CN112416296A

CN112416296A - Implementation method and device for calculating arc tangent function

Info

Publication number: CN112416296A
Application number: CN202011264893.2A
Authority: CN
Inventors: 陈虎; 龙科莅; 伍彬山; 杨焕荣
Original assignee: Hunan Guliang Microelectronics Co ltd
Current assignee: Hunan Guliang Microelectronics Co ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-26

Abstract

The invention discloses a method and a device for realizing calculation of an arc tangent function, wherein the method comprises the following steps: step S1: pre-treating; obtaining a quadrant reference angle and a corresponding coordinate ratio according to the input two-dimensional coordinate value; step S2: monotonicity control and table look-up; strict monotonicity of an output result is controlled by controlling the density of an anchor point value and selecting a first threshold value and a second threshold value, a coefficient is obtained by table look-up, and a variable of polynomial operation is obtained by addition operation; step S3: performing polynomial operation; performing polynomial operation according to the obtained coefficient and the variable of the polynomial operation; step S4: post-treatment; and adding the result obtained by the arc tangent operation and the quadrant reference angle, normalizing and rounding to obtain an arc tangent result corresponding to the input two-dimensional coordinate value. The device is used for implementing the method. The invention has the advantages of simple principle, low hardware overhead, high precision and the like.

Description

Implementation method and device for calculating arc tangent function

Technical Field

The invention mainly relates to the technical field of processors, in particular to a method and a device for calculating an arc tangent function.

Background

The operation of the arctan function is an important and common operation in scientific calculation and engineering design, is widely applied to the scientific and technical fields of image processing, error calculation and the like, and is used as an important activation function in a deep neural network to participate in the operation under the current artificial intelligence and deep learning hot tide.

Arctangent function, in particular the inverse of tangent function f (x) tanx. Generally, the arctangent function operates with an input range of (-infinity, + ∞) and an output range of-pi/2 to pi/2, which is referred to as a two-quadrant arctangent function.

In the prior art, in order to expand the output range, an input may be processed first, a quadrant in which an input coordinate point is located is determined, then an arctangent operation of two quadrants is performed, and a final result is obtained according to the quadrant in which the input coordinate point is located and a result of the arctangent operation of the two quadrants. The input of the method is a two-dimensional coordinate value, the output range is-pi, and the method is called a four-quadrant arc tangent function.

The main methods for realizing the arctangent function operation by using hardware circuits are as follows: polynomial approximation, coordinate rotation method (CORDIC), table lookup, etc., wherein:

in the operation process of the coordinate rotation method, the higher the required result precision is, the more the iteration times are required, and the larger the delay is;

although the table look-up method has high operation speed, the required table space is expanded in geometric multiple along with the improvement of operation precision;

the piecewise polynomial approximation method has the characteristics of high operation speed and small occupied table space and other hardware resources, and is suitable for high-precision and high-speed operation.

It is worth noting that the existing arc tangent function implementation hardware circuit rarely demands and guarantees strict monotonicity of output results. In addition, the existing hardware circuit for realizing the arctangent function is difficult to ensure that the result precision really reaches the precision represented by a single-precision floating point number under the condition of moderate hardware cost.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the implementation method and the device for calculating the arc tangent function, which have the advantages of simple principle, low hardware overhead and high precision.

In order to solve the technical problems, the invention adopts the following technical scheme:

an implementation method for calculating an arctangent function, comprising the steps of:

step S1: pre-treating; obtaining a quadrant reference angle and a corresponding coordinate ratio according to the input two-dimensional coordinate value;

step S2: monotonicity control and table look-up; strict monotonicity of an output result is controlled by controlling the density of an anchor point value and selecting a first threshold value and a second threshold value, a coefficient is obtained by table look-up, and a variable of polynomial operation is obtained by addition operation;

step S3: performing polynomial operation; performing polynomial operation according to the obtained coefficient and the variable of the polynomial operation;

step S4: post-treatment; and adding the result obtained by the arc tangent operation and the quadrant reference angle, normalizing and rounding to obtain an arc tangent result corresponding to the input two-dimensional coordinate value.

As a further improvement of the process of the invention: the step of step S1 includes:

step S101: judging the quadrant in which the two-dimensional coordinate value is located according to the symbol of the two-dimensional coordinate value and the relative size of the absolute value of the two-dimensional coordinate value, and further acquiring a quadrant reference angle according to the corresponding relation;

step S102: calculating the coordinate ratio of the two coordinate values by using floating point division according to the quadrant and the corresponding relation in the step S101;

step S103: and performing division operation by using the selected dividend and the divisor.

As a further improvement of the process of the invention: the step of step S2 includes:

step S201: monotonicity control: selecting a first threshold and a second threshold; if the input number is larger than the first threshold, selecting the first threshold result as a constant result, wherein the constant result effectively identifies the position 1 and the overflow identification position 1; otherwise, the effective identification bit of the constant result is 0, and the overflow identification is 0; based on the lower n bits of the order code and the upper m bits of the mantissa of the input numberTaking the anchor value x₀(ii) a If the input number is less than the second threshold, the anchor value is 0.

Step S202: and (4) table lookup: obtaining an index by decoding the anchor value, and obtaining a coefficient according to an index table look-up;

step S203: obtaining variables, obtaining a variable D of polynomial operation, i.e. D ═ x-x₀Wherein x is the absolute value of the input, namely the symbol position 0 of the input number is x; wherein x₀Is an anchor value.

As a further improvement of the process of the invention: the step of step S3 includes:

step S301: multiplication and addition operation: performing multiply-add operation by using the obtained coefficient and a variable D of polynomial operation to obtain a result;

step S302: normalization and rounding processing: rounding and normalizing the result of the multiply-add operation, wherein the normalization is to convert the result of the internal operation into a data form meeting the standard;

step S303: and (3) selecting and outputting: and selecting and outputting the constant result and the result of the multiply-add operation according to the corresponding selection identifier.

As a further improvement of the process of the invention: the step of step S4 includes: the step of step S4 includes:

step S401: carrying out order matching and shifting on the polynomial operation result and the quadrant reference angle;

step S402: adding the two numbers after the step matching is completed;

step S403: the result of step S401 is rounded and normalized.

The present invention further provides an implementation apparatus for calculating an arctangent function, comprising:

the preprocessing module comprises a quadrant reference angle acquisition unit, a divisor selection unit and a division unit; the quadrant reference angle acquisition unit is a mapping table, and the input two-dimensional coordinate values are mapped into 5 constant values according to the range, the comparison of the absolute values of two input sign bits and two input numbers; the divisor selection unit selects the input two-dimensional coordinate value according to the output result of the quadrant reference angle acquisition unit and outputs the required dividend and divisor;

the monotonicity control and table look-up module comprises a monotonicity control unit, a variable acquisition unit and a table look-up unit; the monotonicity control unit is used for controlling strict monotonicity of a result, and the variable acquisition unit is used for acquiring a variable of polynomial operation to finish the operation; the lookup table unit is used for storing coefficients used by polynomial operation;

the polynomial operation module is used for carrying out polynomial operation;

and the post-processing module is used for finishing the addition operation of the quadrant reference angle and a final result obtained by polynomial operation to obtain a quadrant arc tangent operation result.

As a further improvement of the device of the invention: the monotonicity control unit comprises an anchor point generating unit, a decoding unit and a judging and selecting unit. The anchor point generation unit controls the density of the generated anchor point values by controlling the size of m so as to control the strict monotonicity of the result. The coding unit obtains a corresponding index value by coding the anchor value. The judgment selection unit compares an absolute value of the input number with a first threshold value.

As a further improvement of the device of the invention: the polynomial operation module comprises a multiplication and addition operation unit, a normalization and rounding processing unit and a selection output unit, wherein the multiplication and addition operation unit comprises two multipliers, two adders and two opposite order shift units and is used for finishing multiplication and addition operation; the normalization and rounding processing unit is used for rounding the result of the polynomial operation and normalizing the result; the selection output unit is used for selectively outputting the result through the constant result valid identification bit, and comprises a selector for selectively outputting the result through the valid identification bit.

As a further improvement of the device of the invention: the multiplication and addition operation unit comprises a multiplier #1, a multiplier #2, an adder #1, an adder #2, a step-pair shifting unit #1 and a step-pair shifting unit # 2; first, using multiplier #1, obtain C₂The multiplication result R2_0 of × D;then, the step shift operation of the coefficients C1 and R2_0 is completed by using the step shift unit #1 to obtain a result R2_ 1; then using the adder #1, a result R1 of R2_1+ C1 is obtained; using the multiplier #2, a multiplication result R0_0 of R1 × D is obtained; completing the log-rank of the coefficients C0 and R0_1 using a log-rank shift unit; finally, the adder #2 is used for completing the addition operation of R0_1 and C0, and a result R of the multiply-add operation is obtained; the two numbers of the order-matching shifting units do not need to be compared; in the operation process, the order codes of all the coefficients are more than or equal to the other number of the order operation performed with the coefficients; the pair shift unit #1 aligns the code of R2_0 to C1, and then shifts the mantissa of R2_ 0; the pair shift unit #2 aligns the code of R0_0 to C0 and then shifts the mantissa of R2_ 0.

As a further improvement of the device of the invention: the post-processing module comprises a pair-order shifting unit, an addition operation unit and a normalization and rounding processing unit; the step-matching shifting unit comprises step-matching code size comparison, shifting code acquisition operation and right shifting operation; the step size comparison, i.e. the size of two digital codes, is performed.

Compared with the prior art, the invention has the advantages that:

the implementation method and the device for calculating the arc tangent function have the advantages of simple principle, low hardware cost and high precision, widen the range of output results and complete the hardware implementation of the four-quadrant arc tangent function. Furthermore, the invention ensures that the output result has strict monotonicity by optimizing the density of the configured anchor point value. By optimizing the anchor value density, the storage bit number of the order code and the mantissa of each coefficient is optimized, the size of the corresponding coefficient table is limited, and the storage space of the coefficient table is small in occupation.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention.

Fig. 2 is a schematic diagram of the structural principle of the device of the present invention.

FIG. 3 is a schematic diagram of the structure of a pre-processing module in a specific application example of the apparatus of the present invention.

FIG. 4 is a schematic diagram of the structure of the monotonicity control and lookup table module in the embodiment of the present invention.

FIG. 5 is a schematic diagram of a polynomial operation module in an embodiment of the present invention.

FIG. 6 is a schematic diagram of a multiply-add unit according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of the post-processing module of the apparatus of the present invention in a specific application example.

Detailed Description

The invention will be described in further detail below with reference to the drawings and specific examples.

As shown in fig. 1, an implementation method for calculating an arctangent function of the present invention includes the steps of:

step S1: pre-treating;

namely, according to the input two-dimensional coordinate values, the quadrant reference angle and the corresponding coordinate ratio are obtained.

Step S2: monotonicity control and table look-up;

that is, strict monotonicity of an output result is controlled by controlling the density of anchor values and selecting appropriate first and second thresholds, and a coefficient is obtained by table lookup and a variable of polynomial operation is obtained by addition operation.

Step S3: performing polynomial operation;

that is, polynomial operation is performed based on the obtained coefficients and the variables of the polynomial operation.

In a specific application example, the step S3 includes polynomial operation, normalization and rounding processing, and result selection.

Step S4: post-treatment;

that is, the result of the arctangent operation is added to the quadrant reference angle, and normalization and rounding processing are performed to obtain an arctangent result corresponding to the input two-dimensional coordinate value.

In a specific application example, the step S1 may include:

step S101: and judging the quadrant in which the two-dimensional coordinate value is positioned according to the symbol of the two-dimensional coordinate value and the relative size of the absolute value of the two-dimensional coordinate value, and further acquiring a quadrant reference angle according to the corresponding relation, wherein the specific method is shown in the following tables 1 and 2.

Namely, the quadrant reference angle is judged according to the input two-dimensional coordinate value (X, Y); and judging the quadrant in which the two-dimensional coordinate value is positioned according to the comparison of the symbol of the two-dimensional coordinate value and the absolute value of the two-dimensional coordinate value shown in the table 1, and further acquiring a quadrant reference angle according to the corresponding relation shown in the table 2. For example, if sign bits of two-dimensional coordinate values are both 0 and absolute value ratio | X | ≧ | Y |, a quadrant reference angle of 0 can be obtained as shown in the following two tables.

TABLE 1 relationship between the quadrant and sign and absolute value of two-dimensional coordinate

TABLE 2 quadrant reference angle and coordinate ratio and the corresponding relation of the quadrant in which the coordinate is located

In the quadrant	Quadrant reference angle	Ratio of coordinates
			[3π/4，π]	π	Y/X
(π/4，3π/4)	π/2	-X/Y
			[-π/4，π/4]	0	Y/X
(-3π/4，-π/4)	-π/2	-X/Y
			(-π，-3π/4]	-π	Y/X

Step S102: as shown in table 2 above, the coordinate ratio of the two coordinate values is calculated by floating-point division according to the located quadrant and the corresponding relationship in step S101. Wherein the coordinate ratio of the two coordinates is in the range of [ -1, 1 ]. That is, the dividend and the divisor are selected based on the obtained quadrant reference angle. For example, if the obtained quadrant reference angle is 0, the dividend is selected as Y and the divisor is X.

In a specific application example, the step S2 may include:

step S201: monotonicity control: an appropriate first threshold and second threshold are selected.

If the input number is larger than the first threshold, selecting the first threshold result as a constant result, wherein the constant result effectively identifies the position 1 and the overflow identification position 1; otherwise, the constant result valid flag bit is 0 and the overflow flag is 0. Obtaining anchor value x according to the low n bits of the order code and the high m bits of the mantissa of the input number₀. If the input number is less than the second threshold, the anchor value is 0.

Step S202: and (4) table lookup: and decoding the anchor point value to obtain an index, and searching a table according to the index to obtain a coefficient.

In step S201, the present invention further describes the anchor values as a preferred embodiment:

the anchor value refers to the midpoint of each segment interval in the segment polynomial operation.

The form of the piecewise polynomial operation is shown in equation (1).

Result＝C0+D×(C1+D×C2)

1)

Wherein D ═ x-x₀。

The method of the invention is to use the formula (1) to complete the approximation of the arctangent function in each segment interval. The segment interval is identified by anchor value, i.e. x for calculating D₀The value of (c). The length of each segment interval is 2| D |.

The segment intervals are continuous and non-overlapping, and all segment intervals are spliced together, namely the effective operation interval of all input variables is [0, 1 ].

The density of the anchor values is controlled by the lower n bits of the order code and the upper m bits of the mantissa. Where n is constant, the larger the value of m, the greater the density of anchor values.

The anchor values are distributed between a first threshold and a second threshold; the anchor point values are not uniformly distributed, but the distribution density is gradually increased as the point on the number axis approaches the second threshold value to control the strict monotonicity of the result, i.e. m is gradually decreased as the number of lower n bits of the code is decreased.

The second threshold must be small enough to control that strict monotonicity of the results is guaranteed.

The density of the anchor values must be large enough to control the strict monotonicity of the result, i.e. m must be large enough to control.

The density of anchor values is related to the accuracy required for the final result. The higher the required accuracy, the greater the density of anchor values, i.e. the greater the corresponding value of m, to ensure strict monotonicity of the results.

In the present embodiment, the first threshold is set to 1.

The lower n bits of the order code and the upper m bits of the mantissa of the anchor value are the same as the input number, the sign bit of the anchor value is constant 0, the rest bits of the order code are constants, the m +1 bits of the mantissa counted from left to right are 1, and the rest mantissas are all 0.

In step S202, as a preferred embodiment, the present invention further decodes the anchor value by using a two-stage decoding, i.e. decoding the anchor value by using the step code and the fraction of the mantissa of the anchor value to obtain the index.

The code bits of the anchor value used for decoding are n lower bits and the mantissa bits are M upper bits. M is the maximum value of M in step S203.

All coefficient bits are not stored in the coefficient table, for example, the sign bit and partial order bit and mantissa bit of the coefficient are not stored in the coefficient table; these digits will be recovered by the circuit after table lookup.

In step S203, x for calculating the polynomial variable D and the anchor value x in the same operation₀With the same size of the level code.

In a specific application example, the step S3 may include:

step S301: multiplication and addition operation: the obtained coefficient and a variable D of the polynomial operation are subjected to a multiply-add operation with reference to formula (1), and a result is obtained.

Step S302: normalization and rounding processing: the result of the multiply-add operation is rounded and normalized, where normalization is the conversion of the result of the internal operation into a standard-compliant data form, such as a single-precision form that is compliant with the IEEE-754 standard.

In step S301, the multiply-add operation mainly includes two multiplication operations, two addition operations, and two shift-to-order operations.

Passenger transportIn the calculation process, the coefficient C is firstly used₂Performing a multiplication operation with the variable D to obtain the results R2_0, R2_0 and coefficient C₁Shifting and adding the orders to obtain a result R1; multiplying the variable D by R1 to obtain a result R0; finally R0 and C₀And shifting the orders and adding to obtain a result R, namely the result of the multiplication and addition operation.

In multiply-add operations, the intermediate results must guarantee sufficient accuracy to ensure strict monotonicity of the results.

In step S302, the mantissa of the result of the multiply-add operation is rounded, and then the mantissa, the sign bit, and the order code are concatenated together to obtain the result number meeting the requirement of the operation format.

The normalization process essentially concatenates the rounded mantissa, the exponent and the sign bit into a resultant output that meets the format requirements.

In the process of rounding, if the most significant bit of the mantissa carries, 1 is required to be added to the order code.

In step S303, the specific method is to detect the valid flag bit of the constant result, and if the flag bit is valid, select the constant result to output; otherwise, outputting the result of the multiply-add operation.

In the whole multiplication and addition operation process, only the final result of the multiplication and addition operation is rounded, and the intermediate result is directly truncated.

In a specific application example, the step S4 may include:

step S401: and carrying out order matching and shifting on the polynomial operation result and the quadrant reference angle.

Step S402: and adding the two numbers after the completion of the pair steps.

Step S403: the result of S401 is rounded and normalized.

In step S401, the quadrant reference angle includes 5 values, as shown in table 2.

This addition operation includes a compare operation, a pair shift operation, and a mantissa addition operation.

For a comparison of the sizes of two numbers, first only the two order codes and mantissas are compared, regardless of the sign bit. The specific method is that the sizes of the stage codes are compared firstly, if the stage codes are the same, the sizes of the mantissas are compared, otherwise, the number of the stage codes is larger.

In the shifting operation, the orders of the smaller numbers are aligned to the larger numbers, and then the mantissas of the smaller numbers are right shifted according to the value of the order difference. For example, if the codes differ by 2 bits, the mantissa of the smaller number is shifted to the right by 2 bits.

In step S402, the addition of two numbers includes mantissa addition and carry of a code.

Firstly, the mantissas of two numbers are added, and if the mantissas are added to generate the most significant carry, the order code needs to be added by 1.

In step S403, the rounding and normalization processing steps described in further step S402 are substantially the same as those in step S302.

As shown in fig. 2, the present invention further provides an implementation apparatus for calculating an arctangent function, which comprises:

the preprocessing module mainly comprises a quadrant reference angle acquisition unit, a divisor selection unit and a division unit. The quadrant reference angle acquisition unit is a mapping table, and maps the input two-dimensional coordinate values into 5 constant values according to a defined range and comparison of absolute values of two input sign bits and two input numbers. The specific mapping method is shown in tables 1 and 2. The divisor selection unit selects the input two-dimensional coordinate value according to the output result of the quadrant reference angle acquisition unit, outputs the required dividend and divisor, and specifically selects the mode according to the table 1. The division unit is a typical floating-point division unit. Specifically, if both inputs are 0, the output is 0; if the divisor alone is 0, the output is infinite, i.e., in the floating-point representation of the IEEE-754 standard, the output level bits are all 1 and the mantissa bits are all 0.

As shown in fig. 3, it includes an M101 quadrant reference angle acquisition unit, an M102 divisor selection unit, and an M103 floating-point division unit. In this example, the M101 quadrant reference angle acquisition unit is a mapping table that maps the input two-dimensional coordinate values into 5 constant values according to a defined range, according to comparison of the magnitude of the absolute values of two input sign bits and two input numbers. The specific mapping method is shown in tables 1 and 2. The M102 divisor selecting unit selects the input two-dimensional coordinate value according to the output result of the quadrant reference angle acquiring unit, and outputs the desired dividend and divisor, which has a simple implementation structure, and therefore, no specific description is given, and the specific selection mode refers to table 2. The M103 division unit is a typical floating-point division unit. For this division unit, first a decision is made based on the input: if the two inputs are all 0, the output is 0; if only the divisor is 0, the output is infinite, i.e., in the IEEE-754 standard floating-point representation, the order numbers are all 1's and the mantissas are all 0's. After the determination is completed, floating-point division operations are performed, which mainly include order subtraction and mantissa division.

The monotonicity control and table look-up module mainly comprises a monotonicity control unit, a variable acquisition unit and a table look-up unit. The monotonicity control unit mainly comprises an anchor point generating unit, a decoding unit and a judging and selecting unit. Wherein the anchor point generating unit controls the density of the generated anchor point values by controlling the size of m, thereby controlling the strict monotonicity of the result. To ensure strict monotonicity of the results, the value of m must be sufficiently large. The decoding unit decodes the anchor point value to obtain a corresponding index value. The decoding can be simply completed according to the corresponding mapping relation between the anchor value and the required index value. The decision selection unit needs to compare the absolute value of the input number with a first threshold. The comparison method can be only to compare the size of the code, and the comparison can be realized by simple logic or by an addition unit. In addition, whether the overflow flag is set to 1 or not is judged, whether the mantissa is equal to a certain constant or not is judged in the last step of judgment, if the mantissa is equal to the certain constant, the overflow flag is 0, and otherwise, the overflow flag is set to 1.

Referring to fig. 4, in the present embodiment, the monotonicity controlling and table lookup module mainly includes an M201 monotonicity controlling unit, an M202 variable obtaining unit, and an M203 table lookup unit.

M202 variable get sheetAn element for obtaining a variable of the polynomial operation, completing D ═ x-x₀Mainly comprises an addition unit and a left shift unit. In the variable acquiring unit, the subtraction of the two numbers can be realized by adding the complements thereof, i.e. the input and output numbers are both complements of the original numbers. The addition unit is a basic integer addition unit. The two numbers of the order codes for addition are consistent, and only the mantissas need to be added. The M202 variable fetch unit needs to use a left shift unit to remove the leading 0 of the result mantissa after the addition operation is completed.

M203 a lookup table unit, a non-volatile storage device or a circuit capable of storing a coefficient table, for storing three coefficients used for polynomial operation. The lookup table unit mainly comprises a nonvolatile storage device for storing coefficients. In this embodiment, a floating point version of the coefficients is stored. In this embodiment, the look-up table may not store the sign bit of the coefficient. The coefficients stored inside the M203 lookup table unit are in the form of coefficient complements. After the coefficient is obtained through the index, partial digits of the coefficient, including sign digits, order code partial digits and mantissa partial digits, are recovered and then output. The coefficients C0, C1, C2 stored in the lookup table unit are obtained by taylor expansion. The coefficient values have a one-to-one correspondence with the anchor values, and a correspondingly determined functional relationship may accomplish the mapping between the two. In particular, the coefficients are obtained by anchor values and corresponding functional relationships, which are in turn obtained by second-order taylor expansion. For example, a polynomial form in the form of formula (1) can be obtained by a second-order taylor expansion, and a formula for calculating a correlation coefficient can be naturally obtained. In the present embodiment, when the input number is equal to or less than the second threshold value, in order to ensure high accuracy and strict monotonicity of the output result, three coefficients are not obtained using the above method, but C is made₀，C₂Is equal to 0, C₁Equal to 1. In the method for acquiring the coefficients, the coefficients have enough bit width.

Referring to fig. 5 and 6, the polynomial operation module includes an M301 multiply-add unit, an M302 normalization and rounding unitElement, M303 selects an output unit. The multiplication and addition operation unit mainly comprises two multipliers, two adders and two opposite-order shift units and is used for finishing multiplication and addition operation. The multiply-add unit mainly includes a multiplier #1, a multiplier #2, an adder #1, an adder #2, a pair-order shift unit #1, and a pair-order shift unit # 2. In a specific operation, referring to equation (1), the multiplier #1 is first used to obtain C₂The multiplication result R2_0 of × D; then, the step shift operation of the coefficients C1 and R2_0 is completed by using the step shift unit #1 to obtain a result R2_ 1; then using adder #1, obtain result R1 of R2_1+ C1; using the multiplier #2, a multiplication result R0_0 of R1 × D is obtained; completing the log rank of the coefficients C0 and R0_1 by using a log rank shifting unit; finally, the adder #2 is used to complete the addition of R0_1 and C0, and the result R of the multiply-add operation is obtained. The pair-order shifting unit does not need to compare the sizes of the two numbers. In the operation process, the order code of all coefficients is larger than or equal to another number of the order operation performed with the coefficients. The pair of level shift units #1 aligns the level code of R2_0 to C1, and then shifts the mantissa of R2_ 0. The pair shift unit #2 aligns the code of R0_0 to C0 and then shifts the mantissa of R2_ 0.

And a normalization and rounding unit for performing normalization by rounding the result of the polynomial operation. The even number processing operation mainly comprises a mantissa addition operation, a judgment operation and a code addition operation. The judging operation mainly comprises the steps of judging the lowest bits of the mantissa and judging whether the carry is needed or not. The mantissa addition operation is mainly to carry out carry operation on mantissas when carry needs to be carried out, that is, to add high-order N bits. The order code addition operation is to add 1 to the order code when the most significant carry bit exists after the mantissa completes the carry operation. Normalization refers to converting a number input to a normalization unit into a desired representation and outputting the converted number. In the present invention, the operation of this section is to concatenate and output the sign bit, the order code, and the rounded mantissa bit.

M303 selects the output unit, through the effective identification bit of constant result, carry on the selective output to the result, it mainly includes a selector, through the effective identification bit, carry on the selective output to the result. If the constant result is effectively marked as 1, the constant result is selected to be output, otherwise, the polynomial operation result is selected to be output. M303 selects the concrete operation of the output unit to be, set up the detection circuit at first, detect the result identification bit of the constant, if this identification bit is valid, namely identification bit is 1, choose the result of the constant to output; otherwise, the result of the polynomial operation is output.

Referring to fig. 7, the post-processing module includes an M401 log-shift unit, an M402 add unit, and an M403 normalization and rounding unit. The module is used for finishing the addition operation of the quadrant reference angle and the final result obtained by polynomial operation to obtain a four-quadrant arc tangent operation result.

In a specific application example, the multiply-add unit comprises two multipliers, wherein the scale of the multiplier #1 is far smaller than that of the multiplier # 2. The input and output of the multiplier or adder in the multiply-add unit must be guaranteed with sufficient bit width to ensure that no additional loss of precision occurs.

In a specific application example, the lookup table unit mainly includes a nonvolatile storage device or a circuit for storing the lookup table. If a floating-point version of the coefficients is stored, then the levels may be stored less or no more for some of the coefficients. The mantissas of the coefficients are stored using a complementary form. This lookup table does not store the sign bit of the coefficient. The sign bits of the coefficients are given collectively by the circuit.

The coefficients C0, C1, C2 stored in the lookup table unit may be obtained by taylor expansion, or by other methods, such as using interpolation, and the like. All coefficients are obtained in a uniform manner, i.e. if the coefficients are obtained by a taylor expansion, all coefficients are obtained by the taylor expansion. The coefficient values have a one-to-one correspondence with the anchor values described in step S301, and there is a corresponding determined functional relationship to accomplish the mapping between the two. Specifically, the coefficients are obtained by anchor values and corresponding functional relationships, which are obtained by taylor expansion or the like. For example, a polynomial form in the form of formula (1) can be obtained by a second-order taylor expansion, and a formula for calculating the correlation coefficient can be naturally obtained.

When the number of inputs is less than or equal to the second threshold, three coefficients are not obtained using the above method but C is set to be equal to C in order to ensure high accuracy and strict monotonicity of the output result₀，C₂Is equal to 0, C₁Equal to 1.

In the method for obtaining the coefficients, the coefficients must have enough digits to ensure that the coefficients obtained by the method enable the final result obtained after the polynomial operation to meet the requirements of high precision and strict monotony.

In a specific application example, in order to obtain strict monotonicity, the anchor point unit intercepts the lower n bits and the upper m bits of the mantissa of the input order code as an anchor point value, wherein m is not a definite value and changes according to the change of the order code.

In a specific application example, when the input number is less than or equal to the second threshold value, the anchor value is 0. In order to obtain a strict monotonicity, the second threshold value set must be sufficiently small.

In a specific application example, the selection output unit is a selection circuit, and the selection is completed through a constant result valid identifier.

In a specific application example, the post-processing module is configured to complete an addition operation of the quadrant reference angle and a final result obtained by the polynomial operation, and obtain a four-quadrant arc tangent operation result. The order matching operation unit comprises the step code size comparison, the shift code acquisition operation and the right shift operation. The comparison of the sizes of the two digital codes can be realized by a subtraction operation unit directly, or by a logic unit, bit by bit from the highest bit, or by other comparison circuits.

The shift code obtaining operation is a subtraction unit for subtracting the smaller step code from the larger step code to obtain the shift code.

The right shift operation right shifts the mantissa according to a shift code.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. An implementation method for calculating an arctangent function, the steps comprising:

2. The method of claim 1, wherein the step S1 includes:

3. The method of claim 1, wherein the step S2 includes:

step S201: monotonicity control: selecting a first threshold and a second threshold; if the input number is larger than the first threshold, selecting the first threshold result as a constant result, wherein the constant result effectively identifies the position 1 and the overflow identification position 1; otherwise, the valid flag bit of the constant result is 0, and the overflow flag is 0; obtaining anchor value x according to the low n bits of the order code and the high m bits of the mantissa of the input number₀(ii) a If the input number is smaller than the second threshold value, the anchor point value is 0;

4. The method of claim 1, wherein the step S3 includes:

step S303: and (3) selecting and outputting: and according to the corresponding selection identifier, selectively outputting the constant result and the result of the multiply-add operation.

5. The method of claim 1, wherein the step S4 includes: the step of step S4 includes:

step S402: adding the two numbers after the step matching is completed;

step S403: the result of step S401 is rounded and normalized.

6. An implementation apparatus for calculating an arctangent function, comprising:

the preprocessing module comprises a quadrant reference angle acquisition unit, a divisor selection unit and a division unit; the quadrant reference angle acquisition unit is a mapping table, and maps the input two-dimensional coordinate value into 5 constant values according to the range, two input sign bits and the comparison of the absolute values of two input numbers; the divisor selection unit selects the input two-dimensional coordinate value according to the output result of the quadrant reference angle acquisition unit and outputs the required dividend and divisor;

the polynomial operation module is used for carrying out polynomial operation;

and the post-processing module is used for finishing the addition operation of the quadrant reference angle and a final result obtained by polynomial operation to obtain a four-quadrant arc tangent operation result.

7. The apparatus of claim 6, wherein the monotonicity controlling unit comprises an anchor point generating unit, a decoding unit, a judgment selecting unit; the anchor point generating unit controls the density of the generated anchor point value by controlling the size of m so as to control the strict monotonicity of the result; the coding unit obtains a corresponding index value by coding the anchor value. The judgment selection unit compares an absolute value of the input number with a first threshold value.

8. The apparatus according to claim 6, wherein the polynomial operation module comprises a multiplication and addition operation unit, a normalization and rounding unit, and a selection output unit, wherein the multiplication and addition operation unit comprises two multipliers, two adders, and two logarithmic shift units for performing multiplication and addition operations; the normalization and incorporation processing unit is used for rounding the result of the polynomial operation and normalizing the result; the selection output unit is used for selectively outputting the result through the constant result valid identification bit, and comprises a selector for selectively outputting the result through the valid identification bit.

9. The apparatus according to claim 8, wherein the multiply-add unit comprises a multiplier #1, a multiplier #2, an adder #1, an adder #2, and a logarithmic shift unit #1, a logarithmic shift unit # 2; first, using multiplier #1, obtain C₂The multiplication result R2_0 of × D; then, the step shift operation of the coefficients C1 and R2_0 is completed by using the step shift unit #1 to obtain a result R2_ 1; then using adder #1, obtain result R1 of R2_1+ C1; using the multiplier #2, a multiplication result R0_0 of R1 × D is obtained; completing the log rank of the coefficients C0 and R0_1 by using a log rank shifting unit; finally, the adder #2 is used for completing the addition operation of R0_1 and C0, and a result R of the multiply-add operation is obtained; the two numbers of the order-matching shifting units do not need to be compared; in the operation process, the order codes of all the coefficients are more than or equal to the other number of the order operation performed with the coefficients; the pair shift unit #1 aligns the order code of R2_0 to C1, and then shifts the mantissa of R2_ 0; the pair shift unit #2 aligns the code of R0_0 to C0 and then shifts the mantissa of R2_ 0.

10. The apparatus of claim 6, wherein the post-processing module comprises a log shift unit, an addition unit, and a normalization and rounding unit; the step-matching shifting unit comprises step-matching code size comparison, shifting code acquisition operation and right shifting operation; the step size comparison, i.e. the size of two digital codes, is performed.