CN109976706B

CN109976706B - Floating-point arithmetic device

Info

Publication number: CN109976706B
Application number: CN201910252430.5A
Authority: CN
Inventors: 冯春阳; 刘刚; 沙亮; 彭琅; 黄晶; 王俊杰; 张兴革
Original assignee: Hexin Technology Suzhou Co ltd
Current assignee: Hexin Technology Suzhou Co ltd
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2023-03-31
Anticipated expiration: 2039-03-29
Also published as: CN109976706A

Abstract

The application discloses a floating point arithmetic device belongs to the technical field of microprocessors. The floating-point arithmetic device includes: the device comprises a decoding circuit, a first processing circuit, a second processing circuit and a selection circuit. According to the floating point arithmetic device, the first processing circuit calculates the reciprocal estimate of the floating point or the reciprocal estimate of the square root of the floating point of the first floating point data to obtain a first calculation result, the second processing circuit calculates the reciprocal estimate of the floating point base 2 exponent of the second floating point data to obtain a second calculation result, and the second calculation result is selected and output to obtain a target result based on the first calculation result and/or the second calculation result, so that the calculation of the transcendental function of the target floating point data is realized, and the applicability of the floating point arithmetic device is improved.

Description

Floating-point arithmetic device

Technical Field

The application relates to the technical field of microprocessors, in particular to a floating-point operation device.

Background

With the increasing demand for computing-intensive computing power and the dramatic increase in computational complexity, floating point processing devices that comply with the Institute of Electrical and Electronics Engineers (IEEE) 754 standard are increasingly being used. Therefore, high requirements are placed on the applicability and accuracy of the floating-point processing apparatus.

In response to the above requirements, the floating point processing apparatus in the related art can implement the calculation of floating point transcendental functions such as floating point reciprocal, floating point reciprocal square root, floating point exponent, floating point logarithm, and the like by using the built-in software. However, the computation of floating point transcendental functions performed by software is generally slow and computationally inefficient.

Disclosure of Invention

Therefore, an object of the present invention is to overcome the disadvantages of slow calculation speed and low calculation efficiency caused by executing the calculation of a floating point override function through software in the prior art, and to provide a floating point arithmetic device, including:

the embodiment of the present application provides a floating point arithmetic device, including: decoding circuit, first processing circuit, second processing circuit and selection circuit:

the decoding circuit is used for receiving target floating point data and a control signal, acquiring first floating point data from the target floating point data according to the control signal, and sending the first floating point data to the first processing circuit; and/or acquiring second floating point data from the target floating point data according to the control signal, sending the second floating point data to the second processing circuit, and sending the control signal to the first processing circuit, the second processing circuit and the selection circuit respectively, wherein the first floating point data comprises floating point data with an estimation function type of floating point reciprocal estimation calculation or floating point square root reciprocal estimation calculation, and the second floating point data comprises floating point data with an estimation function type of floating point base 2 exponent estimation calculation or floating point base 2 logarithm estimation calculation;

the first processing circuit is configured to perform a first estimation calculation on the first floating point data according to the control signal to obtain a first calculation result of the first floating point data, and send the first calculation result to the selection circuit, where the first estimation calculation includes a floating reciprocal estimation calculation or a floating reciprocal square root estimation calculation;

the second processing circuit is configured to perform a second estimation calculation on the second floating point data according to the control signal to obtain a second calculation result, and send the second calculation result to the selection circuit, where the second estimation calculation includes a floating-point base 2 exponent estimation calculation or a floating-point base 2 logarithm estimation calculation;

and the selection circuit is used for selectively outputting the first calculation result and the second calculation result to obtain a target calculation result of the target floating point data.

In an alternative embodiment, the first floating point data comprises a first 32-bit floating point data, and/or a first 64-bit floating point data; the first 32-bit floating point data is 32-bit first floating point data, and the first 64-bit floating point data is 64-bit first floating point data;

the first processing circuit comprises at least one first processor and/or at least one second processor;

the first calculation result comprises at least one of at least one group of first calculation sub-results output by the at least one first processor, at least one group of first calculation sub-results output by the at least one second processor, and at least one group of second calculation sub-results output by the at least one second processor;

the first processor is configured to perform the first estimation calculation on a group of the first 32-bit floating point data according to the control signal to obtain a group of the first calculation sub-results, and send the first calculation sub-results to the selection circuit;

the second processor is configured to perform the first estimation calculation on a group of the first 32-bit floating point data according to the control signal to obtain a group of the first calculation sub-results, and send the group of the first calculation sub-results to the selection circuit; or, according to the control signal, performing the first estimation calculation on a group of the first 64-bit floating point data to obtain a group of the second calculation sub-results, and sending the group of the second calculation sub-results to the selection circuit.

In an alternative embodiment, the second floating point data includes a second 32-bit floating point data, the second 32-bit floating point data being a 32-bit second floating point data;

the second processing circuitry comprises at least one third processor;

the second calculation result comprises at least one set of third calculation sub-results output by the at least one third processor;

and the third processor is configured to perform the second estimation calculation on a group of the second 32-bit floating point data according to the control signal to obtain a group of the third calculation sub-results, and send the group of the third calculation sub-results to the selection circuit.

In an alternative embodiment, the first processing circuitry comprises at least two of the first processors, and at least two of the second processors;

the first calculation result comprises the first calculation sub-result output by each of the at least two first processors, and the first calculation sub-result output by each of the at least two second processors;

the decoding circuit is further configured to unpack the target floating point data into at least four groups of first 32-bit floating point data according to the control signal, and send each group of 32-bit floating point data of the at least four groups of first 32-bit floating point data to the at least two first processors and the at least two second processors, respectively.

In an alternative embodiment, the first processing circuitry comprises at least four of the second processors;

the first computation result comprises the first computation sub-result output by each of the at least four second processors;

the decoding circuit is further configured to unpack the target floating point data into at least four groups of first 32-bit floating point data according to the control signal, and send each group of 32-bit floating point data in the at least four groups of first 32-bit floating point data to any one of the four second processors.

In an alternative embodiment, the first processing circuit comprises at least two of the second processors;

the first calculation result comprises the second calculation sub-result output by each of the at least two second processors;

the decoding circuit is further configured to unpack the target floating point data into at least two groups of first 64-bit floating point data according to the control signal, and send each group of 64-bit floating point data in the at least two groups of first 64-bit floating point data to any one of the at least two second processors.

In an alternative embodiment, the second processing circuitry comprises at least four of the third processors;

the second computation result comprises the third computation sub-result output by each of the at least four third processors;

the decoding circuit is further configured to unpack the target floating point data into at least four sets of the second 32-bit floating point data according to the control signal, and send each 32-bit floating point data of the at least four sets of the second 32-bit floating point data to any one of the at least four third processors.

In an optional embodiment, the first processor, or the second processor, comprises:

the first flow circuit is used for respectively performing floating point type judgment, floating point mantissa normalization processing and floating point mantissa leading zero detection on the first floating point data according to the control signal to obtain first flow output data, and sending the first flow output data to the second flow circuit;

the second pipeline circuit is used for executing first intermediate result processing of floating point number exponents, special floating point number calculation processing and normalized floating point mantissa addressing lookup tables on the first pipeline output data according to the control signal to obtain second pipeline output data, and sending the second pipeline output data to a third pipeline circuit;

the third flowing water circuit is used for performing fixed-point multiplication on the second flowing water output data on the floating-point mantissa data path according to the control signal to obtain third flowing water output data, and sending the third flowing water output data to a fourth flowing water circuit;

the fourth pipeline circuit is configured to perform floating point mantissa subtraction, normalization processing and second intermediate result processing of a floating point exponent on the third pipeline output data according to the control signal to obtain fourth pipeline output data, and send the fourth pipeline output data to a fifth pipeline circuit;

the fifth flow circuit is used for performing post-processing on the fourth flow output data according to the control signal to obtain a calculation result; when the first floating point data comprises the first 32-bit floating point data, the computation result comprises the first computation sub-result; when the first floating point data includes the first 64-bit floating point data, the computation result includes the second computation sub-result.

In an optional embodiment, the third processor comprises:

the sixth pipeline circuit is used for respectively executing decomposition operation and calculation type judgment operation on the second floating point data according to the control signal to obtain a sixth pipeline output result, and sending the sixth pipeline output result to a seventh pipeline circuit;

the seventh pipeline circuit is configured to execute special data processing on the sixth pipeline output result according to the control signal and based on a calculation type included in the sixth pipeline output result to obtain a seventh pipeline output result, and send the seventh pipeline output result to an eighth pipeline circuit;

the eighth pipeline circuit is configured to perform a first shift operation on the seventh pipeline output result according to the control signal to obtain an eighth pipeline first sub-result, and send the eighth pipeline first sub-result to a ninth pipeline circuit;

the ninth pipeline circuit is configured to perform an index table look-up operation and a complementation operation on the eighth pipeline first sub-result according to the control signal to obtain a ninth pipeline first sub-result, and send the ninth pipeline first sub-result to the tenth pipeline circuit;

and the tenth pipeline circuit is configured to perform a second shift operation and an exponent floating point number synthesis operation on the first sub-result of the ninth pipeline according to the control signal, so as to obtain the third calculation sub-result.

In an optional embodiment, the eighth pipeline circuit is further configured to perform, according to the control signal, a first leading zero detection operation, a third shift operation, and a logarithm table look-up operation on the seventh pipeline output result, respectively, to obtain an eighth pipeline second sub-result, and send the eighth pipeline second sub-result to the ninth pipeline circuit;

the ninth pipeline circuit is further configured to perform a complementation operation on the eighth pipeline second sub-result according to the control signal to obtain a ninth pipeline second sub-result, and send the ninth pipeline second sub-result to a tenth pipeline circuit;

the tenth pipeline circuit is further configured to perform, according to the control signal, a second leading zero detection operation, a fourth shift operation, and a logarithmic floating-point number synthesis operation on the second sub-result of the ninth pipeline, so as to obtain the third calculation sub-result.

The technical scheme of the application has the following advantages:

the method comprises the steps that a decoding circuit obtains first floating point data and/or second floating point data from target floating point data, when the target floating point data comprise the first floating point data, a first processing circuit carries out first valuation calculation on the first floating point data to obtain a first calculation result, when the target floating point data comprise the second floating point data, a second processing circuit carries out second valuation calculation on the second floating point data to obtain a second calculation result, a selection circuit selects and outputs the first calculation result and/or the second calculation result to obtain a target calculation result, and the first calculation result is a calculation result of a floating point reciprocal evaluation calculation or a floating point reciprocal square root evaluation calculation of the first floating point data, the second calculation result is a calculation result of a floating point base 2 exponent evaluation calculation or a floating point base 2 logarithm evaluation calculation of the second floating point data, and the target calculation result is selected and output based on the first calculation result and/or the second calculation result, so that calculation of a function exceeding the target floating point data is achieved, and the applicability of a floating point arithmetic device is improved.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a floating-point arithmetic device according to an exemplary embodiment of the present application;

FIG. 2 is a block diagram of a floating-point arithmetic device according to an exemplary embodiment of the present application;

FIG. 3 is a block diagram of a first processor provided in an exemplary embodiment of the present application;

fig. 4 is a schematic structural diagram of a third processor according to an exemplary embodiment of the present application.

Detailed Description

The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be connected through the inside of the two elements, or may be connected wirelessly or through a wire. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

In addition, the technical features mentioned in the different embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a block diagram illustrating a structure of a floating-point arithmetic device according to an exemplary embodiment of the present application. As shown in fig. 1, the floating-point arithmetic device 100 according to the present embodiment includes a decoding circuit (Decode) 110, a first processing circuit (PE) 120, a second processing circuit 130, and a selection circuit (Select) 140.

The decoding circuit 110 is configured to receive target floating point data and a control signal, acquire first floating point data from the target floating point data according to the control signal, and send the first floating point data to the first processing circuit 120; and/or acquiring second floating point data from the target floating point data according to the control signal, sending the second floating point data to the second processing circuit 130, and sending the control signal to the first processing circuit 120, the second processing circuit 130 and the selection circuit 140 respectively.

The first floating point data comprises floating point data with the estimation function type of floating point reciprocal estimation calculation or floating point square root reciprocal estimation calculation; the second floating point data includes floating point data whose evaluation function type is either a floating point base 2 exponent evaluation calculation or a floating point base 2 logarithm evaluation calculation.

When the target floating point data includes the first floating point data, the decoding circuit 110 sends the first floating point data to the first processing circuit 120 according to the control signal; when the target floating point data includes the second floating point data, the decoding circuit 110 sends the second floating point data to the second processing circuit 130 according to the control signal; when the target floating point data includes the first floating point data and the second floating point data, the decoding circuit 110 transmits the first floating point data to the first processing circuit 120 and transmits the second floating point data to the second processing circuit 130 according to the control signal.

The first processing circuit 120 is configured to perform a first estimation calculation on the first floating point data according to the control signal, obtain a first calculation result of the first floating point data, and send the first calculation result to the selection circuit 140. Wherein the first estimate calculation comprises a floating-point reciprocal estimate calculation or a floating-point reciprocal square root estimate calculation.

In an alternative embodiment, the first floating point data includes a first 32-bit floating point data, and/or a first 64-bit floating point data. The first 32-bit floating point data is 32-bit first floating point data, and the first 64-bit floating point data is 64-bit first floating point data. Specifically, the 64-bit floating point data includes a 64-bit double precision floating point number or a single precision floating point number in a double precision format.

In an alternative embodiment, the first processing circuit 120 includes at least one first processor, or, alternatively, at least one second processor; the first calculation result comprises at least one of at least one group of first calculation sub-results output by the at least one first processor, at least one group of first calculation sub-results output by the at least one second processor, and at least one group of second calculation sub-results output by the at least one second processor.

And the first processor is used for carrying out first estimation calculation on a group of first 32-bit floating point data according to the control signal to obtain a group of first calculation sub-results and sending the group of first calculation sub-results to the selection circuit.

The second processor is used for carrying out first estimation calculation on a group of first 32-bit floating point data according to the control signal to obtain a group of first calculation sub-results and sending the group of first calculation sub-results to the selection circuit; or, according to the control signal, performing a first estimation calculation on a group of first 64-bit floating point data to obtain a group of second calculation sub-results, and sending the group of second calculation sub-results to the selection circuit.

The second processing circuit 130 is configured to perform a second estimation calculation on the second floating point data according to the control signal to obtain a second calculation result, and send the second calculation result to the selecting circuit 140.

Wherein the second estimate calculation comprises a floating-point based 2 exponential estimate calculation, or a floating-point based 2 logarithmic estimate calculation. Optionally, the second floating point data includes a second 32-bit floating point data, and the second processing circuit 130 is further configured to perform a second evaluation calculation on a set of the second 32-bit floating point data.

In an alternative embodiment, the second floating point data includes a second 32-bit floating point data, the second 32-bit floating point data being a 32-bit second floating point data.

In an alternative embodiment, the second processing circuit 130 comprises at least one third processor, and the second calculation result comprises at least one set of second calculation sub-results output by the at least one third processor.

And the third processor is used for performing second estimation calculation on a group of second 32-bit floating point data according to the control signal to obtain a group of third calculation sub-results, and sending the group of third calculation sub-results to the selection circuit.

And the selection circuit 140 is configured to selectively output the first calculation result and/or the second calculation result to obtain a target calculation result of the target floating point data.

In summary, in the embodiment of the present application, the decoding circuit obtains the first floating point data and/or the second floating point data from the target floating point data, when the target floating point data includes the first floating point data, the first processing circuit performs a first estimation calculation on the first floating point data to obtain a first calculation result, when the target floating point data includes the second floating point data, the second processing circuit performs a second estimation calculation on the second floating point data to obtain a second calculation result, the selection circuit selectively outputs the first calculation result and/or the second calculation result to obtain the target calculation result, because the first calculation result is a calculation result of a reciprocal floating point estimation calculation or a reciprocal floating point square root estimation calculation of the first floating point data, the second calculation result is a calculation result of a floating point base 2 exponent estimation calculation or a floating point base 2 logarithm estimation calculation of the second floating point data, and the target calculation result is selectively output based on the first calculation result and/or the second calculation result, the calculation result is calculated based on the first calculation result and/or the second calculation result, thereby realizing calculation of the calculation function of the target floating point data, and improving the applicability of the floating point operation device.

Optionally, in this embodiment of the present application, at least one first processor or at least one second processor is provided, so that the floating point reciprocal estimate calculation and the floating point reciprocal square root reciprocal estimate calculation for the 32-bit floating point data or the 64-bit floating point data included in the target floating point data are implemented.

Optionally, in this embodiment of the application, the floating point base 2 exponent evaluation calculation or the floating point base 2 logarithm evaluation calculation of the 32-bit floating point data in the target floating point data is implemented by setting a second processing circuit that performs the floating point base 2 exponent evaluation calculation or the floating point base 2 logarithm evaluation calculation on a group of 32-bit floating point data.

Fig. 2 is a block diagram illustrating a structure of a floating-point arithmetic device according to an exemplary embodiment of the present application. As shown in fig. 2, the floating-point arithmetic device 200 provided in the present embodiment includes a decoding circuit 210, at least two first processors 221 (two first processors 221 are exemplarily illustrated in fig. 2), at least two second processors 222 (two second processors 222 are exemplarily illustrated in fig. 2), at least four third processors 230 (four third processors 230 are exemplarily illustrated in fig. 2), and a selection circuit 240.

The decoding circuit 210 is configured to receive target floating point data and a control signal, divide the target floating point data into at least four groups of first 32-bit floating point data according to the control signal, and send each group of 32-bit floating point data in the at least four groups of first 32-bit floating point data to any one of at least two first processors 221, and any one of at least two second processors 222; or, the processor is configured to divide the target floating point data into at least two groups of first 64-bit floating point data according to the control signal, and send each group of 64-bit floating point data in the at least two groups of first 64-bit floating point data to any one of the at least two second processors 222; or, the target floating point data is divided into at least four groups of second 32-bit floating point data according to the control signal, each group of 32-bit floating point data in the at least four groups of second 32-bit floating point data is respectively sent to the at least four third processors 230, and the control signal is sent to the two first processors 221, the at least two second processors 222, and the at least four third processors 230.

The first 32-bit floating point data is 32-bit floating point data of which the evaluation function type is floating point reciprocal evaluation calculation or floating point reciprocal square root evaluation calculation; the first 64-bit floating point data is 64-bit floating point data of which the evaluation function type is floating point reciprocal evaluation calculation or floating point reciprocal square root evaluation calculation, and specifically, the 64-bit floating point data comprises 64-bit double-precision floating point numbers or single-precision floating point numbers in a double-precision format; the second 32-bit floating point data is 32-bit floating point data whose evaluation function type is floating-point base 2 exponent evaluation calculation or floating-point base 2 logarithm evaluation calculation.

Illustratively, taking the target floating point data as 128-bit floating point data as an example, the decoding circuit 210 decodes the target data according to the received control signal, decomposes the 128-bit data into one, two, or four valid groups of 32/64-bit floating point data, distributes the data into the corresponding calculation paths of the first processor 221, the second processor 222, or the third processor 230, and provides control signals (including main information such as precision type, vector scalar type, estimation calculation type, floating point abnormal operation enable, and the like) of each processor.

The first processor 221 is configured to perform a first evaluation calculation on a group of first 32-bit floating point data according to the control signal, so as to obtain a group of first calculation sub-results, a first path exception signal, and a first valid flag signal.

A second processor 222, configured to perform a first estimation calculation on a group of first 32-bit floating point data according to the control signal, so as to obtain a group of first calculation sub-results, a first path exception signal, and a first valid flag signal; or, according to the control signal, performing a first evaluation calculation on a group of first 64-bit floating point data to obtain a second calculation sub-result, a second path abnormal signal and a second valid flag signal.

The third processor 230 is configured to perform a second estimation calculation on a group of second 32-bit floating point data according to the control signal, so as to obtain a group of third calculation sub-results, a third path exception signal, and a third valid flag signal.

Illustratively, the table one shows the types of evaluation functions supported by the floating-point arithmetic device according to the embodiment of the present application. As shown in table one, SP32 represents 32-bit single-precision floating point data conforming to the IEEE 754 standard format, and DP represents 64-bit double-precision floating point data conforming to the IEEE 754 standard format; SP64 represents single precision floating point data in double precision format conforming to IEEE 754 standard format; the scalar quantity represents the evaluation calculation of a group of 32-bit/64-bit floating point data, and the vector quantity represents the parallel evaluation calculation of four groups of 32-bit or two groups of 64-bit floating point data; s denotes an evaluation function type supporting scalar operations; v represents the type of the estimation function supporting vector operation; r represents the reciprocal operation of the floating point number; RSR denotes performing a floating-point reciprocal square root operation; EXP denotes performing a floating-point radix-2 exponent operation; LOG denotes performing a floating-point number base 2-logarithm operation.

Watch 1

/>

In the embodiment of the present application, the first estimate calculation includes a floating point reciprocal estimate calculation, or a floating point reciprocal square root estimate calculation, i.e., other types of estimate functions than SP32_ V _ EXP and SP32_ V _ LOG shown in table one. Specifically, the first processor 221 supports other evaluation function types of 32-bit floating point data other than SP32_ V _ EXP and SP32_ V _ LOG; the second processor 222 supports other evaluation function types of 64-bit floating point data than SP32_ V _ EXP and SP32_ V _ LOG.

In the embodiment of the present application, the second estimate calculation includes a floating-point base 2 exponential estimate calculation, or a floating-point base 2 logarithmic estimate calculation, i.e., the SP32_ V _ EXP and SP32_ V _ LOG estimate function types shown in table one. Specifically, the third processor 230 supports the SP32_ V _ EXP and SP32_ V _ LOG evaluation function types.

A selection circuit 230, configured to selectively output at least one of the first computation sub-results output by the at least two first processors 221, the second computation sub-results output by the at least two second processors 222, and the third computation sub-results output by the at least four third processors 230 according to the control signal, so as to obtain a computation output result; performing or operation on at least one of the first path abnormal signals output by the at least two first processors 221, the second path abnormal signals output by the at least two second processors 222, and the third path abnormal signals output by the at least four third processors 230 to obtain an abnormal output result; at least one of the first valid flag signals output by the at least two first processors 221, the second valid flag signals output by the at least two second processors 222, and the third valid flag signals output by the at least four third processors 230 is subjected to an or operation to obtain a flag bit output result, and the calculation output result, the abnormal output result, and the flag bit output result are used as target calculation results.

In summary, in the embodiment of the present application, by providing two first processors, two second processors, and four third processors, the transcendental function vector calculation of floating point data with 128 bits and more than 128 bits is realized, and the applicability of the floating point arithmetic device is improved.

Fig. 3 is a schematic diagram illustrating a structure of a first processor according to an exemplary embodiment of the present application. As shown in fig. 3, the first processor 300 includes a first pipeline circuit 310, a second pipeline circuit 320, a third pipeline circuit 330, a fourth pipeline circuit 340, and a fifth pipeline circuit 350, and the first processor 300 may be used as the first processing circuit 120 in the embodiment of fig. 1, the first processor 221 in the embodiment of fig. 2, or the second processor 222 in the embodiment of fig. 2.

The first pipeline circuit 310 is configured to perform floating point type determination, floating point mantissa normalization processing, and floating point mantissa leading zero detection on the first floating point data according to the control signal, to obtain first pipeline output data, and send the first pipeline output data to the second pipeline circuit 320.

Illustratively, as shown in fig. 3, the first flow circuit 310 includes a first decision circuit 311 and a first normalization circuit 312.

The first decision circuit 311 performs floating point type decision for the first floating point data (32/64 bits), classifies special and normal floating point data, and generates data information (mainly including signals of special floating point, decoding control, floating point exception enable, and the like) input to the special floating point data path, data information (mainly including 11-bit floating point exponent and decoding control signal) of the floating point exponent calculation path, and data information (mainly including 53-bit floating point mantissa and decoding control signal) of the normal floating point data path.

The first normalization circuit 312 is general for normal 32/64-bit floating point data calculation, the input floating point mantissa data is 53 bits, and the functions of left shift normalization processing, leading zero detection and the like of the floating point mantissa are realized, so that three types of signals sent to the second flow circuit are generated: the leading zero detection result R1 of the data width of 6 bits input to the first subtraction circuit 321 and the first lookup table circuit 322 is input to xh and xl for lookup table calculation after the floating-point mantissa of the first lookup table circuit is normalized, where xh is [ 1.

The second pipeline circuit 320 is configured to perform first intermediate result processing on the floating point exponent, special floating point calculation processing, and normalization floating point mantissa addressing lookup table on the first pipeline output data, to obtain second pipeline output data, and send the second pipeline output data to the third pipeline circuit 330.

Illustratively, as shown in fig. 3, the second pipeline circuit 320 includes a first subtracting circuit 321, a first look-up table circuit 322, and a first special processing circuit 323.

The first subtraction circuit 322 performs a first subtraction process of the 11-bit floating-point exponent. When the floating point number is a normalized number, carrying out subtraction calculation, wherein the subtracted number is a constant value C (one value is selected from 190, 253, 254, 1086, 1149, 1534, 2045 and 2046 constants according to each evaluation calculation type and the corresponding 32/63 floating point mark), and the subtracted number is an 11-bit floating point exponent; and when the floating point number is a denormal number, performing addition calculation, wherein the addend is a constant value C, the addend is an input leading zero result R1, and under the control gating logic, selectively outputting a 12-bit exponent intermediate result containing sign extension bits and subjected to addition/subtraction calculation.

The first lookup table circuit 322 is addressed by xh, and the lookup table can generate and send a 19-bit offset and an 8-bit slope value to the third flow circuit. The lookup table logic is constructed in a truth table mode, 3 lookup tables are formed by reciprocal and square root reciprocal, wherein 1 lookup table is reciprocal, the offset and the slope value of [1, 2) are stored, 2 lookup tables are reciprocal of square root are stored, the offset and the slope value of [0.5, 1) and [1, 2) are respectively stored, and the gating logic selects the needed lookup table result to output.

The first special processing circuit 323 mainly implements the calculation result of the 32/64-bit special floating point number, generates output results of special floating point numbers such as zero, infinity, and not, and first exception flags such as a divide-by-zero operation, a not-number invalid operation, a negative square root invalid operation, overflow, not-precise, and a rounding operation.

And the third pipeline circuit 330 is configured to perform fixed-point multiplication on the second pipeline output data in the floating-point mantissa data path to obtain third pipeline output data, and send the third pipeline output data to the fourth pipeline circuit 340.

Illustratively, the third flow circuit 330 includes a multiplication circuit 331.

The multiplication circuit 331 performs multiplication of an 8-bit × 11-bit fixed point number on the floating-point mantissa data path, performs Booth (Booth) multiplication of 8-bit × 11-bit with the slope value output by the first lookup table circuit and xl output by the first normalization circuit, and finally outputs a 19-bit product result.

The fourth pipeline circuit 340 is configured to perform floating point mantissa subtraction, normalization processing and second intermediate result processing on the floating point exponent on the third pipeline output data according to the control signal, to obtain fourth pipeline output data, and send the fourth pipeline output data to the fifth pipeline circuit 350.

Illustratively, the fourth pipeline circuit 340 includes a second subtraction circuit 341, a second normalization circuit 342, and a third subtraction circuit 343.

The second subtraction circuit 340 performs a floating-point mantissa subtraction operation, subtracts the product result from the offset, and performs decimal alignment adjustment on the subtracted number and the subtracted number, and sends the subtracted result as high-order 22-bit data to the second normalization circuit.

The second normalization circuit 342 performs post-normalization processing on the subtraction result. For the 22-bit calculation result output from the second subtraction circuit, which is only three cases of 1.x.. X, 0.1x.. X and 0.01x.. X, the leading zero detection and shifter is simplified in design, only the upper 3 bits of the floating-point mantissa are detected, the leading zero numbers are respectively 0, 1 and 2, that is, 0, 1 bit and 2 bit are respectively shifted to the left, so that the post-normalization processing is realized, and finally 53-bit floating-point mantissa M1 and 2-bit leading zero detection data R2 after normalization are output.

The third subtraction circuit 343 performs the second subtraction process of the floating-point number exponent. The intermediate result of the 12-bit floating-point exponent generated by the first subtraction circuit 321 performs a subtraction operation with the number R2 of leading zeros (since the floating-point mantissa M1 is left-shifted, the mantissa exponent corresponding to the floating-point number must be subtracted by the number of left-shifted bits), and outputs a sign-extended 13-bit floating-point exponent.

And a fifth pipeline 350, configured to perform post-processing on the fourth pipeline output data according to the control signal to obtain a first calculation result. When the first floating point data includes first 32-bit floating point data, the computation result includes a first computation sub-result; when the first floating point data includes first 64-bit floating point data, the calculation result includes a second calculation sub-result.

Illustratively, as shown in FIG. 3, the fifth pipeline circuit 350 includes a post-processing circuit 351.

The post-processing circuit 351 performs post-processing on the floating-point number normalized intermediate result. In the conventional floating point number calculation path, the final result type is judged according to the floating point exponent range, and corresponding processing module results of a normalization number (Normal), an Overflow (Overflow) and an Underflow (Underflow) are gated to obtain the calculation result and corresponding state flag information of the 32-bit/64-bit conventional floating point number calculation path, and the specific classification is as follows:

1. the floating point result output can be directly formed by aiming at the normalized number mantissa with a 1.frac format and biased exponent (DP range [1, 2046] with a biased 1023, 32-bit single-precision floating point data range [1, 254] with a biased 127, single-precision floating point data range [897, 1150] with a double-precision format and a biased 1023). The intermediate result is in double precision format, such as for 32-bit single precision floating point data calculation, the result data format needs to be adjusted, the mantissa [0] 52 is truncated [ 1.

2. And outputting an infinite floating point number aiming at the overflow result. When the overflow control enable signal is 1, outputting overflow and non-precise abnormal valid marks; otherwise, only the overflow valid flag is output, and the inaccurate abnormal flag is invalid.

3. For underflow results, zero or a fractional number (Tiny, where the floating mantissa is non-zero, the floating exponent is less than zero, and the final result is expressed in exponent all-zero, mantissa non-zero format) is output. When the underflow control enable signal is 1, outputting an underflow and non-precise abnormal effective mark; otherwise, only the underflow valid flag is output, and the inaccurate exception flag is invalid.

In summary, in the embodiment of the present application, the first processing circuit is set as the first pipeline circuit, the second pipeline circuit, the third pipeline circuit, the fourth pipeline circuit and the fifth pipeline circuit, the floating point reciprocal estimate calculation or the floating point reciprocal square root estimate calculation is split into five pipeline steps, and the five pipeline steps are respectively executed by the five pipeline circuits, so as to obtain the floating point reciprocal estimate calculation or the calculation result of the floating point reciprocal square root estimate calculation of the floating point data, thereby implementing the hardware design of the floating point reciprocal estimate calculation and the floating point reciprocal square root estimate calculation, improving the calculation speed of the floating point arithmetic device, and improving the calculation efficiency of the floating point arithmetic device.

Fig. 4 is a schematic diagram illustrating a structure of a third processor according to an exemplary embodiment of the present application. As shown in fig. 4, the third processor 400 includes a sixth pipeline circuit 410, a seventh pipeline circuit 420, an eighth pipeline circuit 430, a ninth pipeline circuit 440, and a tenth pipeline circuit 450, and the third processor 400 may be used as the second processing circuit 130 in the embodiment of fig. 1 or the third processor 230 in the embodiment of fig. 2.

The sixth pipeline circuit 410 is configured to perform a decomposition operation on the second floating point data and a calculation type determination operation according to the control signal, obtain a sixth pipeline output result, and send the sixth pipeline output result to the seventh pipeline circuit 420.

Illustratively, as shown in fig. 4, the sixth pipeline circuit 410 includes a second determination circuit 411.

The second decision circuit 411 performs an unpacking logic function, and simultaneously transmits the generated evaluation function type, 32-bit floating point number type (including 1-bit sign bit S, 8-bit exponent bit E, 24-bit mantissa bit M with hidden bit) and flag/control bits thereof, which include flag information such as normalization number (normal), denormal number (denormal), zero (zero), one (one), infinity (infinity), and not (NAN), to the next stage.

The seventh pipeline circuit 420 is configured to perform special data processing on the sixth pipeline output result based on the calculation type included in the sixth pipeline output result according to the control signal, obtain a seventh pipeline output result, and send the seventh pipeline output result to the eighth pipeline circuit 430.

Illustratively, as shown in fig. 4, the seventh pipeline circuit 420 includes a second special processing circuit 421.

The second special processing circuit 421 performs calculation work corresponding to the special data according to the evaluation function type, and at this stage, exponent and mantissa operations are performed on the special floating point number and the conventional floating point number respectively according to the 32-bit floating point number type, wherein the conventional floating point number keeps the S/E/M data straight and through without processing; the special floating point number needs to generate S/E/M of a constant value according to different floating point types, and finally, the two types of floating point number calculation results are synthesized, and data including a 1-bit sign bit S, an 8-bit exponent bit E, and a 24-bit mantissa bit M with a hidden bit is generated and sent to the eighth pipeline circuit 430 in a unified manner.

The eighth pipeline circuit 430 is configured to perform a first shift operation on the seventh pipeline output result according to the control signal to obtain an eighth pipeline first sub-result, and send the eighth pipeline first sub-result to the ninth pipeline circuit 440; or, the control unit is configured to perform the first leading zero detection operation, the third shift operation, and the logarithm table lookup operation on the seventh pipeline output result according to the control signal, obtain an eighth pipeline second sub-result, and send the eighth pipeline second sub-result to the ninth pipeline circuit 440.

Illustratively, the eighth pipeline circuit 430 includes a first exponential subtraction circuit 431, a first right shift circuit 432, a first leading zero detection circuit 433, a first left shift circuit 434, a selection circuit 435, a logarithmic look-up table circuit 436, and a first data processing circuit 437.

A seventh pipelined output result of the floating point data supporting the floating point base 2 exponent estimate calculation in the second floating point data is sent to the first exponent reduction circuit 431; and a seventh pipelined output result of the floating point data supporting the floating point base 2 logarithmic evaluation calculation in the second floating point data is sent to the first leading zero detection circuit 433.

The first exponent subtracting circuit 431 is configured to perform a shift operation on the input data, convert the input 32-bit floating point number into a fixed-point format, and implement a requirement according to an exponent algorithm in a floating-point exponent path in the pipeline section, where a subtracted number input by the exponent subtracting module is a constant integer (constant 8), and an input subtracted number is an E exponent from an output result of the seventh pipeline, so as to generate an 8-bit calculation result RS, and send the calculation result RS to the first right shift circuit 432.

The input data of the first right shift circuit 432 are RS [ 3] (representing the number of right shifts), shifted data RM (which is a 32-bit number obtained by left-shifting the 24-bit mantissa bits M in the seventh pipelined output result by 8 bits), output a 32-bit intermediate result IM, then sorted by the floating-point exponent calculation result (including three types of zero, infinity, and normal), sent to the first data processing circuit 436 from three of 0x 000000000000, 0x7F800000, and IM, generate 32-bit data texp, and wait for output of two of the 32-bit data generated by the logarithmic operation path (mantissa bits providing the final floating-point result for exponential operation, floating-point exponent bits are passed from the eighth pipeline circuit 430 to the ninth pipeline circuit 440).

The first leading zero detecting circuit 433 is used to perform leading zero detection on RM data and then send the lower 5 bits of the 6-bit leading zero count value to the first left shifting circuit 434.

A first shift left circuit 434, which is used to shift left the RM and output the 32-bit shift result SL to the selection circuit 435.

And a selection circuit 435, configured to perform alternative output on M [ 1] and SL [ 1] under the action of the control signal, to obtain a logic, and send the logic to the logarithm look-up table circuit 435.

And the logarithm look-up table circuit 436 is used for generating a logout through a logarithm look-up table with a logic circuit structure, synthesizing the logout with S and E to generate a 32-bit floating point number tlog, and sending the tlog to the first data processing circuit 436.

The first data processing circuit 437 is configured to output the 32-bit data exp _ log to the ninth pipeline 440 by one out of 32-bit texp and tlog under the control signal (for identifying an exponential operation or a logarithmic operation).

The ninth pipeline circuit 440 is configured to perform an exponential table look-up operation and a complementation operation on the eighth pipeline first sub-result to obtain a ninth pipeline first sub-result, and send the ninth pipeline first sub-result to the tenth pipeline circuit 450; or, a complementation operation is performed on the eighth pipeline second sub-result to obtain a ninth pipeline second sub-result, and the ninth pipeline second sub-result is sent to the tenth pipeline circuit 450.

Illustratively, as shown in fig. 4, the ninth pipeline circuit includes an exponential look-up circuit 441, a complementation circuit 442 and a second data processing circuit 443.

The exponent lookup circuit 441 is configured to perform an exp exponent operation value logical lookup (the exponent lookup table logic is constructed in a truth table manner according to an exponent implementation algorithm) and an 11-bit lookup result output according to the exp _ log [ 9] from the stage2, and synthesize a 32-bit floating point number as an upper 11 bits (all lower 12 bits are zero) of a 23-bit floating point mantissa together with the exp _ log [0 ].

And a complementary circuit 442, configured to perform an operation of negating and adding one to M [1 ].

And the second data processing circuit 443 is used for performing alternative selection on the 32-bit floating point number and conv output by the exponent table look-up circuit under the selection of the control signal and outputting the 32-bit floating point number and conv as 32-bit data LR to the tenth pipeline circuit.

The tenth pipeline circuit 450 is configured to perform a second shift operation and an exponent floating point number synthesis operation on the first sub-result of the ninth pipeline according to the control signal, so as to obtain a third calculation sub-result; or, performing a second leading zero detection operation, a fourth shift operation, and a logarithmic floating point number synthesis operation on the second sub-result of the ninth pipeline to obtain a third calculation sub-result.

Illustratively, as shown in fig. 4, the tenth pipeline circuit 450 includes a second right shift circuit 451, an exponent add circuit 452, an exponent synthesizing circuit 453, a second leading zero detection circuit 454, a second left shift circuit 455, a second exponent subtract circuit 456, a logarithm synthesizing circuit 457, and a third data processing circuit 458.

A tenth pipeline circuit 450 for determining whether the ninth pipeline first sub-result is a denormal floating-point number, and if the ninth pipeline first sub-result is the denormal floating-point number, outputting the exponent bit total '0' of the 32-bit floating-point number to the second right shift circuit 451; the ninth pipeline first sub-result is not a denormalized floating point number, then the ninth pipeline first sub-result is sent to the exponent add circuit 452.

The second right shift circuit 451 is configured to perform corresponding right shift processing on the mantissa bits of the floating point number to obtain 32-bit data rs _ dout, and send the 32-bit data rs _ dout to the exponent synthesizing circuit 453.

The exponent adding circuit 452 is configured to perform an exponent adding process on the first sub-result of the ninth pipeline to obtain an exponent adding calculation result, and send the exponent adding calculation result to the exponent synthesizing circuit 453.

An exponent synthesis circuit 453, for calculating a floating point exponent calculation result based on: (1) When the floating-point mantissa is rounded, no carry is carried out, and then the LR bit is finally calculated; (2) When the floating-point mantissa is rounded, if there is a carry and the exponent bit is zero, a 32-bit floating point number synthesized by adding all zeros at the upper 9 bits and rs _ dout [0] at the lower 23 bits is output, and finally, the alternative of the calculated values (1) and (2) is realized, and the 32-bit exp exponent estimated value exp _ dout is output.

The second leading zero detecting circuit 454 detects the number of leading zeros of the 32-bit data LR [1 ].

A second left shift circuit 455 is used to perform left shift normalization of LR [ 1.

A second exponent subtracting circuit 456 for performing a subtraction operation (constant 134-cnt calculation) on the corresponding floating point exponent and outputting the 8-bit calculation result as the floating point exponent bit and LR [0] as the sign bit to the logarithm synthesis circuit 457.

And a logarithmic synthesis circuit 457, configured to perform an output of one of the floating-point mantissa bits and the floating-point exponent bits to obtain a log _ dout value that generates a 32-bit log logarithmic evaluation value.

And a third data processing circuit 458 for selectively outputting the 32 bits exp _ dout and log _ dout as a third calculation sub-result under the control signal (mainly, an exponential operation enable flag or a logarithmic operation enable flag).

To sum up, in the embodiment of the present application, the third processor is configured as a sixth pipeline circuit, a seventh pipeline circuit, an eighth pipeline circuit, a ninth pipeline circuit, and a tenth pipeline circuit, the floating-point-based 2-exponent estimate calculation or the floating-point-based 2-logarithm estimate calculation is split into five pipeline steps, and the five pipeline steps are respectively executed by the five pipeline circuits, so that the calculation result of the floating-point-based 2-exponent estimate calculation or the floating-point-based 2-logarithm estimate calculation of the floating-point data is obtained, the hardware design of the floating-point-based 2-exponent estimate calculation and the floating-point-based 2-logarithm estimate calculation is realized, the calculation speed of the floating-point arithmetic device is increased, and the calculation efficiency of the floating-point arithmetic device is increased.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications of the invention are intended to be covered by the present invention.

Claims

1. A floating-point arithmetic device, comprising: decoding circuit, first processing circuit, second processing circuit and selection circuit:

the decoding circuit is used for receiving target floating point data and a control signal, acquiring first floating point data from the target floating point data according to the control signal, and sending the first floating point data to the first processing circuit; and/or acquiring second floating point data from the target floating point data according to the control signal, sending the second floating point data to the second processing circuit, and sending the control signal to the first processing circuit, the second processing circuit and the selection circuit respectively, wherein the first floating point data comprises floating point data with an evaluation function type of reciprocal floating point evaluation calculation or reciprocal floating point square root evaluation calculation, and the second floating point data comprises floating point data with an evaluation function type of floating point base 2 exponent evaluation calculation or floating point base 2 logarithm evaluation calculation;

the first processing circuit is configured to perform a first estimation calculation on the first floating point data according to the control signal to obtain a first calculation result of the first floating point data, and send the first calculation result to the selection circuit, where the first estimation calculation includes a floating point reciprocal estimation calculation or a floating point reciprocal square root estimation calculation;

the second processing circuit is configured to perform a second estimation calculation on the second floating point data according to the control signal to obtain a second calculation result, and send the second calculation result to the selection circuit, where the second estimation calculation includes a floating-point-based 2-exponent estimation calculation or a floating-point-based 2-logarithm estimation calculation;

2. The floating point arithmetic device of claim 1, wherein the first floating point data comprises a first 32-bit floating point data, and/or a first 64-bit floating point data; the first 32-bit floating point data is 32-bit first floating point data, and the first 64-bit floating point data is 64-bit first floating point data;

the first processing circuit comprises at least one first processor, and/or at least one second processor;

3. The floating point arithmetic device of claim 2, wherein the second floating point data comprises a second 32-bit floating point data, the second 32-bit floating point data being a 32-bit second floating point data;

the second processing circuitry comprises at least one third processor;

4. The floating point arithmetic device of claim 3 wherein the first processing circuitry comprises at least two of the first processors and at least two of the second processors;

5. The floating point arithmetic device of claim 3 wherein the first processing circuitry comprises at least four of the second processors;

6. The floating point arithmetic device of claim 3 wherein the first processing circuitry comprises at least two of the second processors;

7. The floating point arithmetic device of claim 3 wherein the second processing circuitry comprises at least four of the third processors;

the decoding circuit is further configured to unpack the target floating point data into at least four groups of second 32-bit floating point data according to the control signal, and send each 32-bit floating point data of the at least four groups of second 32-bit floating point data to any one of the at least four third processors.

8. The floating point arithmetic device of any one of claims 2 to 7 wherein the first processor, or the second processor, comprises:

the fifth flow circuit is used for carrying out post-processing on the fourth flow output data according to the control signal to obtain a calculation result; when the first floating point data comprises the first 32-bit floating point data, the computation result comprises the first computation sub-result; when the first floating point data includes the first 64-bit floating point data, the computation result includes the second computation sub-result.

9. The floating point arithmetic device of any one of claims 3 to 7 wherein the third processor comprises:

10. The floating-point arithmetic device of claim 9, wherein the eighth pipeline circuit is further configured to perform a first leading zero detection operation, a third shift operation and a logarithmic table lookup operation on the seventh pipeline output result according to the control signal, respectively, to obtain an eighth pipeline second sub-result, and send the eighth pipeline second sub-result to the ninth pipeline circuit;