CN116974512A - Floating point arithmetic device, vector processing device, processor, and electronic apparatus - Google Patents

Floating point arithmetic device, vector processing device, processor, and electronic apparatus Download PDF

Info

Publication number
CN116974512A
CN116974512A CN202211436156.5A CN202211436156A CN116974512A CN 116974512 A CN116974512 A CN 116974512A CN 202211436156 A CN202211436156 A CN 202211436156A CN 116974512 A CN116974512 A CN 116974512A
Authority
CN
China
Prior art keywords
mantissa
unit
shift
floating point
exponent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211436156.5A
Other languages
Chinese (zh)
Inventor
任子木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211436156.5A priority Critical patent/CN116974512A/en
Publication of CN116974512A publication Critical patent/CN116974512A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49905Exception handling
    • G06F7/4991Overflow or underflow
    • G06F7/49915Mantissa overflow or underflow in handling floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The application relates to the technical field of integrated circuits, and discloses a floating point arithmetic device, a vector processing device, a processor and electronic equipment, wherein the floating point arithmetic device comprises: the index operation module is used for operating the first index and the second index to output an index operation result; the first mantissa operation branch includes: the mantissa exchanging unit is used for exchanging the first mantissa with the second mantissa according to the target difference value so as to input the first target mantissa into the first pair-order shifting unit and input the second target mantissa into the first fixed-point adder; the first fixed-point adder is used for carrying out addition operation; the first normalization shift unit is used for performing normalization shift processing; the maximum shiftable number of the first pair of order shifting units is greater than the maximum shiftable number of the first normalization shifting unit; the first mantissa and the second mantissa are input to the first mantissa operation branch if the absolute value of the target difference is greater than a first threshold. The application can improve the efficiency of floating point addition and subtraction operation.

Description

Floating point arithmetic device, vector processing device, processor, and electronic apparatus
Technical Field
The present application relates to the field of integrated circuits, and more particularly, to a floating point arithmetic device, a vector processing device, a processor, and an electronic apparatus.
Background
In the technical field of AI (Artificial Intelligence ), an AI processor is generally used for vector processing, the vector processing capability is a key factor restricting the performance of the AI processor, and the parallelism of vector processing units of the high-performance AI processor is high. The key to vector processing is the floating point capability of the floating point device. The improvement of the operation efficiency of the floating point operation device is a technical problem to be solved in the related art.
Disclosure of Invention
In view of the above, embodiments of the present application provide a floating point computing device, a vector processing device, a processor, and an electronic apparatus, so as to improve the computing efficiency of the floating point computing device.
According to an aspect of an embodiment of the present application, there is provided a floating-point arithmetic device for operating on a first floating-point operand including a first exponent and a first mantissa and a second floating-point operand including a second exponent and a second mantissa, the floating-point arithmetic device including:
the index operation module is used for operating the first index and the second index and outputting an index operation result; the exponent operation result at least comprises an absolute value of a target difference value, wherein the target difference value is the difference between a first exponent and a second exponent;
The first mantissa operation module comprises a first mantissa operation branch; inputting the first mantissa and the second mantissa into the first mantissa operation branch if the absolute value of the target difference is greater than a first threshold;
the first mantissa operation branch comprises a mantissa exchanging unit, a first pair of order shifting units, a first fixed point adder and a first normalized shifting unit; the mantissa exchanging unit is used for exchanging the first mantissa with the second mantissa according to the target difference value so as to input a first target mantissa to the first order shifting unit and input a second target mantissa to the first fixed-point adder; the first fixed-point adder is used for carrying out addition operation on the order shift result output by the first order shift unit and the second target mantissa; the first normalization shift unit is used for performing normalization shift processing on the operation result output by the first fixed-point adder; the first target mantissa is a mantissa with a smaller corresponding exponent of the first mantissa and the second mantissa; the second target mantissa refers to the other mantissa of the first mantissa and the second mantissa other than the first target mantissa; the first pair of order shift units has a maximum number of shiftable bits greater than the maximum number of shiftable bits of the first normalization shift unit.
According to an aspect of an embodiment of the present application, there is provided a vector processing apparatus including a plurality of parallel vector processing units and a register unit, the vector processing units including a floating point operation apparatus as described above, the floating point operation apparatus reading a first floating point operand and a second floating point operand from the register unit and writing operation results of the first floating point operand and the second floating point operand to the register unit.
According to an aspect of an embodiment of the present application, there is provided a processor including a floating point arithmetic device as described above.
According to an aspect of an embodiment of the present application, there is provided an electronic device including a processor as above.
For two floating point operands, if the absolute value of the difference between the first exponent and the second exponent (i.e. the absolute value of the target difference) is larger, this indicates that the first floating point operand and the second floating point operand are different greatly, in this case, on the one hand, in order to make the order of the smaller order code proceed to the bigger order code, for the mantissa of the floating point operand with the smaller order code, the number of bits of the order shift is larger, and the time spent for the order shift is longer; on the other hand, if the mantissas of the two floating point operands are subtracted, more leading zeros will not be generated, so the number of bits that need to be shifted in the normalization shift operation is less, and the time required for the normalization shift is less; therefore, in the application, according to the absolute value of the difference value between the first exponent in the first floating point operand and the second exponent in the second floating point operand (namely, the absolute value of the target difference value) determined by the exponent operation module, when the absolute value of the target difference value is larger than the first threshold value, the first mantissa and the second mantissa are input into the first mantissa operation branch, and the maximum movable number of the first opposite-order shift unit in the first mantissa operation branch is larger than the maximum movable number of the first normalization shift unit, so that the first mantissa and the second mantissa can be ensured to be input into the first mantissa operation branch for carrying out floating point addition and subtraction operation, the overlong time occupied by normalization shift is avoided, the time required by opposite-order shift is ensured, the time delay in the floating point addition and subtraction operation process is reduced, and the efficiency of floating point addition and subtraction operation is improved as a whole.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a block diagram illustrating a floating point arithmetic device, according to one embodiment of the present application.
Fig. 2 shows a schematic diagram of the format of a floating point operand under the IEEE754 standard.
Fig. 3 is a block diagram of a floating point arithmetic device according to another embodiment of the present application.
FIG. 4 is a schematic diagram of a second normalization shift element according to an embodiment of the present application.
Fig. 5 is a circuit diagram of an exponent operation unit according to an embodiment of the present application.
Fig. 6 is a block diagram illustrating a floating point arithmetic device according to another embodiment of the present application.
Fig. 7 is a block diagram of a vector processing apparatus according to an embodiment of the present application.
Fig. 8 is a block diagram of an electronic device according to an embodiment of the application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Fig. 1 is a block diagram illustrating a floating-point operator for operating on a first floating-point operand and a second floating-point operand, each of which includes an exponent and a mantissa, according to one embodiment of the present application. In the present application, for convenience of distinction, the exponent of the first floating point operand is referred to as the first exponent, and the mantissa of the first floating point operand is referred to as the first mantissa; the exponent of the second floating point operand is referred to as the second exponent and the mantissa of the second floating point operand is referred to as the second mantissa. In particular embodiments, the first and second floating-point operands may be floating-point operands that satisfy the IEEE754 standard format.
FIG. 2 shows a schematic diagram of the format of a floating point operand under the IEEE754 standard, as shown in FIG. 2, a floating point operand comprising three parts: a counter, a step code, and a mantissa, wherein:
the sign of the mantissa indicates the sign bit, 1 indicates negative, and 0 indicates positive;
step code (index): representing the power of a floating point operand, wherein the base number is 2, and the floating point operand is represented by a frame shift;
mantissa: the decimal part of the floating point number is represented by the original code and one bit is hidden in the case of 2 as the base.
Assume a floating point number n= (-1) ms *2 E *(1.M) 2 Wherein m is s The number, E is the code, and M is the mantissa. Wherein, (1. M) 2 The lower right-hand corner of 2 represents a cardinal number of 2.
As shown in fig. 1, the floating-point arithmetic device at least includes an exponent arithmetic module 100 and a first mantissa arithmetic module 200. The exponent operation module 100 is configured to operate on the first exponent and the second exponent, and output an exponent operation result; the exponent operation result at least includes an absolute value of a target difference, the target difference being a difference between the first exponent and the second exponent.
The first mantissa operation module includes a first mantissa operation branch 210; in case the absolute value of the target difference is greater than the first threshold, inputting the first mantissa and the second mantissa into the first mantissa operation branch 210;
The first mantissa operation branch 210 includes a mantissa exchange unit 211, a first pair of order shift units 212, a first fixed point adder 213, a first normalization shift unit 214; the mantissa exchanging unit 211 is configured to exchange the first mantissa with the second mantissa according to the target difference value, so as to input the first target mantissa to the first pair-order shifting unit and input the second target mantissa to the first fixed-point adder 213; the first target mantissa is a mantissa with a smaller corresponding exponent of the first mantissa and the second mantissa; taking the other mantissa of the first mantissa and the second mantissa except the first target mantissa as the second target mantissa; the difference between the first exponent and the second exponent may be calculated by the exponent calculation module 100.
Specifically, if the target difference is greater than zero (i.e., the first exponent is greater than the second exponent), the second mantissa is taken as the first target mantissa, the first mantissa is taken as the second target mantissa, and the mantissa exchanging unit 211 exchanges the first mantissa with the second mantissa, so that the second mantissa is input to the first pair of order shifting units 212, and the first mantissa is input to the first fixed-point adder 213; if the target difference is smaller than zero (i.e., the first exponent is smaller than the second exponent), the first mantissa is used as the first target mantissa, the second mantissa is used as the second target mantissa, and the mantissa exchanging unit 211 does not need to exchange the first mantissa with the second mantissa, and inputs the first mantissa to the first pair-order shifting unit 212 and the second mantissa to the first fixed-point adder 213. Of course, if the target difference is equal to zero, the mantissa exchanging unit 211 may exchange the first mantissa with the second mantissa, or may not exchange the first mantissa with the second mantissa; in order to reduce the processing amount, in the case where the target difference value is zero, the mantissa exchanging unit 211 may not need to exchange the first mantissa with the second mantissa.
The mantissa of the floating point operand with the smaller exponent (i.e., the first target mantissa) is input to the first pair of order shifting units 212 by swapping the first mantissa with the second mantissa, and the mantissa of the floating point operand with the larger exponent (i.e., the second target mantissa) is input to the first fixed point adder 213.
The first order shifting unit 212 is configured to shift the first target mantissa to the order of the smaller order code to the larger order code, and specifically shift the mantissa of the floating-point operand with the smaller exponent (i.e., the first target mantissa) to the right to realize the same order code of the two floating-point operands. Wherein the number of moving bits of the first target mantissa is the absolute value of the target difference value. The order shifting refers to that by shifting the position of decimal points in mantissas so that the order codes (exponents) of two floating point operands that are operated on are equal, the exponents of two floating point operands that are operated on are equal, also referred to as the order codes of two floating point operands being aligned.It should be noted that, in the process of the level shifting, the exponent of the floating point operand corresponding to the shifted mantissa correspondingly changes, specifically, if the exponent is shifted by one bit to the right, the exponent is increased by 1. For example, if a floating point operand is 0.00110 ×2 8 Its mantissa 0.00110 is shifted right by two bits, its mantissa becomes 0.00001, its exponent becomes 10 from 8, and the floating point operand after shifting is expressed as 0.00001×2 11 . In the present application, the first order shifting unit 212 performs order shifting on the first target mantissa, which means shifting the mantissa of the first target mantissa by K bits to the right, where K is the absolute value of the target difference value.
After the first target mantissa is subjected to order shifting, the exponent of the floating point operand from which the first target mantissa is derived is identical to the exponent of the floating point operand from which the second target mantissa is derived, so that the order shifting of the exponent (i.e. smaller order code) of the floating point operand from which the first target mantissa is derived to the exponent (i.e. larger order code) of the floating point operand from which the second target mantissa is derived is realized. Thus, if it is necessary to add or subtract two floating point operands, the new mantissa obtained by shifting the first target mantissa by the order may be directly added or subtracted from the second target mantissa.
The first fixed-point adder 213 is configured to add the order shift result output by the first order shift unit to the second target mantissa; the order shift result output by the first order shift unit is the result output after the order shift of the first target mantissa.
The first normalization shift unit 214 is configured to perform normalization shift processing on the operation result output by the first fixed-point adder; the maximum number of shiftable bits of the first pair of stage shifting units is greater than the maximum number of shiftable bits of the first normalization shifting unit. In a specific embodiment, the maximum shiftable number of the first normalization shift unit is 1, i.e. the first normalization shift unit performs at most one bit shift during normalization.
The mantissa S requirement of the normalized floating point operand satisfies:therefore, after the mantissa of the floating-point operand is calculated, it is necessary to perform normalization shift processing on the mantissa operation result so that the mantissa operation result after normalization processing satisfies the normalized mantissa requirement. The normalization shift processing is realized by shifting the mantissa operation result to the left or the right, and similarly, if the mantissa operation result is shifted to the left, the corresponding exponent is reduced by 1; if the mantissa operation result is shifted to the right, the corresponding exponent is increased by 1.
The floating point arithmetic device shown in fig. 1 may be used to add two floating point operands and to subtract the floating point operands. It will be appreciated that in the case where the floating point arithmetic device shown in fig. 1 is used for a floating point operand subtraction operation, the floating point operands therein as the reduction numbers may be represented by complements to convert the subtraction operation of the floating point operands into an addition operation. According to the exponent operation result output by the exponent operation module and the normalized shift result output by the first normalized shift unit, the operation result of addition (or subtraction) of two floating point operands can be correspondingly determined.
For two floating point operands, if the absolute value of the difference between the first exponent and the second exponent (i.e. the absolute value of the target difference) is larger, this indicates that the first floating point operand and the second floating point operand are different greatly, in this case, on the one hand, in order to make the order of the smaller order code proceed to the bigger order code, for the mantissa of the floating point operand with the smaller order code, the number of bits of the order shift is larger, and the time spent for the order shift is longer; on the other hand, if the mantissas of the two floating point operands are subtracted, more leading zeros will not be generated, so the number of bits that need to be shifted in the normalization shift operation is less, and the time required for the normalization shift is less; therefore, in the application, under the condition that the absolute value of the target difference value is larger than the first threshold value, the first mantissa and the second mantissa are input into the first mantissa operation branch, and the maximum movable number of the first opposite-order shifting unit in the first mantissa operation branch is larger than the maximum movable number of the first normalization shifting unit, so that the first mantissa and the second mantissa can be ensured to be input into the first mantissa operation branch to carry out floating point addition and subtraction operation, the excessively long time occupied by normalization shifting is avoided, the time required by opposite-order shifting is ensured, the time delay in the floating point addition and subtraction operation process is reduced, and the efficiency in the floating point addition and subtraction operation process is integrally improved.
Fig. 3 is a block diagram of a floating point arithmetic device according to another embodiment of the present application. In contrast to the floating point computing device shown in fig. 1, in the embodiment corresponding to fig. 3, the first mantissa computing module 200 further includes a second mantissa computing branch 220, and the first mantissa and the second mantissa are input into the second mantissa computing branch 220 when the absolute value of the target difference value is not greater than the first threshold;
the second mantissa operation branch 220 includes a second order shift unit 221, a second fixed point adder 224, a leading zero prediction unit 223, and a second normalization shift unit 227, where the second order shift unit 221 is configured to perform an order shift process on the first mantissa and the second mantissa; the opposite-order shifting processing is carried out, so that the opposite-order is carried out towards the larger direction of the smaller order code, correspondingly, the mantissa of the floating point operand with the smaller order code is shifted to the right, and the shifted bit number is the absolute value of the target difference value. In one embodiment, the first threshold may be 1, and the maximum displaceable number of the second pair of stage shift units is one bit.
Specifically, the second level shifting unit 221 may perform level shifting on the first mantissa and the second mantissa according to the target difference, specifically, determine, according to the target interpolation, which mantissa of the first mantissa and the second mantissa is the mantissa with smaller corresponding exponent, and then, according to the principle of performing level shifting on the lower level code and the higher level code, right shift the mantissa with smaller corresponding exponent, where the right shift bit number is the absolute value of the target difference, and the exponent corresponding to the first mantissa after level shifting is equal to the exponent corresponding to the second mantissa.
The second fixed-point adder 224 is configured to add the order shift result output by the second order shift unit 221; the leading zero prediction unit 223 is configured to perform leading zero prediction on the operation result of the second fixed-point adder; the second normalization shift unit 227 is configured to perform normalization shift processing on the operation result output by the second fixed-point adder 224 according to the leading zero prediction result output by the leading zero prediction unit 223. The maximum number of shiftable bits of the second normalization shift element is greater than the maximum number of shiftable bits of the second pair of order shift elements.
Since there may be more zeros between the most significant bit and the first 1 in the mantissa of two floating-point operands during subtraction, it is necessary to shift the mantissa result normalized in order for the mantissa result to meet the normalized mantissa requirement. To ensure the efficiency of the normalization shift, the normalization shift may be directed by leading zero prediction results.
Specifically, the leading zero prediction unit 223 includes a leading zero predictor (Leading Zero Anticipation, LZA) for detecting the position of the first 1 in the operation result of the mantissas of the two floating-point numbers according to the mantissas of the two floating-point numbers participating in the operation, and determining the number of zeros between the top and the first 1 in the operation result, wherein the number of zeros between the top and the first 1 is the leading zero. The leading zero predictor comprises a pre-coding logic circuit and a first 1 position detection logic circuit, wherein the pre-coding logic circuit generates a series of coding signals with the same bit number as the input mantissa, and the first 1 position detection logic is used for carrying out position coding on the 1 with the highest significant bit.
Correspondingly, the leading zero prediction result indicates the number of leading zeros in the operation result of the second fixed-point adder. Thereafter, the second normalization shift unit may perform normalization shift by the number of leading zeros indicated by the leading zero prediction result. For example, if the leading zero prediction result indicates that the number of leading zeros is K, the second normalization shift unit shifts the mantissa operation result by K bits to the left, and the exponent operation module needs to decrease the exponent corresponding to the mantissa operation result by K. The process of shifting the mantissa operation result left by the second normalization shift unit is called left-hand rule.
For example, if a mantissa operation result is 0.01001, assuming one bit shift left, the mantissa operation result is converted into 0.10010, and the exponent thereof needs to be correspondingly subtracted by 1; it should be noted that, in the left shift process, since the number of bits of the mantissa operation result needs to be kept constant, in the normalization shift process, the corresponding number needs to be complemented with the last bit of the mantissa operation result, for example, in the above example, 10 needs to be complemented with the last bit of the mantissa operation result in the left shift process.
Similarly, the second mantissa branch of the floating-point arithmetic device shown in fig. 3 may be used in conjunction with the exponent operation module 100 to perform addition and subtraction operations on two floating-point operands. It will be appreciated that where the second mantissa branch and exponent operation module 100 of fig. 3 is used for a floating point operand subtraction operation, the floating point operand as a reduction may be represented by a complement to convert the floating point operand subtraction operation to an addition operation.
If the absolute value of the difference between the exponent of the first floating point operand and the exponent of the second floating point operand is smaller, which indicates that the difference between the first floating point operand and the second floating point operand is smaller, on the one hand, the mantissa will generate more leading zeros after subtracting the first floating point operand from the second floating point operand, so that the number of bits of normalized shift is more and the time required for normalized shift is more in this case; on the other hand, since the first floating point operand and the second floating point operand differ less, the number of bits of the order shift performed to make the order of the two is smaller.
Therefore, in the application, under the condition that the absolute value of the target difference value is not greater than the first threshold value, the first mantissa and the second mantissa are input into the second mantissa operation branch, and as the maximum movable number of the second normalized shift unit in the second mantissa operation branch is greater than the maximum movable number of the second opposite-order shift unit, the leading zero prediction unit is pertinently arranged, so that normalized shift processing is carried out on the addition operation result output by the second fixed-point adder according to the leading zero prediction result output by the leading zero prediction unit, the normalized shift is more pertinence and guidance, and the normalized shift efficiency is improved; the method can ensure that the first mantissa and the second mantissa are input into the second mantissa operation branch to carry out floating point addition and subtraction operation, avoid the overlength time occupied by the opposite-order shift and ensure the time required by the normalized shift, thereby reducing the time delay in the floating point addition and subtraction operation process and integrally improving the efficiency in the floating point addition and subtraction operation process.
FIG. 4 is a schematic diagram of a second normalization shift element according to an embodiment of the present application, where the second normalization shift element includes at least two cascaded shift circuits as shown in FIG. 4, and the second normalization shift element illustrated in FIG. 4 includes five cascaded shift circuits, specifically: a first stage shift circuit 810, a second stage shift circuit 820, a third stage shift circuit 830, a fourth stage shift circuit 840, and a fifth stage shift circuit 850 are serially cascaded. Among them, each stage of shift circuit has the same structure, and specifically, the shift circuit includes a shift control unit 811, a shift unit 812, a sticky unit 813, and a Multiplexer (MUX) 814.
The shift control unit 811 is configured to determine whether the input data needs to be shifted in normalization, and if it is determined that the input data needs to be shifted in normalization, input the input data to the shift unit; if it is determined that the normalization shift is not required for the input data, inputting the input data into a multiplexer; the input data of the first-stage shifting circuit is data to be normalized and shifted, for example, an addition operation result output by the second fixed-point adder; the shift control unit may determine whether or not normalized shifting of the input data is required in the present stage shift circuit based on the leading zero prediction result output from the leading zero prediction unit.
The shift unit 812 is used for performing normalization shift processing on the input data, inputting discarded data segments to be discarded in the input data into the sticky unit and inputting normalization shift results of the input data into the multiplexer. For example, if the input data is 0.0001001, the input data needs to be shifted left by 3 to meet the normalization requirement, the first three 0 s are discarded data segments to be discarded from the first digit in the input data. It should be noted that, as described above, in order to ensure the constancy of the number of bits, if L bits are discarded from the first bit of the input data, the L bits need to be complemented with the last bit of the input data, for example, in the case of 0.0001001 of the input data, 3 0's need to be complemented with the last bit in the normalization shift process.
A sticky (stick) unit 813 is configured to perform an or operation on the discarded data segment, and input the or operation result to a sticky unit in the next stage of shift circuit.
The multiplexer 814 is configured to selectively input the input data or a normalized shift result of the input data to the next stage shift circuit as the input data of the next stage shift circuit.
It can be understood that, in the case that the shift control unit determines that the input data of the present stage shift circuit does not need to be subjected to normalized shift, the multiplexer correspondingly selects to input the input data of the present stage shift circuit to the next stage shift circuit; in the case where the shift control unit determines that the input data of the shift circuit of the present stage needs to be normalized and shifted, the multiplexer selects the normalized shift result output from the shift unit to be input to the shift circuit of the next stage.
In some embodiments, the maximum shift number corresponding to the shift circuit of the upper stage is greater than the maximum shift number corresponding to the shift circuit of the lower stage in the cascade order from first to second in the second normalization shift unit.
In the embodiment of fig. 4, a process of normalizing and shifting 24-bit input data and merging discarded data segments is shown, where the process adopts logarithmic shifting, that is, at least two stages of cascaded shifting circuits change logarithmically according to the cascade sequence from first to last, and the maximum shifting number corresponding to each stage of shifting circuits. In the 5-stage shift circuit shown in fig. 4, the maximum shift number corresponding to the first-stage shift circuit 810 is 16, the maximum shift number corresponding to the second-stage shift circuit 820 is 8, the maximum shift number corresponding to the third-stage shift circuit 830 is 4, the maximum shift number corresponding to the fourth-stage shift circuit 840 is 2, and the maximum shift number corresponding to the fifth-stage shift circuit 850 is 1.
In FIG. 4, for 24-bit input data, the normalization shift may be performed in 5 stages, the first stage: if the shift control unit in the first stage shift circuit 810 determines that 16-bit shift is required, performing an or operation on the 16-bit discarded data segment discarded by the shift unit through the sticky unit in the shift process; and input into the sticky unit of the next stage of shift circuit.
The second stage: if the shift control unit in the second stage shift circuit 820 determines that 8-bit shift is required, the shift control unit performs an or operation on the 8-bit discarded data segment discarded by the shift unit and the or operation result of the previous stage shift circuit by the sticky unit 813, and inputs the or operation result of the present stage to the sticky unit of the next stage shift circuit.
Third stage: if the shift control unit in the third stage shift circuit 830 determines that 4-bit shift is required, the shift control unit performs an or operation on the 4-bit discarded data segment discarded by the shift unit and the or operation result of the previous stage shift circuit through the sticky unit, and inputs the or operation result of the present stage to the sticky unit of the next stage shift circuit.
Fourth stage: if the shift control unit in the fourth stage shift circuit 840 determines that 2-bit shift is required, the shift control unit performs an OR operation on the 2-bit discarded data segment discarded by the shift unit and the OR operation result of the previous stage shift circuit by using the sticky unit, and inputs the OR operation result of the current stage to the sticky unit of the next stage shift circuit
Fifth stage: if the shift control unit in the fifth stage shift circuit 850 determines that 1-bit shift is required, the shift control unit performs an or operation on the 1-bit discarded data segment discarded by the shift unit and the or operation result of the previous stage shift circuit through the sticky unit, outputs the or operation result of the present stage, and outputs the final normalized shift result through the multiplexer.
Based on the design of the shift-by-edge OR operation, the bit width of the multiplexer in each stage of shift circuit can be reduced, and the area is further reduced.
Further, as shown in fig. 1 and 3, the exponent operation module 100 includes: an exponent operation unit 110 for performing operation on the first exponent and the second exponent, and outputting an initial exponent operation result; the index operation unit is at least used for calculating the absolute value of the target difference value; the exponent normalization processing unit 120 is configured to normalize the initial exponent operation result output by the exponent operation unit.
In some embodiments, the exponent operation unit is further configured to implement at least one of: calculating a sum of the first index and the second index; a difference between the first index and the second index is calculated.
In the exponential operation process, the intermediate operation result (i.e. the initial exponential operation result output by the exponential operation unit) may exceed the representation range of the standard floating point number, or the intermediate operation result is a non-standardized number, in this case, the exponential normalization processing unit performs normalization processing, so that the exponent operation result obtained by the normalization processing meets the requirement of the standard floating point number.
FIG. 5 is a circuit diagram of an index calculation unit according to an embodiment of the present application, which may be used to calculate a difference between a first index and a second index, calculate a sum of the first index and the second index, and calculate an absolute value of the difference between the first index and the second index in the corresponding embodiment of FIG. 5; as shown in fig. 5, the exponent operation unit 110 includes: a first inverting unit 111, a third fixed-point adder 112, a second inverting unit 113, and a first multiplexer 114;
The first inverting unit 111 is configured to selectively perform an inverting operation on the second exponent;
the third fixed-point adder 112 is configured to calculate a sum of the first exponent and the second exponent to obtain a first operation result, or calculate a sum of the first exponent and the inversion result output by the first inversion unit to obtain a second operation result;
a second inverting unit 113, configured to selectively perform an inverting operation on the first operation result;
the first multiplexer 114 is electrically connected to the second inverting unit and to the third fixed-point adder. Based on the exponent operation unit shown in fig. 5, it is assumed that the first exponent is opA1 and the second exponent is opB1;
1) For calculating the sum of the first index and the second index: the first inverting unit 111 and the second inverting unit 113 do not perform the inverting operation, and the third fixed-point adder 112 calculates the sum of the input first exponent OpA1 and second exponent OpB1; the first multiplexer 114 correspondingly outputs the result of adding the first exponent and the second exponent (i.e., the second operation result), that is: opa1+ opB1.
2) For calculating the difference between the first index and the second index: the first inverting unit 111 performs inverting operation on the second index to obtain-opB 1; the third fixed-point adder 112 adds the input first exponent OpA1 and the second exponent inversion result (i.e., -opB 1) output by the first inversion unit 111 to obtain opa1+ (-opB 1); the first multiplexer 114 corresponds to outputs opA1-opB1.
3) For calculating the absolute value of the difference between the first index and the second index: the first inverting unit 111 performs inverting operation on the second index to obtain-opB 1; the third fixed-point adder 112 adds the input first exponent OpA1 and the inverting result (i.e., -opB 1) of the second exponent output by the first inverting unit 111 to obtain opa1+ (-opB 1), that is, opA1-opB1, and selects an output channel according to the sign bit of OpA1-opB1, if the sign bit of OpA1-opB1 is indicated as positive, the channel from the right side is selected to be output, that is, opA1-opB1 is input to the first multiplexer 114, and then the first multiplexer 114 correspondingly outputs OpA1-opB1.
On the contrary, if the third fixed-point adder 112 determines that the sign bit of opA1-opB1 is negative, the output from the left channel is selected, that is, opA1-opB1 is input to the second inverting unit 113, and the second inverting unit 113 performs the inverting operation on opA1-opB1 to obtain (opA 1-opB 1), that is, opB1-opA1; thereafter, the first multiplexer 114 corresponds to the outputs opB-opA 1.
Based on the exponent operation unit shown in fig. 5, all exponent operation operations are realized through one fixed-point adder, one multiplexer and two negation units, the exponent operation unit has few elements, occupies a smaller area in the floating point operation device, and effectively ensures the integration level of the floating point operation device.
Fig. 6 is a circuit diagram of a floating point arithmetic device according to another embodiment of the present application, and the floating point arithmetic device shown in fig. 6 further includes a multiplication module 300 and a division module 400, compared to the floating point arithmetic device shown in fig. 3. The multiplication module 300 is configured to multiply the first mantissa and the second mantissa; the division operation module 400 is configured to divide the first mantissa and the second mantissa.
In this embodiment, the exponent operation unit in the floating point operation device may be as shown in fig. 5, and since the exponent operation unit shown in fig. 5 may be used to calculate the sum of the first exponent and the second exponent, the difference between the first exponent and the second exponent, and the absolute value of the difference between the first exponent and the second exponent, the first mantissa operation module 200, the multiplication operation module 300, and the division operation module 400 may be combined to implement addition and subtraction operations, multiplication operations, and division operations of the first floating point operand and the second floating point operand.
As shown in fig. 6, the multiplication module 300 includes: the partial product generating unit 310 is used for determining a partial product in the process of multiplying the first mantissa and the second mantissa, and the partial product compressing unit 320 is used for compressing the partial product and outputting a partial product compression result; the partial compression result includes two parts, namely carryover and sum.
The second mantissa operation branch 220 further includes a second multiplexer 222, where the second multiplexer 222 is configured to selectively input the antipodal shift result output by the second antipodal shift unit into the second fixed point adder, or input the partial product compression result (i.e. the carry and sum) into the second fixed point adder for full addition. Based on the second multiplexer 222 in the second mantissa operation branch, the multiplication operation module may multiplex the second fixed-point adder and the second normalization shift unit in the second mantissa operation branch to perform normalization shift, so as to reduce the number of elements in the floating point operation device as a whole, improve the integration level of the floating point operation device, and reduce the area of the floating point operation device as a whole.
Further, the floating point arithmetic device further includes a fourth multiplexer 225, and the fourth multiplexer 225 is configured to selectively input the output of the exponent normalization processing unit 120 and the output of the leading zero prediction unit 223 into the second normalization shift unit 227. The mantissa multiplication result is normalized and shifted by the second normalization shift unit 227 based on the output of the exponent normalization processing unit 120.
In the case where the multiplication operation module multiplexes the second fixed-point adder and the second normalization shift unit in the second mantissa operation branch 220, the second normalization shift unit may perform normalization shift on the addition result of the partial product output by the second fixed-point adder according to the sum of the first exponent and the second exponent.
With continued reference to fig. 6, the division operation module includes: an iterative division unit 410, configured to iterate the first mantissa and the second mantissa to perform division operation; the iterative division operation unit 410 may perform each round of iteration by using the SRT algorithm. The SRT algorithm is a method of non-restored binary division.
The division register 420 is used for temporarily storing the quotient and the remainder output by each iteration of the iterative division operation unit and inputting the remainder output by the last iteration to the iterative division operation unit for next division operation.
A third fixed-point adder 430 for adding the remainder in the division register.
The second sticky unit 440 is electrically connected to the third fixed-point adder, and is configured to or the remainder addition result output by the third fixed-point adder.
The second mantissa operation branch 220 further includes a third multiplexer 226, where the third multiplexer 226 is configured to selectively input the operation result output by the second fixed point adder 224 or the quotient output by the division register 420 into the second normalization shift element 227.
Based on the third multiplexer 226 in the second mantissa operation branch, the division operation module 400 may multiplex the second fixed-point adder in the second mantissa operation branch to perform quotient addition and multiplex the second normalization shift unit to perform normalization shift, thereby further reducing the number of elements in the floating point operation device, improving the integration level of the floating point operation device, and reducing the area of the floating point operation device as a whole.
With continued reference to fig. 6, the floating point arithmetic device further includes: the rounding processing module 500 is configured to perform rounding processing on the normalized shift result output by the first normalization shift unit and rounding processing on the normalized shift result output by the second normalization shift unit. In the right normalization shift process, the lower bits of the mantissa portion shifted to the right are discarded, thereby causing a certain error, so that it is necessary to reduce such an error by rounding the result of the normalization shift process.
In fig. 6, the first mantissa operation module 200 further includes a fifth multiplexer 228 for selectively inputting the normalized shift result output by the first normalized shift unit or the normalized shift result output by the second normalized shift unit to the rounding processing module 500.
The rounding processing module 500 comprises an adder 510, a control unit 520 and a sixth multiplexer 530, the control unit 520 being configured to select whether to perform a rounding operation of +1 on the input normalized shift result according to a rounding mode; adder 510 is used to round the input normalized shift result by +1. Specifically, the control unit 520 may determine whether to perform a rounding operation of +1 on the normalized shift result according to the output sticky bit, where the sticky bit is the result output by the second sticky unit or the result output by the sticky unit in the last stage of the shift circuit in the second normalized shift unit. Specifically, if the control unit 520 determines that the sticky bit is greater than the second threshold, it determines that the rounding operation of +1 is required, whereas if the sticky bit is less than the second threshold, it determines that the rounding operation is performed in the rounding mode that discards the extra number.
In the embodiment corresponding to fig. 6, the floating point arithmetic device further includes a post-processing module 700, configured to post-process the exponent operation result output by the exponent operation module and the rounding result output by the rounding processing module.
In the floating point operation, when the input data is abnormal data, the output needs to be NaN (Not a Number, non-Number, undefined or unrepresentable value). If there is an illegal operation during the operation, the output needs to be NaN, if the intermediate calculation result of the operation is beyond the representation range of the normalized number, the output needs to be INF (infimum, infinity) or saturated to the maximum value of the normalized number, and the processing of these feature data (for example, setting the output to NaN, or setting the output to INF, or the intermediate result beyond the identification range of the normalized number is saturated to the maximum value of the normalized number) can be processed by the post-processing module 700. In this embodiment, the post-processing operations of various floating point operations are unified to the post-processing module 700, that is, the post-processing module 700 is multiplexed by floating point multiplication, floating point addition and subtraction, and floating point division, instead of separately deploying the post-processing module for the floating point multiplication operation module, the floating point addition and subtraction module, and the floating point division module, thereby reducing the number of elements in the floating point operation device and reducing the area of the floating point operation device.
In the embodiment corresponding to fig. 6, the floating point arithmetic device further includes: the preprocessing module 600 is configured to parse at least a first exponent and a first mantissa in a first floating point operand (OpA) and to parse a second exponent and a second mantissa in a second floating point operand (OpB). Further, under the IEEE754 standard, floating point operands are classified as follows; normalized Number (normal), denormal Number (denormal), infinity (INF), and denominator (NaN, not a Number). The preprocessing module 600 may be configured to identify the type of the input floating-point operand, generate a status flag for identifying the type of the floating-point operand, and parse and determine exponent bits and mantissa bits in the floating-point operand, so that an exponent may be input to the exponent operation module and a mantissa may be input to the mantissa operation module (i.e., the first mantissa operation module, the multiplication operation module, or the division operation module) in a targeted manner. Therefore, the common pretreatment module for multiplying, adding, subtracting and dividing the floating-point operand is realized, and the number of elements in the floating-point arithmetic device is reduced, so that the area of the floating-point arithmetic device is reduced, and the integration level of the floating-point arithmetic device is improved.
In the embodiment corresponding to fig. 6, the floating point arithmetic device: 1. two input floating point operands may be added, subtracted, multiplied, and divided, and the four operations may multiplex critical logic processing blocks (e.g., preprocessing, rounding, and post-processing blocks); 2. the exponent operation unit can realize the addition and subtraction of exponents and the calculation of the absolute value of the difference value of exponents through an adder, a multiplexer and two inverting units; 3. multiplexing an adder and a second normalized shift unit in a second mantissa operation branch by a mantissa multiplication operation module and a mantissa division operation module; the floating point arithmetic device is highly integrated, fewer elements are used for ensuring that the floating point arithmetic device can realize four operations of floating point operands, the integration level of the floating point arithmetic device is improved, and compared with the floating point arithmetic device, the floating point arithmetic device has the advantages that a module for realizing addition and subtraction of the floating point operands, a module for realizing multiplication of the floating point operands and a module for realizing division of the floating point operands are respectively and independently deployed, the floating point arithmetic device of the embodiment has smaller elements, smaller area and higher integration level.
Of course, in other embodiments, the floating point arithmetic device may also include an exponent arithmetic module, a first mantissa arithmetic module, and a multiplication arithmetic module, so that the floating point arithmetic device is used to implement addition, subtraction, and multiplication of floating point operands; or the floating point arithmetic device comprises an exponent arithmetic module, a first mantissa arithmetic module and a division arithmetic module, so that the floating point arithmetic device is used for realizing addition and subtraction operation and division operation of floating point operands. And corresponds to multiplexing a common preprocessing module, a rounding processing module and a post-processing module in the addition and subtraction operation and multiplication operation of the floating point operand or the addition and subtraction operation and division operation of the floating point operand.
Based on the floating point arithmetic device shown in fig. 6:
1. if the first floating point operand X and the second floating point operand Y need to be added:
inputting X and Y into the preprocessing module 600, and if the preprocessing module 600 analyzes and determines that both X and Y are normalized numbers, further analyzing and determining a first index J in X X And a first mantissa S X The method comprises the steps of carrying out a first treatment on the surface of the Resolving a second index J in determination Y Y And a second mantissa S Y The method comprises the steps of carrying out a first treatment on the surface of the Thereafter, the first index J X And a second index J Y The first index J is calculated by the index operation unit 110, which is inputted to the index operation unit 110 in the index operation module 100 X And a second index J Y The absolute value M of the difference (see the above description of the corresponding embodiment of fig. 5 for the specific calculation process, and will not be repeated here).
If M is greater than the first threshold, the first mantissa S X And a second mantissa S Y Is input to a first mantissa operation branch 210. Specifically, mantissa exchange unit 211 is configured to perform a mantissa exchange according to first exponent J X And a second index J Y The difference (i.e. target interpolation) of the first mantissa S X And a second mantissa S Y Exchange to exchange the first mantissa S X And a second mantissa S Y The mantissa with smaller corresponding exponent is input as the first target mantissa into the first pair of order shifting units 212, and the first mantissa S X And a second mantissa S Y Is input to the first fixed-point adder 213 as the mantissa of the second target mantissa; the first order shifting unit 212 performs order shifting on the first target mantissa based on M, and correspondingly instructs the exponent operation unit 110 to update the exponent corresponding to the first target mantissa. Let it be assumed that the first mantissa S X As the first target mantissa, the first pair of order shift units 212 shift the first mantissa right by M bits, and the first exponent is correspondingly updated to J X +M。
Then, the first fixed-point adder 213 adds the antipodal shift result output by the first antipodal shift unit to the second target mantissa; and then used by the first normalization shift unit 214 to perform normalization shift processing on the operation result output by the first fixed point adder. Thereafter, the normalized shift result of the first normalized shift unit 214 is input to the rounding processing module 500 through the fifth multiplexer to be rounded, and thereafter post-processed by the post-processing module 700, and then the mantissa addition result is output.
The exponent operation module 100 updates the corresponding exponent according to the normalization performed by the first normalization shift unit 214, normalizes the exponent by the exponent normalization processing unit, inputs the normalized exponent to the post-processing module 700, and then outputs the processed exponent by the post-processing module. Thus, the exponent output by the post-processing module and the output addition result are combined to obtain an operation result of adding the first floating point operand X and the second floating point operand Y.
Subtracting the first floating point operand X from the second floating point operand Y may be converted into a process of adding two floating point operands, i.e., a process of adding the complements of the first floating point operand X and the second floating point operand Y, which is substantially similar to the process of adding the first floating point operand X and the second floating point operand Y, and will not be repeated herein.
2. If the first floating point operand X needs to be multiplied by the second floating point operand Y:
after processing by the preprocessing module 600, the first index J X And a second index J Y Input into the exponent operation module 100, calculate a first exponent J X And a second index J Y And (2) a sum of (2); and the first mantissa S X And a second mantissa S Y Input to the multiplication module 300 to generate a first mantissa S X And a second mantissa S Y Multiplying.
Specifically, the partial product generating unit 310 in the multiplication module 300 calculates the first mantissa S X And a second mantissa S Y A plurality of partial products in the multiplication, and each partial product is input to the partial product compression unit 320 to be compressed, and carry (carry) and sum (sum) are output; thereafter, the second multiplexer 222 is gated, and the carry and sum are input into the second fixed-point adder 224 through the second multiplexer 222 to be added; thereafter, the addition result of the carry and sum outputted from the second fixed point adder 224 is inputted to the second normalization shift unit 227 through the third multiplexer 226 to perform normalization shift; thereafter, the normalized shift result of the second normalized shift unit 227 is input to the rounding processing module 500 through the fifth multiplexer 228 to be rounded, and thereafter post-processed by the post-processing module 700, and then the mantissa multiplication result is output.
The exponent operation module 100 calculates a first exponent J X And a second index J Y After the sum, the resulting index sum is updated according to the normalization shift processing performed by the second normalization shift element 227; thereafter, the updated index sum is input to the index normalization processing unit 120 for normalizationAnd inputs the normalized index sum to the post-processing module 700, and then outputs the processed index addition result by the post-processing module. Thus, the arithmetic result of multiplying the first floating point operand X and the second floating point operand Y is obtained by combining the exponent addition result output by the post-processing module and the mantissa multiplication result output by the post-processing module.
3. If it is desired to divide the first floating point operand X by the second floating point operand Y:
after processing by the preprocessing module 600, the first index J X And a second index J Y Input into the exponent operation module 100, calculate a first exponent J X And a second index J Y Is the difference between (1); and the first mantissa S X And a second mantissa S Y Input to the multiplication module 300 to generate a first mantissa S X And a second mantissa S Y Dividing.
Specifically, the iterative division unit 410 in the division operation module 400 uses the SRT algorithm to calculate the first mantissa S X And a second mantissa S Y Performing iterative division, and temporarily storing the results (quotient and remainder) of each iterative division in a division register 420, wherein the division register 420 inputs the remainder obtained by the last iterative division into an iterative division operation unit to perform the next division operation, and inputs the quotient of each iterative division into a second normalization shift unit 227 through a third multiplexer 226 to perform normalization shift processing; the division register 420 also inputs the quotient of each iterative division to the third fixed-point adder 430, the remainder of the iterative division is added by the third fixed-point adder 430, and the remainder addition result is input to the second sticky unit 440 to perform an or operation on the remainder addition result; thereafter, the rounding processing module 500 is controlled by the OR result output by the second sticky unit 440 to round the normalized shift result of the quotient obtained by dividing the mantissa output by the fifth multiplexer, i.e. if the OR result output by the second sticky unit 440 is larger (e.g. larger than 0.5), a rounding processing mode of +1 is adopted (i.e. lower order 1), and if the OR result output by the second sticky unit 440 is smaller (e.g. smaller than 0.5), a rounding processing mode of discarding the redundant number is adopted The formula corresponds to the normalized shift result of the direct output quotient at the moment; post-processing is then performed by post-processing module 700, after which the mantissa division result is output.
The exponent operation module 100 calculates a first exponent J X And a second index J Y After that, the obtained index difference is updated according to the normalization shift processing performed by the second normalization shift unit 227, and then the updated index difference is input to the index normalization processing unit 120 to perform normalization processing, and the normalized index difference is input to the post-processing module 700, and then the post-processing module outputs the processed index subtraction result. Thus, the arithmetic result of dividing the first floating point operand X and the second floating point operand Y is obtained by combining the exponent subtracting result output by the post-processing module and the mantissa dividing result output by the post-processing module.
FIG. 7 is a block diagram of a vector processing apparatus 1100 according to one embodiment of the present application, as shown in FIG. 7, including a plurality of parallel vector processing units 1110 and a register assembly 1120, the register assembly 1120 including a plurality of registers 1121 for temporarily storing source and destination operands of vector processing; the vector processing unit 1110 includes the floating-point arithmetic device 1000 provided in any one of the embodiments above, where the floating-point arithmetic device reads the first floating-point operand and the second floating-point operand from the register unit and writes the arithmetic results of the first floating-point operand and the second floating-point operand to the register unit.
Further, the vector processing unit 1110 further includes one or more fixed point operators 1111 and a seventh multiplexer 1112, and the seventh multiplexer 1112 is configured to selectively output an operation result of the fixed point operators 1111 or an operation result of the floating point arithmetic device 1000.
In the related art, a floating point operation unit for implementing floating point addition and subtraction, a floating point operation unit for implementing floating point multiplication operation and a floating point operation unit for implementing floating point division operation are disposed independently of each other in a vector processing unit of a vector processing device, so that the whole area of the vector processing device occupied a chip is too large, and the surface efficiency ratio is not high for a highly parallel vector processing unit. In addition, operands read from the register summary need to be broadcast to each operation component, so that the output fan-out of a read port of the register is larger, and therefore, the convergence of time sequence and wiring is not facilitated in the process of physically realizing the back end of the chip. In addition, the floating point operation units are selected by the multiplexer in the last stage, and as the floating point operation units are mutually independent, the distance between the floating point operation units in the physical implementation process can be far, and therefore, the time sequence of the multiplexer stage is not easy to converge.
On the basis of the vector processing device provided by the application, if the floating point arithmetic device is the floating point arithmetic device provided by the embodiment of fig. 6, the deep fusion of addition and subtraction operation, multiplication operation and division operation of the floating point number is realized, and the common logic processing units are multiplexed, so that the number of elements of the vector processing device is reduced as a whole, and the area of the vector processing device is reduced; in addition, the operand read from the register only needs to be transmitted to a preprocessing module in the floating point operation device, broadcasting is not required to be carried out at a reading port of the register, and the difficulty of layout and wiring can be greatly reduced; moreover, due to the deep fusion and high cohesion of the floating point arithmetic device, the timing of the path for writing the arithmetic result back to the register element is easy to converge.
In practice, the vector processing device in the related art has an area of 2150um 2 by integrating with an integrating tool (design combiner), and the vector processing device including the floating point computing device in the embodiment of fig. 6 has an area of 1630um 2. It can be seen that the area of the vector processing device provided by the application is 76% of the area of the vector processing device in the related art, so that the vector processing device provided by the application has small area and high integration level.
Further, the timing sequence convergence performance of the vector processing device provided by the application under the higher main frequency is tested, and the test shows that the floating point addition and subtraction two-stage running water in the vector processing device provided by the application can perform timing sequence convergence under the higher main frequency, and the floating point multiplication three-stage running water can perform timing sequence convergence, so that the performance is higher.
In the related art, in the AI processor, the vector processing capability is a key factor restricting the performance of the AI processor, and the parallelism of vector processing units of the high-performance AI processor is very high, which can reach 128 paths of FP32 (Full Precise Float, single precision) or 256 paths of FP16. The key point of the vector processing unit is a floating point operation device, the floating point operation has wide application in reasoning and training of a deep learning model due to high precision and large dynamic range, but the floating point operation is complex, a floating point operation unit which is completely compatible with IEEE754 protocol needs to consume a large amount of resources, for a common scalar processor, the floating point operation unit only needs to instantiate one way, if the area of the floating point operation unit is large, the influence on the whole is not great, but for a high-performance vector processor, the floating point operation unit needs to instantiate multiple ways, and the area ratio of the floating point operation unit in the processor is greatly improved. In the vector processing device provided by the application, the floating point arithmetic devices are highly cohesive, and various floating point arithmetic operations multiplex a common logic processing module, so that the area occupation ratio of the floating point arithmetic devices in the vector processing unit is reduced and the integration level of the vector processing device is improved on the premise that the functions of various floating point arithmetic operations of the floating point arithmetic devices are realized.
The application also provides a processor which comprises the floating point arithmetic device in the embodiment. The processor can be applied to the technical field of artificial intelligence, such as neural network reasoning and training.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
Fig. 8 is a block diagram of an electronic device according to an embodiment of the application. It should be noted that, the electronic device 1300 shown in fig. 8 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 8, the electronic apparatus 1300 includes a processor 1301 that can execute various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1302 or a program loaded from a storage section 1308 into a random access Memory (Random Access Memory, RAM) 1303. The processor 1301 may be a processor including a floating point arithmetic device provided by the present application, and may be used to perform various operations on floating point numbers.
In the RAM 1303, various programs and data required for the system operation are also stored. The CPU1301, ROM1302, and RAM 1303 are connected to each other through a bus 1304. An Input/Output (I/O) interface 1305 is also connected to bus 1304.
The following components are connected to the I/O interface 1305: an input section 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage portion 1308 including a hard disk or the like; and a communication section 1309 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. Removable media 1311, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1310 so that a computer program read therefrom is installed as needed into storage portion 1308.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (16)

1. A floating point arithmetic device for operating on a first floating point operand and a second floating point operand, the first floating point operand including a first exponent and a first mantissa, the second floating point operand including a second exponent and a second mantissa, the floating point arithmetic device comprising:
the index operation module is used for operating the first index and the second index and outputting an index operation result; the exponent operation result at least comprises an absolute value of a target difference value, wherein the target difference value is the difference between a first exponent and a second exponent;
the first mantissa operation module comprises a first mantissa operation branch; inputting the first mantissa and the second mantissa into the first mantissa operation branch if the absolute value of the target difference is greater than a first threshold;
the first mantissa operation branch comprises a mantissa exchanging unit, a first pair of order shifting units, a first fixed point adder and a first normalized shifting unit; the mantissa exchanging unit is used for exchanging the first mantissa with the second mantissa according to the target difference value so as to input a first target mantissa to the first order shifting unit and input a second target mantissa to the first fixed-point adder; the first fixed-point adder is used for carrying out addition operation on the order shift result output by the first order shift unit and the second target mantissa; the first normalization shift unit is used for performing normalization shift processing on the operation result output by the first fixed-point adder; the first target mantissa is a mantissa with a smaller corresponding exponent of the first mantissa and the second mantissa; the second target mantissa refers to the other mantissa of the first mantissa and the second mantissa other than the first target mantissa; the first pair of order shift units has a maximum number of shiftable bits greater than the maximum number of shiftable bits of the first normalization shift unit.
2. The floating point arithmetic device of claim 1, wherein the first mantissa operation module further comprises a second mantissa operation branch, the first mantissa and the second mantissa being input to the second mantissa operation branch if an absolute value of the target difference value is not greater than a first threshold;
the second mantissa operation branch comprises a second opposite-order shifting unit, a second fixed-point adder, a leading zero prediction unit and a second normalization shifting unit, wherein the second opposite-order shifting unit is used for carrying out opposite-order shifting processing on the first mantissa and the second mantissa; the second fixed-point adder is used for adding the order shift result output by the second order shift unit; the leading zero prediction unit is used for leading zero prediction on the operation result of the second fixed-point adder; the second normalization shift unit is used for performing normalization shift processing on the operation result output by the second fixed point adder according to the leading zero prediction result output by the leading zero prediction unit; the maximum number of shiftable bits of the second normalization shift element is greater than the maximum number of shiftable bits of the second pair of order shift elements.
3. The floating point arithmetic device of claim 2, wherein the second normalization shift unit comprises at least two cascaded shift circuits including a shift control unit, a shift unit, a sticky unit, and a multiplexer;
the shift control unit is used for determining whether the input data is required to be subjected to normalized shift, and if the input data is determined to be subjected to normalized shift, the input data is input into the shift unit; inputting the input data into the multiplexer if it is determined that normalization shift is not required for the input data;
the shifting unit is used for carrying out normalized shifting processing on the input data, inputting discarded data segments to be discarded in the input data into the sticky unit and inputting normalized shifting results of the input data into the multiplexer;
the sticky unit is used for performing OR operation on the discarded data segment and inputting an OR operation result into a sticky unit in a next-stage shift circuit;
the multiplexer is used for selectively inputting the input data or normalized shift result of the input data to a next stage shift circuit to serve as the input data of the next stage shift circuit.
4. The floating point arithmetic device as claimed in claim 3, wherein the maximum shift number corresponding to the shift circuit of the upper stage is greater than the maximum shift number corresponding to the shift circuit of the lower stage in the cascade order from first to second in the second normalization shift unit.
5. The floating point arithmetic device of claim 2, wherein the exponent arithmetic module comprises:
the index operation unit is used for operating the first index and the second index and outputting an initial index operation result; the index operation unit is at least used for calculating the absolute value of the target difference value;
and the index normalization processing unit is used for normalizing the initial index operation result output by the index operation unit.
6. The floating point arithmetic device of claim 5, wherein the exponent arithmetic unit is further configured to calculate a sum of the first exponent and the second exponent, and to calculate a difference between the first exponent and the second exponent;
the exponent operation unit includes: the first inverting unit, the third fixed-point adder, the second inverting unit and the first multiplexer;
the first negation unit is used for selectively negating the second index;
The third fixed-point adder is used for calculating the sum of the first index and the second index to obtain a first operation result, or is used for calculating the sum of the first index and the inversion result output by the first inversion unit to obtain a second operation result;
the second inverting unit is used for selectively inverting the first operation result;
the first multiplexer is electrically connected with the second inverting unit and the third fixed-point adder.
7. The floating point arithmetic device of claim 6, further comprising:
and the multiplication operation module is used for carrying out multiplication operation on the first mantissa and the second mantissa.
8. The floating point arithmetic device of claim 7, wherein the multiplication module comprises: the partial product generation unit is used for determining a partial product in the multiplication process of the first mantissa and the second mantissa, and the partial product compression unit is used for compressing the partial product and outputting a partial product compression result;
the second mantissa operation branch further includes a second multiplexer, where the second multiplexer is configured to selectively input the antipodal shift result output by the second antipodal shift unit to the second fixed-point adder, or input the partial compression result to the second fixed-point adder.
9. The floating point arithmetic device of claim 6, further comprising:
and the division operation module is used for carrying out division operation on the first mantissa and the second mantissa.
10. The floating point arithmetic device of claim 9, wherein the division operation module comprises:
an iterative division operation unit, configured to iteratively perform division operation on the first mantissa and the second mantissa;
the division register is used for temporarily storing the quotient and the remainder output by each iteration of the iterative division operation unit and inputting the remainder output by the last iteration into the iterative division operation unit to carry out the next division operation;
a third fixed-point adder for adding the remainder in the division register;
the second sticky unit is electrically connected with the third fixed-point adder and is used for performing OR operation on the remainder addition result output by the third fixed-point adder;
the second mantissa operation branch further includes a third multiplexer for selectively inputting an operation result output by the second fixed point adder or a quotient output by the division register to the second normalization shift element.
11. The floating point arithmetic device according to any one of claims 1 to 10, characterized in that the floating point arithmetic device further comprises: and the rounding processing module is used for rounding the normalized shift result output by the first normalized shift unit and rounding the normalized shift result output by the second normalized shift unit.
12. The floating point operator according to claim 11, further comprising a post-processing module for post-processing the exponent operation result output by the exponent operation module and the rounding result output by the rounding module.
13. The floating point arithmetic device according to any one of claims 1 to 10, characterized in that the floating point arithmetic device further comprises: and the preprocessing module is used for analyzing and determining at least a first exponent and a first mantissa in the first floating-point operand and analyzing and determining a second exponent and a second mantissa in the second floating-point operand.
14. A vector processing apparatus comprising a plurality of parallel vector processing units and a register assembly, the vector processing units comprising the floating point arithmetic device of any one of claims 1 to 13, the floating point arithmetic device reading first and second floating point operands from the register assembly and writing the arithmetic result of the first and second floating point operands to the register assembly.
15. A processor comprising a floating point arithmetic device as claimed in any one of claims 1 to 13.
16. An electronic device comprising the processor of claim 15.
CN202211436156.5A 2022-11-16 2022-11-16 Floating point arithmetic device, vector processing device, processor, and electronic apparatus Pending CN116974512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211436156.5A CN116974512A (en) 2022-11-16 2022-11-16 Floating point arithmetic device, vector processing device, processor, and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211436156.5A CN116974512A (en) 2022-11-16 2022-11-16 Floating point arithmetic device, vector processing device, processor, and electronic apparatus

Publications (1)

Publication Number Publication Date
CN116974512A true CN116974512A (en) 2023-10-31

Family

ID=88481960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211436156.5A Pending CN116974512A (en) 2022-11-16 2022-11-16 Floating point arithmetic device, vector processing device, processor, and electronic apparatus

Country Status (1)

Country Link
CN (1) CN116974512A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692008A (en) * 2023-12-21 2024-03-12 摩尔线程智能科技(北京)有限责任公司 Circuit and method for normalizing data, chip and computing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692008A (en) * 2023-12-21 2024-03-12 摩尔线程智能科技(北京)有限责任公司 Circuit and method for normalizing data, chip and computing device

Similar Documents

Publication Publication Date Title
US4758972A (en) Precision rounding in a floating point arithmetic unit
JPH02196328A (en) Floating point computing apparatus
KR20080055985A (en) Floating-point processor with selectable subprecision
CN112204517A (en) Multi-input floating-point adder
US20030028572A1 (en) Fast single precision floating point accumulator using base 32 system
CN117111881B (en) Mixed precision multiply-add operator supporting multiple inputs and multiple formats
CN116974512A (en) Floating point arithmetic device, vector processing device, processor, and electronic apparatus
JPH09212337A (en) Floating-point arithmetic processor
US4639887A (en) Bifurcated method and apparatus for floating point addition with decreased latency time
CN117648959B (en) Multi-precision operand operation device supporting neural network operation
US6542915B1 (en) Floating point pipeline with a leading zeros anticipator circuit
KR102639646B1 (en) Multi-input floating point adder
KR100290906B1 (en) method and appratus for performing simultaneously addition and rounding in a floating point multiplier
CN114201140B (en) Exponential function processing unit, method and neural network chip
JPH02294821A (en) Floating point arithmetic processor
CN113377334B (en) Floating point data processing method and device and storage medium
KR100331846B1 (en) floating addition
KR920003493B1 (en) Operation circuit based on floating-point representation
US20060277242A1 (en) Combining circuitry
CN111124361A (en) Arithmetic processing apparatus and control method thereof
Nguyen et al. A combined IEEE half and single precision floating point multipliers for deep learning
JP3257278B2 (en) Normalizer using redundant shift number prediction and shift error correction
JP3187402B2 (en) Floating point data addition / subtraction circuit
CN116974513A (en) Data processing method, device, chip, computer equipment and storage medium
US20190155573A1 (en) Handling floating-point operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication