CN115390790A - Floating point multiply-add unit with fusion precision conversion function and application method thereof - Google Patents

Floating point multiply-add unit with fusion precision conversion function and application method thereof Download PDF

Info

Publication number
CN115390790A
CN115390790A CN202210917746.3A CN202210917746A CN115390790A CN 115390790 A CN115390790 A CN 115390790A CN 202210917746 A CN202210917746 A CN 202210917746A CN 115390790 A CN115390790 A CN 115390790A
Authority
CN
China
Prior art keywords
precision
mantissa
multiply
floating
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210917746.3A
Other languages
Chinese (zh)
Inventor
黄立波
谭弘兵
肖立权
邓全
郭维
郭辉
雷国庆
沈俊忠
王俊辉
孙彩霞
隋兵才
王永文
倪晓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210917746.3A priority Critical patent/CN115390790A/en
Publication of CN115390790A publication Critical patent/CN115390790A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers

Abstract

The invention discloses a floating point multiply-add unit with a fusion precision conversion function and an application method thereof, wherein an index adjustment module, a mantissa rounding module and a result output module of a standard floating point multiply-add operation unit are modified, so that the index adjustment module performs index adjustment based on the number of leading zeros in normalized shifted leading mantissas and the difference value of index basic values of the current mixed precision operation addend precision and target precision; the mantissa rounding module is used for rounding the mantissa according to the target precision type to be supported; the result output module is used for combining the sign bit, the exponent and the rounded mantissa, obtaining a combined result for each precision, and selecting one of the combined results as a final output result according to the specific precision conversion control signal. The invention can reduce the hardware cost and improve the realization efficiency of the deep learning algorithm by modifying the standard floating point multiply-add operation unit to integrate the precision conversion function, and has the advantages of low cost and high efficiency.

Description

Floating point multiply-add unit with fusion precision conversion function and application method thereof
Technical Field
The invention belongs to the floating point functional unit design technology in the technical field of processor design, and particularly relates to a floating point multiply-add unit with a fusion precision conversion function and an application method thereof.
Background
In the current deep learning hardware platform, the algorithm model training is usually calculated by adopting a floating point data format, the data flow is complex, various types of operation are involved, and the hardware overhead is high. In order to reduce huge hardware overhead in the training process, the current mainstream deep learning training platform usually adopts a mixed precision training mode, improves the hardware throughput rate and simultaneously reduces the hardware complexity by adopting low-precision multiplication, and maintains the training precision by assisting high-precision addition. However, in the process of hybrid precision training of the deep learning network model, because the requirements of each operator in the model on data precision are different, precision conversion needs to be frequently performed in the transmission process of data between each network layer so as to reduce overflow in the calculation process, which puts new requirements on the calculation mode of hardware. In order to meet the requirement of data precision conversion in the deep learning model training process, a special precision conversion unit is usually designed in the traditional mode, and an independent conversion circuit is designed for each type of precision conversion. The mixed precision floating point multiply-add unit adopts a high precision result output by a calculation mode of low precision multiply and high precision add, and after the high precision result is sent to the precision conversion unit, the high precision multiply-add result is converted into a target (low) precision output according to the precision requirement of the next network layer. The traditional implementation mode has the defects that the floating-point multiply-add operation and the precision conversion operation are completely independent and are implemented in different hardware units, so that the overhead is high.
Disclosure of Invention
And analyzing and finding out the following steps aiming at the actual deep learning algorithm pattern: the floating-point multiplication and addition operator and the precision conversion operator are usually adjacent and can be fused. The floating point precision conversion is essentially to perform corresponding processing on the exponent and the mantissa of the floating point number, which is similar to the processing mode of exponent adjustment and mantissa rounding in the floating point multiply-add operation. Therefore, aiming at the problems in the prior art and aiming at meeting the data requirement of deep learning mixed precision training, the invention provides a floating point multiply-add unit with a fusion precision conversion function and an application method thereof.
In order to solve the technical problems, the invention adopts the technical scheme that:
a floating-point multiply-add unit with fused precision conversion functionality, comprising:
the data preprocessing module is used for extracting a sign bit, an exponent bit field and a mantissa bit field of an input floating point operand according to the operation type and supplementing a hidden bit of an unsigned mantissa in the mantissa bit field;
the mantissa multiplication module is used for multiplying unsigned mantissas which are supplemented with the hidden bits to obtain a mantissa product;
the exponent difference module is used for subtracting the exponents of the multiplication operands and the exponents of the addition operands to obtain exponent difference values;
the addend order-matching shifting module is used for performing order-matching shifting on addends according to the calculated exponential difference values;
the addition module is used for adding the mantissa product and the addend after the order shift to obtain a new mantissa;
the leading zero calculating module is used for calculating the number of leading zeros in the new mantissa;
the normalization shift module is used for carrying out leftward logic shift on the new mantissa, and the leftward shifted data amount is the number of leading zeros in the new mantissa calculated by the leading zero calculation module;
the floating-point multiply-add unit with the fusion precision conversion function further comprises:
the index adjustment module is used for performing index adjustment based on the number of leading zeros in the new mantissa and the difference value of the index basic value of the current mixed precision operation addend precision and the target precision;
the mantissa rounding module is used for rounding the mantissa according to the target precision type to be supported;
and the result output module is used for combining the sign bit, the exponent and the rounded mantissa, obtaining a combined result for each precision, and selecting one of the combined results as a final output result according to the specific precision conversion control signal.
Optionally, the index adjustment module includes:
the difference value calculation submodule is used for calculating an index base value difference value between the index base value of the current mixed precision operation addend precision and the index base value of the target precision;
the merging and adding submodule is used for summing the number of leading zeros in the new mantissa and the difference value of the exponent base value;
the index correction subtraction submodule is used for subtracting the sum of the merging and adding submodule from the index difference value obtained by the index difference module to obtain a new index;
an overflow judging submodule for judging whether the new index overflows or not, if not, directly outputting the new index as the index after precision conversion, otherwise, skipping to execute the maximum/minimum value module
And the maximum/minimum value taking module is used for selecting 0 as the index output after the precision conversion if the new index is less than 0 when the new index overflows, and selecting the maximum value capable of being represented by the target precision index as the index output after the precision conversion if the new index is greater than the maximum value capable of being represented by the target precision index.
Optionally, when the overflow determining sub-module determines whether the new index overflows, if the new index is smaller than 0 or larger than 31, it is determined that the new index overflows, otherwise, it is determined that the new index does not overflow.
Optionally, the mantissa rounding module comprises:
the control bit generation submodule is used for generating corresponding rounding related control bits aiming at the mantissa after the logic shift of the normalization shift module and aiming at various different target precisions, and the rounding related control bits comprise a bonding bit G, a rounding bit R and a reserved bit LSB;
the selection output submodule is used for selecting a corresponding rounding related control bit according to the target precision and outputting a corresponding rounded mantissa according to the rounding related control bit;
and the rounding carry judgment submodule is used for generating a sign bit according to the rounded mantissa.
Optionally, the generating the sign bit according to the rounded mantissa by the rounded carry judgment sub-module includes: judging whether the rounded mantissa carries or not, and when the rounded mantissa is all 1 and carries, carrying is required to be carried at the highest bit of the rounded mantissa; if the highest bit of the rounded mantissa carries a carry, adding 1 to the exponent, and changing the rounded mantissa into all 0; and outputting the exponent as a sign bit to a result output module together with the exponent and the rounded mantissa.
Optionally, the plurality of different target accuracies includes some or all of single accuracy, half accuracy, bfloat16, and Tensorfloat-32.
In addition, the invention also provides a processor, which comprises a processor body and a floating point multiply-add unit arranged in the processor body, wherein the floating point multiply-add unit is the floating point multiply-add unit with the fusion precision conversion function.
In addition, the invention also provides computer equipment which comprises a processor and a memory which are connected with each other, wherein the processor comprises a processor body and a floating point multiply-add unit arranged in the processor body, and the floating point multiply-add unit is the floating point multiply-add unit with the fusion precision conversion function.
In addition, the invention also provides an application method of the floating-point multiply-add unit with the fusion precision conversion function, which comprises the following steps:
s1, extracting a mixed precision multiply-add operator and a precision conversion operator based on convolution or matrix multiply operator of a target deep learning network model, wherein the mixed precision multiply-add operator is used for completing mixed precision multiply-add operation, and the precision conversion operator is used for completing precision conversion, and comprises the steps of converting a low-precision floating point number into a high-precision floating point number and converting the high-precision floating point number into a low-precision floating point number;
s2, a pair of operators with a mixed precision multiply-add operator in front and a precision conversion operator in back are fused into a fusion operator, the fusion operator is used for completing the mixed precision multiply-add operation and then completing the precision conversion, and the target precision to be supported of the fusion operator is determined based on the application requirement of the target deep learning network model;
and S3, the fusion operator is realized by adopting the floating point multiplication and addition unit with the fusion precision conversion function so as to accelerate convolution or matrix multiplication operation in the target deep learning network model.
Optionally, in step S1, the low-precision floating point number is one of half-precision, bfoat 16 and TensorFloat-32, and the high-precision floating point number is single-precision; when the fusion operator is obtained in the step S2, the fusion operator is obtained by fusing a mixed precision multiply-add operator in a forward propagation stage and a precision conversion operator for converting high-precision floating point numbers after the forward propagation stage into low-precision floating point numbers; and fusing the mixed precision multiply-add operator in the back propagation stage and the precision conversion operator for converting the high-precision floating point number after the back propagation stage into the low-precision floating point number into a fusion operator.
Compared with the prior art, the invention mainly has the following advantages:
in the embodiment, precision conversion and floating-point multiply-add are integrated into one arithmetic unit, so that the hardware overhead can be effectively reduced, and the algorithm implementation efficiency is improved. The concrete is characterized in that:
1. the hardware overhead is low: the traditional implementation mode is that the combination of the floating-point multiply-add unit and the precision conversion unit needs to design two types of hardware circuits, and the hardware cost is high. In the embodiment, the same function can be realized only by adding a small amount of hardware logic to the exponent adjusting part and the mantissa rounding part in the floating-point multiply-add unit, and the hardware overhead is effectively reduced.
2. The execution efficiency is high: in a traditional mode, a floating point multiply-add unit is called through a multiply-add instruction, a multiply-add result is obtained through calculation, then the multiply-add result is written back to a storage component, data is read out from a memory through a precision conversion instruction, the data is sent to a precision conversion unit, and the data is written back to the storage component after conversion, wherein 3 times of access to the memory is involved. In the embodiment, precision conversion is directly carried out in the floating-point multiply-add operation process, the converted result is directly written back to the storage component, only 1 storage access is needed, the efficiency is greatly improved, and the software control process is simplified.
Drawings
FIG. 1 is a schematic diagram of a floating-point multiply-add unit with a fused precision conversion function according to the present invention.
Fig. 2 is a schematic structural diagram of an index adjustment module according to an embodiment of the present invention.
FIG. 3 is a block diagram of a mantissa rounding module and a result output module according to an embodiment of the present invention.
Fig. 4 is a training data flow of a target deep learning network model according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the floating-point multiply-add unit with the fusion precision conversion function in this embodiment has the same structure as the conventional floating-point multiply-add unit, and both include:
the data preprocessing module is used for extracting a sign bit, an exponent bit field and a mantissa bit field of an input floating point operand according to the operation type and supplementing a hidden bit of an unsigned mantissa in the mantissa bit field;
the mantissa multiplication module is used for multiplying unsigned mantissas which are supplemented with the hidden bits to obtain a mantissa product;
the exponent difference module is used for subtracting the exponents of the multiplication operands and the exponents of the addition operands to obtain exponent difference values;
the addend order-matching shifting module is used for performing order-matching shifting on addends according to the calculated exponential difference values;
the addition module is used for adding the mantissa product and the addend after the order shift to obtain a new mantissa;
the leading zero calculation module is used for calculating the number of leading zeros in the new mantissa;
the normalization shift module is used for carrying out leftward logic shift on the new mantissa, and the leftward shifted data amount is the number of leading zeros in the new mantissa obtained by the leading zero calculation module;
different from the structure of the existing floating-point multiply-add unit, the floating-point multiply-add unit with the fused precision conversion function in the embodiment further includes the following improved modules:
the exponent adjusting module is used for performing exponent adjustment based on the number of leading zeros in the new mantissa and the difference value of the exponent basic values of the current mixed precision operation addend precision and the target precision; in the conventional floating-point multiply-add unit, the index adjustment module only needs to consider the number of leading 0 s, and the embodiment combines the precision conversion function and also needs to consider the difference between the index base value of the mixed precision multiply-add high-precision result and the target precision index base value;
the mantissa rounding module is used for rounding the mantissa according to the target precision type to be supported;
and the result output module is used for combining the sign bit, the exponent and the rounded mantissa, obtaining a combined result for each precision, and selecting one of the combined results as a final output result according to the specific precision conversion control signal.
In this embodiment, based on the conventional floating-point multiply-add unit, a circuit for data precision conversion is added, and two types of operations of floating-point multiply-add and data precision conversion are fused, so as to meet the requirement on the data stream format in deep learning mixed precision training. The conventional floating-point multiply-add operation is divided into a plurality of stages: the method comprises data preprocessing, mantissa multiplication, exponent difference, addend pair order, addition, leading zero calculation, exponent adjustment, normalized shift, mantissa rounding and result output. The embodiment will increase the support of precision conversion operation, and will improve the 3 parts of exponent adjustment, mantissa rounding and result output. The specific improvements are described as follows: index adjustment: in the conventional floating-point multiply-add calculation process, the exponent needs to be subjected to corresponding subtraction operation on the predicted exponent according to the result of leading zero calculation so as to obtain the final result exponent. In the process of floating-point precision conversion, because the exponent base values are different in different floating-point formats, the exponents also need to be subjected to addition/subtraction operations, for example, the exponent base number in the single-precision floating-point format is 127, and the exponent base number in the half-precision floating-point format is 15, so that the exponent is subtracted from 112 (127-112) to obtain the converted exponent result. In this embodiment, to integrate the precision conversion function, the radix difference of precision conversion is added to the result obtained by calculating the leading zeros, and then corresponding addition/subtraction is performed. In the process of converting high-precision data into low precision data, overflow may occur due to the fact that the exponent bit field is reduced (for example, the single precision exponent is 8 bits, and the half precision exponent is 5 bits), and the related regulations in the IEEE 754-2008 standard refer to the processing of overflow. Rounding mantissa: mantissas are rounded in a manner similar to a conventional multiply-add unit, using the roundup mode of roundTiesToEven in the IEEE 754-2008 standard. While the conventional floating-point multiply-add unit only needs to provide a round for one precision, the round mode is fixed, in this embodiment, the round circuit logic needs to be designed for each possible target precision, and one mantissa result is obtained for each target precision. And (4) outputting a result: and combining the adjusted exponent, the rounded mantissa result and the sign bit, obtaining a combined result for each possible target output precision, and selecting one of the possible target output precisions as an output according to the actual target precision type.
As shown in fig. 2, the index adjustment module includes:
the difference value calculation submodule is used for calculating an index base value difference value between the index base value of the current mixed precision operation addend precision and the index base value of the target precision;
for example, the current mixed precision multiply-add mode is FP16 × FP16+ FP32, the multiply-add mode respectively corresponds to the floating point precision of the matrices a, B, and C in the a × B + C general matrix multiply-add operation, the precision of the result obtained by the multiply-add calculation is FP32, and the exponent base value is 127. The target precision is FP16, the base value of the index is 15, and the index needs to be additionally subtracted by 112 (127-15) during index adjustment to obtain a converted index result;
the merging and adding submodule is used for summing the number of leading zeros in the new mantissa and the difference value of the exponent base value;
the index correction subtraction submodule is used for subtracting the sum of the merging and adding submodule from the index difference value obtained by the index difference module to obtain a new index;
an overflow judging submodule for judging whether the new index overflows or not, if not, directly outputting the new index as the index after precision conversion, otherwise, skipping to execute the maximum/minimum value module
And the maximum/minimum value taking module is used for selecting 0 as the index output after the precision conversion if the new index is less than 0 when the new index overflows, and selecting the maximum value capable of being represented by the target precision index as the index output after the precision conversion if the new index is greater than the maximum value capable of being represented by the target precision index.
During the transition from high accuracy to low accuracy, the index may overflow. For example, converting from single precision to half precision, the bit width of the exponent is reduced from 8 bits to 5 bits, and if the adjusted exponent value is greater than 31 or less than 0, overflow is indicated. In this embodiment, when the overflow determining submodule determines whether the new index overflows, if the new index is smaller than 0 or larger than 31, it determines that the new index overflows, otherwise, it determines that the new index does not overflow.
Similar to the modification of the exponent adjusting part, fig. 3 illustrates a mantissa rounding manner after merging the precision conversion function in this embodiment, which may be configured to support conversion of multiple data precisions. As shown in fig. 3, the mantissa rounding module includes:
a control bit generation submodule (shown in a dotted area at the upper side in fig. 3) for generating corresponding rounding related control bits for a plurality of different target precisions for the mantissa after the logical shift of the normalized shift module, wherein the rounding related control bits include a sticky bit G, a rounding bit R and a reserved bit LSB; as shown in fig. 3, the target precision in this embodiment includes four kinds (which may be some or more) of single-precision (single-precision), half-precision (half-precision), bfloat16, and tensrfoat-32.
The selection output submodule is used for selecting a corresponding rounding related control bit according to the target precision and outputting a corresponding rounded mantissa according to the rounding related control bit;
and a rounding carry judgment submodule (shown in a lower dotted area in fig. 3) for generating a sign bit according to the rounded mantissa.
In this embodiment, the generating the sign bit according to the rounded mantissa by the rounded carry judgment sub-module includes: judging whether the rounded mantissa carries a carry or not, and when the rounded mantissa is all 1 and the carry is carried in the rounding process, carrying the most significant bit of the rounded mantissa at the moment; if the highest bit of the rounded mantissa carries a carry, adding 1 to the exponent, and changing the rounded mantissa into all 0; and outputting the exponent as a sign bit to a result output module together with the exponent and the rounded mantissa.
In addition, the present embodiment further provides a processor, which includes a processor body and a floating-point multiply-add unit disposed in the processor body, where the floating-point multiply-add unit is the floating-point multiply-add unit with the fusion precision conversion function.
In addition, the embodiment further provides a computer device, which includes a processor and a memory connected to each other, where the processor includes a processor body and a floating-point multiply-add unit disposed in the processor body, and the floating-point multiply-add unit is the aforementioned floating-point multiply-add unit with the fused precision conversion function.
The embodiment designs a floating point multiply-add unit with a function of integrating precision conversion, which is an improvement of the application of the traditional floating point multiply-add unit in a deep learning hardware platform. In the embodiment, the floating-point multiply-add unit with the fusion precision conversion function is tightly coupled with the deep learning algorithm, so that analysis needs to be performed from the algorithm level in the implementation process, a fusion operator is extracted according to a target operator structure and a network model data stream, and a customized operation unit is designed by taking the functional requirements of the fusion operator as a traction guide. Therefore, the present embodiment further provides an application method of the floating-point multiply-add unit with fusion precision conversion function, including:
s1, extracting a mixed precision multiply-add operator and a precision conversion operator based on convolution or matrix multiply operator of a target deep learning network model, wherein the mixed precision multiply-add operator is used for completing mixed precision multiply-add operation, and the precision conversion operator is used for completing precision conversion, and comprises the steps of converting a low-precision floating point number into a high-precision floating point number and converting the high-precision floating point number into a low-precision floating point number; in this embodiment, the low-precision floating point number in step S1 is one of half-precision, bfloat16, and TensorFloat-32, and the high-precision floating point number is single-precision;
s2, fusing a pair of operators with a mixed precision multiply-add operator in front and a precision conversion operator in back into a fusion operator, wherein the fusion operator is used for completing the mixed precision multiply-add operation firstly and then completing the precision conversion, and determining the target precision to be supported by the fusion operator based on the application requirement of the target deep learning network model;
and S3, the fusion operator is realized by adopting the floating point multiply-add unit with the fusion precision conversion function to accelerate convolution or matrix multiply operation in the target deep learning network model.
Fig. 4 depicts data flow between operators between network layers in the deep learning network model, in fig. 4, L represents the L-th network layer of the target deep learning network model, and L +1 and L-1 are the same.
In this embodiment, when the fusion operator is obtained in step S2, the fusion operator includes fusing the mixed precision multiply-add operator in the forward propagation stage and the precision conversion operator for converting the high-precision floating point number after the forward propagation stage into the low-precision floating point number into the fusion operator; and fusing the mixed precision multiply-add operator in the back propagation stage and the precision conversion operator for converting the high-precision floating point number after the back propagation stage into the low-precision floating point number into a fusion operator. The operators involved in fig. 4 mainly include: a mixed precision multiply-add operator 10 of three stages of forward propagation, backward propagation and weight (gradient) update, a precision conversion operator 11 in the forward propagation process, a precision conversion operator 12 in the backward propagation process, and a weight precision conversion operator 13 in the forward propagation process.
The hybrid precision multiply-add operator 10 is a core operator in three stages (forward propagation, backward propagation, and weight (gradient) update) involved in the deep learning algorithm training process, and mainly corresponds to a convolution or matrix multiplication operator. The operator is the most core operator in the deep learning algorithm, the multiplication operand of the operator adopts a low-precision floating point format, the addition operand adopts a high-precision floating point format, and a high-precision floating point multiplication and addition result is output. The low precision in the deep learning algorithm mainly comprises the following steps: half precision, bfloat16 and Tensorfloat-32, high precision often refers to single precision, the data format is specified according to the specific requirements of the algorithm model.
The purpose of the precision conversion operator 11 in the forward propagation process is to convert the current network layer high-precision multiply-add operation result into a low-precision format. By writing back low-precision data to the storage, the storage overhead can be effectively reduced, and meanwhile, the low-precision data can be directly used as an activation value to be input to the next network layer. The operation data type of precision conversion here corresponds to the data format selected in the hybrid precision multiply-add operator 10.
The precision conversion operator 12 in the backward propagation process is used for the precision conversion operation of the multiply-add result in the backward propagation process, and the principle is the same as that of the precision conversion operator 11 in the forward propagation process.
The weight precision conversion operator 13 in the forward propagation process is used for precision conversion operation of the weights before floating point multiply-add in the forward propagation process, the format of the weights in storage is high precision (single precision), and when mixed precision multiply-add operation is performed, the weights need to be converted into low precision firstly and then operated. The precision conversion is not fused with the floating-point multiply-add, because the operation data is preprocessed, the precision conversion is prior to the floating-point multiply-add, the hardware logic in the floating-point multiply-add cannot be multiplexed on the data stream, and the hardware overhead cannot be saved by fusing the two.
The calculation sequence among the operators determines whether the common calculation part can be fused or not, namely whether a fusion operator can be formed or not. In fig. 4, the mixed precision multiply-add operator 10 and the precision conversion operator 11 in the forward propagation process, or the mixed precision multiply-add operator 10 and the precision conversion operator 12 in the backward propagation process can be fused, because the index adjustment and the mantissa rounding calculation required by both can be fused when they meet the requirement of the floating-point multiply-add operation and the precision conversion after the precision conversion on the data stream. However, the two sequences are changed, so that fusion cannot be performed, for example, the hybrid precision multiply-add operator 10 and the weight precision conversion operator 13 in the forward propagation process cannot form a fusion operator. Extracting a fusion operator according to the deep learning network model data flow in fig. 4, determining the data precision supported by the fusion operator according to specific application requirements, modifying based on the traditional floating point multiply-add structure, and designing a new operation unit to realize the fusion of floating point multiply-add and precision conversion.
In summary, the present embodiment specifically implements a floating-point arithmetic unit, which has only one more enable control signal for precision conversion at the external interface level compared with the conventional floating-point multiply-add unit. In the process of applying the embodiment to the deep learning hardware platform, the original independent floating point multiply-add unit and the original independent precision conversion unit are directly replaced, and simultaneously, the interfaces of the two operation units are merged and connected to the operation unit of the embodiment. The replacement process is very simple, the replacement is transparent to a programmer in the actual application process, and the control difficulty of the programmer cannot be increased. Multiply-accumulate operation and precision conversion are one of the most common operations in deep learning, and the traditional implementation mode is to classify the two operations into different operators and respectively design a special operation unit, so that the hardware overhead is high and the operator implementation efficiency is low. In the embodiment, the floating-point multiply-add operation and the precision conversion operation are combined into a whole, and the calculation efficiency can be effectively improved only by slightly modifying the traditional floating-point multiply-add unit and increasing a small amount of hardware overhead. The two types of operators are coupled together and are consistent with deep learning algorithm data, algorithm implementation can be simplified, and control difficulty is reduced.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A floating-point multiply-add unit with fused precision conversion functionality, comprising:
the data preprocessing module is used for extracting a sign bit, an exponent bit field and a mantissa bit field of an input floating point operand according to the operation type and supplementing a hidden bit of an unsigned mantissa in the mantissa bit field;
the mantissa multiplication module is used for multiplying unsigned mantissas which are supplemented with the hidden bits to obtain a mantissa product;
the exponent difference module is used for subtracting the exponents of the multiplication operands and the exponents of the addition operands to obtain exponent difference values;
the addend order-matching shifting module is used for performing order-matching shifting on addends according to the calculated exponential difference values;
the addition module is used for adding the mantissa product and the addend after the order shift to obtain a new mantissa;
the leading zero calculation module is used for calculating the number of leading zeros in the new mantissa;
the normalization shift module is used for carrying out leftward logic shift on the new mantissa, and the leftward shifted data amount is the number of leading zeros in the new mantissa calculated by the leading zero calculation module;
it is characterized by also comprising:
the exponent adjusting module is used for performing exponent adjustment based on the number of leading zeros in the new mantissa and the difference value of the exponent basic values of the current mixed precision operation addend precision and the target precision;
the mantissa rounding module is used for rounding the mantissa according to the target precision type to be supported;
and the result output module is used for combining the sign bit, the exponent and the rounded mantissa, obtaining a combined result for each precision, and selecting one of the combined results as a final output result according to the specific precision conversion control signal.
2. The floating-point multiply-add unit with fused precision conversion function according to claim 1, wherein the exponent adjusting module comprises:
the difference value calculation submodule is used for calculating an index base value difference value between the index base value of the current mixed precision operation addend precision and the index base value of the target precision;
the merging and adding submodule is used for summing the number of leading zeros in the new mantissa and the difference value of the exponent base value;
the index correction subtraction submodule is used for subtracting the sum of the merging and adding submodule from the index difference value obtained by the index difference module to obtain a new index;
an overflow judging submodule for judging whether the new index overflows or not, if not, directly outputting the new index as the index after precision conversion, otherwise, skipping to execute the maximum/minimum value module
And the maximum/minimum value taking module is used for selecting 0 as the index output after precision conversion if the new index is less than 0 when the new index overflows, and selecting the maximum value which can be expressed by the target precision index as the index output after precision conversion if the new index is greater than the maximum value which can be expressed by the target precision index.
3. The floating-point multiply-add unit with fusion precision conversion function of claim 2, wherein the overflow determining submodule determines that the new exponent overflows if the new exponent is smaller than 0 or larger than 31 when determining whether the new exponent overflows or not, otherwise determines that the new exponent does not overflow.
4. The floating-point multiply-add unit with fused precision conversion function according to claim 1, wherein the mantissa rounding module comprises:
the control bit generation submodule is used for generating corresponding rounding related control bits aiming at a plurality of different target precisions aiming at the mantissas after the logic shift of the normalized shift module, and the rounding related control bits comprise a splicing bit G, a rounding bit R and a reserved bit LSB;
the selection output submodule is used for selecting a corresponding rounding related control bit according to the target precision and outputting a corresponding rounded mantissa according to the rounding related control bit;
and the rounding carry judgment submodule is used for generating a sign bit according to the rounded mantissa.
5. The floating-point multiply-add unit with fused precision conversion function according to claim 4, wherein the round-carry judgment sub-module generates the sign bit according to the rounded mantissa comprises: judging whether the rounded mantissa carries a carry or not, and when the rounded mantissa is all 1 and the carry is carried in the rounding process, carrying the most significant bit of the rounded mantissa at the moment; if the highest bit of the rounded mantissa carries a carry, adding 1 to the exponent, and changing the rounded mantissa into all 0; and outputting the exponent as a sign bit to a result output module together with the exponent and the rounded mantissa.
6. The floating-point multiply-add unit with fused precision conversion function of claim 4, wherein the plurality of different target precisions comprises some or all of the four types single-precision, half-precision, bfloat16, and Tensorfloat-32.
7. A processor comprising a processor body and a floating-point multiply-add unit provided in the processor body, wherein the floating-point multiply-add unit is the floating-point multiply-add unit with a fused precision conversion function according to any one of claims 1 to 6.
8. A computer device comprising a processor and a memory connected to each other, wherein the processor comprises a processor body and a floating-point multiply-add unit provided in the processor body, and wherein the floating-point multiply-add unit is the floating-point multiply-add unit with the fused precision conversion function according to any one of claims 1 to 6.
9. The application method of the floating-point multiply-add unit with the fused precision conversion function according to any one of claims 1 to 6, characterized by comprising the following steps:
s1, extracting a mixed precision multiply-add operator and a precision conversion operator based on convolution or matrix multiply operator of a target deep learning network model, wherein the mixed precision multiply-add operator is used for completing mixed precision multiply-add operation, and the precision conversion operator is used for completing precision conversion, and comprises the steps of converting a low-precision floating point number into a high-precision floating point number and converting the high-precision floating point number into a low-precision floating point number;
s2, a pair of operators with a mixed precision multiply-add operator in front and a precision conversion operator in back are fused into a fusion operator, the fusion operator is used for completing the mixed precision multiply-add operation and then completing the precision conversion, and the target precision to be supported of the fusion operator is determined based on the application requirement of the target deep learning network model;
and S3, the fusion operator is realized by adopting the floating point multiply-add unit with the fusion precision conversion function in any one of claims 1 to 6 to accelerate convolution or matrix multiply operation in the target deep learning network model.
10. The method as claimed in claim 9, wherein the low-precision floating point number is one of half-precision, bfloat16 and Tensorfloat-32, and the high-precision floating point number is single-precision in step S1; when the fusion operator is obtained in the step S2, the fusion operator is obtained by fusing a mixed precision multiply-add operator in a forward propagation stage and a precision conversion operator for converting high-precision floating point numbers into low-precision floating point numbers after the forward propagation stage into the fusion operator; and fusing the mixed precision multiply-add operator in the back propagation stage and the precision conversion operator for converting the high-precision floating point number after the back propagation stage into the low-precision floating point number into a fusion operator.
CN202210917746.3A 2022-08-01 2022-08-01 Floating point multiply-add unit with fusion precision conversion function and application method thereof Pending CN115390790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210917746.3A CN115390790A (en) 2022-08-01 2022-08-01 Floating point multiply-add unit with fusion precision conversion function and application method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210917746.3A CN115390790A (en) 2022-08-01 2022-08-01 Floating point multiply-add unit with fusion precision conversion function and application method thereof

Publications (1)

Publication Number Publication Date
CN115390790A true CN115390790A (en) 2022-11-25

Family

ID=84117846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210917746.3A Pending CN115390790A (en) 2022-08-01 2022-08-01 Floating point multiply-add unit with fusion precision conversion function and application method thereof

Country Status (1)

Country Link
CN (1) CN115390790A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116643718A (en) * 2023-06-16 2023-08-25 合芯科技有限公司 Floating point fusion multiply-add device and method of pipeline structure and processor
CN117251132A (en) * 2023-09-19 2023-12-19 上海合芯数字科技有限公司 Fixed-floating point SIMD multiply-add instruction fusion processing device and method and processor
CN117632081A (en) * 2024-01-24 2024-03-01 沐曦集成电路(上海)有限公司 Matrix data processing system for GPU
CN117648959A (en) * 2024-01-30 2024-03-05 中国科学技术大学 Multi-precision operand operation device supporting neural network operation
CN117251132B (en) * 2023-09-19 2024-05-14 上海合芯数字科技有限公司 Fixed-floating point SIMD multiply-add instruction fusion processing device and method and processor

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116643718A (en) * 2023-06-16 2023-08-25 合芯科技有限公司 Floating point fusion multiply-add device and method of pipeline structure and processor
CN116643718B (en) * 2023-06-16 2024-02-23 合芯科技有限公司 Floating point fusion multiply-add device and method of pipeline structure and processor
CN117251132A (en) * 2023-09-19 2023-12-19 上海合芯数字科技有限公司 Fixed-floating point SIMD multiply-add instruction fusion processing device and method and processor
CN117251132B (en) * 2023-09-19 2024-05-14 上海合芯数字科技有限公司 Fixed-floating point SIMD multiply-add instruction fusion processing device and method and processor
CN117632081A (en) * 2024-01-24 2024-03-01 沐曦集成电路(上海)有限公司 Matrix data processing system for GPU
CN117632081B (en) * 2024-01-24 2024-04-19 沐曦集成电路(上海)有限公司 Matrix data processing system for GPU
CN117648959A (en) * 2024-01-30 2024-03-05 中国科学技术大学 Multi-precision operand operation device supporting neural network operation
CN117648959B (en) * 2024-01-30 2024-05-17 中国科学技术大学 Multi-precision operand operation device supporting neural network operation

Similar Documents

Publication Publication Date Title
CN115390790A (en) Floating point multiply-add unit with fusion precision conversion function and application method thereof
US6751644B1 (en) Method and apparatus for elimination of inherent carries
US5128889A (en) Floating-point arithmetic apparatus with compensation for mantissa truncation
EP0973089B1 (en) Method and apparatus for computing floating point data
US5369607A (en) Floating-point and fixed-point addition-subtraction assembly
US10019231B2 (en) Apparatus and method for fixed point to floating point conversion and negative power of two detector
JP4500358B2 (en) Arithmetic processing apparatus and arithmetic processing method
EP0472139B1 (en) A floating-point processor
US5148386A (en) Adder-subtracter for signed absolute values
US11106431B2 (en) Apparatus and method of fast floating-point adder tree for neural networks
KR100241076B1 (en) Floating- point multiply-and-accumulate unit with classes for alignment and normalization
US5136536A (en) Floating-point ALU with parallel paths
EP0717350A2 (en) High-speed division and square root calculation unit
JPS5979350A (en) Arithmetic device for floating point
CN116594590A (en) Multifunctional operation device and method for floating point data
CN116643718B (en) Floating point fusion multiply-add device and method of pipeline structure and processor
CN117111881A (en) Mixed precision multiply-add operator supporting multiple inputs and multiple formats
EP1282034A2 (en) Elimination of rounding step in the short path of a floating point adder
CN117075842B (en) Decimal adder and decimal operation method
JP3517162B2 (en) Division and square root arithmetic unit
CN112579519B (en) Data arithmetic circuit and processing chip
KR100974190B1 (en) Complex number multiplying method using floating point
JP3174974B2 (en) Floating point arithmetic device and method
CN117270813A (en) Arithmetic unit, processor, and electronic apparatus
JP2801472B2 (en) Floating point arithmetic unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination