CN117435164A

CN117435164A - High-performance multiply-add device, multiply-add method and electronic equipment

Info

Publication number: CN117435164A
Application number: CN202210832162.6A
Authority: CN
Inventors: 余玉琴; 曾耀辉; 卞仁玉; 张淮声
Original assignee: Glenfly Tech Co Ltd
Current assignee: Glenfly Tech Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2024-01-23

Abstract

The present application relates to a multiply add method, apparatus, processor and computer program product. The method comprises the following steps: when the logic operation unit performs single-precision floating point number multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are combined to perform multiply-add operation on the single-precision floating point number to be processed, so that corresponding single-precision multiply-add results are obtained, and N multiply-add results are obtained in total; when the logical operation unit performs the half-precision floating point number multiply-add operation, each half-precision multiply-add device performs multiply-add operation on the half-precision floating point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total. The utilization rate of the multiplier-adder is improved.

Description

High-performance multiply-add device, multiply-add method and electronic equipment

Technical Field

The application relates to the technical field of chips, in particular to a high-performance multiply-add device, a multiply-add method and electronic equipment.

Background

In a logical operation unit of a microprocessor, a floating-point number multiply-add operation is generally implemented using a multiply-add device. In general, the design scheme of the multiplier-adder in the logic operation unit is as follows: n single-precision multiply-add devices and 2n half-precision multiply-add devices are arranged. When the logical operation unit performs single-precision floating point number multiply-add operation, n single-precision multiply-add devices work simultaneously, and n single-precision multiply-add results can be obtained; when the logical operation unit performs half-precision floating point number multiply-add operation, 2n half-precision multiply-add devices work simultaneously, and 2n half-precision multiply-add results can be obtained.

However, when the logic operation unit performs single-precision floating point number multiply-add operation, the 2n half-precision multiply-add devices are in an idle state, and when the logic operation unit performs half-precision floating point number multiply-add operation, the n single-precision multiply-add devices are in an idle state, the utilization rate of the multiply-add devices is not high, and the design of a large number of multiply-add devices also increases hardware cost.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a high-performance multiply-add device, a multiply-add method, and an electronic apparatus that can improve the utilization of the multiply-add device.

In a first aspect, the present application provides a high performance multiply-add device comprising: n single precision multiply-add units, each single precision multiply-add unit comprising: two half-precision multiply-add devices;

when the high-performance multiply-add device performs single-precision floating point multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are used for combining the single-precision floating point to be processed to perform multiply-add operation, so as to obtain corresponding single-precision multiply-add results, and N multiply-add results are obtained in total;

when the high-performance multiply-add device performs the multiply-add operation of the half-precision floating-point number, each half-precision multiply-add device is used for performing the multiply-add operation of the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total.

In a second aspect, the present application further provides a multiply-add method, the method comprising:

when the high-performance multiply-add device performs single-precision floating-point multiply-add operation, the two half-precision multiply-add devices in each single-precision multiply-add unit are combined to perform multiply-add operation on the single-precision floating-point to be processed, so that corresponding single-precision multiply-add results are obtained, and N multiply-add results are obtained;

when the high-performance multiply-add device performs the multiply-add operation of the half-precision floating-point number, each half-precision multiply-add device performs the multiply-add operation of the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total.

In a third aspect, the present application further provides an asymmetric multiply-add device, comprising: n multiply-add units, each multiply-add unit comprising: a single-precision multiply-add device and a half-precision multiply-add device;

when the asymmetric multiply-add device performs the multiply-add operation of the half-precision floating-point number, the single-precision multiply-add device and the half-precision multiply-add device perform multiply-add operation on the half-precision floating-point number to be processed respectively to obtain corresponding half-precision multiply-add results, and 2N half-precision multiply-add results are obtained in total;

when the asymmetric multiply-add device performs single-precision floating-point multiply-add operation, the single-precision multiply-add device performs multiply-add operation on the single-precision floating-point to be processed to obtain corresponding single-precision multiply-add results, and N single-precision multiply-add results are obtained in total.

In a fourth aspect, the present application further provides a multiply-add method, the method comprising:

In a fifth aspect, the present application also provides an electronic device. The electronic device comprises a memory storing a computer program and a processor implementing the method provided in the second aspect or the method provided in the fourth aspect when the processor executes the computer program.

In a sixth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method provided in the second aspect or the method provided in the fourth aspect.

In a seventh aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the method provided in the second aspect or the method provided in the fourth aspect.

The multiplication and addition method, the device, the processor and the computer program product provide two design ideas, wherein one design ideas is that 2n semi-precision multiplication and addition devices are still arranged in a logic operation unit. The 2n half-precision multiply-add devices are grouped two by two to obtain n groups, and when single-precision floating point multiply-add operation is performed, the half-precision multiply-add devices also participate in operation and are not idle, so that the utilization rate of the half-precision multiply-add devices is improved. The other design thinking is that n semi-precision multiply-add devices, n single-precision multiply-add devices and one single-precision multiply-add device are arranged in a logic operation unit to form a group, n groups are obtained in total, when the semi-precision floating point multiply-add operation is carried out, the single-precision multiply-add devices also participate in the operation, do not idle, the utilization rate of the single-precision multiply-add devices is improved, and in addition, the scheme of the application saves the n semi-precision multiply-add devices, and reduces the hardware cost.

Drawings

FIG. 1 is a schematic diagram of a data format of a single precision floating point number in one embodiment;

FIG. 2 is a schematic diagram of a data format of a half-precision floating point number in one embodiment;

FIG. 3 is a schematic diagram of a design structure of a logic operation unit in one embodiment;

FIG. 4 is a schematic diagram of a design structure of a logic operation unit according to another embodiment;

FIG. 5 is a schematic diagram of a design structure of a logic operation unit according to another embodiment;

FIG. 6 is a flow diagram of a multiply-add method in one embodiment;

FIG. 7 is a schematic diagram of the internal structure of a multiplier-adder in one embodiment;

FIG. 8 is a flow diagram of a single precision floating point multiply-add operation in one embodiment;

FIG. 9 is a schematic diagram of the internal structure of a multiplier-adder according to another embodiment;

FIG. 10 is a flow diagram of a half-precision floating point multiply-add operation in one embodiment;

FIG. 11 is a flow chart of a multiply-add method in another embodiment;

FIG. 12 is a schematic diagram of the internal structure of a multiplier-adder according to another embodiment;

FIG. 13 is a flow chart of a half-precision floating point multiply-add operation in another embodiment;

FIG. 14 is a schematic diagram of the internal structure of a multiplier-adder according to another embodiment;

FIG. 15 is a flow chart of a single precision floating point multiply-add operation in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

For ease of understanding, the following terms are used in connection with the embodiments of the present application:

single precision floating point number: the binary floating point arithmetic standard (IEEE 754) specifies that single precision floating point numbers consist of 32-bit binary data, the data format of which is shown in fig. 1, wherein,

s: sign bit, s=0, indicates that the value represented by the single-precision floating point number is positive, and s=1 indicates that the value represented by the single-precision floating point number is negative.

Step code (exact): an exponent portion, 8-bit binary data.

Mantissa (mantissa): the portion after the decimal point, 23 bits of binary data.

Normalized Normal: the finger code is not all 1 and not all 0, and the number 1 before the decimal point is omitted.

Denormal: the finger code is all 0, the mantissa is not all 0, and the number 0 before the decimal point is omitted.

Step code bias (bias): bias=0x7f for Normal single precision floating point numbers and bias=0x7e for denormal single precision floating point numbers.

The Normal single precision floating point number represents the value:

data＝(-1) ^S *2 ^{exponent-0x7F} *(1.mantissa)

the value represented by Denormal single precision floating point numbers is:

data＝(-1) ^S *2 ^{exponent-0x7E} *(0.mantissa)

semi-precision floating point number: the half-precision floating point number is composed of 16-bit binary data, and the data format of the half-precision floating point number is shown in fig. 2, wherein S: sign bit, s=0, indicates that the value represented by the half-precision floating-point number is positive, and s=1 indicates that the value represented by the half-precision floating-point number is negative.

Step code (exact): an exponent portion, 5-bit binary data;

mantissa (mantissa): the portion after the decimal point, 10-bit binary data;

normal: the finger code is not all 1 and not all 0, and the number 1 before the decimal point is omitted.

Step code bias (bias): bias=0xf for Normal half-precision floating-point numbers and bias=0xe for denormal half-precision floating-point numbers.

The Normal half-precision floating point number represents the value:

data＝(-1) ^S *2 ^exponent-0xF *(1.mantissa)

the Denormal half-precision floating point number represents the value:

data＝(-1) ^S *2 ^exponent-0xE *(0.mantissa)

in a logical operation unit of a microprocessor, a floating-point number multiply-add operation is generally implemented using a multiply-add device. In some embodiments, referring to fig. 3, the design scheme of the multiplier-adder in the logic operation unit is as follows: n single-precision multiply-add devices and 2n half-precision multiply-add devices are arranged. When the logical operation unit performs single-precision floating point number multiply-add operation, n single-precision multiply-add devices work simultaneously, and n single-precision multiply-add results can be obtained; when the logical operation unit performs half-precision floating point number multiply-add operation, 2n half-precision multiply-add devices work simultaneously, and 2n half-precision multiply-add results can be obtained. The problem with this approach is that when the logical operation unit performs single-precision floating-point multiply-add operation, 2n half-precision multiply-add devices are in an idle state, and when the logical operation unit performs half-precision floating-point multiply-add operation, n single-precision multiply-add devices are in an idle state, the utilization rate of the multiply-add devices is not high, and the design of a large number of multiply-add devices also increases the hardware cost.

For this reason, two other design ideas are provided in the embodiment of the present application, and one design ideas is shown in fig. 4, a high-performance multiply-add device is provided, and 2n half-precision multiply-add devices are still provided in the high-performance multiply-add device. And grouping 2n half-precision multiply-add devices in pairs to obtain n groups, wherein when the high-performance multiply-add device performs single-precision floating point multiply-add operation, the two half-precision multiply-add devices in each group are combined to perform multiply-add operation on the single-precision floating point to be processed to obtain corresponding single-precision multiply-add results, and n single-precision multiply-add results can be obtained in total. When the high-performance multiply-add device performs the half-precision floating-point multiply-add operation, the 2n half-precision multiply-add devices independently work, namely each half-precision multiply-add device performs multiply-add operation on the half-precision floating-point to be processed to obtain corresponding half-precision multiply-add results, and 2n half-precision multiply-add results can be obtained in total. According to the scheme of the embodiment of the application, when single-precision floating point number multiply-add operation is carried out, the half-precision multiply-add device also participates in operation, is not idle, improves the utilization rate of the half-precision multiply-add device, and compared with the design scheme shown in fig. 3, n single-precision multiply-add devices are saved, and hardware cost is reduced.

Another design concept is to provide an asymmetric multiply-add device, in which n half-precision multiply-add devices and n single-precision multiply-add devices are arranged, as shown in fig. 5. A half-precision multiply-add unit and a single-precision multiply-add unit form a group, n groups are obtained in total, and each group is called a multiply-add unit in the embodiment of the application for convenience of explanation. When the asymmetric multiply-add device performs single-precision floating-point number multiply-add operation, the n single-precision multiply-add devices independently work, namely each single-precision multiply-add device performs multiply-add operation on the single-precision floating-point number to be processed to obtain corresponding single-precision multiply-add results, and n single-precision multiply-add results can be obtained in total. When the asymmetric multiply-add device carries out the multiply-add operation of the half-precision floating-point number, the n half-precision multiply-add devices independently work to obtain n half-precision multiply-add results, each single-precision multiply-add device converts the half-precision floating-point number to be processed into a single-precision floating-point number, then carries out multiply-add operation, and finally converts the result into half-precision, and n half-precision multiply-add results can be obtained to obtain 2n half-precision multiply-add results. According to the scheme of the embodiment of the application, when the half-precision floating point number multiply-add operation is carried out, the single-precision multiply-add device also participates in the operation, is not idle, the utilization rate of the single-precision multiply-add device is improved, and compared with the design scheme shown in fig. 3, n half-precision multiply-add devices are saved, and hardware cost is reduced.

It should be noted that: the high-performance multiply-add device shown in fig. 4 may be applied to any type of processor that needs to perform a floating-point multiply-add operation, and similarly, the asymmetric multiply-add device shown in fig. 5 may also be applied to any type of processor that needs to perform a floating-point multiply-add operation, where the embodiment of the present application does not limit the type of processor.

The high performance multiply-add device shown in fig. 4 and the asymmetric multiply-add device shown in fig. 5 are each described in detail below.

First, the high-performance multiplier-adder shown in fig. 4 will be described in detail.

In one embodiment, a high performance multiply-add comprises: n single precision multiply-add units, each single precision multiply-add unit comprising: two half-precision multiply-add devices; when the high-performance multiply-add device performs single-precision floating point multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are used for combining the single-precision floating point to be processed to perform multiply-add operation, so as to obtain corresponding single-precision multiply-add results, and N multiply-add results are obtained in total; when the high-performance multiply-add device performs the multiply-add operation of the half-precision floating-point number, each half-precision multiply-add device is used for performing the multiply-add operation of the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total.

Wherein the two half-precision multiply-add devices comprise: the first half-precision multiply adder and the second half-precision multiply adder, the single-precision floating point number to be processed comprises: a first single precision multiplier, a second single precision multiplier, and a single precision addend.

When the high-performance multiply-add device performs single-precision floating-point multiply-add operation, the first half-precision multiply-add device is particularly used for executing first partial multiply operation to obtain a first multiply operation result, and the first multiply operation result is transmitted to the second half-precision multiply-add device; the second half-precision multiply adder is specifically configured to perform a second partial multiply operation to obtain a second multiply operation result, where the first partial multiply operation and the second partial multiply operation are divided according to a preset rule based on the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier; the second half-precision multiply adder is further used for determining the decimal of the multiplication result according to the first multiplication result and the second multiplication result; and determining a single-precision multiplication and addition result according to the decimal of the multiplication result, the step code of the first single-precision multiplier, the step code of the second single-precision multiplier, the sign of the first single-precision multiplier, the sign of the second single-precision multiplier, the decimal of the single-precision addend, the step code of the single-precision addend and the sign of the single-precision addend.

Wherein the second half-precision multiply-add comprises: the device comprises a step code addition operation module, a decimal addition module and a determination module; the step code addition operation module is used for determining the step code of the multiplication operation result according to the step code of the first single-precision multiplier and the step code of the second single-precision multiplier; the decimal addition module is used for determining the sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier; the determining module is used for determining the decimal of the multiplication and addition result, the sign of the multiplication and addition result and the order code of the multiplication and addition result according to the order code of the multiplication and addition result, the order code of the single-precision, the sign of the single-precision and addition result, the decimal of the multiplication and addition result and the decimal of the single-precision and addition result; and determining a single-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

Wherein, the determination module includes: the device comprises a first step code subtraction operation module, a first shift operation module, a first addition operation module and a first multiplication and addition result step code determination module; the first step code subtraction operation module is used for determining the absolute value of a step code difference value according to the step code of the multiplication operation result and the step code of the single-precision addend; the first shift operation module is used for carrying out shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the step code of the multiplication result, the step code of the single-precision addend and the absolute value of the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation; the first addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addend, the decimal of the multiplication and addition result after the shift operation and the decimal of the single-precision addend after the shift operation; the first multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

Specifically, the step code addition operation module is specifically configured to:

the step code of the multiplication result is determined by adopting the following formula:

op01.exp＝op0.exp+op1.exp-bias

wherein op01.Exp is the step code of the multiplication result, op0.Exp is the step code of the first single precision multiplier, op1.Exp is the step code of the second single precision multiplier, bias is the step code deviation.

Specifically, the decimal addition module is specifically configured to:

and performing exclusive OR operation on the sign of the first single-precision multiplier and the sign of the second single-precision multiplier to obtain a sign of a multiplication result.

Specifically, the first shift operation module is specifically configured to:

comparing the step code of the multiplication result with the step code of the single-precision addend; if the step code of the multiplication result is larger than or equal to the step code of the single-precision addend, right-shifting the decimal of the single-precision addend by the bit number corresponding to the absolute value; if the step code of the multiplication result is smaller than the step code of the single-precision addend, the decimal of the multiplication result is shifted to the right by the bit number corresponding to the absolute value.

Specifically, the first addition operation module is specifically configured to:

if the sign of the multiplication result is different from the sign of the single-precision addend, inverting and adding one to the decimal of the single-precision addend after the shift operation to obtain the decimal of the single-precision addend after inverting and adding one; and summing the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the inverse addition, and taking the summation result as the decimal of the multiplication result; if the sign of the multiplication result is the same as that of the single-precision addend, the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation are summed, and the sum result is used as the decimal of the multiplication and addition result.

Specifically, the first multiplication and addition result code determining module is specifically configured to:

if the step code of the multiplication result is larger than or equal to the step code of the single-precision addend, taking the step code of the multiplication result as the step code of the multiplication result; if the step code of the multiplication result is smaller than the step code of the single-precision addend, the step code of the single-precision addend is used as the step code of the multiplication result.

In one embodiment, a half-precision floating point number to be processed includes: a first half precision multiplier, a second half precision multiplier, and a half precision addend.

When the high-performance multiply-add device performs half-precision floating point multiply-add operation, the half-precision multiply-add device is particularly used for determining the decimal of the multiplication result according to the decimal of the first half-precision multiplier and the decimal of the second half-precision multiplier; determining the step code of the multiplication result according to the step code of the first half-precision multiplier and the step code of the second half-precision multiplier; determining the sign of the multiplication result according to the sign of the first half-precision multiplier and the sign of the second half-precision multiplier; determining the decimal of the multiplication result, the sign of the multiplication result and the code of the multiplication result according to the decimal of the multiplication result, the code of the half-precision addend, the sign of the half-precision addend and the decimal of the half-precision addend; and determining a semi-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

The half-precision multiply-add device comprises a second code subtraction operation module, a second shift operation module, a second addition operation module and a second multiply-add result code determination module: the second code subtraction operation module is used for determining the absolute value of a code difference value according to the code of the multiplication result and the code of the half-precision addend; the second shift operation module is used for carrying out shift operation on the decimal of the multiplication result or the decimal of the half-precision addend according to the step code of the multiplication result, the step code of the half-precision addend and the absolute value of the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the half-precision addend after the shift operation; the second addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addition number, the decimal of the multiplication and addition result after the shift operation and the decimal of the half-precision addition number after the shift operation; the second multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

In one embodiment, as shown in fig. 6, a multiply-add method is provided, which is applied to the high-performance multiply-add device shown in fig. 4, and includes the steps of:

S602, when the high-performance multiply-add device performs single-precision floating point multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are used for combining the single-precision floating point to be processed to perform multiply-add operation, so as to obtain corresponding single-precision multiply-add results, and N multiply-add results are obtained in total.

For convenience of explanation, for each single-precision multiply-add unit, the two half-precision multiply-add units included in the single-precision multiply-add unit may be respectively referred to as: a first half-precision multiply-add and a second half-precision multiply-add. The multiply-add operation, as the name implies, includes both multiplications and additions, so the single-precision floating-point number to be processed includes two single-precision multipliers and one single-precision addend, and for convenience of explanation, the two single-precision multipliers are referred to as a first single-precision multiplier and a second single-precision multiplier, respectively.

The multiply-add operation expression is:

dst＝op0*op1+op2

where op0 is a first single precision multiplier, op1 is a second single precision multiplier, op2 is a single precision addend, and dst is a multiply-add result.

The variables used in the multiply-add process are described below:

op0.Mant: the decimal of op0, op0 is op0.mant=1.mantissa when normalizing single precision floating point numbers, and op 0.mantissa=0.mantissa when denormalizing single precision floating point numbers;

op1.Mant: the decimal of op1, op1 is op1. Mant=1. Mantissa when normalizing single precision floating point numbers, op1 is op1. Mant=0. Mantissa when denormalizing single precision floating point numbers;

op2.Mant: the decimal of op2, op2 is op2.mant=1.mantissa when normalizing single precision floating point numbers, and op 2.mant=0.mantissa when denormalizing single precision floating point numbers;

op01.Mant: op0 is the decimal of the op1 multiplication result;

dst.mant: the fraction of dst;

op0.Exp: the step code of op 0;

op1.Exp: the step code of op 1;

op2.Exp: the step code of op 2;

op01.Exp: op0 is the step code of the op1 multiplication result, op01. Exp=op0.exp+op1.exp-bias;

dst.exp: a step code of dst;

op0.s: the sign of op 0;

op1.S: the sign of op 1;

op2.S: the sign of op 2;

op01.S: op0 is the sign of the result of op1 multiplication, and the op0.S and the op1.S are subjected to exclusive OR operation to obtain op01.S;

dst.s: symbol of dst.

Optionally, when performing multiply-add operation on the single-precision floating point number to be processed, the op0.mant and op1.mant may be divided into two parts according to a preset rule, where for convenience of explanation, the two multiplication operations obtained by division are referred to as a first part multiplication operation and a second part multiplication operation.

Such as: as a result of the fact that,

op0.mant[23:0]*op1.mant[23:0]＝op0.mant[23:0]*op1.mant[11:0]+op0.mant[23:0]*op1.mant[23:12]

op0.mant [23:0] op1.mant [23:0] may be divided into two partial multiplications, the first partial multiplication being op0.mant [23:0] op1.mant [11:0] and the second partial multiplication being op0.mant [23:0] op1.mant [23:12]. Alternatively, the first partial multiplication is op0.mant [23:0] op1.mant [23:12], and the second partial multiplication is op0.mant [23:0] op1.mant [11:0].

It should be noted that: the above-mentioned division multiplication operation is only an example, and the embodiment of the present application is not limited thereto, but may be other division modes, for example: op0.mant [23:0] op1.mant [23:0] are divided into op0.mant [23:0] op1.mant [12:0] and op0.mant [23:0] op1.mant [23:13] and the like.

Optionally, the first half-precision multiply adder may perform a first partial multiply operation to obtain a first multiply operation result, and transmit the first multiply operation result to the second half-precision multiply adder; the second half-precision multiply adder executes a second partial multiply operation to obtain a second multiply operation result, determines op01.Mant according to the first multiply operation result and the second multiply operation result, and specifically, the second half-precision multiply adder can add the first multiply operation result and the second multiply operation result to obtain op01.Mant. When the single-precision floating point number multiply-add operation is carried out, the multiply operation is divided into two parts, the two half-precision multiply-add devices respectively execute one part and then add, so that the single-precision floating point number multiply-add operation can be carried out by the two half-precision multiply-add devices, the utilization rate of the half-precision multiply-add devices is improved, and meanwhile, the hardware cost is reduced because the single-precision multiply-add devices are not additionally arranged.

Optionally, after the two half-precision multiply-add devices work together to obtain op01.mant, the second half-precision multiply-add device may determine a corresponding single-precision multiply-add result according to op01.mant, op0.exp, op1.exp, op0.s, op1.s, op2.mant, op2.exp, and op2.s. Specifically, the second half-precision multiply adder may determine op01.Exp according to op0.Exp and op 1.Exp; the op01.S can be determined according to op0.S and op 1.S; dst.mant, dst.s, and dst.exp may be determined from op01.Exp, op2.Exp, op01.S, op2.S, op01.Mant, and op2.Mant; further, based on dst.mant, dst.s, and dst.exp, this is normalized to the data format of a single precision floating point number, resulting in a single precision multiply-add result.

Optionally, the two operands of the addition operation are op01 and op2, in order to make the order codes of the two numbers identical, the decimal needs to be shifted, and the second half-precision multiply adder can determine the absolute value of the order code difference value according to op01.Exp and op2. Exp; performing shift operation on the op01.Mant or the op2.Mant according to the absolute values of the op01.Exp, the op2.Exp and the step difference value to obtain the op01.Mant after the shift operation and the op2.Mant after the shift operation; the second half-precision multiply adder can determine dst.mant and dst.S according to op01.S, op2.S, op01.mant after shift operation and op2.mant after shift operation; the dst.exp is determined from op01.Exp and op2. Exp.

The structures of the first half-precision multiply-add and the second half-precision multiply-add are described below:

in one possible implementation, the first half-precision multiply-add device and the second half-precision multiply-add device may be designed as the structure illustrated in fig. 7, where the first half-precision multiply-add device includes: the device comprises a decimal multiplication operation module 10, a step code addition operation module 11, a step code subtraction operation module 12, a shift operation module 13, an addition operation module 14, a normalization module 15 and a multiplication and addition result step code determination module 16. Also, the second half-precision multiply-add includes: the device comprises a decimal multiplication operation module 20, a step code addition operation module 21, a step code subtraction operation module 22, a shift operation module 23, an addition operation module 24, a normalization module 25 and a multiplication and addition result step code determination module 26. It should be noted that: the second half precision multiply-add further comprises, compared to the first half precision multiply-add: the decimal addition module 27 is connected with the decimal multiplication module 10 in the first half-precision multiply-add device, after the decimal multiplication module 10 in the first half-precision multiply-add device obtains the multiplication result, the multiplication result is transmitted to the decimal addition module 27, and the decimal addition module 27 is used for adding the multiplication result obtained by the first single-precision multiply-add device and the multiplication result obtained by the second single-precision multiply-add device to obtain op01.Mant.

It should be noted that: as indicated by the dashed box in fig. 7, when the high-performance multiply-add device performs single-precision floating-point multiply-add operation, only the fractional multiply operation module 10 participating in the operation in the first half-precision multiply-add device performs other operation in the second half-precision multiply-add device.

The following describes in detail the process of single-precision floating point number multiply-add operation with reference to the structure shown in fig. 7, and referring to fig. 8, the process specifically includes:

s801, the fractional multiplication module 10 performs a first partial multiplication to obtain a first multiplication result, and transmits the first multiplication result to the fractional addition module 27. The fractional multiplication module 20 performs a second partial multiplication to obtain a second multiplication result, and passes the second multiplication result to the fractional addition module 27. The fractional addition module 27 adds the first multiplication result and the second multiplication result to obtain op01.Mant.

Specifically, the division manner of the multiplication operation is referred to above, and the embodiments of the present application are not described herein again.

S802, the step code addition operation module 21 calculates op01.Exp by adopting the following formula:

op01.exp＝op0.exp+op1.exp-bias

wherein bias is bias of Normal single-precision floating point number, i.e., bias=0x7f.

S803, the decimal addition module 27 performs exclusive OR operation on the op0.S and the op1.S to obtain the op01.S.

Alternatively, op0.S and op1.S may be input to the fractional addition module 27 such that the fractional addition module 27 performs an exclusive or operation.

S804, the step-code subtracting operation module 22 calculates the absolute values of the step-code difference values of op01.Exp and op2. Exp.

Alternatively, after the step-code subtracting module 22 calculates the absolute value of the step-code difference, the op01.Exp, op2.Exp and the absolute value may be transmitted to the shift module 23.

S805, the shift operation module 23 compares op01.Exp and op2.Exp, and performs a shift operation according to the comparison result.

Specifically, if op01.Exp is greater than or equal to op2.Exp, then op2.Mant is shifted to the right by the number of bits corresponding to the absolute value; if the op01.Exp is smaller than the op2.Exp, the op01.Mant is shifted right by the bit number corresponding to the absolute value, and the op01.Mant and the op2.Mant after the shift operation are obtained.

Specifically, if op01.Exp > =op2.exp, then op2.mant is shifted right by |op01.Exp-op2.exp|.

If op01.Exp < op2.Exp, then op01.Mant is shifted right by |op01.Exp-op2.Exp|.

Alternatively, two selectors a and b may be provided between the decimal addition module 27 and the shift operation module 23, and if the shift operation module 23 determines that op01.Exp > =op2. Exp, then op2.Mant is selected from the selector b to perform shift operation, and then the selector a transfers op01.Mant to the addition operation module 24; the shift operation module 23 determines that op01.Exp < op2.Exp, selects op01.Mant from the selector b to perform shift operation, and the selector a passes op2.Mant to the addition operation module 24.

S806, the addition operation module 24 compares op01.S with op2.S, and performs a summation operation according to the comparison result.

If the op01.S and the op2.S are different, performing negation and one addition on the op2.Mant after the S805 shift operation to obtain the op2.Mant after the negation and one addition; and summing the op2.Mant after the inversion and the op01.Mant after the S806 shift operation, taking the summation result as dst.mant, and setting dst.S to be equal to op01.S;

if op01.S and op2.S are the same, then op2.Mant and op01.Mant after the shift operation of S805 are directly summed, the result of summation is taken as dst.mant, and dst.S is set equal to op01.S.

Alternatively, the fractional addition module 27 may transmit the op01.S to the addition module 24 through the shift operation module 23, and may input the op2.S to the addition module 24, so that the addition module 24 may determine whether the op01.S and the op2.S are the same. Alternatively, op0.S, op1.S and op2.S are all directly input to the addition module 24, so that the addition module 24 performs exclusive OR operation to obtain op01.S, and the above judgment is performed according to op01.S and op2.S.

S807, the multiply-add result order code determination module 26 determines dst.exp based on op01.Exp and op2. Exp.

Specifically, if op01.Exp > =op2. Exp, dst.exp=op01. Exp;

If op01.Exp < op2.Exp, dst. Exp=op 02.Exp.

Alternatively, when dst.exp is obtained, the dst.exp may be normalized to obtain an 8-bit binary level code of the normalized single-precision floating point number.

S808, the normalization module 25 normalizes the dst.mant obtained in the S806 to obtain a 23-bit binary mantissa of the normalized single-precision floating point number.

Alternatively, the addition module 24 may pass dst.s to the normalization module 25, in which case the normalization module 25 outputs sign bits in addition to the 23-bit binary mantissa. The sign bit, the 8-bit binary order code obtained in S807, and the 23-bit binary mantissa obtained in S808 constitute a single-precision multiply-add result.

According to the multiplication and addition method, the multiplication operation is divided into two parts, and the two half-precision multiplication and addition devices respectively execute one part, so that the two half-precision multiplication and addition devices can be combined to finish the multiplication and addition operation of the single-precision floating point number, the utilization rate of the half-precision multiplication and addition devices is improved, the single-precision multiplication and addition devices are not required to be additionally arranged, and the hardware cost is reduced.

S604, when the high-performance multiply-add device performs the multiply-add operation of the half-precision floating-point number, each half-precision multiply-add device performs the multiply-add operation of the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total.

The half-precision floating point number to be processed comprises two half-precision multipliers and a half-precision addend, and for convenience of explanation, the two half-precision multipliers are respectively called a first half-precision multiplier and a second half-precision multiplier.

The multiply-add operation expression is:

dst＝op0*op1+op2

where op0 is the first half precision multiplier, op1 is the second half precision multiplier, op2 is the half precision addend, and dst is the multiply-add result.

It should be noted that: unlike the single-precision floating-point multiply-add process, no division of the multiplication operation is required because the single half-precision multiply-add is supporting half-precision floating-point multiply-add operations.

Optionally, the semi-precision multiply adder may determine op01.Mant according to op1.Mant and op2.Mant; determining op01.Exp according to op0.Exp and op 1.Exp; the op01.S can be determined according to op0.S and op 1.S; the dst.mant, dst.exp, and dst.s can be determined from op01.mant, op01.exp, op01.s, op2.mant, op2.exp, and op 2.s; further, based on dst.mant, dst.exp, and dst.S, they are normalized to the data format of half-precision floating-point numbers, thereby obtaining half-precision multiply-add results.

Similarly, the two operands of the addition operation are op01 and op2, in order to make the order codes of the two numbers identical, the decimal shift is needed, and the semi-precision multiply adder can determine the absolute value of the order code difference value according to op01.Exp and op2. Exp; performing shift operation on the op01.Mant or the op2.Mant according to absolute values of the op01.Exp, the op2.Exp and the step difference value to obtain the op01.Mant after the shift operation and the op2.Mant after the shift operation; determining dst.mant and dst.S according to op01.S, op2.S, op01.mant after shift operation and op2.mant after shift operation; the dst.exp is determined from op01.Exp and op2. Exp.

The following describes in detail the process of the half-precision floating-point number multiply-add operation with reference to the structure shown in fig. 9, taking the second half-precision multiply-add device as an example, referring to fig. 10, the process specifically includes:

s1001, the decimal multiplication operation module 20 calculates the product of op0.Mant and op1. Mant.

Specifically, op1.Mant and op2.Mant are 11-bit binary data, and op01.Mant obtained by calculation is 22-bit binary data. Since the fractional multiplication module 10 is connected with the fractional addition module 27, and each half-precision multiply-add device performs multiply-add operation of the half-precision floating point number, that is, the first half-precision multiply-add device also performs multiply-add operation of the half-precision floating point data, in order to avoid the influence of the data transmitted by the fractional multiplication module 10, the fractional addition module 27 may set the transmitted data to zero by the fractional multiplication module 10, so that the output result of the fractional addition module 27 is the calculation result of the fractional multiplication module 20.

S1002, the step addition module 21 calculates op01.Exp.

S1003, the decimal addition module 27 performs exclusive OR operation on the op0.S and the op1.S to obtain the op01.S.

S1004, the step-code subtracting block 22 calculates the absolute values of the step-code difference values of op01.Exp and op2. Exp.

S1005, the shift operation module 23 compares op01.Exp and op2.Exp, and performs a shift operation based on the comparison result.

S1006, the addition operation module 24 compares op01.S with op2.S, and performs summation operation according to the comparison result. Dst.mant is obtained.

S1007, the multiplication and addition result order code determining module 26 determines dst.exp according to op01.Exp and op2. Exp.

And when dst.exp is obtained, the dst.exp can be standardized to obtain a 5-bit binary code of the standardized single-precision floating point number.

S1008, the normalization module 25 normalizes dst.mant obtained in S1006 to obtain 10-bit binary mantissas of the normalized single-precision floating point number.

The implementation process of S1001-S1008 is similar to that of S801-S808, and the embodiments of the present application are not repeated here. The sign bit, the 5-bit binary order code obtained in S1007, and the 10-bit binary mantissa obtained in S1008 constitute a half-precision multiply-add result.

According to the multiply-add method, when the high-performance multiply-add device performs single-precision multiply-add operation on the single-precision floating point number, the two half-precision multiply-add devices in each single-precision multiply-add unit are combined to perform multiply-add operation on the single-precision floating point number to be processed, so that corresponding single-precision multiply-add results are obtained, and N multiply-add results are obtained. When the high-performance multiply-add device performs the multiply-add operation of the half-precision floating-point number, each half-precision multiply-add device performs the multiply-add operation of the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N multiply-add results are obtained in total. According to the scheme of the embodiment of the application, when single-precision floating point number multiply-add operation is carried out, the half-precision multiply-add device also participates in operation, is not idle, improves the utilization rate of the half-precision multiply-add device, and compared with the design scheme shown in fig. 3, n single-precision multiply-add devices are saved, and hardware cost is reduced.

The asymmetric multiply-add device shown in fig. 5 will be described in detail.

In one embodiment, the asymmetric multiply-add comprises: n multiply-add units, each multiply-add unit comprising: a single-precision multiply-add device and a half-precision multiply-add device; when the asymmetric multiply-add device performs the multiply-add operation of the half-precision floating-point number, the single-precision multiply-add device and the half-precision multiply-add device perform multiply-add operation on the half-precision floating-point number to be processed respectively to obtain corresponding half-precision multiply-add results, and 2N half-precision multiply-add results are obtained in total; when the asymmetric multiply-add device performs single-precision floating-point multiply-add operation, the single-precision multiply-add device performs multiply-add operation on the single-precision floating-point to be processed to obtain corresponding single-precision multiply-add results, and N single-precision multiply-add results are obtained in total.

The half-precision floating point number to be processed comprises: the first half precision multiplier, the second half precision multiplier and the half precision adder, the single precision multiplier-adder comprising: the device comprises a first conversion unit, a determination module and a second conversion unit; the first conversion unit is used for converting the first half-precision multiplier, the second half-precision multiplier and the half-precision addend into single precision to obtain the first single-precision multiplier, the second single-precision multiplier and the single-precision addend; the determining module is used for determining a single-precision multiplication and addition result according to the decimal of the first single-precision multiplier, the step code of the first single-precision multiplier, the sign of the first single-precision multiplier, the decimal of the second single-precision multiplier, the step code of the second single-precision multiplier, the sign of the single-precision addend, the decimal of the single-precision addend, the step code of the single-precision addend and the sign of the single-precision addend; the second conversion unit is used for converting the single-precision multiply-add result to obtain a half-precision multiply-add result.

Wherein, the determination module includes: the device comprises a decimal multiplication operation module, a step code addition operation module, an addition operation module and a determination unit; the decimal multiplication operation module is used for determining the decimal of the multiplication operation result according to the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier; the step code addition operation module is used for determining the step code of the multiplication operation result according to the step code of the first single-precision multiplier and the step code of the second single-precision multiplier; the addition operation module is used for determining the sign of a multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier; the determining unit is used for determining the decimal of the multiplication result, the sign of the multiplication result and the code of the multiplication result according to the decimal of the multiplication result, the code of the single-precision addend, the sign of the single-precision addend and the decimal of the single-precision addend; and determining a single-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

Wherein the determining unit includes: the device comprises a first step code subtraction operation module, a first shift operation module, a first addition operation module and a first multiplication and addition result step code determination module; the first step code subtraction operation module is used for determining a step code difference value according to the step code of the multiplication operation result and the step code of the single-precision addend; the first shift operation module is used for carrying out shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the step code of the multiplication result, the step code of the single-precision addend and the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation; the first addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addend, the decimal of the multiplication and addition result after the shift operation and the decimal of the single-precision addend after the shift operation; the first multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

op01.exp＝op0.exp+op1.exp-bias

Specifically, the addition operation module is specifically configured to:

Specifically, the first shift operation module is specifically configured to:

comparing the step code of the multiplication result with the step code of the single-precision addend; if the step code of the multiplication result is larger than or equal to the step code of the single-precision addend, shifting the decimal of the single-precision addend to the right by the bit number corresponding to the step code difference value; if the step code of the multiplication result is smaller than the step code of the single-precision addend, the decimal of the multiplication result is shifted to the right by the bit number corresponding to the step code difference value.

Specifically, the first addition operation module is specifically configured to: if the sign of the multiplication result is different from the sign of the single-precision addend, inverting and adding one to the decimal of the single-precision addend after the shift operation to obtain the decimal of the single-precision addend after inverting and adding one; and summing the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the inverse addition, and taking the summation result as the decimal of the multiplication result; if the sign of the multiplication result is the same as that of the single-precision addend, the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation are summed, and the sum result is used as the decimal of the multiplication and addition result.

In one embodiment, a single precision floating point number to be processed includes: a first single precision multiplier, a second single precision multiplier, and a single precision addend; the single-precision multiplier adder is specifically used for determining the decimal of the multiplication result according to the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier; determining the step code of the multiplication result according to the step code of the first single-precision multiplier and the step code of the second single-precision multiplier; determining the sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier; determining the decimal of the multiplication result, the sign of the multiplication result and the step code of the multiplication result according to the decimal of the multiplication result, the step code of the single-precision addend, the sign of the single-precision addend and the decimal of the single-precision addend; and determining a single-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

The single-precision multiply-add device comprises a second code subtraction operation module, a second shift operation module, a second addition operation module and a second multiply-add result code determination module: the second code subtraction operation module is used for determining a code difference value according to the code of the multiplication result and the code of the single-precision addend; the second shift operation module is used for carrying out shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the step code of the multiplication result, the step code of the single-precision addend and the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation; the second addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addition number, the decimal of the multiplication and addition result after the shift operation and the decimal of the single-precision addition number after the shift operation; the second multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

In one embodiment, as shown in fig. 11, a multiply-add method is provided, which is applied to the asymmetric multiply-add device shown in fig. 5, and includes the steps of:

S1102, when the asymmetric multiply-add device performs the half-precision floating-point multiply-add operation, the single-precision multiply-add device and the half-precision multiply-add device perform multiply-add operation on the half-precision floating-point to be processed respectively to obtain corresponding half-precision multiply-add results, and 2N half-precision multiply-add results are obtained in total.

It should be noted that: the process of the half-precision multiply-add unit executing the half-precision floating-point multiply-add operation can refer to the prior art, and the embodiments of the present application are not described herein again. The following focuses on the process of a single-precision multiply-add device performing a half-precision floating-point multiply-add operation.

The half-precision floating point number to be processed comprises: for convenience of explanation, the two half-precision multipliers are referred to as a first half-precision multiplier and a second half-precision multiplier, respectively.

Optionally, the single-precision multiply-add device may convert the first half-precision multiplier, the second half-precision multiplier, and the half-precision addend into single-precision, obtain the first single-precision multiplier, the second single-precision multiplier, and the single-precision addend, and perform multiply-add operation on the first single-precision multiplier, the second single-precision multiplier, and the single-precision addend.

The multiply-add operation expression is:

dst＝op0*op1+op2

where op0 is the converted first single precision multiplier, op1 is the converted second single precision multiplier, op2 is the converted single precision addend, and dst is the single precision multiply-add result.

Optionally, after the conversion, the single-precision multiply-add device may determine a single-precision multiply-add result according to op0.mant, op1.mant, op2.mant, op0.exp, op1.exp, op2.exp, op0.s, op1.s, and op2.s; and converting the single-precision multiply-add result to obtain a half-precision multiply-add result. Therefore, the single-precision multiply-add device can also perform multiply-add operation of the half-precision floating point number, the utilization rate of the single-precision multiply-add device is improved, n half-precision multiply-add devices can be arranged in the scheme of the embodiment of the application, and hardware cost is reduced.

Optionally, the process of converting the half-precision floating point number into the single-precision floating point number may refer to the prior art, which is not described in detail in the embodiments of the present application.

Optionally, the single-precision multiply adder may determine op01.Mant according to op0.Mant and op 1.Mant; determining op01.Exp according to op0.Exp and op2. Exp; determining op01.S according to op0.S and op 1.S; determining dst.mant, dst.s and dst.exp according to op01.mant, op01.exp, op01.s, op2.exp, op2.s and op2.mant; based on dst.mant, dst.S and dst.exp, the single-precision floating point number is standardized into a data format of the single-precision floating point number, so that a single-precision multiply-add result is obtained.

The following describes the structure of the single-precision multiply-add and the half-precision multiply-add:

in one possible implementation, the single-precision multiply-add and the half-precision multiply-add may be designed as the structure illustrated in fig. 12, and as shown in fig. 12, the half-precision multiply-add includes: the system comprises a decimal multiplication operation module 30, a step code addition operation module 31, a step code subtraction operation module 32, a shift operation module 33, an addition operation module 34, a normalization module 35 and a multiplication and addition result step code determination module 36. The single precision multiply adder includes: a decimal multiplication operation module 40, a step code addition operation module 41, a step code subtraction operation module 42, a shift operation module 43, an addition operation module 44, a normalization module 45 and a multiplication and addition result step code determination module 46. It should be noted that: the single-precision multiply-add device is compared with the half-precision multiply-add device, and further comprises: a first conversion unit 47 and a second conversion unit 48. When the single-precision multiply-add device executes multiply-add operation of the half-precision floating point number, the first half-precision multiplier, the second half-precision multiplier and the half-precision addend are converted into single precision through the first conversion unit 47 to obtain op0, op1 and op2, dst=op0×op1+op2 is calculated to obtain a single-precision multiply-add result, and then the single-precision multiply-add result is converted to obtain the half-precision multiply-add result.

The following describes in detail, with reference to the structure shown in fig. 12, the process of performing the half-precision floating-point multiply-add operation by the single-precision multiply-add device, and referring to fig. 13, the process specifically includes:

s1301, the first conversion unit 47 converts the first half precision multiplier, the second half precision multiplier, and the half precision addend into single precision, resulting in op0, op1, and op2.

Optionally, the process of converting the half-precision floating point number into the single-precision floating point number may refer to the prior art, and the embodiments of the present application are not described herein again. After the first conversion unit 47 converts op0, op1, and op2, op0.Mant, and op1.Mant are passed to the fractional multiplication module 40.

S1302, the decimal multiplication operation module 40 calculates the product op01.Mant of op0.Mant and op 1.Mant.

Specifically, op0.Mant and op1.Mant are 24-bit binary data, and op01.Mant obtained by calculation is 48-bit binary data.

S1303, the step addition operation module 41 calculates op01.Exp.

S1304, the adder module 44 exclusive-ors the op 0S and the op 1S to obtain op 01S.

Alternatively, op0.S, op1.S, and op2.S may be input to the addition module 44, so that the addition module 44 may perform an exclusive or operation to obtain op01.S.

S1305, the step-code subtracting module 42 calculates the absolute values of the step-code difference values of op01.Exp and op2. Exp.

S1306, the shift operation module 43 compares op01.Exp and op2.Exp, and performs a shift operation based on the comparison result.

S1307, the addition operation module 44 compares op01.S with op2.S, and performs summation operation according to the comparison result to obtain dst.

S1308, the multiplication and addition result order code determining module 26 determines dst.exp according to op01.Exp and op2. Exp.

And when dst.exp is obtained, the dst.exp can be standardized to obtain the 8-bit binary code of the standardized single-precision floating point number.

S1309, the normalization module 45 normalizes the dst.mant obtained in the S1307 to obtain the 23-bit binary mantissa of the standardized single-precision floating point number.

S1310, the second conversion unit converts the 48-bit binary mantissa into a 10-bit binary mantissa and converts the 8-bit binary level code into a 5-bit binary level code.

Alternatively, the addition module 44 may pass dst.s to the normalization module 45, in which case the normalization module 45 outputs sign bits in addition to the 23-bit binary mantissa. The sign bit, the 10-bit binary mantissa obtained in S1310, and the 5-bit binary order code obtained in S1310 constitute a half-precision multiply-add result.

According to the multiplication and addition method, the half-precision floating point number is converted into the single precision, so that the single-precision multiplication and addition device can also carry out the multiplication and addition operation of the half-precision floating point number, the utilization rate of the single-precision multiplication and addition device is improved, n half-precision multiplication and addition devices can be arranged in a small number, and the hardware cost is reduced.

It should be noted that: in fig. 12, the process of performing the half-precision floating point number operation by the half-precision multiply-add device is compared with the process shown in fig. 13, and therefore, S1201 and S1210 are fewer, and other processes are similar, which are not repeated herein in the embodiment of the present application.

S1104, when the asymmetric multiply-add device performs single-precision floating-point multiply-add operation, each single-precision multiply-add device performs multiply-add operation on the single-precision floating-point to be processed to obtain corresponding single-precision multiply-add results, and N single-precision multiply-add results are obtained in total.

The single-precision floating point number to be processed comprises two single-precision multipliers and a single-precision addend, and for convenience of explanation, the two single-precision multipliers are respectively called a first single-precision multiplier and a second single-precision multiplier.

The multiply-add operation expression is:

dst＝op0*op1+op2

It should be noted that: unlike the half-precision floating-point multiply-add process, the single-precision multiply-add device does not need to convert when performing the single-precision floating-point multiply-add operation because the single-precision multiply-add device supports the single-precision floating-point multiply-add operation.

Optionally, the single-precision multiply adder may determine op01.Mant according to op1.Mant and op2.Mant; determining op01.Exp according to op0.Exp and op 1.Exp; the op01.S can be determined according to op0.S and op 1.S; the dst.mant, dst.exp, and dst.s can be determined from op01.mant, op01.exp, op01.s, op2.mant, op2.exp, and op 2.s; further, based on dst.mant, dst.exp, and dst.S, they are normalized to the data format of single-precision floating-point numbers, thereby obtaining a single-precision multiply-add result.

Similarly, the two operands of the addition operation are op01 and op2, in order to make the order codes of the two numbers identical, the decimal shift is needed, and the single-precision multiply adder can determine the absolute value of the order code difference value according to op01.Exp and op2. Exp; performing shift operation on the op01.Mant or the op2.Mant according to absolute values of the op01.Exp, the op2.Exp and the step difference value to obtain the op01.Mant after the shift operation and the op2.Mant after the shift operation; determining dst.mant and dst.S according to op01.S, op2.S, op01.mant after shift operation and op2.mant after shift operation; the dst.exp is determined from op01.Exp and op2. Exp.

The following describes in detail, with reference to the structure shown in fig. 14, a process of performing a single-precision floating-point number multiply-add operation by the single-precision multiply-add device, and referring to fig. 15, the process specifically includes:

S1501, the decimal multiplication module 40 calculates the product of op0.Mant and op1. Mant.

Specifically, since conversion is not required when the single-precision floating-point multiply-add operation is performed, the first conversion unit 47 and the second conversion unit 48 are in an invalid state at this time, and only the function of transparent transmission is specified. op1. And op2. Are both 24-bit binary data, and op01. Is calculated as 48-bit binary data.

S1502, the step code addition operation module 41 calculates op01.Exp.

S1503, the addition module 44 performs exclusive OR operation on the op0.S and the op1.S to obtain op01.S.

S1504, the step-code subtracting block 42 calculates the absolute values of the step-code difference values of op01.Exp and op2. Exp.

S1505, the shift operation module 43 compares op01.Exp and op2.Exp, and performs a shift operation according to the comparison result.

S1506, the addition module 44 compares op01.S with op2.S, and performs a summation operation according to the comparison result. Dst.mant is obtained.

S1507, the multiplication and addition result order code determining module 46 determines dst.exp according to op01.Exp and op2. Exp.

S1508, the normalization module 25 normalizes dst.mant obtained in S1506 to obtain 23-bit binary mantissas of the normalized single-precision floating point number.

The implementation process of S1501-S1508 is similar to that of S1302-S1309, and the embodiments of the present application are not described herein again. The sign bit, the 8-bit binary order code obtained in S1507, and the 23-bit binary mantissa obtained in S1508 constitute a single-precision multiply-add result.

According to the multiply-add method, when the asymmetric multiply-add device performs half-precision multiply-add operation on the floating-point number, each single-precision multiply-add device and each half-precision multiply-add device perform multiply-add operation on the half-precision floating-point number to be processed to obtain corresponding half-precision multiply-add results, and 2N half-precision multiply-add results are obtained in total; when the asymmetric multiply-add device performs single-precision floating-point multiply-add operation, each single-precision multiply-add device performs multiply-add operation on the single-precision floating-point to be processed to obtain corresponding single-precision multiply-add results, and N single-precision multiply-add results are obtained in total. According to the scheme of the embodiment of the application, when the half-precision floating point number multiply-add operation is carried out, the single-precision multiply-add device also participates in the operation, is not idle, the utilization rate of the single-precision multiply-add device is improved, and compared with the design scheme shown in fig. 3, n half-precision multiply-add devices are saved, and hardware cost is reduced.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages. The steps performed by the modules in the above embodiment of the apparatus may be referred to in the description of the method embodiment.

The various modules in the multiply-add device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. The scope of the application is to be determined by the following claims.

Claims

1. A high performance multiply-add device, the high performance multiply-add device comprising: n single precision multiply-add units, each single precision multiply-add unit comprising: two half-precision multiply-add devices;

when the high-performance multiply-add device performs single-precision floating point multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are used for combining single-precision floating point multiply-add operation to be processed to obtain corresponding single-precision multiply-add results, and N multiply-add results are obtained in total;

2. The high performance multiply-add of claim 1, wherein the two half-precision multiply-add comprises: the single-precision floating point number to be processed comprises: a first single precision multiplier, a second single precision multiplier, and a single precision addend;

when the high-performance multiply-add device performs single-precision floating-point multiply-add operation, the first half-precision multiply-add device is specifically configured to perform a first partial multiply operation to obtain a first multiply operation result, and transmit the first multiply operation result to the second half-precision multiply-add device; the second half-precision multiplier adder is specifically configured to perform a second partial multiplication operation to obtain a second multiplication operation result, where the first partial multiplication operation and the second partial multiplication operation are divided according to a preset rule based on the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier; the second half-precision multiply adder is further used for determining the decimal of the multiplication result according to the first multiplication result and the second multiplication result; determining the single-precision multiplication and addition result according to the decimal of the multiplication result, the step code of the first single-precision multiplier, the step code of the second single-precision multiplier, the sign of the first single-precision multiplier, the sign of the second single-precision multiplier, the decimal of the single-precision addend, the step code of the single-precision addend and the sign of the single-precision addend.

3. The high performance multiply-add of claim 2, wherein the second half-precision multiply-add comprises: the device comprises a step code addition operation module, a decimal addition module and a determination module;

the step code addition operation module is used for determining the step code of a multiplication operation result according to the step code of the first single-precision multiplier and the step code of the second single-precision multiplier;

the decimal addition module is used for determining the sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier;

the determining module is used for determining the decimal of the multiplication result, the sign of the multiplication result and the step code of the multiplication result according to the step code of the multiplication result, the step code of the single-precision addend, the sign of the single-precision addend, the decimal of the multiplication result and the decimal of the single-precision addend; and determining the single-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

4. The high performance multiply-add of claim 3, wherein the determining module comprises: the device comprises a first step code subtraction operation module, a first shift operation module, a first addition operation module and a first multiplication and addition result step code determination module;

The first step code subtraction operation module is used for determining an absolute value of a step code difference value according to the step code of the multiplication operation result and the step code of the single-precision addend;

the first shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the step code of the multiplication result, the step code of the single-precision addend, and the absolute value of the step code difference value, so as to obtain the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation;

the first addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addition number, the decimal of the multiplication and addition result after the shift operation and the decimal of the single-precision addition number after the shift operation;

the first multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

5. The high-performance multiply-add device of claim 3, wherein the step-code addition operation module is specifically configured to:

Determining the step code of the multiplication result by adopting the following formula:

op01.exp＝op0.exp+op1.exp-bias

6. A high performance multiply-add module according to claim 3, wherein the fractional addition module is specifically configured to:

and performing exclusive OR operation on the sign of the first single-precision multiplier and the sign of the second single-precision multiplier to obtain the sign of the multiplication result.

7. The high performance multiply-add of claim 4, wherein the first shift-operation module is specifically configured to:

comparing the step code of the multiplication result with the step code of the single-precision addend;

if the step code of the multiplication result is larger than or equal to the step code of the single-precision addend, right-shifting the decimal of the single-precision addend by the bit number corresponding to the absolute value;

if the step code of the multiplication result is smaller than the step code of the single-precision addend, the decimal of the multiplication result is shifted to the right by the bit number corresponding to the absolute value.

8. The high performance multiply-add device of claim 4, wherein the first add operation module is specifically configured to:

If the sign of the multiplication result is different from the sign of the single-precision addend, inverting the decimal of the single-precision addend after the shifting operation, and adding one to obtain the decimal of the single-precision addend after inverting and adding one; summing the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the inversion and one addition, and taking the summation result as the decimal of the multiplication and addition result;

and if the sign of the multiplication result is the same as that of the single-precision addend, summing the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation, and taking the summation result as the decimal of the multiplication and addition result.

9. The high performance multiply-add device of claim 4, wherein the first multiply-add result order code determination module is specifically configured to:

if the step code of the multiplication result is larger than or equal to the step code of the single-precision addend, taking the step code of the multiplication result as the step code of the multiplication and addend result;

and if the step code of the multiplication result is smaller than the step code of the single-precision addend, taking the step code of the single-precision addend as the step code of the multiplication result.

10. The high performance multiply-add device of claim 1, wherein the half-precision floating-point number to be processed comprises: a first half precision multiplier, a second half precision multiplier, and a half precision addend;

when the high-performance multiply-add device performs half-precision floating-point multiply-add operation, the half-precision multiply-add device is specifically configured to determine a fraction of a multiplication result according to the fraction of the first half-precision multiplier and the fraction of the second half-precision multiplier; determining the step code of a multiplication result according to the step code of the first half-precision multiplier and the step code of the second half-precision multiplier; determining the sign of a multiplication result according to the sign of the first half-precision multiplier and the sign of the second half-precision multiplier; determining the decimal of the multiplication result, the sign of the multiplication result and the code of the multiplication result according to the decimal of the multiplication result, the code of the half-precision addend, the sign of the half-precision addend and the decimal of the half-precision addend; and determining the semi-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

11. The high performance multiply-add device of claim 10, wherein the half-precision multiply-add device comprises a second order subtraction module, a second shift operation module, a second addition operation module, and a second multiply-add result order code determination module:

the second code subtraction module is used for determining the absolute value of a code difference value according to the code of the multiplication result and the code of the half-precision addend;

the second shift operation module is configured to perform a shift operation on the decimal of the multiplication result or the decimal of the half-precision addend according to the step code of the multiplication result, the step code of the half-precision addend, and the absolute value of the step code difference value, so as to obtain the decimal of the multiplication result after the shift operation and the decimal of the half-precision addend after the shift operation;

the second addition operation module is used for determining the decimal of the multiplication and addition result and the sign of the multiplication and addition result according to the sign of the multiplication and addition result, the sign of the single-precision addition number, the decimal of the multiplication and addition result after the shift operation and the decimal of the half-precision addition number after the shift operation;

the second multiplication and addition result step code determining module is used for determining the step code of the multiplication and addition result according to the step code of the multiplication operation result and the step code of the single-precision addition number.

12. A multiply-add method for use with the high performance multiply-add apparatus of any one of claims 1-11, the method comprising:

when the high-performance multiply-add device performs single-precision floating point multiply-add operation, two half-precision multiply-add devices in each single-precision multiply-add unit are combined to perform multiply-add operation on the single-precision floating point to be processed, so that corresponding single-precision multiply-add results are obtained, and N multiply-add results are obtained;

13. The method of claim 12, wherein the two half-precision multiply-add devices comprise: the single-precision floating point number to be processed comprises: a first single precision multiplier, a second single precision multiplier, and a single precision addend;

the two half-precision multiply-add devices are combined to perform multiply-add operation on the single-precision floating point number to be processed to obtain a corresponding single-precision multiply-add result, and the method comprises the following steps:

the first half-precision multiply adder executes a first partial multiply operation to obtain a first multiply operation result, and transmits the first multiply operation result to the second half-precision multiply adder;

The second half-precision multiply adder executes a second partial multiply operation to obtain a second multiply operation result, and the first partial multiply operation and the second partial multiply operation are divided according to a preset rule based on the decimal of the first single-precision multiplier and the decimal of the second single-precision multiplier;

the second half-precision multiply adder determines the decimal of the multiplication result according to the first multiplication result and the second multiplication result;

the second half-precision multiply-add device determines the single-precision multiply-add result according to the decimal of the multiplication result, the step code of the first single-precision multiplier, the step code of the second single-precision multiplier, the sign of the first single-precision multiplier, the sign of the second single-precision multiplier, the decimal of the single-precision addend, the step code of the single-precision addend and the sign of the single-precision addend.

14. The method of claim 13, wherein the second half-precision multiply-add determining the single-precision multiply-add result from the fraction of the multiply operation result, the step code of the first single-precision multiplier, the step code of the second single-precision multiplier, the sign of the first single-precision multiplier, the sign of the second single-precision multiplier, the fraction of the single-precision addend, the step code of the single-precision addend, and the sign of the single-precision addend comprises:

The second half-precision multiply adder determines the step code of the multiplication result according to the step code of the first single-precision multiplier and the step code of the second single-precision multiplier;

the second half-precision multiplier adder determines the sign of the multiplication result according to the sign of the first single-precision multiplier and the sign of the second single-precision multiplier;

the second half-precision multiply-add device determines the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result according to the step code of the multiply operation result, the step code of the single-precision addend, the sign of the multiply operation result, the decimal of the multiply operation result and the decimal of the single-precision addend;

the second half-precision multiply-add device determines the single-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

15. The method of claim 14, wherein the second half-precision multiply-add device determining the fraction of the multiply-add result, the sign of the multiply-add result, and the step of the multiply-add result based on the step of the multiply result, the step of the single-precision add, the sign of the single-precision add, the fraction of the multiply result, and the fraction of the single-precision add comprises:

The second half-precision multiply adder determines the absolute value of a step code difference value according to the step code of the multiplication operation result and the step code of the single-precision addend;

the second half-precision multiply-add device performs shift operation on the decimal of the multiplication result or the decimal of the single-precision addend according to the step code of the multiplication result, the step code of the single-precision addend and the absolute value of the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the single-precision addend after the shift operation;

the second half-precision multiply-add device determines the decimal of the multiply-add result and the sign of the multiply-add result according to the sign of the multiply operation result, the sign of the single-precision addend, the decimal of the multiply operation result after the shift operation and the decimal of the single-precision addend after the shift operation;

and the second half-precision multiply-add device determines the step code of the multiply-add result according to the step code of the multiply operation result and the step code of the single-precision addend.

16. The method of claim 12, wherein the half-precision floating point number to be processed comprises: a first half precision multiplier, a second half precision multiplier, and a half precision addend;

Each half-precision multiply-add device performs multiply-add operation on the half-precision floating point number to be processed to obtain a corresponding half-precision multiply-add result, and the method comprises the following steps:

the half-precision multiply adder determines the decimal of the multiplication result according to the decimal of the first half-precision multiplier and the decimal of the second half-precision multiplier;

the half-precision multiply adder determines the step code of a multiplication result according to the step code of the first half-precision multiplier and the step code of the second half-precision multiplier;

the half-precision multiply adder determines the sign of a multiplication result according to the sign of the first half-precision multiplier and the sign of the second half-precision multiplier;

the half-precision multiply-add device determines the decimal of the multiply-add result, the sign of the multiply-add result and the code of the multiply-add result according to the decimal of the multiply operation result, the code of the half-precision addend, the sign of the half-precision addend and the decimal of the half-precision addend;

the half-precision multiply-add device determines the half-precision multiply-add result according to the decimal of the multiply-add result, the sign of the multiply-add result and the step code of the multiply-add result.

17. The method of claim 16, wherein the half-precision multiply-add device determining the fraction of the multiply-add result, the sign of the multiply-add result, and the step of the multiply-add result based on the fraction of the multiply-add result, the sign of the multiply-add result, the step of the half-precision add, the sign of the half-precision add, and the fraction of the half-precision add comprises:

the half-precision multiply adder determines the absolute value of a step code difference value according to the step code of the multiplication result and the step code of the half-precision addend;

the half-precision multiply adder performs shift operation on the decimal of the multiplication result or the decimal of the half-precision addend according to the step code of the multiplication result, the step code of the half-precision addend and the absolute value of the step code difference value to obtain the decimal of the multiplication result after the shift operation and the decimal of the half-precision addend after the shift operation;

the half-precision multiply-add device determines the decimal of the multiply-add result and the sign of the multiply-add result according to the sign of the multiply operation result, the sign of the single-precision addend, the decimal of the multiply operation result after the shift operation and the decimal of the half-precision addend after the shift operation;

The half-precision multiply-add device determines the step code of the multiply-add result according to the step code of the multiply operation result and the step code of the single-precision addend.

18. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 12 to 17 when the computer program is executed.

19. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 12 to 17.