CN109634558B

CN109634558B - Programmable mixed precision arithmetic unit

Info

Publication number: CN109634558B
Application number: CN201811514918.2A
Authority: CN
Inventors: 刘彦; 赵立东
Original assignee: Shanghai Suiyuan Technology Co Ltd
Current assignee: Shanghai Suiyuan Technology Co.,Ltd.
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-01-14
Anticipated expiration: 2038-12-12
Also published as: CN109634558A

Abstract

The application provides a programmable mixed precision operation unit which can support floating point or fixed point multiplication and/or addition operation of various precisions, not only can realize multi-path concurrent low-precision operation, but also can integrally realize high-precision operation, and therefore, the programmable mixed precision operation unit has higher energy efficiency ratio.

Description

Programmable mixed precision arithmetic unit

Technical Field

The application relates to the field of electronic information, in particular to a programmable hybrid precision arithmetic unit.

Background

Deep neural networks are widely applied in the field of artificial intelligence, and application scenarios thereof can be roughly divided into two types, namely Training (Training) and Inference (Inference). The inference algorithm has relatively low requirement on the operation precision, and 8-bit and 16-bit fixed point precisions are mostly used; most training algorithms require 16-bit or 32-bit floating point precision.

The existing arithmetic unit only supports 8-bit or 16-bit fixed point arithmetic and is only suitable for inference; or floating point operation is supported, the method is suitable for training and inference, but the hardware cost is high, the energy consumption is high, and the energy efficiency ratio is low when the method is applied to an inference scene.

Disclosure of Invention

The application provides a programmable mixed precision operation unit, which aims to solve the problems of compatibility of fixed-point and floating-point operations and high energy efficiency ratio.

In order to achieve the above object, the present application provides the following technical solutions:

a programmable mixed-precision arithmetic unit, comprising:

four extended half-precision multipliers and four extended single-precision adders;

any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, and X is a preset numerical value;

any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value;

wherein the four extended half-precision multipliers and the four extended single-precision adders are connected in a first manner or a second manner;

the first mode is as follows: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multiply-add devices;

the second mode is as follows: the first extended single-precision adder is respectively cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier;

the second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier;

and the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder.

Optionally, the extended half-precision multiplier includes:

a single-precision exponent multiplier and an extended half-precision mantissa multiplier which are connected in parallel;

the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.

Optionally, the extended single-precision adder includes:

a single-precision exponent adder and an extended single-precision mantissa adder which are connected in parallel;

the extended single precision mantissa adder is used to extend an input numerical value to Y bits and calculate the sum of the extended numerical values.

A programmable mixed-precision arithmetic unit, comprising:

four extended single precision multipliers and four extended double precision adders;

the extended single-precision multiplier is the programmable mixed-precision operation unit of any one of the preceding items;

the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values, wherein M is a preset numerical value;

wherein the four extended single-precision multipliers and the four extended double-precision adders are connected in a first manner or a second manner;

the first mode is as follows: the four extended single-precision multipliers and the four extended double-precision adders are correspondingly cascaded one by one to form four single-precision multiply-add devices connected in parallel;

the second mode is as follows: the first extended double-precision adder is respectively cascaded with the first extended single-precision multiplier and the second extended single-precision multiplier;

the second extended double-precision adder is respectively cascaded with the third extended single-precision multiplier and the fourth extended single-precision multiplier;

the third extended double-precision adder is respectively cascaded with the first extended double-precision adder and the second extended double-precision adder.

Optionally, the extended double-precision adder includes:

a double-precision exponent adder and an extended double-precision mantissa adder which are connected in parallel;

the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values.

A programmable mixed-precision arithmetic unit, comprising:

a programmable mixed-precision arithmetic unit of any of the preceding in parallel.

A programmable mixed-precision arithmetic unit, comprising:

four extended half-precision multipliers and three extended single-precision adders;

the first extended half-precision multiplier is used for calculating MSBa and MSBb after an input numerical value is extended to X bits to obtain a first product;

the second expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating MSBa LSBb to obtain a second product;

the third expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and MSBb to obtain a third product;

the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product; the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, and LSBb is the low order of the expanded second numerical value;

the first extended single-precision adder is used for extending the first product and the second product to Y bits and then calculating the sum of the extended first product and the extended second product to obtain a first addition result;

the second extended single-precision adder is used for extending the third product and the fourth product to Y bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result;

and the third extended single-precision adder is used for calculating the sum of the extended first addition result and the extended second addition result after the first addition result and the second addition result are extended to Y bits.

Optionally, the method further includes:

and a fourth extended single-precision adder, configured to extend the two single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the single-precision values is the sum of the first addition result and the second addition result.

A programmable mixed-precision arithmetic unit, comprising:

four extended single precision multipliers and three extended double precision adders;

the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit;

the first extended double-precision adder is used for extending the first product and the second product to M bits, and then calculating the sum of the extended first product and the extended second product to obtain a first addition result; the first product is an output result of a first extended single-precision multiplier, and the second product is an output result of a second extended single-precision multiplier;

the second extended double-precision adder is used for extending the third product and the fourth product to M bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result; the third product is an output result of a third extended single-precision multiplier, and the fourth product is an output result of a fourth extended single-precision multiplier;

and the third extended double-precision adder is used for calculating the sum of the extended first addition result and the second addition result after the first addition result and the second addition result are extended to M bits.

Optionally, the method further includes:

and a fourth extended double-precision adder, configured to extend the two double-precision values to M bits, and calculate a sum of the two extended double-precision values, where any one of the double-precision values is the sum of the first addition result and the second addition result.

The programmable mixed precision operation unit can support floating point or fixed point multiplication and/or addition operation of various precisions, can realize multipath concurrent low-precision operation, and can realize high-precision operation integrally, so that the programmable mixed precision operation unit has higher energy efficiency ratio.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a programmable mixed-precision computing unit according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;

fig. 3 is a schematic diagram illustrating different operations implemented by switching the connection modes of the programmable mixed-precision operation unit disclosed in the embodiment of the present application;

FIG. 4 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;

FIG. 5 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;

fig. 6 is a schematic structural diagram of another programmable mixed-precision arithmetic unit disclosed in the embodiment of the present application.

Detailed Description

The programmable mixed precision arithmetic unit disclosed by the embodiment of the application can be applied to, but not limited to, a deep neural network, and is suitable for training and deducing processes in terms of arithmetic types; in terms of hardware, it can be provided in general purpose central processing units (CPUs such as Intel/AMD x86 CPUs), graphics processors (GPUs such as NVidia V100), neuron processors (such as Google TPU), field programmable gate arrays (FPGAs and Application Specific Integrated Circuits (ASICs)).

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a programmable mixed-precision arithmetic unit disclosed in an embodiment of the present application, including: four extended half-precision multipliers 1 and four extended single-precision adders 2.

Any one of the extended half-precision multipliers is used for extending an input value to X bits and calculating a product of a first value and a second value, wherein the first value is a higher value or a lower value of the extended value of one input value, the second value is a higher value or a lower value of the extended value of the other input value, and X is a preset value.

Any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value.

In the present embodiment, "extended" means that the function of extending the number of digits of a numerical value to a preset numerical value is provided, but if the input numerical value is full of the preset numerical value, the extension is not performed.

Specifically, any one of the extended half-precision multipliers 1 includes: a single-precision exponent multiplier 11 and an extended half-precision mantissa multiplier 12 are connected in parallel.

Wherein the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.

Specifically, the value range of X includes bits 1 and 16 ~ 22, where the part exceeding 16 bits (input) as precision will be used to maintain the precision of the intermediate result, and does not affect the implementation of the present invention,

bits

2 and 11 ~ 15, and it is also the content covered by the present invention that only the support of the fixed point 32 bit calculation is missing compared with 1.

Specifically, the single-precision exponent multiplier is used for calculating the product of exponents of single-precision floating-point numbers. The single-precision exponential multiplier supports multiplication of single-precision numerical values and can be downward compatible with multiplication of half-precision numerical values.

Based on the structure of the extended half-precision multiplier, the extended half-precision multiplier can realize multiplication of half-precision floating point numbers and multiplication of 8-bit or 16-bit fixed point numbers. For example, the following steps are carried out:

for half-precision floating-point multiplication: assume two half-precision floating-point numbers C and D:

c = 1.Mc 2^ Ec, D = 1.Md 2^ Ed, where 1.Mc is the mantissa of C, 2^ Ec is the exponent of C, 1.Md is the mantissa of D, and 2^ Ed is the exponent of D.

The product of C and D is then: c + D = (1.Mc + 1.Md) × 2^ (Ec + Ed).

After 1.Mc and 1.Md are extended from 11 bits to 16 bits by low-bit extension (zero padding), respectively, assume that 1.Ma is extended as follows:

1.Mc’= Mc*2^-15。

md after extension is:

1.Md’= Md*2^-15。

let X = C + D =1.Mx + 2^ Ex;

then Ex = Ec + Ed;

1) if Mc Md 2^ -30> =2

Mx = Mc Md 2^ -31 (Mx rounded to 10 bits)

Ex=Ec+Ed +1

2) If Mc Md 2^ -30<2

Mx = Mc Md 2^ -30 (Mx rounded to 10 bits)

Ex=Ec+Ed。

Therefore, the multiplication of the two half-precision floating point numbers comprises multiplication, shift operation and addition operation, so that the addition operation can be carried out by using a single-precision exponent multiplier and the multiplication operation can be carried out by using a half-precision mantissa multiplier, and the shift operation is realized by using the conventional shift operation module.

For 8-bit fixed-point multiplication: assuming that the fixed-point numbers of two 8 bits are a and B, let 1.Ma = a (signed or unsigned extension to 16 bits), and 1.Mb = B, then a = B = 1.Ma × 1.Mb (cut 16 lower bits), and the operation can be completed using an extended half-precision mantissa multiplier.

Any extended single precision adder 2 includes: a single-precision exponent adder 21 and an extended single-precision mantissa adder 22 connected in parallel.

The extended single-precision mantissa adder is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values.

Specifically, the value range of Y includes 32 ~ 44 bits, where the part exceeding 32 bits (output) as precision will be used to maintain the precision of the intermediate result, without affecting the implementation of the present invention.2, 22 ~ 30 bits, and compared with 1, only missing the support for the fixed point 32 bit calculation is also covered by the present invention.

Specifically, the single-precision exponent adder is used for calculating the sum of exponents of single-precision or half-precision floating point numbers.

Based on the above structure, any one extended single-precision adder can support:

1. single precision addition;

2. half precision addition, wherein half precision is expanded into single precision and then converted into half precision after addition;

3. fixed-point 32-bit addition is realized by directly utilizing an extended mantissa adder, and the exponent parts of addition operands are defaulted to be the same constants;

4. the fixed point 16-bit and 8-bit addition is expanded to 32-bit fixed point for operation, and the result is converted into 16-bit or 8-bit (lower bit).

The four extended half-precision multipliers 1 and the four extended single-precision adders 2 have two connection modes:

fig. 1 shows a first connection: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multipliers and adders.

Based on the above specific structure, any one of the half-precision multipliers and adders can realize the following operations:

1. a half precision multiply or add operation;

2. a half precision multiply-add operation;

3. a fixed point 8-bit 16-bit multiply or add operation;

4. a fixed point 8-bit 16-bit multiply-add operation;

5. a single precision addition operation;

6. a fixed point 32-bit addition operation.

It can be seen that the first connection scheme can constitute 4 half-precision multipliers and that the 4 half-precision multipliers can perform the above 6 operations in parallel (one half-precision multiplier and one operation at the same time).

It should be noted that, in the embodiments of the present application, according to an existing structure and a known operation type, a person skilled in the art can know how to implement the known operation type using the existing structure, and details are not repeated here. Other configurations can be similar and are not exhaustive in the embodiments of the present application.

Fig. 2 shows a second connection:

the four extended half-precision multipliers are respectively called: a first extended half-precision multiplier, a second extended half-precision multiplier, a third extended half-precision multiplier, and a fourth extended half-precision multiplier. The four extended single-precision adders are referred to as a first extended single-precision adder, a second extended single-precision adder, a third extended single-precision adder, and a fourth extended single-precision adder, respectively. The first extended single-precision adder is cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier, respectively. The second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier. And the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder. The fourth extended single-precision adder is not connected to other extended single-precision adders or extended half-precision multipliers.

Based on the above specific structure, an example of the second connection relationship shown in fig. 2 for implementing the multiply-add operation is:

specifically, assume that two single-precision floating-point numbers are a and B:

a = 1.Ma x 2^ Ea, B = 1.Mb x 2^ Eb, where 1.Ma is the mantissa of A, 2^ Ea is the exponent of A, 1.Mb is the mantissa of B, and 2^ Eb is the exponent of B.

The product of a and B is: a = (1. Ma:1. Mb) × 2^ (Ea + Eb).

After 1.Ma and 1.Mb are extended from 23 bits to 32 bits by low-bit extension (zero padding), respectively, it is assumed that 1.Ma is extended as follows:

1.Ma’= MSBa*2^-15 + LSBa * 2^-31。

mb extended is:

1.Mb’= MSBb*2^-15 + LSBb * 2^-31。

wherein MSB represents a high bit, LSB represents a low bit, MSBa represents a high bit of the mantissa obtained after 1.Ma expansion, LSBa represents a low bit of the mantissa obtained after 1.Ma expansion, MSBb represents a high bit of the mantissa obtained after 1.Mb expansion, and LSBb represents a low bit of the mantissa obtained after 1.Mb expansion. The MSB and LSB are 16-bit fixed point numbers respectively.

The product of the mantissas is:

X=1.Ma*1.Mb= 1.Ma’ * 1.Mb’

=(MSBa*2^-15+LSBa*2^-31)* (MSBb*2^-15 + LSBb * 2^-31 )

= 2^-15 *( (MSBa * MSBb) + 2^-16 *( MSBa * LSBb + LSBa * MSBb) + 2^-32 * (LSBa * LSBb)) 。(1)

it can be seen that equation (1) includes four products, and the product of mantissas is the sum of the four products, so that, based on equation (1), 4 extended half-precision multipliers in fig. 2 (specifically, extended half-precision mantissa multipliers in the extended half-precision multipliers) are sequentially used to calculate 4 products in equation (1), and 3 adders are used to calculate the sum of the 4 products.

Namely: fig. 2 comprises 3 layers from top to bottom: in the first layer, from left to right, the first extended half-precision multiplier is used to calculate MSBa × MSBb, the second extended half-precision multiplier is used to calculate MSBa × LSBb, the third extended half-precision multiplier is used to calculate LSBa × MSBb, and the fourth extended half-precision multiplier is used to calculate LSBa × LSBb. Two extended single precision adders in a second layer, in left-to-right order, the first extended single precision adder for calculating a sum of MSBa MSBb and MSBa LSBb (hereinafter the sum of MSBa MSBb and MSBa LSBb is referred to as a first addition result), and the second extended single precision adder for calculating a sum of LSBa MSBb and LSBa LSBb (hereinafter the sum of LSBa MSBb and LSBa LSBb is referred to as a second addition result). The extended single precision adder of the third layer is used for calculating the sum of the first addition result and the second addition result.

It should be noted that 2^ -15, 2^ -16 and 2^ -32 in the formula (1) are realized through shift operation, and the shift operation can be realized by using the existing shift operation module, which can be seen in the prior art specifically and is not described herein again, and the shift operation module is not shown in fig. 2.

When the shift operation is involved, the semi-precision floating-point multiplier can realize the purpose of precision protection for the expansion of mantissas.

As is apparent from the above description, in the configuration shown in fig. 2, the mantissa operation in one single-precision floating-point multiplication is synthesized by combining 4 times of half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition operations.

The exponent operation 2^ Ea × 2^ Eb in the product of A and B can be realized by any single-precision exponent multiplier, and the multiplication and addition of the mantissa multiplication result and the exponent multiplication result can be obtained by shift operation (a shift module is the prior art and is not shown in FIG. 2), so that the connection relation shown in FIG. 2 has the function of realizing the multiplication of floating point numbers.

And by combining the fourth extended single-precision adder, the result of the multiplication of the floating point number twice can be used as the input of the fourth extended single-precision adder, so that the multiplication and addition operation is realized. Namely: combining 4 times of extended half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition, and one extended single-precision floating point addition operation, one single-precision floating point multiply-add operation can be synthesized. That is, 4 extended half-precision multipliers and 4 extended single-precision adders may constitute one single-precision floating-point multiply-add unit and be downward compatible with half-precision floating-point multiply-add operations.

Because fixed-point operations can be implemented using the mantissa portion of floating-point operations, and the exponent portion, which corresponds to floating-point operations, is skipped, fixed-point multiplications can also be implemented using the connection shown in fig. 2, which can implement 8-bit or 16-bit or 32-bit fixed-point multiplications based on the precision of extended half-precision multipliers. The implementation of the 32-bit fixed-point multiplication is the same as the above-mentioned mantissa multiplication (since the value of the input extended half-precision multiplier is already 32 bits, the half-precision multiplier does not need to extend the value). For 8-bit or 16-bit fixed-point multiplication, the operation process of multiplying the mantissa after the value is expanded by the half-precision multiplier is the same as that of the mantissa, and the description is omitted here.

Here, an example is given of a 32-bit fixed-point multiply-add operation for the connection relationship shown in fig. 2:

assume two 32-bit fixed-point numbers a and B, respectively:

a = MSBa 2^16+ LSBa, B = MSBa 2^16+ LSBb, wherein MSBa is the upper 16 bits of A, and LSBa is the lower 16 bits of A; MSBb is the upper 16 bits of B, and LSBb is the lower 16 bits of B.

The product of a and B is: a ^ B = (MSBa ^ 2^16+ LSBa) (MSBa ^ 2^16+ LSBb)

= 2^32 *(MSBa * MSBb) + 2^16 *( MSBa * LSBb + LSBa * MSBb) + (LSBa *LSBb) 。

Similar to single precision floating point operation, 4 extended half-precision mantissa multipliers, three shifters and three extended single-precision mantissa adders are used to complete the operation of a × B. And intercepting the lower 32 bits of the result, and if the result is greater than the maximum value represented by the 32 bits of energy during interception, performing saturation or overflow processing according to the algorithm requirement.

Finally, an extended single-precision mantissa adder is cascaded to complete a 32-bit fixed point multiply-add operation.

In summary, the four extended half-precision multipliers and the four extended single-precision adders constitute a reconfigurable arithmetic unit (fig. 1 and 2) capable of supporting multiply and/or add operations of multiple precisions, such as half-precision, single-precision floating-point numbers, and also capable of supporting multiply and/or add operations of 8-bit, 16-bit, and 32-bit fixed-point numbers, i.e., capable of supporting mixed-precision floating-point and fixed-point multiply and/or add operations.

More importantly, a part of the arithmetic units in the arithmetic unit can support lower-precision operation, for example, a single-precision exponential multiplier and any one extended half-precision multiplier realize half-precision floating-point multiplication, and the whole arithmetic unit supports higher-precision operation. Moreover, the operation of a part of the operators does not affect the operation of other operators connected in parallel, so that the multi-path concurrent low-precision operation can be realized, and the whole can be used as a high-precision operation unit.

In the prior art, if independent single-precision and semi-precision calculators are adopted, only one of the calculators is used when a specific precision calculation task is executed, so that the transistor utilization rate and the energy efficiency are low, and if only the calculator with the highest precision is adopted to convert the low-precision sum calculation into the high-precision calculation, the energy efficiency of the calculation is low (equivalent to the high-precision calculation) when the low-precision calculation is carried out.

Therefore, the arithmetic unit shown in fig. 1 and 2 can realize a high transistor utilization rate and a high power consumption ratio.

The support for low-precision multiplication and addition is not limited to floating-point half-precision, 8-bit and 16-bit fixed-point. Any non-standard floating point data format (e.g., BFloat 16) with an exponent of no more than 8 bits and a fractional portion of no more than 10 bits, and non-standard fixed point data formats of no more than 16 bits, e.g., 2-bit, 4-bit, 12-bit fixed point, may be supported with this structure.

It should be noted that, for the two connection modes, switching (i.e., programming) of different connection relationships may be instructed (controlled). This switching is realized by a MUX (gate switch).

Fig. 3 is a schematic diagram of a programmable mixed-precision computing unit that uses a selector to switch between the above first connection mode and the second connection mode to implement different operations:

wherein the content of the first and second substances,

1. the solid path represents a floating point single precision scalar multiply-add:

d = a + B + C = (AM + AL) × (BM + BL) + C. AM represents the high 12 bits of the single precision multiplier A mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), AL represents the low 12 bits of the single precision multiplier A mantissa in the extended floating point number (note that AL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of A), BM represents the high 12 bits of the single precision multiplier B mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), BL represents the low 12 bits of the single precision multiplier B mantissa in the extended floating point number (note that BL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of B), and C represents the single precision floating point multiplication addend.

2. The dashed path represents (4x16) floating point half precision vector multiply/multiply add:di=ci+ai*bi，direpresents the elements corresponding to vectors { d0, d1, d2, d3}, i =0,1,2,3,cirepresenting the elements corresponding to vectors c0, c1, c2, c3,airepresenting the elements corresponding to the vectors a0, a1, a2, a3,birepresenting the corresponding elements of the vectors b0, b1, b2, b 3.

3. The selectors a, b select the dashed paths, and the remaining selectors select the solid paths to represent (4x16) the vector dot product of floating point half precision:

。Athe representation of the vector a is shown as,

which represents the transpose of the vector B,akrepresenting the corresponding elements of vector A { a0, a1, a2, a3},bkrepresenting the corresponding elements of vector B { B0, B1, B2, B3 }.

Fig. 4 is a schematic diagram of another programmable mixed-precision arithmetic unit according to an embodiment of the present application, including four extended single-precision multipliers and four extended double-precision adders.

The extended double precision adder is used for extending the input numerical value to M bits and calculating the sum of the extended numerical values.

Specifically, the range of M includes bits 1, 63 ~ 100, where a portion of more than 64 bits (addition) as precision will be used to maintain the precision of the intermediate result without affecting the practice of the invention bits 2, 46 ~ 62, and simply lack support for fixed point 64 bit calculations over 1, as is encompassed by the invention.

The arithmetic unit in fig. 4 differs from the arithmetic unit shown in fig. 1 in that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder. The connection relationship in the arithmetic unit shown in fig. 4 is the same as that shown in fig. 1.

The extended single-precision multipliers in the arithmetic unit shown in FIG. 4 may be part of the programmable mixed-precision arithmetic unit shown in FIG. 2 except for a single extended single-precision adder (i.e., the fourth extended single-precision adder). The extended range of the number of bits of the extended single-precision multiplier is, therefore, 1, 32 ~ 50 bits, where a portion exceeding 32 bits (multiplication) will be used as precision to maintain the precision of the intermediate result, without affecting the implementation of the present invention.2, 23 ~ 31 bits, as compared to 1, simply lack support for fixed-point 64-bit calculations, and are likewise encompassed by the present invention.

Fig. 5 shows another connection relationship of four extended single-precision multipliers and four extended double-precision adders, and the difference between fig. 5 and fig. 2 is that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder.

The extended single-precision multiplier in the arithmetic unit shown in fig. 5 may be part of the programmable mixed-precision arithmetic unit shown in fig. 2 except for a single extended single-precision adder (i.e., a fourth extended single-precision adder). Correspondingly, the reconfigurable structure consisting of the four extended single-precision multipliers and the four extended double-precision adders supports the following operation types:

1. 1 double-precision floating-point multiply-add operation;

2. 1 64-bit fixed point multiply-add operation;

3. 4 concurrent fixed-point 8-bit, 16-bit or 32-bit multiply-add operations;

4. 4 concurrent 64-bit fixed-point addition operations;

5. 4 concurrent double precision addition operations.

The operation unit shown in fig. 4 or fig. 5 implements the processes of multiplication, addition, and multiplication-addition, which may refer to the processes shown in fig. 1 or fig. 2, and the difference is that the precision of the values involved in the operation is different, for example, the high-order value in the above operation formula is the high-order 32-order of the mantissa after the expansion, and details are not repeated here.

Fig. 6 is a schematic diagram of another programmable mixed-precision arithmetic unit (which may also be referred to as a multiply-add arithmetic operator) disclosed in the embodiment of the present application, including a parallel programmable mixed-precision arithmetic unit (denoted by Kernel in fig. 6), where the programmable mixed-precision arithmetic unit may be an arithmetic unit shown in fig. 1 and fig. 2 or shown in fig. 4 and fig. 5. It should be noted that fig. 6 is only an example of parallel connection, and the parallel connection mode and the dimension are not limited in this embodiment.

The programmable mixed-precision computing unit shown in fig. 6 may be applied to multiply-add operations of vectors, and the vectors may be in dimensions such as one-dimensional and two-dimensional dimensions, which is not limited herein.

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A programmable mixed-precision arithmetic unit, comprising:

any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, X is a preset numerical value, and the value range of X comprises 11 ~ 22 bits;

any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, the calculated extended numerical value is a numerical value in which the product of a first numerical value and a second numerical value is extended to Y bits, Y is a preset numerical value, and the value range of Y comprises 22 ~ 30 bits and 32 ~ 44 bits;

2. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended half-precision multiplier comprises:

3. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended single-precision adder comprises:

the extended single-precision mantissa adder is used for extending an input numerical value to Y bits and calculating the sum of numerical values extended to the Y bits.

4. A programmable mixed-precision arithmetic unit, comprising:

the extended single-precision multiplier is a programmable mixed-precision arithmetic unit as claimed in any one of claims 1 to 3;

the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of numerical values after the input numerical value is extended to the M bits, wherein M is a preset numerical value, and the value range of M comprises 46 ~ 100 bits;

5. A programmable mixed-precision arithmetic unit as claimed in claim 4, wherein said extended double-precision adder comprises:

the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the numerical values after the input numerical value is extended to M bits.

6. A programmable mixed-precision arithmetic unit, comprising:

a programmable mixed-precision arithmetic unit according to any one of claims 1 to 3 connected in parallel, or a programmable mixed-precision arithmetic unit according to any one of claims 4 to 5 connected in parallel.

7. A programmable mixed-precision arithmetic unit, comprising:

the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product, wherein the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, LSBb is the low order of the expanded second numerical value, and the value range of X comprises 11 ~ 22 bits;

and the third extended single-precision adder is used for extending the first addition result and the second addition result to Y bits, and calculating the sum of the extended first addition result and the extended second addition result, wherein the value range of Y comprises 22 ~ 30 bits and 32 ~ 44 bits.

8. A programmable mixed-precision arithmetic unit as claimed in claim 7, further comprising:

and a fourth extended single-precision adder, configured to extend two input single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the input single-precision values is the sum of the first addition result and the second addition result.

9. A programmable mixed-precision arithmetic unit, comprising:

four extended single-precision multipliers and three extended double-precision adders, wherein the value range of M comprises 46 ~ 100 bits;

the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit in claim 7;

10. A programmable mixed-precision arithmetic unit as claimed in claim 9, further comprising:

and a fourth extended double-precision adder, configured to extend two input double-precision values to M bits, and then calculate a sum of the two extended double-precision values, where any one of the input double-precision values is the sum of the first addition result and the second addition result.