CN109634558B - Programmable mixed precision arithmetic unit - Google Patents

Programmable mixed precision arithmetic unit Download PDF

Info

Publication number
CN109634558B
CN109634558B CN201811514918.2A CN201811514918A CN109634558B CN 109634558 B CN109634558 B CN 109634558B CN 201811514918 A CN201811514918 A CN 201811514918A CN 109634558 B CN109634558 B CN 109634558B
Authority
CN
China
Prior art keywords
precision
extended
numerical value
bits
adder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811514918.2A
Other languages
Chinese (zh)
Other versions
CN109634558A (en
Inventor
刘彦
赵立东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Suiyuan Technology Co.,Ltd.
Original Assignee
Shanghai Suiyuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Suiyuan Technology Co Ltd filed Critical Shanghai Suiyuan Technology Co Ltd
Priority to CN201811514918.2A priority Critical patent/CN109634558B/en
Publication of CN109634558A publication Critical patent/CN109634558A/en
Application granted granted Critical
Publication of CN109634558B publication Critical patent/CN109634558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/505Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/3804Details
    • G06F2207/3808Details concerning the type of numbers or the way they are handled
    • G06F2207/3812Devices capable of handling different types of numbers
    • G06F2207/3824Accepting both fixed-point and floating-point numbers

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a programmable mixed precision operation unit which can support floating point or fixed point multiplication and/or addition operation of various precisions, not only can realize multi-path concurrent low-precision operation, but also can integrally realize high-precision operation, and therefore, the programmable mixed precision operation unit has higher energy efficiency ratio.

Description

Programmable mixed precision arithmetic unit
Technical Field
The application relates to the field of electronic information, in particular to a programmable hybrid precision arithmetic unit.
Background
Deep neural networks are widely applied in the field of artificial intelligence, and application scenarios thereof can be roughly divided into two types, namely Training (Training) and Inference (Inference). The inference algorithm has relatively low requirement on the operation precision, and 8-bit and 16-bit fixed point precisions are mostly used; most training algorithms require 16-bit or 32-bit floating point precision.
The existing arithmetic unit only supports 8-bit or 16-bit fixed point arithmetic and is only suitable for inference; or floating point operation is supported, the method is suitable for training and inference, but the hardware cost is high, the energy consumption is high, and the energy efficiency ratio is low when the method is applied to an inference scene.
Disclosure of Invention
The application provides a programmable mixed precision operation unit, which aims to solve the problems of compatibility of fixed-point and floating-point operations and high energy efficiency ratio.
In order to achieve the above object, the present application provides the following technical solutions:
a programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and four extended single-precision adders;
any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, and X is a preset numerical value;
any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value;
wherein the four extended half-precision multipliers and the four extended single-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multiply-add devices;
the second mode is as follows: the first extended single-precision adder is respectively cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier;
the second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier;
and the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder.
Optionally, the extended half-precision multiplier includes:
a single-precision exponent multiplier and an extended half-precision mantissa multiplier which are connected in parallel;
the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
Optionally, the extended single-precision adder includes:
a single-precision exponent adder and an extended single-precision mantissa adder which are connected in parallel;
the extended single precision mantissa adder is used to extend an input numerical value to Y bits and calculate the sum of the extended numerical values.
A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and four extended double precision adders;
the extended single-precision multiplier is the programmable mixed-precision operation unit of any one of the preceding items;
the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values, wherein M is a preset numerical value;
wherein the four extended single-precision multipliers and the four extended double-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended single-precision multipliers and the four extended double-precision adders are correspondingly cascaded one by one to form four single-precision multiply-add devices connected in parallel;
the second mode is as follows: the first extended double-precision adder is respectively cascaded with the first extended single-precision multiplier and the second extended single-precision multiplier;
the second extended double-precision adder is respectively cascaded with the third extended single-precision multiplier and the fourth extended single-precision multiplier;
the third extended double-precision adder is respectively cascaded with the first extended double-precision adder and the second extended double-precision adder.
Optionally, the extended double-precision adder includes:
a double-precision exponent adder and an extended double-precision mantissa adder which are connected in parallel;
the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the extended numerical values.
A programmable mixed-precision arithmetic unit, comprising:
a programmable mixed-precision arithmetic unit of any of the preceding in parallel.
A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and three extended single-precision adders;
the first extended half-precision multiplier is used for calculating MSBa and MSBb after an input numerical value is extended to X bits to obtain a first product;
the second expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating MSBa LSBb to obtain a second product;
the third expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and MSBb to obtain a third product;
the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product; the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, and LSBb is the low order of the expanded second numerical value;
the first extended single-precision adder is used for extending the first product and the second product to Y bits and then calculating the sum of the extended first product and the extended second product to obtain a first addition result;
the second extended single-precision adder is used for extending the third product and the fourth product to Y bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result;
and the third extended single-precision adder is used for calculating the sum of the extended first addition result and the extended second addition result after the first addition result and the second addition result are extended to Y bits.
Optionally, the method further includes:
and a fourth extended single-precision adder, configured to extend the two single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the single-precision values is the sum of the first addition result and the second addition result.
A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and three extended double precision adders;
the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit;
the first extended double-precision adder is used for extending the first product and the second product to M bits, and then calculating the sum of the extended first product and the extended second product to obtain a first addition result; the first product is an output result of a first extended single-precision multiplier, and the second product is an output result of a second extended single-precision multiplier;
the second extended double-precision adder is used for extending the third product and the fourth product to M bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result; the third product is an output result of a third extended single-precision multiplier, and the fourth product is an output result of a fourth extended single-precision multiplier;
and the third extended double-precision adder is used for calculating the sum of the extended first addition result and the second addition result after the first addition result and the second addition result are extended to M bits.
Optionally, the method further includes:
and a fourth extended double-precision adder, configured to extend the two double-precision values to M bits, and calculate a sum of the two extended double-precision values, where any one of the double-precision values is the sum of the first addition result and the second addition result.
The programmable mixed precision operation unit can support floating point or fixed point multiplication and/or addition operation of various precisions, can realize multipath concurrent low-precision operation, and can realize high-precision operation integrally, so that the programmable mixed precision operation unit has higher energy efficiency ratio.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a programmable mixed-precision computing unit according to an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
fig. 3 is a schematic diagram illustrating different operations implemented by switching the connection modes of the programmable mixed-precision operation unit disclosed in the embodiment of the present application;
FIG. 4 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
FIG. 5 is a schematic structural diagram of another programmable mixed-precision computing unit disclosed in the embodiments of the present application;
fig. 6 is a schematic structural diagram of another programmable mixed-precision arithmetic unit disclosed in the embodiment of the present application.
Detailed Description
The programmable mixed precision arithmetic unit disclosed by the embodiment of the application can be applied to, but not limited to, a deep neural network, and is suitable for training and deducing processes in terms of arithmetic types; in terms of hardware, it can be provided in general purpose central processing units (CPUs such as Intel/AMD x86 CPUs), graphics processors (GPUs such as NVidia V100), neuron processors (such as Google TPU), field programmable gate arrays (FPGAs and Application Specific Integrated Circuits (ASICs)).
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a programmable mixed-precision arithmetic unit disclosed in an embodiment of the present application, including: four extended half-precision multipliers 1 and four extended single-precision adders 2.
Any one of the extended half-precision multipliers is used for extending an input value to X bits and calculating a product of a first value and a second value, wherein the first value is a higher value or a lower value of the extended value of one input value, the second value is a higher value or a lower value of the extended value of the other input value, and X is a preset value.
Any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, wherein Y is a preset numerical value.
In the present embodiment, "extended" means that the function of extending the number of digits of a numerical value to a preset numerical value is provided, but if the input numerical value is full of the preset numerical value, the extension is not performed.
Specifically, any one of the extended half-precision multipliers 1 includes: a single-precision exponent multiplier 11 and an extended half-precision mantissa multiplier 12 are connected in parallel.
Wherein the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
Specifically, the value range of X includes bits 1 and 16 ~ 22, where the part exceeding 16 bits (input) as precision will be used to maintain the precision of the intermediate result, and does not affect the implementation of the present invention, bits 2 and 11 ~ 15, and it is also the content covered by the present invention that only the support of the fixed point 32 bit calculation is missing compared with 1.
Specifically, the single-precision exponent multiplier is used for calculating the product of exponents of single-precision floating-point numbers. The single-precision exponential multiplier supports multiplication of single-precision numerical values and can be downward compatible with multiplication of half-precision numerical values.
Based on the structure of the extended half-precision multiplier, the extended half-precision multiplier can realize multiplication of half-precision floating point numbers and multiplication of 8-bit or 16-bit fixed point numbers. For example, the following steps are carried out:
for half-precision floating-point multiplication: assume two half-precision floating-point numbers C and D:
c = 1.Mc 2^ Ec, D = 1.Md 2^ Ed, where 1.Mc is the mantissa of C, 2^ Ec is the exponent of C, 1.Md is the mantissa of D, and 2^ Ed is the exponent of D.
The product of C and D is then: c + D = (1.Mc + 1.Md) × 2^ (Ec + Ed).
After 1.Mc and 1.Md are extended from 11 bits to 16 bits by low-bit extension (zero padding), respectively, assume that 1.Ma is extended as follows:
1.Mc’= Mc*2^-15。
md after extension is:
1.Md’= Md*2^-15。
let X = C + D =1.Mx + 2^ Ex;
then Ex = Ec + Ed;
1) if Mc Md 2^ -30> =2
Mx = Mc Md 2^ -31 (Mx rounded to 10 bits)
Ex=Ec+Ed +1
2) If Mc Md 2^ -30<2
Mx = Mc Md 2^ -30 (Mx rounded to 10 bits)
Ex=Ec+Ed。
Therefore, the multiplication of the two half-precision floating point numbers comprises multiplication, shift operation and addition operation, so that the addition operation can be carried out by using a single-precision exponent multiplier and the multiplication operation can be carried out by using a half-precision mantissa multiplier, and the shift operation is realized by using the conventional shift operation module.
For 8-bit fixed-point multiplication: assuming that the fixed-point numbers of two 8 bits are a and B, let 1.Ma = a (signed or unsigned extension to 16 bits), and 1.Mb = B, then a = B = 1.Ma × 1.Mb (cut 16 lower bits), and the operation can be completed using an extended half-precision mantissa multiplier.
Any extended single precision adder 2 includes: a single-precision exponent adder 21 and an extended single-precision mantissa adder 22 connected in parallel.
The extended single-precision mantissa adder is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values.
Specifically, the value range of Y includes 32 ~ 44 bits, where the part exceeding 32 bits (output) as precision will be used to maintain the precision of the intermediate result, without affecting the implementation of the present invention.2, 22 ~ 30 bits, and compared with 1, only missing the support for the fixed point 32 bit calculation is also covered by the present invention.
Specifically, the single-precision exponent adder is used for calculating the sum of exponents of single-precision or half-precision floating point numbers.
Based on the above structure, any one extended single-precision adder can support:
1. single precision addition;
2. half precision addition, wherein half precision is expanded into single precision and then converted into half precision after addition;
3. fixed-point 32-bit addition is realized by directly utilizing an extended mantissa adder, and the exponent parts of addition operands are defaulted to be the same constants;
4. the fixed point 16-bit and 8-bit addition is expanded to 32-bit fixed point for operation, and the result is converted into 16-bit or 8-bit (lower bit).
The four extended half-precision multipliers 1 and the four extended single-precision adders 2 have two connection modes:
fig. 1 shows a first connection: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multipliers and adders.
Based on the above specific structure, any one of the half-precision multipliers and adders can realize the following operations:
1. a half precision multiply or add operation;
2. a half precision multiply-add operation;
3. a fixed point 8-bit 16-bit multiply or add operation;
4. a fixed point 8-bit 16-bit multiply-add operation;
5. a single precision addition operation;
6. a fixed point 32-bit addition operation.
It can be seen that the first connection scheme can constitute 4 half-precision multipliers and that the 4 half-precision multipliers can perform the above 6 operations in parallel (one half-precision multiplier and one operation at the same time).
It should be noted that, in the embodiments of the present application, according to an existing structure and a known operation type, a person skilled in the art can know how to implement the known operation type using the existing structure, and details are not repeated here. Other configurations can be similar and are not exhaustive in the embodiments of the present application.
Fig. 2 shows a second connection:
the four extended half-precision multipliers are respectively called: a first extended half-precision multiplier, a second extended half-precision multiplier, a third extended half-precision multiplier, and a fourth extended half-precision multiplier. The four extended single-precision adders are referred to as a first extended single-precision adder, a second extended single-precision adder, a third extended single-precision adder, and a fourth extended single-precision adder, respectively. The first extended single-precision adder is cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier, respectively. The second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier. And the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder. The fourth extended single-precision adder is not connected to other extended single-precision adders or extended half-precision multipliers.
Based on the above specific structure, an example of the second connection relationship shown in fig. 2 for implementing the multiply-add operation is:
specifically, assume that two single-precision floating-point numbers are a and B:
a = 1.Ma x 2^ Ea, B = 1.Mb x 2^ Eb, where 1.Ma is the mantissa of A, 2^ Ea is the exponent of A, 1.Mb is the mantissa of B, and 2^ Eb is the exponent of B.
The product of a and B is: a = (1. Ma:1. Mb) × 2^ (Ea + Eb).
After 1.Ma and 1.Mb are extended from 23 bits to 32 bits by low-bit extension (zero padding), respectively, it is assumed that 1.Ma is extended as follows:
1.Ma’= MSBa*2^-15 + LSBa * 2^-31。
mb extended is:
1.Mb’= MSBb*2^-15 + LSBb * 2^-31。
wherein MSB represents a high bit, LSB represents a low bit, MSBa represents a high bit of the mantissa obtained after 1.Ma expansion, LSBa represents a low bit of the mantissa obtained after 1.Ma expansion, MSBb represents a high bit of the mantissa obtained after 1.Mb expansion, and LSBb represents a low bit of the mantissa obtained after 1.Mb expansion. The MSB and LSB are 16-bit fixed point numbers respectively.
The product of the mantissas is:
X=1.Ma*1.Mb= 1.Ma’ * 1.Mb’
=(MSBa*2^-15+LSBa*2^-31)* (MSBb*2^-15 + LSBb * 2^-31 )
= 2^-15 *( (MSBa * MSBb) + 2^-16 *( MSBa * LSBb + LSBa * MSBb) + 2^-32 * (LSBa * LSBb)) 。(1)
it can be seen that equation (1) includes four products, and the product of mantissas is the sum of the four products, so that, based on equation (1), 4 extended half-precision multipliers in fig. 2 (specifically, extended half-precision mantissa multipliers in the extended half-precision multipliers) are sequentially used to calculate 4 products in equation (1), and 3 adders are used to calculate the sum of the 4 products.
Namely: fig. 2 comprises 3 layers from top to bottom: in the first layer, from left to right, the first extended half-precision multiplier is used to calculate MSBa × MSBb, the second extended half-precision multiplier is used to calculate MSBa × LSBb, the third extended half-precision multiplier is used to calculate LSBa × MSBb, and the fourth extended half-precision multiplier is used to calculate LSBa × LSBb. Two extended single precision adders in a second layer, in left-to-right order, the first extended single precision adder for calculating a sum of MSBa MSBb and MSBa LSBb (hereinafter the sum of MSBa MSBb and MSBa LSBb is referred to as a first addition result), and the second extended single precision adder for calculating a sum of LSBa MSBb and LSBa LSBb (hereinafter the sum of LSBa MSBb and LSBa LSBb is referred to as a second addition result). The extended single precision adder of the third layer is used for calculating the sum of the first addition result and the second addition result.
It should be noted that 2^ -15, 2^ -16 and 2^ -32 in the formula (1) are realized through shift operation, and the shift operation can be realized by using the existing shift operation module, which can be seen in the prior art specifically and is not described herein again, and the shift operation module is not shown in fig. 2.
When the shift operation is involved, the semi-precision floating-point multiplier can realize the purpose of precision protection for the expansion of mantissas.
As is apparent from the above description, in the configuration shown in fig. 2, the mantissa operation in one single-precision floating-point multiplication is synthesized by combining 4 times of half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition operations.
The exponent operation 2^ Ea × 2^ Eb in the product of A and B can be realized by any single-precision exponent multiplier, and the multiplication and addition of the mantissa multiplication result and the exponent multiplication result can be obtained by shift operation (a shift module is the prior art and is not shown in FIG. 2), so that the connection relation shown in FIG. 2 has the function of realizing the multiplication of floating point numbers.
And by combining the fourth extended single-precision adder, the result of the multiplication of the floating point number twice can be used as the input of the fourth extended single-precision adder, so that the multiplication and addition operation is realized. Namely: combining 4 times of extended half-precision mantissa (fixed point) multiplication operations and 3 times of extended single-precision mantissa (fixed point) addition, and one extended single-precision floating point addition operation, one single-precision floating point multiply-add operation can be synthesized. That is, 4 extended half-precision multipliers and 4 extended single-precision adders may constitute one single-precision floating-point multiply-add unit and be downward compatible with half-precision floating-point multiply-add operations.
Because fixed-point operations can be implemented using the mantissa portion of floating-point operations, and the exponent portion, which corresponds to floating-point operations, is skipped, fixed-point multiplications can also be implemented using the connection shown in fig. 2, which can implement 8-bit or 16-bit or 32-bit fixed-point multiplications based on the precision of extended half-precision multipliers. The implementation of the 32-bit fixed-point multiplication is the same as the above-mentioned mantissa multiplication (since the value of the input extended half-precision multiplier is already 32 bits, the half-precision multiplier does not need to extend the value). For 8-bit or 16-bit fixed-point multiplication, the operation process of multiplying the mantissa after the value is expanded by the half-precision multiplier is the same as that of the mantissa, and the description is omitted here.
Here, an example is given of a 32-bit fixed-point multiply-add operation for the connection relationship shown in fig. 2:
assume two 32-bit fixed-point numbers a and B, respectively:
a = MSBa 2^16+ LSBa, B = MSBa 2^16+ LSBb, wherein MSBa is the upper 16 bits of A, and LSBa is the lower 16 bits of A; MSBb is the upper 16 bits of B, and LSBb is the lower 16 bits of B.
The product of a and B is: a ^ B = (MSBa ^ 2^16+ LSBa) (MSBa ^ 2^16+ LSBb)
= 2^32 *(MSBa * MSBb) + 2^16 *( MSBa * LSBb + LSBa * MSBb) + (LSBa *LSBb) 。
Similar to single precision floating point operation, 4 extended half-precision mantissa multipliers, three shifters and three extended single-precision mantissa adders are used to complete the operation of a × B. And intercepting the lower 32 bits of the result, and if the result is greater than the maximum value represented by the 32 bits of energy during interception, performing saturation or overflow processing according to the algorithm requirement.
Finally, an extended single-precision mantissa adder is cascaded to complete a 32-bit fixed point multiply-add operation.
In summary, the four extended half-precision multipliers and the four extended single-precision adders constitute a reconfigurable arithmetic unit (fig. 1 and 2) capable of supporting multiply and/or add operations of multiple precisions, such as half-precision, single-precision floating-point numbers, and also capable of supporting multiply and/or add operations of 8-bit, 16-bit, and 32-bit fixed-point numbers, i.e., capable of supporting mixed-precision floating-point and fixed-point multiply and/or add operations.
More importantly, a part of the arithmetic units in the arithmetic unit can support lower-precision operation, for example, a single-precision exponential multiplier and any one extended half-precision multiplier realize half-precision floating-point multiplication, and the whole arithmetic unit supports higher-precision operation. Moreover, the operation of a part of the operators does not affect the operation of other operators connected in parallel, so that the multi-path concurrent low-precision operation can be realized, and the whole can be used as a high-precision operation unit.
In the prior art, if independent single-precision and semi-precision calculators are adopted, only one of the calculators is used when a specific precision calculation task is executed, so that the transistor utilization rate and the energy efficiency are low, and if only the calculator with the highest precision is adopted to convert the low-precision sum calculation into the high-precision calculation, the energy efficiency of the calculation is low (equivalent to the high-precision calculation) when the low-precision calculation is carried out.
Therefore, the arithmetic unit shown in fig. 1 and 2 can realize a high transistor utilization rate and a high power consumption ratio.
The support for low-precision multiplication and addition is not limited to floating-point half-precision, 8-bit and 16-bit fixed-point. Any non-standard floating point data format (e.g., BFloat 16) with an exponent of no more than 8 bits and a fractional portion of no more than 10 bits, and non-standard fixed point data formats of no more than 16 bits, e.g., 2-bit, 4-bit, 12-bit fixed point, may be supported with this structure.
It should be noted that, for the two connection modes, switching (i.e., programming) of different connection relationships may be instructed (controlled). This switching is realized by a MUX (gate switch).
Fig. 3 is a schematic diagram of a programmable mixed-precision computing unit that uses a selector to switch between the above first connection mode and the second connection mode to implement different operations:
wherein the content of the first and second substances,
1. the solid path represents a floating point single precision scalar multiply-add:
d = a + B + C = (AM + AL) × (BM + BL) + C. AM represents the high 12 bits of the single precision multiplier A mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), AL represents the low 12 bits of the single precision multiplier A mantissa in the extended floating point number (note that AL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of A), BM represents the high 12 bits of the single precision multiplier B mantissa in the extended half precision floating point number (8-bit exponent and 16-bit mantissa), BL represents the low 12 bits of the single precision multiplier B mantissa in the extended floating point number (note that BL exponent needs to be multiplied by 2^ 12 adjustment on the exponent of B), and C represents the single precision floating point multiplication addend.
2. The dashed path represents (4x16) floating point half precision vector multiply/multiply add:di=ci+ai*bidirepresents the elements corresponding to vectors { d0, d1, d2, d3}, i =0,1,2,3,cirepresenting the elements corresponding to vectors c0, c1, c2, c3,airepresenting the elements corresponding to the vectors a0, a1, a2, a3,birepresenting the corresponding elements of the vectors b0, b1, b2, b 3.
3. The selectors a, b select the dashed paths, and the remaining selectors select the solid paths to represent (4x16) the vector dot product of floating point half precision:
Figure DEST_PATH_IMAGE001
Athe representation of the vector a is shown as,
Figure 290234DEST_PATH_IMAGE002
which represents the transpose of the vector B,akrepresenting the corresponding elements of vector A { a0, a1, a2, a3},bkrepresenting the corresponding elements of vector B { B0, B1, B2, B3 }.
Fig. 4 is a schematic diagram of another programmable mixed-precision arithmetic unit according to an embodiment of the present application, including four extended single-precision multipliers and four extended double-precision adders.
The extended double precision adder is used for extending the input numerical value to M bits and calculating the sum of the extended numerical values.
Specifically, the range of M includes bits 1, 63 ~ 100, where a portion of more than 64 bits (addition) as precision will be used to maintain the precision of the intermediate result without affecting the practice of the invention bits 2, 46 ~ 62, and simply lack support for fixed point 64 bit calculations over 1, as is encompassed by the invention.
The arithmetic unit in fig. 4 differs from the arithmetic unit shown in fig. 1 in that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder. The connection relationship in the arithmetic unit shown in fig. 4 is the same as that shown in fig. 1.
The extended single-precision multipliers in the arithmetic unit shown in FIG. 4 may be part of the programmable mixed-precision arithmetic unit shown in FIG. 2 except for a single extended single-precision adder (i.e., the fourth extended single-precision adder). The extended range of the number of bits of the extended single-precision multiplier is, therefore, 1, 32 ~ 50 bits, where a portion exceeding 32 bits (multiplication) will be used as precision to maintain the precision of the intermediate result, without affecting the implementation of the present invention.2, 23 ~ 31 bits, as compared to 1, simply lack support for fixed-point 64-bit calculations, and are likewise encompassed by the present invention.
Fig. 5 shows another connection relationship of four extended single-precision multipliers and four extended double-precision adders, and the difference between fig. 5 and fig. 2 is that: 1. replacing the extended single-precision exponential multiplication/adder with an extended double-precision exponential multiplication/adder; 2. replacing the expanded half-precision mantissa multiplier with an expanded single-precision mantissa multiplier; 3. the extended single-precision mantissa adder is replaced with an extended double-precision mantissa adder.
The extended single-precision multiplier in the arithmetic unit shown in fig. 5 may be part of the programmable mixed-precision arithmetic unit shown in fig. 2 except for a single extended single-precision adder (i.e., a fourth extended single-precision adder). Correspondingly, the reconfigurable structure consisting of the four extended single-precision multipliers and the four extended double-precision adders supports the following operation types:
1. 1 double-precision floating-point multiply-add operation;
2. 1 64-bit fixed point multiply-add operation;
3. 4 concurrent fixed-point 8-bit, 16-bit or 32-bit multiply-add operations;
4. 4 concurrent 64-bit fixed-point addition operations;
5. 4 concurrent double precision addition operations.
The operation unit shown in fig. 4 or fig. 5 implements the processes of multiplication, addition, and multiplication-addition, which may refer to the processes shown in fig. 1 or fig. 2, and the difference is that the precision of the values involved in the operation is different, for example, the high-order value in the above operation formula is the high-order 32-order of the mantissa after the expansion, and details are not repeated here.
Fig. 6 is a schematic diagram of another programmable mixed-precision arithmetic unit (which may also be referred to as a multiply-add arithmetic operator) disclosed in the embodiment of the present application, including a parallel programmable mixed-precision arithmetic unit (denoted by Kernel in fig. 6), where the programmable mixed-precision arithmetic unit may be an arithmetic unit shown in fig. 1 and fig. 2 or shown in fig. 4 and fig. 5. It should be noted that fig. 6 is only an example of parallel connection, and the parallel connection mode and the dimension are not limited in this embodiment.
The programmable mixed-precision computing unit shown in fig. 6 may be applied to multiply-add operations of vectors, and the vectors may be in dimensions such as one-dimensional and two-dimensional dimensions, which is not limited herein.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and four extended single-precision adders;
any one of the expanded half-precision multipliers is used for expanding an input numerical value to X bits and calculating the product of a first numerical value and a second numerical value, wherein the first numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of one input numerical value, the second numerical value is a high-order numerical value or a low-order numerical value in the expanded numerical value of the other input numerical value, X is a preset numerical value, and the value range of X comprises 11 ~ 22 bits;
any one of the extended single-precision adders is used for extending an input numerical value to Y bits and calculating the sum of the extended numerical values, the calculated extended numerical value is a numerical value in which the product of a first numerical value and a second numerical value is extended to Y bits, Y is a preset numerical value, and the value range of Y comprises 22 ~ 30 bits and 32 ~ 44 bits;
wherein the four extended half-precision multipliers and the four extended single-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended half-precision multipliers and the four extended single-precision adders are correspondingly cascaded one by one to form four parallel half-precision multiply-add devices;
the second mode is as follows: the first extended single-precision adder is respectively cascaded with the first extended half-precision multiplier and the second extended half-precision multiplier;
the second extended single-precision adder is respectively cascaded with the third extended half-precision multiplier and the fourth extended half-precision multiplier;
and the third extended single-precision adder is respectively cascaded with the first extended single-precision adder and the second extended single-precision adder.
2. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended half-precision multiplier comprises:
a single-precision exponent multiplier and an extended half-precision mantissa multiplier which are connected in parallel;
the extended half-precision mantissa multiplier is configured to extend an input value to X bits and calculate a product of the first value and the second value.
3. A programmable mixed-precision arithmetic unit as claimed in claim 1 wherein said extended single-precision adder comprises:
a single-precision exponent adder and an extended single-precision mantissa adder which are connected in parallel;
the extended single-precision mantissa adder is used for extending an input numerical value to Y bits and calculating the sum of numerical values extended to the Y bits.
4. A programmable mixed-precision arithmetic unit, comprising:
four extended single precision multipliers and four extended double precision adders;
the extended single-precision multiplier is a programmable mixed-precision arithmetic unit as claimed in any one of claims 1 to 3;
the extended double-precision adder is used for extending an input numerical value to M bits and calculating the sum of numerical values after the input numerical value is extended to the M bits, wherein M is a preset numerical value, and the value range of M comprises 46 ~ 100 bits;
wherein the four extended single-precision multipliers and the four extended double-precision adders are connected in a first manner or a second manner;
the first mode is as follows: the four extended single-precision multipliers and the four extended double-precision adders are correspondingly cascaded one by one to form four single-precision multiply-add devices connected in parallel;
the second mode is as follows: the first extended double-precision adder is respectively cascaded with the first extended single-precision multiplier and the second extended single-precision multiplier;
the second extended double-precision adder is respectively cascaded with the third extended single-precision multiplier and the fourth extended single-precision multiplier;
the third extended double-precision adder is respectively cascaded with the first extended double-precision adder and the second extended double-precision adder.
5. A programmable mixed-precision arithmetic unit as claimed in claim 4, wherein said extended double-precision adder comprises:
a double-precision exponent adder and an extended double-precision mantissa adder which are connected in parallel;
the extended double-precision mantissa adder is used for extending an input numerical value to M bits and calculating the sum of the numerical values after the input numerical value is extended to M bits.
6. A programmable mixed-precision arithmetic unit, comprising:
a programmable mixed-precision arithmetic unit according to any one of claims 1 to 3 connected in parallel, or a programmable mixed-precision arithmetic unit according to any one of claims 4 to 5 connected in parallel.
7. A programmable mixed-precision arithmetic unit, comprising:
four extended half-precision multipliers and three extended single-precision adders;
the first extended half-precision multiplier is used for calculating MSBa and MSBb after an input numerical value is extended to X bits to obtain a first product;
the second expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating MSBa LSBb to obtain a second product;
the third expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and MSBb to obtain a third product;
the fourth expanded half-precision multiplier is used for expanding the input numerical value to X bits and then calculating LSBa and LSBb to obtain a fourth product, wherein the input numerical value is a first numerical value and a second numerical value, MSBa is the high order of the expanded first numerical value, MSBb is the high order of the expanded second numerical value, LSBa is the low order of the expanded first numerical value, LSBb is the low order of the expanded second numerical value, and the value range of X comprises 11 ~ 22 bits;
the first extended single-precision adder is used for extending the first product and the second product to Y bits and then calculating the sum of the extended first product and the extended second product to obtain a first addition result;
the second extended single-precision adder is used for extending the third product and the fourth product to Y bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result;
and the third extended single-precision adder is used for extending the first addition result and the second addition result to Y bits, and calculating the sum of the extended first addition result and the extended second addition result, wherein the value range of Y comprises 22 ~ 30 bits and 32 ~ 44 bits.
8. A programmable mixed-precision arithmetic unit as claimed in claim 7, further comprising:
and a fourth extended single-precision adder, configured to extend two input single-precision values to Y bits, and then calculate a sum of the two extended single-precision values, where any one of the input single-precision values is the sum of the first addition result and the second addition result.
9. A programmable mixed-precision arithmetic unit, comprising:
four extended single-precision multipliers and three extended double-precision adders, wherein the value range of M comprises 46 ~ 100 bits;
the extended single-precision multiplier is used for realizing the function of the programmable mixed-precision arithmetic unit in claim 7;
the first extended double-precision adder is used for extending the first product and the second product to M bits, and then calculating the sum of the extended first product and the extended second product to obtain a first addition result; the first product is an output result of a first extended single-precision multiplier, and the second product is an output result of a second extended single-precision multiplier;
the second extended double-precision adder is used for extending the third product and the fourth product to M bits, and then calculating the sum of the extended third product and the extended fourth product to obtain a second addition result; the third product is an output result of a third extended single-precision multiplier, and the fourth product is an output result of a fourth extended single-precision multiplier;
and the third extended double-precision adder is used for calculating the sum of the extended first addition result and the second addition result after the first addition result and the second addition result are extended to M bits.
10. A programmable mixed-precision arithmetic unit as claimed in claim 9, further comprising:
and a fourth extended double-precision adder, configured to extend two input double-precision values to M bits, and then calculate a sum of the two extended double-precision values, where any one of the input double-precision values is the sum of the first addition result and the second addition result.
CN201811514918.2A 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit Active CN109634558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811514918.2A CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811514918.2A CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Publications (2)

Publication Number Publication Date
CN109634558A CN109634558A (en) 2019-04-16
CN109634558B true CN109634558B (en) 2020-01-14

Family

ID=66073086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811514918.2A Active CN109634558B (en) 2018-12-12 2018-12-12 Programmable mixed precision arithmetic unit

Country Status (1)

Country Link
CN (1) CN109634558B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389746B (en) * 2019-07-29 2021-04-23 中国电子科技集团公司第二十四研究所 Hardware acceleration circuit, micro control chip and system
US20230076931A1 (en) * 2019-10-14 2023-03-09 Anhui Cambricon Information Technology Co., Ltd. Multiplier for floating-point operation, method, integrated circuit chip, and calculation device
US11275561B2 (en) * 2019-12-12 2022-03-15 International Business Machines Corporation Mixed precision floating-point multiply-add operation
CN111666077B (en) * 2020-04-13 2022-02-25 北京百度网讯科技有限公司 Operator processing method and device, electronic equipment and storage medium
CN111784489A (en) * 2020-06-28 2020-10-16 广东金宇恒软件科技有限公司 Financial accounting management system based on big data
CN111626414B (en) * 2020-07-30 2020-10-27 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN112506468B (en) * 2020-12-09 2023-04-28 上海交通大学 RISC-V general processor supporting high throughput multi-precision multiplication operation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1279781A (en) * 1997-11-26 2001-01-10 阿塔迈尔公司 Apparatus for multiprecision integer arithmetic
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN107153522A (en) * 2017-04-21 2017-09-12 东南大学 A kind of dynamic accuracy towards artificial neural networks can match somebody with somebody approximate multiplier
CN107967132A (en) * 2017-11-27 2018-04-27 中国科学院计算技术研究所 A kind of adder and multiplier for neural network processor
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN108958705A (en) * 2018-06-26 2018-12-07 天津飞腾信息技术有限公司 A kind of floating-point fusion adder and multiplier and its application method for supporting mixed data type

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2522194B (en) * 2014-01-15 2021-04-28 Advanced Risc Mach Ltd Multiply adder

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1279781A (en) * 1997-11-26 2001-01-10 阿塔迈尔公司 Apparatus for multiprecision integer arithmetic
CN101916177A (en) * 2010-07-26 2010-12-15 清华大学 Configurable multi-precision fixed point multiplying and adding device
CN108694038A (en) * 2017-04-12 2018-10-23 英特尔公司 Dedicated processes mixed-precision floating-point operation circuit in the block
CN107153522A (en) * 2017-04-21 2017-09-12 东南大学 A kind of dynamic accuracy towards artificial neural networks can match somebody with somebody approximate multiplier
CN107967132A (en) * 2017-11-27 2018-04-27 中国科学院计算技术研究所 A kind of adder and multiplier for neural network processor
CN108564168A (en) * 2018-04-03 2018-09-21 中国科学院计算技术研究所 A kind of design method to supporting more precision convolutional neural networks processors
CN108958705A (en) * 2018-06-26 2018-12-07 天津飞腾信息技术有限公司 A kind of floating-point fusion adder and multiplier and its application method for supporting mixed data type

Also Published As

Publication number Publication date
CN109634558A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN109634558B (en) Programmable mixed precision arithmetic unit
CN114402289B (en) Multi-mode arithmetic circuit
Mohan et al. Residue Number Systems
US9519460B1 (en) Universal single instruction multiple data multiplier and wide accumulator unit
JP2021536076A (en) Multiplication cumulative circuit
JP5873599B2 (en) System and method for signal processing in a digital signal processor
JPH02196328A (en) Floating point computing apparatus
JP3940542B2 (en) Data processor and data processing system
CN110852434B (en) CNN quantization method, forward calculation method and hardware device based on low-precision floating point number
US11816448B2 (en) Compressing like-magnitude partial products in multiply accumulation
US10101970B2 (en) Efficient modulo calculation
WO2022170809A1 (en) Reconfigurable floating point multiply-accumulate operation unit and method suitable for multi-precision calculation
Schmookler et al. A low-power, high-speed implementation of a PowerPC/sup TM/microprocessor vector extension
CN112732220A (en) Multiplier, method, integrated circuit chip and computing device for floating-point operation
CN112712172A (en) Computing device, method, integrated circuit and equipment for neural network operation
JP4273071B2 (en) Divide and square root calculator
CN114341796A (en) Signed multiword multiplier
WO2001046796A1 (en) Computing system using newton-raphson method
TW202219839A (en) Neural network processing unit and system
JPH04172526A (en) Floating point divider
KR102338863B1 (en) Apparatus and method for controlling operation
CN117908835B (en) Method for accelerating SM2 cryptographic algorithm based on floating point number computing capability
US20240069864A1 (en) Hardware accelerator for floating-point operations
CN115374904A (en) Low-power-consumption floating point multiplication accumulation operation method for neural network reasoning acceleration
Balasaraswathi et al. IMPLEMENTATION OF FLOATING POINT FFT PROCESSOR WITH SINGLE PRECISION FOR REDUCTION IN POWER

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 200120 room a-522, 188 Yesheng Road, Lingang xinpian District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai Suiyuan Technology Co.,Ltd.

Address before: 201203 Room 302, building 2, zhangrun building, Lane 61, shengxia Road, Pudong New Area, Shanghai

Patentee before: SHANGHAI ENFLAME TECHNOLOGY Co.,Ltd.