CN117742657A - Arithmetic unit, arithmetic logic unit and processor - Google Patents

Arithmetic unit, arithmetic logic unit and processor Download PDF

Info

Publication number
CN117742657A
CN117742657A CN202211108747.XA CN202211108747A CN117742657A CN 117742657 A CN117742657 A CN 117742657A CN 202211108747 A CN202211108747 A CN 202211108747A CN 117742657 A CN117742657 A CN 117742657A
Authority
CN
China
Prior art keywords
integer
bits
multiplier
bit
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211108747.XA
Other languages
Chinese (zh)
Inventor
周金元
刘璐
邹云晓
刘偲旸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pingtouge Shanghai Semiconductor Co Ltd
Original Assignee
Pingtouge Shanghai Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingtouge Shanghai Semiconductor Co Ltd filed Critical Pingtouge Shanghai Semiconductor Co Ltd
Priority to CN202211108747.XA priority Critical patent/CN117742657A/en
Publication of CN117742657A publication Critical patent/CN117742657A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

An arithmetic unit, an arithmetic logic unit, and a processor are disclosed. The operator is used for operating on a plurality of integers with N bits. The arithmetic unit comprises a first multiplier, a second multiplier, a third multiplier, a logic arithmetic unit and a first adder. The first multiplier is used for multiplying an integer with P bits, the second multiplier is used for multiplying an integer with Q bits, and the third multiplier is used for multiplying an integer with R bits. The logic operation unit is used for executing a plurality of AND operations. In the first stage, the first multiplier, the second multiplier, the third multiplier and the logic operator respectively receive corresponding multi-bits of the first input integer and the second input integer for operation, and the first adder sums the operation results to obtain an operation integer comprising the lowest (n+1) bits of the product of the first input integer and the second input integer.

Description

Arithmetic unit, arithmetic logic unit and processor
Technical Field
The present invention relates to an operator, and more particularly, to a high-order integer multiplier capable of multiplexing a plurality of low-order multipliers.
Background
The arithmetic logic unit (Arithmetic Logic Unit, ALU) is a basic component in a computing circuit, is commonly used in a central processing unit (Central Processing Unit, CPU) and a graphics processor (Graphic Processing Unit, GPU), and can be used to perform arithmetic operations or bit operations on binary integers. In the prior art, arithmetic logic units often include a variety of operators to perform different types of operations. For example, an arithmetic logic unit often includes a 32-bit floating-point multiply-add and a 16-bit floating-point multiply-add to perform floating-point operations. According to the floating-point representation, a 32-bit floating-point number may include a 1-bit sign bit, an 8-bit exponent bit, and a 23-bit mantissa bit. In this case, only the tail-biting portion is multiplied, so that only a 24-bit multiplier is typically used in the 32-bit floating-point multiplier-adder. Similarly, a 16-bit floating point number may include a 1-bit sign bit, a 5-bit exponent bit, and a 10-bit mantissa bit, so a 16-bit floating point number multiplier-adder typically uses only 11-bit multipliers.
However, in some applications, the processor may require a more number of integer multiplications, such as a 32-bit integer multiplication. In this case, the arithmetic logic unit cannot complete the required operation using only a 32-bit floating point number multiply-add, but may require an additional 32-bit integer multiplier. Since the area required for a 32-bit multiplier is very large, the addition of a 32-bit integer multiplier will also significantly increase the area required for the processor. In this case, how to efficiently use the multiplier with a lower number of bits in the arithmetic logic unit to perform integer multiplication with a higher number of bits without excessively increasing the calculation time has become a problem to be solved in the art.
Disclosure of Invention
An embodiment of the present application provides an arithmetic unit for operating on a plurality of integers having N bits, and the arithmetic unit includes a first multiplier, a second multiplier, a third multiplier, a logic arithmetic unit, and a first adder. The first multiplier is configured to multiply an integer having P bits, the second multiplier is configured to multiply an integer having Q bits, and the third multiplier is configured to multiply an integer having R bits. N, P, Q and R are integers, and N is greater than P, Q and R. The logic operator is used for executing a plurality of AND operations. In a first stage, the first multiplier, the second multiplier, the third multiplier, and the logic operator receive corresponding bits of a first input integer and a second input integer, respectively, for performing an operation, the first adder receives at least operation results of the first multiplier, the second multiplier, the third multiplier, and the logic operator to sum to obtain a first operation integer, and the first operation integer includes a lowest (n+1) bit of a product of the first input integer and the second input integer.
Another embodiment of the present application provides an arithmetic logic unit comprising the aforementioned operator.
Another embodiment of the present application provides a processor comprising the arithmetic logic unit described above.
The arithmetic unit, the arithmetic logic unit and the processor can effectively multiplex the multiplier with lower digits to complete multiply-add operation with higher digits, so that the area required by the arithmetic unit can be reduced, and the design and functional flexibility of the arithmetic logic unit and the processor can be improved.
Drawings
Aspects of the disclosure are better understood from the following embodiments when read in conjunction with the accompanying drawings. It should be noted that the various structures are not drawn to scale according to standard practice in the industry. In fact, the dimensions of the various structures may be arbitrarily increased or decreased for clarity of discussion.
Fig. 1 is a schematic diagram of an embodiment of a processor of the present application.
Fig. 2 is a schematic diagram of an embodiment of the operator of fig. 1.
Fig. 3 is a schematic diagram of the operator of fig. 2 multiplying an input integer in a first stage.
Fig. 4 is a schematic diagram of the operator of fig. 2 multiplying an input integer in a second stage.
Fig. 5 is a schematic diagram of another embodiment of the operator of the present application.
Fig. 6 is a schematic diagram of the operator of fig. 5 multiplying an input integer in a first stage.
Fig. 7 is a schematic diagram of the operator of fig. 5 multiplying an input integer in a second stage.
Detailed Description
The following disclosure provides many different embodiments, or examples, of the different means for implementing the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. Of course, such is merely an example and is not intended to be limiting. For example, in the following description, the formation of a first member over or on a second member may include embodiments in which the first member and the second member are formed in direct contact, and may also include embodiments in which additional members may be formed between the first member and the second member such that the first member and the second member may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Moreover, for ease of description, spatially relative terms such as "under … …," "under … …," "under," "over … …," "over … …," and the like may be used herein to describe one component or member's relationship to another component or member illustrated in the figures. In addition to the orientations depicted in the drawings, the spatially relative terms are intended to encompass different orientations of the device in use or operation. The apparatus may be otherwise oriented (rotated 90 degrees or otherwise) and thus the spatially relative descriptors used herein interpreted as such.
As used herein, terms such as "first," "second," and "third" describe various components, regions, layers and/or sections, but such components, regions, layers and/or sections should not be limited by such terms. Such terms may be used only to distinguish one component, region, layer or section from another. The terms such as "first," "second," and "third" when used herein do not imply a sequence or order unless clearly indicated by the context.
The singular forms "a", "an" and "the" may include plural forms as well, unless the context clearly indicates otherwise. The term "coupled" along with its derivatives may be used herein to describe structural relationships between parts. "connected" may be used to describe two or more elements in direct physical or electrical contact with each other. "connected" may also be used to indicate that two or more elements are in direct or indirect (with intervening elements between them) physical or electrical contact with each other, and/or that the two or more elements cooperate or interact with each other.
Fig. 1 is a schematic diagram of an embodiment of a processor of the present application. Instruction fetch unit 110, instruction decode unit 120, reorder buffer 130, arithmetic logic unit 140, load store unit 150, register file 160, and memory 170 may be included in processor 100. In this embodiment, the instruction fetch unit 110 may fetch the instructions to be executed by the processor 100 sequentially, and the instruction decode unit 120 may parse the operations required by the instructions. The reorder buffer 130 may adjust the order of execution of the actual operations according to the status of the hardware execution operations, and send the operations to be executed to the arithmetic logic unit 140 or the load store unit 150, respectively, for execution. The arithmetic logic unit 140 can perform various types of operation operations, such as multiplication, addition, and bit logic operations, according to the data stored in the register file 160, and store the operation result to the register file 160. In addition, load store unit 150 may read from and write to register file 160 and memory 170. Although the re-order buffer 130 does not necessarily send the requests to perform operations to the ALU 140 or the load store unit 150 in a predetermined order, after the operations are completed, the re-order buffer 130 still submits the operations according to the predetermined order of the original program, so that the correctness of the program can be maintained.
In the processor 100, since the arithmetic logic unit 140 is used to perform various operations, the arithmetic logic unit 140 may generally include various kinds of operators, such as the operator 142, to satisfy different operation requirements, so that the arithmetic logic unit 140 often occupies a large circuit area. Fig. 2 is a schematic diagram of an embodiment of the operator 142 in the arithmetic logic unit 140. In this embodiment, to reduce the area required by the arithmetic logic unit 140, the operator 142 can multiplex the smaller-bit multipliers MTP1A and MTP2A in the floating-point multiply-add devices 1421 and 1422 to complete the higher-bit integer multiplication.
As shown in fig. 2, the operator 142 may include a 32-bit floating-point number multiplier-adder 1421, a 16-bit floating-point number multiplier-adder 1422, a multiplier MTP3A, a logic operator LGC1A, and an adder ADD1A. The multiplier MTP1A in the floating-point multiplier-adder 1421 may perform multiplication on an integer having 24 bits, the multiplier MTP2A in the floating-point multiplier-adder 1422 may perform multiplication on an integer having 11 bits, and the multiplier MTP3A may perform multiplication on an integer having 8 bits. In this embodiment, the arithmetic unit 142 may perform 32-bit integer multiplication and addition by using the multipliers MTP1A, MTP a and MTP3A, the logic arithmetic unit LGC1A and the adder ADD1A, for example, multiply the 32-bit input integers INT1A and INT2A, and ADD the product of the input integers INT1A and INT2A to another 64-bit input integer INT3A to obtain a 64-bit output integer OINT1A, as shown in equation (0).
Oint1a=int1a×int2a+int3a formula (0)
For example, in the first stage, the multipliers MTP1A, MTP A and MTP3A and the logic operator LGC1A can receive the corresponding bits of the input integers INT1A and INT2A, respectively, for operation, and the adder ADD1A can receive the operation results of the multipliers MTP1A, MTP A and MTP3A and the logic operator LGC1A, and ADD the operation results to the lowest 32 bits INT3A [31:0] of the input integer INT3A to obtain the operation integer PINT1A, and the operation integer PINT1A includes the lowest 32 bits OINT1A [31:0] of the output integer OINT 1A.
Fig. 3 is a schematic diagram of the arithmetic unit 142 multiplying the input integers INT1A and INT2A in the first stage. In the first stage, the multiplier MTP1A may multiply the least significant 24 bits INT1A [23:0] of the input integer INT1A and the least significant 24 bits INT2A [23:0] of the input integer INT2A to generate the first operation result RA1A, the multiplier MTP2A may multiply the most significant 8 bits INT1A [31:24] of the input integer INT1A and the least significant 9 bits INT2A [8:0] of the input integer INT2A to generate the second operation result RA2A, and the multiplier MTP3A may multiply the least significant 8 bits INT1A [7:0] of the input integer INT1A and the most significant 8 bits INT1A [31:24] of the input integer INT2A to generate the third operation result RA3A. In addition, the logical operation unit LGC1A may perform an and operation on the 9 th low bit INT1A [8] of the input integer INT1A and the 25 th low bit INT2A [24] of the input integer INT2A to generate the operation result RA4A. Then, the adder ADD1A may ADD the operation result RA1A, RA2A, RA3A, RA A and the lowest 32 bits INT3A [31:0] of the input integer INT3A to generate the operation integer PINT1A.
In the first stage, the contents of the multiplier MTP1A, MTP2A, MTP a, the logic LGC1A and the adder ADD1A are shown in equations (1) to (5), respectively.
RA1 a=int1a [23:0] ×int2a [23:0] formula (1)
RA2 a=int1a [31:24] ×int2a [8:0] formula (2)
RA3 a=int1a [7:0] ×int2a [31:24] formula (3)
RA=INT1A [8] & INT2A [24] type (4)
Pint1a=r1a+r2a+r3a+r4a+int3a [31:0] formula (5)
In this case, the arithmetic integer PINT1A corresponds to the sum of the lowest 33 bits of the product of the input integers INT1A and INT2A and the lowest 32 bits INT3A [31:0] of the input integer INT3A. In some embodiments, the arithmetic logic unit 140 may only need to output a partial bit number of the integer OINT1A, such as the lowest 32 bits OINT1A [31:0], and the arithmetic unit 142 may directly output the lowest 32 bits PINT1A [31:0] of the integer PINT1A.
In the second stage after the calculation of the first stage is completed, the multipliers MTP1A, MTP A and MTP3A and the logic operator LGC1A can respectively receive the corresponding bits of the input integers INT1A and INT2A for operation, and the adder ADD1A can receive the operation results of the multipliers MTP1A, MTP A and MTP3A and the logic operator LGC1A and ADD the operation results with the highest 32 bits INT3A [63:32] of the input integer INT3A to obtain the operation integer PINT2A.
Fig. 4 is a schematic diagram of the arithmetic unit 142 multiplying the input integers INT1A and INT2A in the second stage. In the second stage, the multiplier MTP1A multiplies the highest 24 bits INT1A [31:8] of the input integer INT1A and the highest 24 bits INT2A [31:8] of the input integer INT2A and shifts the result down by 13 bits to generate the operation result RA5A. The multiplier MTP2A multiplies the highest 11 bits [31:21] of the input integer INT1A and the lowest 8 bits [7:0] of the input integer INT2A and shifts the result down by 8 bits to generate the operation result RA6A. The multiplier MTP3A multiplies the lowest 8 bits INT1A [7:0] of the input integer INT1A and the highest 8 bits INT2A [31:24] of the input integer INT2A and shifts the result down by 5 bits to generate the operation result RA7A.
In addition, the logic operator LGC1A may perform an and operation on the 8 th low bit INT1A [7] of the input integer INT1A and the 24 th low bit INT2A [23] of the input integer INT2A to generate the operation result RA8A, perform an and operation on the 7 th low bit INT1A [6] of the input integer INT and the 24 th low bit INT2A [23] of the input integer INT2A to generate the operation result RA9A, and may perform an and operation on the 8 th low bit INT1A [7] of the input integer INT1A and the 23 rd low bit INT2A [22] of the input integer INT2A to generate the operation result RA10A. The operation result RA8A corresponds to the product of the 8 th lower bit INT1A [7] of the input integer INT1A and the 24 th lower bit INT2A [23] of the input integer INT2A, the operation result RA9A corresponds to the product of the 7 th lower bit INT1A [6] of the input integer INT and the 24 th lower bit INT2A [23] of the input integer INT2A, and the logic operation unit LGC1A can shift the operation result RA8A to the upper bit by 1 and then combine with the operation result RA9A to form an operation result RA10A. The operation result RA11A corresponds to the product of the 8 th low bit INT1A [7] of the input integer INT1A and the 23 rd low bit INT2A [22] of the input integer INT2A. Then, adder ADD1A may ADD the results RA5A, RA6A, RA7A, RA A and RA11A to generate the operation integer PINT2A.
In the second stage, the contents of the multiplier MTP1A, MTP2A, MTP a, the logic LGC1A and the adder ADD1A are shown in equations (6) to (13), respectively.
RA5 A= (INT 1A [31:8 ]. Times.INt2A [31:8 ]) > 13 (6)
RA6 A= (INT 1A [31:21 ]. Times.INt2A [7:0 ]) > 8 (7)
RA7 A= (INT 1A [7:0 ]. Times.INt2A [31:24 ]) > 5 (8)
RA8 A= (INT 1A [7] & INT2A [23 ]) formula (9)
RA9 A= (INT 1A [6] & INT2A [23 ]) (10)
RA10 A= (INT 1A [7] & INT2A [23], INT1A [6] & INT2A [23 ]) of formula (11)
RA11 A= (INT 1A [7] & INT2A [22 ]) (12)
Pint2a=r5a+r6a+r7a+r10a+r11a+ (INT 3A [63:32] < 3) formula (13)
In this case, the operational integer Pint2A should include the sum of the highest 35 bits of the product of the input integer INT1A and the input integer INT2A and the highest 32 bits of the input integer INT3A, i.e., the highest 35 bits OINT1A [63:29] of the output integer OINT 1A.
It should be noted that, since the operator 142 does not calculate the low-order part of the product of the input integers INT1A and INT2A in the second stage, the operator 142 may further include an overflow checking circuit 1423 and an adder ADD2A in the present embodiment. The overflow checking circuit 1423 may generate a complementary bit DLT1 for determining whether the operation integer PINT2A is to be carried, and the adder ADD2A may ADD the complementary bit DLT1 to the highest 32 bits PINT2A [34:3] of the operation integer PINT2A, thereby obtaining the operation integer PINT3A. In this case, the arithmetic integer PINT3A corresponds to the sum of the highest 32 bits of the product of the input integers INT1A and INT2A and the highest 32 bits INT3A [63:32] of the input integer INT3A, so the arithmetic unit 142 can combine the lowest 32 bits PINT1A [31:0] of the arithmetic integer PINT1A and the arithmetic integer PINT3A into the complete output integer OINT1A, thereby completing the 32-bit integer multiply-add operation.
Since the operator 142 has calculated the lowest 32 bits OINT1A [31:0] of the output integer OINT1A in the first stage, the operator 142 would otherwise only need to calculate the highest 32 bits of the product of the input integer INT1A and the input integer INT2A to obtain the complete product in the second stage; however, in the second stage, since the lower part of the product of the input integer INT1A and the input integer INT2A may be carried to the highest 32 bits of the product of the input integer INT1A and the input integer INT2A, the operator 142 must determine whether the value of the lower part of the product of the input integer INT1A and the input integer INT2A is carried to the higher part.
In the present embodimentFor example, the shaded portion S1 filled with dots in FIG. 4 represents the product of all bits within 29 of the sum of the bits of the input integers INT1A and INT2A, which also corresponds to the highest 35 bits OINT1A [63:29] of the correct output integer OINT1A]And the difference value of the operational integer PINT2A. It can be found through calculation that even if each bit value of the shadow portion S1 is 1, the total difference sum is at most only less than 2 33 The sum of the shadow parts S1 is at most only the 4 th lower PINT2A [3] of the operation integer PINT2A]Carry 1.
In addition, since the arithmetic circuit 142 has already included the sum of the least significant 33 bits of the product of the input integers INT1A and INT2A and the least significant 32 bits INT3A [31] of the input integer INT3A in the first stage, the exclusive OR result of the most significant bit INT1A [32] of the arithmetic integer INT1A and the 33 th significant bit INT3A [32] of the input integer INT3A can be regarded as the true result of the 33 th significant bit OINT1A [32] of the output integer OINT1A, and the overflow checking circuit 1423 can exclusive OR the true result with the 4 th significant bit PINT2A [3] of the arithmetic integer PINT2A to obtain the complementary bit DLT1 for indicating whether the 4 th significant bit PINT2A [3] of the arithmetic integer PINT2A needs to be carried. Then, the adder ADD2A can ADD the highest 32 bits PINT2A [34:3] of the operation integer PINT2A to the complementary bit DLT1 to obtain the highest 32 bits of the output integer OINT 1A. The operation contents of the overflow checking circuit 1423 and the adder ADD2A can be shown in the following equations (14) and (15).
DLT1=PINT1A [32] ≡3A [32] ≡PINT2A [3] type (14)
ONT1B [63:32] =PINT2A [34:3] +DLT1 formula (15)
In this way, the arithmetic circuit 142 can respectively calculate the lowest 32 bits OINT1A [31:0] and the highest 32 bits OINT1A [63:32] of the output integer OINT1A in the first stage and the second stage, and can combine the two to form the complete output integer OINT1A, thereby completing the multiply-add operation of the integers INT1A, INT a and INT3A.
In addition, in the present embodiment, the adder ADD2A may be an adder provided in the 32-bit floating point number multiplier 1421, so that the arithmetic unit 142 can complete the 32-bit multiply-ADD operation in two stages by adding only the 8-bit multiplier MTP3A, the adder ADD1A, and the overflow check circuit 1423 in addition to the floating point number multipliers 1421 and 1422. Since the operator 142 can effectively multiplex the lower-order multipliers in the floating-point multiply-add devices 1421 and 1422 to perform multiply-add operations with higher orders, excessive increase of the area of the operator can be avoided, and the flexibility of the processor in design and function is increased.
In some implementations, the arithmetic logic unit 140 may only need the product of the input integers INT1A and INT2A, but not set the input integer INT3A, in which case the operator 142 may preset the 64 bits of the input integer INT3A to 0, or omit the portion of the input integer INT3A in the operations of equation (0), equation (5), equation (10) and equation (11).
In fig. 2 to 4, the operator 142 multiplexes the 24-bit multiply-add device and the 11-bit multiply-add device in the floating-point multiply-add devices 1421 and 1422 to complete the 32-bit multiply-add operation, but the present application is not limited thereto. In some other embodiments, the designer may multiplex the lower-order multipliers according to principles similar to operator 142 to achieve other higher-order multiply-add operations.
Fig. 5 is a schematic diagram of another embodiment of the operator of the present application. In the present embodiment, the operator 242 may multiply the input integers INT1B and INT2B having N bits and add the product of the two with the input integer INT3B having 2N bits to complete the multiply-add operation of N bits.
As shown in fig. 5, the operator 242 may include multipliers MTP1B, MTP B and MTP3B, adders ADD1B and ADD2B, a logic operator LGC1B, and an overflow check circuit 2424. In this embodiment, the multiplier MTP1B can perform a multiplication on an integer having P bits, the multiplier MTP2B can perform a multiplication on an integer having Q bits, the multiplier MTP3B can perform a multiplication on an integer having R bits, and the operator 242 can perform a multiplication on N bits using the multipliers MTP1B, MTP B and MTB3B, where N, P, Q, R is a positive integer and N is greater than P, P can be greater than Q and Q can be greater than R, further, in this embodiment, the sum of P and R is equal to N; taking the operator 142 of fig. 2 as an example, N is 32, p is 24, q is 11, and R is 8.
In the present embodiment, similar to the operator 142, the operator 242 may make the multipliers MTP1B, MTP B and MTB3B and the logic operator LGC1B receive the corresponding bits of the input integers INT1B and INT2B, respectively, to perform the operation in the first stage. Then, the adder ADD1B may receive the operation results of the multipliers MTP1B, MTP B and MTB3B and the logic operator LGC1B, and may ADD the operation results to the lowest N bits INT3B [ N-1:0] of the input integer INT3B to generate the operation integer PINT1B. In this case, the arithmetic integer PINT1B is the sum of the lowest (n+1) bit corresponding to the product of the input integers INT1B and INT2B and the lowest 32 bits of the input integer INT 3B.
Fig. 6 is a schematic diagram of the arithmetic unit 242 multiplying the input integers INT1A and INT2A in the first stage. In the first stage, the multiplier MTP1B can multiply the lowest P bit INT1B [ P-1:0] of the input integer INT1B and the lowest P bit INT2B [ P-1:0] of the input integer INT2B to generate a first operation result RA1B, the multiplier MTP2A can multiply the highest R bit INT1A [ N-1:N-R ] of the input integer INT1B and the lowest S bit INT2B [ S-1:0] of the input integer INT2B to generate a second operation result RA2B, and the multiplier MTP3B can multiply the lowest R bit INT1B [ R-1:0] of the input integer INT1B and the highest R bit INT1B [ N-1:N-R ] of the input integer INT2B to generate a third operation result RA3B. In addition, the logical operator LGC1B may perform an and operation on the partial digits of the input integers INT1B and INT2B to generate the operation result RA4B, such that the adder ADD1B may obtain the lowest (n+1) bit of the product of the input integers INT1B and INT2B after adding the operation results RA1B, RA2B, RA B and RA 4B. In addition, in the present embodiment, the adder ADD1B can also ADD the operation results RA1B, RA2B, RA B and RA4B to the lowest N-bit INT3B [ N-1:0] of the input integer INT3B to generate the operation integer PINT1B.
In the first stage, the operation contents of the multiplier MTP1B, MTP2B, MTP B and the adder ADD1B are shown in equations (16) to (20), respectively.
RA1 B=INT1B [ P-1:0 ]. Times.INT2B [ P-1:0] formula (16)
RA2 B=INT1B [ N-1:N-R ]. Times.INT2BS [ S-1:0] formula (17)
RA3 B=INT1B [ R-1:0 ]. Times.INT2B [ N-1:N-R ] formula (18)
Pint1b=r1b+r2b+r3b+r4b+int3b [31:0] formula (19)
In some embodiments, if the operator 242 is only required to calculate a partial bit number of the output integer OINT1B, such as the lowest N bits OINT1B [ N-1:0], the operator 242 can directly output the lowest N bits PINT1B [ N-1:0] of the operation integer PINT1B.
Fig. 7 is a schematic diagram of the arithmetic unit 242 multiplying the input integers INT1B and INT2B in the second stage. In the second stage, the multiplier MTP1B multiplies the highest P bit INT1A [ N-1:N-P ] of the input integer INT1B and the highest P bit INT2B [ N-1:N-P ] of the input integer INT2B by low shift (N-Q+R) -2 (N-P) bits to generate the operation result RA5B. The multiplier MTP2B multiplies the highest Q bit [ N-1:N-Q ] of the input integer INT1B and the lowest R bit [ R-1:0] of the input integer INT2B to low shift (N-Q+R) - (N-Q) bits to generate the operation result RA6B. The multiplier MTP3B multiplies the lowest R bit INT1B [ R-1:0] of the input integer INT1B and the highest R bit INT2A [ N-1:N-R ] of the input integer INT2B to low shift (N-Q+R) - (N-R) bits to generate the operation result RA7B.
In addition, the logical operator LGC1B may perform an and operation on the partial digits of the input integers INT1B and INT2B to generate an operation result RA8B, such that the adder ADD1B may obtain the highest (n+q-R) bit of the product of the input integers INT1B and INT2B after adding the operation results RA5B, RA6B, RA B and RA 8B. In addition, the adder ADD1B may also ADD the highest N bits of the input integer INT3B to the high shift (Q-R) bits and then to the operation results RA5B, RA6B, RA B and RA8B to generate the operation integer PINT2B.
In the second stage, the contents of the multiplier MTP1B, MTP2B, MTP B, the logic lgC1B and the adder ADD1B are shown in the equations (20) to (23), respectively.
RA5 B= (INT 1B [ N-1:N-P ]. Times.INT2B [ N-1:N-P ]) > [ (N-Q+R) -2 (N-P) ] type (20)
RA6 B= (INT 1B [ N-1:N-Q ]. Times.INT2B [ R-1:0 ]) > [ (N-Q+R) - (N-Q) ] type (21)
RA7 B= (INT 1B [ R-1:0 ]. Times.INT2B [ N-1:N-R ]) > [ (N-Q+R) - (N-R) ] type (22)
Pint2b=r5b+r6b+r7b+r8b+ [ INT3B [2N-1:n ] < (Q-R) ] formula (23)
In the present embodiment, the shaded portion S2 filled with dots in fig. 7 represents the product of all bits of the sum of the bits of the input integers INT1B and INT2B within (N-q+r) bits, which also corresponds to the difference between the highest (n+q-R) bit of the correct output integer OINT1B and the operation integer PINT2B. In the present embodiment, as long as it is confirmed that the total numerical sum maximum of the hatched portion S2 is less than 2 N+1 It means that the sum of the shadow parts S2 will only be at most (Q-R+1) th lower-order PINT2B [ Q-R ] of the operational integer PINT2B]Carry 1. In this case, the overflow checking circuit 242 can operate at least on the highest bit PINT1B [ N+1 ] of the integer PINT1B]And (Q-R+1) th lower PINT2B [ Q-R ] of the operation integer PINT2B]Judging the (Q-R+1) th low-order PINT2B [ Q-R ] of the operational integer PINT2B]Whether overflow is generated or not, and a corresponding supplemental bit DLT2 is generated.
Further, since the arithmetic circuit 242 calculates the generated arithmetic integer PINT1N to include the sum of the lowest (n+1) bit of the product of the input integers INT1B and INT2B and the lowest (n+1) bit INT3B [ N-1] of the input integer INT3B in the first stage, the overflow checking circuit 2423 performs the exclusive-or operation on the highest (n+1) bit PINT1B [ N ] of the arithmetic integer PINT1B and the (n+1) th low (n+1r3b [ N ] of the input integer INT3B, and the arithmetic result is regarded as the true result of the (n+1) th high (oint1b [ N) of the output integer oin 1B, the overflow checking circuit 2423 performs the exclusive-or operation on the (Q-r+1) th low (Q-R) bit PINT2B [ Q-R ] of the arithmetic integer PINT2B to obtain the carry bit DLT 2B indicating whether the (Q-R) th (Q-R) of the arithmetic integer PINT2B is needed. Then, the adder ADD2B can ADD the N-bit PINT2B [ N+Q-R-1:Q-R ] of the operation integer PINT2B to the complementary bit DLT2 to obtain the N-bit of the output integer OINT 1B.
The operation contents of the overflow checking circuit 2423 and the adder ADD2B can be shown in the following equations (24) and (25).
DLT2=PINT1B [ N ] ≡3A [ N ] ≡PINT2B [ Q-R ] formula (24)
ONT1B [2N-1:N ] =PINT2B [ N+Q-R-1:Q-R ] +DLT2 formula (25)
In this way, the operation circuit 242 can respectively calculate the lowest N bits OINT1N [ N-1:0] and the highest N bits OINT1N [2N:N ] of the output integer OINT1B in the first stage and the second stage, and can combine the two to form the complete output integer OINT1B, thereby completing the multiply-add operation of the integers INT1B, INT B and INT 3B.
In the present embodiment, the multipliers MTP1B, MTP B and MTP3B may be, for example, multipliers or multipliers originally required by the arithmetic unit 242 to perform other operations, and the adder ADD2B may also be an adder provided in the multipliers, so that the arithmetic unit 242 can effectively multiplex the multipliers with lower bits to complete multiply-ADD operations with higher bits, thereby avoiding excessively increasing the area of the arithmetic unit and increasing the flexibility of the processor in design and function. In addition, since the operator 242 can complete the high-order multiply-add operation in only two stages, the operation performance is sufficient to support most of the applications.
In summary, the arithmetic unit, the arithmetic logic unit and the processor of the present application can effectively multiplex the multiplier with a lower number of bits to complete the multiply-add operation with a higher number of bits, so that the area required by the arithmetic unit can be reduced, and the design and functional flexibility of the arithmetic logic unit and the processor can be increased.
The foregoing outlines structures of several embodiments so that those skilled in the art may better understand the aspects of the disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other manufacturing processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.

Claims (20)

1. An operator for operating on a plurality of integers having N bits, comprising:
a first multiplier for performing a multiplication operation on an integer having P bits;
a second multiplier for performing a multiplication operation on an integer having Q bits;
a third multiplier to perform a multiplication operation on an integer having R bits, wherein N, P, Q and R are integers and N is greater than P, Q and R;
a logic operator for performing a plurality of AND operations; and
A first adder;
wherein:
in a first stage, the first multiplier, the second multiplier, the third multiplier, and the logic operator receive corresponding bits of a first input integer and a second input integer, respectively, for performing an operation;
the first adder receives at least the operation results of the first multiplier, the second multiplier, the third multiplier and the logic operator to sum up to obtain a first operation integer; and
The first operational integer includes a lowest (n+1) bit of a product of the first input integer and the second input integer.
2. The operator according to claim 1 wherein said operator is further configured to calculate a lowest N-bit of a sum of said product and a third input integer, said first adder is configured to generate said first operational integer from a lowest N-bit of an operation result of said first multiplier, said second multiplier, said third multiplier and said logic operator and a third input integer in said first stage, and a lowest N-bit of said first operational integer is equal to a sum of said product and a lowest N-bit of said third input integer.
3. The operator according to claim 1 wherein:
p is more than Q, Q is more than R, the sum of P and R is more than or equal to N, and the sum of P and R is more than or equal to N;
in the second stage:
the first multiplier, the second multiplier, the third multiplier and the logic operator respectively receive corresponding multi-bits of the first input integer and the second input integer for operation; and
The first adder receives at least the operation results of the first multiplier, the second multiplier, the third multiplier and the logic operator to sum to obtain a second operation integer, wherein the second operation integer comprises the highest (N+Q-R) bit of the product; and
The arithmetic unit further includes:
an overflow checking circuit for determining whether the (Q-R+1) th low bit of the second operation integer generates overflow to generate a complementary bit according to at least the highest bit of the first operation integer and the (Q-R+1) th low bit of the second operation integer; and
A second adder for adding the complementary bit to the highest N bits of the second operation integer to obtain a third operation integer, the third operation integer being the highest N bits of the product;
wherein the operator combines the lowest N bits of the first operational integer and the third operational integer into the product.
4. The operator according to claim 3 wherein:
the operator is further configured to calculate a sum of the product and a third input integer, the third input integer having 2N bits;
the first adder generates the first operation integer by adding the operation results of the first multiplier, the second multiplier, the third multiplier and the logic operator to the lowest N bits of the third input integer in the first stage, and generates the second operation integer by adding the operation results of the first multiplier, the second multiplier, the third multiplier and the logic operator to the highest N bits of the third input integer in the second stage; and
The overflow checking circuit judges whether the Nth high bit of the second operation integer generates overflow or not to generate a complementary bit according to the highest bit of the first operation integer, the (Q-R+1) th low bit of the second operation integer and the (N+1) th low bit of the third input integer.
5. The operator according to claim 3 wherein the sum of the number of bits of said first input integer and said second input integer is less than (n+1) th power of 2 within (N-q+r) bits.
6. The operator according to any one of claims 3 to 5 further comprising:
a 32 bit floating point number multiplier-adder; and
A 16-bit floating point number multiplier-adder;
the first multiplier and the second adder are arranged in the 32-bit floating point number multiplier-adder, and the second multiplier is arranged in the 16-bit floating point number multiplier-adder.
7. The operator according to claim 6 wherein N is 32, p is 24, q is 11, r is 8, said first operational integer is the lowest 33 bits of said product, and said third operational integer is the highest 32 bits of said product.
8. The operator according to claim 7 wherein said first multiplier multiplies a lowest 24 bits of said first input integer and a lowest 24 bits of said second input integer in said first stage to produce a first operation result.
9. The operator according to claim 8 wherein said second multiplier multiplies the highest 8 bits of said first input integer and the lowest 9 bits of said second input integer in said first stage to produce a second operation result.
10. The operator according to claim 9 wherein said third multiplier performs an and operation on a lowest 8 bits of said first input integer and a highest 8 bits of said second input integer in said first stage to produce a third operation result.
11. The operator according to claim 10 wherein said logic operator performs an and operation on a9 th low order of said first input integer and a 25 th low order of said second input integer in said first stage to produce a fourth operation result.
12. The operator according to claim 11 wherein said first adder generates said first operational integer from a lowest 32 bits of said first operational result, said second operational result, said third operational result, said fourth operational result, and said third input integer in said first stage.
13. The operator according to claim 12 wherein said first multiplier multiplies the highest 24 bits of said first input integer and the highest 24 bits of said second input integer in said second stage by 13 bits to produce a fifth operation result.
14. The operator according to claim 13 wherein the second multiplier multiplies the highest 11 bits of the first input integer and the lowest 8 bits of the second input integer in the second stage by 8 bits to generate a sixth operation result.
15. The operator according to claim 14 wherein said third multiplier multiplies the lowest 8 bits of said first input integer and the highest 8 bits of said second input integer in said second stage by 5 bits to produce a seventh operation result.
16. The operator according to claim 15 wherein the logic operator performs an and operation on the 8 th low order of the first input integer and the 24 th low order of the second input integer in the second stage to produce an eighth operation result, performs an and operation on the 7 th low order of the first input integer and the 24 th low order of the second input integer to produce a ninth operation result, performs an and operation on the 8 th low order of the first input integer and the 23 rd low order of the second input integer to produce a tenth operation result, and shifts the eighth operation result to an upper order of 1 bit and combines the ninth operation result with the ninth operation result to produce an eleventh operation result.
17. The operator according to claim 16 wherein the first adder adds up the fifth operation result, the sixth operation result, the seventh operation result, the tenth operation result, an eleventh operation result, and a highest 32-bit shift of a third input integer by 3 bits in the second stage to generate the second operation integer.
18. The operator according to claim 17 wherein the overflow checking circuit performs an exclusive or operation on a most significant bit of the first operational integer, a 33 rd low bit of the third input integer, and a4 th low bit of the second operational integer to generate the supplemental bit.
19. An arithmetic logic unit comprising an operator as claimed in any one of claims 1 to 18.
20. A processor comprising the arithmetic logic unit of claim 19.
CN202211108747.XA 2022-09-13 2022-09-13 Arithmetic unit, arithmetic logic unit and processor Pending CN117742657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211108747.XA CN117742657A (en) 2022-09-13 2022-09-13 Arithmetic unit, arithmetic logic unit and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211108747.XA CN117742657A (en) 2022-09-13 2022-09-13 Arithmetic unit, arithmetic logic unit and processor

Publications (1)

Publication Number Publication Date
CN117742657A true CN117742657A (en) 2024-03-22

Family

ID=90276222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211108747.XA Pending CN117742657A (en) 2022-09-13 2022-09-13 Arithmetic unit, arithmetic logic unit and processor

Country Status (1)

Country Link
CN (1) CN117742657A (en)

Similar Documents

Publication Publication Date Title
US10366050B2 (en) Multi-operation neural network unit
US10776690B2 (en) Neural network unit with plurality of selectable output functions
US6381625B2 (en) Method and apparatus for calculating a power of an operand
Zhang et al. Efficient multiple-precision floating-point fused multiply-add with mixed-precision support
US10579338B2 (en) Apparatus and method for processing input operand values
MX2008010873A (en) Floating-point processor with reduced power requirements for selectable subprecision.
US10255041B2 (en) Unified multiply unit
US11816448B2 (en) Compressing like-magnitude partial products in multiply accumulation
US8019805B1 (en) Apparatus and method for multiple pass extended precision floating point multiplication
US6115732A (en) Method and apparatus for compressing intermediate products
US11119731B2 (en) Apparatus and method for rounding
US10459688B1 (en) Encoding special value in anchored-data element
CN114341796A (en) Signed multiword multiplier
US7958180B2 (en) Multiplier engine
CN117742657A (en) Arithmetic unit, arithmetic logic unit and processor
US10963245B2 (en) Anchored data element conversion
WO2020161457A1 (en) Overflow or underflow handling for anchored-data value
Sreerama et al. An Algorithm for variable precision based floating point multiplication
Shapran et al. DIVISION USING THE BASE RADIX16 NUMBER SYSTEM TO FORM FRACTION DIGITS
Meyer-Baese et al. Computer Arithmetic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination