WO2023004783A1 - 一种累加器、乘法器及算子电路 - Google Patents

一种累加器、乘法器及算子电路 Download PDF

Info

Publication number
WO2023004783A1
WO2023004783A1 PCT/CN2021/109751 CN2021109751W WO2023004783A1 WO 2023004783 A1 WO2023004783 A1 WO 2023004783A1 CN 2021109751 W CN2021109751 W CN 2021109751W WO 2023004783 A1 WO2023004783 A1 WO 2023004783A1
Authority
WO
WIPO (PCT)
Prior art keywords
transistor
bits
node
twenty
coupled
Prior art date
Application number
PCT/CN2021/109751
Other languages
English (en)
French (fr)
Inventor
范团宝
时小山
蒋越星
蒋明峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/109751 priority Critical patent/WO2023004783A1/zh
Priority to CN202180007101.XA priority patent/CN115917499A/zh
Priority to EP21951373.6A priority patent/EP4336345A1/en
Publication of WO2023004783A1 publication Critical patent/WO2023004783A1/zh
Priority to US18/424,893 priority patent/US20240168714A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • G06F7/5318Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting

Definitions

  • the present application relates to the field of electronic technology, in particular to an accumulator, a multiplier and an operator circuit.
  • B0 to B7 in FIG. 1 correspond to different digits (ie 2 0 to 2 7 ).
  • accumulators are implemented based on the Wallace tree (Wallace tree) compression method, and multiple standard adders (including full adders and half adders) are used to separate each of the multiple digits
  • the above bits are compressed layer by layer, and two accumulated values are obtained after multi-layer compression, and finally the two accumulated values are added to obtain the final result.
  • the multiple binary numbers to be accumulated can be compressed into two rows through 4 compressor layers (each row represents an accumulation value) , including multiple standard adders in each compressor layer.
  • Wallace tree is an existing method to complete efficient accumulation. The bits on different digits of the same layer can be compressed in parallel, and the delay of each layer is the delay of a standard full adder, so it has fast calculation speed. specialty.
  • the application provides an accumulator, a multiplier and an operator circuit, which are used to reduce the difficulty of realizing the accumulator, thereby reducing the area and power consumption.
  • the application adopts the following technical solutions:
  • an accumulator including W compressor layers, W is an integer greater than or equal to 1; the W compressor layers are used to compress multiple binary numbers to obtain multiple accumulated values, and the multiple The sum of the accumulated values is the accumulated sum of the plurality of binary numbers; wherein, the W compressor layers include at least one first compressor layer, and each first compressor layer is used to compress the input array to obtain the output array, the The input array includes a first array and a second array, the first array includes a plurality of positive-phase bits, the second array includes a plurality of inverted bits, and the output array includes a first compressed array and a second compressed array; wherein , each first compressor layer includes: a first compression circuit for compressing the first array to obtain a first compressed array; a second compression circuit for compressing the second array to obtain a second compressed array.
  • the W compressor layers include at least one first compressor layer, and in the input array of each first compressor layer, the first array includes a plurality of positive-phase bits, and the second array includes a plurality of Inverted bits, so the first array can be considered as a Wallace tree including multiple positive phase bits, and the second array can be considered as a Wallace tree including multiple inverted bits. That is, the input array of each first compressor layer includes two Wallace trees, and the phases of the bits included in the two Wallace trees are opposite.
  • the first compression circuit is used to compress the first array
  • the second compression circuit is used to compress the second array, so that different phases in the input array
  • the bits of can be compressed by different compression circuits, so that there is no need to unify the bits in the input array of each first compressor layer to the same phase, so there is no need to add a functional circuit to perform phase unification processing, so that the accumulation
  • the device is simple to implement, and can reduce the area and power consumption.
  • the first compression circuit includes one or more first compressors, each of the one or more first compressors is used to compress Three bits located on the same digit;
  • the second compression circuit includes one or more second compressors, and each second compressor in the one or more second compressors is used to compress the same digit in the second array
  • one or more first compressors in the first compression circuit and one or more second compressors in the second compression circuit can be used in parallel to compress the bits on the corresponding digits, so that Increase, the compression efficiency of each first compressor layer.
  • each of the first compressors and each of the second compressors is an inverting sum adder; the inverting sum adder is used to compress the three bits, a carry output bit and a sum output bit are obtained, the phase of the carry output bit is the same as the phase of the three bits, and the phase of the sum output bit is opposite to the phase of the three bits.
  • an inverting summation adder is provided, and the implementation scheme of the inverting summation adder is relatively simple, for example, the area is small and the power consumption is low.
  • the inverting sum adder is used to perform the following compression: if the three bits are all 0, the carry output bit is 0, and the sum output bit is 1 ; If the three bits are all 1, the carry output bit is 1, and the sum output bit is 0; if one of the three bits is 1 and the other two bits are 0, then The carry output bit is 0, and the sum output bit is 0; if two of the three bits are 1 and the other bit is 0, the carry output bit is 1, and the sum output bit is 1.
  • a simple and effective compression manner of the inverse sum adder is provided.
  • each of the first compressors and each of the second compressors is an inverting carry adder; the inverting carry adder is used to compress the three bits bits to obtain a carry output bit and a sum output bit, the phase of the carry output bit is opposite to the phase of the three bits, and the phase of the sum output bit is the same as the phase of the three bits.
  • an inverting carry adder is provided, and the implementation scheme of the inverting carry adder is relatively simple, for example, the area is small and the power consumption is low.
  • the inverting carry adder is used to perform the following compression: if the three bits are all 0, the carry output bit is 1, and the sum output bit is 0; If the three bits are all 1, the carry output bit is 0, and the sum output bit is 1; if one of the three bits is 1 and the other two bits are 0, the If the carry output bit is 1, the sum output bit is 1; if two of the three bits are 1 and the other bit is 0, the carry output bit is 0, and the sum output bit is 0 .
  • a simple and effective compression manner of the inverse-carry adder is provided.
  • each of the first compressors and each of the second compressors is a double inversion adder; the double inversion adder is used to compress the three bits bit, to obtain a carry output bit and a sum output bit, and the phases of the carry output bit and the sum output bit are opposite to those of the three bits.
  • a double-inverting adder is provided, and the implementation scheme of the double-inverting adder is relatively simple, such as small area and low power consumption.
  • the accumulator further includes: a summing circuit, configured to receive the multiple accumulated values, and sum the multiple accumulated values to obtain the accumulated sum.
  • the accumulator further includes: one or more inverters, configured to compress one or more first compressors or second compressors in the W compressor layers Inverting at least one of the sum output bit and the carry output bit output by the compressor, or inverting the three bits input to the one or more first compressors or second compressors.
  • the compression efficiency of the W compressor layers can be improved while ensuring the accuracy of the compression result.
  • a multiplier in a second aspect, includes an encoder and an accumulator, and the accumulator is the accumulator provided in the first aspect or any possible implementation manner of the first aspect.
  • an operator circuit When the operator circuit is applied to an accumulator, it can be used as an adder in the compressor layer of the accumulator, and the adder is an inverting sum adder, including: A transistor, a second transistor, a third transistor, a fourth transistor, a fifth transistor, a sixth transistor, a seventh transistor, an eighth transistor, a ninth transistor, a tenth transistor, an eleventh transistor, a twelfth transistor, a Thirteenth transistor, fourteenth transistor, fifteenth transistor, sixteenth transistor, seventeenth transistor, eighteenth transistor, nineteenth transistor, twentieth transistor, twenty-first transistor, twenty-second transistor , the twenty-third transistor and the twenty-fourth transistor; wherein, the first transistor and the second transistor are coupled in parallel between the power supply terminal and the first node; the third transistor is coupled between the first node and the second node; The four transistors are coupled between the second node and the third node; the fifth transistor and the sixth transistor are coupled in parallel between the third node and the
  • the first transistor, the second transistor, the third transistor, the seventh transistor, the eighth transistor, the tenth transistor, the eleventh transistor, the fifteenth transistor, the sixteenth transistor The transistors, the seventeenth transistor, the eighteenth transistor and the twenty-third transistor are PMOS transistors; the fourth transistor, the fifth transistor, the sixth transistor, the ninth transistor, the twelfth transistor, the thirteenth transistor, the fourteenth transistor.
  • the transistors, the nineteenth transistor, the twentieth transistor, the twenty-first transistor, the twenty-second transistor, and the twenty-fourth transistor are NMOS transistors.
  • an operator circuit When the operator circuit is applied to an accumulator, it can be used as an adder in the compressor layer of the accumulator, and the adder is an inverting carry adder, including: a first Transistor, second transistor, third transistor, fourth transistor, fifth transistor, sixth transistor, seventh transistor, eighth transistor, ninth transistor, tenth transistor, eleventh transistor, twelfth transistor, tenth Three transistors, fourteenth transistors, fifteenth transistors, sixteenth transistors, seventeenth transistors, eighteenth transistors, nineteenth transistors, twentieth transistors, twenty-first transistors, twenty-second transistors, The twenty-third transistor and the twenty-fourth transistor; wherein, the first transistor and the second transistor are coupled in parallel between the power supply terminal and the first node; the third transistor is coupled between the first node and the first output terminal; the second transistor is coupled between the first node and the first output terminal; Four transistors are coupled between the first output terminal and the second node; the fifth transistor and the sixth transistor are coupled in parallel
  • the first transistor, the second transistor, the third transistor, the seventh transistor, the eighth transistor, the tenth transistor, the eleventh transistor, the fifteenth transistor, the sixteenth transistor The transistors, the seventeenth transistor, the eighteenth transistor and the twenty-third transistor are PMOS transistors; the fourth transistor, the fifth transistor, the sixth transistor, the ninth transistor, the twelfth transistor, the thirteenth transistor, the fourteenth transistor.
  • the transistors, the nineteenth transistor, the twentieth transistor, the twenty-first transistor, the twenty-second transistor, and the twenty-fourth transistor are NMOS transistors.
  • a processor including an accumulator, a multiplier, or an operator circuit; wherein, the accumulator is the accumulator provided in the above-mentioned first aspect or any possible implementation of the first aspect, the The multiplier is the multiplier provided in the above second aspect, and the operator circuit is the operator circuit provided in any possible implementation manner of the above third to fourth aspects, or the third to fourth aspects.
  • a chip including an accumulator, a multiplier, or an operator circuit; wherein, the accumulator is the accumulator provided in the first aspect or any possible implementation of the first aspect, and the multiplication
  • the multiplier is the multiplier provided in the above-mentioned second aspect
  • the operator circuit is the operator circuit provided in any possible implementation manner of the above-mentioned third to fourth aspects, or the third to fourth aspects.
  • FIG. 2 is a schematic diagram of cumulative calculation in a multiplier provided in an embodiment of the present application
  • Fig. 3 is the synoptic diagram that a kind of accumulator based on standard adder accumulates a plurality of binary numbers
  • FIG. 4 is a schematic structural diagram of a communication device provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an accumulator provided in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an inverting sum adder provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another accumulator provided in the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an inverting carry adder provided in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another accumulator provided in the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a double inverting adder provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another accumulator provided in the embodiment of the present application.
  • FIG. 13 is a schematic diagram of an inverter in another accumulator provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an inverting sum adder provided in an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of an inverting carry adder provided in an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a double inverting adder provided in an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a multiplier provided in an embodiment of the present application.
  • FIG. 18 is a performance comparison diagram of accumulation calculation in a multiplier provided in an embodiment of the present application.
  • At least one means one or more, and “multiple” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
  • “At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • at least one item (piece) of a, b, or c can represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b, c can be single or multiple.
  • the current accumulator implemented based on the Wallace tree usually includes multiple compressor layers.
  • each bit in the input array and output array of each compressor layer is required Bits are positive phase.
  • multiple standard adders are used to perform parallel compression. After every three bits are compressed by a standard adder, a positive phase carry output bit and a Positive summing output bit.
  • the accumulator when the accumulator is used to accumulate 27 binary numbers of 6 bits (bits), the accumulator may include multiple compressor layers, and each of the multiple compressor layers Each standard adder in the first compressor layer is used to compress the three bits of positive phase (represented as IN0, IN1 and IN2 in Fig. 3), and the carry output bit of the positive phase of the output is represented as C, the positive phase of The summing output bit is denoted S. Only the first compressor layer to the third compressor layer among the multiple compressor layers are shown in FIG. It is called the weight bit, which refers to the bits in different positions in the binary system, similar to the ones, tens, and hundreds in decimal.
  • the above-mentioned accumulator uses a large number of standard adders, and the standard adder requires that the input bits and the output bits are both in phase when performing multi-bit compression, so the standard adder needs to include a phase unification functional circuit.
  • MOS metal-oxide-semiconductor
  • the standard adders in the above accumulators are basically standard full adders.
  • the number of MOS tubes included in each standard full adder is as high as 28, and the area occupied by the 7nm process is 0.2736um 2 , so that the area of the accumulator is large.
  • the large number of MOS transistors in the standard full adder will lead to a large number of flipping times of the MOS transistors in the average unit bit calculation, so that the power consumption of the accumulator is large. Therefore, the current accumulator implemented based on the Wallace tree has the problems of large area and high power consumption.
  • the embodiment of the present application provides an accumulator, which reduces the power consumption and area of the accumulator by compressing the bits of different phases in the same compressor layer respectively, and the accumulator can be used in communication equipment.
  • the specific description of the communication device and the accumulator can be found below.
  • the phase of the bit (also referred to as the signal corresponding to the bit) has two phase states, positive phase and reverse phase, and these two phase states are relative, such as a bit
  • the positive phase of the bit is G
  • the reverse phase of this bit is /G, that is, the negative phase signal is the logical inversion of the positive phase signal.
  • the program storage area can store an operating system, at least one application program required by a function, etc., and the data storage area can store the operating time of the device. created data, etc.
  • the processor 102 is used to control and manage the actions of the communication device, for example, by running or executing software programs and/or modules stored in the memory 101, and calling data stored in the memory 101 to perform various functions of the device and process data.
  • the communication interface 103 is used to support the communication device to perform communication.
  • the processor 102 includes but is not limited to a central processing unit (central processing unit, CPU), a network processing unit (network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), or a digital signal processor (digital signal processor, DSP) or general-purpose processors, etc.
  • the processor 102 includes one or more accumulators, or includes one or more multipliers, for example, the processor 102 includes a multiplier array, and the multiplier implements the multiplication operation in the processor 102 device.
  • the bus 104 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 4 , but it does not mean that there is only one bus or one type of bus.
  • FIG. 5 is a schematic structural diagram of an accumulator provided by an embodiment of the present application, and the accumulator can be used to realize the accumulation of multiple binary numbers.
  • this accumulator comprises: W compressor layer, is used for compressing a plurality of binary numbers, to obtain a plurality of accumulation values, and the sum of the plurality of accumulation values is the accumulation sum of the plurality of binary numbers, and W is greater than or an integer equal to 1.
  • multiple lines (each line including one or more bits) can be obtained, and each line represents an accumulation value, that is, the multiple lines represent the multiple accumulation values.
  • the multiple accumulated values may be two accumulated values or more than two accumulated values, which is not specifically limited in this embodiment of the present application.
  • the W compressor layers may include one compressor layer or multiple compressor layers, for example, the W may be equal to 1, 4, or 6, etc., and the specific values may be set by those skilled in the art according to experience or actual needs. The embodiment does not specifically limit this.
  • the W compressor layers include a plurality of compressor layers, and the plurality of compressor layers are represented as L1 to LW as an example for illustration.
  • the W compressor layers include at least one first compressor layer, and each first compressor layer in the at least one first compressor layer is used to compress an input array to obtain an output array.
  • the input array includes a first array and a second array, the first array includes a plurality of positive-phase bits, the second array includes a plurality of inverted bits, and the output array includes a first compressed array and a second compressed array.
  • Each first compressor layer includes: a first compression circuit 21 for compressing the first array in the input array of the first compressor layer to obtain a first compressed array; a second compression circuit 22 for compressing the The second array in the input array of the first compressor layer to obtain a second compressed array.
  • the at least one first compressor layer may include one or more first compressor layers, and the number of layers of the one or more first compressor layers may be expressed as N, where N is a positive integer less than or equal to W.
  • the at least one first compressor layer may be any one or more of the W compressor layers.
  • Fig. 5 comprise (W-1) first compressor layer with this at least one first compressor layer, and this (W-1) first compressor layer is the 2nd in this W compressor layer
  • the Wth compressor layer (that is, L2 to LW) is taken as an example for illustration. It should be noted that when N is less than W, other compressor layers in the W compressor layers except the at least one first compressor layer can be implemented using existing technologies, and this embodiment of the present application does not specifically describe this limit.
  • the first array in the input array of each first compressor layer, includes a plurality of positive-phase bits, and the second array includes a plurality of inverted bits, so that the first array can be considered as A Wallace tree including multiple bits in positive phase, the second array can be regarded as a Wallace tree including multiple bits in reverse phase. That is, the input array of each first compressor layer includes two Wallace trees, and the phases of the bits included in the two Wallace trees are opposite.
  • the first compression circuit 21 is used to compress the first array
  • the second compression circuit 22 is used to compress the second array, so that in the input array Bits of different phases can be compressed by different compression circuits, so that there is no need to unify the bits in the input array of each first compressor layer into the same phase, so that the accumulator is simple to implement compared to traditional designs, and Can reduce area and power consumption.
  • each compressor layer in the W compressor layers can include one or more compressors, and each compressor can be used to compress three bits on the same digit in the input array of the compressor layer .
  • the first compression circuit 21 in each first compressor layer the first compression circuit 21 may include one or more first compressors 211, each of the one or more first compressors 211 The compressor 211 is used to compress the three bits located at the same digit in the first array.
  • the second compression circuit 22 in each first compressor layer the second compression circuit 22 may include one or more second compressors 221, each of the one or more second compressors 221 The compressor 221 is used to compress the three bits located at the same digit in the second array.
  • the first compressor 211 and the second compressor 221 may be any of the following three adders, the three adders include an inverting sum adder, an inverting carry adder and a double inverting adder, The three adders are described below.
  • the first type an inverse summing adder, is used to compress three bits to obtain a sum output bit and a carry output bit, the phase of the sum output bit is opposite to the phase of the three bits, the The phase of the carry-out bit is the same as that of the three bits.
  • the inverted sum adder compresses the three bits IN0, IN1 and IN2 Finally, output a carry bit C and a summation bit /S, C has the same phase as IN0, IN1 and IN2, and /S has the opposite phase to IN0, IN1 and IN2.
  • the inverting sum adder can satisfy the logic functions shown in the following formulas (1-1) and (1-2), NOT means negation operation, XOR means OR operation, AND means AND operation , OR represents an OR operation.
  • the inverted sum adder is used to perform the following compression: if the three bits are all 0, then the carry output bit is 0, the The sum output bit is 1; if the three bits are all 1, the carry output bit is 1, and the sum output bit is 0; if one of the three bits is 1, the other two If the bit is 0, the carry output bit is 0, and the summation output bit is 0; if two of the three bits are 1 and the other bit is 0, the carry output bit is 1 , the sum output bit is 1. That is, the inverse sum adder can be compressed according to the logic table shown in Table 1 below.
  • the first compressor 211 is the inverting sum adder
  • the first compressor 211 is used to: compress the three positive-phase bits to obtain a positive-phase carry output bit and an inverted summation output bit.
  • the second compressor 221 is used to: compress the inverted three bits to obtain an inverted carry output bit and a non-inverted sum output bit.
  • each compressor layer in the W compressor layers includes a plurality of inverting sum adders, and the input array of the first compressor layer (that is, L1) only includes A plurality of positive-phase bits, the input arrays of the 2nd to the W compressor (ie L2 to LW) all include a plurality of positive-phase bits (ie the first array) and a plurality of reversed-phase bits ( i.e. the second array).
  • first compression circuit 21 is included in the first compressor layer (ie L1), and the first compression circuit 21 includes a plurality of first compressors 211;
  • Each of the compressor layers (ie L2 to LW) includes a first compression circuit 21 and a second compression circuit 22, the first compression circuit 21 includes a plurality of first compressors 211, the second compression The circuit 22 includes a plurality of second compressors 221 .
  • the first compressor layer (ie L1) includes 54 first compressors 211, each of which A compressor 211 is used to compress three positive phase bits of the same digit to output a positive phase carry output bit C and an inverted sum output bit /S.
  • the input matrix of the second compressor layer (ie, L2) includes the 54 output bits output by the first compressor 211 .
  • the first matrix in the input matrix includes the positive phase carry output bit C output by the 54 first compressors 211
  • the second matrix includes the inverted sum output bit C output by the 54 first compressor 211 /S.
  • the first matrix in the input matrix includes the positive-phase carry output bits C output by the 18 first compressors 211 and the positive-phase sum output bits S output by the 18 second compressors 221, and the second The matrix includes the inverted sum output bits /S output by the 18 first compressors 211 and the inverted carry output bits /C output by the 18 second compressors 221 .
  • the first compression circuit 21 in the third compressor layer (ie L3 ) includes 12 first compressors 211
  • the second compression circuit 22 includes 12 second compressors 221 .
  • Each first compressor 211 in the 12 first compressors 211 is used to compress three positive-phase bits on the same digit in the first matrix to output a positive-phase carry output bit C and a Inverted summing output bit /S.
  • Each second compressor 221 in the 12 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output an inverted carry output bit /C and A non-inverting summing output bit S.
  • the input matrix of the 4th compressor layer (ie L4) includes the output bits output by the 12 first compressors 211 and the 12 second compressors 221 .
  • the first matrix in the input matrix includes the positive-phase carry output bits C output by the 12 first compressors 211 and the positive-phase sum output bits S output by the 12 second compressors 221, and the second The matrix includes the inverted sum output bits /S output by the 12 first compressors 211 and the inverted carry output bits /C output by the 12 second compressors 221 .
  • the first compression circuit 21 in the fourth compressor layer includes six first compressors 211
  • the second compression circuit 22 includes six second compressors 221 .
  • Each first compressor 211 in the six first compressors 211 is used to compress three positive-phase bits on the same digit in the first matrix to output a positive-phase carry output bit C and a Inverted summing output bit /S.
  • Each second compressor 221 in the 6 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output an inverted carry output bit /C and A non-inverting summing output bit S.
  • the output matrix of the 4th compressor layer includes: the positive phase carry output bit C and the inverted sum output bit /S output by the 6 first compressors 211, the 6 second compressors 221 outputs the inverted carry output bit /C and the normal phase sum output bit S, as well as the uncompressed normal phase bits in the first array and the uncompressed inverted bits in the second array.
  • first to fourth compressor layers ie, L1 to L4 in the accumulator are shown in FIG. 7 , and the compression methods of other compressor layers after the fourth compressor layer can be Compression is performed in a manner similar to that of the second to fourth compressor layers above, or the bits of different phases in the output matrix of the fourth compressor layer are converted into bits of the same phase through the inversion operation, and then through the present Some compression methods are used for compression, which will not be repeated in this embodiment of the present application.
  • the second type an inverting carry adder, is used to compress three bits to obtain a sum output bit and a carry output bit, the phase of the sum output bit is the same as the phase of the three bits, and the carry The phase of the output bit is the opposite of that of the three bits.
  • the inverting carry adder can satisfy the logic functions shown in the following formulas (2-1) and (2-2), NOT represents negation operation, XOR represents OR operation, AND represents AND operation, OR represents an OR operation.
  • the inverting carry adder is used to perform the following compression: if the three bits are all 0, the carry output bit is 1, and the calculation The sum output bit is 0; if the three bits are all 1, the carry output bit is 0, and the sum output bit is 1; if one of the three bits is 1, the other two bits bit is 0, then the carry output bit is 1, and the summation output bit is 1; if there are two bits in the three bits that are 1 and the other bit is 0, then the carry output bit is 0, The sum output bit is 0. That is, the inverted carry adder can be compressed according to the logic table shown in Table 2 below.
  • the first compressor 211 is used to: compress the three positive phase bits to obtain an inverted carry output and a positive sum output bit.
  • the second compressor 221 is used to: compress the inverted three bits to obtain a positive phase carry output bit and an inverted sum output bit.
  • each compressor layer in the W compressor layers includes a plurality of inverse carry adders, and the input array of the first compressor layer (ie L1) only includes multiple positive-phase bits, the input arrays of the 2nd to Wth compressors (that is, L2 to LW) all include multiple positive-phase bits (that is, the first array) and multiple reversed-phase bits (that is, second array).
  • the above description can be understood as: only one first compression circuit 21 is included in the first compressor layer (ie L1), and the first compression circuit 21 includes a plurality of first compressors 211; the second to Wth compressors
  • Each compressor layer in the layers i.e. L2 to LW) includes a first compression circuit 21 and a second compression circuit 22, the first compression circuit 21 includes a plurality of first compressors 211, the second compression circuit 22 includes a plurality of second compressors 221.
  • the first compressor layer (ie L1) includes 54 first compressors 211, each of which A compressor 211 is used to compress three positive phase bits of the same digit to output an inverted carry output bit /C and a normal phase sum output bit S.
  • the input matrix of the second compressor layer (ie, L2) includes the 54 output bits output by the first compressor 211 .
  • the first matrix in the input matrix includes the inverted carry output bits /C output by the 54 first compressors 211
  • the second matrix includes the non-inverted summation outputs output by the 54 first compressors 211 Bit S.
  • the first compression circuit 21 in the second compressor layer includes 18 first compressors 211
  • the second compression circuit 22 includes 18 second compressors 221 .
  • Each first compressor 211 in the 18 first compressors 211 is used to compress three positive phase bits on the same digit in the first matrix to output an inverted carry output bit /C and A non-inverting summing output bit S.
  • Each second compressor 221 in the 18 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output a positive phase carry output bit C and a Inverted summing output bit /S.
  • the input matrix of the third compressor layer includes the output bits of the 18 first compressors 211 and the 18 second compressors 221 .
  • the first matrix in the input matrix includes the positive-phase sum output bits S output by the 18 first compressors 211 and the positive-phase carry output bits C output by the 18 second compressors 221, and the second The matrix includes the inverted carry output bits /C output by the 18 first compressors 211 and the inverted sum output bits /S output by the 18 second compressors 221 .
  • the first compression circuit 21 in the third compressor layer includes 12 first compressors 211
  • the second compression circuit 22 includes 12 second compressors 221 .
  • Each first compressor 211 in the 12 first compressors 211 is used to compress three positive phase bits on the same digit in the first matrix to output an inverted carry output bit /C and A non-inverting summing output bit S.
  • Each second compressor 221 in the 12 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output a positive phase carry output bit C and a Inverted summing output bit /S.
  • the input matrix of the fourth compressor layer (ie, L4) includes the output bits of the 12 first compressors 211 and the 12 second compressors 221 .
  • the first matrix in the input matrix includes the positive-phase sum output bits S output by the 12 first compressors 211 and the positive-phase carry output bits S output by the 12 second compressors 221, and the second The matrix includes the inverted carry output bits /C output by the 12 first compressors 211 and the inverted sum output bits /S output by the 12 second compressors 221 .
  • the first compression circuit 21 in the fourth compressor layer includes six first compressors 211
  • the second compression circuit 22 includes six second compressors 221 .
  • Each first compressor 211 in the six first compressors 211 is used to compress three positive phase bits on the same digit in the first matrix to output an inverted carry output bit /C and A non-inverting summing output bit S.
  • Each second compressor 221 in the 6 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output a positive phase carry output bit C and a Inverted summing output bit /S.
  • the output matrix of the 4th compressor layer (that is, L4) includes: the inverted carry output bit /C and the non-phase sum output bit S output by the 6 first compressors 211, the 6 second compressors 221 The output inverted sum output bit /S and the normal phase carry output bit C, as well as the uncompressed normal phase bits in the first array and the uncompressed inverted bits in the second array.
  • first to fourth compressor layers ie, L1 to L4 in the accumulator are shown in FIG. 9 , and the compression methods of other compressor layers after the fourth compressor layer can be Compression is performed in a manner similar to that of the second to fourth compressor layers above, or the bits of different phases in the output matrix of the fourth compressor layer are converted into bits of the same phase through the inversion operation, and then through the present Some compression methods are used for compression, which will not be repeated in this embodiment of the present application.
  • the input matrix of the first compressor layer (that is, L1) only includes a plurality of positive-phase bits for illustration, which does not limit the embodiment of the present application. .
  • the input matrix of the first compressor layer ie, L1 may also include only a plurality of inverted bits, or include both normal-phase bits and inverted bits.
  • the third type double inverting adder, is used to compress three bits, to obtain a sum output bit and a carry output bit, the phase of the sum output bit and the phase of the carry output bit are all the same as the three The bits are out of phase.
  • the double-inverting adder can satisfy the logic functions shown in the following formulas (3-1) and (3-2), NOT represents a negation operation, XOR represents an OR operation, AND represents an AND operation, OR represents an OR operation.
  • the double inverting adder is used to perform the following compression: if the three bits are all 0, the carry output bit is 1, and the calculation The sum output bit is 1; if the three bits are all 1, the carry output bit is 0, and the sum output bit is 0; if one of the three bits is 1, the other two bits bit is 0, the carry output bit is 1, and the summation output bit is 0; if there are two bits in the three bits that are 1 and the other bit is 0, then the carry output bit is 0, The sum output bit is 1. That is, the double inverting adder can be compressed according to the logic table shown in Table 3 below.
  • the first compressor 211 is used to: compress the three positive-phase bits to obtain an inverted carry output and an inverted sum output bit.
  • the second compressor 221 is used to: compress the three inverted bits to obtain a positive-phase carry output and a positive-phase sum output bit.
  • each compressor layer in the W compressor layers includes a plurality of double inverting adders, and the first to the Wth compressors (that is, L1 to LW)
  • the input arrays each include a first array and a second array.
  • each compressor layer in the 1st to Wth compressor layers includes a first compression circuit 21 and a second compression circuit 22, the first compression circuit 21 includes a plurality of first compressors 211 and the second compression circuit 22 includes a plurality of second compressors 221 .
  • the first compression circuit 21 in the first compressor layer includes 18 first compressors 211
  • the second The compression circuit 22 includes 18 second compressors 221 .
  • Each first compressor 211 in the 18 first compressors 211 is used to compress three positive phase bits on the same digit in the first matrix to output an inverted carry output bit /C and An inverted summing output bit /S.
  • Each second compressor 221 in the 18 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output a positive phase carry output bit C and a The summing output bit S of the positive phase.
  • the first matrix in the input matrix of the second compressor layer i.e. L2
  • the second matrix includes the The 18 inverted carry output bits /C and inverted sum output bits /S output by the first compressor 211 .
  • the first compression circuit 21 in the second compressor layer includes 12 first compressors 211
  • the second compression circuit 22 includes 12 second compressors 221 .
  • Each first compressor 211 in the 12 first compressors 211 is used to compress three positive phase bits on the same digit in the first matrix to output an inverted carry output bit /C and An inverted summing output bit /S.
  • Each second compressor 221 in the 12 second compressors 221 is used to compress three inverted bits on the same digit in the second matrix to output a positive phase carry output bit C and a The summing output bit S of the positive phase.
  • the output matrix of the second compressor layer i.e. L2) includes: the inverted sum output bit /S and the inverted carry output bit /C output by the 12 first compressors 211, and the 12 second The compressor 221 outputs a positive-phase sum output bit S and a positive-phase carry output bit C.
  • first to second compressor layers namely L1 to L2
  • the compression methods of other compressor layers after the second compressor layer can be Compress in a similar manner to the first to second compressor layers above, or convert the bits of different phases in the output matrix of the second compressor layer into bits of the same phase through the inverse operation, and then pass the current Some compression methods are used for compression, which will not be repeated in this embodiment of the present application.
  • the first compressor layers located on different levels among the multiple first compressor layers can be the same or different.
  • the multiple first compressor layers include at least two first compressor layers, and each first compressor and each second compressor in the at least two first compressor layers may adopt the above three One of the adders.
  • each first compressor and each second compressor in a part of the first compressor layers of the at least two first compressor layers adopts one of the above three adders, and the other part of the first compressor
  • Each of the first compressor and each of the second compressors in the layer uses one or two of the other two of the above three adders.
  • the accumulator may further include: a summing circuit 23, configured to receive the multiple accumulated values, and sum the multiple accumulated values to obtain the accumulated sum. Specifically, after the W compressor layers compress the multiple binary numbers to obtain the multiple accumulated values, the Wth compressor layer (ie LW) in the W compressor layers can send the multiple accumulated values For the summing circuit 23, when the summing circuit 23 receives the multiple accumulated values, it sums the multiple accumulated values to obtain the cumulative sum of the multiple binary numbers.
  • the multiple accumulated values are two accumulated values.
  • the summation circuit 23 is an adder, and the adder sums the two accumulated values to obtain the accumulated sum of the multiple binary numbers.
  • the accumulator may also include: one or more inverters for summing the output bits and Inverting at least one of the carry out bits, or inverting three bits input to the one or more first compressors or second compressors.
  • the one or more inverters are used for: summing output bits and carry bits output by one or more first compressors or second compressors in the W compressor layers At least one of the output bits is inverted.
  • the input matrix of the i-th compressor layer includes 3 rows 6
  • the i-th compressor layer includes 6 first compressors, and the 6 first compressors are all inverting sum adders.
  • the 6 first compressors are used to compress the input matrix to output two rows, the first row of the two rows includes 6 inverted summation output bits /S, and the second row includes 6 non-inverted carries Output bit C.
  • the one or more inverters may include 6 inverters, and the 6 inverters may be used to respectively invert the 6 inverted summation output bits /S in the first row to obtain 6 positive-phase summation output bits S.
  • the two lines output by the i-th compressor layer are transformed into two lines of normal phase.
  • the one or more inverters are used to invert the three bits input to one or more first compressors or second compressors in the W compressor layers .
  • the input matrix of the i-th compressor layer includes 6 rows of 3 The bits of the column, wherein the 1st row to the 3rd row are positive-phase bits, the 4th row to the 6th row are reversed-phase bits, the i-th compressor layer includes 6 first compressors, and the 6 The first compressors are both inverting sum adders.
  • the one or more inverters may include 9 inverters, and the 9 inverters can be used to respectively invert the 9 inverted bits in the 4th row to the 6th row to obtain 9 positive-phase bits (that is, the bits in the 4th to 6th rows are converted into positive phases), that is, the 9 inverters can be used to input the 3 first compressor bits in the second row Bit inversion.
  • the 6 first compressors in the 1st compressor layer can be used to compress the three positive phase bits on the same data respectively to output four lines, including two inverted lines The summation output bit /S and the carry output bit C of the two rows of positive phase.
  • the input matrix of the i-th compressor layer and the included first compressor shown in FIG. 12 and FIG. 13 are only exemplary, and do not constitute a limitation to the embodiment of the present application.
  • at least one of the sum output bit and the carry output bit output by the one or more first compressors or the second compressor is reversed by the one or more inverters, or the The inversion of the three bits input to the one or more first compressors or second compressors can improve the compression efficiency of the W compressor layers while ensuring the accuracy of the compression results, and then improve the accumulator. Computational efficiency.
  • FIG. 14 is a schematic structural diagram of an inverting sum compression operator circuit provided in an embodiment of the present application.
  • the inverting sum operator circuit may also be called an inverting sum adder.
  • the inverting sum adder includes: a first transistor M1, a second transistor M2, a third transistor M3, a fourth transistor M4, a fifth transistor M5, a sixth transistor M6, a seventh transistor M7, an eighth transistor M8, a Nine transistors M9, tenth transistors M10, eleventh transistors M11, twelfth transistors M12, thirteenth transistors M13, fourteenth transistors M14, fifteenth transistors M15, sixteenth transistors M16, seventeenth transistors M17 , the eighteenth transistor M18, the nineteenth transistor M19, the twentieth transistor M20, the twenty-first transistor M21, the twenty-second transistor M22, the twenty-third transistor M23, and the twenty-fourth transistor M24.
  • the first transistor M1 and the second transistor M2 are coupled in parallel between the power supply terminal and the first node 1; the third transistor M3 is coupled between the first node 1 and the second node 2; the fourth transistor M4 is coupled between the second node between the node 2 and the third node 3; the fifth transistor M5 and the sixth transistor M6 are coupled in parallel between the third node 3 and the ground terminal; the seventh transistor M7 is coupled between the power supply terminal and the fourth node 4; the eighth The transistor M8 is coupled between the second node 2 and the fourth node 4; the ninth transistor M9 is coupled between the second node 2 and the fifth node 5; the tenth transistor M10 and the eleventh transistor M11 are coupled in series at the fourth node Between 4 and the first output terminal /C; the twelfth transistor M12 and the thirteenth transistor M13 are coupled in series between the fifth node 5 and the first output terminal /C; the fourteenth transistor M14 is coupled to the fifth node 5 and the ground terminal; the fifteenth transistor M15,
  • the control terminals of the third transistor M3, the fourth transistor M4, the eleventh transistor M11, the twelfth transistor M12, the fifteenth transistor M15 and the twentieth transistor M20 are used to receive the first input IN0;
  • the control terminals of the fifth transistor M5, the seventh transistor M7, the fourteenth transistor M14, the sixteenth transistor M16 and the twenty-first transistor M21 are used to receive the second input IN2;
  • the control terminals of the transistor M8 , the ninth transistor M9 , the tenth transistor M10 , the thirteenth transistor M13 , the seventeenth transistor M17 and the twenty-second transistor M22 are used to receive the third input IN3 .
  • the control terminals of the eighteenth transistor M18 and the nineteenth transistor M19 are coupled to the second node 2; the twenty-third transistor M23 and the twenty-fourth transistor M24 are coupled in series between the power terminal and the ground terminal, and the twenty-third The coupling point of the transistor M23 and the twenty-fourth transistor M24 is the second output terminal S; the control terminals of the twenty-third transistor M23 and the twenty-fourth transistor M24 are both coupled to the second node 2.
  • the first input IN0, the second input IN2 and the third input IN3 may be the three bits in the relevant description of the inverting sum adder in the above accumulator embodiment
  • the first An output terminal /S may be used to output the sum output bit of the inverted sum adder
  • a second output terminal C may be used to output the carry output bit of the inverted sum adder.
  • the first transistor M1, the second transistor M2, the third transistor M3, the seventh transistor M7, the eighth transistor M8, the tenth transistor M10, the eleventh transistor M11, the fifteenth transistor M15, and the sixteenth transistor M16, the seventeenth transistor M17, the eighteenth transistor M18 and the twenty-third transistor M23 are PMOS transistors;
  • the thirteenth transistor M13 , the fourteenth transistor M14 , the nineteenth transistor M19 , the twentieth transistor M20 , the twenty-first transistor M21 , the twenty-second transistor M22 , and the twenty-fourth transistor M24 are NMOS transistors.
  • the above-mentioned control terminal may specifically refer to a gate of a corresponding PMOS transistor or an NMOS transistor.
  • the first transistor M1 to the twenty-fourth transistor M24 in the example above may be MOS transistors, or may be replaced by bipolar transistors.
  • the types of the transistors shown in FIG. The embodiments of the present application are not limited.
  • Figure 14 is only an example of a circuit, and any transistors added to the circuit so that the functions of multiple transistors are equivalent to those of one or more transistors in Figure 14 are also regarded as the same type of circuit .
  • FIG. 15 is a schematic structural diagram of an inverting carry compression operator circuit provided by an embodiment of the present application.
  • the inverting carry operator circuit may also be called an inverting carry adder.
  • the inverting carry adder includes: a first transistor M1, a second transistor M2, a third transistor M3, a fourth transistor M4, a fifth transistor M5, a sixth transistor M6, a seventh transistor M7, an eighth transistor M8, a ninth transistor Transistor M9, tenth transistor M10, eleventh transistor M11, twelfth transistor M12, thirteenth transistor M13, fourteenth transistor M14, fifteenth transistor M15, sixteenth transistor M16, seventeenth transistor M17, The eighteenth transistor M18, the nineteenth transistor M19, the twentieth transistor M20, the twenty-first transistor M21, the twenty-second transistor M22, the twenty-third transistor M23, and the twenty-fourth transistor M24.
  • the first transistor M1 and the second transistor M2 are coupled in parallel between the power supply terminal and the first node 1; the third transistor M3 is coupled between the first node 1 and the first output terminal /C; the fourth transistor M4 is coupled between Between the first output terminal /C and the second node 2; the fifth transistor M5 and the sixth transistor M6 are coupled in parallel between the second node 2 and the ground terminal; the seventh transistor M7 is coupled between the power supply terminal and the third node 3 between; the eighth transistor M8 is coupled between the third node 3 and the first output terminal /C; the ninth transistor M9 is coupled between the first output terminal /C and the fourth node 4; the tenth transistor M10 and the eleventh
  • the transistor M11 is coupled in series between the third node 3 and the fifth node 5; the twelfth transistor M12 and the thirteenth transistor M13 are coupled in series between the fourth node 4 and the fifth node 5; the fourteenth transistor M14 is coupled in Between the fourth node 4 and the ground terminal; the fifteenth transistor M
  • the control terminals of the third transistor M3, the fourth transistor M4, the eleventh transistor M11, the twelfth transistor M12, the fifteenth transistor M15 and the twentieth transistor M20 are used to receive the first input IN0;
  • the control terminals of the fifth transistor M5, the seventh transistor M7, the fourteenth transistor M14, the sixteenth transistor M16 and the twenty-first transistor M21 are used to receive the second input IN2;
  • the control terminals of the transistor M8 , the ninth transistor M9 , the tenth transistor M10 , the thirteenth transistor M13 , the seventeenth transistor M17 and the twenty-second transistor M22 are used to receive the third input IN3 .
  • the control terminals of the eighteenth transistor M18 and the nineteenth transistor M19 are coupled to the first output terminal /C; the twenty-third transistor M23 and the twenty-fourth transistor M24 are coupled in series between the power terminal and the ground terminal, and the second The coupling point of the thirteenth transistor M23 and the twenty-fourth transistor M24 is the second output terminal S; the control terminals of the twenty-third transistor M23 and the twenty-fourth transistor M24 are both coupled to the fifth node 5.
  • the first input IN0, the second input IN2 and the third input IN3 may be the three bits in the relevant description of the inverse carry sum adder in the above accumulator embodiment
  • the first output terminal /C may be used to output the sum output bit of the inverted carry adder
  • the second output terminal S may be used to output the carry output bit of the inverted carry adder.
  • the first transistor M1, the second transistor M2, the third transistor M3, the seventh transistor M7, the eighth transistor M8, the tenth transistor M10, the eleventh transistor M11, the fifteenth transistor M15, and the sixteenth transistor M16, the seventeenth transistor M17, the eighteenth transistor M18 and the twenty-third transistor M23 are PMOS transistors;
  • the thirteenth transistor M13 , the fourteenth transistor M14 , the nineteenth transistor M19 , the twentieth transistor M20 , the twenty-first transistor M21 , the twenty-second transistor M22 , and the twenty-fourth transistor M24 are NMOS transistors.
  • the above-mentioned control terminal may specifically refer to a gate of a corresponding PMOS transistor or an NMOS transistor.
  • the first transistor M1 to the twenty-fourth transistor M24 in the example above may be MOS transistors, or may be replaced by bipolar transistors.
  • the types of transistors shown in FIG. The embodiments of the present application are not limited.
  • Figure 15 is only an example of a circuit, and any transistors added to the circuit so that the functions of multiple transistors are equivalent to those of one or more transistors in Figure 15 are also regarded as the same type of circuit .
  • FIG. 16 is a schematic structural diagram of a double-inversion compression operator circuit provided by an embodiment of the present application.
  • the double-inversion operator circuit may also be called a double-inversion adder.
  • the double-inversion adder includes: a first transistor M1, a second transistor M2, a third transistor M3, a fourth transistor M4, a fifth transistor M5, a sixth transistor M6, a seventh transistor M7, an eighth transistor M8, a ninth transistor Transistor M9, tenth transistor M10, eleventh transistor M11, twelfth transistor M12, thirteenth transistor M13, fourteenth transistor M14, fifteenth transistor M15, sixteenth transistor M16, seventeenth transistor M17, An eighteenth transistor M18, a nineteenth transistor M19, a twentieth transistor M20, a twenty-first transistor M21, and a twenty-second transistor M22.
  • the first transistor M1 and the second transistor M2 are coupled in parallel between the power supply terminal and the first node 1; the third transistor M3 is coupled between the first node 1 and the first output terminal /C; the fourth transistor M4 is coupled between Between the first output terminal /C and the second node 2; the fifth transistor M5 and the sixth transistor M6 are coupled in parallel between the second node 2 and the ground terminal; the seventh transistor M7 is coupled between the power supply terminal and the third node 3
  • the eighth transistor M8 is coupled between the third node 3 and the first output terminal /C; the ninth transistor M9 is coupled between the first output terminal /C and the fourth node 4; the tenth transistor M10 is coupled between the fourth node between the node 4 and the ground terminal; the eleventh transistor M11 and the twelfth transistor M12 are coupled in series between the third node 3 and the second output terminal /S; the thirteenth transistor M13 and the fourteenth transistor M14 are coupled in series Between the second output terminal /S and the fourth node 4; the
  • the control terminals of the third transistor M3, the fourth transistor M4, the twelfth transistor M12, the thirteenth transistor M13, the fifteenth transistor M15 and the twentieth transistor M20 are used to receive the first input IN0;
  • the control terminals of the fifth transistor M5, the seventh transistor M7, the tenth transistor M10, the sixteenth transistor M16 and the twenty-first transistor M21 are used to receive the second input IN2;
  • the control terminals of the M8, the ninth transistor M9, the eleventh transistor M11, the fourteenth transistor M14, the seventeenth transistor M17 and the twenty-second transistor M22 are used to receive the third input IN3.
  • the control terminals of the eighteenth transistor M18 and the nineteenth transistor M19 are both coupled to the first output terminal /C.
  • the first input IN0, the second input IN2 and the third input IN3 may be the three bits described in the relevant description of the double-inversion adder in the above accumulator embodiment, the first The output terminal /C can be used to output the sum output bit of the double-inverting adder, and the second output terminal S can be used to output the carry output bit of the double-inverting adder.
  • the transistor M16, the seventeenth transistor M17 and the eighteenth transistor M18 are PMOS transistors
  • the fourth transistor M4, the fifth transistor M5, the sixth transistor M6, the ninth transistor M9, the tenth transistor M10, the thirteenth transistor M13, the The fourteenth transistor M14, the nineteenth transistor M19, the twentieth transistor M20, the twenty-first transistor M21, and the twenty-second transistor M22 are NMOS transistors.
  • the above-mentioned control terminal may specifically refer to a gate of a corresponding PMOS transistor or an NMOS transistor.
  • first transistor M1 to the twenty-second transistor M22 in the example above may be MOS transistors, or may be replaced by bipolar transistors.
  • the types of transistors shown in FIG. The embodiments of the present application are not limited.
  • Figure 16 is only an example of a circuit, and any transistors added to the circuit so that the functions of multiple transistors are equivalent to those of one or more transistors in Figure 16 are also regarded as the same type of circuit .
  • the embodiment of the present application also provides a multiplier.
  • the multiplier may include: multiple sets of encoders 301 and accumulators 302, and the multiple sets of encoders 301 can be used for the binary number representation A value and a second value are encoded to obtain a plurality of partial product terms, and the accumulator 302 can be used to accumulate the plurality of partial product terms to obtain a product of the first value and the second value.
  • the accumulator 302 may be any accumulator provided above, and the plurality of partial product items may be used as an input array of the first compressor layer among the W compressor layers of the above accumulator.
  • the multiplier may further include: multiple precoders 303 .
  • the plurality of precoders 303 can be used to precode the first value to obtain a precoding result; correspondingly, the plurality of encoders 301 can be used to code the precoding result and the second value to obtain the above multiple a partial product term.
  • the multiplier shown in FIG. 17 please refer to the international patent PCT/CN2019/119993.
  • the embodiment of the present application will adopt the multiplier of the accumulator provided above (hereinafter referred to as the inverse accumulation multiplier), and the existing multiplier of the accumulator realized based on the standard adder (hereinafter referred to as the traditional accumulation multiplier) device) were compared, and the specific power consumption and area under the 7nm process are shown in FIG. 18 , as an embodiment of the present application, it is not limited to the following specific parameter values.
  • (a) in FIG. 18 shows the power consumption in the corresponding accumulator when using the inverting accumulating multiplier and the traditional accumulating multiplier to perform multiplication of two 8-bits to 32-bit binary numbers respectively.
  • FIG. 18 shows the power consumption in the corresponding accumulator when using the inverting accumulating multiplier and the traditional accumulating multiplier to perform multiplication of two 8-bits to 32-bit binary numbers respectively.
  • FIG. 18 shows the size of the area in the corresponding accumulator when the inverting accumulating multiplier and the traditional accumulating multiplier are used to multiply two 8-bits to 32-bit binary numbers respectively. It can be seen from FIG. 18 that both the power consumption and the area of the inverting accumulation multiplier are smaller than those of the traditional accumulation multiplier.
  • the input array of each first compressor layer in at least one first compressor layer includes a first array and a second array
  • the first array It can be regarded as a Wallace tree including a plurality of positive-phase bits
  • the second array can be regarded as a Wallace tree including a plurality of inverted-phase bits. That is, the input array includes two Wallace trees, and the phases of the bits included in the two Wallace trees are opposite.
  • the first compression circuit 21 is used to compress the first array
  • the second compression circuit 22 is used to compress the second array, so that bits of different phases in the input array can be compressed by different compression circuits Compression, so that it is not necessary to unify the bits in the input array of each first compressor layer to the same phase, so that the accumulator is simple to implement compared with the traditional design, and can reduce area and power consumption. Therefore, the area and power consumption of the multiplier using the accumulator is also small.
  • a processor including an accumulator, a multiplier, or an operator circuit; wherein, the accumulator includes the accumulator provided above, and the multiplier is the above-mentioned A multiplier including the accumulator is provided, and the operator circuit includes any one or more operator circuits provided above.
  • a chip including an accumulator, a multiplier, or an operator circuit; wherein, the accumulator includes the accumulator provided above, and the multiplier is provided above
  • the multiplier including the accumulator, the operator circuit includes any one or more operator circuits provided above.
  • a communication device is also provided.
  • the structure of the communication device may be as shown in FIG. 4 , that is, the communication device may include a memory 101 , a processor 102 , a communication interface 103 and a bus 104 .
  • the processor 102 may include the accumulator provided above, or the multiplier provided above including the accumulator.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Logic Circuits (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请提供一种累加器、乘法器及算子电路,涉及电子技术领域,用于降低累加器的面积和功耗。该累加器包括:W个压缩器层,W为大于或等于1的整数;其中,该W个压缩器层包括至少一个第一压缩器层,在每个第一压缩器层的输入阵列中,第一阵列包括多个正相的比特位,第二阵列包括多个反相的比特位,每个第一压缩器层包括用于压缩第一阵列的第一压缩电路、用于压缩第二阵列的第二压缩电路,即每个第一压缩器层的输入阵列中不同相位的比特位由不同的压缩电路来压缩。

Description

一种累加器、乘法器及算子电路 技术领域
本申请涉及电子技术领域,尤其涉及一种累加器、乘法器及算子电路。
背景技术
累加器是数字电路设计中很常用的一种计算电路,可用于实现多个二进制数的相加(或称为累加),比如,用于实现图1中的(a)所示的“y=x0[7:0]+x1[7:0]+x2[7:0]+…”的相加。累加器还可以应用中乘法器中,用于实现乘法运算中的多个二进制数的累加,比如,用于实现图1中的(b)所示的“y=x0[7:0]×x1[7:0]”中的多个二进制数的累加。图1中B0至B7对应表示不同的数位(即2 0至2 7)。
现有技术中,累加器都是基于华莱士树(Wallace tree)的压缩方式实现的,且使用多个标准加法器(包括全加器和半加器)分别将多个数位中每个数位上的比特位进行逐层压缩,经过多层压缩后得到两个累加值,最后再将这两个累加值相加得到最终的结果。示例性的,如图2所示,对于一个8bits×8bits的乘法器中的累加计算,可以通过4个压缩器层将待累加的多个二进制数压缩为两行(每行表示一个累加值),每个压缩器层中包括多个标准加法器。华莱士树是一种现有的完成高效累加的方法,同一层的不同数位上的比特位可以并行压缩,且每一层的延迟都是一个标准全加器的延迟,从而具有计算速度快的特点。
但是,基于上述华莱士树方式实现的累加器中,使用了大量的标准加法器(主要是全加器),而标准加法器在进行多比特位压缩时要求输入的比特位和输出的比特位都是正相,从而导致标准加法器中需要包括用于执行相位统一的功能电路,这样会导致该累加器存在面积大、功耗高的问题。
发明内容
本申请提供一种累加器、乘法器及算子电路,用于降低累加器的实现难度,从而降低面积和功耗。为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种累加器,包括W个压缩器层,W为大于或等于1的整数;该W个压缩器层,用于压缩多个二进制数,以得到多个累加值,该多个累加值之和为该多个二进制数的累加和;其中,该W个压缩器层包括至少一个第一压缩器层,每个第一压缩器层用于压缩输入阵列以得到输出阵列,该输入阵列包括第一阵列和第二阵列,第一阵列包括多个正相的比特位,第二阵列包括多个反相的比特位,该输出阵列包括第一压缩阵列和第二压缩阵列;其中,该每个第一压缩器层包括:第一压缩电路,用于压缩第一阵列,以得到第一压缩阵列;第二压缩电路,用于压缩第二阵列,以得到第二压缩阵列。
上述技术方案中,该W个压缩器层包括至少一个第一压缩器层,在每个第一压缩器层的输入阵列中,第一阵列包括多个正相的比特位,第二阵列包括多个反相的比特位,从而第一阵列可以认为是一个包括多个正相的比特位的华莱士树,第二阵列可以认为是一个包括多个反相的比特位的华莱士树。也即是,每个第一压缩器层的输入阵列包括两个华莱士树,这两个华莱士树中包括的比特位的相位相反。对于每个第一压缩器层的输入阵列包括的第一阵列和第二阵列,第一压缩电路用于压缩第一阵列,第二压缩电路用于压缩第二阵 列,从而该输入阵列中不同相位的比特位可以由不同的压缩电路来压缩,从而无需将每个第一压缩器层的输入阵列中的比特位都统一为同一相位,因此无需添加执行相位统一处理的功能电路,从而使得该累加器相对于传统设计实现简单,且能够降低面积和功耗。
在第一方面的一种可能的实现方式中,第一压缩电路包括一个或多个第一压缩器,该一个或多个第一压缩器中的每个第一压缩器用于压缩第一阵列中位于同一数位上的三个比特位;第二压缩电路包括一个或多个第二压缩器,该一个或多个第二压缩器中的每个第二压缩器用于压缩第二阵列中位于同一数位上的三个比特位。上述可能的实现方式中,第一压缩电路中的一个或多个第一压缩器、以及第二压缩电路中的一个或多个第二压缩器可并行用于压缩对应数位上的比特位,从而提高,每个第一压缩器层的压缩效率。
在第一方面的一种可能的实现方式中,该每个第一压缩器和该每个第二压缩器均为反相求和加法器;该反相求和加法器,用于压缩该三个比特位,得到一个进位输出位和一个求和输出位,该进位输出位的相位与该三个比特位的相位相同,该求和输出位的相位与该三个比特位的相位相反。上述可能的实现方式中,提供了一种反相求和加法器,该反相求和加法器的实现方案比较简单,例如面积小、功耗低。
在第一方面的一种可能的实现方式中,该反相求和加法器用于进行以下压缩:若该三个比特位均为0,则该进位输出位为0,该求和输出位为1;若该三个比特位均为1,则该进位输出位为1,该求和输出位为0;若该三个比特位中存在一个比特位为1、另外两个比特位为0,则该进位输出位为0,该求和输出位为0;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则该进位输出位为1,该求和输出位为1。上述可能的实现方式中,提供了一种简单有效的反相求和加法器的压缩方式。
在第一方面的一种可能的实现方式中,该每个第一压缩器和该每个第二压缩器均为反相进位加法器;该反相进位加法器,用于压缩该三个比特位,得到一个进位输出位和一个求和输出位,该进位输出位的相位与该三个比特位的相位相反,该求和输出位的相位与该三个比特位的相位相同。上述可能的实现方式中,提供了一种反相进位加法器,该反相进位加法器的实现方案比较简单,例如面积小、功耗低。
在第一方面的一种可能的实现方式中,该反相进位加法器用于进行以下压缩:若该三个比特位均为0,则该进位输出位为1,该求和输出位为0;若该三个比特位均为1,则该进位输出位为0,该求和输出位为1;若该三个比特位中存在一个比特位为1、另外两个比特位为0,则该进位输出位为1,该求和输出位为1;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则进位输出位为0,该求和输出位为0。上述可能的实现方式中,提供了一种简单有效的反相进位加法器的压缩方式。
在第一方面的一种可能的实现方式中,该每个第一压缩器和该每个第二压缩器均为双反相加法器;该双反相加法器,用于压缩该三个比特位,得到一个进位输出位和一个求和输出位,该进位输出位和该求和输出位的相位均与该三个比特位的相位相反。上述可能的实现方式中,提供了一种双反相加法器,该双反相加法器的实现方案比较简单,例如面积小、功耗低。
在第一方面的一种可能的实现方式中,该双反相加法器用于进行以下压缩:若该三个比特位均为0,则该进位输出位为1,该求和输出位为1;若该三个比特位均为1,则该进位输出位为0,该求和输出位为0;若该三个比特位中存在一个比特位为1、另外两个比 特位为0,则该进位输出位为1,该求和输出位为0;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则进位输出位为0,该求和输出位为1。上述可能的实现方式中,提供了一种简单有效的双反相加法器的压缩方式。
在第一方面的一种可能的实现方式中,该累加器还包括:求和电路,用于接收该多个累加值,并对该多个累加值求和以得到该累加和。
在第一方面的一种可能的实现方式中,该累加器还包括:一个或多个反相器,用于对该W个压缩器层中的一个或多个第一压缩器或第二压缩器所输出的求和输出位和进位输出位中的至少一个取反、或对输入该一个或多个第一压缩器或第二压缩器的该三个比特位取反。上述可能的实现方式中,能够在保证压缩结果准确的情况下,提高该W个压缩器层的压缩效率。
第二方面,提供一种乘法器,该乘法器包括编码器和累加器,该累加器为上述第一方面或者第一方面的任一种可能的实现方式所提供的的累加器。
第三方面,提供一种算子电路,该算子电路在应用于累加器时,能够作为累加器的压缩器层中的加法器,且该加法器为反相求和加法器,包括:第一晶体管、第二晶体管、第三晶体管、第四晶体管、第五晶体管、第六晶体管、第七晶体管、第八晶体管、第九晶体管、第十晶体管、第十一晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十五晶体管、第十六晶体管、第十七晶体管、第十八晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管、第二十三晶体管和第二十四晶体管;其中,第一晶体管和第二晶体管并联耦合在电源端和第一节点之间;第三晶体管耦合在第一节点和第二节点之间;第四晶体管耦合在第二节点和第三节点之间;第五晶体管和第六晶体管并联耦合在第三节点和接地端之间;第七晶体管耦合在该电源端和第四节点之间;第八晶体管耦合在第二节点和第四节点之间;第九晶体管耦合在第二节点和第五节点之间;第十晶体管和第十一晶体管串联耦合在第四节点和第一输出端之间;第十二晶体管和第十三晶体管串联耦合在第五节点和第一输出端之间;第十四晶体管耦合在第五节点和该接地端之间;第十五晶体管、第十六晶体管和第十七晶体管并联耦合在该电源端和第六节点之间;第十八晶体管耦合在第一输出端和第六节点之间;第十九晶体管耦合在第一输出端和第七节点之间;第二十晶体管、第二十一晶体管和第二十二晶体管并联耦合在第七节点和该接地端之间;第三晶体管、第四晶体管、第十一晶体管、第十二晶体管、第十五晶体管和第二十晶体管的控制端均用于接收第一输入;第一晶体管、第五晶体管、第七晶体管、第十四晶体管、第十六晶体管和第二十一晶体管的控制端均用于接收第二输入;第二晶体管、第六晶体管、第八晶体管、第九晶体管、第十晶体管、第十三晶体管、第十七晶体管和第二十二晶体管的控制端均用于接收第三输入;第十八晶体管和第十九晶体管的控制端均耦合于第二节点;第二十三晶体管和第二十四晶体管串联耦合在该电源端和该接地端之间,第二十三晶体管和第二十四晶体管的耦合点为第二输出端;第二十三晶体管和第二十四晶体管的控制端均耦合于第二节点。上述技术方案中,提供了一种算子电路,该算子电路中晶体管的数量少、占用面积小,实现简单,从而将该算子电路应用于累加器中时可以减小累加的面积。
在第三方面的一种可能的实现方式中,第一晶体管、第二晶体管、第三晶体管、第七晶体管、第八晶体管、第十晶体管、第十一晶体管、第十五晶体管、第十六晶体管、第 十七晶体管、第十八晶体管和第二十三晶体管是PMOS晶体管;第四晶体管、第五晶体管、第六晶体管、第九晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管和第二十四晶体管是NMOS晶体管。上述可能的实现方式中,提供的算子电路压缩数据时晶体管的翻转率较小,从而将该算子电路应用于累加器中时可以减小累加器的功耗。
第四方面,提供一种算子电路,该算子电路在应用于累加器时,能够作为累加器的压缩器层中的加法器,且该加法器为反相进位加法器,包括:第一晶体管、第二晶体管、第三晶体管、第四晶体管、第五晶体管、第六晶体管、第七晶体管、第八晶体管、第九晶体管、第十晶体管、第十一晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十五晶体管、第十六晶体管、第十七晶体管、第十八晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管、第二十三晶体管和第二十四晶体管;其中,第一晶体管和第二晶体管并联耦合在电源端和第一节点之间;第三晶体管耦合在第一节点和第一输出端之间;第四晶体管耦合在第一输出端和第二节点之间;第五晶体管和第六晶体管并联耦合在第二节点和接地端之间;第七晶体管耦合在该电源端和第三节点之间;第八晶体管耦合在第三节点和该第一输出端之间;第九晶体管耦合在第一输出端和第四节点之间;第十晶体管和第十一晶体管串联耦合在第三节点和第五节点之间;第十二晶体管和第十三晶体管串联耦合在第四节点和第五节点之间;第十四晶体管耦合在第四节点和该接地端之间;第十五晶体管、第十六晶体管和第十七晶体管并联耦合在该电源端和第六节点之间;第十八晶体管耦合在第五节点和第六节点之间;第十九晶体管耦合在第五节点和第七节点之间;第二十晶体管、第二十一晶体管和第二十二晶体管并联耦合在第七节点和该接地端之间;第三晶体管、第四晶体管、第十一晶体管、第十二晶体管、第十五晶体管和第二十晶体管的控制端均用于接收第一输入;第一晶体管、第五晶体管、第七晶体管、第十四晶体管、第十六晶体管和第二十一晶体管的控制端均用于接收第二输入;第二晶体管、第六晶体管、第八晶体管、第九晶体管、第十晶体管、第十三晶体管、第十七晶体管和第二十二晶体管的控制端均用于接收第三输入;第十八晶体管和第十九晶体管的控制端均耦合于第一输出端;第二十三晶体管和第二十四晶体管串联耦合在该电源端和该接地端之间,第二十三晶体管和第二十四晶体管的耦合点为第二输出端;第二十三晶体管和第二十四晶体管的控制端均耦合于第五节点。上述技术方案中,提供了一种算子电路,该算子电路中晶体管的数量少、占用面积小,实现简单,从而将该算子电路应用于累加器中时可以减小累加的面积。
在第四方面的一种可能的实现方式中,第一晶体管、第二晶体管、第三晶体管、第七晶体管、第八晶体管、第十晶体管、第十一晶体管、第十五晶体管、第十六晶体管、第十七晶体管、第十八晶体管和第二十三晶体管是PMOS晶体管;第四晶体管、第五晶体管、第六晶体管、第九晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管和第二十四晶体管是NMOS晶体管。上述可能的实现方式中,提供的算子电路压缩数据时晶体管的翻转率较小,从而将该算子电路应用于累加器中时可以减小累加器的功耗。
第五方面,提供一种处理器,包括累加器、乘法器或算子电路;其中,该累加器为上述第一方面或者第一方面的任一种可能的实现方式所提供的累加器,该乘法器为上述第二 方面所提供的乘法器,该算子电路为上述第三方面至第四方面、或者第三方面至第四方面的任一种可能的实现方式所提供的算子电路。
第六方面,提供一种芯片,包括累加器、乘法器或算子电路;其中,该累加器为上述第一方面或者第一方面的任一种可能的实现方式所提供的累加器,该乘法器为上述第二方面所提供的乘法器,该算子电路为上述第三方面至第四方面、或者第三方面至第四方面的任一种可能的实现方式所提供的算子电路。
可以理解地,上述提供的任一种累加器、处理器或芯片均包括上文所提供的累加器或算子电路,因此,其所能达到的有益效果可参考上文所提供的累加器或算子电路中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种多个二进制数累加的示意图;
图2为本申请实施例提供的一种乘法器中作累加计算的示意图;
图3为一种基于标准加法器的累加器对多个二进制数作累加的示意图;
图4为本申请实施例提供的一种通信设备的结构示意图;
图5为本申请实施例提供的一种累加器的结构示意图;
图6为本申请实施例提供的一种反相求和加法器的结构示意图;
图7为本申请实施例提供的另一种累加器的结构示意图;
图8为本申请实施例提供的一种反相进位加法器的结构示意图;
图9为本申请实施例提供的又一种累加器的结构示意图;
图10为本申请实施例提供的一种双反相加法器的结构示意图;
图11为本申请实施例提供的另一种累加器的结构示意图;
图12为本申请实施例提供的一种累加器中反相器的示意图;
图13为本申请实施例提供的另一种累加器中反相器的示意图;
图14为本申请实施例提供的一种反相求和加法器的结构示意图;
图15为本申请实施例提供的一种反相进位加法器的结构示意图;
图16为本申请实施例提供的一种双反相加法器的结构示意图;
图17为本申请实施例提供的一种乘法器的结构示意图;
图18为本申请实施例提供的一种乘法器中累加计算的性能对比图。
具体实施方式
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,或a、b和c,其中a,b,c可以是单个,也可以是多个。另外,本申请的实施例采用了“第一”、“第二”等字样对名称或功能或作用类似的对象进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。“耦合”一词用于表示电性连接,包括通过导线或连接端直接相连或通过其他器件间接相连。因此“耦合”应被视为是一种广义上的电子通信连接。
在介绍本申请实施例之前,首先对现有技术中基于华莱士树实现的累加器的相关技术进行介绍说明。当前基于华莱士树实现的累加器中通常包括多个压缩器层,在利用该累加器实现多个二进制数的累加时,需要每一个压缩器层的输入阵列和输出阵列中的每个比特位均为正相。同时,对于同一个压缩器层中正相的多个比特位,分别采用多个标准加法器进行并行压缩,每三个比特位经过一个标准加法器压缩后,输出一个正相的进位输出位和一个正相的求和输出位。
示例性的,如图3所示,当利用该累加器对27个6比特(bits)的二进制数作累加计算时,该累加器可以包括多个压缩器层,该多个压缩器层的每个压缩器层中的每个标准加法器用于对正相的三个比特位(图3中表示为IN0、IN1和IN2)进行压缩,输出的正相的进位输出位表示为C、正相的求和输出位表示为S。图3中仅示出了该多个压缩器层中的第1个压缩器层至第3个压缩器层,B0至B5对应表示不同的数位(即2 0至2 5),该数位也可以称为权位,该数位是针对二进制中不同位置上的比特位而言的,类似于十进制中的个位、十位和百位等。
上述累加器采用了大量的标准加法器,而标准加法器在进行多比特位压缩时要求输入的比特位和输出的比特位都是正相,从而该标准加法器中需要包括用于执行相位统一的功能电路。目前,通常使用更多数量的金属-氧化物-半导体(metal–oxide–semiconductor,MOS)管来实现比特位的相位的统一,这样会导致该标准加法器存在MOS管的数量多、以及在平均单位比特计算内MOS管翻转次数多的问题。上述累加器中的标准加法器基本都是标准全加器,以标准全加器为例,每个标准全加器包括的MOS管的数量高达28个,且在7nm工艺下占用面积为0.2736um 2,从而使得该累加器的面积大。此外,标准全加器中MOS管的数量多,会导致在平均单位比特计算内MOS管的翻转次数多,从而使得该累加器的功耗大。因此,当前基于华莱士树实现的累加器存在面积大、功耗高的问题。基于此,本申请实施例提供一种累加器,通过对同一压缩器层中不同相位的比特位分别进行压缩,来降低该累加器的功耗和面积,该累加器可用于通信设备中,关于该通信设备和该累加器的具体描述可以参见下文。
本申请实施例提供的累加计算过程中,比特位(也可以称为比特位对应的信号)的相位具有正相和反相两种相位状态,这两种相位状态是相对的,比如一个比特位的正相为G,则该比特位的反相为/G,即反相信号是正相信号的逻辑取反。
图4为本申请实施例提供的一种通信设备的结构示意图,该通信设备可以是终端、或者服务器等,或者可以是终端或者服务器内置的芯片、芯片组、电路板或模组等。参见图3,该通信设备可以包括存储器101、处理器102、通信接口103和总线104。其中,存储器101、处理器102以及通信接口103通过总线104相互连接。存储器101可用于存储数据、软件程序以及模块,主要包括存储程序区和存储数据区,存储程序区可存储操作系统、至少一个功能所需的应用程序等,存储数据区可存储该设备的使用时所创建的数据等。处理器102用于对该通信设备的动作进行控制管理,比如通过运行或执行存储在存储器101内的软件程序和/或模块,以及调用存储在存储器101内的数据,执行该设备的各种功能和处理数据。通信接口103用于支持该通信设备进行通信。
其中,处理器102包括但不限于中央处理单元(central processing unit,CPU)、网络处理单元(network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、 或数字信号处理器(digital Signal processor,DSP)或者通用处理器等。在本申请实施例中,处理器102中包括一个或多个累加器,或者包括一个或者多个乘法器,例如处理器102中包括乘法器阵列,该乘法器是在处理器102中实现乘法运算的器件。
总线104可以是外设部件互连标准(peripheral component interconnect,PCI)总线,或者扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
为了进一步描述技术方案,图5为本申请实施例提供的一种累加器的结构示意图,该累加器可用于实现多个二进制数的累加。参见图5,该累加器包括:W个压缩器层,用于压缩多个二进制数,以得到多个累加值,该多个累加值之和为该多个二进制数的累加和,W为大于或等于1的整数。
其中,该W个压缩器层压缩该多个二进制数后,可得到多行(每行包括一个或者多个比特位),每行表示一个累加值,即该多行表示该多个累加值。该多个累加值可以为两个累加值或者两个以上的累加值,本申请实施例对此不作具体限制。
另外,该W个压缩器层可以包括一个压缩器层或者多个压缩器层,比如,该W可以等于1、4或者6等,具体数值可以由本领域人员根据经验或者实际需求进行设置,本申请实施例对此不作具体限制。图5中以W个压缩器层包括多个压缩器层、且该多个压缩器层分别表示为L1至LW为例进行说明。
在本申请方案中,该W个压缩器层包括至少一个第一压缩器层,至少一个第一压缩器层中的每个第一压缩器层用于压缩输入阵列,以得到输出阵列。该输入阵列包括第一阵列和第二阵列,第一阵列包括多个正相的比特位,第二阵列包括多个反相的比特位,该输出阵列包括第一压缩阵列和第二压缩阵列。每个第一压缩器层包括:第一压缩电路21,用于压缩该第一压缩器层的输入阵列中的第一阵列,以得到第一压缩阵列;第二压缩电路22,用于压缩该第一压缩器层的输入阵列中的第二阵列,以得到第二压缩阵列。
其中,该W个压缩器层中的每个压缩器层都有一个输入阵列和一个输出阵列。该W个压缩器层中第1个压缩器层(即L1)的输入阵列可以为该多个二进制数按照数位从高到低的顺序依次排列得到的矩阵。该W个压缩器层中第2个至第W个压缩器层(即L2至LW)中每个压缩器层的输入矩阵可以为该压缩器层的上一个压缩器层的输出矩阵或者该输出矩阵经过其他变形或预处理后的矩阵,即第i个压缩器层的输入矩阵为第(i-1)个压缩器层的输出矩阵或者该输出矩阵经过其他变形或预处理后的矩阵,i的取值依次为2至W。
另外,该至少一个第一压缩器层可以包括一个或者多个第一压缩器层,该一个或者多个第一压缩器层的层数可以表示为N,N为正整数且小于或等于W。当N小于W时,该至少一个第一压缩器层可以是该W个压缩器层中的任意一个或者多个压缩器层。图5中以该至少一个第一压缩器层包括(W-1)个第一压缩器层,且该(W-1)个第一压缩器层为该W个压缩器层中的第2个至第W个压缩器层(即L2至LW)为例进行说明。需要说明的是,当N小于W时,该W个压缩器层中除该至少一个第一压缩器层之外的其他压缩器层可以采用现有技术来实现,本申请实施例对此不作具体限制。
本申请实施例中,在每个第一压缩器层的输入阵列中,第一阵列包括多个正相的比特 位,第二阵列包括多个反相的比特位,从而第一阵列可以认为是一个包括多个正相的比特位的华莱士树,第二阵列可以认为是一个包括多个反相的比特位的华莱士树。也即是,每个第一压缩器层的输入阵列包括两个华莱士树,这两个华莱士树中包括的比特位的相位相反。对于每个第一压缩器层的输入阵列包括的第一阵列和第二阵列,第一压缩电路21用于压缩第一阵列,第二压缩电路22用于压缩第二阵列,从而该输入阵列中不同相位的比特位可以由不同的压缩电路来压缩,从而无需将每个第一压缩器层的输入阵列中的比特位都统一为同一相位,从而使得该累加器相对于传统设计实现简单,且能够降低面积和功耗。
进一步的,该W个压缩器层中的每个压缩器层可以包括一个或者多个压缩器,每个压缩器可用于对该压缩器层的输入阵列中同一数位上的三个比特位进行压缩。对于每个第一压缩器层中的第一压缩电路21,该第一压缩电路21可以包括一个或多个第一压缩器211,该一个或多个第一压缩器211中的每个第一压缩器211用于压缩第一阵列中位于同一数位上的三个比特位。对于每个第一压缩器层中的第二压缩电路22,该第二压缩电路22可以包括一个或多个第二压缩器221,该一个或多个第二压缩器221中的每个第二压缩器221用于压缩第二阵列中位于同一数位上的三个比特位。
上述第一压缩器211和第二压缩器221可以为以下三种加法器中的任一种,该三种加法器包括反相求和加法器、反相进位加法器和双反相加法器,下面分别对这三种加法器进行介绍说明。
第一种、反相求和加法器,用于压缩三个比特位,以得到一个求和输出位和一个进位输出位,该求和输出位的相位与该三个比特位的相位相反,该进位输出位的相位与该三个比特位的相位相同。
示例性的,如图6所示,假设输入该反相求和加法器的三个比特位分别为IN0、IN1和IN2,该反相求和加法器压缩该三个比特位IN0、IN1和IN2后,输出一个进位比特位C和一个求和比特位/S,C与IN0、IN1和IN2的相位相同,/S与IN0、IN1和IN2的相位相反。在一种示例中,该反相求和加法器可以满足如下公式(1-1)和(1-2)所示的逻辑功能,NOT表示取反运算,XOR表示或非运算,AND表示与运算,OR表示或运算。
/S=NOT(IN0 XOR IN1 XOR IN2)   (1-1)
C=(IN0 AND IN1)OR(IN1 AND IN2)OR(IN0 AND IN2)  (1-2)
当该三个比特位为不同的数值(比如,0或1)时,该反相求和加法器用于执行以下压缩:若该三个比特位均为0,则该进位输出位为0,该求和输出位为1;若该三个比特位均为1,则该进位输出位为1,该求和输出位为0;若该三个比特位中存在一个比特位为1、另外两个比特位为0,则该进位输出位为0,该求和输出位为0;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则该进位输出位为1,该求和输出位为1。也即是,该反相求和加法器可以根据如下表1所示的逻辑表进行压缩。
表1
IN0 IN1 IN2 C /S
0 0 0 0 1
0 0 1 0 0
0 1 0 0 0
1 0 0 0 0
1 1 0 1 1
1 0 1 1 1
1 1 0 1 1
1 1 1 1 0
当第一压缩器211为该反相求和加法器时,该第一压缩器211用于:压缩正相的三个比特位,以得到一个正相的进位输出位和一个反相的求和输出位。当第二压缩器221为该反相求和加法器时,该第二压缩器221用于:压缩反相的三个比特位,以得到一个反相的进位输出位和一个正相的求和输出位。
在一种可能的实施例中,假设该W个压缩器层中的每个压缩器层均包括多个反相求和加法器,且第1个压缩器层(即L1)的输入阵列仅包括多个正相的比特位,第2个至第W个压缩器(即L2至LW)的输入阵列均包括多个正相的比特位(即第一阵列)和多个反相的比特位(即第二阵列)。上述描述也可以理解为:第1个压缩器层(即L1)中仅包括一个第一压缩电路21,该第一压缩电路21包括多个第一压缩器211;第2个至第W个压缩器层(即L2至LW)中的每个压缩器层均包括一个第一压缩电路21和一个第二压缩电路22,该第一压缩电路21包括多个第一压缩器211,该第二压缩电路22包括多个第二压缩器221。
示例性的,如图7所示,当利用该累加器对27个6bits的二进制数作累加计算时,第1个压缩器层(即L1)中包括54个第一压缩器211,每个第一压缩器211用于对同一数位上的三个正相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。第2个压缩器层(即L2)的输入矩阵包括该54个第一压缩器211输出的输出位。其中,该输入矩阵中的第一矩阵包括该54个第一压缩器211输出的正相的进位输出位C,第二矩阵包括该54个第一压缩器211输出的反相的求和输出位/S。
第2个压缩器层(即L2)中的第一压缩电路21包括18个第一压缩器211,第二压缩电路22包括18个第二压缩器221。该18个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。该18个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。第3个压缩器层(即L3)的输入矩阵包括该18个第一压缩器211和该18个第二压缩器221输出的输出位。其中,该输入矩阵中的第一矩阵包括该18个第一压缩器211输出的正相的进位输出位C和该18个第二压缩器221输出的正相的求和输出位S,第二矩阵包括该18个第一压缩器211输出的反相的求和输出位/S和该18个第二压缩器221输出的反相的进位输出位/C。
第3个压缩器层(即L3)中的第一压缩电路21包括12个第一压缩器211,第二压缩电路22包括12个第二压缩器221。该12个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。该12个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。第4个压缩器层(即L4)的输入矩阵包括该12个第一压缩器211和该12个第二压缩器221输出的输出位。其中,该输入矩阵中的第一矩阵包括该12个第 一压缩器211输出的正相的进位输出位C和该12个第二压缩器221输出的正相的求和输出位S,第二矩阵包括该12个第一压缩器211输出的反相的求和输出位/S和该12个第二压缩器221输出的反相的进位输出位/C。
第4个压缩器层(即L4)中的第一压缩电路21包括6个第一压缩器211,第二压缩电路22包括6个第二压缩器221。该6个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。该6个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。第4个压缩器层(即L4)的输出矩阵包括:该6个第一压缩器211输出的正相的进位输出位C和反相的求和输出位/S,该6个第二压缩器221输出的反相的进位输出位/C和正相的求和输出位S,以及上述第一阵列未被压缩的正相的比特位和上述第二阵列中未被压缩的反相的比特位。
需要说明的是,图7中仅示出了该累加器中的第1个至第4个压缩器层(即L1至L4),第4个压缩器层之后的其他压缩器层的压缩方式可以采用上述第2个至第4个压缩器层类似的方式进行压缩,或者将第4个压缩器层的输出矩阵中不同相位的比特位通过取反运算转换为同一相位的比特位,再通过现有的压缩方式进行压缩,也本申请实施例在此不再赘述。
第二种、反相进位加法器,用于压缩三个比特位,以得到一个求和输出位和一个进位输出位,该求和输出位的相位与该三个比特位的相位相同,该进位输出位的相位与该三个比特位的相位相反。
示例性的,如图8所示,假设输入该反相进位加法器的三个比特位分别为IN0、IN1和IN2,该反相进位加法器压缩该三个比特位IN0、IN1和IN2后,输出一个进位比特位/C和一个求和比特位S,/C与IN0、IN1和IN2的相位相反,S与IN0、IN1和IN2的相位相同。在一种示例中,该反相进位加法器可以满足如下公式(2-1)和(2-2)所示的逻辑功能,NOT表示取反运算,XOR表示或非运算,AND表示与运算,OR表示或运算。
S=IN0 XOR IN1 XOR IN2   (2-1)
/C=NOT((IN0 AND IN1)OR(IN1 AND IN2)OR(IN0 AND IN2))  (2-2)
当该三个比特位为不同的数值(比如,0或1)时,该反相进位加法器用于执行以下压缩:若该三个比特位均为0,则该进位输出位为1,该求和输出位为0;若该三个比特位均为1,则该进位输出位为0,该求和输出位为1;若该三个比特位中存在一个比特位为1、另外两个比特位为0,则该进位输出位为1,该求和输出位为1;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则该进位输出位为0,该求和输出位为0。也即是,该反相进位加法器可以根据如下表2所示的逻辑表进行压缩。
表2
IN0 IN1 IN2 /C S
0 0 0 1 0
0 0 1 1 1
0 1 0 1 1
1 0 0 1 1
1 1 0 0 0
1 0 1 0 0
1 1 0 0 0
1 1 1 0 1
当第一压缩器211为该反相进位加法器时,该第一压缩器211用于:压缩正相的三个比特位,以得到一个反相的进位输出位和一个正相的求和输出位。当第二压缩器221为该反相进位加法器时,该第二压缩器221用于:压缩反相的三个比特位,以得到一个正相的进位输出位和一个反相的求和输出位。
在一种可能的实施例中,假设该W个压缩器层中的每个压缩器层均包括多个反相进位加法器,且第1个压缩器层(即L1)的输入阵列仅包括多个正相的比特位,第2个至第W个压缩器(即L2至LW)的输入阵列均包括多个正相的比特位(即第一阵列)和多个反相的比特位(即第二阵列)。上述描述可以理解为:第1个压缩器层(即L1)中仅包括一个第一压缩电路21,该第一压缩电路21包括多个第一压缩器211;第2个至第W个压缩器层(即L2至LW)中的每个压缩器层均包括一个第一压缩电路21和一个第二压缩电路22,该第一压缩电路21包括多个第一压缩器211,该第二压缩电路22包括多个第二压缩器221。
示例性的,如图9所示,当利用该累加器对27个6bits的二进制数作累加计算时,第1个压缩器层(即L1)中包括54个第一压缩器211,每个第一压缩器211用于对同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。第2个压缩器层(即L2)的输入矩阵包括该54个第一压缩器211输出的输出位。其中,该输入矩阵中的第一矩阵包括该54个第一压缩器211输出的反相的进位输出位/C,第二矩阵包括该54个第一压缩器211输出的正相的求和输出位S。
第2个压缩器层(即L2)中的第一压缩电路21包括18个第一压缩器211,第二压缩电路22包括18个第二压缩器221。该18个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。该18个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。第3个压缩器层(即L3)的输入矩阵包括该18个第一压缩器211和该18个第二压缩器221输出的输出位。其中,该输入矩阵中的第一矩阵包括该18个第一压缩器211输出的正相的求和输出位S和该18个第二压缩器221输出的正相的进位输出位C,第二矩阵包括该18个第一压缩器211输出的反相的进位输出位/C和该18个第二压缩器221输出的反相的求和输出位/S。
第3个压缩器层(即L3)中的第一压缩电路21包括12个第一压缩器211,第二压缩电路22包括12个第二压缩器221。该12个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。该12个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。第4个压缩器层(即L4)的输入矩阵包括包括该12个第一压缩器211和该12个第二压缩器221输出的输出位。其中,该输入矩阵中的第一矩阵包括该12个第一压缩器211输出的正相的求和输出位S和该12个第二压缩器221输出的正相的进 位输出位S,第二矩阵包括该12个第一压缩器211输出的反相的进位输出位/C和该12个第二压缩器221输出的反相的求和输出位/S。
第4个压缩器层(即L4)中的第一压缩电路21包括6个第一压缩器211,第二压缩电路22包括6个第二压缩器221。该6个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个正相的求和输出位S。该6个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个正相的进位输出位C和一个反相的求和输出位/S。第4个压缩器层(即L4)的输出矩阵包括:该6个第一压缩器211输出的反相的进位输出位/C和正相的求和输出位S,该6个第二压缩器221输出的反相的求和输出位/S和正相的进位输出位C,以及上述第一阵列未被压缩的正相的比特位和上述第二阵列中未被压缩的反相的比特位。
需要说明的是,图9中仅示出了该累加器中的第1个至第4个压缩器层(即L1至L4),第4个压缩器层之后的其他压缩器层的压缩方式可以采用上述第2个至第4个压缩器层类似的方式进行压缩,或者将第4个压缩器层的输出矩阵中不同相位的比特位通过取反运算转换为同一相位的比特位,再通过现有的压缩方式进行压缩,也本申请实施例在此不再赘述。
另外,上述第一种和第二种的示例中均以第1个压缩器层(即L1)的输入矩阵仅包括多个正相的比特位为例进行说明,并不对本申请实施例构成限制。在实际应用中,第1个压缩器层(即L1)的输入矩阵也可以仅包括多个反相的比特位,或者同时包括正相的比特位和反相的比特位。
第三种、双反相加法器,用于压缩三个比特位,以得到一个求和输出位和一个进位输出位,该求和输出位的相位和该进位输出位的相位均与该三个比特位的相位相反。
示例性的,如图10所示,假设输入该双反相加法器的三个比特位分别为IN0、IN1和IN2,该双反相加法器压缩该三个比特位IN0、IN1和IN2后,输出一个进位比特位/C和一个求和比特位/S,/C和/S的相位与IN0、IN1和IN2的相位均相反。在一种示例中,该双反相加法器可以满足如下公式(3-1)和(3-2)所示的逻辑功能,NOT表示取反运算,XOR表示或非运算,AND表示与运算,OR表示或运算。
S=NOT(IN0 XOR IN1 XOR IN2)   (3-1)
/C=NOT((IN0 AND IN1)OR(IN1 AND IN2)OR(IN0 AND IN2))  (3-2)
当该三个比特位为不同的数值(比如,0或1)时,该双反相加法器用于执行以下压缩:若该三个比特位均为0,则该进位输出位为1,该求和输出位为1;若该三个比特位均为1,则该进位输出位为0,该求和输出位为0;若该三个比特位中存在一个比特位为1、另外两个比特位为0,则该进位输出位为1,该求和输出位为0;若该三个比特位中存在两个比特位为1、另外一个比特位为0,则该进位输出位为0,该求和输出位为1。也即是,该双反相加法器可以根据如下表3所示的逻辑表进行压缩。
表3
IN0 IN1 IN2 /C S
0 0 0 1 1
1 1 1 0 0
0 0 1 1 0
0 1 0 1 0
1 0 0 1 0
1 1 0 0 1
1 0 1 0 1
1 1 0 0 1
当第一压缩器211为该双反相加法器时,该第一压缩器211用于:压缩正相的三个比特位,以得到一个反相的进位输出位和一个反相的求和输出位。当第二压缩器221为该双反相加法器时,该第二压缩器221用于:压缩反相的三个比特位,以得到一个正相的进位输出位和一个正相的求和输出位。
在一种可能的实施例中,假设该W个压缩器层中的每个压缩器层均包括多个双反相加法器,且第1个至第W个压缩器(即L1至LW)的输入阵列均包括一个第一阵列和一个第二阵列。上述描述可以理解为:第1个至第W个压缩器层(即L1至LW)中的每个压缩器层均包括一个第一压缩电路21和一个第二压缩电路22,该第一压缩电路21包括多个第一压缩器211,该第二压缩电路22包括多个第二压缩器221。
示例性的,如图11所示,当利用该累加器对18个6bits的二进制数作累加计算时,若该18个二进制数中的前9个二进制数为正相的比特位(即第一矩阵)、后9个二进制数为反相的比特位(即第二矩阵),则第1个压缩器层(即L1)中的第一压缩电路21包括18个第一压缩器211,第二压缩电路22包括18个第二压缩器221。该18个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个反相的求和输出位/S。该18个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个正相的进位输出位C和一个正相的求和输出位S。第2个压缩器层(即L2)的输入矩阵中的第一矩阵包括该18个第二压缩器221输出的正相的进位输出位C和正相的求和输出位S,第二矩阵包括该18个第一压缩器211输出的反相进位输出位/C和反相的求和输出位/S。
第2个压缩器层(即L2)中的第一压缩电路21包括12个第一压缩器211,第二压缩电路22包括12个第二压缩器221。该12个第一压缩器211中的每个第一压缩器211用于对第一矩阵中同一数位上的三个正相的比特位进行压缩,以输出一个反相的进位输出位/C和一个反相的求和输出位/S。该12个第二压缩器221中的每个第二压缩器221用于对第二矩阵中同一数位上的三个反相的比特位进行压缩,以输出一个正相的进位输出位C和一个正相的求和输出位S。第2个压缩器层(即L2)的输出矩阵包括:该12个第一压缩器211输出的反相的求和输出位/S和反相的进位输出位/C,以及该12个第二压缩器221输出的正相的求和输出位S和正相的进位输出位C。
需要说明的是,图11中仅示出了该累加器中的第1个至第2个压缩器层(即L1至L2),第2个压缩器层之后的其他压缩器层的压缩方式可以采用上述第1个至第2个压缩器层类似的方式进行压缩,或者将第2个压缩器层的输出矩阵中不同相位的比特位通过取反运算转换为同一相位的比特位,再通过现有的压缩方式进行压缩,也本申请实施例在此不再赘述。
可选的,当该W个压缩器层中的至少一个第一压缩器层包括多个第一压缩器层时,该多个第一压缩器层中位于不同层级上的第一压缩器层的压缩方式可以是相同的,也可以是不同的。比如,该多个第一压缩器层包括至少两个第一压缩器层,该至少两个第一压缩器层中的每个第一压缩器和每个第二压缩器可以均采用上述三种加法器中一种。或者,该至少两个第一压缩器层中的一部分第一压缩器层中的每个第一压缩器和每个第二压缩器均采用上述三种加法器中一种,另一部分第一压缩器层中的每个第一压缩器和每个第二压缩器均采用上述上述三种加法器中的另外两种中的一种或两种。
进一步的,该累加器还可以包括:求和电路23,用于接收该多个累加值,并对该多个累加值求和以得到该累加和。具体的,在该W个压缩器层压缩该多个二进制数得到该多个累加值后,该W个压缩器层中的第W个压缩器层(即LW)可以将该多个累加值发送给求和电路23,求和电路23在接收到该多个累加值时,对该多个累加值求和即可得到该多个二进制数的累加和。可选的,该多个累加值为两个累加值,此时该求和电路23为加法器,该加法器对该两个累加值求和得到该多个二进制数的累加和。
进一步的,该累加器还可以包括:一个或多个反相器,用于对该W个压缩器层中的一个或多个第一压缩器或第二压缩器所输出的求和输出位和进位输出位中的至少一个取反、或对输入该一个或多个第一压缩器或第二压缩器的三个比特位取反。在一种可能的实施例中,该一个或多个反相器用于:对该W个压缩器层中的一个或多个第一压缩器或第二压缩器所输出的求和输出位和进位输出位中的至少一个取反。
示例性的,如图12所示,以该W个压缩器层中的第i个压缩器层(即1≤i≤W)为例,假设第i个压缩器层的输入矩阵包括3行6列的正相比特位,第i个压缩器层包括6个第一压缩器,且该6个第一压缩器均为反相求和加法器。该6个第一压缩器用于对该输入矩阵进行压缩以输出两行,该两行中的第一行包括6个反相的求和输出位/S、第二行包括6个正相的进位输出位C。此时,该一个或多个反相器可以包括6个反相器,该6个反相器可用于分别对第一行的6个反相的求和输出位/S进行取反,以得到6个正相的求和输出位S。经过取反之后,第i个压缩器层输出的这两行被转换为正相的两行。
在另一种可能的实施例中,该一个或多个反相器用于:对输入该W个压缩器层中的一个或多个第一压缩器或第二压缩器的三个比特位取反。示例性的,如图13所示,以该W个压缩器层中的第i个压缩器层(即1≤i≤W)为例,假设第i个压缩器层的输入矩阵包括6行3列的比特位,其中第1行至第3行为正相的比特位、第4行至第6行为反相的比特位,第i个压缩器层包括6个第一压缩器,且该6个第一压缩器均为反相求和加法器。此时,该一个或多个反相器可以包括9个反相器,该9个反相器可用于分别对第4行至第6行中反相的9个比特位进行取反,以得到9个正相的比特位(即将第4行至第6行中的比特位转换为正相),即该9个反相器可用于对输入第二行中的3个第一压缩器的比特位取反。经过取反之后,第1个压缩器层中的6个第一压缩器可分别用于对同一数据上正相的三个比特位进行压缩以输出四行,这四行中包括两行反相的求和输出位/S和两行正相的进位输出位C。
需要说明的是,上述图12和图13所示的第i个压缩器层的输入矩阵、以及所包括的第一压缩器仅为示例性的,并不构成对本申请实施例的限制。在本申请实施例中,通过该一个或者多个反相器对一个或多个第一压缩器或第二压缩器所输出的求和输出位和进位 输出位中的至少一个取反、或对输入该一个或多个第一压缩器或第二压缩器的三个比特位取反,能够在保证压缩结果准确的情况下,提高该W个压缩器层的压缩效率,进而提高该累加器的计算效率。
图14为本申请实施例提供的一种反相求和压缩算子电路的结构示意图,该反相求和算子电路也可以称为反相求和加法器。该反相求和加法器包括:第一晶体管M1、第二晶体管M2、第三晶体管M3、第四晶体管M4、第五晶体管M5、第六晶体管M6、第七晶体管M7、第八晶体管M8、第九晶体管M9、第十晶体管M10、第十一晶体管M11、第十二晶体管M12、第十三晶体管M13、第十四晶体管M14、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17、第十八晶体管M18、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21、第二十二晶体管M22、第二十三晶体管M23和第二十四晶体管M24。
其中,第一晶体管M1和第二晶体管M2并联耦合在电源端和第一节点①之间;第三晶体管M3耦合在第一节点①和第二节点②之间;第四晶体管M4耦合在第二节点②和第三节点③之间;第五晶体管M5和第六晶体管M6并联耦合在第三节点③和接地端之间;第七晶体管M7耦合在电源端和第四节点④之间;第八晶体管M8耦合在第二节点②和第四节点④之间;第九晶体管M9耦合在第二节点②和第五节点⑤之间;第十晶体管M10和第十一晶体管M11串联耦合在第四节点④和第一输出端/C之间;第十二晶体管M12和第十三晶体管M13串联耦合在第五节点⑤和第一输出端/C之间;第十四晶体管M14耦合在第五节点⑤和接地端之间;第十五晶体管M15、第十六晶体管M16和第十七晶体管M17并联耦合在电源端和第六节点⑥之间;第十八晶体管M18耦合在第一输出端/C与第六节点⑥之间;第十九晶体管M19耦合在第一输出端/C与第七节点⑦之间;第二十晶体管M20、第二十一晶体管M21和第二十二晶体管M22并联耦合在第七节点⑦与接地端之间。
第三晶体管M3、第四晶体管M4、第十一晶体管M11、第十二晶体管M12、第十五晶体管M15和第二十晶体管M20的控制端用于接收第一输入IN0;第一晶体管M1、第五晶体管M5、第七晶体管M7、第十四晶体管M14、第十六晶体管M16和第二十一晶体管M21的控制端用于接收第二输入IN2;第二晶体管M2、第六晶体管M6、第八晶体管M8、第九晶体管M9、第十晶体管M10、第十三晶体管M13、第十七晶体管M17和第二十二晶体管M22的控制端用于接收第三输入IN3。
第十八晶体管M18和第十九晶体管M19的控制端均耦合于第二节点②;第二十三晶体管M23和第二十四晶体管M24串联耦合在电源端和接地端之间,第二十三晶体管M23和第二十四晶体管M24的耦合点为第二输出端S;第二十三晶体管M23和第二十四晶体管M24的控制端均耦合于第二节点②。
在本申请实施例中,上述第一输入IN0、第二输入IN2和第三输入IN3可以是上述累加器实施例中关于反相求和加法器的相关描述中的所述三个比特位,第一输出端/S可以用于输出所述反相求和加法器的求和输出位,第二输出端C可以用于输出所述反相求和加法器的进位输出位。
可选的,第一晶体管M1、第二晶体管M2、第三晶体管M3、第七晶体管M7、第八晶体管M8、第十晶体管M10、第十一晶体管M11、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17、第十八晶体管M18和第二十三晶体管M23是PMOS晶体管; 第四晶体管M4、第五晶体管M5、第六晶体管M6、第九晶体管M9、第十二晶体管M12、第十三晶体管M13、第十四晶体管M14、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21、第二十二晶体管M22和第二十四晶体管M24是NMOS晶体管。相应的,上述控制端具体可以是指相应的PMOS晶体管或者NMOS晶体管的栅极。
需要说明的是,上述举例的第一晶体管M1至第二十四晶体管M24可以是MOS晶体管,也可以用双极型晶体管代替,图14中所示的各晶体管的类型仅为示例性的,并不对本申请实施例构成限制。另外,图14仅是一种电路实例,对于任何在此电路基础上增加晶体管,而使多个晶体管所起的作用等同于图14中的一个或者多个晶体管作用的,也视为同种电路。
图15为本申请实施例提供的一种反相进位压缩算子电路的结构示意图,该反相进位算子电路也可以称为反相进位加法器。该反相进位加法器包括:第一晶体管M1、第二晶体管M2、第三晶体管M3、第四晶体管M4、第五晶体管M5、第六晶体管M6、第七晶体管M7、第八晶体管M8、第九晶体管M9、第十晶体管M10、第十一晶体管M11、第十二晶体管M12、第十三晶体管M13、第十四晶体管M14、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17、第十八晶体管M18、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21、第二十二晶体管M22、第二十三晶体管M23和第二十四晶体管M24。
其中,第一晶体管M1和第二晶体管M2并联耦合在电源端和第一节点①之间;第三晶体管M3耦合在第一节点①和第一输出端/C之间;第四晶体管M4耦合在第一输出端/C和第二节点②之间;第五晶体管M5和第六晶体管M6并联耦合在第二节点②和接地端之间;第七晶体管M7耦合在电源端和第三节点③之间;第八晶体管M8耦合在第三节点③和第一输出端/C之间;第九晶体管M9耦合在第一输出端/C和第四节点④之间;第十晶体管M10和第十一晶体管M11串联耦合在第三节点③和第五节点⑤之间;第十二晶体管M12和第十三晶体管M13串联耦合在第四节点④和第五节点⑤之间;第十四晶体管M14耦合在第四节点④和接地端之间;第十五晶体管M15、第十六晶体管M16和第十七晶体管M17并联耦合在电源端和第六节点⑥之间;第十八晶体管M18耦合在第五节点⑤与第六节点⑥之间;第十九晶体管M19耦合在第五节点⑤与第七节点⑦之间;第二十晶体管M20、第二十一晶体管M21和第二十二晶体管M22并联耦合在第七节点⑦与接地端之间。
第三晶体管M3、第四晶体管M4、第十一晶体管M11、第十二晶体管M12、第十五晶体管M15和第二十晶体管M20的控制端用于接收第一输入IN0;第一晶体管M1、第五晶体管M5、第七晶体管M7、第十四晶体管M14、第十六晶体管M16和第二十一晶体管M21的控制端用于接收第二输入IN2;第二晶体管M2、第六晶体管M6、第八晶体管M8、第九晶体管M9、第十晶体管M10、第十三晶体管M13、第十七晶体管M17和第二十二晶体管M22的控制端用于接收第三输入IN3。
第十八晶体管M18和第十九晶体管M19的控制端均耦合于第一输出端/C;第二十三晶体管M23和第二十四晶体管M24串联耦合在电源端和接地端之间,第二十三晶体管M23和第二十四晶体管M24的耦合点为第二输出端S;第二十三晶体管M23和第二十四晶体管M24的控制端均耦合于第五节点⑤。
在本申请实施例中,上述第一输入IN0、第二输入IN2和第三输入IN3可以是上 述累加器实施例中关于反相进位求和加法器的相关描述中的所述三个比特位,第一输出端/C可以用于输出所述反相进位加法器的求和输出位,第二输出端S可以用于输出所述反相进位加法器的进位输出位。
可选的,第一晶体管M1、第二晶体管M2、第三晶体管M3、第七晶体管M7、第八晶体管M8、第十晶体管M10、第十一晶体管M11、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17、第十八晶体管M18和第二十三晶体管M23是PMOS晶体管;第四晶体管M4、第五晶体管M5、第六晶体管M6、第九晶体管M9、第十二晶体管M12、第十三晶体管M13、第十四晶体管M14、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21、第二十二晶体管M22和第二十四晶体管M24是NMOS晶体管。相应的,上述控制端具体可以是指相应的PMOS晶体管或者NMOS晶体管的栅极。
需要说明的是,上述举例的第一晶体管M1至第二十四晶体管M24可以是MOS晶体管,也可以用双极型晶体管代替,图15中所示的各晶体管的类型仅为示例性的,并不对本申请实施例构成限制。另外,图15仅是一种电路实例,对于任何在此电路基础上增加晶体管,而使多个晶体管所起的作用等同于图15中的一个或者多个晶体管作用的,也视为同种电路。
图16为本申请实施例提供的一种双反相压缩算子电路的结构示意图,该双反相算子电路也可以称为双反相加法器。该双反相加法器包括:第一晶体管M1、第二晶体管M2、第三晶体管M3、第四晶体管M4、第五晶体管M5、第六晶体管M6、第七晶体管M7、第八晶体管M8、第九晶体管M9、第十晶体管M10、第十一晶体管M11、第十二晶体管M12、第十三晶体管M13、第十四晶体管M14、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17、第十八晶体管M18、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21和第二十二晶体管M22。
其中,第一晶体管M1和第二晶体管M2并联耦合在电源端和第一节点①之间;第三晶体管M3耦合在第一节点①和第一输出端/C之间;第四晶体管M4耦合在第一输出端/C和第二节点②之间;第五晶体管M5和第六晶体管M6并联耦合在第二节点②和接地端之间;第七晶体管M7耦合在电源端和第三节点③之间;第八晶体管M8耦合在第三节点③和第一输出端/C之间;第九晶体管M9耦合在第一输出端/C和第四节点④之间;第十晶体管M10耦合在第四节点④和接地端之间;第十一晶体管M11和第十二晶体管M12串联耦合在第三节点③和第二输出端/S之间;第十三晶体管M13和第十四晶体管M14串联耦合在第二输出端/S和第四节点④之间;第十五晶体管M15、第十六晶体管M16和第十七晶体管M17并联耦合在电源端和第五节点⑤之间;第十八晶体管M18耦合在第五节点⑤与第二输出端/S;第十九晶体管M19耦合在第二输出端/S和第六节点⑥之间;第二十晶体管M20、第二十一晶体管M21和第二十二晶体管M22并联耦合在第六节点⑥接地端之间。
第三晶体管M3、第四晶体管M4、第十二晶体管M12、第十三晶体管M13、第十五晶体管M15和第二十晶体管M20的控制端用于接收第一输入IN0;第一晶体管M1、第五晶体管M5、第七晶体管M7、第十晶体管M10、第十六晶体管M16和第二十一晶体管M21的控制端用于接收第二输入IN2;第二晶体管M2、第六晶体管M6、第八晶体管M8、第九晶体管M9、第十一晶体管M11、第十四晶体管M14、第十七晶体管M17和第二十二晶体管M22的控制端用于接收第三输入IN3。第十八晶体管M18和第十九晶体管M19的控 制端均耦合于第一输出端/C。
在本申请实施例中,上述第一输入IN0、第二输入IN2和第三输入IN3可以是上述累加器实施例中关于双反相加法器的相关描述中的所述三个比特位,第一输出端/C可以用于输出所述双反相加法器的求和输出位,第二输出端S可以用于输出所述双反相加法器的进位输出位。
可选的,第一晶体管M1、第二晶体管M2、第三晶体管M3、第七晶体管M7、第八晶体管M8、第十一晶体管M11、第十二晶体管M12、第十五晶体管M15、第十六晶体管M16、第十七晶体管M17和第十八晶体管M18是PMOS晶体管;第四晶体管M4、第五晶体管M5、第六晶体管M6、第九晶体管M9、第十晶体管M10、第十三晶体管M13、第十四晶体管M14、第十九晶体管M19、第二十晶体管M20、第二十一晶体管M21和第二十二晶体管M22是NMOS晶体管。相应的,上述控制端具体可以是指相应的PMOS晶体管或者NMOS晶体管的栅极。
需要说明的是,上述举例的第一晶体管M1至第二十二晶体管M22可以是MOS晶体管,也可以用双极型晶体管代替,图16中所示的各晶体管的类型仅为示例性的,并不对本申请实施例构成限制。另外,图16仅是一种电路实例,对于任何在此电路基础上增加晶体管,而使多个晶体管所起的作用等同于图16中的一个或者多个晶体管作用的,也视为同种电路。
基于此,本申请实施例还提供一种乘法器,如图17所示,该乘法器可以包括:多组编码器301和累加器302,该多组编码器301可用于对二进制数表示的第一数值和第二数值作编码,以得到多个部分积项,该累加器302可用于对该多个部分积项作累加,以得到第一数值和第二数值的乘积。其中,该累加器302可以为上述上文所提供的任意一种累加器,该多个部分积项可以作为上述累加器的W个压缩器层中的第1个压缩器层的输入阵列。
可选的,该乘法器还可以包括:多个预编码器303。该多个预编码器303可用于对第一数值作预编码,以得到预编码结果;相应的,该多组编码器301可用于对该预编码结果和第二数值作编码,以得到上述多个部分积项。关于图17所示的乘法器的更具体描述可进一步参照国际专利PCT/CN2019/119993。
本申请实施例将采用上文所提供的累加器的乘法器(下文中称为反相累加乘法器),与现有基于标准加法器实现的累加器的乘法器(下文中称为传统累加乘法器)进行了比较,具体在7nm工艺下的功耗和面积如图18所示,作为本申请的一个实施例,其不限定于以下具体参数值。图18中的(a)示出了利用该反相累加乘法器和该传统累加乘法器分别在两个8bits至32bits的二进制数做乘法运算时,其对应的累加器中的功耗大小。图18中的(b)示出了利用该反相累加乘法器和该传统累加乘法器分别在两个8bits至32bits的二进制数做乘法运算时,其对应的累加器中的面积大小。由图18可以看出,该反相累加乘法器的功耗和面积均小于该传统累加乘法器的功耗和面积。
本申请实施例中,由于在该反相累加乘法器的累加器中,至少一个第一压缩器层中的每个第一压缩器层的输入阵列包括第一阵列和第二阵列,第一阵列可以认为是一个包括多个正相的比特位的华莱士树,第二阵列可以认为是一个包括多个反相的比特位的华莱士树。也即是,该输入阵列包括两个华莱士树,这两个华莱士树中包括的比特位的相位相反。 对于第一阵列和第二阵列,第一压缩电路21用于压缩第一阵列,第二压缩电路22用于压缩第二阵列,从而该输入阵列中不同相位的比特位可以由不同的压缩电路来压缩,从而无需将每个第一压缩器层的输入阵列中的比特位都统一为同一相位,从而使得该累加器相对于传统设计实现简单,且能够降低面积和功耗。因此,采用该累加器的乘法器的面积和功耗也较小。
在本申请的另一个实施例中,还提供一种处理器,包括累加器、乘法器或算子电路;其中,该累加器为包括上文所提供的累加器,该乘法器为上文所提供的包括该累加器的乘法器,该算子电路包括上文所提供的任意一种或者多个算子电路。
在本申请的的另一个实施例中,提供一种芯片,包括累加器、乘法器或算子电路;其中,该累加器为包括上文所提供的累加器,该乘法器为上文所提供的包括该累加器的乘法器,该算子电路包括上文所提供的任意一种或者多个算子电路。
在本申请的另一个实施例中,还提供一种通信设备,该通信设备的结构可以如图4所示,即该通信设备可以包括存储器101、处理器102、通信接口103和总线104。其中,该处理器102中可以包括上文所提供的累加器,或者上文所提供的包括该累加器的乘法器。
需要说明的是,上文中关于该累加器和算子电路的相关描述,均可以对应援引到图16所示的乘法器、处理器、芯片和该通信设备中所包括的累加器和算子电路中,本申请实施例在此不再赘述。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种累加器,其特征在于,包括W个压缩器层,W为大于或等于1的整数;
    所述W个压缩器层,用于压缩多个二进制数,以得到多个累加值,所述多个累加值之和为所述多个二进制数的累加和;
    其中,所述W个压缩器层包括至少一个第一压缩器层,每个第一压缩器层用于压缩输入阵列以得到输出阵列,所述输入阵列包括第一阵列和第二阵列,所述第一阵列包括多个正相的比特位,所述第二阵列包括多个反相的比特位,所述输出阵列包括第一压缩阵列和第二压缩阵列;
    其中,所述每个第一压缩器层包括:
    第一压缩电路,用于压缩所述第一阵列,以得到所述第一压缩阵列;
    第二压缩电路,用于压缩所述第二阵列,以得到所述第二压缩阵列。
  2. 根据权利要求1所述的累加器,其特征在于,所述第一压缩电路包括一个或多个第一压缩器,所述一个或多个第一压缩器中的每个第一压缩器用于压缩所述第一阵列中位于同一数位上的三个比特位;
    所述第二压缩电路包括一个或多个第二压缩器,所述一个或多个第二压缩器中的每个第二压缩器用于压缩所述第二阵列中位于同一数位上的三个比特位。
  3. 根据权利要求2所述的累加器,其特征在于,所述每个第一压缩器和所述每个第二压缩器均为反相求和加法器;
    所述反相求和加法器,用于压缩所述三个比特位,得到一个进位输出位和一个求和输出位,所述进位输出位的相位与所述三个比特位的相位相同,所述求和输出位的相位与所述三个比特位的相位相反。
  4. 根据权利要求3所述的累加器,其特征在于,所述反相求和加法器用于进行以下压缩:
    若所述三个比特位均为0,则所述进位输出位为0,所述求和输出位为1;
    若所述三个比特位均为1,则所述进位输出位为1,所述求和输出位为0;
    若所述三个比特位中存在一个比特位为1、另外两个比特位为0,则所述进位输出位为0,所述求和输出位为0;
    若所述三个比特位中存在两个比特位为1、另外一个比特位为0,则所述进位输出位为1,所述求和输出位为1。
  5. 根据权利要求2所述的累加器,其特征在于,所述每个第一压缩器和所述每个第二压缩器均为反相进位加法器;
    所述反相进位加法器,用于压缩所述三个比特位,得到一个进位输出位和一个求和输出位,所述进位输出位的相位与所述三个比特位的相位相反,所述求和输出位的相位与所述三个比特位的相位相同。
  6. 根据权利要求5所述的累加器,其特征在于,所述反相进位加法器用于进行以下压缩:
    若所述三个比特位均为0,则所述进位输出位为1,所述求和输出位为0;
    若所述三个比特位均为1,则所述进位输出位为0,所述求和输出位为1;
    若所述三个比特位中存在一个比特位为1、另外两个比特位为0,则所述进位输出位为1,所述求和输出位为1;
    若所述三个比特位中存在两个比特位为1、另外一个比特位为0,则进位输出位为0,所述求和输出位为0。
  7. 根据权利要求2所述的累加器,其特征在于,所述每个第一压缩器和所述每个第二压缩器均为双反相加法器;
    所述双反相加法器,用于压缩所述三个比特位,得到一个进位输出位和一个求和输出位,所述进位输出位和所述求和输出位的相位均与所述三个比特位的相位相反。
  8. 根据权利要求7所述的累加器,其特征在于,所述双反相加法器用于进行以下压缩:
    若所述三个比特位均为0,则所述进位输出位为1,所述求和输出位为1;
    若所述三个比特位均为1,则所述进位输出位为0,所述求和输出位为0;
    若所述三个比特位中存在一个比特位为1、另外两个比特位为0,则所述进位输出位为1,所述求和输出位为0;
    若所述三个比特位中存在两个比特位为1、另外一个比特位为0,则进位输出位为0,所述求和输出位为1。
  9. 根据权利要求1-8任一项所述的累加器,其特征在于,所述累加器还包括:
    求和电路,用于接收所述多个累加值,并对所述多个累加值求和以得到所述累加和。
  10. 根据权利要求3-9任一项所述的累加器,其特征在于,所述累加器还包括:
    一个或多个反相器,用于对所述W个压缩器层中的一个或多个第一压缩器或第二压缩器所输出的求和输出位和进位输出位中的至少一个取反、或对输入所述一个或多个第一压缩器或第二压缩器的所述三个比特位取反。
  11. 一种乘法器,其特征在于,所述乘法器包括编码器、以及如权利要求1-10任一项所述的累加器。
  12. 一种算子电路,其特征在于,包括:第一晶体管、第二晶体管、第三晶体管、第四晶体管、第五晶体管、第六晶体管、第七晶体管、第八晶体管、第九晶体管、第十晶体管、第十一晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十五晶体管、第十六晶体管、第十七晶体管、第十八晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管、第二十三晶体管和第二十四晶体管;其中,
    所述第一晶体管和所述第二晶体管并联耦合在电源端和第一节点之间;
    所述第三晶体管耦合在所述第一节点和第二节点之间;
    所述第四晶体管耦合在所述第二节点和第三节点之间;
    所述第五晶体管和所述第六晶体管并联耦合在所述第三节点和接地端之间;
    所述第七晶体管耦合在所述电源端和第四节点之间;
    所述第八晶体管耦合在所述第二节点和所述第四节点之间;
    所述第九晶体管耦合在所述第二节点和第五节点之间;
    所述第十晶体管和所述第十一晶体管串联耦合在所述第四节点和第一输出端之间;
    所述第十二晶体管和所述第十三晶体管串联耦合在所述第五节点和所述第一输出 端之间;
    所述第十四晶体管耦合在所述第五节点和所述接地端之间;
    所述第十五晶体管、所述第十六晶体管和所述第十七晶体管并联耦合在所述电源端和第六节点之间;
    所述第十八晶体管耦合在所述第一输出端和所述第六节点之间;
    所述第十九晶体管耦合在所述第一输出端和第七节点之间;
    所述第二十晶体管、所述第二十一晶体管和所述第二十二晶体管并联耦合在所述第七节点和所述接地端之间;
    所述第三晶体管、所述第四晶体管、所述第十一晶体管、所述第十二晶体管、所述第十五晶体管和所述第二十晶体管的控制端均用于接收第一输入;
    所述第一晶体管、所述第五晶体管、所述第七晶体管、所述第十四晶体管、所述第十六晶体管和所述第二十一晶体管的控制端均用于接收第二输入;
    所述第二晶体管、所述第六晶体管、所述第八晶体管、所述第九晶体管、所述第十晶体管、所述第十三晶体管、所述第十七晶体管和所述第二十二晶体管的控制端均用于接收第三输入;
    所述第十八晶体管和所述第十九晶体管的控制端均耦合于所述第二节点;
    所述第二十三晶体管和所述第二十四晶体管串联耦合在所述电源端和所述接地端之间,所述第二十三晶体管和所述第二十四晶体管的耦合点为第二输出端;
    所述第二十三晶体管和所述第二十四晶体管的控制端均耦合于所述第二节点。
  13. 根据权利要求12所述的算子电路,其特征在于,所述第一晶体管、所述第二晶体管、所述第三晶体管、所述第七晶体管、所述第八晶体管、所述第十晶体管、所述第十一晶体管、所述第十五晶体管、所述第十六晶体管、所述第十七晶体管、所述第十八晶体管和所述第二十三晶体管是PMOS晶体管;
    所述第四晶体管、所述第五晶体管、所述第六晶体管、所述第九晶体管、所述第十二晶体管、所述第十三晶体管、所述第十四晶体管、所述第十九晶体管、所述第二十晶体管、所述第二十一晶体管、所述第二十二晶体管和所述第二十四晶体管是NMOS晶体管。
  14. 一种算子电路,其特征在于,包括:第一晶体管、第二晶体管、第三晶体管、第四晶体管、第五晶体管、第六晶体管、第七晶体管、第八晶体管、第九晶体管、第十晶体管、第十一晶体管、第十二晶体管、第十三晶体管、第十四晶体管、第十五晶体管、第十六晶体管、第十七晶体管、第十八晶体管、第十九晶体管、第二十晶体管、第二十一晶体管、第二十二晶体管、第二十三晶体管和第二十四晶体管;其中,
    所述第一晶体管和所述第二晶体管并联耦合在电源端和第一节点之间;
    所述第三晶体管耦合在所述第一节点和第一输出端之间;
    所述第四晶体管耦合在所述第一输出端和第二节点之间;
    所述第五晶体管和所述第六晶体管并联耦合在所述第二节点和接地端之间;
    所述第七晶体管耦合在所述电源端和第三节点之间;
    所述第八晶体管耦合在所述第三节点和所述第一输出端之间;
    所述第九晶体管耦合在所述第一输出端和第四节点之间;
    所述第十晶体管和所述第十一晶体管串联耦合在所述第三节点和第五节点之间;
    所述第十二晶体管和所述第十三晶体管串联耦合在所述第四节点和所述第五节点之间;
    所述第十四晶体管耦合在所述第四节点和所述接地端之间;
    所述第十五晶体管、所述第十六晶体管和所述第十七晶体管并联耦合在所述电源端和第六节点之间;
    所述第十八晶体管耦合在所述第五节点和所述第六节点之间;
    所述第十九晶体管耦合在所述第五节点和第七节点之间;
    所述第二十晶体管、所述第二十一晶体管和所述第二十二晶体管并联耦合在所述第七节点和所述接地端之间;
    所述第三晶体管、所述第四晶体管、所述第十一晶体管、所述第十二晶体管、所述第十五晶体管和所述第二十晶体管的控制端均用于接收第一输入;
    所述第一晶体管、所述第五晶体管、所述第七晶体管、所述第十四晶体管、所述第十六晶体管和所述第二十一晶体管的控制端均用于接收第二输入;
    所述第二晶体管、所述第六晶体管、所述第八晶体管、所述第九晶体管、所述第十晶体管、所述第十三晶体管、所述第十七晶体管和所述第二十二晶体管的控制端均用于接收第三输入;
    所述第十八晶体管和所述第十九晶体管的控制端均耦合于所述第一输出端;
    所述第二十三晶体管和所述第二十四晶体管串联耦合在所述电源端和所述接地端之间,所述第二十三晶体管和所述第二十四晶体管的耦合点为第二输出端;
    所述第二十三晶体管和所述第二十四晶体管的控制端均耦合于所述第五节点。
  15. 根据权利要求14所述的算子电路,其特征在于,所述第一晶体管、所述第二晶体管、所述第三晶体管、所述第七晶体管、所述第八晶体管、所述第十晶体管、所述第十一晶体管、所述第十五晶体管、所述第十六晶体管、所述第十七晶体管、所述第十八晶体管和所述第二十三晶体管是PMOS晶体管;
    所述第四晶体管、所述第五晶体管、所述第六晶体管、所述第九晶体管、所述第十二晶体管、所述第十三晶体管、所述第十四晶体管、所述第十九晶体管、所述第二十晶体管、所述第二十一晶体管、所述第二十二晶体管和所述第二十四晶体管是NMOS晶体管。
PCT/CN2021/109751 2021-07-30 2021-07-30 一种累加器、乘法器及算子电路 WO2023004783A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2021/109751 WO2023004783A1 (zh) 2021-07-30 2021-07-30 一种累加器、乘法器及算子电路
CN202180007101.XA CN115917499A (zh) 2021-07-30 2021-07-30 一种累加器、乘法器及算子电路
EP21951373.6A EP4336345A1 (en) 2021-07-30 2021-07-30 Accumulator, multiplier, and operator circuit
US18/424,893 US20240168714A1 (en) 2021-07-30 2024-01-29 Accumulator, multiplier, and operator circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109751 WO2023004783A1 (zh) 2021-07-30 2021-07-30 一种累加器、乘法器及算子电路

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/424,893 Continuation US20240168714A1 (en) 2021-07-30 2024-01-29 Accumulator, multiplier, and operator circuit

Publications (1)

Publication Number Publication Date
WO2023004783A1 true WO2023004783A1 (zh) 2023-02-02

Family

ID=85087449

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109751 WO2023004783A1 (zh) 2021-07-30 2021-07-30 一种累加器、乘法器及算子电路

Country Status (4)

Country Link
US (1) US20240168714A1 (zh)
EP (1) EP4336345A1 (zh)
CN (1) CN115917499A (zh)
WO (1) WO2023004783A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158879A1 (en) * 2000-12-11 2003-08-21 International Business Machines Corporation Pre-reduction technique within a multiplier/accumulator architecture
US20070244943A1 (en) * 2006-02-28 2007-10-18 Sony Computer Entertainment Inc. Methods and apparatus for providing a reduction array
CN105528191A (zh) * 2015-12-01 2016-04-27 中国科学院计算技术研究所 数据累加装置、方法及数字信号处理装置
CN112596699A (zh) * 2020-12-30 2021-04-02 海光信息技术股份有限公司 乘法器、处理器及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158879A1 (en) * 2000-12-11 2003-08-21 International Business Machines Corporation Pre-reduction technique within a multiplier/accumulator architecture
US20070244943A1 (en) * 2006-02-28 2007-10-18 Sony Computer Entertainment Inc. Methods and apparatus for providing a reduction array
CN105528191A (zh) * 2015-12-01 2016-04-27 中国科学院计算技术研究所 数据累加装置、方法及数字信号处理装置
CN112596699A (zh) * 2020-12-30 2021-04-02 海光信息技术股份有限公司 乘法器、处理器及电子设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KONG XIN, CHEN GANG; GONG GUOLIANG; LU HUAXIANG; MAO WENYU: "High Performance Multiply-accumulator for the Convolutional Neural Networks Accelerator", XI'AN DIANZI KE JI DAXUE XUEBAO - JOURNAL OF XIADIAN UNIVERSITY, XI'AN DIANZI KE JI DAXUE, XI'AN,, CN, vol. 47, no. 4, 31 August 2020 (2020-08-31), CN , XP093028523, ISSN: 1001-2400, DOI: 10.19665/j.issn1001-2400.2020.04.008 *
LIU, WEIQIANG ET AL.: "Design and Analysis of Approximate Redundant Binary Multipliers", IEEE TRANSACTIONS ON COMPUTERS, vol. 68, no. 6, 30 June 2019 (2019-06-30), XP011723015, ISSN: 0018-9340, DOI: 10.1109/TC.2018.2890222 *

Also Published As

Publication number Publication date
EP4336345A1 (en) 2024-03-13
CN115917499A (zh) 2023-04-04
US20240168714A1 (en) 2024-05-23

Similar Documents

Publication Publication Date Title
CN107977191B (zh) 一种低功耗并行乘法器
CN110515589B (zh) 乘法器、数据处理方法、芯片及电子设备
CN110362293B (zh) 乘法器、数据处理方法、芯片及电子设备
CN110515587B (zh) 乘法器、数据处理方法、芯片及电子设备
CN109144473A (zh) 一种基于冗余odds数的十进制3:2压缩器结构
WO2023004783A1 (zh) 一种累加器、乘法器及算子电路
US11855661B2 (en) Multiplier and operator circuit
WO2022143432A1 (zh) 一种矩阵计算装置、方法、系统、电路、芯片及设备
CN115840556A (zh) 一种基于6位近似全加器的2组有符号张量计算电路结构
CN111897513B (zh) 一种基于反向极性技术的乘法器及其代码生成方法
CN209879493U (zh) 乘法器
CN110647307B (zh) 数据处理器、方法、芯片及电子设备
US7739323B2 (en) Systems, methods and computer program products for providing a combined moduli-9 and 3 residue generator
KR102676098B1 (ko) 곱셈기 및 연산자 회로
JPH0467213B2 (zh)
WO2023015442A1 (zh) 一种乘法器
JPH07160476A (ja) 部分積生成回路
Hsiao et al. Low-cost design of reciprocal function units using shared multipliers and adders for polynomial approximation and Newton Raphson interpolation
CN220305789U (zh) 一种基于基本门电路的低功耗全加器
CN113778377B (zh) 一种基于基8布斯折叠编码的平方器结构
CN217034731U (zh) 选择控制器及运算电路及芯片
Reddy et al. A high speed, high Radix 32-bit Redundant parallel multiplier
CN113033799B (zh) 数据处理器、方法、装置及芯片
REDDY et al. Design and Implementation of VLSI Architectures of 16-Bit Carry Select Adder Using Brent Kung Adder
CN110378478B (zh) 乘法器、数据处理方法、芯片及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21951373

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021951373

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021951373

Country of ref document: EP

Effective date: 20231205

NENP Non-entry into the national phase

Ref country code: DE