WO2022017179A1 - 加法器、运算电路、芯片和计算装置 - Google Patents

加法器、运算电路、芯片和计算装置 Download PDF

Info

Publication number
WO2022017179A1
WO2022017179A1 PCT/CN2021/104880 CN2021104880W WO2022017179A1 WO 2022017179 A1 WO2022017179 A1 WO 2022017179A1 CN 2021104880 W CN2021104880 W CN 2021104880W WO 2022017179 A1 WO2022017179 A1 WO 2022017179A1
Authority
WO
WIPO (PCT)
Prior art keywords
adder
bits
input
carry
subsections
Prior art date
Application number
PCT/CN2021/104880
Other languages
English (en)
French (fr)
Inventor
刘建波
范志军
李楠
郭海丰
Original Assignee
深圳比特微电子科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳比特微电子科技有限公司 filed Critical 深圳比特微电子科技有限公司
Publication of WO2022017179A1 publication Critical patent/WO2022017179A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure generally relates to digital circuits. Specifically, it relates to an adder, an arithmetic circuit including the adder, as well as a chip and a computing device.
  • Adders which are used to perform addition operations, are an important part of many arithmetic circuits.
  • a high-speed device is usually used to realize the adder.
  • an adder for calculating the sum of two numbers of an input, the adder having two inputs respectively representing the two numbers, wherein each input is corresponding to each other is divided into a plurality of sub-parts, and the plurality of sub-parts sequentially represent the partial bits of the input from the low order to the high order
  • the adder includes: a plurality of first-stage addition modules, each first-stage addition module is used for adding The corresponding subsections of the two inputs are summed; a plurality of intermediate registers, each of which is coupled to a corresponding first-stage addition module, is used to store the sum of the corresponding subsections of the two inputs; a or a plurality of carry registers, each carry register coupled to a corresponding first stage addition module for storing the carry bits of corresponding sub-portions of the two inputs; and a second stage addition module coupled to the plurality of An intermediate register and the one or more carry registers for summing the sum from each intermediate register with the
  • an adder for calculating the sum of a number of an input and a predetermined constant, the adder having an input representing the number, the input being divided into multiple sub-parts, the plurality of sub-parts represent partial bits of the input sequentially from low bits to high bits
  • the adder includes: one or more first-stage addition modules, each first-stage addition module is used for adding the The corresponding sub-portions of the input are summed with the corresponding bits of the constant; a plurality of intermediate registers, each of which is coupled to a corresponding first-stage addition module, is used to store the corresponding sub-portions of the input and the constant The sum of the corresponding bits of ; one or more carry registers, each of which is coupled to a corresponding first-stage addition module, for storing the carry of the corresponding subsection of the input and the corresponding bit of the constant; and A second stage addition module, coupled to the plurality of intermediate registers and the one or more carry registers, sums the sum from each
  • an arithmetic circuit comprising an adder as described above; and a pre-combination logic module coupled to an input of the adder and an output coupled to the adder at least one of the post-combination logic modules.
  • a chip including the arithmetic circuit as described above.
  • a computing device including the chip as described above.
  • FIG. 1 shows a schematic diagram of an adder for calculating the sum of two numbers of an input, according to one or more exemplary embodiments of the present disclosure.
  • FIG. 2 shows a schematic diagram of an adder for calculating the sum of an input number and a predetermined constant according to one or more exemplary embodiments of the present disclosure.
  • FIG 3 illustrates a portion of an operational circuit including an adder according to one or more exemplary embodiments of the present disclosure.
  • FIG. 4 shows a part of an operation circuit including an adder according to the related art.
  • FIG. 1 shows a schematic diagram of an adder 100 in accordance with one or more exemplary embodiments of the present disclosure.
  • the adder 100 is used to calculate the sum of the two numbers input.
  • the adder 100 has two inputs 111 , 112 and two outputs 161 , 162 .
  • the two inputs 111 and 112 respectively represent the two input numbers
  • the outputs 161 and 162 respectively represent the sum and the carry of the summation result of the two numbers.
  • the configuration of the input and output of the adder 100 is not limited to the embodiment shown in FIG. 1 .
  • the configuration of the input and output of the adder 100 can be appropriately adjusted according to the function of the adder and the needs of the operation circuit, and the configuration of each module of the adder 100 can be adjusted accordingly.
  • the adder may also have only one output, ie, only the sum of the summation results, without the carry.
  • each input 111 , 112 is divided into N sub-parts corresponding to each other, and the N sub-parts sequentially represent the partial bits of the input from low-order to high-order.
  • input 111 is divided into subsections 111-1, 111-2, . . . , 111-N from low to high
  • input 112 is divided into subsections 112-1, 112-2, .
  • the first subsections 111-1 and 112-1 represent the least significant one or more bits of the inputs 111 and 112, respectively, and the bits represented by 111-1 and 112-1 are the same.
  • the second subsections 111-2 and 112-2 respectively represent one or more bits of the inputs 111 and 112 higher than 111-1 and 112-1, and the bits represented by 111-2 and 112-2 are the same .
  • the Nth subsections 111-N and 112-N represent the highest one or more bits of the inputs 111 and 112, respectively, and the bits represented by 111-N and 112-N are the same.
  • each input 111, 112 has at least two subsections.
  • the output 161 representing the sum of the inputs 111 and 112 is divided into two subsections 161-1, 161-2.
  • 161-1 corresponds to the first subsections 111-1 and 112-1 of the inputs 111 and 112, representing the lowest one or more bits of the output 161;
  • 161-2 corresponds to the other subsections of the inputs 111 and 112 , representing the other one or more bits of output 161.
  • the adder 100 includes a first-stage addition module group 120 , an intermediate register group 130 , a carry register group 140 and a second-stage addition module 150 .
  • the first-stage addition module group 120 is coupled to the inputs 111, 112, and includes a plurality of first-stage addition modules 120-1, 120-2, . . . , 120-N. Each first stage addition module 120-1, 120-2, . . . , 120-N is used to sum corresponding sub-portions of the two inputs 111, 112.
  • the first first stage summing module 120-1 is coupled to the first subsections 111-1 and 112-1 of the two inputs 111, 112 for summing 111-1 and 112-1.
  • the number of the first-stage addition modules is equal to the number of input subsections, and the configuration of each first-stage addition module can be determined according to the number of bits of the input corresponding subsections.
  • the outputs of the first-stage addition module group 120 are coupled to the intermediate register group 130 and the carry register group 140, and output the sum and carry of the summation result to the intermediate register group 130 and the carry register group 140, respectively.
  • the intermediate register group 130 includes a plurality of intermediate registers 130-1, 130-2, . . . , 130-N. Each intermediate register 130-1, 130-2, ..., 130-N is coupled to a corresponding first stage addition module 120-1, 120-2, ..., 120-N for storing corresponding sub-portions of the two inputs 111, 112 The sum of the summation results of .
  • the first intermediate register 130-1 is coupled to the first first stage addition module 120-1 for storing the sum of the summation results of 111-1 and 112-1. That is, the first intermediate register 130-1 corresponds to the first subsections 111-1 and 112-1 of the two inputs 111, 112, and is used to store the two inputs 111, 112 by 111-1 and 112-1 The sum of the summation results of the least significant one or more digits represented.
  • the number of intermediate registers is equal to the number of input subsections, and the configuration of each intermediate register can be determined according to the number of bits of the input corresponding subsection.
  • the carry register group 140 includes a plurality of carry registers 140-1, 140-2, . . . , 140-N. Each carry register 140-1, 140-2, ..., 140-N is coupled to a corresponding first stage addition module 120-1, 120-2, ..., 120-N for storing corresponding sub-portions of the two inputs 111, 112 The carry of the summation result.
  • the first carry register 140-1 is coupled to the first first stage addition module 120-1 for storing the carry of the summation results of 111-1 and 112-1. That is, the first carry register 140-1 is used to store the carry of the summation result of the least significant one or more bits of the two inputs 111, 112 represented by 111-1 and 112-1.
  • the number of carry registers can be determined based on the number of subsections of the input. In the embodiment shown in Figure 1, the number of carry registers is equal to the number of input subsections. In other embodiments, the number of carry registers may be one less than the number of subsections of the input. That is, the Nth carry register 140-N may not exist in the carry register group 140, and the summation result of the highest one or more bits of the two inputs 111 and 112 represented by 111-N and 112-N may not be stored. carry.
  • the number of carry registers can be determined as needed. In embodiments where adder 100 needs to output a carry of the summed result of two inputs 111, 112 (ie, adder 100 includes output 162), the number of carry registers may be determined to be equal to the number of subsections of the inputs. In embodiments where the adder 100 does not need to output a carry of the summed result of the two inputs 111, 112 (ie, the adder 100 does not include an output 162), the number of carry registers may be determined to be greater than the number of subsections of the inputs 1 less, thereby reducing the additional cost of the adder 100. For example, in an embodiment where the two inputs 111, 112 each have two subsections, the adder 100 may include only one carry register.
  • the carry register is only used to store the carry bit, so each carry register can be implemented by a 1-bit register.
  • the outputs of the intermediate register group 130 and the carry register group 140 are coupled to the second-stage addition module 150, and the result of summing the corresponding sub-parts of the two inputs 111 and 112 respectively (including the sum and carry) is output to the second stage.
  • Stage addition module 150 is
  • the second stage addition module 150 is used to sum the sum from each intermediate register and the carry from the corresponding previous carry register to obtain the sum of the two inputs 111, 112.
  • the second-stage addition module 150 may sum the least significant one or more bits represented by 111-1 and 112-1 of the two inputs 111, 112 output from the first intermediate register 130-1 The summed output is the corresponding least significant one or more bits of the summed sum of the two inputs 111, 112 (ie, subsection 161-1 of output 161).
  • the second-stage addition module 150 may obtain a summation result of one or more bits represented by 111-2 and 112-2 of the two inputs 111, 112 output from the second intermediate register 130-2 The sum is summed with the carry of the summation result of the least significant one or more bits represented by 111-1 and 112-1 of the two inputs 111, 112 output from the first carry register 140-1, and then the The sum output of the summation result is the corresponding one or more bits of the sum of the two inputs 111 , 112 , and the carry of the summation result is used for further operations in the second-stage addition module 150 .
  • the second-stage addition module 150 may sum the most significant one or more bits represented by 111-N and 112-N of the two inputs 111, 112 output from the Nth intermediate register 130-N The sum of the result and the sum of one or more bits represented by 111-N-1 and 112-N-1 of the two inputs 111, 112 output from the N-1th carry register 140-N-1 The carry of the result is summed, and then the sum of the summation result is output as the corresponding highest one or more digits of the sum of the two inputs 111 and 112 .
  • the second-stage addition module 150 may combine the carry of the above summation result with the most significant bit represented by 111-N and 112-N of the two inputs 111, 112 output from the Nth carry register 140-N or the carry of the multi-bit summation result is summed, and then the summation result is output as the carry of the summation result of the two inputs 111 and 112 , that is, the output 162 .
  • the processing performed by the second-stage addition module 150 is not limited to the processing described above.
  • the configuration of the second-stage adding module 150 may be determined according to the function of the adder 100 . For example, in embodiments that do not need to output a carry of the summed result of the two inputs 111 , 112 (ie, the adder does not include an output 162 ), the second stage addition module 150 may not perform the computation and output the two inputs 111 , 112 The processing of the carry of the summation result.
  • the output of the second stage addition module 150 is coupled to outputs 161 and 162, which represent the sum and carry of the summation results of the two inputs 111, 112, respectively.
  • the output 161 may be divided into two subsections: a first subsection 161-1, representing the sum of the summed results of the two inputs 111, 112 corresponding to 111-1 and 112-1 the least significant one or more bits; and a second subsection 161-2 representing the other one or more bits of the sum of the summation results of the two inputs 111, 112.
  • the second stage addition module 150 may directly couple the output of the first intermediate register 130 - 1 to the first subsection 161 - 1 of the output 161 .
  • registers mentioned herein may be edge-triggered registers (eg, D-type flip-flops) or level-triggered registers (eg, latches).
  • the calculation speed of the adder 100 mainly depends on the calculation speed of the first-stage addition module group 120 and the second-stage addition module 150 , while the calculation speed of the first-stage addition module group 120 and the second-stage addition module 150 is related to the two inputs 111 .
  • the number of subparts of 112 is related to the number of bits. Therefore, the number and the number of bits of the subsections of the two inputs 111, 112 can be appropriately determined, thereby increasing the calculation speed of the adder 100.
  • the overall calculation delay of the first-stage addition module group 120 is determined by the calculation delay of the first-stage addition module with the longest calculation delay among the plurality of first-stage addition modules 120-1, 120-2, . . . , 120-N. Decide.
  • Each first-level addition module 120-1, 120-2, . . . , 120-N is used to sum the corresponding sub-parts of the two inputs 111, 112. The more bits in the sub-part, the corresponding first-level The calculation delay of the addition module is longer.
  • the overall computation delay of the first-stage addition block group 120 depends on the largest number of bits among the number of bits of the multiple subsections of the inputs 111 , 112 .
  • the second stage addition module 150 is used to sum the sums from the plurality of intermediate registers and the carry from the corresponding previous carry register. Wherein, the output of the first intermediate register 130-1 represents the corresponding lowest one or more bits of the sum of the summation result. Therefore, the second stage addition module 150 may not perform additional processing on the output of the first intermediate register 130-1. In particular, in some embodiments, the second-stage addition module 150 may also perform certain processing on the output of the first intermediate register 130-1 as required, but the processing time will be much less than the second-stage addition Module 150 requires the summation process as described above for the outputs of the other intermediate registers and the carry register. That is, the calculation delay of the second-stage addition module 150 is determined by the calculation delay of the summation processing as described above.
  • the calculation delay of the second stage addition module 150 may depend on the number of subsections other than the first subsection 111-1, 112-1 among the plurality of subsections of the input 111, 112 and the number of bits thereof Sum.
  • the smaller the number of subsections of the input 111 and 112, or the larger the number of bits of the first subsections 111-1 and 112-1 the shorter the calculation delay of the second-stage addition module 150 is.
  • the number of bits of the first subsection 111-1, 112-1 of the input 111, 112 is greater than or equal to the number of bits of the other subsections. In other embodiments, the multiple subsections of the inputs 111, 112 are substantially equal in number of bits. This is beneficial to reduce the calculation delay of the first-stage addition module group 120 and the second-stage addition module 150, thereby increasing the calculation speed of the adder 100, and further reducing the power consumption and computing power ratio of the chip.
  • the input 111 , 112 has two subsections, and the first subsection 111 - 1 , 112 - 1 has a number of bits greater than or equal to half the number of bits of the input 111 , 112 . This is beneficial to reduce the extra cost while increasing the calculation speed of the adder 100 .
  • the expression “substantially equal” herein means that the two are approximately equal, but not necessarily strictly and precisely equal. Those skilled in the art should understand that this is consistent with technical principles and engineering practice. For example, the two may differ by about 5% or 10%. In some contexts, the two may differ by about 15% or 20%.
  • FIG. 2 shows a schematic diagram of an adder 200 in accordance with one or more exemplary embodiments of the present disclosure.
  • the adder 200 is used to calculate the sum of an inputted number and a predetermined constant.
  • the adder 200 has one input 210 and two outputs 261 , 262 .
  • the input 210 represents the input number
  • the outputs 261 and 262 represent the sum and the carry of the summation result of the number and a predetermined constant, respectively.
  • the adder 200 may also have only one output 261, that is, only the sum of the summation results, but not the carry.
  • the configuration of the adder 200 is similar to that of the adder 100, and can be appropriately adjusted according to the predetermined constant.
  • the input 210 is divided into N sub-parts, and the N sub-parts represent the partial bits of the input sequentially from low-order to high-order. That is, the input 210 is divided into subsections 210-1, 210-2, . . . , 210-N from low to high. where N should be an integer greater than or equal to 2. That is, input 210 has at least two subsections.
  • the output 261 representing the sum of the input 210 and the constant is divided into two subsections 261-1, 261-2.
  • 261-1 corresponds to the first sub-part 210-1 of the input 210, representing the lowest one or more bits of the output 261;
  • 261-2 corresponds to the other sub-parts 210-2,...,210-N of the input 210 Correspondingly, represent the other one or more bits of output 261.
  • the adder 200 includes a first-stage addition module group 220 , an intermediate register group 230 , a carry register group 240 and a second-stage addition module 250 .
  • the first-stage addition module group 220 is coupled to the input 210 and includes a plurality of first-stage addition modules 220-1, 220-2, . . . , 220-N. Each of the first stage addition modules 220-1, 220-2, . . . , 220-N is used to sum the corresponding sub-portion of the input 210 with the corresponding bits of the constant.
  • each of the first stage addition modules 220-1, 220-2, . . . , 220-N may be related to the corresponding bit of the constant.
  • the number and configuration of the first stage addition modules 220-1, 220-2, . . . , 220-N may be determined or adjusted based at least in part on the constant. For example, for any subsection of the input 210, if it is known that the corresponding bits of the predetermined constant are all zero, the subsection and the corresponding bits of the constant may not be summed, so the first-level addition module group 220 can The corresponding first-level addition module is not included. This is beneficial to reduce the manufacturing cost of the adder 200 .
  • the first-level addition module group 220 may include only one first-level addition module, that is, only the first-level addition module with the input 210 is included.
  • the first first-stage addition module 220-1 corresponding to the sub-sections 210-1.
  • the adder 200 is a self-adding 1 adder, and may include only one first-stage adding module.
  • the outputs of the first-stage addition module group 220 are coupled to the intermediate register group 230 and the carry register group 240, and output the summation result (including the sum and carry) to the intermediate register group 230 and the carry register group 240, respectively.
  • the intermediate register group 230 includes a plurality of intermediate registers 230-1, 230-2, . . . , 230-N. As shown in FIG. 2 , each intermediate register 230-1, 230-2, . . . , 230-N is coupled to a corresponding first-stage addition module 220-1, 220-2, . The sum of the result of the sum of the subsections and the corresponding bits of this constant.
  • the carry register group 240 includes a plurality of carry registers 240-1, 240-2, . . . , 240-N. Each carry register 240-1, 240-2, . . . , 240-N is coupled to a corresponding first-stage addition module 220-1, 220-2, . The carry of the result of the sum of the corresponding bits.
  • the configuration of the intermediate register group 230 and the carry register group 240 can be appropriately adjusted according to the configuration of the first-stage addition module group 220 .
  • the corresponding intermediate register in the intermediate register group 230 may be directly coupled to the sub-section of the input 210 part, and the corresponding carry register may not be included in the carry register group 240 .
  • the number of carry registers is equal to the number of input subsections. In other embodiments, the number of carry registers may be one less than the number of input subsections, ie there is no Nth carry register 240-N.
  • the outputs of the intermediate register group 230 and the carry register group 240 are coupled to the second-stage addition module 250, and the result (including the sum and carry) of the sum of the respective sub-parts of the input 210 and the corresponding bits of the constant (including the sum and the carry) is output to the first stage.
  • Secondary addition module 250 is coupled to the second-stage addition module 250, and the result (including the sum and carry) of the sum of the respective sub-parts of the input 210 and the corresponding bits of the constant (including the sum and the carry) is output to the first stage.
  • the second-stage addition module 250 is used to sum the sum from each intermediate register and the carry from the corresponding previous carry register, that is, the result of summing the respective sub-parts of the input 210 and the corresponding bits of the constant. (including the sum and carry) to get the sum of the input 210 and the constant.
  • the second-stage addition module 250 may output the lowest one or more bits of the input 210 represented by 210-1 output from the first intermediate register 230-1 and the corresponding lowest one or more bits of the constant The sum output is the corresponding least significant one or more bits of the sum of input 210 and the constant (ie, subsection 261-1 of output 261).
  • the second-stage addition module 250 may add a sum of one or more bits of the input 210 represented by 210-2 output from the second intermediate register 230-2 and the corresponding one or more bits of the constant summing with the carry output from the first carry register 240-1, then outputting the sum of the summation result as the corresponding one or more bits of the sum of the input 210 and the constant, and adding the carry of the summation result For further operations in the second stage addition module 250 .
  • the second-stage addition module 250 can output the highest one or more bits of the input 210 represented by 210-N output from the Nth intermediate register 230-N and the corresponding highest one or more bits of the constant sum the sum of the N-1 th carry register 240-N-1, and then output the sum of the summation result as the highest one or more bits corresponding to the sum of the input 210 and the constant .
  • the second-level addition module 250 may sum the carry of the above summation result and the carry output from the Nth carry register 240-N, and then output the summation result as the summation result of the input 210 and the constant. carry (i.e. output 262).
  • the processing performed by the second-level addition module 250 is not limited to the above.
  • the configuration of the second-stage adding module 250 may be determined according to the function of the adder. For example, in embodiments where adder 200 does not include output 262, the configuration of second stage adder module 250 and the processing performed by it may be adjusted accordingly.
  • the output of the second stage addition module 250 is coupled to outputs 261 and 262, which respectively represent the sum and carry of the summation result of the input 210 and the constant.
  • output 261 may be divided into two subsections: a first subsection 261-1, representing the least significant one or more bits corresponding to 210-1 of the sum of input 210 and the result of summing the constant bits; and a second subsection 261-2 representing the other one or more bits of the sum of the sum of the input 210 and the constant.
  • the second stage addition module 250 may directly couple the output of the first intermediate register 230 - 1 to the first subsection 261 - 1 of the output 261 .
  • the calculation speed of the adder 200 mainly depends on the calculation speed of the first-stage addition module group 220 and the second-stage addition module 250 .
  • the configuration of the first-stage addition modules 220-1, 220-2, . . . , 220-N can be related to the constant.
  • the first-stage addition module group 220 may not include the corresponding first-stage addition module. Therefore, the computational delay of the first-stage adding block group 220 is independent of the number of bits of such subsections, but only depends on other subsections of the input 210 (ie, for these subsections, the corresponding bits of the constant are not all zeros) ) digits.
  • the number of subsections of input 210 and the number of bits per subsection are determined based at least in part on the constant. For example, if the constant is small (ie, the upper bits of the constant are all zeros, eg, the constant is 1), then the input 210 may have two subsections such that the bits of the constant corresponding to the second subsection are all zeros. For example, if the constant includes a plurality of consecutive bits that are all zeros, a sub-portion of the input 210 may be divided corresponding to at least a portion of the consecutive plurality of bits.
  • FIG. 3 illustrates a portion of an arithmetic circuit 3000 including an adder 300 according to one or more exemplary embodiments of the present disclosure.
  • adder 300 is shown as an adder as shown in FIG. 1 for calculating the sum of two numbers input.
  • the adder 300 can be replaced with an adder for calculating the sum of an input number and a predetermined constant as shown in FIG.
  • the arithmetic circuit 3000 includes registers 3101 and 3102 of the previous stage, an adder 300 , and a register 3200 of the subsequent stage.
  • the operation circuit 3000 may further include a pre-combination logic module 3110 and a post-combination logic module 3120 .
  • the previous stage registers 3101 , 3102 may be directly coupled to the adder 300 . In some embodiments, the previous stage registers 3101 , 3102 may be coupled to the adder 300 via a pre-combination logic module 3110 . In some embodiments, the adder 300 may be directly coupled to the subsequent stage register 3200 . In some embodiments, the adder 300 may be coupled to the post-stage register 3200 via the post-combination logic module 3120 .
  • the arithmetic circuit 3000 may include only one previous stage register 3101 , which provides two inputs 311 , 312 to the adder 300 via the pre-combination logic module 3110 .
  • FIG. 3 shows an embodiment in which the arithmetic circuit 3000 includes a pre-combination logic module 3110 and a post-combination logic module 3120 .
  • the operation circuit 3000 does not include the pre-combination logic module 3110 or the post-combination logic module 3120, and only needs to make appropriate adjustments.
  • the adder 300 is similar in configuration to the adder 100 shown in FIG. 1 .
  • the adder 300 has two inputs 311, 312 respectively representing the two numbers input, and two outputs 361, 362 representing the sum and carry of the summation result of the two numbers, respectively. Therein, the output 361 has two subsections 361-1, 361-2.
  • the adder 300 includes a first-stage addition module group 320 , an intermediate register group 330 including a plurality of intermediate registers 330 - 1 , 330 - 2 , . . . , 330 -N, a carry register group 340 , and a second-stage addition module 350 .
  • the second stage addition module 350 directly couples the output of the first intermediate register 330-1 to the first subsection 361-1 of the output 361.
  • the frequencies of the clocks used for the preceding-stage registers 3101 and 3102, the intermediate register group 330, and the succeeding-stage register 3200 are the same. Therefore, it is expected that the operations of the pre-combination logic module 3110 and the first-stage addition module group 320 can be completed in one clock cycle, and the operations of the second-stage addition module 350 and the post-combination logic module 3120 can be completed in one clock cycle. .
  • Block 3120 calculates the difference in delays.
  • the maximum number of bits of the plurality of subsections of the inputs 311 , 312 of the adder 300 may be determined based at least in part on the difference between the clock period and the computation delay of the pre-combination logic module 3110 . Specifically, the maximum number of bits can be determined so that the calculation delay of the first-stage addition module group 320 is smaller than the difference between the clock cycle and the calculation delay of the pre-combination logic module 3110 . In some embodiments, the maximum number of bits may be determined such that the calculation delay of the first-stage addition module group 320 is substantially equal to the difference between the clock period and the calculation delay of the pre-combination logic module 3110 .
  • the maximum number of bits of the plurality of subsections of inputs 311, 312 of adder 300 is determined based at least in part on clock cycles digits.
  • the calculation delay of the first-stage addition module group 320 is also related to the constant .
  • the number of bits of the subsections can be adjusted according to this constant.
  • the maximum number of bits in the number of bits of the multiple sub-sections of the input of the adder 300 may be determined first according to the difference between the clock period and the calculation delay of the pre-combination logic module 3110, and then according to the constant to adjust the number of bits in multiple subsections. For example, if the constant includes consecutive bits that are all zeros, a subsection of the input may be divided corresponding to the consecutive bits, regardless of whether the number of bits in the subsection is greater than the determined maximum number of bits. In some embodiments, the number of bits of the plurality of sub-portions of the input may be adjusted such that the maximum number of bits in the sub-portion that does not correspond to the all-zero bits of the constant is substantially equal to the determined maximum number of bits.
  • the lower bound on the number of bits of the first subsection of the inputs 311 , 312 of the adder 300 may be determined at least in part from the difference between the clock period and the computation delay of the post-combination logic module 3120 .
  • the number of bits of the first subsection of the inputs 311 , 312 of the adder 300 is determined based at least in part on the difference between the clock period and the computation delay of the post-combination logic module 3120 .
  • the number of bits of the first subsection can be determined such that the calculation delay of the second-stage addition module 350 is smaller than the difference between the clock cycle and the calculation delay of the post-combination logic module 3120 .
  • the number of bits of the first subsection may be determined such that the calculation delay of the second stage addition module 350 is substantially equal to the difference between the clock period and the calculation delay of the post-combination logic module 3120 .
  • the number of bits of the first subsection of the inputs 311, 312 of the adder 300 is determined based at least in part on the clock cycle.
  • the number of bits of the first subsection of the input 311, 312 is determined to be greater than or equal to the number of bits of the other subsections. In some embodiments, the number of bits of the multiple sub-portions of the inputs 311, 312 are determined to be substantially equal.
  • the strategies described above may be combined to determine the number and number of bits of subsections of the inputs 311 , 312 of the adder 300 .
  • the input 311 of the adder 300 may first be determined according to the difference between the clock cycle and the calculation delay of the pre-combination logic module 3110 and the difference between the clock cycle and the calculation delay of the post-combination logic module 3120 , the number of bits in the first subsection of 312.
  • the upper limit of the number of bits of the first subsection can be determined according to the difference between the clock cycle and the calculation delay of the pre-combination logic module 3110
  • the upper limit of the number of bits of the first subsection can be determined according to the difference between the clock cycle and the calculation delay of the post-combination logic module 3120. to determine the lower limit of the number of bits in this first subsection.
  • the other bits of the inputs 311, 312 can be divided into the second subsection.
  • the input 311, 312 is divided into two subsections, wherein the number of bits in the first subsection is greater than or equal to the number of bits in the second subsection.
  • the other bits of the input 311, 312 can be divided into several subsections, so that the number of these subsections is as small as possible, and each The number of bits of each subsection is less than or equal to the number of bits of the first subsection determined.
  • the number of bits of these subsections may be determined to be substantially equal to the number of bits of the first subsection.
  • the number and the number of bits of the subsections can be further adjusted according to the constant. For example, if the constant includes a plurality of consecutive bits that are all zeros, a subsection of the input may be divided corresponding to the consecutive bits, regardless of whether the number of bits in the subsection is greater than the determined first subsection. digits.
  • the present disclosure achieves an increase in the operation speed of the adder with lower cost and lower power consumption.
  • FIG. 4 shows a part of an arithmetic circuit 4000 including an adder 4120 according to the related art.
  • the arithmetic circuit 4000 includes first-level registers 4101 and 4102 , a pre-combination logic module 4110 , an adder 4120 , a second-level register 4200 , a post-combination logic module 4210 , and a third-level register 4300 .
  • the first-level registers 4101 and 4102 are coupled to the second-level register 4200 via the pre-combination logic module 4110 and the adder 4120 .
  • the second level register 4200 is coupled to the third level register 4300 via the post-combination logic module 4210 .
  • first level registers 4101 and 4102, the second level register 4200 and the third level register 4300 in the arithmetic circuit 4000 shown in FIG. 4 correspond to the previous level registers in the arithmetic circuit 3000 shown in FIG. 3 respectively 3101 and 3102, the intermediate register group 330, and the next-level register 3200.
  • the pre-combination logic module 4110 and the post-combination logic module 4210 in the operation circuit 4000 correspond to the pre-combination logic module 3110 and the post-combination logic module 3120 in the operation circuit 3000, respectively.
  • the adder 4120 in the operation circuit 4000 of the related art shown in FIG. The adder 300 in the arithmetic circuit 3000 of the present disclosure is coupled between the previous-stage registers 3101 , 3102 and the latter-stage register 3200 across the intermediate register group 330 .
  • the adder 4120 can only use the clock cycle between the first-level registers 4101 and 4102 and the second-level register 4200 to perform operations together with the pre-combination logic module 4110 .
  • the adder 300 can utilize, together with the pre-combination logic module 3110 and the post-combination logic module 3120 , between the previous stage registers 3101 , 3102 and the middle register group 330 and the middle The operation is performed in two clock cycles between the register group 330 and the next stage register 3200 .
  • the configuration of the adder 300 can be appropriately adjusted according to the configuration of the pre-combination logic module 3110 and the post-combination logic module 3120, so that the two clock cycles can be used more fully and flexibly. time to complete the addition operation.
  • the operations performed by the first-stage addition module group 320 and the second-stage addition module 350 in the adder 300 shown in FIG. 3 are essentially the same as those performed by the adder 4120 shown in FIG. 4 . above are equivalent.
  • the adder 4120 also has modules or units equivalent to or corresponding to the first-stage addition module group 320 and the second-stage addition module 350 in the adder 300 . Therefore, compared with the related art, the configuration of the first-stage addition module group 320 and the second-stage addition module 350 in the adder 300 does not introduce additional cost.
  • each carry register in the carry register group 340 is implemented by a 1-bit register, which has a low manufacturing cost.
  • the additional cost of implementing the adder of the present disclosure is substantially only the manufacturing cost of several 1-bit registers.
  • the adder proposed by the present disclosure creatively uses the clock cycles of adjacent stages to complete some operations, thereby effectively improving the performance of the adder and the operation circuit including the adder at a lower cost. calculating speed.
  • a chip may include an arithmetic circuit as described above, and the chip may also be included in a computing device.
  • the word "exemplary” means “serving as an example, instance, or illustration” rather than as a “model” to be exactly reproduced. Any implementation illustratively described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the present disclosure is not to be bound by any expressed or implied theory presented in the preceding technical field, background, brief summary or detailed description.
  • the word “substantially” is meant to encompass any minor variation due to design or manufacturing imperfections, tolerances of devices or elements, environmental influences, and/or other factors.
  • the word “substantially” also allows for differences from a perfect or ideal situation due to parasitics, noise, and other practical considerations that may exist in an actual implementation.
  • connection means that one element/node/feature is electrically, mechanically, logically or otherwise directly connected to another element/node/feature (or direct communication).
  • coupled means that one element/node/feature can be mechanically, electrically, logically or otherwise linked, directly or indirectly, with another element/node/feature to allow interaction, even though the two features may not be directly connected. That is, “coupled” is intended to encompass both direct and indirect connections of elements or other features, including connections utilizing one or more intervening elements.
  • first,” “second,” and the like may also be used herein for reference purposes only, and are thus not intended to be limiting.
  • the terms “first,” “second,” and other such numerical terms referring to structures or elements do not imply a sequence or order unless the context clearly dictates otherwise.
  • providing is used broadly to encompass all ways of obtaining an object, thus “providing something” includes, but is not limited to, “purchasing,” “preparing/manufacturing,” “arranging/arranging,” “installing/ Assembly”, and/or “Order” objects, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Signal Processing (AREA)
  • Mathematical Optimization (AREA)
  • Complex Calculations (AREA)
  • Logic Circuits (AREA)

Abstract

一种加法器(100)、运算电路(3000)、芯片和计算装置。所述加法器(100)用于计算输入的两个数字之和,加法器(100)具有分别表示两个数字的两个输入(111,112),每个输入被彼此对应地划分为多个子部分,多个子部分由低位到高位依次表示输入的部分位,加法器(100)包括:多个第一级加法模块(120-1,120-2,…,120-N),每一个用于对两个输入(111,112)的对应子部分进行求和;多个中间寄存器(130-1,130-2,…,130-N),每一个耦接到对应的第一级加法模块,用于存储两个输入(111、112)的对应子部分的和数;一个或多个进位寄存器(140-1,140-2,…,140-N),每一个耦接到对应的第一级加法模块,用于存储两个输入(111,112)的对应子部分的进位;以及第二级加法模块(150),耦接到多个中间寄存器(130-1,130-2,…,130-N)和一个或多个进位寄存器(140-1,140-2,…,140-N),用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。

Description

加法器、运算电路、芯片和计算装置
相关申请的交叉引用
本申请是以CN申请号为202010711949.8,申请日为2020年7月22日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开总体而言涉及数字电路。具体来说,涉及一种加法器,一种包括加法器的运算电路,以及芯片和计算装置。
背景技术
用于进行加法运算的加法器是许多运算电路的重要组成部分。在相关技术中,如果需要提高加法器的运算速度,通常会采用高速器件来实现加法器。
发明内容
根据本公开的一个方面,提供了一种加法器,其用于计算输入的两个数字之和,所述加法器具有分别表示所述两个数字的两个输入,其中每个输入被彼此对应地划分为多个子部分,所述多个子部分由低位到高位依次表示所述输入的部分位,并且所述加法器包括:多个第一级加法模块,每个第一级加法模块用于对所述两个输入的对应子部分进行求和;多个中间寄存器,每个中间寄存器耦接到对应的第一级加法模块,用于存储所述两个输入的对应子部分的和数;一个或多个进位寄存器,每个进位寄存器耦接到对应的第一级加法模块,用于存储所述两个输入的对应子部分的进位;以及第二级加法模块,耦接到所述多个中间寄存器和所述一个或多个进位寄存器,用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。
根据本公开的另一个方面,提供了一种加法器,其用于计算输入的一个数字与预定的常数之和,所述加法器具有表示所述数字的一个输入,所述输入被划分为多个子部分,所述多个子部分由低位到高位依次表示所述输入的部分位,并且所述加法器包括:一个或多个第一级加法模块,每个第一级加法模块用于对所述输入的对应子部分与所述常数的对应位进行求和;多个中间寄存器,每个中间寄存器耦接到对应的第一级加法模块,用于存储所述输入的对应子部分与所述常数的对应位的和数;一个或多个进 位寄存器,每个进位寄存器耦接到对应的第一级加法模块,用于存储所述输入的对应子部分与所述常数的对应位的进位;以及第二级加法模块,耦接到所述多个中间寄存器和所述一个或多个进位寄存器,用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。
根据本公开的另一个方面,提供了一种运算电路,其包括如上所述的加法器;以及耦接到所述加法器的输入的前置组合逻辑模块和耦接到所述加法器的输出的后置组合逻辑模块中的至少一者。
根据本公开的另一个方面,提供了一种芯片,其包括如上所述的运算电路。
根据本公开的又一个方面,提供了一种计算装置,其包括如上所述的芯片。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得更为清楚。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1示出了根据本公开一个或多个示例性实施例的用于计算输入的两个数字之和的加法器的示意图。
图2示出了根据本公开一个或多个示例性实施例的用于计算输入的一个数字与预定的常数之和的加法器的示意图。
图3示出了包括根据本公开一个或多个示例性实施例的加法器的运算电路的一部分。
图4示出了包括根据相关技术的加法器的运算电路的一部分。
注意,在以下说明的实施方式中,有时在不同的附图之间共同使用同一附图标记来表示相同部分或具有相同功能的部分,而省略其重复说明。在一些情况中,使用相似的标号和字母表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
为了便于理解,在附图等中所示的各结构的位置、尺寸及范围等有时不表示实际的位置、尺寸及范围等。因此,本公开并不限于附图等所公开的位置、尺寸及范围等。
具体实施方式
下面将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。也就是说,本文中的结构及方法是以示例性的方式示出,来说明本公开中的结构和方法的不同实施例。然而,本领域技术人员将会理解,它们仅仅说明可以用来实施的本公开的示例性方式,而不是穷尽的方式。此外,附图不必按比例绘制,一些特征可能被放大以示出具体组件的细节。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
在相关技术中,如果需要提高加法器的运算速度,通常会采用高速器件来实现加法器。但是,高速器件的面积较大、功耗较高,这导致加法器以及包括加法器的运算电路的面积和功耗相应增大,使得芯片的制造成本和功耗显著增大。因此,期望以较低的制造成本和功耗来提高加法器的运算速度,因而需要一种改进的加法器。
图1示出了根据本公开一个或多个示例性实施例的加法器100的示意图。加法器100用于计算输入的两个数字之和。
如图1所示,加法器100具有两个输入111、112和两个输出161、162。其中,两个输入111、112分别表示输入的两个数字,并且输出161、162分别表示这两个数字的求和结果的和数及进位。
本领域技术人员应当理解,加法器100的输入和输出的配置不限于图1所示的实施例。可以根据加法器的功能和运算电路的需要而适当地调整加法器100的输入和输出的配置,并相应地调整加法器100的各个模块的配置。例如,在一些实施例中,加法器也可以仅具有一个输出,即仅输出求和结果的和数,而不输出进位。
如图1所示,每个输入111、112被彼此对应地划分为N个子部分,这N个子部分由低位到高位依次表示该输入的部分位。例如,输入111由低位到高位被划分为子部分111-1,111-2,…,111-N,输入112由低位到高位被划分为子部分112-1,112-2,…,112-N。
具体而言,第一个子部分111-1和112-1分别表示输入111和112的最低一位或多位,并且111-1和112-1所表示的位相同。相应地,第二个子部分111-2和112-2分别表示输入111和112的比111-1和112-1高的一位或多位,并且111-2和112-2所表示的位相同。依此类推,第N个子部分111-N和112-N分别表示输入111和112的最高一位或多位,并且111-N和112-N所表示的位相同。
其中,N应为大于或等于2的整数。即,每个输入111、112至少具有两个子部分。
在一些实施例中,如图1所示,表示输入111和112的和数的输出161被划分为两个子部分161-1、161-2。其中,161-1与输入111和112的第一个子部分111-1和112-1对应,表示输出161的最低一位或多位;161-2则与输入111和112的其他子部分对应,表示输出161的其他一位或多位。
本领域技术人员应当理解,本文中将输入和输出划分为多个子部分只是为了便于描述各个子部分的不同的耦接关系,并不意指或暗示各个子部分之间必然被物理地分隔或隔断。特别地,本领域技术人员应当理解,将输入和输出划分为耦接关系不同的多个子部分并不需要在数字电路中引入额外的部件或产生额外的成本。
如图1所示,加法器100包括第一级加法模块组120、中间寄存器组130、进位寄存器组140以及第二级加法模块150。
第一级加法模块组120耦接到输入111、112,包括多个第一级加法模块120-1,120-2,…,120-N。每个第一级加法模块120-1,120-2,…,120-N用于对两个输入111、112的对应子部分进行求和。
例如,第一个第一级加法模块120-1耦接到两个输入111、112的第一个子部分111-1和112-1,用于对111-1和112-1进行求和。
第一级加法模块的数量与输入的子部分的数量相等,并且每个第一级加法模块的配置可以根据输入的对应子部分的位数来确定。
第一级加法模块组120的输出耦接到中间寄存器组130和进位寄存器组140,将求和结果的和数与进位分别输出到中间寄存器组130和进位寄存器组140。
中间寄存器组130包括多个中间寄存器130-1,130-2,…,130-N。每个中间寄存器130-1,130-2,…,130-N耦接到对应的第一级加法模块120-1,120-2,…,120-N,用于存储两个输入111、112的对应子部分的求和结果的和数。
例如,第一个中间寄存器130-1耦接到第一个第一级加法模块120-1,用于存储 111-1和112-1的求和结果的和数。即,第一个中间寄存器130-1与两个输入111、112的第一个子部分111-1和112-1对应,用于存储两个输入111、112的由111-1和112-1所表示的最低一位或多位的求和结果的和数。
中间寄存器的数量与输入的子部分的数量相等,并且每个中间寄存器的配置可以根据输入的对应子部分的位数来确定。
进位寄存器组140包括多个进位寄存器140-1,140-2,…,140-N。每个进位寄存器140-1,140-2,…,140-N耦接到对应的第一级加法模块120-1,120-2,…,120-N,用于存储两个输入111、112的对应子部分的求和结果的进位。
例如,第一个进位寄存器140-1耦接到第一个第一级加法模块120-1,用于存储111-1和112-1的求和结果的进位。即,第一个进位寄存器140-1用于存储两个输入111、112的由111-1和112-1所表示的最低一位或多位的求和结果的进位。
进位寄存器的数量可以根据输入的子部分的数量来确定。在图1所示的实施例中,进位寄存器的数量与输入的子部分的数量相等。在其他实施例中,进位寄存器的数量可以比输入的子部分的数量少1。即,进位寄存器组140中可以不存在第N个进位寄存器140-N,不存储两个输入111、112的由111-N和112-N所表示的最高一位或多位的求和结果的进位。
进位寄存器的数量可以根据需要来确定。在加法器100需要输出两个输入111、112的求和结果的进位(即,加法器100包括输出162)的实施例中,可以将进位寄存器的数量确定为与输入的子部分的数量相等。在加法器100不需要输出两个输入111、112的求和结果的进位(即,加法器100不包括输出162)的实施例中,可以将进位寄存器的数量确定为比输入的子部分的数量少1,从而使加法器100的额外成本减少。例如,在两个输入111、112分别具有两个子部分的实施例中,加法器100可以仅包括一个进位寄存器。
进位寄存器仅用于存储进位,因此每个进位寄存器可以由1比特寄存器来实现。
中间寄存器组130和进位寄存器组140的输出耦接到第二级加法模块150,将两个输入111、112的各个对应的子部分分别求和的结果(包括和数与进位)输出到第二级加法模块150。
第二级加法模块150用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和,从而得到两个输入111、112之和。
具体而言,第二级加法模块150可以将从第一个中间寄存器130-1输出的两个输 入111、112的由111-1和112-1所表示的最低一位或多位的求和结果的和数输出为两个输入111、112的求和结果的和数的对应最低一位或多位(即输出161的子部分161-1)。
进一步地,第二级加法模块150可以将从第二个中间寄存器130-2输出的两个输入111、112的由111-2和112-2所表示的一位或多位的求和结果的和数与从第一个进位寄存器140-1输出的两个输入111、112的由111-1和112-1所表示的最低一位或多位的求和结果的进位进行求和,而后将求和结果的和数输出为两个输入111、112的和数的对应一位或多位,并且将求和结果的进位用于第二级加法模块150中的进一步的运算。
依此类推,第二级加法模块150可以将从第N个中间寄存器130-N输出的两个输入111、112的由111-N和112-N所表示的最高一位或多位的求和结果的和数与从第N-1个进位寄存器140-N-1输出的两个输入111、112的由111-N-1和112-N-1所表示的一位或多位的求和结果的进位进行求和,而后将求和结果的和数输出为两个输入111、112的和数的对应最高一位或多位。
进一步地,第二级加法模块150可以将上述求和结果的进位与从第N个进位寄存器140-N输出的两个输入111、112的由111-N和112-N所表示的最高一位或多位的求和结果的进位进行求和,而后将求和结果输出为两个输入111、112的求和结果的进位,即输出162。
本领域技术人员应当理解,第二级加法模块150所执行的处理不限于以上所述的处理。可以根据加法器100的功能来确定第二级加法模块150的配置。例如,在不需要输出两个输入111、112的求和结果的进位(即加法器不包括输出162)的实施例中,第二级加法模块150可以不执行用于计算并输出两个输入111、112的求和结果的进位的处理。
第二级加法模块150的输出耦接到输出161和162,输出161和162分别表示两个输入111、112的求和结果的和数及进位。
在一些实施例中,输出161可以被划分为两个子部分:第一个子部分161-1,表示两个输入111、112的求和结果的和数的与111-1和112-1对应的最低一位或多位;以及第二个子部分161-2,表示两个输入111、112的求和结果的和数的其他一位或多位。
如图1所示,在一些实施例中,第二级加法模块150可以将第一个中间寄存器130-1的输出直接耦接到输出161的第一个子部分161-1。
本领域技术人员应理解,本文中提到的寄存器可以是边沿触发寄存器(例如D类型触发器)或电平触发寄存器(例如锁存器)。
加法器100的计算速度主要依赖于第一级加法模块组120和第二级加法模块150的计算速度,而第一级加法模块组120和第二级加法模块150的计算速度与两个输入111、112的子部分的数量和位数有关。因此,可以适当地确定两个输入111、112的子部分的数量和位数,从而使加法器100的计算速度提高。
第一级加法模块组120的整体计算延时由多个第一级加法模块120-1,120-2,…,120-N中的计算延时最长的一个第一级加法模块的计算延时来决定。每个第一级加法模块120-1,120-2,…,120-N用于对两个输入111、112的对应子部分进行求和,该子部分的位数越多,则对应的第一级加法模块的计算延时越长。
因此,第一级加法模块组120的整体计算延时依赖于输入111、112的多个子部分的位数中的最大位数。该最大位数越大,则第一级加法模块组120的整体计算延时越长。
第二级加法模块150用于对来自多个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。其中,第一个中间寄存器130-1的输出即表示求和结果的和数的对应最低一位或多位。因此,第二级加法模块150可以不对第一个中间寄存器130-1的输出进行额外的处理。特别地,在一些实施例中,第二级加法模块150也可以根据需要对第一个中间寄存器130-1的输出进行一定的处理,但这种处理的耗时将远少于第二级加法模块150需要对其他中间寄存器和进位寄存器的输出进行的如上所述的求和处理。也就是说,第二级加法模块150的计算延时由如上所述的求和处理的计算延时来决定。
因此,第二级加法模块150的计算延时可依赖于输入111、112的多个子部分中的除了第一个子部分111-1、112-1之外的其他子部分的数量以及其位数之和。该其他子部分的数量越多,或者其位数之和越大,则第二级加法模块150的计算延时越长。换言之,输入111、112的子部分的数量越少,或者第一个子部分111-1、112-1的位数越大,则第二级加法模块150的计算延时越短。
因此,期望减少两个输入111、112的多个子部分的位数中的最大位数。同时,期望减少输入111、112的子部分的数量,并且期望增加第一个子部分的位数。
在一些实施例中,输入111、112的第一个子部分111-1、112-1的位数大于或等于其他子部分的位数。在另一些实施例中,输入111、112的多个子部分的位数基本 相等。这有利于降低第一级加法模块组120和第二级加法模块150的计算延时,从而提高加法器100的计算速度,进而降低芯片的功耗算力比。
在一些实施例中,输入111、112具有两个子部分,并且第一个子部分111-1、112-1的位数大于或等于输入111、112的位数的一半。这有利于在提高加法器100的计算速度的同时使得额外成本减少。
需要说明的是,本文中的“基本相等”的表述意指二者大致相等,但不必然严格地、精确地相等。本领域技术人员应当理解,这符合技术原理和工程实践。例如,二者可以相差约5%或10%。在一些语境中,二者可以相差约15%或20%。
图2示出了根据本公开一个或多个示例性实施例的加法器200的示意图。加法器200用于计算输入的一个数字与预定的常数之和。
如图2所示,加法器200具有一个输入210和两个输出261、262。其中,输入210表示输入的数字,并且输出261、262分别表示这个数字与预定的常数的求和结果的和数及进位。
类似地,在一些实施例中,加法器200也可以仅具有一个输出261,即仅输出求和结果的和数,而不输出进位。
加法器200的配置与加法器100类似,并且可以根据该预定的常数而进行适当的调整。
如图2所示,输入210被划分为N个子部分,这N个子部分由低位到高位依次表示该输入的部分位。即,输入210由低位到高位被划分为子部分210-1,210-2,…,210-N。其中,N应为大于或等于2的整数。即,输入210至少具有两个子部分。
在一些实施例中,如图2所示,表示输入210与该常数的和数的输出261被划分为两个子部分261-1、261-2。其中,261-1与输入210的第一个子部分210-1对应,表示输出261的最低一位或多位;261-2则与输入210的其他子部分210-2,…,210-N对应,表示输出261的其他一位或多位。
如图2所示,加法器200包括第一级加法模块组220、中间寄存器组230、进位寄存器组240以及第二级加法模块250。
第一级加法模块组220耦接到输入210,包括多个第一级加法模块220-1,220-2,…,220-N。每个第一级加法模块220-1,220-2,…,220-N用于对输入210的对应子部分与该常数的对应位进行求和。
每个第一级加法模块220-1,220-2,…,220-N的配置可以与该常数的对应位相关。 在一些实施例中,第一级加法模块220-1,220-2,…,220-N的数量和配置可以至少部分地根据该常数来确定或调整。例如,对于输入210的任一子部分,如果已知该预定常数的对应位全部为零,则可以不对该子部分与该常数的对应位进行求和,因此第一级加法模块组220中可以不包括对应的第一级加法模块。这有利于降低加法器200的制造成本。
例如,在一些实施例中,如果该常数较小(即较高位全部为零),则第一级加法模块组220中可以仅包括一个第一级加法模块,即仅包括与输入210的第一个子部分210-1对应的第一个第一级加法模块220-1。特别地,当该常数为1时,加法器200为自加1加法器,并且可以仅包括一个第一级加法模块。
第一级加法模块组220的输出耦接到中间寄存器组230和进位寄存器组240,将求和结果(包括和数与进位)分别输出到中间寄存器组230和进位寄存器组240。
中间寄存器组230包括多个中间寄存器230-1,230-2,…,230-N。如图2所示,每个中间寄存器230-1,230-2,…,230-N耦接到对应的第一级加法模块220-1,220-2,…,220-N,用于存储输入210的对应子部分与该常数的对应位的求和结果的和数。
进位寄存器组240包括多个进位寄存器240-1,240-2,…,240-N。每个进位寄存器240-1,240-2,…,240-N耦接到对应的第一级加法模块220-1,220-2,…,220-N,用于存储输入210的对应子部分与该常数的对应位的求和结果的进位。
中间寄存器组230和进位寄存器组240的配置可以根据第一级加法模块组220的配置进行适当的调整。例如,当第一级加法模块组220中不包括与输入210的任一子部分对应的第一级加法模块时,中间寄存器组230中的对应的中间寄存器可以直接耦接到输入210的该子部分,并且进位寄存器组240中可以不包括对应的进位寄存器。
在图2所示的实施例中,进位寄存器的数量与输入的子部分的数量相等。在其他实施例中,进位寄存器的数量可以比输入的子部分的数量少1,即不存在第N个进位寄存器240-N。
中间寄存器组230和进位寄存器组240的输出耦接到第二级加法模块250,将输入210的各个子部分与所述常数的对应位分别求和的结果(包括和数与进位)输出到第二级加法模块250。
第二级加法模块250用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和,即对输入210的各个子部分与该常数的对应位分别求和的结果(包括和数与进位)进行求和,从而得到输入210与该常数之和。
具体而言,第二级加法模块250可以将从第一个中间寄存器230-1输出的输入210的由210-1所表示的最低一位或多位与该常数的对应最低一位或多位的和数输出为输入210与该常数的和数的对应最低一位或多位(即输出261的子部分261-1)。
进一步地,第二级加法模块250可以将从第二个中间寄存器230-2输出的输入210的由210-2所表示的一位或多位与该常数的对应一位或多位的和数与从第一个进位寄存器240-1输出的进位进行求和,而后将求和结果的和数输出为输入210与该常数的和数的对应一位或多位,并且将求和结果的进位用于第二级加法模块250中的进一步的运算。
依此类推,第二级加法模块250可以将从第N个中间寄存器230-N输出的输入210的由210-N所表示的最高一位或多位与该常数的对应最高一位或多位的和数与从第N-1个进位寄存器240-N-1输出的进位进行求和,而后将求和结果的和数输出为输入210与该常数的和数的对应最高一位或多位。
进一步地,第二级加法模块250可以将上述求和结果的进位与从第N个进位寄存器240-N输出的进位进行求和,而后将求和结果输出为输入210与该常数的求和结果的进位(即输出262)。
本领域技术人员应当理解,第二级加法模块250所执行的处理不限于以上所述。可以根据加法器的功能来确定第二级加法模块250的配置。例如,在加法器200不包括输出262的实施例中,可以对第二级加法模块250的配置及其所执行的处理进行相应的调整。
第二级加法模块250的输出耦接到输出261和262,输出261和262分别表示输入210与该常数的求和结果的和数及进位。
在一些实施例中,输出261可以被划分为两个子部分:第一个子部分261-1,表示输入210与该常数的求和结果的和数的与210-1对应的最低一位或多位;以及第二个子部分261-2,表示输入210与该常数的求和结果的和数的其他一位或多位。如图2所示,在一些实施例中,第二级加法模块250可以将第一个中间寄存器230-1的输出直接耦接到输出261的第一个子部分261-1。
加法器200的计算速度主要依赖于第一级加法模块组220和第二级加法模块250的计算速度。
关于第二级加法模块250,类似地,输入210的子部分的数量越少,或者第一个子部分210-1的位数越大,则第二级加法模块250的计算延时越短。因此,期望减少 输入210的子部分的数量,并且增加第一个子部分210-1的位数。
另一方面,与加法器100不同的是,在加法器200中,第一级加法模块220-1,220-2,…,220-N的配置可以与该常数相关。如上所述,如果对于输入210的某一子部分,该常数的对应位全部为零,则第一级加法模块组220中可以不包括对应的第一级加法模块。因此,第一级加法模块组220的计算延时与这样的子部分的位数无关,而仅依赖于输入210的其他子部分(即,对于这些子部分,该常数的对应位不全部为零)的位数。具体而言,期望适当地增加对应于常数的全部为零的位的子部分的位数,并且期望减少其他子部分的位数中的最大位数。
在一些实施例中,输入210的子部分的数量和每个子部分的位数至少部分地根据该常数来确定。例如,如果该常数较小(即该常数的较高位全部为零,例如该常数为1),则输入210可以具有两个子部分,使得该常数的与第二个子部分对应的位全部为零。例如,如果该常数中包括全部为零的连续多个位,则可以与这连续多个位的至少一部分对应地划分出输入210的一个子部分。
图3示出了包括根据本公开一个或多个示例性实施例的加法器300的运算电路3000的一部分。
仅作为示例,在图3中,加法器300示出为如图1所示的用于计算输入的两个数字之和的加法器。但是,本领域技术人员应当理解,可以将加法器300替换为如图2所示的用于计算输入的一个数字与预定的常数之和的加法器,只需对运算电路3000进行适当的调整。
运算电路3000包括前一级寄存器3101、3102,加法器300,以及后一级寄存器3200。此外,在一些实施例中,运算电路3000还可以包括前置组合逻辑模块3110和后置组合逻辑模块3120。
在一些实施例中,前一级寄存器3101、3102可以直接耦接到加法器300。在一些实施例中,前一级寄存器3101、3102可以经由前置组合逻辑模块3110耦接到加法器300。在一些实施例中,加法器300可以直接耦接到后一级寄存器3200。在一些实施例中,加法器300可以经由后置组合逻辑模块3120耦接到后一级寄存器3200。
本领域技术人员应当理解,前一级寄存器3101、3102和后一级寄存器3200的数量和配置不限于图3中的实施例。例如,在一些实施例中,运算电路3000可以仅包括一个前一级寄存器3101,该前一级寄存器3101经由前置组合逻辑模块3110来向加法器300提供两个输入311、312。
仅作为示例,图3示出了运算电路3000包括前置组合逻辑模块3110和后置组合逻辑模块3120的实施例。本领域技术人员应当理解,以下描述同样可以适用于运算电路3000不包括前置组合逻辑模块3110或后置组合逻辑模块3120的实施例,只需进行适当的调整。
加法器300与图1所示的加法器100的配置类似。
加法器300具有分别表示输入的两个数字的两个输入311、312,以及分别表示这两个数字的求和结果的和数及进位两个输出361、362。其中,输出361具有两个子部分361-1、361-2。
加法器300包括:第一级加法模块组320,包括多个中间寄存器330-1,330-2,…,330-N的中间寄存器组330,进位寄存器组340,以及第二级加法模块350。其中,在一些实施例中,第二级加法模块350将第一个中间寄存器330-1的输出直接耦接到输出361的第一个子部分361-1。
在运算电路3000中,用于前一级寄存器3101和3102、中间寄存器组330以及后一级寄存器3200的时钟的频率相同。因此,期望前置组合逻辑模块3110和第一级加法模块组320的运算能够在一个时钟周期内完成,并且第二级加法模块350和后置组合逻辑模块3120的运算能够在一个时钟周期内完成。
因此,期望第一级加法模块组320的计算延时小于时钟周期与前置组合逻辑模块3110的计算延时之差,并且第二级加法模块350的计算延时小于时钟周期与后置组合逻辑模块3120的计算延时之差。
关于第一级加法模块组320,如上所述,在加法器300中,输入311、312的多个子部分的位数中的最大位数越大,则第一级加法模块组320的计算延时越长。因此,输入311、312的多个子部分的位数中的最大位数的上限可以至少部分地根据时钟周期与前置组合逻辑模块3110的计算延时之差来确定。
在一些实施例中,可以至少部分地根据时钟周期与前置组合逻辑模块3110的计算延时之差来确定加法器300的输入311、312的多个子部分的位数中的最大位数。具体而言,可以将该最大位数确定为:使得第一级加法模块组320的计算延时小于时钟周期与前置组合逻辑模块3110的计算延时之差。在一些实施例中,可以将该最大位数确定为:使得第一级加法模块组320的计算延时基本等于时钟周期与前置组合逻辑模块3110的计算延时之差。
此外,在运算电路3000不包括前置组合逻辑模块3110的实施例中,在一些示例 中,至少部分地根据时钟周期来确定加法器300的输入311、312的多个子部分的位数中的最大位数。
另一方面,在加法器300为图2所示的用于计算输入的一个数字与预定的常数之和的加法器的情况下,第一级加法模块组320的计算延时还与该常数有关。如上所述,可以根据该常数来调整多个子部分的位数。
因此,在一些实施例中,可以首先根据时钟周期与前置组合逻辑模块3110的计算延时之差来确定加法器300的输入的多个子部分的位数中的最大位数,进而根据该常数来调整多个子部分的位数。例如,如果该常数包括全部为零的连续多个位,则可以与这连续多个位对应地划分出输入的一个子部分,不管该子部分的位数是否大于该确定的最大位数。在一些实施例中,可以调整输入的多个子部分的位数,使得不对应于该常数的全部为零的位的子部分中的最大位数基本等于该确定的最大位数。
关于第二级加法模块350,如上所述,在加法器300中,输入311、312的子部分的数量越多,或者第一个子部分的位数越小,则第二级加法模块350的计算延时越长。因此,加法器300的输入311、312的第一个子部分的位数的下限可以至少部分地根据时钟周期与后置组合逻辑模块3120的计算延时之差来确定。
在一些实施例中,至少部分地根据时钟周期与后置组合逻辑模块3120的计算延时之差来确定加法器300的输入311、312的第一个子部分的位数。具体而言,可以将该第一个子部分的位数确定为:使得第二级加法模块350的计算延时小于时钟周期与后置组合逻辑模块3120的计算延时之差。在一些实施例中,可以将该第一个子部分的位数确定为:使得第二级加法模块350的计算延时基本等于时钟周期与后置组合逻辑模块3120的计算延时之差。
此外,在运算电路3000不包括后置组合逻辑模块3120的实施例中,在一些示例中,至少部分地根据时钟周期来确定加法器300的输入311、312的第一个子部分的位数。
另一方面,如上面所提到的,在一些实施例中,将输入311、312的第一个子部分的位数确定为大于或等于其他子部分的位数。在一些实施例中,将输入311、312的多个子部分的位数确定为基本相等。
在一些实施例中,可以将以上所述的策略结合起来以确定加法器300的输入311、312的子部分的数量和位数。
例如,在一些实施例中,首先可以根据时钟周期与前置组合逻辑模块3110的计 算延时之差以及时钟周期与后置组合逻辑模块3120的计算延时之差来确定加法器300的输入311、312的第一个子部分的位数。例如,可以根据时钟周期与前置组合逻辑模块3110的计算延时之差来确定该第一个子部分的位数的上限,并且根据时钟周期与后置组合逻辑模块3120的计算延时之差来确定该第一个子部分的位数的下限。
而后,如果确定的第一个子部分的位数大于或等于输入311、312的位数的一半,则可以将输入311、312的其他位划分为第二个子部分。这样,输入311、312被划分为两个子部分,其中第一个子部分的位数大于或等于第二个子部分的位数。
如果确定的第一个子部分的位数小于输入311、312的位数的一半,则可以将输入311、312的其他位划分为若干个子部分,使得这些子部分的数量尽可能少,并且每个子部分的位数均小于或等于确定的第一个子部分的位数。例如,可以将这些子部分的位数确定为与第一个子部分的位数基本相等。
在加法器300为图2所示的用于计算输入的一个数字与预定的常数之和的加法器200的情况下,可以进而根据该常数来调整子部分的数量和位数。例如,如果该常数包括全部为零的连续多个位,则可以与这连续多个位对应地划分出输入的一个子部分,不管该子部分的位数是否大于确定的第一个子部分的位数。
本领域技术人员应当理解,加法器的输入的子部分的数量和位数的确定方式不限于以上所描述的具体实施例。可以独立地或结合地采用本文所描述的各种策略,综合考虑加法器和运算电路的功能、配置、面积、成本、速度、功耗等各种因素,来确定加法器的输入的子部分的数量和位数。
与需要使用高速器件的相关技术相比,本公开以较低的成本和较低的功耗实现了加法器的运算速度的提升。
作为对比,图4示出了包括根据相关技术的加法器4120的运算电路4000的一部分。
运算电路4000包括第一级寄存器4101、4102,前置组合逻辑模块4110,加法器4120,第二级寄存器4200,后置组合逻辑模块4210,以及第三级寄存器4300。
其中,第一级寄存器4101、4102经由前置组合逻辑模块4110和加法器4120耦接到第二级寄存器4200。第二级寄存器4200经由后置组合逻辑模块4210耦接到第三级寄存器4300。
可以看出,图4所示的运算电路4000中的第一级寄存器4101和4102、第二级寄存器4200及第三级寄存器4300分别对应于图3所示的运算电路3000中的前一级寄 存器3101和3102、中间寄存器组330及后一级寄存器3200。相应地,运算电路4000中的前置组合逻辑模块4110和后置组合逻辑模块4210分别对应于运算电路3000中的前置组合逻辑模块3110和后置组合逻辑模块3120。
本公开和相关技术的重要区别在于,图4所示的相关技术的运算电路4000中的加法器4120耦接在第一级寄存器4101、4102和第二级寄存器4200之间,而图3所示的本公开的运算电路3000中的加法器300跨中间寄存器组330耦接在前一级寄存器3101、3102和后一级寄存器3200之间。
在图4所示的相关技术中,加法器4120仅能够与前置组合逻辑模块4110一起利用第一级寄存器4101、4102和第二级寄存器4200之间的时钟周期进行运算。而在图3所示的本公开的技术方案中,加法器300能够与前置组合逻辑模块3110和后置组合逻辑模块3120一起利用前一级寄存器3101、3102和中间寄存器组330之间以及中间寄存器组330和后一级寄存器3200之间的两个时钟周期进行运算。在本公开的技术方案中,可以根据前置组合逻辑模块3110和后置组合逻辑模块3120的配置来对加法器300的配置进行适当的调整,从而更充分、更灵活地利用两个时钟周期的时间来完成加法运算。
此外,本领域技术人员应当理解,图3所示的加法器300中的第一级加法模块组320和第二级加法模块350所执行的运算与图4所示的加法器4120执行的运算实质上是等同的。换言之,加法器4120中也具有与加法器300中的第一级加法模块组320和第二级加法模块350等同或对应的模块或单元。因此,与相关技术相比,加法器300中的第一级加法模块组320和第二级加法模块350的配置并未引入额外的成本。
也就是说,与图4所示的相关技术相比,实现图3所示的运算电路3000中的加法器300所需要的额外模块或单元仅是进位寄存器组340。如上所述,进位寄存器组340中的每个进位寄存器均由1比特寄存器来实现,其制造成本较低。换言之,与图4所示的相关技术相比,实现本公开的加法器的额外成本基本上仅仅是若干个1比特寄存器的制造成本。
因此,与相关技术相比,本公开所提出的加法器创造性地利用相邻一级的时钟周期来完成部分运算,从而以较低的成本有效地提高了加法器及包括加法器的运算电路的运算速度。
根据本公开的加法器及运算电路可以以软件、硬件、软件与硬件的结合等各种适当的方式实现。在一种实现方式中,一种芯片可以包括如上所述的运算电路,该芯片 还可以包括在一种计算装置中。
在说明书及权利要求中的词语“前”、“后”、“顶”、“底”、“之上”、“之下”等,如果存在的话,用于描述性的目的而并不一定用于描述不变的相对位置。应当理解,这样使用的词语在适当的情况下是可互换的,使得在此所描述的本公开的实施例,例如,能够在与在此所示出的或另外描述的那些取向不同的其他取向上操作。
如在此所使用的,词语“示例性的”意指“用作示例、实例或说明”,而不是作为将被精确复制的“模型”。在此示例性描述的任意实现方式并不一定要被解释为比其它实现方式优选的或有利的。而且,本公开不受在上述技术领域、背景技术、发明内容或具体实施方式中所给出的任何所表述的或所暗示的理论所限定。
如在此所使用的,词语“基本上”意指包含由设计或制造的缺陷、器件或元件的容差、环境影响和/或其它因素所致的任意微小的变化。词语“基本上”还允许由寄生效应、噪声以及可能存在于实际的实现方式中的其它实际考虑因素所致的与完美的或理想的情形之间的差异。
另外,前面的描述可能提及了被“连接”或“耦接”在一起的元件或节点或特征。如在此所使用的,除非另外明确说明,“连接”意指一个元件/节点/特征与另一种元件/节点/特征在电学上、机械上、逻辑上或以其它方式直接地连接(或者直接通信)。类似地,除非另外明确说明,“耦接”意指一个元件/节点/特征可以与另一元件/节点/特征以直接的或间接的方式在机械上、电学上、逻辑上或以其它方式连结以允许相互作用,即使这两个特征可能并没有直接连接也是如此。也就是说,“耦接”意图包含元件或其它特征的直接连结和间接连结,包括利用一个或多个中间元件的连接。
另外,仅仅为了参考的目的,还可以在本文中使用“第一”、“第二”等类似术语,并且因而并非意图限定。例如,除非上下文明确指出,否则涉及结构或元件的词语“第一”、“第二”和其它此类数字词语并没有暗示顺序或次序。
还应理解,“包括/包含”一词在本文中使用时,说明存在所指出的特征、整体、步骤、操作、单元和/或组件,但是并不排除存在或增加一个或多个其它特征、整体、步骤、操作、单元和/或组件以及/或者它们的组合。
在本公开中,术语“提供”从广义上用于涵盖获得对象的所有方式,因此“提供某对象”包括但不限于“购买”、“制备/制造”、“布置/设置”、“安装/装配”、和/或“订购”对象等。
本领域技术人员应当意识到,在上述操作之间的边界仅仅是说明性的。多个操作 可以结合成单个操作,单个操作可以分布于附加的操作中,并且操作可以在时间上至少部分重叠地执行。而且,另选的实施例可以包括特定操作的多个实例,并且在其他各种实施例中可以改变操作顺序。但是,其它的修改、变化和替换同样是可能的。因此,本说明书和附图应当被看作是说明性的,而非限制性的。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。在此公开的各实施例可以任意组合,而不脱离本公开的精神和范围。本领域的技术人员还应理解,可以对实施例进行多种修改而不脱离本公开的范围和精神。本公开的范围由所附权利要求来限定。

Claims (16)

  1. 一种加法器,用于计算输入的两个数字之和,所述加法器具有分别表示所述两个数字的两个输入,其中每个输入被彼此对应地划分为多个子部分,所述多个子部分由低位到高位依次表示所述输入的部分位,并且所述加法器包括:
    多个第一级加法模块,每个第一级加法模块用于对所述两个输入的对应子部分进行求和;
    多个中间寄存器,每个中间寄存器耦接到对应的第一级加法模块,用于存储所述两个输入的对应子部分的和数;
    一个或多个进位寄存器,每个进位寄存器耦接到对应的第一级加法模块,用于存储所述两个输入的对应子部分的进位;以及
    第二级加法模块,耦接到所述多个中间寄存器和所述一个或多个进位寄存器,用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。
  2. 根据权利要求1所述的加法器,其中,第二级加法模块将所述多个中间寄存器中的与所述输入的第一个子部分对应的第一个中间寄存器的输出直接耦接到加法器的输出,其中所述第一个子部分表示所述输入的最低一位或多位。
  3. 根据权利要求1或2所述的加法器,其中,所述两个输入的第一个子部分的位数大于或等于其他子部分的位数。
  4. 根据权利要求1或2所述的加法器,其中,每个输入具有两个子部分。
  5. 一种加法器,用于计算输入的一个数字与预定的常数之和,所述加法器具有表示所述数字的一个输入,所述输入被划分为多个子部分,所述多个子部分由低位到高位依次表示所述输入的部分位,并且所述加法器包括:
    一个或多个第一级加法模块,每个第一级加法模块用于对所述输入的对应子部分与所述常数的对应位进行求和;
    多个中间寄存器,每个中间寄存器耦接到对应的第一级加法模块,用于存储所述输入的对应子部分与所述常数的对应位的和数;
    一个或多个进位寄存器,每个进位寄存器耦接到对应的第一级加法模块,用于存储所述输入的对应子部分与所述常数的对应位的进位;以及
    第二级加法模块,耦接到所述多个中间寄存器和所述一个或多个进位寄存器,用于对来自每个中间寄存器的和数与来自对应的前一个进位寄存器的进位进行求和。
  6. 根据权利要求5所述的加法器,其中,第二级加法模块将所述多个中间寄存器中的与所述输入的第一个子部分对应的第一个中间寄存器的输出直接耦接到加法器的输出,其中所述第一个子部分表示所述输入的最低一位或多位。
  7. 根据权利要求5或6所述的加法器,其中,所述输入的子部分的数量和每个子部分的位数至少部分地根据所述常数来确定。
  8. 根据权利要求5或6所述的加法器,其中,第一级加法模块的数量和配置至少部分地根据所述常数来确定。
  9. 根据权利要求8所述的加法器,其中,所述常数为1。
  10. 根据权利要求5或6所述的加法器,其中,所述输入具有两个子部分。
  11. 一种运算电路,所述运算电路包括:
    根据权利要求1-10中任一项所述的加法器;以及
    耦接到所述加法器的输入的前置组合逻辑模块和耦接到所述加法器的输出的后置组合逻辑模块中的至少一者。
  12. 根据权利要求11所述的运算电路,其中,所述加法器的所述输入的子部分的数量和每个子部分的位数至少部分地根据用于所述运算电路的时钟的周期、前置组合逻辑模块和后置组合逻辑模块中的至少一者的计算延时来确定。
  13. 根据权利要求12所述的运算电路,其中,
    如果所述运算电路包括前置组合逻辑模块,则所述加法器的所述输入的所述多个 子部分的位数中的最大位数至少部分地根据用于所述运算电路的时钟的周期与所述前置组合逻辑模块的计算延时之差来确定,
    如果所述运算电路不包括前置组合逻辑模块,则所述加法器的所述输入的所述多个子部分的位数中的最大位数至少部分地根据用于所述运算电路的时钟的周期来确定。
  14. 根据权利要求12所述的运算电路,其中,
    如果所述运算电路包括后置组合逻辑模块,则所述加法器的所述输入的第一个子部分的位数至少部分地根据用于所述运算电路的时钟的周期与所述后置组合逻辑模块的计算延时之差来确定,
    如果所述运算电路不包括后置组合逻辑模块,则所述加法器的所述输入的第一个子部分的位数至少部分地根据用于所述运算电路的时钟的周期来确定。
  15. 一种芯片,所述芯片包括根据权利要求11-14中任一项所述的运算电路。
  16. 一种计算装置,所述计算装置包括根据权利要求15所述的芯片。
PCT/CN2021/104880 2020-07-22 2021-07-07 加法器、运算电路、芯片和计算装置 WO2022017179A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010711949.8 2020-07-22
CN202010711949.8A CN111708512A (zh) 2020-07-22 2020-07-22 加法器、运算电路、芯片和计算装置

Publications (1)

Publication Number Publication Date
WO2022017179A1 true WO2022017179A1 (zh) 2022-01-27

Family

ID=72547409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104880 WO2022017179A1 (zh) 2020-07-22 2021-07-07 加法器、运算电路、芯片和计算装置

Country Status (3)

Country Link
CN (1) CN111708512A (zh)
TW (1) TWI776580B (zh)
WO (1) WO2022017179A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708512A (zh) * 2020-07-22 2020-09-25 深圳比特微电子科技有限公司 加法器、运算电路、芯片和计算装置
CN112506471A (zh) * 2020-12-21 2021-03-16 深圳比特微电子科技有限公司 用于数字货币运算的芯片和计算系统
CN113419704A (zh) * 2021-07-23 2021-09-21 北京源启先进微电子有限公司 49位加法器及其实现方法、运算电路及芯片

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4853887A (en) * 1986-09-22 1989-08-01 Francis Jutand Binary adder having a fixed operand and parallel-serial binary multiplier incorporating such an adder
CN1614553A (zh) * 2003-11-06 2005-05-11 国际商业机器公司 进位存储加法器及其系统
CN102043604A (zh) * 2010-12-17 2011-05-04 中南大学 并行反馈进位加法器及其实现方法
CN111708512A (zh) * 2020-07-22 2020-09-25 深圳比特微电子科技有限公司 加法器、运算电路、芯片和计算装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1069353A (zh) * 1992-04-29 1993-02-24 黄上立 预置进位加法器
TW201134100A (en) * 2010-03-26 2011-10-01 Novatek Microelectronics Corp (Xiu-accumulator) adder circuit and Xiu-accumulator circuit using the same
CN110688086A (zh) * 2019-09-06 2020-01-14 西安交通大学 一种可重构的整型-浮点加法器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4853887A (en) * 1986-09-22 1989-08-01 Francis Jutand Binary adder having a fixed operand and parallel-serial binary multiplier incorporating such an adder
CN1614553A (zh) * 2003-11-06 2005-05-11 国际商业机器公司 进位存储加法器及其系统
CN102043604A (zh) * 2010-12-17 2011-05-04 中南大学 并行反馈进位加法器及其实现方法
CN111708512A (zh) * 2020-07-22 2020-09-25 深圳比特微电子科技有限公司 加法器、运算电路、芯片和计算装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHI SHANG TAN XIN: "HDL Series: Principles and Design of a Carry-Save Adder)", CSDN, 12 January 2020 (2020-01-12), pages 1 - 6, XP055889329, Retrieved from the Internet <URL:https://blog.csdn.net/zhouxuanyuye/article/details/103947258> *

Also Published As

Publication number Publication date
TW202143024A (zh) 2021-11-16
TWI776580B (zh) 2022-09-01
CN111708512A (zh) 2020-09-25

Similar Documents

Publication Publication Date Title
WO2022017179A1 (zh) 加法器、运算电路、芯片和计算装置
Santoro et al. SPIM: a pipelined 64* 64-bit iterative multiplier
CN101213749A (zh) 多位可编程分频器
TWI784457B (zh) 時鐘電路系統、計算晶片、算力板和資料處理設備
Rawat et al. A low power and reduced area carry select adder
CN111047034B (zh) 一种基于乘加器单元的现场可编程神经网络阵列
WO2022001414A1 (zh) 全加器、芯片和计算装置
TWI783425B (zh) 時鐘電路、計算晶片、算力板和資料處理設備
CN212723979U (zh) 加法器、运算电路、芯片和计算装置
Jung et al. Time borrowing in high-speed functional units using skew-tolerant domino circuits
WO2018209978A1 (zh) Xiu-累加寄存器、xiu-累加寄存器电路、以及电子设备
Neeraja et al. Design of an area efficient braun multiplier using high speed parallel prefix adder in cadence
CN212411183U (zh) 用于执行散列算法的运算电路、芯片和计算装置
Fried Algorithms for power consumption reduction and speed enhancement in high-performance parallel multipliers
US7447727B2 (en) Recursive carry-select topology in incrementer designs
CN111813452A (zh) 用于执行散列算法的运算电路、芯片和计算装置
Larsson et al. Transition reduction in carry-save adder trees
James et al. Performance analysis of double digit decimal multiplier on various FPGA logic families
Rocha et al. Improving the Partial Product Tree Compression on Signed Radix-2 m Parallel Multipliers
WO2022166528A1 (zh) 采用全定制布局摆放的芯片以及用于实现挖矿算法的电子装置
CN212084127U (zh) 运算电路、芯片和计算装置
US5761106A (en) Horizontally pipelined multiplier circuit
US20160105281A1 (en) Chip and method for operating a processing circuit
CN116166219A (zh) 一种可配置模乘法器
Pritha et al. Enhancing the Efficiency of Wallace Tree Multipliers Through Optimized ECSLA Design

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21846975

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21846975

Country of ref document: EP

Kind code of ref document: A1