CN112214199B - 256 bit multiplier - Google Patents

256 bit multiplier Download PDF

Info

Publication number
CN112214199B
CN112214199B CN202010953134.0A CN202010953134A CN112214199B CN 112214199 B CN112214199 B CN 112214199B CN 202010953134 A CN202010953134 A CN 202010953134A CN 112214199 B CN112214199 B CN 112214199B
Authority
CN
China
Prior art keywords
bit
multiplier
output value
bit multiplier
adder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010953134.0A
Other languages
Chinese (zh)
Other versions
CN112214199A (en
Inventor
李树国
孔添琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Cao Mu Xin Technology Co ltd
Original Assignee
Beijing Cao Mu Xin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Cao Mu Xin Technology Co ltd filed Critical Beijing Cao Mu Xin Technology Co ltd
Priority to CN202010953134.0A priority Critical patent/CN112214199B/en
Publication of CN112214199A publication Critical patent/CN112214199A/en
Application granted granted Critical
Publication of CN112214199B publication Critical patent/CN112214199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5332Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by skipping over strings of zeroes or ones, e.g. using the Booth Algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5334Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
    • G06F7/5336Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a 256-bit multiplier, which comprises a plurality of 128-bit product generating devices and a shift addition device, wherein the output ends of the 128-bit product generating devices are respectively connected with the input ends of the shift addition device, and each 128-bit product generating device comprises a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise negation device, a second bitwise negation device and an addition device; the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bit-by-bit negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bit-by-bit negating device, and the output end of the adding device is connected with the input end of the shift addition device.

Description

256 bit multiplier
Technical Field
The present invention relates to the field of multiplier technologies, and in particular, to a 256-bit multiplier.
Background
Multipliers are the most basic functional cores in microprocessor design and modern cryptographic systems. Almost all hardware and software systems require an indispensable arithmetic unit of a multiplier, the efficiency of multiplication implementation fundamentally determines the execution efficiency of the algorithm and the system speed, and the performance of the multiplication implementation directly determines the performance of a cryptographic algorithm such as a public key cryptographic algorithm (RSA). The multiplier is not only widely applied to the design of a cryptographic chip, but also is an indispensable operation core in a digital system.
However, the delay of the multiplication hardware implementation of the 256-bit multiplier in the related art is high, which affects the overall operation efficiency of the multiplier.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, an object of the present invention is to provide a 256-bit multiplier, which can shorten the delay of multiplication hardware implementation and can be applied to fast implementation of algorithms in a hardware circuit, thereby optimizing the performance of a digital circuit chip.
In order to achieve the above object, a 256-bit multiplier according to a first embodiment of the present invention includes a plurality of 128-bit product generators and a shift-add device, where outputs of the 128-bit product generators are respectively connected to inputs of the shift-add device, where each 128-bit product generator includes a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise inverse device, a second bitwise inverse device, and an add device; the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bitwise negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bitwise negating device, and the output end of the adding device is connected with the input end of the shift-add device; the adding device receives operation results obtained by operation of the first 64-bit adder, the second 64-bit multiplier and the third 64-bit multiplier through 64-bit input data received by corresponding input ends, and performs addition operation on the operation results to obtain a 128-bit output value, wherein the operation results comprise a first operation result, a second operation result and a third operation result; the shift-add means performs a shift-add operation based on the 128-bit output value of each 128-bit product generation means to obtain 256-bit output values.
The 256-bit multiplier according to the embodiment of the present invention first receives the operation result obtained by the operation of the first 64-bit adder, the second 64-bit multiplier, and the third 64-bit multiplier by the addition device in each 128-bit product generation device from the 64-bit input data received by the corresponding input terminal, and performs addition operation based on the operation result to obtain a 128-bit output value, and then performs shift addition operation by the shift addition device according to the 128-bit output value of each 128-bit product generation device to obtain a 256-bit output value. Therefore, the method can shorten the time delay of multiplication hardware realization, and can be applied to the quick realization of the algorithm in a hardware circuit, thereby optimizing the performance of a digital circuit chip.
In addition, the 256-bit multiplier proposed according to the above embodiment of the present invention may also have the following additional technical features:
in one embodiment of the invention, the adding means comprises a plurality of 128-bit adders.
In one embodiment of the invention, the shift-add means comprises a plurality of 128-bit shift-add adders.
In an embodiment of the present invention, after receiving the 64-bit input data through the corresponding input terminals, the first 64-bit adder and the second 64-bit adder perform addition operation respectively to obtain a first input value and a second output value, and transmit the first input value and the second output value to the first 64-bit multiplier through the corresponding output terminals; the first 64-bit multiplier performs multiplication operation according to the first input value and the second output value to obtain the first operation result.
In an embodiment of the present invention, after receiving the 64-bit input data, the second 64-bit multiplier performs a multiplication operation to obtain the third output value, and transmits the third output value to the first bitwise inverting device through an output terminal of the second 64-bit multiplier; and the first bitwise negation device is used for bitwise negating the third output value by +1 so as to obtain the second operation result.
In an embodiment of the present invention, after receiving the 64-bit input data, the third 64-bit multiplier performs a multiplication operation to obtain the fourth output value, and transmits the fourth output value to the second bitwise inverting device through an output terminal of the third 64-bit multiplier; and the second bitwise negation device is used for bitwise negating the fourth output value by +1 so as to obtain the third operation result.
In one embodiment of the invention, the 64-bit multiplier comprises a plurality of 16-bit multipliers, each 16-bit multiplier comprising a plurality of 8-bit multipliers.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of a 256-bit multiplier according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of an 8-bit multiplier partial product array based on 4-Booth encoding according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a circuit configuration of a half-adder according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an improved 8-bit multiplier partial product array in accordance with one embodiment of the present invention;
FIG. 5 is a schematic diagram of a single column 4-bit compressed logic gate structure in accordance with one embodiment of the present invention;
FIG. 6 is a schematic diagram of a logic gate circuit configuration of a dual column compression block in accordance with one embodiment of the present invention;
FIG. 7 is a diagram illustrating the allocation of compressed blocks in an array and the results of the compression, according to one embodiment of the present invention;
FIG. 8 is a schematic diagram of the KO-2 algorithm according to one embodiment of the present invention; and
FIG. 9 is a schematic diagram of the KO-4 algorithm according to one embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The 256-bit multiplier of the embodiment of the present invention is described below with reference to the drawings.
Fig. 1 is a schematic diagram of a 256-bit multiplier according to one embodiment of the invention.
As shown in fig. 1, the 256-bit multiplier according to the embodiment of the present invention may include a plurality of 128-bit partial product generators 10 and shift-add/add units 20.
Wherein the outputs of the plurality of 128-bit partial product generating means 10 are connected to the inputs of the shift-add means 20, respectively, wherein the shift-add means 20 may comprise a plurality of 128-bit shift-add adders.
Each 128-bit product generation means 10 comprises a first 64-bit adder 11, a second 64-bit adder 12, a first 64-bit multiplier 13, a second 64-bit multiplier 14, a third 64-bit multiplier 15, a first bitwise negation means 16, a second bitwise negation means 17 and an addition means 18, wherein the addition means 18 may comprise a plurality of 128-bit adders.
Wherein the output terminals of the first 64-bit adder 11 and the second 64-bit adder 12 are respectively connected to the input terminal of the first 64-bit multiplier 13, the output terminal of the first 64-bit multiplier 13 is connected to the input terminal of the adding means 18, the output terminal of the second 64-bit multiplier 14 is connected to the input terminal of the adding means 18 through the first bit-wise inverting means 16, the output terminal of the third 64-bit multiplier 15 is connected to the input terminal of the adding means 18 through the second bit-wise inverting means 17, and the output terminal of the adding means 18 is connected to the input terminal of the shift-add means 20.
It should be noted that the 64-bit multipliers described in this embodiment may be 64-bit magnitude multipliers, for example, 65-bit multipliers, which are considered to be on the same order as the 64-bit multipliers because the circuit overhead is not much different from that of the 64-bit multipliers, that is, the 64-bit magnitude multipliers may include 64-bit multipliers, 65-bit multipliers, and the like, which are not much different from that of the 64-bit multipliers.
The adding device 18 receives operation results obtained by the first 64-bit adder 11, the second 64-bit adder 12, the second 64-bit multiplier 14, and the third 64-bit multiplier 15 through operation of the 64-bit input data received by the corresponding input terminals, and performs addition operation based on the operation results to obtain a 128-bit output value, wherein the operation results include a first operation result, a second operation result, and a third operation result.
It should be noted that the 64-bit input data described in this embodiment may be a plurality of 64-bit input data, and each 64-bit input data in the plurality of 64-bit input data may be the same or different, and is not limited herein.
In an embodiment of the invention, the 64-bit input data may be 8 64-bit input data, and each of the first 64-bit adder 11, the second 64-bit adder 12, the second 64-bit multiplier 14, and the third 64-bit multiplier 15 may receive 2 64-bit input data.
In an embodiment of the present invention, the first 64-bit adder 11 and the second 64-bit adder 12 respectively perform an addition operation to obtain a first input value and a second output value after receiving 64-bit input data through corresponding input terminals, and transmit the first input value and the second output value to the first 64-bit multiplier 13 through corresponding output terminals, and the first 64-bit multiplier 13 performs a multiplication operation according to the first input value and the second output value to obtain a first operation result.
In another embodiment of the present invention, the second 64-bit multiplier 14 performs a multiplication operation to obtain a third output value after receiving the 64-bit input data, and the third output value is transmitted to the first bit-wise negation device 16 through the output terminal of the second 64-bit multiplier 14, and the first bit-wise negation device 16 performs a bit-wise negation +1 on the third output value to obtain a second operation result.
In another embodiment of the present invention, the third 64-bit multiplier 15 performs a multiplication operation to obtain a fourth output value after receiving 64-bit input data, and the fourth output value is transmitted to the second bit-wise negation device 17 through the output terminal of the third 64-bit multiplier 15, and the second bit-wise negation device 17 performs a bit-wise negation of +1 on the fourth output value to obtain a third operation result.
Specifically, as shown in fig. 1, during the operation of the 256-bit multiplier, the first 64-bit adder 11 can receive 2 64-bit input data through its own input terminal, and similarly, the second 64-bit adder 12 can also receive 2 64-bit input data through its own input terminal. Then, the first 64-bit adder 11 and the second 64-bit adder 12 perform corresponding addition operations on the received 2 64-bit input data to obtain a first input value and a second output value, respectively. The first 64-bit adder 11 and the second 64-bit adder 12 pass the first input value and the second output value, respectively, through their own outputs to the first 64-bit multiplier 13. The first 64-bit multiplier 13, after receiving the first input value and the second output value, performs a corresponding multiplication operation on the first input value and the second output value to obtain a first operation result, and transmits the first operation result to the adding device 18.
In the process of operation of the 256-bit multiplier, the second 64-bit multiplier 14 may receive 2 64-bit input data through its own input end, perform corresponding multiplication operation on the received 2 64-bit input data to obtain a third output value, and transmit the third output value to the first bitwise inverting device 16 through the output end of the second 64-bit multiplier 14. The first inverting unit 16, after receiving the third output value, inverts +1 bit of the third output value to obtain a second operation result, and transmits the second operation result to the adding unit 18.
Similarly, the third 64-bit multiplier 15 may receive 2 64-bit input data through its own input end, perform corresponding multiplication operation on the received 2 64-bit input data to obtain a fourth output value, and transmit the fourth output value to the second bitwise inverting device 17 through the output end of the third 64-bit multiplier 15. The second bitwise negating means 17, upon receiving the fourth output value, bitwise negates +1 the fourth output value to obtain a third operation result, and passes the third operation result to the adding means 18.
Further, the adding device 18 receives the first operation result, the second operation result, and the third operation result, and performs a correlated addition operation on the first operation result, the second operation result, and the third operation result based on a plurality of built-in 128-bit adders to obtain a 128-bit output value.
It should be noted that, in the operation of the 256-bit multiplier of the present invention, each 128-bit product generation apparatus 10 can output a 128-bit output value.
The shift-add adder 20 performs a shift-add operation on the basis of the 128-bit output value of each 128-bit product generator 10 to obtain a 256-bit output value.
Specifically, the adder 18 in each 128-bit product generator 10 passes the 128-bit output value to the shift-add adder 20 after obtaining the 128-bit output value, and the shift-add adder 20 may perform a correlated addition operation on the 128-bit output value output from each 128-bit product generator 10 based on a plurality of built-in 128-bit shift-add adders after receiving the 128-bit output value output from each 128-bit product generator 10 to obtain a 256-bit output value.
In summary, the 256-bit multiplier provided in the embodiments of the present invention can shorten the delay of multiplication hardware implementation, and can be applied to fast implementation of algorithms in a hardware circuit, thereby optimizing the performance of a digital circuit chip.
Further, in one embodiment of the present invention, the 64-bit multipliers (e.g., the first 64-bit multiplier 13, the second 64-bit multiplier 14, and the third 64-bit multiplier 15) may include a plurality of 16-bit multipliers, and each 16-bit multiplier may include a plurality of 8-bit multipliers.
It should be noted that the 8-bit multiplier described in this embodiment may be an optimized 8-bit multiplier, wherein the 8-bit multiplier may be an 8-bit multiplier generated by a radix-4-Booth encoding.
In the embodiment of the present application, the delay of the multiplier can be shortened by optimizing the partial product array of the 8-bit multiplier, and the 256-bit multiplier of the present application can be constructed based on the delay.
The optimization process of the above-described 8-bit multiplier is described in detail below in conjunction with fig. 2-7:
an 8-bit multiplier partial product array generated by the Booth coding algorithm is shown in FIG. 2, and as can be seen from FIG. 2, stepped irregular sawtooth parts are arranged on the left side and the right side of the array, so that the irregular bit distribution brings difficulty for subsequent compression calculation, and the application can use parallel one-bit addition and the half adder is constructed as shown in FIG. 3. The left and right sides of the array are normalized to the product array shown in fig. 4, consuming only the delay of one bit half-adder.
The array optimization method not only regularizes the distribution of the sawtooth-shaped bits on the right side of the array, but also shortens the length of the carry chain of the single bit in the last row. Meanwhile, the constant '1' distributed in a sawtooth shape on the left side can be calculated in advance, the time delay of a key path of the calculating circuit is shorter than that of a one-bit half adder optimally used on the right side, and the calculating circuit and the right-side half adder circuit can be optimally executed in parallel, so that the time overhead is further saved.
And then, designing a compression step for the improved 8-bit Booth coding multiplier array, wherein the array can be compressed in blocks by taking the number of logic circuit stages required by compressing to one row as a consideration basis so as to achieve that the delay of the compression circuit required by each part is as average as possible, so that the compression of the parts performed in parallel consumes the same delay as much as possible, thereby not wasting the time overhead of a circuit critical path.
For better understanding, the analysis of the logical progression may be performed for one compressed block for the middle two columns. First, if the gate level design of the compression circuit is performed for a single column, i.e. 4 bits are compressed into a row, the gate level logic circuit is configured as shown in fig. 5. The analysis and gate level design of the compression circuit on this basis for two columns is shown in fig. 6. It can be seen that through the compression of the logic circuit as shown in the figure, 8 bits in two columns are changed into 4 bits in one row, and the longest logic gate number in the process can be 11.5.
Similarly, the right four columns can be analyzed. The right four columns are treated as a compressed block which can be compressed to 6 bits in a row, and the longest number of logic gate stages in the process can be 12 stages. Similarly, the left 5 columns are used as a compression block, and finally compressed into a row of 5 bits a (for the multiplication result, overflow or carry out except the most significant bit can be omitted), and the longest logic stage number in the compression circuit can be 13 stages. It should be noted that there are five bits in one column in the middle part, and this column containing five bits is grouped with its left column, and the number of critical path logic gates for compressing these two columns by nine bits to a row of four bits is 13 (still within the current maximum number of logic stages). Thus, the whole array can be divided into several compression blocks with basically equivalent logic gate levels required by compression, which are respectively: four columns on the right, five columns on the left, and one group of two columns in the middle (one of which is five bits). The compressed blocks are then executed in parallel, compressing the partial product array from five rows to two rows. The allocation of compressed blocks and the compression results are shown in fig. 7.
As shown in fig. 7, after the block is compressed into two rows, the two rows are added together to obtain the final multiplication result, i.e. the optimized multiplication result of the 8-bit multiplier.
The final 256-bit multiplier result is obtained by using a divide and conquer fast multiplication algorithm Karatsuba-offsman based on the optimized 8-bit multiplier as described in detail below with reference to fig. 8-9.
First, a 16-bit multiplier can be obtained using the KO-2 algorithm shown in FIG. 8, as follows. Let two multiply operands A and B (both 16 bits):
A=a128+a0 (1)
B=b 128+b0 (2)
then there are:
Figure BDA0002677697110000061
28the preceding coefficient terms can be deformed as:
(a1b0+a0b1)=(a1+a0)·(b1+b0)-a1b1-a0b0 (4)
thus, two multiplications (a)1b0+a0b1) Reduced for one multiplication (a)1+a0)·(b1+b0) The extra overhead introduced by two additions is small and negligible compared to the multiplication. The 8-bit multiplier results in a 16-bit multiplier through the process of KO-2 shown in fig. 8.
After obtaining the 16-bit multiplier, the KO-4 algorithm shown in fig. 9 can be used twice, so that the 64-bit multiplier is obtained for the first time, and the 256-bit multiplier is obtained for the second time KO-4. The 256-bit KO-4 algorithm is derived as follows:
let first KO-4 two multiply operands A and B (256 bits each) and where a0~3b0~3Are all 64 bits:
A=a32192+a22128+a1264+a0 (5)
B=b 32192+b 22128+b 1264+b0 (6)
then there are also:
Figure BDA0002677697110000071
2192the former coefficient terms are complex in deformation and describe in detail:
Figure BDA0002677697110000072
the 64-bit multiplier obtains a 256-bit multiplier through the process of KO-4.
In the embodiment of the invention, by the method for constructing the 256-bit multiplier, the partial product array of the 8-bit multiplier generated by the 4-Booth-based code can be optimized on the premise of not influencing the logic function of the multiplier, the time delay of the multiplier is shortened, and the faster 256-bit multiplier is constructed on the basis of the time delay.
To sum up, the 256-bit multiplier according to the embodiment of the present invention first receives the operation result obtained by the operation of the 64-bit input data received by the corresponding input terminal by the first 64-bit adder, the second 64-bit multiplier, and the third 64-bit multiplier through the adding device in each 128-bit product generating device, and performs addition operation based on the operation result to obtain a 128-bit output value, and then performs shift addition operation according to the 128-bit output value of each 128-bit product generating device by the shift addition device to obtain a 256-bit output value. Therefore, the method can shorten the time delay of multiplication hardware realization, and can be applied to the quick realization of the algorithm in a hardware circuit, thereby optimizing the performance of a digital circuit chip.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise explicitly stated or limited, the terms "mounted," "connected," "fixed," and the like are to be construed broadly, e.g., as being permanently connected, detachably connected, or integral; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (4)

1. A256-bit multiplier comprising a plurality of 128-bit product generating means and a shift-add means, outputs of said plurality of 128-bit product generating means being connected to inputs of said shift-add means, respectively, wherein,
each 128-bit product generation device comprises a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise negation device, a second bitwise negation device and an addition device;
the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bitwise negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bitwise negating device, and the output end of the adding device is connected with the input end of the shift-add device;
the adding device receives operation results obtained by operation of the first 64-bit adder, the second 64-bit multiplier and the third 64-bit multiplier through 64-bit input data received by corresponding input ends, and performs addition operation on the operation results to obtain a 128-bit output value, wherein the operation results comprise a first operation result, a second operation result and a third operation result;
the shift-add means performs a shift-add operation on the 128-bit output value of each 128-bit partial product generation means to obtain a 256-bit output value;
after receiving the 64-bit input data through corresponding input ends, the first 64-bit adder and the second 64-bit adder respectively perform addition operation to obtain a first input value and a second output value, and transmit the first input value and the second output value to the first 64-bit multiplier through corresponding output ends;
the first 64-bit multiplier performs multiplication operation according to the first input value and the second output value to obtain the first operation result;
after receiving the 64-bit input data, the second 64-bit multiplier performs multiplication to obtain a third output value, and the third output value is transmitted to the first bitwise inverting device through the output end of the second 64-bit multiplier;
the first bitwise negation device is used for bitwise negating the third output value by +1 to obtain the second operation result;
after receiving the 64-bit input data, the third 64-bit multiplier performs multiplication to obtain a fourth output value, and the fourth output value is transmitted to the second bitwise inverting device through the output end of the third 64-bit multiplier;
and the second bitwise negation device is used for bitwise negating the fourth output value by +1 so as to obtain the third operation result.
2. A 256-bit multiplier according to claim 1, wherein the adding means comprises a plurality of 128-bit adders.
3. A 256-bit multiplier according to claim 1, characterized in that said shift-add means comprise a plurality of 128-bit shift-add adders.
4. The 256-bit multiplier of any of claims 1 to 3, wherein the 64-bit multiplier comprises a plurality of 16-bit multipliers, each 16-bit multiplier comprising a plurality of 8-bit multipliers.
CN202010953134.0A 2020-09-11 2020-09-11 256 bit multiplier Active CN112214199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010953134.0A CN112214199B (en) 2020-09-11 2020-09-11 256 bit multiplier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010953134.0A CN112214199B (en) 2020-09-11 2020-09-11 256 bit multiplier

Publications (2)

Publication Number Publication Date
CN112214199A CN112214199A (en) 2021-01-12
CN112214199B true CN112214199B (en) 2022-06-21

Family

ID=74049306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010953134.0A Active CN112214199B (en) 2020-09-11 2020-09-11 256 bit multiplier

Country Status (1)

Country Link
CN (1) CN112214199B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1118472A (en) * 1994-05-26 1996-03-13 摩托罗拉公司 Combined multiplier-shifter and method therefor
CN105045560A (en) * 2015-08-25 2015-11-11 浪潮(北京)电子信息产业有限公司 Fixed-point multiply-add operation method and apparatus
CN107977191A (en) * 2016-10-21 2018-05-01 中国科学院微电子研究所 A kind of low power consumption parallel multiplier
CN108351761A (en) * 2015-11-12 2018-07-31 Arm有限公司 Use the multiplication of the first and second operands of redundant representation
CN111522528A (en) * 2020-04-22 2020-08-11 厦门星宸科技有限公司 Multiplier, multiplication method, operation chip, electronic device, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6102645B2 (en) * 2013-09-11 2017-03-29 富士通株式会社 Product-sum operation circuit and product-sum operation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1118472A (en) * 1994-05-26 1996-03-13 摩托罗拉公司 Combined multiplier-shifter and method therefor
CN105045560A (en) * 2015-08-25 2015-11-11 浪潮(北京)电子信息产业有限公司 Fixed-point multiply-add operation method and apparatus
CN108351761A (en) * 2015-11-12 2018-07-31 Arm有限公司 Use the multiplication of the first and second operands of redundant representation
CN107977191A (en) * 2016-10-21 2018-05-01 中国科学院微电子研究所 A kind of low power consumption parallel multiplier
CN111522528A (en) * 2020-04-22 2020-08-11 厦门星宸科技有限公司 Multiplier, multiplication method, operation chip, electronic device, and storage medium

Also Published As

Publication number Publication date
CN112214199A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN106909970B (en) Approximate calculation-based binary weight convolution neural network hardware accelerator calculation device
US5956265A (en) Boolean digital multiplier
US7805478B2 (en) Montgomery modular multiplier
JP2722411B2 (en) Implementation of modular reduction by Montgomery method
US7543011B2 (en) Montgomery modular multiplier and method thereof using carry save addition
US8756268B2 (en) Montgomery multiplier having efficient hardware structure
KR100591761B1 (en) Montgomery Modular Multiplication Method Using Montgomery Modular Multiplier and Carry Store Addition
CN115344237A (en) Data processing method combining Karatsuba and Montgomery modular multiplication
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN113794572A (en) Hardware implementation system and method for high-performance elliptic curve digital signature and signature verification
US20180165064A1 (en) Partial square root calculation
CN112214199B (en) 256 bit multiplier
US3842250A (en) Circuit for implementing rounding in add/subtract logic networks
US6480870B1 (en) Random number generator using lehmer algorithm
CN110688094B (en) Remainder operation circuit and method based on parallel cyclic compression
Matutino et al. An efficient scalable RNS architecture for large dynamic ranges
Lee Low-Latency Bit-Parallel Systolic Multiplier for Irreducible x m+ x n+ 1 with gcd (m, n)= 1
US6163790A (en) Modular arithmetic coprocessor comprising an integer division circuit
US5948051A (en) Device improving the processing speed of a modular arithmetic coprocessor
US20070180014A1 (en) Sparce-redundant fixed point arithmetic modules
US20040133618A1 (en) Method and system for performing a multiplication operation and a device
Pérez-Celis et al. An fpga architecture to accelerate the burrows wheeler transform by using a linear sorter
Saleh et al. Novel serial–parallel multipliers
JPH09245019A (en) Product sum arithmetic circuit
Kim et al. Digit-serial modular multiplication using skew-tolerant domino CMOS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210928

Address after: Room 311, floor 3, building 26, yard 1, Baosheng South Road, Haidian District, Beijing 100192

Applicant after: Beijing Cao Mu Xin Technology Co.,Ltd.

Address before: 100084 Tsinghua Yuan, Beijing, Haidian District

Applicant before: TSINGHUA University

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant