CN112214199B

CN112214199B - 256 bit multiplier

Info

Publication number: CN112214199B
Application number: CN202010953134.0A
Authority: CN
Inventors: 李树国; 孔添琪
Original assignee: Beijing Cao Mu Xin Technology Co ltd
Current assignee: Beijing Cao Mu Xin Technology Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2022-06-21
Anticipated expiration: 2040-09-11
Also published as: CN112214199A

Abstract

The invention discloses a 256-bit multiplier, which comprises a plurality of 128-bit product generating devices and a shift addition device, wherein the output ends of the 128-bit product generating devices are respectively connected with the input ends of the shift addition device, and each 128-bit product generating device comprises a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise negation device, a second bitwise negation device and an addition device; the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bit-by-bit negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bit-by-bit negating device, and the output end of the adding device is connected with the input end of the shift addition device.

Description

256 bit multiplier

Technical Field

The present invention relates to the field of multiplier technologies, and in particular, to a 256-bit multiplier.

Background

Multipliers are the most basic functional cores in microprocessor design and modern cryptographic systems. Almost all hardware and software systems require an indispensable arithmetic unit of a multiplier, the efficiency of multiplication implementation fundamentally determines the execution efficiency of the algorithm and the system speed, and the performance of the multiplication implementation directly determines the performance of a cryptographic algorithm such as a public key cryptographic algorithm (RSA). The multiplier is not only widely applied to the design of a cryptographic chip, but also is an indispensable operation core in a digital system.

However, the delay of the multiplication hardware implementation of the 256-bit multiplier in the related art is high, which affects the overall operation efficiency of the multiplier.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, an object of the present invention is to provide a 256-bit multiplier, which can shorten the delay of multiplication hardware implementation and can be applied to fast implementation of algorithms in a hardware circuit, thereby optimizing the performance of a digital circuit chip.

In order to achieve the above object, a 256-bit multiplier according to a first embodiment of the present invention includes a plurality of 128-bit product generators and a shift-add device, where outputs of the 128-bit product generators are respectively connected to inputs of the shift-add device, where each 128-bit product generator includes a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise inverse device, a second bitwise inverse device, and an add device; the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bitwise negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bitwise negating device, and the output end of the adding device is connected with the input end of the shift-add device; the adding device receives operation results obtained by operation of the first 64-bit adder, the second 64-bit multiplier and the third 64-bit multiplier through 64-bit input data received by corresponding input ends, and performs addition operation on the operation results to obtain a 128-bit output value, wherein the operation results comprise a first operation result, a second operation result and a third operation result; the shift-add means performs a shift-add operation based on the 128-bit output value of each 128-bit product generation means to obtain 256-bit output values.

The 256-bit multiplier according to the embodiment of the present invention first receives the operation result obtained by the operation of the first 64-bit adder, the second 64-bit multiplier, and the third 64-bit multiplier by the addition device in each 128-bit product generation device from the 64-bit input data received by the corresponding input terminal, and performs addition operation based on the operation result to obtain a 128-bit output value, and then performs shift addition operation by the shift addition device according to the 128-bit output value of each 128-bit product generation device to obtain a 256-bit output value. Therefore, the method can shorten the time delay of multiplication hardware realization, and can be applied to the quick realization of the algorithm in a hardware circuit, thereby optimizing the performance of a digital circuit chip.

In addition, the 256-bit multiplier proposed according to the above embodiment of the present invention may also have the following additional technical features:

in one embodiment of the invention, the adding means comprises a plurality of 128-bit adders.

In one embodiment of the invention, the shift-add means comprises a plurality of 128-bit shift-add adders.

In an embodiment of the present invention, after receiving the 64-bit input data through the corresponding input terminals, the first 64-bit adder and the second 64-bit adder perform addition operation respectively to obtain a first input value and a second output value, and transmit the first input value and the second output value to the first 64-bit multiplier through the corresponding output terminals; the first 64-bit multiplier performs multiplication operation according to the first input value and the second output value to obtain the first operation result.

In an embodiment of the present invention, after receiving the 64-bit input data, the second 64-bit multiplier performs a multiplication operation to obtain the third output value, and transmits the third output value to the first bitwise inverting device through an output terminal of the second 64-bit multiplier; and the first bitwise negation device is used for bitwise negating the third output value by +1 so as to obtain the second operation result.

In an embodiment of the present invention, after receiving the 64-bit input data, the third 64-bit multiplier performs a multiplication operation to obtain the fourth output value, and transmits the fourth output value to the second bitwise inverting device through an output terminal of the third 64-bit multiplier; and the second bitwise negation device is used for bitwise negating the fourth output value by +1 so as to obtain the third operation result.

In one embodiment of the invention, the 64-bit multiplier comprises a plurality of 16-bit multipliers, each 16-bit multiplier comprising a plurality of 8-bit multipliers.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic diagram of a 256-bit multiplier according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of an 8-bit multiplier partial product array based on 4-Booth encoding according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a circuit configuration of a half-adder according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an improved 8-bit multiplier partial product array in accordance with one embodiment of the present invention;

FIG. 5 is a schematic diagram of a single column 4-bit compressed logic gate structure in accordance with one embodiment of the present invention;

FIG. 6 is a schematic diagram of a logic gate circuit configuration of a dual column compression block in accordance with one embodiment of the present invention;

FIG. 7 is a diagram illustrating the allocation of compressed blocks in an array and the results of the compression, according to one embodiment of the present invention;

FIG. 8 is a schematic diagram of the KO-2 algorithm according to one embodiment of the present invention; and

FIG. 9 is a schematic diagram of the KO-4 algorithm according to one embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The 256-bit multiplier of the embodiment of the present invention is described below with reference to the drawings.

Fig. 1 is a schematic diagram of a 256-bit multiplier according to one embodiment of the invention.

As shown in fig. 1, the 256-bit multiplier according to the embodiment of the present invention may include a plurality of 128-bit partial product generators 10 and shift-add/add units 20.

Wherein the outputs of the plurality of 128-bit partial product generating means 10 are connected to the inputs of the shift-add means 20, respectively, wherein the shift-add means 20 may comprise a plurality of 128-bit shift-add adders.

Each 128-bit product generation means 10 comprises a first 64-bit adder 11, a second 64-bit adder 12, a first 64-bit multiplier 13, a second 64-bit multiplier 14, a third 64-bit multiplier 15, a first bitwise negation means 16, a second bitwise negation means 17 and an addition means 18, wherein the addition means 18 may comprise a plurality of 128-bit adders.

Wherein the output terminals of the first 64-bit adder 11 and the second 64-bit adder 12 are respectively connected to the input terminal of the first 64-bit multiplier 13, the output terminal of the first 64-bit multiplier 13 is connected to the input terminal of the adding means 18, the output terminal of the second 64-bit multiplier 14 is connected to the input terminal of the adding means 18 through the first bit-wise inverting means 16, the output terminal of the third 64-bit multiplier 15 is connected to the input terminal of the adding means 18 through the second bit-wise inverting means 17, and the output terminal of the adding means 18 is connected to the input terminal of the shift-add means 20.

It should be noted that the 64-bit multipliers described in this embodiment may be 64-bit magnitude multipliers, for example, 65-bit multipliers, which are considered to be on the same order as the 64-bit multipliers because the circuit overhead is not much different from that of the 64-bit multipliers, that is, the 64-bit magnitude multipliers may include 64-bit multipliers, 65-bit multipliers, and the like, which are not much different from that of the 64-bit multipliers.

The adding device 18 receives operation results obtained by the first 64-bit adder 11, the second 64-bit adder 12, the second 64-bit multiplier 14, and the third 64-bit multiplier 15 through operation of the 64-bit input data received by the corresponding input terminals, and performs addition operation based on the operation results to obtain a 128-bit output value, wherein the operation results include a first operation result, a second operation result, and a third operation result.

It should be noted that the 64-bit input data described in this embodiment may be a plurality of 64-bit input data, and each 64-bit input data in the plurality of 64-bit input data may be the same or different, and is not limited herein.

In an embodiment of the invention, the 64-bit input data may be 8 64-bit input data, and each of the first 64-bit adder 11, the second 64-bit adder 12, the second 64-bit multiplier 14, and the third 64-bit multiplier 15 may receive 2 64-bit input data.

In an embodiment of the present invention, the first 64-bit adder 11 and the second 64-bit adder 12 respectively perform an addition operation to obtain a first input value and a second output value after receiving 64-bit input data through corresponding input terminals, and transmit the first input value and the second output value to the first 64-bit multiplier 13 through corresponding output terminals, and the first 64-bit multiplier 13 performs a multiplication operation according to the first input value and the second output value to obtain a first operation result.

In another embodiment of the present invention, the second 64-bit multiplier 14 performs a multiplication operation to obtain a third output value after receiving the 64-bit input data, and the third output value is transmitted to the first bit-wise negation device 16 through the output terminal of the second 64-bit multiplier 14, and the first bit-wise negation device 16 performs a bit-wise negation +1 on the third output value to obtain a second operation result.

In another embodiment of the present invention, the third 64-bit multiplier 15 performs a multiplication operation to obtain a fourth output value after receiving 64-bit input data, and the fourth output value is transmitted to the second bit-wise negation device 17 through the output terminal of the third 64-bit multiplier 15, and the second bit-wise negation device 17 performs a bit-wise negation of +1 on the fourth output value to obtain a third operation result.

Specifically, as shown in fig. 1, during the operation of the 256-bit multiplier, the first 64-bit adder 11 can receive 2 64-bit input data through its own input terminal, and similarly, the second 64-bit adder 12 can also receive 2 64-bit input data through its own input terminal. Then, the first 64-bit adder 11 and the second 64-bit adder 12 perform corresponding addition operations on the received 2 64-bit input data to obtain a first input value and a second output value, respectively. The first 64-bit adder 11 and the second 64-bit adder 12 pass the first input value and the second output value, respectively, through their own outputs to the first 64-bit multiplier 13. The first 64-bit multiplier 13, after receiving the first input value and the second output value, performs a corresponding multiplication operation on the first input value and the second output value to obtain a first operation result, and transmits the first operation result to the adding device 18.

In the process of operation of the 256-bit multiplier, the second 64-bit multiplier 14 may receive 2 64-bit input data through its own input end, perform corresponding multiplication operation on the received 2 64-bit input data to obtain a third output value, and transmit the third output value to the first bitwise inverting device 16 through the output end of the second 64-bit multiplier 14. The first inverting unit 16, after receiving the third output value, inverts +1 bit of the third output value to obtain a second operation result, and transmits the second operation result to the adding unit 18.

Similarly, the third 64-bit multiplier 15 may receive 2 64-bit input data through its own input end, perform corresponding multiplication operation on the received 2 64-bit input data to obtain a fourth output value, and transmit the fourth output value to the second bitwise inverting device 17 through the output end of the third 64-bit multiplier 15. The second bitwise negating means 17, upon receiving the fourth output value, bitwise negates +1 the fourth output value to obtain a third operation result, and passes the third operation result to the adding means 18.

Further, the adding device 18 receives the first operation result, the second operation result, and the third operation result, and performs a correlated addition operation on the first operation result, the second operation result, and the third operation result based on a plurality of built-in 128-bit adders to obtain a 128-bit output value.

It should be noted that, in the operation of the 256-bit multiplier of the present invention, each 128-bit product generation apparatus 10 can output a 128-bit output value.

The shift-add adder 20 performs a shift-add operation on the basis of the 128-bit output value of each 128-bit product generator 10 to obtain a 256-bit output value.

Specifically, the adder 18 in each 128-bit product generator 10 passes the 128-bit output value to the shift-add adder 20 after obtaining the 128-bit output value, and the shift-add adder 20 may perform a correlated addition operation on the 128-bit output value output from each 128-bit product generator 10 based on a plurality of built-in 128-bit shift-add adders after receiving the 128-bit output value output from each 128-bit product generator 10 to obtain a 256-bit output value.

In summary, the 256-bit multiplier provided in the embodiments of the present invention can shorten the delay of multiplication hardware implementation, and can be applied to fast implementation of algorithms in a hardware circuit, thereby optimizing the performance of a digital circuit chip.

Further, in one embodiment of the present invention, the 64-bit multipliers (e.g., the first 64-bit multiplier 13, the second 64-bit multiplier 14, and the third 64-bit multiplier 15) may include a plurality of 16-bit multipliers, and each 16-bit multiplier may include a plurality of 8-bit multipliers.

It should be noted that the 8-bit multiplier described in this embodiment may be an optimized 8-bit multiplier, wherein the 8-bit multiplier may be an 8-bit multiplier generated by a radix-4-Booth encoding.

In the embodiment of the present application, the delay of the multiplier can be shortened by optimizing the partial product array of the 8-bit multiplier, and the 256-bit multiplier of the present application can be constructed based on the delay.

The optimization process of the above-described 8-bit multiplier is described in detail below in conjunction with fig. 2-7:

an 8-bit multiplier partial product array generated by the Booth coding algorithm is shown in FIG. 2, and as can be seen from FIG. 2, stepped irregular sawtooth parts are arranged on the left side and the right side of the array, so that the irregular bit distribution brings difficulty for subsequent compression calculation, and the application can use parallel one-bit addition and the half adder is constructed as shown in FIG. 3. The left and right sides of the array are normalized to the product array shown in fig. 4, consuming only the delay of one bit half-adder.

The array optimization method not only regularizes the distribution of the sawtooth-shaped bits on the right side of the array, but also shortens the length of the carry chain of the single bit in the last row. Meanwhile, the constant '1' distributed in a sawtooth shape on the left side can be calculated in advance, the time delay of a key path of the calculating circuit is shorter than that of a one-bit half adder optimally used on the right side, and the calculating circuit and the right-side half adder circuit can be optimally executed in parallel, so that the time overhead is further saved.

And then, designing a compression step for the improved 8-bit Booth coding multiplier array, wherein the array can be compressed in blocks by taking the number of logic circuit stages required by compressing to one row as a consideration basis so as to achieve that the delay of the compression circuit required by each part is as average as possible, so that the compression of the parts performed in parallel consumes the same delay as much as possible, thereby not wasting the time overhead of a circuit critical path.

For better understanding, the analysis of the logical progression may be performed for one compressed block for the middle two columns. First, if the gate level design of the compression circuit is performed for a single column, i.e. 4 bits are compressed into a row, the gate level logic circuit is configured as shown in fig. 5. The analysis and gate level design of the compression circuit on this basis for two columns is shown in fig. 6. It can be seen that through the compression of the logic circuit as shown in the figure, 8 bits in two columns are changed into 4 bits in one row, and the longest logic gate number in the process can be 11.5.

Similarly, the right four columns can be analyzed. The right four columns are treated as a compressed block which can be compressed to 6 bits in a row, and the longest number of logic gate stages in the process can be 12 stages. Similarly, the left 5 columns are used as a compression block, and finally compressed into a row of 5 bits a (for the multiplication result, overflow or carry out except the most significant bit can be omitted), and the longest logic stage number in the compression circuit can be 13 stages. It should be noted that there are five bits in one column in the middle part, and this column containing five bits is grouped with its left column, and the number of critical path logic gates for compressing these two columns by nine bits to a row of four bits is 13 (still within the current maximum number of logic stages). Thus, the whole array can be divided into several compression blocks with basically equivalent logic gate levels required by compression, which are respectively: four columns on the right, five columns on the left, and one group of two columns in the middle (one of which is five bits). The compressed blocks are then executed in parallel, compressing the partial product array from five rows to two rows. The allocation of compressed blocks and the compression results are shown in fig. 7.

As shown in fig. 7, after the block is compressed into two rows, the two rows are added together to obtain the final multiplication result, i.e. the optimized multiplication result of the 8-bit multiplier.

The final 256-bit multiplier result is obtained by using a divide and conquer fast multiplication algorithm Karatsuba-offsman based on the optimized 8-bit multiplier as described in detail below with reference to fig. 8-9.

First, a 16-bit multiplier can be obtained using the KO-2 algorithm shown in FIG. 8, as follows. Let two multiply operands A and B (both 16 bits):

A＝a₁2⁸+a₀ (1)

B＝b ₁2⁸+b₀ (2)

then there are:

2⁸the preceding coefficient terms can be deformed as:

(a₁b₀+a₀b₁)＝(a₁+a₀)·(b₁+b₀)-a₁b₁-a₀b₀ (4)

thus, two multiplications (a)₁b₀+a₀b₁) Reduced for one multiplication (a)₁+a₀)·(b₁+b₀) The extra overhead introduced by two additions is small and negligible compared to the multiplication. The 8-bit multiplier results in a 16-bit multiplier through the process of KO-2 shown in fig. 8.

After obtaining the 16-bit multiplier, the KO-4 algorithm shown in fig. 9 can be used twice, so that the 64-bit multiplier is obtained for the first time, and the 256-bit multiplier is obtained for the second time KO-4. The 256-bit KO-4 algorithm is derived as follows:

let first KO-4 two multiply operands A and B (256 bits each) and where a_0～3b_0～3Are all 64 bits:

A＝a₃2¹⁹²+a₂2¹²⁸+a₁2⁶⁴+a₀ (5)

B＝b ₃2¹⁹²+b ₂2¹²⁸+b ₁2⁶⁴+b₀ (6)

then there are also:

2¹⁹²the former coefficient terms are complex in deformation and describe in detail:

the 64-bit multiplier obtains a 256-bit multiplier through the process of KO-4.

In the embodiment of the invention, by the method for constructing the 256-bit multiplier, the partial product array of the 8-bit multiplier generated by the 4-Booth-based code can be optimized on the premise of not influencing the logic function of the multiplier, the time delay of the multiplier is shortened, and the faster 256-bit multiplier is constructed on the basis of the time delay.

To sum up, the 256-bit multiplier according to the embodiment of the present invention first receives the operation result obtained by the operation of the 64-bit input data received by the corresponding input terminal by the first 64-bit adder, the second 64-bit multiplier, and the third 64-bit multiplier through the adding device in each 128-bit product generating device, and performs addition operation based on the operation result to obtain a 128-bit output value, and then performs shift addition operation according to the 128-bit output value of each 128-bit product generating device by the shift addition device to obtain a 256-bit output value. Therefore, the method can shorten the time delay of multiplication hardware realization, and can be applied to the quick realization of the algorithm in a hardware circuit, thereby optimizing the performance of a digital circuit chip.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present invention, unless otherwise explicitly stated or limited, the terms "mounted," "connected," "fixed," and the like are to be construed broadly, e.g., as being permanently connected, detachably connected, or integral; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A256-bit multiplier comprising a plurality of 128-bit product generating means and a shift-add means, outputs of said plurality of 128-bit product generating means being connected to inputs of said shift-add means, respectively, wherein,

each 128-bit product generation device comprises a first 64-bit adder, a second 64-bit adder, a first 64-bit multiplier, a second 64-bit multiplier, a third 64-bit multiplier, a first bitwise negation device, a second bitwise negation device and an addition device;

the output ends of the first 64-bit adder and the second 64-bit adder are respectively connected with the input end of the first 64-bit multiplier, the output end of the first 64-bit multiplier is connected with the input end of the adding device, the output end of the second 64-bit multiplier is connected with the input end of the adding device through the first bitwise negating device, the output end of the third 64-bit multiplier is connected with the input end of the adding device through the second bitwise negating device, and the output end of the adding device is connected with the input end of the shift-add device;

the adding device receives operation results obtained by operation of the first 64-bit adder, the second 64-bit multiplier and the third 64-bit multiplier through 64-bit input data received by corresponding input ends, and performs addition operation on the operation results to obtain a 128-bit output value, wherein the operation results comprise a first operation result, a second operation result and a third operation result;

the shift-add means performs a shift-add operation on the 128-bit output value of each 128-bit partial product generation means to obtain a 256-bit output value;

after receiving the 64-bit input data through corresponding input ends, the first 64-bit adder and the second 64-bit adder respectively perform addition operation to obtain a first input value and a second output value, and transmit the first input value and the second output value to the first 64-bit multiplier through corresponding output ends;

the first 64-bit multiplier performs multiplication operation according to the first input value and the second output value to obtain the first operation result;

after receiving the 64-bit input data, the second 64-bit multiplier performs multiplication to obtain a third output value, and the third output value is transmitted to the first bitwise inverting device through the output end of the second 64-bit multiplier;

the first bitwise negation device is used for bitwise negating the third output value by +1 to obtain the second operation result;

after receiving the 64-bit input data, the third 64-bit multiplier performs multiplication to obtain a fourth output value, and the fourth output value is transmitted to the second bitwise inverting device through the output end of the third 64-bit multiplier;

and the second bitwise negation device is used for bitwise negating the fourth output value by +1 so as to obtain the third operation result.

2. A 256-bit multiplier according to claim 1, wherein the adding means comprises a plurality of 128-bit adders.

3. A 256-bit multiplier according to claim 1, characterized in that said shift-add means comprise a plurality of 128-bit shift-add adders.

4. The 256-bit multiplier of any of claims 1 to 3, wherein the 64-bit multiplier comprises a plurality of 16-bit multipliers, each 16-bit multiplier comprising a plurality of 8-bit multipliers.