IL106923A

IL106923A - Device for performing modular multiplication

Info

Publication number: IL106923A
Application number: IL10692393A
Authority: IL
Original assignee: Fortress U & T Ltd
Priority date: 1993-09-06
Filing date: 1993-09-06
Publication date: 1998-07-15

Description

106923/2 minnn rfraDn MX !? lawn nn DEVICE FOR PERFORMING MODULAR MULTIPLICATION FORTRESS U & T LTD.

C: 19179 IMPROVED METHOD FOR PERFORMTNG MODULAR ANT) C O VENT I ON AL MULTIPLICATION Field of the Invention The invention relates to a process and apparatus for ear i out modular multiplication and exponentiation of large numbers and related operations, particularly for Public Key Cryptographic authentication and encryption protocols.

This application makes reference to 2 applications which have since been cancelled: application No. 103921 and copending application No. 104753.

Background of the Invention In parent copending Application No.103921, and in first copending addition application number 104753, a device and method is described and claimed, more specifically, a compact synchronous microelectronic peripheral machine for standard microprocessors, with means for proper clocking and control, which is specifically intended to carry out a modular multiplication method known as the "Interleaved Montgomery Muitiprecision Modular Multiplication Method", and modular exponentiation processes based on the said multiplication method, which processes are often used in encryption systems. The parent application fully describes the mathematical basis for those methods, which involves the use of a iP operator, such that, for any two numbers A and B which are the multiplicand and the multiplier, and a 2815/93 modular N, Α· Β· I mod N ≡ 2<Α·Β)Ν, wherein I is a parasitic factor which actually has no influence on the overall precision.

To enact the _P operator on A-B the following process can be carried out, using a precalculated constant J: 1) X = A-B 2) Y = (X-J) mod 2n (only the n LS bits are necessary) 3) Z = X + Y-N 4) S = Z 2n . 5) P Y S mod N (N is to be subtracted from S, if S > N) Finally, at step 5): P Y 2{A-B)N, After the subtraction of N, if necessary: P = ¾A-B)N = A B mod N J is a constant which depends on module N only and is defined by J ≡ -N 1 mod 2N. The symbol ≡ signifies congruence, while the symbol Y signifies limited congruence, viz. that an equality exists or an equality plus a modulo exists.

The hardware described in the parent application and in application No. 104753 carries out modular multiplication and exponentiation by applying the iP operator in a new and original proceeding. Further, the squaring can be carried out in the same method, by applying it to a multiplicand and a multiplier that are equal. Modular exponentiation involves a succession of modular multiplications and squarings, and therefore is carried out by a method which comprises the repeated, 2815/93 suitably combined and oriented application of the aforesaid multiplication squaring and exponentiation methods. However, a novel ahd improved way of carrying out modular exponentiation will be further specified herein.

The aforesaid copending applications describe a method for carrying out modular multiplication, wherein the multiplicand A, the multiplier B and the modulo N comprise m characters of k bits each, the multiplicand and the multiplier not being greater than the modulo, which comprises the steps of: 1 - precalculating a parameter H and at least the least significant character Jo of another parameter J, as hereinafter defined, and loading Jo into a k-bit register; 2 - loading the multiplier B and the modulo N into respective registers of n-bit length, wherein n=m-k; 3 - setting an n-bit long register S to zero; and 4 - carrying out an i-iteration m times, wherein i is from zero to m-1 , each ith iteration comprising the following operations: a) transferring the ith character Ai-i of the multiplicand A from Aj register means to storing means chosen from among register and latch means; b) generating the value X = S(i-l) + A(j.i) · B, wherein S(i-l ) is the "updated" value of S, as hereinafter defined, by : I - cycle right shifting of the B register into multiplying means, II - serially multiplying B by Aj-ij III - cycle right shifting of the modulo register N, IV - determining the "updated" value of S(i-l) as the value stored in the S register after the (i-l)th iteration, if the same is not greater - - 2815/93 than N, or if it is greater than N, by serially subtracting N from it and assuming the resulting value as the "updated" value of S(i-l); and V - cycle right shifting of the register S and serially adding the value of the multiplication A(i-i) · B bit by bit to the "updated" value of S; c) multiplying the LS character of X, Xn by Jo and entering the value Xo · Jo mod 2k into register means as Yo, while delaying N and X by k clock cycles; d) calculating the value Z = X + Yo · N by: I - multiplying Yo by N by a delayed right shifting of the N register concurrent with the aforesaid right cycle shifting thereof, and II - adding X to the value of Yo · N; e) ignoring the least significant character of Z and entering the remaining characters into the S register, whereby to enter Z/2k, except for the last iteration; f) comparing Z/2k to N bit by bit for the purpose of determining the updated value of S, S(i) in the manner hereinbefore defined; g) wherein the ith character of the multiplicand Aj is loaded into the A register means at any time during the aforesaid operations; 5) at the last (mth) iteration, ignoring the least significant character of Z/2k and entering the remaining characters into the B register, as the value of C Υ _Ρ (Α · Β)Ν; 6) repeating the steps 3) to 4), wherein C or C-N, if C is greater than N, is substituted for B and H is substituted for A, whereby to calculate P = 2(C · H) modN; and 7) assuming the value of obtained from the last iteration as the result of the operation A · B mod N. 2815/93 Said multiplication method is particularly described in application No. 104753, and therefore will be designated as "the multiplication method of application No. 104753 or a Montgomery multiplication, or a multiplication in the T field of numbers".

Said copending applications also describe a method for performing the ^✓modular exponentiation of D = AE modN which comprises the following steps: 1) loading the modulo number into the aforesaid register N; 2) setting the aforesaid register S to zero; 3) loading the base A to be exponentiated into the aforesaid register B; 4) storing the exponent E in a computer register; 5) shifting said exponent E left; 6) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 7 to 9: 7) for every one of said bits, regardless of its being 0 or 1, squaring the content of register B by the multiplication method hereinbefore set forth, wherein the successive characters of the base are loaded into register Aj from register B; 8) if and only if the current bit of the exponent E is 1 , multiplying, after performing operation 7), the content of register B by the base A; and 9) after each Montgomery square or Montgomery multiply operation to perform a Montgomery C · H multiplication (i2?(C · H)N, and 10) after performing steps 6-9 for all bits of E, storing the result of the last operation as D Y AE mod N in register B. 2815/93 Further, said copending applications describe a method for performing modular exponentiation of D = AE mod N which comprises the steps of: 1 ) loading the modulo number into the aforesaid register N; 2) setting the aforesaid register S to zero; 3) loading the base A to be exponentiated into the aforesaid register B; 4) storing the exponent E in a computer register, and a precalculated parameter T in the CPU memory; 5) shifting said exponent E left; 6) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 7 to 8: 7) for every one of said bits, regardless of its being 0 or 1, carrying out operations 4 and 5 of the multiplication method hereinbefore set forth, wherein both the multiplicand and the multiplier are the base A, and wherein the successive characters of the base are loaded into register Ai from register B; 8) if and only if the current bit of the exponent E is 1 , carrying out, after performing operation 7), operations 4 and 5 of the multiplication method hereinbefore set forth, wherein the multiplicand is the content of register B and the multiplier is the base A; and 9) after performing steps 7-8 for all bits of E, performing an additional Montgomery multiplication of register B by the parameter T (2(B ■ T) N)> and then storing the result of the last operation as D Y AE mod N in register B.

Parameter T is defined as T = (2n)S modN, wherein S = 2Q-1+E mod 29-1, as explained in detail in the parent application. - - 2815/93 It is the purpose of this invention to provide an improved method for performing an exponentiation operation, and to provide a method for performing a multiplication operation of large numbers in the conventional field of numbers (not the modular field), by the same hardware described therein, the preferred embodiment of which is the device described in the parent application and in the addition application number 104753, and to which multiplexer M2_l ;6 in Figure 3 (38 in Figure 2) has been appended.

Summary of the Invention This invention provides an even more improved method for performing modular exponentiation of D = AE mod N, which comprises the steps of: 1) storing the exponent E in a computer register. 2) loading the modulo number into the aforesaid register N; 3) setting the aforesaid register S to zero; 4) performing a multiplication operation, by the method of application No. 104753, while A is the operand to be exponentiated, and H is a precalculated parameter as defined before. 5) loading A into the base register B. 6) performing a squaring operation of the contents of register B. 7) shifting said exponent E left; 8) ignoring all the zero bits thereof which precede the first 1 bit and ignoring the first 1 bit of said exponent E, and for all the following bits performing the operations 9 to 10: 9) for every one of said E bits, regardless of its being 0 or 1 , carrying out operations 4 and 5 of the squaring method hereinbefore set forth, wherein both the multiplicand and the multiplier originate from the B - - 2815/93 register, and wherein the successive characters of the Montgomery multiplier are loaded into register Aj from register B; 10) if and only if the current bit of the exponent E is 1, carrying out, after performing operation 9, operations 4 and 5 of the multiplication method hereinbefore set forth, wherein the multiplicand is the content of register B and the multiplier is the base A ; and 11 ) after performing steps 8-10 for all bits of E, performing an additional Montgomery multiplication of register B by the original base A and then storing the result of the last operation as D ¥ AE modN in register B if the exponent is odd; if the exponent were even, perform an additional Montgomery multiplication of D times 1: B Y iP(D-l) Y DI It is seen that the exponentiation method of this invention eliminates the need for the computation of the parameter T, hereinbefore mentioned.

It has further been found, and this is another object of the present invention, that the machine described in the previous applications (in a 512 bit register size form) permits obtaining the result of the conventional multiplication of two n/2 bit numbers (actually any two operands which when multiplied will not cause a result longer than n bits, i.e. an overflow) without using the additional hardware or the cumbersome operations that would be required to obtain it according to the prior art. This is achieved by carrying out modular multiplication of said numbers by the multiplication process of the aforesaid application No. 104753, wherein the modulo number N is an n-bit number consisting of all "l's" (fiFffff....fff), equating Jo to 1 , and loading - - 2815/93 the multiplicand in B and manipulating A as in said multiplication process of application No. 104753.

The device for carrying out such multiplication in the normal field of numbers by the aforesaid method can be the same device described in our two copending applications, hereinbefore mentioned, which comprises control means including a CPU and a multiplication circuit which comprises: an n-bit shift register B for the multiplier; an n-bit shift register N for the modulo ; an n-bit shift register for the value S as herein defined; a k-bit register Aj for the multiplicand; k-bit register means for the values Jo and Yo as herein defined; multiplier means for multiplying the content of the B register by that of the Aj register; additional n-bit multiplier means; and adding, subtracting, multiplexing and delay means.

The said device is particularly described in parent application No. 103921 , and therefore will be called hereinafter "the device of application No. 103921".

Preferably, all connections between the n-bit registers and the remaining components and between components none of which is a latch, are 1-bit connections. - - 2815/93 Description of the Drawings In the drawings: - Fig. 1 is a block diagram of an apparatus suitable for carrying out the invention; - Fig. 2 is a schematic block diagram of a modular multiplication circuit forming part of said apparatus with an additional multiplexer [38] which can force N to be all ones; - Fig. 3 shows a particular modular multiplication circuit as appeared in the parent application with an additional multiplexer [M2_l;6], which can force N to be all ones; and - Fig. 4 is a schematic diagram illustrating the timing relationship j between the various operation of an iteration of the multiplication operation according to an embodiment of the invention; Detailed Description of Preferred Embodiments Fig. 1 illustrates in block diagram form a device for carrying out the methods according to the invention. The device comprises: 1) A complete Central Processing Unit (CPU) 2) Counters 3) A State Machine.

The CPU contains volatile and non-volatile memory some of which can be utilized by this multiplication process. The CPU controls the modular arithmetic block in the circuit. - - 2815/93 The CPU: 1 ) Communicates with a host. 2) Loads and unloads data to and from the chip. 3) Commands the circuit to perform a sequence of mathematical operations. 4) Is responsible for other cryptographic and noncryptographic, and data processing operations.

The counters generate the address for the embodied State Machine.

The embodied State Machine decodes the addresses and generates control signals to the MULT block. These control signals command the MULT block to perform the proper sequence of operations necessary to calculate the iP(A · B)N transformation (where A can be equal to B).

Fig. 2 shows in block diagram form a modular multiplication circuit according to the invention, which can be used for carrying out modular squaring and modular exponentiation. Numerals 10, 11 and 12 indicate three registers that are n-bit long (n = m-k) which constitute B, S and N registers respectively into which the multiplier value S and the modulo number are loaded. The aforesaid registers are preferably divided into two n/2 registers, preferably including a k least significant bit subdivision for the N and B registers. Multiplexers 13, 14 and 15 respectively are placed before the said registers, and if they are subdivided into component parts, a multiplexer is placed before each subdivision. Also shown in a block diagram, these registers are intended to be serially loaded, but it would also be possible to load them in parallel. 16, 17 and 18 are three registers, each of which is k bits long, for receiving the values Ai, Jo, and YQ values respectively. - - 2815/93 Registers 16 and 17 are serial load-parallel output or serial and parallel load-parallel output shift registers. Register 18 is preferably a serial in parallel output shift register. The content of these registers is intended to be processed by multiplying means 19 and 20 through components 21 and 22, which are preferably k-bit latches. If they are latches, they are loaded from registers 16, 17 and 18 through k-bit buses. If they are registers, they can be serially loaded through 1-bit connections. Numerals 24, 25, 25', 26, 36, 37 and 38 also designate multiplexers. Multipliers 19 and 20 may be A serial, B parallel inputs, serial output multiplier means or any other serial/parallel inputs-serial outputs multiplying means. Multiplexer 38 can force the modulus N to be all "l"s for multiplying in the normal field of numbers.

Numerals 27, 28, 29, 30, and 31 designate 1-bit full/half adder/subtract means. 31 designates a full adder/subtract means. 32, 33 and 34 designate k-bit k-clock cycle delay means capable of delaying digital signals, which may be composed of analog or digital components, though digital components are preferred. 35 is a Borrow detector, which is a two bit latch/storage means. As is seen, the device according to the invention - although it is intended to handle large numbers such as 512-bit numbers - does not comprise buses, except optionally a few k bit buses, and this constitutes an important saving of hardware. When registers B, S and N comprise n/2 bit parts, the device of the invention can be used to carry out multiplication and exponentiation operations on 256-bit numbers, which is a substantial advantage as to the flexibility of the use of the device. - - 2815/93 Fig. 3, which is self-explanatory and which is derived from the parent application, shows a preferred embodiment of the device of Fig. 2 wherein multiplexer M2_l;6 has been appended to the original Figure 3.

It will be evident to skilled persons how the device of Fig. 2 or Fig. 3 carries out the operations which constitute the multiplication method according to the invention. The timing relationship of said operation is, however, further illustrated in Fig. 4. Said figure diagramatically illustrates all the various operations carried out in effective successive clock cycles in an embodiment of the invention, in which n = 512, k = 32 and m = 16. This is a fairly common situation in the encryption art. When the invention is carried out according to the embodiment illustrated in Fig. 3, the same device can be used to operate with n = 256, as well.

In Fig. 4 a succession of the various operations is illustrated as a function of the effective clock cycles, which are marked on the abscissa axis. At the beginning of the operation and before any of the iterations which form a part of the modular multiplication method according to the invention, the values of B, N and S are loaded in the respective registers. The first character of A is also loaded into the respective register [16]. As soon as an iteration begins and during k clock cycles, the shifting of the content of the B and S registers is carried out. The generation of the X value takes place during n+k effective clock cycles, the first k clock cycles being occupied by entering the value of Xo. During the first effective k clock cycles the value of Yo has been entered. During the next effective n+k clock cycles, the value of X, which had been introduced into multiplier 20, is now shifted or - - 2815/93 introduced into adder 31 after having been delayed by delay 34. The value of N is used at three different time phases. First, to "update" S and B, second, delayed k effective clock cycles to multiply by Yo, and then delayed a second k effective clock cycles to sense how the next value of S or B will be "updated". During the same n+k effective clock cycles, Z is calculated, as well as Z/2k. The value of Aj is loaded beginning with the first k effective clock cycles and continuing during the successive part of the iteration. The final value of Z/2k is entered into register S (or B) during n clock cycles after the first 2k effective clock cycles.

As well known in the art, and by definition, an exponentiation operation is actually a series of multiplications of the base operand by itself. As mentioned in our two previous applications, each mutiplication operation performed by our machine adds a parasitic product I to the result e.g. to calculate C=A2modN by the machine of our invention, a product results. In order to eliminate the parasitic quantity I, a second operation of C =_F(C-H)N = A^modN is needed for each multiplication operation (H is a precalculated parameter). Thus, according to the parent application, in order to calculate AE, the operation of C=-KC-H)N should be repeated as many times as the Montgomery squares or multiplys were performed. In our addition application No. 104753 we introduced the parameter T, that can be precalculated, and is equal to T = (2n)S modN, wherein S = 2Q-1+E mod 2Q-1, as explained in detail in application number 10753. As shown in application No. 104753, only one multiplication of C=_P(C -T)N is needed at the end of all the multiplication operations, and there is no need for the many - - 2815/93 multiplications of C=_F(C-H)N for each time a Montgomery square or multiply is performed, but for many purposes the large exponents are not constants (i.e. the NIST DSS signature algorithm) and T cannot be precalculated .

The introduction of the parameter T can be avoided if the following steps are followed in order to calculate AE: Assuming that we have precalculated the Montgomery constant, H, and that our device can both square and multiply in the P field, we wish to calculate: C = AE mod N.

Let E(j) denote the j bit in the binary representation of the exponent E, starting with the MS bit whose index is 1 and concluding with the LS bit whose index is q, we can exponentiate as follows for odd exponents: Α*¥-ΗΑ·Η)Ν B = A* FOR j = 2 TO q B¥iHB-B)N IF E(j) = l and J ≠ q THEN B ¥ !ΗΒ·Α*)Ν IF J = q THEN Β¥ίΚΒ·Α)Ν ENDFOR C = B In the transition from each step to the next, N is subtracted from B whenever B is larger than or equal to N. 2815/93 After the last iteration, the value B is ¥ to AE mod N, and C is the final value.

For even exponents, the last step could be: IF J = q THEN Β¥2 Β·1)Ν To clarify, we shall use the following example: E = 1011 --> E(i) =1 ; E(2)=0; E(3) =1; E (4) = l ; To find Anmod N; q = 4 A*=. A-H)N ≡ A 2 I=AI-lmod N B = A* for j = 2 to q B = ί Β·Β)Ν which produces: A2(I-l)2.I = A2!'1 E(2) = 0; B = A2I-1 j = 3 B = 2(B.B)N — — -> (Α^Ι-1)2·! = A4?1 E(3)=l B = _KB-A*)N > (A4l-i) (AI-1)-I = A5l-i j = 4 Β = £ΚΒ·Β)Ν > A10-I-2-I = A!O-I-1 As E(4) was odd, the last multiplication will be by A, to remove the parasitic I-1.

B = ί(Β·Α) > A!OI-1 · A-I = A11 C = B Numerical example: AE mod N; A=91i6; E = llio = 10112; N=A59i6; H = 44B - - 2815/93 Therefore: q = 4; J = 217; n = 12 To show that the basic tenets hold we calculate I and I'l using Montgomery multiplication: 2U-1)N = M-I _ 1-H)N =1· I-2-I = i-1 χ = 1·1 x = 44b y = 1-217 mod 24 y = 44b- 217 mod 24 z = l +217*a59 = 15a000 z = 44b + 8bda59= 5a7000 z/l000 = 15a < N z/1000 < N 1=15a 1-1 =5a7. q = 4;n = 12; J = 217; A = 91; E(2) = 0;E(3) = E(4)=1;N Α*=2{Α·Η)Ν = ΑΓ1 = i(91-44b)N x = 91-44b = 26e7b y = x · J mod 24 = 30d z = x + (30d) (a59) = 220000i6 > z/1000 = check: A-Hmod N = 91· 5a7 mod a59 >z1000 = 220 AI-Va59 = 4f remainder 220 A* = 220 B = A*= 220i6 forj = 2 to 4 j = 2 B = ί(Β · B)N = 2(A* · A*)N B= ¾220-220)N x = 2202 = 48400 y = x-Jmod 2n = cOO z = x + (cOO) (a59) = 80b000 z/1000 = 80b A2!-1 = 80b check:A2mod a59 = 9b2 ~> — >9b2-5a7 mod a59-36cde/a59= 54b remainder 80b - - 2815/93 E(2) = 0 B = (A · I-i · A · I-i · I) = A2 - I-i j = 3 B = Β·Β)Ν = (A2-I-1 )2 . 1 niodN = A4?1 x = 80b- 80b = 408079 y = x · J mod 2n= ...cdf z = x + (40b079 +(cdf)-(a59)=c5e000 z/iooo = c5e c5e > N B = c5e - a59 = 205 = A4l-1 check: A4 = 9b22modN ≡ 577; 577 · H= 577-5a7 ≡ 205 E(3)=l B = ¾205 · A*)N = A5-?1 x = 205 · 220 = 44aa0 y = x -217 mod 2n = .... 460 z = x + (460) (a59) = 319000 z/l000 = 319 check: A4?1- A s 205· modN = 319 A5 = 577-91 ≡ 5fb ; 5fb-5a7 = A-5 -I-l = 319 j = 4 Β = -Κ319·319)Ν = Α10Ι-1 x = 319-319 = 99871 y = x- Jmod2n = 427 z = x + (427) (a59) = 349000 z/1000 = 349 check: 5fb-5fbN ≡ A^N ≡ 8c5 A10-?1 ≡ 8c5-5a7 ≡ 349 as: E(4) = 1 multiply by A (not A*) B = ί¾Α10.Ι-ΐ.Α)Ν = Α11Ν .

X = 349-91 = 10c59 y = x-Jmod2n = dff z = x + (dfi) (A59) = 92b000 AH mod N = 92b check: X= 8c5-91 = 4f795 — > X/N = 7a remainder 92b Had we wanted to find Ai0 (E(4) = 0), i.e. an even exponent - - 2815/93 we would take - A10-!"1 and multiply it by 1 !K349-1)N x = 349 y = 349-217 mod 2 1 = .... d8f z = x + (d8f (a59) = 8c5000 AlO = 8c5 which was shown previously.

As hereinbefore stated, the apparatus described in the aforesaid copending applications permits to obtain the result of a conventional multiplication of large numbers. For example, it is possible to multiply two 256 bit numbers if a machine of 512 bit registers is used. In order to do this, value of all "l's" (ffffff...fff) is loaded into register N or alternatively supplied by a switch providing a constant logic "1" at the output of register N and the remaining operation is carried out as in application number 104753.

If N = ffff....fff , the following conditions exist: 1) H=J=Jo=I=I-1=l and: 2) C=fZ(A · B)N=A · B · I, but, 1=1 (as shown in 1), so that : 3) C=A · B That means that the result C is equal the multiplication of A ■ B in the conventional field of numbers. 2815/93 Example: (All numbers in Hexadecimal) N=ffffff, so: H=J=J0=I=I-1=1 Let: A=f79, B=efe, so: A0=79, Ai=f, A2=0 So=0 X=0+79+efe=7160e Y=l*Oe=e=Y0 Z=7160e+e*fffFff=e071600, and by dropping the least significant 00: Sl=e0716 X=e0716+f*efe=ee7f8 Y=l*f8=Y0 Z=ee7f8+f8*ffffff=f80ee700, and by dropping the least significant 00 (zeros): X=S2+0*B=S2=f80ee7 Y=l*e7=e7 Z=X+e7*ffffff=e7f80e00, and by dropping the least significant 00: S3=e7f80e Check: e79*efe=e7f80e While an embodiment of the invention has been described by way of illustration, it will be evident that the invention may be carried out by skilled persons with many modifications, variations and adaptations, without departing from its spirit or exceeding the scope of the claims. 106923/2 19179nec.lai DZ-OTH 05MAY96

Claims

1. Microelectronic apparatus for performing modular multiplication of a multiplier by a multiplicand, the apparatus comprising: first (B) , second (S) and third (N) main switched and clocked serial-in serial-out registers respectively operative to store the multiplier, a partial result and a modulus; a first multiplying device in which the multiplicand resides and which is operative, for each of a plurality of portions of the multiplicand in turn, to receive the multiplier from the B register, to multiply the multiplier by a current portion of the multiplicand, and to generate an output comprising a product of said multiplication; a serial adder operating on the output of the first multiplying device and a limited congruence of the partial result residing in the S register and operative to provide an output; a second multiplying device receiving, in a first phase, the output of the serial adder and a Montgomery constant and receiving, in a second phase, the modulus from the N register, and operative, in the first phase, to compute a first phase product of the Montgomery constant by a portion of the output of the serial adder and, in the second phase, to multiply the modulus by the first phase product, thereby to generate a second phase output which, when combined with the .serial adder output, generates said partial result; and a subtractor for subtracting the modulus from the contents of the S register, to produce a limited congruence thereof , wherein, after the plurality of portions of the multiplicand have been processed by the first multiplying device, said partial result constitutes a limited congruence of a result of performing said modular multiplication of said multiplier by 106923/2 said multiplicand.

2. Apparatus according to claim 1 further comprising a second subtractor for subtracting the modulus from the contents of the B register, to produce the contents of the B register reduced by the modulus, and wherein said first multiplying device includes a first serial/parallel multiplying device serially receiving said contents of the B register reduced by the modulus, and receiving the multiplicand in parallel.

3. Apparatus according to claim 1 wherein said first multiplying device includes a first input latch in which the multiplicand resides.

4. Apparatus according to claim 1 wherein the second multiplying device includes a second input latch which receives the multiplicand.

5. Apparatus according to claim 1 employing no multiplying devices other than said first and second multiplying devices .

6. Apparatus according to claim 1 wherein said second multiplying device comprises a multiplexed serial/parallel multiplying device.

7. Apparatus according to claim 1 wherein a Montgomery constant Jn resides in the second multiplying device in the first phase and the first phase output of the second multiplying device resides in the second multiplying device in the second phase.

8. Apparatus according to claim 7 comprising an array of 2-to-l multiplexers operative to feed the Montgomery constant into said second multiplying device in the first phase, and to feed the first phase output of the second multiplying device into the second multiplying device, in the second phase. 106923/1

9. Apparatus according to claim 8 comprising a serial/parallel register serially receiving the output of the second multiplying device in the first phase and feeding the output in parallel into said second multiplying device via said multiplexer array in the second phase.

10. Apparatus according to claim 1 wherein the lengths of the first and second multiplying devices are both k, the apparatus also comprising: a second adder operative to receive, with a delay of k effective clock cycles, the output of the serial adder and to receive the second phase output of the second multiplying device and to add these outputs, thereby to generate second adder output in which the k least significant bits are zeros, and to feed said second adder output into a selected one of the B register or the S register; and a k-bit delay unit intermediate the serial and second adders operative to provide the delay of k effective clock cycles .

11. Apparatus according to claim 9 wherein the lengths of the first and second multiplying devices are both k, the apparatus also comprising a second adder operative to receive, with a delay of k effective clock cycles, the output of the serial adder and to receive the second phase output of the second multiplying device and to add these outputs, thereby to generate second adder output in which the k least significant bits are zeros, and to feed said second adder output into a selected one of the B register or the S register, wherein said array of multiplexers comprises k 2-to-l multiplexers and wherein said serial/parallel register is of length k; and a k-bit delay unit intermediate the first and second adders operative to provide the delay of k effective clock 106923/1 cycles.

12. Apparatus according to claim 10 also comprising a borrow sensing device operative to receive output from the second adder and to determine whether the output of the second adder is larger than or equal to the modulus.

13. Apparatus according to claim 10 also comprising a first serial subtracter receiving the contents of the B register, subtracting therefrom the modulus, thereby to compute a modularly reduced multiplier, if the second adder output is larger than or equal to the modulus, and feeding the modularly reduced multiplier to said first multiplying device.

14. Apparatus according to claim 10 also comprising a second subtracter receiving the contents of the S register and subtracting therefrom the modulus, thereby to compute a modularly reduced multiplier, if the second adder output is larger than or equal to the modulus, and feeding the modularly reduced output of the S register to said multiplying device.

15. Apparatus according to claim 1 wherein said first and second multiplying devices are each of length k and wherein the duration of the first phase is k effective clock cycles.

16. Apparatus according to claim 1 wherein the serial adder operates on the output of the first multiplying device and modularly reduced contents of the S register.

17. Apparatus according to claim 1 wherein said main switched and clocked registers are subdivided.

18. Apparatus according to claim 13 also comprising: a second subtracter receiving the contents of the S register; and a comparator determining whether the second adder 106923/1 output is larger than or equal to the modulus, wherein the comparator is operatively associated with the second subtracter so as to control subtraction of the modulus from the contents of the S register.

19. Apparatus according to claim 13 comprising: a second subtracter receiving the contents of the B register; and a comparator determining whether the second adder output is larger than or equal to the modulus, wherein the comparator is operatively associated with the second subtracter so as to control subtraction of the modulus from the contents of the B register.

20. Apparatus according to claim 10 wherein said second adder comprises a serial adder.

21. Apparatus according to claim 1 wherein said second phase output comprises serial output. For the Applicant, Co. C: 19179