JP5294787B2

JP5294787B2 - Data processing apparatus and data processing method

Info

Publication number: JP5294787B2
Application number: JP2008263670A
Authority: JP
Inventors: 雅之吉野; 勝幸桶屋; カミーユヴィオム
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2008-10-10
Filing date: 2008-10-10
Publication date: 2013-09-18
Anticipated expiration: 2028-10-10
Also published as: JP2010091913A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data processing device capable of improving computation efficiency of a modular multiplication with respect to a data having a bit number exceeding twice as large as the arithmetic bit number of a modular multiplier. <P>SOLUTION: When a quotient and a remainder of 2w-bit modular multiplication are calculated from a quotient and a remainder of w-bit modular multiplication by recursively repeating computation of modular multiplication in a plurality of times by a computation unit (310), a control unit (320) controls to distribute the remainder and the quotient of w-bit modular multiplication determined by the preceding computation of modular multiplication into the next computation of modular multiplication. Accordingly, no new computation is required for the quotient of preceding computation necessary to the succeeding computation to be recursively carried out, compared to an arithmetic algorithm in which computation in the preceding modular multiplication demands only a remainder of w-bit modular multiplication. This improves the computation efficiency of modular multiplication on data having a bit number equal to a multiple of two of the computation bit number of a modular multiplication unit. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報セキュリティ分野における、剰余乗算機能を備えたデータ処理装置に関し、例えば、剰余乗算を用いた暗号技術を適用したＩＣカード用マイクロコンピュータ、さらには当該マイクロコンピュータを備えたＩＣカードに関する。 The present invention relates to a data processing apparatus having a modular multiplication function in the field of information security. For example, the present invention relates to an IC card microcomputer to which an encryption technique using modular multiplication is applied, and further to an IC card having the microcomputer.

公開鍵暗号の事実上標準であるＲＳＡ暗号に代表されるように、剰余乗算は、暗号技術における最も基本的な演算の一つである。重い計算処理である剰余乗算を高速に実行するため、ＩＣカード等の多くの暗号機器が剰余乗算を処理できる専用のハードウェアとして剰余乗算ユニット(以降、剰余乗算専用器とも呼ぶ）を搭載している。 As represented by RSA cryptography, which is a de facto standard for public key cryptography, modular multiplication is one of the most basic operations in cryptography. In order to perform heavy multiplication, which is a heavy calculation process, at high speed, a modular multiplication unit (hereinafter also referred to as a modular multiplication unit) is installed as dedicated hardware that can be used by many cryptographic devices such as IC cards. Yes.

一方、剰余乗算を採用する暗号アルゴリズムでは、安全性の観点から、年々、長い鍵長を推奨する傾向がある。特に、公開鍵暗号の代表的な暗号アルゴリズムである、ＲＳＡ暗号や楕円曲線暗号に対しては、近年の計算機器の性能向上や解読アルゴリズムの改善により、より長い鍵長が要求されている。長い鍵長に対応するには、回路規模が大きな前記の剰余乗算専用器を必要とするが、回路規模の増加は、生産コストの増加を招いてしまう。また、近年はＲＦＩＤに代表される小型機器に対しても、利用者のプライバシー保護などを目的に、暗号装置の採用に対する要望が強い。従って、前記の剰余乗算専用器に対する回路の小規模化の要望が強い。そこで、計算できる最大ビット長が短い演算器を用いて長いビット長の剰余乗算を演算できるデータ処理装置が提供されている。計算できる最大ビット長を抑制して、回路全体の小規模化に貢献できる技術として、剰余乗算専用器で計算できる最大ビット長の２倍の剰余乗算を実現する技術を記載した非特許文献１、非特許文献２、及び非特許文献３がある。また、上記の非特許文献１乃至３の一般的な計算手順を整理し、汎用化した文献として、非特許文献４がある。 On the other hand, cryptographic algorithms that employ modular multiplication tend to recommend a long key length year by year from the viewpoint of security. In particular, RSA encryption and elliptic curve encryption, which are typical encryption algorithms for public key encryption, require longer key lengths due to recent improvements in performance of computing devices and improvements in decryption algorithms. In order to cope with a long key length, the above-described dedicated modular multiplication unit having a large circuit scale is required. However, an increase in the circuit scale leads to an increase in production cost. In recent years, there has been a strong demand for adopting a cryptographic device for the purpose of protecting the privacy of users even for small devices typified by RFID. Therefore, there is a strong demand for downsizing the circuit for the above-mentioned residue multiplication dedicated device. In view of this, a data processing apparatus is provided that can calculate a remainder multiplication with a long bit length using an arithmetic unit with a short maximum bit length that can be calculated. Non-Patent Document 1, which describes a technique that realizes a remainder multiplication twice the maximum bit length that can be calculated by a dedicated residue multiplication device, as a technique that can suppress the maximum bit length that can be calculated and contribute to downsizing of the entire circuit. There are Non-Patent Document 2 and Non-Patent Document 3. Further, Non-Patent Document 4 is a generalized document obtained by organizing the general calculation procedures of Non-Patent Documents 1 to 3 described above.

非特許文献４では、最大ｗビットの剰余乗算が計算可能な前記の剰余乗算専用器を用いて、剰余乗算の商を計算するアルゴリズム１と、その商を用いて、最大２ｗビットの剰余乗算（の剰余）を計算するアルゴリズム２を紹介している。ただし、剰余乗算の商ｑと剰余ｒでは、
ｒ＝xy2^-m mod (z)…（式ａ）
xy＝qz＋r2^ｍ…（式ｂ）
の式が成り立つ。 In Non-Patent Document 4, algorithm 1 for calculating a quotient of remainder multiplication using the above-described residue multiplication exclusive device capable of calculating a remainder multiplication of maximum w bits and a remainder multiplication (up to 2 w bits) using the quotient. The algorithm 2 for calculating the remainder) is introduced. However, in the quotient q and the remainder r of the remainder multiplication,
r = xy2 ^-m mod (z) (Formula a)
xy = qz + r2 ^m (Formula b)
The following equation holds.

［アルゴリズム１］
入力：x、y、z、ただし0≦x、y<z、gcd(z, 2^m) = 1、かつ0≦m<ｗ
出力：xy/z、xy mod (z)
ステップ1. r ←xy2^-m mod (z)
ステップ2. r'←xy2^-m mod(z + 2^m)
ステップ3. q ← r - r'
ステップ4. q ≦-2^m, q ← q + z + 2^m
ステップ5. Return (q、r)
上記アルゴリズム１では、2種類の剰余乗算の剰余（ステップ１の剰余ｒとステップ２の剰余ｒ’）から、剰余乗算の商（q）を計算する。従って、上記アルゴリズム１は、最低でも２つの剰余乗算の剰余を演算することが必要になる。gcd(z, 2^m) = 1はzと2^mの最大公約数が１であること、即ちzと2^mが相互に素数であることを意味する。 [Algorithm 1]
Input: x, y, z, where 0 ≦ x, y <z, gcd (z, 2 ^m ) = 1 and 0 ≦ m <w
Output: xy / z, xy mod (z)
Step 1. r ← xy2 ^-m mod (z)
Step 2. r '← xy2 ^-m mod (z + 2 ^m )
Step 3. q ← r-r '
Step 4.q ≤-2 ^m , q ← q + z + 2 ^m
Step 5. Return (q, r)
In the algorithm 1, the quotient (q) of the remainder multiplication is calculated from the remainders of the two types of remainder multiplication (the remainder r in step 1 and the remainder r ′ in step 2). Therefore, the algorithm 1 needs to calculate the remainder of at least two remainder multiplications. gcd (z, 2 ^m ) = 1 means that the greatest common divisor of z and 2 ^m is 1, that is, z and 2 ^m are mutually prime.

［アルゴリズム２］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY mod Z
ステップ1. r1 ← x1 y1 2^-m mod(z1) and q1 ← x1 y1 -r1 z1 2^m
ステップ2. r2 ← q1 z0 2^-m mod(c) and q2 ← q1 z0 -r2 c 2^m
ステップ3. r3 ← (x0+x1)(y0+y1) 2^-m mod(c) and q3 ← (x0+x1)(y0+y1)-r3c2^m
ステップ4. r4 ←x0 y0 2^-m mod(c) and q4 ← x0 y0 -r4c2^m
ステップ5. r5 ← c(-q2+q3-q4+r1) 2^-m mod(z1) and q5 ← c(-q2+q3-q4+r1)-r5 z1 2^m
ステップ6. r6← q5 z0 2^-m mod(c) and q6 ← q5 z0 -r6 c 2^m
ステップ7. Return (q2 + q4 - q6 - r1 - r2 + r3 - r4 + r5)c + (r2 + r4 - r6)2^m
上記アルゴリズム２の入力において、最大ビット長２ｗのデータＸ、データＹ、データＺをより小さなビット長のデータx1、x0、y1、y0、z1、z0、c、2^mで表す。ただし、ｍは前記の剰余乗算専用器が実装する剰余乗算から定まるため、ｃの値を設定すれば他の値が定まる。例えば、ｍの値に応じて、c=1(m=wのとき)、c=2^m(m=0)、c=2^w-1(ｍは任意)などの値をcに設定する。このアルゴリズム２では、上記小さいビット長のデータに対してアルゴリズム１で求めた商を用いて余りを求める演算を繰り返して最後にXYmodZで表されるXYをZで割った余りが求められる。このアルゴリズム２では、６組の剰余乗算の商と剰余(ステップ1からステップ６)を要する。相対的に計算量の軽い加算や減算を無視した場合、上記アルゴリズム２は、６組の剰余乗算の商と剰余の計算コストの総和とほぼ等しい計算コストを有する。 [Algorithm 2]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
Output: XY mod Z
Step 1.r1 ← x1 y1 2 ^-m mod (z1) and q1 ← x1 y1 -r1 z1 2 ^m
Step 2.r2 ← q1 z0 2 ^-m mod (c) and q2 ← q1 z0 -r2 c 2 ^m
Step 3.r3 ← (x0 + x1) (y0 + y1) 2 ^-m mod (c) and q3 ← (x0 + x1) (y0 + y1) -r3c2 ^m
Step 4.r4 ← x0 y0 2 ^-m mod (c) and q4 ← x0 y0 -r4c2 ^m
Step 5.r5 ← c (-q2 + q3-q4 + r1) 2 ^-m mod (z1) and q5 ← c (-q2 + q3-q4 + r1) -r5 z1 2 ^m
Step 6.r6 ← q5 z0 2 ^-m mod (c) and q6 ← q5 z0 -r6 c 2 ^m
Step 7. Return (q2 + q4-q6-r1-r2 + r3-r4 + r5) c + (r2 + r4-r6) 2 ^m
At the input of the algorithm 2, the data X of the maximum bit length 2w, represented by data Y, data x1 of the data Z smaller bit length, x0, y1, y0, z1 , z0, c, 2 m. However, since m is determined from the residue multiplication implemented by the above-described dedicated multiplier device, other values are determined by setting the value of c. For example, according to the value of m, values such as c = 1 (when m = w), c = 2 ^m (m = 0), c = 2 ^w −1 (m is arbitrary) are set to c. In this algorithm 2, an operation for obtaining a remainder is repeated using the quotient obtained in the algorithm 1 with respect to the data having the small bit length, and finally, a remainder obtained by dividing XY represented by XYmodZ by Z is obtained. This algorithm 2 requires 6 sets of remainder multiplication quotient and remainder (step 1 to step 6). When addition or subtraction with a relatively light calculation amount is ignored, the algorithm 2 has a calculation cost substantially equal to the sum of the quotients of six sets of remainder multiplication and the calculation cost of the remainder.

Ｗ．Ｆｉｓｃｈｅｒ，Ｊ．−Ｐ．Ｓｅｉｆｅｒｔ： “Ｉｎｃｒｅａｓｉｎｇｔｈｅｂｉｔｌｅｎｇｔｈｏｆｃｒｙｐｔｏ−ｃｏｐｒｏｃｅｓｓｏｒｓ” ＣＨＥＳ２００２，ｖｏｌ．２５２３ｏｆＬｅｃｔｕｒｅＮｏｔｅｓｉｎＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，Ｓｐｒｉｎｇｅｒ−Ｖｅｒｌａｇ，ｐｐ．７１−−８１（２００３）．W. Fischer, J.A. -P. Seifert: “Increasing the bit length of crypto-coprocessors” CHES2002, vol. 2523 of Lecture Notes in Computer Science, Springer-Verlag, pp. 2523 71--81 (2003). ＢｅｎｏｉｔＣｈｅｖａｌｌｉｅｒ−Ｍａｍｅｓ，ＭａｒｃＪｏｙｅ，ａｎｄＰａｓｃａｌＰａｉｌｌｉｅｒ： “ＦａｓｔｅｒＤｏｕｂｌｅ−ＳｉｚｅＭｏｄｕｌａｒＭｕｌｔｉｐｌｉｃａｔｉｏｎＦｒｏｍＥｕｃｌｉｄｅａｎＭｕｌｔｉｐｌｉｅｒｓ” ＣＨＥＳ２００３，ｖｏｌ．２７７９ｏｆＬｅｃｔｕｒｅＮｏｔｅｓｉｎＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，Ｓｐｒｉｎｇｅｒ−Ｖｅｒｌａｇ，ｐｐ．２１４−２２７（２００３）．Benoit Chevallier-Mames, Marc Joye, and Pascal Pallier: “Faster Double-Size Modular Multiplexing From Multiple Multipliers” CHES2003, vol. 2779 of Lecture Notes in Computer Science, Springer-Verlag, pp. 214-227 (2003). ＭａｓａｙｕｋｉＹｏｓｈｉｎｏ，ＫａｔｓｕｙｕｋｉＯｋｅｙａ，ａｎｄＣａｍｉｌｌｅＶｕｉｌｌａｕｍｅ．ＵｎｂｒｉｄｌｅｔｈｅＢｉｔ−ＬｅｎｇｔｈｏｆａＣｒｙｐｔｏ−ＣｏｐｒｏｃｅｓｓｏｒｗｉｔｈＭｏｎｔｇｏｍｅｒｙＭｕｌｔｉｐｌｉｃａｔｉｏｎ．ＩｎＰｒｅｐｒｏｃｅｅｄｉｎｇｓｏｆＳＡＣ２００６ｐｐ．１８４−１９８（２００６）Masayuki Yoshino, Katsukiyuki Okeya, and Camille Vuillaume. Universe Bit-Length of a Crypto-Processor with Management Multiplication. In Preprocesseds of SAC2006 pp. 184-198 (2006) ＭａｓａｙｕｋｉＹｏｓｈｉｎｏ，ＫａｔｓｕｙｕｋｉＯｋｅｙａ，ａｎｄＣａｍｉｌｌｅＶｕｉｌｌａｕｍｅ． “Ｄｏｕｂｌｅ−ＳｉｚｅＢｉｐａｒｔｉｔｅＭｕｌｔｉｐｌｉｃａｔｉｏｎ” ＡＣＩＳＰ２００７，ｖｏｌ．４５８６ｏｆＬｅｃｔｕｒｅＮｏｔｅｓｉｎＣｏｍｐｕｔｅｒＳｃｉｅｎｃｅ，Ｓｐｒｉｎｇｅｒ−Ｖｅｒｌａｇ，ｐｐ．２３０−２４４（２００７）Masayuki Yoshino, Katsukiyuki Okeya, and Camille Vuillaume. “Double-Size Bipartite Multiplication” ACISP2007, vol. 4586 of Lecture Notes in Computer Science, Springer-Verlag, pp. 4586. 230-244 (2007)

上記の非特許文献４に紹介された実現手法(以下、従来技術と呼ぶ）では、１回で計算できる最大ビット長（ｗ）に制限がある前記の剰余乗算専用器に、上記アルゴリズム１と上記アルゴリズム２用い、最大２倍のビット長（２ｗ）の剰余乗算を計算する。即ち、上記アルゴリズム１及び２を用いて最大ビット長ｗの剰余乗算の商と余りを逐次的に計算して、最大ビット長の２倍である２ｗの剰余乗算の剰余を求めることができる。本発明者は更にこれを発展させ、上記アルゴリズムによる２倍のビット長の剰余乗算を再帰的に繰り返すことによって４倍、さらには８倍のビット長の剰余乗算を行うことについて検討した。しかしながら、２倍のビット長の剰余乗算を再帰的に繰り返して４倍のビット長の剰余乗算を行うには、４倍のビット長の剰余乗算を行うには、２倍のビット長の剰余乗算で使用した商が必要に成る。しかしながら、非特許文献に記載された技術は２倍化に特化した技術であるため、その演算に使用した商を再帰的演算に使用可能なように出力することが考慮されていなかった。そのため、単純に上記アルゴリズムを４倍化にも適用する場合には、４倍のビット長の剰余乗算を行うときは、新たに２倍のビット長の剰余乗算のアルゴリズムに従って逐次商を特別に求める操作を追加しなければならない。しかもそのような操作の追加はアルゴリズム２の各ステップで行わなければならないから、全体としての演算処理時間が著しく増大する。以下に、剰余乗算器の最大ビット長の複数倍の剰余乗算演算に関する課題を整理して示す。 In the implementation method introduced in Non-Patent Document 4 (hereinafter referred to as the prior art), the above algorithm 1 and the above-described dedicated multiplication unit are limited in the maximum bit length (w) that can be calculated at one time. The algorithm 2 is used to calculate a remainder multiplication of a maximum bit length (2w). That is, the quotient and remainder of the remainder multiplication of the maximum bit length w are sequentially calculated using the above algorithms 1 and 2, and the remainder of the 2w remainder multiplication that is twice the maximum bit length can be obtained. The present inventor further developed this, and studied recursively repeating a double multiplication with a double bit length according to the above algorithm to perform a quadruple or even a 8-bit remainder multiplication. However, to perform a quadruple bit length remainder multiplication by recursively repeating a double bit length remainder multiplication, to perform a quadruple bit length remainder multiplication, a double bit length remainder multiplication is required. The quotient used in is needed. However, since the technique described in the non-patent document is a technique specialized for doubling, it has not been considered to output the quotient used for the calculation so that it can be used for the recursive calculation. Therefore, when the algorithm is simply applied to quadruple, when performing quadruple bit length remainder multiplication, a special sequential quotient is newly obtained according to a double bit length remainder multiplication algorithm. You must add an operation. Moreover, since such an operation must be added at each step of the algorithm 2, the overall processing time increases significantly. The following is a summary of problems related to a modular multiplication operation that is a multiple of the maximum bit length of the modular multiplier.

課題１：効率性
(a)：従来技術では、前記の剰余乗算専用器が１回で計算可能な最大ビット長（ｗ）を越える剰余乗算の商の計算コストが大きい。例えば、上記アルゴリズム１と上記アルゴリズム２に従い、前記の剰余乗算専用器の最大４倍のビット長（４ｗ）の剰余剰算の計算する例を示す。４ｗビットの剰余乗算の計算では、上記アルゴリズム２に従い、各６個の２ｗビットの剰余乗算の剰余（r1、r2、r3、r4、r5、r6）と商（q1、q2、q3、q4、q5、q6）が必要である。２ｗビットのそれぞれの剰余乗算の剰余を計算するには、再度、上記アルゴリズム２に従い、各６個のｗビットの剰余乗算の剰余（r1、r2、r3、r4、r5、r6）と商（q1、q2、q3、q4、q5、q6）が必要である。一方、２ｗビットの剰余乗算の商の計算においては、上記アルゴリズム１から、前記で得た２ｗビットの剰余乗算の剰余と、別の値を持つ剰余乗算の剰余の、2種類の剰余乗算の剰余が必要である。従って、２ｗビットの6個の剰余乗算の商（q1、q2、q3、q4、q5、q6）の計算には、異なる6個の２ｗビットの剰余乗算の剰余（r1'、r2'、r3'、r4'、r5'、r6'）が必要であり、これらは上記アルゴリズム２に従い、各6個のｗビットの剰余乗算の商と剰余を必要とする。 Challenge 1: Efficiency
(a): In the prior art, the calculation cost of the quotient of the remainder multiplication exceeding the maximum bit length (w) that can be calculated at one time by the above-described dedicated multiplier multiplication is large. For example, according to the algorithm 1 and the algorithm 2 described above, an example of calculating a remainder calculation with a bit length (4w) that is a maximum of four times that of the dedicated multiplication unit is shown. In the calculation of 4w bit remainder multiplication, according to the above algorithm 2, each of the 6w 2w bit remainder multiplications (r1, r2, r3, r4, r5, r6) and the quotient (q1, q2, q3, q4, q5) Q6) is required. In order to calculate the remainder of each 2w-bit modular multiplication, again according to the above algorithm 2, the remainder (r1, r2, r3, r4, r5, r6) and the quotient (q1) , Q2, q3, q4, q5, q6). On the other hand, in the calculation of the quotient of the 2w bit remainder multiplication, the remainder of the two kinds of remainder multiplication, that is, the remainder of the 2w bit remainder multiplication obtained above from the algorithm 1 and the remainder of the remainder multiplication having another value are obtained. is necessary. Therefore, for calculating the quotient (q1, q2, q3, q4, q5, q6) of six 2w-bit modular multiplications, the remainders (r1 ′, r2 ′, r3 ′) of six different 2w-bit modular multiplications are used. , R4 ′, r5 ′, r6 ′), and according to the above algorithm 2, each requires a quotient and a remainder of 6 w-bit remainder multiplications.

従って、計算する剰余乗算のビット長が増加すると、計算に必要なｗビットの剰余乗算の剰余と商の個数（即ち、ｗビットの剰余乗算の計算回数）が指数関数的に増加する。例えば、最大ｗビットの剰余乗算の剰余を計算する前記剰余乗算器を用い、２ｗビットの剰余乗算を計算する場合にはｗビットの剰余乗算を１２回、４ｗビットの剰余乗算を計算する場合にはｗビットの剰余乗算を12²回、８ｗビットの剰余乗算を計算するには、12³回の計算が必要である。これにより、指数関数的に増加する計算量を抑制するこが必要であるという課題が見出された。
(b)：上記アルゴリズム１と上記アルゴリズム２では、剰余乗算だけでなく、四則演算（加算、減算、乗算、割算）を処理する必要がある。そこで、剰余乗算の計算機能に加え、実装環境に四則演算（加算、減算、乗算、割算）の一部または全ての計算機能を備えた方が、性能が向上する場合がある。これにより、剰余乗算に加え、四則演算を利用して、前記の剰余乗算専用器の１回で計算できる最大ビット長の２倍を越える剰余乗算を計算することが得策であるという課題が見出された。 Therefore, when the bit length of the remainder multiplication to be calculated increases, the number of remainders and quotients of the w-bit remainder multiplication necessary for the calculation (that is, the number of times of w-bit remainder multiplication is calculated) increases exponentially. For example, when the remainder multiplier for calculating the remainder of the maximum w-bit remainder multiplication is used and the 2w-bit remainder multiplication is calculated, the w-bit remainder multiplication is calculated 12 times and the 4w-bit remainder multiplication is calculated. In order to calculate 12 ² times of w-bit remainder multiplication and 12 ³ times to calculate 8 w-bit remainder multiplication. Thereby, the subject that it was necessary to suppress the computational complexity which increases exponentially was discovered.
(b): In the algorithm 1 and the algorithm 2, it is necessary to process not only modular multiplication but also four arithmetic operations (addition, subtraction, multiplication, and division). Therefore, in addition to the calculation function of remainder multiplication, the performance may be improved if a part or all of the calculation functions for the four arithmetic operations (addition, subtraction, multiplication, division) are provided in the implementation environment. As a result, in addition to the remainder multiplication, it is found that it is a good idea to calculate a remainder multiplication exceeding twice the maximum bit length that can be calculated once by the above-mentioned remainder multiplication dedicated device using four arithmetic operations. It was done.

課題２：柔軟なビット長の拡張
(a)：上記アルゴリズム１と上記アルゴリズム２を用いた従来技術では、前記の剰余乗算専用器が１回で最大ｗビットの剰余乗算を計算する場合、２ｗビットの剰余乗算が計算できる。この方法を再帰的に利用し、４ｗビット、８ｗビット、１６ｗビット等、ｗビットの２のべき乗倍の剰余乗算が計算できる。しかし、常に倍数は２のべき乗倍に限定される。そのため、前記の剰余乗算専用器で計算可能なビット長が変更できない（最大ビット長の剰余乗算しか計算できない等）場合、長いビット長の剰余乗算で短いビット長の剰余乗算を代用する必要がある。例えば最大1024ビット、ただし512ビットの剰余乗算も前記の剰余乗算専用器が計算可能な場合、1536(=512×3)ビットの剰余乗算を処理する代わりに、2048(=512×2²)ビットの剰余乗算を計算する必要がある。剰余乗算のビット長が長い場合、計算に必要な時間や、データの一時保存に必要なメモリが増加してしまう。これにより、２の倍数だけでなく、前記の剰余乗算専用器のビット長の整数倍の剰余乗算を計算できることが望ましいと言う課題が見出された。
(b)：計算する剰余乗算のビット長が長い場合、データの一時保存のために、より大きな領域（メモリ等）が必要である。これにより、剰余乗算器のワークメモリを容易に大きくすることが望ましいと言うことが見出された。 Problem 2: Flexible bit length expansion
(a): In the prior art using the above algorithm 1 and the above algorithm 2, when the above-mentioned residue multiplication dedicated unit calculates a maximum w bit remainder multiplication at a time, a 2 w bit remainder multiplication can be calculated. By using this method recursively, it is possible to calculate a remainder multiplication of a power of 2 of w bits, such as 4 w bits, 8 w bits, and 16 w bits. However, the multiple is always limited to a power of two. For this reason, when the bit length that can be calculated by the above-described dedicated residue multiplication unit cannot be changed (for example, only the maximum bit length remainder multiplication can be calculated), it is necessary to substitute a short bit length remainder multiplication with a long bit length remainder multiplication. . For example, if the above-mentioned special multiplication unit can calculate a maximum of 1024 bits, but a 512 bit remainder multiplication, instead of processing a 1536 (= 512 × 3) bit remainder multiplication, 2048 (= 512 × 2 ² ) bits Need to calculate the remainder multiplication. When the bit length of the remainder multiplication is long, the time required for calculation and the memory required for temporary storage of data increase. As a result, it has been found that it is desirable to be able to calculate not only a multiple of 2, but a remainder multiplication that is an integral multiple of the bit length of the remainder multiplication dedicated device.
(b): When the bit length of the remainder multiplication to be calculated is long, a larger area (memory or the like) is required for temporary storage of data. As a result, it has been found that it is desirable to easily increase the work memory of the remainder multiplier.

本発明の目的は、剰余乗算器の演算ビット数の２倍を超えるビット数のデータに対する剰余乗算の演算効率を向上させることができるデータ処理装置を提供することにある。 An object of the present invention is to provide a data processing apparatus capable of improving the operation efficiency of remainder multiplication for data having a bit number exceeding twice the number of operation bits of a remainder multiplier.

本発明の別の目的は、剰余乗算器の演算ビット数の２倍を超える整数倍のビット数のデータに対する剰余乗算の演算効率を向上させることができるデータ処理装置を提供することにある。 Another object of the present invention is to provide a data processing apparatus capable of improving the operation efficiency of remainder multiplication for data having a bit number that is an integer multiple exceeding twice the number of operation bits of a remainder multiplier.

本発明の更に別の目的は、剰余乗算器の演算ビット数の２倍を超えるビット数のデータに対する剰余乗算のためのワークメモリを容易に大きくすることができるデータ処理装置を提供することにある。 Still another object of the present invention is to provide a data processing apparatus capable of easily increasing a work memory for remainder multiplication for data having a number of bits exceeding twice the number of operation bits of a remainder multiplier. .

本発明の前記並びにその他の目的と新規な特徴は本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。 The following is a brief description of an outline of typical inventions disclosed in the present application.

〔１〕剰余乗算の演算処理を再帰的に複数回繰り返してｗビットの剰余乗算の剰余と商から、２ｗビットの剰余乗算の商と剰余を計算するとき、先の剰余乗算の演算処理で求めたｗビットの剰余乗算の剰余と商を、次の剰余乗算の演算処理に振り分ける制御を行う。これにより、先の剰余乗算の演算処理がｗビットの剰余乗算の剰余だけを求める演算アルゴリズムに比べ、再帰的に行われる後の演算に必要な前の演算処理の商を新たに演算することを要しない。剰余乗算器の演算ビット数の２の倍数のビット数のデータに対する剰余乗算の演算効率を向上させることができる。 [1] When calculating the quotient and remainder of the 2w-bit remainder multiplication from the remainder and quotient of the w-bit remainder multiplication by recursively repeating the computation process of the remainder multiplication a plurality of times, it is obtained by the previous remainder multiplication calculation process. Further, control is performed to distribute the remainder and the quotient of the w-bit remainder multiplication to the next remainder multiplication operation. This makes it possible to newly calculate the quotient of the previous calculation process required for the subsequent calculation recursively compared to the calculation algorithm in which the calculation process of the previous multiplication is only the remainder of the w-bit modular multiplication. I don't need it. It is possible to improve the calculation efficiency of the residue multiplication for data having a bit number that is a multiple of 2 of the calculation bit number of the remainder multiplier.

〔２〕ｗビットの剰余乗算の剰余乗算の剰余と商から、ｋｗビット（ｋ＞２）の剰余乗算の商と剰余を計算するとき、kｗビットの乗算をｗビットの乗算に分割する分割演算処理と、分割処理された乗算の積から剰余乗算を計算するためのリダクション処理を前記演算部に実行させる。これにより、剰余乗算器の演算ビット数の２倍を超える整数倍のビット数のデータに対する剰余乗算の演算効率を向上させることができる。 [2] Division operation that divides kw-bit multiplication into w-bit multiplication when calculating kw-bit (k> 2) quotient and remainder from the remainder and quotient of remainder multiplication of w-bit remainder multiplication The calculation unit is caused to execute a reduction process for calculating a remainder multiplication from a product of the process and the divided product. As a result, it is possible to improve the calculation efficiency of the remainder multiplication for data having a bit number that is an integer multiple exceeding twice the number of calculation bits of the remainder multiplier.

〔３〕剰余乗算を行う演算部の制御部は、中央処理装置のアドレス空間に配置されたＲＡＭを、前記演算部のワークメモリとして用いことが可能とされる。剰余乗算を行う演算部内部のワークメモリを大きくすることなく剰余乗算のワークメモリを増やすことができる。 [3] The control unit of the arithmetic unit that performs modular multiplication can use a RAM arranged in the address space of the central processing unit as a work memory of the arithmetic unit. The work memory for remainder multiplication can be increased without increasing the work memory inside the arithmetic unit for performing remainder multiplication.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記のとおりである。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.

〔１〕剰余乗算器の演算ビット数の２倍を超えるビット数のデータに対する剰余乗算の演算効率を向上させることができる。 [1] It is possible to improve the operation efficiency of the remainder multiplication for data having a bit number exceeding twice the number of operation bits of the remainder multiplier.

〔２〕剰余乗算器の演算ビット数の２倍を超える整数倍のビット数のデータに対する剰余乗算の演算効率を向上させることができる。 [2] It is possible to improve the calculation efficiency of the remainder multiplication for data having a bit number that is an integer multiple that exceeds twice the number of calculation bits of the remainder multiplier.

〔３〕剰余乗算器の演算ビット数の２倍を超えるビット数のデータに対する剰余乗算のためのワークメモリを容易に大きくすることができる。 [3] It is possible to easily increase the work memory for remainder multiplication for data having a bit number exceeding twice the number of operation bits of the remainder multiplier.

１．実施の形態の概要
先ず、本願において開示される発明の代表的な実施の形態について概要を説明する。代表的な実施の形態についての概要説明で括弧を付して参照する図面中の参照符号はそれが付された構成要素の概念に含まれるものを例示するに過ぎない。 1. First, an outline of a typical embodiment of the invention disclosed in the present application will be described. Reference numerals in the drawings referred to in parentheses in the outline description of the representative embodiments merely exemplify what are included in the concept of the components to which the reference numerals are attached.

〔１〕本発明に係るデータ処理装置（６０１）は、剰余乗算のための演算部（３１０）と制御部（３２０）を有する。前記演算部は剰余乗算の演算処理を行う。前記制御部は前記剰余乗算の演算処理を再帰的に複数回繰り返してｗビットの剰余乗算の剰余と商から、２ｗビットの剰余乗算の商と剰余を計算するとき、先の剰余乗算の演算処理で求めたｗビットの剰余乗算の剰余と商を、次の剰余乗算の演算処理に振り分ける制御を行う（アルゴリズム３）。 [1] A data processing device (601) according to the present invention includes a calculation unit (310) and a control unit (320) for remainder multiplication. The arithmetic unit performs arithmetic operation of remainder multiplication. The control unit recursively repeats the arithmetic operation of the remainder multiplication a plurality of times to calculate the quotient and remainder of the 2w bit remainder multiplication from the remainder and quotient of the w bit remainder multiplication. Control is performed to distribute the remainder and quotient of the w-bit remainder multiplication obtained in (5) to the next remainder multiplication operation processing (algorithm 3).

〔２〕更に具体的には、本発明に係るデータ処理装置（６０１）は、剰余乗算のための演算部（３１０）と制御部（３２０）を有する。前記演算部は、wを演算値のビット数を表す正の整数、x、y、zを0≦x、y、z<2^wを満たすwビットの非負の整数、X、Y、Zを0≦X、Y、Z<2^2wを満たす2wビットの非負の整数、m、nを非負の整数とするとき、剰余乗算の演算式xy = qz+r2ⁿを満たす整数ｑと整数ｒを出力するための演算処理を行なう。前記制御部は、前記演算処理を再帰的に繰り返すとき、前記剰余乗算専用器が出力する前記整数ｑと前記整数ｒを、乗算の演算式XY = QZ + R2^2mを満たす整数Qと整数Rを得るための次の演算処理に振り分ける処理を制御する（アルゴリズム３）。 [2] More specifically, the data processing device (601) according to the present invention includes a calculation unit (310) and a control unit (320) for remainder multiplication. The arithmetic unit is a positive integer representing the number of bits of the w calculated value, x, y, z and 0 ≦ x, y, nonnegative integer w bits satisfying z <2 ^w, X, Y, and Z 0 ≦ X, Y, Z <2 ^2w bit non-negative integer satisfying 2w, m and n are set as non-negative integers, and the integer q and integer r satisfying the arithmetic expression xy = qz + r2 ⁿ are output. For this purpose. When the control unit recursively repeats the arithmetic processing, the control unit obtains the integer q and the integer r that are output from the remainder multiplication dedicated unit, an integer Q and an integer R that satisfy a multiplication arithmetic expression XY = QZ + R2 ^2m. The process of distributing to the next calculation process to obtain is controlled (algorithm 3).

〔３〕項２のデータ処理装置において、前記演算部は、剰余乗算器（１０１）、加算器（１０２）、及び減算器（１０３）を有する。 [3] In the data processing device according to item 2, the arithmetic unit includes a remainder multiplier (101), an adder (102), and a subtracter (103).

〔４〕項３のデータ処理装置は更に、前記演算部は、データメモリ（３０６）と、アキュムレータ（３１２）と、前記データメモリ又は前記アキュムレータから前記剰余乗算器、前記加算器、又は前記減算器へのデータ経路を選択するセレクタとを有する。前記アキュムレータは、前記剰余乗算器、前記加算器、又は前記減算器の出力を累積し、累積されたデータをセレクタ又はデータメモリに出力する。 [4] In the data processing device according to item 3, the arithmetic unit further includes: a data memory (306); an accumulator (312); and the remainder multiplier, the adder, or the subtractor from the data memory or the accumulator. And a selector for selecting a data route to the. The accumulator accumulates the outputs of the remainder multiplier, the adder, or the subtracter, and outputs the accumulated data to a selector or a data memory.

〔５〕項２のデータ処理装置において、前記制御部は、前記処理の手順を記述した演算制御プログラムを保持するプログラムメモリ（３０５）と、前記プログラムメモリから読み出される演算命令を解読して前記演算部に前記演算処理を実行させるための制御信号を生成する制御回路（３０３）と、を有する。 [5] In the data processing device according to item 2, the control unit decodes the arithmetic instruction read from the program memory (305) holding the arithmetic control program describing the procedure of the processing and the arithmetic memory. A control circuit (303) for generating a control signal for causing the unit to execute the arithmetic processing.

〔６〕項２のデータ処理装置において、前記制御部に暗号化又は復号のための剰余乗算処理の指示を与える中央処理装置（６０３）とを更に備え、１個の半導体基板に形成されている。 [6] The data processing apparatus according to [2], further including a central processing unit (603) that gives an instruction to the control unit to perform multiplication or decryption for encryption or decryption, and is formed on one semiconductor substrate. .

〔７〕項２のデータ処理装置は更に、前記中央処理装置のアドレス空間に配置されたＲＡＭ（６０６）を有し、前記制御部は前記ＲＡＭを前記演算部のワークメモリとして用いことが可能とされる。 [7] The data processing apparatus according to item 2 further includes a RAM (606) arranged in the address space of the central processing unit, and the control unit can use the RAM as a work memory of the arithmetic unit. Is done.

〔８〕本発明の別の観点によるデータ処理装置（６０１）は、剰余乗算のための演算部（３１０）と制御部（３２０）を有する。前記演算部は剰余乗算の演算処理を行う。前記制御部は、ｗビットの剰余乗算の剰余乗算の剰余と商から、ｋｗビット（ｋ＞２）の剰余乗算の商と剰余を計算するとき、kｗビットの乗算をｗビットの乗算に分割する分割演算処理と、分割処理された乗算の積から剰余乗算を計算するためのリダクション処理を前記演算部に実行させる（アルゴリズム１０）。 [8] A data processing device (601) according to another aspect of the present invention includes a calculation unit (310) and a control unit (320) for remainder multiplication. The arithmetic unit performs arithmetic operation of remainder multiplication. The controller divides kw-bit multiplication into w-bit multiplication when calculating a kw-bit (k> 2) quotient and remainder from the remainder and quotient of the remainder multiplication of the w-bit remainder multiplication. The calculation unit is caused to execute a division calculation process and a reduction process for calculating a remainder multiplication from the divided multiplication product (algorithm 10).

〔９〕更に具体的な観点によるデータ処理装置（６０１）は、剰余乗算のための演算部（３１０）と制御部（３２０）を有する。前記演算部は剰余乗算の演算処理を行う。前記制御部は、Ｘ、Ｙ、Ｚを0≦Ｘ、Ｙ、Ｚ<2^kwを満たすkwビットの非負の整数とし、剰余乗算の演算式Ｒ＝ＸＹ2^-kw mod Ｚを満たす非負の整数Ｒを得るとき、kwビットの整数同士の乗算の積ＸＹを小さいビット数の乗算の積に分割する分割演算処理と、前記の整数Ｚに基づいて前記分割処理された前記乗算の積ＸＹに対して次数を低くするリダクション処理と、を前記演算部に実行させる（アルゴリズム１０）。 [9] The data processing device (601) according to a more specific viewpoint includes a calculation unit (310) and a control unit (320) for remainder multiplication. The arithmetic unit performs arithmetic operation of remainder multiplication. The control unit sets X, Y, and Z as kw-bit non-negative integers that satisfy 0 ≦ X, Y, and Z <2 ^kw, and sets a non-negative integer R that satisfies a remainder multiplication arithmetic expression R = XY 2 ^−kw mod Z A division operation for dividing a product XY of multiplications of integers of kw bits into products of multiplication of a small number of bits, and an order with respect to the multiplication product XY subjected to the division processing based on the integer Z The calculation unit is caused to execute a reduction process for lowering (algorithm 10).

〔１０〕項９のデータ処理装置において、前記リダクション処理は、前記分割処理された前記乗算の積ＸＹに対して、最終的に０以上Ｚ未満の値を求める処理である。 [10] In the data processing device according to item 9, the reduction process is a process of finally obtaining a value of 0 or more and less than Z for the product XY obtained by the division process.

〔１１〕項１０のデータ処理装置において、前記演算部は、剰余乗算器（１０１）、加算器（１０２）、及び減算器（１０３）を有する。 [11] In the data processing device according to item 10, the arithmetic unit includes a remainder multiplier (101), an adder (102), and a subtractor (103).

〔１２〕項１１のデータ処理装置は更に、前記演算部は、データメモリ（３０６）と、アキュムレータ（３１２）と、前記データメモリ又は前記アキュムレータから前記剰余乗算器、前記加算器、又は前記減算器へのデータ経路を選択するセレクタ（３０８）とを有する。前記アキュムレータは、前記剰余乗算器、前記加算器、又は前記減算器の出力を累積し、累積されたデータをセレクタ又はデータメモリに出力する。 [12] In the data processing device according to item 11, the arithmetic unit further includes a data memory (306), an accumulator (312), and the data memory or the accumulator to the remainder multiplier, the adder, or the subtractor. And a selector (308) for selecting a data path to the. The accumulator accumulates the outputs of the remainder multiplier, the adder, or the subtracter, and outputs the accumulated data to a selector or a data memory.

〔１３〕項９のデータ処理装置において、前記制御部は、前記処理の手順を記述した演算制御プログラムを保持するプログラムメモリ（３０５）と、前記プログラムメモリから読み出される演算命令を解読して前記演算部に前記分割演算処理及びリダクション処理を実行させるための制御信号を生成する制御回路（３０３）とを有する。 [13] In the data processing device according to item 9, the control unit decodes an arithmetic instruction read from the program memory (305) holding an arithmetic control program describing a procedure of the processing, and the arithmetic operation And a control circuit (303) that generates a control signal for causing the division calculation process and the reduction process to be performed in the unit.

〔１４〕項９のデータ処理装置は更に、前記制御部に暗号化又は復号のための剰余乗算処理の指示を与える中央処理装置（６０３）を有し、１個の半導体基板に形成されている。 [14] The data processing apparatus according to [9] further includes a central processing unit (603) that gives an instruction of a remainder multiplication process for encryption or decryption to the control unit, and is formed on one semiconductor substrate. .

〔１５〕項１４のデータ処理装置は更に、前記中央処理装置のアドレス空間に配置されたＲＡＭ（６０６）を有し、前記制御部は前記ＲＡＭを前記演算部のワークメモリとして用いことが可能とされる。 [15] The data processing device according to item 14 further includes a RAM (606) arranged in an address space of the central processing unit, and the control unit can use the RAM as a work memory of the arithmetic unit. Is done.

２．実施の形態の詳細
実施の形態について更に詳述する。 2. Details of Embodiments Embodiments will be further described in detail.

《実施に形態１》
非特許文献４の上記アルゴリズム１と上記アルゴリズム２では、前記の剰余乗算専用器が1回で計算できる剰余乗算の最大ビット長の２倍を越える剰余乗算を計算する場合、剰余乗算の商の計算コストが大きいという前記課題１があった。剰余乗算の商の計算に、２種類の剰余乗算の剰余を要するからである。 << Embodiment 1 >>
In the algorithm 1 and the algorithm 2 described in Non-Patent Document 4, when a remainder multiplication exceeding twice the maximum bit length of the remainder multiplication that can be calculated by the remainder multiplication unit at a time is calculated, the quotient of the remainder multiplication is calculated. The problem 1 is that the cost is high. This is because the calculation of the quotient of the modular multiplication requires two types of modular multiplication.

そこで、剰余乗算の剰余から商を計算する方式（即ち、上記アルゴリズム１と上記アルゴリズム２に沿った方式）ではなく、乗算を計算し、剰余乗算の商と剰余を効率的に計算するアルゴリズム３を以下に示す。 Therefore, instead of a method for calculating the quotient from the remainder of the remainder multiplication (that is, the method according to the algorithm 1 and the algorithm 2), an algorithm 3 for calculating the multiplication and efficiently calculating the quotient and the remainder of the remainder multiplication is used. It is shown below.

［アルゴリズム３］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-2m / Z、XY2^-2m mod Z
ステップ1. r1 ← x0 y0 2^-m mod(2^w) and q1 ← x0 y0 -r1 2^w 2^m
ステップ2. r2 ← (x0+x1)(y0+y1)2^-m mod(z1) and q2 ← (x0+x1)(y0+y1) -r2 z1 2^m
ステップ3. r3 ← x1y1 2^-m mod(z1) and q3 ← x1y1 -r3 z1 2^m
ステップ4. r4 ← (r3-q1)c 2^-m mod(z1) and q4 ← (r3-q1)c -r4 z1 2^m
ステップ5. r5 ← q3 z0 2^-m mod(z1) and q5 ← q3 z0 -r5 z1 2^m
ステップ6. r6← (q2-q3+q4-q5) z0 2^-m mod(2^w) and q6 ← (q2-q3+q4-q5) z0 -r6 2^w 2^m
ステップ7. Return q3 2^w-m+(q2-q3+q4-q5)2^m and (q1-q6-r1+r2-r3+r4-r5)2^w-m+ (r1-r6)2^m
ただし、出力値 (XY2^-2m / Z)は、XY2^-2mをZで割り、小数点以下を切捨てた値である。上記アルゴリズム３における入出力データは、以下の恒等式を満たす。
XY = (XY2^-2m / Z)Z + (XY2^-2m mod Z)2^2m
即ち、(XY2^-2m / Z)の値をもつ剰余乗算の商をＱ、(XY2^-2m mod Z)の値をもつ剰余をＲとすると、(式ａ)、（式ｂ）同様、以下の式が成り立つ。
Ｒ＝XY2^-2m mod Z・・・（式Ａ）
XY＝QZ＋Ｒ2^2m ・・・(式Ｂ)
上記アルゴリズム３の各ステップは、剰余乗算の商と剰余を求める計算と、加算と減算から構成される。剰余乗算の商を計算する毎に、上記アルゴリズム２を２回（再帰的に）呼び出す必要があった従来技術と比べ、剰余乗算の商と剰余をまとめて計算する上記アルゴリズム３は、計算量が少なく、全体の処理時間を短縮できる。 [Algorithm 3]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
^{^{Output: XY2 -2m / Z, XY2 -2m}} mod Z
Step 1.r1 ← x0 y0 2 ^-m mod (2 ^w ) and q1 ← x0 y0 -r1 2 ^w 2 ^m
Step 2.r2 ← (x0 + x1) (y0 + y1) 2 ^-m mod (z1) and q2 ← (x0 + x1) (y0 + y1) -r2 z1 2 ^m
Step 3.r3 ← x1y1 2 ^-m mod (z1) and q3 ← x1y1 -r3 z1 2 ^m
Step 4.r4 ← (r3-q1) c 2 ^-m mod (z1) and q4 ← (r3-q1) c -r4 z1 2 ^m
Step 5. r5 ← q3 z0 2 ^-m mod (z1) and q5 ← q3 z0 -r5 z1 2 ^m
Step 6.r6 ← (q2-q3 + q4-q5) z0 2 ^-m mod (2 ^w ) and q6 ← (q2-q3 + q4-q5) z0 -r6 2 ^w 2 ^m
Step 7. Return q3 2 ^wm + (q2-q3 + q4-q5) 2 ^m and (q1-q6-r1 + r2-r3 + r4-r5) 2 ^wm + (r1-r6) 2 ^m
However, the output value (XY2 ^-2m / Z) divides the XY two ^-2m in Z, is a value obtained by truncating decimals. The input / output data in the algorithm 3 satisfies the following identity.
^{XY = (XY2 -2m / Z)} Z + (XY2 -2m mod Z) 2 2m
That, (XY2 ^-2m / Z) Q the quotient of modular multiplication with a value of, when R a remainder having a value of (XY2 ^-2m mod Z), (formula a), (formula b) Similarly, the following The formula holds.
R = XY2 ^-2m mod Z ··· (formula A)
XY = QZ + R2 ^2m ... (Formula B)
Each step of the algorithm 3 includes a calculation for obtaining a quotient of remainder multiplication and a remainder, and addition and subtraction. Compared with the prior art that required the algorithm 2 to be called twice (recursively) each time the quotient of the remainder multiplication is calculated, the algorithm 3 that computes the quotient and the remainder of the remainder multiplication together has a calculation amount of Less overall processing time.

例えば、最大ｗビットの剰余乗算の剰余を計算する前記剰余乗算器を用い、２ｗビットの剰余乗算を計算する場合にはｗビットの剰余乗算の回数は１２回であり、上記アルゴリズム２を用いた従来技術と同様の計算量である。しかし、４ｗビットの剰余乗算を計算する場合にはｗビットの剰余乗算は6*12(=72)回、８ｗビットの剰余乗算を計算するには6²*12(=432)回である。上記アルゴリズム１と上記アルゴリズム２を用いた従来技術では、４ｗビットの場合は12²(=144)回、8ｗビットの場合は12³(=1728)回であり、提案手法は計算量が50%（４ｗビットの場合）、25％(8wビットの場合)と少なくて済む。一般に、最大ｗビットの剰余乗算の剰余を計算する前記剰余乗算器を用い、２^ｘｗビットの剰余乗算を計算する場合、従来技術に比べ、計算量は僅か(1/2^x-１)*100%で済む（ただし、理論的に計算量の少ない、加減算等の他の演算は無視している）。 For example, when the remainder multiplier for calculating the remainder of the maximum w-bit remainder multiplication is used and the 2w-bit remainder multiplication is calculated, the number of w-bit remainder multiplications is 12, and the above algorithm 2 is used. The calculation amount is the same as that of the conventional technique. However, when calculating a 4w bit remainder multiplication, the w bit remainder multiplication is 6 * 12 (= 72) times, and when calculating an 8w bit remainder multiplication, it is 6 ² * 12 (= 432) times. In the prior art using algorithm 1 and algorithm 2 above, 12 ² (= 144) times in the case of 4w bits and 12 ³ (= 1728) times in the case of 8w bits. (4w bit) and 25% (8w bit) are small. In general, when calculating the remainder multiplication of 2 ^× w bits using the remainder multiplier for calculating the remainder of the maximum multiplication of w bits, the calculation amount is only (1/2 ^{× −1} ) * as compared with the prior art. 100% is sufficient (however, other calculations such as addition and subtraction are neglected in theory).

従って、本発明では、指数関数的に増加する計算量を抑制し、より効率的に剰余乗算の剰余と商の双方を計算する処理フローと、前記処理フローを用いて剰余乗算を計算する剰余乗算器が提供できる。 Therefore, in the present invention, a processing flow that suppresses the exponentially increasing calculation amount and more efficiently calculates both the remainder and the quotient of the remainder multiplication, and the remainder multiplication that calculates the remainder multiplication using the processing flow. Can be provided.

上記アルゴリズム３の実装機器は、前記の剰余乗算専用器における剰余乗算の計算機能と、加算や減算の計算機能を実装すればよい。 The implementation device of the algorithm 3 may be implemented with a remainder multiplication calculation function and an addition or subtraction calculation function in the above-described residue multiplication dedicated device.

なお、上記のアルゴリズム３は、前記の剰余乗算専用器が1回で計算可能な最大ビット長以上の剰余乗算の商と剰余を計算する他のアルゴリズムと、本質的に同様である。例えば、下記のアルゴリズム４においても、同様に商と剰余が計算できる。従って、上記アルゴリズム３と下記アルゴリズム４の処理は、本質的に同じである。アルゴリズム4は、
[アルゴリズム４]
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-2m / Z、XY2^-2m mod Z
ステップ1. r1 ← x0 y0 2^-m mod(2^w) and q1 ← x0 y0 -r1 2^w 2^m
ステップ2. r2 ← x1 y0 2^-m mod(z1) and q2 ← x1 y0 -r2 z1 2^m
ステップ3. r3 ← x0 y1 2^-m mod(z1) and q3 ← x0 y1 -r3 z1 2^m
ステップ4. r4 ← x1 y1 2^-m mod(z1) and q4 ← x1 y1 -r4 z1 2^m
ステップ5. r5 ← r4 c 2^-m mod(z1) and q5 ← r4 c -r5 z1 2^m
ステップ6. r6 ← q4 z0 2^-m mod(z1) and q6 ← q4 z0 -r6 z1 2^m
ステップ7. r7← (-q2-q3-q5+q6) z0 2^-m mod(2^w) and q7 ← (-q2-q3-q5+q6) z0 -r7 2^w 2^m
ステップ8. Return q4 2^w-m + (q2+q3+q5-q6)2^m and (r2+r3+r5-r6+q1+q7)2^w-m +(r1+r7)2^m
である。 Note that the algorithm 3 is essentially the same as other algorithms for calculating the quotient and remainder of the remainder multiplication greater than or equal to the maximum bit length that can be calculated by the remainder multiplication unit at one time. For example, in the following algorithm 4, the quotient and the remainder can be calculated similarly. Therefore, the processing of the above algorithm 3 and the following algorithm 4 is essentially the same. Algorithm 4 is
[Algorithm 4]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
^{^{Output: XY2 -2m / Z, XY2 -2m}} mod Z
Step 1.r1 ← x0 y0 2 ^-m mod (2 ^w ) and q1 ← x0 y0 -r1 2 ^w 2 ^m
Step 2.r2 ← x1 y0 2 ^-m mod (z1) and q2 ← x1 y0 -r2 z1 2 ^m
Step 3.r3 ← x0 y1 2 ^-m mod (z1) and q3 ← x0 y1 -r3 z1 2 ^m
Step 4.r4 ← x1 y1 2 ^-m mod (z1) and q4 ← x1 y1 -r4 z1 2 ^m
Step 5. r5 ← r4 c 2 ^-m mod (z1) and q5 ← r4 c -r5 z1 2 ^m
Step 6.r6 ← q4 z0 2 ^-m mod (z1) and q6 ← q4 z0 -r6 z1 2 ^m
Step 7.r7 ← (-q2-q3-q5 + q6) z0 2 ^-m mod (2 ^w ) and q7 ← (-q2-q3-q5 + q6) z0 -r7 2 ^w 2 ^m
Step 8. Return q4 2 ^wm + (q2 + q3 + q5-q6) 2 ^m and (r2 + r3 + r5-r6 + q1 + q7) 2 ^wm + (r1 + r7) 2 ^m
It is.

また、下記のアルゴリズム５においても、同様に商と剰余が計算できる。アルゴリズム５は、
[アルゴリズム５]
入力：X = x1 s + x0 2^m、Y = y1 s + y0 2^m、Z = z1 s + z0 2^m、ただし0≦m<ｗかつs^２＝Z＋a
出力：XY2^-2m / Z、XY2^-2m mod Z
ステップ1. r1 ← x0 y0 2^-m mod(s) and q1 ← x0 y0 -r1 s 2^m
ステップ2. r2 ← (x1+x0)(y1+y0) 2^-m mod(s) and q2 ← (x1+x0) (y1+y0) -r2 s 2^m
ステップ3. r3 ← x1 y1 2^-m mod(s) and q3 ← x1 y1 -r3 s 2^m
ステップ4. r4 ← a q3 2^-m mod(s) and q4 ← a q3 -r4 s 2^m
ステップ5. r5 ← a(-q1+q2-q3+q4+r3) 2^-m mod(s) and q5 ← a(-q1+q2-q3+q4+r3)-r5 s 2^m
ステップ6. Return q3 2^w-m+(q1-q3+q4-q5)2^m and (r1+r4-r6)2^w-m+(q2-q6+r1-r2-r3+r4-r5)2^m
である。 Also in the algorithm 5 below, the quotient and the remainder can be calculated similarly. Algorithm 5 is
[Algorithm 5]
Input: X = x1 s + x0 2 ^m , Y = y1 s + y0 2 ^m , Z = z1 s + z0 2 ^m , where 0≤m <w and s ² = Z + a
^{^{Output: XY2 -2m / Z, XY2 -2m}} mod Z
Step 1.r1 ← x0 y0 2 ^-m mod (s) and q1 ← x0 y0 -r1 s 2 ^m
Step 2.r2 ← (x1 + x0) (y1 + y0) 2 ^-m mod (s) and q2 ← (x1 + x0) (y1 + y0) -r2 s 2 ^m
Step 3.r3 ← x1 y1 2 ^-m mod (s) and q3 ← x1 y1 -r3 s 2 ^m
Step 4.r4 ← a q3 2 ^-m mod (s) and q4 ← a q3 -r4 s 2 ^m
Step 5.r5 ← a (-q1 + q2-q3 + q4 + r3) 2 ^-m mod (s) and q5 ← a (-q1 + q2-q3 + q4 + r3) -r5 s 2 ^m
Step 6. Return q3 2 ^wm + (q1-q3 + q4-q5) 2 ^m and (r1 + r4-r6) 2 ^wm + (q2-q6 + r1-r2-r3 + r4-r5) 2 ^m
It is.

また、前記の剰余乗算専用器が実装する剰余乗算の種類によらず、任意の種類の剰余乗算の商と剰余が計算できる。例えば、前記の剰余乗算専用器における変数ｍがｍ＝１を満たすとき(一般に、モンゴメリ乗算と呼ばれる)に、変数ｍがｍ＝0.5となる剰余乗算(一般に、二分割剰余乗算と呼ばれる)の商と剰余も、下記のアルゴリズム６に従い、計算できる。アルゴリズム６は、
［アルゴリズム６］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-m /Z and XY2^-m mod(Z)
ステップ1. r1 ← x1 y1 2^-m mod(2^w) and q1 ← x1 y1 -r1 2^w 2^m
ステップ2. r2 ← (x1+x0) (y1+y0) 2^-m mod(2^w) and q2 ← (x1+x0)(y1+y0) -r2 2^w 2^m
ステップ3. r3 ← x0 y0 2^-m mod(2^w) and q3 ← x0 y0 -r3 2^w 2^m
ステップ4. r4 ← q3 2^w 2^-m mod(z1) and q4 ← q3 2^w -r4 z1 2^m
ステップ5. r5 ← r1 1 2^-m mod(z0) and q5 ← r1 1 -r5 z0 2^m
ステップ6. r6 ← q4 z0 2^-m mod(2^w) and q6 ← q4 z0 -r6 2^w 2^m
ステップ7. r7 ← q5 z1 2^-m mod(2^w) and q7 ← q5 z1 -r7 2^w 2^m
ステップ8. Return q4 2^w + q5 and (r4-q1+q2-q3-q6-q7)2^w +(q1-r1+r2+r5-r6-r7)
である。 In addition, the quotient and remainder of any kind of remainder multiplication can be calculated regardless of the kind of remainder multiplication implemented by the above-described residue multiplication dedicated device. For example, when the variable m in the above-described dedicated multiplier unit satisfies m = 1 (generally referred to as Montgomery multiplication), the quotient when the variable m is m = 0.5 (generally referred to as binary divisional multiplication). And the remainder can be calculated according to the following algorithm 6. Algorithm 6 is
[Algorithm 6]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
Output: XY2 ^-m / Z and XY2 ^-m mod (Z)
Step 1.r1 ← x1 y1 2 ^-m mod (2 ^w ) and q1 ← x1 y1 -r1 2 ^w 2 ^m
Step 2.r2 ← (x1 + x0) (y1 + y0) 2 ^-m mod (2 ^w ) and q2 ← (x1 + x0) (y1 + y0) -r2 2 ^w 2 ^m
Step 3.r3 ← x0 y0 2 ^-m mod (2 ^w ) and q3 ← x0 y0 -r3 2 ^w 2 ^m
Step ^{^{4. r4 ← q3 2 w 2 -m}} mod (z1) and q4 ← q3 2 w -r4 z1 2 m
Step 5. r5 ← r1 1 2 ^-m mod (z0) and q5 ← r1 1 -r5 z0 2 ^m
Step 6.r6 ← q4 z0 2 ^-m mod (2 ^w ) and q6 ← q4 z0 -r6 2 ^w 2 ^m
Step 7.r7 ← q5 z1 2 ^-m mod (2 ^w ) and q7 ← q5 z1 -r7 2 ^w 2 ^m
Step 8. Return q4 2 ^w + q5 and (r4-q1 + q2-q3-q6-q7) 2 ^w + (q1-r1 + r2 + r5-r6-r7)
It is.

また、法Zと変数ｍにおいて、Ｚ＝２^2ｗかつｍ＝０、またはＺ＝２^2ｗかつｍ＝１が成り立つ場合、剰余乗算の商と剰余を計算する上記アルゴリズム３に代えて、乗算を計算する下記アルゴリズム７を実施してもよい。ただし、下記アルゴリズム７では、乗算結果の上位２ｗビットを剰余乗算の商（ｍ＝０のとき）または剰余乗算の剰余（ｍ＝１）、下位２ｗビットを剰余乗算の剰余（ｍ＝１のとき）または剰余乗算の商（ｍ＝０）とみなす。アルゴリズム７は、
［アルゴリズム７］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-2m /Z and XY2^-2m mod(Z)
ステップ1. r1 ← x0 y0 2^-m mod(2^w) and q1 ← x0 y0 -r1 2^w 2^m
ステップ2. r2 ← x0 y1 2^-m mod(2^w) and q2 ← x0 y1 -r2 2^w 2^m
ステップ3. r3 ← x1 y0 2^-m mod(2^w) and q3 ← x1 y0 -r3 2^w 2^m
ステップ4. r4 ← x1 y1 2^-m mod(2^w) and q4 ← x1 y1 -r3 2^w 2^m
ステップ5. sum = x1y12^2w + (x1y0+x0y1)2^w + x0y0
ステップ6. Return sum/2^2w and sum mod(2^2w)
である。 When Z = 2 ^2w and m = 0, or Z = 2 ^2w and m = 1 in the modulus Z and the variable m, the multiplication is performed in place of the algorithm 3 for calculating the quotient and the remainder of the remainder multiplication. The following algorithm 7 may be executed. However, in the algorithm 7 below, the upper 2w bits of the multiplication result is the quotient of the remainder multiplication (when m = 0) or the remainder of the remainder multiplication (m = 1), and the lower 2w bit is the remainder of the remainder multiplication (when m = 1) ) Or the quotient of remainder multiplication (m = 0). Algorithm 7 is
[Algorithm 7]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
^{^{Output: XY2 -2m / Z and XY2 -2m}} mod (Z)
Step 1.r1 ← x0 y0 2 ^-m mod (2 ^w ) and q1 ← x0 y0 -r1 2 ^w 2 ^m
Step 2.r2 ← x0 y1 2 ^-m mod (2 ^w ) and q2 ← x0 y1 -r2 2 ^w 2 ^m
Step 3.r3 ← x1 y0 2 ^-m mod (2 ^w ) and q3 ← x1 y0 -r3 2 ^w 2 ^m
Step 4.r4 ← x1 y1 2 ^-m mod (2 ^w ) and q4 ← x1 y1 -r3 2 ^w 2 ^m
Step 5.sum = x1y12 ^2w + (x1y0 + x0y1) 2 ^w + x0y0
Step 6. Return sum / 2 ^2w and sum mod (2 ^2w )
It is.

なお、上記のアルゴリズム３と同様に、上記アルゴリズム７も計算方式の一例であり、乗算を用いて、前記の剰余乗算専用器が1回で計算可能な最大ビット長以上の剰余乗算の商と剰余を計算する他のアルゴリズムと、本質的に同様である。例えば、他の計算方式として、下記のアルゴリズム８がある。アルゴリズム８は、
［アルゴリズム８］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-2m /Z and XY2^-2m mod(Z)
ステップ1. r1 ← x0 y0 2^-m mod(2^w) and q1 ← x0 y0 -r1 2^w 2^m
ステップ2. r2 ← (x0+x1)(y1+y0) 2^-m mod(2^w) and q2 ← (x0+x1)(y0+y1) -r2 2^w 2^m
ステップ3. r3 ← x1 y1 2^-m mod(2^w) and q3 ← x1 y1 -r3 2^w 2^m
ステップ4. sum = x1y12^w(2^w-1) + (x1+x0)(y1+y0)2^w - x0y0(2^w-1)
ステップ5. Return sum/2^2w and sum mod(2^2w)
である。 Similar to algorithm 3 above, algorithm 7 is also an example of a calculation method, and by using multiplication, the quotient and remainder of the remainder multiplication greater than or equal to the maximum bit length that can be calculated by the remainder multiplication unit at a time. It is essentially the same as any other algorithm that computes. For example, there is the following algorithm 8 as another calculation method. Algorithm 8 is
[Algorithm 8]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
^{^{Output: XY2 -2m / Z and XY2 -2m}} mod (Z)
Step 1.r1 ← x0 y0 2 ^-m mod (2 ^w ) and q1 ← x0 y0 -r1 2 ^w 2 ^m
Step 2.r2 ← (x0 + x1) (y1 + y0) 2 ^-m mod (2 ^w ) and q2 ← (x0 + x1) (y0 + y1) -r2 2 ^w 2 ^m
Step 3.r3 ← x1 y1 2 ^-m mod (2 ^w ) and q3 ← x1 y1 -r3 2 ^w 2 ^m
Step 4.sum = x1y12 ^w (2 ^w -1) + (x1 + x0) (y1 + y0) 2 ^w -x0y0 (2 ^w -1)
Step 5. Return sum / 2 ^2w and sum mod (2 ^2w )
It is.

次に、剰余乗算ユニットにおいて、上記アルゴリズム３を実現する場合の処理フローを説明する。ただし、その処理フローは特に制限されず、上記アルゴリズム４、上記アルゴリズム６、上記アルゴリズム７、上記アルゴリズム８やその他のアルゴリズムを処理する場合でも同様に実現できる。 Next, a processing flow when the algorithm 3 is realized in the remainder multiplication unit will be described. However, the processing flow is not particularly limited, and can be realized in the same manner even when the algorithm 4, the algorithm 6, the algorithm 7, the algorithm 8, and other algorithms are processed.

先ず、その処理に用いる演算ユニットに付ついて説明する。特に制限されないが、図１中の（Ａ）に、与えられた所定の入力値から、各演算器固有の演算を計算し、計算結果を出力する演算器を示す。以下においては、剰余乗算の商と剰余を計算する剰余乗算器としてのＭＭ演算器１０１、加算を計算する加算器としてのＡＤＤ演算器１０２、減算を計算する減算器としてのＳＵＢ演算器１０３の３種類の演算器を用いて、上記のアルゴリズム３を処理する場合を以下で説明する。ただし、剰余乗算ユニットが上記アルゴリズム３を処理するには、剰余乗算と加算と減算が処理できればよく、それらは別々の演算器である必要は無い。例えば、剰余乗算と加算と減算の計算機能をもつ一つの演算器を用いても良い。また、２の補数表現で表したデータとＡＤＤ演算器１０２を用いて、減算を処理するように変更しても良い。 First, the arithmetic unit used for the processing will be described. Although not particularly limited, FIG. 1A shows an arithmetic unit that calculates an operation specific to each arithmetic unit from a given input value and outputs the calculation result. In the following, 3 of the MM calculator 101 as a remainder multiplier for calculating the quotient and remainder of the remainder multiplication, the ADD calculator 102 as the adder for calculating addition, and the SUB calculator 103 as the subtractor for calculating subtraction. A case where the above algorithm 3 is processed using a type of arithmetic unit will be described below. However, in order for the remainder multiplication unit to process the above algorithm 3, it is only necessary to be able to process the remainder multiplication, addition, and subtraction, and they need not be separate arithmetic units. For example, one arithmetic unit having a calculation function of remainder multiplication, addition, and subtraction may be used. Further, the subtraction may be processed by using the data represented by the two's complement expression and the ADD calculator 102.

さらに、剰余乗算の商と剰余の計算では、別の演算器を用いても良い。上記アルゴリズム２に従い、例えば、図１中の（Ｂ）に示す剰余乗算の剰余を計算するＭＭ２演算器１５１を用いても良い。また、乗算と割算を用いて、剰余乗算が計算できる。そのため、図１中の（Ｂ）に示す乗算を計算するＭＵ演算器１５２や、割算を計算するＤＩＶ演算器１５３を用いても良い。 Furthermore, another arithmetic unit may be used in the quotient of the remainder multiplication and the remainder calculation. In accordance with the algorithm 2, for example, the MM2 calculator 151 that calculates the remainder of the remainder multiplication shown in (B) in FIG. 1 may be used. In addition, modular multiplication can be calculated using multiplication and division. Therefore, a MU computing unit 152 that calculates multiplication shown in FIG. 1B or a DIV computing unit 153 that calculates division may be used.

図２には上記アルゴリズム３に関する処理フローが例示される。ここでは、図1の（Ａ）で定義したＭＭ演算器１０１、ＡＤＤ演算器１０２、ＳＵＢ演算器１０３を用いた、上記アルゴリズム３に関する処理フローを示す。ただし、図２の演算器内の()内の番号は、演算器を用いるステップ番号としての参照番号である。また、図２中で線が交差する場合、黒丸印が無い場合は互いに影響がなく、黒丸印がある場合は同一の値をもつデータの分岐処理を指す。従って、図２中の処理フローは、各演算器への入力順と、黒丸印の有無、結線処理により、２ｗビットの剰余乗算の商と剰余それぞれに必要な計算ｗビットの演算結果を振り分け処理を有する。また、この振り分け処理により、図２の処理フローは、最終的には、上記アルゴリズム３の出力値であるq3と(q2-q3+q4-q5)を結合した２ｗビットの剰余乗算の商であるq3c+(q2-q3+q4-q5)と、(q1-q6-r1+r2-r3+r4-r5)と(r1-r6)を結合した２ｗビットの剰余乗算の剰余である(q1-q6-r1+r2-r3+r4- r5)c+(r1-r6)を得る。 FIG. 2 illustrates a processing flow related to the algorithm 3. Here, a processing flow related to the algorithm 3 using the MM computing unit 101, the ADD computing unit 102, and the SUB computing unit 103 defined in FIG. However, the numbers in parentheses in the computing unit in FIG. 2 are reference numbers as step numbers using the computing unit. Also, when the lines intersect in FIG. 2, there is no influence when there is no black circle mark, and when there is a black circle mark, it indicates a branching process of data having the same value. Therefore, the processing flow in FIG. 2 distributes the computation results of w bits required for each quotient and remainder of 2w bits by the order of input to each arithmetic unit, the presence or absence of black circles, and the connection processing. Have In addition, with this distribution processing, the processing flow of FIG. 2 is finally a quotient of 2w bit remainder multiplication combining q3 which is the output value of the algorithm 3 and (q2-q3 + q4-q5). q3c + (q2-q3 + q4-q5), (q1-q6-r1 + r2-r3 + r4-r5) and (r1-r6) are combined and the remainder of 2w bit remainder multiplication (q1-q6- r1 + r2-r3 + r4-r5) c + (r1-r6) is obtained.

例えば、ＭＭ演算器（ｍ１）は、x0、y0、cを入力値として受け付け、剰余乗算の商q1と剰余r1を出力する。ＭＭ演算器(ｍ１)から出力された商ｑ１は結線されたＳＵＢ演算器(ｓ２)とＳＵＢ演算器(ｓ３)に入力値として受け付けられ、同様に、剰余ｒ１はＳＵＢ演算器(ｓ３)とＳＵＢ演算器(ｓ７)に入力値として受け付けられる。ｑ１とｒ１を入力値として受け付けたＳＵＢ演算器は、減算(q1-r1)を計算し、その結果をＡＤＤ演算器(ａ３)へ出力する。同様の処理を他の演算器でも実施し、結果的に、図２中の最下部に記した出力値として、Ｐ１＝q3、Ｐ２＝(q2-q3+q4-q5)、Ｐ３＝(q1-q6-r1+r2-r3+r4-r5)、Ｐ４＝(r1-r6)を得る。 For example, the MM calculator (m1) accepts x0, y0, and c as input values, and outputs a quotient q1 and a remainder r1 of the remainder multiplication. The quotient q1 output from the MM calculator (m1) is received as an input value by the connected SUB calculator (s2) and SUB calculator (s3), and similarly, the remainder r1 is the SUB calculator (s3) and SUB. It is accepted as an input value by the arithmetic unit (s7). The SUB calculator that receives q1 and r1 as input values calculates the subtraction (q1-r1) and outputs the result to the ADD calculator (a3). Similar processing is also performed in another arithmetic unit. As a result, P1 = q3, P2 = (q2-q3 + q4-q5), P3 = (q1- q6-r1 + r2-r3 + r4-r5), P4 = (r1-r6).

図３には図２の処理フローを実行可能な剰余乗算ユニット３の構成が例示される。なお、図３に示される全ての機能ブロックは、単結晶シリコン基板のような、一個の半導体基板に形成されている。 FIG. 3 illustrates the configuration of the modular multiplication unit 3 that can execute the processing flow of FIG. Note that all the functional blocks shown in FIG. 3 are formed on one semiconductor substrate such as a single crystal silicon substrate.

図３において、３００は、剰余乗算ユニットと他の機器の接続に用いるシステムバスを示す。３０１はクロック発生器、３０２は入出力ポート（単にＩ／Ｆと称する）、３０３はプログラムメモリとしてのプログラム用メモリ３０５に従って他の機器を制御する制御回路、３０４は制御回路３０３における各機器の状態管理用のレジスタ（管理レジスタと称する）、３０５は制御回路３０３から読み出されるプログラムやデータが格納されるプログラム用メモリ（読み込み可能であればよく、ＲＯＭ等の不揮発性媒体でもよい）、３０６はデータメモリとしてのデータの格納用メモリ（プログラムを格納してもよく、ＲＡＭ等の揮発性媒体が望ましい）、３０７は制御レジスタ、３０８はセレクタである。１０１は剰余乗算の商と剰余を計算するＭＭ演算器、１０２は加算を計算するＡＤＤ演算器、１０３は減算を計算するＳＵＢ演算器、３１２は各演算器から出力されたデータを格納するアキュムレータである。 In FIG. 3, reference numeral 300 denotes a system bus used for connection of the remainder multiplication unit and other devices. Reference numeral 301 is a clock generator, 302 is an input / output port (simply referred to as I / F), 303 is a control circuit that controls other devices in accordance with a program memory 305 as a program memory, and 304 is the status of each device in the control circuit 303 A register for management (referred to as a management register), 305 is a program memory for storing a program and data read from the control circuit 303 (which may be readable and may be a non-volatile medium such as a ROM), and 306 is data A memory for storing data as a memory (a program may be stored, and a volatile medium such as a RAM is desirable), 307 is a control register, and 308 is a selector. 101 is an MM calculator that calculates the quotient and remainder of remainder multiplication, 102 is an ADD calculator that calculates addition, 103 is a SUB calculator that calculates subtraction, and 312 is an accumulator that stores the data output from each calculator. is there.

特に制限されないが、上記アルゴリズム３を実現するための処理手順が記述されたプログラムをプログラム用メモリ３０５、処理中の入出力や演算で用いるデータをデータ用メモリ３０６に格納する。また、プログラム用メモリ３０５や、データ用メモリ３０６を着脱可能にし、他のメモリと交換または追加して、利用量する記憶容量の調整をしてもよい。また、剰余乗算ユニット３は、クロック発生器３０１を備えず、外部から供給されるクロック信号に基づいて動作してもよい。 Although not particularly limited, a program in which a processing procedure for realizing the algorithm 3 is described is stored in the program memory 305, and data used for input / output or calculation during processing is stored in the data memory 306. Further, the memory capacity for use may be adjusted by making the program memory 305 and the data memory 306 detachable and replacing or adding other memory. The modular multiplication unit 3 does not include the clock generator 301 and may operate based on an externally supplied clock signal.

尚、剰余乗算ユニット３において、剰余乗算器（ＭＭ演算器）１０１、加算器（ＡＤＤ演算器）１０２、減算器（ＳＵＢ演算器）１０３、セレクタ３０８、アキュムレータ２１３及びデータ用メモリ３０６は演算部３１０の一例とされ、制御回路３０３及びプログラム用メモリ３０５は制御部３２０の一例とされる。 In the modular multiplication unit 3, the modular multiplier (MM computing unit) 101, the adder (ADD computing unit) 102, the subtracter (SUB computing unit) 103, the selector 308, the accumulator 213, and the data memory 306 are the computing unit 310. The control circuit 303 and the program memory 305 are examples of the control unit 320.

剰余乗算ユニット３において、図２の処理フローを実行するための、処理フローの概略を図４に示す。図４の処理フローにおいて、剰余乗算の商と剰余の計算はＭＭ演算器１０１、加算はＡＤＤ演算器１０２、減算はＳＵＢ演算器１０３が負担すべき演算処理とされる。 FIG. 4 shows an outline of the processing flow for executing the processing flow of FIG. 2 in the remainder multiplication unit 3. In the processing flow of FIG. 4, the quotient of the remainder multiplication and the calculation of the remainder are the MM calculator 101, the addition is the ADD calculator 102, and the subtraction is the calculation process that should be borne by the SUB calculator 103.

まず、制御回路３０３がプログラム用メモリ３０５から上記アルゴリズム３を記載したプログラムを読み込む（Ｓ４０１）。制御回路３０３は、読み込んだプログラムや管理レジスタ３０４の状態に従って、データ用メモリ３０６とアキュムレータ３１２におけるデータを転送する必要があるか否かを判断する（Ｓ４０２）。転送を必要と判断した場合、データ用メモリ３０６またはアキュムレータ３１２内から、または相互間で、データを転送する。例えば、アキュムレータ３１２に格納されたデータが次の演算処理で出力されるデータに上書きされないよう、アキュムレータ３１２内のデータをデータ用メモリ３０６に転送する（Ｓ４０３）。制御回路３０３は、制御信号を送信し、制御レジスタ３０７内のレジスタ値を設定する（Ｓ４０４）。制御レジスタ３０７内のレジスタ値は、図５中の（Ａ）に示すように、利用する演算器を決定する演算コード（５０１）と、演算器に入力するデータの居場所を示すアドレスコード（５０２）からなる。制御レジスタ３０７内のレジスタ値に従い、セレクタ３０８は、演算器にデータを送信する（Ｓ４０５）。データを送信された演算器は演算（剰余乗算の商と剰余の計算、加算または減算）を処理し、演算結果をアキュムレータ３１２に出力する（Ｓ４０６）。図５中の（Ｂ）に示すように、アキュムレータ３１２は出力値のキャリー(繰上げ)またはボロウ(繰下げ)の有無を、管理レジスタ３０４に伝達する（Ｓ４０７）。制御回路３０３は、読み込んだプログラムや管理レジスタ３０４の状態に従って、処理を終了するかを判断する（Ｓ４０８）。 First, the control circuit 303 reads a program describing the algorithm 3 from the program memory 305 (S401). The control circuit 303 determines whether or not it is necessary to transfer data in the data memory 306 and the accumulator 312 according to the read program and the state of the management register 304 (S402). When it is determined that the transfer is necessary, the data is transferred from the data memory 306 or the accumulator 312 or between each other. For example, the data in the accumulator 312 is transferred to the data memory 306 so that the data stored in the accumulator 312 is not overwritten with the data output in the next arithmetic processing (S403). The control circuit 303 transmits a control signal and sets a register value in the control register 307 (S404). As shown in FIG. 5A, the register value in the control register 307 includes an operation code (501) for determining a computing unit to be used and an address code (502) indicating the location of data to be input to the computing unit. Consists of. According to the register value in the control register 307, the selector 308 transmits data to the computing unit (S405). The computing unit to which the data has been sent processes the computation (modulus multiplication quotient and remainder computation, addition or subtraction), and outputs the computation result to the accumulator 312 (S406). As shown in FIG. 5B, the accumulator 312 transmits the presence / absence of carry (carrying up) or borrowing (carrying down) of the output value to the management register 304 (S407). The control circuit 303 determines whether to end the process according to the read program and the state of the management register 304 (S408).

上記では、剰余乗算の商と剰余を計算できるＭＭ演算器１０１を仮定した。ＭＭ演算器１０１における剰余乗算の計算方法は問わず、例えば、古典的な剰余乗算やモンゴメリ乗算を実装するＭＭ演算器１０１であってもよい。また、ＭＭ演算器１０１の代わりに、他の演算器を用いても、同様に計算できる。例えば、図１中の（Ｂ）に示すような剰余乗算の剰余を出力するＭＭ２演算器１５１を用いてもよい。さらに、ＭＭ演算器１０１が剰余乗算だけでなく、加算や減算も計算できる場合、ＡＤＤ演算器１０２やＳＵＢ演算器１０３を用いなくても良く、剰余乗算ユニットの回路規模を削減できる。また、ＭＭ演算器１０１の代わりに、乗算を計算するＭＵ演算器１５２と割算を計算するＤＩＶ演算器１５３を用いてもよい。また、入力として、データXとデータYとデータZとデータTを受け付け、(XY＋T2^k)/ZとXY+T2^k(mod Z)を出力する演算器を用いても良い。この場合、下記のアルゴリズム９がある。アルゴリズム９は、
［アルゴリズム９］
入力：X = x1 c + x0 2^m、Y = y1 c + y0 2^m、Z = z1 c + z0 2^m、ただし0≦m<ｗ
出力：XY2^-2m /Z and XY2^-2m mod(Z)
ステップ1. r1 ← x1 y1 2^-m mod(z1) and q1 ← x1 y1 -r1 z1 2^m
ステップ2. r2 ← (x0+x1)(y1+y0) 2^-m mod(2^w-1) and q2 ← (x0+x1)(y0+y1) -r2 (2^w-1) 2^m
ステップ3. r3 ← x0 y0 2^-m mod(2^w) and q3 ← x0 y0 -r3 2^w 2^m
ステップ4. r4 ← q2 z0 2^-m + (r2-q3)2^w mod(2^w) and q4 ← q2 z0+(r2-q3)2^w-r4 2^w 2^m
ステップ5. r5 ← (q1-q2-q4) z0 2^-m mod(2^w) and q5 ← (q1-q2-q4) z0 -r5 2^w 2^m
ステップ6. Return q3 2^w + (q1-q2-q4) and (r1-r2-r3-r4+q3-q5)2^w+r3-r4-r5
である。 In the above description, the MM arithmetic unit 101 that can calculate the quotient and the remainder of the remainder multiplication is assumed. The calculation method of the modular multiplication in the MM computing unit 101 is not limited, and for example, the MM computing unit 101 that implements classical modular multiplication or Montgomery multiplication may be used. The same calculation can be performed by using another arithmetic unit instead of the MM arithmetic unit 101. For example, an MM2 calculator 151 that outputs the remainder of the remainder multiplication as shown in FIG. Furthermore, when the MM calculator 101 can calculate not only the remainder multiplication but also addition and subtraction, the ADD calculator 102 and the SUB calculator 103 need not be used, and the circuit scale of the remainder multiplication unit can be reduced. Further, instead of the MM calculator 101, a MU calculator 152 for calculating multiplication and a DIV calculator 153 for calculating division may be used. An arithmetic unit that receives data X, data Y, data Z, and data T as inputs and outputs (XY + T2 ^k ) / Z and XY + T2 ^k (mod Z) may be used. In this case, there is the following algorithm 9. Algorithm 9 is
[Algorithm 9]
Input: X = x1 c + x0 2 ^m , Y = y1 c + y0 2 ^m , Z = z1 c + z0 2 ^m , where 0≤m <w
^{^{Output: XY2 -2m / Z and XY2 -2m}} mod (Z)
Step 1.r1 ← x1 y1 2 ^-m mod (z1) and q1 ← x1 y1 -r1 z1 2 ^m
Step 2.r2 ← (x0 + x1) (y1 + y0) 2 ^-m mod (2 ^w -1) and q2 ← (x0 + x1) (y0 + y1) -r2 (2 ^w -1) 2 ^m
Step 3.r3 ← x0 y0 2 ^-m mod (2 ^w ) and q3 ← x0 y0 -r3 2 ^w 2 ^m
Step 4.r4 ← q2 z0 2 ^-m + (r2-q3) 2 ^w mod (2 ^w ) and q4 ← q2 z0 + (r2-q3) 2 ^w -r4 2 ^w 2 ^m
Step 5.r5 ← (q1-q2-q4) z0 2 ^-m mod (2 ^w ) and q5 ← (q1-q2-q4) z0 -r5 2 ^w 2 ^m
Step 6. Return q3 2 ^w + (q1-q2-q4) and (r1-r2-r3-r4 + q3-q5) 2 ^w + r3-r4-r5
It is.

特に、剰余乗算に加え、乗算を計算できる場合は、上記アルゴリズム３において、剰余乗算に代えて乗算を計算してもよい。図５中の（Ｃ）に示すように、上記アルゴリズム３のステップ１において、c=2^ｗのとき、ｗビット整数x0とy0の積の上位ｗビットは、剰余乗算の商q1と等しく（５５１）、下位ｗビットは剰余乗算の剰余r1と等しい。ステップ6も同様の原理で商q6と剰余r6が求まる（５５２）。 In particular, when the multiplication can be calculated in addition to the remainder multiplication, the algorithm 3 may calculate the multiplication instead of the remainder multiplication. As shown in FIG. 5C, in step 1 of algorithm 3 above, when c = 2 ^w , the upper w bits of the product of the w bit integers x0 and y0 are equal to the quotient q1 of the remainder multiplication (551). ), The lower w bits are equal to the remainder r1 of the remainder multiplication. In step 6, the quotient q6 and the remainder r6 are obtained on the same principle (552).

さらに、演算器を利用する代わりに、各演算結果を予めメモリに書き込み、入力値から、適切にメモリの値を参照し、演算結果を得るように変更しても良い。この場合、演算器が必要とする回路規模を削減できるが、代わりにメモリの使用量が多くなる。 Further, instead of using a computing unit, each computation result may be written in the memory in advance, and the memory value may be appropriately referred to from the input value to obtain a computation result. In this case, the circuit scale required by the arithmetic unit can be reduced, but the amount of memory used is increased instead.

図６は、剰余乗算ユニットの商と剰余を計算する上記アルゴリズム３を実行可能なマイクロコンピュータ６０１のブロック図の概略の一例を示している。図６において、６０２はクロック発生器、６０３はＣＰＵ、６０４は入出力ポート（Ｉ／Ｏポートと称する）、６０５はプログラムやデータが格納された読み出し専用のメモリであるＲＯＭ、６０６はＣＰＵ６０３の作業領域を提供するメモリであるＲＡＭ、６０７はプログラムやデータを格納するメモリであるＥＥＰＲＯＭ、３は剰余乗算ユニットを示している。ＣＰＵ６０３、Ｉ／Ｏポート６０４、ＲＯＭ６０５、ＲＡＭ６０６、ＥＥＰＲＯＭ６０７、及び剰余乗算ユニット３は、アドレスバスとコントロールバスの総称であるバス６１１と、データバスの総称であるバス６１２に接続されている。クロック発生器６０２は、クロック端子ＣＬＫから供給されるクロック信号に基づき、または内部の動作基準クロック信号を生成して、ＣＰＵ６０３に供給する。Ｉ／Ｏポート６０４は、データ入出力外部端子Ｉ／Ｏに接続する。Ｖｃｃ、Ｖｓｓはマイクロコンピュータ６０１の電源用外部端子、ＲＥＳはマイクロコンピュータのリセット用外部端子である。 FIG. 6 shows an example of a schematic block diagram of a microcomputer 601 capable of executing the algorithm 3 for calculating the quotient and remainder of the remainder multiplication unit. In FIG. 6, 602 is a clock generator, 603 is a CPU, 604 is an input / output port (referred to as I / O port), 605 is a ROM which is a read-only memory storing programs and data, 606 is a work of the CPU 603 A RAM which is a memory providing an area, 607 is an EEPROM which is a memory for storing programs and data, and 3 is a remainder multiplication unit. The CPU 603, I / O port 604, ROM 605, RAM 606, EEPROM 607, and remainder multiplication unit 3 are connected to a bus 611 that is a generic name for an address bus and a control bus, and a bus 612 that is a generic term for a data bus. The clock generator 602 generates an internal operation reference clock signal based on the clock signal supplied from the clock terminal CLK or supplies it to the CPU 603. The I / O port 604 is connected to the data input / output external terminal I / O. Vcc and Vss are power supply external terminals of the microcomputer 601, and RES is a microcomputer reset external terminal.

図６に示すマイクロコンピュータ６０１は、単結晶シリコン基板のような、一個の半導体基板に例えば相補型ＭＯＳ集積回路製造技術によって形成される。図６に示すマイクロコンピュータ６０１は一つの実装例であり、他の機器でも、同様に実装できる。例えば、ＲＦＩＤや、ＰＤＡ、携帯電話等の小型機器にも実装可能である。 A microcomputer 601 shown in FIG. 6 is formed on a single semiconductor substrate such as a single crystal silicon substrate by, for example, a complementary MOS integrated circuit manufacturing technique. The microcomputer 601 shown in FIG. 6 is one example of mounting, and can be similarly mounted on other devices. For example, it can be mounted on small devices such as RFID, PDA, and cellular phone.

《実施の形態２》
非特許文献４の上記アルゴリズム２は、前記の剰余乗算専用器のビット長の２のべき乗の倍数(＝2^x倍)が計算可能なビット長であるため、ビット長の微調整がきかず、計算量の増加や消費メモリの増加を招く、という課題２があった。上記課題２に対して有効な、前記の剰余乗算専用器のビット長の整数倍の剰余乗算を計算するアルゴリズムと処理フローを以下に説明する。 << Embodiment 2 >>
The above algorithm 2 of Non-Patent Document 4 is a bit length that can be calculated as a power of a multiple of 2 (= 2 ^× multiples) of the bit length of the above-mentioned special multiplication unit. There was a problem 2 in which the increase in the amount and the increase in the consumed memory were caused. An algorithm and a processing flow for calculating a remainder multiplication that is an integral multiple of the bit length of the remainder multiplication exclusive device effective for the above problem 2 will be described below.

まず、前記の剰余乗算専用器が1回で計算できるビット長の最大３倍の剰余乗算を計算するアルゴリズム１０を以下に例示する。アルゴリズム１０は、
[アルゴリズム１０]
入力：X = x2 c² + x1 c + x0、Y = y2 c² +y1 c + y0、Z = z2 c² + z1 c + z0
出力：XY mod Z
ステップ1. q1 ← x2 y2 / z2 and r1 ← x2 y2 mod(z2)
ステップ2. q2 ← q1 z1 / z2 and r2 ← q1 z1 mod(z2)
ステップ3. q3 ← r1 c /z2 and r3 ← r1 c mod(z2)
ステップ4. q4 ← x2 y1 /z2 and r4 ← x2 y1 mod(z2)
ステップ5. q5 ← x1 y2 /z2 and r5 ← x1 y2 mod(z2)
ステップ6. q6 ← q1 z0/ z2 and r6← q1 z0 mod(z2)
ステップ7. q7 ← (-q2+q3+q4+q5) z1 / z2 and r7 ← (-q2+q3+q4+q5) z1 mod(z2)
ステップ8. q8 ← (-r2+r3+r4+r5)c / z2 and r8 ←(-r2+r3+r4+r5) c mod(z2)
ステップ9. q9 ← x2 y0 / z2 and r9 ← x2 y0 mod(z2)
ステップ10. q10 ← x1 y1 / z2 and r10 ← x1 y1 mod(z2)
ステップ11. q11 ← x0 y2 / z2 and r11 ← x0 y2 mod(z2)
ステップ12. q12 ← (-q2+q3+q4+q5) z0 / c and r12 ← (-q2+q3+q4+q5) z0 mod(c)
ステップ13. q13 ← (-q6-q7+q8+q9+q10+q11) z1 / c and r13 ← (-q6-q7+q8+q9+q10+q11) z1 mod(c)
ステップ14. q14 ← x1 y0 / c and r14 ← x1 y0 mod(c)
ステップ15. q15 ← x0 y1 / c and r15 ← x0 y1 mod(c)
ステップ16. q16 ← (-q6-q7+q8+q9+q10+q11) z0 / c and r16 ← (-q6-q7+q8+q9+q10+q11) z0 mod(c)
ステップ17. q17 ← x0 y0 / c and r17 ← x0 y0 mod(c)
ステップ18. Return ((-q12-q13+q14+q15-r6-r7+r8+r9+r10+r11)c² + (-r12-r13+r14+r15- q16+q17)c + (-r16+r17))(mod Z)
である。 First, an algorithm 10 for calculating a remainder multiplication of a maximum of 3 times the bit length that can be calculated at once by the above-mentioned residue multiplication dedicated unit is illustrated below. Algorithm 10 is
[Algorithm 10]
Input: X = x2 c ² + x1 c + x0, Y = y2 c ² + y1 c + y0, Z = z2 c ² + z1 c + z0
Output: XY mod Z
Step 1.q1 ← x2 y2 / z2 and r1 ← x2 y2 mod (z2)
Step 2.q2 ← q1 z1 / z2 and r2 ← q1 z1 mod (z2)
Step 3.q3 ← r1 c / z2 and r3 ← r1 c mod (z2)
Step 4.q4 ← x2 y1 / z2 and r4 ← x2 y1 mod (z2)
Step 5.q5 ← x1 y2 / z2 and r5 ← x1 y2 mod (z2)
Step 6.q6 ← q1 z0 / z2 and r6 ← q1 z0 mod (z2)
Step 7.q7 ← (-q2 + q3 + q4 + q5) z1 / z2 and r7 ← (-q2 + q3 + q4 + q5) z1 mod (z2)
Step 8.q8 ← (-r2 + r3 + r4 + r5) c / z2 and r8 ← (-r2 + r3 + r4 + r5) c mod (z2)
Step 9.q9 ← x2 y0 / z2 and r9 ← x2 y0 mod (z2)
Step 10.q10 ← x1 y1 / z2 and r10 ← x1 y1 mod (z2)
Step 11.q11 ← x0 y2 / z2 and r11 ← x0 y2 mod (z2)
Step 12.q12 ← (-q2 + q3 + q4 + q5) z0 / c and r12 ← (-q2 + q3 + q4 + q5) z0 mod (c)
Step 13.q13 ← (-q6-q7 + q8 + q9 + q10 + q11) z1 / c and r13 ← (-q6-q7 + q8 + q9 + q10 + q11) z1 mod (c)
Step 14.q14 ← x1 y0 / c and r14 ← x1 y0 mod (c)
Step 15.q15 ← x0 y1 / c and r15 ← x0 y1 mod (c)
Step 16.q16 ← (-q6-q7 + q8 + q9 + q10 + q11) z0 / c and r16 ← (-q6-q7 + q8 + q9 + q10 + q11) z0 mod (c)
Step 17.q17 ← x0 y0 / c and r17 ← x0 y0 mod (c)
Step 18. Return ((-q12-q13 + q14 + q15-r6-r7 + r8 + r9 + r10 + r11) c ² + (-r12-r13 + r14 + r15- q16 + q17) c + (-r16 + r17)) (mod Z)
It is.

上記アルゴリズム１０も、実施の形態１における前記剰余乗算ユニットを用い、同様に実現できる。従って、実施の形態１と実施の形態２において、実装できる装置はプログラムを除いて変わらない。要するに、プログラムメモリ３０５にはアルゴリズム１０を実現するデータ処理手順が記述されたプログラムが格納され、そのプログラムによって規定されるデータを処理行なうための制御回路３０３や演算器１０１，１０２，１０３等は図３の構成をそのまま用いればよい。 The algorithm 10 can be similarly realized using the remainder multiplication unit in the first embodiment. Therefore, in Embodiment 1 and Embodiment 2, the devices that can be mounted are the same except for the program. In short, a program in which a data processing procedure for realizing the algorithm 10 is stored is stored in the program memory 305, and a control circuit 303 and arithmetic units 101, 102, 103, etc. for processing data defined by the program are shown in FIG. The configuration of 3 may be used as it is.

上記のアルゴリズム１０の各ステップは、剰余乗算の商と剰余を求める計算と、加算と減算から構成される。前記の剰余乗算専用器のビット長の２倍の剰余乗算を計算する上記アルゴリズム３を再帰的に２回呼び出す場合に比べると、ステップ数が圧倒的に少ない。例えば、上記アルゴリズム１０における、１７回の剰余乗算の商と剰余の計算に対し、上記アルゴリズム３の計算は、３６回である。従って、剰余乗算の計算量が半分以下（＝１７／３６）で済み、全体の処理時間を短縮できる。 Each step of the algorithm 10 includes a calculation for obtaining a quotient and a remainder of remainder multiplication, addition, and subtraction. Compared to the case where the algorithm 3 for calculating the remainder multiplication twice the bit length of the remainder multiplication dedicated unit is recursively called twice, the number of steps is overwhelmingly smaller. For example, in the algorithm 10, the calculation of the algorithm 3 is 36 times with respect to the quotient of 17 multiplications and the calculation of the remainder. Therefore, the calculation amount of the remainder multiplication is less than half (= 17/36), and the entire processing time can be shortened.

上記アルゴリズム１０が、正しく剰余乗算(XY mod Z)を計算できることを以下に示す。 The following shows that the algorithm 10 can correctly calculate the remainder multiplication (XY mod Z).

c=2^ｗとすると、整数cと式(X=x2c²+x1c+x0)を用い、3ｗビットの整数Xを、前記の剰余乗算専用器のビット長(ｗビット)の整数x2, x1, x0に分割できる。他の整数Y、Zも同様にｗビット整数y2,y1,y0,z2,z1,z0であらわせる(Y=y2c²+y1c+y0, Z=z2c²+z1c+z0)。従って、乗算XYをｗビット整数で展開すると、(式0)から(式1)のように展開できる。
XY = (x2 c² + x1c + x0)(y2 c² + y1c + y0)…(式0)
= x2y2c⁴+(x2y1+x1y2)c³ + (x2y0+x1y1+x0y2)c² + (x1y0+x0y1)c + x0y0…(式1)
なお、展開方法としては、(式1)のような単純な展開式の他、効率的な乗算式への変換が知られているKaratsubaアルゴリズムや、Tom-Cook乗算アルゴリズム、高速フーリエ変換アルゴリズム等の計算手法を用いても良い。 If c = 2 ^w , using integer c and the formula (X = x2c ² + x1c + x0), 3w bit integer X is converted to integer x2, x1, Can be divided into x0. The other integers Y and Z are similarly expressed as w-bit integers y2, y1, y0, z2, z1, and z0 (Y = y2c ² + y1c + y0, Z = z2c ² + z1c + z0). Therefore, when the multiplication XY is expanded with a w-bit integer, it can be expanded as in (Expression 0) to (Expression 1).
XY = (x2 c ² + x1c + x0) (y2 c ² + y1c + y0) ... (Equation 0)
= x2y2c ⁴ + (x2y1 + x1y2) c ³ + (x2y0 + x1y1 + x0y2) c ² + (x1y0 + x0y1) c + x0y0… (Formula 1)
As expansion methods, in addition to simple expansion equations such as (Equation 1), the Karatsuba algorithm, which is known for efficient conversion to multiplication equations, Tom-Cook multiplication algorithms, fast Fourier transform algorithms, etc. A calculation method may be used.

以降では、(式1)の値が法Z未満になるよう、(式1)の各項目を変換、整理していく。この要が、次式
z2c² = -z1c - z0 (mod Z)
である。以下のその変換と整理の内容を示す。 Thereafter, each item of (Equation 1) is converted and arranged so that the value of (Equation 1) is less than modulus Z. The key is the following formula
z2c ² = -z1c-z0 (mod Z)
It is. The contents of the conversion and organization are shown below.

(式1の第一項)x2y2c⁴
= (q1z2+r1)c⁴ （式(x2 y2 = q1 z2 + r1)、即ち、r1=x2y2 modz2とq1=x2y2/z2を利用)
= -q1(z1c + z0)c² + r1c⁴ (式(z2c² = -z1c -z0 (mod Z))を利用)
= (-q1z1+r1c)c³ - q1z0c² …(式2)
上記展開式の第一式と第二式において、x2 y2 = q1 z2+r1が成り立つ。上記の式を満たすq1とr1は、(式ａ)と（式ｂ）の関係式から、(x2 y2 / z2)の値を持つ整数をq1、(x2 y2 mod(z2))の値を持つ整数をr1とすればよい。従って、上記アルゴリズム10のステップ１
q1←x2 y2 / z2
r1←x2 y2 mod(z2)
が導けた。 (First term of Formula 1) x2y2c ⁴
= (q1z2 + r1) c ⁴ (Formula (x2 y2 = q1 z2 + r1), ie, using r1 = x2y2 modz2 and q1 = x2y2 / z2)
= -q1 (z1c + z0) c ² + r1c ⁴ (using the formula (z2c ² = -z1c -z0 (mod Z)))
= (-q1z1 + r1c) c ³ -q1z0c ² (Formula 2)
X2 y2 = q1 z2 + r1 holds in the first and second expressions of the above expansion formula. Q1 and r1 satisfying the above equation have an integer having a value of (x2 y2 / z2), q1 and a value of (x2 y2 mod (z2)) from the relational expression of (Expression a) and (Expression b). The integer may be r1. Therefore, step 1 of algorithm 10 above
q1 ← x2 y2 / z2
r1 ← x2 y2 mod (z2)
Was able to lead.

（式１）の第一項を展開した（式2）と、式１の他の項の共通項を探す。すると、(式2)の第一項と(式1)の第二項が共通項c³でまとめられる。すなわち、
(式2)の第一項+ (式1)の第二項
= (-q1z1+r1c)c³ +(x2y1+x1y2)c³
= ((-q2+q3+q4+q5)z2+(-r2+r3+r4+r5))c³
= (q' z2+r')c³ （q'=-q2+q3+q4+q5 かつ r'=-r2+r3+r4+r5 と整理)
= - q' z1c² - q'z0c+r'c³ …(式3) (式(z2c² = -z1c -z0 (mod Z))を利用)
となる。 The first term of (Equation 1) is expanded (Equation 2), and the common term of other terms in Equation 1 is searched. Then, the second term of the first term of (formula 2) (Equation 1) are combined in a common section c ^3. That is,
The first term of (Formula 2) + the second term of (Formula 1)
= (-q1z1 + r1c) c ³ + (x2y1 + x1y2) c ³
= ((-q2 + q3 + q4 + q5) z2 + (-r2 + r3 + r4 + r5)) c ³
= (q 'z2 + r') c ³ (arranged as q '=-q2 + q3 + q4 + q5 and r' =-r2 + r3 + r4 + r5)
=-q 'z1c ² -q'z0c + r'c ³ … (Formula 3) (Using Formula (z2c ² = -z1c -z0 (mod Z)))
It becomes.

ステップ１を導いた場合と同様、(式a)と（式b）の関係式から、上記展開式の第一式と第二式において、q1z1=q2z2+r2、r1c=q3z2+r3、x2y1=q4z2+r4、x1y2=q5z2+r5が成り立つ。
従って、
ステップ2. q2 ← q1 z1 / z2 and r2 ← q1 z1 mod(z2)
ステップ3. q3 ← r1 c /z2 and r3 ← r1 c mod(z2)
ステップ4. q4 ← x2 y1 /z2 and r4 ← x2 y1 mod(z2)
ステップ5. q5 ← x1 y2 /z2 and r5 ← x1 y2 mod(z2)
のように、上記アルゴリズム10のステップ2からステップ5が導けた。 As in the case where Step 1 is derived, from the relational expression of (Expression a) and (Expression b), q1z1 = q2z2 + r2, r1c = q3z2 + r3, x2y1 = q4z2 + r4, x1y2 = q5z2 + r5 holds.
Therefore,
Step 2.q2 ← q1 z1 / z2 and r2 ← q1 z1 mod (z2)
Step 3.q3 ← r1 c / z2 and r3 ← r1 c mod (z2)
Step 4.q4 ← x2 y1 / z2 and r4 ← x2 y1 mod (z2)
Step 5.q5 ← x1 y2 / z2 and r5 ← x1 y2 mod (z2)
Thus, step 2 to step 5 of the algorithm 10 can be derived.

（式１）の第一項を展開した（式2）の第一項を含む展開式である(式3)と、(式2)の他の項と、(式１)の残りの項の共通項を探す。そうすると、(式1)の第三項と(式2)の第二項と(式3)の第一項と第三項が共通項c²でまとめられる。すなわち、
(式1)の第三項＋(式2)の第二項＋(式3)の第一項＋(式3)の第三項
= (x2y0+x1y1+x0y2)c² - q1z0c² - q'z1c²+r'c³
= ((-q6-q7+q8+q9+q10+q11)z2+(-r6-r7+r8+r9+r10+r11))c²
= q''z2c² + r''c²
（q''=-q6-q7+q8+q9+q10+q11かつ r''=-r6-r7+r8+r9+r10+r11と整理)
= - q''z1c - q''z0+ r''c²…(式4) (式(z2c²=-z1c-z0(mod Z))を利用)
である。 (Formula 3), which is the expansion formula including the first term of (Formula 2), which is the expansion of the first term of (Formula 1), and the other terms of (Formula 2) Search for common terms. Then, summarized in paragraph (Expression 2) of the second term and the common is first term and the third term of (formula 3) term c ² (Equation 1). That is,
The third term of (Formula 1) + the second term of (Formula 2) + the first term of (Formula 3) + the third term of (Formula 3)
= (x2y0 + x1y1 + x0y2) c ² -q1z0c ² -q'z1c ² + r'c ³
= ((-q6-q7 + q8 + q9 + q10 + q11) z2 + (-r6-r7 + r8 + r9 + r10 + r11)) c ²
= q``z2c ² + r''c ²
(Arranged as q '' =-q6-q7 + q8 + q9 + q10 + q11 and r '' =-r6-r7 + r8 + r9 + r10 + r11)
=-q''z1c-q''z0 + r''c ² ... (Formula 4) (Using Formula (z2c ² = -z1c-z0 (mod Z)))
It is.

ステップ１からステップ5を導いた場合と同様、(式a)と（式b）の関係式から、上記展開式の第一式と第二式において、q1z0=q6z2+r6、(-q2+q3+q4+q5)z1=q7z2+r7、(-r2+r3+r4+r5)c=q8z2+r8、x2y0=q9z2+r9、x1y1=q10z2+r10、x0y2=q11z2+r11が成り立つ。従って、
ステップ6. q6 ← q1 z0/ z2 and r6← q1 z0 mod(z2)
ステップ7. q7 ← (-q2+q3+q4+q5) z1 / z2 and r7 ← (-q2+q3+q4+q5) z1 mod(z2)
ステップ8. q8 ← (-r2+r3+r4+r5)c / z2 and r8 ←(-r2+r3+r4+r5) c mod(z2)
ステップ9. q9 ← x2 y0 / z2 and r9 ← x2 y0 mod(z2)
ステップ10. q10 ← x1 y1 / z2 and r10 ← x1 y1 mod(z2)
ステップ11. q11 ← x0 y2 / z2 and r11 ← x0 y2 mod(z2)
のように、上記アルゴリズム10のステップ6からステップ11が導けた。 Similar to the case where Step 1 to Step 5 are derived, from the relational expression of (Expression a) and (Expression b), q1z0 = q6z2 + r6, (−q2 + q3 + q4 + q5) z1 = q7z2 + r7, (-r2 + r3 + r4 + r5) c = q8z2 + r8, x2y0 = q9z2 + r9, x1y1 = q10z2 + r10, x0y2 = q11z2 + r11. Therefore,
Step 6.q6 ← q1 z0 / z2 and r6 ← q1 z0 mod (z2)
Step 7.q7 ← (-q2 + q3 + q4 + q5) z1 / z2 and r7 ← (-q2 + q3 + q4 + q5) z1 mod (z2)
Step 8.q8 ← (-r2 + r3 + r4 + r5) c / z2 and r8 ← (-r2 + r3 + r4 + r5) c mod (z2)
Step 9.q9 ← x2 y0 / z2 and r9 ← x2 y0 mod (z2)
Step 10.q10 ← x1 y1 / z2 and r10 ← x1 y1 mod (z2)
Step 11.q11 ← x0 y2 / z2 and r11 ← x0 y2 mod (z2)
Thus, step 6 to step 11 of the algorithm 10 can be derived.

共通項c²および共通項cでまとめるよう、共通項でくくれない各式の項に対し、cを法とする乗算を計算する。従って、下記の式が導ける。すなわち、
(式1)の第四項＋(式3)の第二項＋(式4)の第一項
= (x1y0+x0y1)c - q'z0c - q' z1c
= (-q12-q13+q14+q15)c² + (-r12-r13+r14+r15)c…(式5)
(式１)の第五項＋(式4)の第二項
= -q''z0 + x0y0
= (-q16+q17)c + (-r16+r17) …(式6)
である。 As summarized in common item c ² and the common term c, to terms of the equation not you common terms Deku calculates a multiplication modulo or c. Therefore, the following formula can be derived. That is,
4th term of (Formula 1) + 2nd term of (Formula 3) + 1st term of (Formula 4)
= (x1y0 + x0y1) c-q'z0c-q 'z1c
= (-q12-q13 + q14 + q15) c ² + (-r12-r13 + r14 + r15) c ... (Formula 5)
The fifth term of (Formula 1) + the second term of (Formula 4)
= -q``z0 + x0y0
= (-q16 + q17) c + (-r16 + r17) (Equation 6)
It is.

ステップ１からステップ11を導いた場合と同様、(式a)と（式b）の関係式から、 (-q2+q3+q4+q5)z0=q12c+r12、(-q6-q7+q8+q9+q10+q11)z1=q13c+r13、x1y0=q14c+r14、x0y1=q15c+r15、(-q6-q7+q8+q9+q10+q11)c=q16c+r16、x0y0=q17c+r17が成り立つ。従って、
ステップ12. q12 ← (-q2+q3+q4+q5) z0 / c and r12 ← (-q2+q3+q4+q5) z0 mod(c)
ステップ13. q13 ← (-q6-q7+q8+q9+q10+q11) z1 / c and r13 ← (-q6-q7+q8+q9+q10+q11) z1 mod(c)
ステップ14. q14 ← x1 y0 / c and r14 ← x1 y0 mod(c)
ステップ15. q15 ← x0 y1 / c and r15 ← x0 y1 mod(c)
ステップ16. q16 ← (-q6-q7+q8+q9+q10+q11) z0 / c and r16 ← (-q6-q7+q8+q9+q10+q11) z0 mod(c)
ステップ17. q17 ← x0 y0 / c and r17 ← x0 y0 mod(c)
のように、上記アルゴリズム10のステップ12からステップ17が導けた。 As in the case of deriving from step 1 to step 11, from the relational expression of (expression a) and (expression b), (-q2 + q3 + q4 + q5) z0 = q12c + r12, (-q6-q7 + q8 + q9 + q10 + q11) z1 = q13c + r13, x1y0 = q14c + r14, x0y1 = q15c + r15, (-q6-q7 + q8 + q9 + q10 + q11) c = q16c + r16, x0y0 = q17c + r17 It holds. Therefore,
Step 12.q12 ← (-q2 + q3 + q4 + q5) z0 / c and r12 ← (-q2 + q3 + q4 + q5) z0 mod (c)
Step 13.q13 ← (-q6-q7 + q8 + q9 + q10 + q11) z1 / c and r13 ← (-q6-q7 + q8 + q9 + q10 + q11) z1 mod (c)
Step 14.q14 ← x1 y0 / c and r14 ← x1 y0 mod (c)
Step 15.q15 ← x0 y1 / c and r15 ← x0 y1 mod (c)
Step 16.q16 ← (-q6-q7 + q8 + q9 + q10 + q11) z0 / c and r16 ← (-q6-q7 + q8 + q9 + q10 + q11) z0 mod (c)
Step 17.q17 ← x0 y0 / c and r17 ← x0 y0 mod (c)
Thus, step 12 to step 17 of the algorithm 10 can be derived.

以上の式を全て整理すると、(式1)は、(式4)の第三項と(式5)の第一項と(式5)の第二項と(式6)の第一項と(式6)の第二項の和で表せる。従って、以下の式
(式１)
= (式4)の第三項＋(式5)の第一項＋(式5)の第二項＋(式6)の第一項＋(式6)の第二項
= (-q12-q13+q14+q15+r'')c²+(-r12-r13+r14+r15-q16+q17)c+(-r16+r17)・・・(式7)
が成り立つ。(式7)の値は0未満または法Z以上の値となり得る。そこで、0未満の場合は法Zを加算し、法Z以上の場合は法Zの減算を繰返して、0以上法Z未満の値を得る。これによって上記アルゴリズムのステップ18が導けた。 When all of the above equations are arranged, (Equation 1) is expressed by the third term of (Equation 4), the first term of (Equation 5), the second term of (Equation 5), and the first term of (Equation 6). It can be expressed as the sum of the second term of (Equation 6). Therefore, the following formula
(Formula 1)
= 3rd term of (Expression 4) + 1st term of (Formula 5) + 2nd term of (Formula 5) + 1st term of (Formula 6) + 2nd term of (Formula 6)
= (-q12-q13 + q14 + q15 + r``) c ² + (-r12-r13 + r14 + r15-q16 + q17) c + (-r16 + r17) ... (Equation 7)
Holds. The value of (Equation 7) can be less than 0 or greater than or equal to modulus Z. Therefore, when the value is less than 0, the modulus Z is added. When the value is not less than the modulus Z, the modulus Z is repeatedly subtracted to obtain a value not less than 0 and less than the modulus Z. This led to step 18 of the above algorithm.

なお、上記の（式０）の展開方法を記載したプログラムをプログラム用メモリ３０５で管理し、剰余乗算ユニットにおける制御回路３０３が、前記プログラムに従って、上記展開式を計算しても良い。また、ＲＯＭ６０５やＲＡＭ６０６、またはＥＥＰＲＯＭ６０７で前記プログラムを管理し、マイクロコンピュータにおける、ＣＰＵ６０３が前記プログラムに従って、展開式を計算してもよい。 The program describing the expansion method of (Equation 0) may be managed by the program memory 305, and the control circuit 303 in the remainder multiplication unit may calculate the expansion equation according to the program. Alternatively, the program may be managed by the ROM 605, the RAM 606, or the EEPROM 607, and the CPU 603 in the microcomputer may calculate the expansion formula according to the program.

上記アルゴリズム１０は、前記の剰余乗算専用器のビット長の最大３倍の剰余乗算を計算する場合に利用できる。上記アルゴリズム１０を汎用化した、前記の剰余乗算専用器のビット長の整数倍（２倍、３倍、４倍、５倍、６倍・・・）の剰余乗算を計算する処理フローを図７に示す。図７の処理フローに従い、前記コプロのビット長の整数倍の剰余乗算が計算できる。 The algorithm 10 can be used when calculating a remainder multiplication of a maximum of 3 times the bit length of the remainder multiplication dedicated device. FIG. 7 shows a processing flow for calculating a remainder multiplication that is an integer multiple (2, 3, 4, 5,...) Of the bit length of the above-described dedicated multiplication unit, which is a generalization of the above algorithm 10. Shown in According to the processing flow of FIG. 7, a remainder multiplication of an integral multiple of the bit length of the copro can be calculated.

まず、前記の剰余乗算専用器のビット長にあわせ、剰余乗算に必要なパラメータ（乗数Ｘ、被乗数Ｙ、法Ｚ）を分解する。分解式は、特に制限されないが、例えば次式で表せる。
Ｘ＝Σ_(i=0) ^[k/ｗ]xicⁱ、Ｙ＝Σ_(i=0) ^[k/ｗ]yicⁱ、Ｚ＝Σ_(i=0) ^[k/ｗ]zicⁱ・・・(式8)
ただし、ｋは各整数のビット長とし、[k/ｗ]はビット長kを前記の剰余乗算専用器のビット長ｗで割った値を切り下げた整数とする。 First, parameters (multiplier X, multiplicand Y, modulus Z) necessary for the modular multiplication are decomposed in accordance with the bit length of the above-mentioned dedicated multiplier unit. The decomposition formula is not particularly limited, but can be expressed by the following formula, for example.
X = Σ _{(i = 0)} ^{[k / w]} xic ⁱ , Y = Σ _{(i = 0)} ^{[k / w]} yic ⁱ , Z = Σ _{(i = 0)} ^{[k / w]} zic ^i. (Formula 8)
Here, k is the bit length of each integer, and [k / w] is an integer obtained by rounding down the value obtained by dividing the bit length k by the bit length w of the above-mentioned residue multiplication dedicated device.

例えば、上記アルゴリズム１０の(式０)も(式８)と同様の分解ステップにあたる（Ｓ７０１）。分解したパラメータを展開し、各パラメータ毎の乗算を構成する。この展開ステップでは、単純な展開式や、Karatsubaアルゴリズム、Tom-Cook乗算アルゴリズム、高速フーリエ変換アルゴリズム等の計算手法が利用できる。例えば、上記アルゴリズム１０の(式１)が上記の展開ステップにあたる（Ｓ７０２）。展開された方程式において、最高次ビットまたは最低次ビットの項目から処理を始める。(式1)で最高次ビットから始める場合、第一項が６ｗビット、第二項が５ｗビット、第三項が４ｗビット、第四項が３ｗビット、第五項が２ｗビットである。従って、(式１)では第一項x2y2c⁴が最高ビット長である（Ｓ７０３）。上記項目に対して、剰余乗算を計算する。このとき、(式８)の法Ｚにおける最高次の整数ｚ[k/ｗ]を剰余乗算の法とする。例えば、上記アルゴリズム１０では、最高次の項目であるx2y2c⁴に対し、整数ｚ2が剰余乗算の法である（Ｓ７０４）。各項目を一括して計算できないか、整理するために、共通項をまとめる。上記アルゴリズム１０の場合では、(式２)の第一項と(式１)の第二項の計算において、z2c²を被乗数とする各整数q2、q3、q4、q5と、c³を被乗数とする各整数r2、r3、r4、r5をそれぞれq'とｒ'にまとめている（Ｓ７０５)。
(式8)を変形すると、次式が得られる。
z[k/ｗ] ＝ -Σ_(i=0) ^[k/ｗ]-1 zicⁱ (mod Z)・・・(式9)
(式9)を用い、Ｓ７０４で計算した剰余乗算の商に関わる項目の次数を減らす、リダクション処理を行う。例えば、上記アルゴリズム１０の場合では、(式1の第一項)x2y2c⁴の変形式の第二式から、6ｗビットの整数q1z2c⁴を5ｗビットの整数q1z1c³と4ｗビットの整数q1z0c²に、(式9)を用いて変形している（Ｓ７０６）。 For example, (Equation 0) of the algorithm 10 corresponds to the same decomposition step as (Equation 8) (S701). The decomposed parameters are expanded, and multiplication for each parameter is configured. In this expansion step, simple expansion formulas, calculation methods such as Karatsuba algorithm, Tom-Cook multiplication algorithm, and fast Fourier transform algorithm can be used. For example, (Equation 1) of the algorithm 10 corresponds to the expansion step (S702). In the developed equation, the processing is started from the item of the highest order bit or the lowest order bit. When starting from the highest order bit in (Equation 1), the first term is 6 w bits, the second term is 5 w bits, the third term is 4 w bits, the fourth term is 3 w bits, and the fifth term is 2 w bits. Therefore, (Equation 1) In paragraph X2y2c ⁴ is the highest bit length (S703). Calculate remainder multiplication for the above items. At this time, the highest-order integer z [k / w] in the method Z of (Equation 8) is used as the remainder multiplication method. For example, in the algorithm 10, to X2y2c ⁴ is a highest order of the items, integers z2 is law modular multiplication (S704). Put together common items to organize whether each item can be calculated at once. In the case of the algorithm 10, the second term of the calculation of the first term of (formula 2) (Equation 1), and each integer q2, q3, q4, q5 to multiplicand the Z2c ^2, and the multiplicand and c ³ The integers r2, r3, r4, and r5 are grouped into q ′ and r ′, respectively (S705).
By transforming (Equation 8), the following equation is obtained.
z [k / w] = -Σ _{(i = 0)} ^{[k / w] -1} zic ⁱ (mod Z) (Equation 9)
Using (Equation 9), reduction processing is performed to reduce the order of items related to the quotient of the remainder multiplication calculated in S704. For example, in the case of the algorithm 10, the 6w bit integer q1z2c ⁴ is converted into a 5w bit integer q1z1c ³ and a 4w bit integer q1z0c ² from the second expression of the modified expression of x2y2c ⁴ (first term of expression 1). It is transformed using (Equation 9) (S706).

各項目の最高ビット長が法Ｚのビット用とほぼ等しいかそれ以下ならば、Ｓ７０８へ進み、そうでなければ、Ｓ７０２に戻る（Ｓ７０７）。 If the maximum bit length of each item is substantially equal to or less than that for the bits of modulus Z, the process proceeds to S708, and if not, the process returns to S702 (S707).

各項目の値を合計する。合計値が負または法Ｚ以上の場合、0以上法Z未満となるよう、法Zの加減算を行う。例えば、上記アルゴリズム１０では、(式７)以降の処理がこれにあたる（Ｓ７０８）。 Sum the values of each item. If the total value is negative or greater than or equal to the modulo Z, addition / subtraction of the modulo Z is performed so that it is 0 or more and less than the modulo Z. For example, in the algorithm 10, the processing after (Expression 7) corresponds to this (S708).

実施の形態１と同様に、剰余乗算の商と剰余を計算するＭＭ演算器１０１、加算を計算するＡＤＤ演算器１０２、減算を計算するＳＵＢ演算器１０３の3種類の演算器を用いて、図７に記す処理フロー（上記のアルゴリズム１０を含む）を処理できる。 As in the first embodiment, three types of calculators are used: an MM calculator 101 that calculates a quotient and a remainder of a remainder multiplication, an ADD calculator 102 that calculates an addition, and a SUB calculator 103 that calculates a subtraction. 7 can be processed (including the above algorithm 10).

実施の形態１と同様に、図３で概略を記したブロック図における剰余乗算ユニット３は、図７の処理フローが実行可能である。また、図３に示される全ての機能ブロックは、単結晶シリコン基板のような、一個の半導体基板で形成できる。 Similar to the first embodiment, the remainder multiplication unit 3 in the block diagram schematically shown in FIG. 3 can execute the processing flow of FIG. 3 can be formed using a single semiconductor substrate such as a single crystal silicon substrate.

実施の形態１と同様に、図４に概略を示した処理フローを用い、図７の処理フローが実行できる。また、図４における処理フローにて、ＭＭ演算器１０１における剰余乗算の計算方法は問わず、剰余乗算の商と剰余を計算できる仮定したＭＭ演算器１０１の代わりに、例えば、古典的な剰余乗算やモンゴメリ乗算を実装するＭＭ演算器１０１であってもよい。また、ＭＭ演算器１０１の代わりに、他の演算器を用いても、同様に計算できる。例えば、図１中の（Ｂ）に示すような剰余乗算の剰余を出力するＭＭ２演算器１５１を用いてもよい。さらに、ＭＭ演算器１０１が剰余乗算だけでなく、加算や減算も計算できる場合、ＡＤＤ演算器１０２やＳＵＢ演算器１０３を用いなくても良く、剰余乗算ユニットの回路規模を削減できる。また、ＭＭ演算器１０１の代わりに、乗算を計算するＭＵ演算器１５２と割算を計算するＤＩＶ演算器１５３を用いてもよい。特に、実施の形態１と同様に、剰余乗算に加え、乗算を計算できる場合は、剰余乗算に代えて乗算を計算してもよい。また、演算器を利用する代わりに、各演算結果を予めメモリに書き込み、入力値から、適切にメモリの値を参照し、演算結果を得るように変更しても良い。 Similarly to the first embodiment, the processing flow shown in FIG. 4 can be used to execute the processing flow shown in FIG. In addition, in the processing flow in FIG. 4, regardless of the calculation method of the modular multiplication in the MM computing unit 101, for example, instead of the assumed MM computing unit 101 that can calculate the quotient and remainder of the modular multiplication, for example, classical modular multiplication Alternatively, the MM calculator 101 that implements Montgomery multiplication may be used. The same calculation can be performed by using another arithmetic unit instead of the MM arithmetic unit 101. For example, an MM2 calculator 151 that outputs the remainder of the remainder multiplication as shown in FIG. Furthermore, when the MM calculator 101 can calculate not only the remainder multiplication but also addition and subtraction, the ADD calculator 102 and the SUB calculator 103 need not be used, and the circuit scale of the remainder multiplication unit can be reduced. Further, instead of the MM calculator 101, a MU calculator 152 for calculating multiplication and a DIV calculator 153 for calculating division may be used. In particular, as in the first embodiment, when multiplication can be calculated in addition to remainder multiplication, multiplication may be calculated instead of remainder multiplication. Further, instead of using the computing unit, each computation result may be written in the memory in advance, and the memory value may be appropriately referred to from the input value to obtain the computation result.

実施の形態１と同様に、図６に概略を記したブロック図に示されるマイクロコンピュータにおいて、図４における処理フローは実行可能である。また、図６に示すマイクロコンピュータ６０１は、単結晶シリコン基板のような、一個の半導体基板で形成できる。さらに、図６に示すマイクロコンピュータ６０１は一つの実装例であり、例えば、ＲＦＩＤや、ＰＤＡ、携帯電話等の、他の機器でも実装可能である。 As in the first embodiment, in the microcomputer shown in the block diagram schematically shown in FIG. 6, the process flow in FIG. 4 can be executed. The microcomputer 601 illustrated in FIG. 6 can be formed using a single semiconductor substrate such as a single crystal silicon substrate. Furthermore, the microcomputer 601 shown in FIG. 6 is an example of implementation, and can be implemented in other devices such as an RFID, a PDA, and a mobile phone.

暗号アルゴリズムでは、べき乗の剰余乗算を要求するＲＳＡ暗号など、繰り返しの剰余乗算を処理する場合がある。図８において、剰余乗算ユニットを備えたマイクロコンピュータ６０１の処理フローの概略を示す。剰余乗算ユニットが備える演算機能を繰り返し用い、それらの剰余乗算に対応できる。なお、剰余乗算の計算手順は、特に図８の処理フローに制限されない。例えば、図８におけるＳ８０１からＳ８０５の処理はＣＰＵ６０３が、Ｓ８５１からＳ８５４の処理は剰余乗算ユニットが担当するが、担当区分を変更しても良い。例えば、全ての処理を剰余乗算ユニットが担当しても良い。 In the cryptographic algorithm, there are cases where iterative modular multiplication is processed, such as RSA cryptography that requires exponentiation modular multiplication. FIG. 8 shows an outline of a processing flow of the microcomputer 601 provided with the remainder multiplication unit. The arithmetic function provided in the modular multiplication unit can be used repeatedly to cope with such modular multiplication. Note that the remainder multiplication calculation procedure is not particularly limited to the processing flow of FIG. For example, the processing from S801 to S805 in FIG. 8 is performed by the CPU 603, and the processing from S851 to S854 is performed by the modular multiplication unit, but the divisions in charge may be changed. For example, the remainder multiplication unit may be in charge of all processing.

まず、ＣＰＵ６０３は剰余乗算ユニットに入力するデータＸ、データＹ、データＺを指定する（Ｓ８０１）。既に入力データが剰余乗算ユニットに設定される場合は、既に設定済みのデータを省いても良く、また、剰余乗算ユニットにより、剰余乗算の演算に必要な変数、剰余乗算のビット長等を入力するデータに指定しても良い。指定したデータの値またはアドレス（直接アドレス、または間接的にデータを参照できるアドレスでもよい）はＣＰＵ６０３から剰余乗算ユニットに通知される（Ｄ８０１）。剰余乗算ユニットは指定されたデータを用い、剰余乗算の入力値を設定する（Ｓ８５１）ＣＰＵ６０３は、剰余乗算ユニットに演算開始を指示する制御信号を送信する（Ｓ８０２）。剰余乗算の演算開始に必要な変数がある場合、Ｓ８０２にて剰余乗算ユニットに送信してもよい。演算開始を意味する前記制御信号がＣＰＵ６０３から剰余乗算ユニットに通知される（Ｄ８０２）。剰余乗算ユニットは前記制御信号を受信すると、剰余乗算を計算する（Ｓ８５２）。剰余乗算ユニットが剰余乗算を計算中、ＣＰＵ６０３は剰余乗算ユニットの演算終了を待つ、または他の演算を実行する（Ｓ８０３）。剰余乗算ユニットは剰余乗算を出力する（Ｓ８５３）。剰余乗算ユニットは、ＣＰＵ６０３に演算終了を通知する（Ｓ８５４）。剰余乗算ユニットの演算過程または出力過程等でエラーが発生した場合、エラーを意味する信号をＳ８５４にてＣＰＵ６０３に送信してもよい（Ｄ８０３）。ＣＰＵ６０３は、上記信号を受信し、演算終了を確認する（Ｓ８０４）。エラーを意味する信号が受信された場合は、ＣＰＵ６０３は演算過程でのエラー発生を確認する。以上の過程で、ＣＰＵ６０３と剰余乗算ユニットは、剰余乗算の商と剰余、または剰余乗算の剰余が計算できた。べき乗の剰余乗算を実行するアルゴリズムなどに従い、ＣＰＵ６０３は処理の繰り返しの是非を判断する（Ｓ８０６）。上記の処理を繰り返す場合は、Ｓ８０１へ戻り、繰り返さない場合は終了する。 First, the CPU 603 designates data X, data Y, and data Z to be input to the remainder multiplication unit (S801). If the input data is already set in the remainder multiplication unit, the already set data may be omitted, and a variable necessary for the remainder multiplication operation, the bit length of the remainder multiplication, and the like are input by the remainder multiplication unit. It may be specified in the data. The value or address of the designated data (which may be a direct address or an address where data can be indirectly referenced) is notified from the CPU 603 to the remainder multiplication unit (D801). The remainder multiplication unit uses the designated data and sets the input value of the remainder multiplication (S851). The CPU 603 transmits a control signal instructing the remainder multiplication unit to start computation (S802). If there is a variable necessary for starting the computation of the remainder multiplication, it may be transmitted to the remainder multiplication unit in S802. The control signal indicating the start of calculation is notified from the CPU 603 to the remainder multiplication unit (D802). When the remainder multiplication unit receives the control signal, the remainder multiplication unit calculates a remainder multiplication (S852). While the remainder multiplication unit is calculating the remainder multiplication, the CPU 603 waits for the computation of the remainder multiplication unit to finish or executes another computation (S803). The remainder multiplication unit outputs remainder multiplication (S853). The remainder multiplication unit notifies the CPU 603 of the end of calculation (S854). When an error occurs in the calculation process or output process of the remainder multiplication unit, a signal indicating the error may be transmitted to the CPU 603 in S854 (D803). The CPU 603 receives the signal and confirms the end of the calculation (S804). When a signal indicating an error is received, the CPU 603 confirms the occurrence of an error in the calculation process. Through the above process, the CPU 603 and the remainder multiplication unit can calculate the quotient and remainder of the remainder multiplication or the remainder of the remainder multiplication. The CPU 603 determines whether or not to repeat the process according to an algorithm for performing power-residue multiplication (S806). If the above process is repeated, the process returns to S801, and if not repeated, the process ends.

なお、上記手順は、剰余乗算の剰余、または剰余乗算の商と剰余を計算する場合であったが、加算や減算等の他の演算を実施する場合でも同様である。また、制御信号の送受信は、システムバス３００を経由しても良い。 The above procedure is a case of calculating the remainder of remainder multiplication, or the quotient and remainder of remainder multiplication, but the same applies to the case of performing other operations such as addition and subtraction. Control signals may be transmitted and received via the system bus 300.

以上本発明者によってなされた発明を実施形態に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。本発明は、ＩＣカードだけではなく、暗号機能を備えた種々の組込機器、情報セキュリティ技術に用いる演算器に、広く適用することができる。 Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof. The present invention can be widely applied not only to an IC card, but also to various embedded devices having an encryption function and arithmetic units used for information security technology.

図１は本発明の実施の形態の形態１における、剰余乗算ユニットが備える、または備えてもよい演算器を示す説明図である。FIG. 1 is an explanatory diagram showing an arithmetic unit that a remainder multiplication unit may or may have in Embodiment 1 of the present invention. 図２は本発明の実施の形態１における、処理フローの概略を示すフローチャートである。FIG. 2 is a flowchart showing an outline of the processing flow in the first embodiment of the present invention. 図３は本発明の実施の形態１において、図２に示した処理フローを実行可能な剰余乗算ユニットの構成を例示するブロック図である。FIG. 3 is a block diagram illustrating a configuration of a modular multiplication unit capable of executing the processing flow shown in FIG. 2 in Embodiment 1 of the present invention. 図４は本発明の実施の形態１において、図２に示した処理フローを実行するためのデータ処理手順を例示するフローチャートである。FIG. 4 is a flowchart illustrating a data processing procedure for executing the processing flow shown in FIG. 2 in the first embodiment of the present invention. 図５は、本発明の実施の形態２において、剰余乗算ユニットにおける制御レジスタ３０７内のレジスタ値の一例、剰余乗算ユニットにおける管理レジスタ３０４とアキュムレータ３１２の伝達手続きの一例、アルゴリズム３のステップ１とステップ６において剰余乗算の代わりに乗算を実行する場合に乗算の結果から剰余乗算の商と剰余の相当箇所、の夫々を示した説明図である。FIG. 5 shows an example of a register value in the control register 307 in the remainder multiplication unit, an example of a transmission procedure of the management register 304 and the accumulator 312 in the remainder multiplication unit, and step 1 and step of algorithm 3 in Embodiment 2 of the present invention. FIG. 6 is an explanatory diagram showing a quotient of remainder multiplication and a corresponding portion of the remainder from the result of multiplication when performing multiplication instead of remainder multiplication in FIG. 図６は剰余乗算ユニットの商と剰余を計算する上記アルゴリズム３を実行可能なマイクロコンピュータ６０１の構成を例示するブロック図である。FIG. 6 is a block diagram illustrating a configuration of a microcomputer 601 capable of executing the algorithm 3 for calculating the quotient and the remainder of the remainder multiplication unit. 図７は剰余乗算を計算できる演算器が1回当たりに計算できるビット長の整数倍のビット長の剰余乗算を計算する処理フローを例示するフローチャートである。FIG. 7 is a flowchart illustrating a processing flow for calculating a remainder multiplication with a bit length that is an integral multiple of the bit length that can be calculated at one time by an arithmetic unit that can calculate the remainder multiplication. 図８は剰余乗算ユニットを備えたマイクロコンピュータ６０１のデータ処理フローを概略的に示すフローチャートである。FIG. 8 is a flowchart schematically showing a data processing flow of the microcomputer 601 provided with the remainder multiplication unit.

Explanation of symbols

１０１ＭＭ演算器
１０２ＡＤＤ演算器
１０３ＳＵＢ演算器
１５２ＭＭ２演算器
１５２ＭＵ演算器
１５３ＤＩＶ演算器
３０１クロック発生器
３０２入出力ポート
３０３制御回路
３０５プログラム用メモリ３０５
３０４管理レジスタ
３０６データの格納用メモリ
３０７制御レジスタ
３０８セレクタ
６０１マイクロコンピュータ
６０２クロック発生器
６０３ＣＰＵ
６０４入出力ポート
６０５ＲＯＭ
６０６ＲＡＭ
６０７ＥＥＰＲＯＭ DESCRIPTION OF SYMBOLS 101 MM computing unit 102 ADD computing unit 103 SUB computing unit 152 MM2 computing unit 152 MU computing unit 153 DIV computing unit 301 Clock generator 302 Input / output port 303 Control circuit 305 Program memory 305
304 Management Register 306 Memory for Data Storage 307 Control Register 308 Selector 601 Microcomputer 602 Clock Generator 603 CPU
604 I / O port 605 ROM
606 RAM
607 EEPROM

Claims

An arithmetic unit and a control unit for remainder multiplication;
The arithmetic unit performs a modular multiplication operation,
The control unit recursively repeats the arithmetic operation of the remainder multiplication a plurality of times, from the remainder and quotient of w bits (where w is a positive integer representing the number of bits of the operation value) , the quotient of 2w bit remainder multiplication. When calculating the remainder, the data processing device performs control to distribute the remainder and the quotient of the w-bit remainder multiplication obtained in the previous remainder multiplication computation process to the next remainder multiplication computation process.

An arithmetic unit and a control unit for remainder multiplication;
In the arithmetic unit, w is a positive integer representing the number of bits of the arithmetic value,
x, y, z is a w-bit non-negative integer satisfying 0 ≦ x, y, z <2 ^w ,
2w bit non-negative integer satisfying X, Y, Z 0 ≦ X, Y, Z <2 ^2w ,
When m and n are non-negative integers,
An arithmetic process for outputting an integer q and an integer r satisfying an arithmetic expression xy = qz + r2 ⁿ is performed,
When the control unit recursively repeats the arithmetic processing, the control unit obtains the integer q and the integer r that are output from the remainder multiplication dedicated unit, an integer Q and an integer R that satisfy a multiplication arithmetic expression XY = QZ + R2 ^2m. A data processing apparatus that controls processing to be distributed to the next arithmetic processing to obtain.

The data processing apparatus according to claim 2, wherein the arithmetic unit includes a remainder multiplier, an adder, and a subtracter.

The arithmetic unit includes a data memory,
An accumulator,
A selector that selects a data path from the data memory or the accumulator to the remainder multiplier, the adder, or the subtractor;
The data processing apparatus according to claim 3, wherein the accumulator accumulates outputs of the remainder multiplier, the adder, or the subtracter, and outputs the accumulated data to a selector or a data memory.

The control unit includes a program memory that holds an arithmetic control program that describes the procedure of the processing;
The data processing apparatus according to claim 2, further comprising: a control circuit that decodes a calculation instruction read from the program memory and generates a control signal for causing the calculation unit to execute the calculation process.

The data processing apparatus according to claim 2, further comprising: a central processing unit that gives an instruction of a remainder multiplication process for encryption or decryption to the control unit, and formed on one semiconductor substrate.

A RAM disposed in the address space of the central processing unit;
The control unit is possible and the use Iruko the RAM as a work memory of the arithmetic unit, the data processing apparatus according to claim 6, wherein.

An arithmetic unit and a control unit for remainder multiplication;
The arithmetic unit performs a modular multiplication operation,
The control unit calculates a quotient and a remainder of a kw bit (k> 2) remainder multiplication from a remainder and a quotient of a remainder multiplication of a remainder multiplication of w bits (w is a positive integer representing the number of bits of an operation value ). A data processing device that causes the arithmetic unit to execute a division operation process for dividing kw-bit multiplication into w-bit multiplication and a reduction process for calculating a remainder multiplication from the product of the divided multiplications.

  A data processing method executed by a data processing apparatus having a calculation unit and a control unit for remainder multiplication,
  Arithmetic processing of remainder multiplication by the arithmetic unit;
  The arithmetic operation of the remainder multiplication executed by the control unit is recursively repeated a plurality of times, and the 2w bit is obtained from the remainder and quotient of w bits (where w is a positive integer representing the number of bits of the operation value). A data processing method, comprising: calculating a quotient and a remainder of a remainder multiplication and allocating a remainder and a quotient of a w-bit remainder multiplication obtained by a previous remainder multiplication operation process to a next remainder multiplication operation process.

  A data processing method executed by a data processing apparatus having a calculation unit and a control unit for remainder multiplication,
  w is a positive integer representing the number of bits of the operation value,
  x, y, z 0 ≦ x, y, z <2 ^ww W-bit non-negative integer that satisfies
  X, Y, Z is 0 ≦ X, Y, Z <2 ^2w2w 2w bit non-negative integer that satisfies
  When m and n are non-negative integers,
  Remainder multiplication expression xy = qz + r2 executed by the calculation unit ⁿⁿ Arithmetic processing for outputting an integer q and an integer r satisfying
  When the arithmetic processing executed by the control unit is recursively repeated, the integer q and the integer r output from the remainder multiplication dedicated unit are multiplied by an arithmetic expression XY = QZ + R2 ^2m2m A data processing method including an integer Q satisfying the above and a control for processing to distribute to the next arithmetic processing for obtaining the integer R.

The data processing method according to claim 10, wherein the arithmetic unit includes a remainder multiplier, an adder, and a subtracter.

  The arithmetic unit includes a data memory,
  An accumulator,
  A selector that selects a data path from the data memory or the accumulator to the remainder multiplier, the adder, or the subtractor;
  The data processing method according to claim 11, wherein the accumulator accumulates outputs of the remainder multiplier, the adder, or the subtracter, and outputs the accumulated data to a selector or a data memory.

The control unit includes a program memory that holds an arithmetic control program that describes the procedure of the processing;
The data processing method according to claim 10, further comprising: a control circuit that decodes a calculation instruction read from the program memory and generates a control signal for causing the calculation unit to execute the calculation process.

The data processing method according to claim 10, further comprising: a central processing unit that gives an instruction of a remainder multiplication process for encryption or decryption to the control unit, and formed on one semiconductor substrate.

A RAM disposed in the address space of the central processing unit;
The data processing method according to claim 14, wherein the control unit can use the RAM as a work memory of the arithmetic unit.