JP3742293B2

JP3742293B2 - Residue arithmetic unit

Info

Publication number: JP3742293B2
Application number: JP2000334978A
Authority: JP
Inventors: 信一川村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-11-01
Filing date: 2000-11-01
Publication date: 2006-02-01
Anticipated expiration: 2020-11-01
Also published as: JP2001194993A

Description

【０００１】
【発明の属する技術分野】
本発明は、剰余演算系に基づき大きな整数の演算を並列処理により高速に計算する剰余演算処理装置及び方法に関する。
【０００２】
【従来の技術】
大きな整数を効率良く演算するための手法として剰余演算系（Modular ArithmeticまたはResidue Number System）が知られている。剰余演算系では、互いに素な比較的小さな整数の組{a₁, a₂,…, a_n}を用意し、表現対象となる大きな整数をこれらの整数で割った余りで表現する。以後、この整数の組を剰余演算系の基底(base)と称する。また、要素数ｎを基底サイズと称する。
【０００３】
例えば基底{a₁, a₂,…, a_n}が与えられている場合、整数ｘは、これを基底a_i(i=1,2,...,n)で除して得られるｎ個の余り{x₁, x₂,…, x_n}により表現される。このとき、数xが基底要素の積Ａ(=a₁a₂…a_n)未満の正整数であれば、数xは基底要素の積Ａを法として一意に表現できる。言いかえれば、数xとその剰余演算系表現{x₁, x₂,…, x_n}は一対一に対応する。
【０００４】
このような剰余演算系表現において２つの整数x，yの積を計算するには、まず、各要素毎の積を求め、さらに、対応する基底a_iで除した余りを求める。これは、一般的には、各要素毎に対応する基底a_iを法とする積を計算することで基底要素の積Ａを法とする積が求められることと言い換えられる。加算および減算についても同様であり、基底a_iに対応する要素x_i，y_iについて、a_iを法とする加算あるいは減算をすればよい。
【０００５】
このような剰余演算系を用いた演算では、乗算・加算・減算は、各要素毎独立に対応する基底を法とする演算を行えば良いのであるが、例えば基底として計算機のワード長以内の値を採用することで、非常に大きな整数の演算を単精度の演算の繰り返しによって実現できる。
【０００６】
また、それらの単精度演算は基底毎で独立して実行できるので、演算器を複数用意することで並列処理が可能になる。例えば、基底サイズがｎの場合、剰余機能付き乗算器をｎ個用意し、これらを並列に動作させることによって、１回の単精度剰余付き乗算と同じ時間内で基底要素の積Ａを法とする乗算を終えることができる。
【０００７】
現在の計算機内では、２進数表現が用いられているのが通常である。２進数表現に基づく大きな整数の演算では、LSB(Least Significant Bit)からMSB(Most Significant Bit)に向けて桁上がり（キャリー）が伝播し、大きな整数の全桁数（あるいはビット長）に比例した処理時間がかかる。したがって、剰余演算系を用いて並列処理した場合に比べて処理速度の点で不利である。
【０００８】
一方、剰余演算系はワード間の桁上りが生じないので２進数表現で代表される基数法(Radix representation)に比べ、大きい整数の乗算・加算・減算を効率良く行うための方式として古くから知られてきた。
【０００９】
しかしながら、除算や２数の大小比較については基数法に比べ効率良く行う手段は知られていなかった。このため公開鍵暗号のような大きな整数の演算を高速に行う応用に剰余演算系が適していると考えられながら、８０年代までは剰余演算系を具体的にどう適用したら良いか知られていなかった。
【００１０】
そして、PoschらはIEEE Transaction on Parallel and Distributed Systems, Vol.6, No.5, May 1995, pp.449-454に掲載された”Modulo Reduction in Residue Number Systems”およびComputer & Security誌 Vol.17, pp.637-650, 1998の”RNS-Modulo Reduction Upon a Restricted Base Value Set and its Applicability to RSA Cryptography”において、剰余演算系を利用し、公開鍵暗号系のRSA暗号法(RSA cryptography)の演算を高速に行う方式を提案した。
【００１１】
また、Kornerupらは13th IEEE Symposium on Computer Arithmetic (Proceedings of ARITH13), IEEE Computer Society, pp234-239の”An RNS Montgomery Modular Multiplication Algorithm”において、またPaillierはSpringer-Verlag, Lecture Notes in Computer Science No.1560 Public Key Cryptography (PKC’99), pp.223-234の“Low-Cost Double-Size Modular Exponentiation or How to Stretch Your Cryptoprocessor”において、類似の高速演算方式を提案した。
【００１２】
RSA暗号法に剰余演算系を用いる主な理由は、同暗号法が十進数２００桁程度以上の非常に大きな整数の剰余乗算演算の繰り返しにより構成され、これまで述べたような剰余演算系の乗算と加減算が高速に行える特性を利用して高速処理を実現することが可能であるためである。
【００１３】
上記Poschら、Kornerupら、およびPaillierのそれぞれの方式において共通するのは、剰余演算系において不利な除算を行うことを避けるために、Montgomeryの演算方式を剰余演算系に組合せている点である。また、処理の途中において、ある基底で剰余系表現された整数を別な基底で表現した値を求めるための基底変換(base conversion)あるいは基底拡張(base extension)が行われている点も３方式に共通している。さらに、いずれの方式とも基底変換または基底拡張を効率良く行うことができるか否かが処理全体の効率にかかわっている。
【００１４】
ここで、基底変換と基底拡張という２種類の用語を用いているが、基底変換とは、ある基底で表現された値をその基底と互いに素な別の基底で表現しなおすことをいう。また、基底拡張とは、サイズｎの基底で表現された値を、元の基底にそれと互いに素な１つの整数を加えた、サイズｎ＋１の基底で表した場合のｎ＋１番目の要素を求めることを指す。基底拡張の方式があれば、それをｎ回実行することにより基底変換を構成できることは明らかである。剰余演算系を用いたRSA暗号法の実現においては、基底変換（または基底拡張）を効率良く行うための方式および装置が必要となる。
【００１５】
しかしながら、上記３つの方式、並びにこれまで提案されている方式は、以下で説明するように、何らかの点で効率が悪い基底変換方式であるといえる。
【００１６】
まず、Poschらの提案した方式において、RSA暗号の演算において示された基底変換の方式は、変換前の値がある値よりも小さい場合には変換後の値に誤差を生じる可能性がある。そこでPoschらは基底変換処理の入力に適当なオフセットを加えることで該入力を基底変換処理において誤差が生じないような値に変換し、その変換結果を基底変換し、得られた基底変換結果からオフセットによる影響を取り除くという手順を提案している。しかし、このようなオフセットのための前処理および後処理は全体の演算量を増加させるので効率が悪い。
【００１７】
またPoschらの方式は与えられた基底で計算可能なRSA暗号の鍵のサイズが著しく限定される上、基底変換に必要な補正項を計算するために乗算器を必要とするので、回路化した際の面積および処理遅延の点でも不利である。
【００１８】
図５は、Poschらの方式によるRSA暗号演算に用いられる剰余演算回路の概略構成を示す図である。
【００１９】
剰余演算機能付き積和回路５０１、RAM５２１、ROM５３１は１つのユニットを構成し、同様の構成のユニットがｎ個並列に並ぶ構成になっている。ここでは基底のサイズをｎとしており、各ユニットは特定の基底に対応した演算を行う。例えば、各ユニットは基底Aのｎ個の各基底要素および基底Bのｎ個の各基底要素にそれぞれ対応しており、例えば積和回路５０１では基底a₁，b₁に対応した演算が行われる。また、これらｎ個のユニットはそれぞれｒビットの演算を行うよう構成されており、さらにｒビットのバスによって相互に接続されている。
【００２０】
図６に積和回路５０１〜５０ｎの内部構成を示す。ここでは、便宜上、積和回路５０１で示すユニットに関するものとして説明する。入力としては記号a，ｂで表すｒビットのデータと、図中で右側から入力されているｒビットのROM５３１からのデータ入力がある。図中で、aはRAM５２１からの入力、ｂはROM５３１からの入力を表す。a，bはまず乗算器６０１で掛け合わされ、結果は次段の加算器６０２に供給される。加算器６０２では、乗算結果とレジスタ６０４からのフィードバック値が入力され足し合わされる。加算器６０２の結果は剰余演算部６０３に供給され、レジスタ６０５にセットされた値により割った余りに変換される。ここではレジスタ６０５の値を記号ｍ_ｉと書いているが、これは基底a_１またはb_１を表すものとする。入力a，bには基底サイズと同じｎ組みのデータが供給されるが、ｎ個のデータをすべて計算した後には計算結果がレジスタ６０４に出来上がっており、これはｒビットのバスによってRAM５２１に供給される。
【００２１】
図５の説明に戻って、剰余演算回路には、基底変換において計算結果を補正するための補正項計算ユニット５１０及びこの補正項計算ユニット５１０に外付けされ、該補正項計算ユニット５１０に少なくともｎワードのパラメータを供給するROM５３０が含まれる。
【００２２】
Poschらの提案した補正項計算ユニット５１０は、図７のような積和回路によって実現される。図７に示す回路では、まず入力されたｒビットのデータとＲＯＭ５３０から入力されたデータが乗算器７０１で掛け合わされたのち、加算器７０２によって累積加算される構成となっている。加算結果はレジスタ７０３に記憶され、補正項を完全に計算し終わってから値がフィードバックされる。
【００２３】
ここで注意すべき点は、補正項計算ユニット５１０の回路規模は、図６に示した剰余演算機能付き積和回路の回路規模と同程度以上の大きなものであるという点である。また、ここで計算される補正項は（ｒ＋log₂ｎ）ビット程度の大きさであり、図において、積和回路５０１〜５０ｎに向けて補正項を伝えるための伝送バス幅はｒビットではなく、（r+log₂ｎ）ビット必要であり、これは回路面積を増大させる要因になる。もっとも、この内ｒビットはＲＡＭから補正項演算ユニットへのバスと共用することが可能であるが、その場合でもlog₂ｎビット分はフィードバックのために余計な面積を必要とすることになる。
【００２４】
また、積和回路５０１から５０ｎは、補正項計算ユニット５１０から受け取った補正項をそれまでの計算結果に反映させるために少なくとも剰余乗算を１回行う必要がある。仮に、補正項を他の処理を行っている間に、逐次積和回路にフィードバックすることができれば処理時間の節約になると思われるが、Poschらの構成では補正項を完全に計算し終わってからでなければ値をフィードバックすることはできない。これら具体的な問題点を解決する手段はこれまで考案されていなかった。
【００２５】
他の従来技術であるKornerupらの方式では、前記補正項を計算するために、ShenoyとKumaresanがIEEE Transaction on Computers, Vol.38, No.2, February 1989, pp.292-297の”Fast Base Extension Using a Redundant Modulus in RNS”で提案した方式を用いている。この場合、補正項のサイズはｎ程度でありPoschらの方式に比べ格段に小さいが、この方式の場合にもやはり補正項の計算に乗算を必要とし回路規模、処理遅延の点でより効率の良い補正項演算手順が求められていた。
【００２６】
また、他の従来技術であるPaillierの提案した方式では、任意の基底を選べる訳ではなく、基底に対して基数表現への変換や基数表現から剰余演算系表現への変換が非常に効率良く行えるという条件が課されているために適用範囲が限られている。論文中で具体的に示されている適用可能な例は、基底サイズｎが２の基底二組を用いる場合のみが示されており、それ以外の実用的な例は知られていない。ｎが２程度と小さい場合、基底の各要素は逆に大きく、ｎを大きく取れて基底の各要素を小さく設定できる場合にくらべ処理速度を上げることが困難である。
【００２７】
以上述べたように、RSA暗号の高速処理をねらって剰余演算系を利用することを提案した３種類の方式が知られており、これまでに提案されているRSA暗号の演算方式に比べて処理効率を上げる効果はあるものの、いずれの方式においても処理ステップの中で最も重要な部分である基底変換処理の効率が悪かったり、基底サイズが限られた方式しかなかった。
【００２８】
【発明が解決しようとする課題】
以上の点に鑑み本発明は、従来提案されている基底変換方式に比べ、以下のような点のすべてもしくは一部において優れている新しい基底変換方式を提供することを目的とする。
【００２９】
(a)補正項の値が比較的小さくかつ逐次処理できる。
【００３０】
(b)変換後の値が変換前に表現されていた値と一致し、誤差が生じない。
【００３１】
(c)仮に誤差が生じる場合でも、前後の処理や入力サイズの制限により誤差を容易に制御できる。
【００３２】
(d)RSA暗号への適用においては鍵のサイズへの制限が少ない。
【００３３】
(e)補正項を計算するのに乗算が不要で処理効率が良い。
【００３４】
(f)基底の取り方に制約が少なく汎用性が高い。
【００３５】
そして、このような基底変換方式を、Montgomeryのアルゴリズムと組み合わせることにより、RSA暗号の処理等に用いられる高速な剰余演算装置及び方法を実現することを目的とする。
【００３６】
【課題を解決するための手段】
上記課題を解決し目的を達成するために本発明は次のように構成されている。
【００３７】
（１）本発明の剰余演算装置は剰余演算機能を有する複数の積和回路と、この積和回路における剰余演算に用いられる補正項を計算する補正項計算ユニットとを備えた剰余演算装置において、前記補正項計算ユニットは、前記補正項を１ビットずつ逐次計算し、前記積和回路は、前記補正項計算ユニットにより計算された前記補正項を逐次反映させて基底変換もしくは基底拡張を行うことを特徴とする剰余演算装置である。
【００３８】
（２）本発明の剰余演算装置は上記（１）に記載の装置であって、かつ前記積和回路は、モンゴメリ(Montgomery)乗算を行うことを特徴とする剰余演算装置である。
【００３９】
（３）本発明の剰余演算装置は複数並列に配置された積和回路と、この積和回路における剰余演算に用いられる補正項を計算する補正項計算ユニットとを備えた剰余演算処理装置において、前記補正項計算ユニットは、前記補正項を１ビットずつ逐次計算し、前記積和回路は、前記補正項計算ユニットにより計算された前記補正項を逐次反映させて剰余演算系表現を基数表現に変換する演算を行うことを特徴とする剰余演算装置である。
【００４０】
（４）本発明の剰余演算装置方法は上記（１）乃至（３）のいずれかに記載の装置であって、かつ前記補正項計算ユニットは、除算回路を有し、前記積和回路で扱われる剰余演算系の基底を２の冪、もしくは２の冪に近接するものとすることを特徴とする剰余演算装置である。
【００４１】
（５）本発明の剰余演算装置は上記（１）乃至（４）のいずれかに記載の装置であって、かつ前記補正項計算ユニットへの入力ビットを選択するビット選択部を更に備えたことを特徴とする剰余演算装置である。
【００４２】
（６）本発明の剰余演算装置は上記（１）乃至（５）のいずれかに記載の装置であって、かつ外部とのデータの入出力を行うＩ／Ｏ部を更に備えたことを特徴とする剰余演算装置である。
【００４３】
（７）本発明の剰余演算装置は剰余演算系における所定の演算アルゴリズムにおいてある基底を他の基底に基底変換又は基底拡張する剰余演算装置において、前記基底変換又は基底拡張の未知パラメータｋを、該未知パラメータｋの前回計算結果の累積加算により生じた桁上がりに近似して出力するｋ出力手段と、前記ｋ出力手段から出力された未知パラメータｋに応じて前記基底変換又は基底拡張における特定項の計算可否を切り替える切替手段と、前記特定項の計算を含む乗算、加算及び剰余算の組み合わせにより基底要素毎に前記基底変換又は基底拡張の計算を行う複数の演算ユニットと、を具備することを特徴とする剰余演算装置である。
【００４４】
（８）本発明の剰余演算装置は上記（７）に記載の装置であって、かつ前記ｋ出力手段は、中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似することを特徴とする剰余演算装置である。
【００４５】
（９）本発明の剰余演算装置は上記（７）に記載の装置であって、かつビット選択手段をさらに具備し、前記ｋ出力手段は中国剰余定理に基づく前記未知パラメータｋの計算式の分子を、前記ビット選択手段による有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算装置である。
【００４６】
（１０）本発明の剰余演算装置方法は上記（７）に記載の装置であって、かつ前記ｋ出力手段は、中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似するとともに、該計算式の分子を、有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算装置である。
【００４７】
（１１）本発明の剰余演算装置は上記（７）に記載の装置であって、かつ前記所定の演算アルゴリズムは、入力された整数ｘ，ｙ，Ｎに対して、ｘｙＢ^−１ modＮまたはｘｙＢ^−１ mod Ｎ＋Ｎを出力するモンゴメリ乗算アルゴリズムから構成されることを特徴とする剰余演算装置である。
【００４８】
（１２）本発明の剰余演算装置は上記（１１）に記載の装置であって、かつ前記モンゴメリ乗算を用いた所定のアルゴリズムに従ってべき乗剰余算を行う手段を具備することを特徴とする剰余演算装置である。
【００４９】
（１３）本発明の剰余演算装置は上記（７）に記載の装置であって、かつ中国剰余定理に基づく未知パラメータを含む所定の計算式に従って、剰余演算系表現を基数表現に変換して出力する変換手段を具備することを特徴とする剰余演算装置である。
【００５０】
（１４）本発明の剰余演算装置は剰余演算系における所定の演算アルゴリズムにおいてある基底を他の基底に基底変換又は基底拡張する剰余演算装置において、特定項の計算を含む乗算、加算及び剰余算の組み合わせにより基底要素毎に前記基底変換又は基底拡張の計算を行うための複数の演算ユニットと、前記複数の演算ユニットのそれぞれに設けられ、前記基底変換又は基底拡張の未知パラメータｋを、該未知パラメータｋの前回計算結果の累積加算により生じた桁上がりに近似して出力するｋ出力手段と、前記ｋ出力手段に対応する前記演算ユニットの前記特定項の計算可否を、該ｋ出力手段から出力された未知パラメータｋに応じて切り替える切替手段と、当該演算ユニットのオペランドを隣接する演算ユニットに送信し、及び隣接する他の演算ユニットからのオペランドを受信する演算ユニット間の接続手段と、を具備する剰余演算装置である。
【００５１】
（１５）本発明の剰余演算装置方法は上記（１４）に記載の装置であって、かつ前記ｋ出力手段は、中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似することを特徴とする剰余演算装置である。
【００５２】
（１６）本発明の剰余演算装置方法は上記（１４）に記載の装置であって、かつ前記ｋ出力手段は、中国剰余定理に基づく前記未知パラメータｋの計算式の分子を、有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算装置である。
【００５３】
（１７）本発明の剰余演算装置は上記（１４）に記載の装置方法であって、かつ前記ｋ出力手段は、中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似するとともに、該計算式の分子を、有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算装置である。
【００５４】
（１８）本発明の剰余演算装置方法は上記（１４）に記載の装置であって、かつ前記所定の演算アルゴリズムは、入力された整数ｘ，ｙ，Ｎに対して、ｘｙＢ^−１ mod ＮまたはｘｙＢ^−１ mod Ｎ＋Ｎを出力するモンゴメリ乗算アルゴリズムから構成されることを特徴とする剰余演算装置である。
【００５５】
（１９）本発明の剰余演算装置は上記（１８）に記載の装置であって、かつ前記モンゴメリ乗算を用いた所定のアルゴリズムに従ってべき乗剰余算を行う手段を具備することを特徴とする剰余演算装置である。
【００５６】
（２０）本発明の剰余演算装置は上記（１４）に記載の装置であって、かつ中国剰余定理に基づく未知パラメータを含む所定の計算式に従って、剰余演算系表現を基数表現に変換して出力する変換手段を具備することを特徴とする剰余演算装置である。
【００５７】
（２１）本発明の剰余演算方法は剰余演算系における所定の演算アルゴリズムにおいてある基底を他の基底に基底変換又は基底拡張する剰余演算方法において、前記基底変換又は基底拡張の未知パラメータｋを、前回計算結果の累積加算により生じた桁上がりに近似し、前記出力された未知パラメータｋに応じて前記基底変換又は基底拡張における特定項の計算可否を切り替え、前記特定項の計算を含む乗算、加算及び剰余算の組み合わせにより基底要素毎に前記基底変換又は基底拡張の計算を行うことを特徴とする剰余演算方法である。
【００５８】
（２２）本発明の剰余演算方法は上記（２１）に記載の方法であって、かつ中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似することを特徴とする剰余演算方法である。
【００５９】
（２３）本発明の剰余演算方法は上記（２１）に記載の方法であって、かつ中国剰余定理に基づく前記未知パラメータｋの計算式の分子を、有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算方法である。
【００６０】
（２４）本発明の剰余演算装置方法は上記（２１）に記載の方法であって、かつ中国剰余定理に基づく前記未知パラメータｋの計算式の分母を２の冪により近似するとともに、該計算式の分子を、有効ビット長以外の切り捨てに基づいて近似することを特徴とする剰余演算方法である。
【００６１】
（２５）本発明の剰余演算方法は上記（２１）に記載の方法であって、かつ前記所定の演算アルゴリズムは、入力された整数ｘ，ｙ，Ｎに対して、ｘｙＢ^−１ mod ＮまたはｘｙＢ^−１ mod Ｎ＋Ｎを出力するモンゴメリ乗算アルゴリズムから構成されることを特徴とする剰余演算方法である。
【００６２】
【発明の実施の形態】
以下、本発明の実施形態について、図面を参照しつつ詳細に説明する。
【００６３】
（第１実施形態）
まず、本発明の最も適した例であるRSA暗号法の演算について述べる。
【００６４】
RSA暗号の暗号化および復号は次の式で表されるべき剰余演算によって実現される。
【００６５】
Ｃ＝ｍ^e mod Ｎ（１）
ここで、m，Nは十進数で数百桁の大きさで処理量が非常に大きいため、これを効率良く計算するために様々な方式が考案されてきた。
RSA暗号の演算を実装する良く知られた方法としてMontgomeryによって提案された剰余付き乗算（以後これをモンゴメリ(Montgomery)乗算と呼ぶ）を繰り返し用いる方法がある。従来の技術でも紹介したように、本発明の具体的適用対象の一つとしてモンゴメリ乗算を剰余演算系で行う場合を取り上げる。ここでは、まず剰余演算系ではない通常のモンゴメリ乗算の処理手続きについて説明する。
【００６６】
モンゴメリ乗算は、入力された整数ｘ，ｙ，Ｎに対して、ｘｙＢ^−１ mod ＮまたはｘｙＢ^−１ mod Ｎ＋Ｎを出力するアルゴリズムであり、次の５ステップからなる。
【００６７】
（１）ｓ ← ｘ・ｙ
（２）ｔ ← ｛ｓ・（−Ｎ）^−１｝mod Ｂ
（３）ｕ ← ｔ・Ｎ
（４）ｖ ← ｓ＋ｕ
（５）ｗ ← ｖ／Ｂ
ここで、ｓ，ｔ，ｕ，ｖ，ｗは中間変数を表し、ＢはＮより大きく、Ｎと互いに素な任意の整数である。
【００６８】
これを剰余演算系で実現するアイディアはPoschらが初めて提案しており、次のような７ステップで書ける。
【００６９】
（１）ｓ_A ← ｘ_A・ｙ_A，ｓ_B ← ｘ_B・ｙ_B
（２）ｔ_B ← ｛ｓ_B・（−Ｎ_B）^−１｝mod Ｂ
（３）ｔ_B から基底変換によりｔ_Aを求める。
（４）ｕ_A ← ｔ_A・Ｎ_A
（５）ｖ_A ← ｓ_A＋ｕ_A
（６）ｗ_A ← ｖ_A Ｂ_A ^−１
（７）ｗ_Aから基底変換によりｗ_Bを求める。
ここで、添え字AあるいはBをつけた記号はそれぞれ剰余演算系の基底A＝｛a₁, a₂,…, a_n｝あるいは基底B=｛b₁, b₂,…, b_n｝によって表現された数を表す。例えば、ｘ_Aは基底要素の積Ａ= a₁a₂…a_n を法とする剰余環の要素ｘを基底Aの各要素で割った余りｎ個の組｛x₁, x₂,…, x_n｝を表す。上記の処理により正しく計算ができるためには少なくともＮ＜Ａ，Ｎ＜Ｂが必要条件である。この条件からｘやｙは基底Aのみ、あるいは基底Bのみで一意に表現できるので、ｘ_A，ｘ_Bのペアでｘを表すこと自体は冗長である。しかし、ｘとｙの積ｓがとる値の範囲は０≦ｓ＜Ｎ²であり、A*Bを基底としてはじめて正しく表現される。このことからｘとｙもA*Bを基底として表現することによりｓが剰余演算系の積として正しく計算できることがわかる。なお、基底Aと基底Bのサイズｎとｍは一般には異なるが、特殊な場合としてｎ=ｍとした場合には基底Aを処理する演算ユニットと基底Bを処理する演算ユニットを共用できるという利点がある。
【００７０】
剰余演算系におけるモンゴメリ乗算の処理においてステップ（３）と（７）を除けば、通常のモンゴメリ演算における５つのステップとの対応関係は明らかであろう。ステップ（１）〜（２）、（４）〜（６）は剰余演算系の乗算または加算によって容易に実現できる。例えば、ステップ（１）のｓ_Aの計算では基底Aの剰余演算系で表されたｘの各要素とｙの各要素を対応する基底要素を法として乗ずることによって計算できる。これに対してステップ（３）、（７）の基底変換についてこれまで幾つかの研究がなされてきた。基底変換をいかに効率良く行うかが上記処理アルゴリズムを効率良く実装するためのポイントとなる。
【００７１】
与えられたｘをまさしく基底要素の積Ａ（= a₁a₂…a_n）以下の正の値として表現する手法をまず考察する。いまｘを０≦ｘ＜Ａなる整数とし、その剰余演算系表現を｛x₁, x₂,…, x_n｝とする。この時よく知られた中国剰余定理から次式が成り立つ。
【００７２】
【数１】

【００７３】
ここで、A_i はＡ/ a_i，A_i ^-1は法a_i におけるA_i の乗法逆元である。このとき、
【数２】

【００７４】
なるｋが唯一存在する。ここで、未知なパラメータはkのみであり、kを既知のパラメータで表現することを考える。kは第一項で計算された値を０以上Ａ未満の整数にするためのパラメータであり、以後、kを補正項と呼ぶことにする。
【００７５】
式（３）の両辺をＡで割ると、
【数３】

従って、
【数４】

ここで、０≦x/Ａ＜１を考慮すると、
【数５】

が言える。小数部を切り捨てる操作を記号［］で表すと、式（６）から次の関係式が導ける。
【００７６】
【数６】

【００７７】
これはPoschらの表現に似ているが彼等の方式による補正項ｋ’は次のように書ける。
【００７８】
【数７】

【００７９】
このPoschらの式（８）と比べると、本発明に係る式（７）は、ｘ_i の項が（）内に組み込まれ、a_i を法としてA_i ^-1と掛け合わされている点が異なる。以後この積を次のように記号ξ_iで表す。
【００８０】
ξ_i＝x_i *A_i ^-1mod a_i （９）
式（７）に基づく補正項kの取りうる値は０以上ｎ未満の値となるのに対し、Poschらの式（８）に基づく補正項k’は最大でΣ_i=1 ⁿ a_i 程度の大きさとなる。このPoschらの補正項k’は、
【数８】

を満たし、多くの場合ｎを大きく超える値となる。なお、Min，Maxはそれぞれ最小値、最大値をとる関数とする。
【００８１】
式（７）に従って計算される補正項kは、Poschらの方式に比べて値が小さいものが得られる。このように、本発明による補正項kの計算法は式（７）の関係式を出発点として構成される。
【００８２】
ここで、本発明によるモンゴメリ乗算を実現する剰余演算回路の構成を図面を参照しながら説明する。
図１は、モンゴメリ乗算を実現する剰余演算装置の主要部分を図示したものである。剰余演算機能付き積和回路１０１、RAM１２１、ROM１３１は１つのユニットを構成し、同様の構成のユニットがｎ個並列に並ぶ構成になっている。各ユニットは基底Aのｎ個の各基底要素および基底Bのｎ個の各基底要素にそれぞれ対応しており、例えば積和回路１０１では基底a₁，b₁に対応した演算が行われる。これらｎ個のユニットはそれぞれｒビットの演算を行うよう構成されており、さらにｒビットのバスによって相互に接続されている。これらｎ個のユニット以外の構成要素としてビット選択部１１１と補正計算ユニット１１０とが図示されている。補正項計算ユニット１１０は、上記した式（７）またはその変形式に従って補正項kに相当する値を計算するために必要なユニットである。ビット選択部１１１はｒビットのバスから必要な数の上位ビット（ｑ）を切り出すユニットであるが、実装によってはｒビットをそのまま補正項計算ユニット１１０に供給する場合もある。
【００８３】
図２は、図１に示された積和回路１０１〜１０ｎのうちの一つの積和回路の構成を示している。ここでは、便宜上、積和回路１０１で示すユニットに関するものとして説明する。入力としては、記号ａ，ｂで表すｒビットのデータと、図中で右側から入力されているｒビットのROM１３１からのデータと、１ビットの補正項演算ユニットからの出力とがある。図中でａはRAM１２１からの入力、ｂはROM１３１からの入力を表す。ａ，ｂはまず乗算器２０１で掛け合わされ、結果は次段の加算器２０２に供給される。加算器２０２は乗算結果の他にレジスタ２０４からのフィードバック値と、レジスタ２０５からのデータとが入力され、足し合わされる。ただし、レジスタ２０５からのデータは、スイッチ２０７が閉じているときはそのまま加算器２０２に供給されるが、スイッチ２０７が開いている時は０に置き換えられる。スイッチ２０７の開閉は補正項計算ユニット１１０からの１ビットのデータで制御される。データが１の場合は閉、データが０の場合は開に制御される。加算器２０２の結果は剰余演算部２０３に供給され、レジスタ２０６にセットされた値により割った余りに変換される。ここでは、レジスタ２０６の値を記号ｍ_ｉと書いているが、これは基底a₁またはb₁を表すものとする。入力ａ，ｂには基底サイズと同じｎ組みのデータが供給されるが、ｎ個のデータをすべて計算した後には計算結果はレジスタ２０４に出来上がっており、これはｒビットのバスによってRAM１２１に供給される。
【００８４】
図３は、補正項計算ユニット１１０の一構成例を示している。この補正項計算ユニット１１０は、入力されたｑビットのデータを加算器３０１によって累積加算する構成を有する。加算結果のｑ＋１ビットはレジスタ３０２に記憶され、レジスタ３０２の最上位ビットが補正項の逐次計算結果として出力される。最上位ビット以外のｑビットは次の処理ステップで再び加算器３０１に供給される。入力として基底サイズと等しいｎ個の値が供給されるので、補正項演算ユニット１１０は、ｎ回にわたって計算結果を出力することになる。
【００８５】
図４は、ビット選択部１１１の構成を示す図である。ここでは入力されたｒビットの内、上位ｑビット（ｑ≦ｒ）が出力される。なお、ｑ＝ｒとする構成の場合は、ビット選択部を設けなくてもよい。
【００８６】
図８は、補正項計算ユニット１１０の他の構成例を示す。この構成例は、入力値をまず除算回路８０１によって除算する点を特徴とする。このような除算回路８０１は一見、図７の構成に比べて不利に見えるかもしれないが、除数が２の冪または２の冪に非常に近ければ、効率良く除算を行う手段が知られており、除算回路８０１での処理は必ずしも大きくない。
【００８７】
ここで、本発明に係る剰余演算装置の第１の実施形態の特徴点として、式（７）に従って補正項を計算するための手順について説明する。なお、本実施形態は図１においてｑ＝ｒとした回路構成を前提とする。また、本実施形態では、図８に示した補正項計算ユニットの構成を用いることとする。この図８の構成では補正項を求める際に除算が行われるが、基底要素の積Ａ未満の任意のｘに対して正しい補正項kを計算できるという長所がある。また、一般には除算の精度と手間が問題になるが、２^r、２^r−１、２^r＋１のような特殊な値を基底とする場合には、この手法により容易に補正項を計算できる。
【００８８】
ここで、基底Aで表現されたｘを基底Bの表現に変換するまでの流れを説明する。
【００８９】
【数９】

【００９０】
上式（１１）に基づく演算を実際にハードウェア化するためには、次式のような漸化式で表現される手続きが利用される。
【００９１】
σ_i = (σ_i-1 − k_i-1 ) +ξ_i / a_i （１２）
k_i = [σ_i] （１３）
c_i = {c_i-1 +ξ_i* (A_i mod b_j)+ k_i*（b_j − Amod b_j ）} mod b_j（１４）
上式（１２）〜（１４）に基づく手続きを、変換先となるすべての基底要素ｂ_ｊ（j=１,...,m）について、i＝１からｎまで順に繰り返す。
各変数の初期値σ_０＝k_０＝c_０=０とすると、c_nが基底変換された結果となる。このように漸化式で表現すると、補正項kは１ビットずつ計算され、上式（１４）に示されるように、その都度、基底変換の途中結果に反映されることがわかる。
【００９２】
式（１２）に基づいて補正項kを逐次計算するためのハードウェア構成が、既に示した図８の補正項計算ユニット１１０である。上式（１１）におけるξ_iは、図８に示される入力xに対応し、上式（１１）におけるa_iは入力yに対応する。
【００９３】
加算器８０２は、除算回路８０１から出力された除算結果(x/y)と、レジスタ８０３に保持されている前回の値とを加算し、その結果をレジスタ８０３に出力する。同図に示されるように、レジスタ８０３においてキャリーが生じた際、そのキャリービット（１ビット）が補正項k（reduction factor）として補正項計算ユニット１１０から出力される。この補正項kは、１または０の値をとる。
【００９４】
補正項計算ユニット１１０から出力されたkに基づき、式（１４）に従って基底変換後の値を並列に計算するためのハードウェア構成が、既に示した図２の積和回路１０１〜１０ｎである。積和回路の一つ、例えば積和回路１０１は、次のような基本演算をサポートするよう構成される。
c_i+1 = ( c_i+ ab + k_id ) mod m_i （１５）
上式（１５）において、右辺に含まれるk_iは１または０であるため、右辺第三項の計算はスイッチ２０７のみにより実現される。これは、図１に示した補正項演算ユニット１１０から積和回路１０１〜１０ｎへのフィードバックは１ビットの結線のみで足りることを意味している。このような本実施形態の回路構成は、図５に示したPoschらの回路に比べて極めて簡素な構成になっている。フィードバックが１ビットで済むという構成上の利点は、後述する他の実施形態でも同様である。
【００９５】
なお、以上の手続きでは補正項を計算する際、最初にx_iをA_i ^-1と掛けてξ_iを求める必要があるが、剰余演算系のモンゴメリ乗算にこの基底変換を用いる場合には、ステップ（２）で定数（‐N_B ^-1）の各要素にあらかじめA_i ^-1を掛けておくことで、ξ_iを求める手間が新たに加わることが無くなる。また、ステップ（７）の変換に必要な前処理は、ステップ（６）の定数B_A ^-1に組み込んでおくことができる。この点は、後述する他の実施形態についても同様である。
【００９６】
また、以上の手続きは基底変換のみならず基底拡張にも適用可能であることは明らかである。すなわち、ｍ個すべての基底{b_j}について変換を行うのではなく、特定の基底についてのみ変換を行えば、基底拡張を行ったこととになる。
【００９７】
以上説明した第１実施形態の剰余演算装置に適用された、本発明に係る新たな基底変換（拡張）によれば、次のような作用効果を得ることができる。
(a)補正項の値を比較的小さくし、かつこれを１ビット単位で逐次処理できる。
(b)基底変換後の値が変換前に表現されていた値と同じであるから、Poschらの方式のような誤差が生じない。
(c)仮に誤差が生じるとしても、前後の処理や入力サイズの制限により誤差を容易に制御できる。
(d)RSA暗号への適用においては鍵のサイズへの制限が少ない。
(e)補正項の計算に乗算が不要であり処理効率が良い。
(f)基底の取り方に制約が少なく汎用性が高い。
したがって、本実施形態のような基底変換（拡張）によれば、簡素な構成でモンゴメリ乗算を高速化でき、ひいてはRSA暗号法の処理の高速化を実現できる。
【００９８】
また、本実施形態の剰余演算装置は、剰余演算系表現を基数表現に変換する手続にも適用可能である。この手続きの詳細は第２実施形態において説明する。
（第２実施形態）
第２の実施形態は、式（１１）による補正項kの計算式において、右辺の各項の分母を、分母以上で最も分母に近い２の冪に選ぶことで近似を行うものである。
【００９９】
すなわち、
2^ri-1 ＜ a_i ≦ 2^ri （１６）
を満たすｒ_iによって、a_i を2^riで近似する。なお、一般にr_iは基底の要素毎で異なるが、実装上はすべての基底要素を同一のビット長にすると、図１の積和演算回路１０１〜１０ｎが共通化できるといった利点が得られる。
適当なμ_iに対してa_iは次式のように表される。
【０１００】
a_i＝ 2^ri‐μ_i （１７）
この時、式（１１）で計算される補正項kの近似値lとして
【数１０】

を用いる。lもk同様に漸化式で逐次計算すると。
【０１０１】
σ_i = (σ_i-1 − l_i-1 ) +ξ_i / 2^ri （１９）
l_i = [σ_i] （２０）
ここで、lおよびσの初期値はともに０とする。補正項の計算はこの式（１９）および（２０）にしたがって行うことができる。本実施形態についても第１の実施形態と同様に、図１においてｑ=ｒとした回路構成を前提としている。また、本実施形態以降では図３に示す補正項計算ユニットの構成を用いることとする。
【０１０２】
式（１９）および（２０）に従った補正項計算は、第１実施形態と同様に基底変換および基底拡張に利用できる。しかしながら本実施形態では、これを剰余演算系表現を基数表現に変換する手続きに応用する。次式（２１）は、剰余演算系表現を基数表現に変換する手続きを示している。
【０１０３】
c _i = c_i-1 +ξ_i* A_i ‐ l_i*Ａ（２１）
ここで注意すべきは、式（２１）は式（１４）と似ているが、式（１４）では変数c_iは最大の基底を表現できる精度さえあれば良かったのに対して、式（２１）の変数c_iは基底要素の積Ａ程度の大きさを有する多倍長変数を格納できることを前提としている点である。実際にハードウェアを設計する場合には式（２１）の計算をそのまま実現するのではなく、単精度の演算の繰り返しに分割するなどの工夫が必要であるが、剰余演算系表現から基数法表現に変換する原理を説明するにはこれで十分であろう。また単精度演算への分解は容易である。
【０１０４】
式（１８）に従ってkを近似した場合、式（２１）の変換結果に誤差が生じることがある。ここでその誤差について若干の説明を加える。まず近似誤差の尺度として次式で表されるεを導入する。
ε= Max(μ_i／2^ri ) （２２）
このεを用いると、入力xが、
nεＡ ≦ x ＜Ａ（２３）
のとき、式（１７）は正しい補正項kと同じ値を与える。また、
0 ≦ x ＜ nεA （２４）
のとき、式（１７）は正しい値ｋまたはk-1を与える。
式（２３）によると、nε＜１を満たし、かつ、必要なだけ小さいεを選ぶことが求められる。一方、μ_i を十分小さく選ぶと、図２の剰余演算部２０３で行われるmod a_i演算が容易になるということも知られている。
【０１０５】
以上説明した第２実施形態によれば、第１実施形態と同様に簡素な構成で高速に補正項lを計算する剰余演算装置により、剰余演算系表現を基数表現に変換する手続を実現できる。
なお、第２の実施形態では、式（１２）の分母を２の冪で近似したが、式（１２）において分母のみならず分子についても近似を行ってもよい。具体的には、次の第３実施形態で説明するように、分子の有効ビット長を許容誤差範囲内で短くしてもよい。
【０１０６】
（第３実施形態）
第３実施形態では、式（１２）の分子をの有効ビット長を許容誤差範囲内で短くすることで近似を行うものである。この近似は、図１においてｑ＜ｒとし、ｒビットの上位ｑビットを補正項計算ユニットで累積加算することに対応する。
【０１０７】
この場合の補正項をｍとおき、例えば次式（２５）によってｍを求める。
【０１０８】
【数１１】

【０１０９】
ここで、trunc（）は与えられた変数の上位ｑビットはそのままとし、該上位ビットよりも下位のビットを０とする関数である。原理的には、各項毎に取り出すビット数ｑを変えても良いが、すべての項にｑを共通にした方が通常ハードウェア構成は簡単となる。
【０１１０】
ｍを逐次計算するための漸化式は以下の通りである。
σ_i = (σ_i-1 − ｍ_i-1 ) + trunc（ξ_i ）/ 2^ri （２６）
ｍ_i = [σ_i] （２７）
ただし、σとｍの初期値は０とする。
本実施形態の場合、分母のみならず分子についても近似誤差が生じる。これら分母及び分子の近似誤差による影響について説明する。今、分子の近似誤差の尺度として次のようなδ_iを定義する。
δ_i = {ξ_i ‐ trunc(ξ_i )} / a_i （２８）
さらに、
δ＝Max(δ_i) （２９）
を定義する。
【０１１１】
このδが導入されると、第２の実施形態の場合に似た以下のような条件が与えられる。
ここで、入力xが、
n(ε＋δ)Ａ ≦ x ＜Ａ（３０）
のとき、式（２５）は正しい補正項kと同じ値を与える。また、
0 ≦ x ＜ n（ε＋δ）Ａ（３１）
のとき、式（２５）は正しい値kまたはk-1を与える。
第３実施形態によれば、式（１２）の分母のみならず、分子についてもその有効ビット長を許容誤差範囲内で短くすることで近似を行っているので、補正項の計算をより簡素化、高速化できる。
なお、第２および３の実施形態によって与えられる補正項l, mは、入力xがある値以上の値の場合は正しい補正項を与え、xがある値よりも小さい場合は正しい補正項を与えないという性質を持っている。
しかし、場合によっては、xがある値以上の場合のみ補正項が誤差を含むことがあり、xがある値以下であれば、任意に小さい値まで正しい補正項が与えられる、という性質の方が好ましい場合がある。例えば、上記したモンゴメリ乗算のステップ（７）の基底Aから基底Bへの変換では、モジュラスNをある値以下しておくだけで常に正しく基底変換がなされるようにしたい場合である。
【０１１２】
（第４実施形態）
次に、第４の実施形態として、xがある値以下であれば、任意に小さい値まで正しい補正項が与えられるような補正項の計算方法を説明する。
【０１１３】
補正項の計算の基本原理は式（１１）に基づくが、分母は２の冪で近似し、分子は上位ｑビットのみ使うという近似を用いるものとする。この方式では、パラメータαとβを導入するが、αは次式のように入力xの大きさを制限するためのパラメータである。
０≦ x ＜ (１−α)Ａ（３２）
本実施形態における補正項m’は次式に従って計算する。
【０１１４】
【数１２】

【０１１５】
本実施形態は、図１においてｑ＜ｒとしてｒビットの上位ｑビットを補正項演算ユニット１１０（図３）に入力し、内部レジスタ３０２の初期値をβとして累積加算するものである。式（３３）に対応する漸化式は次の通りである。
σ₀ =β （３４）
m’₀＝０（３５）
σ_i = (σ_i-1 − m’_i-1 ) + trunc(ξ_i ) / 2^ri （３６）
m’_i = [σ_i] （３７）
このとき、n (ε＋δ)≦β≦α＜１であれば０≦x≦（１−α）Ａなるxは正しく変換される。
【０１１６】
例えば、α=β=１／２とすると、Ａ／２以下の任意のｘに対して常に正しい補正項を計算することが可能である。β=１／２を図１の剰余演算装置で実現するには、図３に示したレジスタ３０２の上位から２番目のビットに１をセットすれば良い。このようにβを２の冪の逆数に選ぶと、レジスタの初期値の設定は、対応する一つのビットを１にセットするだけで良く簡単になる。なお、一般には誤差ｎ（ε＋δ）以上でα以下のβならば、任意の値をオフセットとして設定できる。
【０１１７】
第４実施形態によれば、上記のようにパラメータαとβが導入され、xがある値以下に制限される。そして、該制限されたxの任意の小さい値において正しい補正項が常に与えられるような補正項の計算を実現できる。
【０１１８】
（第５実施形態）
第５実施形態は、パラメータサイズに関する。RSA暗号の場合、１０２４ビット程度のモジュラスサイズを選ぶ必要があり、基底A，Bともに１０２４ビットを若干上回る程度の大きさが必要である。基底A，Bが各々３２ビット程度、すなわちｒ=３２とすると、基底サイズはｎ＝３３程度となる。これはｎ*ｒを１０２４程度とするためである。第４の実施形態においてα＝β=１／２とした場合、誤差の発生を抑えるためにはｎ（ε＋δ）≦１／２であることが必要である。したがって、ε＋δ≦１／２ｎ=１／６６であり、ε＜１／２^８、ε＜１／２^８はこれを満たす十分条件である。このようなパラメータサイズは、図３に示した加算器３０１の精度に概ね対応しており、補正項の演算には８ビット程度の加算器を用いれば良いことがわかる。
【０１１９】
（第６実施形態）
第６実施形態は、これまでに説明した本発明に係る基底変換（拡張）によるモンゴメリ乗算に基づいてべき乗剰余演算を行う装置に関する。
図９は、本実施形態に係るべき乗剰余演算装置の全体構成を表す図である。入力データおよび出力データは図示されたＩ／Ｏ部１０００を介してやり取りされる。入力データは、まずＩ／Ｏ部１０００を介して所定のＲＡＭ１２０１に格納される。外部からのデータが剰余演算系表現で入力された場合には、それぞれ対応するＲＡＭ１２０１〜１２０ｎに格納される。図中にはＲＡＭがｎ個示されているが、各ＲＡＭにはそれぞれ基底a_ｉとb_ｉに対応する要素が書きこまれる。入力されたデータは、積和回路１１０１〜１１０ｎ及び補正項計算ユニット１１００により、これまでに述べたモンゴメリ乗算を繰り返すことによってべき乗剰余演算結果として与えられる。この演算結果は対応するＲＡＭ１２００〜１２０ｎに格納され、Ｉ／Ｏ部１０００を介して外部に出力される。
【０１２０】
モンゴメリ乗算を繰り返すことによってべき乗剰余演算を行うための手順について、図１０のフローチャートに従って説明する。同図に示されるフローチャートは、入力された剰余演算系表現の値ｘをｅ乗し、Ｎで割った余りを求める処理を表している。ただし、Ｎは既知と仮定し、Ｎの剰余演算系表現を求めるなどの処理については事前に計算してあるものとして図１０には示されていない。なお、Ｎを外部入力とし、Ｎの剰余演算系表現を求めるなどの処理をその都度行うように構成しても良い。
【０１２１】
図１０に示すＭＭは剰余系演算によるモンゴメリ乗算を意味する関数である。入力されたｘの剰余演算系表現は、まず定数ｄの剰余演算系表現された値(d_A, d_B)とモンゴメリ乗算によって掛け合わされてｘ'に変換される。ただし、ｄ＝Ｂ^２ mod Ｎである。次に、変換された値ｘ'（の剰余演算系表現）は、中間結果ｃにコピーされる。
次のステップはループ処理であり、ループ変数ｉはｋ−１から１まで変化する。ここで、外部入力されたべき指数ｅは２進数表現されており、そのビット数はｋビットであって各ビットはｅ_ｉと表現されている。e_ｋは最上位ビットであり、ここでは１とする。また、ｋは２以上の値とする。
ループ内では、まず中間変数ｃの２乗に相当する値がモンゴメリ乗算を用いて計算される。続いて、ループ変数ｉに対応するｅのビットe_ｉが１であるか否かを判定し、１でなければループの開始に戻り、１であるならば次のステップに進む。次のステップでは、ｃとｘの積をモンゴメリ乗算によって求める。続いて、ループ変数ｉが１であるか否かを判定し、１でなければループの開始点に戻り、１であるならばループ処理を抜ける。
最終的なステップでは、ここまでの計算結果ｃと、１を剰余演算系表現した値との積をモンゴメリ乗算によって求め、結果ｙ（の剰余演算系表現）を得る。
【０１２２】
以上によりｙ＝ｘ^ｅ mod Ｎが計算される。
【０１２３】
（第７実施形態）
第７実施形態は、剰余演算装置をリング構成としたものに関する。
図１に示したｎ個の積和回路は、ｒビットのバスを介して接続されている。このバス接続によって、あるＲＡＭから出力されたデータを、ｎ個の全ての積和ユニットに伝送し、並列処理することが可能となる。バスによって積和回路を結ぶ構成は、並列処理の実現に有効な構成法の一つであるといえる。
一方、ネットワークアーキテクチャの分野で良く知られているように、複数ユニットを接続する方法としては、このようなバス接続のほかに、リング接続が考えられる。バス型のアーキテクチャはｎ個のユニットに同一のデータを放送（broadcast）するためのバスによって特徴づけられるのに対して、リング接続では隣り合ったユニット間を結ぶ通信路がｎ個のユニットを接続し、全体としてリング状のアーキテクチャとなる。
【０１２４】
本発明による剰余演算装置もリング接続によって実現することが可能である。直列的なリング接続の場合、各ユニットは隣ユニットにデータを送るだけで良いので、複数のユニットにデータを送信しなければならないバス型の構成に比べ、各ユニットのデータ駆動能力が小さくて済む。また全ユニットの動作を全く同じに制御することができる。これに対してバス型の場合には、あるユニットがデータを放送する際には、残りの（ｎ−１）個のユニットはそのデータを受信することになり、全ユニットが同一の動作をする訳ではない。全ユニットの動作が同じであるためにリング型の方が制御が容易である。
【０１２５】
図１１はリング構成の剰余演算装置の構成を示すブロック図である。図１１の構成では、バス接続がリング接続に変更されていると共に、図１に示したビット選択部１１１と補正項計算ユニット１１０が、ｎ個の積和回路のそれぞれに対して設けられるという変更が加えられている。図１に示した構成は、１つの補正項演算ユニットが設けられていただけであり、補正項計算ユニットで計算された補正項をｎ個の積和回路に放送するためには、１ビットのバスを設けるだけで済んだ。一方、本実施形態のようにｎ個の補正項計算ユニットを積和回路のそれぞれに設ける構成とした場合は回路規模が若干増加する。しかし、本発明に係る補正項計算ユニットは図３に示したように極めて簡素な構成であり、このような補正項計算ユニットをｎ個設ける構成としても、全体の回路規模に占める補正項計算ユニットの割合は極めて小さい。
【０１２６】
図１１のリング接続構成では、補正項計算ユニットが積和回路毎に設けられているので補正項を各積和回路に伝送するためのバスが不要になり、その代わりに接続ユニットが設けられる。この接続ユニットの詳細構成を図１２に示す。同図に示される接続ユニットは、２入力のセレクタ９６０とセレクタ９６０からの出力をラッチするｒビットのレジスタ９６１とにより構成されている。ある積和回路において、このレジスタ９６１は今回の演算サイクルで使用されるオペランドの一つを記憶する。次の演算サイクルにおいて、そのオペランドは隣接（図では例えば左隣）する接続ユニットに転送され、また、他の隣接（この場合、図では右隣）のユニットから次のオペランドが受信される。ｎ個の接続ユニットの各々のレジスタ９６１に格納されたｎ個のオペランドは、バケツリレー的に次々と隣接するユニットに転送され、ちょうどｎサイクルの時間で全ユニットにｎ個のオペランドが回覧されることになる。
【０１２７】
［ユニット数のスケーラビリティ］
ここまでは、積和回路の個数ｎ（あるいはユニット数ｎ）は基底サイズｎに等しいものとして説明してきた。しかし、演算ユニット数と基底サイズとが必ずしも一致している必要はないというのが一般的である。そこでユニット数を記号ｎからｍに改めて表すと、ｍ≦ｎの制約の下、ｍがより大きいほど高速処理が可能になる。一方、ＬＳＩ等によるハードウェア実装を考える場合、ｍがより大きいほど、回路規模及び消費電力は大きくなる。このように、ユニット数ｍと演算速度との間にトレードオフが生じる。ここで、ユニット数ｍの典型的な定め方として、ｎの約数を採用する方法が考えられる。たとえば、ｎ＝３３ならば、ｍ＝１，３，１１，あるいは３３がユニット数の候補になる。ｎの約数以外のｍを採用することも当然可能であるが、ｎの約数とすると回路の制御が規則的になり、かつ、演算ユニットの稼働率も高くなるという利点がある。いずれにしても、ｍをｎに限定しないことでＬＳＩ設計等の自由度が格段に広がることが容易に推測されよう。
【０１２８】
【発明の効果】
以上説明したように本発明によれば、新たな基底変換（拡張）が提供され、次のような作用効果を得ることができる。
(a)補正項の値を比較的小さくし、かつこれを１ビット単位で逐次処理できる。
(b)基底変換後の値が変換前に表現されていた値と同じであるから、Poschらの方式のような誤差が生じない。
(c)仮に誤差が生じるとしても、前後の処理や入力サイズの制限により誤差を容易に制御できる。
(d)RSA暗号への適用においては鍵のサイズへの制限が少ない。
(e)補正項の計算に乗算が不要であり処理効率が良い。
(f)基底の取り方に制約が少なく汎用性が高い。
したがって、簡素な構成でモンゴメリ乗算を高速化でき、ひいてはRSA暗号法の処理の高速化を実現できる。
【図面の簡単な説明】
【図１】本発明の実施形態に係る剰余演算装置の構成を示す図
【図２】図１に示された積和回路の構成を示す図
【図３】図１に示された補正項計算ユニットの構成を示す図
【図４】図１に示されたビット選択部の構成を示す図
【図５】従来技術に係る剰余演算装置の構成を示す図
【図６】従来技術に係り図５に示された積和回路の構成を示す図
【図７】従来技術に係り図５に示された補正項計算ユニットの構成を示す図
【図８】本発明の実施形態に係る補正項計算ユニットの他の構成を示す図
【図９】本発明の実施形態に係るべき乗剰余演算装置の構成を示す図
【図１０】本発明の実施形態に係るべき乗剰余演算の処理フローチャート
【図１１】本発明の実施形態に係る剰余演算装置の他の構成を示す図
【図１２】図１１に示した剰余演算装置の接続ユニットの構成を示す図
【符号の説明】
１０１〜１０ｎ…積和回路
１１０…補正項計算ユニット
１１１…ビット選択部
１２１〜１２ｎ…ＲＡＭ（ランダムアクセスメモリ）
１３１〜１３ｎ…ＲＯＭ（リードオンリーメモリ）
２０１…乗算器
２０２…加算器
２０３…剰余演算器
２０４〜２０６…レジスタ
２０７…スイッチ
３０１…加算器
３０２…レジスタ
５０１〜５０ｎ…積和回路
５１０…補正項演算ユニット
５２１〜５２ｎ…ランダムアクセスメモリ
５３０〜５３ｎ…リードオンリーメモリ
６０１…乗算器
６０２…加算器
６０３…剰余演算器
６０４、６０５…レジスタ
７０１…乗算器
７０２…加算器
７０３…レジスタ
８０１…除算回路
８０２…加算回路
８０３…レジスタ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a residue calculation processing device and method for calculating a large integer operation at high speed by parallel processing based on a residue calculation system.
[0002]
[Prior art]
A residue arithmetic system (Modular Arithmetic or Residue Number System) is known as a technique for efficiently calculating a large integer. In a remainder operation system, a relatively small pair of integers {a₁, a₂,…, A_n}, And express the large integer to be expressed by the remainder divided by these integers. Hereinafter, this set of integers is referred to as a base of the remainder calculation system. The number of elements n is referred to as a base size.
[0003]
For example, the base {a₁, a₂,…, A_n} Is given, the integer x represents this as the basis a_in remainders obtained by dividing by (i = 1,2, ..., n) {x₁, x₂, ..., x_n}. At this time, the number x is the product A (= a₁a₂… A_nIf the integer is less than), the number x can be uniquely expressed using the product A of the base elements as a modulus. In other words, the number x and its remainder expression {x₁, x₂, ..., x_n} Corresponds one to one.
[0004]
In order to calculate the product of two integers x and y in such a remainder calculation system expression, first, the product of each element is obtained, and the corresponding basis a_iFind the remainder divided by. This is generally the basis a corresponding to each element_iIn other words, the product modulo the product A of the base elements is obtained by calculating the product modulo. The same applies to addition and subtraction, and the basis a_iElement x corresponding to_i, Y_iAbout a_iAddition or subtraction modulo.
[0005]
In such an arithmetic operation using a remainder operation system, multiplication, addition, and subtraction may be performed by modulo the base corresponding to each element independently. For example, a value within the word length of the computer is used as the base. By adopting, it is possible to realize a very large integer operation by repeating a single precision operation.
[0006]
Moreover, since these single precision calculations can be executed independently for each base, parallel processing becomes possible by preparing a plurality of calculators. For example, when the base size is n, n multipliers with a remainder function are prepared, and these are operated in parallel to calculate the product A of the base elements in the same time as a single multiplication with single precision remainder. Can finish the multiplication.
[0007]
In a current computer, a binary representation is usually used. In large integer operations based on binary representation, carry (carry) propagates from LSB (Least Significant Bit) to MSB (Most Significant Bit), which is proportional to the total number of digits (or bit length) of a large integer. Processing time is required. Therefore, it is disadvantageous in terms of processing speed as compared with the case where parallel processing is performed using a remainder calculation system.
[0008]
On the other hand, since there is no carry between words, the remainder calculation system has long been known as a method for efficiently performing multiplication, addition, and subtraction of large integers compared to the radix representation represented by binary representation. Has been.
[0009]
However, there has been no known means for performing division or comparing two numbers more efficiently than in the radix method. For this reason, it is considered that the remainder operation system is suitable for applications that perform large integer operations at high speed, such as public key cryptography, but it has not been known how to apply the remainder operation system until the 1980s. It was.
[0010]
Posch et al., “Modulo Reduction in Residue Number Systems” published in IEEE Transaction on Parallel and Distributed Systems, Vol.6, No.5, May 1995, pp.449-454 and Computer & Security magazine Vol.17, pp.637-650, 1998 "RNS-Modulo Reduction Upon a Restricted Base Value Set and its Applicability to RSA Cryptography" A method to perform at high speed was proposed.
[0011]
Kornerup et al. In 13th IEEE Symposium on Computer Arithmetic (Proceedings of ARITH13), IEEE Computer Society, pp234-239, “An RNS Montgomery Modular Multiplication Algorithm”, and Paillier in Springer-Verlag, Lecture Notes in Computer Science No. 1560. A similar high-speed calculation method was proposed in "Low-Cost Double-Size Modular Exponentiation or How to Stretch Your Cryptoprocessor" in Public Key Cryptography (PKC'99), pp.223-234.
[0012]
The main reason for using the remainder operation system in RSA cryptography is that the encryption method is composed of repetition of a very large integer remainder multiplication operation of about 200 decimal digits or more, and the multiplication of the remainder operation system as described above. This is because it is possible to realize high-speed processing by using the characteristic that can perform addition and subtraction at high speed.
[0013]
What is common in each of the above Posch et al., Kornerup et al., And Paillier systems is that the Montgomery arithmetic system is combined with the remainder arithmetic system in order to avoid performing an unfavorable division in the remainder arithmetic system. In the middle of processing, there are also three methods in which base conversion or base extension is performed in order to obtain a value that represents an integer expressed in a residue system in one base in another base. Is common. Furthermore, in any of the methods, whether the base conversion or the base extension can be performed efficiently is related to the efficiency of the entire processing.
[0014]
Here, two types of terms, ie, basis transformation and basis extension, are used. Basis transformation means that a value represented by a certain basis is represented again by another basis that is disjoint from that basis. The base extension is to obtain the (n + 1) th element when the value expressed by the base of size n is expressed by the base of size n + 1, which is obtained by adding one integer that is relatively prime to the original base. Point to. If there is a base extension method, it is obvious that the base transformation can be configured by executing it n times. In order to realize RSA cryptography using a residue arithmetic system, a method and apparatus for efficiently performing base conversion (or base extension) are required.
[0015]
However, it can be said that the above three methods and the methods proposed so far are basis conversion methods that are inefficient at some point, as will be described below.
[0016]
First, in the method proposed by Posch et al., The basis conversion method shown in the operation of RSA encryption may cause an error in the value after conversion if the value before conversion is smaller than a certain value. Therefore, Posch et al. Add an appropriate offset to the input of the base conversion process to convert the input to a value that does not cause an error in the base conversion process, perform the base conversion of the conversion result, and from the obtained base conversion result Proposes a procedure to remove the effects of offset. However, the pre-processing and post-processing for such an offset increase the overall calculation amount, and thus are inefficient.
[0017]
In addition, the method of Posch et al. Is circuitized because the RSA cipher key size that can be calculated for a given basis is significantly limited and a multiplier is required to calculate the correction term necessary for basis conversion. It is also disadvantageous in terms of area and processing delay.
[0018]
FIG. 5 is a diagram showing a schematic configuration of a remainder operation circuit used for RSA encryption operation by the Posch et al. Method.
[0019]
The product-sum circuit 501 with a remainder calculation function, the RAM 521, and the ROM 531 constitute one unit, and n units having the same configuration are arranged in parallel. Here, the size of the base is n, and each unit performs an operation corresponding to a specific base. For example, each unit corresponds to each of n base elements of the base A and n base elements of the base B. For example, in the product-sum circuit 501, the base a₁, B₁An operation corresponding to is performed. Each of these n units is configured to perform an r-bit operation, and is further connected to each other by an r-bit bus.
[0020]
FIG. 6 shows the internal configuration of the product-sum circuits 501 to 50n. Here, for the sake of convenience, description will be made assuming that the unit is indicated by the product-sum circuit 501. The input includes r-bit data represented by symbols a and b, and data input from the r-bit ROM 531 input from the right side in the figure. In the drawing, “a” represents an input from the RAM 521, and “b” represents an input from the ROM 531. First, a and b are multiplied by a multiplier 601 and the result is supplied to an adder 602 at the next stage. In the adder 602, the multiplication result and the feedback value from the register 604 are input and added. The result of the adder 602 is supplied to the remainder calculation unit 603 and converted into a remainder divided by the value set in the register 605. Here, the value of the register 605 is represented by the symbol m._iBut this is the basis a₁Or b₁. The n sets of data that are the same as the base size are supplied to the inputs a and b. However, after all n pieces of data have been calculated, the calculation result is completed in the register 604, which is supplied to the RAM 521 by an r-bit bus. Is done.
[0021]
Returning to the explanation of FIG. 5, the remainder calculation circuit is externally attached to the correction term calculation unit 510 for correcting the calculation result in the basis conversion and the correction term calculation unit 510, and at least n is added to the correction term calculation unit 510. A ROM 530 is provided to supply word parameters.
[0022]
The correction term calculation unit 510 proposed by Posch et al. Is realized by a product-sum circuit as shown in FIG. In the circuit shown in FIG. 7, the input r-bit data and the data input from the ROM 530 are first multiplied by the multiplier 701 and then accumulated and added by the adder 702. The addition result is stored in the register 703, and the value is fed back after the correction term is completely calculated.
[0023]
What should be noted here is that the circuit scale of the correction term calculation unit 510 is as large as or larger than the circuit scale of the product-sum circuit with a remainder calculation function shown in FIG. The correction term calculated here is (r + log₂n) About a bit size, in the figure, the transmission bus width for transmitting a correction term to the product-sum circuits 501 to 50n is not r bits, but (r + log₂n) A bit is necessary, which increases the circuit area. Of course, the r bits can be shared with the bus from the RAM to the correction term arithmetic unit.₂For n bits, an extra area is required for feedback.
[0024]
Further, the product-sum circuits 501 to 50n need to perform at least one remainder multiplication in order to reflect the correction term received from the correction term calculation unit 510 in the calculation results so far. If the correction term can be fed back to the successive product-sum circuit during other processing, it will save processing time. However, in the Posch et al. Configuration, the correction term is completely calculated. Otherwise, the value cannot be fed back. Until now, no means for solving these specific problems has been devised.
[0025]
In another conventional technique, Kornerup et al., Shenoy and Kumaresan calculated “Fast Base” of IEEE Transaction on Computers, Vol.38, No.2, February 1989, pp.292-297 to calculate the correction term. The method proposed in “Extension Using a Redundant Modulus in RNS” is used. In this case, the size of the correction term is about n, which is much smaller than the method of Posch et al., But this method also requires multiplication for the calculation of the correction term and is more efficient in terms of circuit scale and processing delay. A good correction term calculation procedure was required.
[0026]
In addition, in the method proposed by Paillier, which is another prior art, it is not possible to select an arbitrary base, and it is possible to convert a base to a base representation or a base representation to a remainder operation system representation very efficiently. The scope of application is limited. The applicable examples specifically shown in the paper are shown only when two sets of bases having a base size n of 2 are used, and no other practical examples are known. When n is as small as about 2, each element of the base is conversely large, and it is difficult to increase the processing speed as compared with the case where n can be set large and each element of the base can be set small.
[0027]
As mentioned above, there are three known methods proposed to use the remainder calculation system aiming at high-speed processing of RSA ciphers, and the processing is compared with the RSA cipher operation methods proposed so far. Although there is an effect of increasing the efficiency, in any of the methods, the efficiency of the base conversion process, which is the most important part of the processing steps, is poor, or there is only a method with a limited base size.
[0028]
[Problems to be solved by the invention]
In view of the above points, an object of the present invention is to provide a new basis conversion method that is superior in all or a part of the following points as compared to a conventionally proposed basis conversion method.
[0029]
(a) The value of the correction term is relatively small and can be sequentially processed.
[0030]
(b) The value after conversion matches the value expressed before conversion, and no error occurs.
[0031]
(c) Even if an error occurs, the error can be easily controlled by the processing before and after and the restriction of the input size.
[0032]
(d) There are few restrictions on the key size in application to RSA encryption.
[0033]
(e) Multiplication is not necessary to calculate the correction term, and processing efficiency is good.
[0034]
(f) There are few restrictions on how to take bases, and versatility is high.
[0035]
An object of the present invention is to realize a high-speed remainder calculation device and method used for RSA encryption processing and the like by combining such a basis conversion method with the Montgomery algorithm.
[0036]
[Means for Solving the Problems]
In order to solve the above problems and achieve the object, the present invention is configured as follows.
[0037]
(1) A remainder calculation apparatus according to the present invention includes a plurality of product-sum circuits having a remainder calculation function and a correction term calculation unit that calculates a correction term used for the remainder calculation in the product-sum circuit. The correction term calculation unit sequentially calculates the correction term bit by bit, and the product-sum circuit performs base conversion or base extension by sequentially reflecting the correction term calculated by the correction term calculation unit. This is a characteristic remainder calculation device.
[0038]
(2) The remainder calculation apparatus according to the present invention is the remainder calculation apparatus according to (1) described above, wherein the product-sum circuit performs Montgomery multiplication.
[0039]
(3) A residue arithmetic processing apparatus according to the present invention includes a plurality of product-sum circuits arranged in parallel, and a correction term calculation unit that calculates a correction term used for a residue operation in the product-sum circuit. The correction term calculation unit sequentially calculates the correction term one bit at a time, and the product-sum circuit sequentially reflects the correction term calculated by the correction term calculation unit and converts a residue calculation system representation into a radix representation. This is a remainder calculation device characterized in that it performs an operation to perform.
[0040]
(4) The remainder calculation apparatus method according to the present invention is the apparatus according to any one of (1) to (3) above, and the correction term calculation unit includes a division circuit and is handled by the product-sum circuit. The remainder computation system is characterized in that the basis of the remainder computation system is close to 2 冪 or 2 冪.
[0041]
(5) The remainder calculation apparatus according to the present invention is the apparatus according to any one of (1) to (4) above, and further includes a bit selection unit that selects an input bit to the correction term calculation unit. This is a remainder calculation device characterized by the following.
[0042]
(6) The remainder calculation apparatus according to the present invention is the apparatus according to any one of (1) to (5) above, and further includes an I / O unit for inputting / outputting data to / from the outside. It is a remainder arithmetic unit.
[0043]
(7) The remainder calculation apparatus according to the present invention is a remainder calculation apparatus for performing base conversion or base extension of a base in a predetermined arithmetic algorithm in a residue arithmetic system to another base, and for determining the unknown parameter k for the base conversion or base extension, K output means for approximating the carry generated by the cumulative addition of the previous calculation result of the unknown parameter k, and a specific term in the basis conversion or base extension according to the unknown parameter k output from the k output means Switching means for switching between calculation and non-computable, and a plurality of arithmetic units for performing the calculation of the base conversion or the base extension for each base element by a combination of multiplication, addition and remainder calculation including the calculation of the specific term. It is a remainder arithmetic unit.
[0044]
(8) The remainder calculation apparatus according to the present invention is the apparatus described in (7) above, and the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2. This is a remainder calculation device.
[0045]
(9) The remainder calculation device according to the present invention is the device described in (7), further including a bit selection unit, wherein the k output unit is a numerator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem. Is calculated based on truncation other than the effective bit length by the bit selection means.
[0046]
(10) The remainder calculation apparatus method of the present invention is the apparatus according to (7) above, and the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2. In addition, the remainder calculation apparatus is characterized in that the numerator of the calculation formula is approximated based on truncation other than the effective bit length.
[0047]
(11) The remainder calculation apparatus according to the present invention is the apparatus described in (7) above, and the predetermined calculation algorithm is xyB for input integers x, y, and N.^-1 modN or xyB^-1 A modular arithmetic unit comprising a Montgomery multiplication algorithm that outputs mod N + N.
[0048]
(12) A remainder calculation apparatus according to the present invention is the apparatus described in (11) above, and further includes means for performing a power residue calculation according to a predetermined algorithm using the Montgomery multiplication. It is.
[0049]
(13) The remainder calculation device of the present invention is the device described in (7) above, and converts the remainder calculation system representation into a radix representation according to a predetermined calculation formula including unknown parameters based on the Chinese remainder theorem, and outputs The remainder calculating device is characterized by comprising a converting means.
[0050]
(14) A residue arithmetic unit according to the present invention is a residue arithmetic unit that performs base conversion or base expansion of a base to another base in a predetermined arithmetic algorithm in a residue arithmetic system, and performs multiplication, addition and remainder calculation including calculation of a specific term. A plurality of arithmetic units for performing calculation of the base conversion or base extension for each base element by combination, and an unknown parameter k of the base conversion or base extension is provided in each of the plurality of arithmetic units. k output means for approximating the carry generated by the cumulative addition of the previous calculation result of k, and whether or not the specific term of the arithmetic unit corresponding to the k output means can be calculated is output from the k output means. Switching means for switching according to the unknown parameter k, the operand of the arithmetic unit is transmitted to the adjacent arithmetic unit, and the adjacent And connecting means between the computing unit for receiving the operands from the other calculation units that are modular arithmetic device comprising a.
[0051]
(15) The remainder calculation apparatus method of the present invention is the apparatus according to (14) above, and the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2. This is a remainder calculation device.
[0052]
(16) The remainder calculation apparatus method according to the present invention is the apparatus according to (14) above, and the k output means uses a numerator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem as an effective bit length. It is a remainder calculation device characterized by approximating based on truncation of.
[0053]
(17) The remainder calculation apparatus of the present invention is the apparatus method according to (14) above, and the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2. In addition, the remainder calculation apparatus is characterized in that the numerator of the calculation formula is approximated based on truncation other than the effective bit length.
[0054]
(18) The remainder calculation apparatus method according to the present invention is the apparatus described in (14) above, and the predetermined calculation algorithm is xyB for input integers x, y, and N.^-1 mod N or xyB^-1 A modular arithmetic unit comprising a Montgomery multiplication algorithm that outputs mod N + N.
[0055]
(19) A remainder calculation apparatus according to the present invention is the apparatus described in (18) above, and further includes means for performing a power residue calculation according to a predetermined algorithm using the Montgomery multiplication. It is.
[0056]
(20) The remainder calculation apparatus according to the present invention is the apparatus according to (14) above, and converts the remainder calculation system expression into a radix expression in accordance with a predetermined calculation formula including an unknown parameter based on the Chinese remainder theorem, and outputs the result. The remainder calculating device is characterized by comprising a converting means.
[0057]
(21) The remainder computation method of the present invention is a remainder computation method in which a basis in a predetermined computation algorithm in a remainder computation system is subjected to basis transformation or basis extension to another basis. Approximate the carry generated by the cumulative addition of the calculation results, switch whether to calculate a specific term in the basis conversion or base extension according to the output unknown parameter k, and perform multiplication, addition including calculation of the specific term, and According to another aspect of the present invention, there is provided a residue calculation method, wherein the basis conversion or the basis extension is calculated for each base element by a combination of residue calculations.
[0058]
(22) The remainder calculation method of the present invention is the method according to (21) above, and the remainder of the calculation formula of the unknown parameter k based on the Chinese remainder theorem is approximated by a power of 2. It is a calculation method.
[0059]
(23) The remainder calculation method of the present invention is the method described in (21) above, and approximates the numerator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem based on truncation other than the effective bit length. This is a remainder calculation method characterized by the above.
[0060]
(24) The remainder calculation apparatus method of the present invention is the method according to (21) above, and approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2, and the calculation formula The numerator is approximated based on truncation other than the effective bit length.
[0061]
(25) The remainder calculation method according to the present invention is the method described in (21) above, and the predetermined calculation algorithm uses xyB for input integers x, y, and N.^-1 mod N or xyB^-1 The remainder calculation method is characterized by comprising a Montgomery multiplication algorithm that outputs mod N + N.
[0062]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0063]
(First embodiment)
First, the operation of the RSA encryption method which is the most suitable example of the present invention will be described.
[0064]
Encryption and decryption of the RSA cipher is realized by a remainder operation that should be expressed by the following equation.
[0065]
C = m^e  mod N (1)
Here, m and N are decimal numbers with a size of several hundreds of digits and a very large processing amount. Therefore, various methods have been devised to calculate this efficiently.
As a well-known method for implementing the operation of the RSA cipher, there is a method of repeatedly using multiplication with a residue proposed by Montgomery (hereinafter referred to as Montgomery multiplication). As introduced in the prior art, a case where Montgomery multiplication is performed in a remainder calculation system will be taken up as one of specific application targets of the present invention. Here, a normal Montgomery multiplication processing procedure that is not a remainder calculation system will be described first.
[0066]
Montgomery multiplication uses xyB for the input integers x, y, and N.^-1 mod N or xyB^-1 This is an algorithm that outputs mod N + N, and consists of the following five steps.
[0067]
(1) s ← x · y
(2) t ← {s · (-N)^-1} Mod B
(3) u ← t · N
(4) v ← s + u
(5) w ← v / B
Here, s, t, u, v, and w represent intermediate variables, and B is an arbitrary integer larger than N and relatively prime to N.
[0068]
Posch et al. Have proposed the idea of realizing this in the remainder calculation system for the first time in seven steps.
[0069]
(1) s_A  ← x_A・ Y_A, S_B  ← x_B・ Y_B
(2) t_B  ← {s_B・ (-N_B)^-1} Mod B
(3) t_B  To t by basis transformation_AAsk for.
(4) u_A  ← t_A・ N_A
(5) v_A  ← s_A+ U_A
(6) w_A  ← v_A B_A ^-1
(7) w_ATo w by basis transformation_BAsk for.
Here, the symbol with the subscript A or B is the base A = {a₁, a₂,…, A_n} Or base B = {b₁, b₂,…, B_n} Represents the number represented by. For example, x_AIs the product of base elements A = a₁a₂… A_n A group of remainders n that divides the elements x of the remainder ring modulo by the elements of the base A {x₁, x₂, ..., x_n}. At least N <A and N <B are necessary in order to be able to calculate correctly by the above processing. From this condition, x and y can be uniquely expressed by only the base A or only the base B._A, X_BExpressing x in pairs is redundant. However, the range of values taken by the product s of x and y is 0 ≦ s <N²It is expressed correctly only with A * B as the basis. From this, it can be seen that x and y can be correctly calculated as a product of the remainder operation system by expressing A * B as a basis. Note that the sizes n and m of the base A and the base B are generally different, but as a special case, when n = m, the arithmetic unit that processes the base A and the arithmetic unit that processes the base B can be shared. There is.
[0070]
Except for steps (3) and (7) in the Montgomery multiplication process in the remainder calculation system, the correspondence between the five steps in the normal Montgomery calculation will be clear. Steps (1) to (2) and (4) to (6) can be easily realized by multiplication or addition of a remainder calculation system. For example, s in step (1)_ACan be calculated by multiplying each element of x and each element of y represented by the remainder A of the base A modulo the corresponding base element. On the other hand, several studies have been made so far on the basis transformations of steps (3) and (7). How efficiently the base conversion is performed is a point for efficiently implementing the above processing algorithm.
[0071]
A given x is exactly the product A (= a₁a₂… A_n) First consider the following method of expressing as a positive value. Now, let x be an integer satisfying 0 ≦ x <A, and the remainder calculation system representation is {x₁, x₂, ..., x_n}. The following equation holds from the well-known Chinese remainder theorem.
[0072]
[Expression 1]

[0073]
Where A_i Is A / a_i, A_i ^-1Is the law a_i A in_i Is the multiplication inverse of. At this time,
[Expression 2]

[0074]
There is only one k. Here, k is the only unknown parameter, and it is considered that k is expressed by a known parameter. k is a parameter for making the value calculated in the first term an integer from 0 to less than A, and hereinafter, k will be referred to as a correction term.
[0075]
Dividing both sides of equation (3) by A,
[Equation 3]

Therefore,
[Expression 4]

Here, considering 0 ≦ x / A <1,
[Equation 5]

I can say. When the operation of truncating the decimal part is represented by the symbol [], the following relational expression can be derived from the expression (6).
[0076]
[Formula 6]

[0077]
This is similar to the expression by Posch et al., But the correction term k 'by their method can be written as follows.
[0078]
[Expression 7]

[0079]
Compared with this Posch et al. Equation (8), equation (7) according to the present invention is_i Is embedded in () and a_i A as a law_i ^-1The difference is that it is multiplied. Hereafter, this product is expressed as_iRepresented by
[0080]
ξ_i= X_i * A_i ^-1mod a_i (9)
The possible value of the correction term k based on the equation (7) is a value between 0 and less than n, whereas the correction term k ′ based on the equation (8) of Posch et al._{i = 1} ⁿ a_i It will be about the size. The correction term k ′ of Posch et al.
[Equation 8]

In many cases, the value greatly exceeds n. Min and Max are functions having a minimum value and a maximum value, respectively.
[0081]
The correction term k calculated according to the equation (7) is smaller in value than the Posch et al. Method. Thus, the calculation method of the correction term k according to the present invention is configured with the relational expression (7) as a starting point.
[0082]
Here, the configuration of a remainder operation circuit that realizes Montgomery multiplication according to the present invention will be described with reference to the drawings.
FIG. 1 illustrates a main part of a remainder arithmetic unit that implements Montgomery multiplication. The product-sum circuit 101 with a remainder calculation function, the RAM 121, and the ROM 131 constitute one unit, and n units having the same configuration are arranged in parallel. Each unit corresponds to each of n base elements of the base A and n base elements of the base B. For example, in the product-sum circuit 101, the base a₁, B₁An operation corresponding to is performed. These n units are each configured to perform an r-bit operation, and are further connected to each other by an r-bit bus. A bit selection unit 111 and a correction calculation unit 110 are illustrated as components other than these n units. The correction term calculation unit 110 is a unit necessary for calculating a value corresponding to the correction term k in accordance with the above equation (7) or its modified equation. The bit selection unit 111 is a unit that cuts out a required number of upper bits (q) from the r-bit bus. However, depending on the implementation, the r-bit may be supplied to the correction term calculation unit 110 as it is.
[0083]
FIG. 2 shows the configuration of one of the product-sum circuits 101 to 10n shown in FIG. Here, for the sake of convenience, description will be made assuming that the unit is indicated by the product-sum circuit 101. Input includes r-bit data represented by symbols a and b, r-bit ROM 131 input from the right side in the figure, and output from a 1-bit correction term arithmetic unit. In the figure, “a” represents an input from the RAM 121, and “b” represents an input from the ROM 131. First, a and b are multiplied by a multiplier 201, and the result is supplied to an adder 202 at the next stage. In addition to the multiplication result, the adder 202 receives and adds the feedback value from the register 204 and the data from the register 205. However, the data from the register 205 is supplied to the adder 202 as it is when the switch 207 is closed, but is replaced with 0 when the switch 207 is open. The opening / closing of the switch 207 is controlled by 1-bit data from the correction term calculation unit 110. When the data is 1, the control is closed, and when the data is 0, the control is open. The result of the adder 202 is supplied to the remainder calculation unit 203 and converted into a remainder divided by the value set in the register 206. Here, the value of the register 206 is changed to the symbol m._iBut this is the basis a₁Or b₁. The n sets of data that are the same as the base size are supplied to the inputs a and b, but after all n pieces of data are calculated, the calculation result is completed in the register 204, which is supplied to the RAM 121 via the r-bit bus. Is done.
[0084]
FIG. 3 shows an example of the configuration of the correction term calculation unit 110. The correction term calculation unit 110 has a configuration in which input q-bit data is cumulatively added by an adder 301. The q + 1 bits of the addition result are stored in the register 302, and the most significant bit of the register 302 is output as the sequential calculation result of the correction term. The q bits other than the most significant bit are supplied to the adder 301 again in the next processing step. Since n values equal to the base size are supplied as input, the correction term calculation unit 110 outputs the calculation result n times.
[0085]
FIG. 4 is a diagram illustrating a configuration of the bit selection unit 111. Here, of the input r bits, the upper q bits (q ≦ r) are output. Note that in the case of a configuration in which q = r, the bit selection unit may not be provided.
[0086]
FIG. 8 shows another configuration example of the correction term calculation unit 110. This configuration example is characterized in that an input value is first divided by a division circuit 801. At first glance, such a division circuit 801 may seem disadvantageous compared to the configuration of FIG. 7, but if the divisor is very close to 2's or 2's, there are known means for performing efficient division. The processing in the dividing circuit 801 is not necessarily large.
[0087]
Here, as a feature point of the first embodiment of the remainder calculation apparatus according to the present invention, a procedure for calculating a correction term according to Equation (7) will be described. This embodiment is based on a circuit configuration in which q = r in FIG. In the present embodiment, the configuration of the correction term calculation unit shown in FIG. 8 is used. In the configuration of FIG. 8, division is performed when obtaining a correction term, but there is an advantage that a correct correction term k can be calculated for any x less than the product A of the base elements. In general, the accuracy and labor of division are problems.^r2^r-1,2^rWhen a special value such as +1 is used as a basis, the correction term can be easily calculated by this method.
[0088]
Here, a flow until x expressed in the base A is converted into a base B expression will be described.
[0089]
[Equation 9]

[0090]
In order to actually implement the calculation based on the above formula (11) in hardware, a procedure expressed by a recurrence formula such as the following formula is used.
[0091]
σ_i = (σ_i-1 −k_i-1 ) + ξ_i / a_i               (12)
k_i = [σ_i] (13)
c_i = {c_i-1 + ξ_i* (A_i mod b_j) + k_i* (B_j − Amod b_j )} Mod b_j(14)
The procedure based on the above formulas (12) to (14) is applied to all the base elements b to be converted._jFor (j = 1,..., M), i = 1 to n are repeated in order.
Initial value σ of each variable₀= K₀= C₀= 0, c_nResults in the basis transformation. When expressed in a recurrence formula in this way, it can be seen that the correction term k is calculated bit by bit and is reflected in the intermediate conversion result each time, as shown in the above formula (14).
[0092]
The hardware configuration for sequentially calculating the correction term k based on the equation (12) is the correction term calculation unit 110 shown in FIG. Ξ in the above equation (11)_iCorresponds to the input x shown in FIG. 8, and a in the above equation (11)_iCorresponds to the input y.
[0093]
The adder 802 adds the division result (x / y) output from the division circuit 801 and the previous value held in the register 803, and outputs the result to the register 803. As shown in the figure, when a carry occurs in the register 803, the carry bit (1 bit) is output from the correction term calculation unit 110 as a correction term k (reduction factor). This correction term k takes a value of 1 or 0.
[0094]
Based on k output from the correction term calculation unit 110, the hardware configuration for calculating in parallel the value after basis conversion according to the equation (14) is the product-sum circuits 101 to 10n of FIG. One of the product-sum circuits, for example, the product-sum circuit 101, is configured to support the following basic operations.
c_{i + 1} = (c_i+ ab + k_id) mod m_i              (15)
In the above equation (15), k included in the right side_iIs 1 or 0, the calculation of the third term on the right side is realized only by the switch 207. This means that the feedback from the correction term arithmetic unit 110 shown in FIG. 1 to the product-sum circuits 101 to 10n needs only 1-bit connection. Such a circuit configuration of the present embodiment is extremely simple compared to the circuit of Posch et al. Shown in FIG. The structural advantage that only one bit of feedback is required is the same in other embodiments described later.
[0095]
In the above procedure, when calculating the correction term,_iA_i ^-1Multiplied by ξ_iHowever, if this basis transformation is used for Montgomery multiplication of the remainder calculation system, a constant (−N_B ^-1) A for each element in advance_i ^-1Multiplied by ξ_iThere is no need to add the trouble of seeking The preprocessing necessary for the conversion in step (7) is the constant B in step (6)._A ^-1Can be incorporated into. This also applies to other embodiments described later.
[0096]
Moreover, it is clear that the above procedure can be applied not only to the base conversion but also to the base extension. That is, all m bases {b_jIf the conversion is performed only for a specific base instead of performing the conversion for}, it means that the base is expanded.
[0097]
According to the new basis transformation (expansion) according to the present invention applied to the remainder calculation apparatus of the first embodiment described above, the following operational effects can be obtained.
(a) The value of the correction term can be made relatively small and can be sequentially processed in units of 1 bit.
(b) Since the value after the base conversion is the same as the value expressed before the conversion, an error like the method of Posch et al. does not occur.
(c) Even if an error occurs, the error can be easily controlled by the processing before and after and the restriction of the input size.
(d) There are few restrictions on the key size in application to RSA encryption.
(e) No multiplication is required for the calculation of the correction term, and the processing efficiency is good.
(f) There are few restrictions on how to take bases, and versatility is high.
Therefore, according to the basis conversion (extension) as in the present embodiment, Montgomery multiplication can be speeded up with a simple configuration, and hence the processing speed of RSA cryptography can be speeded up.
[0098]
Moreover, the remainder calculation apparatus of this embodiment is applicable also to the procedure which converts a remainder calculation type | system | group expression into a radix expression. Details of this procedure will be described in the second embodiment.
(Second Embodiment)
In the second embodiment, approximation is performed by selecting the denominator of each term on the right side of the calculation formula of the correction term k according to the equation (11) as two powers greater than or equal to the denominator and closest to the denominator.
[0099]
That is,
2^ri-1 <A_i ≦ 2^ri                  (16)
Satisfy r_iBy a_i 2^riApproximate. In general, r_iHowever, if all the base elements have the same bit length, there is an advantage that the product-sum operation circuits 101 to 10n in FIG. 1 can be shared.
Appropriate μ_iAgainst a_iIs expressed as:
[0100]
a_i= 2^ri-Μ_i                    (17)
At this time, as an approximate value l of the correction term k calculated by Equation (11)
[Expression 10]

Is used. If l is also calculated sequentially with a recurrence formula, like k.
[0101]
σ_i = (σ_i-1 − L_i-1 ) + ξ_i / 2^ri (19)
l_i = [σ_i] (20)
Here, the initial values of l and σ are both 0. The correction term can be calculated according to the equations (19) and (20). As in the first embodiment, this embodiment also assumes a circuit configuration in which q = r in FIG. In the present embodiment and after, the configuration of the correction term calculation unit shown in FIG. 3 is used.
[0102]
The correction term calculation according to the equations (19) and (20) can be used for basis conversion and basis extension as in the first embodiment. However, in the present embodiment, this is applied to a procedure for converting the remainder calculation system representation into the radix representation. The following equation (21) shows a procedure for converting the remainder calculation system representation into the radix representation.
[0103]
c_i = c_i-1 + ξ_i* A_i -L_i* A (21)
It should be noted that equation (21) is similar to equation (14), but in equation (14), the variable c_iWas good as long as it had sufficient precision to express the maximum basis, whereas the variable c in equation (21)_iIs based on the premise that a multiple-length variable having a size about the product A of the base elements can be stored. When actually designing hardware, it is necessary to devise a method such as dividing the calculation of equation (21) into repetitions of single-precision operations instead of realizing the calculation as it is. This would be sufficient to explain the principle of conversion to. Also, decomposition into single precision operations is easy.
[0104]
When k is approximated according to Expression (18), an error may occur in the conversion result of Expression (21). Here, some explanation will be given about the error. First, ε represented by the following equation is introduced as a measure of approximation error.
ε = Max (μ_i/ 2^ri (22)
Using this ε, the input x is
nεA ≦ x <A (23)
(17) gives the same value as the correct correction term k. Also,
0 ≤ x <nεA (24)
(17) gives the correct value k or k-1.
According to the equation (23), it is required to select ε that satisfies nε <1 and is as small as necessary. Meanwhile, μ_i Is selected to be sufficiently small, mod a performed in the remainder calculation unit 203 in FIG._iIt is also known that computation is easy.
[0105]
According to the second embodiment described above, a procedure for converting a remainder calculation system expression into a radix expression can be realized by a remainder calculation apparatus that calculates the correction term l at a high speed with a simple configuration as in the first embodiment.
In the second embodiment, the denominator of Expression (12) is approximated by a power of 2. However, not only the denominator but also the numerator may be approximated in Expression (12). Specifically, as described in the following third embodiment, the effective bit length of the numerator may be shortened within an allowable error range.
[0106]
(Third embodiment)
In the third embodiment, approximation is performed by shortening the effective bit length of the numerator of Expression (12) within an allowable error range. This approximation corresponds to q <r in FIG. 1 and accumulatively adding the upper q bits of r bits in the correction term calculation unit.
[0107]
In this case, the correction term is set as m, and m is obtained by the following equation (25), for example.
[0108]
## EQU11 ##

[0109]
Here, trunc () is a function that leaves the upper q bits of a given variable as it is and sets the lower bits of the higher bits to 0. In principle, the number of bits q to be taken out for each term may be changed, but the hardware configuration is usually simpler if q is common to all terms.
[0110]
The recurrence formula for calculating m sequentially is as follows.
σ_i = (σ_i-1 -M_i-1 ) + trunc （ξ_i ) / 2^ri (26)
m_i = [σ_i] (27)
However, the initial values of σ and m are 0.
In the case of this embodiment, an approximation error occurs not only in the denominator but also in the numerator. The influence of the denominator and the approximation error of the numerator will be described. Now, as a measure of the approximation error of the molecule,_iDefine
δ_i = {ξ_i -Trunc (ξ_i )} / a_i (28)
further,
δ = Max (δ_i(29)
Define
[0111]
When this δ is introduced, the following conditions similar to those in the second embodiment are given.
Where input x is
n (ε + δ) A ≦ x <A (30)
(25) gives the same value as the correct correction term k. Also,
0 ≦ x <n (ε + δ) A (31)
(25) gives the correct value k or k-1.
According to the third embodiment, not only the denominator of Expression (12) but also the numerator is approximated by shortening the effective bit length within the allowable error range, so the calculation of the correction term is further simplified. Can speed up.
The correction terms l and m given by the second and third embodiments give a correct correction term when the input x is greater than a certain value, and give a correct correction term when x is smaller than a certain value. Has the property of not.
However, in some cases, the correction term may contain an error only when x is greater than or equal to a certain value, and if x is less than or equal to a certain value, the correct correction term can be given to arbitrarily small values. It may be preferable. For example, in the conversion from the base A to the base B in the Montgomery multiplication step (7) described above, it is desired that the base conversion is always performed correctly only by keeping the modulus N below a certain value.
[0112]
(Fourth embodiment)
Next, as a fourth embodiment, a correction term calculation method will be described in which a correct correction term is given to an arbitrarily small value if x is a certain value or less.
[0113]
The basic principle of calculation of the correction term is based on Equation (11), but it is assumed that the denominator is approximated by a power of 2 and the numerator uses only the upper q bits. In this method, parameters α and β are introduced. Α is a parameter for limiting the size of the input x as in the following equation.
0 ≦ x <(1-α) A (32)
The correction term m ′ in this embodiment is calculated according to the following equation.
[0114]
[Expression 12]

[0115]
In this embodiment, when q <r in FIG. 1, the upper q bits of r are input to the correction term arithmetic unit 110 (FIG. 3), and the initial value of the internal register 302 is cumulatively added as β. The recurrence formula corresponding to formula (33) is as follows.
σ₀ = β (34)
m ’₀= 0 (35)
σ_i = (σ_i-1 − M ’_i-1 ) + trunc (ξ_i ) / 2^ri (36)
m ’_i = [σ_i] (37)
At this time, if n (ε + δ) ≦ β ≦ α <1, x satisfying 0 ≦ x ≦ (1−α) A is correctly converted.
[0116]
For example, when α = β = 1/2, it is possible to always calculate a correct correction term for any x equal to or less than A / 2. In order to realize β = ½ with the remainder calculation apparatus of FIG. 1, it is only necessary to set 1 to the second highest bit of the register 302 shown in FIG. In this way, when β is selected to be the reciprocal of 2 設定, setting of the initial value of the register is simple and simple by setting one corresponding bit to 1. In general, an arbitrary value can be set as an offset as long as β is an error n (ε + δ) or more and α or less.
[0117]
According to the fourth embodiment, parameters α and β are introduced as described above, and x is limited to a certain value or less. Then, it is possible to realize a calculation of a correction term such that a correct correction term is always given at an arbitrarily small value of the limited x.
[0118]
(Fifth embodiment)
The fifth embodiment relates to a parameter size. In the case of RSA encryption, it is necessary to select a modulus size of about 1024 bits, and both bases A and B need to be slightly larger than 1024 bits. If the bases A and B are each about 32 bits, that is, r = 32, the base size is about n = 33. This is to make n * r about 1024. When α = β = 1/2 in the fourth embodiment, it is necessary that n (ε + δ) ≦ 1/2 in order to suppress the occurrence of errors. Therefore, ε + δ ≦ 1 / 2n = 1/66, and ε <1/2⁸, Ε <1/2⁸Is a sufficient condition to satisfy this. Such a parameter size generally corresponds to the accuracy of the adder 301 shown in FIG. 3, and it is understood that an adder of about 8 bits may be used for the calculation of the correction term.
[0119]
(Sixth embodiment)
The sixth embodiment relates to an apparatus that performs a modular exponentiation operation based on Montgomery multiplication by basis conversion (extension) according to the present invention described so far.
FIG. 9 is a diagram illustrating the overall configuration of the power residue calculation apparatus according to the present embodiment. Input data and output data are exchanged via the illustrated I / O unit 1000. Input data is first stored in a predetermined RAM 1201 via the I / O unit 1000. When data from the outside is input in the remainder calculation system expression, it is stored in the corresponding RAMs 1201 to 120n. Although n RAMs are shown in the figure, each RAM has a basis a._iAnd b_iThe element corresponding to is written. The input data is given as a modular exponentiation operation result by repeating the Montgomery multiplication described so far by the product-sum circuits 1101 to 110n and the correction term calculation unit 1100. The calculation result is stored in the corresponding RAM 1200 to 120n and output to the outside via the I / O unit 1000.
[0120]
A procedure for performing a modular exponentiation by repeating Montgomery multiplication will be described with reference to the flowchart of FIG. The flowchart shown in the figure represents a process for obtaining the remainder obtained by multiplying the inputted value x of the remainder calculation system expression by e and dividing by N. However, it is assumed that N is known, and processing such as obtaining a remainder arithmetic expression of N is not shown in FIG. 10 as being calculated in advance. Note that N may be an external input, and processing such as obtaining a remainder arithmetic expression of N may be performed each time.
[0121]
MM shown in FIG. 10 is a function that means Montgomery multiplication by a remainder system operation. The inputted remainder operation system expression of x is first a value (d_A, d_B) And Montgomery multiplication to convert to x ′. Where d = B² mod N. Next, the converted value x ′ (remainder arithmetic expression) is copied to the intermediate result c.
The next step is loop processing, and the loop variable i changes from k-1 to 1. Here, the exponent e to be externally input is expressed in binary, the number of bits is k bits, and each bit is e_iIt is expressed. e_kIs the most significant bit, which is 1 here. Further, k is a value of 2 or more.
In the loop, first, a value corresponding to the square of the intermediate variable c is calculated using Montgomery multiplication. Subsequently, bit e of e corresponding to loop variable i_iIs determined to be 1. If not 1, the process returns to the start of the loop. If 1, the process proceeds to the next step. In the next step, the product of c and x is obtained by Montgomery multiplication. Subsequently, it is determined whether or not the loop variable i is 1. If it is not 1, it returns to the start point of the loop, and if it is 1, exits the loop processing.
In the final step, a product of the calculation result c so far and a value obtained by expressing 1 as a remainder operation system is obtained by Montgomery multiplication, and a result y (remainder operation system expression) is obtained.
[0122]
Thus, y = x^e mod N is calculated.
[0123]
(Seventh embodiment)
The seventh embodiment relates to a remainder computing device having a ring configuration.
The n product-sum circuits shown in FIG. 1 are connected via an r-bit bus. With this bus connection, data output from a certain RAM can be transmitted to all n product-sum units and processed in parallel. It can be said that the configuration in which the product-sum circuits are connected by a bus is one of the effective configuration methods for realizing parallel processing.
On the other hand, as is well known in the field of network architecture, a ring connection is conceivable as a method of connecting a plurality of units in addition to such a bus connection. A bus-type architecture is characterized by a bus for broadcasting the same data to n units, whereas in a ring connection, a communication path connecting adjacent units connects n units. As a whole, it becomes a ring-shaped architecture.
[0124]
The remainder calculation apparatus according to the present invention can also be realized by ring connection. In the case of a serial ring connection, each unit only needs to send data to the adjacent unit, so that the data drive capability of each unit is smaller than a bus-type configuration in which data must be sent to a plurality of units. . Also, the operation of all units can be controlled exactly the same. On the other hand, in the case of the bus type, when a unit broadcasts data, the remaining (n-1) units receive the data, and all the units perform the same operation. Not a translation. Since the operation of all units is the same, the ring type is easier to control.
[0125]
FIG. 11 is a block diagram showing a configuration of a ring-shaped remainder computing device. In the configuration of FIG. 11, the bus connection is changed to a ring connection, and the bit selection unit 111 and the correction term calculation unit 110 shown in FIG. 1 are provided for each of the n product-sum circuits. Has been added. The configuration shown in FIG. 1 is only provided with one correction term calculation unit. In order to broadcast the correction term calculated by the correction term calculation unit to n product-sum circuits, a 1-bit bus is used. Just set up. On the other hand, when the n correction term calculation units are provided in each of the product-sum circuits as in this embodiment, the circuit scale slightly increases. However, the correction term calculation unit according to the present invention has a very simple configuration as shown in FIG. 3, and the correction term calculation unit occupies the entire circuit scale even when n correction term calculation units are provided. The proportion of is very small.
[0126]
In the ring connection configuration of FIG. 11, a correction term calculation unit is provided for each product-sum circuit, so that a bus for transmitting the correction term to each product-sum circuit is not necessary, and a connection unit is provided instead. The detailed structure of this connection unit is shown in FIG. The connection unit shown in the figure includes a two-input selector 960 and an r-bit register 961 that latches an output from the selector 960. In a certain product-sum circuit, the register 961 stores one of the operands used in the current operation cycle. In the next operation cycle, the operand is transferred to the adjacent (for example, left adjacent in the figure) connected unit, and the next operand is received from another adjacent (in this case, right adjacent in the figure) unit. The n operands stored in the registers 961 of each of the n connected units are transferred to the adjacent units one after another in a bucket relay manner, and the n operands are circulated to all units in just n cycles. It will be.
[0127]
[Scalability of the number of units]
So far, the number n of product-sum circuits (or the number of units n) has been described as being equal to the base size n. However, in general, the number of arithmetic units and the base size do not necessarily need to match. Therefore, when the number of units is re-expressed from the symbol n to m, high-speed processing becomes possible as m increases under the constraint of m ≦ n. On the other hand, when considering hardware mounting by LSI or the like, the larger m is, the larger the circuit scale and power consumption are. Thus, a trade-off occurs between the number of units m and the calculation speed. Here, as a typical method of determining the number of units m, a method of employing a divisor of n can be considered. For example, if n = 33, m = 1, 3, 11, or 33 is a candidate for the number of units. Of course, it is possible to employ m other than the divisor of n. However, when the divisor is n, there is an advantage that the control of the circuit becomes regular and the operation rate of the arithmetic unit becomes high. In any case, it can be easily estimated that the degree of freedom of LSI design and the like is greatly expanded by not limiting m to n.
[0128]
【The invention's effect】
As described above, according to the present invention, a new basis conversion (extension) is provided, and the following operational effects can be obtained.
(a) The value of the correction term can be made relatively small and can be sequentially processed in units of 1 bit.
(b) Since the value after the base conversion is the same as the value expressed before the conversion, an error like the method of Posch et al. does not occur.
(c) Even if an error occurs, the error can be easily controlled by the processing before and after and the restriction of the input size.
(d) There are few restrictions on the key size in application to RSA encryption.
(e) No multiplication is required for the calculation of the correction term, and the processing efficiency is good.
(f) There are few restrictions on how to take bases, and versatility is high.
Therefore, the Montgomery multiplication can be speeded up with a simple configuration, and the speeding up of the RSA encryption method can be realized.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a remainder arithmetic apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram showing a configuration of a product-sum circuit shown in FIG.
3 is a diagram showing a configuration of a correction term calculation unit shown in FIG. 1;
4 is a diagram showing a configuration of a bit selection unit shown in FIG. 1;
FIG. 5 is a diagram illustrating a configuration of a remainder calculation apparatus according to a conventional technique.
6 is a diagram showing a configuration of a product-sum circuit shown in FIG. 5 according to the prior art.
7 is a diagram showing a configuration of a correction term calculation unit shown in FIG. 5 according to the prior art.
FIG. 8 is a diagram showing another configuration of the correction term calculation unit according to the embodiment of the present invention.
FIG. 9 is a diagram showing a configuration of a modular exponentiation apparatus according to an embodiment of the present invention.
FIG. 10 is a process flowchart of power residue calculation according to the embodiment of the present invention.
FIG. 11 is a diagram showing another configuration of the remainder calculation apparatus according to the embodiment of the present invention.
12 is a diagram showing a configuration of a connection unit of the remainder calculation device shown in FIG.
[Explanation of symbols]
101 to 10n ... product-sum circuit
110: Correction term calculation unit
111: Bit selection section
121 to 12n ... RAM (random access memory)
131 to 13n ROM (read only memory)
201 ... multiplier
202 ... Adder
203: Remainder calculator
204-206 ... registers
207 ... Switch
301 ... Adder
302: Register
501-50n ... Product sum circuit
510 ... Correction term calculation unit
521-52n ... Random access memory
530-53n: Read only memory
601 ... Multiplier
602 ... Adder
603: Remainder calculator
604, 605 ... registers
701: Multiplier
702 ... Adder
703: Register
801: Dividing circuit
802 ... Adder circuit
803: Register

Claims

In a residue calculation device comprising a plurality of product-sum circuits having a residue calculation function and a correction term calculation unit for calculating a correction term used for residue calculation in the product-sum circuit,
The correction term calculation unit outputs 1-bit data by sequentially calculating the correction term bit by bit,
Each of the product-sum circuits is
A multiplier that multiplies the first input value and the second input value to obtain a multiplication result;
An adder for adding a register feedback value from a first register to the multiplication result;
A switch for controlling whether or not to input the value of the second register holding the second input value to the adder by the 1-bit data;
A remainder calculator that calculates a remainder obtained by dividing the addition result by the adder by the second input value and outputs the remainder to the first register;
A residue arithmetic apparatus that performs base conversion or base extension by sequentially reflecting the correction terms calculated by the correction term calculation unit.

2. The residue arithmetic apparatus according to claim 1, wherein the product-sum circuit performs Montgomery multiplication.

In a residue arithmetic processing device comprising a plurality of product-sum circuits arranged in parallel and a correction term calculation unit for calculating a correction term used for a residue operation in the product-sum circuit,
The correction term calculation unit outputs 1-bit data by sequentially calculating the correction term bit by bit,
Each of the product-sum circuits is
A multiplier that multiplies the first input value and the second input value to obtain a multiplication result;
An adder for adding a register feedback value from a first register to the multiplication result;
A switch for controlling whether or not to input the value of the second register holding the second input value to the adder by the 1-bit data;
A remainder calculator that calculates a remainder obtained by dividing the addition result by the adder by the second input value and outputs the remainder to the first register;
A residue calculation device that performs an operation of sequentially converting the correction term calculated by the correction term calculation unit to convert a residue calculation system representation into a radix representation.

The correction term calculation unit has a division circuit;
4. The residue calculation device according to claim 1, wherein a basis of a residue calculation system handled by the product-sum circuit is set to be 2 冪 or close to 2 冪. 5.

The remainder calculation apparatus according to any one of claims 1 to 4, further comprising a bit selection unit that selects an input bit to the correction term calculation unit.

6. The remainder calculation apparatus according to claim 1, further comprising an I / O unit for inputting / outputting data to / from the outside.

In a residue arithmetic unit that performs base conversion or base extension on one basis to another basis in a predetermined operation algorithm in a residue operation system,
K output means for approximating the unknown parameter k of the basis conversion or basis extension to a carry generated by the cumulative addition of the previous calculation result of the unknown parameter k;
A plurality of arithmetic units for performing the calculation of the basis transformation or the basis extension for each basis element ,
A multiplier that multiplies the first input value and the second input value to obtain a multiplication result;
An adder for adding a register feedback value from a first register to the multiplication result;
Whether or not to input the value of the second register holding the second input value to the adder is controlled by 1-bit data corresponding to the unknown parameter k, and according to the unknown parameter k Switching means for switching whether to calculate a specific term in the basis conversion or basis extension;
An arithmetic unit comprising: a remainder calculator that calculates a remainder obtained by dividing the addition result by the adder by the second input value and outputs the remainder to the first register ;
A remainder calculation device comprising:

8. The remainder calculation apparatus according to claim 7, wherein the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2.

A bit selection unit, wherein the k output unit approximates the numerator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem based on truncation other than the effective bit length by the bit selection unit. The remainder calculation apparatus according to claim 7.

The k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2, and approximates the numerator of the calculation formula based on truncation other than the effective bit length. The remainder calculation apparatus according to claim 7.

The predetermined arithmetic algorithm is xyB ⁻¹ for the input integers x, y, and N. mod N or xyB ^-1 The modular arithmetic apparatus according to claim 7, comprising a Montgomery multiplication algorithm that outputs mod N + N.

12. The modular arithmetic apparatus according to claim 11, further comprising means for performing a modular exponentiation according to a predetermined algorithm using the Montgomery multiplication.

8. The remainder calculation apparatus according to claim 7, further comprising conversion means for converting a remainder calculation system expression into a radix expression and outputting the result according to a predetermined calculation formula including an unknown parameter based on the Chinese remainder theorem.

In a residue arithmetic unit that performs base conversion or base extension on one basis to another basis in a predetermined operation algorithm in a residue operation system,
A multiplier that multiplies the first input value and the second input value to obtain a multiplication result;
An adder for adding a register feedback value from a first register to the multiplication result;
A switch for controlling whether or not to input the value of the second register holding the second input value to the adder by 1-bit data;
A remainder calculator that calculates a remainder obtained by dividing the addition result by the adder by the second input value and outputs the remainder to the first register, and calculates the basis conversion or basis extension for each base element Multiple computing units to perform,
An unknown parameter k of the base conversion or base extension provided in each of the plurality of arithmetic units is approximated to a carry generated by cumulative addition of the previous calculation result of the unknown parameter k, and the unknown parameter k is set to the 1 K output means for outputting as bit data ;
A remainder arithmetic apparatus comprising: a means for transmitting an operand of the arithmetic unit to an adjacent arithmetic unit, and connecting means between arithmetic units for receiving operands from other adjacent arithmetic units.

15. The remainder calculation apparatus according to claim 14, wherein the k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2.

15. The remainder calculating apparatus according to claim 14, wherein the k output means approximates the numerator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem based on truncation other than the effective bit length.

The k output means approximates the denominator of the calculation formula of the unknown parameter k based on the Chinese remainder theorem by a power of 2, and approximates the numerator of the calculation formula based on truncation other than the effective bit length. The remainder calculation apparatus according to claim 14.

The predetermined arithmetic algorithm is xyB ⁻¹ for the input integers x, y, and N. mod N or xyB ^-1 15. The remainder calculation apparatus according to claim 14, comprising a Montgomery multiplication algorithm that outputs mod N + N.

19. The modular arithmetic apparatus according to claim 18, further comprising means for performing a modular exponentiation according to a predetermined algorithm using the Montgomery multiplication.

15. The remainder calculation apparatus according to claim 14, further comprising conversion means for converting a remainder calculation system expression into a radix expression according to a predetermined calculation formula including unknown parameters based on the Chinese remainder theorem.