JP4293665B2

JP4293665B2 - Remainder multiplier

Info

Publication number: JP4293665B2
Application number: JP05480399A
Authority: JP
Inventors: 貴敏小野; なつめ松崎; 浩柏
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-03-02
Filing date: 1999-03-02
Publication date: 2009-07-08
Anticipated expiration: 2019-03-02
Also published as: JPH11316544A

Description

【０００１】
【発明の属する技術分野】
本発明は、高速かつ小回路規模の構成により剰余乗算を行う剰余乗算装置に関する。
【０００２】
【従来の技術】
近年、通信分野における暗号化技術の発達に伴って、暗号化の際に利用される各種演算装置の需要が高まっている。
例えば、楕円暗号方式においては剰余乗算を行うための剰余乗算装置が利用される。ここで剰余乗算とは、３つの整数ａ、ｂ及びｐにおいて（ａ＊ｂ）ｍｏｄｐの演算を施すことである。ここにおいてａは被乗数、ｂは乗数、ｐは法という。
【０００３】
図１７は、従来の剰余乗算装置の構成を示す。
なお、同図の剰余乗算装置１においてａ、ｂ、ｐは、説明を簡単にするため４ビットで表わされるような小さい値を用いるものとする。
剰余乗算装置１は、積算出部２、剰余算出部３、制御部４から構成される。
積算出部２は、被乗数レジスタ５（４ビット）、加算器６（４ビット）、積レジスタ７（９ビット）から構成され、制御部４の制御によって被乗数ａと乗数ｂとの積ａ＊ｂを算出する。
【０００４】
積算出部２に対する制御部４の処理手順を以下に示す。
ステップ１：初期設定として、被乗数レジスタ５に被乗数ａを保持させ、積レジスタ７（８ビット）の下位４ビットに乗数ｂを保持させ、積レジスタ７の上位５ビットに０を保持させる。
ステップ２：積レジスタ７の最下位ビットが０であるか１であるかを判定し、１と判定した場合には、加算器６において積レジスタ７の上位５ビットのうち最上位ビットを除く４ビットに保持される値と被乗数レジスタ７に保持される被乗数ａとを加算させ、加算値を積レジスタ７の上位４ビットに保持させる。
【０００５】
ステップ３：積レジスタ７を右１ビットシフトさせる。
ステップ４：ステップ２〜３の処理を４回繰り返す。
以上のようにしてステップ２〜３のループ処理を４回繰り返すと積レジスタ７には積ａ＊ｂが保持されることになる。
剰余算出部３は、法レジスタ８（４ビット）、加算器９（５ビット）、剰余レジスタ１０（１３ビット）から構成され、制御部４の制御によって手書きの筆算と同等な要領により積ａ＊ｂの法ｐによる剰余（ａ＊ｂ）ｍｏｄｐの算出を行う。なお、剰余算出部３により算出される剰余は符号付きの５ビット値である。
【０００６】
剰余算出部３に対する制御部４の処理手順を以下に示す。
ステップ５：初期設定として法レジスタ８に法ｐを保持させ、剰余レジスタ１０の下位８ビットに積レジスタ７に保持される積ａ＊ｂを保持させ、上位５ビットに０を保持させる。
ステップ６：剰余レジスタ１０に左１ビットシフトさせる。
【０００７】
ステップ７：加算器９において剰余レジスタ１０の上位５ビットに保持される値から法レジスタ８に保持される法ｐを減算させ、剰余レジスタ１０の上位５ビットに保持させる。
ステップ８：剰余レジスタ１０の上位５ビットの正負を最上位ビットの値から判定し、負（最上位ビットが１）と判定した場合は、加算器９において剰余レジスタ１０の上位５ビットと法レジスタ８の法ｐとを加算させることによって剰余レジスタ１０の値を元に戻す（つまり、ステップ７において減算される前の値に戻す）。
【０００８】
ステップ９：ステップ６〜８の処理を８回繰り返す。
以上のようにしてステップ６〜８のループ処理を８回繰り返すと、剰余レジスタ１０の上位５ビットに剰余（ａ＊ｂ）ｍｏｄｐが保持されることになる。
このように剰余乗算装置１は、積算出部２においては、被乗数ａと乗数ｂの各ビットとの部分積を、ｂの下位側から被乗数ａに対応させて算出し、算出される部分積を順次に累算することによって積ａ＊ｂを算出する。そして剰余算出部３においては、積算出部２において算出された積ａ＊ｂをシフトさせて法ｐの桁を積ａ＊ｂの上位桁から下位桁へと桁を合わせつつａ＊ｂから法ｐを減算することにより剰余（ａ＊ｂ）ｍｏｄｐを算出する。
【０００９】
【発明が解決しようとする課題】
上記従来技術における剰余乗算装置１においては、ａ、ｂ、ｐが４ビットという小さな値であるので、加算器６及び加算器９は４ビットの加算が行えればよく、またループ処理の回数も積算出部２においては４回、剰余算出部３においては８回と少ない。それに対して実際の楕円暗号に用いられるａ、ｂ、ｐは例えば１６０ビットとかなり大きな値である。よって従来の剰余乗算装置１を１６０ビットのａ、ｂ、ｐを用いた演算に対応させた場合、加算器６及び加算器９は１６０ビットの加算を行う必要があり、回路規模が大きくなるという問題があった。またループ処理の回数は、積算出部２においては１６０回、剰余算出部３においては３２０回繰り返されることとなり、演算時間がかかるという問題があった。
【００１０】
このようにして従来技術の剰余乗算装置１をあらゆる値を用いた演算に対応させるとすると、値のビット数が大きくなればなるほど、加算器で行われる加算のビット数が大きくなるので加算器の回路規模が大きくなり、ループ処理の回数が増えるので演算時間が長くなるという問題があった。
上記剰余乗算装置１の演算時間の問題を解消する他の従来技術として「United States Patent 5,144,574 Modular multiplication method and the system for processing data」に記載の剰余乗算方法がある。
【００１１】
この剰余乗算方法は、乗数ｂの上位側から２ビットずつ順次に被乗数ａに対応させて、被乗数ａと乗数ｂの２ビットずつとの部分積を算出し、算出される度にその部分積から法ｐの整数倍を減算することによって部分積の法ｐによる部分剰余を算出し、その部分剰余を累算することによって（ａ＊ｂ）ｍｏｄｐを算出する方法である。
【００１２】
このような剰余乗算方法を用いた装置においては、剰余乗算装置１が乗数ｂの１ビットずつを被乗数ａに対応させて部分積を算出していたのに対して、被乗数ａと乗数ｂの２ビットずつとの部分積を算出するので、部分積の算出にかかるループ処理の回数が少なくなり演算時間が短縮される。
このように剰余乗算方法は、主に、被乗数ａと乗数ｂの２ビットずつとを対応させた部分積の算出によって演算時間を短縮しようとするものである。ただしこの剰余乗算方法においては、部分積を算出する際に被乗数ａに対応させる乗数ｂのビット数はたかだか２ビットと少なく、ａ、ｂ、ｐが大きな値である場合には、さほど演算時間の短縮の効果は得られない。
【００１３】
上記問題に鑑みて本発明は、回路規模を抑えながら高速に演算を行う剰余乗算装置を提供することを目的とする。
【００１４】
【課題を解決するための手段】
上記問題を解決するため本発明の剰余乗算装置は、被乗数ａと乗数ｂ（ｂはｋｂビットのデータ）との積に対して、法ｐ（ｐはｋビットデータ）による剰余と合同な値を次式の累算値として算出する剰余乗算装置であって、
累算値＝ Σ C(i)*b[s*i+s-1:s*i]
（ここでΣはi=０〜[[kｂ/s]]までの累算を示す。 [[kｂ/s]]は商kｂ/sの整数部分であり、iは０から[[k/s]]までの整数であり、C(i)は漸化式で表わされi=0のときC(i)=a、i>=1のときC(i)≡(C(i-1)*2^s) mod ｐであり（≡は両辺の値が法ｐにおいて合同であることを示す））
b[s*i+s-1:s*i]は、ｋビットの乗数ｂのうち2^s*i+s-1の位から2^s*iの位までのｓビットの部分乗数であり、前記剰余乗算装置は、ｍ(ｍはｓ以上の整数)ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶するテーブル手段と、初回（i=0）では被乗数ａを中間数C(0)として出力し、
２回目以降（i>0）では、前回出力された中間数C(i-1)をｓビット桁上げし、桁上げ後の中間数の下位ｋビットを除く上位のｍビットについての前記剰余をテーブル手段から読み出し、読み出した剰余と下位ｋビットとを加算することにより新たな中間数(i)を算出する中間数算出手段とを備え、算出された各中間数C(i)と、それに対応する部分乗数b[s*i+s-1:s*i]との部分積C(i)*b[s*i+s-1:s*i]を順次累算することによって前記累算値を算出するよう構成される。
【００１５】
また前記中間数算出手段は、中間数を保持する第１保持手段と、部分乗数b[s*i+s-1:s*i]に対応して、第１保持手段に保持された中間数をｓビット桁上げする桁上げ手段と、桁上げ後の中間数を、桁上げ後の中間数における下位ｋビットよりも上位のｍビットの部分である上位データと下位ｋビットの部分である下位データとに分割する分割手段と、分割手段による上位データに対する法ｐによる剰余を前テーブル手段から読み出す読み出し手段と、テーブル手段から読み出された剰余と、分割手段による下位データとを加算することにより新たな中間数を得る加算手段とを備え、前記第１保持手段は、加算手段において新たな中間数が得られる度にその保持内容を新たな中間数に更新し、前記初回では被乗数を中間数として出力し、前記２回目以降では更新後の新たな中間数を出力するよう構成される。
【００１６】
また前記剰余乗算装置は、さらに、前記分割手段、読み出し手段、加算手段を利用することにより、前記累算値の法ｐによる剰余を求める後処理手段を備える。
また、前記テーブル手段は、ｍビットにより表現される各値に対応するアドレスが入力され、そのアドレスが指す記憶領域に当該アドレスに対応するｍビットの値の２^k倍に対する法ｐによる剰余を予め記憶するメモリ素子を有する。
【００１７】
また前記mビットは下位m1ビットと上位m2ビット（m＝m1+m2）とに分割され、前記mビットにより表現される各値は、m1ビットにより表現される値とm2ビットにより表現される値との組み合わせに対応し、前記テーブル手段は、m1ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶する第１部分テーブル手段と、m2ビットにより表現される各値について、その値の２^k+m1倍に対する法ｐによる剰余を予め記憶する第２部分テーブル手段とを備え、前記加算手段は、第１、第２部分テーブルからの読み出されたそれぞれの剰余と、前記下位データとを加算することにより新たなっ中間数を得るよう構成される。
また前記mビットは下位側からt(3≦t≦m)個の部分ビットm1、…、mtに分割され、前記mビットにより表現される各値は、各mi（ｉは１からｔまで整数）ビットにより表現される値（ｔ個）を組み合わせたものに対応し、前記テーブル手段は、部分ビットmiビットにより表現される各値について、その値の２^k+x倍（ここでｘ=m1+…＋m(i-1)である）に対する法ｐによる剰余を予め記憶するｔ個の部分テーブル手段Ｔiを備え、前記加算手段は、ｔ個の部分テーブル手段Tiからの読み出されたｔ個の剰余と、前記下位データとを加算することにより新たな中間数を得るよう構成される。
【００１８】
また前記剰余乗算装置は、さらに累算手段と補正手段とを備え、前記累算手段は、初期値として０を保持する第３レジスタと、部分積C(i)*b[s*i+s-1:s*i]と第３レジスタに保持された累算値とを加算し、加算結果を新たな累算値として第３レジスタに出力して保持させる加算器とを備え、前記補正手段は、ｐの整数倍の値をもつ補正値を保持する補正値保持手段と、前記第３レジスタに保持された累算値が所定の値以上であれば、補正値保持手段に保持された補正値を、前記加算器に減算させる補正制御手段とを備える。
【００１９】
また前記第３レジスタは符号ビットを有し、前記補正制御手段は、第３レジスタの累算値が正であれば、前記加算器に対して累算値と部分積との加算と同時に前記補正値を減算させ、前記補正値は、ｐの整数倍であってその絶対値が前記部分積の最大値(t+1)(2^s -1)p以下の値であることを特徴とする。
また前記法ｐはｐ＝2^k−αの関係を満たし、各部分テーブル手段Ｔiは、前記剰余をk3ビットのデータとして記憶する、ここで、k3はt*2^m*αのビット数であり、前記αは、k3がｋより小さくなるように定められた定数であることを特徴とする。
【００２０】
また各部分テーブル手段Ｔiは、miビットにより表現される０から(2^mi-1)までの値に対応する2^mi個のエントリを有し、ｊ(jは０から2^miまで)番目のエントリは、j*2^m1+…^+m(i-1)*αを格納していることを特徴とする。
本発明の剰余乗算装置は、被乗数と乗数との積に対する法ｐ（ｐはｋビットデータ）による剰余と合同な数を算出する剰余乗算装置であって、乗数をｓ（ｓは２以上の整数）ビットずつに分けて得られるｓビットの部分乗数を下位側から順に出力する出力手段と、各部分乗数の位に応じて、被乗数を桁上げし、桁上げ後の被乗数に対して、法ｐによる剰余と合同な数（以下中間数と呼ぶ）を算出する第１算出手段と、出力手段により出力された部分乗数と、当該部分乗数に対応して第１算出手段により算出された前記中間数との積を部分積として算出する第２算出手段と、第２算出手段に算出された部分積を累算する累算手段と、累算手段に累算された累算値にｐの整数倍の値を加減算することにより、累算値を所定のビット数を越えないように補正する補正手段と、出力手段により全ての部分乗数が出力されるまで、第１算出手段による中間数の算出と、第２算出手段による部分積の算出と、累算手段による累算と、補正手段による補正とを繰り返し行わせる制御手段とを備え、前記第１算出手段は、ｍ(ｍはｓ以上の整数)ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶し、制御手段による繰り返しのうち初回では被乗数を中間数として出力し、２回目以降では、前回出力された中間数をｓビット桁上げし、桁上げ後の中間数の下位ｋビットより上位のｍビットについてテーブル手段を読み出し、読み出した数と下位ｋビットとを加算することにより新たな中間数を算出するよう構成される。
【００２１】
また前記制御手段は、第１〜第３ステージを含むパイプライン処理を制御する、第１ステージでは出力手段に部分乗数を出力させるとともに第１算出手段に中間数を出力させ、第２ステージでは第２算出手段に部分積を算出させ、第３ステージでは累算手段に累算させるともに補正手段に補正させるよう構成される。
また前記出力手段は、最初に乗数を保持し、保持している値の下位ｓビットを部分乗数として出力する乗数保持手段と、乗数保持手段に保持されている値をｓビット下位側にシフトさせ、シフト後の値を乗数保持手段に出力して保持させるシフト手段とを備える。
【００２２】
また前記第２生成手段は、第ｉ（ｉは１から（ｓ−１）の整数）のシフト手段は、第１算出手段により算出された前記中間数をｉビット左シフトにより桁上げする第１から第（ｓ−１）のシフト手段を備え、前記第１生成手段は、第１算出手段により算出された前記中間数をｓビット左シフトにより桁上げする第ｓのシフト手段と、前記中間数の１の補数を生成する補数生成手段と、定数１を出力する定数出力手段とを備え、前記第２加算手段は、全ビットが１であると判定された場合に、第ｓのシフト手段の出力と、補数生成手段に生成された１の補数と、定数１を選択し、全ビットが１であると判定されなかった場合に、部分乗数の２⁰の位が”１”であれば前記中間数を選択し、部分乗数の２ⁱの位が”１”であれば、第ｉのシフト手段の桁上げ結果を選択する選択手段と、選択手段の選択結果を加算することにより前記部分積を算出する加算器とを備える。
【００２３】
【発明の実施の形態】
＜第１実施形態＞
以下に本実施形態における剰余乗算装置ついて図面を用いて説明する。
本実施形態における剰余乗算装置は、楕円曲線暗号の演算等に用いられる装置で、被乗数Ａ（１６０ビット）と乗数Ｂ（１６０ビット）とが入力されると、（式1）に示す積Ａ＊Ｂの法Ｐ（１６０ビットの素数）による剰余Ｒ（＝Ａ＊ＢｍｏｄＰ）又は剰余Ｒと剰余体上において同値の値を算出する。ここにおいて法Ｐは、αを５４ビット以下の値とするときＰ＝２¹⁶⁰−αを満たす値である。
【００２４】
本剰余乗算装置は、（式１）を変形させた（式２）に基づいて構成されている。なお、念のため（式１）から（式２）への変形過程の詳細を発明の実施の形態の末尾に示す。
【００２５】
【数３】

【００２６】
【数４】

【００２７】
＜演算式の説明＞
（式２）においてＲ１は、Ｒと法Ｐの剰余体において同値の値である。ここでＲとＲ１とが法Ｐの剰余体において同値であるとは、Ｒ１とＲとの差がＰの整数倍であるということを示す。
Ｒ１は、ｉ＝１からｋ／ｓまでの間、Ｃ（ｉ）＊ｂ（ｉ）を累算することによって算出される。ここでｋは乗数Ｂのビット数である。ｓは、乗数Ｂを数ビットずつの部分乗数に分割したときの各部分乗数のビット数である。つまりｋ／ｓは、部分乗数の個数と同じ値である。ここにおいてｋとｓとは、ｋがｓで割り切れる関係になるように定められる。
【００２８】
ｂ（ｉ）＝ｂ[ｓ＊ｉ−１：ｓ＊ｉ−ｓ）]は、乗数Ｂを下位からｓビットずつに分割した部分乗数を示す。ｉ＝１のときｂ（１）＝ｂ［ｓ−１：０］、ｉ＝２のときｂ（２）＝ｂ［２ｓ−１：ｓ］、ｉ＝３のときｂ（３）＝［３ｓ−１：２ｓ］、．．．となる。ここでｋビットの乗数Ｂを２進数表記した場合、ｂｋ−１ｂｋ−２．．．ｂ２ｂ１ｂ０と表わすものとする。またｂ［ｘ：ｙ］は、乗数Ｂの下位からｘ＋１ビット目からｙ＋１ビット目（ただしｘ＞ｙ）までのビット列を指すものとする。よって例えばｂ［１５：８］は、乗数Ｂの下位１６ビット目から９ビット目までの部分乗数を指し、２進数表記によりｂ１５ｂ１４ｂ１３ｂ１２ｂ１１ｂ１０ｂ９ｂ８と表わされる。
【００２９】
Ｃ（ｉ）は、ｉ＝１のときＣ（１）＝Ａ（Ａは被乗数）、２＜＝ｉ＜＝ｋ／ｓのときＣ（ｉ）≡Ｃ（ｉ−１）＊２^sｍｏｄＰという漸化式で表わされる。つまりＣ（ｉ）は、Ｃ（１）＝Ａ（被乗数）、Ｃ（２）≡Ａ＊２^sｍｏｄＰ、Ｃ（３）≡（Ａ＊２^sｍｏｄＰ）＊２^sｍｏｄＰ、...というように漸化的に変化する。この漸化式において２^sは、ｓビットの部分乗数ｂ（ｉ）の桁上げに相当する。Ｃ（ｉ）は、１つ前の値であるＣ（ｉ−１）をｓビット桁上げした値の法Ｐによる剰余である。以下、Ｃ（ｉ）を以後中間数と呼ぶこととする。
【００３０】
このように（式２）においてＲ１は、ｉ＝１〜ｋ／ｓの間、被乗数Ａについて、ｓビット桁上げした値の法Ｐによる剰余を中間数Ｃ（ｉ）として漸化的に算出し、また乗数Ｂを下位からｓビットずつの部分乗数ｂ（ｉ）に分割し、それらの部分積Ｃ（ｉ）＊ｂ（ｉ）を求めて累算することを繰り返しｋ／ｓ回行うことによって求められる。以下、Ｒ１を算出するために演算をｋ／ｓ回繰り返すことを繰り返し処理と呼ぶ。
＜剰余乗算装置の構成＞
図１は、本実施形態における剰余乗算装置１００の概略構成を示すブロック図である。
【００３１】
同図において剰余乗算装置１００は、乗数分割部１１、剰余算出部１２、部分積算出部１３、累算部１４、補正部１５、制御部１６から構成される。この剰余乗算装置１００は、ｋ＝１６０、ｓ＝８としたときの（式２）に基づいて構成されている。
乗数分割部１１は、８ビットずつの部分乗数ｂ（ｉ）＝ｂ［８ｉ−１：８ｉ−８］（ｉ＝１，．．．，１９，２０）を乗数Ｂの下位側から順次出力する。
【００３２】
剰余算出部１２は、中間数Ｃ（ｉ）（ｉ＝１，...１９，２０）を漸化的に算出する。つまり剰余算出部１２は、初回（ｉ＝１）は被乗数Ａを中間数Ｃ（１）として出力する。２回目以降（ｉ＝２，３，．．．，２０）は前回の中間数Ｃ（ｉ−１）を８ビット桁上げして、その桁上げ値Ｃ（ｉ−１）＊２⁸の法Ｐによる剰余を中間数Ｃ（ｉ）（ｉ＝２，３，．．．，２０）として算出する。より具体的には、剰余算出部１２は、初回はＡ、２回目はＡ＊２⁸ｍｏｄＰ、３回目は（Ａ＊２⁸ｍｏｄＰ）＊２⁸ｍｏｄＰ、４回目は（（Ａ＊２⁸ｍｏｄＰ）＊２⁸ｍｏｄＰ）＊２⁸ｍｏｄＰ、．．．を中間数Ｃ（ｉ）として算出する。ここで剰余算出部１２が算出するＣ（ｉ）は、桁上げ値Ｃ（ｉ−１）＊２⁸と合同な値であって、Ｐよりも大きい値である場合もある。
【００３３】
剰余の算出が高速に行われるようにするため剰余算出部１２は、ＲＯＭテーブル３０８、ＲＯＭテーブル３０９を有する。これらのＲＯＭは、後に詳しく説明するが、桁上げ値Ｃ（ｉ−１）＊２⁸から下位１６０ビットを除いた部分、つまり桁上げによって１６０ビットを越えた部分に対応する法Ｐによる剰余を予め記憶する。剰余算出部１２は、桁上げ値Ｃ（ｉ−１）＊２⁸の１６０ビットを越える部分について、これらのＲＯＭを参照し、ＲＯＭより得られる１６０ビットを越える部分に対応する剰余と、桁上げ値Ｃ（ｉ−１）＊２⁸の下位１６０ビットの部分とを加算することにより中間数Ｃ（ｉ）を算出する。
【００３４】
部分積算出部１３は、剰余算出部１２より出力される中間数Ｃ（ｉ）と乗数分割部１１より出力される部分乗数ｂ（ｉ）との部分積Ｃ（ｉ）＊ｂ（ｉ）を出力する。
累算部１４は、部分積算出部１３より出力される部分積Ｃ（ｉ）＊ｂ（ｉ）を累算する。
【００３５】
補正部１５は、累算部１４による累算結果に応じて累算値にＰの整数倍の値を加減算することによって累算値を補正し、それによって累算部１４における累算値の桁あふれを防ぐ。
制御部１６は、剰余算出部１２、部分積算出部１３、累算部１４、補正部１５においてｉ＝１〜２０の２０回行われる演算の繰り返し処理を制御する。また制御部１６は、剰余算出部１２による中間数の算出と乗数分割部１１による部分乗数の算出とを第１段階、部分積算出部１３による部分積の算出を第２段階、累算部１４による累算値の算出と補正部１５による累算値の補正とを第３段階としてパイプライン制御を行う。さらに制御部１６は、２０回繰り返し動作させた後、累算部１４における最終的な累算値がＰより大きい場合には、その累算値を剰余算出部１２に入力して累算値のＰによる剰余を算出させる。
【００３６】
図２は、剰余算出部１２のより詳細な構成図を示す。
同図において剰余算出部１２は、セレクタ３０１、レジスタＡ３０２、シフタ３０３、セレクタ３０４、剰余演算部３１１から構成される。
セレクタ３０１は、制御部１６の制御により、まず初期設定（ｉ＝１）として外部より入力される被乗数ＡをレジスタＡ３０２に出力し、初期設定後（２＜＝ｉ＜＝２０）は剰余演算部３１１より入力される中間数Ｃ（ｉ）をレジスタＡ３０２に出力するセレクタである。
【００３７】
レジスタＡ３０２は、１６１ビットのレジスタであり、初期設定（ｉ＝１）として外部よりセレクタ３０１を介して入力される被乗数Ａ（＝Ｃ（１））を保持し、初期設定後（２＜＝ｉ＜＝２０）は加算器３１０からセレクタ３０１を介して入力される中間数Ｃ（ｉ）を保持する。レジスタＡ３０２は、出力側が部分積算出部１３とシフタ３０３とに接続されており、制御部１６の指示に応じて保持している中間数を新たに入力される中間数に更新し、部分積算出部１３とシフタ３０３とに出力する。
【００３８】
シフタ３０３は、入力を８ビット左シフトする１６９ビット幅のシフタであり、レジスタＡ３０２より入力される１６１ビットの中間数Ｃ（ｉ−１）を８ビット左シフトして１６９ビットの桁上げ値Ｃ（ｉ−１）＊２⁸を出力する。このようにシフタ３０３は、８ビットの左シフトを行うことによって中間数Ｃ（ｉ−１）に対して８ビットの桁上げを行う。
【００３９】
セレクタ３０４は、繰り返し処理の間、シフタ３０３より入力される桁上げ値Ｃ（ｉ−１）＊２⁸を剰余演算部３１１に出力し、繰り返し処理の後、累算部１４より入力される累算値を剰余演算部３１１に出力する。
剰余演算部３１１は、ＲＯＭテーブル３０８、ＲＯＭテーブル３０９、加算器３１０から構成され、シフタ３０３よりセレクタ３０４を介して入力される桁上げ値Ｃ（ｉ−１）＊２⁸又は累算部１４より入力される累算値の法Ｐによる剰余又は合同な値を出力する。この剰余または合同な値は１６１ビットである。
【００４０】
また剰余演算部３１１は、セレクタ３０４から入力される１６９ビットの値の下位１６０ビットを加算器３１０に入力するバス３０５と、１６１〜１６５ビット目をＲＯＭテーブル３０８に入力するバス３０６と、１６６〜１６９ビット目をＲＯＭテーブル３０９に入力するバス３０７とを備える。これらのバスは、桁上げ値Ｃ（ｉ−１）＊２⁸または累算値を下位側から１６０ビットの部分（以下下位部分と呼ぶ）と５ビットの部分（以下５ビット部分と呼ぶ）と４ビットの部分（以下４ビット部分と呼ぶ）とに分割してそれぞれ加算器３１０とＲＯＭテーブル３０８とＲＯＭテーブル３０９とに入力する。
【００４１】
図３（ａ）（ｂ）は、ＲＯＭテーブル３０９及びＲＯＭテーブル３０８の記憶内容を示す。
同図（ｂ）は、ＲＯＭテーブル３０８の記憶内容を示す。ＲＯＭテーブル３０８は、５ビットで表現される値（０００００）₂〜（１１１１１）₂を入力値とし、それら各値を２¹⁶⁰倍した値に対する法Ｐによる剰余を出力値として対応付けて記憶する。同図（ｂ）においてαは、Ｐ＝２¹⁶⁰−αの関係を満たすことからα＝２¹⁶⁰ｍｏｄＰである。よって同図（ｂ）の出力値０、１＊α、２＊α、...３１＊αは、入力値（５ビット部分）を２¹⁶⁰倍した値の法Ｐによる剰余である。ここでαは５４ビット以下の値で入力値は５ビットであるのでＲＯＭテーブル３０８の出力は最大でも５９ビットである。ＲＯＭテーブル３０８は、制御部１６よりｒｅａｄ信号が入力されたときバス３０６より入力される５ビット部分に対応する剰余を読み出して出力する。
【００４２】
同図（ａ）は、ＲＯＭテーブル３０９の記憶内容を示す。ＲＯＭテーブル３０９は、４ビットで表現される値（００００）₂〜（１１１１）₂を入力値とし、それら各値を２¹⁶⁵倍した値に対する法Ｐによる剰余を出力値として対応付けて記憶する。ここでα＝２¹⁶⁰ｍｏｄＰであるので、どう図（ａ）の出力値０、▲１▼＊３２＊α、２＊３２＊α、...１５＊３２＊αは入力値（４ビット）を２¹⁶⁵倍した値の法Ｐによる剰余である。αは５４ビット以下の値で、入力値は４ビット、３２は５ビットであるのでＲＯＭテーブル３０９の出力は最大でも６３ビットである。ＲＯＭテーブル３０９は、制御部１６よりｒｅａｄ信号が入力されたときバス３０７より入力される４ビット部分に対応する剰余を読み出して出力する。
【００４３】
このように剰余演算部３１１においては桁上げによって１６０ビットを越えた部分についてＲＯＭテーブル３０８、ＲＯＭテーブル３０９を用いて剰余を求めるので、従来のように減算によって剰余を求める構成よりも高速になる。また、ＲＯＭテーブル３０８、ＲＯＭテーブル３０９の２つのＲＯＭに分けたことにより入力値の個数がＲＯＭテーブル３０８は２⁵個、ＲＯＭテーブル３０９は２⁴個と少なくなる。これによりＲＯＭテーブル３０８、ＲＯＭテーブル３０９は、５ビット部分、４ビット部分それぞれに対応する入力値の検索がより速くなり、剰余の算出を高速にしている。
【００４４】
加算器３１０は、１６０ビットの下位部分とＲＯＭテーブル３０８より出力される５ビット部分の剰余（５９ビット）とＲＯＭテーブル３０９より出力される４ビット部分の剰余（６３ビット）とを加えることによってＣ（ｉ−１）＊２⁸ｍｏｄＰ又は合同な値を出力する。
と法Ｐの剰余体上同値の値を出力する。加算器３１０が出力する値は、１６１ビットである。
【００４５】
図４は、乗数分割部１１のより詳細な構成を示す。
同図において乗数分割部１１は、シフタ５０３、セレクタ５０１、レジスタＢ５０２から構成される。
セレクタ５０１は、制御部１６の制御により、初期設定（ｉ＝１）として、外部より入力される乗数ＢをレジスタＢ５０２へ出力し、初期設定後（２＜＝ｉ）は、シフタ５０３より入力される値をレジスタＢ５０２へ出力する。
【００４６】
レジスタＢ５０２は、１６０ビットのレジスタであり、初期設定として外部よりセレクタ５０１を介して入力される乗数Ｂを保持し、初期設定後は、シフタ５０３よりセレクタ５０１を介して入力される値を保持する。レジスタＢ５０２は、出力側がシフタ５０３に接続されており、また下位８ビットの部分が部分積算出部１３に接続されている。乗数分割部１１は、制御部１６の制御により保持している値を、シフタ５０３よりセレクタ５０１を介して新たに入力される値に更新する。レジスタＢ５０２に保持されている値は、シフタ５０３に出力されるとともに下位８ビットが部分積算出部１３へ出力される。
【００４７】
シフタ５０３は、レジスタＢ５０２より入力される値を８ビット右シフトして出力する１６０ビットのシフタである。
このような構成によって乗数分割部１１は、乗数Ｂの下位から８ビットずつの部分乗数ｂ（ｉ）＝ｂ［８ｉ−１：８ｉ−８］（ｉ＝１，．．．１９，２０）を順次に部分積算出部１３に出力する。
【００４８】
図５は、部分積算出部１３のより詳細な構成を示す。
同図において部分積算出部１３は、シフタ部５１２、セレクタ部５１３、加算器５２２、加算器５２３、レジスタＣａ５２４、レジスタＣｂ５２５から構成される。
部分積算出部１３における演算を式によって表わすと（式３）のようになる。
【００４９】
【数５】

【００５０】
なお同式においては便宜上、乗数分割部１１より入力される部分乗数ｂ（ｉ）の２進表記をｂｉ７ｂｉ６ｂｉ５ｂｉ４ｂｉ３ｂｉ２ｂｉ１ｂｉ０としている。
同式に示すように、部分積算出部１３は、まず剰余算出部１２より入力される中間数Ｃ（ｉ）に対して部分乗数ｂ（ｉ）の各ビットに応じた０〜７ビットの桁上げを行い（同式▲１▼）、次に桁上げされた各値に部分乗数ｂ（ｉ）の各ビットを乗じ（同式▲２▼）、最後にそれらを加算する（同式▲３▼、▲４▼）
シフタ部５１２は、バス５０４と、入力される値に対してそれぞれ１〜７ビット左シフトを行うシフタ５０５〜５１１とを備え、同式▲１▼に相当する演算を行う。
【００５１】
すなわちシフタ部５１２は、剰余算出部１２より入力される中間数Ｃ（ｉ）を部分乗数ｂ（ｉ）の各ビットの重みに応じた桁上げを行ってセレクタ部５１３に出力する。シフタ５０５〜５１１は、入力されるＣ（ｉ）に対してそれぞれ１〜７ビットの左シフトを行うシフタである。つまりシフタ５０５〜５１１は、入力されるＣ（ｉ）に対して、それぞれＣ（ｉ）＊２¹〜Ｃ（ｉ）＊２⁷を出力する。バス５０４は、中間数Ｃ（ｉ）をそのままセレクタ部５１３に出力する。
【００５２】
セレクタ部５１３は、セレクタ５１４〜５２１を備え、同式▲２▼に相当する演算を行う。セレクタ５１４〜５２１は、それぞれシフタ部５１２よりＣ（ｉ）＊２⁰〜Ｃ（ｉ）＊２⁷が入力され、乗数分割部１１よりｂ（ｉ）の各ビットｂｉ０〜ｂｉ７が入力される。セレクタ５１４〜５２１は、それぞれに入力される部分乗数ｂ（ｉ）の各ビットｂｉ０〜ｂｉ７の値に応じて、それぞれに入力されるＣ（ｉ）＊２⁰〜Ｃ（ｉ）＊２⁷か０かのどちらかを選択して出力する。すなわちセレクタ５１４〜５２１は、入力されるｂｉｎ（ｎ＝０，１，...，７）の値が１の場合はＣ（ｉ）＊２ⁿ（ｎ＝０，１，...，７）を出力し、ｂｉｎの値が０の場合は０を出力する。
【００５３】
加算器５２２は、▲３▼に相当する演算を行う加算器であり、セレクタ５１４〜５１７より入力される４つの値を加算した値（以下部分積ａという）をレジスタＣａ５２４へ出力する。
加算器５２３は、▲４▼に相当する演算を行う加算器であり、セレクタ５１８〜５２１より入力される４つの値を加算した値（以下部分積ｂという）をレジスタＣｂ５２５へ出力する。
【００５４】
レジスタＣａ５２４、レジスタＣｂ５２５は、それぞれ部分積ａ、部分積ｂを保持し、制御部１６の制御により保持している部分積ａ、部分積ｂを新しく入力された値に更新する。レジスタＣａ５２４及びレジスタＣｂ５２５の出力側は、累算部１４に接続されており、レジスタＣａ５２４及びレジスタＣｂ５２５は、保持している部分積ａ、ｂを累算部１４に出力する。
【００５５】
この部分積ａと部分積ｂとを加え合わせた値が部分積Ｃ（ｉ）＊ｂ（ｉ）に相当する。
図６は、累算部１４及び補正部１５の詳細な構成を示す。
累算部１４は、加算器６０１及びレジスタＳ６０２から構成される。
レジスタＳ６０２は、累算値を保持する１７０ビットのレジスタであり、制御部１６の制御により、保持している累算値を、加算器６０１から入力される新しい累算値に更新する。ここでレジスタＳ６０２の最上位ビットは、累算値の正負を表わす符合ビットとなる。
【００５６】
加算器６０１は、レジスタＳ６０２に保持される累算値とレジスタＣａ５２４及びレジスタＣｂ５２５に保持される部分積ａと部分積ｂと補正部１５より入力される補正値とを加算してレジスタＳ６０２に出力する。
補正部１５は、レジスタＥ６０３、インバータ６０６、セレクタ６０４、セレクタ６０５から構成される。
【００５７】
レジスタＥ６０３は、部分積ａ及び部分積ｂを合わせた値の最大値となり得る値より大きく、かつ最も近い値を保持する。
インバータ６０６は、レジスタＥ６０３に保持される５１０＊Ｐの反転値（以下、¬５１０＊Ｐと表記する）を出力する。
セレクタ６０４は、制御部１６の制御に従って、０と５１０＊Ｐと¬５１０＊Ｐのうちからいずれかを補正値として選択して加算器６０１に供給する。
【００５８】
セレクタ６０５は、制御部１６の制御に従って、０または１を加算器６０１に供給する。
ここで制御部１６は、レジスタＳ６０２に保持される累算値の正負に応じてセレクタ６０４に補正値を選択させ、またセレクタ６０５に０または１を選択させて加算器６０１に供給させる。
【００５９】
詳しくは制御部１６は、レジスタＳ６０２の最上位ビットが０であれば、セレクタ６０４に¬５１０＊Ｐを選択させ、セレクタ６０５に１を選択させる。これによってレジスタＳ６０２に保持される累算値が正（最上位ビットが０）である場合には、レジスタＥ６０３からインバータ６０６、セレクタ６０４を介して¬５１０＊Ｐが、またセレクタ６０５を介して１が加算器６０１に供給される。これのおかげで加算器６０１の次の加算においては５１０＊Ｐが減算されることになり、加算器６０１及びレジスタＳ６０２における桁あふれが予防される。
【００６０】
制御部１６は、レジスタＳ６０２に保持される累算値が負（最上位ビットが１）である場合には、セレクタ６０４に０を選択させ、またセレクタ６０５に０を選択させる。
また制御部１６は、ｉ＝１〜２０の繰り返し処理が終わり、レジスタＳ６０２の保持する値が最終的な累算値に更新された場合、その累算値の最上位ビットが１か０かを判定し、１（負を示す）である場合には、セレクタ６０４に５１０＊Ｐを、セレクタ６０５に０を選択させ、加算器６０１においてレジスタＳ６０２に保持される累算値に５１０＊Ｐを加える補正を行わせる。これによって制御部１６は、最終的な累算値が正になるよう補正させる。また制御部１６は、最終的な累算値の最上位ビットが０（正を示す）場合には、セレクタ６０４に０を、セレクタ６０５に０を選択させ、累算値の補正が行われないようにする。
＜第２実施形態＞
以下に第２実施形態における剰余乗算装置について以下に図面を用いて説明する。
【００６１】
本実施形態における剰余乗算装置は、被乗数Ａ（１６０ビット）と乗数Ｂ（１６１ビット）とが入力されると、積Ａ＊Ｂの法Ｐ（１６０ビット）による剰余Ｒ（＝Ａ＊ＢｍｏｄＰ）又は剰余Ｒと剰余体上において同値の値を算出する。ここにおいて法Ｐは、αを５４ビット以下の値とするときＰ＝２¹⁶⁰−αを満たす値である。
【００６２】
本実施形態の剰余乗算装置も、第１実施形態と同様（式２）に基づいて構成されている。ただし第１実施形態においてはｓ＝８であったのに対して第２実施形態においてはｓ＝９とする。またｋ／ｓの部分は、実際に計算するとｋ／ｓ＝１６１／９と割り切れないので、本実施形態においてはｋ／ｓの部分は１８と置き換えるものとする。
【００６３】
図７は、第２実施形態における剰余乗算装置２００の概略構成を示すブロック図である。
同図において剰余乗算装置２００は、乗数分割部８１、剰余算出部８２、部分積算出部８３、累算部８４、補正部８５、制御部８６から構成される。
同図の剰余乗算装置２００において剰余算出部８２は、１つしかＲＯＭテーブルを有していない点が剰余算出部１２と異なる。また図１１において詳細に説明するが、部分積算出部８３は部分積算出部１３に対して、その内部の加算器に入力される入力数を減らして加算器の回路規模がより小さくなるように構成されている。さらに剰余乗算装置１００において桁挙げ値Ｃ（ｉ−１）＊２⁸の算出は剰余算出部１２が行っていたが、剰余乗算装置２００において桁上げ値Ｃ（ｉ−１）＊２⁹の算出は、剰余算出部８２ではなく部分積算出部８３が行う。
【００６４】
その他、乗数分割部８１、累算部８４、補正部８５、制御部８６は、乗数Ｂが１６１ビット、ｓが９ビットと第１実施形態とビット数が異なっている点から各構成要素における入出力のビット数やレジスタ等のビット数が異なってはいるものの、その点を除けば第１実施形態と同様の構成である
乗数分割部８１は、９ビットずつの部分乗数ｂ（ｉ）＝ｂ［９ｉ−１：９ｉ−９］（ｉ＝１，...，１７，１８）を乗数Ｂの下位側から順次出力する。
【００６５】
剰余算出部８２は、中間数Ｃ（ｉ）（ｉ＝１，...１７，１８）を漸化的に算出する。つまり剰余算出部８２は、初回（ｉ＝１）は、被乗数Ａを中間数Ｃ（ｉ）として出力する。２回目以降（ｉ＝２，．．．１７，１８）は前回の中間数Ｃ（ｉ−１）の９ビット桁上げされた桁上げ値Ｃ（ｉ−１）＊２⁹の法Ｐによる剰余Ｃ（ｉ−１）＊２⁹ｍｏｄＰを中間数Ｃ（ｉ）として算出する。より具体的には、剰余算出部８２は、初回はＡ、２回目はＡ＊２⁹ｍｏｄＰ、３回目は（Ａ＊２⁹ｍｏｄＰ）＊２⁹ｍｏｄＰ、４回目は、（（Ａ＊２⁹ｍｏｄＰ）＊２⁹ｍｏｄＰ）＊２⁹ｍｏｄＰ、...を中間数Ｃ（ｉ）として算出する。ただし、剰余算出部８２は、中間数Ｃ（ｉ−１）に対する９ビットの桁上げを行わず、部分積算出部８３において算出される桁上げ値Ｃ（ｉ−１）＊２⁹を利用してその法Ｐによる剰余Ｃ（ｉ−１）＊２⁹ｍｏｄＰを算出する。
【００６６】
剰余の算出が高速に行われるようにするため剰余算出部８２は、ＲＯＭテーブル９０４を有する。ＲＯＭテーブル９０４は、桁上げ値Ｃ（ｉ−１）＊２⁹から下位１６０ビットを除いた部分、つまり桁上げによって１６０ビットを越えた部分に対応する法Ｐによる剰余を予め記憶する。剰余算出部８２は、桁上げ値Ｃ（ｉ−１）＊２⁹の１６０ビットを越える部分についてＲＯＭテーブル９０４を参照し、当該ＲＯＭにより得られる１６０ビットを越える部分に対応する剰余と、桁上げ値Ｃ（ｉ−１）＊２⁹の下位１６０ビットの部分とを算出することにより中間数Ｃ（ｉ）を算出する。
【００６７】
部分積算出部８３は、剰余算出部８２より出力される中間数Ｃ（ｉ）と乗数分割部８１より出力される部分乗数ｂ（ｉ）との部分積Ｃ（ｉ）＊ｂ（ｉ）を出力する。また部分積算出部８３は、剰余算出部８２より出力される中間数Ｃ（ｉ−１）の９ビットの桁上げ値を算出して剰余算出部８２に出力する。
累算部８４は、部分積算出部８３により出力される部分積Ｃ（ｉ）＊ｂ（ｉ）を累算する。
【００６８】
補正部８５は、累算部８４による累算結果に応じて累算値にＰの整数倍の値を加減算することによって累算値を補正し、それによって累算部８４における累算値の桁あふれを防ぐ。
制御部８６は、乗数分割部８１、剰余算出部８２、部分積算出部８３、累算部８４、補正部８５においてｉ＝１〜１８の１８回行われる演算の繰り返し処理を制御する。また制御部８６は、剰余算出部８２による中間数の算出と乗数分割部８１による部分乗数の算出とを第１段階、部分積算出部８３による部分積の算出を第２段階、累算部８４による累算値の算出と補正部８５による累算値の補正とを第３段階としてパイプライン制御を行う。制御部８６は、１８回の繰り返し動作の後、累算部８４における最終的な累算値がＰより大きい場合には、その累算値を剰余算出部８２に入力して累算値のＰによる剰余を算出させる。
【００６９】
図８は、剰余算出部８２のより詳細な構成を示す。
同図において剰余算出部８２は、セレクタ９０１、レジスタＡ９０６、セレクタ９０２、剰余演算部９０７から構成される。
セレクタ９０１は、制御部８６の制御により、まず初期設定（ｉ＝１）として外部より入力される被乗数ＡをレジスタＡ９０６に出力し、初期設定後（ｉ＝２，．．．１７，１８）は剰余演算部９０７より入力される中間数Ｃ（ｉ）をセレクタ９０１に出力するセレクタである。
【００７０】
レジスタＡ９０６は、１６１ビットのレジスタであり、初期設定（ｉ＝１）として外部よりセレクタ９０１を介して入力される被乗数Ａ（＝Ｃ（１））を保持し、初期設定後（ｉ＝２，．．．１７，１８）は、加算器９０５からセレクタ９０１を介して入力される中間数Ｃ（ｉ）を保持する。レジスタＡ９０６は、出力側が部分積算出部８３に接続されており、制御部８６の指示に応じて保持している中間数を新たに入力される中間数に更新し、部分積算出部８３に出力する。レジスタＡ９０６は、最終的には剰余乗算値Ｒを保持する。
【００７１】
セレクタ９０２は、繰り返し処理の間、部分積算出部８３より入力される桁上げ値Ｃ（ｉ−１）＊２⁹を剰余演算部９０７に出力し、繰り返し処理の後、累算部８４より入力される累算値を剰余演算部９０７に出力する。
剰余演算部９０７は、ＲＯＭテーブル９０４と加算器９０５から構成され、部分積算出部８３よりセレクタ９０２を介して入力される１７０ビットの桁上げ値Ｃ（ｉ−１）＊２⁹又は累算部８４より入力される１７０ビットの累算値の法Ｐによる剰余又は合同な値を出力する。この剰余または合同な値は１６１ビットである。
【００７２】
剰余演算部９０７は、セレクタ９０２から入力される１７０ビットの値の下位１６０ビットを加算器９０５に出力するバス９０８と、下位１６０ビットを除く上位１０ビット（１６１〜１７０ビット目）をＲＯＭテーブル９０４に出力するバス９０９とを備える。これらバス９０８及びバス９０９は、１７０ビットの値を下位１６０ビット（以下下位部分と呼ぶ）と上位１０ビット（以下上位部分と呼ぶ）とに分割して、下位部分と上位部分とをそれぞれ加算器９０５とＲＯＭテーブル９０４とに入力する。
【００７３】
図９は、ＲＯＭテーブル９０４の記憶内容を示す。
同図においてＲＯＭテーブル９０４は、１０ビットで表現される各値（００００００００００）₂〜（１１１１１１１１１１）₂を入力値とし、それら各値を２¹⁶⁰倍した値に対する法Ｐによる剰余を出力値として対応付けて記憶する。ＲＯＭテーブル９０４は、制御部８６よりｒｅａｄ信号が入力されたときバス９０９より入力される上位部分に対応する剰余を読み出して出力する。
【００７４】
加算器９０５は、ＲＯＭテーブル９０４より出力される上位部分に対する剰余と下位部分とを加えることによって１６１ビットのＣ（ｉ−１）＊２⁹ｍｏｄＰ又はそれと合同な値を出力する。
このように剰余演算部９０７においては９ビットの桁上げによって１６０ビットを越えた部分についてＲＯＭテーブル９０４を用いて剰余を求め、その剰余と下位１６０ビットとを加算することによって桁上げ値Ｃ（ｉ−１）＊２の剰余又は合同な値を求める。
【００７５】
この剰余演算部９０７の内部の構成を従来技術における剰余算出部３と似た構成にした場合、剰余演算部９０７は従来技術に示したようなループ処理（ステップ６〜８）を複数回行わなければならず、剰余の算出に時間がかかる。これに対して本実施形態の剰余演算部９０７は、ＲＯＭテーブル９０４における１回の読出しと加算器９０５における１回の加算とを行うだけで剰余を算出するので従来の構成よりも高速になる。
【００７６】
図１０は、乗数分割部８１のより詳細な構成を示す。
同図において乗数分割部８１は、セレクタ８０１、レジスタＢ８０２、シフタ８０３から構成される。
セレクタ８０１は、制御部８６の制御により、初期設定（ｉ＝１）として、外部より入力される乗数ＢをレジスタＢ８０２へ出力し、初期設定後（ｉ＝２,...17,18）は、シフタ８０３より入力される値をレジスタＢ８０２へ出力する。
【００７７】
レジスタＢ８０２は、１６１ビットのレジスタであり、初期設定として外部よりセレクタ８０１を介して入力される乗数Ｂを保持し、初期設定後は、シフタ８０３よりセレクタ８０１を介して入力される値を保持する。レジスタＢ８０２は、出力側がシフタ８０３に接続されており、また下位９ビットの部分が部分積算出部８３に接続されている。レジスタＢ８０２は、制御部１６の制御により保持している値を、シフタ８０３よりセレクタ８０１を介して新たに入力される値に更新する。レジスタＢ８０２に保持されている値は、シフタ８０３に出力されるとともに下位９ビットが部分積算出部８３へ出力される。
【００７８】
シフタ８０３は、レジスタＢ８０２より入力される値を９ビット右シフトして出力する１６１ビットのシフタである。
このような構成によって剰余算出部８２は、乗数Ｂの下位から９ビットずつの部分乗数ｂ（ｉ）＝ｂ［９ｉ−１：９ｉ−９］（ｉ＝１，．．．１７，１８）を順次に部分積算出部８３に出力する。
＜部分積算出方法＞
ここで部分積算出部８３の詳細な構成を説明する前に、部分積算出部８３の構成の基礎となっている２つの積算出方法について説明する。
【００７９】
被乗数Ｘと３ビットの乗数Ｙ（＝ｙ２ｙ１ｙ０）との積は、（式４）のように展開される。
【００８０】
【数６】

【００８１】
つまり（式４）において積Ｘ＊Ｙは、被乗数Ｘと乗数Ｙの各桁の重み（２²，２¹，２⁰）と各桁の値（ｙ２，ｙ１，ｙ０）とを乗じて得られる値（Ｘ＊２²＊ｙ２，Ｘ＊２¹＊ｙ１，Ｘ＊２⁰＊ｙ０、以下ビット積と呼ぶ）を加算することにより算出される。
具体的には例えば、Ｙ＝１０１のとき（式４）はＸ＊Ｙ＝Ｘ＊２²＊ｙ２＋Ｘ＊２⁰＊ｙ０であり、Ｘ＊Ｙは２つの値（以下ビット積と呼ぶ）の加算によって算出される。Ｙ＝１１０のとき（式４）はＸ＊Ｙ＝Ｘ＊２²＊ｙ２＋Ｘ＊２¹＊ｙ１であり、Ｘ＊Ｙは２つのビット積の加算によって算出される。Ｙ＝１１１のとき（式４）はＸ＊Ｙ＝Ｘ＊２²＊ｙ２＋Ｘ＊２¹＊ｙ１＋Ｘ＊２⁰＊ｙ０であり、Ｘ＊Ｙは３つのビット積の加算によって算出される。
【００８２】
このように被乗数に対して乗数の各桁の重みと各桁の値とを乗じて得られるビット積を加算することにより積を算出する方法を一般算出方法と呼ぶこととする。第１実施形態における剰余算出部１２も、この方法に基づいて構成されている。
ところで１１１＝２³−１である。よってＹ＝１１１のとき積Ｘ＊Ｙは、被乗数Ｘに１１１を乗じる代わりに２³−１を乗じることによっても算出することができる。被乗数Ｘと２³−１と積は次のように展開される。
【００８３】
【数７】

【００８４】
つまり（式５）においてＹ＝１１１のとき積Ｘ＊Ｙは、被乗数Ｘに２³を乗じた値から被乗数Ｘを減算することにより算出される。
またＹ’＝１１１０００のとき、Ｙ’＝Ｙ＊２³であるから積Ｘ＊Ｙ’は被乗数Ｘに１１１０００を乗じる代わりに２⁶−２³を乗じることによっても算出することができる。
【００８５】
以上により乗数において２^pの位（下位からp＋１ビット目）から２^q（下位からq＋１ビット目）の位（ただしp＞q）が全て１であるとき、その部分における積は被乗数に２^p+1を乗じた値（以下被減数ともいう）から被乗数に２^q（以下減数ともいう）を乗じた値を減ずることによって算出することができる。この方法を特別算出方法と呼ぶこととする。
【００８６】
ところで加算器においてｓとｔとの減算値ｓ−ｔは、ｓとｔの反転値（１の補数）と定数１との加算に置きかえられて算出される。よって乗数において２^pの位（下位からp＋１ビット目）から２^q（下位からq＋１ビット目）の位（ただしp＞q）が全て１である場合、その部分について特別算出方法によって積を算出するとすれば、加算器は被乗数の２^p+1倍と被乗数の２^q倍の反転値（以下単に反転値ともいう）と定数１との加算を行うだけでよい。しかも定数１の加算は加算器の下位ビットのキャリーインを利用して実現することができる。つまり特別算出方法によって積を算出する場合、加算器は２つの加算とキャリーインとを行うだけでよい。
【００８７】
これに対してｎビットの乗数の全ビットが１である場合、一般算出方法によって積を算出するとすれば、加算器はｎ個のビット積の加算を行う必要がある。
以上のことから乗数中１が３以上連続して並ぶ場合、その部分については、一般算出方法によるよりも特別算出方法によって積を算出するほうが加算器における加算の数を減らすことができるので、加算器の規模の縮小と加算量の負荷軽減とを実現できる。
部分積算出部８３は、上記２つの方法に基づいて部分積Ｃ（ｉ）＊ｂ（ｉ）を算出するよう構成される。
【００８８】
部分積算出部８３は、部分乗数ｂ（ｉ）のビット列において１が連続している部分に特別算出方法を適用し、それ以外の部分については一般算出方法を適用する。より具体的には、部分積算出部８３は、９ビットの部分乗数ｂ（ｉ）が１１１１１１１１１である場合と、部分乗数ｂ（ｉ）の上位６ビット又は下位６ビットが１１１１１１である場合と、上位３ビット又は中位３ビット又は下位３ビットが１１１である場合にその部分について特別算出方法を適用し、それ以外は一般算出方法を適用する。
【００８９】
例えば、部分乗数ｂ（ｉ）が１１１１１１０１１である場合、部分積算出部８３は、上位６ビットの１１１１１１の部分に特別算出方法を適用し、下位３ビットの０１０の部分に一般算出方法を適用して積Ｃ（ｉ）＊ｂ（ｉ）する。ただしこのとき、上位６ビットの１１１１１１は、下位から３ビット桁上げされた値（２³倍した値）であるので、部分積算出部８３は、特別算出方法を適用する際には上位６ビットの部分に対して２³の桁上げを行わなければならない。
【００９０】
具体的には（式６）のように展開される。
【００９１】
【数８】

【００９２】
（式６）に示すように上位６ビットの１１１１１１については特別算出方法の適用と２³の桁上げとにより同式▲１▼のように展開される。すなわち部分積算出部８３は、中間数Ｃ（ｉ）を９ビット桁上げした値と中間数Ｃ（ｉ）を３ビット桁上げした値と定数１との加算によって上位６ビットの１１１１１１の部分における積を算出する。また下位３ビットの０１１については一般算出方法の適用により同式▲２▼のように展開される。すなわち部分積算出部８３は、下位３ビットの０１１の値に応じて、中間数Ｃ（ｉ）を1ビット桁上げした値と中間数Ｃ（ｉ）との加算によって下位３ビットの０１１の部分における積を算出する。
【００９３】
図１１は、部分積算出部８３のより詳細な構成を示す。
同図において部分積算出部８３は、シフタ部５１、選択部５２、ＯＲ回路５１３、加算器Ａ７０１、加算器Ｂ７０２、レジスタＣａ８４４、レジスタＣｂ８４５から構成される。
シフタ部５１は、バス８３０とシフタ８３１〜８３９とを有する。バス８３０は、入力される１６１ビットの中間数Ｃ（ｉ）をそのまま選択部５２へ出力する。シフタ８３１〜８３９は、それぞれ入力される中間数Ｃ（ｉ）を左に１〜９ビットシフトさせることにより中間数Ｃ（ｉ）を１〜９ビット桁上げしてＣ（ｉ）＊２、Ｃ（ｉ）＊２²、Ｃ（ｉ）＊２³、Ｃ（ｉ）＊２⁴、Ｃ（ｉ）＊２⁵、Ｃ（ｉ）＊２⁶、Ｃ（ｉ）＊２⁷、Ｃ（ｉ）＊２⁸、Ｃ（ｉ）＊２⁹を生成して選択部５２に出力する。またシフタ８３９は、中間数Ｃ（ｉ）を左に９ビットシフトさせて選択部５２に出力すると同時に、その値を桁上げ値Ｃ（ｉ−１）＊２⁹として剰余算出部８２に出力する。
【００９４】
シフタ部５１において、バス８３０とシフタ８３１〜８３８の出力、すなわちＣ（ｉ）、Ｃ（ｉ）＊２¹Ｃ、．．．、Ｃ（ｉ）＊２⁸は一般算出方法による算出の際のビット積の候補となる値である。シフタ８３３、８３６、８３９の出力、すなわちＣ（ｉ）＊２³、Ｃ（ｉ−１）＊２⁶、Ｃ（ｉ−１）＊２⁹は、特別算出方法による算出の際の被減数の候補となる値である。またバス８３０、シフタ８３３、８３６の出力、すなわちＣ（ｉ）、Ｃ（ｉ）＊２³、Ｃ（ｉ−１）＊２⁶は、特別算出方法による算出の際、減数の候補となる値である。以下これらの候補となる値を候補値と呼ぶこととする。
【００９５】
選択部５２は、シフタ部５１より０〜９ビット桁上げされた値と乗数分割部８１からの部分乗数ｂ（ｉ）とが入力されると、９ビットの部分乗数ｂ（ｉ）が１１１１１１１１１であるか否か、部分乗数ｂ（ｉ）の上位６ビット又は下位６ビットが１１１１１１であるか否か、上位３ビット又は中位３ビット又は下位３ビットが１１１であるか否かを判定する。次に選択部５２は、その判定に応じてシフタ部５１から入力される候補値のうちのいくつかを選択する。より詳しくは、選択部５２は、肯定的に判定された部分については、候補値の中からその部分に対応する被減数と減数とを選択する。選択部５２は、否定的に判定された部分については、候補値の中からその部分に対応するビット積を選択する。選択部５２は、選択された候補値のうち、被減数とビット積についてはそのまま加算器Ａ７０１又は加算器Ｂ７０２に出力する。選択部５２は、選択された候補値のうち、減数については、その値を反転させた反転値と定数１とを生成して、生成した値を加算器Ａ７０１又は加算器Ｂ７０２に出力する。
【００９６】
選択部５２は、演算値選択器８４０、８４１、８４２を備える。
図１２は、演算値選択器８４０、８４１、８４２の詳細な構成を示す。
演算値選択器８４０は、インバータ１３０１、セレクタ１３０２、１３０３、制御回路１３０４を備える。また演算値選択器８４０は、入力端子Ｓ、Ｚ、Ｉ０、Ｉ１、Ｉ２、Ｉ３、出力端子Ｆ、Ｏ０、Ｏ１を備える。
【００９７】
インバータ１３０１は、入力端子Ｉ０より候補値Ｃ（ｉ）が減数として入力されると、その減数を反転させた反転値をセレクタ１３０３に出力する。
セレクタ１３０３は、インバータ１３０１より入力端子¬Ｉ０を介して候補値Ｃ（ｉ）の反転値が、入力端子Ｚより０が、入力端子Ｉ０より候補値Ｃ（ｉ）が、入力端子Ｉ１より候補値Ｃ（ｉ）＊２¹が入力されると、制御回路１３０４の入出力論理に従ってこれらの入力端子のうちから１つを選択し、選択した入力端子より入力される値を出力端子Ｏ０より出力する。
【００９８】
セレクタ１３０２は、入力端子Ｚより０が、入力端子Ｉ１より候補値Ｃ（ｉ）＊２¹が、入力端子Ｉ２より候補値Ｃ（ｉ）＊２²が、入力端子Ｉ３より候補値Ｃ（ｉ）＊２³が入力されると、制御回路１３０４の入出力論理に従ってこれらの入力端子のうちから１つを選択し、選択した入力端子より入力される値を出力端子Ｏ１より出力する。
【００９９】
制御回路１３０４は、入力端子Ｓより部分乗数ｂ（ｉ）が入力されると、図１３（ｃ）に示す入出力論理に基づいてセレクタ１３０２及び１３０３に入力端子の１つを選択させて、その入力端子から入力される値を出力端子Ｏ１、Ｏ０より出力させる。また制御回路１３０４は、セレクタ１３０３に入力端子¬Ｉ０を選択させたとき、出力端子Fより１を出力する。それ以外のとき、制御回路１３０４は、出力端子Fより０を出力する。
【０１００】
図１３（ｃ）は、制御回路１３０４に入力される部分乗数ｂ（ｉ）と入出力論理を示す。
同図（ｃ）の１列目は、入力端子Ｓより入力される部分乗数ｂ（ｉ）を示す。“−”は、１か０かの任意の値を示し、“＃＃＃”は、“１１１”以外の任意の値を示す。２列目は、セレクタ１３０２に選択させる入力端子を示す。３列目は、セレクタ１３０３に選択させる入力端子を示す。４列目は、制御回路１３０４が出力端子Ｆより出力する値で、１のとき定数１を示す。
【０１０１】
例えば、制御回路１３０４は、部分乗数ｂ（ｉ）の下位６ビットすべてが１であると判定した場合は、出力端子Ｏ１より０を、出力端子Ｏ０よりＣ（ｉ）の反転値を出力させ、出力端子Ｆより定数１を出力する。下位６ビットすべてが１であると判定しなかった場合であって、下位３ビットすべてが１であると判定した場合は、制御回路１３０４は、出力端子Ｏ１よりＣ（ｉ）＊２³を出力させ、出力端子Ｏ０よりＣ（ｉ）を出力させ、出力端子Ｆより定数１を選択する。その他の場合、制御回路１３０４は、部分乗数ｂ（ｉ）の下位３ビットの値に応じて出力端子Ｏ１、Ｏ０よりＣ（ｉ）＊２²、Ｃ（ｉ）＊２¹、Ｃ（ｉ）、０のうちのいずれか２つの値を出力させる。
【０１０２】
演算値選択器８４１は、インバータ１３０１、セレクタ１３０２、１３０３、制御回路１３０５を備える。また演算値選択器８４１は、入力端子Ｓ、Ｚ、Ｉ０、Ｉ１、Ｉ２、Ｉ３、出力端子Ｆ、Ｏ０、Ｏ１を備える。演算値選択器８４１は、演算値選択器８４０に対して制御回路１３０４の代わりに制御回路１３０５を備える点が異なっている。制御回路１３０５は、入力端子Ｓより部分乗数ｂ（ｉ）が入力されると、図１３（ｂ）に示す入出力論理に基づいてセレクタ１３０２及び１３０３に入力端子の１つを選択させて、その入力端子から入力される値を出力端子Ｏ１、Ｏ０より出力させる。その他の構成要素は、演算値選択器８４０と同様であるので説明を省略する。
【０１０３】
演算値選択器８４２は、インバータ１３０１、セレクタ１３０２、１３０３、制御回路１３０６を備える。また演算値選択器８４２は、入力端子Ｓ、Ｚ、Ｉ０、Ｉ１、Ｉ２、Ｉ３、出力端子Ｆ、Ｏ０、Ｏ１を備える。演算値選択器８４２は、演算値選択器８４０に対して制御回路１３０４の代わりに制御回路１３０６を備える点が異なっている。制御回路１３０６は、入力端子Ｓより部分乗数ｂ（ｉ）が入力されると、図１３（ａ）に示す入出力論理に基づいてセレクタ１３０２及び１３０３に入力端子の１つを選択させて、その入力端子から入力される値を出力端子Ｏ１、Ｏ０より出力させる。その他の構成要素は、演算値選択器８４０と同様であるので説明を省略する。
【０１０４】
ＯＲ回路５１３は、演算値選択器８４０及び８４１それぞれの出力端子Ｆの出力値の論理和をとる。図１３（ｂ）（ｃ）からわかるように、演算値選択器８４０及び８４１において、出力端子Ｆの一方の出力が定数１であるときは必ず他方の出力は０である。よってＯＲ回路５１３は、論理和をとることによって２つの値を１つにまとめ、これにより選択器５２の出力数を減らしている。
【０１０５】
加算器Ａ７０１は、演算値選択器８４２の出力端子Ｏ１、Ｏ０、Ｆの出力値と演算値選択器８４１の出力端子Ｏ１の出力値とを加算して、結果の部分積ｃをレジスタＣａ８４４に出力する。
加算器Ｂ７０２は、演算値選択器８４１の出力端子Ｏ０の出力値と演算値選択器８４０の出力端子Ｏ１、Ｏ０の出力値とＯＲ回路５１３の出力値とを加算しての出力値とを加算して、結果の部分積ｄをレジスタＣａ８４５に出力する。
【０１０６】
ここにおいて部分積ｃと部分積ｄとを加え合わせた値が部分積Ｃ（ｉ）＊ｂ（ｉ）である。
レジスタＣａ８４４レジスタＣｂ８４５は、それぞれ部分積ｃ及び部分積ｄを保持し、制御部８６の制御により保持している部分積ｃ及び部分積ｄを新しく入力される値に更新する。レジスタＣａ８４４レジスタＣｂ８４５の出力側は、累算部８４に接続されており、レジスタＣａ８４４レジスタＣｂ８４５は、保持している部分積ｃ及び部分積ｄを累算部８４に出力する。
【０１０７】
図１４は、累算部８４及び補正部８５のより詳細な構成を示す。
同図において累算部８４は、加算器７０５及びレジスタＳ７０７から構成される。
レジスタＳ７０７は、累算値を保持する１７１ビットのレジスタであり、制御部８６の制御により、保持している累算値を、セレクタ７０６から入力される新しい累算値に更新する。ここでレジスタＳ７０７の最上位ビットは、累算値の正負を表わす符号ビットとなる。
【０１０８】
加算器７０５は、レジスタＳ７０７に保持される累算値とレジスタＣａ８４４及びレジスタＣｂ８４５に保持される部分積ｃと部分積ｄと補正部８５より入力される補正値とを加算してレジスタＳ７０７に出力する。
補正部８５は、レジスタＥ７０３、インバータ７０８、セレクタ７０４、セレクタ７０６から構成される。
【０１０９】
レジスタＥ７０３は、部分積ｃ及び部分積ｄを合わせた値の最大値と成り得る値より大きく、かつ最も近い値を保持する。具体的にはレジスタＥ７０３は、１２２＊Ｐを保持する。
インバータ７０８は、レジスタＥ７０３に保持される１０２２＊Ｐの反転値（以下、¬１０２２＊Ｐと表記する）を出力する。
【０１１０】
セレクタ７０４は、制御部８６の制御に従って、０と１０２２＊Ｐと¬１０２２＊Ｐのうちからいずれかを補正値として選択してセレクタ７０６に供給する。セレクタ７０６は、制御部８６の制御に従って、０または１をセレクタ７０６に供給する。ここで制御部８６は、レジスタＳ７０７に保持される累算値の正負に応じてセレクタ７０４に補正値を選択させ、またセレクタ７０６に０または１を選択させて加算器７０５に供給させる。
【０１１１】
詳しくは制御部８６は、レジスタＳ７０７の最上位ビットが０であれば、セレクタ７０４に¬１０２２＊Ｐを選択させ、セレクタ７０６に１を選択させる。これによってレジスタＳ７０７に保持される累算値が正（最上位ビットが０）である場合には、レジスタＥ７０３からインバータ７０８、セレクタ７０４を介して¬１０２２＊Ｐが、またセレクタ７０６を介して１が加算器７０５に供給される。これのおかげで加算器７０５の次の加算においては１０２２＊Ｐが減算されることになり、加算器７０５及びレジスタＳ７０７における桁あふれが予防される。
【０１１２】
制御部８６は、レジスタＳ７０７に保持される累算値が負（最上位ビットが１）である場合には、セレクタ７０４に０を選択させ、またセレクタ７０６に０を選択させる。
また制御部８６は、ｉ＝１〜１８の繰り返し処理が終わり、レジスタＳ７０７の保持する値が最終的な累算値に更新された場合、その累算値の最上位ビットが１か０かを判定し、１（負を示す）である場合には、セレクタ７０４に１０２２＊Ｐを、セレクタ７０６に０を選択させ、加算器７０５においてレジスタＳ７０７に保持される累算値に１０２２＊Ｐを加えるという補正を行わせる。これによって制御部８６は、最終的な累算値が正になるよう補正させる。また制御部８６は、最終的な累算値の最上位ビットが０（正を示す）の場合には、セレクタ７０４に０を、セレクタ７０６に０を選択させ、累算値の補正が行われないようにする。
【０１１３】
なお部分積算出部８３は、図１５のように構成してもよい。
同図は、図１１に対して、シフタ８４７及びシフタ８４６を備える点と、演算値選択器８４０、８４１、８４２の代わりに演算値選択器８５０、８５１、８５２を備える点が異なっている。
シフタ８４７は、部分乗数ｂ（ｉ）を右に３ビットシフトして出力する。
【０１１４】
シフタ８４６は、部分乗数ｂ（ｉ）を左に３ビットシフトして出力する。
図１６は、演算値選択器８５０、８５１、８５２に共通の制御回路の入出力論理を示す。このように図１６に示す部分積算出部８３は、シフタ８４７及びシフタ８４６を備えることによって、右に３ビットシフトさせた部分乗数ｂ（ｉ）と、部分乗数ｂ（ｉ）そのものと、左に３ビットシフトさせた部分乗数ｂ（ｉ）とを選択器５２に入力させる。こうすることにより演算値選択器８５０、８５１、８５２は、同じ入出力論理を適用することができる。また演算値選択器８５０、８５１、８５２がいずれも同じ入出力論理を適用できるので、制御回路を１つにしても良い。
【０１１５】
また実施形態１において剰余算出部１２が有するROMテーブルは２つ、実施形態２において剰余算出部８２が有するROMテーブルは１つであったが、３つ以上のROMテーブルを備えるようにしてもよい。
＜式の展開＞
実施形態１における（式１）から（式２）の変形過程を以下に示しておく。
【０１１６】
【数９】

【０１１７】
【発明の効果】
本発明の剰余乗算装置は、被乗数ａと乗数ｂ（ｂはｋｂビットのデータ）との積に対して、法ｐ（ｐはｋビットデータ）による剰余と合同な値を次式の累算値として算出する剰余乗算装置であって、
累算値＝ Σ C(i)*b[s*i+s-1:s*i]
（ここでΣはi=０〜[[kｂ/s]]までの累算を示す。[[kｂ/s]]は商kｂ/sの整数部分であり、iは０から[[k/s]]までの整数であり、C(i)は漸化式で表わされi=0のときC(i)=a、i>=1のときC(i)≡(C(i-1)*2^s) mod ｐであり（≡は両辺の値が法ｐにおいて合同であることを示す））
b[s*i+s-1:s*i]は、ｋビットの乗数ｂのうち2^s*i+s-1の位から2^s*iの位までのｓビットの部分乗数であり、前記剰余乗算装置は、ｍ(ｍはｓ以上の整数)ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶するテーブル手段と、初回（i=0）では被乗数ａを中間数C(0)として出力し、２回目以降（i>0）では、前回出力された中間数C(i-1)をｓビット桁上げし、桁上げ後の中間数の下位ｋビットを除く上位のｍビットについての前記剰余をテーブル手段から読み出し、読み出した剰余と下位ｋビットとを加算することにより新たな中間数(i)を算出する中間数算出手段とを備え、算出された各中間数C(i)と、それに対応する部分乗数b[s*i+s-1:s*i]との部分積C(i)*b[s*i+s-1:s*i]を順次累算することによって前記累算値を算出するよう構成される。
【０１１８】
この構成によれば、テーブル手段はｍビットにより表現される各値、つまり０から２^m−１に相当する各値について、それらの値の２^K倍に対する法ｐによる剰余を予め記憶する。中間数算出手段は、ｉ＞＝１のとき、まず桁上げ後の中間数の下位ｋビットを除く上位ｍビットについての剰余、つまり上位ｍビットの２^k倍に対する法ｐによる剰余をテーブル手段から読み出す。次に中間数算出手段は、テーブル手段より読み出した剰余と下位ｋビットとを加算することにより桁上げ後の中間数の法ｐによる剰余又は合同な値を算出する。
【０１１９】
ここで桁上げ後の中間数の下位ｋビットは、法ｐによる剰余そのものの値であるか、又は法ｐによる剰余と合同な値である。この合同な値は、剰余より大きくかつ剰余により近い値である。よって桁上げ後の中間数について法ｐによる剰余又は合同な値であって剰余により近い値を得るには、中間数算出手段は、上位ｍビットについて剰余を得て、これと下位ｋビットとを加え合わせればよい。中間数算出手段は、上位ｍビットの剰余をテーブル手段から読み出すことによって、桁上げ後の中間数の法ｐによる剰余又は合同な値をより短時間で算出する。
【０１２０】
仮に、従来技術の剰余乗算装置１の剰余算出部３における剰余算出の手順と同様な手順を用いて桁上げ後の中間数の法ｐによる剰余を求めるとすれば、中間数算出部は、ステップ６〜８のループ処理を複数回繰り返さなければならない。より詳しくは、このループ処理の回数は、桁上げ後の中間数のビット数に相当する。これに対し本発明における中間数算出部は、テーブル部の１回の読み出しと、１回の加算によって桁上げ後の中間数の法ｐによる剰余又は合同な値を算出することができ高速な剰余演算を実現できる。
【０１２１】
また中間数算出手段は、漸化的に前回の中間数を桁上げした値の剰余を中間数として算出し、これによって中間数算出手段は、算出する中間数のビット数の増大を防止している。この防止により、中間数を用いて算出される部分積のビット数の増大を防止する。この結果、部分積の累算においてはビット数の抑えられた部分積を用いて加算を行えばよいので、加算器の回路規模を抑えることができる。
【０１２２】
また前記中間数算出手段は、中間数を保持する第１保持手段と、部分乗数b[s*i+s-1:s*i]に対応して、第１保持手段に保持された中間数をｓビット桁上げする桁上げ手段と、桁上げ後の中間数を、桁上げ後の中間数における下位ｋビットよりも上位のｍビットの部分である上位データと下位ｋビットの部分である下位データとに分割する分割手段と、分割手段による上位データに対する法ｐによる剰余を前テーブル手段から読み出す読み出し手段と、テーブル手段から読み出された剰余と、分割手段による下位データとを加算することにより新たな中間数を得る加算手段とを備え、前記第１保持手段は、加算手段において新たな中間数が得られる度にその保持内容を新たな中間数に更新し、前記初回では被乗数を中間数として出力し、前記２回目以降では更新後の新たな中間数を出力するよう構成される。
【０１２３】
この構成によれば、中間数算出手段内部の各構成要素は、汎用のハードウェア素子を用いて簡単に構成することができる。つまり第１保持手段はレジスタ、桁上げ手段はシフタ、分割手段は前記シフタの上位ｍビット部分と下位ｋビット部分に接続されたｍビットのバスとｋビットのバス、加算手段は加算器を用いて構成でき、これらの構成要素によって中間数算出手段は、中間数Ｃ（ｉ）を漸化的に算出する。
【０１２４】
また前記剰余乗算装置は、さらに、前記分割手段、読み出し手段、加算手段を利用することにより、前記累算値の法ｐによる剰余を求める後処理手段を備える。
この構成によれば、後処理手段が最後の累算値の法ｐによる剰余を求めることにより、最後の累算値をｐよりも小さい値にすることができる。
【０１２５】
また前記テーブル手段は、ｍビットにより表現される各値に対応するアドレスが入力され、そのアドレスが指す記憶領域に当該アドレスに対応するｍビットの値の２^k倍に対する法ｐによる剰余を予め記憶するメモリ素子を有する。
この構成によれば、テーブル手段はメモリ素子１個で実現することができる。メモリ素子は、ｍビットにより表現される各値、つまり１０進数の０から２^m−１に相当する各値について、その２^k倍の法ｐによる剰余を予め記憶する。これら剰余の記憶領域を指し示すアドレスは、各値と同じ値である。メモリ素子は、読み出し手段から上位ｍビットが入力されると、その値をアドレスとして、そのアドレスが示す記憶領域の剰余が読み出し手段によって読み出される。これによって中間数算出手段は、メモリ素子からの１回の読出しを行うだけで、上位ｍビットに対応する剰余を短時間で得ることができる。
【０１２６】
また前記mビットは下位m1ビットと上位m2ビット（m＝m1+m2）とに分割され、前記mビットにより表現される各値は、m1ビットにより表現される値とm2ビットにより表現される値との組み合わせに対応し、前記テーブル手段は、m1ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶する第１部分テーブル手段と、m2ビットにより表現される各値について、その値の２^k+m1倍に対する法ｐによる剰余を予め記憶する第２部分テーブル手段とを備え、前記加算手段は、第１、第２部分テーブルからの読み出されたそれぞれの剰余と、前記下位データとを加算することにより新たなっ中間数を得るよう構成される。
【０１２７】
この構成によれば、テーブル手段は、ｍビットのうち上位側のｍ１ビットに対応する第１部分テーブル手段と、下位側のｍ２ビットに対応する第２部分テーブル手段とを有する。第１部分テーブル手段は、上位側ｍ１ビットに対応する剰余を２^m1個記憶し、第２部分テーブル手段は、下位側ｍ２ビットに対応する剰余を２^m2個記憶する。この個数からわかるように、第１及び第２部分テーブル手段とを合わせた剰余の記憶個数は、メモリ素子１個の場合に比べて、さらに少なくなる。これによりテーブル手段は、記憶容量の小さい２つのメモリ素子によって構成することができる。
【０１２８】
また前記mビットは下位側からt(3≦t≦m)個の部分ビットm1、…、mtに分割され、前記mビットにより表現される各値は、各mi（ｉは１からｔまで整数）ビットにより表現される値（ｔ個）を組み合わせたものに対応し、前記テーブル手段は、部分ビットmiビットにより表現される各値について、その値の２^k+x倍（ここでｘ=m1+…＋m(i-1)である）に対する法ｐによる剰余を予め記憶するｔ個の部分テーブル手段Ｔiを備え、前記加算手段は、ｔ個の部分テーブル手段Tiからの読み出されたｔ個の剰余と、前記下位データとを加算することにより新たな中間数を得るよう構成される。
【０１２９】
このようにテーブル手段は、ｍビットを分割したｔ個の部分ビットの各値に対応するｔ個の部分テーブル手段Tiを有する。このようにテーブル手段は、複数の部分テーブル手段Tiを有することにより各部分テーブル手段Tiにおける剰余の記憶個数を少なくしている。
前記剰余乗算装置は、さらに累算手段と補正手段とを備え、前記累算手段は、初期値として０を保持する第３レジスタと、部分積C(i)*b[s*i+s-1:s*i]と第３レジスタに保持された累算値とを加算し、加算結果を新たな累算値として第３レジスタに出力して保持させる加算器とを備え、前記補正手段は、ｐの整数倍の値をもつ補正値を保持する補正値保持手段と、前記第３レジスタに保持された累算値が所定の値以上であれば、補正値保持手段に保持された補正値を、前記加算器に減算させる補正制御手段とを備える。
【０１３０】
この構成によれば補正制御手段は、累算値が所定の値以上であれば、加算器に累算値から補正値を減算させるので、第３レジスタにおける桁あふれが防止される。
また前記第３レジスタは符号ビットを有し、前記補正制御手段は、第３レジスタの累算値が正であれば、前記加算器に対して累算値と部分積との加算と同時に前記補正値を減算させ、前記補正値は、ｐの整数倍であってその絶対値が前記部分積の最大値(t+1)(2^s -1)p以下の値であることを特徴とする。
【０１３１】
この構成によれば、補正制御手段は第３レジスタに保持される累算値の符号ビットが１（累算値が正を示す）であるとき、加算器に累算値から補正値を減算させるので、第３レジスタにおける桁あふれが防止される。また補正値は、ｐの整数倍であって前記部分積の最大値(t+1)(2s-1)p以下の値であるので、累算値の最終的な値がｐより小さい値か又はｐに近い値にする。
【０１３２】
また前記法ｐはｐ＝2^k−αの関係を満たし、各部分テーブル手段Ｔiは、前記剰余をk3ビットのデータとして記憶する、ここで、k3はt*2^m*αのビット数であり、前記αは、k3がｋより小さくなるように定められた定数であることを特徴とする。
この構成によればｐ＝2^k−αの関係を満たし、αはｋ３がｋより小さくなるように上限を定められた定数である。これらによって部分テーブル手段Tiに記憶される剰余はｋ３ビットまでに制限することができ、部分テーブルからの出力ビット数を少なくすることができる。部分テーブル手段Tiから出力される剰余のビット数がｋ３ビット以下であるので、それに合わせて加算部のビット幅も制限することができる。
【０１３３】
また各部分テーブル手段Ｔiは、miビットにより表現される０から(2^mi-1)までの値に対応する2^mi個のエントリを有し、ｊ(jは０から2^miまで)番目のエントリは、j*2^m1+…^+m(i-1)*αを格納していることを特徴とする。
この構成において、部分テーブル手段Tiは、記憶するエントリの個数を２^mi−１と制限することができ、部分テーブルの規模を小さくすることができる。
【０１３４】
また、αが仮にｕビットであるとすると、ｊ番目のエントリであるｊ*2^m1+…^+m(i-1)＊αのビット数は、最大でもｊのビット数にｍ１＋ｍ２＋...＋ｍ（ｉ−１）とｕビットとを加え合わせた値に制限することができる。部分テーブル手段Tiより出力される剰余を加算する加算手段は、ビット幅を制限することができる。
【０１３５】
本発明の剰余乗算装置は、被乗数と乗数との積に対する法ｐ（ｐはｋビットデータ）による剰余と合同な数を算出する剰余乗算装置であって、乗数をｓ（ｓは２以上の整数）ビットずつに分けて得られるｓビットの部分乗数を下位側から順に出力する出力手段と、各部分乗数の位に応じて、被乗数を桁上げし、桁上げ後の被乗数に対して、法ｐによる剰余と合同な数（以下中間数と呼ぶ）を算出する第１算出手段と、出力手段により出力された部分乗数と、当該部分乗数に対応して第１算出手段により算出された前記中間数との積を部分積として算出する第２算出手段と、第２算出手段に算出された部分積を累算する累算手段と、累算手段に累算された累算値にｐの整数倍の値を加減算することにより、累算値を所定のビット数を越えないように補正する補正手段と、出力手段により全ての部分乗数が出力されるまで、第１算出手段による中間数の算出と、第２算出手段による部分積の算出と、累算手段による累算と、補正手段による補正とを繰り返し行わせる制御手段とを備え、前記第１算出手段は、ｍ(ｍはｓ以上の整数)ビットにより表現される各値について、その値の２^k倍に対する法ｐによる剰余を予め記憶し、制御手段による繰り返しのうち初回では被乗数を中間数として出力し、２回目以降では、前回出力された中間数をｓビット桁上げし、桁上げ後の中間数の下位ｋビットより上位のｍビットについてテーブル手段を読み出し、読み出した数と下位ｋビットとを加算することにより新たな中間数を算出するよう構成される。
【０１３６】
この構成によれば第１算出手段は、制御手段により繰り返しのうち２回目以降では、ｓビット桁上げされた桁上げ後の中間数の下位ｋビットを除く上位もｍビットについてテーブル手段より対応する剰余を読出し、その剰余と下位ｋビットとを加算することにより、桁上げ後の中間数の法ｐによる剰余又は合同な値を中間数として算出する。これによって第１算出手段は、剰余の算出を高速に行う。
【０１３７】
また第２算出手段は、ｓビットの部分乗数毎に部分乗数と中間数との部分積を算出する。このように複数桁の部分乗数ずつ部分積を算出するので、部分乗数の個数分だけ部分積の算出を行えばよく、第２算出手段における繰り返しの回数が減り、従来技術の積算出部２に比べて高速になる。
また前記制御手段は、第１〜第３ステージを含むパイプライン処理を制御する、第１ステージでは出力手段に部分乗数を出力させるとともに第１算出手段に中間数を出力させ、第２ステージでは第２算出手段に部分積を算出させ、第３ステージでは累算手段に累算させるともに補正手段に補正させるよう構成される。
【０１３８】
この構成によれば剰余乗算装置は、パイプライン処理を行うので、各構成要素における処理が効率良く行われ、演算の処理速度がより高速になる。
また前記出力手段は、最初に乗数を保持し、保持している値の下位ｓビットを部分乗数として出力する乗数保持手段と、乗数保持手段に保持されている値をｓビット下位側にシフトさせ、シフト後の値を乗数保持手段に出力して保持させるシフト手段とを備える。
【０１３９】
この構成によれば、出力手段は、乗数を保持するレジスタと、保持された乗数をｓビット下位側にシフトさせるシフタによって簡単に構成することができる。
また前記第２生成手段は、第ｉ（ｉは１から（ｓ−１）の整数）のシフト手段は、第１算出手段により算出された前記中間数をｉビット左シフトにより桁上げする第１から第（ｓ−１）のシフト手段を備え、前記第１生成手段は、第１算出手段により算出された前記中間数をｓビット左シフトにより桁上げする第ｓのシフト手段と、前記中間数の１の補数を生成する補数生成手段と、定数１を出力する定数出力手段とを備え、前記第２加算手段は、全ビットが１であると判定された場合に、第ｓのシフト手段の出力と、補数生成手段に生成された１の補数と、定数１を選択し、全ビットが１であると判定されなかった場合に、部分乗数の２⁰の位が”１”であれば前記中間数を選択し、部分乗数の２ⁱの位が”１”であれば、第ｉのシフト手段の桁上げ結果を選択する選択手段と、選択手段の選択結果を加算することにより前記部分積を算出する加算器とを備える。
【０１４０】
この構成によれば、第１から第（ｓ−１）のシフト手段は、１〜（ｓ−１）ビット左シフトを行うシフタによって実現される。第ｓのシフト手段は、ｓビット左シフトを行うシフタによって実現される。補数生成手段は、インバータにより実現される。このように第１生成手段及び第２生成手段は、汎用的なハードウェア素子によって構成することができる。これらの構成要素は、加算手段に加算させるべき候補の値を前もって生成する。選択手段は、部分乗数の値に応じて候補の値から加算手段に加算させるべき値を選択する。加算手段は、選択された候補の値を加算することによって部分積を算出する。このような構成によって第２算出手段は、複数桁の中間数と部分乗数との部分積を高速に算出することができる。
【図面の簡単な説明】
【図１】第１実施形態における剰余乗算装置１００の概略構成を示すブロック図である。
【図２】剰余算出部１２のより詳細な構成図を示す。
【図３】（ａ）ＲＯＭテーブル３０９の記憶内容を示す。
（ｂ）ＲＯＭテーブル３０８の記憶内容を示す。
【図４】乗数分割部１１のより詳細な構成を示す。
【図５】部分積算出部１３のより詳細な構成を示す。
【図６】累算部１４及び補正部１５の詳細な構成を示す。
【図７】第２実施形態における剰余乗算装置２００の概略構成を示すブロック図である。
【図８】剰余算出部８２のより詳細な構成を示す。
【図９】ＲＯＭテーブル９０４の記憶内容を示す。
【図１０】乗数分割部８１のより詳細な構成を示す。
【図１１】部分積算出部８３のより詳細な構成を示す。
【図１２】演算値選択器８４０、８４１、８４２の詳細な構成を示す。
【図１３】（ａ）は、制御回路１３０６に入力される部分乗数ｂ（ｉ）と制御回路１３０６の入出力論理との対応を示す。
（ｂ）は、制御回路１３０５に入力される部分乗数ｂ（ｉ）と制御回路１３０５の入出力論理との対応を示す。
（ｃ）は、制御回路１３０４に入力される部分乗数ｂ（ｉ）と制御回路１３０４の入出力論理との対応を示す。
【図１４】累算部８４及び補正部８５のより詳細な構成を示す。
【図１５】部分積算出部８３は、図１５のように構成してもよい。
【図１６】演算値選択器８５０、８５１、８５２に共通の制御回路の入出力論理を示す。
【図１７】従来の剰余乗算装置の構成を示す。
【符号の説明】
１００剰余乗算装置
１１乗数分割部
１２剰余算出部
１３部分積算出部
１４累算部
１５補正部
１６制御部
３０８ＲＯＭテーブル
３０９ＲＯＭテーブル
３０１セレクタ
３０２レジスタＡ
３０３シフタ
３０４セレクタ
３１０加算器
３１１剰余演算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a modular multiplication apparatus that performs modular multiplication with a high-speed and small-circuit configuration.
[0002]
[Prior art]
In recent years, with the development of encryption technology in the communication field, there is an increasing demand for various arithmetic units used for encryption.
For example, in the elliptic cryptosystem, a modular multiplication device for performing modular multiplication is used. Here, the remainder multiplication is to perform (a * b) modp operation on the three integers a, b and p. Here, a is a multiplicand, b is a multiplier, and p is a modulus.
[0003]
FIG. 17 shows a configuration of a conventional modular multiplication apparatus.
In the remainder multiplication apparatus 1 shown in FIG. 2, a, b, and p are small values as represented by 4 bits for the sake of simplicity.
The remainder multiplication apparatus 1 includes a product calculation unit 2, a remainder calculation unit 3, and a control unit 4.
The product calculation unit 2 includes a multiplicand register 5 (4 bits), an adder 6 (4 bits), and a product register 7 (9 bits). The product a * b of the multiplicand a and the multiplier b is controlled by the control unit 4. Is calculated.
[0004]
The processing procedure of the control unit 4 for the product calculation unit 2 is shown below.
Step 1: As an initial setting, the multiplicand a is held in the multiplicand register 5, the multiplier b is held in the lower 4 bits of the product register 7 (8 bits), and 0 is held in the upper 5 bits of the product register 7.
Step 2: It is determined whether the least significant bit of the product register 7 is 0 or 1. If it is determined to be 1, the adder 6 removes the most significant bit from the upper 5 bits of the product register 7 4 The value held in the bit and the multiplicand a held in the multiplicand register 7 are added, and the added value is held in the upper 4 bits of the product register 7.
[0005]
Step 3: Shift product register 7 right by 1 bit.
Step 4: The process of steps 2-3 is repeated four times.
As described above, when the loop processing in steps 2 to 3 is repeated four times, the product a * b is held in the product register 7.
The remainder calculation unit 3 includes a modulus register 8 (4 bits), an adder 9 (5 bits), and a remainder register 10 (13 bits), and is controlled by the control unit 4 in the same manner as handwritten writing. The remainder (a * b) modp is calculated by the modulus p of b. The remainder calculated by the remainder calculation unit 3 is a signed 5-bit value.
[0006]
The processing procedure of the control unit 4 for the remainder calculation unit 3 is shown below.
Step 5: As an initial setting, the modulus p is held in the modulus register 8, the product a * b held in the product register 7 is held in the lower 8 bits of the

remainder register

10, and 0 is held in the upper 5 bits.
Step 6: Shift the left register 1 bit by 1 bit.
[0007]
Step 7: The adder 9 subtracts the modulus p held in the modulus register 8 from the value held in the upper 5 bits of the remainder register 10, and holds it in the upper 5 bits of the remainder register 10.
Step 8: The sign of the upper 5 bits of the remainder register 10 is determined from the value of the most significant bit, and if it is determined to be negative (the most significant bit is 1), the adder 9 uses the upper 5 bits of the remainder register 10 and the modulus register. The value of the remainder register 10 is restored by adding the modulus p of 8 (that is, the value before being subtracted in step 7).
[0008]
Step 9: The process of steps 6-8 is repeated 8 times.
As described above, when the loop process of steps 6 to 8 is repeated eight times, the remainder (a * b) modp is held in the upper 5 bits of the remainder register 10.
In this way, the remainder multiplying apparatus 1 calculates the partial product of the multiplicand a and each bit of the multiplier b in the product calculation unit 2 in correspondence with the multiplicand a from the lower side of b, and calculates the calculated partial product. The product a * b is calculated by sequentially accumulating. Then, the remainder calculating unit 3 shifts the product a * b calculated by the product calculating unit 2 and adjusts the digit of the modulus p from the a * b to the modulus a while aligning the digit from the upper digit to the lower digit of the product a * b. The remainder (a * b) modp is calculated by subtracting p.
[0009]
[Problems to be solved by the invention]
In the remainder multiplication apparatus 1 in the above prior art, since a, b, and p are as small as 4 bits, the adder 6 and the adder 9 only need to be able to add 4 bits, and the number of loop processes is also sufficient. The number is 4 times in the

product calculation unit

2 and 8 times in the remainder calculation unit 3. On the other hand, a, b, and p used for actual elliptic encryption are considerably large values such as 160 bits. Therefore, when the conventional remainder multiplication apparatus 1 is made to correspond to an operation using 160 bits a, b, and p, the adder 6 and the adder 9 need to perform 160 bits addition, which increases the circuit scale. There was a problem. In addition, the number of loop processes is repeated 160 times in the product calculation unit 2 and 320 times in the remainder calculation unit 3, and there is a problem that it takes a calculation time.
[0010]
Assuming that the remainder multiplication apparatus 1 according to the prior art is adapted to an operation using any value in this way, the larger the number of bits of the value, the larger the number of bits added in the adder. There is a problem in that the circuit scale becomes large and the number of loop processes increases, resulting in a long calculation time.
As another conventional technique for solving the calculation time problem of the remainder multiplication apparatus 1, there is a remainder multiplication method described in “United States Patent 5,144,574 Modular multiplication method and the system for processing data”.
[0011]
This remainder multiplication method calculates a partial product of 2 bits each of a multiplicand a and a multiplier b in order corresponding to the multiplicand a by 2 bits sequentially from the higher-order side of the multiplier b, and from each partial product, it is calculated. This is a method of calculating a partial remainder by subtracting an integral multiple of the modulus p and calculating the partial residue by the partial product method p, and accumulating the partial residue.
[0012]
In an apparatus using such a modular multiplication method, the modular multiplication apparatus 1 calculates a partial product by associating each bit of the multiplier b with the multiplicand a, whereas the multiplicand a and the multiplier b 2 Since the partial product for each bit is calculated, the number of loop processes for calculating the partial product is reduced, and the calculation time is shortened.
As described above, the remainder multiplication method is intended to shorten the calculation time mainly by calculating a partial product in which the multiplicand a and the multiplier b are associated with each two bits. However, in this remainder multiplication method, the number of bits of the multiplier b corresponding to the multiplicand a when calculating the partial product is as few as 2 bits, and when a, b, and p are large values, the calculation time is much longer. The shortening effect cannot be obtained.
[0013]
In view of the above problems, an object of the present invention is to provide a modular multiplication apparatus that performs high-speed computation while suppressing the circuit scale.
[0014]
[Means for Solving the Problems]
In order to solve the above problem, the remainder multiplication apparatus of the present invention obtains a congruent value with the remainder by the modulus p (p is k-bit data) for the product of the multiplicand a and the multiplier b (b is kb-bit data). A modular multiplication device that calculates the accumulated value of the following equation,
Accumulated value = ΣC (i) * b [s * i + s-1: s * i]
(Where Σ is an accumulation from i = 0 to [[kb / s]]. [[Kb / s]] is the integer part of the quotient kb / s, and i ranges from 0 to [[k / s ]], And C (i) is expressed by a recurrence formula. When i = 0, C (i) = a, and when i> = 1, C (i) ≡ (C (i-1) * 2 ^s ) mod p (≡ indicates that the values on both sides are congruent in modulus p))
b [s * i + s-1: s * i] is 2 out of the k-bit multiplier b ^{s * i + s-1} 2 from the place of ^{s * i} S-bit partial multipliers up to ^k Table means for preliminarily storing the remainder by the modulus p for the multiplication, and in the first time (i = 0), the multiplicand a is output as an intermediate number C (0);
From the second time on (i> 0), the intermediate number C (i-1) output last time is carried out by s bits, and the remainder for the upper m bits excluding the lower k bits of the intermediate number after the carry. Intermediate number calculation means for calculating a new intermediate number (i) by reading from the table means and adding the read remainder and the lower k bits, and corresponding to each calculated intermediate number C (i) The partial product C (i) * b [s * i + s-1: s * i] with the partial multiplier b [s * i + s-1: s * i] It is configured to calculate a value.
[0015]
The intermediate number calculating means includes a first holding means for holding the intermediate number and an intermediate number held in the first holding means corresponding to the partial multiplier b [s * i + s−1: s * i]. And a carry means for carrying s bits, and an intermediate number after the carry, the upper data which is the upper m bits of the lower k bits and the lower k which is the lower k bits of the intermediate number after the carry A dividing unit that divides the data into data, a reading unit that reads the remainder of the higher-order data by the dividing unit by the method p from the previous table unit, a remainder read from the table unit, and the lower data by the dividing unit Adding means for obtaining a new intermediate number, and the first holding means updates the held content to a new intermediate number each time a new intermediate number is obtained by the adding means, and the multiplicand is changed to the intermediate number at the first time. Output as the second time Descending in is configured to output a new intermediate number of updated.
[0016]
The remainder multiplication device further includes post-processing means for obtaining a remainder by the modulus p of the accumulated value by using the dividing means, the reading means, and the adding means.
The table means receives an address corresponding to each value represented by m bits, and stores the m-bit value 2 corresponding to the address in the storage area indicated by the address. ^k It has a memory element for preliminarily storing a remainder by the modulus p for the multiplication.
[0017]
The m bits are divided into lower m1 bits and upper m2 bits (m = m1 + m2), and each value represented by the m bits is a value represented by the m1 bit and a value represented by the m2 bit. The table means, for each value represented by m1 bits, 2 ^k A first partial table means for preliminarily storing a remainder by a modulus p for a multiplication factor, and for each value represented by m2 bits, 2 of the value ^{k + m1} Second partial table means for preliminarily storing a remainder obtained by modulus p with respect to multiplication, wherein the adding means adds the respective remainders read from the first and second partial tables and the lower order data To obtain a new intermediate number.
The m bits are divided into t (3 ≦ t ≦ m) partial bits m1,..., Mt from the lower side, and each value represented by the m bits is each mi (i is an integer from 1 to t). ) Corresponding to a combination of values (t) expressed by bits, the table means for each value expressed by partial bits mi bits, ^{k + x} T partial table means Ti for preliminarily storing a remainder by modulus p for a multiple (here, x = m1 +... + M (i-1)), and the adding means includes t partial table means Ti A new intermediate number is obtained by adding the read t remainders and the lower order data.
[0018]
The remainder multiplication apparatus further includes an accumulation unit and a correction unit, and the accumulation unit includes a third register that holds 0 as an initial value, and a partial product C (i) * b [s * i + s. -1: s * i] and the accumulated value held in the third register are added, and the addition means outputs the added result to the third register as a new accumulated value and holds it. Is a correction value holding means for holding a correction value having an integer multiple of p, and a correction value held in the correction value holding means if the accumulated value held in the third register is equal to or greater than a predetermined value. Correction control means for causing the adder to subtract a value.
[0019]
The third register has a sign bit, and the correction control means corrects the correction simultaneously with the addition of the accumulated value and the partial product to the adder if the accumulated value of the third register is positive. The correction value is an integer multiple of p, and its absolute value is the maximum value (t + 1) (2 ^s -1) The value is less than or equal to p.
The method p is p = 2. ^k Each partial table means Ti stores the remainder as k3 bit data, where k3 is t * 2 ^m * The number of bits of α, wherein α is a constant determined such that k3 is smaller than k.
[0020]
Each partial table means Ti is represented by 0 to (2 ^mi 2 for values up to -1) ^mi J (j is 0 to 2) ^mi The second entry is j * 2 ^{m1 +} ... ^{+ m (i-1)} * α is stored.
A remainder multiplication apparatus according to the present invention is a remainder multiplication apparatus that calculates a number congruent with a remainder by a modulus p (p is k-bit data) for a product of a multiplicand and a multiplier, and the multiplier is s (s is an integer of 2 or more) ) Output means for outputting partial multipliers of s bits obtained by dividing each bit in order from the lower side, and the multiplicand is carried according to the position of each partial multiplier, and the modulus p is applied to the multiplicand after the carry. A first calculation means for calculating a number congruent with the remainder by the above (hereinafter referred to as an intermediate number), a partial multiplier output by the output means, and the intermediate number calculated by the first calculation means corresponding to the partial multiplier A second calculating means for calculating the product of the two as a partial product, an accumulating means for accumulating the partial product calculated by the second calculating means, and an integral multiple of p to the accumulated value accumulated by the accumulating means By adding or subtracting the value, the accumulated value does not exceed the specified number of bits. Until all partial multipliers are output by the output means, the intermediate number by the first calculation means, the partial product by the second calculation means, the accumulation by the accumulation means, Control means for repeatedly performing correction by the correction means, wherein the first calculation means is configured to calculate 2 for each value represented by m (m is an integer equal to or greater than s) bits. ^k The remainder by the modulus p for the multiplication is stored in advance, and the multiplicand is output as an intermediate number in the first iteration among the repetitions by the control means. The table means is read for m bits higher than the lower k bits of the number, and a new intermediate number is calculated by adding the read number and the lower k bits.
[0021]
The control means controls pipeline processing including the first to third stages. In the first stage, the output means outputs a partial multiplier and the first calculation means outputs an intermediate number, and the second stage outputs the intermediate number. 2 The partial product is calculated by the calculating means, and in the third stage, the accumulating means is accumulated and the correcting means is corrected.
The output means holds the multiplier first, and outputs the lower s bits of the held value as a partial multiplier, and shifts the value held in the multiplier holding means to the lower side of the s bits. Shift means for outputting and holding the shifted value to the multiplier holding means.
[0022]
The second generation means is a first shift means for shifting the intermediate number calculated by the first calculation means by i-bit left shift. The i-th shift means (i is an integer from 1 to (s-1)). To (s-1) th shift means, wherein the first generation means carries the intermediate number calculated by the first calculation means by s-bit left shift, and the intermediate number Complement generating means for generating a 1's complement, and constant output means for outputting a constant 1, wherein the second adding means is configured to output the sth shift means when all bits are determined to be 1. When the output, the one's complement generated by the complement generation means, and the constant 1 are selected and it is not determined that all bits are 1, the partial multiplier is 2 ⁰ If the place is “1”, the intermediate number is selected and the partial multiplier is 2 ⁱ If the place is “1”, there are provided selection means for selecting the carry result of the i-th shift means and an adder for calculating the partial product by adding the selection results of the selection means.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
<First Embodiment>
The remainder multiplication apparatus in this embodiment will be described below with reference to the drawings.
The remainder multiplication apparatus according to the present embodiment is an apparatus used for computation of elliptic curve cryptography. When a multiplicand A (160 bits) and a multiplier B (160 bits) are input, the product A * shown in (Expression 1). The remainder R (= A * BmodP) or the remainder R in the modulus P (160-bit prime number) of B is calculated on the remainder field. Here, the modulus P is P = 2 when α is a value of 54 bits or less. ¹⁶⁰ It is a value satisfying −α.
[0024]
This modular multiplication apparatus is configured based on (Expression 2) obtained by modifying (Expression 1). Note that the details of the transformation process from (Formula 1) to (Formula 2) are shown at the end of the embodiment of the invention just in case.
[0025]
[Equation 3]

[0026]
[Expression 4]

[0027]
<Explanation of arithmetic expression>
In (Formula 2), R1 is the same value in the remainder field of R and modulus P. Here, the fact that R and R1 are the same in the remainder field of the modulus P indicates that the difference between R1 and R is an integral multiple of P.
R1 is calculated by accumulating C (i) * b (i) from i = 1 to k / s. Here, k is the number of bits of the multiplier B. s is the number of bits of each partial multiplier when the multiplier B is divided into partial multipliers of several bits. That is, k / s is the same value as the number of partial multipliers. Here, k and s are determined so that k is divisible by s.
[0028]
b (i) = b [s * i−1: s * i−s)] represents a partial multiplier obtained by dividing the multiplier B into s bits from the lower order. b (1) = b [s-1: 0] when i = 1, b (2) = b [2s-1: s] when i = 2, b (3) = [3s when i = 3 −1: 2s],. . . It becomes. Here, when the k-bit multiplier B is expressed in binary, bk-1bk-2. . . It shall be expressed as b2b1b0. Also, b [x: y] indicates a bit string from the lower x + 1 bit to the y + 1th bit (where x> y) of the multiplier B. Therefore, for example, b [15: 8] indicates a partial multiplier from the lower 16th bit to the 9th bit of the multiplier B, and is expressed as b15b14b13b12b11b10b9b8 in binary notation.
[0029]
C (i) is C (1) = A (A is a multiplicand) when i = 1, and C (i) ≡C (i-1) * 2 when 2 <= i <= k / s. ^s It is expressed by a recurrence formula called modP. That is, C (i) is C (1) = A (multiplicand), C (2) ≡A * 2 ^s modP, C (3) ≡ (A * 2 ^s modP) * 2 ^s ModP, ... changes gradually. In this recurrence formula, 2 ^s Corresponds to a carry of the partial multiplier b (i) of s bits. C (i) is a remainder in modulus P of a value obtained by carrying s bits of C (i−1), which is the previous value. Hereinafter, C (i) is hereinafter referred to as an intermediate number.
[0030]
Thus, in (Equation 2), R1 is recursively calculated as the intermediate number C (i) with the remainder of the value raised by s bits for the multiplicand A in the range of i = 1 to k / s. In addition, by dividing the multiplier B into partial multipliers b (i) of s bits from the lower order, and obtaining and accumulating those partial products C (i) * b (i) by repeating k / s times Desired. Hereinafter, repeating the operation k / s times to calculate R1 is referred to as an iterative process.
<Configuration of Remainder Multiplier>
FIG. 1 is a block diagram illustrating a schematic configuration of a remainder multiplication apparatus 100 according to the present embodiment.
[0031]
In FIG. 1, the remainder multiplication apparatus 100 includes a multiplier division unit 11, a remainder calculation unit 12, a partial product calculation unit 13, an accumulation unit 14, a correction unit 15, and a control unit 16. The remainder multiplication apparatus 100 is configured based on (Expression 2) when k = 160 and s = 8.
The multiplier division unit 11 sequentially outputs 8-bit partial multipliers b (i) = b [8i-1: 8i-8] (i = 1,..., 19, 20) from the lower order side of the multiplier B. .
[0032]
The remainder calculation unit 12 recursively calculates the intermediate number C (i) (i = 1,..., 19, 20). That is, the remainder calculation unit 12 outputs the multiplicand A as the intermediate number C (1) for the first time (i = 1). For the second and subsequent times (i = 2, 3,..., 20), the previous intermediate number C (i−1) is carried by 8 bits, and the carry value C (i−1) * 2 ⁸ Is calculated as an intermediate number C (i) (i = 2, 3,..., 20). More specifically, the remainder calculation unit 12 determines that the first time is A and the second time is A * 2. ⁸ modP, the third time (A * 2 ⁸ modP) * 2 ⁸ modP, 4th ((A * 2 ⁸ modP) * 2 ⁸ modP) * 2 ⁸ modP,. . . Is calculated as an intermediate number C (i). Here, C (i) calculated by the remainder calculation unit 12 is a carry value C (i−1) * 2. ⁸ And may be a value greater than P.
[0033]
The remainder calculation unit 12 includes a ROM table 308 and a ROM table 309 so that the remainder is calculated at high speed. These ROMs will be described in detail later, but carry value C (i-1) * 2 ⁸ The remainder of the modulus P corresponding to the part excluding the lower 160 bits from the part, that is, the part exceeding 160 bits by carry is stored in advance. The remainder calculation unit 12 carries the carry value C (i−1) * 2. ⁸ Referring to these ROMs for the portion exceeding 160 bits, the remainder corresponding to the portion exceeding 160 bits obtained from the ROM and the carry value C (i−1) * 2 ⁸ The intermediate number C (i) is calculated by adding the lower 160 bits.
[0034]
The partial product calculation unit 13 calculates a partial product C (i) * b (i) between the intermediate number C (i) output from the remainder calculation unit 12 and the partial multiplier b (i) output from the multiplier division unit 11. Output.
The accumulation unit 14 accumulates the partial product C (i) * b (i) output from the partial product calculation unit 13.
[0035]
The correction unit 15 corrects the accumulated value by adding or subtracting an integer multiple of P to the accumulated value according to the accumulation result by the accumulating unit 14, and thereby the digit of the accumulated value in the accumulating unit 14. Prevent overflow.
The control unit 16 controls the repetitive processing of the calculation performed 20 times with i = 1 to 20 in the remainder calculation unit 12, the partial product calculation unit 13, the accumulation unit 14, and the correction unit 15. The control unit 16 calculates the intermediate number by the remainder calculating unit 12 and the partial multiplier by the multiplier dividing unit 11 in the first stage, and calculates the partial product by the partial product calculating unit 13 in the second stage. The pipeline control is performed in the third stage of the calculation of the accumulated value by and the correction of the accumulated value by the correction unit 15. Further, after the control unit 16 repeatedly operates 20 times, if the final accumulated value in the accumulating unit 14 is larger than P, the accumulated value is input to the remainder calculating unit 12 to calculate the accumulated value. The remainder by P is calculated.
[0036]
FIG. 2 shows a more detailed configuration diagram of the remainder calculation unit 12.
In the figure, the remainder calculation unit 12 includes a selector 301, a register A 302, a shifter 303, a selector 304, and a remainder calculation unit 311.
Under the control of the control unit 16, the selector 301 first outputs a multiplicand A input from the outside as an initial setting (i = 1) to the register A302, and after the initial setting (2 <= i <= 20), a remainder calculation unit. This selector outputs the intermediate number C (i) input from 311 to the register A302.
[0037]
The register A302 is a 161-bit register, holds the multiplicand A (= C (1)) input from the outside via the selector 301 as an initial setting (i = 1), and after the initial setting (2 <= i <= 20) holds the intermediate number C (i) input from the adder 310 via the selector 301. The register A302 is connected to the partial product calculation unit 13 and the shifter 303 on the output side, and updates the intermediate number held in response to an instruction from the control unit 16 to the newly input intermediate number, thereby calculating the partial product Output to the unit 13 and the shifter 303.
[0038]
The shifter 303 is a 169-bit wide shifter that shifts the input by 8 bits to the left. The 161-bit intermediate number C (i−1) input from the register A302 is shifted by 8 bits to the left, and a 169-bit carry value C (I-1) * 2 ⁸ Is output. Thus, the shifter 303 performs an 8-bit carry on the intermediate number C (i−1) by performing an 8-bit left shift.
[0039]
The selector 304 carries the carry value C (i−1) * 2 input from the shifter 303 during the repetitive processing. ⁸ Is output to the residue calculation unit 311, and after the iterative processing, the accumulated value input from the accumulation unit 14 is output to the residue calculation unit 311.
The remainder calculation unit 311 includes a ROM table 308, a ROM table 309, and an adder 310, and a carry value C (i−1) * 2 input from the shifter 303 via the selector 304. ⁸ Alternatively, a remainder or congruent value according to the modulus P of the accumulated value input from the accumulation unit 14 is output. This remainder or congruent value is 161 bits.
[0040]
The remainder calculation unit 311 also inputs a bus 305 that inputs the lower 160 bits of the 169-bit value input from the selector 304 to the adder 310, a bus 306 that inputs the 161st to 165th bits to the ROM table 308, and 166 to And a bus 307 for inputting the 169th bit to the ROM table 309. These buses carry carry value C (i-1) * 2. ⁸ Alternatively, the accumulated value is divided into a 160-bit portion (hereinafter referred to as the lower portion), a 5-bit portion (hereinafter referred to as the 5-bit portion), and a 4-bit portion (hereinafter referred to as the 4-bit portion) from the lower side. The data are input to the adder 310, the ROM table 308, and the ROM table 309, respectively.
[0041]
3A and 3B show the storage contents of the ROM table 309 and the ROM table 308. FIG.
FIG. 4B shows the stored contents of the ROM table 308. The ROM table 308 is a value represented by 5 bits (00000). ₂ ~ (11111) ₂ Is the input value and each of those values is 2 ¹⁶⁰ The remainder by the modulus P for the doubled value is stored in association with the output value. In FIG. 5B, α is P = 2. ¹⁶⁰ Since α satisfies the relationship α = 2 ¹⁶⁰ modP. Therefore, the

output values

0, 1 * α, 2 * α,... 31 * α in FIG. ¹⁶⁰ This is a remainder by the modulus P of the doubled value. Here, α is a value of 54 bits or less and the input value is 5 bits, so the output of the ROM table 308 is 59 bits at the maximum. The ROM table 308 reads and outputs the remainder corresponding to the 5-bit portion input from the bus 306 when the read signal is input from the control unit 16.
[0042]
FIG. 4A shows the stored contents of the ROM table 309. The ROM table 309 is a 4-bit value (0000) ₂ ~ (1111) ₂ Is the input value and each of those values is 2 ¹⁶⁵ The remainder by the modulus P for the doubled value is stored in association with the output value. Where α = 2 ¹⁶⁰ Since it is modP, the output value 0, (1) * 32 * α, 2 * 32 * α,... 15 * 32 * α in FIG. ¹⁶⁵ This is a remainder by the modulus P of the doubled value. Since α is a value of 54 bits or less, the input value is 4 bits, and 32 is 5 bits, the output of the ROM table 309 is 63 bits at the maximum. The ROM table 309 reads and outputs the remainder corresponding to the 4-bit portion input from the bus 307 when the read signal is input from the control unit 16.
[0043]
As described above, the remainder calculation unit 311 obtains the remainder using the ROM table 308 and the ROM table 309 for the part exceeding 160 bits due to the carry, and thus the speed is higher than the conventional construction for obtaining the remainder by subtraction. Further, since the ROM table 308 and the ROM table 309 are divided into two ROMs, the number of input values is 2 in the ROM table 308. ^Five ROM table 309 is 2 ^Four It becomes fewer with pieces. As a result, the ROM table 308 and the ROM table 309 search for input values corresponding to the 5-bit part and the 4-bit part, respectively, and speed up the calculation of the remainder.
[0044]
The adder 310 adds the lower part of 160 bits, the remainder (59 bits) of the 5-bit part output from the ROM table 308, and the remainder (63 bits) of the 4-bit part output from the ROM table 309. (I-1) * 2 ⁸ ModP or congruent value is output.
And modulo P output the same value over the remainder field. The value output from the adder 310 is 161 bits.
[0045]
FIG. 4 shows a more detailed configuration of the multiplier dividing unit 11.
In the same figure, the multiplier dividing unit 11 includes a shifter 503, a selector 501, and a register B502.
Under the control of the control unit 16, the selector 501 outputs the multiplier B input from the outside to the register B 502 as an initial setting (i = 1). After the initial setting (2 <= i), the selector 501 inputs the shifter 503. Is output to the register B502.
[0046]
A register B502 is a 160-bit register that holds a multiplier B input from the outside via the selector 501 as an initial setting, and holds a value input from the shifter 503 via the selector 501 after the initial setting. . The output side of the register B502 is connected to the shifter 503, and the lower 8 bits are connected to the partial product calculation unit 13. The multiplier dividing unit 11 updates the value held by the control of the control unit 16 to a value newly input from the shifter 503 via the selector 501. The value held in the register B 502 is output to the shifter 503 and the lower 8 bits are output to the partial product calculation unit 13.
[0047]
The shifter 503 is a 160-bit shifter that shifts the value input from the register B502 to the right by 8 bits and outputs the result.
With such a configuration, the multiplier dividing unit 11 calculates the partial multipliers b (i) = b [8i−1: 8i−8] (i = 1,..., 19, 20) of 8 bits from the lower order of the multiplier B. The data are sequentially output to the partial product calculation unit 13.
[0048]
FIG. 5 shows a more detailed configuration of the partial product calculation unit 13.
In the figure, the partial product calculation unit 13 includes a shifter unit 512, a selector unit 513, an adder 522, an adder 523, a register Ca 524, and a register Cb 525.
When the calculation in the partial product calculation unit 13 is expressed by an equation, it is expressed as (Equation 3).
[0049]
[Equation 5]

[0050]
In the equation, for convenience, the binary notation of the partial multiplier b (i) input from the multiplier dividing unit 11 is bi7bi6bi5bi4bi3bi2bi1bi0.
As shown in the equation, the partial product calculation unit 13 first calculates a digit of 0 to 7 bits corresponding to each bit of the partial multiplier b (i) with respect to the intermediate number C (i) input from the remainder calculation unit 12. Next, multiply each value carried by each bit of the partial multiplier b (i) (formula (2)), and finally add them (form (3)) ▼, ▲ 4 ▼)
The shifter unit 512 includes a bus 504 and shifters 505 to 511 that respectively shift the input values by 1 to 7 bits to the left, and performs an operation corresponding to the equation (1).
[0051]
That is, the shifter unit 512 carries the intermediate number C (i) input from the remainder calculation unit 12 according to the weight of each bit of the partial multiplier b (i) and outputs the result to the selector unit 513. The shifters 505 to 511 are shifters that perform 1 to 7-bit left shift on input C (i). That is, the shifters 505 to 511 each receive C (i) * 2 for C (i) input. ¹ ~ C (i) * 2 ⁷ Is output. The bus 504 outputs the intermediate number C (i) to the selector unit 513 as it is.
[0052]
The selector unit 513 includes selectors 514 to 521 and performs an operation corresponding to the equation (2). The selectors 514 to 521 receive C (i) * 2 from the shifter unit 512, respectively. ⁰ ~ C (i) * 2 ⁷ Are input, and the bits bi0 to bi7 of b (i) are input from the multiplier dividing unit 11. The selectors 514 to 521 respectively input C (i) * 2 according to the values of the respective bits bi0 to bi7 of the partial multiplier b (i) input thereto. ⁰ ~ C (i) * 2 ⁷ Select either 0 or 0 and output. That is, when the value of bin (n = 0, 1,..., 7) input is 1, the selectors 514 to 521 have C (i) * 2 ⁿ (N = 0, 1,..., 7) is output. If the value of bin is 0, 0 is output.
[0053]
The adder 522 is an adder that performs an operation corresponding to (3), and outputs a value obtained by adding the four values input from the selectors 514 to 517 (hereinafter referred to as a partial product a) to the register Ca 524.
The adder 523 is an adder that performs an operation corresponding to (4), and outputs a value obtained by adding the four values input from the selectors 518 to 521 (hereinafter referred to as a partial product b) to the register Cb525.
[0054]
The register Ca 524 and the register Cb 525 hold the partial product a and the partial product b, respectively, and update the partial product a and the partial product b held by the control unit 16 to newly input values. The output sides of the register Ca 524 and the register Cb 525 are connected to the accumulation unit 14, and the register Ca 524 and the register Cb 525 output the partial products a and b that are held to the accumulation unit 14.
[0055]
A value obtained by adding the partial product a and the partial product b corresponds to the partial product C (i) * b (i).
FIG. 6 shows a detailed configuration of the accumulating unit 14 and the correcting unit 15.
The accumulating unit 14 includes an adder 601 and a register S602.
The register S <b> 602 is a 170-bit register that holds an accumulated value, and updates the held accumulated value to a new accumulated value input from the adder 601 under the control of the control unit 16. Here, the most significant bit of the register S602 is a sign bit representing the sign of the accumulated value.
[0056]
The adder 601 adds the accumulated value held in the register S602, the partial product a and the partial product b held in the register Ca524 and the register Cb525, and the correction value input from the correction unit 15, and outputs the result to the register S602. To do.
The correction unit 15 includes a register E603, an inverter 606, a selector 604, and a selector 605.
[0057]
The register E603 holds a value that is larger and closest to a value that can be the maximum value of the sum of the partial product a and the partial product b.
The inverter 606 outputs the inverted value of 510 * P (hereinafter referred to as ¬510 * P) held in the register E603.
The selector 604 selects one of 0, 510 * P, and ˜510 * P as a correction value according to the control of the control unit 16 and supplies it to the adder 601.
[0058]
The selector 605

supplies

0 or 1 to the adder 601 under the control of the control unit 16.
Here, the control unit 16 causes the selector 604 to select a correction value according to whether the accumulated value held in the register S602 is positive or negative, and causes the selector 605 to select 0 or 1 and supply it to the adder 601.
[0059]
Specifically, if the most significant bit of the register S602 is 0, the control unit 16 causes the selector 604 to select 510 * P and causes the selector 605 to select 1. As a result, when the accumulated value held in the register S602 is positive (the most significant bit is 0), 511 * P is transferred from the register E603 via the inverter 606 and the

selector

604, and 1 through the selector 605. Is supplied to the adder 601. Thanks to this, 510 * P is subtracted in the next addition of the adder 601, and overflow in the adder 601 and the register S602 is prevented.
[0060]
When the accumulated value held in the register S602 is negative (the most significant bit is 1), the control unit 16 causes the selector 604 to select 0 and the selector 605 to select 0.
Further, when the iterative process of i = 1 to 20 is completed and the value held in the register S602 is updated to the final accumulated value, the control unit 16 determines whether the most significant bit of the accumulated value is 1 or 0. If it is 1 (indicating negative), the selector 604 selects 510 * P, the selector 605 selects 0, and the adder 601 adds 510 * P to the accumulated value held in the register S602. Make corrections. As a result, the control unit 16 corrects the final accumulated value to be positive. When the most significant bit of the final accumulated value is 0 (indicating positive), the control unit 16 causes the selector 604 to select 0 and the selector 605 to select 0, and the accumulated value is not corrected. Like that.
Second Embodiment
The remainder multiplication apparatus in 2nd Embodiment is demonstrated below using drawing below.
[0061]
When the multiplicand A (160 bits) and the multiplier B (161 bits) are input to the remainder multiplication apparatus in this embodiment, the remainder R (= A * BmodP) by the modulus P (160 bits) of the product A * B or The same value is calculated on the remainder R and the remainder field. Here, the modulus P is P = 2 when α is a value of 54 bits or less. ¹⁶⁰ It is a value satisfying −α.
[0062]
The remainder multiplication apparatus of this embodiment is also configured based on (Equation 2) as in the first embodiment. However, s = 8 in the first embodiment, whereas s = 9 in the second embodiment. In addition, since the k / s portion is not divisible by k / s = 161/9 when actually calculated, the k / s portion is replaced with 18 in this embodiment.
[0063]
FIG. 7 is a block diagram showing a schematic configuration of the remainder multiplication apparatus 200 in the second embodiment.
In the figure, the remainder multiplication apparatus 200 includes a multiplier division unit 81, a remainder calculation unit 82, a partial product calculation unit 83, an accumulation unit 84, a correction unit 85, and a control unit 86.
In the remainder multiplication apparatus 200 of the figure, the remainder calculation unit 82 is different from the remainder calculation unit 12 in that it has only one ROM table. In addition, as will be described in detail in FIG. 11, the partial product calculation unit 83 reduces the number of inputs input to the adder inside the partial product calculation unit 13 so that the circuit scale of the adder becomes smaller. It is configured. Further, the digit multiplication value C (i−1) * 2 in the remainder multiplication apparatus 100. ⁸ Is calculated by the remainder calculation unit 12, but the carry multiplication apparatus 200 carries the carry value C (i−1) * 2. ⁹ Is calculated not by the remainder calculation unit 82 but by the partial product calculation unit 83.
[0064]
In addition, the multiplier dividing unit 81, the accumulating unit 84, the correcting unit 85, and the control unit 86 are different from those in the first embodiment in that the multiplier B is 161 bits and s is 9 bits. Although the number of output bits and the number of bits in the register are different, the configuration is the same as in the first embodiment except for this point.
The multiplier dividing unit 81 sequentially outputs the partial multipliers b (i) = b [9i-1: 9i-9] (i = 1,..., 17, 18) of 9 bits from the lower side of the multiplier B. .
[0065]
The remainder calculating unit 82 recursively calculates the intermediate number C (i) (i = 1,..., 17, 18). That is, the remainder calculation unit 82 outputs the multiplicand A as the intermediate number C (i) for the first time (i = 1). After the second time (i = 2,..., 17, 18), the carry value C (i−1) * 2 obtained by carrying the 9-bit carry of the previous intermediate number C (i−1). ⁹ Remainder C (i-1) * 2 by modulo P ⁹ ModP is calculated as an intermediate number C (i). More specifically, the remainder calculation unit 82 determines that the first time is A and the second time is A * 2. ⁹ modP, the third time (A * 2 ⁹ modP) * 2 ⁹ modP, the fourth is ((A * 2 ⁹ modP) * 2 ⁹ modP) * 2 ⁹ ModP,... is calculated as an intermediate number C (i). However, the remainder calculation unit 82 does not carry 9-bit carry over the intermediate number C (i−1), and the carry value C (i−1) * 2 calculated by the partial product calculation unit 83. ⁹ And the remainder C (i-1) * 2 by the modulus P ⁹ modP is calculated.
[0066]
The remainder calculation unit 82 includes a ROM table 904 so that the remainder is calculated at high speed. ROM table 904 stores carry value C (i-1) * 2. ⁹ The remainder of the modulus P corresponding to the part excluding the lower 160 bits from the part, that is, the part exceeding 160 bits by carry is stored in advance. The remainder calculation unit 82 carries the carry value C (i−1) * 2. ⁹ The ROM table 904 is referred to for a portion exceeding 160 bits, and the remainder corresponding to the portion exceeding 160 bits obtained by the ROM and the carry value C (i−1) * 2 ⁹ To calculate the intermediate number C (i).
[0067]
The partial product calculation unit 83 calculates a partial product C (i) * b (i) between the intermediate number C (i) output from the remainder calculation unit 82 and the partial multiplier b (i) output from the multiplier division unit 81. Output. The partial product calculation unit 83 calculates a 9-bit carry value of the intermediate number C (i−1) output from the residue calculation unit 82 and outputs the calculated value to the residue calculation unit 82.
The accumulation unit 84 accumulates the partial product C (i) * b (i) output from the partial product calculation unit 83.
[0068]
The correction unit 85 corrects the accumulated value by adding or subtracting a value that is an integral multiple of P to the accumulated value according to the accumulated result by the accumulating unit 84, and thereby the digit of the accumulated value in the accumulating unit 84. Prevent overflow.
The control unit 86 controls an iterative process of 18 operations of i = 1 to 18 in the multiplier division unit 81, the remainder calculation unit 82, the partial product calculation unit 83, the accumulation unit 84, and the correction unit 85. The control unit 86 calculates the intermediate number by the remainder calculating unit 82 and the partial multiplier by the multiplier dividing unit 81 in the first stage, and calculates the partial product by the partial product calculating unit 83 in the second stage, and the accumulating unit 84. The pipeline control is performed in the third stage of the calculation of the accumulated value by and the correction of the accumulated value by the correction unit 85. When the final accumulated value in the accumulating unit 84 is larger than P after 18 repetitive operations, the control unit 86 inputs the accumulated value to the remainder calculating unit 82 and outputs the accumulated value P To calculate the remainder.
[0069]
FIG. 8 shows a more detailed configuration of the remainder calculation unit 82.
In the figure, the remainder calculation unit 82 includes a selector 901, a register A 906, a selector 902, and a remainder calculation unit 907.
Under the control of the control unit 86, the selector 901 first outputs the multiplicand A input from the outside to the register A 906 as an initial setting (i = 1), and after the initial setting (i = 2,..., 17, 18). This is a selector that outputs the intermediate number C (i) input from the remainder calculation unit 907 to the selector 901.
[0070]
The register A 906 is a 161-bit register, holds a multiplicand A (= C (1)) input from the outside via the selector 901 as an initial setting (i = 1), and after initial setting (i = 2, .., 17) hold the intermediate number C (i) input from the adder 905 via the selector 901. The register A 906 has an output side connected to the partial product calculation unit 83, updates the intermediate number held in accordance with an instruction from the control unit 86 to the newly input intermediate number, and outputs it to the partial product calculation unit 83. To do. The register A906 finally holds the remainder multiplication value R.
[0071]
The selector 902 carries the carry value C (i−1) * 2 input from the partial product calculation unit 83 during the iterative process. ⁹ Is output to the residue calculation unit 907, and after the iterative processing, the accumulated value input from the accumulation unit 84 is output to the residue calculation unit 907.
The remainder calculation unit 907 includes a ROM table 904 and an adder 905, and a 170-bit carry value C (i−1) * 2 input from the partial product calculation unit 83 via the selector 902. ⁹ Alternatively, a remainder or congruent value by the modulus P of the 170-bit accumulated value input from the accumulation unit 84 is output. This remainder or congruent value is 161 bits.
[0072]
The remainder calculating unit 907 outputs the lower 160 bits of the 170-bit value input from the selector 902 to the adder 905 and the upper 10 bits (161th to 170th bits) excluding the lower 160 bits in the ROM table 904. And a bus 909 for outputting to the network. These

buses

908 and 909 divide the 170-bit value into lower 160 bits (hereinafter referred to as the lower part) and upper 10 bits (hereinafter referred to as the upper part), and add the lower part and the upper part to the adders. 905 and the ROM table 904 are input.
[0073]
FIG. 9 shows the stored contents of the ROM table 904.
In the figure, the ROM table 904 is a value (0000000000000) expressed by 10 bits. ₂ ~ (1111111111) ₂ Is the input value and each of those values is 2 ¹⁶⁰ The remainder by the modulus P for the doubled value is stored in association with the output value. The ROM table 904 reads and outputs the remainder corresponding to the upper part input from the bus 909 when the read signal is input from the control unit 86.
[0074]
The adder 905 adds a remainder and a lower part for the upper part output from the ROM table 904 to provide 161 bits of C (i−1) * 2. ⁹ modP or a congruent value is output.
In this way, the remainder calculation unit 907 obtains a remainder using the ROM table 904 for a part exceeding 160 bits by 9-bit carry, and adds the remainder and the lower 160 bits to carry value C (i -1) Find the remainder or congruent value of * 2.
[0075]
When the internal configuration of the remainder calculation unit 907 is similar to that of the remainder calculation unit 3 in the prior art, the remainder calculation unit 907 must perform the loop processing (steps 6 to 8) as shown in the prior art a plurality of times. It takes time to calculate the remainder. On the other hand, the remainder calculation unit 907 according to the present embodiment calculates the remainder only by performing one reading in the ROM table 904 and one addition in the adder 905, so that the speed is higher than that of the conventional configuration.
[0076]
FIG. 10 shows a more detailed configuration of the multiplier division unit 81.
In the figure, the multiplier division unit 81 includes a selector 801, a register B 802, and a shifter 803.
Under the control of the control unit 86, the selector 801 outputs the multiplier B input from the outside to the register B 802 as the initial setting (i = 1), and after the initial setting (i = 2,..., 17, 18). The value input from the shifter 803 is output to the register B 802.
[0077]
The register B 802 is a 161-bit register that holds a multiplier B input from the outside via the selector 801 as an initial setting, and holds a value input from the shifter 803 via the selector 801 after the initial setting. . In the register B 802, the output side is connected to the shifter 803, and the lower 9 bits are connected to the partial product calculation unit 83. The register B 802 updates the value held by the control of the control unit 16 to a value newly input from the shifter 803 via the selector 801. The value held in the register B 802 is output to the shifter 803 and the lower 9 bits are output to the partial product calculation unit 83.
[0078]
The shifter 803 is a 161-bit shifter that shifts the value input from the register B 802 to the right by 9 bits and outputs the result.
With such a configuration, the remainder calculation unit 82 calculates a partial multiplier b (i) = b [9i−1: 9i-9] (i = 1,..., 17, 18) of 9 bits from the lower order of the multiplier B. The data are sequentially output to the partial product calculation unit 83.
<Partial product calculation method>
Before describing the detailed configuration of the partial product calculation unit 83, two product calculation methods that form the basis of the configuration of the partial product calculation unit 83 will be described.
[0079]
The product of the multiplicand X and the 3-bit multiplier Y (= y2y1y0) is expanded as shown in (Expression 4).
[0080]
[Formula 6]

[0081]
That is, in (Expression 4), the product X * Y is the weight of each digit of the multiplicand X and the multiplier Y (2 ² , 2 ¹ , 2 ⁰ ) And each digit value (y2, y1, y0) (X * 2) ² * Y2, X * 2 ¹ * Y1, X * 2 ⁰ * Y0, hereinafter referred to as bit product).
Specifically, for example, when Y = 101 (formula 4), X * Y = X * 2. ² * Y2 + X * 2 ⁰ * Y0, and X * Y is calculated by adding two values (hereinafter referred to as bit product). When Y = 110 (Formula 4), X * Y = X * 2 ² * Y2 + X * 2 ¹ * Y1, and X * Y is calculated by adding two bit products. When Y = 111 (Formula 4), X * Y = X * 2 ² * Y2 + X * 2 ¹ * Y1 + X * 2 ⁰ * Y0, and X * Y is calculated by adding three bit products.
[0082]
A method of calculating a product by adding a bit product obtained by multiplying the multiplicand by the weight of each digit of the multiplier and the value of each digit is referred to as a general calculation method. The remainder calculation unit 12 in the first embodiment is also configured based on this method.
By the way, 111 = 2 ^Three -1. Therefore, when Y = 111, the product X * Y is 2 instead of multiplying the multiplicand X by 111. ^Three It can also be calculated by multiplying by -1. Multiplicand X and 2 ^Three The product of -1 is expanded as follows:
[0083]
[Expression 7]

[0084]
In other words, in (Equation 5), when Y = 111, the product X * Y is 2 to the multiplicand X. ^Three This is calculated by subtracting the multiplicand X from the value multiplied by.
When Y ′ = 111000, Y ′ = Y * 2 ^Three So the product X * Y 'is 2 instead of multiplying the multiplicand X by 111000 ⁶ -2 ^Three It can also be calculated by multiplying.
[0085]
2 in the multiplier ^p 2 from the place (p + 1 bit from the bottom) ^q When the order (q + 1 bit from the lower order) (where p> q) is all 1, the product in that part is 2 in the multiplicand ^{p + 1} 2 to the multiplicand from the value obtained by multiplying ^q It can be calculated by subtracting a value multiplied by (hereinafter also referred to as a subtraction). This method is called a special calculation method.
[0086]
In the adder, the subtracted value s-t between s and t is calculated by replacing the inverted value of s and t (1's complement) with the constant 1. So 2 in the multiplier ^p 2 from the place (p + 1 bit from the bottom) ^q If the order (q + 1 bit from the least significant) (where p> q) is all 1, if the product is calculated by a special calculation method for that part, the adder is 2 ^{p + 1} Double and multiplicand 2 ^q It is only necessary to add a double inverted value (hereinafter also simply referred to as an inverted value) and a constant 1. Moreover, the addition of the constant 1 can be realized by using the carry-in of the lower bits of the adder. That is, when the product is calculated by the special calculation method, the adder only needs to perform two additions and carry-in.
[0087]
On the other hand, when all the bits of the n-bit multiplier are 1, if the product is calculated by the general calculation method, the adder needs to add n bit products.
From the above, when 1 or more of the multipliers are continuously arranged, it is possible to reduce the number of additions in the adder by calculating the product by the special calculation method rather than by the general calculation method. The scale of the vessel can be reduced and the load of the addition amount can be reduced.
The partial product calculation unit 83 is configured to calculate the partial product C (i) * b (i) based on the above two methods.
[0088]
The partial product calculation unit 83 applies the special calculation method to the portion where 1 is continuous in the bit string of the partial multiplier b (i), and applies the general calculation method to the other portions. More specifically, the partial product calculation unit 83 has a case where the 9-bit partial multiplier b (i) is 111111111, a case where the upper 6 bits or the lower 6 bits of the partial multiplier b (i) are 111111, When the upper 3 bits, the middle 3 bits, or the lower 3 bits are 111, the special calculation method is applied to that portion, and the general calculation method is applied otherwise.
[0089]
For example, when the partial multiplier b (i) is 1111111011, the partial product calculation unit 83 applies the special calculation method to the 111111 portion of the upper 6 bits and applies the general calculation method to the 010 portion of the lower 3 bits. Product C (i) * b (i). However, at this time, the upper 6 bits 111111 are values obtained by carrying 3 bits from the lower order (2 ^Three Therefore, the partial product calculation unit 83 applies 2 to the upper 6 bits when applying the special calculation method. ^Three Must carry a carry.
[0090]
Specifically, it is expanded as shown in (Expression 6).
[0091]
[Equation 8]

[0092]
As shown in (Expression 6), for the upper 6 bits 111111, the special calculation method is applied and 2 ^Three Is expanded as shown in equation (1). That is, the partial product calculation unit 83 adds the value obtained by carrying the intermediate number C (i) by 9 bits, the value obtained by carrying the intermediate number C (i) by 3 bits, and the constant 1 to the upper 6 bits 111111. Calculate the product. The lower 3 bits 011 are expanded as shown in the equation (2) by applying the general calculation method. That is, the partial product calculation unit 83 adds the value obtained by raising the intermediate number C (i) by 1 bit and the intermediate number C (i) according to the value of 011 of the lower 3 bits and the portion of 011 of the lower 3 bits. Calculate the product at.
[0093]
FIG. 11 shows a more detailed configuration of the partial product calculation unit 83.
In the figure, the partial product calculation unit 83 includes a shifter unit 51, a selection unit 52, an OR circuit 513, an adder A701, an adder B702, a register Ca844, and a register Cb845.
The shifter unit 51 includes a bus 830 and shifters 831 to 839. The bus 830 outputs the input 161-bit intermediate number C (i) to the selection unit 52 as it is. The shifters 831 to 839 shift the input intermediate number C (i) to the left by 1 to 9 bits to carry the intermediate number C (i) by 1 to 9 bits to obtain C (i) * 2, C (I) * 2 ² , C (i) * 2 ^Three , C (i) * 2 ^Four , C (i) * 2 ^Five , C (i) * 2 ⁶ , C (i) * 2 ⁷ , C (i) * 2 ⁸ , C (i) * 2 ⁹ Is generated and output to the selection unit 52. The shifter 839 shifts the intermediate number C (i) by 9 bits to the left and outputs it to the selection unit 52, and at the same time, carries the value into the carry value C (i-1) * 2. ⁹ To the remainder calculation unit 82.
[0094]
In the shifter unit 51, the outputs of the bus 830 and the shifters 831 to 838, that is, C (i), C (i) * 2 ¹ C,. . . , C (i) * 2 ⁸ Is a value that is a candidate for the bit product in the calculation by the general calculation method. Outputs of the

shifters

833, 836, 839, that is, C (i) * 2 ^Three , C (i-1) * 2 ⁶ , C (i-1) * 2 ⁹ Is a value that is a candidate for the divisor in the calculation by the special calculation method. The output of the bus 830 and shifters 833 and 836, that is, C (i), C (i) * 2 ^Three , C (i-1) * 2 ⁶ Is a value that is a candidate for reduction in the calculation by the special calculation method. Hereinafter, these candidate values will be referred to as candidate values.
[0095]
When the value obtained by carrying 0 to 9 bits from the shifter unit 51 and the partial multiplier b (i) from the multiplier dividing unit 81 are input to the selection unit 52, the 9-bit partial multiplier b (i) is 111111111. It is determined whether there is 111, whether the upper 6 bits or the lower 6 bits of the partial multiplier b (i) is 111111, and whether the upper 3 bits, the middle 3 bits, or the lower 3 bits are 111. Next, the selection unit 52 selects some of the candidate values input from the shifter unit 51 according to the determination. More specifically, the selection unit 52 selects a subtracted number and a reduced number corresponding to the part from the candidate values for the part that has been positively determined. For the part determined negatively, the selection unit 52 selects a bit product corresponding to the part from the candidate values. The selection unit 52 outputs the attenuator and the bit product of the selected candidate values as they are to the adder A701 or the adder B702. The selection unit 52 generates an inverted value obtained by inverting the value of the selected candidate value and a constant 1, and outputs the generated value to the adder A701 or the adder B702.
[0096]
The selection unit 52 includes

operation value selectors

840, 841, and 842.
FIG. 12 shows a detailed configuration of the

calculation value selectors

840, 841, and 842.
The calculation value selector 840 includes an inverter 1301,

selectors

1302 and 1303, and a control circuit 1304. The calculation value selector 840 includes input terminals S, Z, I0, I1, I2, I3, and output terminals F, O0, O1.
[0097]
When the candidate value C (i) is input as a reduction number from the input terminal I0, the inverter 1301 outputs an inverted value obtained by inverting the reduction number to the selector 1303.
The selector 1303 receives an inverted value of the candidate value C (i) from the inverter 1301 via the input terminal ¬I0, 0 from the input terminal Z, candidate value C (i) from the input terminal I0, and candidate value from the input terminal I1. C (i) * 2 ¹ Is selected, one of these input terminals is selected according to the input / output logic of the control circuit 1304, and the value input from the selected input terminal is output from the output terminal O0.
[0098]
The selector 1302 is 0 from the input terminal Z, and the candidate value C (i) * 2 from the input terminal I1. ¹ Is a candidate value C (i) * 2 from the input terminal I2. ² Is a candidate value C (i) * 2 from the input terminal I3. ^Three Is input, one of these input terminals is selected according to the input / output logic of the control circuit 1304, and the value input from the selected input terminal is output from the output terminal O1.
[0099]
When the partial multiplier b (i) is input from the input terminal S, the control circuit 1304 causes the

selectors

1302 and 1303 to select one of the input terminals based on the input / output logic shown in FIG. Values input from the input terminals are output from the output terminals O1 and O0. The control circuit 1304 outputs 1 from the output terminal F when the selector 1303 selects the input terminal ¬I0. At other times, the control circuit 1304 outputs 0 from the output terminal F.
[0100]
FIG. 13C shows the partial multiplier b (i) input to the control circuit 1304 and the input / output logic.
The first column of FIG. 10C shows the partial multiplier b (i) input from the input terminal S. “−” Represents an arbitrary value of 1 or 0, and “####” represents an arbitrary value other than “111”. The second column shows input terminals to be selected by the selector 1302. The third column shows input terminals to be selected by the selector 1303. The fourth column is a value output from the output terminal F by the control circuit 1304 and indicates a constant of 1 when 1.
[0101]
For example, if the control circuit 1304 determines that all the lower 6 bits of the partial multiplier b (i) are 1, the control circuit 1304 outputs 0 from the output terminal O1 and the inverted value of C (i) from the output terminal O0. The constant 1 is output from the output terminal F. When it is determined that all the lower 6 bits are not 1 and all the lower 3 bits are 1, the control circuit 1304 receives C (i) * 2 from the output terminal O1. ^Three , C (i) is output from the output terminal O 0, and the constant 1 is selected from the output terminal F. In other cases, the control circuit 1304 uses C (i) * 2 from the output terminals O1 and O0 according to the value of the lower 3 bits of the partial multiplier b (i). ² , C (i) * 2 ¹ , C (i), and any two values of 0 are output.
[0102]
The calculation value selector 841 includes an inverter 1301,

selectors

1302 and 1303, and a control circuit 1305. The calculation value selector 841 includes input terminals S, Z, I0, I1, I2, I3, and output terminals F, O0, O1. The calculation value selector 841 is different from the calculation value selector 840 in that a control circuit 1305 is provided instead of the control circuit 1304. When the partial multiplier b (i) is input from the input terminal S, the control circuit 1305 causes the

selectors

1302 and 1303 to select one of the input terminals based on the input / output logic shown in FIG. Values input from the input terminals are output from the output terminals O1 and O0. Since other components are the same as those of the calculation value selector 840, description thereof is omitted.
[0103]
The calculation value selector 842 includes an inverter 1301,

selectors

1302 and 1303, and a control circuit 1306. The calculation value selector 842 includes input terminals S, Z, I0, I1, I2, I3, and output terminals F, O0, O1. The calculation value selector 842 is different from the calculation value selector 840 in that a control circuit 1306 is provided instead of the control circuit 1304. When the partial multiplier b (i) is input from the input terminal S, the control circuit 1306 causes the

selectors

1302 and 1303 to select one of the input terminals based on the input / output logic shown in FIG. Values input from the input terminals are output from the output terminals O1 and O0. Since other components are the same as those of the calculation value selector 840, description thereof is omitted.
[0104]
The OR circuit 513 calculates the logical sum of the output values of the output terminals F of the

operation value selectors

840 and 841. As can be seen from FIGS. 13B and 13C, in the

operation value selectors

840 and 841, when one output of the output terminal F is a constant 1, the other output is always 0. Therefore, the OR circuit 513 combines the two values into one by taking the logical sum, thereby reducing the number of outputs of the selector 52.
[0105]
Adder A701 adds the output values of output terminals O1, O0, and F of operation value selector 842 and the output value of output terminal O1 of operation value selector 841, and outputs the resulting partial product c to register Ca844. To do.
Adder B 702 adds the output value of output terminal O 0 of operation value selector 841, the output value of output terminals O 1 and O 0 of operation value selector 840, and the output value of OR circuit 513. Then, the resulting partial product d is output to the register Ca845.
[0106]
Here, a value obtained by adding the partial product c and the partial product d is the partial product C (i) * b (i).
The register Ca844 register Cb845 holds the partial product c and the partial product d, respectively, and updates the partial product c and the partial product d held under the control of the control unit 86 to newly input values. The output side of the register Ca844 register Cb845 is connected to the accumulation unit 84, and the register Ca844 register Cb845 outputs the partial product c and the partial product d that are held to the accumulation unit 84.
[0107]
FIG. 14 shows a more detailed configuration of the accumulation unit 84 and the correction unit 85.
In the figure, the accumulating unit 84 includes an adder 705 and a register S707.
The register S707 is a 171-bit register that holds the accumulated value, and updates the held accumulated value to a new accumulated value input from the selector 706 under the control of the control unit 86. Here, the most significant bit of the register S707 is a sign bit representing the sign of the accumulated value.
[0108]
The adder 705 adds the accumulated value held in the register S707, the partial product c and the partial product d held in the register Ca844 and the register Cb845, and the correction value input from the correction unit 85, and outputs the result to the register S707. To do.
The correction unit 85 includes a register E 703, an inverter 708, a selector 704, and a selector 706.
[0109]
The register E703 holds a value that is greater than and closest to a value that can be the maximum value of the sum of the partial product c and the partial product d. Specifically, the register E703 holds 122 * P.
The inverter 708 outputs the inverted value of 1022 * P held in the register E703 (hereinafter referred to as ˜1022 * P).
[0110]
The selector 704 selects one of 0, 1022 * P, and ˜1022 * P as a correction value according to the control of the control unit 86 and supplies it to the selector 706. The selector 706

supplies

0 or 1 to the selector 706 under the control of the control unit 86. Here, the control unit 86 causes the selector 704 to select a correction value in accordance with the sign of the accumulated value held in the register S707, and causes the selector 706 to select 0 or 1 and supply it to the adder 705.
[0111]
Specifically, if the most significant bit of the register S707 is 0, the control unit 86 causes the selector 704 to select 1022 * P and causes the selector 706 to select 1. As a result, when the accumulated value held in the register S707 is positive (the most significant bit is 0), -1022 * P from the register E703 via the inverter 708 and the

selector

704 and 1 through the selector 706 are set. Is supplied to the adder 705. Thanks to this, in the next addition of the adder 705, 1022 * P is subtracted, and overflow in the adder 705 and the register S707 is prevented.
[0112]
When the accumulated value held in the register S707 is negative (the most significant bit is 1), the control unit 86 causes the selector 704 to select 0 and the selector 706 to select 0.
In addition, when the iterative process of i = 1 to 18 ends and the value held in the register S707 is updated to the final accumulated value, the control unit 86 determines whether the most significant bit of the accumulated value is 1 or 0. If it is 1 (indicating negative), the selector 704 selects 1022 * P, the selector 706 selects 0, and the adder 705 adds 1022 * P to the accumulated value held in the register S707. Make corrections. As a result, the control unit 86 corrects the final accumulated value to be positive. When the most significant bit of the final accumulated value is 0 (indicating positive), the control unit 86 causes the selector 704 to select 0 and the selector 706 to select 0 to correct the accumulated value. Do not.
[0113]
The partial product calculation unit 83 may be configured as shown in FIG.
11 differs from FIG. 11 in that

shifters

847 and 846 are provided, and

calculation value selectors

850, 851, and 852 are provided instead of

calculation value selectors

840, 841, and 842.
Shifter 847 shifts partial multiplier b (i) to the right by 3 bits and outputs the result.
[0114]
Shifter 846 shifts partial multiplier b (i) to the left by 3 bits and outputs the result.
FIG. 16 shows the input / output logic of the control circuit common to the

operation value selectors

850, 851, and 852. As described above, the partial product calculation unit 83 shown in FIG. 16 includes the shifter 847 and the shifter 846, so that the partial multiplier b (i) shifted by 3 bits to the right, the partial multiplier b (i) itself, and the left The partial multiplier b (i) shifted by 3 bits is input to the selector 52. By doing so, the

operation value selectors

850, 851, and 852 can apply the same input / output logic. Further, since the same input / output logic can be applied to all the

operation value selectors

850, 851, and 852, one control circuit may be provided.
[0115]
In the first embodiment, the remainder calculation unit 12 has two ROM tables. In the second embodiment, the remainder calculation unit 82 has one ROM table. However, the remainder calculation unit 82 may have three or more ROM tables. .
<Development of expression>
The deformation process from (Equation 1) to (Equation 2) in the first embodiment will be described below.
[0116]
[Equation 9]

[0117]
【The invention's effect】
The remainder multiplication apparatus according to the present invention calculates, for the product of the multiplicand a and the multiplier b (b is data of kb bits), a value congruent with the remainder by the modulus p (p is k-bit data) and an accumulated value of the following equation: A remainder multiplication device that calculates as
Accumulated value = ΣC (i) * b [s * i + s-1: s * i]
(Here, Σ represents an accumulation from i = 0 to [[kb / s]]. [[Kb / s]] is an integer part of the quotient kb / s, and i ranges from 0 to [[k / s]. ]], And C (i) is expressed by a recurrence formula. When i = 0, C (i) = a, and when i> = 1, C (i) ≡ (C (i-1) * 2 ^s ) mod p (≡ indicates that the values on both sides are congruent in modulus p))
b [s * i + s-1: s * i] is 2 out of the k-bit multiplier b ^{s * i + s-1} 2 from the place of ^{s * i} S-bit partial multipliers up to ^k Table means for preliminarily storing the remainder by the modulus p for the multiplication and the first time (i = 0), the multiplicand a is output as an intermediate number C (0), and the second and subsequent times (i> 0), the intermediate number output last time C (i-1) is carried by s bits, the remainder for the upper m bits excluding the lower k bits of the intermediate number after the carry is read from the table means, and the read remainder and the lower k bits are added. Intermediate number calculation means for calculating a new intermediate number (i), and each calculated intermediate number C (i) and the corresponding partial multiplier b [s * i + s-1: s * i ] And the partial product C (i) * b [s * i + s-1: s * i] are sequentially accumulated to calculate the accumulated value.
[0118]
According to this configuration, the table means has each value represented by m bits, that is, 0 to 2 ^m For each value corresponding to -1, 2 of those values ^K The remainder by the modulus p for the multiplication is stored in advance. When i> = 1, the intermediate number calculation means first calculates the remainder for the upper m bits excluding the lower k bits of the intermediate number after carry, that is, the upper m bits of 2 ^k The remainder from the modulus p for the double is read from the table means. Next, the intermediate number calculation means calculates the remainder or congruent value by the modulus p of the intermediate number after the carry by adding the remainder read from the table means and the lower k bits.
[0119]
Here, the lower-order k bits of the intermediate number after the carry are the value of the remainder itself by the modulus p or the congruent value with the remainder by the modulus p. This congruent value is larger than the remainder and closer to the remainder. Therefore, in order to obtain a remainder or congruent value by the modulus p for the intermediate number after the carry, and a value closer to the remainder, the intermediate number calculation means obtains the remainder for the upper m bits, and obtains this and the lower k bits. Add them together. The intermediate number calculation means calculates the remainder or congruent value of the intermediate number after the carry by the modulus p by reading the remainder of the upper m bits from the table means.
[0120]
If the remainder of the intermediate number after carry is calculated by using the same procedure as the remainder calculation procedure in the remainder calculation section 3 of the remainder multiplication apparatus 1 of the prior art, the intermediate number calculation section The loop process of 6-8 must be repeated several times. More specifically, the number of loop processes corresponds to the intermediate number of bits after the carry. On the other hand, the intermediate number calculation unit according to the present invention can calculate a remainder or a congruent value of the intermediate number after carry by a single reading and a single addition of the table part, and a high-speed remainder. Arithmetic can be realized.
[0121]
Further, the intermediate number calculation means calculates the remainder of the value obtained by incrementally raising the previous intermediate number as an intermediate number, thereby preventing the intermediate number calculation means from increasing the number of bits of the intermediate number to be calculated. Yes. This prevention prevents an increase in the number of bits of the partial product calculated using the intermediate number. As a result, in the accumulation of partial products, addition may be performed using partial products with a reduced number of bits, so that the circuit scale of the adder can be reduced.
[0122]
The intermediate number calculating means includes a first holding means for holding the intermediate number and an intermediate number held in the first holding means corresponding to the partial multiplier b [s * i + s−1: s * i]. And a carry means for carrying s bits, and an intermediate number after the carry, the upper data which is the upper m bits of the lower k bits and the lower k which is the lower k bits of the intermediate number after the carry A dividing unit that divides the data into data, a reading unit that reads the remainder of the higher-order data by the dividing unit by the method p from the previous table unit, a remainder read from the table unit, and the lower data by the dividing unit Adding means for obtaining a new intermediate number, and the first holding means updates the held content to a new intermediate number each time a new intermediate number is obtained by the adding means, and the multiplicand is changed to the intermediate number at the first time. Output as the second time Descending in is configured to output a new intermediate number of updated.
[0123]
According to this configuration, each component in the intermediate number calculation means can be easily configured using a general-purpose hardware element. That is, the first holding means uses a register, the carry means uses a shifter, the dividing means uses an m-bit bus and a k-bit bus connected to the upper m-bit part and the lower k-bit part of the shifter, and the adding means uses an adder. With these components, the intermediate number calculation means calculates the intermediate number C (i) incrementally.
[0124]
The remainder multiplication device further includes post-processing means for obtaining a remainder by the modulus p of the accumulated value by using the dividing means, the reading means, and the adding means.
According to this configuration, the post-processing means obtains the remainder by the modulus p of the last accumulated value, so that the last accumulated value can be made smaller than p.
[0125]
The table means receives an address corresponding to each value represented by m bits, and stores the m-bit value 2 corresponding to the address in the storage area indicated by the address. ^k It has a memory element for preliminarily storing a remainder by the modulus p for the multiplication.
According to this configuration, the table means can be realized by one memory element. The memory element has each value represented by m bits, that is, a decimal number from 0 to 2. ^m 2 for each value corresponding to -1. ^k The remainder by the modulo p is stored in advance. The address indicating the storage area of the remainder is the same value as each value. When the upper m bits are input from the reading means, the memory element reads the remainder of the storage area indicated by the address as the address by the reading means. As a result, the intermediate number calculation means can obtain the remainder corresponding to the upper m bits in a short time by only performing one reading from the memory element.
[0126]
The m bits are divided into lower m1 bits and upper m2 bits (m = m1 + m2), and each value represented by the m bits is a value represented by the m1 bit and a value represented by the m2 bit. The table means, for each value represented by m1 bits, 2 ^k A first partial table means for preliminarily storing a remainder by a modulus p for a multiplication factor, and for each value represented by m2 bits, 2 of the value ^{k + m1} Second partial table means for preliminarily storing a remainder obtained by modulus p with respect to multiplication, wherein the adding means adds the respective remainders read from the first and second partial tables and the lower order data To obtain a new intermediate number.
[0127]
According to this configuration, the table means includes the first partial table means corresponding to the upper m1 bits of the m bits and the second partial table means corresponding to the lower m2 bits. The first partial table means sets the remainder corresponding to the upper m1 bit to 2 ^m1 The second partial table means stores the remainder corresponding to the lower-order m2 bits as 2 ^m2 Remember me. As can be seen from this number, the combined number of the remainders of the first and second partial table means is further reduced as compared with the case of one memory element. Thus, the table means can be constituted by two memory elements having a small storage capacity.
[0128]
The m bits are divided into t (3 ≦ t ≦ m) partial bits m1,..., Mt from the lower side, and each value represented by the m bits is each mi (i is an integer from 1 to t). ) Corresponding to a combination of values (t) expressed by bits, the table means for each value expressed by partial bits mi bits, ^{k + x} T partial table means Ti for preliminarily storing a remainder by modulus p for a multiple (here, x = m1 +... + M (i-1)), and the adding means includes t partial table means Ti A new intermediate number is obtained by adding the read t remainders and the lower order data.
[0129]
Thus, the table means has t partial table means Ti corresponding to each value of t partial bits obtained by dividing m bits. Thus, the table means has a plurality of partial table means Ti, thereby reducing the number of stored surpluses in each partial table means Ti.
The remainder multiplication device further includes an accumulation unit and a correction unit, and the accumulation unit includes a third register that holds 0 as an initial value, a partial product C (i) * b [s * i + s− 1: s * i] and the accumulated value held in the third register are added, and an adder that outputs and holds the addition result as a new accumulated value in the third register is provided. , Correction value holding means for holding a correction value having an integer multiple of p, and a correction value held in the correction value holding means if the accumulated value held in the third register is equal to or greater than a predetermined value And a correction control means for causing the adder to subtract.
[0130]
According to this configuration, the correction control means causes the adder to subtract the correction value from the accumulated value if the accumulated value is equal to or greater than a predetermined value, thereby preventing overflow in the third register.
The third register has a sign bit, and the correction control means corrects the correction simultaneously with the addition of the accumulated value and the partial product to the adder if the accumulated value of the third register is positive. The correction value is an integer multiple of p, and its absolute value is the maximum value (t + 1) (2 ^s -1) The value is less than or equal to p.
[0131]
According to this configuration, the correction control means causes the adder to subtract the correction value from the accumulated value when the sign bit of the accumulated value held in the third register is 1 (the accumulated value indicates positive). Therefore, overflow in the third register is prevented. Since the correction value is an integer multiple of p and is equal to or less than the maximum value (t + 1) (2s-1) p of the partial product, is the final value of the accumulated value smaller than p? Or a value close to p.
[0132]
The method p is p = 2. ^k Each partial table means Ti stores the remainder as k3 bit data, where k3 is t * 2 ^m * The number of bits of α, wherein α is a constant determined such that k3 is smaller than k.
According to this configuration, p = 2 ^k -Α is satisfied, and α is a constant whose upper limit is set so that k3 becomes smaller than k. As a result, the remainder stored in the partial table means Ti can be limited to k3 bits, and the number of output bits from the partial table can be reduced. Since the number of surplus bits output from the partial table means Ti is k3 bits or less, the bit width of the adder can be limited accordingly.
[0133]
Each partial table means Ti is represented by 0 to (2 ^mi 2 for values up to -1) ^mi J (j is 0 to 2) ^mi The second entry is j * 2 ^{m1 +} ... ^{+ m (i-1)} * α is stored.
In this configuration, the partial table means Ti sets the number of entries to be stored to 2 ^mi −1, and the scale of the partial table can be reduced.
[0134]
Also, if α is u bits, j * 2 which is the jth entry ^{m1 +} ... ^{+ m (i-1)} * The number of bits of α can be limited to a value obtained by adding m1 + m2 +... + M (i−1) and u bits to the number of bits of j at the maximum. The adding means for adding the remainder output from the partial table means Ti can limit the bit width.
[0135]
A remainder multiplication apparatus according to the present invention is a remainder multiplication apparatus that calculates a number congruent with a remainder by a modulus p (p is k-bit data) for a product of a multiplicand and a multiplier, and the multiplier is s (s is an integer of 2 or more) ) Output means for outputting partial multipliers of s bits obtained by dividing each bit in order from the lower side, and the multiplicand is carried according to the position of each partial multiplier, and the modulus p is applied to the multiplicand after the carry. A first calculation means for calculating a number congruent with the remainder by the above (hereinafter referred to as an intermediate number), a partial multiplier output by the output means, and the intermediate number calculated by the first calculation means corresponding to the partial multiplier A second calculating means for calculating the product of the two as a partial product, an accumulating means for accumulating the partial product calculated by the second calculating means, and an integral multiple of p to the accumulated value accumulated by the accumulating means By adding or subtracting the value, the accumulated value does not exceed the specified number of bits. Until all partial multipliers are output by the output means, the intermediate number by the first calculation means, the partial product by the second calculation means, the accumulation by the accumulation means, Control means for repeatedly performing correction by the correction means, wherein the first calculation means is configured to calculate 2 for each value represented by m (m is an integer equal to or greater than s) bits. ^k The remainder by the modulus p for the multiplication is stored in advance, and the multiplicand is output as an intermediate number in the first iteration among the repetitions by the control means. The table means is read for m bits higher than the lower k bits of the number, and a new intermediate number is calculated by adding the read number and the lower k bits.
[0136]
According to this configuration, in the second and subsequent iterations of the first calculation means by the control means, the upper part excluding the lower k bits of the intermediate number after the carry carried by s bits also corresponds to the m bits from the table means. By reading the remainder and adding the remainder and the lower-order k bits, a remainder or congruent value according to the modulus p of the intermediate number after the carry is calculated as the intermediate number. Thereby, the first calculation means calculates the remainder at high speed.
[0137]
The second calculating means calculates a partial product of the partial multiplier and the intermediate number for each s-bit partial multiplier. Since partial products are calculated for each partial multiplier of multiple digits in this way, partial products need only be calculated for the number of partial multipliers, and the number of iterations in the second calculation means is reduced, and the prior art product calculation unit 2 is Compared to faster.
The control means controls pipeline processing including the first to third stages. In the first stage, the output means outputs a partial multiplier and the first calculation means outputs an intermediate number, and the second stage outputs the intermediate number. 2 The partial product is calculated by the calculating means, and in the third stage, the accumulating means is accumulated and the correcting means is corrected.
[0138]
According to this configuration, the remainder multiplication apparatus performs pipeline processing, so that the processing in each component is efficiently performed, and the processing speed of the operation is further increased.
The output means holds the multiplier first, and outputs the lower s bits of the held value as a partial multiplier, and shifts the value held in the multiplier holding means to the lower side of the s bits. Shift means for outputting and holding the shifted value to the multiplier holding means.
[0139]
According to this configuration, the output unit can be easily configured by the register that holds the multiplier and the shifter that shifts the held multiplier to the lower side of s bits.
The second generation means is a first shift means for shifting the intermediate number calculated by the first calculation means by i-bit left shift. The i-th shift means (i is an integer from 1 to (s-1)). To (s-1) th shift means, wherein the first generation means carries the intermediate number calculated by the first calculation means by s-bit left shift, and the intermediate number Complement generating means for generating a 1's complement, and constant output means for outputting a constant 1, wherein the second adding means is configured to output the sth shift means when all bits are determined to be 1. When the output, the one's complement generated by the complement generation means, and the constant 1 are selected and it is not determined that all bits are 1, the partial multiplier is 2 ⁰ If the place is “1”, the intermediate number is selected and the partial multiplier is 2 ⁱ If the place is “1”, there are provided selection means for selecting the carry result of the i-th shift means and an adder for calculating the partial product by adding the selection results of the selection means.
[0140]
According to this configuration, the first to (s−1) th shift means are realized by a shifter that performs 1 to (s−1) bit left shift. The s-th shift means is realized by a shifter that performs s-bit left shift. The complement generation means is realized by an inverter. As described above, the first generation unit and the second generation unit can be configured by general-purpose hardware elements. These components generate in advance candidate values to be added by the adding means. The selecting means selects a value to be added to the adding means from the candidate values according to the value of the partial multiplier. The adding means calculates a partial product by adding the values of the selected candidates. With such a configuration, the second calculating means can calculate a partial product of a multi-digit intermediate number and a partial multiplier at high speed.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a remainder multiplication apparatus 100 according to a first embodiment.
FIG. 2 shows a more detailed configuration diagram of the remainder calculation unit 12;
FIG. 3A shows the storage contents of a ROM table 309;
(B) The stored contents of the ROM table 308 are shown.
FIG. 4 shows a more detailed configuration of the multiplier dividing unit 11;
FIG. 5 shows a more detailed configuration of the partial product calculation unit 13;
6 shows a detailed configuration of an accumulation unit 14 and a correction unit 15. FIG.
FIG. 7 is a block diagram showing a schematic configuration of a remainder multiplication apparatus 200 in the second embodiment.
FIG. 8 shows a more detailed configuration of the remainder calculation unit 82.
FIG. 9 shows the storage contents of a ROM table 904;
10 shows a more detailed configuration of the multiplier division unit 81. FIG.
11 shows a more detailed configuration of the partial product calculation unit 83. FIG.
12 shows a detailed configuration of the

calculation value selectors

840, 841, and 842. FIG.
13A shows the correspondence between the partial multiplier b (i) input to the control circuit 1306 and the input / output logic of the control circuit 1306. FIG.
(B) shows the correspondence between the partial multiplier b (i) input to the control circuit 1305 and the input / output logic of the control circuit 1305.
(C) shows the correspondence between the partial multiplier b (i) input to the control circuit 1304 and the input / output logic of the control circuit 1304.
14 shows a more detailed configuration of the accumulating unit 84 and the correcting unit 85. FIG.
15 may be configured as shown in FIG.
FIG. 16 shows the input / output logic of a control circuit common to the

operation value selectors

850, 851, and 852;
FIG. 17 shows a configuration of a conventional modular multiplication apparatus.
[Explanation of symbols]
100 Remainder multiplier
11 multiplier divider
12 Remainder calculation part
13 Partial product calculator
14 Accumulation part
15 Correction unit
16 Control unit
308 ROM table
309 ROM table
301 selector
302 Register A
303 Shifter
304 selector
310 Adder
311 Remainder calculation part

Claims

This is a remainder multiplication device that calculates a congruent value as a cumulative value of the following equation for a product of a multiplicand a and a multiplier b (b is kb bit data) and a modulus p (p is k bit data). And

Where [[kb / s]] is the integer part of the quotient kb / s,
i is an integer from 0 to [[k / s]]
C (i) is expressed by a recurrence formula. When i = 0, C (i) = a, and when i> = 1, C (i) ≡ (C (i-1) * 2 ^s ) modp ( ≡ indicates that the values on both sides are congruent in the modulus p),
b [s * i + s-1: s * i] is a partial multiplier of s bits from the 2 ^{s * i} ^{+ s-1} to the 2 ^{s * i} of the k-bit multiplier b,
The remainder multiplication device includes:
table means for storing in advance, for each value represented by m (m is an integer greater than or equal to s) bits, a modulus p with respect to 2 ^k times that value;
At the first time (i = 0), multiplicand a is output as intermediate number C (0),
From the second time on (i> 0), the intermediate number C (i-1) output last time is carried out by s bits, and the remainder for the upper m bits excluding the lower k bits of the intermediate number after the carry. An intermediate number calculating means for calculating a new intermediate number (i) by reading from the table means and adding the read remainder and the lower k bits;
Partial product C (i) * b [s * i + s-1: between each calculated intermediate number C (i) and the corresponding partial multiplier b [s * i + s-1: s * i] s * i] is sequentially accumulated to calculate the accumulated value.

The intermediate number calculating means includes
First holding means for holding an intermediate number;
Carrying means for carrying the intermediate number held in the first holding means by s bits corresponding to the partial multiplier b [s * i + s-1: s * i];
A dividing unit that divides the intermediate number after the carry into upper data that is a portion of m bits higher than the lower k bits in the intermediate number after the carry and lower data that is a portion of the lower k bits;
A reading means for reading out the remainder by the modulus p for the upper data by the dividing means from the previous table means;
Adding means for obtaining a new intermediate number by adding the remainder read from the table means and the lower data by the dividing means;
The first holding means updates the held content to a new intermediate number every time a new intermediate number is obtained in the adding means, outputs the multiplicand as the intermediate number in the first time, and after the second time after the update 2. The remainder multiplication apparatus according to claim 1, wherein a new intermediate number is output.

The remainder multiplication device further includes:
3. The remainder multiplication apparatus according to claim 2, further comprising post-processing means for obtaining a remainder of the accumulated value by a modulus p by using the dividing means, the reading means, and the adding means.

The table means includes
An address corresponding to each value represented by m bits is input, and a memory element that stores in advance a remainder obtained by modulus p with respect to 2 ^k times the m-bit value corresponding to the address is stored in the storage area indicated by the address. The modular multiplication apparatus according to claim 2 or 3, wherein

The m bits are divided into lower m1 bits and upper m2 bits (m = m1 + m2),
Each value represented by the m bits corresponds to a combination of a value represented by the m1 bit and a value represented by the m2 bit,
The table means includes
first partial table means for preliminarily storing, for each value represented by m1 bits, a modulus p for 2 ^k times that value;
for each value represented by m2 bits, a second partial table means for preliminarily storing a remainder in modulus p for 2 ^{k + m1} times that value;
4. The addition means according to claim 2, wherein a new intermediate number is obtained by adding each of the remainders read from the first and second partial tables and the lower order data. Remainder multiplier.

The m bits are divided into t (3 ≦ t ≦ m) partial bits m1, ..., mt from the lower side,
Each value represented by the m bits corresponds to a combination of values (t) represented by each mi (i is an integer from 1 to t) bits,
The table means includes
For each value represented by the partial bits mi bits, t partial tables for storing in advance a remainder by the modulus p for 2 ^{k + x} times the value (where x = m1 +... + M (i−1)). Means Ti,
4. The adding means obtains a new intermediate number by adding t remainders read out from t partial table means Ti and the lower order data. Remainder multiplication device.

The m bits are divided into t (2 ≦ t ≦ m) partial bits m1, ..., mt from the lower side,
The table means includes
When x is expressed by the following equation, each partial table means Ti stores, for each value represented by the partial bit mi bit, t partial tables for storing in advance a remainder obtained by modulus p for 2 ^{k + x} times that value. Means Ti (i is an integer from 1 to t),
Where x is represented by the following equation:

4. The remainder according to claim 2, wherein the adding means obtains a new intermediate number by adding t remainders read from t partial table means Ti and the lower order data. Multiplier device.

8. The remainder multiplication apparatus according to claim 7, wherein each partial table means Ti stores the remainder as k-bit data.

The remainder multiplication device further includes an accumulation unit and a correction unit,
The accumulation means includes
A third register holding 0 as an initial value;
The partial product C (i) * b [s * i + s-1: s * i] is added to the accumulated value held in the third register, and the addition result is added to the third register as a new accumulated value. And an adder for outputting and holding,
The correction means includes
correction value holding means for holding a correction value having a value that is an integer multiple of p;
And a correction control means for causing the adder to subtract the correction value held in the correction value holding means when the accumulated value held in the third register is equal to or greater than a predetermined value. Item 9. A modular multiplication apparatus according to item 8.

The third register has a sign bit;
If the accumulated value of the third register is positive, the correction control means causes the adder to subtract the correction value simultaneously with the addition of the accumulated value and the partial product,
The correction value is an integral multiple of p, and its absolute value is the maximum value (t + 1) (2 ^10. The modular multiplication apparatus according to claim 9, wherein ^s -1) is a value equal to or less than p.

The modulus p satisfies the relationship p = 2 ^k −α,
Each partial table means Ti stores the remainder as k3 bit data.
Where k3 is the number of bits of t * 2 ^m * α,
8. The modular multiplication apparatus according to claim 7, wherein α is a constant determined so that k3 is smaller than k.

Each partial table means Ti has 2 ^mi entries corresponding to values from 0 to (2 ^mi -1) represented by mi bits,
12. The modular multiplication apparatus according to claim 11, wherein the j-th entry (j is from 0 to 2 ^mi ) stores j * 2 ^{m1 +} ... ^{+ m (i−1)} * α.

The remainder multiplication device further includes an accumulation unit and a correction unit,
The accumulation means includes
A third register holding 0 as an initial value;
The partial product C (i) * b [s * i + s-1: s * i] is added to the accumulated value held in the third register, and the addition result is added to the third register as a new accumulated value. And an adder for outputting and holding,
The correction means includes
correction value holding means for holding a correction value having a value that is an integer multiple of p;
If the accumulated value held in the third register is equal to or greater than a predetermined value, correction control means for causing the adder to subtract the correction value held in the correction value holding means;
The modular multiplication apparatus according to claim 12, comprising:

The third register has a sign bit;
If the accumulated value of the third register is positive, the correction control means causes the adder to subtract the correction value simultaneously with the addition of the accumulated value and the partial product,
14. The modular multiplication apparatus according to claim 13, wherein the correction value is an integral multiple of p and is equal to or less than a maximum value (2 ^{s + 1} −2) p of the partial product.

In a remainder multiplication device that calculates a number congruent with a remainder by a modulus p (p is k-bit data) for a product of a multiplicand and a multiplier,
Output means for sequentially outputting s-bit partial multipliers obtained by dividing the multiplier into s (s is an integer of 2 or more) bits, in order from the lower order;
First calculating means for carrying the multiplicand according to the position of each partial multiplier and calculating a number congruent to the remainder by the modulus p (hereinafter referred to as an intermediate number) with respect to the multiplicand after the carry;
Second calculation means for calculating as a partial product a product of the partial multiplier output by the output means and the intermediate number calculated by the first calculation means corresponding to the partial multiplier;
Accumulating means for accumulating the partial products calculated by the second calculating means;
Correction means for correcting the accumulated value so as not to exceed a predetermined number of bits by adding or subtracting an integer multiple of p to the accumulated value accumulated by the accumulating means;
Until all the partial multipliers are output by the output means, the calculation of the intermediate number by the first calculation means, the calculation of the partial product by the second calculation means, the accumulation by the accumulation means, and the correction by the correction means are repeated. Control means to perform,
The first calculation means stores in advance, for each value represented by m (m is an integer equal to or greater than s) bits, a modulus p with respect to 2 ^k times that value,
Of the repetitions by the control means, the multiplicand is output as an intermediate number at the first time, and the second and subsequent times carry out the previous output intermediate number by s bits, and m bits higher than the lower k bits of the intermediate number after the carry. A remainder multiplication device characterized in that a new intermediate number is calculated by reading the table means and adding the read number and the lower k bits.

The first calculation means includes
First holding means for holding a multiplicand;
Carrying means for carrying the intermediate number held in the first holding means by s bits corresponding to the partial multiplier output from the output means for the second time or later,
A dividing unit that divides the intermediate number after the carry into upper data that is a portion of m bits higher than the lower k bits in the intermediate number after the carry and lower data that is a portion of the lower k bits;
A reading means for reading out the remainder by the modulus p for the upper data by the dividing means from the table means;
An adding means for obtaining a new intermediate number by adding the remainder read from the table means and the lower data by the dividing means;
With
The first holding means updates the held contents to a new intermediate number every time a new intermediate number is obtained by the adding means, outputs the multiplicand as the intermediate number in the first time, and after the second time after the update Output a new intermediate number of
16. The modular multiplication apparatus according to claim 15, wherein the second calculation unit calculates the partial product using the content held by the first holding unit as an intermediate number.

The control means controls pipeline processing including the first to third stages. In the first stage, the output means outputs a partial multiplier and the first calculation means outputs an intermediate number, and the second stage outputs a second number. 17. The modular multiplication apparatus according to claim 16, wherein the calculating means causes the partial product to be calculated, and in the third stage, the accumulating means causes the correction means to correct and the correction means to correct.

The output means includes
Multiplier holding means for initially holding a multiplier and outputting the lower s bits of the held value as a partial multiplier;
18. The modular multiplication apparatus according to claim 17, further comprising shift means for shifting the value held in the multiplier holding means to the lower side of s bits and outputting and holding the shifted value to the multiplier holding means. .

The second calculation means includes
Multiplication means for calculating the partial product from the intermediate number held in the first holding means and the partial multiplier output from the multiplier holding means;
19. The modular multiplication apparatus according to claim 18, further comprising second holding means for holding the calculated partial product in the multiplication means.

The accumulation means includes
Third holding means for holding the accumulated value;
An adder for adding the partial product held in the second holding means and the accumulated value held in the third holding means;
20. The modular multiplication apparatus according to claim 19, wherein the third holding unit holds the addition result as a new accumulated value.

The correction means includes
correction value holding means for holding a correction value having a value that is an integer multiple of p;
When the number of effective bits of the accumulated value held in the third holding means is equal to or greater than a predetermined number of bits, the correction value held in the correction value holding means is simultaneously added to the partial product and the accumulated value. 21. The modular multiplication apparatus according to claim 20, further comprising correction control means for adding and subtracting the adder.

The control unit controls the pipeline processing by using the first holding means and the multiplier holding means as a pipeline latch from the first to the second stage and using the second holding means as the pipeline latch from the second to the third stage. 22. The modular multiplication apparatus according to claim 21, wherein

The table means includes
An address corresponding to each value represented by m bits is input, and a memory element that stores in advance a remainder obtained by modulus p with respect to 2 ^k times the m-bit value corresponding to the address is stored in the storage area indicated by the address. 18. The modular multiplication apparatus according to claim 17, wherein

The m bits are divided into lower m1 bits and upper m2 bits (m = m1 + m2),
Each value represented by the m bits corresponds to a combination of a value represented by the m1 bit and a value represented by the m2 bit,
The table means includes
first partial table means for preliminarily storing, for each value represented by m1 bits, a modulus p for 2 ^k times that value;
for each value represented by m2 bits, a second partial table means for preliminarily storing a remainder in modulus p for 2 ^{k + m1} times that value;
18. The remainder multiplying apparatus according to claim 17, wherein the adding means adds the remainders read from the first and second partial tables and the lower order data.

The m bits are divided into t (3 ≦ t ≦ m) partial bits m1, ..., mt from the lower side,
Each value represented by the m bits corresponds to a combination of values (t) represented by each mi (i is an integer from 1 to t) bits,
The table means includes
For each value represented by the partial bits mi bits, t partial table means Ti for storing in advance the remainder by the modulus p for 2 ^{k + x} times the value (here, x = m1 +... + M (i-1)). With
18. The remainder multiplication apparatus according to claim 17, wherein the adding means adds t remainders read from t partial table means Ti and the lower order data.

The second calculation means includes
Determining means for determining whether or not all bits of the partial multiplier of s bits output from the output means are 1,
First generation means for generating a value obtained by multiplying the intermediate number calculated by the first calculation means by the power of 2 and a negative value of the multiplicand;
For each bit of the partial multiplier, a product of a bit weight in the partial multiplier and the intermediate number is generated, and the product is a second generation means that is a number obtained by raising the intermediate number by the bit weight;
When it is determined that all the bits are 1, each value generated by the first generation unit is added. When all the bits are not determined to be 1, among the products generated by the second generation unit 17. The modular multiplication apparatus according to claim 16, further comprising second addition means for adding a bit corresponding to a bit of “1” in the partial multiplier.

The second generation means includes
The i-th (i is an integer from 1 to (s-1)) shift means shifts the intermediate number calculated by the first calculation means by i-bit left shift from the first to (s-1) -th Provided with shifting means,
The first generation means includes
S-th shift means for carrying the intermediate number calculated by the first calculation means by s-bit left shift;
A complement generation means for generating the one's complement of the intermediate number;
Constant output means for outputting constant 1;
The second adding means includes
When it is determined that all bits are 1, the output of the s-th shift means, the 1's complement generated by the complement generation means, and the constant 1 are selected, and it is not determined that all bits are 1. If the, if 2 ⁰ digit of partial multiplier is "1" to select the intermediate speed, if the position of the 2 ⁱ of portions multiplier "1", the carry resulting from the shift means of the i A selection means to select;
27. The modular multiplication apparatus according to claim 26, further comprising: an adder that calculates the partial product by adding selection results of the selection means.

S is a multiple of 3 (3n) bits;
The partial multiplier is represented by n pieces of 3-bit data sn,.
The second calculation means includes
J-th (j is an integer from 1 to n) determination means; first to n-th determination means for determining whether or not the 3-bit data sj is "111";
When it is determined by the jth determination means that the value is “111”, a value obtained by multiplying the intermediate number by “1000” and the position 2 ^{3 (j−1) of the} 3-bit data sj, and a negative multiplicand First to n-th special generation means for generating a value obtained by multiplying the value by the position 2 ^{3 (j-1)} of the 3-bit data sj;
When it is determined by the j-th determining means that it is not “111”, for each bit of the 3-bit data sj, the logical value of the bit, the weight of each bit in the s-bit partial multiplier, and the intermediate number First to nth general generating means for generating a product;
Adding means for adding the value generated by the special generating means corresponding to sj determined to be “111” and the product generated by the general generating means corresponding to sj determined not to be “111”; 17. The modular multiplication apparatus according to claim 16, further comprising:

The second calculation means includes
The i-th (i is an integer from 1 to s) shift means includes first to s-th shift means for carrying the intermediate number by i-bit left shift,
The first general generation means uses a shift result by the first and second shift means,
The j-th general generation means excluding the first uses a shift result by the (3j-3) th, (3j-2), and (3j-1) th shift means,
The jth special generation means uses the shift result obtained by the (3j) th shift means,
29. The modular multiplication apparatus according to claim 28, wherein the jth special generating means and the (j + 1) th general generating means share the (3j) shift means.

The jth special generation means includes:
A complement generation means for generating a one's complement of the intermediate number or a shift result by the (3j-3) th shift means;
Constant output means for outputting constant 1;
The adding means includes
When it is determined that sj is “111”, the output of the (3j) th shift means, the one's complement generated by the complement generation means in the jth special generation means, and the jth special generation Select the constant 1 output by the constant output means in the means,
If it is not determined that s1 is “111”, the intermediate number and the output of the (3j−2) th and (3j−1) th shift means are selected,
selection means for selecting the output of the (3j-3) th, (3j-2) th, and (3j-3) th shift means when sj excluding s1 is not determined to be "111";
30. The modular multiplication apparatus according to claim 29, further comprising an adder that calculates the partial product by adding the selection results of the selection means.

17. The modular multiplication according to claim 16, further comprising: a post-processing unit that obtains a remainder of the last accumulated value corrected by the modulus p by using the dividing unit, the reading unit, and the adding unit. apparatus.