JP5127241B2

JP5127241B2 - Residue calculation device and residue calculation method

Info

Publication number: JP5127241B2
Application number: JP2007010538A
Authority: JP
Inventors: 大輔鈴木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-01-19
Filing date: 2007-01-19
Publication date: 2013-01-23
Anticipated expiration: 2027-01-19
Also published as: JP2008176136A

Description

本発明は、剰余演算装置及び剰余演算方法に関するものである。本発明は、特に、剰余演算装置における多倍長演算器及びその制御方式（処理方式）に関するものである。 The present invention relates to a residue calculation device and a residue calculation method. The present invention particularly relates to a multiple length arithmetic unit and a control method (processing method) thereof in a remainder arithmetic apparatus.

現在、電子商取引やネットワーク通信などではＲＳＡ（登録商標）（Ｒｉｖｅｓｔ・Ｓｈａｍｉｒ・Ａｄｌｅｍａｎ）公開鍵暗号方式（以下、「ＲＳＡ暗号」という）が利用されている。ＲＳＡ暗号では、べき乗剰余演算の処理を必要とする。一般に、ＲＳＡ暗号で用いられるべき乗剰余演算は、演算対象となるデータが５１２ビットや１０２４ビットなどの大きなサイズを持つため、非常に長い処理時間がかかることが知られている。 Currently, RSA (registered trademark) (Rivest, Shamir, Adleman) public key cryptosystem (hereinafter referred to as “RSA cipher”) is used in electronic commerce and network communication. The RSA cipher requires a power-residue calculation process. In general, it is known that the remainder calculation to be used in the RSA encryption takes a very long processing time because the data to be calculated has a large size such as 512 bits or 1024 bits.

べき乗剰余演算をハードウェア上で効率的に行う計算法として、Ｐ．Ｌ．Ｍｏｎｔｏｇｏｍｅｒｙにより提案されたモンゴメリ乗算と呼ばれるアルゴリズムが一般に知られている。このモンゴメリ乗算では、剰余演算で必要な除算処理をビットシフトで置き換えることができる。非特許文献１では、回路がレイテンシを持つ場合においても、効率よくモンゴメリ乗算を計算するアルゴリズムが開示されている。以下に非特許文献１に記載されているモンゴメリ乗算のアルゴリズムについて説明する。 As a calculation method for efficiently performing power-residue calculation on hardware, P.I. L. An algorithm called Montgomery multiplication proposed by Montgomery is generally known. In this Montgomery multiplication, the division processing necessary for the remainder operation can be replaced with a bit shift. Non-Patent Document 1 discloses an algorithm for efficiently calculating Montgomery multiplication even when a circuit has latency. The Montgomery multiplication algorithm described in Non-Patent Document 1 will be described below.

上記モンゴメリ乗算のハードウェアによる従来方式として、非特許文献１では、４−２加算器による構成例が開示されている。また、非特許文献２ではＦＰＧＡ（Ｆｉｅｌｄ・Ｐｒｏｇｒａｍｍａｂｌｅ・Ｇａｔｅ・Ａｒｒａｙ）のＲＡＭ（Ｒａｎｄｏｍ・Ａｃｃｅｓｓ・Ｍｅｍｏｒｙ）と加算器で構成される演算器を基本構成単位としたシストリックアレイ（Ｓｙｓｔｏｌｉｃ・Ａｒｒａｙ）の構成が示されている。非特許文献３ではＦＰＧＡの組み込み乗算器を用いたシストリックアレイの構成が示されている。ＦＰＧＡとしては、例えば非特許文献４に示されるようなものがある。
Ｈ．Ｏｒｕｐ、“ＳｉｍｐｌｉｆｙｉｎｇＱｕｏｔｉｅｎｔＤｅｔｅｒｍｉｎａｔｉｏｎｉｎＨｉｇｈ−ＲａｄｉｘＭｏｄｕｌａｒＭｕｌｔｉｐｌｉｃａｔｉｏｎ”、Ｐｒｏｃ．ｏｆｔｈｅ１２ｔｈＳｙｍｐｏｓｉｕｍｏｎＣｏｍｐｕｔｅｒＡｒｉｔｈｍｅｔｉｃ、１９９５年Ｔ．Ｂｌｕｍ、Ｃ．Ｐａａｒ、“Ｈｉｇｈ−ＲａｄｉｘＭｏｎｔｇｏｍｅｒｙＭｏｄｕｌａｒＥｘｐｏｎｅｎｔｉａｔｉｏｎｏｎＲｅｃｏｎｆｉｇｕｒａｂｌｅＨａｒｄｗａｒｅ”、ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＣｏｍｐｕｔｅｒｓ、Ｖｏｌ．５０、Ｎｏ．７、ｐｐ．７５９−７６４、２００１年７月Ｓ．Ｈ．Ｔａｎｇ、Ｋ．Ｓ．Ｔｓｕｉ、Ｐ．Ｈ．Ｗ．Ｌｅｏｎｇ、“ＭｏｄｕｌａｒＥｘｐｏｎｅｎｔｉａｔｉｏｎｕｓｉｎｇＰａｒａｌｌｅｌＭｕｌｔｉｐｌｉｅｒｓ”、Ｐｒｏｃｅｅｄｉｎｇｓ．２００３ＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＦｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＴｅｃｈｎｏｌｏｇｙ（ＦＰＴ）、ｐｐ．５２−５９、２００３年１２月Ｘｉｌｉｎｘ，Ｉｎｃ．、「Ｖｉｒｔｅｘ−４ユーザーガイド」、２００５年９月１２日、インターネット＜ＵＲＬ：ｈｔｔｐ：／／ｄｉｒｅｃｔ．ｘｉｌｉｎｘ．ｃｏｍ／ｂｖｄｏｃｓ／ｕｓｅｒｇｕｉｄｅｓ／ｊ＿ｕｇ０７０．ｐｄｆ＞ As a conventional method using the Montgomery multiplication hardware, Non-Patent Document 1 discloses a configuration example using a 4-2 adder. Further, in Non-Patent Document 2, a systolic array having an arithmetic unit composed of a RAM (Random, Access, and Memory) of an FPGA (Field, Programmable, Gate, and Array) and an adder as a basic structural unit. The configuration is shown. Non-Patent Document 3 shows a configuration of a systolic array using an FPGA built-in multiplier. As an FPGA, for example, there is one as shown in Non-Patent Document 4.
H. Orup, “Simplicating Quotient Determination in High-Radix Modular Multiplication”, Proc. of the 12th Symposium on Computer Arithmetic, 1995 T.A. Blum, C.I. Paar, “High-Radix Modular Modular Exponentialation on Reconfigurable Hardware”, IEEE Transactions on Computers, Vol. 50, no. 7, pp. 759-764, July 2001 S. H. Tang, K.K. S. Tsui, P .; H. W. Leong, “Modular Exponentializing Parallel Multipliers”, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT), pp. 52-59, December 2003 Xilinx, Inc. "Virtex-4 User Guide", September 12, 2005, Internet <URL: http: // direct. xilinx. com / bvdocs / userguides / j_ug070. pdf>

近年のＦＰＧＡでは様々な機能拡張が行われている。例えば、一般的なものでは、多倍長データの乗算処理を高速で行うための専用回路が予め実装されているため、これらを利用して乗算処理を効率的に行うデバイスを容易に実現することができる。また、例えば、非特許文献４のＦＰＧＡのように、音声や画像などの処理で一般に広く利用されているＤＳＰ（Ｄｉｇｉｔａｌ・Ｓｉｇｎａｌ・Ｐｒｏｃｅｓｓｏｒ）の基本要素が複数組み込まれるようになったものもある。前述したようなモンゴメリ乗算のハードウェアによる従来方式は、このようなＦＰＧＡの機能拡張を想定して構成されていないため、機能拡張されたＦＰＧＡに従来方式を単純に実装しても、ＦＰＧＡが持つ機能を十分に引き出すことができないという課題があった。 Various functions have been expanded in recent FPGAs. For example, in general, a dedicated circuit for performing multiplication processing of multiple-length data at high speed is pre-installed, and thus a device that efficiently performs multiplication processing can be easily realized using these circuits. Can do. Also, for example, there are some in which a plurality of basic elements of a DSP (Digital / Signal / Processor) which are generally widely used in processing of voices and images are incorporated, such as an FPGA of Non-Patent Document 4. Since the conventional method based on the hardware of Montgomery multiplication as described above is not configured assuming such a function expansion of the FPGA, even if the conventional method is simply implemented in the function-extended FPGA, the FPGA has it. There was a problem that the function could not be fully extracted.

本発明は、剰余演算を行って暗号アルゴリズムを処理する暗号処理回路において、ＦＰＧＡの機能拡張に適応した回路構成及び処理方式を採ることで、より少ない回路リソースでの高速動作を可能とし、さらに、回路の汎用性を高めることを目的とする。 The present invention enables a high-speed operation with fewer circuit resources by adopting a circuit configuration and processing method adapted to the function expansion of the FPGA in the cryptographic processing circuit that performs the cryptographic operation by performing the remainder operation, The purpose is to increase the versatility of the circuit.

本発明の一の態様に係る剰余演算装置は、
被乗数をＡ、乗数をＢ、法をＭ、中間値をＱとし、Ａ及びＢの乗算結果とＭ及びＱの乗算結果と中間結果Ｓとの加算結果に基づいて、モンゴメリ乗算の結果を得る剰余演算装置において、
所定の動作周波数の２倍の動作周波数にて、Ａ及びＢの乗算処理とＭ及びＱの乗算処理とを乗算器で行い、それぞれの乗算結果を出力する多倍長乗算部と、
前記所定の動作周波数にて、前記多倍長乗算部により出力された２つの乗算結果とＳとの加算処理を複数の加算器で行い、加算器ごとの加算結果を出力する多倍長加算部と、
前記多倍長加算部により出力された加算器ごとの加算結果を連結してモンゴメリ乗算の結果を得る剰余演算部とを備え、
前記多倍長乗算部は、前記２倍の動作周波数の各サイクルにて、所定のビット長の乗算処理を行い、
前記多倍長加算部は、前記所定の動作周波数の各サイクルにて、前記所定のビット長の２倍の長さの加算処理を行う。 A remainder calculation apparatus according to one aspect of the present invention is provided.
The remainder for obtaining the result of the Montgomery multiplication based on the addition result of the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S, where the multiplicand is A, the multiplier is B, the modulus is M, and the intermediate value is Q. In the arithmetic unit,
A multiple length multiplication unit that performs multiplication processing of A and B and multiplication processing of M and Q by a multiplier at an operation frequency that is twice the predetermined operation frequency, and outputs each multiplication result;
A multiple length addition unit that performs addition processing of two multiplication results output by the multiple length multiplication unit and S at the predetermined operating frequency with a plurality of adders, and outputs the addition result for each adder When,
A remainder calculation unit that obtains a result of Montgomery multiplication by concatenating the addition results for each adder output by the multiple length addition unit ;
The multiple length multiplication unit performs multiplication processing of a predetermined bit length in each cycle of the double operating frequency,
The multiple length adder performs an addition process of a length twice the predetermined bit length in each cycle of the predetermined operating frequency .

本発明の一の態様によれば、被乗数をＡ、乗数をＢ、法をＭ、中間値をＱとし、Ａ及びＢの乗算結果とＭ及びＱの乗算結果と中間結果Ｓとの加算結果に基づいて、モンゴメリ乗算の結果を得る剰余演算装置において、多倍長乗算部が、所定の動作周波数の２倍の動作周波数にて、Ａ及びＢの乗算処理とＭ及びＱの乗算処理とを乗算器で行い、それぞれの乗算結果を出力し、多倍長加算部が、前記所定の動作周波数にて、前記多倍長乗算部により出力された２つの乗算結果とＳとの加算処理を複数の加算器で行い、加算器ごとの加算結果を出力し、剰余演算部が、前記多倍長加算部により出力された加算器ごとの加算結果を連結してモンゴメリ乗算の結果を得ることにより、剰余演算を行って暗号アルゴリズムを処理する暗号処理回路において、より少ない回路リソースでの高速動作が可能となり、さらに、回路の汎用性が向上する。 According to one aspect of the present invention, the multiplicand is A, the multiplier is B, the modulus is M, the intermediate value is Q, and the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S are added. Based on this, in the remainder arithmetic unit that obtains the result of the Montgomery multiplication, the multiple length multiplication unit multiplies the A and B multiplication processes and the M and Q multiplication processes at an operating frequency that is twice the predetermined operating frequency. performed in a vessel, and outputs the respective multiplication results, multiple length adder portion, wherein at a predetermined operating frequency, said multiple length multiplied two output by the unit multiplication result S and adding process multiple of An adder outputs an addition result for each adder, and a remainder calculation unit concatenates the addition results for each adder output by the multiple-length adder to obtain a Montgomery multiplication result , In a cryptographic processing circuit that performs computation and processes cryptographic algorithms It enables high-speed operation with less circuit resource, further improves the versatility of the circuit.

以下、本発明の実施の形態について、図を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

実施の形態１．
本実施の形態に係る暗号処理回路は、剰余演算を行って暗号アルゴリズムを処理するために、特に、ＦＰＧＡのリソースを有効活用することにより、処理の高速化と回路の小型化を実現するものである。 Embodiment 1 FIG.
The cryptographic processing circuit according to the present embodiment realizes high-speed processing and circuit miniaturization, in particular, by effectively utilizing FPGA resources in order to perform a remainder operation and process cryptographic algorithms. is there.

図１は、本実施の形態に係る暗号処理回路を用いたデバイスの一例として、ＦＰＧＡを用いたセキュリティチップ１０１の構成を示すブロック図である。セキュリティチップ１０１は、例えば、携帯電話、ＩＣ（Ｉｎｔｅｇｒａｔｅｄ・Ｃｉｒｃｕｉｔ）カード、ＩＣカードリーダライタといった小型端末やネットワークルータなどの通信機器に組み込まれ、通信データの暗号処理のために用いられる。 FIG. 1 is a block diagram illustrating a configuration of a security chip 101 using an FPGA as an example of a device using the cryptographic processing circuit according to the present embodiment. The security chip 101 is incorporated in a communication device such as a small terminal such as a mobile phone, an IC (Integrated Circuit) card, or an IC card reader / writer, or a network router, and used for encryption processing of communication data.

図１において、セキュリティチップ１０１は、ＣＰＵ１０２（Ｃｅｎｔｒａｌ・Ｐｒｏｃｅｓｓｉｎｇ・Ｕｎｉｔ）、内部メモリ１０３、外部インタフェース１０４、暗号処理回路用メモリ１０５を備え、これらはバス１０６で接続されている。また、セキュリティチップ１０１は、本実施の形態に係る暗号処理回路として、モンゴメリ乗算回路２０１を備える。 In FIG. 1, a security chip 101 includes a CPU 102 (Central Processing Unit), an internal memory 103, an external interface 104, and a cryptographic processing circuit memory 105, which are connected by a bus 106. The security chip 101 also includes a Montgomery multiplication circuit 201 as an encryption processing circuit according to the present embodiment.

ＣＰＵ１０２は、モンゴメリ乗算回路２０１に対する入出力の制御、内部メモリ１０３のデータの読み書き、外部インタフェース１０４を介した外部装置との通信などを行うプログラムを実行するプロセッサ（処理装置）である。内部メモリ１０３は、ＣＰＵ１０２の作業領域として利用されるＲＡＭ（Ｒａｎｄｏｍ・Ａｃｃｅｓｓ・Ｍｅｍｏｒｙ）、各種プログラムや設定パラメータなどを格納するＲＯＭ（Ｒｅａｄ・Ｏｎｌｙ・Ｍｅｍｏｒｙ）からなり、ＦＰＧＡが持つ機能の１つである組み込みメモリで実装することが想定される。外部インタフェース１０４は、シリアル通信やＬＡＮ（ローカルエリアネットワーク）などの各種インタフェースであり、セキュリティチップ１０１に対して、モンゴメリ乗算回路２０１で処理するデータの入出力を行う。暗号処理回路用メモリ１０５は、モンゴメリ乗算回路２０１とバス１０６間のインタフェースとして用いられる。 The CPU 102 is a processor (processing device) that executes a program for controlling input / output with respect to the Montgomery multiplication circuit 201, reading / writing data in the internal memory 103, communicating with an external device via the external interface 104, and the like. The internal memory 103 includes a RAM (Random / Access / Memory) used as a work area of the CPU 102 and a ROM (Read / Only / Memory) for storing various programs and setting parameters, and is one of the functions of the FPGA. It is assumed to be implemented with some built-in memory. The external interface 104 is various interfaces such as serial communication and LAN (local area network), and inputs / outputs data processed by the Montgomery multiplication circuit 201 to the security chip 101. The cryptographic processing circuit memory 105 is used as an interface between the Montgomery multiplication circuit 201 and the bus 106.

モンゴメリ乗算回路２０１は、剰余演算装置の一例である。モンゴメリ乗算回路２０１は、モンゴメリ乗算を利用した暗号処理に、ＦＰＧＡが持つ機能や特性を活用することで、従来方式より効率的な演算を可能とするものである。以下、モンゴメリ乗算回路２０１の詳細について説明する。 The Montgomery multiplication circuit 201 is an example of a remainder calculation device. The Montgomery multiplication circuit 201 makes it possible to perform more efficient calculation than the conventional method by utilizing the functions and characteristics of the FPGA for encryption processing using Montgomery multiplication. Details of the Montgomery multiplication circuit 201 will be described below.

まず、モンゴメリ乗算回路２０１に実装されるアルゴリズムを以下に示す。下記アルゴリズムは、非特許文献１に記載のモンゴメリ乗算のアルゴリズムをＦＰＧＡが持つ機能や特性を考慮して改良したものである。 First, an algorithm implemented in the Montgomery multiplication circuit 201 is shown below. The following algorithm is an improvement of the Montgomery multiplication algorithm described in Non-Patent Document 1 in consideration of the functions and characteristics of the FPGA.

上記アルゴリズムの擬似コードにおいて、“｜”はビット連結を意味する。また、ｃａｒｒｙ＿ｋはｋビット、ｃａｒｒｙ＿１は１ビットの中間値を意味する。また、変数ｐ［ｉ］［ｊ］、ｕ［ｉ］［ｊ］、ｖ［ｉ］［ｊ］、ｓ［ｉ］［ｊ］はそれぞれｋビットとし、初期値は０とする。 In the pseudo code of the above algorithm, “|” means bit concatenation. Further, carry_k means an intermediate value of k bits and carry_1 means 1 bit. The variables p [i] [j], u [i] [j], v [i] [j], and s [i] [j] are each set to k bits, and the initial value is 0.

上記擬似コードの特徴は、乗算処理をｋビットごとの２つの乗算処理、多倍長乗算処理（１）、多倍長乗算処理（２）に分割している点と、加算処理を２ｋビットごとの２つの加算処理、多倍長加算処理（１）、多倍長加算処理（２）に分割している点である。また、他の特徴として、多倍長乗算処理（２）ではｊ＝０の場合のみ処理を変更することで、ｑ導出処理を高速に行うことが挙げられる。これらの理由及び詳細については後述する。 The pseudo code is characterized in that the multiplication process is divided into two multiplication processes every k bits, a multiple multiplication process (1), and a multiple multiplication process (2), and the addition process is performed every 2 k bits. Are divided into two addition processing, multiple length addition processing (1), and multiple length addition processing (2). As another feature, in the multiple length multiplication process (2), the q derivation process can be performed at high speed by changing the process only when j = 0. These reasons and details will be described later.

図２は、モンゴメリ乗算回路２０１の構成を示すブロック図である。図２に示したモンゴメリ乗算回路２０１全体で、前述した擬似コード全体が処理される。 FIG. 2 is a block diagram showing the configuration of the Montgomery multiplication circuit 201. The entire pseudo code described above is processed by the entire Montgomery multiplication circuit 201 shown in FIG.

図２において、モンゴメリ乗算回路２０１は、演算回路Ａ３０１、演算回路Ｂ４０１、演算回路Ｃ５０１、演算回路Ｄ６０１の４種類の演算回路を備える。演算回路Ａ３０１、演算回路Ｄ６０１は、多倍長乗算部の一例であり、擬似コードの多倍長乗算処理（１）及び多倍長乗算処理（２）を行う演算回路である。また、演算回路Ｂ４０１、演算回路Ｃ５０１は、多倍長乗算部の一例であり、順に、擬似コードの多倍長加算処理（１）、多倍長加算処理（２）を行う演算回路である。モンゴメリ乗算回路２０１全体の構成及び動作について説明する前に、まず、それぞれの種類の演算回路について説明する。 In FIG. 2, the Montgomery multiplication circuit 201 includes four types of arithmetic circuits: an arithmetic circuit A301, an arithmetic circuit B401, an arithmetic circuit C501, and an arithmetic circuit D601. The arithmetic circuit A301 and the arithmetic circuit D601 are an example of a multiple length multiplication unit, and are arithmetic circuits that perform multiple length multiplication processing (1) and multiple length multiplication processing (2) of pseudo code. The arithmetic circuit B401 and the arithmetic circuit C501 are examples of a multiple length multiplication unit, and are arithmetic circuits that sequentially perform multiple length addition processing (1) and multiple length addition processing (2) of pseudo code. Before describing the overall configuration and operation of the Montgomery multiplication circuit 201, each type of arithmetic circuit will be described first.

図３は、演算回路Ａ３０１の構成の一例を示すブロック図である。 FIG. 3 is a block diagram illustrating an example of the configuration of the arithmetic circuit A301.

図３に示した演算回路Ａ３０１は、前述した擬似コードの多倍長乗算処理（１）及び多倍長乗算処理（２）におけるループ処理のうち、ｊ＝０からｊ＝（２ｔ−１）までの処理を行うものである。この演算回路Ａ３０１の処理は以下の擬似コード（以下、「擬似コードＡ」という）で表せる。 The arithmetic circuit A301 shown in FIG. 3 includes j = 0 to j = (2t−1) among the loop processes in the multiple-precision multiplication process (1) and the multiple-precision multiplication process (2) of the pseudo code described above. The process is performed. The processing of the arithmetic circuit A301 can be expressed by the following pseudo code (hereinafter referred to as “pseudo code A”).

ここで、擬似コードＡと演算回路Ａ３０１の各部との対応について説明する。 Here, the correspondence between the pseudo code A and each part of the arithmetic circuit A301 will be described.

まず、入力ポート３０２から、擬似コードＡのａ［ｊ］及びａ［ｊ＋１］、もしくは、ｍ［ｊ］及びｍ［ｊ＋１］が同時に入力される。そのため、入力ポート３０２は、２ｋビットのバス幅を持つ。また、入力ポート３０３からは、擬似コードＡのｂ［ｉ］が入力される。また、入力ポート３０４からは、擬似コードＡのｑ［ｉ−ｄ］が入力される。入力ポート３０２，３０３，３０４から入力されたデータは、それぞれ入力レジスタ３０９，３１０，３１１，３１２に格納される。 First, a [j] and a [j + 1] of pseudo code A or m [j] and m [j + 1] are simultaneously input from the input port 302. Therefore, the input port 302 has a 2k bit bus width. Further, b [i] of pseudo code A is input from the input port 303. Further, q [id] of pseudo code A is input from the input port 304. Data input from the input ports 302, 303, and 304 are stored in the input registers 309, 310, 311 and 312 respectively.

入力レジスタ３０９，３１０に格納されたデータがａ［ｊ］とａ［ｊ＋１］の場合、そのどちらか一方がマルチプレクサ３０６で選択され、中間レジスタ３１３に格納される。同様に、入力レジスタ３０９，３１０に格納されたデータがｍ［ｊ］とｍ［ｊ＋１］の場合、そのどちらか一方がマルチプレクサ３０６で選択され、中間レジスタ３１３に格納される。また、入力レジスタ３１１，３１２に格納されたｂ［ｉ］及びｑ［ｉ−ｄ］は、マルチプレクサ３０７で選択され、中間レジスタ３１４に格納される。中間レジスタ３１４のデータは、出力ポート３１９に出力される。 When the data stored in the input registers 309 and 310 are a [j] and a [j + 1], one of them is selected by the multiplexer 306 and stored in the intermediate register 313. Similarly, when the data stored in the input registers 309 and 310 are m [j] and m [j + 1], one of them is selected by the multiplexer 306 and stored in the intermediate register 313. Further, b [i] and q [id] stored in the input registers 311 and 312 are selected by the multiplexer 307 and stored in the intermediate register 314. The data in the intermediate register 314 is output to the output port 319.

中間レジスタ３１３と中間レジスタ３１４に格納されたデータは、乗算器３１６で乗算処理され、その結果が中間レジスタ３１５に格納される。ここで、中間レジスタ３１５はｋビット×ｋビットの乗算結果を格納するため、２ｋビットのレジスタとなる。中間レジスタ３１５に格納されたデータは、入力ポート３０５から与えられ入力レジスタ３０８に格納されたデータ、もしくは、出力レジスタ３２２に格納されたデータの上位ｋビットと加算器３１７で加算処理され、その結果が出力レジスタ３２２に格納される。ここで、入力レジスタ３０８に格納されたデータと出力レジスタ３２２に格納されたデータの上位ｋビットの選択はマルチプレクサ３１８で行われる。出力レジスタ３２２は、２ｋビットのレジスタであり、上位ｋビットはマルチプレクサ３１８の入力と出力ポート３２０に接続され、下位ｋビットは出力ポート３２１に接続されている。 The data stored in the intermediate register 313 and the intermediate register 314 is multiplied by the multiplier 316, and the result is stored in the intermediate register 315. Here, since the intermediate register 315 stores a multiplication result of k bits × k bits, the intermediate register 315 is a 2 k bit register. The data stored in the intermediate register 315 is added by the adder 317 with the upper k bits of the data supplied from the input port 305 and stored in the input register 308 or stored in the output register 322, and the result Is stored in the output register 322. Here, the multiplexer 318 selects the upper k bits of the data stored in the input register 308 and the data stored in the output register 322. The output register 322 is a 2 k-bit register, the upper k bits are connected to the input of the multiplexer 318 and the output port 320, and the lower k bits are connected to the output port 321.

続けて、演算回路Ａ３０１の詳細な動作について説明する。ここで、図３の点線外における動作クロックをｃｌｋ１ｘとし、点線内の動作クロックをｃｌｋ２ｘと表記する。ｃｌｋ２ｘはｃｌｋ１ｘの２倍の動作周波数であるものとする。図３の点線内は、ＦＰＧＡであるセキュリティチップ１０１に予め設けられている多倍長乗算回路３２３である。前述したように、ＦＰＧＡは、このような乗算処理のための専用回路を具備している。そして、ＦＰＧＡの特性の１つとして、このような専用回路は一般的に高い周波数で動作することができる。 Next, detailed operation of the arithmetic circuit A301 will be described. Here, the operation clock outside the dotted line in FIG. 3 is denoted as clk1x, and the operation clock within the dotted line is denoted as clk2x. It is assumed that clk2x has an operating frequency twice that of clk1x. A dotted line in FIG. 3 indicates a multiple length multiplication circuit 323 provided in advance in the security chip 101 which is an FPGA. As described above, the FPGA includes a dedicated circuit for such multiplication processing. As one of the characteristics of the FPGA, such a dedicated circuit can generally operate at a high frequency.

最初に、入力ポート３０２から、ｃｌｋ１ｘを基準としてａ［１］｜ａ［０］が入力される。以降、ｃｌｋ１ｘを基準としてａ［２ｔ−１］｜ａ［２ｔ−２］まで毎サイクル、入力ポート３０２からデータが連続して入力される。以下では、この処理を入力処理ａと呼ぶことにする。続けて、入力ポート３０２から、ａ［２ｔ−１］｜ａ［２ｔ−２］が入力されたサイクルの次のサイクルでｍ［１］｜ｍ［０］が入力される。以降、ｃｌｋ１ｘを基準としてｍ［２ｔ−１］｜ｍ［２ｔ−２］まで毎サイクル、入力ポート３０２からデータが連続して入力される。以下では、この入力処理ａの終了後の処理を入力処理ｍと呼ぶことにする。入力処理ｍの終了後は再び入力処理ａ、入力処理ｍが繰り返し実行される。そして、この一連の処理は、所定の回数（ｉ＝０からｉ＝ｎ＋ｄまで）繰り返し実行される。 First, a [1] | a [0] is input from the input port 302 with reference to clk1x. Thereafter, data is continuously input from the input port 302 every cycle up to a [2t−1] | a [2t−2] with reference to clk1x. Hereinafter, this processing is referred to as input processing a. Subsequently, m [1] | m [0] is input from the input port 302 in the cycle after the cycle in which a [2t-1] | a [2t-2] is input. Thereafter, data is continuously input from the input port 302 every cycle up to m [2t−1] | m [2t−2] with reference to clk1x. Hereinafter, the process after the end of the input process a will be referred to as an input process m. After the input process m ends, the input process a and the input process m are repeatedly executed again. This series of processing is repeatedly executed a predetermined number of times (from i = 0 to i = n + d).

上記の入力処理ａ及び入力処理ｍに同期しながら、ｃｌｋ１ｘを基準として入力ポート３０３からはｂ［ｉ］が、入力ポート３０４からはｑ［ｉ−ｄ］が入力される。 In synchronization with the input process a and the input process m, b [i] is input from the input port 303 and q [id] is input from the input port 304 with reference to clk1x.

マルチプレクサ３０６は、ｃｌｋ２ｘを基準として入力レジスタ３０９と入力レジスタ３１０に格納されたデータを交互に選択する。マルチプレクサ３０７は、入力レジスタ３０９と入力レジスタ３１０に入力処理ａのデータが格納されている場合には、入力レジスタ３１１に格納されたデータであるｂ［ｉ］を選択する。一方、入力レジスタ３０９と入力レジスタ３１０に入力処理ｍのデータが格納されている場合には、入力レジスタ３１２に格納されたデータであるｑ［ｉ−ｄ］を選択する。つまり、中間レジスタ３１３にａ［ｊ］あるいはａ［ｊ＋１］が格納されている場合は、中間レジスタ３１４にｂ［ｉ］が格納され、中間レジスタ３１３にｍ［ｊ］あるいはｍ［ｊ＋１］が格納されている場合は、中間レジスタ３１４にｑ［ｉ−ｄ］が格納される。 The multiplexer 306 alternately selects the data stored in the input register 309 and the input register 310 with reference to clk2x. The multiplexer 307 selects b [i], which is the data stored in the input register 311, when the data of the input process a is stored in the input register 309 and the input register 310. On the other hand, when the data of the input process m is stored in the input register 309 and the input register 310, q [id] that is the data stored in the input register 312 is selected. That is, when a [j] or a [j + 1] is stored in the intermediate register 313, b [i] is stored in the intermediate register 314, and m [j] or m [j + 1] is stored in the intermediate register 313. If it is, q [id] is stored in the intermediate register 314.

これにより、乗算器３１６では、入力処理ａに対応してｂ［ｉ］・ａ［ｊ］（“・”は乗算処理を表すが、省略する場合がある）が処理され、入力処理ｍに対応してｑ［ｉ−ｄ］・ｍ［ｊ］が処理される。処理結果は、中間レジスタ３１５に格納される。 Thus, the multiplier 316 processes b [i] · a [j] (“·” represents a multiplication process, but may be omitted) corresponding to the input process a, and corresponds to the input process m. Then q [id] · m [j] is processed. The processing result is stored in the intermediate register 315.

次に、中間レジスタ３１５にｂ［ｉ］・ａ［０］が格納されている場合、入力レジスタ３０８の値は０にセットされ、マルチプレクサ３１８で入力レジスタ３０８の値が選択されて、加算器３１７がｂ［ｉ］・ａ［０］＋０の処理を実行する。それ以降は、マルチプレクサ３１８で出力レジスタ３２２の上位ｋビットの値が選択されて、加算器３１７がｂ［ｉ］・ａ［ｊ］＋ｃａｒｒｙ＿ｋの処理を実行する。一方、中間レジスタ３１５にｑ［ｉ−ｄ］・ｍ［０］が格納されている場合、入力レジスタ３０８の値はｐ［ｉ］［０］にセットされ、マルチプレクサ３１８で入力レジスタ３０８の値が選択されて、加算器３１７がｑ［ｉ−ｄ］・ｍ［０］＋ｐ［ｉ］［０］の処理を実行する。それ以降は、マルチプレクサ３１８で出力レジスタ３２２の上位ｋビットの値が選択されて、加算器３１７がｑ［ｉ−ｄ］・ｍ［ｊ］＋ｃａｒｒｙ＿ｋの処理を実行する。 Next, when b [i] · a [0] is stored in the intermediate register 315, the value of the input register 308 is set to 0, the value of the input register 308 is selected by the multiplexer 318, and the adder 317 Executes the process of b [i] · a [0] +0. After that, the value of the upper k bits of the output register 322 is selected by the multiplexer 318, and the adder 317 executes the process b [i] · a [j] + carry_k. On the other hand, when q [id] · m [0] is stored in the intermediate register 315, the value of the input register 308 is set to p [i] [0], and the value of the input register 308 is set by the multiplexer 318. When selected, the adder 317 executes a process of q [id] · m [0] + p [i] [0]. Thereafter, the value of the upper k bits of the output register 322 is selected by the multiplexer 318, and the adder 317 executes the process of q [id] · m [j] + carry_k.

以上の処理により、出力ポート３２１には、入力処理ａに対応してｐ［ｉ］［０］からｐ［ｉ］［２ｔ−１］が出力され、入力処理ｍに対応してｖ［ｉ］［０］及びｕ［ｉ］［１］からｕ［ｉ］［２ｔ−１］が出力される。また、出力ポート３２０には、出力ポート３２１に対応したｃａｒｒｙ＿ｋが出力される。出力ポート３２０，３２１には、ｃｌｋ２ｘを基準としてサイクルごとに各データが出力される。 Through the above processing, p [i] [0] to p [i] [2t−1] are output from the output port 321 corresponding to the input processing a, and v [i] is output corresponding to the input processing m. U [i] [2t-1] is output from [0] and u [i] [1]. Also, carry_k corresponding to the output port 321 is output to the output port 320. Each data is output to the output ports 320 and 321 for each cycle with reference to clk2x.

このように、演算回路Ａ３０１では、入力ポート３０２，３０３へのデータの入力がｃｌｋ１ｘを基準とするのに対して、乗算処理や加算処理自体はｃｌｋ２ｘを基準とする。本実施の形態では、このような構成により、入力ポート３０２，３０３に接続される外部回路を、ｃｌｋ１ｘを基準として構成することができ、なおかつ乗算処理や加算処理自体はｃｌｋ２ｘを基準として処理できる。このように、本実施の形態では、外部回路を低い動作周波数で構成できるため、タイミング制約が緩くなり外部回路の設計条件を緩めることができる。 As described above, in the arithmetic circuit A301, the input of data to the input ports 302 and 303 is based on clk1x, while the multiplication processing and the addition processing itself are based on clk2x. In the present embodiment, with such a configuration, an external circuit connected to the input ports 302 and 303 can be configured with reference to clk1x, and multiplication processing and addition processing itself can be processed with reference to clk2x. As described above, in this embodiment, since the external circuit can be configured with a low operating frequency, the timing constraint is relaxed and the design conditions of the external circuit can be relaxed.

様々な機能拡張が施された近年のＦＰＧＡでは、図３に示した多倍長乗算回路３２３のような乗算処理や加算処理の専用回路がハードマクロとして組み込まれている。一般にこのような専用回路は、高い動作周波数で動作させることができる。一方で、専用回路以外の部分に関してＦＰＧＡの回路を専用回路と同等の動作周波数で動作させることを前提とした場合、タイミング制約が厳しくなるため、複雑な回路を実装することは難しい。本実施の形態は、全体の性能を低下させずに、専用回路部分のみを高い動作周波数で動作させ、それ以外の部分を低い動作周波数で動作させることができるため、効率的な処理が実現できる。 In recent FPGAs to which various functions are expanded, a dedicated circuit for multiplication processing and addition processing such as the multiple-length multiplication circuit 323 shown in FIG. 3 is incorporated as a hard macro. In general, such a dedicated circuit can be operated at a high operating frequency. On the other hand, when it is assumed that the FPGA circuit is operated at an operating frequency equivalent to that of the dedicated circuit with respect to parts other than the dedicated circuit, it is difficult to mount a complicated circuit because the timing constraint becomes severe. In this embodiment, only the dedicated circuit portion can be operated at a high operating frequency and the other portions can be operated at a low operating frequency without degrading the overall performance, so that efficient processing can be realized. .

図４は、前述したアルゴリズムの擬似コードのｑ導出処理を行う演算回路の構成の一例を示すブロック図である。 FIG. 4 is a block diagram showing an example of the configuration of an arithmetic circuit that performs q derivation processing of pseudo code of the algorithm described above.

図４に示した演算回路のうち、演算回路Ａ３０１を除いたものは、中間値導出部の一例である。この演算回路において、入力ポート２０５からは、ｓ［ｉ］［１］が入力される。そして、入力レジスタ２０２は、ｃｌｋ１ｘに同期してｓ［ｉ］［１］を取り込む。中間レジスタ２０３は、図３に示した演算回路Ａ３０１の出力ポート３２１からｖ［ｉ］［０］が出力されたタイミングでデータを取り込む。その結果、中間レジスタ２０３にｖ［ｉ］［０］が格納される。加算器２０４は、ｖ［ｉ］［０］＋ｓ［ｉ］［１］の処理を行い、その結果としてｑ［ｉ＋１］を出力する。加算器２０４は、演算回路Ａ３０１の入力ポート３０４に接続されており、入力レジスタ３１２が所定のタイミングで入力ポート３０４からｑ［ｉ＋１］を取り込む。図示していないが、加算器２０４と演算回路Ａ３０１の入力ポート３０４の間にはバッファが設けられており、このバッファが加算器２０４から出力されるｑ［ｉ＋１］に遅延ｄを加えることで、上記タイミングが調整される。その結果、図３に示したように、演算回路Ａ３０１の入力ポート３０４からはｑ［ｉ−ｄ］が入力されることになる。また、演算回路Ａ３０１の出力ポート３２１は、入力ポート３０５を介して入力レジスタ３０８と接続されており、入力レジスタ３０８が出力ポート３２１からｐ［ｉ］［０］が出力されたタイミングで入力ポート３０５からデータを取り込む。その結果、入力レジスタ３０８にｐ［ｉ］［０］が格納される。 The arithmetic circuit shown in FIG. 4 excluding the arithmetic circuit A301 is an example of the intermediate value deriving unit. In this arithmetic circuit, s [i] [1] is input from the input port 205. Then, the input register 202 captures s [i] [1] in synchronization with clk1x. The intermediate register 203 captures data at the timing when v [i] [0] is output from the output port 321 of the arithmetic circuit A301 illustrated in FIG. As a result, v [i] [0] is stored in the intermediate register 203. The adder 204 processes v [i] [0] + s [i] [1] and outputs q [i + 1] as a result. The adder 204 is connected to the input port 304 of the arithmetic circuit A301, and the input register 312 takes in q [i + 1] from the input port 304 at a predetermined timing. Although not shown, a buffer is provided between the adder 204 and the input port 304 of the arithmetic circuit A301, and this buffer adds a delay d to q [i + 1] output from the adder 204. The timing is adjusted. As a result, as shown in FIG. 3, q [id] is input from the input port 304 of the arithmetic circuit A301. The output port 321 of the arithmetic circuit A301 is connected to the input register 308 via the input port 305, and the input port 305 is output when p [i] [0] is output from the output port 321. Import data from. As a result, p [i] [0] is stored in the input register 308.

上記擬似コードの多倍長乗算処理（２）でｊ＝０の場合のみｖ［ｉ］［０］を算出するように処理するのは、図４に示した演算回路で行う処理を多倍長加算処理（１）の結果を待たずして開始できるからである。これにより、ｑ［ｉ＋１］の導出を高速に行うことができる。ｑ［ｉ＋１］の導出が高速であれば、繰り返し処理である入力処理ａ及び入力処理ｍを連続して行うことができる。逆に、ｑ［ｉ＋１］の導出が遅いと、入力処理ｍを実行しても、乗算対象であるｑ［ｉ＋１］が入力されるまで乗算処理を開始することができず、以降の処理を連続して行うことができなくなってしまう。 The process of calculating v [i] [0] only when j = 0 in the multiple-precision multiplication process (2) of the pseudo code described above is the same as the process performed by the arithmetic circuit shown in FIG. This is because the result of the addition process (1) can be started without waiting. Thereby, q [i + 1] can be derived at high speed. If the derivation of q [i + 1] is fast, the input process a and the input process m, which are repetitive processes, can be performed continuously. Conversely, if the derivation of q [i + 1] is slow, even if the input process m is executed, the multiplication process cannot be started until the multiplication target q [i + 1] is input, and the subsequent processes are continued. And will not be able to do it.

図５は、演算回路Ｂ４０１の構成の一例を示すブロック図である。 FIG. 5 is a block diagram illustrating an example of the configuration of the arithmetic circuit B401.

図５に示した演算回路Ｂ４０１は、前述したアルゴリズムの擬似コードの多倍長加算処理（１）におけるループ処理のうち、ｊ＝０からｊ＝ｔ−１までの処理を行うものである。 The arithmetic circuit B401 shown in FIG. 5 performs processing from j = 0 to j = t-1 in the loop processing in the multiple-length addition processing (1) of the pseudo code of the algorithm described above.

図３に示した演算回路Ａ３０１が２ｔ回のループ処理を行うのに対して、図５に示した演算回路Ｂ４０１はｔ回のループ処理を行うことに注意されたい。演算回路Ｂ４０１は、ｃｌｋ１ｘを基準として動作する。このため、演算回路Ｂ４０１は、大部分の回路がｃｌｋ２ｘで動作する演算回路Ａ３０１の半分のループ回数を処理することとなる。 It should be noted that the arithmetic circuit A301 shown in FIG. 3 performs 2t loop processing, whereas the arithmetic circuit B401 shown in FIG. 5 performs t loop processing. The arithmetic circuit B401 operates with reference to clk1x. For this reason, the arithmetic circuit B401 processes most of the number of loops of the arithmetic circuit A301 operating with clk2x.

入力ポート４０２は、演算回路Ａ３０１の出力ポート３２１に接続されている。出力ポート３２１から出力されるデータのうち、演算回路Ａ３０１でのループ処理におけるｊの値が偶数のときの出力は、中間レジスタ４０４に格納される。同じくｊの値が奇数のときの出力は、中間レジスタ４０６に格納される。中間レジスタ４０５は、中間レジスタ４０４の出力を格納する。このとき、中間レジスタ４０４と中間レジスタ４０５，４０６は、互いに逆のエッジで動作するレジスタであるとする。つまり、中間レジスタ４０４がｃｌｋ１ｘの立ち上がりで動作するレジスタであるとすれば、中間レジスタ４０５，４０６はｃｌｋ１ｘの立ち下がりで動作するレジスタとなる。このような構成により、中間レジスタ４０５，４０６からは同一のタイミングで、隣り合う中間値を連結した値であるｐ［ｉ］［ｊ＋１］｜ｐ［ｉ］［ｊ］あるいはｕ［ｉ］［ｊ＋１］｜ｕ［ｉ］［ｊ］が出力されることになる。例えば、中間レジスタ４０５がｐ［ｉ］［０］を出力するタイミングでは、中間レジスタ４０６はｐ［ｉ］［１］を出力し、中間レジスタ４０５がｐ［ｉ］［２］を出力するタイミングでは、中間レジスタ４０６はｐ［ｉ］［３］を出力する。また、中間レジスタ４０５がｕ［ｉ］［２］を出力するタイミングでは、中間レジスタ４０６はｕ［ｉ］［３］を出力する。ただし、前述したように、上記擬似コードの多倍長乗算処理（２）ではｊ＝０の場合のみｖ［ｉ］［０］を算出する処理を行う関係上、中間レジスタ４０５がｖ［ｉ］［０］を出力するタイミングでは、中間レジスタ４０６はｕ［ｉ］［１］を出力することに注意されたい。 The input port 402 is connected to the output port 321 of the arithmetic circuit A301. Of the data output from the output port 321, the output when the value of j in the loop processing in the arithmetic circuit A 301 is an even number is stored in the intermediate register 404. Similarly, the output when the value of j is an odd number is stored in the intermediate register 406. The intermediate register 405 stores the output of the intermediate register 404. At this time, the intermediate register 404 and the intermediate registers 405 and 406 are registers that operate at opposite edges. That is, if the intermediate register 404 is a register that operates at the rising edge of clk1x, the intermediate registers 405 and 406 are registers that operate at the falling edge of clk1x. With such a configuration, the intermediate registers 405 and 406 receive p [i] [j + 1] | p [i] [j] or u [i] [j + 1] that are values obtained by connecting adjacent intermediate values at the same timing. ] [U [i] [j] is output. For example, at the timing when the intermediate register 405 outputs p [i] [0], the intermediate register 406 outputs p [i] [1], and at the timing when the intermediate register 405 outputs p [i] [2]. The intermediate register 406 outputs p [i] [3]. At the timing when the intermediate register 405 outputs u [i] [2], the intermediate register 406 outputs u [i] [3]. However, as described above, the pseudo register multiple multiplication process (2) performs the process of calculating v [i] [0] only when j = 0, so that the intermediate register 405 has v [i]. Note that the intermediate register 406 outputs u [i] [1] at the timing of outputting [0].

中間レジスタ４０５，４０６がｐ［ｉ］［ｊ］｜ｐ［ｉ］［ｊ＋１］を出力する場合、ＡＮＤゲート４０７，４０８の出力が０になるようにマスク処理が行われる。よって、加算器４１０，４１１の出力は、ｐ［ｉ］［ｊ］｜ｐ［ｉ］［ｊ＋１］となる。加算器４１０，４１１がｐ［ｉ］［ｊ］｜ｐ［ｉ］［ｊ＋１］を出力するとき、可変シフトレジスタ４１２，４１３は、加算器４１０，４１１の出力を取り込む。その結果、可変シフトレジスタ４１２，４１３にｐ［ｉ］［ｊ］｜ｐ［ｉ］［ｊ＋１］が格納される。次に、中間レジスタ４０５，４０６がｕ［ｉ］［ｊ］｜ｕ［ｉ］［ｊ＋１］を出力する場合、可変シフトレジスタ４１２，４１３は、格納されているｐ［ｉ］［ｊ］｜ｐ［ｉ］［ｊ＋１］を出力する。このとき、ＡＮＤゲート４０７，４０８ではマスク処理が行われず、可変シフトレジスタ４１２，４１３の出力が、ＡＮＤゲート４０７，４０８を介して加算器４１０，４１１に入力される。そして、加算器４１０，４１１は、ｐ［ｉ］［１］＋ｕ［ｉ］［１］あるいは（ｐ［ｉ］［ｊ＋１］｜ｐ［ｉ］［ｊ］）＋（ｕ［ｉ］［ｊ＋１］｜ｕ［ｉ］［ｊ］）＋ｃａｒｒｙ＿１の処理を行う。ここで、ｃａｒｒｙ＿１は出力レジスタ４１６の値である。ｐ［ｉ］［１］＋ｕ［ｉ］［１］の処理が実行される場合は、入力ポート４０３から０が入力され、マルチプレクサ４０９で入力ポート４０３からの０が選択されて加算器４１０に入力される。それ以外の場合は、マルチプレクサ４０９で出力レジスタ４１６の値が選択されて加算器４１０に入力される。加算器４１０，４１１による加算処理の結果であるｖ［ｉ］［ｊ］、ｖ［ｉ］［ｊ＋１］、及び、加算処理時に発生するキャリー値であるｃａｒｒｙ＿１は、それぞれ出力レジスタ４１４〜４１６に格納され、出力ポート４１７〜４１９に出力される。 When the intermediate registers 405 and 406 output p [i] [j] | p [i] [j + 1], mask processing is performed so that the outputs of the AND gates 407 and 408 become zero. Therefore, the outputs of the adders 410 and 411 are p [i] [j] | p [i] [j + 1]. When the adders 410 and 411 output p [i] [j] | p [i] [j + 1], the variable shift registers 412 and 413 capture the outputs of the adders 410 and 411. As a result, p [i] [j] | p [i] [j + 1] is stored in the variable shift registers 412 and 413. Next, when the intermediate registers 405 and 406 output u [i] [j] | u [i] [j + 1], the variable shift registers 412 and 413 store p [i] [j] | p [I] [j + 1] is output. At this time, the mask processing is not performed in the AND gates 407 and 408, and the outputs of the variable shift registers 412 and 413 are input to the adders 410 and 411 via the AND gates 407 and 408. Then, the adders 410 and 411 receive p [i] [1] + u [i] [1] or (p [i] [j + 1] | p [i] [j]) + (u [i] [j + 1] | U [i] [j]) + carry_1. Here, carry_1 is the value of the output register 416. When the process of p [i] [1] + u [i] [1] is executed, 0 is input from the input port 403, and 0 from the input port 403 is selected by the multiplexer 409 and input to the adder 410. Is done. In other cases, the value of the output register 416 is selected by the multiplexer 409 and input to the adder 410. V [i] [j] and v [i] [j + 1], which are the results of the addition processing by the adders 410 and 411, and carry_1, which is the carry value generated during the addition processing, are stored in the output registers 414 to 416, respectively. And output to the output ports 417 to 419.

ここで、可変シフトレジスタ４１２，４１３は、レイテンシが変更可能なシフトレジスタであり、ｔの値によってレイテンシが定められる。例えば、ｔ＝０の場合はレイテンシを１とし、ｔ＝１の場合はレイテンシを２とする。このように、可変シフトレジスタ４１２，４１３は、ｔ＋１のレイテンシを持つように設定する。なお、可変シフトレジスタ４１２，４１３は、ＦＰＧＡが持つ機能の１つであり、ＦＰＧＡ上でこのような機能を実現することは容易である。 Here, the variable shift registers 412 and 413 are shift registers whose latency can be changed, and the latency is determined by the value of t. For example, when t = 0, the latency is 1, and when t = 1, the latency is 2. Thus, the variable shift registers 412 and 413 are set to have a latency of t + 1. Note that the variable shift registers 412 and 413 are one of the functions of the FPGA, and it is easy to realize such functions on the FPGA.

図６は、演算回路Ｃ５０１の構成の一例を示すブロック図である。 FIG. 6 is a block diagram showing an example of the configuration of the arithmetic circuit C501.

図６に示した演算回路Ｃ５０１は、前述したアルゴリズムの擬似コードの多倍長加算処理（２）におけるループ処理のうち、ｊ＝０からｊ＝ｔ−１までの処理を行うものである。 The arithmetic circuit C501 shown in FIG. 6 performs the processing from j = 0 to j = t-1 in the loop processing in the multiple addition processing (2) of the pseudo code of the algorithm described above.

図３に示した演算回路Ａ３０１が２ｔ回のループ処理を行うのに対して、図６に示した演算回路Ｃ５０１はｔ回のループ処理を行うことに注意されたい。演算回路Ｃ５０１はｃｌｋ１ｘを基準として動作する。このため、演算回路Ｃ５０１は、大部分の回路がｃｌｋ２ｘで動作する演算回路Ａ３０１の半分のループ回数を処理することとなる。 Note that the arithmetic circuit A301 illustrated in FIG. 3 performs 2t loop processing, whereas the arithmetic circuit C501 illustrated in FIG. 6 performs t loop processing. The arithmetic circuit C501 operates with reference to clk1x. For this reason, the arithmetic circuit C501 processes half the number of loops of the arithmetic circuit A301 in which most circuits operate on clk2x.

入力ポート５０２は、図５に示した演算回路Ｂ４０１の出力ポート４１７に接続されている。また、入力ポート５０３は、演算回路Ｂ４０１の出力ポート４１８に接続されている。入力ポート５０２，５０３から入力されるｖ［ｉ］［ｊ＋１］｜ｖ［ｉ］［ｊ］は、可変シフトレジスタ５１２及び可変シフトレジスタ５１３の出力、あるいは、入力ポート５０４から入力されるデータと加算器５１０，５１１で加算処理される。この加算処理は上記擬似コードの（ｖ［ｉ］［ｊ＋１］｜ｖ［ｉ］［ｊ］）＋（ｓ［ｉ］［ｊ＋２］｜ｓ［ｉ］［ｊ＋１］）＋ｃａｒｒｙ＿１の処理に対応する。加算器５１０，５１１は、加算結果ｓ［ｉ＋１］［ｊ］、ｓ［ｉ＋１］［ｊ＋１］、及び、加算処理時に発生するキャリー値であるｃａｒｒｙ＿１を出力する。出力されたデータは、それぞれ出力レジスタ５１４〜５１６に格納される。また、ｓ［ｉ＋１］［ｊ］、ｓ［ｉ＋１］［ｊ＋１］については可変シフトレジスタ５１２，５１３にも格納される。ｉ＝０の処理では、ｓ［ｉ］［０］、・・・、ｓ［ｉ］［ｔ］であるため、加算処理は（ｖ［０］［ｊ＋１］｜ｖ［０］［ｊ］）＋０となる必要がある。そのため、この処理は、ＡＮＤゲート５０６，５０７の出力が０となるようにＡＮＤゲート５０６，５０７が制御され、入力ポート５０５から０が入力され、マルチプレクサ５０８で入力ポート５０５からの０が選択されて加算器５１０に入力される。 The input port 502 is connected to the output port 417 of the arithmetic circuit B401 shown in FIG. The input port 503 is connected to the output port 418 of the arithmetic circuit B401. V [i] [j + 1] | v [i] [j] input from the input ports 502 and 503 is added to the output of the variable shift register 512 and the variable shift register 513 or the data input from the input port 504. Adders 510 and 511 perform addition processing. This addition processing corresponds to the processing of (v [i] [j + 1] | v [i] [j]) + (s [i] [j + 2] | s [i] [j + 1]) + carry_1 of the pseudo code. Adders 510 and 511 output addition results s [i + 1] [j], s [i + 1] [j + 1], and carry_1, which is a carry value generated during the addition process. The output data is stored in output registers 514 to 516, respectively. Further, s [i + 1] [j] and s [i + 1] [j + 1] are also stored in the variable shift registers 512 and 513. In the process of i = 0, since s [i] [0],..., s [i] [t], the addition process is (v [0] [j + 1] | v [0] [j]) It needs to be +0. Therefore, in this processing, the AND gates 506 and 507 are controlled so that the outputs of the AND gates 506 and 507 become 0, 0 is input from the input port 505, and 0 from the input port 505 is selected by the multiplexer 508. Input to adder 510.

次に、加算処理の詳細について述べる。 Next, details of the addition processing will be described.

前述したように、可変シフトレジスタ５１２，５１３には、ｓ［ｉ＋１］［ｊ］、ｓ［ｉ＋１］［ｊ＋１］が格納される。ｔ＝１の場合、ｉ＝０の処理後は、ｓ［１］［０］が可変シフトレジスタ５１２に格納され、ｓ［１］［１］が可変シフトレジスタ５１３に格納される。ｉ＝１の処理時には、加算器５１０への入力としてｓ［１］［１］が必要となり、加算器５１１への入力としてｓ［１］［２］が必要となる。このとき、ｓ［１］［１］は可変シフトレジスタ５１３に格納されているので、ＡＮＤゲート５０６を経由して加算器５１０に入力されるのに対し、ｓ［１］［２］は入力ポート５０４から入力されるものとする。ｓ［１］［１］及びｓ［１］［２］に対する加算処理時に発生するキャリー値は、出力レジスタ５１６に格納された後、出力ポート５１９に出力される。ｉ＝２以降の処理も同様に行われる。 As described above, the variable shift registers 512 and 513 store s [i + 1] [j] and s [i + 1] [j + 1]. When t = 1, after the process of i = 0, s [1] [0] is stored in the variable shift register 512, and s [1] [1] is stored in the variable shift register 513. When i = 1, s [1] [1] is required as an input to the adder 510 and s [1] [2] is required as an input to the adder 511. At this time, since s [1] [1] is stored in the variable shift register 513, it is input to the adder 510 via the AND gate 506, whereas s [1] [2] is input port Assume that the input is made from 504. The carry value generated during the addition process for s [1] [1] and s [1] [2] is stored in the output register 516 and then output to the output port 519. The processing after i = 2 is performed in the same manner.

ｔ＝２の場合、ｉ＝０の処理後は、ｓ［１］［０］、ｓ［１］［２］が可変シフトレジスタ５１２に格納され、ｓ［１］［１］、ｓ［１］［３］が可変シフトレジスタ５１３に格納される。ｉ＝１の処理時には、加算器５１０への入力としてｓ［１］［１］、ｓ［１］［３］が必要となり、加算器５１１への入力としてｓ［１］［２］、ｓ［１］［４］が必要となる。このとき、ｔ＝１の場合と同様に、加算器５１０での処理に必要なデータは可変シフトレジスタ５１３に格納されているので、ＡＮＤゲート５０６を経由して加算器５１０に入力される。加算器５１１での処理に必要なデータのうち、ｓ［１］［２］は可変シフトレジスタ５１２に格納されているので、マルチプレクサ５０９及びＡＮＤゲート５０７を経由して加算器５１１に入力されるのに対し、ｓ［１］［４］は入力ポート５０４から入力されるものとする。ｓ［１］［１］及びｓ［１］［２］に対する加算処理時に発生するキャリー値は、出力レジスタ５１６に格納された後、次のｓ［１］［３］及びｓ［１］［４］に対する加算処理に必要なｃａｒｒｙ＿１の値としてマルチプレクサ５０８を経由して加算器５１０に入力される。ｓ［１］［３］及びｓ［１］［４］に対する加算処理時に発生するキャリー値は、出力レジスタ５１６に格納された後、出力ポート５１９に出力される。ｉ＝２以降の処理も同様に行われる。 In the case of t = 2, after the process of i = 0, s [1] [0], s [1] [2] are stored in the variable shift register 512, and s [1] [1], s [1] [3] is stored in the variable shift register 513. When i = 1, s [1] [1] and s [1] [3] are required as inputs to the adder 510, and s [1] [2] and s [ 1] [4] are required. At this time, as in the case of t = 1, data necessary for processing in the adder 510 is stored in the variable shift register 513 and is input to the adder 510 via the AND gate 506. Of the data necessary for processing in the adder 511, s [1] [2] is stored in the variable shift register 512, so that it is input to the adder 511 via the multiplexer 509 and the AND gate 507. In contrast, s [1] [4] is input from the input port 504. The carry value generated during the addition process for s [1] [1] and s [1] [2] is stored in the output register 516, and then the next s [1] [3] and s [1] [4] ] Is input to the adder 510 via the multiplexer 508 as the value of carry_1 necessary for the addition process. The carry value generated during the addition process for s [1] [3] and s [1] [4] is stored in the output register 516 and then output to the output port 519. The processing after i = 2 is performed in the same manner.

ｔ＝３以上の場合も、ｔ＝２の場合と同様に、可変シフトレジスタ５１２，５１３に格納済みのｓの値は、前述したように、ＡＮＤゲート５０６を経由して加算器５１０に入力され、加算器５１０は入力されたｓの値を用いて加算処理を行う。そして、可変シフトレジスタ５１２，５１３に格納されていないｓの値は、前述したように、入力ポート５０４から入力され、加算器５１０は入力されたｓの値を用いて加算処理を行う。 Also in the case of t = 3 or more, as in the case of t = 2, the value of s already stored in the variable shift registers 512 and 513 is input to the adder 510 via the AND gate 506 as described above. The adder 510 performs addition processing using the input value of s. Then, the value of s that is not stored in the variable shift registers 512 and 513 is input from the input port 504 as described above, and the adder 510 performs addition processing using the input value of s.

次に、図４に示した演算回路と図６に示した演算回路Ｃ５０１の入出力関係について述べる。 Next, the input / output relationship between the arithmetic circuit shown in FIG. 4 and the arithmetic circuit C501 shown in FIG. 6 will be described.

図６に示した演算回路Ｃ５０１の出力ポート５１８は図４に示した演算回路の入力ポート２０５に接続する。これによりｑ導出処理、即ち、ｖ［ｉ］［０］＋ｓ［ｉ］［１］の処理に必要なｓ［ｉ］［１］の値を出力ポート５１８に出力されたタイミングで入力レジスタ２０２に格納し、加算器２０４で加算処理することでｑ［ｉ＋１］を導出することができる。 The output port 518 of the arithmetic circuit C501 shown in FIG. 6 is connected to the input port 205 of the arithmetic circuit shown in FIG. As a result, the value of s [i] [1] necessary for the q derivation process, that is, the process of v [i] [0] + s [i] [1] is output to the input register 202 at the timing when it is output to the output port 518. Q [i + 1] can be derived by storing and adding by the adder 204.

図７は、演算回路Ｄ６０１の構成の一例を示すブロック図である。 FIG. 7 is a block diagram showing an example of the configuration of the arithmetic circuit D601.

図７に示した演算回路Ｄ６０１は、前述したアルゴリズムの擬似コードの多倍長乗算処理（１）及び多倍長乗算処理（２）におけるループ処理のうち、ｊ＝２ｔからｊ＝（４ｔ−１）までの処理を行うものである。ここで、ｔは正の整数（ｔ≠０）とする。この演算回路Ｄ６０１の処理は以下の擬似コード（以下、「擬似コードＤ」という）で表せる。 The arithmetic circuit D601 shown in FIG. 7 includes j = 2t to j = (4t−1) among the loop processing in the multiple-precision multiplication processing (1) and the multiple-precision multiplication processing (2) of the pseudo code of the algorithm described above. ). Here, t is a positive integer (t ≠ 0). The processing of the arithmetic circuit D601 can be expressed by the following pseudo code (hereinafter referred to as “pseudo code D”).

ここで、擬似コードＤにおける右辺のｃａｒｒｙ＿ｋは、それぞれ図３に示した演算回路Ａ３０１によるｊ＝２ｔ−１のときの処理で発生したキャリー値であるものとする。以下、擬似コードＤと演算回路Ｄ６０１の各部との対応について説明する。 Here, carry_k on the right side in the pseudo code D is assumed to be a carry value generated in the process when j = 2t−1 by the arithmetic circuit A301 shown in FIG. Hereinafter, the correspondence between the pseudo code D and each part of the arithmetic circuit D601 will be described.

まず、入力ポート６０２から、擬似コードＤのａ［ｊ］及びａ［ｊ＋１］、もしくは、ｍ［ｊ］及びｍ［ｊ＋１］が同時に入力される。そのため、入力ポート６０２は、２ｋビットのバス幅を持つ。また、入力ポート６０３からは、擬似コードＤのｂ［ｉ］及びｑ［ｉ−ｄ］が入力される。入力ポート６０３は、中間レジスタ６０８，６０９を経由して、乗算器６１４と出力ポート６１６に接続されている。つまり、入力ポート６０３から入力されたデータは、レイテンシを２として乗算器６１４で乗算処理されるとともに、出力ポート６１６に出力される。 First, a [j] and a [j + 1] of pseudo code D or m [j] and m [j + 1] are simultaneously input from input port 602. Therefore, the input port 602 has a bus width of 2k bits. Also, b [i] and q [id] of the pseudo code D are input from the input port 603. The input port 603 is connected to the multiplier 614 and the output port 616 via the intermediate registers 608 and 609. That is, the data input from the input port 603 is multiplied by the multiplier 614 with a latency of 2, and is output to the output port 616.

入力ポート６０２，６０３では、データを入力する処理が、図３に示した演算回路Ａ３０１と同様に行われる。つまり、入力処理ａと入力処理ｍが、ｃｌｋ２ｘを基準として２ｔサイクルごとに交互に行われる。 In the input ports 602 and 603, data input processing is performed in the same manner as the arithmetic circuit A301 shown in FIG. That is, the input process a and the input process m are alternately performed every 2t cycles with reference to clk2x.

マルチプレクサ６１１は、ｃｌｋ２ｘを基準として入力レジスタ６０５と入力レジスタ６０６に格納されたデータを交互に選択する。中間レジスタ６０７には、マルチプレクサ６１１で選択されたデータが格納される。中間レジスタ６０７に入力処理ａのデータが格納されている場合、そのデータと同じタイミングで中間レジスタ６０９からｂ［ｉ］が乗算器６１４に入力される。一方、中間レジスタ６０７に入力処理ｍのデータが格納されている場合、そのデータと同じタイミングで中間レジスタ６０９からｑ［ｉ−ｄ］が乗算器６１４に入力される。 The multiplexer 611 alternately selects the data stored in the input register 605 and the input register 606 with reference to clk2x. The intermediate register 607 stores the data selected by the multiplexer 611. When the data of the input process a is stored in the intermediate register 607, b [i] is input from the intermediate register 609 to the multiplier 614 at the same timing as that data. On the other hand, when the data of the input process m is stored in the intermediate register 607, q [id] is input from the intermediate register 609 to the multiplier 614 at the same timing as that data.

これにより、乗算器６１４では、入力処理ａに対応してｂ［ｉ］・ａ［ｊ］が処理され、入力処理ｍに対応してｑ［ｉ−ｄ］・ｍ［ｊ］が処理される。処理結果は、中間レジスタ６１０に格納される。 Thereby, the multiplier 614 processes b [i] · a [j] corresponding to the input process a, and processes q [id] · m [j] corresponding to the input process m. . The processing result is stored in the intermediate register 610.

次に、加算器６１５では、中間レジスタ６１０に格納された乗算器６１４による乗算処理の結果とｃａｒｒｙ＿ｋとの加算処理を行う。ｊ＝２ｔの場合のみ、ｃａｒｒｙ＿ｋの値として入力ポート６０４から入力される値がマルチプレクサ６１２で選択されて加算器６１５で用いられる。それ以外の場合は、図３に示した演算回路Ａ３０１と同様に、１サイクル前の加算処理で発生したキャリー値がマルチプレクサ６１２で選択されて加算器６１５で用いられる。 Next, adder 615 performs addition processing of the result of multiplication processing by multiplier 614 stored in intermediate register 610 and carry_k. Only when j = 2t, the value input from the input port 604 as the value of carry_k is selected by the multiplexer 612 and used by the adder 615. In other cases, the carry value generated in the addition process one cycle before is selected by the multiplexer 612 and used in the adder 615, as in the arithmetic circuit A301 shown in FIG.

以上の処理により、出力ポート６１８には、入力処理ａに対応してｐ［ｉ］［２ｔ］からｐ［ｉ］［４ｔ−１］が出力され、入力処理ｍに対応してｕ［ｉ］［２ｔ］からｕ［ｉ］［４ｔ−１］が出力される。また、出力ポート６１７には、出力ポート６１８に対応したｃａｒｒｙ＿ｋが出力される。出力ポート６１７，６１８には、ｃｌｋ２ｘを基準としてサイクルごとに各データが出力される。 Through the above processing, p [i] [2t-1] is output from the output port 618 corresponding to the input process a to p [i] [4t-1], and u [i] is output corresponding to the input process m. U [i] [4t-1] is output from [2t]. Also, carry_k corresponding to the output port 618 is output to the output port 617. Each data is output to the output ports 617 and 618 every cycle with reference to clk2x.

図２に示したように、ｊ＝２ｔからｊ＝（４ｔ−１）までの多倍長加算処理（１）におけるループ処理の回路は、図５に示した演算回路Ｂ４０１と同一の回路を図７に示した演算回路Ｄ６０１に接続して構成する。また、ｊ＝２ｔからｊ＝（４ｔ−１）までの多倍長加算処理（２）におけるループ処理の回路は、さらに、そこに図６に示した演算回路Ｃ５０１と同一の回路を接続して構成する。 As shown in FIG. 2, the circuit of the loop processing in the multiple length addition processing (1) from j = 2t to j = (4t-1) is the same circuit as the arithmetic circuit B401 shown in FIG. 7 is connected to the arithmetic circuit D601 shown in FIG. Further, the loop processing circuit in the multiple length addition processing (2) from j = 2t to j = (4t−1) is further connected to the same circuit as the arithmetic circuit C501 shown in FIG. Configure.

さらに、ｊ＝４ｔ以降の処理も２ｔごとに図７に示した演算回路Ｄ６０１、図５に示した演算回路Ｂ４０１、図６に示した演算回路Ｃ５０１を所定の回数分用意して構成する。 Further, the processing after j = 4t is configured by preparing the arithmetic circuit D601 shown in FIG. 7, the arithmetic circuit B401 shown in FIG. 5, and the arithmetic circuit C501 shown in FIG.

図２において、モンゴメリ乗算回路２０１の回路構成のうち、演算回路Ａ３０１及び演算回路Ｄ６０１の大部分（図３の多倍長乗算回路３２３、図７の多倍長乗算回路６１９）と演算回路Ａ３０１の出力を格納する中間レジスタ２０３は、ｃｌｋ２ｘを基準として処理を行い、それ以外の回路は、ｃｌｘ１ｘを基準として処理を行う。そのため、モンゴメリ乗算回路２０１全体における入出力はｃｌｋ１ｘを基準として行われる。これにより、外部回路は低い動作周波数で構成できるため、タイミング制約が緩くなり外部回路の設計条件を緩めることができる。例えば、ＦＰＧＡに組み込まれているメモリに対する読み出し、書き込み処理を外部回路に利用できる。また、演算回路Ａ３０１及び演算回路Ｄ６０１に、近年のＦＰＧＡに機能拡張として組み込まれている乗算処理及び加算処理の専用回路を用いることにより、専用回路が動作可能な高い動作周波数をそのまま活用した処理が可能となり、高速で効率のよい処理が実現できる。 2, in the circuit configuration of the Montgomery multiplication circuit 201, most of the arithmetic circuit A301 and the arithmetic circuit D601 (the multiple length multiplication circuit 323 in FIG. 3 and the multiple length multiplication circuit 619 in FIG. 7) and the arithmetic circuit A301. The intermediate register 203 that stores the output performs processing with reference to clk2x, and the other circuits perform processing with reference to clx1x. Therefore, input / output in the entire Montgomery multiplication circuit 201 is performed with reference to clk1x. Thereby, since the external circuit can be configured with a low operating frequency, the timing constraint is relaxed and the design conditions of the external circuit can be relaxed. For example, read / write processing for a memory incorporated in an FPGA can be used for an external circuit. In addition, the processing circuit A301 and the processing circuit D601 use a dedicated circuit for multiplication processing and addition processing incorporated as a function expansion in recent FPGAs, so that processing using the high operating frequency at which the dedicated circuit can operate is used as it is. This makes it possible to realize high-speed and efficient processing.

次に、図２に示したモンゴメリ乗算回路２０１全体の動作について説明する。 Next, the operation of the entire Montgomery multiplication circuit 201 shown in FIG. 2 will be described.

まず、演算回路Ａ３０１が、図３を用いて説明した処理を開始する。演算回路Ａ３０１がｉ＝０における入力処理ａを完了した後、つまり、処理開始からｃｌｋ２ｘを基準として２ｔサイクル後（ｃｌｋ１ｘを基準とした場合はｔサイクル後）、演算回路Ａ３０１に接続されている演算回路Ｄ６０１が入力処理ａを開始する。同様に、ｃｌｋ２ｘを基準として２ｔサイクルごとに、順次隣の演算回路Ｄ６０１が入力処理ａを開始する。以降、前述したアルゴリズムの擬似コード全体の処理が完了するまで各演算回路が処理を行う。演算結果は、図２の一番左側の演算回路Ｃ５０１により、ｓ［ｎ＋ｄ＋１］［１］｜ｓ［ｎ＋ｄ＋１］［０］からｓ［ｎ＋ｄ＋１］［２ｔ−１］｜ｓ［ｎ＋ｄ＋１］［２ｔ−２］まで順次２ｋビットずつが、ｃｌｋ１ｘを基準としてｔサイクル間出力される。この出力後、続けて右隣の演算回路Ｃ５０１により、ｓ［ｎ＋ｄ＋１］［２ｔ＋１］｜ｓ［ｎ＋ｄ＋１］［２ｔ］からｓ［ｎ＋ｄ＋１］［４ｔ−１］｜ｓ［ｎ＋ｄ＋１］［４ｔ−２］まで順次２ｋビットずつが、ｃｌｋ１ｘを基準としてｔサイクル間出力される。以降、順次隣の演算回路Ｃ５０１により、演算結果が、ｃｌｋ１ｘを基準としてｔサイクル間出力される。また、演算結果として必要なｓ［ｎ＋ｄ］［１］、・・・、ｓ［ｎ＋１］［１］は一番左側の演算回路Ｃ５０１により出力される。 First, the arithmetic circuit A301 starts the processing described with reference to FIG. After the arithmetic circuit A301 completes the input process a at i = 0, that is, after 2t cycles from the start of the process, using clk2x as a reference (after t cycles if clk1x is used as a reference), the arithmetic connected to the arithmetic circuit A301 The circuit D601 starts the input process a. Similarly, the adjacent arithmetic circuit D601 sequentially starts the input process a every 2t cycles with reference to clk2x. Thereafter, each arithmetic circuit performs processing until the processing of the entire pseudo code of the algorithm is completed. The calculation result is output from s [n + d + 1] [1] | s [n + d + 1] [0] to s [n + d + 1] [2t−1] | s [n + d + 1] [2t−2 by the leftmost arithmetic circuit C501 in FIG. ], 2 k bits are output sequentially for t cycles with reference to clk1x. After this output, the operation circuit C501 on the right next continues from s [n + d + 1] [2t + 1] | s [n + d + 1] [2t] to s [n + d + 1] [4t−1] | s [n + d + 1] [4t−2]. Sequentially, 2k bits are output for t cycles with reference to clk1x. Thereafter, the calculation result is sequentially output for t cycles by the adjacent calculation circuit C501 with reference to clk1x. In addition, s [n + d] [1],..., S [n + 1] [1] necessary as calculation results are output by the leftmost calculation circuit C501.

図８は、モンゴメリ乗算回路２０１の動作（剰余演算方法）を示すフローチャートである。 FIG. 8 is a flowchart showing the operation (residue calculation method) of the Montgomery multiplication circuit 201.

ここで、前述したアルゴリズムの擬似コードにおいて、
被乗数Ａ＝｛Ａ_ｎ＋ｄ，・・・，Ａ_１，Ａ_０｝（Ａ_ｎ＋ｄ，・・・，Ａ_ｎの各々の値は０）、
乗数Ｂ＝｛Ｂ_ｎ＋ｄ，・・・，Ｂ_１，Ｂ_０｝（Ｂ_ｎ＋ｄ，・・・，Ｂ_ｎの各々の値を０）、
Ｍ’’’＝＝｛Ｍ_０，Ｍ_１，・・・，Ｍ_ｎ＋ｄ｝、
中間値Ｑ＝｛Ｑ_{ｎ＋ｄ＋１}，・・・，Ｑ_１，Ｑ_０，Ｑ_−１，・・・，Ｑ_−ｄ｝、
中間結果Ｓ＝｛Ｓ_{ｎ＋ｄ＋１}，・・・，Ｓ_１，Ｓ_０｝、
Ａ及びＢの乗算結果Ｐ＝｛Ｐ_ｎ＋ｄ，・・・，Ｐ_１，Ｐ_０｝、
Ｍ及びＱの乗算結果Ｕ＝｛Ｕ_ｎ＋ｄ，・・・，Ｕ_１，Ｕ_０｝、
Ｐ及びＵの加算結果Ｖ＝｛Ｖ_ｎ＋ｄ，・・・，Ｖ_１，Ｖ_０｝とする。 Here, in the pseudo code of the algorithm described above,
Multiplicand _{_{A = {A n + d,}} ···, A 1, A 0} (A n + d, ···, each of the values of _{A n} 0),
Multiplier B = {B _{n + d} ,..., B ₁ , B ₀ } (each value of B _{n + d} ,..., B _n is 0),
M ′ ″ == {M ₀ , M ₁ ,..., M _{n + d} },
Intermediate value Q = {Q _{n + d + 1} ,..., Q ₁ , Q ₀ , Q ₋₁ _,.
Intermediate result S = {S _{n + d + 1} ,..., S ₁ , S ₀ },
Multiplication result of A and B P = {P _{n + d} ,..., P ₁ , P ₀ },
M and Q multiplication results U = {U _{n + d} ,..., U ₁ , U ₀ },
The addition result of P and U is V = {V _{n + d} ,..., V ₁ , V ₀ }.

モンゴメリ乗算回路２０１（剰余演算装置）は、Ｓ_０の値を０に設定し、例えばメモリに格納しておく（ステップＳ１０１）。また、モンゴメリ乗算回路２０１は、Ｑ_０，Ｑ_−１，・・・，Ｑ_−ｄの各々の値を０に設定し、例えばメモリに格納しておく（ステップＳ１０２）。 Montgomery multiplication circuit 201 (residue arithmetic device), stores the value of _{S 0} is set to 0, for example, in the memory (step S101). Further, the Montgomery multiplication circuit 201 sets each value of Q ₀ , Q ₋₁ ,..., Q _−d to 0 and stores it in, for example, a memory (step S102).

モンゴメリ乗算回路２０１は、カウンタｉを０に設定してループ処理を開始する（ステップＳ１０３）。 The Montgomery multiplication circuit 201 sets the counter i to 0 and starts loop processing (step S103).

演算回路Ａ３０１及び演算回路Ｄ６０１（多倍長乗算部）は、所定の動作周波数ｃｌｋ１ｘの２倍の動作周波数ｃｌｋ２ｘにて、ｉ回目の処理で、ＡとＢ_ｉとの乗算処理を乗算器３１６，６１４で行い、その乗算結果Ｐ_ｉを出力する（ステップＳ１０４：多倍長乗算ステップの一部）。また、演算回路Ａ３０１及び演算回路Ｄ６０１は、２倍の動作周波数ｃｌｋ２ｘにて、ｉ回目の処理で、Ｍ’’’とＱ_ｉ−ｄとの乗算処理を乗算器３１６，６１４で行い、その乗算結果Ｕ_ｉを出力する（ステップＳ１０５：多倍長乗算ステップの一部）。このステップＳ１０５において、演算回路Ａ３０１及び演算回路Ｄ６０１は、後述するように、ｉ−１回目の処理のステップＳ１０８で出力されるＱ_ｉを遅延させたＱ_ｉ−ｄを用いて、Ｍ’’’とＱ_ｉ−ｄとの乗算処理を行う。本実施の形態では、ステップＳ１０５において、演算回路Ａ３０１は、ｉ回目の処理で、Ｍ’’’の最初のｋビットとＱ_ｉ−ｄとの乗算処理を行った後、その乗算結果とステップＳ１０４で出力したＰ_ｉの最初のｋビットとの加算処理を加算器３１７で行い、その加算結果をＶ_ｉの最初のｋビットとして出力する。 The arithmetic circuit A301 and the arithmetic circuit D601 (multiple-length multiplier) perform multiplication processing of A and B _i in the i-th processing at the operating frequency clk2x that is twice the predetermined operating frequency clk1x. In step 614, the multiplication result _Pi is output (step S104: part of the multiple length multiplication step). The arithmetic circuit A301 and the arithmetic circuit D601 is at twice the operating frequency clk2x, in i-th processing, performed by the multiplier 316,614 a process of multiplying the M '''and _{Q i-d,} the multiplication The result U _i is output (step S105: part of the multiple multiplication step). In this step S105, as will be described later, the arithmetic circuit A301 and the arithmetic circuit D601 use the Q _i-d obtained by delaying Q _i output in step S108 of the (i−1) -th process to use M ′ ″. And Q _i-d are multiplied. In this embodiment, in step S105, the arithmetic circuit A301 is a i-th process, after the process of multiplying the first k bits and Q _i-d of M ''', the multiplication result and the step S104 The adder 317 performs addition processing with the first k bits of P _i output in step S3, and outputs the addition result as the first k bits of V _i .

演算回路Ｂ４０１（多倍長加算部）は、所定の動作周波数ｃｌｋ１ｘにて、ｉ回目の処理で、ステップＳ１０４で出力されたＰ_ｉとステップＳ１０５で出力されたＵ_ｉとの加算処理を加算器４１０，４１１で行い、その加算結果Ｖ_ｉを出力する（ステップＳ１０６：多倍長加算ステップの一部）。その後、演算回路Ｃ５０１（多倍長加算部）は、１回目の処理では、ステップＳ１０６で出力されたＶ_０とステップＳ１０１でメモリに格納されたＳ_０との加算処理を加算器５１０，５１１で行い、その加算結果をＳ_１として出力する。また、演算回路Ｃ５０１は、２回目以降のｉ回目の処理では、ステップＳ１０６で出力されたＶ_ｉとｉ−１回目の処理で出力したＳ_ｉとの加算処理を加算器５１０，５１１で行い、その加算結果をＳ_ｉ＋１として出力する（ステップＳ１０７：多倍長加算ステップの一部）。本実施の形態では、ステップＳ１０７において、演算回路Ｃ５０１は、ｉ回目の処理で、Ｖ_ｉとＳ_ｉとの加算処理を行った後、Ｓ_ｉ＋１を可変シフトレジスタ５１２，５１３に格納し、ｔの大きさに合わせてタイミングを調整した上で、可変シフトレジスタ５１２，５１３からＳ_ｉ＋１を出力する。 Arithmetic circuit B 401 (multiple length adder unit) at a predetermined operating frequency clk1x, in i-th processing, the adder an addition process between the outputted _{U i} at _{P i} and step S105 which is output in step S104 carried out at 410 and 411, and outputs the addition result _{V i} (step S106: part of multiple length adding step). Thereafter, the arithmetic circuit C501 (multiple length adder unit) is the first process, an addition process between the _{S 0} stored in the memory at _{V 0} and step S101 which is output in step S106 in the adder 510, 511 performed, and outputs the sum as S _1. In addition, in the second and subsequent i-th processing, the arithmetic circuit C501 performs addition processing of V _i output in step S106 and S _i output in the _i− 1th processing by the adders 510 and 511, The addition result is output as S _{i + 1} (step S107: part of the multiple length addition step). In the present embodiment, in step S107, the arithmetic circuit C501 performs the addition process of V _i and S _{i in} the i-th process, and then stores S _{i + 1} in the variable shift registers 512 and 513. After adjusting the timing according to the size, S _{i + 1} is output from the variable shift registers 512 and 513.

図４に示した演算回路（中間値導出部）は、ｉ回目の処理で、ステップＳ１０５で出力されたＶ_ｉの最初のｋビットとステップＳ１０７で出力されたＳ_ｉの２番目のｋビットとの加算処理を加算器２０４で行い、その加算結果をＱ_ｉ＋１として出力する（ステップＳ１０８：中間値導出ステップ）。ここで、図４に示した演算回路は、ステップＳ１０６で出力されたＶ_ｉを用いて、Ｖ_ｉの最初のｋビットとＳ_ｉの２番目のｋビットとの加算処理を行ってもよい。この場合、ステップＳ１０５において、演算回路Ａ３０１はＶ_ｉの最初のｋビットを演算しなくてもよい。 The arithmetic circuit (intermediate value deriving unit) shown in FIG. 4 performs the first k bits of V _i output in step S105 and the second k bits of S _i output in step S107 in the _i- th process. Are added by the adder 204, and the addition result is output as Q _{i + 1} (step S108: intermediate value deriving step). Here, the arithmetic circuit shown in FIG. 4 may perform addition processing of the first k bits of V _i and the second k bits of S _i using V _i output in step S106. In this case, in step S105, the arithmetic circuit A301 may not calculate the first k bits of V _i .

モンゴメリ乗算回路２０１は、カウンタｉがｎ＋ｄになるまで、ｉを１ずつインクリメントし、上記処理をｎ＋ｄ＋１回繰り返す（ステップＳ１０９）。 The Montgomery multiplication circuit 201 increments i by 1 until the counter i reaches n + d, and repeats the above process n + d + 1 times (step S109).

演算回路Ｃ５０１（剰余演算部）は、ループ処理が終了した後、直前（即ち、ｎ＋ｄ＋１回の処理）のステップＳ１０７で出力されたＳ_{ｎ＋ｄ＋１}とそれより前のｄ回の処理（時間的に逆の順でいえば、ｎ＋ｄ回目からｎ＋１回目まで）のステップＳ１０７で出力されたＳ_ｎ＋ｄからＳ_ｎ＋１までの各々の２番目のｋビットとを連結してビット列Ｓ _{ｎ＋ｄ＋２}≡ＡＢＲ^−１ｍｏｄ（Ｍ）を得る（ステップＳ１１０：剰余演算ステップ）。Ｓ _{ｎ＋ｄ＋２} ≡ＡＢＲ ^−１ｍｏｄ（Ｍ）がモンゴメリ乗算の結果である。なお、前述したように、モンゴメリ乗算は、除算処理をビットシフトで置き換えた剰余演算である。 After the loop processing ends, the arithmetic circuit C501 (remainder operation unit) outputs S _{n + d + 1} output in step S107 immediately before (that is, n + d + 1 processing) and d processing (reverse in time). In order, the bit string S _{n + d + 2} ≡ABR ⁻¹ mod (M) by concatenating the second k bits from S _{n +} _d to S _{n + 1} output in step S107 ₍ from _{n + d} to _{n + 1} ). the obtained (step S110: remainder operation step). S _{n + d + 2} ≡ABR ⁻¹ mod (M) is the result of Montgomery multiplication. As described above, Montgomery multiplication is a remainder operation in which division processing is replaced with bit shift.

以上のように、本実施の形態に係る剰余演算装置は、多倍長乗算処理と多倍長加算処理を分離し、多倍長乗算処理は多倍長加算処理の２倍の動作周波数で実行されることを特徴とする。 As described above, the remainder calculation apparatus according to the present embodiment separates the multiple length multiplication process and the multiple length addition process, and the multiple length multiplication process is executed at twice the operating frequency of the multiple length addition process. It is characterized by being.

前記剰余演算装置は、多倍長加算処理において、各サイクルあたりに処理するビット長が多倍長乗算処理の２倍の長さであることを特徴とする。 The remainder calculation device is characterized in that in the multiple length addition process, the bit length processed per cycle is twice as long as the multiple length multiplication process.

前記剰余演算装置は、多倍長乗算処理は図３及び図７に示した回路構成を持つことを特徴とする。 The remainder arithmetic unit has the circuit configuration shown in FIGS. 3 and 7 for the multiple length multiplication process.

前記剰余演算装置は、多倍長加算処理は図５及び図６に示した回路構成を持つことを特徴とする。 The remainder arithmetic unit has the circuit configuration shown in FIGS. 5 and 6 for the multiple length addition process.

前記剰余演算装置は、多倍長加算処理のうち、最初のｋビットの処理のみ多倍長乗算処理用の回路で行い、多倍長乗算処理で必要な中間値の導出を高速化することを特徴とする。 The remainder calculating device performs only the first k-bit processing in the multiple length addition processing in the multiple length multiplication processing circuit, and accelerates the derivation of the intermediate value necessary for the multiple length multiplication processing. Features.

前記剰余演算装置は、多倍長加算処理用の回路において、多倍長加算処理の結果を可変シフトレジスタに格納し、格納した値を多倍長加算処理に必要なタイミングで可変シフトレジスタから出力することで、任意のループ回数に対応することを特徴とする。つまり、本実施の形態に係る剰余演算装置では、入力データのビット長に合わせてループ回数を調整することが可能である。 The remainder arithmetic unit stores the result of the multiple length addition process in the variable shift register in the circuit for the multiple length addition process, and outputs the stored value from the variable shift register at a timing required for the multiple length addition process. By doing so, it is possible to deal with an arbitrary number of loops. That is, in the remainder calculation apparatus according to the present embodiment, the number of loops can be adjusted according to the bit length of the input data.

ＦＰＧＡを用いたセキュリティチップの構成を示すブロック図である。It is a block diagram which shows the structure of the security chip using FPGA. モンゴメリ乗算回路の構成を示すブロック図である。It is a block diagram which shows the structure of a Montgomery multiplication circuit. 多倍長乗算処理（１）及び多倍長乗算処理（２）を行う演算回路Ａの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the arithmetic circuit A which performs a multiple length multiplication process (1) and a multiple length multiplication process (2). ｑ導出処理を行う演算回路の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the arithmetic circuit which performs q derivation | leading-out process. 多倍長加算処理（１）を行う演算回路Ｂの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the arithmetic circuit B which performs a multiple length addition process (1). 多倍長加算処理（２）を行う演算回路Ｃの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the arithmetic circuit C which performs a multiple length addition process (2). 多倍長乗算処理（１）及び多倍長乗算処理（２）を行う演算回路Ｄの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the arithmetic circuit D which performs a multiple length multiplication process (1) and a multiple length multiplication process (2). モンゴメリ乗算回路の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a Montgomery multiplication circuit.

Explanation of symbols

１０１セキュリティチップ、１０２ＣＰＵ、１０３内部メモリ、１０４外部インタフェース、１０５暗号処理回路用メモリ、２０１モンゴメリ乗算回路、２０２入力レジスタ、２０３中間レジスタ、２０４加算器、２０５入力ポート、３０１演算回路Ａ、３０２，３０３，３０４，３０５入力ポート、３０６，３０７マルチプレクサ、３０８，３０９，３１０，３１１，３１２入力レジスタ、３１３，３１４，３１５中間レジスタ、３１６乗算器、３１７加算器、３１８マルチプレクサ、３１９，３２０，３２１出力ポート、３２２出力レジスタ、３２３多倍長乗算回路、４０１演算回路Ｂ、４０２，４０３入力ポート、４０４，４０５，４０６中間レジスタ、４０７，４０８ＡＮＤゲート、４０９マルチプレクサ、４１０，４１１加算器、４１２，４１３可変シフトレジスタ、４１４，４１５，４１６出力レジスタ、４１７，４１８出力ポート、５０１演算回路Ｃ、５０２，５０３，５０４，５０５入力ポート、５０６，５０７ＡＮＤゲート、５０８，５０９マルチプレクサ、５１０，５１１加算器、５１２，５１３可変シフトレジスタ、５１４，５１５，５１６出力レジスタ、５１７，５１８，５１９，５２０出力ポート、６０１演算回路Ｄ、６０２，６０３，６０４入力ポート、６０５，６０６入力レジスタ、６０７，６０８，６０９，６１０中間レジスタ、６１１，６１２マルチプレクサ、６１３出力レジスタ、６１４乗算器、６１５加算器、６１６，６１７，６１８出力ポート、６１９多倍長乗算回路。 DESCRIPTION OF SYMBOLS 101 Security chip, 102 CPU, 103 Internal memory, 104 External interface, 105 Encryption processing memory, 201 Montgomery multiplication circuit, 202 Input register, 203 Intermediate register, 204 Adder, 205 Input port, 301 Arithmetic circuit A, 302, 303, 304, 305 Input port, 306, 307 Multiplexer, 308, 309, 310, 311, 312 Input register, 313, 314, 315 Intermediate register, 316 Multiplier, 317 Adder, 318 Multiplexer, 319, 320, 321 Output Port, 322 output register, 323 multiple length multiplication circuit, 401 arithmetic circuit B, 402, 403 input port, 404, 405, 406 intermediate register, 407, 408 AND gate, 409 Chipplexer, 410, 411 adder, 412, 413 variable shift register, 414, 415, 416 output register, 417, 418 output port, 501 arithmetic circuit C, 502, 503, 504, 505 input port, 506, 507 AND gate, 508, 509 Multiplexer, 510, 511 Adder, 512, 513 Variable shift register, 514, 515, 516 Output register, 517, 518, 519, 520 Output port, 601 Arithmetic circuit D, 602, 603, 604 Input port, 605 , 606 input register, 607, 608, 609, 610 intermediate register, 611, 612 multiplexer, 613 output register, 614 multiplier, 615 adder, 616, 617, 618 output port, 619 multiple length multiplication times .

Claims

The remainder for obtaining the result of the Montgomery multiplication based on the addition result of the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S, where the multiplicand is A, the multiplier is B, the modulus is M, and the intermediate value is Q. In the arithmetic unit,
A multiple length multiplication unit that performs multiplication processing of A and B and multiplication processing of M and Q by a multiplier at an operation frequency that is twice the predetermined operation frequency, and outputs each multiplication result;
A multiple length addition unit that performs addition processing of two multiplication results output by the multiple length multiplication unit and S at the predetermined operating frequency with a plurality of adders, and outputs the addition result for each adder When,
A remainder calculation unit that obtains a result of Montgomery multiplication by concatenating the addition results for each adder output by the multiple length addition unit;
The multiple length multiplication unit performs multiplication processing of a predetermined bit length in each cycle of the double operating frequency,
The multiple-precision adding unit performs addition processing of a length twice as long as the predetermined bit length in each cycle of the predetermined operating frequency.

The remainder for obtaining the result of the Montgomery multiplication based on the addition result of the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S, where the multiplicand is A, the multiplier is B, the modulus is M, and the intermediate value is Q. In the arithmetic unit,
A multiple length multiplication unit that performs multiplication processing of A and B and multiplication processing of M and Q by a multiplier at an operation frequency that is twice the predetermined operation frequency, and outputs each multiplication result;
A multiple length addition unit that performs addition processing of two multiplication results output by the multiple length multiplication unit and S at the predetermined operating frequency with a plurality of adders, and outputs the addition result for each adder When,
A remainder calculation unit that obtains a result of Montgomery multiplication by concatenating the addition results for each adder output by the multiple length addition unit;
The remainder calculation device is:
The bit length of B is k bits × n, the delay parameter is d,
_{_{B = {B n + d,}} ···, B 1, B 0}, B n + d, ···, a value of each of _{B n} is 0,
Q = {Q _{n + d + 1} ,..., Q ₁ , Q ₀ , Q ₋₁ ,..., Q _−d }, Q ₀ , Q ₋₁ _,. ,
M ′ ″ = (M ″ +1) / 2 ^{k (d + 1)} , M ″ = M ′ (mod (2 ^{k (d + 1)} )) M, (−MM ′) mod (2 ^{k (d + 1)} ) = 1 and
S = {S _{n + d + 1} ,..., S ₁ , S ₀ }, S ₀ is set to 0,
2 ^kn R ⁻¹ mod (M) = 1,
The multiplication result of A and B is P = {P _{n + d} ,..., P ₁ , P ₀ },
The multiplication result of M and Q is U = {U _{n + d} ,..., U ₁ , U ₀ },
The addition result of P and U is V = {V _{n + d} ,..., V ₁ , V ₀ },
Perform a loop process that increments the counter i by 1 from 0 to n + d and repeats n + d + 1 times,
The multiple length multiplication unit performs multiplication processing of A and B _i in each processing at the twice operating frequency, outputs the multiplication result P _i , and performs M ′ in each processing. '' And Q _i-d are multiplied, the multiplication result U _i is output,
The multiple length adder unit, said at a predetermined operating frequency, each time the process performs the addition of the P _i and U _i output by the multiple length multiplication unit, the addition result output V _i After that, in the first process, V ₀ and S ₀ are added, and the addition result is output as S _{1. In the} second and subsequent processes, V _i and S _i output in the previous process are output. Addition process is performed, and the addition result is output as S _{i + 1} .
The remainder calculation device further includes:
In each process, an adder performs addition processing of the first k bits of V _i and the second k bits of S _i , and includes an intermediate value deriving unit that outputs the addition result as Q _{i + 1} .
The multiple length multiplication unit, each time the process, using the Q _i-d delayed the Q _i output by the previous processing by said intermediate value derivation unit, and M '''and Q _i-d The multiplication process of
The remainder operation unit, n + after d + 1 th processing, the multiple length adder unit by the outputted _{S n + d + 1} and _{S n + d} after connecting the second k bits of each of up to _{S n + 1} bit sequence _{S n + d + 2} ≡ A remainder calculation apparatus characterized by obtaining ABR ⁻¹ mod (M).

The multiple length multiplication unit performs a multiplication process of the first k bits of M ′ ″ and Q _i-d in each process, and then adds the multiplication result and the first k bits of P _i. Processing is performed by an adder, and the addition result is output as the first k bits of V _i .
The intermediate value deriving unit uses the first k bits of V _i output by the multiple length multiplication unit in each processing, and uses the first k bits of V _i and the second k bits of S _i. The remainder calculation apparatus according to claim 2, wherein addition processing is performed.

The multiple length addition unit performs addition processing of V _i and S _i in each process, and then stores S _{i + 1} in the variable shift register and adjusts the timing for outputting S _{i + 1} from the variable shift register. The remainder calculation apparatus according to claim 2, wherein:

The remainder calculation device is implemented by an FPGA (Field / Programmable / Gate / Array) preliminarily provided with a multiple-precision multiplication circuit that performs multiplication processing of multiple-precision data at the double operating frequency.
5. The remainder calculation device according to claim 1, wherein the multiple length multiplication unit performs multiplication processing by the multiple length multiplication circuit.

The remainder for obtaining the result of the Montgomery multiplication based on the addition result of the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S, where the multiplicand is A, the multiplier is B, the modulus is M, and the intermediate value is Q. In the calculation method,
A multiple length multiplication step of performing multiplication processing of A and B and multiplication processing of M and Q at an operation frequency twice as high as a predetermined operation frequency, and outputting each multiplication result;
Multiple- precision addition step of performing addition processing of two multiplication results output in the multiple-precision multiplication step and S at the predetermined operating frequency with a plurality of adders and outputting the addition results for each adder When,
A remainder calculation step of obtaining a result of Montgomery multiplication by concatenating the addition results for each adder output in the multiple length addition step,
In the multiple length multiplication step, a multiplication process of a predetermined bit length is performed in each cycle of the double operating frequency,
In the multiple length addition step, an addition process of a length twice the predetermined bit length is performed in each cycle of the predetermined operating frequency.

The remainder for obtaining the result of the Montgomery multiplication based on the addition result of the multiplication result of A and B, the multiplication result of M and Q, and the intermediate result S, where the multiplicand is A, the multiplier is B, the modulus is M, and the intermediate value is Q. In the calculation method,
A multiple length multiplication step of performing multiplication processing of A and B and multiplication processing of M and Q at an operation frequency twice as high as a predetermined operation frequency, and outputting each multiplication result;
Multiple- precision addition step of performing addition processing of two multiplication results output in the multiple-precision multiplication step and S at the predetermined operating frequency with a plurality of adders and outputting the addition results for each adder When,
A remainder calculation step of obtaining a result of Montgomery multiplication by concatenating the addition results for each adder output in the multiple length addition step,
The remainder calculation method is:
The bit length of B is k bits × n, the delay parameter is d,
_{_{B = {B n + d,}} ···, B 1, B 0}, B n + d, ···, a value of each of _{B n} is 0,
Q = {Q _{n + d + 1} ,..., Q ₁ , Q ₀ , Q ₋₁ ,..., Q _−d }, Q ₀ , Q ₋₁ _,. ,
M ′ ″ = (M ″ +1) / 2 ^{k (d + 1)} , M ″ = M ′ (mod (2 ^{k (d + 1)} )) M, (−MM ′) mod (2 ^{k (d + 1)} ) = 1 and
S = {S _{n + d + 1} ,..., S ₁ , S ₀ }, S ₀ is set to 0,
2 ^kn R ⁻¹ mod (M) = 1,
The multiplication result of A and B is P = {P _{n + d} ,..., P ₁ , P ₀ },
The multiplication result of M and Q is U = {U _{n + d} ,..., U ₁ , U ₀ },
The addition result of P and U is V = {V _{n + d} ,..., V ₁ , V ₀ },
Perform a loop process that increments the counter i by 1 from 0 to n + d and repeats n + d + 1 times,
In the multiple length multiplication step, multiplication processing of A and B _i is performed in each processing at the double operating frequency, and the multiplication result P _i is output, and in each processing, M ′ '' And Q _i-d are multiplied, the multiplication result U _i is output,
In the multiple length adding step, wherein at a predetermined operating frequency, each time the process performs the addition of the P _i and U _i output by the multiple-precision multiplication step, the addition result output V _i After that, in the first process, V ₀ and S ₀ are added, and the addition result is output as S _{1. In the} second and subsequent processes, V _i and S _i output in the previous process are output. Addition process is performed, and the addition result is output as S _{i + 1} .
The remainder calculation method further includes:
In each process, an adder performs an addition process of the first k bits of V _i and the second k bits of S _i , and includes an intermediate value derivation step for outputting the addition result as Q _{i + 1} .
In the multiple-length multiplication step, M ′ ″ and Q _i-d are obtained by using Q _i-d obtained by delaying Q _i output in the previous process in the intermediate value derivation step in each process. The multiplication process of
In the remainder calculating step, after n + d + 1-th processing, S _{n + d + 1} output in the multiple length addition step and each second k bit from S _{n +} _d to S _{n + 1} are connected to form a bit string S _{n + d + 2} ≡ A remainder calculation method characterized by obtaining ABR ⁻¹ mod (M).