KR19990003678A

KR19990003678A - Montgomery Modular Multiplication Apparatus and Method

Info

Publication number: KR19990003678A
Application number: KR1019970027606A
Authority: KR
Inventors: 유기영
Original assignee: 유기영
Priority date: 1997-06-26
Filing date: 1997-06-26
Publication date: 1999-01-15
Also published as: KR100256776B1

Abstract

몽고메리 모듈러 승산 장치 및 방법이 개시된다. 이 장치는 제1 ∼ 제 n+1 프로세서들을 구비하고, 제p+1 프로세서는 제p 프로세서로부터 입력한 제어 신호에 응답하여, 제p 프로세서로부터 입력한 n+1 비트열의 제1 데이타(Ai), m+2 비트열의 제2 데이타(Bi), m+2개의 비트열로 구성되는 제3 데이타(Ni) 및 결과 데이타(Ti)을 이용하여 임시 변수(M) 및 제1 캐리(C=C₁C₀)를 계산하고 Ai, M 및 C를 저장하거나, 저장한 A, M 및 C와 제p 프로세서로부터 입력한 Ai, Bi, Ni 및 이전 결과값을 이용하여 제2 캐리(C'=C₁'C₀') 및 현재 결과값(Ti')를 계산하며, 이전 결과값(Ti) 또는 현재 결과값(Ti')과 입력한 Ai를 제p+2 프로세서로 출력하고 제어 신호, Bi 및 Ni들을 소정 시간 지연후에 제p+2 프로세서로 출력하고, 제1 프로세서로 입력되는 이전 결과값(T)과 제어 신호의 최하위 비트를 제외한 비트들은 모든 비트들은 저 논리 레벨이고, 제1 프로세서는 제어 신호를 외부로부터 입력하고, 제n+1 프로세서로부터 출력되며 n+m+1 비트열로 구성되는 현재 결과값들은 제1 데이타와 제2 데이타를 몽고메리 모듈러 N 승산한 결과인 것을 특징으로 한다. 그러므로, 몽고메리 모듈러 승산을 고속으로 수행할 수 있고, 하드웨어 설계 비용이 저렴한 효과가 있다.A Montgomery modular multiplication apparatus and method are disclosed. The apparatus includes first through n + 1 processors, and the p + 1 processor responds to a control signal input from the p processor, so that the first data Ai of the n + 1 bit string input from the p processor is input. using the second data Bi of the m + 2 bit stream, the third data Ni consisting of the m + 2 bit stream and the result data Ti, the temporary variable M and the first carry C = C. ₁ C ₀ ) and store the Ai, M and C, or the second carry (C '= C) using the stored A, M and C and Ai, Bi, Ni and previous results from the p processor. ₁ 'C ₀ ') and the current result value (Ti ') are calculated, and the previous result value (Ti) or the current result value (Ti') and the input Ai are output to the p + 2 processor and the control signal, Bi and After the predetermined time delay, the N bits are output to the p + 2 processor, and the bits except for the least significant bit of the control signal and the previous result value T input to the first processor are all bits at a low logic level. The processor inputs a control signal from an external source, outputs from an n + 1 processor, and the present result values consisting of n + m + 1 bit strings are the result of multiplying the first data and the second data by Montgomery Modular N. do. Therefore, Montgomery modular multiplication can be performed at high speed, and hardware design cost is low.

Description

Montgomery Modular Multiplication Apparatus and Method

본 발명은 암호화 및 복호화를 수행하는 시스템과 같이 정보의 보호를 요구하는 시스템을 위한 모듈러 승산에 관한 것으로서, 특히 모듈러 승산의 일종인 몽고메리 모듈러 승산 장치 및 방법에 관한 것이다.The present invention relates to modular multiplication for systems that require protection of information, such as systems that perform encryption and decryption, and more particularly, to a Montgomery modular multiplication apparatus and method, which is a type of modular multiplication.

모듈러 승산은 암호화 및 복호화 시스템에서 주로 사용된다. 특히 많은 데이타를 암호화 및 복호화하기 위해서 다른 모듈러 승산 방법보다 속도가 매우 빠른 몽고메리(montgomery) 모듈러 승산 방법이 사용된다.Modular multiplication is mainly used in encryption and decryption systems. In particular, the Montgomery modular multiplication method, which is much faster than other modular multiplication methods, is used to encrypt and decrypt a lot of data.

한편, 전술한 모듈러 승산 방법 또는 몽고메리 모듈러 승산 방법은 소프트웨어 및 하드웨어적으로 수행될 수 있으며, 소프트웨어적으로 몽고메리 모듈러 승산을 수행할 경우, 하드웨어적으로 수행하는 것보다 훨씬 시간이 많이 소요되기 때문에 빠른 연산 속도를 요구하는 시스템에서는 하드웨어적으로 몽고메리 모듈러 승산을 수행한다.Meanwhile, the above-described modular multiplication method or Montgomery modular multiplication method may be performed in software and hardware. When performing the Montgomery modular multiplication in software, it takes much longer than performing in hardware. In systems that require speed, Montgomery's modular multiplication is done in hardware.

그러나, 종래의 몽고메리 모듈러 승산을 하드웨어적으로 수행하는 장치는 순차적인 방법에 의해 몽고메리 모듈러 승산을 수행하였기 때문에 처리 속도 면에서 뒤떨어지고, 이를 위해서 여러개의 프로세서들이 필요하였으므로 몽고메리 모듈러 승산 장치의 원가가 비싼 문제점이 있었다.However, the conventional Montgomery modular multiplication apparatus is inferior in terms of processing speed because the Montgomery modular multiplication is performed by a sequential method, and the cost of the Montgomery modular multiplication apparatus is expensive because several processors are required for this purpose. There was a problem.

본 발명이 이루고자 하는 기술적 과제는, 몽고메리 모듈러 승산의 병렬성을 이용하여 최적의 시간내에 몽고메리 모듈러 승산을 병렬로 수행할 수 있는 몽고메리 모듈러 승산 장치를 제공하는 데 있다.An object of the present invention is to provide a Montgomery modular multiplication apparatus capable of performing the Montgomery modular multiplication in parallel in an optimal time using the parallelism of Montgomery modular multiplication.

본 발명이 이루고자 하는 다른 기술적 과제는, 상기 몽고메리 모듈러 승산 장치에서 수행되는 몽고메리 모듈러 승산 방법을 제공하는 데 있다.Another object of the present invention is to provide a Montgomery modular multiplication method performed in the Montgomery modular multiplication apparatus.

도 1은 본 발명에 의한 몽고메리 모듈러 승산 장치의 블럭도이다.1 is a block diagram of a Montgomery modular multiplication apparatus according to the present invention.

도 2는 도 1에 도시된 각 프로세서의 본 발명에 의한 블럭도이다.2 is a block diagram according to the present invention of each processor shown in FIG.

도 3은 도 1에 도시된 장치에서 수행되는 본 발명에 의한 몽고메리 모듈러 승산 방법을 설명하기 위한 플로우차트이다.FIG. 3 is a flowchart for explaining a Montgomery modular multiplication method according to the present invention performed in the apparatus shown in FIG. 1.

상기 과제를 이루기 위해, n+1개의 비트열로 구성되는 제1 데이타(Ai, 여기서 i는 비트 자리수)와 m+2개의 비트열로 구성되는 제2 데이타(Bi)를 몽고메리 모듈러 Ni(여기서, N은 m+2개의 비트열로 구성되는 제3 데이타임) 승산하는 본 발명에 의한 몽고메리 모듈러 승산 장치는,In order to achieve the above object, the Montgomery modular Ni (here, Montgomery modular multiplication apparatus according to the present invention multiplied by N is the third data consisting of m + 2 bit strings,

제1 ∼ 제 n+1 프로세서들로 구성되고, 제p+1(여기서, 1≤ p(정수) ≤n+1) 프로세서는 제p 프로세서로부터 입력한 제어 신호에 응답하여, 제p 프로세서로부터 입력한 상기 Ai, Bi, Ni 및 이전 결과값(Ti)을 이용하여 임시 변수(M) 및 제1 캐리(C=C₁C₀)를 다음 식에 따라 계산하고 상기 Ai를 A로서 저장하며 상기 M 및 상기 C를 저장하거나, 저장한 상기 A, M 및 C와 상기 제p 프로세서로부터 입력한 상기 Ai, Bi, Ni 및 상기 이전 결과값을 이용하여 제2 캐리(C'=C₁'C₀') 및 현재 결과값(Ti')를 다음 식에 따라 계산하며, 이전 결과값(Ti) 또는 현재 결과값(Ti')과 입력한 Ai를 제p+2 프로세서로 출력하고 상기 제어 신호, Bi 및 Ni들을 소정 시간 지연후에 상기 제p+2 프로세서로 출력하고,And a p + 1 (where 1 ≦ p (integer) ≦ n + 1) processor is input from the pth processor in response to a control signal input from the pth processor. Using the Ai, Bi, Ni and the previous result value Ti, the temporary variable M and the first carry C = C ₁ C ₀ are calculated according to the following equation, and Ai is stored as A and the M And a second carry (C '= C ₁ ' C ₀ ') using the stored A, M, and C, and the Ai, Bi, Ni, and the previous result value input from the p processor. ) And the current result value Ti 'are calculated according to the following equation, and the previous result value Ti or the current result value Ti' and the input Ai are outputted to the p + 2 processor and the control signals Bi and Output Nis to the p + 2 processor after a predetermined time delay;

상기 제1 프로세서로 입력되는 상기 이전 결과값(T)과 상기 제어 신호의 최하위 비트를 제외한 비트들은 모든 비트들은 저 논리 레벨이고, 상기 제1 프로세서는 상기 제어 신호를 외부로부터 입력하고, 상기 제n+1 프로세서로부터 출력되며 n+m+1 비트열로 구성되는 상기 현재 결과값들은 상기 제1 데이타와 상기 제2 데이타를 몽고메리 모듈러 N 승산한 결과인 것이 바람직하다.The bits except for the previous result T and the least significant bit of the control signal are input to the first processor, all bits are at a low logic level, and the first processor inputs the control signal from the outside, and the nth Preferably, the current result values, which are output from a +1 processor and consist of n + m + 1 bit strings, are the result of Montgomery's modular N multiplication of the first data and the second data.

M = (Ti ＋ Ai × Bi) MOD rM = (Ti + Ai × Bi) MOD r

(여기서, r은 소정수이다.)(Where r is a predetermined number)

C = Ti ＋ Ai × Bi + M × NiC = Ti + Ai × Bi + M × Ni

(여기서, C₀은 C DIV r MOD r이고, C₁은 C DIV r DIV r이다.)Where C ₀ is C DIV r MOD r and C ₁ is C DIV r DIV r.

C' = Ti + A × Bi ＋ M × Ni + C₀+ C₁× rC '= Ti + A × Bi + M × Ni + C ₀ + C ₁ × r

(여기서, C₀'는 C' DIV r MOD r이고, C₁'는 C' DIV r DIV r이다.)(Wherein C ₀ 'is C' DIV r MOD r and C ₁ 'is C' DIV r DIV r)

Ti' = (Ti ＋ A × Bi ＋ M×Ni ＋ C₀') MOD rTi '= (Ti + A × Bi + M × Ni + C ₀ ') MOD r

여기서, 제p+1 프로세서는 상기 r, 상기 제p 프로세서로부터 상기 제1 및 상기 제2 데이타들과 상기 이전 결과값을 입력하여 상기 임시 변수를 계산하는 임시 변수 계산 수단과, 선택신호에 응답하여, 상기 임시 변수, 상기 제1 캐리 및 상기 A로서 제1 데이타를 저장 또는 독출하는 저장 수단과, 상기 r 및 상기 저장 수단에 저장된 상기 제1 캐리를 이용하여 제3 캐리(C)를 다음 식에 따라 계산하는 제1 캐리 계산 수단과,Herein, the p + 1 processor comprises temporary variable calculating means for calculating the temporary variable by inputting the first and second data and the previous result value from the r, the p processor, and in response to a selection signal. A third carry C using the temporary variable, the first carry and the storage means for storing or reading first data as A, and the first carry stored in the r and the storage means. First carry calculation means for calculating in accordance with,

C = C₀＋ C₁×rC = C ₀ + C ₁ × r

상기 저 논리 레벨 및 상기 제3 캐리를 상기 선택 신호에 응답하여 선택적으로 출력하는 제1 선택 수단과, 상기 선택 신호에 응답하여, 상기 임시 변수, 상기 제1, 제2 및 제3 데이타들, 상기 이전 결과값 및 상기 제1 선택 수단의 출력을 이용하여 상기 제1 캐리를 계산하여 상기 저장수단으로 출력하거나, 저장된 상기 임시 변수와 상기 A 및 상기 제1 캐리, 상기 제2 및 제3 데이타들, 상기 이전 결과값 및 상기 r을 이용하여 상기 제2 캐리를 계산하는 제2 캐리 계산 수단과, 상기 r, 상기 저장 수단에 저장된 상기 A 및 상기 임시 변수와, 상기 제p 프로세서로부터 출력되는 상기 이전 결과값과 상기 제2 및 상기 제3 데이타들, 상기 제2 캐리 계산 수단으로부터 출력되는 상기 제2 캐리를 이용하여 상기 현재 결과값을 계산하는 결과값 계산 수단과, 상기 결과값 계산 수단으로부터 입력한 상기 현재 결과값 또는 상기 제p 프로세서로부터 입력한 상기 이전 결과값을 상기 선택 신호에 응답하여 선택적으로 출력하는 제2 선택 수단과, 입력한 상기 제어신호, 상기 제2 및 상기 제3 데이타를 버퍼링하여 상기 제p+2 프로세서로 출력하는 버퍼 및 상기 제어 신호와 상기 저 논리 레벨을 비교하고, 비교된 결과에 응답하여 상기 선택 신호를 출력하는 비교 수단으로 구성되는 것이 바람직하다.First selection means for selectively outputting the low logic level and the third carry in response to the selection signal, the temporary variable, the first, second and third data in response to the selection signal, the The first carry is calculated using the previous result and the output of the first selection means and output to the storage means, or the stored temporary variable and the A and the first carry, the second and third data, Second carry calculation means for calculating the second carry using the transfer result value and the r, the r, the A and the temporary variables stored in the storage means, and the transfer result output from the p processor. A result value calculating means for calculating the current result value using a value and the second and third data, the second carry output from the second carry calculation means, and the result value system Second selection means for selectively outputting the current result value input from the means or the previous result value input from the p-processor in response to the selection signal, the input control signal, the second and third And buffers for buffering data to be output to the p + 2 processor, and comparing means for comparing the control signal with the low logic level and outputting the selection signal in response to the comparison result.

상기 다른 과제를 이루기 위해, n+1 개의 비트열로 구성되는 제1 데이타(Ai, 여기서 i는 비트 자리수)와 m+2 개의 비트열로 구성되는 제2 데이타(Bi)를 각각이 n+m+1 개의 비트열로 구성되는 제어 데이타(CTi) 및 결과 데이타(Ti)들을 이용하여 몽고메리 모듈러 Ni(여기서, N은 m+2개의 비트열로 구성되는 제3 데이타임) 승산하기 위해 n+1 개의 프로세서들을 갖는 몽고메리 모듈러 승산 장치에서 수행되는 본 발명에 의한 몽고메리 모듈러 승산 방법은, 상기 제어 데이타가 '1'인가를 판단하는 (a) 단계와, 상기 제어 데이타가 '1'이면, 상기 제1, 제2 및 제3 데이타들과 상기 결과 데이타를 이전 프로세서로부터 가져오는 (b) 단계와, 상기 제1 및 상기 제2 데이타들과 상기 결과 데이타를 이용하여 임시 변수(M)를 다음 식에 따라 구하는 (c) 단계와,In order to achieve the above object, the first data (Ai, where i is the number of bits) consisting of n + 1 bit strings and the second data Bi consisting of m + 2 bit strings, each of n + m N + 1 to multiply Montgomery modular Ni (where N is third data consisting of m + 2 bitstreams) using control data (CTi) consisting of +1 bit strings and result data (Ti) The Montgomery modular multiplication method according to the present invention performed in the Montgomery modular multiplication apparatus having two processors comprises the steps of: (a) determining whether the control data is '1', and if the control data is '1', the first (B) fetching second and third data and the result data from a previous processor, and using the first and second data and the result data to generate a temporary variable M according to (C) to obtain,

M = (Ti ＋ Ai × Bi) MOD rM = (Ti + Ai × Bi) MOD r

(여기서, r은 소정수이다.)(Where r is a predetermined number)

상기 임시 변수와 상기 제1, 제2 및 제3 데이타들과 상기 결과 데이타를 이용하여 제1 캐리(C=C₁C₀)를 다음 식에 따라 구하는 (d) 단계와,(D) obtaining a first carry (C = C ₁ C ₀ ) using the temporary variable, the first, second and third data and the result data according to the following equation,

C = Ti ＋ Ai × Bi + M × NiC = Ti + Ai × Bi + M × Ni

상기 임시 변수 및 상기 제1 캐리와 A로서 상기 제1 데이타를 저장하는 (e) 단계와, 상기 제1 데이타 및 상기 결과 데이타를 다음 프로세서로 보내고, 상기 제2 및 제3 데이타들과 상기 제어 데이타를 소정 시간 지연후에 상기 다음 프로세서로 보내는 (f) 단계와, 상기 제어 데이타가 '1'이 아니면, 상기 제1, 제2 및 제3 데이타들과 상기 결과 데이타를 상기 이전 프로세서로부터 가져오고, 저장된 상기 A, M 및 C를 독출하는 (g) 단계와, 상기 제2 및 제3 데이타들과 상기 A, M 및 C 및 상기 결과 데이타를 이용하여 제2 캐리(C'=C₁'C₀')를 다음 식에 따라 구하는 (h) 단계와,(E) storing said first data as said temporary variable and said first carry and A, sending said first data and said result data to a next processor, said second and third data and said control data (F) sending to the next processor after a predetermined time delay, and if the control data is not '1', the first, second and third data and the result data are retrieved from the previous processor and stored. (G) reading the A, M and C, and using the second and third data and the A, M and C and the result data, a second carry (C ′ = C ₁ 'C _0). Step (h), where

저장된 상기 A 및 M, 상기 제2 캐리, 상기 제2 및 제3 데이타들 및 상기 결과 데이타를 이용하여 현재 결과 데이타(Ti')를 다음 식에 따라 구하는 (i) 단계와,(I) obtaining current result data Ti ′ using the stored A and M, the second carry, the second and third data, and the result data according to the following equation;

제1 데이타 및 상기 현재 결과 데이타를 다음 프로세서의 결과 데이타로서 보내고, 상기 제어 데이타, 상기 제2 및 제3 데이타들을 소정 시간 지연후에 상기 다음 프로세서로 보내는 (j) 단계와, 모든 결과 데이타가 구해졌는가를 판단하여, 상기 모든 결과 데이타가 구해지지 않았으면 상기 (a) 단계로 진행하는 (k)단계 및 상기 모든 결과 데이타가 구해졌으면, 상기 모든 결과 데이타를 상기 몽고메리 모듈러 승산된 결과로서 결정하는 (l) 단계로 이루어지는 것이 바람직하다.(J) sending first data and the current result data as result data of the next processor and sending the control data, the second and third data to the next processor after a predetermined time delay, and have all result data obtained? (K) proceeding to step (a) if all the result data is not obtained and determining all the result data as the Montgomery modular multiplied result if all the result data is obtained (l). It is preferred that the step).

이하, 본 발명에 의한 몽고메리 모듈러 승산 장치의 구성 및 동작을 첨부한 도면을 참조하여 다음과 같이 설명한다.Hereinafter, with reference to the accompanying drawings, the configuration and operation of the Montgomery modular multiplication apparatus according to the present invention will be described.

도 1은 본 발명에 의한 몽고메리 모듈러 승산 장치의 블럭도로서, 제1, 제2, ..., 제p, ... 및 제n+1 프로세서들(10, 12, ..., 14, .... 및 16)로 구성된다.1 is a block diagram of a Montgomery modular multiplication apparatus according to the present invention, wherein the first, second, ..., p, ... and n + 1 processors 10, 12, ..., 14, ... and 16).

도 1에 도시된 제1 프로세서(10)는 제1 데이타(Ai), 제2 데이타(Bi), 제3 데이타(Ni), 결과 데이타(Ti) 및 제어 데이타(CTi)를 입력하여 임시 변수(M) 및 제1 캐리(C=C₁C₀)를 계산한 후 A로서 제1 데이타(Ai)와 계산한 M 및 C를 저장하거나, 제2 캐리(C'=C₁'C₀') 및 현재 결과값(Ti')를 계산하며, 이전 결과값(Ti) 또는 현재 결과값(Ti')과 외부로부터 입력한 제1 데이타(Ai)를 제2 프로세서(12)로 출력하고 제어 데이타(CTi), 제2 및 제3 데이타들(Bi 및 Ni)들을 소정 시간 지연후에 제2 프로세서(12)로 출력한다. 이 때, 제어 데이타(CTi, 여기서 i는 비트 자리수)는 n+m+1 개의 비트열 즉, CT₀CT₁CT₂... CT_n+mCT_n+m+1로 구성되며, 최하위 비트(CT₀)부터 최상위 비트(CT_n+m+1) 순으로 제1 프로세서(10)로 입력된다. 본 발명의 바람직한 일실시예에서 제어 데이타는 1 0 0 ... 0 0 순으로 제1 프로세서(10)로 입력된다. 또한, 제1 프로세서(10)로 입력되는 결과 데이타(Ti)는 n+m+1 비트열 즉, T₀T₁T₂... T_n+mT_n+m+1로 구성되며, 최하위 비트(T₀)부터 최상위 비트(T_n+m+1) 순으로 제1 프로세서(10)로 입력된다. 본 발명의 바람직한 일실시예에서 결과 데이타는 0 0 0 ... 0 0 순으로 제1 프로세서(10)로 입력된다.The first processor 10 illustrated in FIG. 1 inputs the first data Ai, the second data Bi, the third data Ni, the result data Ti, and the control data CTi to input a temporary variable ( M) and the first carry (C = C ₁ C ₀ ) are calculated and then the first data Ai and the calculated M and C are stored as A, or the second carry (C ′ = C ₁ 'C ₀ ') And calculating the current result value Ti ', outputting the previous result value Ti or the current result value Ti' and externally input first data Ai to the second processor 12 and controlling the control data ( CTi) and second and third data Bi and Ni are output to the second processor 12 after a predetermined time delay. At this time, the control data (CTi, where i is the number of bits) is composed of n + m + 1 bit string, that is, CT ₀ CT ₁ CT ₂ ... CT _{n + m} CT _{n + m + 1} , the least significant bit The data is input to the first processor 10 in order from (CT ₀ ) to most significant bit (CT _{n + m + 1} ). In a preferred embodiment of the present invention, the control data is input to the first processor 10 in the order of 0 0 ... 0 0. In addition, the result data Ti input to the first processor 10 is composed of n + m + 1 bit strings, that is, T ₀ T ₁ T ₂ ... T _{n + m} T _{n + m + 1} and is the lowest. It is input to the first processor 10 in order from bit T ₀ to most significant bit T _{n + m + 1} . In a preferred embodiment of the present invention, the result data is input to the first processor 10 in the order of 0 0 0 ... 0 0.

제2 프로세서(12)는 외부로부터 제1 프로세서(10)로 입력되는 제1, 제2 및 제3 데이타들(Ai, Bi 및 Ni) 및 제어 데이타(CTi)들을 제1 프로세서(10)를 통해 입력하는 한편, 제1 프로세서(10)에서 계산된 현재 결과 데이타(CTi)를 결과 데이타로서 입력하여 제1 프로세서(10)와 동일한 동작을 수행한다. 제p 프로세서(14)는 제p-1 프로세서로부터 출력되는 제1, 2 및 제3 데이타들(Ai, Bi 및 Ni) 및 제어 데이타(CTi)들과, 제p-1 프로세서에서 계산된 현재 결과 데이타를 결과 데이타(Ti)로서 입력하여 제2 프로세서(12)와 동일한 동작을 수행한다.The second processor 12 receives the first, second and third data Ai, Bi, and Ni and control data CTi input to the first processor 10 from the outside through the first processor 10. On the other hand, the same result as the first processor 10 is performed by inputting the current result data CTi calculated by the first processor 10 as result data. The p-th processor 14 includes first, second and third data (Ai, Bi, and Ni) and control data (CTi) output from the p-th processor, and a current result calculated by the p-1 processor. Data is input as the result data Ti to perform the same operation as that of the second processor 12.

제n+1 프로세서(16)는 제n 프로세서로부터 입력한 제1, 2 및 제3 데이타 및 제어 데이타들과 결과 데이타를 이용하여 제1 프로세서(10)와 동일한 동작을 수행하며, 그 결과 계산된 현재 결과 데이타를 출력단자 OUT를 통해 출력한다. 이 때, 출력단자 OUT를 통해 출력된 결과 데이타들 즉, T₀T₁T₂... T_nT_n+1은 n+1 비트열로 구성되는 제1 데이타와 m+2 비트열로 구성되는 제2 데이타의 몽고메리 모듈러 N 승산 결과이다.The n + 1th processor 16 performs the same operation as that of the first processor 10 by using the first, second and third data and control data and the result data input from the nth processor, and the calculated result. Output the current result data through the output terminal OUT. At this time, the result data output through the output terminal OUT, that is, T ₀ T ₁ T ₂ ... T _n T _{n + 1} is composed of the first data consisting of the n + 1 bit string and the m + 2 bit string Montgomery modular N multiplication result of the second data.

이하, 도 1에 도시된 각 프로세서의 세부적인 동작을 다음과 같이 설명한다.Hereinafter, the detailed operation of each processor shown in FIG. 1 will be described as follows.

도 2는 도 1에 도시된 각 프로세서의 본 발명에 의한 블럭도로서, 임시 변수 계산부(40), 제1 및 제2 캐리 계산부들(42 및 46), 제1 선택부(44), 저장부(50), 비교부(48), 결과값 계산부(52), 제2 선택부(54) 및 버퍼(56)로 구성된다.FIG. 2 is a block diagram according to the present invention of each processor shown in FIG. 1, and includes a temporary variable calculator 40, first and second carry calculators 42 and 46, a first selector 44, and storage. The unit 50, the comparison unit 48, the result value calculation unit 52, the second selection unit 54 and the buffer 56.

도 2에 도시된 임시 변수 계산부(40)는 소정값(r)과 이전 프로세서로부터 제1, 제2 및 제3 데이타들(Ai, Bi 및 Ni), 결과 데이타(Ti) 및 제어 데이타(CTi)들을 입력하여 다음 수학식 1과 같이 임시 변수(M)를 계산하고, 계산된 임시변수를 제2 캐리 계산부(46)로 출력한다.The temporary variable calculator 40 shown in FIG. 2 has a predetermined value r and first, second and third data Ai, Bi and Ni, result data Ti and control data CTi from a previous processor. ) To calculate the temporary variable (M) as shown in the following equation 1, and outputs the calculated temporary variable to the second carry calculation unit 46.

M = (Ti ＋ Ai × Bi) MOD r 또는 M = (Ti ＋ Ai × Bi × X ) MOD rM = (Ti + Ai × Bi) MOD r or M = (Ti + Ai × Bi × X) MOD r

여기서, X는 (r - No)' MOD r로서, No는 제3 데이타의 최하위 비트를 나타낸다.Where X is (r-No) 'MOD r, where No represents the least significant bit of the third data.

임시 변수 계산부(40)는 r이 이진수일 때는 X를 이용하지 않고 M을 구하고, r이 16진수, 8진수등과 같이 2진수를 제외한 일반적인 진수일 때는 X를 이용하여 M을 구한다.The temporary variable calculation unit 40 obtains M without using X when r is a binary number, and calculates M using X when r is a general number except binary, such as hexadecimal and octal.

한편, 비교부(48)는 이전 프로세서로부터 입력한 제어 데이타(CTi)와 저 논리 레벨의 데이타를 비교하고, 비교된 결과에 응답하여 선택 신호(SC)를 저장부(50), 제1 및 제2 선택부들(44 및 54), 제2 캐리 계산부(46)로 각각 출력한다. 예를 들면, 비교부(48)는 제어 데이타(CTi)가 저 논리 레벨이면 저 논리 레벨의 선택 신호(SC)를 출력하고, 제어 데이타(CTi)가 고 논리 레벨이면 고 논리 레벨의 선택 신호(SC)를 출력한다.On the other hand, the comparator 48 compares the control data CTi input from the previous processor with the data of the low logic level, and transmits the selection signal SC in response to the result of the comparison. 2 outputs to the second selectors 44 and 54 and the second carry calculator 46, respectively. For example, the comparator 48 outputs a low logic level selection signal SC when the control data CTi is at a low logic level, and a high selection logic signal when the control data CTi is at a high logic level. SC).

제1 캐리 계산부(42)는 소정값(r) 및 저장부(50)에 저장된 제1 캐리(C=C₁C₀)를 이용하여 제3 캐리(C)를 다음 수학식 2와 같이 계산하고, 계산된 제3 캐리(C)를 제1 선택부(44)로 출력한다.The first carry calculation unit 42 calculates the third carry C by using the predetermined value r and the first carry C = C ₁ C ₀ stored in the storage unit 50 as shown in Equation 2 below. The calculated third carry C is output to the first selector 44.

C = C₀＋ C₁×rC = C ₀ + C ₁ × r

제1 선택부(44)는 제1 캐리 계산부(42)로부터 출력되는 제3 캐리(C) 및 저 논리 레벨의 데이타들 중 하나를 선택 신호(SC)에 응답하여 제2 캐리 계산부(46)로 선택적으로 출력한다. 즉, 고 레벨의 선택 신호(SC)가 입력되면 저 논리 레벨의 데이타를 선택하여 제2 캐리 계산부(46)로 출력하고, 저레벨의 선택 신호(SC)가 입력되면 제1 캐리 계산부(42)로부터 입력한 제3 캐리(C)를 선택하여 제2 캐리 계산부(46)로 출력한다.The first selector 44 selects one of the third carry C and the low logic level data output from the first carry calculator 42 in response to the selection signal SC. Optionally print with). That is, when the high level selection signal SC is input, the low logic level data is selected and output to the second carry calculation unit 46. When the low level selection signal SC is input, the first carry calculation unit 42 is selected. The third carry C inputted from) is selected and output to the second carry calculation unit 46.

제2 캐리 계산부(46)는 선택 신호(SC)에 응답하여, 임시 변수, 제1, 제2 및 제3 데이타들, 이전 결과값 및 제1 선택부(44)의 출력을 이용한 제1 캐리(C)를 계산하여 저장부(50)로 출력하거나, 제2 및 제3 데이타들(Bi 및 Ni), 결과 데이타(Ti), 소정값(r) 및 저장부(50)에 저장된 임시 변수(M), A 및 제1 캐리(C)를 이용힌 제2 캐리(C')를 계산하는 역할을 수행한다. 즉, 제2 캐리 계산부(46)는 고 레벨의 선택 신호(SC)가 입력되면 다음 수학식 3과 같이 결과 데이타(Ti), 제1, 제2 및 제3 데이타들(Ai, Bi 및 Ni) 및 임시 변수 계산부(40)로부터 출력되는 임시 변수(M)를 이용하여 제1 캐리(C)를 계산하고, 계산된 제1 캐리(C)를 저장부(50)로 출력한다.In response to the selection signal SC, the second carry calculation unit 46 uses the temporary variable, the first, the second and the third data, the previous result and the first carry using the output of the first selection unit 44. (C) is calculated and output to the storage unit 50, or the second and third data (Bi and Ni), the result data (Ti), the predetermined value (r) and the temporary variables stored in the storage unit 50 ( It calculates the second carry (C ') using M), A and the first carry (C). That is, when the high level selection signal SC is input, the second carry calculation unit 46 outputs the result data Ti, first, second and third data Ai, Bi, and Ni as shown in Equation 3 below. ) And the temporary variable M output from the temporary variable calculator 40 to calculate the first carry C, and output the calculated first carry C to the storage 50.

C = Ti ＋ Ai × Bi + M × NiC = Ti + Ai × Bi + M × Ni

여기서, C₀은 C DIV r MOD r이고, C₁은 C DIV r DIV r이다.Wherein C ₀ is C DIV r MOD r and C ₁ is C DIV r DIV r.

즉, C₀는 제1 캐리(C)를 r로 나눈 몫을 다시 r로 나눈 나머지이고, C₁은 제1 캐리(C)를 r로 나눈 몫을 다시 r로 나눈 몫이다.That is, C ₀ is the remainder obtained by dividing the first carry C divided by r again by r, and C ₁ is the share obtained by dividing the first carry C divided by r again by r.

그러나, 제2 캐리 계산부(46)는 저 레벨의 선택 신호가 입력되면, 다음 수학식 4와 같이, 결과 데이타(Ti), 제1 데이타(A), 임시 변수(M), 제2 및 제3 데이타들(Bi 및 Ni), 소정값(r) 및 저장부(50)에 저장된 제1 캐리(C)를 이용하여 제2 캐리(C')를 계산하고, 계산된 제2 캐리(C')를 결과값 계산부(52)로 출력한다.However, when the low level selection signal is input, the second carry calculation unit 46 receives the result data Ti, the first data A, the temporary variable M, the second and the second, as shown in Equation 4 below. The second carry C 'is calculated using the three data Bi and Ni, the predetermined value r, and the first carry C stored in the storage 50, and the calculated second carry C' ) Is output to the result calculator 52.

여기서, C₀'는 C' DIV r MOD r이고, C₁'는 C' DIV r DIV r이다.Here, C ₀ ′ is C ′ DIV r MOD r and C ₁ ′ is C ′ DIV r DIV r.

한편, 저장부(50)는 비교부(48)로부터 출력되는 선택신호(SC)에 응답하여, 임시 변수 계산부(40)로부터 출력되는 임시 변수(M), 제1 데이타(Ai) 및 제2 캐리 계산부(46)로부터 출력되는 제1 캐리(C)를 저장 또는 독출한다. 여기서, 제1 데이타(Ai)는 a로서 저장한다. 즉, 고 레벨의 선택 신호(SC)가 입력되면 M, Ai 및 C를 저장하고, 저 레벨의 선택 신호(SC)가 입력되면 저장한 M, Ai 및 C를 해당 블럭으로 독출한다.On the other hand, the storage unit 50, in response to the selection signal SC output from the comparator 48, the temporary variable M, the first data Ai, and the second data output from the temporary variable calculator 40. The first carry C outputted from the carry calculation unit 46 is stored or read. Here, the first data Ai is stored as a. That is, when the high level selection signal SC is inputted, M, Ai, and C are stored, and when the low level selection signal SC is inputted, the stored M, Ai, and C are read out to the corresponding block.

도 2에 도시된 결과값 계산부(52)는 소정값(r), 저장부(50)에 저장된 A 및 임시 변수(M)와, 이전 프로세서로부터 출력되는 결과 데이타(Ti)와 제2 및 제3 데이타들(Bi 및 Ni), 제2 캐리 계산부(46)로부터 출력되는 제2 캐리(C'=C₁'C₀')를 이용하여 다음 수학식 5와 같이 현재 결과 데이타(Ti')를 계산하고, 계산된 현재 결과 데이타를 제2 선택부(54)로 출력한다.The result value calculator 52 shown in FIG. 2 includes a predetermined value r, A and temporary variables M stored in the storage 50, result data Ti output from the previous processor, and second and second values. Using the third data Bi and Ni and the second carry C '= C ₁ ' C ₀ 'output from the second carry calculation unit 46, the current result data Ti' as shown in Equation 5 below. Is calculated and the calculated current result data is output to the second selector 54.

제2 선택부(54)는 결과값 계산부(52)에서 계산된 현재 결과 데이타(Ti') 및 이전 프로세서로부터 입력한 결과 데이타(Ti)들중 하나를 선택 신호(SC)에 응답하여 출력단자 OUT1을 통해 다음 프로세서의 결과 데이타로서 선택적으로 출력한다. 즉, 고 레벨의 선택 신호(SC)가 입력되면 이전 프로세서로부터 입력한 결과 데이타(Ti)를 다음 프로세서의 결과 데이타로서 출력단자 OUT1을 통해 출력하고, 저 레벨의 선택 신호(SC)가 입력되면 결과값 계산부(52)에서 계산된 현재 결과 데이타를 다음 프로세서의 결과 데이타로서 출력단자 OUT1을 통해 출력한다.The second selector 54 outputs one of the current result data Ti 'calculated by the result value calculator 52 and the result data Ti input from the previous processor in response to the selection signal SC. Optionally output as result data of the next processor through OUT1. That is, when the high level selection signal SC is input, the result data Ti input from the previous processor is output through the output terminal OUT1 as the result data of the next processor, and when the low level selection signal SC is input, the result data is output. The current result data calculated by the value calculator 52 is output through the output terminal OUT1 as the result data of the next processor.

한편, 버퍼(56)는 이전 프로세서로부터 입력한 제어 데이타(CTi), 제2 및 제3 데이타들(Bi 및 Ni)를 버퍼링하고, 버퍼링된 제어 데이타, 제2 및 제3 데이타들을 다음 프로세서로 출력한다. 즉, 이전 프로세서로부터 입력한 제어 데이타, 제2 및 제3 데이타들을 소정 시간 지연 후에 출력하는 역할을 수행한다.Meanwhile, the buffer 56 buffers the control data CTi, the second and third data Bi and Ni input from the previous processor, and outputs the buffered control data, the second and third data to the next processor. do. That is, it serves to output the control data, the second and the third data input from the previous processor after a predetermined time delay.

도 3은 도 1에 도시된 장치에서 수행되는 본 발명에 의한 몽고메리 모듈러 승산 방법을 설명하기 위한 플로우차트로서, 제어 데이타에 따라 프로세서가 프로세싱하는 단계(제80∼98단계) 및 몽고메리 모듈러 승산 결과를 결정하는 단계(제100 및 102단계)로 이루어진다.FIG. 3 is a flowchart illustrating a Montgomery modular multiplication method according to the present invention performed in the apparatus shown in FIG. 1. Determination steps (100 and 102).

본 발명에 의한 몽고메리 모듈러 승산 방법에서는 먼저, 이전 프로세서 또는 외부로부터 입력한 제어 데이타가 '1'인가를 판단한다(제80단계). 만일, 제어 데이타가 '1'이면, 이전 프로세서 또는 외부로부터 Ai, Bi, Ni 및 Ti를 가져온다(제82단계). 제82단계후에, 제1 및 제2 데이타들(Ai 및 Bi)과 결과 데이타(Ti)들을 이용하여 수학식 1과 같이 임시 변수를 계산한다(제84단계).In the Montgomery modular multiplication method according to the present invention, first, it is determined whether control data input from a previous processor or external device is '1' (step 80). If the control data is '1', Ai, Bi, Ni and Ti are retrieved from the previous processor or the outside (step 82). After step 82, a temporary variable is calculated using Equation 1 using the first and second data Ai and Bi and the result data Ti (step 84).

제84단계후에, 임시 변수(M)와 제1, 제2 및 제3 데이타들(Ai, Bi 및 Ni)과 결과 데이타(Ti)를 이용하여 제1 캐리(C=C₁C₀)를 수학식 3과 같이 구한다(제86단계). 제86단계후에, 계산된 임시 변수(M), 제1 캐리(C) 및 A로서 제1 데이타(Ai)를 저장한다(제88단계). 이는 제어 데이타가 0인 경우 제2 캐리(C')를 구할 때 사용하기 위해서이다.After step 84, the first carry (C = C ₁ C ₀ ) is calculated using the temporary variable M, the first, second and third data Ai, Bi and Ni and the resultant data Ti. Obtained as in Equation 3 (Step 86). After step 86, the first data Ai is stored as the calculated temporary variable M, the first carry C, and A (step 88). This is for use when obtaining the second carry C 'when the control data is zero.

제88단계후에, 이전 프로세서로부터 입력한 제1 데이타(Ai) 및 결과 데이타(Ti)들을 다음 프로세서로 보내고, 제2 및 제3 데이타들(Bi 및 Ni)과 제어 데이타(CTi)를 소정 시간 지연후에 다음 프로세서로 보낸다(제90단계).After operation 88, the first data Ai and the result data Ti input from the previous processor are sent to the next processor, and the second and third data Bi and Ni and the control data CTi are delayed by a predetermined time. After that, the process is sent to the next processor (step 90).

그러나, 만일 제어 데이타가 '1'이 아니고 '0'이면, 도 2에 도시된 각 프로세서는 제1, 제2 및 제3 데이타들(Ai, Bi 및 Ni)과 결과 데이타(Ti)를 이전 프로세서 또는 외부로부터 가져오고, 저장한 A, M 및 C를 독출한다(제92단계). 제92단계후에, 제2 및 제3 데이타들(Bi 및 Ni)과 저장한 A, M 및 C 및 결과 데이타(Ti)를 이용하여 제2 캐리(C'=C₁'C₀')를 수학식 4와 같이 구한다(제94단계).However, if the control data is not '1' and '0', each processor shown in Fig. 2 transfers the first, second and third data Ai, Bi and Ni and the result data Ti to the previous processor. Alternatively, the stored A, M and C are read from the outside (step 92). After step 92, the second carry (C '= C ₁ ' C ₀ ') is calculated using the second and third data Bi and Ni, and the stored A, M and C, and the resultant data Ti. Obtained as in Equation 4 (Step 94).

제94단계후에, 저장한 A 및 M, 제2 캐리, 이전 프로세서로부터 입력한 제2 및 제3 데이타들(Bi 및 Ni) 및 결과 데이타(Ti)를 이용하여 현재 결과 데이타(Ti')를 수학식 5와 같이 구한다(제96단계). 제96단계후에, 이전 프로세서로부터 가져온 제1 데이타(Ai) 및 다음 프로세서의 결과 데이타로서 제96단계에서 구한 현재 결과 데이타를 다음 프로세서로 보내고, 제어 데이타(CTi), 제2 및 제3 데이타들(Bi 및 Ni)을 소정 시간 지연후에 다음 프로세서로 보낸다(제98단계).After operation 94, the current result data Ti ′ is calculated using the stored A and M, the second carry, the second and third data Bi and Ni and the result data Ti input from the previous processor. Obtained as in Equation 5 (Step 96). After step 96, the first data Ai obtained from the previous processor and the current result data obtained in step 96 are sent to the next processor as the result data of the next processor, and the control data CTi, the second and third data ( Bi and Ni) are sent to the next processor after a predetermined time delay (step 98).

제90 또는 제98단계후에, 모든 결과 데이타가 구해졌는가를 판단한다(제100단계). 즉, 도 1에 도시된 제n+1 프로세서(16)로부터 모든 결과 데이타가 출력되었는가를 판단한다. 만일, 모든 결과 데이타가 구해지지 않았으면 제80단계로 진행하여 나머지 결과 데이타를 구하는 작업을 수행하고, 모든 결과 데이타가 구해졌으면, 제n+1 프로세서(16)로부터 출력된 모든 결과 데이타를 제1 데이타와 제2 데이타의 몽고메리 모듈러 N 승산된 결과로서 결정한다(제102단계).After step 90 or 98, it is determined whether all result data have been obtained (step 100). That is, it is determined whether all result data are output from the n + 1th processor 16 shown in FIG. If all the result data is not obtained, the process proceeds to step 80 to obtain the remaining result data. If all the result data is obtained, all the result data output from the n + 1 processor 16 is first checked. The result is determined as the result of the Montgomery modular N multiplication of the data and the second data (step 102).

이상에서 설명한 바와 같이, 본 발명에 의한 몽고메리 모듈러 승산 장치 및 방법은 몽고메리 모듈러 승산의 병렬성을 이용하여 몽고메리 모듈러 승산을 고속으로 수행할 수 있고, 프로세서의 갯수가 종래의 몽고메리 모듈러 승산장치에 비해 적으므로 하드웨어 설계 비용이 저렴한 효과가 있다.As described above, the Montgomery modular multiplication apparatus and method according to the present invention can perform Montgomery modular multiplication at high speed by using the parallelism of the Montgomery modular multiplication, and the number of processors is smaller than that of the conventional Montgomery modular multiplication apparatus. The hardware design cost is low.

Claims

Montgomery modular Ni (where N is m + 2 pieces of first data (Ai, where i is the number of bits) and m + 2 pieces of bit data) is composed of n + 1 bit strings. A Montgomery modular multiplication apparatus for multiplying a third data comprising a bit string, comprising: first to n + 1 processors, and a p + 1 (where 1 ≦ p (integer) ≦ n + 1) processor In response to the control signal input from the p processor, the temporary variable M and the first carry C = C ₁ C using the Ai, Bi, Ni and the previous result value Ti input from the p processor. ₀ ) is calculated according to the following equation and stores Ai as A and stores the M and C, or the A, M and C stored and the Ai, Bi, Ni and the previous input from the p processor using the results to a second carry _{(C '= C 1' C} 0 ') , and the current result value (Ti') and calculating by the following equation and the previous result value (Ti) or Prefecture The result value Ti 'and the input Ai are output to the p + 2 processor, and the control signals, Bi and Ni are output to the p + 2 processor after a predetermined time delay, and the previous input is input to the first processor. The bits except for the result value T and the least significant bit of the control signal are all bits at a low logic level, and the first processor inputs the control signal from the outside, is output from the n + 1 processor, and is n + m. And the current result values comprising a +1 bit string are a result of multiplying the first data with the second data by Montgomery Modular N.

M = (Ti + Ai × Bi) MOD r

(Where r is a predetermined number)

C = Ti + Ai × Bi + M × Ni

Where C ₀ is C DIV r MOD r and C ₁ is C DIV r DIV r.

C '= Ti + A × Bi + M × Ni + C ₀ + C ₁ × r

(Wherein C ₀ 'is C' DIV r MOD r and C ₁ 'is C' DIV r DIV r)

Ti '= (Ti + A × Bi + M × Ni + C ₀ ') MOD r

2. The Montgomery modular multiplication apparatus according to claim 1, wherein the temporary variable (M) is calculated according to the following equation instead of M = (Ti + Ai × Bi) MOD r.

M = (Ti + Ai × Bi × X) MOD r

Where X = (r-No) 'MOD r.

The processor of claim 1 or 2, wherein the p + 1 processor inputs the first, second and third data and the previous result from the r and p processors to determine the temporary variable. A temporary variable calculating means for calculating, in response to a selection signal, storage means for storing or reading first data as the temporary variable, the first carry and the A, and the first carry stored in the r and the storage means. First carry calculation means for calculating the third carry C using the following equation,

C = C ₀ + C ₁ × r

First selection means for selectively outputting the low logic level and the third carry in response to the selection signal, the temporary variable, the first, second and third data in response to the selection signal, the transfer The first carry is calculated using a result value and the output of the first selection means and output to the storage means, or the stored temporary variable and the A and the first carry, the second and third data, the Second carry calculation means for calculating the second carry using the previous result value and the r, the r, the A and the temporary variable stored in the storage means, and the previous result value output from the p processor; Result value calculating means for calculating the current result value using the second and the third data, the second carry output from the second carry calculation means, and the result value calculating means Second selection means for selectively outputting the current result value inputted from or the previous result value inputted from the p processor in response to the selection signal, the input control signal, the second and third data; And a buffer which buffers and outputs the buffer to the p + 2 processor, the control signal and the low logic level, and outputs the selection signal in response to the result of the comparison. .

First data consisting of n + 1 bit strings (Ai, where i is a bit digit) and second data Bi consisting of m + 2 bit strings, each of n + m + 1 bit strings Montgomery modular multiplication with n + 1 processors to multiply Montgomery modular Ni (where N is third data consisting of m + 2 bitstreams) using the resultant control data CTi and result data Ti In the Montgomery modular multiplication method performed in the device,

(a) determining whether the control data is '1',

(b) if the control data is '1', fetching the first, second and third data and the result data from a previous processor,

(c) obtaining a temporary variable M using the first and second data and the result data according to the following equation,

M = (Ti + Ai × Bi) MOD r

(Where r is a predetermined number)

(d) obtaining a first carry (C = C ₁ C ₀ ) using the temporary variable, the first, second and third data and the result data according to the following equation,

C = Ti + Ai × Bi + M × Ni

Where C ₀ is C DIV r MOD r and C ₁ is C DIV r DIV r.

(e) storing the first data as the temporary variable and the first carry and A,

(f) sending the first data and the result data to a next processor and sending the second and third data and the control data to the next processor after a predetermined time delay;

(g) if the control data is not '1', fetching the first, second and third data and the result data from the previous processor and reading the stored A, M and C,

(h) obtaining a second carry (C '= C ₁ ' C ₀ ') using the second and third data, the A, M and C, and the result data according to the following equation:

C '= Ti + A × Bi + M × Ni + C ₀ + C ₁ × r

(Wherein C ₀ 'is C' DIV r MOD r and C ₁ 'is C' DIV r DIV r)

(i) obtaining current result data Ti ′ using the stored A and M, the second carry, the second and third data and the result data according to the following equation,

Ti '= (Ti + A × Bi + M × Ni + C ₀ ') MOD r

(j) sending first data and the current result data as result data of a next processor, and sending the control data, the second and third data to the next processor after a predetermined time delay,

(k) determining whether all result data have been obtained, and if all the result data have not been obtained, proceeding to step (a); and

(l) if all the result data is obtained, determining all the result data as the Montgomery modular multiplication result.