KR101128505B1

KR101128505B1 - method and apparatus for modular multiplication

Info

Publication number: KR101128505B1
Application number: KR1020090032925A
Authority: KR
Inventors: 김영세; 박영수; 박지만; 김무섭; 전성익
Original assignee: 한국전자통신연구원
Priority date: 2008-12-03
Filing date: 2009-04-15
Publication date: 2012-03-27
Also published as: KR20100063623A

Abstract

몽고메리 모듈러 곱셈 연산의 중간값 및 최종 결과값의 크기를 보정하기 위한 별도의 비교 및 감산 과정을 생략한 모듈러 곱셈 연산 방법 및 이를 위한 연산 장치가 제공된다. 워드 기반의 몽고메리 보정 인자를 산출하기 위한 별도의 연산 과정을 생략하고, 모듈러 연산기의 동작 과정에서 독립적으로 동작하는 가산부를 이용하여 모듈러 곱셈의 결과값의 크기를 항상 모듈러스 N보다 작은 값으로 유지함으로써 연산 속도를 향상시키는 한편 하드웨어의 구현 면적을 최소화한다.Provided are a modular multiplication operation method and a computing device for omitting separate comparison and subtraction processes for correcting the magnitudes of the intermediate and final result values of the Montgomery modular multiplication operation. By eliminating the separate calculation process for calculating the word-based Montgomery correction factor, and using the adder that operates independently during the operation of the modular operator, the size of the result of the modular multiplication is always kept smaller than the modulus N. Improves speed while minimizing hardware footprint.

모듈러 곱셈, 몽고메리, RSA Modular Multiplication, Montgomery, RSA

Description

Modular multiplication operation method and apparatus

본 발명의 실시예는 몽고메리 모듈러 곱셈 연산의 중간값 및 최종 결과값의 크기를 보정하기 위한 별도의 비교 및 감산 과정을 생략한 모듈러 곱셈 연산 방법 및 이를 위한 연산 장치에 관한 것이다.Embodiments of the present invention relate to a modular multiplication operation method and a computing device for omitting separate comparison and subtraction processes for correcting the magnitudes of the intermediate and final resultant values of Montgomery modular multiplication operations.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-041-03, 과제명: 차세대 모바일 단말기의 보안 및 신뢰 서비스를 위한 공통 보안 핵심 모듈 개발].The present invention is derived from a study conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2006-S-041-03, Assignment name: Security and trust of next generation mobile terminal Develop common security core modules for services.

RSA(Rivest Shamir Adleman) 암호 알고리즘은 모듈러(modula) 승산을 수행하여 구현되며, 이는 모듈러 곱셈의 반복 수행을 통하여 가능하다. 이러한 모듈러 곱셈을 빠르게 반복 수행하기 위해서 몽고메리 알고리즘이 주로 사용된다.The Rivest Shamir Adleman (RSA) cryptographic algorithm is implemented by performing modula multiplication, which is possible through iterative modular multiplication. Montgomery's algorithm is mainly used to perform this modular multiplication quickly.

모바일 TPM, IC카드, USIM과 같이 제한된 하드웨어 리소스를 가지는 시스템 상에 RSA 암호화 기능을 구현하기 위해서는 전체 데이터를 한꺼번에 처리하는 것이 아니라, 임의의 워드 단위로 나누어 연산하는 워드 기반(word-base)의 몽고메리 알고리즘이 사용된다. To implement RSA encryption on systems with limited hardware resources, such as mobile TPMs, IC cards, and USIMs, word-base Montgomery does not process the entire data at once, but divides it into arbitrary word units. An algorithm is used.

그러나, 워드 기반 몽고메리 알고리즘을 적용하여 설계된 모듈러 곱셈기는 워드 단위의 연산을 처리하므로, 몽고메리 보정 인자를 구하기 위한 연산 과정과, 각 연산 과정에서의 중간값 및 결과값의 크기가 모듈러스의 크기보다 작아지도록 보정하기 위한 비교 및 연산 과정이 부가적으로 요구된다. 이러한 부가적인 연산 과정은 주로 소프트웨어로 처리되므로 이로 인해 발생하는 시간 지연이 전체 모듈러 곱셈 및 RSA의 성능을 저하시키는 중요한 원인 중 하나로 작용한다. However, since the modular multiplier designed by applying the word-based Montgomery algorithm processes a word-by-word operation, the calculation process for obtaining the Montgomery correction factor and the magnitude of the intermediate value and the resultant value in each calculation process are smaller than the modulus. A comparison and calculation process to calibrate additionally is required. Since these additional computations are often handled in software, the time delays that result are one of the major causes of poor overall modular multiplication and RSA performance.

이를 해결하기 위해 몽고메리 보정 인자를 별도로 연산하지 않는 방법이 제시된바 있다. 이는 전체 데이터를 그 크기 그대로 입력받아 처리하는 원리로서, 예를 들어 1024 이상의 비트를 가지는 데이터를 처리하기 위해서는 그에 따라 하드웨어의 면적도 함께 커지므로 모바일 TPM, IC카드, USIM과 같은 소형 임베디드 시스템에 적용하기 힘든 문제가 있다.In order to solve this problem, a method of not separately calculating Montgomery correction factors has been suggested. This is a principle that receives the entire data as it is and processes it. For example, in order to process data with more than 1024 bits, the area of hardware increases accordingly, so it is applied to small embedded systems such as mobile TPM, IC card, and USIM. There is a problem that is difficult to do.

또한, 전체 데이터의 확장을 통해 중간값 및 결과값의 보정 과정을 생략하는 방법이 제시되었으나, 예를 들어 만일 32bit system bus를 사용하는 시스템의 경우엔 2비트 확장을 위해서 최소 32-bit 확장이 필요하며 이는 하드웨어 및 소프트웨어 모두 리던던시(redundancy)로 인한 구현상의 문제를 가지게 된다. In addition, a method of omitting the correction of the intermediate value and the result value has been proposed through the expansion of the entire data. For example, in the case of a system using a 32-bit system bus, a minimum 32-bit extension is required for the 2-bit extension. This is because both hardware and software have implementation problems due to redundancy.

다시 말해, 하드웨어의 경우에는 불가피하게 모듈러 곱셈기의 크기를 증가시켜야 하는데, 예를 들어 1024-bit 모듈러 연산을 구현하기 위해서 임의의 비트 수를 128-bit로 고려할 경우 그대로 128-bit 크기의 모듈러 연산기를 구현해도 된다. 하지만, 데이터 확장을 고려할 때 비슷한 성능을 내기 위해서는 160-bit 크기의 모듈러 연산기를 구현해야 하며, 이는 모듈러 연산기의 하드웨어 면적을 증가시키게 된다. 소프트웨어의 경우에도 각종 데이터 입출력 및 기타 관련된 소프트웨어 연산을 처리할 경우, 1024-bit 데이터 단위로 처리하는 것이 아니라 1120-bit (160-bit * 7) 데이터 단위로 처리해야 하므로 리소스의 증가 및 성능의 저하를 초래하게 된다.In other words, hardware must inevitably increase the size of the modular multiplier. For example, to implement a 1024-bit modular operation, a 128-bit modular arithmetic unit can be used as long as the arbitrary number of bits is considered to be 128-bit. You may implement it. However, considering data expansion, to achieve similar performance, a 160-bit modular operator must be implemented, which increases the hardware area of the modular operator. In the case of software, when processing various data input / output and other related software operation, it should be processed in 1120-bit (160-bit * 7) data unit, not in 1024-bit data unit. Will result.

본 발명은 위와 같은 문제점을 해결하기 위해 제안된 것으로서, RSA 암호화를 위한 워드 기반 모듈러 곱셈에서 연산 과정의 중간값 및 결과값의 크기 보정을 위한 연산 과정을 생략한 모듈러 곱셈 방법 및 그 장치를 제공하고자 한다.The present invention has been proposed to solve the above problems, to provide a modular multiplication method and apparatus for omitting the calculation process for the size correction of the intermediate value and the result of the operation in word-based modular multiplication for RSA encryption do.

위와 같은 목적을 달성하기 위한 본 발명의 일 양태는, 승수 A, 피승수 B, 모듈러스 N에 대한 모듈러 곱셈 결과값 R을 구하는 워드 기반의 몽고메리 모듈러 곱셈 연산 방법에 관한 것으로서, 상기 R을 합(S)과 캐리(C)로 구분하여 초기화하는 제1단계와, 상기 B의 워드 순서 i에 대한 루프를 시작하는 제2단계와, 상기 i 루프 내에서 상기 A의 워드 순서 j에 대한 루프를 시작하고, 상기 R을 초기화하는 제3단계와, 상기 j 루프 내에서 상기 A 및 상기 B에 대한 워드 단위의 모듈러 곱셈을 수행하는 제4단계와, 상기 j 루프 내에서 워드 단위의 덧셈을 소정의 조건에 따라 최대 3번 수행하는 제5단계와, 상기 j 루프 및 상기 i 루프를 종료하고 워드 단위의 덧셈을 2번 하는 제6단계 및 최종 결과값(T)이 상기 N보다 크면 최종 결과값에서 상기 N을 뺀 값을 출력하고, 최종 결과값(T)이 상기 N보다 작으면 최종 결과값을 리턴하는 제7단계를 포함한다.One aspect of the present invention for achieving the above object, relates to a word-based Montgomery modular multiplication operation method for obtaining a modular multiplication result R for multiplier A, multiplicand B, modulus N, the sum (S) And a first step of initializing by dividing and carry (C), a second step of starting a loop for word order i of B, and starting a loop for the word order j of A in the i loop, A third step of initializing R, a fourth step of performing modular word multiplication of the A and B in the j loop, and a word unit addition in the j loop according to a predetermined condition. A fifth step of performing a maximum of three times; a sixth step of ending the j-loop and the i-loop, and adding the word unit twice; and if the final result value T is greater than the N, the N is determined from the final result value. Output the subtracted value, Value (T) comprises a seventh step of returning a final result is smaller than the N.

본 발명의 다른 일 양태는 승수 A, 피승수 B, 모듈러스 N에 대한 모듈러 곱셈 결과값 R을 구하는 워드 기반의 몽고메리 모듈러 곱셈 연산 장치에 관한 것으로서, 상기 R을 합(S)과 캐리(C)로 구분하여 초기화하고, 상기 B의 워드 순서 i에 대 한 루프 및 상기 A의 워드 순서 j에 대한 루프를 시작하며, 초기화된 상기 R을 이용하여 상기 j 루프 내에서 상기 A 및 상기 B에 대한 워드 단위의 모듈러 곱셈을 수행하고, 최종 결과값(T)이 상기 N보다 크면 최종 결과값에서 상기 N을 뺀 값을 출력하고, 최종 결과값(T)이 상기 N보다 작으면 최종 결과값을 리턴하는 연산부 및 상기 j 루프 내에서 워드 단위의 덧셈을 소정의 조건에 따라 최대 3번 수행하고, 상기 j 루프 및 상기 i 루프를 종료하고 워드 단위의 덧셈을 2번 하는 가산부를 포함한다.Another aspect of the present invention relates to a word-based Montgomery modular multiplication operation device for obtaining a modular multiplication result R for multiplier A, multiplicand B, and modulus N, wherein R is divided into sum (S) and carry (C). Initialize the loop, start the loop for word order i of B and the loop for word order j of A, and use the initialized R in word units for the A and B in the j loop. An operation unit for performing a modular multiplication, outputting a value obtained by subtracting N from a final result value if the final result value T is greater than N, and returning a final result value if the final result value T is less than N; and And an adder configured to perform word unit addition up to three times in the j loop according to a predetermined condition, to terminate the j loop and the i loop and to add the unit of word twice.

본 발명에 의하면 저면적 및 고속을 동시에 요구하는 모바일 TPM, IC카드, USIM 과 같은 소형 임베디드 시스템 상에 RSA 암호화 기능을 구현하기 위해 워드 단위로 연산하는 모듈러 곱셈기를 설계함에 있어, 별도의 데이터 확장이나 부가 소프트웨어 동작이 없이도, 모듈러 연산기의 동작 과정에서 독립적으로 동작하는 가산부를 이용하여 모듈러 곱셈의 결과값의 크기를 항상 N보다 작은 값으로 유지할 수 있으므로 하드웨어 면적 및 소프트웨어 사이즈의 증가나 부가 동작으로 인한 시간 지연을 제거함으로써 전체 RSA 연산을 단순화할 수 있다.According to the present invention, in designing a modular multiplier that performs word-by-word operation to implement RSA encryption on small embedded systems such as mobile TPM, IC card, and USIM that simultaneously require low area and high speed, Even without additional software operation, the size of the result of modular multiplication can always be kept smaller than N by using an adder that operates independently during the operation of the modular operator. Eliminating delays can simplify the overall RSA operation.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설 명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding other components unless specifically stated otherwise. In addition, the terms “… unit”, “… unit”, “module”, etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software. have.

먼저, 본 발명의 실시예에 사용되는 몽고메리 모듈러 곱셈 알고리즘을 설명한다. 승수 A, 피승수 B, 그리고 모듈러스 N에 대해서 수학식 1 과 같이 결과값 R을 구하는 연산을 모듈러 곱셈이라 한다.First, the Montgomery modular multiplication algorithm used in the embodiment of the present invention will be described. An operation for obtaining the resultant R for the multiplier A, the multiplicand B, and the modulus N as shown in Equation 1 is called modular multiplication.

R = A * B mod NR = A * B mod N

모듈러 곱셈을 효율적으로 수행하기 위한 다양한 알고리즘이 존재한다. 그 중에서 몽고메리 알고리즘은 모듈러 곱셈 알고리즘을 잉여계수 r을 사용해 정수 Z_n 영역에서 계산을 rZ_n 영역으로 옮겨 계산하는 방법으로서, 몫과 나머지를 구하는 고전적인 나눗셈을 사용하지 않고도 모듈러 연산을 가능케 한다. Various algorithms exist for efficiently performing modular multiplication. Among them, Montgomery's algorithm uses the modular multiplication algorithm to calculate the calculation by moving the calculation from the integer Z _n area to the rZ _n area using the surplus coefficient r, which enables modular operation without using the classical division of the quotient and the remainder.

이 경우 영역의 이동을 위한 추가적인 동작이 요구되지만, 단순히 한 번의 모듈러 곱셈이 아니라 RSA(Rivest Shamir Adleman)와 같이 반복적인 모듈러 곱셈을 통한 모듈러 승산이 모듈러 곱셈기의 주된 사용 목적이다. 모듈러 승산의 경우, 영역의 이동을 매번 수행하는 것이 아니라 전체 승산의 처음과 끝에서만 수행하면 된다. 그리고, 영역의 이동 또한 몽고메리 모듈러 곱셈에 의해 수행할 수 있으므로 몽고메리 모듈러 연산방법은 모듈러 곱셈기의 구현에 매우 적합한 방법이라고 할 수 있다.In this case, an additional operation for moving the region is required, but modular multiplication through repetitive modular multiplication such as Rivest Shamir Adleman (RSA) is not a single modular multiplication, but a main purpose of the modular multiplier. In the case of modular multiplication, the movement of the area is not performed every time, but only at the beginning and the end of the overall multiplication. In addition, since the movement of the region can also be performed by Montgomery modular multiplication, the Montgomery modular arithmetic method is very suitable for the implementation of the modular multiplier.

잉여계수 r은 데이터의 비트 수 n에 대해서 2ⁿ의 값을 가지며 몽고메리 모듈러 곱셈 알고리즘으로 수행한 결과는 수학식 2 와 같다The surplus coefficient r has a value of 2 ⁿ for the number n of bits of data, and the result of the Montgomery modular multiplication algorithm is shown in Equation 2.

R = A * B * r^-1 mod NR = A * B * r ^-1 mod N

수학식 2를 구현하기 위한 일반적인 몽고메리 알고리즘은 수학식 3과 같이 나타낼 수 있다.A general Montgomery algorithm for implementing Equation 2 may be represented as Equation 3.

단계 1 : R = 0Step 1: R = 0

단계 2 : R = R + A*BStep 2: R = R + A * B

단계 3 : m = R*N’mod r (단, r*r^-1-N*N’= 1)Step 3: m = R * N'mod r (where r * r ^-1 -N * N '= 1)

단계 4 : R = (R + m*N)/rStep 4: R = (R + m * N) / r

단계 5 : if(R>N) R = R - NStep 5: if (R> N) R = R-N

단계 6 : return (R)Step 6: return (R)

수학식 3에서 단계 4의 나눗셈 연산은 간단히 쉬프트 동작으로 구현될 수 있 다. In Equation 3, the division operation of step 4 may be simply implemented as a shift operation.

이처럼 몽고메리 알고리즘은 모듈러 곱셈에 필요한 나눗셈 연산을 곱셈 연산과 쉬프트 동작만으로 간략화시켜 그 수행 속도를 향상시킨다. 그러나 단계 3에서 수행되는 몽고메리 보정 인자 m을 구하기 위한 연산 과정이 추가로 구현되어야 하며 이 연산량만큼의 추가 동작시간이 요구된다. 따라서 인자 m을 구하기 위한 별도의 연산 과정을 수행하지 않는 것이 보다 효율적이다.As such, the Montgomery algorithm simplifies the division operation required for modular multiplication with only the multiplication operation and the shift operation. However, an operation process for obtaining the Montgomery correction factor m performed in step 3 must be additionally implemented, and additional operation time is required as much as this calculation amount. Therefore, it is more efficient not to perform a separate operation for obtaining the factor m.

몽고메리 보정 인자 m을 구하기 위한 연산 없이 모듈러 곱셈을 수행하기 위한 기본적인 알고리즘은 수학식 4와 같다The basic algorithm for performing modular multiplication without calculating the Montgomery correction factor m is shown in Equation 4.

단계 1 : R = 0Step 1: R = 0

단계 2 : for i = 0 to n-1{Step 2: for i = 0 to n-1 {

R = R + A_i*BR = R + A _i * B

R = R + R₀*NR = R + R ₀ * N

R = R/2 R = R / 2

}}

단계 3 : if(R>N) R = R - NStep 3: if (R> N) R = R-N

단계 4 : return (R)Step 4: return (R)

수학식 4는 몽고메리 인자 m을 구하기 위한 추가적인 연산 없이 단지 단계 2의 과정만으로 모든 모듈러 곱셈을 수행할 수 있다. 이는 전체 데이터를 그 크기 그대로 입력받아 처리하는 경우에 적용되는 연산 방식이다. Equation 4 can perform all modular multiplications only in the process of step 2 without additional computation to find the Montgomery factor m. This is an operation method that is applied when the entire data is input as it is and processed.

한편, 처리하고자 하는 전체 데이터를 한꺼번에 수행하지 않고 임의의 워드 단위로 나누어 연산하는 방법을 고려할 수 있다. 수학식 5는 이러한 워드 단위 연산을 위한 몽고메리 모듈러 곱셈을 단계적으로 표현한 알고리즘이다On the other hand, it is possible to consider a method of calculating by dividing the entire data to be processed in any word unit without performing at once. Equation (5) is a stepwise representation of Montgomery modular multiplication for such word-by-word operations.

단계 1 : R = 0Step 1: R = 0

단계 2 : for i = 0 to p-1{Step 2: for i = 0 to p-1 {

단계 3 : R = R + A*B_i (B_i는 w-bit 워드)Step 3: R = R + A * B _i (B _i is a w-bit word)

m = R₀* N₀’mod 2^w (N₀ * N₀’ = -1 mod 2^w)m = R ₀ * N ₀ 'mod 2 ^w (N ₀ * N ₀ '= -1 mod 2 ^w )

R = R + N*mR = R + N * m

R = R/2^w R = R / 2 ^w

}}

단계 4 : if(R>N) R = R-N Step 4: if (R> N) R = R-N

단계 5 : return (R)Step 5: return (R)

수학식 5에서 하나의 열(A*B_i)에 대한 수행을 살펴보면, A 역시 B_i와 마찬가지로 p개의 w-bit 워드로 나뉘어 p번의 워드 단위의 곱셈이 이루어지고, (N*m)의 수행 역시 p개의 w-bit 워드 단위로 나뉘어 p번의 워드 단위의 곱셈이 이루어진다. 일반적으로 한 워드의 비트 수는 32 비트 또는 그 정수 배의 비트로 결정되는데 이 는 시스템이 가지는 시스템 버스(system bus)의 데이터 크기가 32 bit인 점을 고려한 것이다.Looking at the performance of one column (A * B _i ) in Equation 5, A is divided into p w-bit words like B _i is multiplied by p word units, the performance of (N * m) Also divided into p w-bit word unit is multiplied by p word units. In general, the number of bits of a word is determined to be 32 bits or an integer multiple of the bits, considering the fact that the data size of the system bus of the system is 32 bits.

따라서, 수학식 5의 알고리즘을 하드웨어로 구현한 장치는 w-bit 워드 단위의 연산 모듈 및 이를 제어하는 모듈을 포함하므로 전체 데이터 비트를 한꺼번에 처리하는 경우에 비해 하드웨어 면적이 감소하게 된다. Therefore, since the apparatus implementing the algorithm of Equation 5 in hardware includes an arithmetic module in units of w-bit words and a module for controlling the same, the hardware area is reduced compared to the case of processing all data bits at once.

워드 단위 연산을 위한 몽고메리 모듈러 곱셈의 또 다른 실시예는, 각각의 워드 단위 연산이 이전의 연산 결과와 상관없이 독립적으로 수행되어 각 부분 연산에서 쉬프트되는 값을 저장해두고 이를 이전 워드 연산의 결과에 더해주는 방식으로 결과값을 구하도록 할 수 있다.Another embodiment of Montgomery modular multiplication for wordwise operations is that each wordwise operation is performed independently of the previous result of the operation, storing the shifted value in each partial operation and adding it to the result of the previous word operation. You can do this in a way.

이 경우, 워드 단위의 곱셈 결과가 항상 w-bit의 크기로 유지되며 아울러 각 부분 연산에 있어서 결과값을 구하기 위한 덧셈 과정은 별도의 모듈로 구현하는 것이 바람직하다. 본 실시예의 몽고메리 모듈러 곱셈 알고리즘은 수학식 6과 같이 표현할 수 있다.In this case, the multiplication result in word units is always maintained in the size of w-bit, and the addition process for obtaining the result value in each partial operation is preferably implemented as a separate module. The Montgomery modular multiplication algorithm of the present embodiment may be expressed as in Equation 6.

단계 1 : R = 0, T = 0Step 1: R = 0, T = 0

단계 2 : for i = 0 to p-1{Step 2: for i = 0 to p-1 {

Pre_c = 0, Next_c = 0Pre_c = 0, Next_c = 0

단계 3 : for j = 0 to p-1{Step 3: for j = 0 to p-1 {

If(i == 0) R_j = 0 (R_j 는 w+1 비트의 크기)If (i == 0) R _j = 0 (R _j is the size of w + 1 bits)

Else R_j = T_j (T_j는 w 비트의 크기)Else R _j = T _j (T _j is the size of w bits)

단계 4 : for b = 0 to w-1{ (단, b는 비트의 순서)Step 4: for b = 0 to w-1 {where b is the order of bits

R_j = R_j + A_jb*B_i R _j = R _j + A _jb * B _i

if(j == 0) m_b = R_j0 (m은 w 비트의 크기)if (j == 0) m _b = R _j0 (m is the size of w bits)

else m_b = m_b else m _b = m _b

R_j = R_j + m_b *N_j R _j = R _j + m _b * N _j

shift_data_b = R_j0 (shift_data는 w 비트의 크기)shift_data _b = R _j0 (shift_data is the size of w bits)

R_j = R_j/2R _j = R _j / 2

} end for b} end for b

단계 5 : (Pre_c, T_j) = R_j + Pre_cStep 5: (Pre_c, T _j ) = R _j + Pre_c

(Next_c, T_j _-1) = T_j _-1 + shift_data + Next_c(Next_c, T _j _-1 ) = T _j _-1 + shift_data + Next_c

} end for j} end for j

} end for i } end for i

단계 6 : If(T>N) T = T-N (단, T는 마지막 Next_c를 포함한 n+1 비트의 크기)Step 6: If (T> N) T = T-N (where T is the size of n + 1 bits including the last Next_c)

단계 7 : return (T)Step 7: return (T)

보다 효율적인 구현을 위해서, 상기 수학식 6에서 임의의 비트 수 w를 결정 할 때, 전체 비트 n 보다 2비트 이상 큰 비트 수 n’ (n’ ≥ n+2)를 전체 비트 수로 하여 n’ = w*p의 형태로 구현한다. 이때, p는 임의의 정수이며 n보다 큰 비트의 값은 0으로 한다. 이는 모듈러 연산의 결과값이 항상 2N 보다 작으므로 그 결과값이 주어진 비트 크기 내에 모두 표현될 수 있도록 한 것이다. 모듈러 곱셈기가 모듈러 승산만을 위한 장치임을 고려한다면 이렇게 2비트 이상 큰 비트 수로 계산을 수행함으로써 모듈러 곱셈 연산의 결과가 N보다 클 경우에 매번 결과값에서 N을 빼는 비교 과정을 수행하지 않아도 최종 모듈러 승산의 결과값은 항상 N 보다 작은 값으로 귀결된다. 만일 모듈러 곱셈의 단일 결과값이 필요할 경우에는 제안되는 연산 장치의 외부에서 별도로 비교 과정을 수행하도록 하면 된다. 따라서 수학식 6에서 단계 6은 이와 같이 데이터 확장을 이용하면 하드웨어로 구현할 때 생략 가능하다. For more efficient implementation, when determining an arbitrary number of bits w in Equation 6, the number of bits n '(n' ≥ n + 2) two or more bits larger than the total bits n as the total number of bits n '= w Implement in the form of * p. At this time, p is an arbitrary integer and the value of the bit larger than n is 0. This is because the result of a modular operation is always less than 2N, so that the result can be represented within a given bit size. Considering that the modular multiplier is a device for modular multiplication only, the calculation is performed with a bit number larger than 2 bits so that if the result of the modular multiplication operation is greater than N, the comparison of subtracting N from the result value is not necessary. The result always results in a value less than N. If a single result of modular multiplication is required, the comparison process can be performed separately outside the proposed computing device. Therefore, step (6) in the equation (6) can be omitted when implemented in hardware by using the data expansion in this way.

워드 단위 연산을 위한 몽고메리 모듈러 곱셈의 또 다른 실시예는, 각 부분 연산의 결과값 및 최종 결과값을 구하기 위한 덧셈 과정 중에 모듈러스 N의 값을 뺀 결과값도 함께 구하여, 최종적으로 얻어지는 두 개의 전체 비트 크기의 결과값 중 하나를 선택하도록 하는 방식을 통해 추가적인 데이터 확장 없이 항상 모듈러 곱셈의 결과값을 N보다 작은 값으로 유지할 수 있다. 이에 대한 수식은 다음 수학식 7과 같다. Another embodiment of Montgomery's modular multiplication for word-wise operations is that two full bits are obtained by subtracting the modulus N in addition to the result of each partial operation and the final result. By choosing one of the magnitude results, you can always keep the result of the modular multiplication less than N without any additional data expansion. The equation for this is shown in Equation 7 below.

이하의 설명에 있어 111…1 이라는 표현은 해당하는 값의 모든 비트 값이 1인 것을 의미한다. 또한 S와 C는 각각 R의 합(sum)과 캐리(carry) 값을 의미한다.In the following description, 111. The expression 1 means that all bits of the corresponding value are 1. In addition, S and C mean the sum and carry value of R, respectively.

단계 1 : R = (S,C) = (0,0), T = 0, T’= 0Step 1: R = (S, C) = (0,0), T = 0, T '= 0

단계 2 : for i = 0 to p-1{Step 2: for i = 0 to p-1 {

pre_c = 0, next_c = 0, sign_c = 1 (세 carry는 모두 1비트)pre_c = 0, next_c = 0, sign_c = 1 (all three carry have 1 bit)

단계 3 : for j = 0 to p-1{Step 3: for j = 0 to p-1 {

If(i == 0) R = (S,C) = (0,0) (R 은 w+1 비트의 크기, S, C 는 w비트의 크기)If (i == 0) R = (S, C) = (0,0) (R is w + 1 bit size, S, C is w bit size)

else {else {

If(j==p-1) {If (j == p-1) {

If(check_ovf0 == 1) (S, C) = (T_j, 111…1)If (check_ovf0 == 1) (S, C) = (T _j , 111… 1)

Else if(check_ovf1 == 1) (S, C) = (1, 111…1)Else if (check_ovf1 == 1) (S, C) = (1, 111… 1)

Else (S, C) = (T_j, 0)Else (S, C) = (T _j , 0)

}}

else (S, C) = (T_j, 0)else (S, C) = (T _j , 0)

} }

단계 4 : for b = 0 to w-1{ (단 b는 비트의 순서)Step 4: for b = 0 to w-1 {(where b is the order of bits)

R = R + A_jb*B_i R = R + A _jb * B _i

if(j == 0) m_b = R₀ (m은 w 비트의 크기, m_b는 m의 b번째 비트, R₀는 R의 0번째 비트)if (j == 0) m _b = R ₀ (m is the size of w bits, m _b is the bth bit of m, R ₀ is the 0th bit of R)

else m_b = m_b else m _b = m _b

R = R + m_b *N_j R = R + m _b * N _j

If(j==0) shift_data_b = 0If (j == 0) shift_data _b = 0

else shift_data_b = R₀ (shift_data는 w 비트의 크기, shift_data_b 는 shift_data의 b번째 비트)else shift_data _b = R ₀ (shift_data is the size of w bits, shift_data _b is the bth bit of shift_data)

R = R/2R = R / 2

} end for b} end for b

단계 5 : (pre_c, T_j) = S + C + pre_c,Step 5: (pre_c, T _j ) = S + C + pre_c,

If(j==p-1) check_ovf0 = pre_cIf (j == p-1) check_ovf0 = pre_c

If(j==0) shift_data = pre_cIf (j == 0) shift_data = pre_c

(next_c, T_j _-1) = T_j _-1 + shift_data + next_c (j==0 이면 T_j _-1 = T_p _-1), (next_c, T _j _-1 ) = T _j _-1 + shift_data + next_c (If j == 0, T _j _-1 = T _p _-1 ),

If(j==0 && i!=0) check_ovf1 = next_cIf (j == 0 && i! = 0) check_ovf1 = next_c

If(i==p-1 && j!=0) (sign_c, T’_j-1) = T_j _-1 + ~N_j _-1 + sign_cIf (i == p-1 && j! = 0) (sign_c, T ' _j-1 ) = T _j _-1 + ~ N _j _-1 + sign_c

} end for j} end for j

} end for i } end for i

단계 5-1 : (next_c, T_p _-1) = T_p _-1 + next_c, Step 5-1: (next_c, T _p _-1 ) = T _p _-1 + next_c,

check_ovf1 = next_ccheck_ovf1 = next_c

(Sign_c, T’_p-1) = T_p _-1 + ~N_p _-1 + sign_c(Sign_c, T ' _p-1 ) = T _p _-1 + ~ N _p _-1 + sign_c

check_sign = check_ovf0 | check_ovf1 | sign_c check_sign = check_ovf0 | check_ovf1 | sign_c

단계 6 : If(check_sign == 1) return(T’)Step 6: If (check_sign == 1) return (T ')

else return (T) else return (T)

수학식 7에서 순차적인 반복 연산을 제어하는 값인 j 와 i 는 각각 A의 워드 순서 및 B의 워드 순서이다. 수학식 7에서 보면 B의 i 번째 워드에 대해서 A는 0에서 p-1 번째 워드까지가 곱해지고, 다시 B의 i+1 번째 워드에 대해서 다음 곱셈을 진행하는 순서로 이루어진다. 이때, 각 B번째 워드와 A와의 곱셈 결과, 즉 한 열인 (A*B_i)의 결과는 중간값으로서 다음 곱셈 연산의 초기값으로 입력된다.In Equation 7, j and i, which control sequential iteration operations, are the word order of A and the word order of B, respectively. In Equation 7, for the i-th word of B, A is multiplied from 0 to p-1 word, and the next multiplication is performed for the i + 1th word of B again. At this time, the multiplication result of each B-th word and A, that is, the result of one column (A * B _i ) is input as an initial value of the next multiplication operation as an intermediate value.

수학식 6과 비교할 때 수학식 7의 차이점은 단계 5에서 이루어지는 덧셈 과정인데, 수학식 6에서는 2번의 덧셈 과정이 이루어지는데 비해 수학식 7에서는 최대 3번의 덧셈 과정이 수행된다. 이는 중간값을 구하는 과정에서는 2번의 덧셈 과정이 이루어지나, 최종 결과값을 구하는 단계, 즉 i가 p-1일 때의 결과값을 구하는 과정에서는 각 word 단위의 결과값에 해당하는 순서의 N 값을 빼는 과정을 통해 또 하나의 최종 결과값인 T’를 구하는 과정이 추가되는 것이다. The difference between Equation 7 compared to Equation 6 is the addition process performed in Step 5, in which Equation 6 performs two addition processes, whereas Equation 7 performs up to three addition processes. In the process of obtaining the intermediate value, two addition processes are performed, but in the process of obtaining the final result value, that is, in the process of obtaining the result value when i is p-1, the N values in the order corresponding to the result value of each word unit By subtracting, another final result T 'is obtained.

이 모든 덧셈 과정은 단계 4에서 이루어지는 모듈러 곱셈 연산과는 독립적으로 수행되며, 모듈러 곱셈이 수행되는 시간 내에 덧셈 과정이 모두 수행될 수 있도록 설계 가능하다. 여기서 뺄셈 과정은 2의 보수를 더하는 것으로 수행가능하므로 N의 1의 보수 및 캐리인 sign_c의 초기값을 1로 하여 더하는 덧셈 과정으로 진행된 다. 단계 5-1은 마지막 모듈러 곱셈 연산 이후의 최상위 워드 값에 대한 결과를 구하기 위해 추가되는 과정이다.All these addition processes are performed independently of the modular multiplication operation in step 4, and can be designed so that the addition process can be performed within the time when the modular multiplication is performed. Since the subtraction process can be performed by adding two's complement, the process of addition is performed by adding an initial value of N's 1's complement and a carry value sign_c as 1. Step 5-1 is a process added to obtain a result for the most significant word value since the last modular multiplication operation.

이러한 덧셈 방법을 통해 중간값 및 최종 결과값을 항상 N 보다 작은 값으로 제한할 수 있는 이유는 다음과 같다. The reason why the addition method can always limit the median and the final result to a value smaller than N is as follows.

먼저 중간값의 경우, 수학식 7에서는 단계 4에서 이루어지는 모듈러 곱셈 연산의 입출력은 R인데, 실제로 모듈러 곱셈 연산의 출력 범위는 0<R<2N 이므로 N보다 한 비트 큰 값이 출력될 가능성이 있다. R을 S와 C로 나누어 저장할 경우, 각 S와 C는 N과 동일한 비트 수를 가질 수 있다. 그러나, 중간값의 경우에는 두 번의 덧셈을 통해 T를 구하게 되는데 결국 T는 또다시 0<T<2N 이 된다. 따라서, 이를 제어하기 위해 수학식 7은 check_ovf0 및 check_ovf1을 이용한다. First, in the case of the intermediate value, the input and output of the modular multiplication operation performed in step 4 in Equation 7 is R. In fact, since the output range of the modular multiplication operation is 0 <R <2N, a value one bit larger than N may be output. When R is divided into S and C and stored, each S and C may have the same number of bits as N. However, in the case of the median value, T is obtained through two additions, and T is again 0 <T <2N. Therefore, Equation 7 uses check_ovf0 and check_ovf1 to control this.

먼저, 최상위 워드 T_p _-1을 구하기 위한 첫 번째 덧셈에서 오버플로우(overflow)가 발생하면 캐리인 pre_c 값을 check_ovf0에 기록한다. pre_c를 다음 i 루프 연산에 초기값으로 주기 위해, R의 초기값은 (S, C)로 구분할 수 있다는 점을 이용한다. First, when overflow occurs in the first addition to obtain the most significant word T _p _-1 , the carry value pre_c is written to check_ovf0. To give pre_c an initial value for the next i-loop operation, we take advantage of the fact that the initial value of R can be distinguished by (S, C).

즉, 오버플로우 값을 S와 C로 구분하면 (111…1, 1)로 표현할 수 있으므로, 중간값 T_p _-1 을 구하는 두 번째 덧셈에서 미리 1을 더해주고, 다음 i 루프 연산의 p-1번째 연산의 초기값에 (T_p _-1, 111…1)을 제공하면, 오버플로우가 발생한 중간값을 다음 연산의 초기값으로 제공할 수 있게 된다. In other words, if the overflow value is divided into S and C, it can be expressed as (111… 1, 1). Therefore, in the second addition to obtain the median value T _p _-1 , 1 is added in advance, and p-1 of the next i loop operation By providing (T _p _-1 , 111... 1) to the initial value of the first operation, the intermediate value of the overflow can be provided as the initial value of the next operation.

이때, pre_c를 발생하고 난 T_p _-1가 111…1 인 경우에는 두 번째 덧셈에서 1을 더해주면 또다시 오버플로우가 발생할 수 있다. 그러나, 이러한 경우가 발생하려면 모듈러스 N의 최상위 워드 값인 Np-1이 111…1 이어야 하지만, 실제로 RSA 암호화를 위한 파라미터로서 모듈러스 N의 경우, 만일 워드 크기가 128bit라고 한다면 128bit 모두 1을 가지는 모듈러스를 파라미터로 설정하는 경우는 발생하지 않기 때문에, 이러한 경우를 고려할 필요는 없다.At this time, T _p _-1 after generating pre_c is 111. If it is 1, adding 1 in the second addition can cause overflow again. However, in order for this case to occur, the most significant word value of modulus N, Np-1, is 111. It should be 1, but in the case of modulus N as a parameter for RSA encryption, if the word size is 128 bits, the modulus having all 1s of 128 bits as a parameter does not occur, so it is not necessary to consider this case.

T_p _-1을 구하는 두 번째 덧셈 과정에서 오버플로우가 발생하면, 캐리인 next_c 값을 check_ovf1에 기록하고, 다음 i 루프 연산에 초기값으로 보낸다. 이때는 p-1번째 연산의 R 초기값에 (1, 111…1)을 제공하는데, 그 이유는 전술한 바대로 check_ovf0 과 check_ovf1은 동시에 발생할 수 없기 때문이다. 만일 check_ovf1이 1이 되려면 두 번째 덧셈 연산의 입력인 T_p _-1 과 next_c가 각각 111…1 및 1이어야 하므로 이를 그대로 다음 연산의 초기값으로 제공하면 된다.If an overflow occurs in the second addition process of obtaining T _p _-1 , the carry next_c value is written to check_ovf1 and sent as the initial value to the next i loop operation. In this case, (1, 111, ..., 1) is provided to the initial value of R of the p-1th operation, because check_ovf0 and check_ovf1 cannot occur at the same time as described above. If check_ovf1 is 1, the inputs of the second addition operation, T _p _-1 and next_c, are respectively 111. Since it should be 1 and 1, you can provide it as the initial value of the next operation.

이상의 과정을 통해 중간값을 제어하고, 이제 i가 p-1인 최종 결과값을 도출하는 단계에서는 한 번의 덧셈 과정이 더 추가된다. 이는 최종 결과값이 N보다 클 경우에 결과값으로 사용할 T’를 구해야 하기 때문이다. 이를 위해 마지막 덧셈 과정을 추가하여, 각 워드 단위의 결과값에 대해서 해당하는 워드의 N값을 빼고 이 값을 T’로 하여 저장한다. 마지막 덧셈 과정이 수행된 후, check_ovf0 또는 check_ovf1이 1인 경우에는 최종 결과값을 T가 아닌 T’로 결정한다. Through the above process, the intermediate value is controlled, and one more addition process is added in the step of deriving the final result value where i is p-1. This is because when the final result is larger than N, T 'is used as the result. To this end, add the final addition process, subtract the N value of the word for the result value in each word unit and store this value as T '. After the last addition process, if check_ovf0 or check_ovf1 is 1, the final result is determined as T 'rather than T.

아울러, 비트 수가 더 커지는 오버플로우가 발생하는 경우가 아니라 단지 최종 결과값이 비트 수는 늘어나지 않는 상태에서 그 값만 N보다 더 커지는 경우도 발생할 수 있다. 이 경우는 뺄셈 과정에서 저장되는 캐리인 sign_c의 최종 값이 1인 경우이므로, 결론적으로 check_ovf0, check_ovf1 및 sign_c 등 3개의 값 중 하나가 1이 되는 경우에 최종 결과값을 T가 아닌 T’로 결정하면 된다. 이 세 개의 값은 모두 0이 되거나 셋 중의 하나의 값만 1이 될 수 있다. In addition, the overflow of the larger number of bits does not occur, but only the final result value is larger than N without increasing the number of bits. In this case, since the final value of the carry_sign_c stored in the subtraction process is 1, the final result is determined to be T 'instead of T when one of three values such as check_ovf0, check_ovf1, and sign_c becomes 1. Just do it. All three values can be zero or only one of the three values can be 1.

따라서, 본 발명의 실시예에서 제안한 구조는 별도의 데이터 확장 없이 모듈러 연산기의 동작 과정에서 독립적으로 동작하는 덧셈기를 이용하여 데이터의 확장 없이도 모듈러 곱셈의 결과값의 크기를 항상 N보다 작은 값으로 유지할 수 있다.Therefore, the structure proposed in the embodiment of the present invention can always maintain the size of the result of modular multiplication to less than N by using an adder that operates independently in the operation process of the modular operator without additional data expansion. have.

이하, 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in detail.

도 1은 수학식 7에 기술한 모듈러 곱셈 (A*B mod N) 방법을 순차적으로 도시한 순서도이다.FIG. 1 is a flowchart sequentially illustrating a modular multiplication (A * B mod N) method described in Equation 7. FIG.

연산이 시작되면, 연산 과정 내의 워드 단위 모듈러 곱셈 연산의 입출력 값인 R은 S와 C로 나뉘어 각각 0으로 초기화되며, A의 워드 순서인 j 및 B의 워드 순서인 i도 0으로 초기화 된다(S100). 다음으로 i 루프(loop)를 시작하는데 이때, 각 캐리인 pre_c, next_c는 각각 0으로 sign_c는 1로 초기화 된다(S200). When the operation starts, R, the input / output value of the word unit modular multiplication operation in the operation process, is divided into S and C, respectively, and is initialized to 0, and the word order of A and i, which is word order of A, are also initialized to 0 (S100). . Next, the i loop starts. At this time, each carry pre_c and next_c are initialized to 0, and sign_c is initialized to 1 (S200).

이제 임의의 i에 대한 j 루프가 0에서 p-1까지 진행되는데, 임의의 j번째 루프는 먼저 워드 단위의 모듈러 곱셈을 수행하기 위한 초기값 R = (S, C)을 설정하게 된다. 우선 i 값이 0인지를 확인하고(S300), 만일 (S, C) = 0인 경우에는 (0, 0)으로 설정한다(S301). 만일 (S, C) ≠ 0인 경우에는 j 값이 p-1인지, 즉 A의 마지막 워드에 대한 모듈러 곱셈인지를 확인하는데(S302), j ≠ p-1인 경우 초기값을 (T_j, 0)으로 설정한다(S303). Now the j loop for any i runs from 0 to p-1, which first sets the initial value R = (S, C) for performing modular multiplication in word units. First, it is checked whether i value is 0 (S300), and if (S, C) = 0, it is set to (0, 0) (S301). If (S, C) ≠ 0, check whether j is p-1, that is, modular multiplication of the last word of A (S302). If j ≠ p-1, the initial value is (T _j , 0) (S303).

만일 j = p-1인 경우에는 오버플로우의 발생 유무에 따라 초기값을 설정해야 하는데, 먼저 check_ovf0가 1인지를 확인한다(S304). check_ovf0가 1일 경우, (T_j, 111…1)을 초기값으로 설정하며(S305), check_ovf0가 1이 아닐 경우에는 check_ovf1이 1인지를 확인한다(S306). 만일 check_ovf1이 1일 경우에는 (1, 111…1)을 초기값으로 설정하며(S307), check_ovf1이 1이 아닐 경우에는 오버플로우가 발생하지 않은 경우이므로 초기값을 (T_j, 0)으로 설정한다(S308). If j = p-1, an initial value should be set according to whether overflow occurs. First, it is checked whether check_ovf0 is 1 (S304). If check_ovf0 is 1, (T _j , 111... 1) is set to an initial value (S305). If check_ovf0 is not 1, it is checked whether check_ovf1 is 1 (S306). If check_ovf1 is 1, (1, 111… 1) is set to an initial value (S307). If check_ovf1 is not 1, an overflow does not occur, so the initial value is set to (T _j , 0). (S308).

초기값 R이 설정되면, 이 초기값과 현재의 i 값과 j 값에 따라 Aj * Bi mod N_j 의 워드 단위 모듈러 곱셈이 수행된다(S400). 이 워드 단위의 모듈러 곱셈은 수학식 6과 동일한 방식을 따르되 R을 S와 C로 나누어 연산하기 위한 덧셈 방법이 적용된다. When the initial value R is set, word-based modular multiplication of Aj * Bi mod N _j is performed according to the initial value and the current i value and j value (S400). Modular multiplication in word units follows the same scheme as in Equation 6, but an addition method for dividing R into S and C is applied.

연산을 마치면 결과값 R이 출력되며, 아울러 몽고메리 보정인자 m 및 shift_data 값 또한 연산 후 출력된다. 몽고메리 보정인자 m은 다음번 워드 단위 모듈러 곱셈을 위해서 저장되며, R 및 shift_data는 이어서 수행되는 덧셈 연산의 입력으로 제공된다. When the operation is completed, the result value R is output, and the Montgomery correction factors m and shift_data are also output after the operation. The Montgomery correction factor m is stored for the next word-by-word modular multiplication, and R and shift_data are provided as input to the subsequent addition operation.

덧셈 과정은 우선 임의의 j번째 루프에 있어서 출력된 S와 C및 pre_c 값을 더하여, (pre_c, T_j)를 출력하는 단계를 거치는데(S500), 이때 덧셈 결과값인 T_j는 아직 완전한 중간값은 아니다. The addition process first adds the S, C and pre_c values output in any j-th loop, and outputs (pre_c, T _j ) (S500), where the addition result T _j is still in the middle. Not a value.

다음으로 현재 j 값이 p-1인지, 즉 마지막 A 워드에 대한 것인지를 확인하고(S501), j= p-1일 경우에는 check_ovf0에 pre_c를 저장하고(S502), j≠ p-1일 경우에는 다시 현재 j 값이 0인지를 확인한다(S503). Next, it is checked whether the current j value is p-1, that is, the last A word (S501). If j = p-1, pre_c is stored in check_ovf0 (S502), and if j ≠ p-1. Check again whether the current j value is 0 (S503).

이 확인 과정은 j가 0 인 경우, 다음에 있을 두 번째 덧셈 과정이 이전의 j가 p-1일 때의 T_j의 완전한 중간값을 구하는 과정이므로, 만일 check_ovf0가 1일 경우에는 이 값을 미리 더해서 T_p _-1 중간값을 구해놓기 위함이다. 그런데, T_p _-1의 완전한 중간값을 구하는 경우의 shift_data 값은 0이므로, 이 shift_data에 pre_c 값을 대입하여 덧셈을 수행하면 오버플로우가 발생한 경우 1을 더하는 효과가 있고 오버플로우가 발생하지 않은 경우에는 그대로 0 값을 유지할 수 있다(S504). This check process is the next second addition process when j is 0, which is the process of finding the complete median value of T _j when the previous j is p-1, so if check_ovf0 is 1 In addition, to get the median T _p _-1 . However, since the shift_data value is 0 when the complete median value of T _p _-1 is 0, the addition of a pre_c value to this shift_data has an effect of adding 1 when an overflow occurs and an overflow does not occur. 0 can be maintained as it is (S504).

만일 j가 0이 아닐 경우에는, 출력된 shift_data를 그대로 이용한다. 그리고, shift_data를 이전에 구해놓은 T_j _-1 에 next_c와 함께 더하고, (next_c, T_j-1)이 중간값으로 출력된다(S505). 이때, j가 0이고 i가 0이 아닌 경우에는(S506) T_p _-1이 중간값으로 출력된 상태이므로 T_p _-1에 오버플로우가 발생했는지를 확인하기 위해 출력된 next_c를 check_ovf1에 저장한다(S507). If j is not 0, the shift_data output is used as it is. Then, shift_data is previously added to T _j _-1 obtained with next_c, and (next_c, T _j-1 ) is output as an intermediate value (S505). In this case, when j is 0 and i is not 0 (S506), since T _p _-1 is output as an intermediate value, the next_c output is stored in check_ovf1 to check whether an overflow occurs in T _p _-1 . (S507).

덧셈 출력이 중간값까지만 필요한 경우, 즉 i가 p-1이 아닌 경우에는 S507까지만 수행하고 새로운 루프를 수행하면 되지만, 덧셈 출력이 최종 결과값을 구하기 위한 중간 과정, 즉 i가 p-1인 마지막 B 워드에 대한 연산 과정일 경우에는 한 번의 덧셈이 더 필요하게 된다. 이를 위해 현재 i가 p-1이고, j가 0이 아닌지를 확인 한다(S508). 이때 j가 0이 아닌지를 확인하는 것은 j가 0일 때는 두 번째 덧셈에서의 출력이 중간값 T_p _-1을 출력하기 때문이다. If the addition output is required only up to the median, i.e. i is not p-1, you only need to do S507 and run a new loop, but the add output is an intermediate process to get the final result, i.e. the last time i is p-1. In the case of the operation on the word B, one more addition is required. To this end, it is checked whether i is p-1 and j is not 0 (S508). This is why j is not 0 because when j is 0, the second addition outputs the intermediate value T _p _-1 .

확인 결과, i가 p-1이고 j가 0이 아닌 경우, 구해진 T_j _-1 에 대해서 T’_j-1 을 구하는 세 번째 덧셈 과정이 수행되는데, 이 과정은 실제로는 N 값을 빼는 과정이므로 수학식 7에서 설명한 바와 같이 T_j _-1에 ~N_j _- ₁(N_j-1의 1의 보수)를 더하고 sign_c의 초기값(j가 1일 때)을 1로 하여 더해주는 방법으로 뺄셈을 수행하여 (sign_c, T’_j-1)을 출력한다(S509).As a result, if i is p-1 and j is not 0, a third addition process is performed to obtain T ' _j-1 for the obtained T _j _-1 , which is actually a process of subtracting N values. As explained in Equation 7, subtraction is performed by adding ~ N _j _- ₁ (1's complement of N _j-1 ) to T _j _-1 and adding the initial value of sign_c (when j is 1) to 1. (sign_c, T ' _j-1 ) is output (S509).

이상의 과정을 순차적으로 진행하는 동안 j가 p-1인가를 확인하여(S309), j가 p-1이 아니면 다시 S300 단계로 복귀하고, j가 p-1인 경우에는 다시 i가 p-1인가를 확인한다(S201). 만약 i가 p-1가 아니면 다시 S200 단계로 복귀하지만, i가 p-1인 경우는 더 이상 수행할 워드 단위 모듈러 곱셈이 남아 있지 않다는 것을 의미하므로, 모든 루프 연산을 종료하고 마지막 덧셈 과정으로 진행한다. While the above processes are sequentially performed, it is checked whether j is p-1 (S309), and if j is not p-1, the process returns to step S300 again, and if j is p-1, is i again p-1? Check (S201). If i is not p-1, the process returns to step S200, but if i is p-1, it means that there is no more word-by-word modular multiplication to be performed. do.

마지막 덧셈 과정은 최종 루프를 마치고 나면 최종 결과값이 T_p _-2까지만 구해져 있는 상태이므로, 최종 워드 결과값인 T_p _-1 및 T’_p-1을 구하는 과정이 된다. 이는 상기 두 번째 덧셈 과정 및 세 번째 덧셈 과정과 동일하게 진행되며, 마지막 덧셈 과정을 통해 최종 check_ovf1이 구해진다. 단, 여기서는 중간값이 아닌 최종 결과값을 구하는 것이므로 두 번째 덧셈 과정에서처럼 pre_c를 shift_data에 넣어 더하는 방식은 취하지 않는다. 그리고, 최종 결과값이 N보다 큰지를 확인하기 위한 값 인 check_sign은, check_ovf0, check_ovf1, c_sign 중 어느 하나의 값이 1인 경우, 1의 값을 가지게 된다(S510). In the final addition process, since the final result is obtained until T _p _-2 after the final loop, the final word result is T _p _-1 and T ' _p _-1 . This proceeds in the same manner as the second addition process and the third addition process, and the final check_ovf1 is obtained through the last addition process. In this case, however, the final result value is obtained instead of the median value. Therefore, the pre_c is added to shift_data as in the second addition process. The check_sign, which is a value for checking whether the final result value is greater than N, has a value of 1 when any one of check_ovf0, check_ovf1, and c_sign is 1 (S510).

마지막 덧셈 과정을 통해 T와 T’가 구해지면 앞서 구한 check_sign이 1인지를 확인하여(S600) 만일 1이면 T’를(S601) 응답하고, 1이 아니면 T를(S602) 최종 출력으로 응답한 후, 본 모듈러 곱셈을 종료한다.When T and T 'are obtained through the final addition process, it is checked whether the previously obtained check_sign is 1 (S600). If 1, T' (S601) is returned. If not 1, T (S602) is returned as the final output. , Ends this modular multiplication.

도 2는 본 발명의 일 실시예에 따른 모듈러 곱셈 장치의 구성을 나타낸 블록도이다. 본 발명의 실시예에 따른 모듈러 곱셈 장치는, 시스템 버스(100), 레지스터 그룹(200), 제어부(300), 연산부(400), 및 가산부(500)를 포함한다.2 is a block diagram showing the configuration of a modular multiplication apparatus according to an embodiment of the present invention. The modular multiplication apparatus according to the embodiment of the present invention includes a system bus 100, a register group 200, a control unit 300, an operation unit 400, and an adder 500.

시스템은 RSA 연산을 위한 모듈러 곱셈이 요구될 경우, 모듈러 곱셈 장치를 구동하고, 필요한 입력 값들(A, B, N, T)을 제공하고, 출력 값(T, T’)를 제공받는데, 이는 시스템 버스(100)를 통해 이루어진다.If a modular multiplication is required for the RSA operation, the system drives the modular multiplication apparatus, provides the necessary input values (A, B, N, T), and is provided with an output value (T, T '). Via bus 100.

레지스터 그룹(200)은 필요한 입력 값 및 출력 값을 워드 단위로 저장하며, 기타 내부 연산을 위해 필요한 레지스터를 포함할 수 있다. The register group 200 stores necessary input values and output values in word units, and may include registers necessary for other internal operations.

연산부(400)는 가장 핵심 연산인 워드 단위의 모듈러 곱셈을 수행하며, 가산부(500)는 워드 단위의 모듈러 곱셈 후의 중간값 및 최종 결과값을 구하기 위한 덧셈을 수행한다. 이러한 모든 내부 모듈의 동작의 제어 및 시스템과의 인터페이스 제어 등은 제어부(300)에서 이루어진다. 이상의 구성 요소들의 동작을 전술한 도 1을 참조하여 설명하면 다음과 같다.The operation unit 400 performs modular multiplication in word units, which is the most core operation, and the adder 500 performs addition to obtain an intermediate value and a final result value after modular multiplication in word units. Control of the operation of all these internal modules and interface control with the system is performed in the controller 300. The operation of the above components will be described with reference to FIG. 1.

먼저, 입력받은 워드 단위의 A, B, N 및 T 등은 레지스터 그룹(200)에 저장된다. 이때, A, B, N는 j 루프 및 i 루프가 진행됨에 따라 레지스터 그룹(200)의 각 레지스터에 순차적으로 입력된다.First, the input word units A, B, N, T, etc. are stored in the register group 200. In this case, A, B, and N are sequentially input to each register of the register group 200 as the j loop and the i loop progress.

제어부(300)는 레지스터 그룹(200)을 제어하여 pre_c, next_c, sign_c를 초기화하고(S200), R을 초기화한다(S300~S308). The controller 300 controls the register group 200 to initialize pre_c, next_c, and sign_c (S200), and initializes R (S300 to S308).

워드 단위의 모듈러 곱셈(S400)은 연산부(400)에서 수행되며, 이어지는 덧셈 과정(S500~S509)은 제어부(300)의 제어를 받아, 가산부(500)에서 이루어 진다. 여기서, 가산부(500)는 두 개의 워드와 하나의 캐리를 더해서 하나의 워드와 캐리를 출력하는 과정인 S500, S505, S509를 담당한다. Modular multiplication (S400) in word units is performed by the operation unit 400, and subsequent addition processes S500 to S509 are performed by the adder 500 under the control of the control unit 300. Here, the adder 500 is responsible for S500, S505, and S509, which are processes of outputting one word and carry by adding two words and one carry.

가산부(500)는 세 번의 덧셈을 위한 개별적인 가산부를 별도로 구비할 수도 있지만, 두 개의 워드와 하나의 캐리를 더하는 하나의 가산부를 이용하여 세 번의 덧셈을 수행할 수도 있다. 후자의 경우, 세 번의 덧셈을 위해, 가산부(500)는 각 순서에 따라 해당 입력 값을 레지스터 그룹(200) 내의 미리 정해진 레지스터에 저장하고, 제어부(300)에 제어에 따라 각 저장값들을 더하는 방식을 이용한다. 최종 덧셈 과정인 S510 역시 이와 동일한 방식으로 이루어진다.The adder 500 may separately include separate adders for three additions, but may also perform three additions using one adder that adds two words and one carry. In the latter case, for three additions, the adder 500 stores corresponding input values in predetermined registers in the register group 200 in each order, and adds the stored values to the control unit 300 according to control. Use the method. The final addition process, S510, is done in the same way.

가산부(500)는 연산부(400)의 하나의 워드 모듈러 곱셈이 수행되는 동안 세 번의 덧셈 과정을 수행하며, 연산부(400)와 독립적으로 동작한다. 이때, 연산부(400)와 가산부(500)는 동시에 동작하는 결과가 되므로, 가산부(500)의 동작(S500~S509)으로 인한 시간 지연은 발생하지 않는다.The adder 500 performs three addition processes while one word modular multiplication of the calculator 400 is performed, and operates independently of the calculator 400. In this case, since the operation unit 400 and the adder 500 are operated at the same time, a time delay due to the operations S500 to S509 of the adder 500 does not occur.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 당업자라면 쉽게 구현할 수 있다.The embodiments of the present invention described above are not implemented only through the apparatus and the method, but may be implemented through a program for realizing a function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded. Implementations can be readily implemented by those skilled in the art from the description of the above-described embodiments.

또한, 이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.In addition, although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of the invention.

도 1은 본 발명의 일 실시예에 의한 모듈러 곱셈 방법을 순차적으로 도시한 순서도이다.1 is a flowchart sequentially illustrating a modular multiplication method according to an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 따른 모듈러 곱셈 장치의 구성을 나타낸 블록도이다.2 is a block diagram showing the configuration of a modular multiplication apparatus according to an embodiment of the present invention.

Claims

In the word-based Montgomery modular multiplication operation method for obtaining a modular multiplication result R for multiplier A, multiplicand B, modulus N,

A first step of initializing the R into a sum S and a carry C;

Starting a loop for word order i of B;

Starting a loop for word order j of A in the i loop and initializing R;

Performing a modular multiplication by word for A and B in the j loop;

A fifth step of performing word-based addition up to three times in the j loop;

A sixth step of terminating the j-loop and the i-loop to add two word units; And

A seventh step of outputting a value obtained by subtracting the N from the final result value if the final result value T is greater than the N, and returning the final result value if the final result value T is less than the N value;

Modular multiplication operation method comprising a.

The method of claim 1, wherein the third step,

Setting the initial value of R to (S, C) = (0, 0) if i indicates the operation of the first word of B (i = 0);

Setting the initial value of R to (T _j , 0) if i is not 0 and j does not indicate the operation of the last word of A (j ≠ p-1);

Setting an initial value of R to (T _j , 111... 1) when j is p-1 and overflow occurs in the last word operation of A;

If j is p-1 and no overflow occurs in the last word operation of A, checking whether an overflow occurs in the shift operation of the last word operation of A;

Setting an initial value of R to (1, 111... 1) when an overflow occurs in the shift operation of the last word operation of A; And

If no overflow occurs in the shift operation of the last word operation of A, setting an initial value of R to (T _j , 0)

Modular multiplication operation method comprising a.

The method of claim 2, wherein the modular multiplication operation of the fourth step

Performing two addition operations by a carry storage addition method; And

Calculating the R, the shift_data factor for the shift operation, and the Montgomery correction factor m

Modular multiplication operation method comprising a.

The method of claim 3, wherein the fifth step

A first addition step of outputting (pre_c, T _j ) by adding the S, C and line carry factors (pre_c) by using the output R of the fourth step as an input.

Modular multiplication operation method comprising a.

The method of claim 4, wherein the fifth step

If j is p-1, a check factor (check_ovf0) indicating whether an operation overflows with respect to the last word of A is set to the pre_c,

If j is 0, set shift_data to pre_c,

A second addition step of outputting (next_c, T _j _-1 ) by adding T _j _-1 , shift_data and the next carry factor (next_c) by inputting the output T _j _- ₁ of the first addition step.

Modular multiplication operation method comprising a.

The method of claim 5, wherein the fifth step

If the second addition step is a final median addition for the last word of A (j = 0, i ≠ 0), a check factor for checking whether an overflow of the final median addition for the last word of A occurs (check_ovf1) Set to next_c above,

When the second addition step is the final result value addition (i = p-1, j ≠ 0), T _j-1 outputted in the second addition step is input, and T _j-1 , N _j-1 Third addition step of adding the complement of 1 (~ N _j-1 ), sign_c and outputting (sign_c, T ' _j-1 )

Modular multiplication operation method comprising a.

The method of claim 6, wherein the fifth step

If multiplication of A * B _i columns is not finished (j ≠ p-1), returning to the third step;

when j = p-1 and i ≠ p-1, returning to the second step; And

when j = p-1 and i = p-1, exiting the i loop and j loop

Modular multiplication operation method further comprising.

The method of claim 7, wherein

A first addition step of adding T _p _-1 and next_c and outputting (next_c, T _p _-1 ) to obtain the last word T _p _-1 of the final result value;

Sets the output next_c to the check_ovf1,

A second addition step of adding T _p ₋₁ , ˜N _p ₋₁ , and sign_c by inputting the output T _p ₋₁ and outputting (sign_c, T ′ _p-1 ); And

check_ovf0 | check_ovf1 | setting the result of the c_sign operation to check_sign to determine whether the final result is greater than N.

Modular multiplication operation method further comprising.

In the word-based Montgomery modular multiplication operation device for obtaining a multiplication result R, multiplier A, multiplicand B, modulus N,

Initialize the R into a sum (S) and a carry (C), and start the loop for the word order i of B and the loop for the word order j of A, and initialize the j using the initialized R. In the loop, modular multiplication is performed on the A and B in word units, and when the final result value T is greater than N, the final result value T is output, and the final result value T is An operation unit for returning a final result value if less than N; And

And an adder configured to perform word unit addition up to three times in the j loop, end the j loop and the i loop, and add word unit twice.