KR102633416B1

KR102633416B1 - Method for privacy preserving using homomorphic encryption with private variables and apparatus theroef

Info

Publication number: KR102633416B1
Application number: KR1020210057872A
Authority: KR
Inventors: 이재욱; 변준영
Original assignee: 서울대학교산학협력단
Priority date: 2021-05-04
Filing date: 2021-05-04
Publication date: 2024-02-05
Also published as: KR20220150641A

Abstract

암호문 처리 방법이 개시된다. 본 암호문 처리 방법은 동형 암호화된 메시지 및 동형 암호화되지 않은 메시지를 슬롯으로 포함하는 데이터 세트를 저장하는 단계, 데이터 세트 내의 동형 암호화된 메시지와 동형 암호화되지 않은 메시지를 기저장된 선형 모델에 적용하여 선형 모델에 대한 추정 값을 산출하는 단계, 및 산출된 추정 값을 외부 장치에 전송하는 단계를 포함한다. A ciphertext processing method is disclosed. This ciphertext processing method includes the steps of storing a data set containing homomorphically encrypted messages and non-homomorphically encrypted messages as slots, applying the homomorphically encrypted messages and non-homomorphically encrypted messages in the data set to a previously stored linear model to model the linear model. It includes calculating an estimated value for and transmitting the calculated estimated value to an external device.

Description

Method and device for securing private variables using homomorphic encryption {METHOD FOR PRIVACY PRESERVING USING HOMOMORPHIC ENCRYPTION WITH PRIVATE VARIABLES AND APPARATUS THEROEF}

본 개시는 동형 암호를 활용한 동형 암호를 활용한 사적 변수의 보안 방법 및 장치에 관한 것으로, 보다 구체적으로는 다양한 변수 중 보안이 필요한 사적 변수만을 암호화하여 회귀 분석을 수행할 수 있는 방법 및 장치에 관한 것이다.The present disclosure relates to a method and device for securing private variables using homomorphic encryption, and more specifically, to a method and device for performing regression analysis by encrypting only private variables that require security among various variables. It's about.

지난 몇 년 동안 머신 러닝은 자율 주행, 마케팅 및 금융을 포함한 거의 모든 부문에서 놀라운 성과를 보여주었다. 머신 러닝의 성공은 GPU로 대표되는 계산 및 저장 기능의 성능 개선뿐만 아니라 공개 데이터의 수도 폭발적으로 증가했기 때문이다. Over the past few years, machine learning has shown remarkable achievements in almost every sector, including autonomous driving, marketing, and finance. The success of machine learning is due not only to improved performance of calculation and storage functions represented by GPUs, but also to the explosive increase in the number of open data.

이러한 점에서, 기업과 기관은 다양한 출처, 특히 개인으로부터 데이터를 수집하기 위해 최선을 다하고 있다. 그러나 개별적인 데이터에는 개인 정보가 포함되어있어 개인 정보 침해에 대한 우려가 제기되고 있다. 개인은 어떤 이유로 든 자신의 개인 정보를 다른 사람이 활용하는 것을 원하지 않을 수 있으며, 그러한 정보를 제공하는 데 동의하더라도 데이터베이스가 공격을 받으면 원치 않는 개인 정보의 유출이 발생할 수 있다. In this regard, companies and organizations are doing their best to collect data from various sources, especially individuals. However, individual data contains personal information, raising concerns about privacy infringement. Individuals may not want their personal information available to others for any reason, and even if they agree to provide such information, an unwanted leak of personal information may occur if the database is attacked.

따라서, 데이터 공개에 보안 기술을 적용하는 것은 더 이상 선택 사항이 아니라 필수이다. 따라서, 개인 정보에 대한 다양한 보안 기술이 제안되고 있으며, 최근에는 이를 해결하기위한 방법으로 동형 암호화 방법이 많이 연구되고 있다. Therefore, applying security technologies to data disclosure is no longer an option but a necessity. Therefore, various security technologies for personal information have been proposed, and recently, homomorphic encryption methods have been widely studied as a method to solve this problem.

동형 암호는 복호 과정 없이 암호화된 데이터에 대한 덧셈 및 곱셈을 수행할 수 있다. 따라서, 동형 암호를 활용함으로써 클라이언트는 신뢰할 수 없는 클라우드 서버에 계산을 위임할 수 있으며, 암호화된 상태의 입력 데이터를 서버에 전송하고, 어떠한 추가적인 질의가 없는 상태에서 모든 계산을 수행할 수 있다. 이와 같이 동형 암호는 개인 데이터에 대한 계산의 위임 구조를 단순하고 안전하게 제공한다. Homomorphic encryption can perform addition and multiplication on encrypted data without a decryption process. Therefore, by utilizing homomorphic encryption, the client can delegate computations to an untrusted cloud server, transmit encrypted input data to the server, and perform all computations without any additional queries. In this way, homomorphic encryption provides a simple and secure delegation structure for calculating personal data.

그러나 머신 러닝에 동형 암호를 활용하는 경우에는 시간과 스토리지 효율성 부분에서 문제가 있었다. 예를 들어, 일반 텍스트로 몇 분 내에 달성할 수 있는 학습 과정에서, 동형 암호문을 이용하는 경우에는 며칠이 소요되었다. However, when using homomorphic encryption for machine learning, there were problems in terms of time and storage efficiency. For example, a learning process that could be accomplished in minutes with plaintext took days using homomorphic ciphertext.

따라서 본 개시는 상술한 바와 같은 문제점을 해결하기 위한 고안된 것으로, 다양한 변수 중 보안이 필요한 사적 변수만을 암호화하여 회귀 분석을 수행할 수 있는 방법 및 장치를 제공하는 데 있다. Therefore, the present disclosure is designed to solve the problems described above, and aims to provide a method and device for performing regression analysis by encrypting only private variables that require security among various variables.

본 개시는 이상과 같은 목적을 달성하기 위한 것으로, 연산 처리 장치에서의 암호문 처리 방법은 동형 암호화된 메시지 및 동형 암호화되지 않은 메시지를 슬롯으로 포함하는 데이터 세트를 저장하는 단계, 상기 데이터 세트 내의 동형 암호화된 메시지와 동형 암호화되지 않은 메시지를 기저장된 선형 모델에 적용하여 상기 선형 모델에 대한 추정 값을 산출하는 단계, 및 상기 산출된 추정 값을 외부 장치에 전송하는 단계를 포함한다. The present disclosure is intended to achieve the above object, and includes a method of processing ciphertext in an arithmetic processing device, including storing a data set including a homomorphically encrypted message and a non-homomorphically encrypted message as a slot, and homomorphically encrypting the data set. Applying the encrypted message and the non-homomorphically encrypted message to a pre-stored linear model to calculate an estimate value for the linear model, and transmitting the calculated estimate value to an external device.

이 경우, 상기 추정값을 산출하는 단계는 상기 선행 모델의 추정 값의 산출에 필요한 다항식 중 상기 동형 암호화된 메시지를 이용하는 항(term)을 추출하고, 상기 추출된 항에 대한 동형 연산을 수행하여, 상기 추정 값을 산출할 수 있다. In this case, the step of calculating the estimated value extracts a term using the homomorphically encrypted message from the polynomial required for calculating the estimated value of the preceding model, performs a homomorphic operation on the extracted term, and performs a homomorphic operation on the extracted term. An estimated value can be calculated.

한편, 상기 추정 값을 산출하는 단계는 상기 데이터 세트에 대응되는 행렬을 생성하고, 상기 행렬에 대해서 상기 동형 암호화된 메시지를 포함하는 제1 행렬과 상기 동형 암호화된 메시지를 포함하지 않는 제2 행렬로 분해하고, 상기 제2 행렬이 상기 제1 행렬에 직교하도록 하고, 상기 제1 행렬과 상기 제2 행렬을 이용하여 상기 선형 모델에 대한 추정 값을 산출할 수 있다. Meanwhile, the step of calculating the estimated value generates a matrix corresponding to the data set, and the matrix is divided into a first matrix including the homomorphically encrypted message and a second matrix not including the homomorphically encrypted message. Decomposition, the second matrix can be orthogonal to the first matrix, and an estimate value for the linear model can be calculated using the first matrix and the second matrix.

한편, 상기 데이터 세트는 각각 동형 암호화된 서로 다른 복수의 변수를 포함할 수 있다. Meanwhile, the data set may include a plurality of different variables, each of which is homomorphically encrypted.

한편, 상기 선형 모델은, 리지 회귀(ridge regression) 선형 모델일 수 있다. Meanwhile, the linear model may be a ridge regression linear model.

한편, 본 개시의 일 실시 예에 따른 연산 장치는 적어도 하나의 인스트럭션(instruction)을 저장하는 메모리, 및 상기 적어도 하나의 인스트럭션을 실행하는 프로세서를 포함하고, 상기 프로세서는 상기 적어도 하나의 인스트럭션을 실행함으로써, 동형 암호화된 메시지 및 동형 암호화되지 않은 메시지를 슬롯으로 포함하는 데이터 세트를 저장하고, 상기 데이터 세트 내의 동형 암호화된 메시지와 동형 암호화되지 않은 메시지를 기저장된 선형 모델에 적용하여 상기 선형 모델에 대한 추정 값을 산출한다. Meanwhile, a computing device according to an embodiment of the present disclosure includes a memory that stores at least one instruction, and a processor that executes the at least one instruction, and the processor executes the at least one instruction. , store a data set containing homomorphically encrypted messages and non-homomorphically encrypted messages as slots, and apply the homomorphically encrypted messages and non-homomorphically encrypted messages in the data set to a pre-stored linear model to estimate the linear model. Calculate the value.

이 경우, 상기 프로세서는 상기 선행 모델의 추정 값의 산출에 필요한 다항식 중 상기 동형 암호화된 메시지를 이용하는 항(term)을 추출하고, 상기 추출된 항에 대한 동형 연산을 수행하여, 상기 추정 값을 산출할 수 있다. In this case, the processor extracts a term using the homomorphically encrypted message from the polynomials needed to calculate the estimated value of the preceding model, performs a homomorphic operation on the extracted term, and calculates the estimated value. can do.

한편, 상기 프로세서는 상기 데이터 세트에 대응되는 행렬을 생성하고, 상기 행렬에 대해서 상기 동형 암호화된 메시지를 포함하는 제1 행렬과 상기 동형 암호화된 메시지를 포함하지 않는 제2 행렬로 분해하고, 상기 제2 행렬이 상기 제1 행렬에 직교하도록 하고, 상기 제1 행렬과 상기 제2 행렬을 이용하여 상기 선형 모델에 대한 추정 값을 산출할 수 있다. Meanwhile, the processor generates a matrix corresponding to the data set, decomposes the matrix into a first matrix including the homomorphically encrypted message and a second matrix not including the homomorphically encrypted message, and 2 The matrix may be orthogonal to the first matrix, and an estimated value for the linear model may be calculated using the first matrix and the second matrix.

한편, 상기 선형 모델은 리지 회귀(ridge regression) 선형 모델일 수 있다. Meanwhile, the linear model may be a ridge regression linear model.

이상과 같은 본 개시의 다양한 실시 예들에 따르면, 개인 보호가 필요한 사적 변수와 그렇지 않은 변수를 구분하여 동형 암호화 처리하여 연산을 처리하는바, 더욱 빠른 연산을 수행할 수 있다. 구체적으로, 전체 데이터를 암호화한 경우의 복잡도는 O(p² log n)인데 반해, 본 개시에 따른 방법의 복잡도는 O(p log n)이다. According to the various embodiments of the present disclosure as described above, private variables that require personal protection and variables that do not are differentiated and homomorphically encrypted to process the calculation, allowing faster calculation. Specifically, the complexity when encrypting the entire data is O(p ² log n), whereas the complexity of the method according to the present disclosure is O(p log n).

또한, 학습 과정에서 경사 하강법을 사용하지 않음으로써, 보다 정확한 해를 효율적으로 얻을 수 있을 뿐만 아니라, 경사 하강법의 성능 최적화에 필요한 학습률 검색을 생략할 수 있는바, 보다 빠른 머신 러닝 학습이 가능한 효과가 있다. In addition, by not using gradient descent in the learning process, not only can more accurate solutions be obtained efficiently, but the learning rate search required to optimize the performance of gradient descent can be omitted, enabling faster machine learning learning. It works.

도 1은 본 개시의 일 실시 예에 따른 네트워크 시스템의 구조를 설명하기 위한 도면,
도 2는 본 개시의 일 실시 예에 따른 연산 장치의 구성을 나타낸 블럭도,
도 3은 본 개시의 일 실시 예에 따른 동형 암호문에 대한 기본 동작을 설명하기 위한 도면,
도 4는 본 개시의 일 실시 예에 따른 동형 암호문에 대한 처리 함수의 예를 설명하기 위한 도면,
도 5는 본 개시의 실시 예에 따른 리지 회귀를 위한 중간 값 산출 알고리즘을 설명하기 위한 도면,
도 6은 본 개시의 실시 예에 따른 산출된 중간 값을 이용하여 리지 회귀의 결과를 산출하는 알고리즘을 설명하기 위한 도면,
도 7은 본 개시 일 실시 예에 따른 동형 암호문에 대한 패킹 동작을 이용한 연산 동작을 설명하기 위한 도면,
도 8은 본 개시에 따른 리지 회귀 동작의 성능을 설명하기 위한 도면,
도 9는 본 개시의 일 실시 예에 따른 암호문 처리 방법을 설명하기 위한 흐름도이다. 1 is a diagram for explaining the structure of a network system according to an embodiment of the present disclosure;
2 is a block diagram showing the configuration of an arithmetic device according to an embodiment of the present disclosure;
3 is a diagram for explaining the basic operation of a homomorphic ciphertext according to an embodiment of the present disclosure;
4 is a diagram illustrating an example of a processing function for homomorphic ciphertext according to an embodiment of the present disclosure;
5 is a diagram for explaining an intermediate value calculation algorithm for ridge regression according to an embodiment of the present disclosure;
6 is a diagram illustrating an algorithm for calculating the result of ridge regression using the calculated median value according to an embodiment of the present disclosure;
7 is a diagram for explaining an operation operation using a packing operation for homomorphic ciphertext according to an embodiment of the present disclosure;
8 is a diagram for explaining the performance of a ridge regression operation according to the present disclosure;
Figure 9 is a flowchart illustrating a method of processing ciphertext according to an embodiment of the present disclosure.

이하에서는 첨부 도면을 참조하여 본 개시에 대해서 자세하게 설명한다. 본 개시에서 수행되는 정보(데이터) 전송 과정은 필요에 따라서 암호화/복호화가 적용될 수 있으며, 본 개시 및 특허청구범위에서 정보(데이터) 전송 과정을 설명하는 표현은 별도로 언급되지 않더라도 모두 암호화/복호화하는 경우도 포함하는 것으로 해석되어야 한다. 본 개시에서 "A로부터 B로 전송(전달)" 또는 "A가 B로부터 수신"과 같은 형태의 표현은 중간에 다른 매개체가 포함되어 전송(전달) 또는 수신되는 것도 포함하며, 반드시 A로부터 B까지 직접 전송(전달) 또는 수신되는 것만을 표현하는 것은 아니다. Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings. Encryption/decryption may be applied to the information (data) transmission process performed in the present disclosure as necessary, and expressions describing the information (data) transmission process in the present disclosure and patent claims are all encryption/decryption even if not separately mentioned. It should be interpreted to include cases as well. In the present disclosure, expressions of the form “transmitted (transmitted) from A to B” or “received by A from B” also include transmission (transmission) or reception with other media in the middle, and must be transmitted (transmitted) from A to B. It does not express only what is directly transmitted (delivered) or received.

본 개시의 설명에 있어서 각 단계의 순서는 선행 단계가 논리적 및 시간적으로 반드시 후행 단계에 앞서서 수행되어야 하는 경우가 아니라면 각 단계의 순서는 비제한적으로 이해되어야 한다. 즉, 위와 같은 예외적인 경우를 제외하고는 후행 단계로 설명된 과정이 선행단계로 설명된 과정보다 앞서서 수행되더라도 개시의 본질에는 영향이 없으며 권리범위 역시 단계의 순서에 관계없이 정의되어야 한다. 그리고 본 명세서에서 "A 또는 B"라고 기재한 것은 A와 B 중 어느 하나를 선택적으로 가리키는 것뿐만 아니라 A와 B 모두를 포함하는 것도 의미하는 것으로 정의된다. 또한, 본 개시에서 "포함"이라는 용어는 포함하는 것으로 나열된 요소 이외에 추가로 다른 구성요소를 더 포함하는 것도 포괄하는 의미를 가진다.In the description of the present disclosure, the order of each step should be understood as non-limiting unless the preceding step must be performed logically and temporally prior to the subsequent step. In other words, except for the above exceptional cases, even if the process described as a subsequent step is performed before the process described as a preceding step, the nature of disclosure is not affected, and the scope of rights must also be defined regardless of the order of the steps. In this specification, the term “A or B” is defined not only to selectively indicate either A or B, but also to include both A and B. In addition, in the present disclosure, the term “includes” has the meaning of including additional components other than those listed as included.

본 개시에서는 본 개시의 설명에 필요한 필수적인 구성요소만을 설명하며, 본 개시의 본질과 관계가 없는 구성요소는 언급하지 아니한다. 그리고 언급되는 구성요소만을 포함하는 배타적인 의미로 해석되어서는 안 되며 다른 구성요소도 포함할 수 있는 비배타적인 의미로 해석되어야 한다.In this disclosure, only essential elements necessary for description of the present disclosure are described, and components that are unrelated to the essence of the present disclosure are not mentioned. And it should not be interpreted in an exclusive sense that includes only the mentioned components, but in a non-exclusive sense that can include other components as well.

그리고 본 개시에서 "값"이라 함은 스칼라값뿐만 아니라 벡터도 포함하는 개념으로 정의된다. And in this disclosure, “value” is defined as a concept that includes not only scalar values but also vectors.

후술하는 본 개시의 각 단계의 수학적 연산 및 산출은 해당 연산 또는 산출을 하기 위해 공지되어 있는 코딩 방법 및/또는 본 개시에 적합하게 고안된 코딩에 의해서 컴퓨터 연산으로 구현될 수 있다.The mathematical operations and calculations of each step of the present disclosure described later may be implemented in computer operations by known coding methods and/or coding designed to be suitable for the present disclosure to perform the corresponding operations or calculations.

이하에서 설명하는 구체적인 수학식은 가능한 여러 대안 중에서 예시적으로 설명되는 것이며, 본 개시의 권리 범위가 본 개시에 언급된 수학식에 제한되는 것으로 해석되어서는 아니된다.The specific mathematical equations described below are explained as examples among various possible alternatives, and the scope of rights of the present disclosure should not be construed as being limited to the mathematical equations mentioned in the present disclosure.

이하에서는 첨부된 도면을 이용하여 본 개시의 다양한 실시 예들에 대하여 구체적으로 설명한다. Hereinafter, various embodiments of the present disclosure will be described in detail using the attached drawings.

도 1은 본 개시의 일 실시 예에 따른 네트워크 시스템의 구조를 설명하기 위한 도면이다. 1 is a diagram for explaining the structure of a network system according to an embodiment of the present disclosure.

도 1을 참조하면, 네트워크 시스템(1000)은 사용자 단말장치(100), 제1 서버(200) 및 제2 서버(300)를 포함할 수 있으며, 각 구성은 네트워크를 통하여 연결될 수 있다. Referring to FIG. 1, the network system 1000 may include a user terminal device 100, a first server 200, and a second server 300, and each component may be connected through a network.

여기서 네트워크는 다양한 형태의 유무선 통신 네트워크, 방송 통신 네트워크, 광통신 네트워크, 클라우드 네트워크 등으로 구현될 수 있으며, 각 장치들은 별도의 매개체 없이 와이파이, 블루투스, NFC(Near Field Communication) 등과 같은 방식으로 연결될 수도 있다. Here, the network can be implemented as various types of wired and wireless communication networks, broadcasting communication networks, optical communication networks, cloud networks, etc., and each device may be connected without a separate medium through methods such as Wi-Fi, Bluetooth, NFC (Near Field Communication), etc. .

도 1에서는 사용자 단말장치(100)가 하나인 것으로 도시하였지만, 구현시에 네트워크 시스템(1000)에는 복수의 사용자 단말장치(100)가 사용될 수 있다. 일 예로, 사용자 단말장치(100)는 스마트폰, 태블릿, 게임 플레이어, PC, 랩톱 PC, 홈서버, 키오스크 등과 같은 다양한 형태의 장치로 구현될 수 있으며, 이밖에 IoT 기능이 적용된 가전 제품 형태로도 구현될 수 있다.Although FIG. 1 shows one user terminal device 100, a plurality of user terminal devices 100 may be used in the network system 1000 when implemented. As an example, the user terminal device 100 may be implemented as various types of devices such as smartphones, tablets, game players, PCs, laptop PCs, home servers, kiosks, etc., and may also be implemented in the form of home appliances with IoT functions applied. It can be implemented.

도 1에 도시된 사용자 단말장치(100)는 데이터 소유자, 제1 서버(200)는 암호화 서비스 공급자(CSP, Crypto-service provider), 제2 서버(300)는 머신러닝 서비스 공급자(MLSP, Machin learning service provider)로 지칭될 수도 있다. The user terminal device 100 shown in FIG. 1 is the data owner, the first server 200 is a crypto-service provider (CSP, Crypto-service provider), and the second server 300 is a machine learning service provider (MLSP, Machin learning). It may also be referred to as a service provider.

제1 서버(200)는 동형 암호에 필요한 비밀 키와 공개 키를 생성하고, 생성된 공개 키를 제2 서버(300) 및 사용자 단말장치(100)에 제공할 수 있다. 구체적으로, 제1 서버(200)는 동형암호화에 필요한 각종 파라미터에 기초하여 비밀 키 및 공개 키를 생성할 수 있다. 구체적인 비밀 키 및 공개 키 생성 동작에 대해서는 도 3을 참조하여 후술한다. The first server 200 may generate a secret key and a public key required for homomorphic encryption, and provide the generated public key to the second server 300 and the user terminal device 100. Specifically, the first server 200 may generate a secret key and a public key based on various parameters required for homomorphic encryption. Specific secret key and public key generation operations will be described later with reference to FIG. 3.

사용자는 자신이 사용하는 사용자 단말장치(100)를 통해서 다양한 정보를 입력할 수 있다. 입력된 정보는 사용자 단말장치(100) 자체에 저장될 수도 있지만, 저장 용량 및 보안 등을 이유로 외부 장치로 전송되어 저장될 수도 있다. 도 2에서 제2 서버(300)는 이러한 정보들을 저장하는 역할을 수행할 수 있다. A user can input various information through the user terminal device 100 that he or she uses. The input information may be stored in the user terminal device 100 itself, but may also be transmitted to and stored in an external device for reasons such as storage capacity and security. In FIG. 2, the second server 300 may serve to store such information.

사용자 단말장치(100)는 개인 정보를 제공받은 공개 키를 이용하여 암호화하고, 암호화된 개인 정보를 제2 서버(300)에 제공할 수 있다. 이때, 사용자 단말장치(100)는 제2 서버(300)에 제공할 여러 정보 중 정보 보고가 필요한 개인 정보에 대해서만 동형 암호화 방식으로 암호화하고, 나머지 정보에 대해서는 동형 암호화 처리 없이 제2 서버(300)에 제공할 수 있다. The user terminal device 100 may encrypt personal information using the provided public key and provide the encrypted personal information to the second server 300. At this time, the user terminal device 100 encrypts only the personal information requiring information reporting among various information to be provided to the second server 300 using homomorphic encryption, and the remaining information is sent to the second server 300 without homomorphic encryption processing. can be provided.

즉, 사용자 단말장치(100)는 개인 정보(또는 사적 변수)를 공개 키를 이용하여 암호화하고, 암호화된 개인 정보와 암호화되지 않은 정보(또는 변수)를 제2 서버(300)에 제공할 수 있다. 여기서 개인 정보 보호가 필요한 정보는 사적 변수라고 지칭될 수 있으며, 하나의 정보(또는 항목)일 수 있으며, 복수의 정보(또는 복수의 항목)일 수 있다. 예를 들어, 이러한 사적 변수는 나이, 성별, 인종 등일 수 있다. 개인 정보 보호가 필요한 항목은 자동으로 선택될 수 있으며, 사용자가 지적한 항목이 사적 변수가 될 수도 있다. That is, the user terminal device 100 may encrypt personal information (or private variables) using a public key and provide the encrypted personal information and unencrypted information (or variables) to the second server 300. . Here, information requiring personal information protection may be referred to as a private variable, and may be one piece of information (or item) or plural pieces of information (or multiple items). For example, these private variables could be age, gender, race, etc. Items that require privacy protection can be automatically selected, and items pointed out by the user may become private variables.

사용자 단말장치(100)는 동형 암호화를 수행하는 과정에서 산출되는 암호화 노이즈, 즉, 에러를 암호문에 포함시킬 수 있다. 구체적으로는, 각 사용자 단말장치(100)에서 생성하는 동형 암호문은, 추후에 비밀 키를 이용하여 복호화하였을 때 메시지 및 에러 값을 포함하는 결과 값이 복원되는 형태로 생성될 수 있다. 예를 들어, 상술한 바와 같은 동형 암호화는 HEAAN 스킴을 이용하여 생성될 수 있으나, 이에 한정되지는 않는다. HEAAN 스킴의 자세한 동작에 대해서는 후술한다. The user terminal device 100 may include encryption noise, that is, an error calculated in the process of performing homomorphic encryption, in the ciphertext. Specifically, the homomorphic ciphertext generated by each user terminal device 100 may be generated in such a way that the resulting value including the message and error value is restored when the homomorphic ciphertext is later decrypted using a secret key. For example, homomorphic encryption as described above may be generated using the HEAAN scheme, but is not limited to this. The detailed operation of the HEAAN scheme will be described later.

공개 키를 제공받은 사용자 단말장치(100)는 제공받은 공개 키를 이용하여 복수의 변수(또는 복수의 개인 정보 또는 복수의 정보) 중 보안이 필요한 변수에 대해서는 동형 암호 처리를 수행하고, 동형 암호화된 변수와 암호화되지 않은 변수를 포함하는 데이터를 제2 서버(300)에 전송할 수 있다. The user terminal device 100 that has received the public key performs homomorphic encryption on variables requiring security among a plurality of variables (or a plurality of personal information or a plurality of information) using the provided public key, and homomorphically encrypts the variable. Data including variables and unencrypted variables may be transmitted to the second server 300.

제2 서버(300)는 사용자 단말장치(100)로부터 데이터를 제공받으면, 제공받은 데이터를 저장할 수 있다. When the second server 300 receives data from the user terminal device 100, it can store the received data.

그리고 제2 서버(300)는 외부의 요청 또는 사용자 단말장치(100)의 요청에 따라, 저장된 데이터를 이용하여 연산 처리를 수행할 수 있다. 여기서 연산 처리는 회귀 분석과 같은 기계 학습 동작일 수 있다. 이러한 기계 학습 동작에 대해서는 도 2와 관련하여 후술한다. And the second server 300 may perform calculation processing using the stored data according to an external request or a request from the user terminal device 100. Here, the computational processing may be a machine learning operation such as regression analysis. This machine learning operation will be described later with reference to FIG. 2.

그리고 제2 서버(300)는 연산 처리 결과를 제1 서버(200)에 제공할 수 있다. 연산 처리 결과를 제공받은 제1 서버(200)는 비밀 키를 이용하여 연산 결과를 해독하고, 그 결과를 사용자 단말장치(100)에 전송할 수 있다. And the second server 300 may provide the calculation processing result to the first server 200. The first server 200, which has received the calculation processing result, can decrypt the calculation result using a secret key and transmit the result to the user terminal device 100.

이하에서는 상술한 제1 서버(200)와 제2 서버(300)는 정직하지만, 호기심 많고, 서로 공모하지 않는 장치로 가정한다. 일반적으로 머신 러닝 서비스 공급자와 암호화 서비스 공급자는 소비자에게 서비스를 제공하여 수익을 얻는 서로 다른 회사이기 때문에, 상술한 가정은 합리적이다. Hereinafter, it is assumed that the above-described first server 200 and second server 300 are honest, but curious, devices that do not collude with each other. Since machine learning service providers and encryption service providers are generally different companies that earn revenue by providing services to consumers, the above assumption is reasonable.

한편, 개인 정보 보안을 위하여, 상술한 제1 서버(200)와 제2 서버(300)는 개인 정보(즉, 상술한 사적 변수)에 대해서 알 수 없어야 한다. 제1 서버(200)는 연산 결과에 대한 복호화 과정에서 최종 결과를 알 수 없어야 하고, 제2 서버(300)도 암호화되지 않은 변수를 사용하여 학습 과정에서의 정보를 이용하여 사적 변수를 유추할 수 없어야 한다. Meanwhile, for personal information security, the above-described first server 200 and second server 300 should not be able to know personal information (i.e., the above-described private variables). The first server 200 must not be able to know the final result during the decryption process for the calculation result, and the second server 300 must also be able to infer private variables using information from the learning process using unencrypted variables. There shouldn't be any.

구체적으로, 적절한 보안 파라미터가 이용되는 경우, 암호화된 변수를 직접 해독할 수 없다. 그러나 연결 공격 또는 속성 추론 공격 등을 이용하여 외부 지식의 도움으로 암호화되지 않은 변수를 사용하여 사적 변수를 추론할 여지가 있다. 이러한 점을 대비하기 위하여, 사용자 단말장치(100)는 상술한 데이터를 제공시에 일부 데이터를 삭제하거나, 무작위성을 추가하여 개인 변수에 대한 예측 가능성을 낮출 수 있다. 이하에서는, 암호화되지 않은 변수에 미리 임의성이 추가되어 제2 서버(300)가 사적 변수에 대한 속성 추론 공격을 효과적으로 수행할 수 없다고 가정한다. Specifically, if appropriate security parameters are used, encrypted variables cannot be decrypted directly. However, there is room for inferring private variables using unencrypted variables with the help of external knowledge using concatenation attacks or attribute inference attacks. In order to prepare for this, the user terminal device 100 may delete some data or add randomness when providing the above-described data to lower the predictability of personal variables. Hereinafter, it is assumed that randomness is added to the unencrypted variable in advance so that the second server 300 cannot effectively perform an attribute inference attack on the private variable.

한편, 제1 서버(200)는 비밀 키를 갖는다는 점에서, 최종 결과를 직접 해독하여 최종 결과를 확인하는 것이 가능하다. 이러한 점을 방지하기 위하여, 사용자 단말장치(100)는 마스크(mask)라고 하는 최종 결과와 길이가 같은 임의 벡터 값을 결정하고, 마스크를 암호화하여 제2 서버(300)에 데이터를 전송할 수 있다. 이와 같이 사용자 단말장치(100)가 데이터를 마스크 처리하여 제2 서버(300)에 제공하는바, 제1 서버(200)는 비밀 키를 이용하여 연산 결과를 복호화하더라도, 사적 변수를 획득하기 어렵게 된다. Meanwhile, since the first server 200 has a secret key, it is possible to check the final result by directly decoding it. To prevent this, the user terminal device 100 may determine a random vector value, called a mask, with the same length as the final result, encrypt the mask, and transmit the data to the second server 300. In this way, the user terminal device 100 masks the data and provides it to the second server 300, so even if the first server 200 decrypts the calculation result using the secret key, it becomes difficult to obtain the private variable. .

이때, 사용자 단말장치(100)는 데이터의 전송 전에 전송할 데이터에 대한 마스킹 처리를 수행하고, 마스킹 처리된 데이터를 제2 서버(300)에 제공할 수도 있다. 마스킹 처리에 대한 구체적인 내용은 후술한다. At this time, the user terminal device 100 may perform masking processing on the data to be transmitted before transmitting the data and provide the masked data to the second server 300. Specific details about masking processing will be described later.

이상과 같이 본 실시 예에 따른 네트워크 시스템(1000)은 개인 정보 보호가 필요한 항목에 대해서만 암호화 처리하는바, 보안 처리가 가능하다. 또한, 사용자의 모든 정보를 암호화하여 처리하는 것이 아니라, 개인 정보가 필요한 항목에 대해서만 동형 암호화 처리한다는 점에서, 분석 과정에서 보다 따른 연산이 가능하다.As described above, the network system 1000 according to this embodiment encrypts only items requiring personal information protection, enabling secure processing. In addition, rather than encrypting and processing all of the user's information, only items that require personal information are homomorphically encrypted, enabling more accurate calculations during the analysis process.

한편, 도 1을 도시하고 설명함에 있어서, 별도의 제1 서버(200)가 암호문에 필요한 공개 키 및 비밀 키를 생성하고, 사용자 단말장치(100)가 공개 키를 제공받아 동형암호문 처리를 수행하는 것으로 설명하였지만, 구현시에는 제1 서버(200)의 기능을 사용자 단말장치(100)가 수행할 수도 있다. 즉, 사용자 단말장치(100)가 직접 비밀 키 및 공개 키를 생성하고, 생성한 공개 키를 이용하여 정보를 동형 암호화하여 제2 서버(200)에 제공하고, 제2 서버(200)에서의 연산 결과를 사용자 단말장치(100)가 제공받아 직접 비밀 키로 복원하는 방식으로도 구현될 수 있다. Meanwhile, in showing and explaining FIG. 1, a separate first server 200 generates a public key and a secret key required for the ciphertext, and the user terminal device 100 receives the public key and performs homomorphic ciphertext processing. Although it has been described as such, when implemented, the user terminal device 100 may perform the functions of the first server 200. That is, the user terminal device 100 directly generates a secret key and a public key, homomorphically encrypts the information using the generated public key, provides it to the second server 200, and performs calculations on the second server 200. It can also be implemented in a way that the user terminal device 100 receives the result and directly restores it with the secret key.

도 2는 본 개시의 일 실시 예에 따른 연산 장치의 구성을 나타낸 블럭도이다. Figure 2 is a block diagram showing the configuration of an arithmetic device according to an embodiment of the present disclosure.

구체적으로, 도 1의 시스템에서 사용자 단말장치 등과 같이 동형 암호화를 수행하는 장치, 제1 서버 장치 등과 같이 동형 암호문에 필요한 키 값을 산출하는 제1 서버, 제2 서버 등과 같이 동형 암호문에 대한 연산을 수행하는 장치 등을 연산 장치라고 지칭할 수 있다. 이러한 연산 장치는 PC(Personal computer), 노트북, 스마트폰, 태블릿, 서버 등 다양한 장치일 수 있다. Specifically, in the system of FIG. 1, a device that performs homomorphic encryption, such as a user terminal device, a first server that calculates the key value required for the homomorphic ciphertext, such as a first server device, and a second server that perform operations on the homomorphic ciphertext. The device that performs the operation may be referred to as an arithmetic device. These computing devices may be various devices such as personal computers (PCs), laptops, smartphones, tablets, and servers.

도 2를 참조하면, 연산 장치(400)는 통신 장치(410), 메모리(420), 디스플레이(430), 조작 입력 장치(440) 및 프로세서(450)를 포함할 수 있다. Referring to FIG. 2 , the computing device 400 may include a communication device 410, a memory 420, a display 430, a manipulation input device 440, and a processor 450.

통신 장치(410)는 연산 장치(400)를 외부 장치(미도시)와 연결하기 위해 형성되고, 근거리 통신망(LAN: Local Area Network) 및 인터넷망을 통해 외부 장치에 접속되는 형태뿐만 아니라, USB(Universal Serial Bus) 포트 또는 무선 통신(예를 들어, WiFi 802.11a/b/g/n, NFC, Bluetooth) 포트를 통하여 접속되는 형태도 가능하다. 이러한 통신 장치(410)는 송수신부(transceiver)로 지칭될 수도 있다. The communication device 410 is formed to connect the computing device 400 with an external device (not shown), and is not only connected to an external device through a local area network (LAN) and the Internet, but also through a USB ( It is also possible to connect through a Universal Serial Bus) port or a wireless communication (e.g., WiFi 802.11a/b/g/n, NFC, Bluetooth) port. This communication device 410 may also be referred to as a transceiver.

통신 장치(410)는 공개 키를 외부 장치로부터 수신할 수 있으며, 연산 장치(400) 자체적으로 생성한 공개 키를 외부 장치로 전송할 수 있다. The communication device 410 can receive a public key from an external device, and the computing device 400 can transmit the public key generated by itself to the external device.

그리고 통신 장치(410)는 외부 장치로부터 메시지를 수신할 수 있으며, 생성한 동형 암호문을 외부 장치로 송신할 수 있다. Additionally, the communication device 410 can receive a message from an external device and transmit the generated homomorphic encrypted text to the external device.

또한, 통신 장치(410)는 암호문 생성에 필요한 각종 파라미터를 외부 장치로부터 수신할 수 있다. 한편, 구현시에 각종 파라미터는 후술하는 조작 입력 장치(440)를 통하여 사용자로부터 직접 입력받을 수 있다. Additionally, the communication device 410 can receive various parameters necessary for generating encrypted text from an external device. Meanwhile, during implementation, various parameters can be directly input from the user through the manipulation input device 440, which will be described later.

또한, 통신 장치(410)는 외부 장치로부터 동형 암호문에 대한 연산을 요청받을 수 있으며, 그에 따라 계산된 결과를 외부 장치에 전송할 수 있다. 여기서 요청받은 연산은 덧셈, 뺄셈, 곱셈과 같은 연산일 수 있으며, 복수의 데이터에 대한 회귀 분석 연산일 수도 있다. Additionally, the communication device 410 may receive a request for an operation on a homomorphic ciphertext from an external device, and may transmit the calculated result accordingly to the external device. The operation requested here may be an operation such as addition, subtraction, or multiplication, or may be a regression analysis operation on multiple data.

메모리(420)에는 연산 장치(400)에 관한 적어도 하나의 인스트럭션(instruction)이 저장될 수 있다. 구체적으로, 메모리(420)에는 본 개시의 다양한 실시 예에 따라 연산 장치(400)가 동작하기 위한 각종 프로그램(또는 소프트웨어)이 저장될 수 있다. At least one instruction regarding the computing device 400 may be stored in the memory 420 . Specifically, the memory 420 may store various programs (or software) for operating the computing device 400 according to various embodiments of the present disclosure.

이러한 메모리(420)는 RAM이나 ROM, 플래시 메모리, HDD, 외장 메모리, 메모리 카드 등과 같은 다양한 형태로 구현될 수 있으며, 어느 하나로 한정되는 것은 아니다. This memory 420 may be implemented in various forms such as RAM, ROM, flash memory, HDD, external memory, memory card, etc., and is not limited to any one.

메모리(420)는 암호화할 메시지를 저장할 수 있다. 여기서 메시지는 사용자가 각종 인용한 각종 신용 정보, 개인 정보 등일 수 있으며, 연산 장치(400)에서 사용되는 위치 정보, 인터넷 사용 시간 정보 등 사용 이력 등과 관련된 정보일 수도 있다. Memory 420 may store messages to be encrypted. Here, the message may be various types of credit information and personal information cited by the user, and may also be information related to usage history, such as location information used in the computing device 400 and information on Internet usage time.

그리고 메모리(420)는 공개 키를 저장할 수 있으며, 연산 장치(400)가 직접 공개 키를 생성한 장치인 경우, 비밀 키뿐만 아니라, 공개 키 및 비밀 키 생성에 필요한 각종 파라미터를 저장할 수 있다. Additionally, the memory 420 can store the public key, and if the computing device 400 is a device that directly generates the public key, it can store not only the secret key but also various parameters necessary for generating the public key and secret key.

그리고 메모리(420)는 후술한 과정에서 생성된 동형 암호문을 저장할 수 있다. 그리고 메모리(420)는 외부 장치에서 전송한 동형 암호문을 저장할 수도 있다. 또한, 메모리(420)는 후술하는 연산 과정에서의 결과물인 연산 결과 암호문을 저장할 수도 있다. And the memory 420 can store the homomorphic ciphertext generated in the process described later. Additionally, the memory 420 may store homomorphic ciphertext transmitted from an external device. Additionally, the memory 420 may store the operation result ciphertext that is the result of the operation process described later.

그리고 메모리(420)는 머신 러닝에 필요한 학습 모델을 저장할 수 있다. 그리고 메모리(420)는 해당 학습 모델에서 사용되는 연산 함수 및 그에 대한 근사 다항식을 저장할 수 있다. 예를들어, 메모리(420)는 후술하는 수학식 10, 15와 같은 선형 모델에 대한 함수를 저장할 수 있다. And the memory 420 can store learning models necessary for machine learning. Additionally, the memory 420 may store the operation function used in the corresponding learning model and its approximate polynomial. For example, the memory 420 may store functions for linear models such as Equations 10 and 15, which will be described later.

디스플레이(430)는 연산 장치(400)가 지원하는 기능을 선택받기 위한 사용자 인터페이스 창을 표시한다. 구체적으로, 디스플레이(430)는 연산 장치(400)가 제공하는 각종 기능을 선택받기 위한 사용자 인터페이스 창을 표시할 수 있다. 이러한 디스플레이(430)는 LCD(liquid crystal display), OLED(Organic Light Emitting Diodes) 등과 같은 모니터일 수 있으며, 후술할 조작 입력 장치(440)의 기능을 동시에 수행할 수 있는 터치 스크린으로 구현될 수도 있다. The display 430 displays a user interface window for selecting a function supported by the computing device 400. Specifically, the display 430 may display a user interface window for selecting various functions provided by the computing device 400. This display 430 may be a monitor such as a liquid crystal display (LCD), an organic light emitting diode (OLED), etc., and may also be implemented as a touch screen that can simultaneously perform the functions of the manipulation input device 440, which will be described later. .

디스플레이(430)는 비밀 키 및 공개 키 생성에 필요한 파라미터의 입력을 요청하는 메시지를 표시할 수 있다. 그리고 디스플레이(430)는 암호화 대상이 메시지를 선택하는 메시지를 표시하거나, 사적 변수를 선택하는 메시지를 표시할 수 있다. 한편, 구현시에 암호화 대상(즉, 사적 변수)은 사용자가 직접 선택할 수도 있고, 자동으로 선택될 수 있다. 즉, 암호화가 필요한 개인 정보 등은 사용자가 직접 메시지를 선택하지 않더라도 자동으로 설정될 수 있다. The display 430 may display a message requesting input of parameters necessary for generating a private key and a public key. Additionally, the display 430 may display a message for the encryption target to select a message, or display a message for selecting a private variable. Meanwhile, at the time of implementation, the encryption target (i.e., private variable) can be selected directly by the user or selected automatically. In other words, personal information that requires encryption can be set automatically even if the user does not directly select the message.

조작 입력 장치(440)는 사용자로부터 연산 장치(400)의 기능 선택 및 해당 기능에 대한 제어 명령을 입력받을 수 있다. 구체적으로, 조작 입력 장치(440)는 사용자로부터 비밀 키 및 공개 키 생성에 필요한 파라미터를 입력받을 수 있다. 또한, 조작 입력 장치(440)는 사용자로부터 암호화될 메시지를 설정받을 수 있다. The manipulation input device 440 may select a function of the computing device 400 and receive a control command for the corresponding function from the user. Specifically, the manipulation input device 440 may receive input from the user of parameters necessary for generating a private key and a public key. Additionally, the manipulation input device 440 can receive a message to be encrypted from the user.

프로세서(450)는 연산 장치(400)의 전반적인 동작을 제어한다. 구체적으로, 프로세서(450)는 메모리(420)에 저장된 적어도 하나의 인스트럭션을 실행함으로써 연산 장치(400)의 동작을 전반적으로 제어할 수 있다. 이러한 프로세서(450)는 CPU(central processing unit), ASIC(application-specific integrated circuit)과 같은 단일 장치로 구성될 수 있으며, CPU, GPU(Graphics Processing Unit) 등의 복수의 장치로 구성될 수도 있다. The processor 450 controls the overall operation of the computing device 400. Specifically, the processor 450 may generally control the operation of the computing device 400 by executing at least one instruction stored in the memory 420. The processor 450 may be composed of a single device such as a central processing unit (CPU) or an application-specific integrated circuit (ASIC), or may be composed of a plurality of devices such as a CPU or a graphics processing unit (GPU).

프로세서(450)는 전송하고자 하는 메시지가 입력되면 메모리(420)에 저장할 수 있다. 그리고 프로세서(450)는 메모리(420)에 저장된 각종 설정 값 및 프로그램을 이용하여, 메시지를 동형 암호화할 수 있다. 이 경우, 공개 키가 사용될 수 있다. 이때, 복수의 메시지 중 개인 정보 보호가 필요한 항목(사적 변수)에 대해서는 동형 암호화를 수행하고, 개인 정보가 필요하지 않은 항목은 암호화 처리를 수행하지 않을 수 있다. The processor 450 may store the message to be transmitted in the memory 420 when it is input. And the processor 450 can homomorphically encrypt the message using various setting values and programs stored in the memory 420. In this case, a public key may be used. At this time, homomorphic encryption may be performed on items that require personal information protection (private variables) among a plurality of messages, and encryption processing may not be performed on items that do not require personal information.

또한, 프로세서(450)는 상술한 복수의 메시지에 대한 외부 장치로의 전송 전에, 복수의 메시지에 대한 마스크 처리를 수행하고, 마스크 처리된 데이터를 외부 장치에 전송할 수도 있다. Additionally, the processor 450 may perform mask processing on a plurality of messages and transmit the masked data to an external device before transmitting the above-described plurality of messages to an external device.

프로세서(450)는 암호화를 수행하는데 필요한 공개 키를 자체적으로 생성하여 사용할 수도 있고, 외부 장치로부터 수신하여 사용할 수도 있다. 일 예로, 연산 장치가 도 1의 제1 서버 장치(200)인 경우, 비밀 키 및 공개 키를 생성하고, 생성한 공개 키를 도 1의 사용자 단말장치(100)에 전송할 수 있다. The processor 450 may generate and use the public key required to perform encryption on its own, or may receive it from an external device and use it. For example, when the computing device is the first server device 200 of FIG. 1, a secret key and a public key may be generated, and the generated public key may be transmitted to the user terminal device 100 of FIG. 1.

그리고 프로세서(450)는 메시지에 대한 동형 암호문을 생성할 수 있다. 구체적으로, 프로세서(450)는 메시지 중 사적 보호가 필요한 항목에 대해서는 공개 키를 적용하여 동형 암호문을 생성할 수 있다. 한편, HEAAN 방식으로 암호문을 처리하는 경우, 프로세서(450)는 메시지에 대해서 링에 속하는 다항식(또는 벡터)으로 변환한 이후에, 공개키를 이용하여 동형 암호문을 생성할 수 있다. And the processor 450 can generate a homomorphic ciphertext for the message. Specifically, the processor 450 can generate homomorphic ciphertext by applying a public key to items that require private protection among messages. Meanwhile, when processing the ciphertext using the HEAAN method, the processor 450 can convert the message into a polynomial (or vector) belonging to a ring and then generate a homomorphic ciphertext using the public key.

그리고 프로세서(450)는 동형 암호문이 생성되면 메모리(420)에 저장하거나, 사용자 요청 또는 기 설정된 디폴트 명령에 따라 동형 암호문을 다른 장치에 전송하도록 통신 장치(410)를 제어할 수 있다. Additionally, the processor 450 can control the communication device 410 to store the homomorphic ciphertext in the memory 420 once it is generated, or to transmit the homomorphic ciphertext to another device according to a user request or a preset default command.

한편, 본 개시의 일 실시 예에 따르면, 패킹(packing)이 이루어질 수도 있다. 동형 암호화에서 패킹을 이용하게 되면, 다수의 메시지를 하나의 암호문으로 암호화하는 것이 가능해진다. 이 경우, 연산 장치(400)에서 각 암호문들 간의 연산을 수행하게 되면, 결과적으로 다수의 메시지에 대한 연산이 병렬적으로 처리되므로 연산 부담이 크게 줄어들게 된다. Meanwhile, according to an embodiment of the present disclosure, packing may be performed. When packing is used in homomorphic encryption, it becomes possible to encrypt multiple messages into one ciphertext. In this case, when the calculation unit 400 performs calculations between each ciphertext, calculations for multiple messages are processed in parallel, thereby greatly reducing the computational burden.

구체적으로는, 프로세서(450)는 메시지가 복수의 메시지 벡터로 이루어지는 경우, 복수의 메시지 벡터를 병렬적으로 암호화할 수 있는 형태의 다항식으로 변환한 후, 그 다항식에 스케일링 팩터를 승산하고 공개 키를 이용하여 동형 암호화할 수도 있다. 이에 따라, 복수의 메시지 벡터를 패킹한 암호문을 생성할 수 있다. Specifically, when a message consists of a plurality of message vectors, the processor 450 converts the plurality of message vectors into a polynomial in a form that can be encrypted in parallel, then multiplies the polynomial by a scaling factor and generates the public key. You can also use homomorphic encryption. Accordingly, it is possible to generate ciphertext by packing a plurality of message vectors.

그리고 프로세서(450)는 동형 암호문에 대한 복호가 필요한 경우, 동형 암호문에 비밀 키를 적용하여 다항식 형태의 복호문을 생성하고, 다항식 형태의 복호문을 디코딩하여 메시지를 생성할 수 있다. In addition, when decryption of the homomorphic ciphertext is necessary, the processor 450 may apply a secret key to the homomorphic ciphertext to generate a decrypted text in the form of a polynomial, and generate a message by decoding the decrypted text in the polynomial form.

그리고 프로세서(450)는 암호문에 대한 연산을 수행할 수 있다. 구체적으로, 프로세서(450)는 동형 암호문에 대해서 암호화된 상태를 유지한 상태에서 덧셈, 뺄셈, 또는 곱셈 등의 연산을 수행할 수 있다. And the processor 450 can perform operations on the ciphertext. Specifically, the processor 450 can perform operations such as addition, subtraction, or multiplication on the homomorphic ciphertext while maintaining the encrypted state.

또한, 프로세서(450)는 암호문에 대해서 리지 회귀를 수행할 수 있다. 리지 회귀는 규제가 추가된 선형 회귀 모델이다. 선형 회귀 모델은 입력 특성에 대한 선형 함수를 만들어 예측을 수행하는 모델로, 일반적인 선행 회귀는 오버피팅은 발생할 수 있다는 점에서, 규제 항이 추가된 모델이다. 리지 회귀에 대한 보다 구체적인 내용은 후술한다. Additionally, the processor 450 may perform ridge regression on the ciphertext. Ridge regression is a linear regression model with added regulation. A linear regression model is a model that performs predictions by creating a linear function for input characteristics. General prior regression is a model with an added regulation term in that overfitting may occur. More specific details about ridge regression are described later.

그리고 연산 장치(400)는 연산이 완료되면, 연산 결과 데이터로부터 유효 영역의 데이터를 검출할 수 있다. 구체적으로, 연산 장치(400)는 연산 결과 데이터를 라운딩 처리를 수행하여 유효 영역의 데이터를 검출할 수 있다. 라운딩 처리란 암호화된 상태에서 메시지의 반올림(round-off)을 진행하는 것을 의미하며, 다르게는 리스케일링(rescaling)이라고 할 수도 있다. And when the calculation is completed, the calculation device 400 can detect data in the effective area from the calculation result data. Specifically, the calculation device 400 may detect data in the effective area by performing a rounding process on the calculation result data. Rounding processing refers to rounding off a message in an encrypted state, and can also be called rescaling.

또한, 연산 장치(400)는 연산 결과 암호문 내의 근사 메시지 비중이 임계치를 초과하면, 암호문에 대한 재부팅 동작을 수행할 수 있다. Additionally, if the proportion of approximate messages in the ciphertext as a result of the calculation exceeds the threshold, the computing device 400 may perform a reboot operation on the ciphertext.

이상과 같은 본 개시에 따른 연산 장치(400)는 개인 보호가 필요한 사적 변수와 그렇지 않은 변수를 구분하여 동형 암호화 처리하여 연산을 처리하는바, 보다 빠른 연산을 수행할 수 있다. 또한, 학습 과정에서 경사 하강법을 사용하지 않음으로써, 보다 정확한 해를 효율적으로 얻을 수 있을 뿐만 아니라, 경사 하강법의 성능 최적화에 필요한 학습률 검색을 생략할 수 있는바, 보다 빠른 머신 러닝 학습이 가능한 효과가 있다. The computing device 400 according to the present disclosure as described above distinguishes between private variables that require personal protection and variables that do not, and performs computation by homomorphically encrypting them, thereby enabling faster computation. In addition, by not using gradient descent in the learning process, not only can more accurate solutions be obtained efficiently, but the learning rate search required to optimize the performance of gradient descent can be omitted, enabling faster machine learning learning. It works.

이하에서는 본 개시에서 이용하는 리지 회귀에 대해서 먼저 간략하게 설명한다. Below, the ridge regression used in this disclosure will first be briefly described.

리지 회귀(Ridge regression)는 입력 변수와 종속 변수 간의 선형 관계를 모델링하는 다양한 기계 학습 기술의 필수 구성 요소이다. 리지 회귀 학습 프로세스에서 가장 많은 리소스가 소요되는 과정은 행렬 반전 연산(matrix inversion operation)이다. Ridge regression is an essential component of various machine learning techniques that model linear relationships between input and dependent variables. The most resource-consuming process in the ridge regression learning process is the matrix inversion operation.

본 개시에서는 동형 암호를 이용하여 사적 변수만 암호화하는 리지 회귀 분석 방법을 설명한다. 리지 추정 공식을 개인 속성이 있는 부분과 없는 부분을 나눠서 암호문과 관련된 작업을 최소화한다. 그런 다음 분해된 행렬의 중복 열을 개인 속성과 직교하도록 강제하여 계산의 최적화를 수행한다. This disclosure describes a ridge regression analysis method that encrypts only private variables using homomorphic encryption. The work related to the ciphertext is minimized by dividing the ridge estimation formula into parts with and without private attributes. Optimization of the computation is then performed by forcing the redundant columns of the decomposed matrix to be orthogonal to the individual attributes.

본 개시에 따른 방법은 경사 하강법을 사용하지 않음으로 정확한 해를 더 효율적으로 얻을 수 있다. 또한, 경사 하강 법의 성능을 최적화하는데 필요한 학습률 검색이 생략되는바, 빠른 처리가 가능하다. The method according to the present disclosure can obtain an accurate solution more efficiently by not using gradient descent. Additionally, the learning rate search required to optimize the performance of the gradient descent method is omitted, enabling fast processing.

한편, 현재는 암호화된 도메인의 매개 변수 검색에는 많은 계산이 필요하기 때문에 관련 매개변수를 일반 텍스트를 사용하여 미리 검색하고 있다. 그러나 이는 실용적인 방법이 아니다. Meanwhile, because searching parameters in an encrypted domain requires a lot of calculations, related parameters are searched in advance using plain text. However, this is not a practical method.

이하에서는 본 개시에서 이용하는 동형 암호에 대한 동작을 설명한다. Below, the operation of the homomorphic encryption used in this disclosure will be described.

동형 암호 스킴은 일반적으로 다음과 같은 구조를 갖는다. A homomorphic encryption scheme generally has the following structure.

KeyGen(I^λ, I^τ)→(sKey, pKey) : 보안 파라미터(λ) 및 기능 파라미터(τ)를 입력하고, 비밀 키(sKey)와 공개 키(pKey)를 출력KeyGen(I ^λ , I ^τ )→(sKey, pKey): Enter security parameters (λ) and function parameters (τ), and output secret key (sKey) and public key (pKey).

Encrypt(pKey, m) → c : 공개 키와 입력 텍스트(m)를 이용하여 암호문(c)을 출력Encrypt(pKey, m) → c: Outputs ciphertext (c) using the public key and input text (m).

Decrypt(sKey, c) → m : 비밀 키와 암호문을 입력하여 복호화된 텍스트(m)를 출력Decrypt(sKey, c) → m: Enter the secret key and ciphertext and output the decrypted text (m).

Evaluate(pKey, f, c) → c_f : 회로(f)의 입력으로 암호문의 벡터에 대해서 f의 출력에 대한 암호문의 벡터를 출력. Evaluate(pKey, f, c) → c _f : Outputs the ciphertext vector for the output of f in relation to the ciphertext vector as the input of the circuit (f).

동형 암호화는 정보 검색과 같은 간단한 쿼리에서 복잡한 기계 학습 알고리즘에 이르기까지 다양한 응용 프로그램을 가지고 있다. 기능 파라미터(τ)는 스킴을 이용한 연산에서 회로의 깊이 경계를 결정한다. Homomorphic encryption has a variety of applications, ranging from simple queries such as information retrieval to complex machine learning algorithms. The functional parameter (τ) determines the depth boundary of the circuit in operations using the scheme.

동형 암호를 실현하는데 사용되는 암호화 시스템은 수년 동안 개발 중이며, 초기 체계는 더하기 또는 곱하기만 지원하며, 이러한 동형 암호를 부분적인 동형 암호화라고 지칭한다. The encryption systems used to realize homomorphic encryption have been under development for several years, and initial schemes only support addition or multiplication, and these homomorphic encryption are referred to as partially homomorphic encryption.

최근에는 덧셈과 곱셈을 모두 지원하며, 암호화 프로세스 중에 암호문에 노이즈가 추가된다. 특히 곱셈은 덧셈 연산보다 더 많은 노이즈를 증가시킨다. 노이즈가 임계 값을 초과하면 암호문은 더이상 복호화할 수 없는 상태가 된다. 이와 같이 노이즈의 증가에 따라 일정 회수의 작업 수만 가능한 체계를 SHE(Some homomorphic encryption)라 한다. SHE 방식 중 일정 수준까지의 깊이로 연산을 정확하게 수행할 수 있는 방식을 LHE(Leveled Homomorphic Encryption)이라 한다. 반대로, 완전 동형 암호화(FHE) 체계는 임의 곱셈 깊이를 갖는 모든 함수를 처리할 수 있다. Recently, both addition and multiplication are supported, and noise is added to the ciphertext during the encryption process. In particular, multiplication increases noise more than addition operation. If the noise exceeds the threshold, the ciphertext can no longer be decrypted. In this way, a system that allows only a certain number of operations as noise increases is called SHE (Some homomorphic encryption). Among the SHE methods, the method that can accurately perform calculations at a certain level of depth is called LHE (Leveled Homomorphic Encryption). In contrast, fully homomorphic encryption (FHE) schemes can handle any function with arbitrary multiplication depth.

이러한 무제한의 작업 수를 수행할 수 있는 기술은 부트스크래핑(bootstrapping)이다. 부트 스트래핑은 암호문에 대한 복호화를 평가하고, 노이즈가 감소된 새로운 동형 암호문을 생성하는 것이다. 그러나 이러한 복호화 방식은 복잡하고 비선형적이므로 매우 많은 리소스가 요구된다. The technique that allows you to perform this unlimited number of operations is bootstrapping. Bootstrapping evaluates the decryption of the ciphertext and generates a new homomorphic ciphertext with reduced noise. However, this decryption method is complex and non-linear, so it requires a lot of resources.

한편, 동형 암호를 사용할 때, 고려해야 하는 사항은 비다항식 함수를 사용하는 것이다. 지수 함수와 같은 비다항식 함수는 덧셈 및 곱셈 등으로만 구성되는 다항 함수로 근사해야 하며, 높은 정밀도를 보장하려면 고차 다항식 연산이 필요로 한다. Meanwhile, when using homomorphic encryption, one thing to consider is using a non-polynomial function. Non-polynomial functions, such as exponential functions, must be approximated by polynomial functions consisting only of addition and multiplication, and higher-order polynomial operations are required to ensure high precision.

최근에는 고차 다항식을 근사하기 위하여 반복적인 방법을 사용하나, 이러한 반복적인 방법은 곱셈 깊이를 증가시킨다. 한편, 행렬 반전 연산은 여러 행렬 곱셈이 필요하고, 수치 안정성을 위해 Schulz 알고리즘 등과 같은 알려진 알고리즘을 사용하여 근사화할 수 있지만 많은 근사화에 많은 리소스가 요구된다. Recently, iterative methods are used to approximate higher-order polynomials, but these iterative methods increase the multiplication depth. Meanwhile, the matrix inversion operation requires multiple matrix multiplications, and can be approximated using known algorithms such as the Schulz algorithm for numerical stability, but many approximations require a lot of resources.

상술한 바와 같이 개인 정보를 보호된 상태에서 기계 학습을 위해서는 동형 암호 스킴을 이용하여야 하나, 기존의 기계 학습은 동형 암호에서의 곱셈 깊이를 증가시키는 행렬 곱셈이 많이 요구된다는 점에서, 기존의 기계 학습에 사용하는 알고리즘을 그대로 동형 암호화 방식으로 처리하기에는 많은 리소스가 요구되는 문제가 있다.As described above, a homomorphic encryption scheme must be used for machine learning while protecting personal information. However, existing machine learning requires a lot of matrix multiplication to increase the multiplication depth in homomorphic encryption. There is a problem that a lot of resources are required to process the algorithm used in the homomorphic encryption method as is.

이하에서는 상술한 바와 같은 문제점을 해결하기 위한 본 개시에 따른 동형 암호를 이용하여 회귀 분석 방법에 대해서 설명한다. Hereinafter, a regression analysis method using homomorphic encryption according to the present disclosure to solve the problems described above will be described.

한편, 암호문 간에 연산을 위해서는, 각 암호문에 대한 평문 슬롯의 수는 동일해야 한다. 즉, 데이터의 수는 데이터 소유자별로 다를 수 있음으로, 사용자 단말장치(100)는 미리 일반 텍스트 슬롯의 크기를 결정해야 할 필요가 있다. 즉, 사용자 단말장치(100)가 보유한 데이터 포인트의 수가 r이고, 일반 텍스트 슬롯의 크기가 N인 경우, 데이터 소유자는 각 열을

조각으로 분할하고, 마지막 조각을 0으로 채울 필요가 있다. 여기서 길이는 N이다. Meanwhile, for operations between ciphertexts, the number of plaintext slots for each ciphertext must be the same. That is, since the number of data may vary depending on the data owner, the user terminal device 100 needs to determine the size of the plain text slot in advance. That is, if the number of data points held by the user terminal device 100 is r and the size of the plain text slot is N, the data owner

We need to split it into pieces and pad the last piece with zeros. Here the length is N.

이와 같은 설정은 사용자 단말장치(100)가 여러 데이터를 소유하는 것과 다르지 않다. 따라서, 이하에서는 도 1의 제2 서버(300)가 모든 데이터를 소유한다는 가정하게 회귀 분석 방법을 설명한다. This setting is no different from the user terminal device 100 owning multiple data. Therefore, the regression analysis method will be described below assuming that the second server 300 of FIG. 1 owns all data.

선형 모델(linear model)은 입력 특성에 대한선형 함수를 만들어 예측을 수행하는 방법이다. 이러한 선행의 예측(accuracy)을 높이기 위하여 여러 정규화 방법이 사용될 수 있다. 본 개시에서는 여러 정규화 방식 중 리지 회귀(또는 능형 회귀, ridge regression)를 이용하여 설명을 하나, 본 개시에는 상술한 리지 회귀 방식뿐만 아니라 다른 회귀 방식에도 적용될 수 있다. A linear model is a method of making predictions by creating a linear function for input characteristics. Several normalization methods can be used to increase the accuracy of these priors. In the present disclosure, the description is made using ridge regression (or ridge regression) among several regularization methods, but the present disclosure can be applied to other regression methods as well as the ridge regression method described above.

이하에서는 먼저, 사적 변수가 하나 있는 경우를 먼저 설명한다. 예를 들어, 하나의 종속 변수(Y)와 p 개의 독립적인 비 사적 변수(X₁, X₂,..., X_p), 하나의 사적 변수(X_s)를 사용하는 경우, 해당 선형 모델은 다음과 같은 수학식 1과 같이 표현될 수 있다. Below, we will first describe the case where there is one private variable. For example, if you use one dependent variable (Y), p independent _non -private variables ₍ _X ₁ , Can be expressed as Equation 1 below:

여기서, Y는 종속 변수(dependent variable), 여기서, X₁, X₂,..., X_p각각은 p 개의 독립적인 비 사적 변수, X_s는 사적 변수이고,

는 추정할 회귀 계수이며, ε는 에러 항이다. 그리고, X₁, X₂,..., X_p, X_s, Y에 대한 i 번째 관측 값은 (x_il, ..., x_ip,x_is y_i)(여기서, i = 1, ..., n)로 표시한다. n > p+1임을 가정하며, 이는 독립 변수의 수보다 더 많은 관측 값이 있다는 가정이다. Here, Y is a _dependent variable, where X ₁ _, X ₂ ,...,

is the regression coefficient to be estimated, and ε is the error term. And, the ith _observed value _for X ₁ _, _X ₂ , _... _, It is displayed as .., n). It is assumed that n > p+1, which is the assumption that there are more observed values than the number of independent variables.

상술한 바와 같은 하나의 사적 변수와 복수의 비사적 변수가 있는 경우, 사적 변수에 대해서 동형 암호가 적용된다면, 사용자 단말장치(100)에서 제2 서버(300)로 전송될 데이터는 (x_il, ..., x_ip,h(x_is) y_i) 이다. 여기서, In the case where there is one private variable and a plurality of non-private variables as described above, if homomorphic encryption is applied to the private variable, the data to be transmitted from the user terminal device 100 to the second server 300 is (x _il , ..., x _ip , h(x _is ) y _i ). here,

이하에서는 이러한 데이터에 동형 암호가 적용되는 경우를 설명한다. h_i = h(x_is)가 사적 변수 x_is에 대한 완전 동형 암호이다. 이하에서는 편의상 일반 텍스트와 암호문 사이의 연산 또는 암호문 간의 연산 등을 모두 평문으로 표시한다. Below, a case where homomorphic encryption is applied to such data will be described. h _i = h(x _is ) is a completely homomorphic cipher for the private variable x _is . Hereinafter, for convenience, all operations between plain text and cipher text or between cipher text are displayed as plain text.

이와 같은 경우에 있어서,

를 사용하여 각 x_ij를

로 교체하고,

로 β₀을 추정하며, 다음의 수학식 2와 같이 인터셉트가 없는 회귀 모델을 나타낼 수 있다. In cases like this,

For each x _ij using

Replace with

β ₀ is estimated, and a regression model without intercept can be expressed as shown in Equation 2 below.

여기서, X는 사적 변수 및 비사적 변수를 갖는 행렬(데이터 세트에 대응되는 행렬 이라고 지칭될수 있음), β는 회귀 계수, ε는 에러항이다. here, X is a matrix with private and non-private variables (can be referred to as the matrix corresponding to the data set), β is the regression coefficient, and ε is the error term.

오버 피팅(overfitting)을 제어하기 위하여, 정규화 항을 에러 함수에 추가하면, 총 에러 함수는 다음의 수학식 3과 같이 최소화될 수 있다. To control overfitting, by adding a regularization term to the error function, the total error function can be minimized as shown in Equation 3 below.

여기서, ε(β)는 총 에러항, ε_D(β)는 정규화 항에 대한 에러이고, ε_W(β)는 규제를 위한 에러항이다. Here, ε(β) is the total error term, ε _D (β) is the error for the regularization term, and ε _W (β) is the error term for regulation.

이에 따른 리지 회귀의 해는 다음의 수학식 4와 같다. The solution to Ridge regression according to this is given in Equation 4 below.

여기서,

_RLS은 리지 회귀의 해이고, I_p+1은 (p+1)×(p+1)에 대한 항등 행렬(identity matrix)이다. 인터셉트 β₀ 는 변수의 중심화때문에 정규화되지 않는다. 따라서, 리지 추정 값은 다음과 같은 수학식 5와 같이 나타낼 수 있다. here,

_RLS is the solution of Ridge regression, and I _p+1 is the identity matrix for (p+1)×(p+1). The intercept β ₀ is not normalized due to the centering of the variables. Therefore, the ridge estimate value can be expressed as Equation 5 below.

여기서

은 리지 추정 값이고,, XX^T는 n×n 행렬이다. 수학식 5의 뒷부분은 와 같다. 그 이유는 수학식 6과 같다. 따라서, 이다. here

is the ridge estimate value, and XX ^T is an n×n matrix. The latter part of equation 5 is It's the same. The reason is the same as Equation 6. thus, am.

여기서 h_s는 (h(x_1s), ...h(x_ns))^T이고, X_(-s)는 X의 다른 파트이고, Sherman-Woobury inversion formula를 사용하면, 수학식 7과 같다. Here, h _s is (h(x _1s ), ...h(x _ns )) ^T , and X _(-s) is another part of X. Using the Sherman-Woobury inversion formula, it is equivalent to equation 7.

여기서, 이다. (여기서, , 는 직교 매트릭스이고, 는 대각 성분(σ₁ ≥... ≥σ_p )을 갖는 대각 행렬이다. 단일 벡터 분해(singular vector decomposition, SVD)를 사용하면, 수학식 8과 같이 나타낼 수 있다. here, am. (here, , is an orthogonal matrix, is a diagonal matrix with diagonal components (σ ₁ ≥... ≥σ _p ). Using singular vector decomposition (SVD), it can be expressed as Equation 8.

여기서, σ_p+1 = … = σ_n=0, 다음과 같은 항으로 정의된다. Here, σ _p+1 = … = σ _n =0, defined by the following terms.

따라서, 리지 추정은 다음과 같은 수학식 10과 같다. Therefore, the ridge estimation is expressed as Equation 10 below.

여기서 u_j는 의 칼럼이고, , 이다. Here u _j is It is a column of , am.

상술한 계산에서는 n개의 합산이 포함된다. 계산을 단순화하기 위하여, (여기서, , )사실을 사용한다. The above calculation includes n summations. To simplify calculations, (here, , ) Use facts.

SVD를 줄이기 위하여, , 를 만족하는 범위 내에서 u₂를 선택할 수 있다. 그리고 up₊₁를 uj가 h_s에 직교하도록 선택하고, 모든 j=p+2, ..., n에 대해서 h_s에 직교하는 u_p+1를 선택할 수 있다. 여기서, U₁은 수학식 11과 같다. To reduce SVD, , u ₂ can be selected within the range that satisfies . And up ₊₁ can be selected so that uj is orthogonal to h _s , and u _p+1 can be selected orthogonal to h _s for all j=p+2, ..., n. Here, U ₁ is equal to Equation 11.

따라서, , η다음과 같이 단순화 할 수 있다. thus, , η can be simplified as follows.

따라서, 리지 추정 값은 다음과 같이 단순화하여 표현될 수 있다. Therefore, the ridge estimate value can be simplified and expressed as follows.

여기서는 단지 p개의 합산만을 포함한다. Here we only include p sums.

회귀 분석에 사용되는 모든 데이터가 동형 암호화되어 있는 경우에는 상술한 바와 같이 선형 모델의 추정 값의 산출에 필요한 다항식을 분해하기 어려운 점이 있다. 그러나 본 개시에서는 사적 보호가 필요한 항목만 동형 암호화 처리가 되어 있으며, 사적 보호가 필요하지 않은 일반 변수는 일반적인 행렬 연산 등으로 빠르게 처리 가능하다는 점에서, 상술한 바와 같이 선형 모델의 추정 값의 산출에 필요한 다항식 중 동형 암호화된 메시지를 이용하는 항만을 따로 추출하고, 해당 항에 대해서만 동형 연산을 수행함으로써 동형 연산의 회수를 최소화할 수 있다. When all data used in regression analysis is homomorphically encrypted, it is difficult to decompose the polynomial required to calculate the estimated value of the linear model, as described above. However, in the present disclosure, only items that require private protection are homomorphically encrypted, and general variables that do not require private protection can be quickly processed through general matrix operations, etc., so as described above, the estimated value of the linear model can be calculated. Among the necessary polynomials, the number of homomorphic operations can be minimized by separately extracting terms that use homomorphically encrypted messages and performing homomorphic operations only on those terms.

또한, 동형 연산 과정에서의 계산 복잡도를 낮추기 위하여, 데이터 세트를 구성하는 행렬을 동형 암호문을 포함하는 제1 행렬과, 동형 암호문을 포함하지 않는 제2 행렬로 분해하고, 제2 행렬은 제1 행렬에 직교하도록 함으로써, 상술한 동형 연산 과정에서의 동형 연산을 보다 빠르게 처리하는 것이 가능하다. In addition, in order to reduce the computational complexity in the homomorphic operation process, the matrix constituting the data set is decomposed into a first matrix containing the homomorphic ciphertext and a second matrix that does not contain the homomorphic ciphertext, and the second matrix is the first matrix. By making it orthogonal to , it is possible to process the homomorphic operation more quickly in the above-described homomorphic operation process.

한편, 이상에서는 상술한 데이터 세트에 하나의 사적 변수만이 포함되는 것으로 설명하였지만, 데이터 세트는 복수의 사적 변수를 포함할 수 있다. 예를 들어, 성별과 인종이 동일한 데이터 세트에 포함된 경우, 둘 다 사적 변수로 처리해야 할 필요가 있다. 이하에서는 앞선 하나의 사적 변수가 있는 경우를 2개의 사적 변수가 있는 것으로 확장하여 설명한다. Meanwhile, although it has been described above that the above-described data set includes only one private variable, the data set may include a plurality of private variables. For example, if gender and race are included in the same data set, both may need to be treated as private variables. Below, the previous case where there is one private variable will be explained by expanding it to where there are two private variables.

2개의 사적 변수가 있는 경우는 다음과 같은 수학식 14로 표현될 수 있다. The case where there are two private variables can be expressed as Equation 14 below.

여기서 , g = 이고, 각각은 암호화된 사적 변수이다. here , g = , and each is an encrypted private variable.

앞선 방식과 동일하게 리지 회귀의 추정값은 다음과 같은 수학식 15와 같이 나타낼 수 있다. In the same way as the previous method, the estimated value of Ridge regression can be expressed as Equation 15 below.

여기서, 중간 값은 다음과 같은 수학식 16과 같이 정의될 수 있다. Here, the intermediate value can be defined as follows in Equation 16.

앞선 결과를 확장하면, 로서 U2를 자유롭게 먼저 선택하고, 직교 처리가 수행되고, h가 먼저 처리되고, 나중에 g가 적용될수 있다. Expanding on the previous results, U2 can be freely selected first, orthogonal processing is performed, h is processed first, and g can be applied later.

상술한 과정에서의 각 항목은 다음과 같다. Each item in the above-described process is as follows.

따라서, 앞선 하나의 사적 변수에서의 수학식 12를 참조하면, 각각은 다음과 같은 수학식 17과 같이 단순화하게 표현할 수 있다. Therefore, referring to Equation 12 in the preceding private variable, Each can be simply expressed as Equation 17 below.

그러나 계산은 u_p+1를 사용하여 계산되는 u_p+2를 포함하기 때문에 더 복잡해 보인다. 그러나, 는 up+2를 사용하지 않는 방식으로 단순화하게 표현될 수 있다. 이를 위하여, 일반성을 잃지 않고, 직교 처리를 g를 먼저 적용하고, h를 나중에 처리하는 다음과 같은 방식을 이용할 수 있다. however The calculation seems more complicated because it involves u _p+2 being calculated using u _p+1 . however, can be expressed simply in a way that does not use up+2. For this purpose, without loss of generality, the following method can be used in which orthogonal processing is applied to g first and h is processed later.

따라서, 는 다음과 같은 수학식 19와 같이 단순화될 수 있다. thus, can be simplified as shown in Equation 19 below.

따라서, 리지 추정은 다음과 같은 수학식 20과 같이 단순히 표기될 수 있다. Therefore, ridge estimation can be simply expressed as Equation 20 below.

이와 같이 2개의 사적 변수를 갖는 경우에도 복수의 연산만으로 그 해를 산출하는 것이 가능하다. 이상에서는 사적 변수가 1개가 있는 경우와 2개가 있는 경우만을 설명하였지만, 3개 이상의 사적 변수가 있는 경우에 대해서도 동일한 방식을 적용하여 구현할 수 있다. In this way, even when there are two private variables, it is possible to calculate the solution using only multiple operations. In the above, only the cases where there is one and two private variables were explained, but the case where there are three or more private variables can also be implemented by applying the same method.

이하에서는 상술한 리지 추정을 HEAAN 스킴에 적용한 경우를 도 3을 참조하여 설명한다. Hereinafter, the case where the above-described ridge estimation is applied to the HEAAN scheme will be described with reference to FIG. 3.

도 3은 본 개시의 일 실시 예에 따른 동형 암호문에 대한 기본 동작을 설명하기 위한 도면이다. Figure 3 is a diagram for explaining the basic operation of homomorphic ciphertext according to an embodiment of the present disclosure.

HEAAN 스킴의 주요한 기능은 암호화 이전에 일반 텍스트 벡터를 링(또는 환)으로 인코딩하는 특유의 동작이 있다. 구체적으로, 메시지를 링에 속하는 벡터로 인코딩하고, 인코딩된 메시지를 공개키를 이용하여 동형 암호문으로 변환할 수 있다. 반대로, 복호 과정에서는 암호문을 벡터로 변환하는 디코딩 절차를 수행하고, 변환된 벡터를 비밀 키로 복호화하여 메시지로 복원할 수 있다. The main feature of the HEAAN scheme is the unique operation of encoding a plain text vector into a ring (or circle) before encryption. Specifically, a message can be encoded as a vector belonging to a ring, and the encoded message can be converted into homomorphic ciphertext using a public key. Conversely, in the decoding process, a decoding procedure is performed to convert the ciphertext into a vector, and the converted vector can be decrypted with a secret key and restored to the message.

이와 같이 인코딩을 통해 HEAAN 스킴은 복잡한 벡터에 대한 암호화를 지원하고, SMID 작업도 가능하다. Through this encoding, the HEAAN scheme supports encryption of complex vectors and SMID operations are also possible.

디코딩은 인코딩의 역 동작과 동일하며, 복호화 동작 이후에 수행된다. Decoding is the same as the reverse operation of encoding and is performed after the decoding operation.

도 4는 본 개시의 일 실시 예에 따른 동형 암호문에 대한 처리 함수의 예를 설명하기 위한 도면이다. Figure 4 is a diagram for explaining an example of a processing function for homomorphic ciphertext according to an embodiment of the present disclosure.

도 4를 참조하면, HEAAN 스킴에서의 정수 덧셈, 덧셈, 정수 곱셈 및 곱셈 연산에 대한 연산 동작을 도시한다. 이러한 기본 연산을 조합하여 특정 함수에 대한 연산을 수행할 수 있다. Referring to Figure 4, it shows operation operations for integer addition, addition, integer multiplication, and multiplication operations in the HEAAN scheme. By combining these basic operations, you can perform operations on specific functions.

도시된 예에서, Add와 Mult는 암호문 사이의 덧셈과 곱셈을 나타내는 반면, ConstaAdd와 ConsMult는 암호문과 상수 다항식 간의 연산을 나타낸다. HEAAN 스킴뿐만 아니라, 병렬 연산을 지원하는 방식에서 이러한 연산은 슬롯 방식으로 수행되므로, 일반 텍스트 슬롯 수만큼의 연산을 한번에 수행할 수 있다. In the example shown, Add and Mult represent addition and multiplication between ciphertexts, while ConstaAdd and ConsMult represent operations between ciphertexts and constant polynomials. In addition to the HEAAN scheme, in methods that support parallel computation, these computations are performed in a slot manner, so computations as many as the number of regular text slots can be performed at once.

한편, HEAAN 스킴에서는 상술한 연산뿐만 아니라, left Rotate, Right Rotate, Rescale 처리도 가능하다. 여기서 left Rotate는 각각 슬롯 방향의 왼쪽으로 회전을 적용하는 것이고, Right Rotate는 각각 슬롯 방향의 오른쪽으로 회전을 적용하는 것이다. Meanwhile, in the HEAAN scheme, in addition to the above-mentioned operations, left rotation, right rotation, and rescale processing are also possible. Here, left Rotate applies rotation to the left of each slot direction, and Right Rotate applies rotation to the right of each slot direction.

이러한 방식은 암호문에 동일한 값을 추가할 때 유용하며, 효율적인 병렬 작업을 가능케 한다. Rescale은 오류의 크기를 줄이기 위하여, 모든 ConsMult 또는 Mult 후에 수행될 수 있다. 다만, Rescale은 암호문 모듈러스를 감소시키므로 부트 스트랩트 없이는 그 작업 횟수가 제안될 수 있다. This method is useful when adding the same value to the ciphertext and enables efficient parallel work. Rescale can be performed after every ConsMult or Mult to reduce the size of the error. However, since Rescale reduces the ciphertext modulus, the number of operations can be proposed without bootstrapping.

한편, 상술한 리지 회귀에서는 변수의 수만큼 많은 합산 연산이 필요로 하기 때문에, 각 사적 변수를 암호문에 넣고, 각 열의 모든 값을 하나의 암호문으로 그룹화할 수 있다. 여기서 하나의 암호문으로 압축할 수 있는 일반 텍스트의 최대 수는 동형 암호문의 보안 파라미터에 의해 결정된다. 이하에서는 열의 길이가 보안 파라미터의 값을 초과하지 않는다고 가정한다. 열의 길이가 보안 파라미터의 값을 초과하는 경우, 각 열을 여러 개의 암호문으로 분해해야 하지만, 암호문이 여러 개로 분해된다고 하더라도, 이는 상술한 리지 회귀 알고리즘의 복잡성에 큰 영향을 주지 않는다. 한편, 암호문에 η, ξ과 같은 스칼라 값을 포함할 수 있는데, 이는 동일한 값을 슬로수 만큼 반복하는 벡터로 취급할 수 있다. On the other hand, since the above-described ridge regression requires as many summation operations as the number of variables, each private variable can be put into a ciphertext and all values of each column can be grouped into one ciphertext. Here, the maximum number of plaintexts that can be compressed into one ciphertext is determined by the security parameters of the homomorphic ciphertext. Hereinafter, it is assumed that the length of the column does not exceed the value of the security parameter. If the length of the column exceeds the value of the security parameter, each column must be decomposed into multiple ciphertexts, but even if the ciphertext is decomposed into multiple ciphertexts, this does not significantly affect the complexity of the ridge regression algorithm described above. Meanwhile, the ciphertext may include scalar values such as η and ξ, which can be treated as a vector that repeats the same value as many times as the number of slows.

한편, 동형 암호문에 대해서는 패킹이 적용된다. 동형 암호화에 패킹을 이용하게 되면, 다수의 메시지를 하나의 암호문으로 암호화하는 것이 가능해진다 이 경우, 연산 장치는 각 암호문에 간의 연산을 수행하게 되면, 결과적으로 다수의 메시지에 대한 연산이 병렬적으로 처리되므로 연산 부담이 많이 줄어들게 된다. 도 7에 도시된 바와 같이 패킹된 두개의 암호문에 대한 덧셈 또는 곱셈 동작을 통하여 복수의 메시지에 대한 덧셈 또는 곱셈 동작을 수행할 수 있다. Meanwhile, packing is applied to homomorphic ciphertext. When packing is used in homomorphic encryption, it becomes possible to encrypt multiple messages into one ciphertext. In this case, the computing device performs operations on each ciphertext, resulting in operations on multiple messages in parallel. As it is processed, the computational burden is greatly reduced. As shown in FIG. 7, an addition or multiplication operation on a plurality of messages can be performed through an addition or multiplication operation on two packed ciphertexts.

도 5는 본 개시의 실시 예에 따른 리지 회귀를 위한 중간 값 산출 알고리즘을 설명하기 위한 도면이다. 구체적으로, 도 5는 하나의 사적 변수만을 포함하는 경우의 리지 회귀의 추정을 위한 중간 값을 산출하는 방법이다. Figure 5 is a diagram for explaining an intermediate value calculation algorithm for ridge regression according to an embodiment of the present disclosure. Specifically, Figure 5 shows a method for calculating the median value for estimation of Ridge regression when only one private variable is included.

도 5를 참조하면, 상술한 수학식 15 또는 수학식 20에 포함된 중간 값을 HEANN 스킴에서 지원하는 ConstMult, Mult, IRotate, ADD 등을 이용하여 산출할 수 있다. Referring to FIG. 5, the intermediate value included in Equation 15 or Equation 20 described above can be calculated using ConstMult, Mult, IRotate, ADD, etc. supported by the HEANN scheme.

도 6은 본 개시의 실시 예에 따른 리지 추정 값의 산출 동작을 설명하기 위한 도면이다. Figure 6 is a diagram for explaining an operation of calculating a ridge estimate value according to an embodiment of the present disclosure.

도 6을 참조하면, 도 5에서 산출된 중간 값을 사용하여, 최종 리지 회귀 추정 값을 산출한다. 편의를 위하여, 곱셈 동작 이후에 리스케일 동작은 생략되어 있다. 동형 회전을 사용하면, 암호문 내의 n개 요소에 대한 합을 log n 번에 계산이 가능하다. 따라서, 도 5에 도시된 알고리즘의 복잡성은 o(p lon n)이다. Referring to FIG. 6, the final ridge regression estimate value is calculated using the intermediate value calculated in FIG. 5. For convenience, the rescale operation after the multiplication operation is omitted. Using homomorphic rotation, the sum of n elements in the ciphertext can be calculated log n times. Therefore, the complexity of the algorithm shown in Figure 5 is o(p lon n).

도 8은 본 개시에 따른 리지 회귀 동작의 성능을 설명하기 위한 도면이다. 구체적으로, 도 8의 각 점은 하나의 데이터 세트에 대한 결과를 나타내며, x 축과 y 축은 각각 열 수와 성능 비율을 나타낸다. Figure 8 is a diagram for explaining the performance of a ridge regression operation according to the present disclosure. Specifically, each point in Figure 8 represents the results for one data set, and the x and y axes represent the number of columns and performance ratio, respectively.

도 8을 참조하면, 열 수와 성능 비율 사이에 선형 관계가 있음을 알 수 있다. 그리고 데이터세트의 열 개수가 적을 때, 본 개시에 따른 방법을 이용하면 보다 효율적으로 수행할 수 있음을 확인할 수 있다. 반대로, 데이터 세트의 열 개수가 많아져 효율적이지 않다고 해석될 수 있으나, 본 개시에 따른 방법은 경사 하강법을 사용하지 않기 때문에, 학습률을 검색할 필요가 없다는 점에서, 기존 방식과 비교해서는 빠른 연산이 가능하다. Referring to Figure 8, it can be seen that there is a linear relationship between the number of columns and the performance ratio. And it can be confirmed that when the number of rows in the dataset is small, the method according to the present disclosure can be used more efficiently. Conversely, it may be interpreted as being inefficient as the number of columns in the data set increases, but the method according to the present disclosure does not use gradient descent and therefore does not need to search for the learning rate, so it is faster than existing methods. This is possible.

도 9는 본 개시의 일 실시 예에 따른 암호문 처리 방법을 설명하기 위한 흐름도이다. Figure 9 is a flowchart illustrating a method of processing ciphertext according to an embodiment of the present disclosure.

먼저, 동형 암호화된 메시지 및 동형 암호화되지 않은 메시지를 슬롯으로 포함하는 데이터 세트를 수신하여 저장한다(S910). 이와 같은 데이터 세트는 앞서 설명한 바와 같이 마스킹 처리되어 있을 수 있으며, 데이터 세트 내에서는 하나의 사적 변수만을 포함할 수 있으며, 복수의 사적 변수가 포함되어 있을 수 있다. 그리고 포함된 사적 변수는 동형 암호화되어 있는바, 해당 데이터 세트를 수신하는 장치에서는 개인 정보가 보호될 수 있다. First, a data set containing homomorphically encrypted messages and non-homomorphically encrypted messages as slots is received and stored (S910). Such a data set may be masked as described above, and may include only one private variable or multiple private variables within the data set. And since the included private variables are homomorphically encrypted, personal information can be protected on the device receiving the data set.

데이터 세트 내의 동형 암호화된 메시지와 동형 암호화되지 않은 메시지를 기저장된 선형 모델에 적용하여 상기 선형 모델에 대한 추정 값을 산출한다. 구체적으로, 선행 모델의 추정 값의 산출에 필요한 다항식 중 동형 암호화된 메시지를 이용하는 항(term)을 추출하고, 추출된 항에 대한 동형 연산을 수행하여 상기 추정 값을 산출할 수 있다. 이와 같이 데이터 세트 내에 모든 변수가 동형 암호화되어 있지 않고, 개인 보호가 필요한 사적 변수에 대해서만 동형 암호가 되어 있기 때문에, 선형 모델의 추정 값의 산출에 필요한 다항식 중 동형 암호화된 메시지를 이용하는 항과 그렇지 않은 항을 구분하는 것이 가능하며, 동형 암호화된 메시지를 이용하는 항에 대해서만 동형 연산 방식을 수행하여 연산 처리할 수 있다. The homomorphically encrypted and non-homomorphically encrypted messages in the data set are applied to a pre-stored linear model to calculate an estimated value for the linear model. Specifically, a term using a homomorphically encrypted message can be extracted from the polynomials needed to calculate the estimated value of the preceding model, and a homomorphic operation on the extracted term can be performed to calculate the estimated value. As such, not all variables in the data set are homomorphically encrypted, and only private variables that require personal protection are homomorphically encrypted. Therefore, among the polynomials required to calculate the estimated value of the linear model, one term uses a homomorphically encrypted message and the other term uses the homomorphically encrypted message. It is possible to distinguish terms, and only terms that use homomorphically encrypted messages can be processed by performing homomorphic calculation methods.

또한, 동형 연산 과정에서 보다 빠른 연산을 위하여, 데이터 세트에 대응되는 행렬을 동형 암호화된 메시지를 포함하는 제1 행렬과 그렇지 않은 제2 행렬로 분해하고, 제2 행렬을 제1 행렬에 직교하도록 하는 처리를 함으로써(S920), 연산 과정에서 선형 모델에 대한 추정 값 산출을 보다 빠르게 수행할 수 있다(S930). In addition, for faster calculation in the homomorphic calculation process, the matrix corresponding to the data set is decomposed into a first matrix containing a homomorphically encrypted message and a second matrix that does not, and the second matrix is made orthogonal to the first matrix. By processing (S920), the estimation value for the linear model can be calculated more quickly during the calculation process (S930).

상술한 바와 같은 연산을 완료하여, 연산 결과를 다른 장치에 전송하는 과정을 수행하거나, 해당 연산 결과를 비밀 키로 복호화할 수도 있다. After completing the above-described calculation, the calculation result can be transmitted to another device, or the calculation result can be decrypted with a secret key.

이상과 같이 본 개시에 따른 암호문 처리 방법은 개인 보호가 필요한 사적 변수와 그렇지 않은 변수를 구분하여 동형 암호화 처리하여 연산을 처리하는바, 더욱 빠른 연산을 수행할 수 있다. 또한, 학습 과정에서 경사 하강법을 사용하지 않음으로써, 보다 정확한 해를 효율적으로 얻을 수 있을 뿐만 아니라, 경사 하강법의 성능 최적화에 필요한 학습률 검색을 생략할 수 있는바, 보다 빠른 머신 러닝 학습이 가능한 효과가 있다. As described above, the ciphertext processing method according to the present disclosure distinguishes between private variables that require personal protection and variables that do not, and performs calculations by homomorphically encrypting them, allowing faster calculations. In addition, by not using gradient descent in the learning process, not only can more accurate solutions be obtained efficiently, but the learning rate search required to optimize the performance of gradient descent can be omitted, enabling faster machine learning learning. It works.

한편, 상술한 다양한 실시 예에 따른 암호문 처리 방법은 각 단계들을 수행하기 위한 프로그램 코드 형태로 구현되어, 기록 매체에 저장되고 배포될 수도 있다. 이 경우, 기록 매체가 탑재된 장치는 상술한 암호화 또는 암호문 처리 등의 동작들을 수행할 수 있다. Meanwhile, the ciphertext processing method according to the various embodiments described above may be implemented in the form of program code for performing each step, and may be stored and distributed in a recording medium. In this case, a device equipped with a recording medium can perform operations such as the above-described encryption or ciphertext processing.

이러한 기록 매체는, ROM, RAM, 메모리 칩, 메모리 카드, 외장형 하드, 하드, CD, DVD, 자기 디스크 또는 자기 테이프 등과 같은 다양한 유형의 컴퓨터 판독 가능 매체가 될 수 있다. These recording media may be various types of computer-readable media such as ROM, RAM, memory chips, memory cards, external hard drives, hard drives, CDs, DVDs, magnetic disks, or magnetic tapes.

이상 첨부 도면을 참고하여 본 개시에 대해서 설명하였지만 본 개시의 권리범위는 후술하는 특허청구범위에 의해 결정되며 전술한 실시 예 및/또는 도면에 제한되는 것으로 해석되어서는 안 된다. 그리고 특허청구범위에 기재된 개시의, 당업자에게 자명한 개량, 변경 및 수정도 본 개시의 권리범위에 포함된다는 점이 명백하게 이해되어야 한다.Although the present disclosure has been described above with reference to the accompanying drawings, the scope of the present disclosure is determined by the scope of the patent claims described later and should not be construed as being limited to the above-described embodiments and/or drawings. In addition, it should be clearly understood that improvements, changes and modifications of the disclosure described in the patent claims, which are obvious to those skilled in the art, are also included in the scope of rights of the present disclosure.

1000: 네트워크 시스템 100: 사용자 단말장치
200: 제1 서버 300: 제2 서버
400: 연산 장치 410: 통신 장치
420: 메모리 430: 디스플레이
440: 조작 입력 장치 450: 프로세서1000: Network system 100: User terminal device
200: first server 300: second server
400: computing device 410: communication device
420: Memory 430: Display
440: manipulation input device 450: processor

Claims

In a method of processing ciphertext in an arithmetic processing unit,
storing a data set containing homomorphically encrypted messages and non-homomorphically encrypted messages as slots;
Applying the homomorphically encrypted message and the non-homomorphically encrypted message in the data set to a pre-stored linear model to calculate an estimate value for the linear model; and
A step of transmitting the calculated estimated value to an external device;
The data set includes a plurality of different variables,
A ciphertext processing method wherein at least one variable among the plurality of different variables is a homomorphically encrypted private variable, and at least one other variable among the plurality of different variables is a non-private variable that is not homomorphically encrypted.

According to paragraph 1,
The step of calculating the estimated value is,
A ciphertext processing method for extracting a term that uses the homomorphically encrypted message from among polynomials required to calculate the estimated value of the linear model, performing a homomorphic operation on the extracted term, and calculating the estimated value.

According to paragraph 1,
The step of calculating the estimated value is,
Generate a matrix corresponding to the data set, decompose the matrix into a first matrix including the homomorphically encrypted message and a second matrix not including the homomorphically encrypted message, and the second matrix is the first matrix. 1 A ciphertext processing method that is orthogonal to a matrix and calculates an estimate value for the linear model using the first matrix and the second matrix.

According to paragraph 1,
A ciphertext processing method wherein the data set includes a plurality of different variables, each of which is homomorphically encrypted.

According to paragraph 1,
The linear model is,
A ciphertext processing method that is a ridge regression linear model.

In the computing device,
a memory storing at least one instruction; and
Including a processor executing the at least one instruction,
The processor,
By executing the at least one instruction,
Store a data set containing homomorphically encrypted messages and non-homomorphically encrypted messages as slots, and apply the homomorphically encrypted messages and non-homomorphically encrypted messages in the data set to a pre-stored linear model to obtain an estimated value for the linear model. Calculate ,
The data set includes a plurality of different variables,
A computing device wherein at least one variable among the plurality of different variables is a homomorphically encrypted private variable, and at least one other variable among the plurality of different variables is a non-private variable that is not homomorphically encrypted.

According to clause 6,
The processor,
A computing device that extracts a term using the homomorphically encrypted message from among the polynomials needed to calculate the estimated value of the linear model, performs a homomorphic operation on the extracted term, and calculates the estimated value.

According to clause 6,
The processor,
Generate a matrix corresponding to the data set, decompose the matrix into a first matrix including the homomorphically encrypted message and a second matrix not including the homomorphically encrypted message, and the second matrix is the first matrix. 1 An arithmetic device that is orthogonal to a matrix and calculates an estimated value for the linear model using the first matrix and the second matrix.

According to clause 6,
A computing device wherein the data set includes a plurality of different variables, each of which is homomorphically encrypted.

According to clause 6,
The linear model is,
A computational unit that is a ridge regression linear model.