KR102010644B1

KR102010644B1 - METHOD AND SYSTEM FOR k-NN CLASSIFICATION PROCESSING BASED ON GARBLED CIRCUIT

Info

Publication number: KR102010644B1
Application number: KR1020170059798A
Authority: KR
Inventors: 장재우; 김형진; 신광식; 김현태
Original assignee: 전북대학교산학협력단; (주)아이엠시티
Priority date: 2017-05-15
Filing date: 2017-05-15
Publication date: 2019-08-14
Also published as: KR20180125227A

Abstract

본 발명은 클라우드에 아웃소싱된 암호화 데이터베이스 상에서의 k-NN 분류 처리 알고리즘에 관한 것이다. 본 발명의 일실시예에 따른 가블드 회로(GARBLED CIRCUIT) 기반 k-NN 분류 처리 방법은, 사용자 단말로부터의 kNN(k Nearest Neighbor) 질의에 대한 결과데이터가 도출 됨에 따라, 상기 결과데이터의 도출에 관여한, 제1 클라우드와 제2 클라우드 사이의 다자간 빈도(SF, Secure Frequency)를 수행하여, 상기 kNN 질의와, 상기 kNN 질의에 의해 도출되었던 과거 결과데이터 간에 대한 빈번도를 계산하는 단계, 상기 빈번도의 계산을 통해, 탐색 범주를 선택하는 단계, 및 상기 탐색 범주가, 상기 도출된 결과데이터의 범주와 일치하면, 상기 도출된 결과데이터를 사용자 단말로 제공하는 단계를 포함하여 구성할 수 있다.The present invention relates to a k-NN classification processing algorithm on a cryptographic database outsourced to the cloud. In the GARBLED CIRCUIT-based k-NN classification processing method according to an embodiment of the present invention, as the result data for the k Nearest Neighbor (kNN) query from the user terminal is derived, the result data is derived. Calculating a frequency between the kNN query and past result data derived by the kNN query by performing a secure frequency (SF) between the first cloud and the second cloud, wherein the frequent Through calculation of the figure, selecting a search category, and if the search category matches the category of the derived result data, it may comprise the step of providing the derived result data to the user terminal.

Description

K-NN classification processing method based on garbled circuit and k-NN classification processing system based on garbled circuit {METHOD AND SYSTEM FOR k-NN CLASSIFICATION PROCESSING BASED ON GARBLED CIRCUIT}

본 발명은 클라우드에 아웃소싱된 암호화 데이터베이스 상에서의 k-NN 분류 처리 알고리즘에 관한 것으로, 암호화 데이터베이스에 대한 사용자 질의를 처리하는 과정에서 노출될 우려가 있는 데이터 접근 패턴의 보호하고 데이터 분류를 지원하기 위한 것이다.The present invention relates to a k-NN classification processing algorithm on an encrypted database outsourced to the cloud, to protect data access patterns and to support data classification that may be exposed in processing a user query against an encrypted database. .

최근 클라우드 컴퓨팅에 대한 연구가 활성화됨에 따라 데이터베이스의 관리 및 운용을 외부사업자에게 위탁하는 데이터베이스 아웃소싱에 대한 관심이 고조되고 있다. Recently, as research on cloud computing has been activated, interest in database outsourcing, which entrusts external operators to database management and operation, is increasing.

그러나 종래 기술에서의 아웃소싱된 데이터베이스는 클라우드 및 공격자로부터 의미 있는 정보가 추출되어 악용될 수 있고, 사용자가 클라우드에 전송하는 질의를 통해 사용자의 성향이나, 선호도 등과 같은 개인 정보가 유추될 수 있는 보안 상의 문제가 있다.However, the outsourced database in the prior art can be exploited by extracting meaningful information from the cloud and attacker, and inferring personal information such as user's disposition and preference through queries sent to the cloud. there is a problem.

이러한 문제를 해결하기 위해 데이터베이스를 다수의 클라우드에 분산 저장하고, 데이터베이스의 내용을 변환한 후 아웃소싱하는 연구 등이 개발되어 오고 있다. 그러나 여전히 데이터 및 질의를 완벽하게 보호할 수 없는 단점이 있다.In order to solve this problem, researches for distributing and storing the database in multiple clouds, converting the contents of the database, and outsourcing have been developed. However, there are still disadvantages in that data and queries cannot be completely protected.

또한, 종래 기술에서의 kNN 분류 기법은 데이터 마이닝 기법이 대표적으로 사용되어 오고 있다. 데이터 마이닝 기법은 주어진 질의로부터 가장 가까운 k개의 데이터를 추출한 후에 빈번도가 가장 높은 범주를 추출하는 기법으로, 이는 높은 정보 보호를 제공하지만 질의처리 비용이 높다는 단점이 있다.In addition, the data mining technique has been typically used as the kNN classification technique in the prior art. The data mining technique extracts the most frequent category after extracting k nearest data from a given query, which provides high information protection but has a disadvantage of high query processing cost.

따라서, 클라우드에 아웃소싱된 데이터베이스 환경에서 데이터 보호, 사용자 질의 보호, 데이터 접근 패턴 보호를 모두 지원하는 동시에 효율적인 질의처리 및 데이터 분류 성능을 제공할 수 있는 시스템 및 방법이 필요한 실정이다. Therefore, there is a need for a system and method that can support data protection, user query protection, and data access pattern protection in a database environment outsourced to the cloud while providing efficient query processing and data classification performance.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, k-NN 질의처리 알고리즘의 결과데이터를 분류 분석함으로써, 정보 유출 없이 해당 결과데이터에 대한 등급을 추출하는 분석할 수 있는 것을 목적으로 한다.The present invention has been made to solve the above problems, the object of the present invention is to classify and analyze the result data of the k-NN query processing algorithm, it can be analyzed to extract the grade for the corresponding result data without information leakage.

또한, 본 발명은 가블드 회로 및 데이터 패킹 기법 기반의 암호화 연산 프로토콜을 제공함으로써, 연산 횟수를 감소시켜 효율적인 질의처리 성능을 제공할 수 있게 하는 다른 목적을 가지고 있다.In addition, the present invention has another object to provide an efficient query processing performance by reducing the number of operations by providing an encryption operation protocol based on the garbled circuit and data packing technique.

또한, 본 발명은 향상된 암호화 연산 프로토콜을 기반으로 하는 암호화 인덱스 탐색과 암호화 데이터베이스 상에서의 데이터 접근 패턴 보호를 지원하는 k-NN 질의처리 알고리즘을 제공함으로써, 추가적인 정보의 노출을 방지하여 데이터 보호와 사용자 질의 보호뿐만 아니라, 질의 처리 과정에서의 데이터 접근 패턴 보호를 모두 지원할 수 있게 하는 다른 목적을 가지고 있다.In addition, the present invention provides a k-NN query processing algorithm that supports encryption index search and data access pattern protection on an encrypted database based on an improved encryption operation protocol, thereby preventing data from being exposed and protecting data and user queries. In addition to protection, it has another purpose to support both data access pattern protection during query processing.

본 발명의 일실시예에 따른 가블드 회로(GARBLED CIRCUIT) 기반 k-NN 질의 분류 방법은, 사용자 단말로부터의 kNN(k Nearest Neighbor) 질의에 대한 결과데이터가 도출 됨에 따라, 상기 결과데이터의 도출에 관여한, 제1 클라우드와 제2 클라우드 사이의 다자간 빈도(SF, Secure Frequency)를 수행하여, 상기 kNN 질의와, 상기 kNN 질의에 의해 도출되었던 과거 결과데이터 간에 대한 빈번도를 계산하는 단계, 상기 빈번도의 계산을 통해, 탐색 범주를 선택하는 단계, 및 상기 탐색 범주가, 상기 도출된 결과데이터의 범주와 일치하면, 상기 도출된 결과데이터를 사용자 단말로 제공하는 단계를 포함하여 구성할 수 있다.In the GARBLED CIRCUIT based k-NN query classification method according to an embodiment of the present invention, as the result data for the k Nearest Neighbor (kNN) query from the user terminal is derived, the result data is derived. Calculating a frequency between the kNN query and past result data derived by the kNN query by performing a secure frequency (SF) between the first cloud and the second cloud, wherein the frequent Through calculation of the figure, selecting a search category, and if the search category matches the category of the derived result data, it may comprise the step of providing the derived result data to the user terminal.

또한, 본 발명의 일실시예에 따른 가블드 회로 기반 k-NN 분류 처리 시스템은, 사용자 단말로부터의 kNN 질의에 대한 결과데이터가 도출 됨에 따라, 상기 결과데이터의 도출에 관여한, 제1 클라우드와 제2 클라우드 사이의 다자간 빈도(SF)를 수행하여, 상기 kNN 질의와, 상기 kNN 질의에 의해 도출되었던 과거 결과데이터 간에 대한 빈번도를 계산하고, 상기 빈번도의 계산을 통해, 탐색 범주를 선택하며, 상기 탐색 범주가, 상기 도출된 결과데이터의 범주와 일치하면, 상기 도출된 결과데이터를 사용자 단말로 제공할 수 있다.In addition, the garbled circuit-based k-NN classification processing system according to an embodiment of the present invention, as the result data for the kNN query from the user terminal is derived, and the first cloud involved in the derivation of the result data; Perform a multi-party frequency (SF) between a second cloud to calculate a frequency between the kNN query and past result data derived by the kNN query, and select a search category through the calculation of the frequency When the search category matches the category of the derived result data, the derived result data may be provided to the user terminal.

본 발명의 일실시예에 따르면, k-NN 질의처리 알고리즘의 결과데이터를 분류 분석함으로써, 정보 유출 없이 해당 결과데이터에 대한 등급을 추출하는 분석을 할 수 있다.According to an embodiment of the present invention, by classifying and analyzing the result data of the k-NN query processing algorithm, an analysis may be performed to extract a grade of the result data without information leakage.

또한, 본 발명의 일실시예에 따르면, 가블드 회로 및 데이터 패킹 기법 기반의 ESSED 프로토콜, GSCMP 프로토콜, 및 GSPE 프로토콜 중 적어도 하나의 암호화 연산 프로토콜을 수행함으로써, 연산 횟수를 감소시켜 효율적인 질의처리 성능을 제공할 수 있다.In addition, according to an embodiment of the present invention, by performing at least one cryptographic operation protocol of the ESSED protocol, the GSCMP protocol, and the GSPE protocol based on the garbled circuit and the data packing scheme, the number of operations is reduced, thereby improving efficient query processing performance. Can provide.

또한, 본 발명의 일실시예에 따르면, 향상된 암호화 연산 프로토콜을 기반으로 하는 암호화 인덱스 탐색과 암호화 데이터베이스 상에서의 데이터 접근 패턴 보호를 지원하는 k-NN 질의처리 알고리즘을 제공함으로써, 추가적인 정보의 노출을 방지하여 데이터 보호와 사용자 질의 보호뿐만 아니라, 질의 처리 과정에서의 데이터 접근 패턴 보호를 모두 지원할 수 있다.In addition, according to an embodiment of the present invention, by providing a k-NN query processing algorithm that supports encryption index search and data access pattern protection on the encryption database based on the improved encryption operation protocol, to prevent exposure of additional information It can support not only data protection and user query protection but also data access pattern protection during query processing.

도 1은 본 발명의 일실시예에 따른 가블드 회로 기반 k-NN 분류 처리 시스템을 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 제1 클라우드에서 보유하는 데이터를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 암호화 데이터 베이스를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 kNN 분류 처리 과정을 설명하기 위한 도면이다.
도 5b 내지 도 6b는 본 발명의 일실시예에 따른 kNN 분류 처리 알고리즘을 설명하기 위한 도면이다.
도 7은 본 발명의 일실시예에 따른 1차원 공간에서의 점-영역 관계를 도시한 도면이다.
도 8a 내지 도 9c는 본 발명의 일실시예에 따른 kNN 분류 처리 알고리즘을 설명하기 위한 도면이다.
도 10은 본 발명의 일실시예에 따른 가블드 회로 기반 k-NN 분류 처리 방법의 순서를 도시한 흐름도이다.1 is a diagram illustrating a garbled circuit-based k-NN classification processing system according to an embodiment of the present invention.
2 is a diagram illustrating data held in a first cloud according to an embodiment of the present invention.
3 is a diagram illustrating an encryption database according to an embodiment of the present invention.
4 is a diagram illustrating a kNN classification process according to an embodiment of the present invention.
5B and 6B illustrate a kNN classification processing algorithm according to an embodiment of the present invention.
FIG. 7 illustrates a point-region relationship in one-dimensional space according to an embodiment of the present invention.
8A to 9C are diagrams for describing a kNN classification processing algorithm according to an embodiment of the present invention.
10 is a flowchart illustrating a procedure of a garbled circuit based k-NN classification processing method according to an embodiment of the present invention.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited or limited by the embodiments. Like reference numerals in the drawings denote like elements.

본 발명에서 "k-NN 질의"는 질의로부터 가장 가까운 거리에 존재하는 k개의 데이터를 탐색하는 질의를 지칭할 수 있다. 본 명세서에서 설명되는 가블드 회로 기반 k-NN 분류 처리 방법 및 가블드 회로 기반 k-NN 분류 처리 시스템은 가블드 회로 및 데이터 패킹 기법 기반의 효율적인 암호화 연산 프로토콜을 이용하여 두 점간 유클리디언 거리 계산, 두 데이터 간 비교 연산, 암호화 영역의 암호화 점 포함 여부 판단함으로써, 암호화 데이터베이스 상에서 k-NN 질의를 처리할 수 있으며, k-NN 질의 처리 결과를 분류 체계를 통해 분석하여 데이터를 분류 처리할 수 있다.In the present invention, a "k-NN query" may refer to a query for searching k data existing in the closest distance from the query. The garbled circuit-based k-NN classification processing method and the garbled circuit-based k-NN classification processing system described herein use the efficient encryption algorithm based on the garbled circuit and data packing scheme to calculate Euclidean distance between two points. By comparing the two data, and determining whether the encryption point is included in the encryption area, the k-NN query can be processed on the encryption database, and the data can be classified by analyzing the k-NN query processing result through the classification system. .

도 1은 본 발명의 일실시예에 따른 가블드 회로 기반 k-NN 분류 처리 시스템을 도시한 도면이다. 1 is a diagram illustrating a garbled circuit-based k-NN classification processing system according to an embodiment of the present invention.

본 발명의 가블드 회로 기반 k-NN 분류 처리 시스템(100, 이하, k-NN 분류 처리 시스템)은 데이터베이스(T)(110), kd 트리(120), 암호화 공개키(public key; pk)(130), 복호화 비밀키(secret key; sk)(140), 제1 클라우드(C_A)(150), 암호화 데이터베이스(160), 암호화 kd 트리(170) 및 제2 클라우드(C_B)(180)를 포함할 수 있다.The garbled circuit based k-NN classification processing system 100 of the present invention (hereinafter, referred to as a k-NN classification processing system) includes a database (T) 110, a kd tree 120, an encryption public key (pk) ( 130, decryption secret key (sk) 140, first cloud (C _A ) 150, encryption database 160, encryption kd tree 170, and second cloud (C _B ) 180 It may include.

k-NN 분류 처리 시스템(100)은 데이터베이스(110)에 저장된 데이터를 선정된 개수(예를 들어, F개) 단위로 분할하고, 분할된 데이터를 포함하는 단말 노드를, 복수로 가지는 kd 트리(120)를 구축한다.The k-NN classification processing system 100 divides the data stored in the database 110 by a predetermined number (for example, F) units, and includes a kd tree having a plurality of terminal nodes including the divided data. 120).

일례로, k-NN 분류 처리 시스템(100)은 레벨이 h이고, 총 2^h-1개의 단말 노드를 가지는 kd 트리(120)를 데이터베이스(110)로부터 구축할 수 있으며, 각 단말 노드는 최대 F(FanOut)개의 데이터를 저장할 수 있다.For example, the k-NN classification processing system 100 may build a kd tree 120 from the database 110 having a level h and having a total of 2 ^h-1 terminal nodes from the database 110, and each terminal node may have a maximum F. (FanOut) data can be saved.

kd 트리(120)의 각 단말 노드는, 자신이 담당하는 노드 영역에 관한 영역 정보와, 노드 영역 내에 포함되는 데이터에 대한 데이터ID를 평문 형태로 저장할 수 있다. 여기서, 상기 영역 정보는 노드 영역에 대한 하한점(lb_z,m) 및 상한점(ub_z,m)(1≤z≤num_node, 1≤j≤m)을 속성(m) 별로 포함할 수 있다.Each terminal node of the kd tree 120 may store, in plain text form, region information regarding a node region in charge thereof and data IDs for data included in the node region. Here, the region information may include a lower limit (lb _{z, m} ) and an upper limit (ub _{z, m} ) (1 ≦ _z ≦ num _node , 1 ≦ j ≦ _m ) for a node region for each attribute m. have.

예를 들어, 도 2를 참조하면, k-NN 분류 처리 시스템(100)은 8개의 2차원 (예컨대, x 및 y 차원) 데이터를 저장할 수 있다. 이 때, k-NN 분류 처리 시스템(100)은 해당 데이터를 kd 트리를 기반으로 분할할 수 있다. 구축된 kd 트리는 총 4개의 단말 노드를 포함할 수 있고, 각각의 단말 노드에 노드의 하한점(lb _x 와 lb _y ), 상한점(ub _x 와 ub _y ) 정보 및 노드에 포함된 데이터의 ID가 저장될 수 있다. 즉, kd 트리(120)의 단말 노드 'node 1'은, 단말 노드 'node 1'과 연관된 노드 영역에 대한 하한점 '(lb₁ _,0, lb₁ _,1)' 및 상한점 '(ub₁ _,0, ub₁ _,1)'과, 상기 노드 영역에 포함되는 데이터에 대한 데이터ID 't₁', 't₂'를 저장할 수 있다.For example, referring to FIG. 2, the k-NN classification processing system 100 may store eight two-dimensional (eg, x and y dimensions) data. At this time, the k-NN classification processing system 100 may divide the corresponding data based on the kd tree. The constructed kd tree may include a total of four terminal nodes, and each terminal node has lower limit ( lb _x and lb _y ) information, upper limit ( ub _x and ub _y ) information, and ID of data included in the node. Can be stored. That is, the terminal node 'node 1' of the kd tree 120 has a lower limit '(lb ₁ _{, 0} , lb ₁ _{, 1} )' and an upper limit '(ub ₁ ) for the node area associated with the terminal node' node 1 '. _{, 0} , ub ₁ _{, 1} ) 'and data IDs' t ₁ ' and 't ₂ ' for data included in the node area.

본원에서 데이터와 노드 간의 포함관계를 명확히 하기 위해, kd 트리의 각 노드 경계에는 데이터가 존재하지 않는다고 가정하지만, 이에 한정된 것은 아니다.To clarify the inclusion relationship between data and nodes herein, it is assumed that no data exists at each node boundary of the kd tree, but is not limited thereto.

k-NN 분류 처리 시스템(100)은 데이터베이스(110)로부터 구축한 kd 트리(120)를 암호화하여, 암호화 kd 트리(170)를 생성한다.The k-NN classification processing system 100 encrypts the kd tree 120 constructed from the database 110 to generate an encrypted kd tree 170.

구체적으로, k-NN 분류 처리 시스템(100)은 각 단말 노드와 연관된 노드 영역에 포함되는 데이터에 대한 데이터ID를, 속성 별로 더 암호화하여 암호화 kd 트리(170)를 생성할 수 있다.In detail, the k-NN classification processing system 100 may generate the encrypted kd tree 170 by further encrypting data IDs of data included in the node area associated with each terminal node for each attribute.

예를 들어, k-NN 분류 처리 시스템(100)은 kd 트리(120)의 단말 노드 'node 1'에 포함되는 데이터ID 't₁', 't₂'를, 속성 m 별로 암호화하고, 나머지 단말 노드에 포함된 데이터ID를 속성 m 별로 암호화하여, 암호화 kd 트리(170)를 생성할 수 있다. 예를 들면, 암호화 kd 트리(170)는 4개의 단말 노드(node 1, node2, node3, node4)를 포함하고, 각 단말 노드(210, 220, 230, 240)는 하한점과 상한점으로 구성되는 노드 영역을 가질 수 있다.For example, the k-NN classification processing system 100 encrypts data IDs 't ₁ ' and 't ₂ ' included in the terminal node 'node 1' of the kd tree 120 for each attribute m, and the remaining terminals By encrypting the data ID included in the node for each property m, an encrypted kd tree 170 may be generated. For example, the encrypted kd tree 170 includes four terminal nodes (node 1, node 2, node 3, node 4), and each terminal node 210, 220, 230, 240 is composed of a lower limit and an upper limit. It can have a node area.

이때, k-NN 분류 처리 시스템(100)은 암호화 데이터베이스(160)의 생성 시 이용한 동일 암호화 공개키(130)로, kd 트리(120)에 포함되는 각 단말 노드를 암호화 하여 암호화 kd 트리(170)를 생성할 수 있다.In this case, the k-NN classification processing system 100 encrypts each terminal node included in the kd tree 120 with the same encryption public key 130 used when the encryption database 160 is generated. Can be generated.

예를 들어, k-NN 분류 처리 시스템(100)은 도 3과 같은 암호화 데이터베이스를 생성할 수 있다. k-NN 분류 처리 시스템(100)은 차원 단위로 암호화를 수행하여 제1 클라우드(150)로 전달할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 kd 트리(120)에 포함된 각 단말 노드의 영역 정보를 속성 별로 암호화 할 수 있는데, 도 3에 도시된 바와 같이 암호화 데이터베이스(160)의 암호화 공개키를 이용하여 암호화 할 수 있다. For example, the k-NN classification processing system 100 may generate an encryption database as shown in FIG. 3. The k-NN classification processing system 100 may perform encryption on a dimensional basis and transmit the same to the first cloud 150. That is, the k-NN classification processing system 100 may encrypt area information of each terminal node included in the kd tree 120 for each property. As shown in FIG. 3, the public key of the encryption database 160 is encrypted. You can encrypt using.

k-NN 분류 처리 시스템(100)은 독립(non-colluding)되는 제1 클라우드(150) 및 제2 클라우드(180)를 마련할 수 있다.The k-NN classification processing system 100 may provide a first cloud 150 and a second cloud 180 that are non-colluding.

본 발명에서, 각 클라우드(150, 180)는 사용자 질의를 처리하기 위해 암호화 프로토콜을 수행 시, 질의 처리 과정 중에 획득한 정보를 바탕으로, 추가적인 정보를 획득하기 위해 다른 클라우드와 결탁하여 데이터 및 정보를 주고 받지 않도록 할 수 있다.In the present invention, each cloud (150, 180) when performing an encryption protocol to process the user query, based on the information obtained during the query processing, to collaborate with other clouds to obtain additional information to collect data and information You can avoid giving and receiving.

k-NN 분류 처리 시스템(100)은 암호화 kd 트리(170)를, 암호화 데이터베이스(160), 및 암호화 공개키(130)를 유지하는 제1 클라우드(150)에 보관한다(단계 101).The k-NN classification processing system 100 stores the encrypted kd tree 170 in the first cloud 150 which holds the encryption database 160 and the encryption public key 130 (step 101).

또한, k-NN 분류 처리 시스템(100)은 상기 암호화 공개키(130)에 대응한 복호화 비밀키(140)를, 제1 클라우드(150)와 상이한 제2 클라우드(180)에 보관한다(단계 102).In addition, the k-NN classification processing system 100 stores the decryption secret key 140 corresponding to the encryption public key 130 in a second cloud 180 different from the first cloud 150 (step 102). ).

다시 말해, k-NN 분류 처리 시스템(100)은 암호화 kd 트리(170)를, 암호화 데이터베이스(160) 및 암호화 공개키(130)와 함께 제1 클라우드(150)에 보관하고, 비밀 키로 생성한 복호화 비밀키(140)를, 다른 제2 클라우드(180)에 보관할 수 있다.In other words, the k-NN classification processing system 100 stores the encrypted kd tree 170 together with the encryption database 160 and the encryption public key 130 in the first cloud 150, and decrypts the generated secret key. The secret key 140 may be stored in another second cloud 180.

또한, k-NN 분류 처리 시스템(100)은 데이터베이스(110)의 암호화 시 이용한 동일 암호화 공개키(130)를 사용자 단말(AU; Authorized User)(190)로 제공할 수 있다(단계 103). 단말(190)에서는 데이터를 획득하기 위해 제1 클라우드(150)로 질의를 요청 시, 상기 제공된 암호화 공개키(130)를 이용하여 사용자 질의를 암호화할 수 있다(단계 104).In addition, the k-NN classification processing system 100 may provide the same encryption public key 130 used for encryption of the database 110 to an authorized user (AU) 190 (step 103). When requesting a query to the first cloud 150 to acquire data, the terminal 190 may encrypt the user query using the provided encryption public key 130 (step 104).

예를 들어, 단말(190)에서는 질의 점을, 예컨대, 'E(q_j)(1≤j≤m)'와 같이 암호화 공개키(130)로 암호화하여 사용자 질의를 요청할 수 있다.For example, the terminal 190 may request a user query by encrypting the query point with the encryption public key 130, for example, 'E (q _j ) (1 ≦ _j ≦ m)'.

이와 같이, k-NN 분류 처리 시스템(100)은 암호화된 질의를 기반으로 서비스를 제공하여 데이터 보호 및 사용자 질의 보호를 지원할 수 있다.As such, the k-NN classification processing system 100 may provide a service based on an encrypted query to support data protection and user query protection.

k-NN 분류 처리 시스템(100)은 사용자 단말(190)로부터 사용자 질의가 수신되면, 선정된 암호화 연산 프로토콜을 기반으로, 제1 클라우드(150)와 제2 클라우드(180) 간에 다자간 계산(SMC, Secure Multiparty Computation)을 수행하여, kNN 질의를 처리할 수 있다(단계 105). When a user query is received from the user terminal 190, the k-NN classification processing system 100 may perform multi-party calculation between the first cloud 150 and the second cloud 180 based on the selected encryption operation protocol (SMC, Secure Multiparty Computation) may be performed to process the kNN query (step 105).

또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150) 및 제2 클라우드(180)와 함께 사용자 질의를 처리한 결과를 단말(190)에 전송할 수 있다(단계 106).In addition, the k-NN classification processing system 100 may transmit a result of processing a user query together with the first cloud 150 and the second cloud 180 to the terminal 190 (step 106).

여기서, 다자간 계산이란, 데이터 소유자가 보유하고 있는 원본 데이터를 노출하지 않은 채, 다른 개체(제1 클라우드(150)와 제2 클라우드(180))를 통해 프로토콜 및 연산을 안전하게 수행하는 것을 지칭할 수 있다.Here, the multilateral calculation may refer to safely performing protocols and operations through other entities (the first cloud 150 and the second cloud 180) without exposing the original data held by the data owner. have.

이를 위해, k-NN 분류 처리 시스템(100)은 암호화 kd 트리(170), 암호화 데이터베이스(160) 및 암호화 공개키(130)를 보관하는 제1 클라우드(150)와 다른 제2 클라우드(180)에 복호화 비밀키(140)를 보관하고, 이를 바탕으로, 제1 클라우드(150)와 제2 클라우드(180) 간에 다자간 계산을 통해 kNN 질의를 처리할 수 있다.To this end, the k-NN classification processing system 100 may be configured in a second cloud 180 different from the first cloud 150 that stores the encrypted kd tree 170, the encryption database 160, and the encryption public key 130. The decryption secret key 140 may be stored, and based on this, the kNN query may be processed through multilateral calculation between the first cloud 150 and the second cloud 180.

이를 통해, k-NN 분류 처리 시스템(100)은 데이터 보호를 지원하면서, 암호화 kd 트리(170)를 기반으로, 암호화 데이터베이스(160) 상에서의 사용자 질의를 안전하게 처리할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 암호화 데이터베이스 상에서 kNN 질의처리 알고리즘을 통해 질의를 처리하는 과정에서 데이터 프라이버시, 질의 프라이버시 및 데이터 접근 패턴과 관련된 어떠한 정보도 노출되지 않는 장점이 있을 수 있다. kNN 질의처리 알고리즘에 대한 보다 상세한 설명은 후술하는 도 4 내지 도 9c를 참고하여 설명하고자 한다. kNN 질의처리 알고리즘에 대하여 설명하기에 앞서, 본원에서 사용되는 프로토콜에 대하여 설명하고자 한다.Through this, the k-NN classification processing system 100 may securely process user queries on the encryption database 160 based on the encryption kd tree 170 while supporting data protection. In addition, the k-NN classification processing system 100 may have an advantage that no information related to data privacy, query privacy, and data access pattern is exposed in processing a query through a kNN query processing algorithm on an encryption database. A detailed description of the kNN query processing algorithm will be described with reference to FIGS. 4 to 9C to be described later. Before describing the kNN query processing algorithm, the protocol used herein will be described.

k-NN 분류 처리 시스템(100)은 ESSED (Enhanced Secure Squared Euclidean Distance) 프로토콜, GSCMP(Garbled Circuit based Secure Compare) 프로토콜, 및 GSPE(Garbled Circuit based Secure Point Enclosure) 프로토콜 중 적어도 하나를 암호화 연산 프로토콜로서 선정할 수 있다. The k-NN classification processing system 100 selects at least one of an Enhanced Secure Squared Euclidean Distance (ESSED) protocol, a Garbled Circuit based Secure Compare (GSCMP) protocol, and a Garbled Circuit based Secure Point Enclosure (GSPE) protocol as an encryption operation protocol. can do.

예를 들면, k-NN 분류 처리 시스템(100)은 ESSED 프로토콜을 이용하여 벡터 E(X)와 E(Y) 간 거리의 제곱 E(|X-Y|²)을 계산할 수 있다. 이때, X 및 Y는 m 차원 벡터일 수 있다.For example, the k-NN classification processing system 100 may calculate the square E (| X - Y | ² ) of the distance between the vectors E ( X ) and E ( Y ) using the ESSED protocol. In this case, X and Y may be an m- dimensional vector.

먼저, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 난수를 생성한 후 수학식 1을 통해 데이터 패킹을 수행하여 R을 계산하도록 할 수 있다.First, the k-NN classification processing system 100 may generate random numbers in the first cloud 150 and then perform data packing through Equation 1 to calculate R.

여기서, σ는 하나의 데이터를 나타내는 비트 길이일 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 R을 암호화하여 E(R)을 생성하도록 할 수 있다.Here, σ may be a bit length representing one data. Also, the k-NN classification processing system 100 may generate R (E) by encrypting R in the first cloud 150.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 각 차원에서의 X와 Y의 암호화 거리 E(x _j -y _j )(1≤j≤m)를 계산한 후, 수학식 2를 통해 데이터 패킹을 수행하여 E(v)를 계산할 수 있다.Next, the k-NN classification processing system 100 calculates the encryption distance E ( x _j - y _j ) (1 ≦ _j ≦ m ) of X and Y in each dimension in the first cloud 150, E ( v ) may be calculated by performing data packing through Equation 2.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 E(v)=E(v)×E(R)을 계산한 후, E(v)를 제2 클라우드(180)로 전송할 수 있다. 그 다음, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 전송 받은 E(v)를 복호화하여 [x ₁-y ₁+r ₁|…|x _m-y _m+r _m]을 획득할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, v

를 통해 언패킹(unpacking)을 수행하여 x _j -y _j +r _j (1≤j≤m)을 획득한 후, 차원별 (x _j -y _j +r _j )²(1≤j≤m)을 합산하여, d에 저장할 수 있다(단, d의 초기값은 0으로 설정할 수 있으나, 이에 한정된 것은 아니다). Next, the k-NN classification processing system 100 calculates E ( v ) = E ( v ) × E ( R ) in the first cloud 150, and then converts E ( v ) to the second cloud 180. Can be sent to. Then, the k-NN classification processing system 100 decodes the received E ( v ) in the second cloud 180 and [ x ₁ - y ₁ + r ₁ |. | x _m - y _m + r _m ] can be obtained. In addition, the k-NN classification processing system 100 in the second cloud 180, v

Unpacking through to obtain x _j - y _j + r _j (1≤ j ≤ m ), then by dimension ( x _j - y _j + r _j ) ² (1≤ j ≤ m ) May be added and stored in d (however, the initial value of d may be set to 0, but is not limited thereto).

이를 통해, k-NN 분류 처리 시스템(100)은 차원별 거리의 합산을 평문 상에서 수행함으로써, 종래 기술인 DPSSED 프로토콜에 비해 암호화 데이터 기반 연산 횟수를 감소시킬 수 있다. Through this, the k-NN classification processing system 100 can reduce the number of operations based on encrypted data compared to the DPSSED protocol of the prior art by performing the sum of the distances for each dimension in plain text.

다음으로, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 d를 암호화한 후, E(d)를 제1 클라우드(150)에게 전송할 수 있다. 해당 과정을 통해, k-NN 분류 처리 시스템(100)은 DPSSED 프로토콜의 제2 클라우드(180)에서 요구되는 m 번의 데이터 암호화를 한 번으로 감소시킬 수 있다.Next, the k-NN classification processing system 100 may encrypt d in the second cloud 180 and then transmit E ( d ) to the first cloud 150. Through this process, the k-NN classification processing system 100 may reduce the m data encryption required in the second cloud 180 of the DPSSED protocol to one time.

마지막으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 난수 삽입에 의해 추가된 값을 수학식 3을 이용하여 각 차원별로 제거함으로써, 두 벡터 X와 Y간 거리의 제곱 E(|X-Y|²)을 계산할 수 있다.Finally, the k-NN classification processing system 100 removes the value added by random number insertion in the first cloud 150 for each dimension by using Equation 3, so that the square E of the distance between two vectors X and Y is removed. (| X - Y | ² ) can be calculated.

실시예에 따라서, k-NN 분류 처리 시스템(100)은 GSCMP 프로토콜을 이용하여 kNN 질의를 처리할 수 있다. According to an embodiment, the k-NN classification processing system 100 may process a kNN query using the GSCMP protocol.

k-NN 분류 처리 시스템(100)은 GSCMP 프로토콜을 이용하여 제1 클라우드(150)에 E(u)와 E(v)가 주어졌을 때, u<v를 만족하는 경우 E(1)을 반환하고, u>v인 경우 E(0)을 반환할 수 있다. k-NN 분류 처리 시스템(100)은 종래 기술의 CMP-S와 마찬가지로 두 개의 ADD 게이트 및 한 개의 CMP 게이트로 구성된 가블드 회로를 통해 GSCMP 프로토콜을 수행할 수 있다. 그러나 k-NN 분류 처리 시스템(100)은 GSCMP 프로토콜 수행 중 제1 클라우드(150)와 제2 클라우드(180) 사이에서 난수가 포함된 데이터를 교환할 수 있는 점에서 CMP-S와 차이가 있을 수 있다.When the k-NN classification processing system 100 is given E ( u ) and E ( v ) to the first cloud 150 using the GSCMP protocol, the k-NN classification processing system 100 returns E (1) when u < v is satisfied. If u > v , E (0) can be returned. Like the conventional CMP-S, the k-NN classification processing system 100 may perform the GSCMP protocol through a garbled circuit composed of two ADD gates and one CMP gate. However, the k-NN classification processing system 100 may be different from the CMP-S in that the k-NN classification processing system 100 may exchange data including random numbers between the first cloud 150 and the second cloud 180 while performing the GSCMP protocol. have.

이하에서, GSCMP 프로토콜의 전체적인 수행 알고리즘을 개시하고자 한다. 첫 번째로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 두 개의 난수 r _u 와 r _v 를 생성할 수 있다. 두 번째로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 r _u 와 r _v 를 암호화 한 후, E(m ₁)=E(u)×E(r _u )² 및 E(m ₂)=E(v)²×E(1)× E(r _v )를 계산할 수 있다. 세 번째로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 두 개의 F(F ₀ : u>v, F ₁ : v>u) 중 임의로 하나를 선택할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 F ₀과 F ₁ 중 무엇이 선택되었는지는 제2 클라우드(180)에 공개하지 않을 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 선택된 F에 따라 다음을 수행할 수 있다.In the following, it is intended to disclose the overall performance algorithm of the GSCMP protocol. First, the k-NN classification processing system 100 may generate two random numbers r _u and r _v in the first cloud 150. Second, the k-NN classification processing system 100 encrypts r _u and r _v in the first cloud 150, and then E ( m ₁ ) = E ( u ) × E ( r _u ) ² and E ( m ₂ ) = E ( v ) ² × E (1) × E ( r _v ) can be calculated. Third, the k-NN classification processing system 100 may select one of two Fs ( F ₀ : u > v , F ₁ : v > u ) in the first cloud 150. In this case, the k-NN classification processing system 100 may not disclose to the second cloud 180 which is selected from F ₀ and F ₁ . In addition, the k-NN classification processing system 100 may perform the following according to the selected F in the first cloud 150.

만약, 제1 클라우드(150)에서 F ₀ : u>v을 선택한 경우, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, <E(m ₂), E(m ₁)>의 순으로 암호화 데이터를 제2 클라우드(180)에게 전송할 수 있다.If F ₀ : u > v is selected in the first cloud 150, the k-NN classification processing system 100 may select <E ( m ₂ ), E ( m ₁ )> in the first cloud 150. The encrypted data may be transmitted to the second cloud 180 in the order of.

만약, 제1 클라우드(150)에서 F ₁ : u<v를 선택한 경우, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, <E(m ₁), E(m ₂)>의 순으로 암호화 데이터를 제2 클라우드(180)에게 전송할 수 있다. If F ₁ : u <v is selected in the first cloud 150, the k-NN classification processing system 100 may select <E ( m ₁ ), E ( m ₂ )> in the first cloud 150. The encrypted data may be transmitted to the second cloud 180 in the order of.

네 번째로, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 전송 받은 데이터를 복호화 할 수 있다. 제1 클라우드(150)에서 F ₀ : u>v를 선택한 경우 k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, <m ₂, m ₁>을 획득할 수 있고, 제1 클라우드(150)에서 F ₁ : u<v를 선택한 경우 k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, <m ₁, m ₂>를 획득할 수 있다.Fourth, the k-NN classification processing system 100 may decode the received data in the second cloud 180. When F ₀ : u> v is selected in the first cloud 150, the k-NN classification processing system 100 may obtain < m ₂ , m ₁ > in the second cloud 180, and the first cloud When F ₁ : u <v is selected at 150, the k-NN classification processing system 100 may obtain < m ₁ , m ₂ > from the second cloud 180.

다섯 번째로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)가 두 개의 ADD 게이트와 한 개의 CMP 게이트로 구성된 가블드 회로를 생성하도록 할 수 있다. 만약, F ₀이 선택된 경우 k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, -r _v 및 -r _u 를 각각 제1 ADD 게이트와 제2 ADD 게이트에 전달할 수 있고, F ₁이 선택된 경우 -r _u 및 -r _v 를 각각 제1 ADD 게이트와 제2 ADD 게이트에 전달할 수 있다.Fifth, the k-NN classification processing system 100 may enable the first cloud 150 to generate a garbled circuit composed of two ADD gates and one CMP gate. If F ₀ is selected, the k-NN classification processing system 100 may transmit- r _v and- r _u to the first ADD gate and the second ADD gate, respectively, in the first cloud 150, and F _1. If the selected and _u -r - r _v may carry in each of the 1 ADD gate and the second gate 2 ADD.

여섯 번째로, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 수신한 데이터 중, 제1 데이터를 제1 ADD 게이트에 전달하도록 할 수 있고, 제2 데이터를 제2 ADD 게이트에 전달하도록 할 수 있다. 따라서, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)가 F ₀이 선택된 경우 m ₂ 및 m ₁을 각각 제1 ADD 게이트와 제2 ADD 게이트에 전달하도록 할 수 있고, F ₁이 선택된 경우 m ₁ 및 m ₂를 각각 제1 ADD 게이트와 제2 ADD 게이트에 전달하도록 할 수 있다.Sixth, the k-NN classification processing system 100 may allow the second cloud 180 to transmit the first data, among the received data, to the first ADD gate, and transmit the second data to the second ADD gate. Can be delivered to Accordingly, the k-NN classification processing system 100 may allow the second cloud 180 to deliver m ₂ and m ₁ to the first ADD gate and the second ADD gate, respectively, when F ₀ is selected, and F ₁ is When selected, m ₁ and m ₂ may be delivered to the first ADD gate and the second ADD gate, respectively.

일곱 번째로, k-NN 분류 처리 시스템(100)은 제1 ADD 게이트에서, F ₀이 선택된 경우 -r _v 및 m ₂=v+r _v 를 합산하고, F ₁이 선택된 경우 -r _u 및 m ₁=u+r _u 를 합산하여 해당 결과 "result ₁"을 CMP 게이트로 전달하도록 할 수 있다.Seventh, the k-NN classification processing system 100 sums- r _v and m ₂ = v + r _v when F ₀ is selected and, when F ₁ is selected- r _u and m , in the first ADD gate. ₁ = u + r _u can be added to pass the result " result ₁ " to the CMP gate.

여덟 번째로, k-NN 분류 처리 시스템(100)은 제2 ADD 게이트에서, F ₀이 선택된 경우 -r _u 및 m ₁=u+r _u 를 합산하고, F ₁이 선택된 경우 -r _v 및 m ₂=v+r _v 를 합산하여 해당 결과 "result ₂"를 CMP 게이트로 전달하도록 할 수 있다. 이 때, ADD 게이트의 결과값은 가블드 회로의 특성에 의해 인코딩 되어 전달되기 때문에, 정보 노출이 발생되지 않을 수 있다.Eighthly, the k-NN classification processing system 100 sums- r _u and m ₁ = u + r _u when F ₀ is selected and, when F ₁ is selected- r _v and m , in the second ADD gate. _You can add ₂ = v + r _v so that the result " result ₂ " is passed to the CMP gate. At this time, since the result value of the ADD gate is encoded and transmitted by the characteristic of the garbled circuit, information exposure may not occur.

아홉 번째로, k-NN 분류 처리 시스템(100)은 CMP 게이트에서, result ₁<result ₂인 경우 α=1을 반환하고, 그렇지 않은 경우 α=0을 반환하도록 할 수 있다.Ninth, the k-NN classification processing system 100 may return α = 1 in the CMP gate if result ₁ < result ₂ , and otherwise return α = 0.

마지막으로, 가블드 회로의 수행 결과 α는 제2 클라우드(180)에서 확인할 수 있고, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 이를 암호화하여 제1 클라우드(150)로 전송할 수 있다. 그러나, k-NN 분류 처리 시스템(100)의 제2 클라우드(180)는 제1 클라우드(150)에 의해 선택된 F를 알지 못하기 때문에, u<v의 결과를 판단할 수 없다. k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 F ₀이 선택된 경우에 E(α)의 값을 SBN 프로토콜을 통해 변경하고, E(α)를 반환함으로써 GSCMP 프로토콜을 종료할 수 있다. 이 때, E(α)=E(1)인 경우, u<v임을 의미하지만, k-NN 분류 처리 시스템(100)의 제1 클라우드(150) 및 제2 클라우드(180)는 E(α)의 실제 값을 알 수 없다.Finally, the result α of the garbled circuit can be confirmed in the second cloud 180, and the k-NN classification processing system 100 encrypts it in the second cloud 180 and transmits it to the first cloud 150. Can be. However, since the second cloud 180 of the k-NN classification processing system 100 does not know F selected by the first cloud 150, the result of u < v cannot be determined. The k-NN classification processing system 100 may terminate the GSCMP protocol by changing the value of E ( α ) through the SBN protocol and returning E ( α ) when F ₀ is selected in the first cloud 150. have. In this case, when E ( α ) = E (1), it means that u < v , but the first cloud 150 and the second cloud 180 of the k-NN classification processing system 100 are E ( α ). The actual value of is unknown.

실시예에 따라서, k-NN 분류 처리 시스템(100)은 GSPE 프로토콜을 이용할 수 있다. 제1 클라우드(150)에 m 차원의 점 E(p) 및 하한점 E(lb _j ) 및 상한점 E(ub _j )(1≤j≤m)으로 표현된 암호화 영역 정보 range가 주어졌을 때, k-NN 분류 처리 시스템(100)은 GSPE 프로토콜을 이용하여, 점 p가 영역 range에 포함되는 경우 E(1)을 반환할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 GSPE 프로토콜을 이용하여, 점 p가 영역 range와 겹치지 않는 경우 E(0)을 반환할 수 있다. GSPE 프로토콜을 이용한 전체적인 수행 알고리즘은 다음과 같을 수 있다.According to an embodiment, the k-NN classification processing system 100 may use the GSPE protocol. When the first cloud 150 to the m-dimensional point E (p) and the lower limit point E _(j lb) and the upper limit point E (ub _j) (1≤ j ≤ m) been encrypted information area range is given by the expression, The k-NN classification processing system 100 may return E (1) when the point p is included in the area range using the GSPE protocol. In addition, the k-NN classification processing system 100 may return E (0) when the point p does not overlap the area range using the GSPE protocol. The overall execution algorithm using the GSPE protocol may be as follows.

먼저, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 두 개의 난수 배열 ra _j , rb _j (1≤j≤2m)을 생성한 후, 수학식 4와 수학식 5를 통해 데이터 패킹을 수행하여, RA 및 RB를 각각 계산할 수 있다.First, the k-NN classification processing system 100 generates two random number arrays ra _j , rb _j (1 ≦ j ≦ 2 m ) in the first cloud 150, and then through Equations 4 and 5 Data packing may be performed to calculate RA and RB , respectively.

여기서, σ는 하나의 데이터를 표현하기 위한 비트 길이를 의미할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 RA 및 RB를 암호화하여 E(RA)와 E(RB)를 생성할 수 있다. Here, σ may mean a bit length for representing one data. In addition, the k-NN classification processing system 100 may generate E ( RA ) and E ( RB ) by encrypting RA and RB in the first cloud 150.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 점 및 영역의 각 차원별 하한점 값에 2를 곱한 후, 이를 각각 E(μ _j ) 및 E(μ _j )(1≤j≤m)에 저장할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 E(μ _j ) ← E(range ₁.lb _j )² 및 E(ξ _j ) ← E(range ₂.lb _j )²를 통해 수행할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 점 및 영역의 각 차원별 상한점 값에 2를 곱하고 1을 더한 후, 각각 E(δ _j ) 및 E(φ _j )(1≤j≤m)에 저장할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 E(δ _j ) ← E(range ₁.ub _j )²×E(1) 및 E(ρ _j ) ← E(range ₂.ub _j )²×E(1)를 통해 수행할 수 있다. 이를 통해 k-NN 분류 처리 시스템(100)은 비교하는 두 수가 같은 경우에 대한 포함 여부를 판단할 수 있다.Next, the k-NN classification processing system 100 multiplies the lower limit value of each dimension of the point and the area in the first cloud 150 by 2, and then E ( μ _j ) and E ( μ _j ) ( 1 ≦ j ≦ m ). At this time, the k-NN classification processing system 100 may be performed through E ( μ _j ) ← E ( range _1.1 lb _j ) ² and E ( ξ _j ) ← E ( range _2.0 lb _j ) ² . In addition, the k-NN classification processing system 100 multiplies the upper limit value of each dimension of the point and the area in the first cloud 150 by 2 and adds 1, and then E ( δ _j ) and E ( φ _j ), respectively. (1 ≦ j ≦ m ). At this time, k-NN classification processing system 100 is E ( δ _j ) ← E ( range _1. Ub _j ) ² × E (1) and E ( ρ _j ) ← E ( range _2. Ub _j ) ² × E This can be done through (1). Through this, the k-NN classification processing system 100 may determine whether the two numbers to be compared are included in the same case.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서 두 개의 F(F ₀ : u>v, F ₁ : v>u) 중 임의의 하나를 선택할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 F ₀과 F ₁중 무엇이 선택되었는지를 제2 클라우드(180)에 공개하지 않을 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 선택된 F에 따라 p의 값 및 range ₂의 상한점에 대해 각 차원별로 다음과 같이 데이터 패킹을 수행하도록 할 수 있다.Next, the k-NN classification processing system 100 may select any one of two F ( F ₀ : u > v , F ₁ : v > u ) in the first cloud 150. In this case, the k-NN classification processing system 100 may not disclose to the second cloud 180 which of F ₀ and F ₁ is selected. In addition, the k-NN classification processing system 100 may perform data packing for each dimension on the upper limit of the value of p and the upper limit of range ₂ according to the selected F in the first cloud 150 as follows.

F ₀ : u > v 가 선택된 경우,

이고,

일 수 있다. F ₁ : v > u 가 선택된 경우,

이고,

일 수 있다. If F ₀ : u > v is selected,

ego,

Can be. If F ₁ : v > u is selected,

ego,

Can be.

즉, F ₀이 선택된 경우, k-NN 분류 처리 시스템(100)은 p의 각 차원별 값을 E(RB)와 패킹하고, range의 각 차원별 상한점 값을 E(RA)와 패킹할 수 있다. 반면, F ₁이 선택된 경우, k-NN 분류 처리 시스템(100)은 p의 각 차원별 값을 E(RA)와 패킹하고, range의 각 차원별 상한점 값을 E(RB)와 패킹할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 선택된 F에 따라 p의 값 및 range의 하한점에 대해 차원별로 다음과 같이 데이터 패킹을 수행하도록 할 수 있다. That is, when F ₀ is selected, the k-NN classification processing system 100 may pack a value of each dimension of p with E ( RB ) and pack an upper limit value of each dimension of range with E ( RA ). have. On the other hand, when F ₁ is selected, the k-NN classification processing system 100 may pack a value of each dimension of p with E ( RA ) and pack an upper limit value of each dimension of the range with E ( RB ). have. In addition, the k-NN classification processing system 100 may perform data packing on a dimension-by-dimensional basis for the lower limit of the value of p and the range of p according to the selected F in the first cloud 150 as follows.

F ₀ : u>v 가 선택된 경우,

이고,

일 수 있다. F ₁ : v>u 가 선택된 경우,

이고,

일 수 있다. If F ₀ : u > v is selected,

ego,

Can be. If F ₁ : v > u is selected,

ego,

Can be.

즉, F ₀이 선택된 경우, k-NN 분류 처리 시스템(100)은 range의 각 차원별 하한점 값을 E(RB)와 패킹하고, p의 각 차원별 상한점 값을 E(RA)와 패킹할 수 있다. 반면, F ₁이 선택된 경우, k-NN 분류 처리 시스템(100)은 range의 각 차원별 하한점 값을 E(RA)와 패킹하고, p의 각 차원별 상한점 값을 E(RB)와 패킹할 수 있다. 다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(RA) 및 E(RB)를 제2 클라우드(180)에게 전송하도록 할 수 있다. That is, when F ₀ is selected, the k-NN classification processing system 100 packs the lower limit value of each dimension of the range with E ( RB ), and packs the upper limit value of each dimension of p with the E ( RA ). can do. On the other hand, when F ₁ is selected, the k-NN classification processing system 100 packs the lower limit value of each dimension of the range with E ( RA ), and packs the upper limit value of each dimension of p with the E ( RB ). can do. Next, the k-NN classification processing system 100 may transmit E ( RA ) and E ( RB ) to the second cloud 180 in the first cloud 150.

다음으로, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, E(RA) 및 E(RB)를 복호화하여, RA 및 RB를 획득할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 RA

를 통해 RA를 언패킹하여 ra _j +u _j (1≤j≤2m)를 획득할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 RB

를 통해 RA를 언패킹하여 rb _j +v _j (1≤j≤2m)를 획득할 수 있다. 여기서, u _j 및 v _j 는 p의 값 및 range의 하한점 및 상한점 값을 의미할 수 있다. 한편, 해당 값에는 난수가 포함되어 있으며, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서는 제1 클라우드(150)에서 선택된 F를 알지 못하기 때문에 추가적인 정보 노출이 발생하지 않을 수 있다. Next, the k-NN classification processing system 100 may obtain RA and RB by decoding E ( RA ) and E ( RB ) in the second cloud 180. In addition, k-NN classification processing system 100 is RA in the second cloud (180)

By unpacking RA , ra _j + u _j (1 ≦ j ≦ 2 m ) can be obtained. In addition, k-NN classification processing system 100 is RB in the second cloud (180)

By unpacking RA , rb _j + v _j (1 ≦ j ≦ 2 m ) can be obtained. Here, u _j and v _j may mean a value of p and a lower limit and an upper limit of the range . On the other hand, the value includes a random number, and since the k-NN classification processing system 100 does not know the F selected in the first cloud 150 in the second cloud 180, no additional information exposure may occur. have.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, CMP-S 서킷을 생성할 수 있다. CMP-S 서킷을 생성한 후, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)가 보유하고 있는 ra _j , rb _j (1≤j≤2m)를 기반으로 -ra _j , -rb _j (1≤j≤2m)를 생성한 후, 이를 제1 클라우드(150)에서 차례로 CMP-S의 입력 값으로 전달하도록 할 수 있다. 한편, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 자신이 보유하고 있는 ra _j +u _j , rb _j +v _j (1≤j≤2m)를 차례로 CMP-S의 입력 값으로 전달하도록 할 수 있다. k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 CMP-S 수행 결과 α' _j (1≤j≤2m)를 확인할 수 있고, 제2 클라우드(180)에서 이를 암호화하여 제1 클라우드(150)로 전송할 수 있다. Next, the k-NN classification processing system 100 may generate a CMP-S circuit in the first cloud 150. After generating the CMP-S circuit, the k-NN classification processing system 100 is based on ra _j , rb _j (1≤ j ≤ 2 m ) held by the first cloud 150- ra _j ,- After generating rb _j (1 ≦ j ≦ 2 m ), the first cloud 150 may be transmitted to the input value of the CMP-S in turn. Meanwhile, the k-NN classification processing system 100 sequentially inputs ra _j + u _j and rb _j + v _j (1 ≦ j ≦ 2 m ) of the CMP-S in the second cloud 180. You can pass it by value. The k-NN classification processing system 100 may check the result of performing CMP-S in the second cloud 180, α ′ _j (1 ≦ j ≦ 2 m ), and encrypts it in the second cloud 180 to thereby encrypt the first. May be sent to the cloud 150.

다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, F ₀이 선택된 경우에만 SBN을 통해 E(α' _j )(1≤j≤2m) 값을 변환할 수 있다. 아울러, k-NN 분류 처리 시스템(100)은 SM(Secure Multiplication) 프로토콜을 이용하여 E(α)와 E(α' _j ) 간 곱을 수행할 수 있다. 이 때, k-NN 분류 처리 시스템(100)은 최초 E(α)의 값은 E(1)로 설정할 수 있다. Next, the k-NN classification processing system 100 may convert E ( α ′ _j ) (1 ≦ j ≦ 2 m ) values through SBN only when F ₀ is selected in the first cloud 150. . In addition, k-NN classification processing system 100 may be carried out between a product E (α) and E (α _'j) using the (Secure Multiplication) SM protocol. At this time, the k-NN classification processing system 100 may set the value of the first E ( α ) to E (1).

마지막으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(α)를 반환함으로써 GSPE 프로트콜을 종료할 수 있다. 이 때, E(α)=E(1)인 경우, 점 p는 영역 range에 포함될 수 있다. 그러나, k-NN 분류 처리 시스템(100)은 제1 클라우드(150) 및 제2 클라우드(180)에서 E(α)의 실제 값을 알 수 없기 때문에, 점의 영역 내 포함 여부를 알 수 없다.Finally, the k-NN classification processing system 100 may terminate the GSPE protocol by returning E ( α ) in the first cloud 150. In this case, when E ( α ) = E (1), the point p may be included in the area range . However, since the k-NN classification processing system 100 does not know the actual value of E ( α ) in the first cloud 150 and the second cloud 180, it may not know whether the k-NN classification processing system 100 is included in the area of the point.

도 4는 본 발명의 일실시예에 따른 kNN 분류 처리 과정을 설명하기 위한 도면이다. 4 is a diagram illustrating a kNN classification process according to an embodiment of the present invention.

k-NN 분류 처리 시스템(100)은 kNN 분류 처리 알고리즘으로서, 암호화 인덱스 탐색 단계(410), kNN 단계(420), 질의 결과 검증 단계(430) 및 범주 확인 단계(440)를 통해 k-NN 질의를 처리하고 분류할 수 있다. 각각의 단계에 대한 상세한 설명은 도 5a 내지 도 9c를 참고하여 설명하고자 한다. 즉, 도 5a 내지 도 6b 및 도 8a 내지 도 9c는 본 발명의 일실시예에 따른 kNN 분류 처리 알고리즘을 설명하기 위한 도면이다.The k-NN classification processing system 100 is a kNN classification processing algorithm and includes a k-NN query through an encryption index search step 410, a kNN step 420, a query result verification step 430, and a category check step 440. Can be processed and classified. Detailed description of each step will be described with reference to FIGS. 5A to 9C. That is, FIGS. 5A to 6B and 8A to 9C are diagrams for describing a kNN classification processing algorithm according to an embodiment of the present invention.

먼저, k-NN 분류 처리 시스템(100)은 다음과 같은 과정을 통해 암호화 인덱스를 탐색할 수 있다(410).First, the k-NN classification processing system 100 may search for an encryption index through the following process (410).

도 5a 및 도 5b를 참고하여 설명하면, 단계(510)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(q)와 E(node _z )(1≤z≤num _node )를 기반으로 GSPE 프로토콜을 수행함으로써, 질의 지점을 포함하는 노드를 탐색할 수 있다. 이때, GSPE 수행 결과 반환된 E(α _z )의 값이 E(1)인 노드는 질의 지점을 포함하는 노드일 수 있다. 그러나, 제1 클라우드(150) 및 제2 클라우드(180)는 어느 노드가 질의 영역과 겹치는 영역인지 알지 못 할 수 있다. k-NN 분류 처리 시스템(100)은 패일러(Paillier) 암호화 시스템을 기반으로 암호화 데이터베이스를 암호화 할 수 있는데, 패일러 암호화 시스템은 의미적 보안을 지원하기 때문이다. Referring to FIGS. 5A and 5B, in step 510, the k-NN classification processing system 100, in the first cloud 150, E ( q ) and E ( node _z ) (1 ≦ z ≦). By performing the GSPE protocol based on num _node ), it is possible to search for a node including a query point. In this case, the node whose value of E ( α _z ) returned as a result of performing GSPE may be a node including a query point. However, the first cloud 150 and the second cloud 180 may not know which node overlaps the query area. The k-NN classification processing system 100 may encrypt the encryption database based on the Paillier encryption system, because the paler encryption system supports semantic security.

단계(520)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 순서 변경 함수 π를 생성하여 E(α)의 순서를 변경하고, 이를 제2 클라우드(180)로 전송할 수 있다.In step 520, the k-NN classification processing system 100 generates, in the first cloud 150, the reordering function π to change the order of E ( α ) and transmits it to the second cloud 180. Can be.

단계(530)에서, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, E(α)를 복호화 한 후, 1의 개수(c)를 확인하고, c개의 노드 그룹 Group을 생성할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 각 노드 그룹에 α 값이 1인 노드 한 개와 α 값이 0인 노드 (num _node /c)-1개를 할당할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 각 노드 그룹에 할당된 노드의 순서를 랜덤하게 변환한 후, 이를 제1 클라우드(150)로 전송할 수 있다. In step 530, the k-NN classification processing system 100, after decoding E ( α ) in the second cloud 180, checks the number c of 1, and generates c node group Groups . can do. In this case, the k-NN classification processing system 100 may allocate one _node having an α value of 1 and a node having a α value of 0 ( num _node / c ) -1 to each node group in the second cloud 180. Can be. In addition, the k-NN classification processing system 100 may randomly convert the order of the nodes assigned to each node group, and then transmit them to the first cloud 150.

단계(540)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 역변경 함수 π ^-1을 이용하여 각 노드 그룹에 속한 노드의 식별 번호를 역변경할 수 있다.In operation 540, the k-NN classification processing system 100 may reverse change identification numbers of nodes belonging to each node group in the first cloud 150 using the inverse change function π ⁻¹ .

단계(550)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 노드 그룹 별 노드에 저장된 데이터와 GSPE 프로토콜을 통해 반환된 각 노드의 E(α)를 이용해 SM 프로토콜을 수행하고, 준동형 암호화 특성을 이용하여 질의와 관련된 노드 내에 존재하는 데이터를 E(cand)에 저장할 수 있다.In step 550, the k-NN classification processing system 100 performs the SM protocol in the first cloud 150 using E ( α ) of each node returned through the GSPE protocol and data stored in each node group node. By using the quasi-homogenous encryption feature, data existing in the node related to the query can be stored in E ( cand ).

단계(560)에서, k-NN 분류 처리 시스템(100)은 E(cand)를 반환함으로써 암호화 인덱스 탐색을 종료할 수 있다.In step 560, the k-NN classification processing system 100 may terminate the encryption index search by returning E ( cand ).

실시예에 따라, 도 5a에서 설명한 암호화 인덱스를 탐색하는 단계는 도 5b에 도시된 바와 같은 알고리즘으로 구현될 수 있다. According to an embodiment, the step of searching the encryption index described in FIG. 5A may be implemented by an algorithm as shown in FIG. 5B.

다시 도 4를 설명하면, k-NN 분류 처리 시스템(100)은 다음과 같은 과정을 통해 k-NN 탐색 단계를 수행할 수 있다(420). k-NN 분류 처리 시스템(100)은 k-NN 탐색 단계에서, 암호화 인덱스 탐색 단계(410)에서 추출한 데이터를 기반으로 질의와의 거리가 가까운 k개의 데이터를 탐색할 수 있다. k-NN 분류 처리 시스템(100)은 SkNN_m 알고리즘을 부분적으로 활용하여 수행할 수 있는데, k-NN 분류 처리 시스템(100)은 암호화 인덱스 탐색 단계(410)의 수행을 통해 반환된 cnt 개의 데이터를 기반으로 kNN 탐색을 수행할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 연산 비용이 큰 SBD, SMIN, SMIN_n 프로토콜 대신, 데이터 패킹 및 가블드 회로 기반의 효율적인 프로토콜(즉, ESSED, SMS_n)을 활용할 수 있다.Referring back to FIG. 4, the k-NN classification processing system 100 may perform a k-NN search step through the following process (420). k-NN classification processing system 100 in the k- NN search step, the distance between the query and based on the data extracted from the encrypted index search step 410 may search for the k nearest data. The k-NN classification processing system 100 may be performed by partially utilizing the SkNN _m algorithm. The k-NN classification processing system 100 may perform cnt pieces of data returned by performing the encryption index search step 410. K NN search can be performed based on this. In addition, the k-NN classification processing system 100 may utilize an efficient protocol (ie, ESSED, SMS _n ) based on data packing and a garbled circuit, instead of the SBD, SMIN, and SMIN _n protocols, which are expensive in operation.

도 6a를 참고하여 설명하면, 단계(610)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, ESSED 프로토콜을 통해 질의 E(q)와 암호화 인덱스 탐색을 통해 반환된 cnt 개의 암호화 데이터 E(cand _i ) 간 유클리디언 거리 제곱 E(d _i )(1≤i≤cnt)를 계산할 수 있다.Referring to FIG. 6A, in step 610, the k-NN classification processing system 100 may return a cnt returned through query E ( q ) and an encryption index search through the ESSED protocol in the first cloud 150. It is possible to calculate the Euclidean distance squared E ( d _i ) (1 ≦ i ≦ cnt ) between two encrypted data E ( cand _i ).

단계(620)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, SMS_n를 통해 암호화 거리(E(d _i )|1≤i≤cnt) 중 최소값 E(d _min )을 찾을 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(d _min )과 E(d _i ) 간 차를 E(d _min )×E(d _i ) ^N ^-1 (1≤i≤cnt)를 통해 계산하고, 그 결과를 E(τ _i )에 저장할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(τ _i )에 암호화 난수를 곱하여 E(τ _i )를 생성할 수 있다. 아울러, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 임의의 순서 변경 함수 π를 E(τ)에 적용하여 E(β)를 생성하고, 이를 제2 클라우드(180)로 전송할 수 있다.In step 620, the k-NN classification processing system 100 performs, in the first cloud 150, the minimum value E ( d _min ) of the encryption distance E ( d _i ) | 1 ≦ i ≦ cnt through SMS _n . Can be found. In addition, the k-NN classification processing system 100 may determine a difference between E ( d _min ) and E ( d _i ) in the first cloud 150 by E ( d _min ) × E ( d _i ) ^N ⁻¹ (1). ≤ i ≤ cnt ), and store the result in E ( τ _i ). Also, k-NN classification processing system 100 may generate E (τ _i) by multiplying the encrypted random number to the first cloud _{(150), E (τ i} ). In addition, the k-NN classification processing system 100 generates an E ( β ) by applying a random order change function π to E ( τ ) in the first cloud 150 and converts it to the second cloud 180. Can transmit

단계(630)에서, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 전송받은 E(β)의 각 원소를 복호화하고, D(β _i )의 값이 0인 경우에는 E(U _i )=E(1)로, 0이 아닌 경우에는 E(U _i )=E(0)으로 설정할 수 있다. 이 후, k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서 E(U)를 제1 클라우드(150)로 전송할 수 있다. In step 630, the k-NN classification processing system 100 decodes each element of the received E ( β ) in the second cloud 180, and E if the value of D ( β _i ) is 0. ( U _i ) = E (1). If not 0, E ( U _i ) = E (0). Thereafter, the k-NN classification processing system 100 may transmit E ( U ) from the second cloud 180 to the first cloud 150.

단계(640)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 제2 클라우드(180)로부터 전송받은 E(U)를 π ^-1을 통해 역변경하여 E(V)에 저장할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 E(V _i )(1≤i≤cnt) 및 암호화 인덱스 탐색을 통해 반환된 E(cand _i,j )(1≤i≤cnt, 1≤j≤m)를 기반으로 SM 프로토콜을 수행하고, 해당 결과를 E(V _i,j )에 저장할 수 있다. 다음으로, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 준동형 암호화 특성을 기반으로 수학식 6을 통해 E(V _i,j )의 값을 각 차원별로 합산할 수 있다. In step 640, the k-NN classification processing system 100 reversely changes E ( U ) received from the second cloud 180 through π ⁻¹ in the first cloud 150 to E ( V ). Can be stored in In addition, the k-NN classification processing system 100 further includes E ( V _i ) (1 ≦ i ≦ cnt ) and E ( cand _{i, j} ) (1 ≦ i ≦ cnt , 1 ≦ j ≦) returned through the encryption index search. m ) can be performed based on the SM protocol, and the result can be stored in E ( V _{i, j} ). Next, in the first cloud 150, the k-NN classification processing system 100 may sum the values of E ( V _{i, j} ) for each dimension through Equation 6 based on the quasi-dynamic encryption characteristic. .

단계(650)에서, k-NN 분류 처리 시스템(100)은 아직 사용자가 요청한 k개의 질의 결과를 찾지 못했을 경우, kNN 결과로 선택된 E(t _s )가 다음 수행과정에서 중복 선택되는 것을 방지해야 한다. 이를 위해, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 수학식 7을 수행하여 각 E(d _i )(1≤i≤cnt)의 값을 갱신할 수 있다. In step 650, if the k-NN classification processing system 100 has not yet found the k query results requested by the user, the k-NN classification processing system 100 should prevent the E ( t _s ) selected as the k NN result from being duplicated in the next execution process. do. To this end, the k-NN classification processing system 100 may update the value of each E ( d _i ) (1 ≦ i ≦ cnt ) by performing Equation 7 in the first cloud 150.

여기서, E(max)는 데이터 도메인의 최대값을 의미할 수 있다. kNN 결과로 선택된 데이터는 E(V _i )=E(1) 값을 지니기 때문에, k-NN 분류 처리 시스템(100)은 수학식 7을 통해 E(d _i )=E(max)로 변경할 수 있다. k-NN 분류 처리 시스템(100)은 나머지 데이터에 대하여, E(V _i )=E(0) 값을 지니기 때문에, E(d _i ) 값을 그대로 유지할 수 있다. 이를 통해, k-NN 분류 처리 시스템(100)은 kNN 결과로 선택된 암호화 데이터가 중복 선택되는 것을 방지할 수 있다. Here, E ( max ) may mean the maximum value of the data domain. Since the data selected as a result of k NN has a value of E ( V _i ) = E (1), the k-NN classification processing system 100 can change E ( d _i ) = E (max) through Equation 7. have. Since the k-NN classification processing system 100 has a value of E ( V _i ) = E (0) for the remaining data, the k-NN classification processing system 100 can maintain the value of E ( d _i ) as it is. Through this, the k-NN classification processing system 100 may prevent the encrypted data selected as a result of k NN from being duplicated.

k-NN 분류 처리 시스템(100)은 상기의 과정을 k개의 데이터가 탐색될 때까지 반복 수행할 수 있으며, 단계(660)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 탐색된 k개의 질의 결과를 반환함으로써 알고리즘을 종료할 수 있다.The k-NN classification processing system 100 may repeat the above process until k data are searched. In step 660, the k-NN classification processing system 100 may perform the first cloud 150. In, the algorithm can be terminated by returning the searched k query results.

실시예에 따라, 도 6a에서 설명한 k-NN 탐색 단계는 도 6b에 도시된 바와 같은 알고리즘으로 구현될 수 있다. According to an embodiment, the k-NN discovery step described in FIG. 6A may be implemented with an algorithm as illustrated in FIG. 6B.

다시 도 4를 설명하면, k-NN 분류 처리 시스템(100)은 다음과 같은 과정을 통해 노드 확장 탐색을 통한 질의결과 검증 단계를 수행할 수 있다(430).Referring back to FIG. 4, the k-NN classification processing system 100 may perform a query result verification step through node extension discovery through the following process (430).

kNN 질의 탐색에 대한 결과는 kd 트리를 통해 분할된 일부 노드의 데이터를 기반으로 탐색된 것일 수 있다. 따라서, 인접한 kd 트리 노드에 질의와 보다 근접한 데이터가 존재하는지 검증하는 과정이 요구될 수 있다. 이를 해결하기 위하여, k-NN 분류 처리 시스템(100)은 k-NN 탐색 단계(420)에서 반환된 결과 E(t) 중 k 번째 결과까지의 거리 dist _k 보다 질의 지점으로부터 가까운 거리에 존재하는 노드들을 탐색할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 질의 지점으로부터의 최단 거리가 dist _k 보다 작은 노드를 찾기 위한 탐색 과정을 수행할 수 있다. 이를 위해, k-NN 분류 처리 시스템(100)은 제1 정의의 최단 거리점을 활용할 수 있다.The result of the kNN query search may be a search based on data of some nodes partitioned through the kd tree. Therefore, a process of verifying that data closer to the query exists in an adjacent kd tree node may be required. To solve this problem, nodes k-NN classification processing system 100 may present a short distance away from the result E (t) the distance query point than dist _k of up to k-th result returned by the k-NN search phase 420 To explore them. That is, the k-NN classification processing system 100 may perform a search process for finding a node whose shortest distance from the query point is smaller than dist _k . To this end, the k-NN classification processing system 100 may utilize the shortest distance point of the first definition.

제1 정의의 최단 거리점 sp(shortest point)은 한 점(p)과 한 영역이 주어졌을 때, 영역 내에 존재하는 모든 점 중에서 p까지의 최단 거리를 갖는 점일 수 있다.The shortest distance point sp (shortest point) of the first definition may be a point having the shortest distance to p among all points existing in the area, given a point p and an area.

도 7를 참고하여, 최단 거리점의 특성을 설명하고자 한다. 도 7은 본 발명의 일실시예에 따른 1차원 공간에서의 점-영역 관계를 도시한 도면이다.7, the characteristics of the shortest distance point will be described. FIG. 7 illustrates a point-region relationship in one-dimensional space according to an embodiment of the present invention.

도 7에 도시된 바와 같이, k-NN 분류 처리 시스템(100)은 1차원 상의 점 p=3 및 3개의 영역(range ₁, range ₂, range ₃)이 주어졌을 경우, 점과 영역의 위치 관계를 크게 3가지로 구분할 수 있다.As shown in FIG. 7, when the k-NN classification processing system 100 is given a point p = 3 and three regions ( range ₁ , range ₂ , range ₃ ) on one dimension, the positional relationship between the point and the region is given. Can be divided into three.

i) range ₁과 같이 영역의 하한점 값(예컨대, 0) 및 상한점 값(예컨대, 2) 모두 점 p의 값(예컨대, 3) 보다 작은 경우, k-NN 분류 처리 시스템(100)은 p에 대한 range ₁의 최단 거리점을 해당 영역의 상한점으로 할 수 있다. ii) range ₂와 같이 영역의 하한점 값(예컨대, 4) 및 상한점 값(예컨대, 6) 모두 점 p의 값(예컨대, 3) 보다 큰 경우, k-NN 분류 처리 시스템(100)은 p에 대한 range ₂의 최단 거리점을 해당 영역의 하한점으로 할 수 있다. iii) range ₃과 같이 영역의 하한점 값(예컨대, 2) 및 상한점 값(예컨대, 4) 사이에 점 p의 값(예컨대, 3)이 존재하는 경우, k-NN 분류 처리 시스템(100)은 p에 대한 range ₃의 최단 거리점을 p의 값으로 할 수 있다. 본원에서는 이러한 특성을 다차원 공간으로 확장하여 활용할 수 있다. i) If both the lower limit value (e.g. 0) and the upper limit value (e.g. 2) of the region are less than the value of point p (e.g. 3), such as range ₁ , then k-NN classification processing system 100 is p The shortest distance point in range ₁ for can be the upper limit of the area. ii) a lower limit point value of the range, such as range ₂ (e. g., 4) and the case upper limit value (e.g., 6) were greater than the value of the point p (e.g., 3), k-NN classification processing system 100 includes a p The shortest distance point in range ₂ for can be the lower limit of the region. iii) If the value of point p (eg 3) exists between the lower limit value (eg 2) and the upper limit value (eg 4) of the region, such as range ₃ , then k-NN classification processing system 100 It can be the minimum distance point of the range ₃ to p with a value of p. In the present application, such a characteristic may be extended and utilized in a multidimensional space.

이러한 특성을 바탕으로, k-NN 분류 처리 시스템(100)은 패일러 암호화 시스템을 기반으로 암호화된 데이터 상에서의 질의에 대한 노드의 최단 거리점 탐색 및 질의 결과 검증을 다음과 같은 과정으로 수행할 수 있다.Based on these characteristics, the k-NN classification processing system 100 may search the shortest distance point of the node for the query on the encrypted data and verify the query result based on the paler encryption system as follows. have.

도 8a를 참고하여 설명하면, 단계(810)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, ESSED 프로토콜을 이용하여 질의 E(q)와 E(t _k )까지의 거리 E(dist _k )를 계산할 수 있다.Referring to FIG. 8A, at step 810, the k-NN classification processing system 100 may, in the first cloud 150, query queries E ( q ) and E ( t _k ) using the ESSED protocol. The distance E ( dist _k ) can be calculated.

단계(820)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(q _j )와 노드의 하한점 E(node _z . lb _j )(1≤z≤num _node , 1≤j≤m) 간 GSCMP 프로토콜을 수행하고, 그 결과를 E(ψ ₁)에 저장할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 E(q _j )와 노드의 상한점 E(node _z . ub _j )(1≤z≤num _node , 1≤j≤m) 간 GSCMP 프로토콜을 수행하고, 그 결과를 E(ψ ₂)에 저장할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 E(q _j )가 노드의 하한점 혹은 상한점 보다 작거나 같은 경우, 상응하는 E(ψ)는 E(1) 값을 갖도록 할 수 있다.In step (820), k-NN classification processing system 100 is a lower limit point of the node in the first cloud _{(150), (q j)} E E (node z. Lb j) (1≤ z ≤ num node, GSCMP protocol between 1 ≦ j ≦ m ) may be performed, and the result may be stored in E ( ψ ₁ ). Also, k-NN classification processing system 100 performs GSCMP protocol between the E (q _j), and the upper limit _{_{E (node z. Ub j)}} (1≤ z ≤ num node, 1≤ j ≤ m) of the node, and The result can be stored in E ( ψ ₂ ). In addition, the k-NN classification processing system 100 may allow the corresponding E ( ψ ) to have an E (1) value when E ( q _j ) is less than or equal to the lower or upper limit of the node.

단계(830)에서, k-NN 분류 처리 시스템(100)은 E(ψ ₁)과 E(ψ ₂)를 이용하여 SBXOR(Secure Bit-XOR) 프로토콜을 수행하고, 결과를 E(ψ ₃)에 저장할 수 있다. In step 830, the k-NN classification processing system 100 performs a Secure Bit-XOR (SBXOR) protocol using E ( ψ ₁ ) and E ( ψ ₂ ), and outputs the result to E ( ψ ₃ ). Can be stored.

단계(840)에서, k-NN 분류 처리 시스템(100)은 수학식 8 및 수학식 9를 수행하여 각 차원에서의 최단 거리점 E(sp _z,j )을 계산할 수 있다. In operation 840, the k-NN classification processing system 100 may perform Equations 8 and 9 to calculate the shortest distance point E ( sp _{z, j} ) in each dimension.

단계(850)에서, E(q)에 대한 각 노드의 최단 거리점 E(sp _z )(1≤z≤num _node ) 탐색이 완료된 후, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, ESSED 프로토콜을 통해 E(q)와 각 E(sp _z ) 간 유클리디언 거리의 제곱을 계산하여 E(spdist _z )(1≤z≤num _node )에 저장할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 수학식 10을 통해 탐색이 완료된 노드의 최단 거리점까지의 거리를 도메인에서의 최대값인 E(max)로 안전하게 변경할 수 있다. In step 850, after searching for the shortest distance point E ( sp _z ) (1 ≦ z ≦ num _node ) of each node with respect to E ( q ), the k-NN classification processing system 100 performs a first cloud ( In 150, the square of the Euclidean distance between E ( q ) and each E ( sp _z ) may be calculated and stored in E ( spdist _z ) (1 ≦ z ≦ num _node ) through the ESSED protocol. In addition, the k-NN classification processing system 100 may safely change the distance from the first cloud 150 to the shortest distance point of the node whose search is completed through Equation 10 to E ( max ) which is the maximum value in the domain. Can be.

여기서, E(α _z )는 단계(810)에서 GSPE 프로토콜을 통해 반환된 값이며, 이미 탐색이 완료된 노드는 E(α _z )=E(1), 그렇지 않은 노드는 E(α _z )=E(0) 값을 가질 수 있다.Where E ( α _z ) is the value returned via the GSPE protocol in step 810, where nodes have already been searched for E ( α _z ) = E (1), otherwise nodes E ( α _z ) = E It can have a value of (0).

이 후, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(spdist _z ) 및 E(dist _k )를 기반으로 GSCMP 프로토콜을 수행하고, 해당 결과를 E(α _z )에 저장할 수 있다. 만약, E(spdist _z )가 E(dist _k )보다 작은 노드이면, 상기 노드는 추가 탐색이 필요한 노드일 수 있으며, k-NN 분류 처리 시스템(100)은 GSCMP 프로토콜의 수행 결과 E(α _z )=E(1)을 반환 받을 수 있다. 이 때, k-NN 분류 처리 시스템(100)은 이미 탐색이 완료된 노드에 대하여 E(spdist _z )가 도메인에서의 최대값을 지니기 때문에, 질의 결과 검증을 위한 확장 노드로 선정하지 않을 수 있다. Thereafter, the k-NN classification processing system 100 performs the GSCMP protocol based on E ( spdist _z ) and E ( dist _k ) in the first cloud 150, and transmits the result to E ( α _z ). Can be stored. If E ( spdist _z ) is a node smaller than E ( dist _k ), the node may be a node that needs further discovery, and the k-NN classification processing system 100 performs E ( α _z ) as a result of performing the GSCMP protocol. = E (1) can be returned. In this case, the k-NN classification processing system 100 may not select an extension node for query result verification because E ( spdist _z ) has a maximum value in a domain for a node that has already been searched.

단계(860)에서, k-NN 분류 처리 시스템(100)은 암호화 인덱스 탐색 단계(410)를 재수행 함으로써, E(q)로부터 dist _k 거리 내에 존재하는 노드에 속한 모든 데이터를 추출하여 E(t)에 추가할 수 있다. 아울러, k-NN 분류 처리 시스템(100)은 E(t)를 기반으로 k-NN 탐색 단계(420)를 재수행 함으로써 최종 질의 결과인 E(result _i ) (1≤i≤k)를 획득할 수 있다.In step 860, the k-NN classification processing system 100 re-runs the encryption index search step 410, thereby extracting all the data belonging to the nodes that exist within the dist _k distance from E ( q ) to E ( t ) Can be added. In addition, the k-NN classification processing system 100 may obtain E ( result _i ) (1 ≦ i ≦ k ) which is a final query result by performing the k-NN search step 420 based on E ( t ). Can be.

단계(870)에서, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, E(result _i ) (1≤i≤k)의 범주(Label)를 확인할 수 있다. k-NN 분류 처리 시스템(100)이 범주 확인을 하는 과정은 후술하는 도 9a 및 도 9c를 참조하여 설명하고자 한다.In operation 870, the k-NN classification processing system 100 may check a label of E ( result _i ) (1 ≦ i ≦ k ) in the first cloud 150. The process of category checking by the k-NN classification processing system 100 will be described with reference to FIGS. 9A and 9C.

실시예에 따라, 도 8a에서 설명한 질의결과 검증 단계는 도 8b에 도시된 바와 같은 알고리즘으로 구현될 수 있다. According to an embodiment, the query result verifying step described in FIG. 8A may be implemented by an algorithm as shown in FIG. 8B.

다시 도 4를 설명하면, k-NN 분류 처리 시스템(100)은 다음과 같은 과정을 통해 범주 확인 단계를 수행할 수 있다(440).Referring back to FIG. 4, the k-NN classification processing system 100 may perform a category identification step through the following process (440).

도 9a 및 도 9c를 참고하여 설명하면, k-NN 분류 처리 시스템(100)은 전체 범주 Label = {L ₁ ,L ₂ ,…, L _n }과 추출된 데이터의 범주 Label = {L’ ₁ ,L ’ ₂ ,…,L’ _k }을 비교할 수 있다. 먼저, k-NN 분류 처리 시스템(100)은 제1 클라우드(150)와 제2 클라우드(180)에서, SF(Secure Frequency)를 통해 빈번도를 계산할 수 있다(910)(line 1). k-NN 분류 처리 시스템(100)은 계산된 빈번도 사이의 비교를 통해, 가장 높은 값을 가지는 범주를 선택할 수 있다(920)(line 2). k-NN 분류 처리 시스템(100)은 제1 클라우드(150)에서, 선택된 범주에 임의의 정수(r _q )를 더한 후, E(γ _q )에 저장할 수 있다(930)(line 3). 다음으로, k-NN 분류 처리 시스템(100)은 E(γ _q )를 제2 클라우드(180)로, 임의의 정수(r _q )를 Bob로 전송할 수 있다(940)(line 4). k-NN 분류 처리 시스템(100)은 제2 클라우드(180)에서, 받은 E(γ _q )를 복호화 후 사용자 단말(AU)로 전송할 수 있다(950)(line 5~6). 마지막으로 k-NN 분류 처리 시스템(100)은 사용자 단말(AU)에서, 제1 클라우드(150)와 제2 클라우드(180)의 결과를 조합하여 질의가 속하는 범주를 최종 결과로 획득하도록 할 수 있다(line 7~8).9A and 9C, the k-NN classification processing system 100 may include the entire category Label = { L ₁ , L ₂ ,. , L _n } and the category of the extracted data Label = { L ' ₁ , L ' ₂ ,… , L ' _k } can be compared. First, the k-NN classification processing system 100 may calculate the frequency in the first cloud 150 and the second cloud 180 through SF (Secure Frequency) (line 1). The k-NN classification processing system 100 may select the category with the highest value (line 2) through the comparison between the calculated frequencies. The k-NN classification processing system 100 may add an arbitrary integer r _q to the selected category in the first cloud 150 and then store it in E ( γ _q ) (930) (line 3). Next, the k-NN classification processing system 100 may transmit E ( γ _q ) to the second cloud 180 and any integer r _q to Bob (940) (line 4). The k-NN classification processing system 100 may decode the received E ( γ _q ) from the second cloud 180 to the user terminal AU after decoding (950) (line 5 to 6). Finally, the k-NN classification processing system 100 may combine the results of the first cloud 150 and the second cloud 180 in the user terminal AU to obtain a category to which the query belongs as the final result. (line 7-8).

실시예에 따라, 도 9c에 도시된 바와 같이 k-NN 분류 처리 시스템(100)은 데이터를 분류할 수 있다. 즉, 도 9에 도시된 보험 데이터에 대하여, 보험사 직원은 새로운 고객의 정보를 입력하여 해당 고객의 정보가 어느 등급에 속하는지 k-NN 분류 처리 시스템(100)을 이용하여 파악할 수 있다. 예를 들어, 새로운 고객의 정보가 33세, A 질병 보유, 연봉 4500만원이라고 가정할 때, k-NN 분류 처리 시스템(100)은 k=2로 하여 질의를 전달하면 가장 가까운 k개의 데이터 1번 및 2번을 추출할 수 있다. k-NN 분류 처리 시스템(100)은 이를 통하여 새로운 고객의 정보가 A등급에 속한다는 것을 확인할 수 있다. k-NN 분류 처리 시스템(100)은 해당 데이터가 모두 고객의 민감한 데이터이기 때문에, 모두 암호화된 상태에서 제안하는 알고리즘을 기반으로 수행할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 정보 유출 없이 새로운 고객의 데이터가 어떤 등급에 해당 하는지 추출할 수 있다.According to an embodiment, as shown in FIG. 9C, the k-NN classification processing system 100 may classify data. That is, with respect to the insurance data shown in FIG. 9, an insurance company employee may input information of a new customer to determine which grade the customer information belongs to using the k-NN classification processing system 100. For example, assuming that a new customer's information is 33 years old, has A disease, and an annual salary of 45 million won, the k-NN classification processing system 100 forwards the query with k = 2. And number 2 can be extracted. The k-NN classification processing system 100 may confirm that the new customer information belongs to the A grade. Since the k-NN classification processing system 100 is all sensitive data of the customer, the k-NN classification processing system 100 may be performed based on an algorithm proposed in the encrypted state. That is, the k-NN classification processing system 100 may extract what grade the new customer's data corresponds to without leaking information.

이하, 도 10에서는 본 발명의 실시예들에 따른 k-NN 분류 처리 시스템(100)의 작업 흐름을 상세히 설명한다.10, the workflow of the k-NN classification processing system 100 according to the embodiments of the present invention will be described in detail.

도 10은 본 발명의 일실시예에 따른 가블드 회로 기반 k-NN 분류 처리 방법의 순서를 도시한 흐름도이다.10 is a flowchart illustrating a procedure of a garbled circuit based k-NN classification processing method according to an embodiment of the present invention.

본 실시예에 따른 가블드 회로 기반 k-NN 분류 처리 방법은 상술한 k-NN 분류 처리 시스템(100)에 의해 수행될 수 있다.The garbled circuit-based k-NN classification processing method according to the present embodiment may be performed by the k-NN classification processing system 100 described above.

먼저, k-NN 분류 처리 시스템(100)은 사용자 단말로부터의 kNN(k Nearest Neighbor) 질의에 대한 결과데이터가 도출 됨에 따라, 상기 결과데이터의 도출에 관여한, 제1 클라우드와 제2 클라우드 사이의 다자간 빈도(SF, Secure Frequency)를 수행하여, 상기 kNN 질의와, 상기 kNN 질의에 의해 도출되었던 과거 결과데이터 간에 대한 빈번도를 계산한다(1010). 즉, 단계(1010)에서, k-NN 분류 처리 시스템(100)은 전체 Label = {L ₁ ,L ₂ ,…, L _n }과 추출된 데이터의 Label = {L’ ₁ ,L ’ ₂ ,…,L’ _k }에 대한 다자간 빈도 SF(Δ, Δ')를 수행하여, 빈번도를 계산할 수 있다.First, as the result data for the k Nearest Neighbor (kNN) query from the user terminal is derived, the k-NN classification processing system 100 may determine whether the k-NN classification processing system 100 is involved in deriving the result data. By performing Secure Frequency (SF), a frequency between the kNN query and past result data derived by the kNN query is calculated (1010). That is, in step 1010, the k-NN classification processing system 100 performs a total Label = { L ₁ , L ₂ ,... , L _n } and the Label = { L ' ₁ , L ' ₂ ,... The frequency can be calculated by performing the multilateral frequency SF (Δ, Δ ′) for , L ′ _k }.

다음으로, k-NN 분류 처리 시스템(100)은 상기 빈번도의 계산을 통해, 탐색 범주를 선택한다(1020). 즉, 단계(1020)에서, k-NN 분류 처리 시스템(100)은 계산된 빈번도 사이의 비교를 통해, 가장 높은 값을 가지는 범주를 선택할 수 있다.Next, the k-NN classification processing system 100 selects a search category through the calculation of the frequency (1020). That is, at step 1020, the k-NN classification processing system 100 may select the category with the highest value through a comparison between the calculated frequencies.

또한, 단계(1020)는 제1 단계로서, 상기 제1 클라우드에서, 가장 높은 빈번도가 계산된 과거 결과데이터가 갖는 범주에, 임의의 부호값을 부가하여 암호화 범주로 부호화하고, 제2 단계로서, 상기 제1 클라우드에서, 상기 암호화 범주를 상기 제2 클라우드로 제공하고, 상기 부호값을 상기 사용자 단말로 제공하는 과정일 수 있다. 즉, k-NN 분류 처리 시스템(100)은 제1 클라우드에, 선택된 범주에 임의의 정수(r _q )를 더한 후, E(γ _q )에 저장할 수 있다.In addition, step 1020 is a first step, in the first cloud, an arbitrary code value is added to the category of the past result data of which the highest frequency is calculated, and encoded into an encryption category, and as the second step. In the first cloud, the encryption category may be provided to the second cloud, and the code value may be provided to the user terminal. That is, the k-NN classification processing system 100 may add an arbitrary integer r _q to the selected category in the first cloud and then store it in E ( γ _q ).

또한, 단계(1020)는, 제3 단계로서, 상기 제2 클라우드에서, 상기 암호화 범주에 대해 복호화하여 복호화 범주를 생성하고, 상기 복호화 범주를 상기 사용자 단말로 제공하며, 제4 단계로서, 상기 사용자 단말에서, 제공된 상기 부호값과 상기 복호화 범주를 이용하여, 상기 암호화 범주로 복원하는 과정을 포함할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 제2 클라우드에 E(γ _q )를 전송하여 복호화하도록 하고, 임의의 정수(r _q )를 Bob로 전송할 수 있다. 그리고, k-NN 분류 처리 시스템(100)은 제2 클라우드에서, 수신한 E(γ _q )를 복호화 후 사용자 단말(AU)로 전송하도록 할 수 있다.Further, step 1020 is a third step, in the second cloud, decrypts the encryption category to generate a decryption category, provides the decryption category to the user terminal, and as a fourth step, the user The terminal may include recovering the encryption category by using the provided code value and the decryption category. That is, the k-NN classification processing system 100 may transmit and decode E ( γ _q ) to the second cloud, and transmit an arbitrary integer r _q to Bob. The k-NN classification processing system 100 may transmit the received E ( γ _q ) to the user terminal AU after decoding in the second cloud.

또한, 단계(1020)는, 상기 암호화 범주로의 복원이 가능하면, 상기 가장 높은 빈번도가 계산된 과거 결과데이터가 갖는 범주를, 상기 탐색 범주로서 선택하는 과정을 포함할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 질의에 따른 가장 가까운 k개의 데이터에 대해 가장 높은 빈번도를 갖는 범주를 탐색 범주로서 선택할 수 있다.In addition, if the restoration to the encryption category is possible, step 1020 may include selecting, as the search category, a category of the past result data having the highest frequency calculated. That is, the k-NN classification processing system 100 may select a category having the highest frequency for the nearest k data according to the query as the search category.

또한, 단계(1020)는 상기 빈번도의 크기에 따라 정해지는 과거 결과데이터가 갖는 범주를 순차적으로 이용하여, 상기 제1 단계 내지 상기 제4 단계를 반복하되, 상기 암호화 범주로의 복원이 가능하면, 상기 반복을 중단하는 과정을 포함할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 적어도 탐색 범주를 선택할 때까지 제1 단계 내지 제4 단계를 반복할 수 있다.In operation 1020, the first to fourth steps may be repeated by sequentially using categories of past result data determined according to the magnitude of the frequency, and may be restored to the encryption category. It may include the step of stopping the repetition. That is, the k-NN classification processing system 100 may repeat the first to fourth steps until at least the search category is selected.

다음으로, k-NN 분류 처리 시스템(100)은 상기 탐색 범주가, 상기 도출된 결과데이터의 범주와 일치하면, 상기 도출된 결과데이터를 사용자 단말로 제공한다(1030). 즉, k-NN 분류 처리 시스템(100)은 빈번도 비교에 따라 선택된 범주에 대해, 결과데이터의 범주와 일치하는 경우 사용자 단말로 결과데이터를 제공할 수 있다. Next, if the search category matches the category of the derived result data, the k-NN classification processing system 100 provides the derived result data to the user terminal (1030). That is, the k-NN classification processing system 100 may provide the result data to the user terminal when the category is matched with the category of the result data for the category selected according to the frequency comparison.

실시예에 따라, k-NN 분류 처리 시스템(100)은 상기 제1 클라우드와, 상기 제1 클라우드와 독립(non-colluding)되는 상기 제2 클라우드를 구축할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 각 클라우드는 사용자 질의를 처리하기 위해 암호화 프로토콜을 수행 시, 질의 처리 과정 중에 획득한 정보를 바탕으로, 추가적인 정보를 획득하기 위해 다른 클라우드와 결탁하여 데이터 및 정보를 주고 받지 않도록 할 수 있다.According to an embodiment, the k-NN classification processing system 100 may build the first cloud and the second cloud that is independent of the first cloud. That is, the k-NN classification processing system 100 is based on the information obtained during the query processing process, when each cloud performs an encryption protocol to process the user query, the data concatenated with other clouds to obtain additional information And send and receive information.

또한, k-NN 분류 처리 시스템(100)은 원본 데이터베이스에 저장되는 데이터를 암호화 한 암호화 데이터베이스와, 상기 암호화와 연관되어 생성되는 암호화 공개키를, 상기 제1 클라우드에 유지할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 제1 클라우드에 암호화 데이버베이스 및 암호화 데이터베이스와 연관되는 암호화 공개키를 유지할 수 있다.Also, the k-NN classification processing system 100 may maintain an encryption database that encrypts data stored in an original database, and an encryption public key generated in association with the encryption in the first cloud. That is, the k-NN classification processing system 100 may maintain an encryption database and an encryption public key associated with the encryption database in the first cloud.

또한, k-NN 분류 처리 시스템(100)은 상기 암호화 공개키에 대응하는 복호화 비밀키를, 상기 제2 클라우드에 유지할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 제2 클라우드에, 암호화 공개키에 대응하는 복호화 비밀키를 유지할 수 있다.Also, the k-NN classification processing system 100 may maintain a decryption secret key corresponding to the encryption public key in the second cloud. That is, the k-NN classification processing system 100 may maintain a decryption secret key corresponding to the encryption public key in the second cloud.

또한, k-NN 분류 처리 시스템(100)은 상기 암호화 공개키를 배포 받은 상기 사용자 단말에서, 상기 kNN(k Nearest Neighbor) 질의가 발생되는 경우, 상기 암호화 공개키와 상기 복호화 비밀키에 기초한, 상기 제1 클라우드와 상기 제2 클라우드 간의 다자간 계산을 수행하여, 상기 암호화 데이터베이스로부터 상기 kNN 질의에 대한 상기 결과데이터를 도출할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 사용자 단말로부터, 데이터베이스의 암호화 시 이용한 암호화 공개키를 이용하여 암호화된 kNN 질의가 수신되는지 판단할 수 있다. 예를 들어, 단말에서는 질의 점을, 예컨대, 'E(q_j)(1≤j≤m)'와 같이 암호화 공개키(130)로 암호화하여 사용자 질의를 요청할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 사용자 단말로부터 사용자 질의가 수신되면, 선정된 암호화 연산 프로토콜을 기반으로, 제1 클라우드와 제2 클라우드 간에 다자간 계산(SMC, Secure Multiparty Computation)을 수행하여, kNN 질의를 처리할 수 있다.In addition, the k-NN classification processing system 100 is based on the encryption public key and the decryption secret key when the kNN (k Nearest Neighbor) query is generated in the user terminal that has received the encryption public key. The result data of the kNN query may be derived from the encryption database by performing a multilateral calculation between the first cloud and the second cloud. That is, the k-NN classification processing system 100 may determine whether an encrypted kNN query is received from the user terminal by using the encryption public key used when encrypting the database. For example, the terminal may request a user query by encrypting the query point with the encryption public key 130, for example, 'E (q _j ) (1 ≦ _j ≦ m)'. In addition, when a user query is received from a user terminal, the k-NN classification processing system 100 may perform secure multiparty computing (SMC) between the first cloud and the second cloud based on a selected encryption algorithm. It can process kNN queries.

여기서, 다자간 계산이란, 데이터 소유자가 보유하고 있는 원본 데이터를 노출하지 않은 채, 다른 개체(제1 클라우드와 제2 클라우드)를 통해 프로토콜 및 연산을 안전하게 수행하는 것을 지칭할 수 있다.Here, the multilateral calculation may refer to safely performing protocols and operations through other entities (the first cloud and the second cloud) without exposing the original data held by the data owner.

실시예에 따라서, k-NN 분류 처리 시스템(100)은 상기 원본 데이터베이스에 저장된 데이터를, 다수의 속성(attribute) 및 차원(column)으로 분할하여, kd 트리를 구성하고, kd 트리를 암호화 한 암호화 kd 트리를, 상기 제1 클라우드에 유지할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 데이터베이스에 저장된 데이터를 선정된 개수(예를 들어, F개) 단위로 분할하고, 분할된 데이터를 포함하는 단말 노드를, 복수로 가지는 kd 트리를 구축할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 제1 클라우드에 암호화된 kd 트리를 유지할 수 있다.According to an embodiment, the k-NN classification processing system 100 divides the data stored in the original database into a plurality of attributes and columns to form a kd tree and encrypts the kd tree. The kd tree can be maintained in the first cloud. That is, the k-NN classification processing system 100 divides the data stored in the database by a predetermined number (for example, F) units, and constructs a kd tree having a plurality of terminal nodes including the divided data. can do. Also, the k-NN classification processing system 100 may maintain an encrypted kd tree in the first cloud.

일례로, k-NN 분류 처리 시스템(100)은 레벨이 h이고, 총 2^h-1개의 단말 노드를 가지는 kd 트리를 데이터베이스로부터 구성할 수 있으며, 각 단말 노드는 최대 F(FanOut)개의 데이터를 저장할 수 있다.For example, the k-NN classification processing system 100 may configure a kd tree having a level h and having a total of 2 ^h-1 terminal nodes from a database, and each terminal node may store a maximum of F (FanOut) data. Can be stored.

kd 트리의 각 단말 노드는, 자신이 담당하는 노드 영역에 관한 영역 정보와, 노드 영역 내에 포함되는 데이터에 대한 데이터ID를 평문 형태로 저장할 수 있다. 여기서, 상기 영역 정보는 노드 영역에 대한 하한점(lb_z,m) 및 상한점(ub_z,m)(1≤z≤num_node, 1≤j≤m)을 속성(m) 별로 포함할 수 있다.Each terminal node of the kd tree may store, in plain text form, region information about a node region in charge thereof and data IDs for data included in the node region. Here, the region information may include a lower limit (lb _{z, m} ) and an upper limit (ub _{z, m} ) (1 ≦ _z ≦ num _node , 1 ≦ j ≦ _m ) for a node region for each attribute m. have.

이때, 상기 결과데이터를 도출하는 단계에서, k-NN 분류 처리 시스템(100)은 선정된 암호화 연산 프로토콜을 기반으로, 상기 암호화 kd 트리에 근거한, 상기 암호화 데이터베이스 상에서의 상기 kNN 질의를 처리하여 상기 결과데이터를 도출할 수 있다. 즉, k-NN 분류 처리 시스템(100)은 암호화 kd 트리에 근거하여 암호화 인덱스를 탐색하고 인접한 kd 트리 노드를 탐색하여 kNN 질의를 처리할 수 있다.At this time, in the step of deriving the result data, the k-NN classification processing system 100 processes the kNN query on the encryption database based on the encryption kd tree based on the selected encryption operation protocol. Data can be derived. That is, the k-NN classification processing system 100 may search for an encryption index based on an encrypted kd tree and search for an adjacent kd tree node to process a kNN query.

또한, 상기 결과데이터를 도출하는 단계에서, k-NN 분류 처리 시스템(100)은 ESSED(Enhanced Secure Squared Euclidean Distance) 프로토콜, GSCMP(Garbled Circuit based Secure Compare) 프로토콜, 및 GSPE(Garbled Circuit based Secure Point Enclosure) 프로토콜 중 어느 하나를, 상기 암호화 연산 프로토콜로 선정할 수 있다. 예를 들면, k-NN 분류 처리 시스템(100)은 ESSED 프로토콜을 이용하여 벡터 E(X)와 E(Y) 간 거리의 제곱 E(|X-Y|²)을 계산할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 GSCMP 프로토콜을 이용하여 제1 클라우드(150)에 E(u)와 E(v)가 주어졌을 때, u<v를 만족하는 경우 E(1)을 반환하고, u>v인 경우 E(0)을 반환할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 GSPE 프로토콜을 이용하여, 제1 클라우드(150)에 m 차원의 점 E(p) 및 하한점 E(lb _j ) 및 상한점 E(ub _j )(1≤j≤m)으로 표현된 암호화 영역 정보 'range'가 주어졌을 때, 점 p가 영역 range에 포함되는 경우 E(1)을 반환할 수 있다.In the deriving of the result data, the k-NN classification processing system 100 may include an Enhanced Secure Squared Euclidean Distance (ESSED) protocol, a Garbled Circuit based Secure Compare (GSCMP) protocol, and a GSB (Garbled Circuit based Secure Point Enclosure). ) Can be selected as the encryption operation protocol. For example, the k-NN classification processing system 100 may calculate the square E (| X - Y | ² ) of the distance between the vectors E ( X ) and E ( Y ) using the ESSED protocol. In addition, the k-NN classification processing system 100 receives E (1) when u < v is satisfied when E ( u ) and E ( v ) are given to the first cloud 150 using the GSCMP protocol. If u > v , E (0) can be returned. In addition, the k-NN classification processing system 100 uses the GSPE protocol, and the m- dimensional point E ( p ) and the lower limit E ( lb _j ) and the upper limit E ( ub _j ) ( Given the encrypted range information ' range ' expressed as 1 ≦ j ≦ m ), E (1) may be returned when the point p is included in the range .

실시예에 따라서, 상기 암호화 연산 프로토콜로서, ESSED 프로토콜이 선정되면, 상기 결과데이터를 도출하는 단계에서, k-NN 분류 처리 시스템(100)은 상기 암호화 kd 트리 내 임의의 데이터 쌍에 대한 차원별 거리의 합산을, 2차원 상에서 수행하여, 상기 결과데이터의 도출을 위한 암호화 데이터 기반 연산 횟수를 감소시킬 수 있다. 즉, k-NN 분류 처리 시스템(100)은 ESSED 프로토콜을 통해 질의 E(q)와 암호화 인덱스 탐색을 통해 반환된 cnt 개의 암호화 데이터 E(cand _i ) 간 유클리디언 거리 제곱 E(d _i )(1≤i≤cnt)를 계산할 수 있다. k-NN 분류 처리 시스템(100)은 kNN 탐색 및 질의결과를 검증하는데 ESSED 프로토콜을 사용할 수 있다.According to an embodiment, when the ESSED protocol is selected as the encryption operation protocol, in the step of deriving the result data, the k-NN classification processing system 100 may determine a dimension-by-dimensional distance with respect to any data pair in the encryption kd tree. May be performed in two dimensions to reduce the number of operations based on encrypted data for deriving the result data. That is, the k-NN classification processing system 100 squares the Euclidean distance E ( d _i ) between the query E ( q ) and the cnt encrypted data E ( cand _i ) returned through the search of the encryption index through the ESSED protocol. 1 ≦ i ≦ cnt ) can be calculated. The k-NN classification processing system 100 may use the ESSED protocol to verify kNN search and query results.

실시예에 따라서, 상기 암호화 연산 프로토콜로서, GSCMP 프로토콜이 선정되면, 상기 결과데이터를 도출하는 단계에서 k-NN 분류 처리 시스템(100)은 상기 제1 클라우드와 상기 제2 클라우드 사이에서 난수를 교환하여, 상기 암호화 kd 트리 내 임의의 데이터 쌍에 대한 크기 비교에 따라 반환되는 데이터의 값을 결정할 수 있다. 예를 들면, k-NN 분류 처리 시스템(100)은 질의결과 검증 과정에서, E(q _j )와 노드의 하한점 E(node _z . lb _j ) 및 상한점 E(node _z . ub _j ) 사이에서 각각 GSCMP 프로토콜을 수행하고, 그 결과를 E(ψ ₁) 및 E(ψ ₂)에 각각 저장할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 ψ ₁<ψ ₂를 만족하는 경우 E(1)을 반환하고, ψ ₁>ψ ₂인 경우 E(0)을 반환할 수 있다. 이때, k-NN 분류 처리 시스템(100)은 난수가 포함된 데이터를 교환할 수 있다.According to an embodiment, when a GSCMP protocol is selected as the encryption operation protocol, in the step of deriving the result data, the k-NN classification processing system 100 exchanges random numbers between the first cloud and the second cloud. The value of the returned data may be determined according to the size comparison for any data pair in the encrypted kd tree. For example, the k-NN classification processing system 100 may determine between E ( q _j ) and the lower limit E ( node _z . Lb _j ) and the upper limit E ( node _z . Ub _j ) of the node in the query result verification process . In the GSCMP protocol, respectively, and the results can be stored in E ( ψ ₁ ) and E ( ψ ₂ ), respectively. In this case, the k-NN classification processing system 100 may return E (1) when ψ ₁ < ψ ₂ and E (0) when ψ ₁ > ψ ₂ . In this case, the k-NN classification processing system 100 may exchange data including a random number.

실시예에 따라서, 상기 암호화 연산 프로토콜로서, GSPE 프로토콜이 선정되면, 상기 결과데이터를 도출하는 단계에서 k-NN 분류 처리 시스템(100)은 상기 암호화 kd 트리 내 m 차원의 데이터 E(p)가, 상기 kNN 질의와 연관된 질의 영역에 포함되면, 상기 GSPE 프로토콜에 의한 수행 결과로서 'E(1)'을 반환하고, 상기 데이터 E(p)가, 상기 질의 영역에 포함되지 않으면, 상기 GSPE 프로토콜에 의한 수행 결과로서 'E(0)'을 반환할 수 있다. 예를 들면, k-NN 분류 처리 시스템(100)은 암호화 인덱스를 탐색하는 과정에서, E(q)와 E(node _z )(1≤z≤num _node )를 기반으로 GSPE 프로토콜을 수행함으로써, 질의 지점을 포함하는 노드를 탐색할 수 있다. 이때, GSPE 수행 결과 반환된 E(α _z )의 값이 E(1)인 노드는 질의 지점을 포함하는 노드일 수 있다. 이때, k-NN 분류 처리 시스템(100)은 제1 클라우드 및 제2 클라우드는 어느 노드가 질의 영역과 겹치는 영역인지 알 수 없게 할 수 있다.According to an embodiment, when a GSPE protocol is selected as the encryption operation protocol, the k-NN classification processing system 100 may determine that m-dimensional data E (p) in the encryption kd tree is obtained in the step of deriving the result data. If included in the query region associated with the kNN query, returns 'E (1)' as a result of execution by the GSPE protocol, and if the data E (p) is not included in the query region, As a result of execution, 'E (0)' can be returned. For example, the k-NN classification processing system 100 performs a GSPE protocol based on E ( q ) and E ( node _z ) (1 ≦ z ≦ num _node ) in the process of searching the encryption index. You can search for nodes that contain points. In this case, the node whose value of E ( α _z ) returned as a result of performing GSPE may be a node including a query point. In this case, the k-NN classification processing system 100 may make it impossible for the first cloud and the second cloud to know which node overlaps the query area.

실시예에 따라서, 상기 결과데이터를 도출하는 단계에서 k-NN 분류 처리 시스템(100)은 상기 제1 클라우드에서, GSPE 프로토콜을 기반으로, 상기 kNN 질의의 지점에 관한 복수의 데이터 E(a)를, 상기 암호화 데이터베이스에서 탐색하여, 상기 제2 클라우드로 전송하고, 상기 제2 클라우드에서, 상기 복호화 비밀키를 통해, 상기 복수의 데이터 E(a) 각각을, 복수의 데이터 E'(a)로 복호화하고, 상기 복수의 데이터 E'(a)를 각각 포함하는 노드 그룹을 생성하며, 상기 제1 클라우드에서, 정해진 순서에 따라 상기 제2 클라우드로부터 노드 그룹을 수신하고, 상기 노드 그룹에 저장된 데이터 E'(a) 및 상기 데이터 E(a)를 이용한 SM 프로토콜을 기반으로, 상기 제1 클라우드와 상기 제2 클라우드 간의 다자간 계산을 수행할 수 있다.According to an embodiment, in the step of deriving the result data, the k-NN classification processing system 100 may generate a plurality of data E (a) regarding a point of the kNN query, based on a GSPE protocol, in the first cloud. Search in the encryption database, transmit to the second cloud, and decrypt each of the plurality of data E (a) into a plurality of data E '(a) through the decryption secret key in the second cloud. And generate a node group each including the plurality of data E '(a), receive the node group from the second cloud in a predetermined order in the first cloud, and store the data E' stored in the node group. Based on the (a) and the SM protocol using the data E (a), multilateral calculation between the first cloud and the second cloud may be performed.

즉, k-NN 분류 처리 시스템(100)은 암호화 인덱스 탐색을 위하여, 제1 클라우드에서 GSPE 프로토콜을 수행함으로써, 질의 지점을 포함하는 노드 E(a)를 탐색할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 E(α)의 순서를 변경하여 제2 클라우드(180)로 전송하고, 제2 클라우드에서 복호화 한 후, c개의 노드 그룹 Group을 생성할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 각 노드 그룹에 할당된 노드의 순서를 랜덤하게 변환한 후, 이를 제1 클라우드(150)로 전송할 수 있다. k-NN 분류 처리 시스템(100)은 제1 클라우드에서, 각 노드 그룹에 속한 노드의 식별 번호를 역변경하고, 노드 그룹 별 노드에 저장된 데이터와 각 노드의 E(α)를 이용해 SM 프로토콜을 수행할 수 있다. 또한, k-NN 분류 처리 시스템(100)은 E(cand)를 반환함으로써 암호화 인덱스 탐색을 종료할 수 있다.That is, the k-NN classification processing system 100 may search for node E (a) including a query point by performing the GSPE protocol in the first cloud to search for an encryption index. Also, k-NN and then classification processing system 100 by changing the order of E (α), and transferred to a second cloud 180, decoded by the second cloud, it is possible to produce a c-node group Group. In addition, the k-NN classification processing system 100 may randomly convert the order of the nodes assigned to each node group, and then transmit them to the first cloud 150. In the first cloud, the k-NN classification processing system 100 reversely changes an identification number of a node belonging to each node group, and performs an SM protocol using data stored in nodes of each node group and E ( α ) of each node. can do. In addition, the k-NN classification processing system 100 may terminate the encryption index search by returning E ( cand ).

이러한, 가블드 회로 기반 k-NN 분류 처리 방법은 k-NN 질의처리 알고리즘의 결과데이터를 분류 분석함으로써, 정보 유출 없이 해당 결과데이터에 대한 등급을 추출하는 분석을 할 수 있다.In the garbled circuit-based k-NN classification processing method, the result data of the k-NN query processing algorithm is classified and analyzed, and thus, the analysis may be performed to extract a grade of the corresponding result data without information leakage.

또한, 가블드 회로 기반 k-NN 분류 처리 방법은 가블드 회로 및 데이터 패킹 기법 기반의 ESSED 프로토콜, GSCMP 프로토콜, 및 GSPE 프로토콜 중 적어도 하나의 암호화 연산 프로토콜을 수행함으로써, 연산 횟수를 감소시켜 효율적인 질의처리 성능을 제공할 수 있다.In addition, the garbled circuit-based k-NN classification processing method performs an encryption query protocol at least one of the ESSED protocol, the GSCMP protocol, and the GSPE protocol based on the garbled circuit and the data packing scheme, thereby reducing the number of operations and efficiently processing the query. Can provide performance.

또한, 가블드 회로 기반 k-NN 분류 처리 방법은 향상된 암호화 연산 프로토콜을 기반으로 하는 암호화 인덱스 탐색과 암호화 데이터베이스 상에서의 데이터 접근 패턴 보호를 지원하는 k-NN 질의처리 알고리즘을 제공함으로써, 추가적인 정보의 노출을 방지하여 데이터 보호와 사용자 질의 보호뿐만 아니라, 질의 처리 과정에서의 데이터 접근 패턴 보호를 모두 지원할 수 있다.In addition, the garbled circuit-based k-NN classification processing method exposes additional information by providing a k-NN query processing algorithm that supports encryption index search and data access pattern protection on an encryption database based on an improved encryption algorithm. It can protect not only data protection and user query, but also data access pattern protection during query processing.

본 발명의 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment of the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

100: 가블드 회로 기반 k-NN 분류 처리 시스템
110: 데이터베이스 120: kd 트리
130: 암호화 공개키 140: 복호화 비밀키
150: 제1 클라우드 160: 암호화 데이터베이스
170: 암호화 kd 트리 180: 제2 클라우드
190: 사용자 단말100: garbled circuit based k-NN classification processing system
110: database 120: kd tree
130: encryption public key 140: decryption secret key
150: first cloud 160: encryption database
170: encrypted kd tree 180: second cloud
190: user terminal

Claims

Building a first cloud and a second cloud that is non-colluded with the first cloud;
Maintaining an encrypted database encrypted with data stored in an original database and an encrypted public key generated in association with the encryption in the first cloud;
Maintaining a decryption secret key corresponding to the encryption public key in the second cloud;
In the user terminal distributed with the encryption public key, when a kNN (k Nearest Neighbor) query is generated, multilateral calculation between the first cloud and the second cloud based on the encryption public key and the decryption secret key (SMC) Performing Secure Multiparty Computation to derive result data of the kNN query from the encryption database;
As the result data is derived, a frequent frequency between the kNN query and the past result data derived by the kNN query is performed by performing Secure Frequency (SF) between the first cloud and the second cloud. Calculating;
Selecting a search category through the calculation of the frequency; And
Providing the derived result data to a user terminal if the search category matches the category of the derived result data
Including,
Deriving the result data,
In the first cloud, based on a GSPE protocol, retrieving a plurality of data E (a) relating to a point of the kNN query from the cryptographic database and transmitting to the second cloud;
In the second cloud, each of the plurality of data E (a) is decrypted into a plurality of data E '(a) through the decryption secret key, and each node includes the plurality of data E' (a). Creating a group; And
Receiving a node group from the second cloud in a predetermined order in the first cloud, based on the SM protocol using the data E '(a) and the data E (a) stored in the node group, the first Performing a multiparty calculation between a cloud and the second cloud
A garbled circuit (GARBLED CIRCUIT) based k-NN classification processing method comprising a.

The method of claim 1,
Selecting the search category,
A first step of adding an arbitrary code value to the category of the past result data having the highest frequency calculated in the first cloud and encoding the same into an encryption category;
Providing, in the first cloud, the encryption category to the second cloud and providing the code value to the user terminal;
In the second cloud, decrypting the encryption category to generate a decryption category, and providing the decryption category to the user terminal;
A fourth step of recovering, at the user terminal, to the encryption category by using the provided code value and the decryption category; And
If restoring to the encryption category is possible, selecting, as the search category, a category of the past result data whose highest frequency is calculated;
Garbled circuit-based k-NN classification processing method comprising a.

The method of claim 2,
Selecting the search category,
Repeating the first step to the fourth step by sequentially using a category of the past result data determined according to the magnitude of the frequency, and stopping the repetition if it is possible to restore the encryption category.
The garbled circuit-based k-NN classification processing method further comprising.

delete

The method of claim 1,
The garbled circuit-based k-NN classification processing method,
Dividing the data stored in the original database into a plurality of attributes and columns to construct a kd tree; And
Maintaining an encrypted kd tree encrypted with the kd tree in the first cloud;
More,
Deriving the result data,
Deriving the result data by processing the kNN query on the encrypted database based on the encrypted kd tree based on the selected encryption operation protocol.
The garbled circuit-based k-NN classification processing method further comprising.

The method of claim 5,
Deriving the result data,
Selecting any one of an Enhanced Secure Squared Euclidean Distance (ESSED) protocol, a Garbled Circuit based Secure Compare (GSCMP) protocol, and a Garbled Circuit based Secure Point Enclosure (GSPE) protocol as the encryption algorithm.
The garbled circuit-based k-NN classification processing method further comprising.

The method of claim 5,
If the ESSED protocol is selected as the encryption operation protocol,
Deriving the result data,
Summing the dimension-specific distances for any pair of data in the encrypted kd tree in two dimensions to reduce the number of operations based on encrypted data for deriving the result data.
The garbled circuit-based k-NN classification processing method further comprising.

The method of claim 5,
If the GSCMP protocol is selected as the encryption operation protocol,
Deriving the result data,
Exchanging random numbers between the first cloud and the second cloud to determine a value of the data returned according to a size comparison for any data pair in the encrypted kd tree
The garbled circuit-based k-NN classification processing method further comprising.

The method of claim 5,
If the GSPE protocol is selected as the encryption operation protocol,
Deriving the result data,
If the m-dimensional data E (p) in the encrypted kd tree is included in the query region associated with the kNN query, returning 'E (1)' as a result of execution by the GSPE protocol; And
If the data E (p) is not included in the query region, returning 'E (0)' as a result of execution by the GSPE protocol.
The garbled circuit-based k-NN classification processing method further comprising.

delete