KR101798378B1

KR101798378B1 - Method for de-identification of personal information based on genetic algorithm and apparatus for the same

Info

Publication number: KR101798378B1
Application number: KR1020160082860A
Authority: KR
Inventors: 최대우; 황명식
Original assignee: 주식회사 파수닷컴
Priority date: 2016-06-30
Filing date: 2016-06-30
Publication date: 2017-11-16

Abstract

A method and an apparatus for de-identifying personal information based on a genetic algorithm are disclosed. The method for de-identifying personal information comprises the steps of: generating a primitive lattice including a plurality of layers composed of one or more nodes; setting a selection node-1 and a selection node-2; setting an intersection node and a modification node; and setting a final lattice composed of a node corresponding to a de-identified table having a suppression value ratio not greater than a preset suppression threshold value among the selection node-1, the selection node-2, the intersection node, and the modification node. Accordingly, the personal information can be effectively de-identified.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and apparatus for identifying non-personal information based on genetic algorithms,

본 발명은 데이터 처리 기술에 관한 것으로, 더욱 상세하게는 유전 알고리즘을 기반으로 개인정보에 대한 비식별화를 효율적으로 수행하기 위한 기술에 관한 것이다.The present invention relates to a data processing technique, and more particularly, to a technique for efficiently performing non-discrimination on personal information based on a genetic algorithm.

정보통신기술(예를 들어, 빅데이터 관련 기술)이 발전함에 따라 개인정보를 수집하는 기술, 수집된 개인정보를 분석하는 기술 등이 발전하고 있다. 개인정보는 주민번호, 주소, 우편번호, 이름, 생일, 성별, 질병, 연봉 등을 포함할 수 있다. 이와 같이, 빅데이터 관련 기술이 발전함에 따라 개인정보는 다양한 분야에서 사용될 수 있다. 예를 들어, 기업은 개인정보에 기초하여 특정 소비자에게 자신의 상품, 서비스 등을 광고할 수 있고, 이에 따라 소비자는 자신이 원하는 상품, 서비스에 대한 정보를 기업으로부터 용이하게 획득할 수 있다.As information and communication technology (for example, big data related technology) develops, technologies for collecting personal information and techniques for analyzing collected personal information are being developed. Personal information may include resident registration number, address, postal code, name, birth date, gender, disease, salary, etc. As such, as the big data related technology develops, personal information can be used in various fields. For example, a company can advertise its own goods, services, and the like to a specific consumer based on personal information, and accordingly, the consumer can easily acquire information about the goods and services desired by the company.

그러나 개인정보가 무분별하게 사용됨으로써 정보주체인 개인의 기본권이 침해될 수 있다. 이러한 문제를 해소하기 위해 개인정보의 비식별화 기술이 고려될 수 있다. 비식별화 기술은 개인정보의 일부 또는 전부를 삭제하거나 대체(즉, 개인정보를 지시하는 데이터에 대한 일반화 수행)함으로써 다른 정보와 결합하여도 특정 개인을 식별할 수 없도록 하는 것을 의미한다. 개인정보에 대한 비식별화가 수행되는 경우, 일반화 레벨(level)에 따라 개인정보가 일반화되는 범위가 달라질 수 있다. 모든 일반화 레벨들 각각에 대한 개인정보의 비식별화가 수행되는 경우, 비식별화된 개인정보를 생성하기 위해 많은 시간이 소모될 수 있다.However, the indefinite use of personal information may infringe the fundamental rights of the individual, the information subject. To solve this problem, non-identification of personal information may be considered. Non-identification technology means that some or all of personal information is deleted or replaced (that is, generalization is performed on the data indicating the personal information), so that it is impossible to identify a specific individual even when combined with other information. When non-discrimination of personal information is performed, the extent to which personal information is generalized may vary according to the level of generalization. If the non-identification of the personal information for each of the generalization levels is performed, a large amount of time may be consumed to generate the non-identified personal information.

또한, 일반화 레벨에 따라 개인정보의 사용성, 재식별화 위험성 등이 달라질 수 있다. 예를 들어, 개인정보 중에서 상대적으로 많은 부분이 일반화되는 경우, 비식별화된 개인정보의 분석시에 상대적으로 많은 오류가 발생하게 되고, 이에 따라 비식별화된 개인정보의 사용성은 저하될 수 있다. 반대로, 개인정보 중에서 상대적으로 적은 부분이 일반화되는 경우, 비식별화된 개인정보는 상대적으로 용이하게 추론되거나 재식별될 수 있으며, 이에 따라 비식별화된 개인정보의 재식별화 위험성은 증가될 수 있다.Also, depending on the level of generalization, the usability of personal information and the risk of re-identification may vary. For example, when a relatively large portion of the personal information is generalized, a relatively large number of errors are generated in the analysis of the non-identified personal information, and the usability of the non-identified personal information may be deteriorated . Conversely, if a relatively small portion of the personal information is generalized, the non-identified personal information can be relatively easily inferred or re-identified, thereby increasing the risk of re-identifying the non-identified personal information have.

또한, 개인정보를 포함하는 테이블의 레코드에 대한 속성이 효율적으로(또는, 신속하게) 설정될 수 있다. 따라서, 개인정보의 비식별화가 보다 신속히 수행될 수 있다.In addition, an attribute for a record of a table including personal information can be set efficiently (or quickly). Thus, non-identification of personal information can be performed more quickly.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은 유전 알고리즘을 기반으로 개인정보를 효율적으로 비식별화하기 위한 방법 및 장치를 제공하는 데 있다.It is an object of the present invention to provide a method and apparatus for efficiently non-identifying personal information based on a genetic algorithm.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 개인정보 비식별화 장치에서 수행되는 개인정보 비식별화 방법은, 원시 테이블에 포함된 레코드들 각각에 기록된 원시 데이터가 지시하는 개인정보의 종류별 일반화 레벨에 기초하여, 개인정보의 종류별 일반화 레벨을 지시하는 적어도 하나의 노드로 구성되는 복수의 계층들을 포함하는 원시 래티스를 생성하는 단계, 상기 복수의 계층들 중에서 계층-n에 속하는 임의의 노드를 선택 노드-1로 설정하고, 계층-m에 속하는 임의의 노드를 선택 노드-2로 설정하는 단계, 상기 선택 노드-1 및 상기 선택 노드-2 각각에 대응하는 비식별화된 테이블의 억제 값 비율과 미리 설정된 억제 임계값의 비교 결과에 기초하여, 상기 원시 래티스에 속하는 임의의 노드들 각각을 교차 노드 및 변이 노드로 설정하는 단계, 및 상기 선택 노드-1, 상기 선택 노드-2, 상기 교차 노드 및 상기 변이 노드 중에서 상기 미리 설정된 억제 임계값 이하의 억제 값 비율을 가지는 비식별화된 테이블에 대응하는 노드로 구성되는 최종 래티스를 설정하는 단계를 포함하며, 상기 n 및 m 각각은 자연수이고, 상기 비식별화된 테이블은 노드가 지시하는 일반화 레벨에 대응하는 데이터를 기초로 상기 원시 테이블이 비식별화된 결과이고, 상기 억제 값의 비율은 상기 비식별화된 테이블을 구성하는 동등 클래스들 중에서 미리 설정된 K-익명성을 만족하지 않는 동등 클래스의 비율이다.According to another aspect of the present invention, there is provided a method for identifying personal information in a personal information non-discrimination apparatus, the method comprising the steps of: Generating a source lattice comprising a plurality of layers consisting of at least one node indicating a generalization level for each kind of personal information based on the kind generalization level, And setting an arbitrary node belonging to the layer-m as the selected node-2; setting the suppression value of the non-identified table corresponding to the selected node-1 and the selected node- Based on the comparison result of the ratio and the predetermined suppression threshold value, setting each of the arbitrary nodes belonging to the primitive lattice as a crossover node and a mutation node And a node corresponding to an unidentified table having a ratio of suppression value of the selected node-1, the selected node-2, the crossing node, and the mutation node equal to or less than the predetermined suppression threshold value. Wherein each of the n and m is a natural number and the unidentified table is a result of the unmodified table being based on data corresponding to a generalization level indicated by the node, Value ratio is a ratio of an equivalent class that does not satisfy the preset K-anonymity among the equivalent classes constituting the non-identified table.

여기서, 상기 원시 래티스 내에서 상기 선택 노드-1은 상기 선택 노드-2와 연결될 수 있다.Here, within the primitive lattice, the selection node-1 may be connected to the selection node-2.

여기서, 상기 계층-n은 상기 원시 래티스의 복수의 계층들 중에서 최하위 계층으로부터 2/3 지점에 위치하는 계층일 수 있고, 상기 계층-m은 상기 원시 래티스의 복수의 계층들 중에서 최하위 계층으로부터 1/3 지점에 위치하는 계층일 수 있다.Here, the layer-n may be a layer located at 2/3 point from the lowest layer among the plurality of layers of the original lattice, and the layer-m may be a 1 / 3 < / RTI >

여기서, 상기 선택 노드-1 및 상기 선택 노드-2 각각에 대응하는 비식별화된 테이블의 억제 값 비율이 상기 미리 설정된 억제 임계값 이하인 경우, 상기 교차 노드는 상기 복수의 계층들 중에서 상기 계층-m과 최하위 계층 사이의 1/2 지점에 위치하는 계층에 속하는 임의의 노드로 설정될 수 있고, 상기 변이 노드는 상기 복수의 계층들 중에서 상기 계층-m에 속하는 노드들 중에서 상기 선택 노드-2를 제외한 임의의 노드로 설정될 수 있다.Here, if the ratio of suppression values of the non-identified table corresponding to each of the selected node-1 and the selected node-2 is equal to or less than the predetermined suppression threshold value, the crossover node selects, from among the plurality of layers, And a node belonging to a layer located at a half-point between the lowest layer and the lowest layer, and the mutation node can be set to any one of the nodes belonging to the layer- It can be set to any node.

여기서, 상기 선택 노드-1에 대응하는 비식별화된 테이블의 억제 값 비율이 상기 미리 설정된 억제 임계값 이하이고, 상기 선택 노드-2에 대응하는 비식별화된 테이블의 억제 값 비율이 상기 미리 설정된 억제 임계값을 초과하는 경우, 상기 교차 노드는 상기 복수의 계층들 중에서 상기 계층-n과 상기 계층-m 사이의 1/2 지점에 위치하는 계층에 속하는 임의의 노드로 설정될 수 있고, 상기 변이 노드는 상기 복수의 계층들 중에서 상기 계층-n에 속하는 노드들 중에서 상기 선택 노드-1을 제외한 임의의 노드로 설정될 수 있다.Here, it is assumed that the ratio of the suppression value of the non-identified table corresponding to the selected node-1 is equal to or less than the predetermined suppression threshold value, and the ratio of the suppression value of the non-identified table corresponding to the selected node- Inhibition threshold value, the crossover node may be set to any node belonging to a layer located at a half-point between the layer-n and the layer-m among the plurality of layers, The node may be set to any node except for the selected node-1 among the nodes belonging to the layer-n among the plurality of layers.

여기서, 상기 선택 노드-1 및 상기 선택 노드-2 각각에 대응하는 비식별화된 테이블의 억제 값 비율이 상기 미리 설정된 억제 임계값을 초과하는 경우, 상기 교차 노드는 상기 복수의 계층들 중에서 상기 계층-n과 최상위 계층 사이의 1/2 지점에 위치하는 계층에 속하는 임의의 노드로 설정될 수 있고, 상기 변이 노드는 상기 복수의 계층들 중에서 상기 계층-n에 속하는 노드들 중에서 상기 선택 노드-1을 제외한 임의의 노드로 설정될 수 있다.Here, if the rate of suppression of the non-identified table corresponding to each of the selected node-1 and the selected node-2 exceeds the predetermined suppression threshold value, the crossover node, among the plurality of layers, -n and an uppermost layer, and the mutation node can be set to any one of the nodes belonging to the layer-n among the plurality of layers, May be set to any node.

여기서, 상기 최종 래티스를 구성하는 노드들의 개수는 상기 복수의 계층들 중에서 가장 많은 노드를 포함하는 계층에 속한 노드 개수의 x배 이상일 수 있고, 상기 x는 0을 초과하는 실수일 수 있다.Here, the number of nodes constituting the final lattice may be at least x times as many as the number of nodes belonging to a layer including the largest number of nodes among the plurality of layers, and x may be a real number exceeding zero.

상기 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 개인정보 비식별화 장치는 프로세서 및 상기 프로세서를 통해 실행되는 적어도 하나의 명령이 저장된 메모리를 포함하고, 상기 적어도 하나의 명령은 원시 테이블에 포함된 레코드들 각각에 기록된 원시 데이터가 지시하는 개인정보의 종류별 일반화 레벨에 기초하여, 개인정보의 종류별 일반화 레벨을 지시하는 적어도 하나의 노드로 구성되는 복수의 계층들을 포함하는 원시 래티스를 생성하는 단계, 상기 복수의 계층들 중에서 계층-n에 속하는 임의의 노드를 선택 노드-1로 설정하고, 계층-m에 속하는 임의의 노드를 선택 노드-2로 설정하고. 상기 선택 노드-1 및 상기 선택 노드-2 각각에 대응하는 비식별화된 테이블의 억제 값 비율과 미리 설정된 억제 임계값의 비교 결과에 기초하여, 상기 원시 래티스에 속하는 임의의 노드들 각각을 교차 노드 및 변이 노드로 설정하고, 그리고 상기 선택 노드-1, 상기 선택 노드-2, 상기 교차 노드 및 상기 변이 노드 중에서 상기 미리 설정된 억제 임계값 이하의 억제 값 비율을 가지는 비식별화된 테이블에 대응하는 노드로 구성되는 최종 래티스를 설정하도록 실행 가능하며, 상기 n 및 m 각각은 자연수이고, 상기 비식별화된 테이블은 노드가 지시하는 일반화 레벨에 대응하는 데이터를 기초로 상기 원시 테이블이 비식별화된 결과이고, 상기 억제 값의 비율은 상기 원시 테이블의 레코드에 기록된 원시 데이터 중에서 상기 비식별화된 테이블을 생성하기 위해 억제 값으로 설정되는 비율을 지시한다.According to another aspect of the present invention, there is provided a personal information non-discrimination apparatus including a processor and a memory storing at least one instruction executed through the processor, wherein the at least one instruction is included in a raw table Generating a raw lattice comprising a plurality of layers consisting of at least one node indicating a generalization level for each type of personal information, based on the type of generalization level of the personal information indicated by the raw data recorded in each of the records, Sets an arbitrary node belonging to the layer-n among the plurality of layers to the selected node-1, and sets an arbitrary node belonging to the layer-m to the selected node-2. Each of the nodes belonging to the original lattice is referred to as a crossing node, based on the comparison result of the inhibition value ratio of the non-identified table corresponding to the selected node-1 and the selected node-2, And a node corresponding to an unidentified table having an inhibition value ratio lower than or equal to the predetermined suppression threshold value among the selection node-1, the selection node-2, the intersection node, Wherein each of the n and m is a natural number and the un-identified table is a result of the un-identified result of the raw table based on data corresponding to the generalized level indicated by the node , And the ratio of the suppression value is a ratio of the original value of the original table Indicates the ratio is set to the inhibition values.

본 발명에 의하면, 미리 설정된 기준을 만족하는 일반화 레벨에 대응하는 개인정보에 대한 비식별화가 수행되므로, 비식별화 절차가 신속히 수행될 수 있다. 또한, 비식별화된 개인정보의 사용성이 향상될 수 있고, 비식별화된 개인정보의 재식별화 위험성이 감소(또는, 개인정보의 재식별화 위험성이 제거)될 수 있다.According to the present invention, since the non-discrimination of the personal information corresponding to the generalization level satisfying the predetermined criteria is performed, the non-discrimination procedure can be performed quickly. In addition, the usability of the non-identified personal information may be improved and the risk of re-identification of the non-identified personal information may be reduced (or the risk of re-identification of the personal information may be eliminated).

또한, 고객의 데이터 유형, 사용 목적 등을 고려하여 개인정보의 비식별화가 수행될 수 있으며, 이에 따라 비식별화된 개인정보의 사용성이 보다 향상될 수 있다. 개인정보의 비식별화를 위해 유전 알고리즘이 사용됨으로써 개인정보의 비식별화가 신속히 수행될 수 있다.In addition, the non-identification of the personal information can be performed in consideration of the data type of the customer, the purpose of use, etc., and thus the usability of the non-identified personal information can be further improved. By using a genetic algorithm for non - discrimination of personal information, non - discrimination of personal information can be performed quickly.

도 1은 본 발명에 따른 방법들을 수행하는 개인정보 비식별화 장치의 일 실시예를 도시한 블록도이다.
도 2는 개인정보 비식별화 방법에 대한 일 실시예를 도시한 흐름도이다.
도 3은 레코드의 속성 설정 방법에 대한 일 실시예를 도시한 흐름도이다.
도 4는 테이블의 일 실시예를 도시한 개념도이다.
도 5는 GH 모델을 설정하는 방법에 대한 일 실시예를 도시한 흐름도이다.
도 6은 우편번호 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.
도 7은 나이 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.
도 8은 국적 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.
도 9는 성별 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.
도 10은 비식별화된 테이블의 일 실시예를 도시한 개념도이다.
도 11은 비식별화된 테이블의 다른 실시예를 도시한 개념도이다.
도 12는 원시 래티스의 일 실시예를 도시한 개념도이다.
도 13은 최종 래티스의 설정 방법을 도시한 흐름도이다.
도 14는 마스킹 처리된 레코드들을 포함하는 테이블의 일 실시예를 도시한 개념도이다.1 is a block diagram illustrating an embodiment of a personal information non-discrimination apparatus for performing the methods according to the present invention.
2 is a flowchart illustrating an embodiment of a personal information ratio identification method.
3 is a flowchart showing an embodiment of a method of setting attributes of a record.
4 is a conceptual diagram showing an embodiment of a table.
5 is a flowchart showing an embodiment of a method of setting a GH model.
6 is a conceptual diagram showing an embodiment of a GH model for a zip code record.
7 is a conceptual diagram showing an embodiment of a GH model for an age record.
8 is a conceptual diagram showing an embodiment of a GH model for a national record.
9 is a conceptual diagram showing an embodiment of a GH model for gender records.
10 is a conceptual diagram showing an embodiment of a non-identified table.
11 is a conceptual diagram showing another embodiment of the non-identified table.
12 is a conceptual diagram showing an embodiment of primitive lattice.
13 is a flowchart showing a method of setting the final lattice.
14 is a conceptual diagram showing an embodiment of a table including the masked records.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the relevant art and are to be interpreted in an ideal or overly formal sense unless explicitly defined in the present application Do not.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate the understanding of the present invention, the same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

도 1은 본 발명에 따른 방법들을 수행하는 개인정보 비식별화 장치의 일 실시예를 도시한 블록도이다.1 is a block diagram illustrating an embodiment of a personal information non-discrimination apparatus for performing the methods according to the present invention.

도 1을 참조하면, 개인정보 비식별화 장치(100)는 적어도 하나의 프로세서(110) 및 메모리(120)를 포함할 수 있다. 또한, 개인정보 비식별화 장치(100)는 네트워크와 연결되어 통신을 수행하는 네트워크 인터페이스 장치(130), 입력 인터페이스 장치(140), 출력 인터페이스 장치(150), 저장 장치(160) 등을 더 포함할 수 있다. 개인정보 비식별화 장치(100)에 포함된 각각의 구성 요소들은 버스(bus)(170)에 의해 연결되어 서로 통신을 수행할 수 있다. 개인정보 비식별화 장치(100)는 간략히 "비식별화 장치(100)"로 지칭될 수 있다. Referring to FIG. 1, the personal information non-discrimination apparatus 100 may include at least one processor 110 and a memory 120. The personal information non-discrimination apparatus 100 further includes a network interface device 130, an input interface device 140, an output interface device 150, a storage device 160, etc., can do. Each component included in the personal information non-identification apparatus 100 may be connected by a bus 170 and communicate with each other. The personal information non-identifying device 100 may be referred to simply as "non-identifying device 100 ".

프로세서(110)는 메모리(120) 및/또는 저장 장치(160)에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(110)는 중앙 처리 장치(central processing unit; CPU), 그래픽 처리 장치(graphics processing unit; GPU) 또는 본 발명에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 메모리(120)와 저장 장치(160)는 휘발성 저장 매체 및/또는 비휘발성 저장 매체로 구성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory; ROM) 및/또는 랜덤 액세스 메모리(random access memory; RAM)로 구성될 수 있다.The processor 110 may execute a program command stored in the memory 120 and / or the storage device 160. The processor 110 may refer to a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which the methods of the present invention are performed. The memory 120 and the storage device 160 may be composed of a volatile storage medium and / or a non-volatile storage medium. For example, the memory 120 may be comprised of read only memory (ROM) and / or random access memory (RAM).

여기서, 비식별화 장치(100)는 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(laptop computer), 태블릿(tablet) PC, 무선전화기(wireless phone), 모바일폰(mobile phone), 스마트 폰(smart phone) 등을 일 수 있다.Here, the non-identifying device 100 may be a desktop computer, a laptop computer, a tablet PC, a wireless phone, a mobile phone, a smart phone, And the like.

한편, 비식별화 장치(100)에서 수행되는 방법(예를 들어, 신호의 전송 또는 수신)이 설명되는 경우에도 이에 대응하는 다른 장치는 비식별화 장치(100)에서 수행되는 방법과 상응하는 방법(예를 들어, 신호의 수신 또는 전송)을 수행할 수 있다. 즉, 비식별화 장치(100)의 동작이 설명된 경우에 이에 대응하는 다른 장치는 비식별화 장치(100)의 동작과 상응하는 동작을 수행할 수 있다. 반대로, 다른 장치의 동작이 설명된 경우에 이에 대응하는 비식별화 장치(100)는 다른 장치의 동작과 상응하는 동작을 수행할 수 있다.On the other hand, even when the method (e.g., transmission or reception of a signal) performed in the non-identifying apparatus 100 is described, the other apparatus corresponding thereto corresponds to the method performed in the non-identifying apparatus 100 (E.g., receiving or transmitting a signal). That is, when the operation of the non-identifying apparatus 100 is described, another corresponding apparatus can perform an operation corresponding to the operation of the non-identifying apparatus 100. [ Conversely, when the operation of another apparatus is described, the corresponding non-identifying apparatus 100 can perform an operation corresponding to the operation of the other apparatus.

도 2는 개인정보 비식별화 방법에 대한 일 실시예를 도시한 흐름도이다.2 is a flowchart illustrating an embodiment of a personal information ratio identification method.

도 2를 참조하면, 개인정보 비식별화 방법은 도 1을 참조하여 설명된 비식별화 장치(100)(예를 들어, 비식별화 장치(100)에 포함된 프로세서(110))에 의해 수행될 수 있다. 비식별화 장치(100)는 데이터베이스(database)(또는, CSV(comma-separated values) 파일 등)로부터 복수의 레코드(record)들로 구성되는 테이블(table)을 획득할 수 있다(S200). 복수의 레코드들 각각에 개인정보를 지시하는 원시 데이터가 기록될 수 있다. 또한, 복수의 레코드들 각각에 개인정보 외의 정보(이하, "비-개인정보"라 함)를 지시하는 원시 데이터가 기록될 수 있다. 데이터베이스는 비식별화 장치(100) 내에 위치할 수 있고, 또는 다른 장치(예를 들어, 서버) 내에 위치할 수 있다.Referring to FIG. 2, the personal information non-discrimination method is performed by the non-identifying apparatus 100 described with reference to FIG. 1 (e.g., the processor 110 included in the non-identifying apparatus 100) . The non-identifying apparatus 100 may obtain a table composed of a plurality of records from a database (or a comma-separated values (CSV) file, etc.) (S200). Raw data indicating individual information may be recorded in each of the plurality of records. In addition, raw data indicating the information other than personal information (hereinafter referred to as "non-personal information") may be recorded in each of the plurality of records. The database may be located within the non-identifying device 100, or may be located within another device (e.g., a server).

테이블을 획득하기 위해 비식별화 장치(100)는 데이터베이스로의 접속을 위해 사용되는 접속 정보(예를 들어, IP(internet protocol) 주소, 포트(port) 번호, ID(identifier), SID(system ID), 비밀번호 등)를 생성할 수 있다. 또는, 접속 정보는 비식별화 장치(100)의 입력 인터페이스 장치(140)를 통해 사용자로부터 획득될 수 있다. 비식별화 장치(100)는 생성된 접속 정보에 기초하여 데이터베이스로의 접속이 승인된 경우 데이터베이스로부터 복수의 레코드들로 구성되는 테이블을 획득할 수 있다.In order to obtain the table, the non-identifying device 100 may include access information (e.g., IP (internet protocol) address, port number, identifier, SID ), A password, etc.). Alternatively, the connection information may be obtained from the user via the input interface device 140 of the non-identifying device 100. The non-identifying apparatus 100 may acquire a table composed of a plurality of records from the database when the connection to the database is approved based on the generated connection information.

비식별화 장치(100)는 테이블에 포함된 복수의 레코드들 각각의 속성을 설정할 수 있다(S210). 레코드에 대한 속성은 다음과 같이 설정될 수 있다.The non-identifying apparatus 100 may set attributes of each of a plurality of records included in the table (S210). The attributes for the record can be set as follows.

도 3은 레코드의 속성 설정 방법에 대한 일 실시예를 도시한 흐름도이다.3 is a flowchart showing an embodiment of a method of setting attributes of a record.

도 3을 참조하면, 비식별화 장치(100)는 정규 표현식(regular expression)을 설정할 수 있다(S211). 정규 표현식은 테이블의 레코드에 기록된 개인정보, 비-개인정보 등을 검색하기 위해 사용될 수 있다. 따라서, 비식별화 장치(100)는 정규 표현식에 의해 검색될 개인정보의 종류를 설정할 수 있다. 개인정보의 종류로 주민번호(또는, 여권번호, SSN(social security number)), 이름, 주소, 우편번호, 나이, 국적, 성별, 질병 등이 존재할 수 있다. 또한, 비식별화 장치(100)는 정규 표현식에 의해 검색될 비-개인정보의 종류를 설정할 수 있다. 비-개인정보의 종류로 환자번호 등이 존재할 수 있다. 개인정보 및 비-개인정보의 종류 정보는 비식별화 장치(100)의 입력 인터페이스 장치(140)를 통해 사용자로부터 획득될 수 있다.Referring to FIG. 3, the non-identifying apparatus 100 may set a regular expression (S211). The regular expression can be used to retrieve personal information, non-personal information, etc. recorded in a record of a table. Accordingly, the non-discrimination apparatus 100 can set the kind of personal information to be searched by the regular expression. Personal information may include the resident registration number (or passport number, SSN (social security number)), name, address, postal code, age, nationality, sex, In addition, the non-identifying device 100 may set the type of non-personal information to be searched by the regular expression. Non-personal information may be a patient number or the like. The type information of the personal information and the non-personal information may be acquired from the user through the input interface device 140 of the non-identification device 100.

또한, 정규 표현식은 검색된 개인정보, 비-개인정보 등이 기록된 레코드의 속성을 설정하기 위해 사용될 수 있다. 레코드의 속성은 ID(identifier), QI(quasi-identifier), SA(sensitive attribute), IA(insensitive attribute)(또는, NSA(non-SA)) 등으로 분류될 수 있다. ID는 특정 개인이 명시적(explicit)으로 식별되는 개인정보를 지시할 수 있다. 특정 개인은 ID로 설정된 하나의 개인정보만으로 식별될 수 있다. 예를 들어, 비식별화 장치(100)는 주민번호, 이름, 주소 등이 기록된 레코드의 속성이 ID로 설정되도록 정규 표현식을 설정할 수 있다. QI는 특정 개인이 묵시적(non-explicit)으로 식별되는 개인정보를 지시할 수 있다. 특정 개인은 QI로 설정된 하나의 개인정보만으로 식별될 수 없으나, QI로 설정된 하나의 개인정보와 다른 개인정보의 조합으로 식별될 수 있다. 예를 들어, 비식별화 장치(100)는 우편번호, 나이, 국적, 성별 등이 기록된 레코드의 속성이 QI로 설정되도록 정규 표현식을 설정할 수 있다.In addition, the regular expression may be used to set the attributes of records in which retrieved personal information, non-personal information, etc. are recorded. An attribute of a record may be classified into an identifier (ID), a quasi-identifier (QI), a sensitive attribute (SA), an insensitive attribute (IA) (or non-SA). The ID may indicate the personal information that a particular individual is identified explicitly. A particular individual can be identified with only one piece of personal information set to ID. For example, the non-identifying apparatus 100 may set a regular expression such that the attribute of the record in which the resident registration number, name, address, etc. are recorded is set to the ID. QI can direct personal information that a particular individual identifies as non-explicit. A specific individual can not be identified by only one piece of personal information set to QI, but can be identified by a combination of one piece of personal information set with QI and another piece of personal information. For example, the non-identifying apparatus 100 may set the regular expression so that the attribute of the record in which the postal code, age, nationality, sex, etc. is recorded is set to QI.

SA는 보호가 요구되는 민감한 개인정보(예를 들어, 미리 설정된 기준 이상의 민감도를 가지는 개인정보)를 지시할 수 있다. SA로 설정된 개인정보가 공개되는 경우 특정 개인의 신상에 문제가 발생될 수 있다. 예를 들어, 비식별화 장치(100)는 질병 등이 기록된 레코드의 속성이 SA로 설정되도록 정규 표현식을 설정할 수 있다. IA는 민감하지 않은 개인정보를 지시할 수 있다. 또는, IA는 SA보다 낮은 민감도를 가지는 개인정보를 지시할 수 있다. IA로 설정된 개인정보가 공개되는 경우 특정 개인의 신상에 문제가 발생되지 않을 수 있다. 예를 들어, 비식별화 장치(100)는 우편번호, 나이, 국적, 성별 등이 기록된 레코드의 속성이 IA로 설정되도록 정규 표현식을 설정할 수 있다.The SA may indicate sensitive personal information that requires protection (e.g., personal information with a sensitivity that is above a predetermined threshold). When the personal information set by SA is disclosed, a problem may arise in the personal information of a specific individual. For example, the non-identifying apparatus 100 may set the regular expression so that the attribute of the record in which the disease or the like is recorded is set to SA. The IA may direct non-sensitive personal information. Alternatively, the IA may indicate personal information with a sensitivity lower than the SA. If the personal information set in the IA is disclosed, the problem may not occur in the personal information of a specific individual. For example, the non-discrimination apparatus 100 may set the regular expression so that the attribute of the record in which the postal code, age, nationality, sex, etc. is recorded is set to IA.

비식별화 장치(100)는 테이블의 검색 대상 범위를 설정할 수 있다(S212). 검색 대상 범위는 테이블의 일부 영역을 지시할 수 있으며, 검색 대상 범위에 의해 지시되는 영역에 단계 S211에서 설정된 정규 표현식이 적용될 수 있다. 즉, 검색 대상 범위 내의 원시 데이터만을 사용함으로써 테이블 내의 모든 원시 데이터가 지시하는 개인정보의 종류(또는, 비-개인정보의 종류)가 파악될 수 있고, 원시 데이터가 기록된 레코드의 속성이 결정될 수 있다. 검색 대상 범위는 레코드의 개수(예를 들어, 테이블 중에서 로우(row)의 개수)를 지시할 수 있다. 예를 들어, 검색 대상 범위는 100개, 1000개 등으로 설정될 수 있다. 여기서, 단계 S212는 필요에 따라 생략될 수 있다. 검색 대상 범위 정보는 비식별화 장치(100)의 입력 인터페이스 장치(140)를 통해 사용자로부터 획득될 수 있다.The non-identification apparatus 100 can set a search target range of the table (S212). The search target range may indicate a part of the table, and the regular expression set in step S211 may be applied to the area indicated by the search target range. That is, by using only the raw data in the search target range, the kind of the personal information (or the kind of the non-personal information) indicated by all the raw data in the table can be grasped and the attribute of the record in which the raw data is recorded can be determined have. The search target range may indicate the number of records (e.g., the number of rows in the table). For example, the search target range may be set to 100, 1000, or the like. Here, step S212 may be omitted as necessary. The search target range information can be obtained from the user through the input interface device 140 of the non-identifying device 100. [

비식별화 장치(100)는 ID 속성을 가지는 레코드의 처리 방식을 설정할 수 있다(S213). 예를 들어, ID 속성을 가지는 레코드의 처리 방식은 다음과 같이 분류될 수 있다. 첫 번째 처리 방식으로, 비식별화 장치(100)는 ID 속성을 가지는 레코드를 테이블에서 제외할 수 있다. 따라서, 테이블은 ID 속성을 가지는 레코드를 포함하지 않을 수 있다. 두 번째 처리 방식으로, 비식별화 장치(100)는 ID 속성을 가지는 레코드에 기록된 원시 데이터에 대한 마스킹(masking) 처리를 할 수 있다. 따라서, 테이블은 ID 속성을 가지는 레코드 등을 포함할 수 있으며, ID 속성을 가지는 레코드에 마스킹 처리된 데이터가 기록될 수 있다. 세 번째 처리 방식으로, 비식별화 장치(100)는 ID 속성을 가지는 레코드에 기록된 원시 데이터를 그대로 사용할 수 있다.The non-identification apparatus 100 can set a processing method of a record having an ID attribute (S213). For example, the processing of a record having an ID attribute can be classified as follows. With the first processing scheme, the non-identifying apparatus 100 may exclude a record having an ID attribute from the table. Thus, the table may not include a record having an ID attribute. In a second processing scheme, the non-identifying apparatus 100 may mask processing of the raw data recorded in the record having the ID attribute. Accordingly, the table may include a record having an ID attribute, and the data masked in the record having the ID attribute may be recorded. With the third processing scheme, the non-identifying apparatus 100 can directly use the raw data recorded in the record having the ID attribute.

비식별화 장치(100)는 테이블 중에서 검색 대상 범위에 의해 지시되는 영역에 정규 표현식을 적용할 수 있다(S214). 예를 들어, 비식별화 장치(100)는 정규 표현식을 기반으로 검색 대상 범위 내에서 개인정보에 해당하는 원시 데이터를 검색할 수 있고, 검색된 원시 데이터에 대응하는 개인정보의 종류를 확인할 수 있다. 비식별화 장치(100)는 확인된 개인정보의 종류에 기초하여 레코드의 속성을 설정할 수 있다. The non-discrimination apparatus 100 may apply a regular expression to an area indicated by the search target range in the table (S214). For example, the non-discrimination apparatus 100 can search for raw data corresponding to individual information within a search target range based on a regular expression, and can identify the type of personal information corresponding to the retrieved raw data. The non-identification device 100 can set the attribute of the record based on the kind of the identified personal information.

구체적으로, 비식별화 장치(100)는 테이블에 포함된 주민번호 레코드(즉, 주민번호를 지시하는 원시 데이터가 기록된 레코드), 이름 레코드(즉, 이름을 지시하는 원시 데이터가 기록된 레코드) 및 주소 레코드(즉, 주소를 지시하는 원시 데이터가 기록된 레코드)의 속성을 ID로 설정할 수 있다. ID 속성을 가지는 레코드는 "ID 레코드"로 지칭될 수 있으며, 이에 따라 ID 레코드는 주민번호 레코드, 이름 레코드 및 주소 레코드를 포함할 수 있다. 비식별화 장치(100)는 테이블에 포함된 우편번호 레코드(즉, 우편번호를 지시하는 원시 데이터가 기록된 레코드), 나이 레코드(즉, 나이를 지시하는 원시 데이터가 기록된 레코드), 국적 레코드(즉, 국적을 지시하는 원시 데이터가 기록된 레코드) 및 성별 레코드(즉, 성별을 지시하는 원시 데이터가 기록된 레코드)의 속성을 QI로 설정할 수 있다. QI 속성을 가지는 레코드는 "QI 레코드"로 지칭될 수 있으며, 이에 따라 QI 레코드는 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드를 포함할 수 있다.Specifically, the non-identifying apparatus 100 records a resident registration number (i.e., a record in which raw data indicating a resident registration number is recorded), a name record (i.e., a record in which raw data indicating a name is recorded) And an attribute of the address record (i.e., the record in which the raw data indicating the address is recorded) can be set to the ID. A record having an ID attribute may be referred to as an "ID record ", whereby the ID record may include a ID number record, a name record, and an address record. The non-identifying apparatus 100 may be configured to store a zip code record (i.e., a record in which raw data indicating a zip code is recorded), an age record (i.e., a record in which raw data indicating age is recorded) (That is, the record in which the raw data indicating the nationality is recorded) and the sex record (that is, the record in which the raw data indicating the sex is recorded) can be set to QI. A record having a QI attribute may be referred to as a "QI record ", whereby the QI record may include a zip code record, an age record, a nationality record, and a gender record.

비식별화 장치(100)는 테이블에 포함된 질병 레코드(즉, 질병을 지시하는 원시 데이터가 기록된 레코드)의 속성을 SA로 설정할 수 있다. SA 속성을 가지는 레코드는 "SA 레코드"로 지칭될 수 있으며, 이에 따라 SA 레코드는 질병 레코드를 포함할 수 있다. 비식별화 장치(100)는 테이블에 포함된 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드의 속성을 IA로 설정할 수 있다. IA 속성을 가지는 레코드는 "IA 레코드"로 지칭될 수 있으며, 이에 따라 IA 레코드는 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드를 포함할 수 있다. 여기서, 테이블에 포함된 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드의 속성은 QI 및 IA로 설정될 수 있다. 앞서 설명된 방법에 의해 처리된 테이블은 다음과 같을 수 있다.The non-identifying apparatus 100 may set the attribute of the disease record (i.e., the record in which the raw data indicating the disease is recorded) included in the table as SA. A record having an SA attribute may be referred to as an "SA record ", whereby the SA record may include a disease record. The non-identifying apparatus 100 may set the attributes of the zip code record, the age record, the nationality record, and the gender record included in the table to IA. A record having an IA attribute may be referred to as an " IA record ", whereby the IA record may include a zip code record, an age record, a nationality record, and a gender record. Here, the attributes of the zip code record, age record, nationality record, and gender record included in the table can be set to QI and IA. The table processed by the method described above may be as follows.

도 4는 테이블의 일 실시예를 도시한 개념도이다.4 is a conceptual diagram showing an embodiment of a table.

도 4를 참조하면, 테이블(400)은 복수의 레코드들을 포함할 수 있다. 복수의 레코드들 각각에 기록된 원시 데이터는 개인정보인 주민번호(또는, 여권번호, SSN), 이름, 주소, 우편번호, 나이, 국적, 성별, 질병 등을 지시할 수 있다. 테이블(400)을 구성하는 주민번호 레코드, 이름 레코드 및 주소 레코드는 ID 레코드로 설정될 수 있다. 테이블(400)을 구성하는 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드는 QI 레코드로 설정될 수 있다. 테이블(400)을 구성하는 질병 레코드는 SA 레코드로 설정될 수 있다. 테이블(400)을 구성하는 우편번호 레코드, 나이 레코드, 국적 레코드 및 성별 레코드는 IA 레코드로 설정될 수 있다.Referring to FIG. 4, the table 400 may include a plurality of records. The raw data recorded in each of the plurality of records may indicate the personal information (or passport number, SSN), name, address, zip code, age, nationality, sex, disease, The ID number record, the name record, and the address record constituting the table 400 may be set as ID records. The zip code record, age record, nationality record, and gender record constituting the table 400 can be set as a QI record. The disease record constituting the table 400 can be set as an SA record. The zip code record, the age record, the nationality record, and the gender record constituting the table 400 can be set as an IA record.

다시 도 3을 참조하면, 비식별화 장치(100)는 단계 S213에서 설정된 처리 방식에 기초하여 ID 속성을 가지는 레코드를 처리할 수 있다. 비식별화 장치(100)는 속성 설정이 완료된 복수의 레코드들을 포함하는 테이블을 출력 인테페이스 장치(150)를 통해 디스플레이할 수 있다(S215). 여기서, 테이블에 포함된 복수의 레코드들 각각은 원시 데이터(즉, 개인정보)와 설정된 속성을 함께 지시할 수 있다. 비식별화 장치(100)는 사용자로부터 설정된 속성에 대한 수정 요청을 지시하는 메시지를 수신할 수 있고, 수신된 메시지에 기초하여 해당 레코드의 속성을 수정할 수 있다. 그 후에, 비식별화 장치(100)는 속성 수정이 완료된 복수의 레코드들을 포함하는 테이블을 출력 인테페이스 장치(150)를 통해 디스플레이할 수 있다. 비식별화 장치(100)는 사용자로부터 속성에 대한 확인 완료를 지시하는 메시지를 수신할 수 있고, 이 경우에 다음 단계를 수행할 수 있다. 여기서, 속성에 대한 수정 요청을 지시하는 메시지 및 속성에 대한 확인 완료를 지시하는 메시지는 비식별화 장치(100)의 입력 인터페이스 장치(140)를 통해 수신될 수 있다.Referring again to FIG. 3, the non-identifying apparatus 100 may process a record having an ID attribute based on the processing method set in step S213. The non-identifying apparatus 100 may display a table including a plurality of records for which attribute setting is completed through the output interface apparatus 150 (S215). Here, each of the plurality of records included in the table may indicate together the raw data (i.e., personal information) and the set attributes. The non-identifying apparatus 100 may receive a message indicating a modification request for the attribute set by the user, and may modify the attribute of the record based on the received message. Thereafter, the non-identifying apparatus 100 may display a table including the plurality of records whose property modification has been completed through the output interface apparatus 150. [ The non-identifying device 100 may receive a message from the user instructing completion of the attribute verification, and in this case, perform the following steps. Here, the message indicating the modification request for the attribute and the message indicating the confirmation completion of the attribute may be received through the input interface device 140 of the non-identification device 100.

다시 도 2를 참조하면, 비식별화 장치(100)는 테이블에 포함된 QI 레코드에 대한 일반화 계층(generalization hierarchy; GH) 모델(model)을 설정할 수 있다(S220). GH 모델의 설정 방법은 다음과 같을 수 있다.Referring again to FIG. 2, the non-identifying apparatus 100 may set a generalization hierarchy (GH) model for the QI records included in the table (S220). The method of setting the GH model may be as follows.

도 5는 GH 모델을 설정하는 방법에 대한 일 실시예를 도시한 흐름도이다.5 is a flowchart showing an embodiment of a method of setting a GH model.

도 5를 참조하면, 비식별화 장치(100)는 QI 레코드에 기록된 원시 데이터에 대한 일반화 레벨(level)을 설정할 수 있다(S221). 비식별화 장치(100)는 QI 레코드의 종류(즉, 우편번호 레코드, 나이 레코드, 국적 레코드, 성별 레코드)에 따라 일반화 레벨을 설정할 수 있다. 예를 들어, 비식별화 장치(100)는 우편번호 레코드에 대한 일반화 레벨의 범위를 일반화 레벨-0부터 일반화 레벨-2까지로 설정할 수 있고, 나이 레코드에 대한 일반화 레벨의 범위를 일반화 레벨-0부터 일반화 레벨-3까지로 설정할 수 있고, 국적 레코드에 대한 일반화 레벨의 범위를 일반화 레벨-0부터 일반화 레벨-2까지로 설정할 수 있고, 성별 레코드에 대한 일반화 레벨의 범위를 일반화 레벨-0부터 일반화 레벨-1까지로 설정할 수 있다.Referring to FIG. 5, the non-identifying apparatus 100 may set a generalization level for the raw data recorded in the QI record (S221). The non-identifying apparatus 100 may set the generalization level according to the type of the QI record (i.e., the zip code record, the age record, the nationality record, and the gender record). For example, the non-identifying apparatus 100 may set the range of the generalization level for the zip code record from the generalization level-0 to the generalization level-2, and the range of the generalization level for the age record to the generalization level-0 To the generalization level-3, and the generalization level range for the nationality record can be set from the generalization level-0 to the generalization level-2, and the range of the generalization level for the gender record can be generalized from the generalization level -0 It can be set to level-1.

원시 데이터에 대한 일반화 범위는 일반화 레벨별로 동일할 수 있다. 예를 들어, 일반화 레벨-1인 경우에 일반화 범위는 일 단위일 수 있으며, 이에 따라 나이를 지시하는 "28", "29", "21" 및 "23"은 "2*"로 일반화될 수 있다. 일반화 레벨-2인 경우에 일반화 범위는 십 단위일 수 있으며, 이에 따라 우편번호를 지시하는 "13053" 및 "13068"은 "130**"로 일반화될 수 있다.The generalization range for raw data can be the same for each generalization level. For example, in the case of the generalization level -1, the generalization range may be one unit, so that the age indicating "28", "29", "21" and "23" can be generalized to "2 *" have. In the case of the generalization level -2, the generalization range may be tens, so that "13053" and "13068" indicating the postal code can be generalized to "130 **".

비식별화 장치(100)는 QI 레코드에 기록된 원시 데이터를 일반화 레벨-0으로 설정할 수 있다(S222). 그 후에, 비식별화 장치(100)는 일반화 레벨의 범위에 기초하여 일반화가 수행될 데이터의 범위를 설정할 수 있고, 일반화가 수행될 데이터의 범위에 기초하여 원시 데이터를 일반화할 수 있고, 일반화된 데이터를 해당 일반화 레벨(예를 들어, 일반화 레벨-1, 일반화 레벨-2, 일반화 레벨-3 등)로 설정할 수 있다(S223). 일반화가 수행된 데이터의 범위는 일반화 레벨-0에서 가장 작으며, 일반화 레벨이 높아질수록 증가될 수 있다.The non-identifying apparatus 100 may set the raw data recorded in the QI record to the generalization level-0 (S222). Thereafter, the non-identifying device 100 may set the range of data to be generalized based on the range of generalization levels, generalize the raw data based on the range of data to be generalized, Data may be set to the corresponding generalization level (e.g., generalization level-1, generalization level-2, generalization level-3, etc.) (S223). The range of generalized data is the smallest at generalization level-0, and can be increased as the generalization level increases.

비식별화 장치(100)는 낮은 일반화 레벨에 대응하는 데이터와 높은 일반화 레벨에 대응하는 데이터를 순차적으로 연결함으로써, GH 모델을 생성할 수 있다(S224). GH 모델에서 최하위 계층에 일반화 레벨-0과 대응하는 원시 데이터가 위치할 수 있고, 일반화 레벨-0보다 상위 계층에 일반화 레벨-1과 대응하는 일반화된 데이터가 위치할 수 있고, 일반화 레벨-1보다 상위 계층에 일반화 레벨-2와 대응하는 일반화된 데이터가 위치할 수 있고, 일반화 레벨-2보다 상위 계층에 일반화 레벨-3과 대응하는 일반화된 데이터가 위치할 수 있다. GH 모델 중 최상위 계층에서 모든 데이터는 하나의 데이터로 일반화될 수 있다. GH 모델의 실시예들은 다음과 같다.The non-identifying apparatus 100 may generate a GH model by sequentially connecting data corresponding to a low generalization level and data corresponding to a high generalization level (S224). In the GH model, primitive data corresponding to the generalization level-0 may be located in the lowest layer, generalized data corresponding to the generalization level-1 may be located in a layer higher than the generalization level-0, The generalized data corresponding to the generalization level-2 may be located in the upper layer, and the generalized data corresponding to the generalization level-3 may be located in the upper layer than the generalization level-2. At the highest layer of the GH model, all data can be generalized as one data. Embodiments of the GH model are as follows.

도 6은 우편번호 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.6 is a conceptual diagram showing an embodiment of a GH model for a zip code record.

도 6을 참조하면, 우편번호 레코드에 대한 GH 모델은 일반화 레벨-0 내지 일반화 레벨-2로 구성될 수 있다. 우편번호 레코드에 기록된 원시 데이터인 "13053", "13068", "14850" 및 "14853"은 일반화 레벨-0으로 설정될 수 있다. 우편번호 레코드에 기록된 원시 데이터 중에서 "13053" 및 "13068"은 "130**"으로 일반화될 수 있고, 일반화된 데이터인 "130**"은 일반화 레벨-1로 설정될 수 있다. 우편번호 레코드에 기록된 원시 데이터 중에서 "14850" 및 "14853"은 "148**"로 일반화될 수 있고, 일반화된 데이터인 "148**"는 일반화 레벨-1로 설정될 수 있다. 일반화 레벨-1에 대응하는 "130**" 및 "148**"는 "*****"(또는, "1****")로 일반화될 수 있고, 일반화된 데이터인 "*****"(또는, "1****")은 일반화 레벨-2로 설정될 수 있다. 우편번호 레코드에 대한 GH 모델은 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Referring to FIG. 6, the GH model for the zip code record may be composed of generalization level-0 to generalization level-2. Quot; 13053 ", " 13068 ", "14850 ", and" 14853 "recorded in the postal code record can be set to the generalization level- Of the raw data recorded in the zip code record, "13053" and "13068" can be generalized to "130 **", and generalized data "130 **" can be set to the generalization level -1. Of the raw data recorded in the postal code record, "14850" and "14853" can be generalized to "148 **", and the generalized data "148 **" can be set to the generalization level -1. "130 **" and "148 **" corresponding to the generalization level -1 can be generalized to "*****" (or "1 ****"), and generalized data "** *** "(or" 1 **** ") may be set to the generalization level -2. The GH model for the postal code record is not limited to the above-described contents, and can be variously set.

도 7은 나이 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.7 is a conceptual diagram showing an embodiment of a GH model for an age record.

도 7을 참조하면, 나이 레코드에 대한 GH 모델은 일반화 레벨-0 내지 일반화 레벨-3으로 구성될 수 있다. 나이 레코드에 기록된 원시 데이터인 "28", "29", "21", "23", "31", "37", "36", "35", "47", "49", "50" 및 "55"는 일반화 레벨-0으로 설정될 수 있다. 나이 레코드에 기록된 원시 데이터 중에서 "28", "29", "21" 및 "23"은 "2*"으로 일반화될 수 있고, 일반화된 데이터인 "2*"은 일반화 레벨-1로 설정될 수 있다. 나이 레코드에 기록된 원시 데이터 중에서 "31", "37", "36" 및 "35"는 "3*"으로 일반화될 수 있고, 일반화된 데이터인 "3*"은 일반화 레벨-1로 설정될 수 있다. 나이 레코드에 기록된 원시 데이터 중에서 "47" 및 "49"는 "4*"로 일반화될 수 있고, 일반화된 데이터인 "4*"는 일반화 레벨-1로 설정될 수 있다. 나이 레코드에 기록된 원시 데이터 중에서 "50" 및 "55"는 "5*"로 일반화될 수 있고, 일반화된 데이터인 "5*"는 일반화 레벨-1로 설정될 수 있다.Referring to FIG. 7, the GH model for the age record may be composed of generalization level-0 to generalization level-3. 28, 29, 21, 23, 31, 37, 36, 35, 47, 49, 50 "And" 55 "may be set to the generalization level-0. Of the raw data recorded in the age record, "28", "29", "21" and "23" can be generalized to "2 *" and generalized data "2 *" is set to generalized level -1 . Of the raw data recorded in the age record, "31", "37", "36" and "35" can be generalized to "3 *" and generalized data "3 *" is set to generalized level -1 . Of the raw data recorded in the age record, "47" and "49" may be generalized to "4 *" and generalized data "4 *" may be set to generalized level-1. Of the raw data recorded in the age record, "50" and "55" may be generalized to "5 *" and generalized data "5 *" may be set to generalized level-1.

일반화 레벨-1에 대응하는 "2*" 및 "3*"은 "<40"으로 일반화될 수 있고, 일반화된 데이터인 "<40"은 일반화 레벨-2로 설정될 수 있다. 일반화 레벨-1에 대응하는 "4*" 및 "5*"는 "≥40"으로 일반화될 수 있고, 일반화된 데이터인 "≥40"은 일반화 레벨-2로 설정될 수 있다. 일반화 레벨-2에 대응하는 "<40" 및 "≥40"은 "**"로 일반화될 수 있고, 일반화된 데이터인 "**"은 일반화 레벨-3으로 설정될 수 있다. 나이 레코드에 대한 GH 모델은 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Quot ;, "2 * " and" 3 * " corresponding to the generalization level -1 can be generalized to "40 ", and generalized data" 4 * "and " 5 * " corresponding to the generalization level -1 can be generalized to "? 40 ", and generalized data"? 40 " Quot ;, "40" and " 40 "corresponding to the generalization level -2 can be generalized to" ** ", and the generalized data "**" The GH model for the age record is not limited to the contents described above, and can be set variously.

도 8은 국적 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.8 is a conceptual diagram showing an embodiment of a GH model for a national record.

도 8을 참조하면, 국적 레코드에 대한 GH 모델은 일반화 레벨-0 내지 일반화 레벨-2로 구성될 수 있다. 국적 레코드에 기록된 원시 데이터인 "한국", "일본", "영국" 및 "독일"은 일반화 레벨-0으로 설정될 수 있다. 국적 레코드에 기록된 원시 데이터 중에서 "한국" 및 "일본"은 "아시아"로 일반화될 수 있고, 일반화된 데이터인 "아시아"는 일반화 레벨-1로 설정될 수 있다. 국적 레코드에 기록된 원시 데이터 중에서 "영국" 및 "독일"은 "유럽"으로 일반화될 수 있고, 일반화된 데이터인 "유럽"은 일반화 레벨-1로 설정될 수 있다. 일반화 레벨-1에 대응하는 "아시아" 및 "유럽"은 "전세계"(또는, "**")로 일반화될 수 있고, 일반화된 데이터인 "전세계"(또는, "**")는 일반화 레벨-2로 설정될 수 있다. 국적 레코드에 대한 GH 모델은 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Referring to FIG. 8, the GH model for the national record may be composed of generalization level-0 to generalization level-2. "Korea", "Japan", "United Kingdom" and "Germany", which are the raw data recorded in the nationality record, can be set to the generalization level-0. Of the raw data recorded in the nationality record, "Korea" and "Japan" can be generalized to "Asia" and generalized data "Asia" can be set to generalization level -1. Of the raw data recorded in the nationality record, "UK" and "Germany" can be generalized to "Europe" and generalized data "Europe" can be set to generalization level -1. (Or "**") corresponding to generalization level-1 can be generalized to " -2. &Lt; / RTI > The GH model for the nationality record is not limited to the contents described above, and can be set variously.

도 9는 성별 레코드에 대한 GH 모델의 일 실시예를 도시한 개념도이다.9 is a conceptual diagram showing an embodiment of a GH model for gender records.

도 9를 참조하면, 성별 레코드에 대한 GH 모델은 일반화 레벨-0 및 일반화 레벨-1로 구성될 수 있다. 성별 레코드에 기록된 원시 데이터인 "남" 및 "여"는 일반화 레벨-0으로 설정될 수 있다. 일반화 레벨-0에 대응하는 "남" 및 "여"는 "사람"(또는, "*")으로 일반화될 수 있고, 일반화된 데이터인 "사람"(또는, "*")은 일반화 레벨-1로 설정될 수 있다. 성별 레코드에 대한 GH 모델은 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Referring to FIG. 9, the GH model for a gender record may comprise a generalization level-0 and a generalization level-1. The raw data recorded in the gender record, "M" and "F", can be set to the generalization level-0. (Or "*") corresponding to generalization level-0 can be generalized to "person" Lt; / RTI > The GH model for gender records is not limited to the contents described above, and can be variously set.

다시 도 2를 참조하면, 비식별화 장치(100)는 테이블의 비식별화를 위해 사용되는 파라미터(parameter)(이하, "비식별화 파라미터"라 함)를 설정할 수 있다(S230). 비식별화 파라미터는 K-익명성(anonymity), L-다양성(diversity), T-근접성(closeness) 등을 포함할 수 있다. 테이블의 비식별화를 위해 K-익명성, "K-익명성 + L-다양성" 또는 "K-익명성 + T-근접성"이 사용될 수 있다. 따라서, 비식별화 장치(100)는 K-익명성을 기본적으로 설정할 수 있고, 추가로 L-다양성 또는 T-근접성을 설정할 수 있다.Referring again to FIG. 2, the non-identifying device 100 may set a parameter (hereinafter, referred to as "non-identifying parameter") used for non-identifying the table (S230). The non-identifying parameter may include K-anonymity, L-diversity, T-closeness, and the like. K-anonymity, "K-anonymity + L-diversity" or "K-anonymity + T-proximity" Thus, the non-identifying device 100 can set K-anonymity basically, and can further set L-diversity or T-proximity.

구체적으로, 비식별화 장치(100)는 K-익명성의 K 값을 설정할 수 있다. 또는, 비식별화 장치(100)는 입력 인터페이스 장치(140)를 통해 사용자로부터 K-익명성의 K 값을 획득할 수 있고, 획득된 K-익명성의 K 값을 사용할 수 있다. K-익명성의 K 값은 동등 클래스를 구성하는 로우 개수를 지시할 수 있다.Specifically, the non-discriminator 100 may set the K value of K-anonymity. Alternatively, the non-identifying device 100 may obtain the K-value of the K-anonymity from the user via the input interface device 140 and may use the K-value of the obtained K-anonymity. The K value of K-anonymity can indicate the number of rows constituting an equivalence class.

여기서, 테이블은 적어도 하나의 동등 클래스(equivalence class)를 포함할 수 있으며, 하나의 동등 클래스 내에서 ID 레코드는 동일한 데이터(예를 들어, 원시 데이터, 일반화된 데이터)를 지시할 수 있다. 즉, 동일한 데이터가 기록된 ID 레코드와 해당 ID 레코드와 관련되는 다른 레코드(예를 들어, QI 레코드, SA 레코드, IA 레코드 등)는 하나의 동등 클래스를 구성할 수 있다. 테이블은 K-익명성에 기초하여 비식별화될 수 있으며, K-익명성의 K 값이 4인 경우(즉, 4-익명성의 경우)에 비식별화된 테이블은 다음과 같을 수 있다.Here, the table may include at least one equivalence class, and an ID record within one equivalence class may indicate the same data (e.g., raw data, generalized data). That is, an ID record in which the same data is recorded and another record (e.g., QI record, SA record, IA record, etc.) associated with the ID record can constitute one equivalence class. The table can be unidentified based on K-anonymity, and if the K-value of K-anonymity is 4 (i.e., in case of 4-anonymity) the unidentified table may be as follows.

도 10은 비식별화된 테이블의 일 실시예를 도시한 개념도이다.10 is a conceptual diagram showing an embodiment of a non-identified table.

도 10을 참조하면, 동등 클래스들 각각은 우편번호 레코드, 나이 레코드, 국적 레코드, 성별 레코드 및 질병 레코드를 포함할 수 있다. 여기서, 도 10에 도시된 비식별화된 테이블(400)은 도 4에 도시된 테이블(400) 중에서 주민번호 레코드, 이름 레코드 및 주소 레코드가 제외된 것일 수 있다. 동등 클래스들 각각에서 우편번호 레코드는 동일한 데이터를 지시할 수 있고, 나이 레코드는 동일한 데이터를 지시할 수 있다.Referring to FIG. 10, each of the equivalence classes may include a zip code record, an age record, a nationality record, a gender record, and a disease record. Here, the non-identified table 400 shown in FIG. 10 may be a table 400 shown in FIG. 4 in which a resident number record, a name record, and an address record are excluded. In each of the equivalence classes, the zip code record can point to the same data, and the age record can point to the same data.

예를 들어, 동등 클래스-1에서 우편번호 레코드는 "130**"을 지시할 수 있고, 나이 레코드는 "<30"을 지시할 수 있다. 동등 클래스-2에서 우편번호 레코드는 "1485*"을 지시할 수 있고, 나이 레코드는 "≥40"을 지시할 수 있다. 동등 클래스-3에서 우편번호 레코드는 "130**"을 지시할 수 있고, 나이 레코드는 "3*"을 지시할 수 있다.For example, in equivalence class-1, the zip code record may indicate "130 **" and the age record may indicate "<30". In Equivalent Class-2, the zip code record can indicate "1485 *" and the age record can indicate "≥40". In Equivalent Class-3, the zip code record can indicate "130 **" and the age record can indicate "3 *".

다시 도 2를 참조하면, 비식별화 장치(100)는 L-다양성의 L 값을 설정할 수 있다. 또는, 비식별화 장치(100)는 입력 인터페이스 장치(140)를 통해 사용자로부터 L-다양성의 L 값을 획득할 수 있고, 획득된 L-다양성의 L 값을 사용할 수 있다. L-다양성의 L 값은 테이블 내의 동등 클래스들 각각에 속하는 SA 레코드에 기록된 데이터 중에서 서로 다른 데이터의 개수일 수 있다. 도 11에 도시된 비식별화된 테이블(400) 내의 동등 클래스-1에서 L-다양성의 L 값은 2(즉, 질병 레코드에 지시되는 서로 다른 질병의 개수)일 수 있고, 동등 클래스-2에서 L-다양성의 L 값은 3일 수 있고, 동등 클래스-3에서 L-다양성의 L 값은 1일 수 있다. 테이블은 K-익명성 및 L-다양성에 기초하여 비식별화될 수 있으며, K-익명성의 K 값이 4이고 L-다양성의 L 값이 3인 경우(즉, 4-익명성 및 3-다양성의 경우)에 비식별화된 테이블은 다음과 같을 수 있다.Referring again to FIG. 2, the non-identifying device 100 may set the L-value of the L-variance. Alternatively, the non-identifying device 100 may obtain L-variance L-values from the user via the input interface device 140 and use the obtained L-variance L-values. The L-variance L value may be the number of different data among the data recorded in the SA records belonging to each of the equivalent classes in the table. The L value of the L-variance in equivalence class-1 in the un-identified table 400 shown in FIG. 11 may be 2 (i.e., the number of different diseases indicated in the disease record) The L-value of the L-variance can be 3, and the L-value of the L-variance in the class-3 can be 1. The table can be unidentified based on K-anonymity and L-diversity, and if the K-value of K-anonymity is 4 and the L-value of L-variance is 3 (i.e. 4-anonymity and 3- diversity The non-identified table may be as follows.

도 11은 비식별화된 테이블의 다른 실시예를 도시한 개념도이다.11 is a conceptual diagram showing another embodiment of the non-identified table.

도 11을 참조하면, 비식별화된 테이블(400)의 동등 클래스-1에서 질병 레코드는 3개의 서로 다른 질병(즉, 위염, 기관지염, 폐렴)을 지시할 수 있고, 동등 클래스-2에서 질병 레코드는 3개의 서로 다른 질병(즉, 폐렴, 위염, 기관지염)을 지시할 수 있고, 동등 클래스-3에서 질병 레코드는 3개의 서로 다른 질병(즉, 위염, 기관지염, 폐렴)을 지시할 수 있다.11, a disease record can indicate three different diseases (i.e., gastritis, bronchitis, pneumonia) in the class-1 of equivalence of the non-identified table 400, Can indicate three different diseases (ie, pneumonia, gastritis, bronchitis), and a disease record in an equivalent class-3 can indicate three different diseases (ie, gastritis, bronchitis, pneumonia).

다시 도 2를 참조하면, 비식별화 장치(100)는 T-근접성의 T 값을 설정할 수 있다. 또는, 비식별화 장치(100)는 입력 인터페이스 장치(140)를 통해 사용자로부터 T-근접성의 T 값을 획득할 수 있고, 획득된 T-근접성의 T 값을 사용할 수 있다. 테이블은 K-익명성, L-다양성 및 T-근접성(또는, K-익명성 및 T-근접성)에 기초하여 비식별화될 수 있다. T-근접성의 T 값은 테이블 내의 동등 클래스들 각각에 속하는 SA 레코드에 의해 지시되는 데이터들 간의 거리일 수 있다. 예를 들어, 테이블이 연봉 레코드를 포함하는 경우, 테이블의 동등 클래스들 각각에서 연봉 레코드에 의해 지시되는 연봉들 간의 거리(즉, 차이)가 T-근접성의 T 값 이내가 되도록 테이블이 비식별화될 수 있다.Referring again to FIG. 2, the non-identifying device 100 may set the T-proximity T value. Alternatively, the non-identifying device 100 may obtain the T-proximity T value from the user via the input interface device 140 and may use the obtained T-proximity T value. The table can be unidentified based on K-anonymity, L-diversity and T-proximity (or K-anonymity and T-proximity). The T-value of the T-proximity may be the distance between the data indicated by the SA record belonging to each of the equivalent classes in the table. For example, if the table contains an annual salary record, the table is un-identified so that the distance (i. E. Difference) between the salaries indicated by the salary record in each of the equivalent classes of the table is within the T- .

비식별화 장치(100)는 억제 값 비율에 대한 임계값(이하, "억제 임계값"이라 함)을 설정할 수 있다(S240). 또는, 비식별화 장치(100)는 입력 인터페이스 장치(140)를 통해 사용자로부터 억제 임계값을 획득할 수 있고, 획득된 억제 임계값을 사용할 수 있다. 억제 값 비율은 비식별화된 테이블 중에서 K-익명성을 만족하지 않는 동등 클래스의 비율을 지시할 수 있다. 또는, 억제 값 비율은 비식별화된 테이블 중에서 K-익명성을 만족하지 않는 레코드의 비율을 지시할 수 있다. 억제 값 비율은 아래 수학식 1을 기초로 계산될 수 있다.The non-identifying apparatus 100 may set a threshold value (hereinafter referred to as "suppression threshold value") for the suppression value ratio (S240). Alternatively, the non-identifying device 100 may obtain an inhibit threshold from the user via the input interface device 140 and may use the obtained inhibit threshold. The rate of inhibition value can indicate the percentage of equivalence classes that do not satisfy the K-anonymity among the unidentified tables. Alternatively, the rate of inhibition value may indicate the percentage of records that do not satisfy the K-anonymity among the un-identified tables. The inhibition value ratio can be calculated based on the following equation (1).

억제 임계값은 다양한 값으로 설정될 수 있다. 예를 들어, 억제 임계값은 10%로 설정될 수 있다.The suppression threshold can be set to various values. For example, the suppression threshold may be set to 10%.

비식별화 장치(100)는 GH 모델을 기반으로 원시 래티스(lattice)를 생성할 수 있다(S250). 원시 래티스는 복수의 노드들을 포함할 수 있으며, 복수의 노드들 각각은 GH 모델에 의해 지시되는 일반화 레벨과 해당 일반화 레벨에 대응하는 레코드를 지시할 수 있다. 즉, 비식별화 장치(100)는 GH 모델에 의해 지시되는 일반화 레벨과 해당 일반화 레벨에 대응하는 레코드를 지시하는 노드들을 설정할 수 있고, 노드들을 일반화 레벨의 순서에 따라 연결함으로써 원시 래티스를 생성할 수 있다. 도 6에 도시된 우편번호 레코드의 GH 모델, 도 7에 도시된 나이 레코드의 GH 모델 및 도 9에 도시된 성별 레코드의 GH 모델을 기반으로 생성된 원시 래티스는 다음과 같을 수 있다.The non-identifying device 100 may generate a native lattice based on the GH model (S250). The native lattice may comprise a plurality of nodes, each of which may indicate a record corresponding to a generalization level indicated by the GH model and a corresponding generalization level. That is, the non-identifying apparatus 100 can set nodes indicating a generalized level indicated by the GH model and a record corresponding to the generalized level, and connecting the nodes according to the order of the generalization level to generate the raw lattice . The raw lattice generated based on the GH model of the postal code record shown in Fig. 6, the GH model of the age record shown in Fig. 7, and the GH model of the gender record shown in Fig. 9 may be as follows.

도 12는 원시 래티스의 일 실시예를 도시한 개념도이다.12 is a conceptual diagram showing an embodiment of primitive lattice.

도 12를 참조하면, 원시 래티스는 복수의 노드들을 포함할 수 있고, 계층-0부터 계층-6으로 구성될 수 있다. 계층들 각각에 적어도 하나의 노드가 위치할 수 있다. 예를 들어, 최하위 계층(즉, 계층-0) 및 최상위 계층(즉, 계층-6) 각각에 하나의 노드가 위치할 수 있다. 계층-1 및 계층-5 각각에 3개의 노드들이 위치할 수 있다. 계층-2에 5개의 노드들이 위치할 수 있다. 계층-3 및 계층-4 각각에 6개의 노드들이 위치할 수 있다.Referring to FIG. 12, the primitive lattice may include a plurality of nodes, and may be configured from layer-0 to layer-6. At least one node may be located in each of the layers. For example, one node may be located in each of the lowest layer (i.e., layer-0) and the highest layer (i.e., layer-6). Three nodes may be located in layer-1 and layer-5, respectively. Five nodes may be located in Layer-2. Six nodes may be located in layer-3 and layer-4, respectively.

여기서, a₀은 도 7에 도시된 GH 모델에서 레벨-0인 나이 레코드를 지시할 수 있고, a₁은 도 7에 도시된 GH 모델에서 레벨-1인 나이 레코드를 지시할 수 있고, a₂는 도 7에 도시된 GH 모델에서 레벨-2인 나이 레코드를 지시할 수 있고, a3은 도 7에 도시된 GH 모델에서 레벨-3인 나이 레코드를 지시할 수 있다. b₀은 도 6에 도시된 GH 모델에서 레벨-0인 우편번호 레코드를 지시할 수 있고, b₁은 도 6에 도시된 GH 모델에서 레벨-1인 우편번호 레코드를 지시할 수 있고, b₂는 도 6에 도시된 GH 모델에서 레벨-2인 우편번호 레코드를 지시할 수 있다. c₀은 도 9에 도시된 GH 모델에서 레벨-0인 성별 레코드를 지시할 수 있고, c₁은 도 9에 도시된 GH 모델에서 레벨-1인 성별 레코드를 지시할 수 있다.Here, a ₀ may indicate an age record of level-0 in the GH model shown in FIG. 7, a ₁ may indicate an age record of level-1 in the GH model shown in FIG. 7, and a ₂ Can indicate an age record of level-2 in the GH model shown in FIG. 7, and a3 can indicate an age record of level-3 in the GH model shown in FIG. b ₀ can indicate a zip code record with level-0 in the GH model shown in Fig. 6, b ₁ can indicate a zip code record with level-1 in the GH model shown in Fig. 6, b ₂ 6 can indicate a level-2 zip code record in the GH model shown in Fig. c ₀ may indicate a gender record at level-0 in the GH model shown in FIG. 9, and c ₁ may indicate a gender record at level-1 in the GH model shown in FIG.

따라서, "a₀, b₀, c₀" 노드는 레벨-0인 나이 레코드, 레벨-0인 우편번호 레코드 및 레벨-0인 성별 레코드를 지시할 수 있다. "a₁, b₀, c₀" 노드는 레벨-1인 나이 레코드, 레벨-0인 우편번호 레코드 및 레벨-0인 성별 레코드를 지시할 수 있다. "a₁, b₁, c₀" 노드는 레벨-1인 나이 레코드, 레벨-1인 우편번호 레코드 및 레벨-0인 성별 레코드를 지시할 수 있다.Thus, the node "a ₀ , b ₀ , c ₀ " may indicate an age record of level-0, a postal code record of level-0 and a sex record of level-0. The node "a ₁ , b ₀ , c ₀ " may indicate an age record of level-1, a postal code record of level-0, and a sex record of level-0. The node "a ₁ , b ₁ , c ₀ " may indicate an age record of level-1, a postal code record of level-1 and a sex record of level-0.

비식별화 장치(100)는 유전 알고리즘(genetic algorithm)을 사용하여 원시 래티스 내에서 최종 래티스를 설정할 수 있다(S260). 최종 래티스의 설정 방법은 다음과 같을 수 있다.The non-identifying device 100 may set the final lattice within the native lattice using a genetic algorithm (S260). The method of setting the final lattice may be as follows.

도 13은 최종 래티스의 설정 방법을 도시한 흐름도이다.13 is a flowchart showing a method of setting the final lattice.

도 13을 참조하면, 비식별화 장치(100)는 도 12에 도시된 원시 래티스 중에서 최하위 계층으로부터 2/3 지점에 해당하는 계층-4에 속하는 노드들 중에서 임의의 노드를 선택 노드 A로 설정할 수 있고, 최하위 계층으로부터 1/3 지점에 해당하는 계층-2에 속하는 노드들 중에서 임의의 노드를 선택 노드 B로 설정할 수 있다(S261). 선택 노드 A는 선택 노드 B와 연결될 수 있다. 예를 들어, 비식별화 장치(100)는 계층-4에 속하는 "a₂, b₂, c₀" 노드를 선택 노드 A로 설정할 수 있고, 계층-2에 속하는 "a₁, b₁, c₀" 노드를 선택 노드 B로 설정할 수 있다.Referring to FIG. 13, the non-identifying apparatus 100 may set an arbitrary node among the nodes belonging to layer-4 corresponding to the 2/3 point from the lowest layer among the primitive lattices shown in FIG. 12 as the selected node A And an arbitrary node among the nodes belonging to the layer-2 corresponding to the 1/3 point from the lowest layer can be set as the selected node B (S261). The selection node A may be connected to the selection node B. For example, the non-identifying apparatus 100 may set the node "a ₂ , b ₂ , c ₀ " belonging to layer-4 as the selected node A, and "a ₁ , b ₁ , c ₀ "node can be set as the selected node B.

비식별화 장치(100)는 선택 노드 A 및 선택 노드 B 각각에 대응하는 테이블에 대한 비식별화를 수행할 수 있다(S262). 비식별화 장치(100)는 앞서 설명된 단계 S230에서 설정된 비식별화 파라미터(예를 들어, K-익명성, L-다양성, T-근접성)를 만족하는 비식별화된 테이블을 생성할 수 있다. 선택 노드 A에 대응하는 테이블에 대한 비식별화 결과는 "비식별화된 테이블 A"로 지칭될 수 있고, 선택 노드 B에 대응하는 테이블에 대한 비식별화 결과는 "비식별화된 테이블 B"로 지칭될 수 있다.The non-identifying apparatus 100 may perform the non-discrimination on the table corresponding to the selected node A and the selected node B, respectively (S262). The non-identifying device 100 may generate an unidentified table that satisfies the non-identifying parameters (e.g., K-anonymity, L-variance, T-proximity) set in step S230 described above . The non-identified result for the table corresponding to the selected node A may be referred to as "unidentified table A ", the non-identified result for the table corresponding to the selected node B may be referred to as the &Lt; / RTI >

비식별화 장치(100)는 비식별화된 테이블 A 및 비식별화된 테이블 B 각각의 억제 값 비율이 모두 억제 임계값 이하인지 여부를 판단할 수 있다(S263). 비식별화된 테이블 A 및 비식별화된 테이블 B 각각의 억제 값 비율이 모두 억제 임계값 이하인 경우(이하, "케이스 1"이라 함), 비식별화 장치(100)는 다음과 같이 최종 래티스를 설정할 수 있다.The non-identifying apparatus 100 may determine whether the ratio of inhibition values of the non-identified table A and the non-identified table B is both below the suppression threshold (S263). When the ratio of suppression values of the non-identified table A and the non-identified table B is below the suppression threshold value (hereinafter referred to as "case 1"), the non-discrimination apparatus 100 calculates the final lattice Can be set.

케이스 1. 최종 래티스 설정 방법Case 1. How to set final lattice

비식별화 장치(100)는 원시 래티스에서 선택 노드 B가 속한 계층-2와 최하위 계층(즉, 계층-0) 사이의 1/2 지점에 해당하는 계층-1에 속하는 노드들 중에서 임의의 노드를 교차 노드로 설정할 수 있고, 계층-2에 속하는 노드들 중에서 선택 노드 B를 제외한 임의의 노드를 변이 노드로 설정할 수 있다(S263-1). 예를 들어, 비식별화 장치(100)는 계층-1에 속하는 "a₀, b₁, c₀" 노드를 교차 노드로 설정할 수 있고, 계층-2에 속하는 "a₀, b₂, c₀" 노드를 변이 노드로 설정할 수 있다.The non-identifying apparatus 100 selects any one of the nodes belonging to the layer-1 corresponding to 1/2 point between the layer-2 to which the selected node B belongs and the lowest layer (i.e., layer-0) It is possible to set an arbitrary node other than the selected node B among the nodes belonging to the layer-2 as the mutation node (S263-1). For example, the non-identifying apparatus 100 may set the node "a ₀ , b ₁ , c ₀ " belonging to layer-1 as an intersection node, and "a ₀ , b ₂ , c ₀ Quot; node as a mutation node.

비식별화 장치(100)는 교차 노드 및 변이 노드 각각에 대응하는 테이블에 대한 비식별화를 수행할 수 있다(S266). 즉, 비식별화된 테이블 B의 억제 값 비율이 억제 임계값 이하이므로, 선택 노드 B보다 상위 계층에 속하는 노드들에 대응하는 테이블에 대한 비식별화는 수행되지 않을 수 있다. 비식별화 장치(100)는 앞서 설명된 단계 S230에서 설정된 비식별화 파라미터(예를 들어, K-익명성, L-다양성, T-근접성)를 만족하는 비식별화된 테이블을 생성할 수 있다.The non-identifying apparatus 100 may perform the non-discrimination on the table corresponding to each of the cross node and the mutation node (S266). That is, since the ratio of the suppression value of the non-identified table B is less than the suppression threshold value, the non-discrimination for the table corresponding to the nodes belonging to the higher layer than the selected node B may not be performed. The non-identifying device 100 may generate an unidentified table that satisfies the non-identifying parameters (e.g., K-anonymity, L-variance, T-proximity) set in step S230 described above .

또한, 비식별화 장치(100)는 단계 S266에 의해 생성된 비식별화된 테이블의 억제 값 비율이 억제 임계값 이하인지 판단할 수 있다. 비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 B, 교차 노드, 변이 노드)의 개수를 카운팅(counting)할 수 있다.In addition, the non-identifying apparatus 100 may determine whether the ratio of suppression values of the non-identified table generated by step S266 is equal to or less than the suppression threshold value. Non-identifying apparatus 100 may count the number of nodes (e.g., selected node B, crossover node, mutation node) satisfying the "suppression value ratio? Suppression threshold value ".

단계 S261 내지 단계 S266은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 B, 교차 노드, 변이 노드)의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수(예를 들어, 도 14에서 6개)의 x배보다 클 때까지 반복하여 수행될 수 있다. 여기서, x는 0보다 큰 실수일 수 있다. 예를 들어, x는 0.8, 1 또는 1.2로 설정될 수 있다. x는 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Steps S261 to S266 include the case where the number of nodes (for example, selected node B, cross node, and mutation node) satisfying the "suppression value ratio? Suppression threshold value " includes the largest number of nodes constituting the original lattice (For example, six in Fig. 14) of the layer of the layer to be formed. Where x may be a real number greater than zero. For example, x may be set to 0.8, 1, or 1.2. x is not limited to the above-described contents, and can be variously set.

예를 들어, 선택 노드 B가 속한 계층-2와 최하위 계층(즉, 계층-0) 사이의 2/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 A'로 설정될 수 있다. 또한, 선택 노드 B가 속한 계층-2와 최하위 계층(즉, 계층-0) 사이의 1/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 B'로 설정될 수 있다. 선택 노드 A' 및 선택 노드 B'에 기초하여 단계 S262 내지 단계 S266이 다시 수행될 수 있다. 이러한 과정은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수의 x배보다 클 때까지 반복하여 수행될 수 있다.For example, any node belonging to the layer corresponding to the 2/3 point between the layer-2 to which the selected node B belongs and the lowest layer (i.e., layer-0) can be set as the selected node A '. In addition, any node belonging to the layer corresponding to the 1/3 point between the layer-2 to which the selected node B belongs and the lowest layer (i.e., layer-0) can be set as the selected node B '. Steps S262 to S266 may be performed again based on the selection node A 'and the selection node B'. This process can be repeatedly performed until the number of nodes satisfying the "suppression value ratio? Suppression threshold value" is larger than x times the number of nodes in the layer including the largest number of layers constituting the original lattice .

비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 B, 교차 노드, 변이 노드 등)를 포함하는 최종 래티스를 설정할 수 있다(S267).The non-identifying device 100 may set the final lattice comprising the node (e.g., selected node B, crossover node, transition node, etc.) satisfying the "suppression value ratio? Suppression threshold value "

한편, 케이스 1에 해당하지 않는 경우, 비식별화 장치(100)는 비식별화된 테이블 A의 억제 값 비율이 억제 임계값 이하이고 비식별화된 테이블 B의 억제 값 비율이 억제 임계값을 초과하는지 여부를 판단할 수 있다(S264). 비식별화된 테이블 A의 억제 값 비율이 억제 임계값 이하이고 비식별화된 테이블 B의 억제 값 비율이 억제 임계값을 초과하는 경우(이하, "케이스 2"라 함), 비식별화 장치(100)는 다음과 같이 최종 래티스를 선택할 수 있다.On the other hand, in the case of not corresponding to Case 1, the non-identifying apparatus 100 determines that the ratio of the suppression value of the non-identified table A is equal to or lower than the suppression threshold value and the ratio of the suppression value of the non-identified table B exceeds the suppression threshold value (S264). When the ratio of the inhibition value of the non-identified table A is equal to or less than the suppression threshold value and the ratio of the inhibition value of the non-identified table B exceeds the suppression threshold value (hereinafter referred to as "case 2" 100) can select the final lattice as follows.

케이스 2. 최종 래티스 설정 방법Case 2. How to set final lattice

비식별화 장치(100)는 원시 래티스에서 선택 노드 A가 속한 계층-4와 선택 노드 B가 속한 계층-2 사이의 1/2 지점에 해당하는 계층-3에 속하는 노드들 중에서 임의의 노드를 교차 노드로 설정할 수 있고, 계층-4에 속하는 노드들 중에서 선택 노드 A를 제외한 임의의 노드를 변이 노드로 설정할 수 있다(S264-1). 예를 들어, 비식별화 장치(100)는 계층-3에 속하는 "a₁, b₁, c₁" 노드를 교차 노드로 설정할 수 있고, 계층-4에 속하는 "a₂, b₁, c₁" 노드를 변이 노드로 설정할 수 있다.The non-identifying apparatus 100 crosses an arbitrary node among the nodes belonging to the layer-3 corresponding to the half point between the layer-4 to which the selected node A belongs and the layer-2 to which the selected node B belongs in the original lattice And any node other than the selected node A among the nodes belonging to the layer-4 can be set as the mutation node (S264-1). For example, the non-identifying apparatus 100 may set the nodes "a ₁ , b ₁ , c ₁ " belonging to the hierarchical layer 3 as the crossover nodes and "a ₂ , b ₁ , c ₁ Quot; node as a mutation node.

비식별화 장치(100)는 교차 노드 및 변이 노드 각각에 대응하는 테이블에 대한 비식별화를 수행할 수 있다(S266). 즉, 비식별화된 테이블 B의 억제 값 비율이 억제 임계값을 초과하므로, 선택 노드 B보다 하위 계층에 속하는 노드들에 대한 비식별화는 수행되지 않을 수 있다. 비식별화 장치(100)는 앞서 설명된 단계 S230에서 설정된 비식별화 파라미터(예를 들어, K-익명성, L-다양성, T-근접성)를 만족하는 비식별화된 테이블을 생성할 수 있다.The non-identifying apparatus 100 may perform the non-discrimination on the table corresponding to each of the cross node and the mutation node (S266). That is, the non-discrimination of the nodes belonging to the lower layer than the selected node B may not be performed because the rate of suppression of the non-identified table B exceeds the suppression threshold value. The non-identifying device 100 may generate an unidentified table that satisfies the non-identifying parameters (e.g., K-anonymity, L-variance, T-proximity) set in step S230 described above .

또한, 비식별화 장치(100)는 단계 S266에 의해 생성된 비식별화된 테이블의 억제 값 비율이 억제 임계값 이하인지 판단할 수 있다. 비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 A, 교차 노드, 변이 노드)의 개수를 카운팅할 수 있다.In addition, the non-identifying apparatus 100 may determine whether the ratio of suppression values of the non-identified table generated by step S266 is equal to or less than the suppression threshold value. The non-identifying device 100 may count the number of nodes (e.g., selected node A, crossover node, mutation node) satisfying the "suppression value ratio? Suppression threshold value ".

단계 S261 내지 단계 S266은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 A, 교차 노드, 변이 노드)의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수(예를 들어, 도 14에서 6개)의 x배보다 클 때까지 반복하여 수행될 수 있다. 여기서, x는 0보다 큰 실수일 수 있다. 예를 들어, x는 0.8, 1 또는 1.2로 설정될 수 있다. x는 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.Steps S261 to S266 include the case where the number of nodes (for example, selected node A, cross node, and mutation node) satisfying the "suppression value ratio? Suppression threshold value " includes the largest number of nodes constituting the raw lattice (For example, six in Fig. 14) of the layer of the layer to be formed. Where x may be a real number greater than zero. For example, x may be set to 0.8, 1, or 1.2. x is not limited to the above-described contents, and can be variously set.

예를 들어, 선택 노드 A가 속한 계층-4와 선택 노드 B가 속한 계층-2 사이의 2/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 A'로 설정될 수 있다. 또한, 선택 노드 A가 속한 계층-4와 선택 노드 B가 속한 계층-2 사이의 1/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 B'로 설정될 수 있다. 선택 노드 A' 및 선택 노드 B'에 기초하여 단계 S262 내지 단계 S266이 다시 수행될 수 있다. 이러한 과정은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수의 x배보다 클 때까지 반복하여 수행될 수 있다.For example, an arbitrary node belonging to a layer corresponding to the 2/3 point between the layer-4 to which the selected node A belongs and the layer-2 to which the selected node B belongs may be set as the selected node A '. In addition, an arbitrary node belonging to the layer corresponding to the 1/3 point between the layer-4 to which the selected node A belongs and the layer-2 to which the selected node B belongs can be set as the selected node B '. Steps S262 to S266 may be performed again based on the selection node A 'and the selection node B'. This process can be repeatedly performed until the number of nodes satisfying the "suppression value ratio? Suppression threshold value" is larger than x times the number of nodes in the layer including the largest number of layers constituting the original lattice .

비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 선택 노드 A, 교차 노드, 변이 노드)를 포함하는 최종 래티스를 설정할 수 있다(S267).The non-identifying apparatus 100 may set the final lattice comprising the node (e.g., the selected node A, the crossover node, the transition node) satisfying the "suppression value ratio? Suppression threshold value "

한편, 케이스 1 및 2에 해당하지 않는 경우, 비식별화 장치(100)는 비식별화된 테이블 A 및 비식별화된 테이블 B 각각의 억제 값 비율이 모두 억제 임계값을 초과하는지 여부를 판단할 수 있다(S265). 비식별화된 테이블 A 및 비식별화된 테이블 B 각각의 억제 값 비율이 모두 억제 임계값을 초과하는 경우(이하, "케이스 3"이라 함), 비식별화 장치(100)는 다음과 같이 최종 래티스를 선택할 수 있다.On the other hand, if it is not the cases 1 and 2, the non-identifying device 100 determines whether the inhibition ratio of each of the non-identified table A and the non-identified table B exceeds the suppression threshold (S265). If all the inhibition value ratios of the non-identified table A and non-identified table B exceed the suppression threshold (hereinafter referred to as "case 3"), the non-discrimination apparatus 100 Lattices can be selected.

케이스 3. 최종 래티스 설정 방법Case 3. How to set the final lattice

비식별화 장치(100)는 원시 래티스에서 선택 노드 A가 속한 계층-4와 최상위 계층(즉, 계층-6) 사이의 1/2 지점에 해당하는 계층-5에 속하는 노드들 중에서 임의의 노드를 교차 노드로 설정할 수 있고, 계층-4에 속하는 노드들 중에서 선택 노드 A를 제외한 임의의 노드를 변이 노드로 설정할 수 있다(S265-1). 예를 들어, 비식별화 장치(100)는 계층-5에 속하는 "a₃, b₁, c₁" 노드를 교차 노드로 설정할 수 있고, 계층-4에 속하는 "a₂, b₁, c₁" 노드를 변이 노드로 설정할 수 있다.The non-discrimination apparatus 100 selects an arbitrary node among the nodes belonging to the layer-5 corresponding to 1/2 point between the layer-4 to which the selected node A belongs and the highest layer (i.e., the layer-6) An arbitrary node other than the selected node A among the nodes belonging to the layer-4 can be set as the mutation node (S265-1). For example, the non-identifying apparatus 100 may set the nodes "a ₃ , b ₁ , c ₁ " belonging to layer-5 as cross nodes and "a ₂ , b ₁ , c ₁ Quot; node as a mutation node.

비식별화 장치(100)는 교차 노드 및 변이 노드 각각에 대응하는 테이블에 대한 비식별화를 수행할 수 있다(S266). 비식별화 장치(100)는 앞서 설명된 단계 S230에서 설정된 비식별화 파라미터(예를 들어, K-익명성, L-다양성, T-근접성)를 만족하는 비식별화된 테이블을 생성할 수 있다.The non-identifying apparatus 100 may perform the non-discrimination on the table corresponding to each of the cross node and the mutation node (S266). The non-identifying device 100 may generate an unidentified table that satisfies the non-identifying parameters (e.g., K-anonymity, L-variance, T-proximity) set in step S230 described above .

또한, 비식별화 장치(100)는 단계 S266에 의해 생성된 비식별화된 테이블의 억제 값 비율이 억제 임계값 이하인지 판단할 수 있다. 비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 교차 노드, 변이 노드)의 개수를 카운팅할 수 있다.In addition, the non-identifying apparatus 100 may determine whether the ratio of suppression values of the non-identified table generated by step S266 is equal to or less than the suppression threshold value. The non-identifying apparatus 100 may count the number of nodes (e.g., crossover nodes, transition nodes) satisfying the "suppression value ratio? Suppression threshold value ".

단계 S261 내지 단계 S266은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 교차 노드, 변이 노드)의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수(예를 들어, 도 14에서 6개)의 x배보다 클 때까지 반복하여 수행될 수 있다. 여기서, x는 0보다 큰 실수일 수 있다. 예를 들어, x는 0.8, 1 또는 1.2로 설정될 수 있다. x는 앞서 설명된 내용에 한정되지 않으며, 다양하게 설정될 수 있다.In steps S261 to S266, the number of nodes (for example, cross nodes, mutation nodes) satisfying the "suppression value ratio? Suppression threshold value" is larger than the number of nodes Can be repeatedly performed until it is larger than the number of times (for example, six in Fig. 14) x times. Where x may be a real number greater than zero. For example, x may be set to 0.8, 1, or 1.2. x is not limited to the above-described contents, and can be variously set.

예를 들어, 선택 노드 A가 속한 계층-4와 최상위 계층(즉, 계층-6) 사이의 2/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 A'로 설정될 수 있다. 또한, 선택 노드 A가 속한 계층-4와 최상위 계층(즉, 계층-6) 사이의 1/3 지점에 해당하는 계층에 속한 임의의 노드가 선택 노드 B'로 설정될 수 있다. 선택 노드 A' 및 선택 노드 B'에 기초하여 단계 S262 내지 단계 S266이 다시 수행될 수 있다. 이러한 과정은 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드의 개수가 원시 래티스를 구성하는 계층들 중에서 가장 많은 노드를 포함하는 계층의 노드 개수의 x배보다 클 때까지 반복하여 수행될 수 있다.For example, an arbitrary node belonging to the layer corresponding to the 2/3 point between the layer-4 to which the selected node A belongs and the highest layer (i.e., layer-6) can be set as the selected node A '. In addition, an arbitrary node belonging to the layer corresponding to the 1/3 point between the layer-4 to which the selected node A belongs and the highest layer (i.e., layer-6) can be set as the selected node B '. Steps S262 to S266 may be performed again based on the selection node A 'and the selection node B'. This process can be repeatedly performed until the number of nodes satisfying the "suppression value ratio? Suppression threshold value" is larger than x times the number of nodes in the layer including the largest number of layers constituting the original lattice .

비식별화 장치(100)는 "억제 값 비율 ≤ 억제 임계값"을 만족하는 노드(예를 들어, 교차 노드, 변이 노드)를 포함하는 최종 래티스를 설정할 수 있다(S267).The non-identifying apparatus 100 may set the final lattice comprising the node (e.g., a crossover node, a transition node) satisfying the "suppression value ratio? Suppression threshold value " (S267).

또한, 비식별화 장치(100)는 출력 인터페이스 장치(150)를 통해 최종 래티스를 디스플레이할 수 있고, 저장 장치(160)(또는, 데이터베이스)에 최종 래티스를 저장할 수 있고, 네트워크 인터페이스 장치(130)를 통해 다른 장치에 최종 래티스를 전송할 수 있다.The non-identifying device 100 may also display the final lattice through the output interface device 150 and may store the final lattice in the storage device 160 (or database) Lt; RTI ID = 0.0 > lattice < / RTI >

다시 도 2를 참조하면, 비식별화 장치(100)는 비식별화된 테이블에 포함된 ID 레코드에 기록된 원시 데이터의 전체 영역 또는 일부 영역에 대해 마스킹 처리를 할 수 있다(S270). 예를 들어, ID 레코드에 기록된 원시 데이터 중에서 마스킹 처리될 영역(예를 들어, 일부 영역)이 미리 설정된 경우, 미리 설정된 영역에 대한 마스킹 처리가 수행될 수 있다. ID 레코드에 기록된 원시 데이터 중에서 마스킹 처리될 영역이 미리 설정되지 않은 경우, 전체 영역에 대한 마스킹 처리가 수행될 수 있다. 단계 S270은 개인정보 비식별화 방법에서 필수적인 단계는 아니며, 필요에 따라 생략될 수 있다. 마스킹 처리된 레코드들을 포함하는 테이블의 일 실시예는 다음과 같다.Referring again to FIG. 2, the non-identifying apparatus 100 may perform masking processing on all or a part of the original data recorded in the ID record included in the non-identified table (S270). For example, when an area (for example, a partial area) to be masked is set in advance in the raw data recorded in the ID record, a masking process for a preset area can be performed. If the area to be masked is not set in advance in the raw data recorded in the ID record, the masking process for the entire area can be performed. Step S270 is not an essential step in the personal information non-discrimination method, and may be omitted if necessary. One embodiment of a table containing the masked records is as follows.

도 14는 마스킹 처리된 레코드들을 포함하는 테이블의 일 실시예를 도시한 개념도이다.14 is a conceptual diagram showing an embodiment of a table including the masked records.

도 14를 참조하면, 테이블(400)에 포함된 주민번호 레코드에 기록된 원시 데이터 중에서 일부 영역은 마스킹 처리될 수 있다. 예를 들어, 주민번호 레코드에 기록된 원시 데이터 중에서 "-" 이후의 영역은 마스킹 처리될 수 있다. 테이블(400)에 포함된 이름 레코드에 기록된 원시 데이터 중에서 전체 영역은 마스킹 처리될 수 있다. 테이블(400)에 포함된 주소 레코드에 기록된 원시 데이터 중에서 일부 영역은 마스킹 처리될 수 있다. 예를 들어, 주소 레코드에 기록된 원시 데이터 중에서 "서울시" 이후의 영역은 마스킹 처리될 수 있다.Referring to FIG. 14, some of the raw data recorded in the resident registration number included in the table 400 may be masked. For example, the area after "-" in the raw data recorded in the social security number record can be masked. The entire area of the raw data recorded in the name record included in the table 400 can be masked. Some of the raw data recorded in the address record included in the table 400 may be masked. For example, the area after "Seoul city" among the raw data recorded in the address record can be masked.

다음으로, 비식별화된 테이블의 위험성을 지시하는 파라미터가 설명될 것이다.Next, parameters indicating the risk of the non-identified table will be described.

재식별화 위험성(re-identification risk)은 비식별화된 테이블의 동등 클래스를 구성하는 로우 개수의 역수에 의해 지시될 수 있다. 재식별화 위험성은 동등 클래스를 구성하는 로우의 최대 개수, 최소 개수, 평균 개수에 따라 달라질 수 있다.The re-identification risk can be indicated by the inverse number of the row number constituting the equivalence class of the un-identified table. The risk of reclassification can vary depending on the maximum number, minimum number, and average number of rows that make up the equivalence class.

샘플(sample) 위험성은 아래 수학식 2에 기초하여 계산될 수 있다.The sample risk can be calculated based on the following equation (2).

집단(population) 위험성은 아래 표 1과 같을 수 있다. 샘플 레이트는 모집단으로부터 샘플링된 비율을 지시할 수 있다.The population risk can be as shown in Table 1 below. The sample rate may indicate the rate sampled from the population.

다음으로, 비식별화된 테이블의 사용성을 지시하는 파라미터가 설명될 것이다.Next, the parameters indicating the usability of the non-identified table will be described.

정확성(precision)은 래티스에 속하는 각각의 노드에 대한 정확성을 측정하기 위해 사용될 수 있고, GH 모델의 평균 높이를 지시할 수 있다. GH 모델에서 일반화 레벨이 높을수록 정확성은 낮아질 수 있고, 데이터의 손실은 증가될 수 있다. 정확성은 아래 수학식 3을 기초로 계산될 수 있다.The precision can be used to measure the accuracy for each node belonging to the lattice and can indicate the average height of the GH model. The higher the generalization level in the GH model, the lower the accuracy and the greater the loss of data. The accuracy can be calculated based on Equation (3) below.

Prec(GT)는 일반화 테이블(generalization table; GT)(즉, 비식별화된 테이블)에 대한 정확성을 지시할 수 있다. N_A는 테이블에 속하는 변수(예를 들어, 도 12에서 우편번호, 나이, 국적, 성별, 질병)의 개수를 지시할 수 있다. N은 테이블을 구성하는 로우의 개수를 지시할 수 있다.

는 GH 모델에서 해당 변수의 일반화 레벨을 지시할 수 있다.

는 GH 모델에서 해당 변수의 일반화 레벨의 최대 값을 지시할 수 있다.Prec (GT) can indicate the accuracy of a generalization table (GT) (i.e., an unidentified table). N _A can indicate the number of variables belonging to the table (e.g., zip code, age, nationality, sex, disease) in FIG. N can indicate the number of rows constituting the table.

Can indicate the generalization level of the variable in the GH model.

Can indicate the maximum value of the generalization level of the variable in the GH model.

분별력 메트릭(discernability metric)은 동등 클래스의 크기, GH 모델에서 일반화 레벨 등을 고려하는 파라미터일 수 있다. 분별력 메트릭은 동등 클래스 내의 일반화된 데이터를 구별하는 능력을 지시할 수 있다. 분별력 메트릭은 아래 수학식 4에 기초하여 계산될 수 있다.The discernability metric may be a parameter considering the size of the equivalence class, the generalization level in the GH model, and the like. The discernibility metric may indicate the ability to distinguish generalized data within an equivalence class. The discriminant power metric can be calculated based on Equation (4) below.

DM은 분별력 메트릭을 지시할 수 있다. f_i는 동등 클래스의 크기를 지시할 수 있다. k는 동등 클래스의 개수를 지시할 수 있다. N은 로우(예를 들어, 테이블을 구성하는 로우)의 개수를 지시할 수 있다.The DM may indicate a discrimination metric. f _i can indicate the size of an equivalence class. k can indicate the number of equivalence classes. N may indicate the number of rows (e.g., the rows that make up the table).

엔트로피(entropy)는 동등 클래스 내의 일반화된 데이터의 구별 능력 또는 정보량을 지시할 수 있다. 엔트로피는 아래 수학식 5에 기초하여 계산될 수 있다.Entropy can indicate the ability to distinguish generalized data or the amount of information in an equivalence class. The entropy can be calculated based on the following equation (5).

는 엔트로피를 지시할 수 있다. α_r는 원시 데이터를 지시할 수 있다. b_r는 일반화된 데이터를 지시할 수 있다. R_ij는 원시 데이터가 기록된 레코드를 지시할 수 있다. R_ij'는 일반화된 데이터가 기록된 레코드를 지시할 수 있다. I는 지시자 함수(indicator function)를 지시할 수 있다.

Can indicate entropy. α _r can indicate the raw data. b _r can indicate generalized data. R _ij can indicate the record in which the raw data is recorded. R _ij 'may indicate a record in which generalized data is recorded. I can indicate an indicator function.

한편, 비식별화 장치(100)는 최종 래티스에 속하는 노드에 대한 위험성 파라미터(예를 들어, 재식별화 위험성, 샘플 위험성, 집단 위험성 등), 사용성 파라미터(예를 들어, 정확성, 분별력 메트릭, 엔트로피 등)를 출력 인터페이스 장치(150)를 통해 디스플레이할 수 있다. 또한, 비식별화 장치(100)는 최종 래티스에 속하는 노드에 대응하는 테이블의 비식별화 이전과 이후(즉, 원시 테이블과 비식별된 테이블의 비교 결과)를 출력 인터페이스 장치(150)를 통해 디스플레이할 수 있다.On the other hand, the non-discrimination device 100 may be configured to determine the risk parameters (e.g., reassociation risk, sample risk, group risk, etc.) for the nodes in the final lattice, usability parameters (e.g., Etc.) through the output interface device 150. In addition, the non-identifying device 100 may display (via the output interface device 150) display before and after the non-identifying of the table corresponding to the node belonging to the last lattice can do.

본 발명에 따른 방법들은 다양한 컴퓨터 수단을 통해 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위해 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The methods according to the present invention can be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer readable medium may be those specially designed and constructed for the present invention or may be available to those skilled in the computer software.

컴퓨터 판독 가능 매체의 예에는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함한다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 적어도 하나의 소프트웨어 모듈로 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of computer readable media include hardware devices that are specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate with at least one software module to perform the operations of the present invention, and vice versa.

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

Claims

A personal information non-discrimination method performed by a personal information de-identification device,
A generalization level for each type of personal information is set based on the generalization level for each type of personal information indicated by the raw data recorded in each of the records included in the original table Creating a primitive lattice comprising a plurality of layers consisting of one node;
Setting an arbitrary node belonging to the layer-n among the plurality of layers to the selected node-1 and setting an arbitrary node belonging to the layer-m to the selected node-2;
Based on a comparison result of a suppression value ratio of a non-identified table corresponding to each of the selected node-1 and the selected node-2 and a preset suppression threshold value, To a crossing node and a transitioning node; And
A final lattice consisting of the node corresponding to the non-identified table having the suppression value ratio of the selected node-1, the selected node-2, the crossing node, and the mutation node equal to or less than the predetermined suppression threshold value Wherein each of the n and m is a natural number and the unidentified table is a result of the unmodified source table based on data corresponding to a generalization level indicated by the node, Is a ratio of an equivalence class that does not satisfy a predetermined K-anonymity among equivalence classes constituting the non-identified table,
The layer-n is located at a point 2/3 from the lowest layer among the plurality of layers of the original lattice, and the layer-m is located at a position 1/3 from the lowest layer among the plurality of layers of the original lattice. A personal information non-discrimination method.

The method according to claim 1,
And the selected node-1 is connected to the selected node-2 within the original lattice.

delete

The method according to claim 1,
If the ratio of the suppression value of the non-identified table corresponding to each of the selected node-1 and the selected node-2 is equal to or smaller than the predetermined suppression threshold value,
Wherein the crossing node is set as an arbitrary node belonging to a layer located at a half point between the layer-m and the lowest layer among the plurality of layers, Is set to an arbitrary node except for the selected node-2 among the nodes belonging to the selected node.

The method according to claim 1,
Wherein the ratio of the suppression value of the non-identified table corresponding to the selected node-1 is less than or equal to the predetermined suppression threshold value, and the ratio of the suppression value of the non-identified table corresponding to the selected node- If the value is exceeded,
Wherein the crossing node is set to an arbitrary node belonging to a layer located at a half-point between the layer-n and the layer-m among the plurality of layers, -n among the nodes belonging to the selected node-1.

The method according to claim 1,
If the ratio of the inhibition value of the non-identified table corresponding to the selected node-1 and the selected node-2 exceeds the preset inhibition threshold value,
Wherein the crossing node is set to an arbitrary node belonging to a layer located at a half point between the layer-n and the highest layer among the plurality of layers, Is set to an arbitrary node other than the selected node-1 among the nodes belonging to the personal information non-discrimination node.

The method according to claim 1,
Wherein the number of nodes constituting the final lattice is at least x times the number of nodes belonging to a layer including the largest number of nodes among the plurality of layers and x is a real number exceeding zero.

A personal information non-discrimination apparatus comprising:
A processor; And
Wherein at least one instruction executed through the processor includes a memory,
Wherein the at least one instruction comprises:
A generalization level for each type of personal information is set based on the generalization level for each type of personal information indicated by the raw data recorded in each of the records included in the original table Creating a primitive lattice comprising a plurality of layers consisting of one node;
Setting an arbitrary node belonging to the layer-n among the plurality of layers to the selected node-1, setting an arbitrary node belonging to the layer-m to the selected node-2;
Based on a comparison result of a suppression value ratio of a non-identified table corresponding to each of the selected node-1 and the selected node-2 and a preset suppression threshold value, To an intersection node and a mutation node; And
To set a final lattice consisting of the nodes corresponding to the un-identified table having the suppression value ratio of the selected node-1, the selected node-2, the crossing node, Wherein each of the n and m is a natural number and the unidentified table is a result of the unmodified raw table based on data corresponding to a generalization level indicated by the node, Indicating a ratio set as an inhibition value to generate the non-identified table from the raw data recorded in the record of the source table,
The layer-n is located at a point 2/3 from the lowest layer among the plurality of layers of the original lattice, and the layer-m is located at a position 1/3 from the lowest layer among the plurality of layers of the original lattice. A personal information non-discrimination unit.

The method of claim 8,
And the selected node-1 is connected to the selected node-2 in the original lattice.

delete

The method of claim 8,
If the ratio of the suppression value of the non-identified table corresponding to each of the selected node-1 and the selected node-2 is equal to or smaller than the predetermined suppression threshold value,
Wherein the crossing node is set as an arbitrary node belonging to a layer located at a half point between the layer-m and the lowest layer among the plurality of layers, Is set to an arbitrary node except the selected node-2 among the nodes belonging to the personal information non-discrimination apparatus.

The method of claim 8,
Wherein the ratio of the suppression value of the non-identified table corresponding to the selected node-1 is less than or equal to the predetermined suppression threshold value, and the ratio of the suppression value of the non-identified table corresponding to the selected node- If the value is exceeded,
Wherein the crossing node is set to an arbitrary node belonging to a layer located at a half-point between the layer-n and the layer-m among the plurality of layers, -n is set to an arbitrary node except for the selected node-1.

The method of claim 8,
If the ratio of the inhibition value of the non-identified table corresponding to the selected node-1 and the selected node-2 exceeds the preset inhibition threshold value,
Wherein the crossing node is set to an arbitrary node belonging to a layer located at a half point between the layer-n and the highest layer among the plurality of layers, Is set to any node other than the selected node-1 among the nodes belonging to the personal information non-discrimination apparatus.

The method of claim 8,
Wherein the number of nodes constituting the final lattice is at least x times as many as the number of nodes belonging to a hierarchy including the largest number of nodes among the plurality of hierarchies, and x is a real number exceeding zero.