KR20100008532A

KR20100008532A - Method of privacy preserving in dynamic datasets publication and privacy preserving system using the same

Info

Publication number: KR20100008532A
Application number: KR1020080069068A
Authority: KR
Inventors: 이주창; 고혁진; 이은주; 최원길; 김응모
Original assignee: 성균관대학교산학협력단
Priority date: 2008-07-16
Filing date: 2008-07-16
Publication date: 2010-01-26
Also published as: KR100954075B1

Abstract

PURPOSE: A method of privacy preserving in dynamic dataset publication and a privacy preserving system using thereof are provided to prevent privacy intrusion by the repeated distribution of data. CONSTITUTION: A protection table generating unit(720) reads out the table T from the first storage. The protection table generating unit gets the predetermined piracy level value(P) as an input. The protection table generating unit creates the table QIT having schema and the table PT equipped with record having schema. The number of records having the Row_ID in the table PT is p. The sum of the PROB column of the records having the Row_ID in the table PT is 1.0.

Description

Personal information protection method in the distribution of dynamic data and personal information protection system using the same {METHOD OF PRIVACY PRESERVING IN DYNAMIC DATASETS PUBLICATION AND PRIVACY PRESERVING SYSTEM USING THE SAME}

본 발명은 개인정보의 보호방법에 관한 것으로, 더욱 상세하게는 개인정보를 담고 있는 데이터, 특히 데이터의 추가나 삭제가 발생되는 동적인 데이터의 배포시에 개인의 민감한 정보가 유출되지 않도록 하는 동적 데이터 배포시의 개인정보 보호방법 및 이를 이용한 개인정보 보호 시스템에 관한 것이다.The present invention relates to a method of protecting personal information, and more particularly, dynamic data distribution to prevent sensitive information of a person from being leaked during distribution of data containing personal information, particularly dynamic data in which addition or deletion of data occurs. It relates to a personal information protection method of the city and a personal information protection system using the same.

데이터베이스(database) 기술 및 정보통신기술의 발전으로 대량의 정보를 수집, 관리, 공유하기가 용이해짐에 따라 조직이나 기관에서는 연구나 통계 분석의 목적으로 개인정보가 포함된 대량의 정보를 수집하는 일이 빈번해지고 있다. 이때, 데이터에 포함된 개인의 민감한 정보가 유출되지 않도록 하는 것이 중요해지고 있다.The development of database technology and information and communication technology makes it easier to collect, manage, and share large amounts of information, and organizations or institutions collect large amounts of information containing personal information for research or statistical analysis. Is becoming frequent. At this time, it is important to prevent sensitive information of the individual included in the data from being leaked.

도 1a는 병원에서 발생된 환자의 진료 기록 데이터를 예시한 도표이다.1A is a diagram illustrating medical record data of a patient generated at a hospital.

예컨대, 도 1a에서 예시된 환자의 진료 기록 데이터를 가정하면(도 1a와 같이 미리 집계 요약되지 않은 테이블 형태의 데이터를 마이크로 데이터 - micro data - 라 정의 가능함), 테이블에는 이름, 나이, 우편번호 및 병명을 포함한 4개의 속성(attribute)가 있게 된다. 여기에서, 이름은 개인을 유일하게 구별할 수 있는 식별자(identifier)이고(실제로는, 동명이인이 존재하여 이름이 유일한 식별자가 될 수는 없으나, 여기서는 유일한 식별자로 가정), 병명은 보호해야 할 민감한 속성에 해당한다. 이때, 도 1a와 같은 마이크로 데이터를 연구나 통계 목적으로 공개할 경우에, 개인의 프라이버시 보호를 위한 가장 간단한 방법은 개인을 유일하게 식별할 수 있는 식별자(도 1a의 테이블에서는 이름)를 제거한 상태에서 테이블을 배포하는 것이다. 그러나, 이러한 방법으로는 개인의 프라이버시를 충분히 보호할 수 없다.For example, assuming the medical record data of the patient illustrated in FIG. 1A (data in the form of a table not pre-aggregated and summarized as in FIG. 1A can be defined as micro data-micro data), the table includes a name, age, zip code, and There are four attributes, including the name of the bottle. Here, the name is an identifier that uniquely distinguishes an individual (actually, the name cannot be the only identifier because a person with the same name exists, but is assumed to be the only identifier here), and the disease name must be protected. This is a sensitive property. In this case, when micro data such as FIG. 1A is disclosed for research or statistical purposes, the simplest method for protecting the privacy of an individual is to remove an identifier (a name in the table of FIG. 1A) that uniquely identifies an individual. To distribute the table. However, this method does not provide sufficient protection for the privacy of the individual.

왜냐하면, 공격자가 '철수'의 나이, 우편번호를 알고 있고 '철수'가 병원에서 진료를 받았다는 사실을 알고 있다면, '철수'의 병명이 '간염'이라는 사실을 쉽게 유추할 수 있기 때문이다. 이처럼, 몇 개의 속성이 서로 결합되어 외부 정보('철수'가 병원 진료를 받았다는 사실)와 연결될 때에 개인을 식별하는데 이용될 수 있는 속성을 부-식별자(quasi-identifier)라 한다.Because if an attacker knows the age and postal code of Bob's, and that Bob is treated in a hospital, he can easily infer that Bob's illness is hepatitis. As such, an attribute that can be used to identify an individual when several attributes are combined with each other and linked to external information (the fact that 'Pullum' has received hospital care) is called a quasi-identifier.

이와 같은 프라이버시 침해 문제를 해결하기 위해서 k-anonymity 모델과 l-diversity 모델이 제안되었다. k-anonymity 모델은 각 레코드가 최소 k-1개의 다른 레코드와 구별되지 않도록 부-식별자를 덜 구체적인 일반화(generalization) 또는 제거(suppression)하는 것에 의해 익명성을 제공하여 프라이버시를 제공하는 기법이다.The k-anonymity model and l-diversity model have been proposed to solve these privacy violations. The k-anonymity model is a technique that provides privacy by providing anonymity by making the sub-identifier less specific generalization or suppression so that each record is not distinguished from at least k-1 other records.

도 1b는 도 1a에서 예시된 마이크로 데이터에 2-anonymity 모델이 적용된 예 를 예시한 도표이다. FIG. 1B is a diagram illustrating an example in which a 2-anonymity model is applied to the micro data illustrated in FIG. 1A.

도 1b를 참조하면, 나이와 우편번호를 일반화시키는 것에 의해서 도 1a의 마이크로 데이터를 구성하는 레코드들을 2개씩 서로 구별되지 않도록 그룹화시킨 예가 도시되어 있다. 이때, 서로 구별되지 않도록 그룹화된 레코드들의 집합을 동등 클래스(equivalent class)라 한다. k-anonymity 모델은 익명성을 제공하기는 하지만 동등 클래스 내에서 민감한 속성의 분포는 고려하지 않으며, l-diversity 모델은 이러한 문제를 해결하기 위해서 가장 빈번하게 발생되는 민감한 속성(병명)의 확률이 최대 1/l이 되도록 하여 이러한 문제점을 해결하는 방법이다.Referring to FIG. 1B, there is shown an example in which the records constituting the micro data of FIG. 1A are grouped so as not to be distinguished from each other by generalizing age and postal code. In this case, a set of records grouped so as not to be distinguished from each other is called an equivalent class. Although the k-anonymity model provides anonymity, it does not take into account the distribution of sensitive attributes within the equivalence class, and the l-diversity model has the highest probability of the most frequently occurring sensitive attributes (bottle names) addressing these problems. It is a way to solve this problem by making 1 / l.

예컨대, 도 1b는 2-anonymity와 2-diversity를 만족하는 테이블이다. k나 l값이 커질수록 프라이버시 보호 정도는 증가된다. 그러나, 프라이버시의 보호 정도가 증가됨에 따라서 일반화로 인한 정보손실로 데이터의 유용성은 감소된다.For example, FIG. 1B is a table satisfying 2-anonymity and 2-diversity. As the value of k or l increases, the degree of privacy protection increases. However, as the degree of privacy protection increases, the usefulness of the data decreases due to information loss due to generalization.

이와 같은 종래의 프라이버시 보호 기법은 정적인(static) 데이터를 배포하는 상황, 즉, 데이터를 배포하는 순간에 모든 데이터가 준비되어 단 한번 배포하는 상황에서는 문제가 없으나, 지속적으로 데이터가 변경되는 환경에서 최근의 정보를 제공하기 위해서 데이터가 삽입되거나 삭제되는 경우에는 문제가 발생된다.Such a conventional privacy protection technique has no problem in the situation of distributing static data, that is, in a situation in which all data is prepared and distributed only once at the time of distributing data, but in an environment in which data is continuously changed. Problems arise when data is inserted or deleted to provide recent information.

도 2a는 도 1a에 예시된 도표에서 추가된 레코드와 삭제된 레코드가 발생된 상황을 예시한 도표이다.FIG. 2A is a diagram illustrating a situation in which an added record and a deleted record are generated in the table illustrated in FIG. 1A.

예컨대, 도 2a는 도 1a에 예시된 도표에서 4개의 레코드('미연', '민석', '현석', '정민')가 추가되고, 4개의 레코드('영호', '민재', '수진', '유진')가 삭제된 테이블을 예시하고 있다.For example, FIG. 2A shows four records ('Miyeon', 'Minseok', 'Hyeonseok', 'Jeongmin') added in the diagram illustrated in FIG. 1A, and four records ('Youngho', 'Minjae', 'Sujin'). ',' Eugene ') is an example of a deleted table.

도 2b는 도 2a에서 예시된 마이크로 데이터에 재배포를 위하여 2-anonymity 모델이 적용된 예를 예시한 도표이다. FIG. 2B is a diagram illustrating an example in which a 2-anonymity model is applied for redistribution to the micro data illustrated in FIG. 2A.

이때, 도 1b와 도 2b를 입수한 공격자는 두 테이블간의 연관관계를 이용하여 다음과 같은 추론을 통하여 개인의 민감한 정보를 획득할 수 있다. 공격자는 '철수'의 부-식별자값과 '철수'의 레코드가 두 테이블에 모두 포함되어 있다는 사실을 배경지식으로 알고 있다면, 도 1a로부터 '철수'의 병명이 간염 또는 감기라는 사실을 알고, 도 1b로부터 '철수'의 병명이 간염 또는 폐렴이라는 사실을 알 수 있다. 따라서, 두 사실을 결합하면 '철수'의 병명은 간염이라고 결정한다. 또는, 도 1b에서 '철수'가 가질 수 있는 병명은 간염과 감기 두 가지 경우였으나, 도 2b에서는 감기라는 병명을 가지는 레코드 자체가 없으므로, 이를 통하여 '철수'의 병명이 간염이라는 사실을 알 수 있다. 이러한 현상을 치명적 결여(critical absence)라 한다.At this time, an attacker who has obtained the information of FIG. 1B and FIG. 2B may obtain the sensitive information of the individual through the following inference using the relation between the two tables. If the attacker knows in the background that the sub-identifier value of 'withdrawal' and 'withdrawal' are included in both tables, he knows from FIG. 1a that the disease of 'withdrawal' is hepatitis or cold, It can be seen from 1b that the disease withdrawal is hepatitis or pneumonia. Thus, combining the two facts, he decides that the name of his illness is hepatitis. Alternatively, in FIG. 1B, the disease names that can be withdrawn are two cases of hepatitis and cold, but in FIG. 2B, since there is no record having the disease name cold, it can be seen that the disease of withdrawal is hepatitis. . This phenomenon is called critical absence.

이와 같은 문제점을 해결하기 위하여 데이터의 삽입/삭제가 발생하는 동적인 데이터의 프라이버시 보호를 위한 m-invariance 모델이 제안되었다. m-invariance 모델은 삭제되는 레코드와 삽입되는 레코드의 민감한 속성이 같으면 삭제되는 레코드를 삽입된 레코드로 대체하여 다시 일반화하고, 치명적 결여가 발생했을 때는 모조 레코드(counterfeit)를 삽입한다. 그러나, m-invariance 기법에서도, 일반화로 인한 정보의 손실이 발생되며, 어떤 레코드의 민감한 속성이 공격자에게 노출된 경우에 같은 동등 클래스에 속한 다른 레코드들에도 영향을 미치게 된다. 따라서, 동적인 데이터에 적용이 가능하면서도 부-식별자의 일반화로 인한 정보의 손실을 방 지할 수 있는 동적인 데이터의 개인정보 보호방법이 필요하다.In order to solve this problem, an m-invariance model has been proposed for dynamic data privacy protection where data insertion / deletion occurs. The m-invariance model replaces the deleted record with the inserted record and generalizes it again if the deleted and equal records have the same sensitive attributes, and inserts counterfeits when a fatal deficit occurs. However, in the m-invariance technique, the loss of information due to generalization occurs, and it affects other records belonging to the same class when the sensitive property of one record is exposed to the attacker. Therefore, there is a need for a dynamic data privacy method that can be applied to dynamic data and prevents loss of information due to generalization of sub-identifiers.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 동적으로 추가 및 삭제가 이루어지는 데이터의 배포시에, 반복되는 데이터의 배포에 의한 프라이버시 침해를 방지하면서도, 부-식별자의 일반화로 인한 정보의 손실을 방지할 수 있는 데이터의 개인정보 보호방법을 제공하는데 있다.An object of the present invention for solving the above problems, in the distribution of data that is dynamically added and deleted, while preventing the invasion of privacy by the distribution of repeated data, while the loss of information due to the generalization of the sub-identifier It is to provide a data protection method of data that can be prevented.

상기와 같은 문제점을 해결하기 위한 본 발명의 다른 목적은, 동적으로 추가 및 삭제가 이루어지는 데이터의 배포시에, 반복되는 데이터의 배포에 의한 프라이버시 침해를 방지하면서도, 부-식별자의 일반화로 인한 정보의 손실을 방지할 수 있도록 데이터를 일반화시켜서 개인정보를 보호하는 데이터의 개인정보 보호 장치를 제공하는데 있다.Another object of the present invention for solving the above problems is, in the distribution of data that is dynamically added and deleted, while preventing the invasion of privacy by the distribution of repeated data, while the information of the generalization of the sub-identifier The present invention provides a personal information protection device that protects personal information by generalizing data to prevent loss.

상기 목적을 달성하기 위한 본 발명은, 부-식별자 속성(QI₁, ..., QI_m)과 도메인 {S₁, ..., S_l}에 속한 범주형(categorical)의 민감한 속성 S를 가지는 n개의 레코드 {t₁, ..., t_n}를 가지는 테이블 T의 정보 보호방법에 있어서, 상기 테이블 T로부터 (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 테이블 QIT를 생성하는 단계 및 상기 테이블 T로부터 상기 테이블 QIT와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 테이블 PT를 생성하는 단계를 포함하고, 상기 테이블 PT에서 같은 Row_ID를 가지는 레코드들은 소정의 프라이 버시 레벨값(p, p는 2이상의 자연수)만큼 생성되고, 상기 같은 Row_ID를 가지는 레코드들의 PROB 컬럼의 합은 1.0이 되도록 하는 것을 특징으로 하는 데이터의 개인정보 보호방법을 제공한다.The present invention to achieve the above object, the categorical sensitive attribute S belonging to the sub-identifier attribute (QI ₁ , ..., QI _m ) and the domain {S ₁ , ..., S _l } A method of protecting information of a table T having _n records {t ₁ , ..., t _n }, comprising: (QI ₁ , ..., QI _m , Row_ID) from the table T Generating a table QIT having a schema and a table PT having a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT from the table T; Records having the same Row_ID in the table PT are generated by a predetermined privacy level value (p, p is a natural number of 2 or more), and the sum of PROB columns of the records having the same Row_ID is 1.0. Provide privacy methods.

여기에서, 상기 같은 Row_ID를 가지는 레코드들의 PROB 컬럼은 균등한 값(1/p)을 가지도록 구성될 수 있다.Here, the PROB column of the records having the same Row_ID may be configured to have an equal value (1 / p).

여기에서, 상기 개인정보 보호방법은 상기 테이블 T에 레코드가 추가 및/또는 삭제된 테이블 T'에 대한 QIT'와 PT'를 생성하는 경우에, T∩T'에 해당하는 레코드들(변경이 없는 레코드들)은 테이블 QIT와 테이블 PT로부터 각각 테이블 QIT'와 테이블 PT'로 그대로 복사하는 단계 및 T'-T에 해당하는 레코드들(추가된 레코드들)에 대해서는, (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 대응되는 레코드를 테이블 QIT'에 생성하고, 상기 테이블 QIT'와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 레코드들을 테이블 PT'에 생성하는 단계를 추가로 포함하고, 상기 테이블 PT'에 생성되는 Row_ID를 가지는 레코드들은 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)만큼 생성되며, 상기 같은 Row_ID를 가지는 레코드들의 PROB 컬럼의 합은 1.0이 되도록 구성될 수 있다.Herein, in the case of generating the QIT 'and PT' for the table T 'in which the record is added and / or deleted in the table T, the records corresponding to T∩T' (there is no change). Records) are copied from the table QIT and the table PT to the table QIT 'and the table PT' as it is, and for the records corresponding to T'-T (added records), (QI ₁ , ..., A corresponding record having a schema including QI _m , Row_ID) is created in the table QIT ', and a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT'. And generating records in the table PT ', wherein records having the Row_ID generated in the table PT' are generated by a predetermined privacy level value (p, p is a natural number of 2 or more), and the same Row_ID. The sum of the PROB columns of records with That may be configured.

상기 다른 목적을 달성하기 위한 본 발명은, 부-식별자 속성(QI₁, ..., QI_m)과 도메인 {S₁, ..., S_l}에 속한 범주형(categorical)의 민감한 속성 S를 가지는 n개의 레코드 {t₁, ..., t_n}를 가지는 테이블 T이 저장된 제 1 저장부, 상기 제 1 저 장부로부터 상기 테이블 T를 독출하고, 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)를 입력 받아, (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 테이블 QIT와, 상기 테이블 QIT와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 레코드를 포함하고, 같은 Row_ID를 가진 레코드들 수는 p가 되며, 같은 Row_ID를 가진 레코드들의 PROB 컬럼의 합은 1.0이 되는 테이블 PT를 생성하는 보호 테이블 생성부 및 상기 보호 테이블 생성부로부터 출력된 상기 테이블 QIT와 테이블 PT를 저장하는 제 2 저장부를 포함한 것을 특징으로 하는 데이터의 개인정보 보호 장치를 제공한다.The present invention for achieving the above another object, the categorical sensitive attribute S belonging to the sub-identifier attribute (QI ₁ , ..., QI _m ) and the domain {S ₁ , ..., S _l } Reads the table T from the first storage unit storing the table T having _n records {t ₁ , ..., t _n } with the first storage unit, and the predetermined privacy level values (p, p A table QIT having a schema containing (QI ₁ , ..., QI _m , Row_ID), and (Row_ID, which is the target of a join operation with the table QIT). S, PROB) including a record with a schema including the schema, the number of records having the same Row_ID is p, the protected table generation unit for creating a table PT that the sum of PROB columns of records having the same Row_ID is 1.0 And a second storage unit for storing the table QIT and the table PT output from the protection table generator. Provide privacy for data, characterized in that a.

여기에서, 상기 보호 테이블 생성부에서 생성하는 테이블 PT에서, 같은 Row_ID를 가지는 레코드들의 PROB 컬럼은 균등한 값을 가지도록 구성될 수 있다.Here, in the table PT generated by the protection table generating unit, PROB columns of records having the same Row_ID may be configured to have equal values.

여기에서, 상기 개인정보 보호장치는 상기 테이블 T에 레코드가 추가 및/또는 삭제된 테이블 T'가 저장된 제 3 저장부; 상기 제 1 저장부, 상기 제 2 저장부 및 상기 제 3 저장부로부터 상기 테이블 T, 상기 테이블 QIT, 상기 테이블 PT 및 상기 테이블 T'를 독출하고, 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)를 입력 받아, T ∩ T'에 해당하는 레코드들(변경이 없는 레코드들)은 테이블 QIT와 테이블 PT로부터 각각 테이블 QIT'와 테이블 PT'로 그대로 복사하고, T'-T에 해당하는 레코드들(추가된 레코드들)에 대해서는, 대응되는 (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 레코드들을 테이블 QIT'에 생성하고, 상기 테이블 QIT'와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지 며, 같은 Row_ID를 가진 레코드들의 PROB 컬럼의 합은 1.0이 되고 같은 Row_ID를 가진 p개의 레코드를 테이블 PT'에 생성하는 변경 보호 테이블 생성부; 및 상기 변경 보호 테이블 생성부로부터 출력된 상기 테이블 QIT'와 테이블 PT'를 저장하는 제 4 저장부를 추가로 포함하여 구성될 수 있다.The personal information protection device may further include: a third storage unit storing a table T 'in which records are added and / or deleted in the table T; The table T, the table QIT, the table PT, and the table T 'are read from the first storage unit, the second storage unit, and the third storage unit, and predetermined privacy level values p and p are two or more. Natural records), and records corresponding to T ∩ T '(records without change) are copied from table QIT and table PT to table QIT' and table PT 'as they are, and records corresponding to T'-T (Added records), records having a schema containing corresponding (QI ₁ , ..., QI _m , Row_ID) are created in the table QIT ', and the table QIT' is joined to ( join) has a schema including (Row_ID, S, PROB) that is the target of the operation, and the sum of the PROB columns of records with the same Row_ID is 1.0 and creates p records with the same Row_ID in the table PT '. A change protection table generation unit; And a fourth storage unit for storing the table QIT 'and the table PT' outputted from the change protection table generator.

상기와 같은 본 발명에 따른 개인정보 보호방법 및 개인정보 보호장치를 이용할 경우에는, 개인의 민감한 정보가 포함된 데이터를 배포할 경우에 민감한 정보를 포함한 민감한 속정 정보 필드에 노이즈를 추가하는 것에 의하여 개인정보를 보호할 수 있다.In the case of using the personal information protection method and the personal information protection device according to the present invention as described above, in the case of distributing data including personal sensitive information, by adding noise to the sensitive information field containing sensitive information You can protect your information.

특히, 데이터에 대한 추가/삭제가 이루어지는 동적인 데이터를 재배포하는 경우에 본 발명에 따른 개인정보 보호방법 및 개인정보 보호장치가 이용될 경우에는, 이전 배포 데이터와 최근 배포 데이터를 서로 대비하는 것에 의해서 개인정보를 유추하려는 공격자의 시도에 대한 보호가 가능하다.In particular, when redistributing dynamic data in which addition / deletion of data is performed, when the personal information protection method and the personal information protection device according to the present invention are used, by contrasting previous distribution data and recent distribution data with each other. Protection against attackers' attempts to infer personal information is possible.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에 따른 개인정보 보호방법Personal information protection method according to the present invention

본 발명에 따른 개인정보 방법은, 부-식별자 속성(QI₁, ..., QI_m)과 도메인 {S₁, ..., S_l}에 속한 범주형(categorical)의 민감한 속성 S를 가지는 n개의 레코드 {t₁, ..., t_n}를 가지는 테이블 T로부터 개인정보가 보호되도록 테이블 QIT와 테이블 PT를 구성하여 배포하는 방법이다.The privacy method according to the present invention has a sub-identifier attribute (QI ₁ , ..., QI _m ) and a categorical sensitive attribute S belonging to the domain {S ₁ , ..., S _l } Table QIT and table PT are constructed and distributed so that personal information is protected from table T having _n records {t ₁ , ..., t _n }.

먼저, 본 발명에 따른 개인정보 보호방법이 적용되는 마이크로 데이터의 구성을 설명한다.First, the configuration of the micro data to which the personal information protection method according to the present invention is applied will be described.

개인정보를 포함하고 있는 n개의 레코드(t₁, ..., t_n)을 포함하고 있는 테이블 T는 하기 [표현 1]을 통하여 표현이 가능하다.Table T containing _n records (t ₁ , ..., t _n ) containing personal information can be expressed through the following [Expression 1].

[표현 1][Expression 1]

T={t₁, ..., t_n}T = {t ₁ , ..., t _n }

이때, 테이블 T의 카디널리티(cardinality)는 하기 표현 2를 통하여 표현이 가능하다.At this time, the cardinality of the table T can be expressed through Expression 2 below.

[표현 2][Expression 2]

|T|=n| T | = n

또한, 테이블 T는 부-식별자(quasi-identifier) 속성을 가지며, 테이블 T의 m개 필드가 부-식별자 속성을 가진다면, 하기 표현 3을 통하여 표현이 가능하다. 예컨대, 도 1a 및 도 2a에 예시된 테이블을 참조하여 설명한다면, 도 1a 및 도 2a에 예시된 테이블의 부-식별자 속성은 '나이' 및 '우편번호'가 되며, m은 2가 된다.In addition, if the table T has a quasi-identifier attribute, and m fields of the table T have the sub-identifier attribute, the table T may be expressed through Expression 3 below. For example, referring to the tables illustrated in FIGS. 1A and 2A, the sub-identifier attributes of the tables illustrated in FIGS. 1A and 2A are 'age' and 'zip code', and m is 2.

[표현 3][Expression 3]

테이블 T의 부식별자 속성: QI₁, ..., QI_m Corrosive properties of table T: QI ₁ , ..., QI _m

또한, 테이블 T는 민감한 속성 S를 가지며(테이블에는 한 개의 민감한 속성 이 있다고 가정), S는 이산적(discrete) 값을 가지는 범주형(categorical) 데이터며, 그 값은 하기 표현 4와 같이 도메인 D내에 속하는 l개의 값 중에서 하나의 값을 가진다. 또한 표현 5와 같이 도메인의 크기는 l로 정의된다.In addition, table T has a sensitive attribute S (assuming there is one sensitive attribute), S is categorical data with discrete values, and the value is domain D as shown in Equation 4 below. It has one of l values in it. In addition, as shown in Expression 5, the size of the domain is defined as l.

예컨대, 도 1a에 예시된 테이블을 참조하여 설명한다면, 도 1a에 예시된 테이블의 민감한 속성은 '병명'이 되며, 민감한 속성은 '간염', '감기', '폐렴', '위궤양', '위염', '당뇨' 등의 값을 가지는 범주형 데이터이다.For example, referring to the table illustrated in FIG. 1A, the sensitive attribute of the table illustrated in FIG. 1A becomes' disease name ', and the sensitive attribute is' hepatitis',' cold ',' pneumonia ',' gastric ulcer ',' Categorical data with values such as gastritis and diabetes.

[표현 4][Expression 4]

D={s₁, ..., s_l}D = {s ₁ , ..., s _l }

[표현 5][Expression 5]

|D|=l| D | = l

도 3은 본 발명에 따른 개인정보 보호방법을 설명하기 위한 순서도이다.3 is a flowchart illustrating a personal information protection method according to the present invention.

도 3을 참조하면, 본 발명에 따른 개인정보 보호방법은, 상기 테이블 T로부터 (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 테이블 QIT를 생성하는 단계(S310)와 상기 테이블 T로부터 상기 테이블 QIT와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 테이블 PT를 생성하는 단계(S320)를 포함하여 구성될 수 있다.Referring to FIG. 3, in the privacy protection method according to the present invention, generating a table QIT having a schema including (QI ₁ , ..., QI _m , Row_ID) from the table T (S310). And a table PT having a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT from the table T (S320).

즉, 단계(S310)에서 생성되는 테이블 QIT는 테이블 T에 포함된 모든 t_i(i는 1이상 n이하)에 대해서 (t_i[QI₁], ..., t_i[QI_m], Row_ID) 튜플(tuple)을 가지게 된다.That is, the table QIT generated in step S310 is (t _i [QI ₁ ], ..., t _i [QI _m ], Row_ID for all t _i (i is 1 or more and n or less) included in the table T. ) Will have tuples.

도 4a는 도 1a에 예시된 데이터에 본 발명에 따른 개인정보 보호방법이 적용된 경우에 생성된 테이블 QIT를 예시하는 도표이다.FIG. 4A is a diagram illustrating a table QIT generated when the privacy protection method according to the present invention is applied to the data illustrated in FIG. 1A.

다음으로, 단계(S320)에서 생성되는 테이블 PT는 테이블 QIT와 조인(join) 연산이 가능하도록 테이블 QIT와 공통적으로 포함된 Row_ID 컬럼을 가지며, (Row_ID, S, PROB)의 스키마를 가진 튜플들로 구성된다. Next, the table PT generated in step S320 has a Row_ID column commonly included in the table QIT to allow a join operation with the table QIT, and is a tuple having a schema of (Row_ID, S, PROB). It is composed.

도 4b는 도 1a에 예시된 데이터를 본 발명에 따른 개인정보 보호방법이 적용된 경우에 생성된 테이블 PT를 예시하는 도표이다.FIG. 4B is a diagram illustrating a table PT generated when the data illustrated in FIG. 1A is applied to the privacy protection method according to the present invention.

이때, 테이블 PT는 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)에 따라서 생성되는데, 프라이버스 레벨값(p)에 대응하여, 동일한 Row_ID 값을 가지는 레코드들이 p개 생성되게 된다. 또한, 동일한 Row_ID 값을 가지는 레코드들의 확률값(PROB) 컬럼의 합은 1.0이 된다.At this time, the table PT is generated according to a predetermined privacy level value (p, p is a natural number of 2 or more). In response to the privacy level value p, p records having the same Row_ID value are generated. In addition, the sum of the probability value PROB columns of records having the same Row_ID value is 1.0.

즉, 본 발명에 따른 개인정보 보호방법은 민감한 속성값이 노출되는 것을 방지하기 위하여 종래 기술에서 언급된 것과 같이 부-식별자 속성을 일반화시키는 대신에 민감한 속성값에 노이즈를 더하는 방식(p-1개의 노이즈 레코드가 테이블 PT에 추가됨)을 이용한다. 예컨대, S의 도메인 D에서 균등한 확률로 임의의 값을 p-1개 발생시켜 레코드를 추가로 발생시켜 테이블 PT에 포함시키는 것에 의하여 공격자가 어떤 값이 원래의 민감한 속성값인지를 구별할 수 없도록 할 수 있다.That is, the privacy protection method according to the present invention adds noise to the sensitive attribute values instead of generalizing the sub-identifier attributes as mentioned in the prior art in order to prevent the exposure of the sensitive attribute values. Noise record is added to the table PT). For example, by generating p-1 random values with equal probability in domain D of S, generating additional records and including them in the table PT so that an attacker cannot distinguish which values are originally sensitive attributes. can do.

본 발명에 따른 개인정보 보호방법에서 데이터 재배포 방법Data redistribution method in the personal information protection method according to the present invention

도 5는 본 발명에 따라 데이터의 재배포시에 적용될 수 있는 개인정보 보호방법을 설명하기 위한 순서도이다.5 is a flowchart illustrating a personal information protection method that can be applied at the time of redistribution of data according to the present invention.

도 5에서 예시된 데이터 재배포시의 개인정보 보호방법은, 상기 도 3을 통하여 설명된 테이블 T에 대하여 레코드가 추가 및/또는 삭제된 테이블 T'가 생성되고, 테이블 T'를 테이블 T를 입수한 조직이나 기관에 재배포하여야 하는 상황이 발생된 경우를 상정하고 있다.In the data redistribution method illustrated in FIG. 5, a table T 'in which records are added and / or deleted with respect to the table T described with reference to FIG. 3 is generated, and the table T' is obtained from the table T '. It is assumed that a situation occurs that requires redistribution to an organization or organization.

본 발명에 따른 개인정보 보호방법은, 데이터를 재배포하는 경우에, T ∩ T'에 해당하는 레코드들(변경이 없는 레코드들)은 테이블 QIT와 테이블 PT로부터 각각 테이블 QIT'와 테이블 PT'로 그대로 복사하는 단계(S510)와, T'-T에 해당하는 레코드들(추가된 레코드들)에 대해서는, (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 대응되는 레코드를 테이블 QIT'에 생성하고, 상기 테이블 QIT'와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 레코드들을 테이블 PT'에 생성하는 단계(S520)를 거쳐서 개인정보를 보호하게 된다.According to the privacy protection method according to the present invention, when data is redistributed, records corresponding to T ∩ T '(records without modification) remain as table QIT' and table PT 'from table QIT and table PT, respectively. For the step of copying (S510) and the records (added records) corresponding to T'-T, a corresponding scheme having a schema including (QI ₁ , ..., QI _m , Row_ID) is included. Creating a record in the table QIT ', and generating records in the table PT' having a schema including (Row_ID, S, PROB) to be joined to the table QIT '(S520). Your privacy is protected.

이때, 상기 단계(S520)에서는 상기 테이블 PT'에 생성되는 Row_ID를 가지는 레코드들은 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)만큼 생성되며, 상기 같은 Row_ID를 가지는 레코드들의 상기 PROB 컬럼의 합은 1.0이 되도록 한다.At this time, in step S520, records having the Row_ID generated in the table PT 'are generated by a predetermined privacy level value (p, p is a natural number of 2 or more), and the sum of the PROB columns of the records having the same Row_ID. Is 1.0.

도 6a와 도 6b는 도 1a에 예시된 데이터에 대하여 레코드의 추가 및 삭제가 이루어진 도 1b에 예시된 데이터에 대한 본 발명에 따른 개인정보 보호방법이 적용되어 생성된 테이블 QIT'와 테이블 PT'를 설명하기 위한 도표이다.6A and 6B illustrate tables QIT 'and Table PT' generated by applying the privacy method according to the present invention for the data illustrated in FIG. 1B in which addition and deletion of records are performed on the data illustrated in FIG. 1A. This is a diagram to explain.

도 6a를 참조하면, 도 1a에서 예시된 데이터에 대하여 4개의 레코드가 추가되고 4개의 레코드가 삭제되어 있는 도 2a에서 예시된 데이터에 대응된 테이블 QIT'가 예시된다. Referring to FIG. 6A, a table QIT 'corresponding to the data illustrated in FIG. 2A, in which four records are added and four records are deleted for the data illustrated in FIG. 1A, is illustrated.

도 6b를 참조하면, 도 6a에 예시된 테이블 QIT'에 대하여 조인 연산되는 테이블 PT'가 예시되며, 프라이버시 레벨값으로 2(p=2)가 지정된 경우를 예시하고 있으므로 동일한 Row_ID를 가지는 레코드들이 2개씩 존재하는 경우를 예시하고 있다. 이때, 동일한 Row_ID를 가지는 레코드들은 균등한 확률(0.5, 1/2)로 원래의 민감한 속성값과 노이즈 값을 가지고 있는 것이 예시된다.Referring to FIG. 6B, a table PT 'that is a join operation on the table QIT' illustrated in FIG. 6A is illustrated, and since 2 (p = 2) is specified as the privacy level value, records having the same Row_ID are 2. It illustrates the case where there exists one by one. In this case, records having the same Row_ID have an original sensitive attribute value and noise value with an equal probability (0.5, 1/2).

이를 통하여, 종래 기술에 언급된 바와 같은 도 1b와 도 2b와 같이 부-식별자의 일반화를 통한 개인정보 보호기법을 이용하는 경우와는 달리, 데이터가 재배포되는 경우에도, 양 데이터를 대조하는 것에 의해서 특정 개인의 민감한 속성을 유추할 수 있게 되는 현상과, 일측의 데이터에서 누락되는 민감한 속성에 의한 치명적 결여 현상을 방지할 수 있다.Through this, unlike the case of using the privacy protection method through the generalization of the sub-identifier as shown in Fig. 1b and 2b as mentioned in the prior art, even when the data is redistributed, It is possible to prevent the phenomenon of inferring a sensitive attribute of an individual and a fatal lack of a sensitive attribute missing from one side data.

즉, 도 4a와 도 4b를 각각 도 6a와 도 6b를 비교하면, 특정인의 부-식별자 값을 알고, 특정인(예컨대, '철수')이 양 테이블(도 4a, 도 6a)에 모두 포함되어 있다는 사실을 알고 있다 하더라고, 특정인의 민감한 속성('간염')에는 여전히 변 함없이 동일한 노이즈('골절')가 부가되어 있으므로 개인정보가 보호됨을 알 수 있다. That is, comparing FIG. 4A and FIG. 4B with FIG. 6A and FIG. 6B, respectively, the sub-identifier value of a specific person is known, and that the specific person (eg, 'withdrawal') is included in both tables (FIGS. 4A and 6A). Even if we know the facts, we can see that the sensitive properties of certain people ('hepatitis') still have the same noise ('fractures'), so their privacy is protected.

본 발명에 따른 개인정보 보호방법이 적용되는 개인정보 보호장치Personal information protection device to which the personal information protection method according to the present invention is applied

도 7은 본 발명에 따른 개인정보 보호장치의 구성예를 도시한 블록도이다.7 is a block diagram showing an example of the configuration of a personal information protection device according to the present invention.

도 7을 참조하면, 본 발명에 따른 개인정보 보호장치의 구성예(700)는 제 1 저장부(710), 보호 테이블 생성부(720) 및 제 2 저장부(730)를 포함하여 구성될 수 있다.Referring to FIG. 7, a configuration example 700 of the personal information protection device according to the present invention may include a first storage unit 710, a protection table generator 720, and a second storage unit 730. have.

도 7에서 예시된 개인정보 보호장치(720)는 종래의 데이터베이스 관리 시스템(DBMS: Database Management System)에 포함되거나, 종래의 데이터베이스 관리 시스템에 부가되어 사용될 수 있는 개인정보 보호화 장치로서, 일반적으로는 소프트웨어적으로 구현될 수 있을 것이다.The personal information protection device 720 illustrated in FIG. 7 is a personal information protection device that may be included in a conventional database management system (DBMS) or may be used in addition to a conventional database management system. It may be implemented in software.

예컨대, 제 1 저장부(710)는 개인정보가 보호된 채 배포되어야 하는 원본 데이터를 포함하고 있는 테이블 T를 저장하는 구성요소로서, 광기록매체, 자기기록매체, 반도체 메모리 등을 포괄적으로 포함하여 지칭하는 구성요소이다. 제 1 저장부(710)에 포함된 테이블 T는 관계형 데이터베이스(relational database)에 관리하는 테이블일 수 있다. 여기에서 테이블 T는 앞서 설명된 본 발명에 따른 개인정보 보호방법의 적용 대상이 되는 부-식별자 속성(QI₁, ..., QI_m)과 도메인 {S₁, ..., S_l}에 속한 범주형(categorical)의 민감한 속성 S를 가지는 n개의 레코드 {t₁, ..., t_n}를 가지는 테이블 T로 설명된다.For example, the first storage unit 710 is a component for storing a table T including original data to be distributed with personal information protected, and includes an optical recording medium, a magnetic recording medium, a semiconductor memory, and the like. Refers to the component. The table T included in the first storage unit 710 may be a table managed in a relational database. Here, the table T is assigned to the sub-identifier attribute (QI ₁ , ..., QI _m ) and the domain {S ₁ , ..., S _l } to which the privacy protection method according to the present invention described above is applied. It is described as a table T with n records {t ₁ , ..., t _n } with the categorical sensitive attribute S belonging to it.

보호 테이블 생성부(720)는 제 1 저장부(710) 및 후술될 제 2 저장부(730)에 억세스 가능한 구성요소로서, 앞서 언급된 바와 같이 데이터베이스 관리 시스템 등에 포함되거나 부가된 소프트웨어적으로 구현된 구성요소일 수 있다.The protection table generator 720 is a component accessible to the first storage unit 710 and the second storage unit 730 which will be described later. It may be a component.

보호 테이블 생성부(720)는 상기 제 1 저장부로부터 상기 테이블 T를 독출하고, 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)를 입력 받아, (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 테이블 QIT와, 상기 테이블 QIT와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지는 레코드를 포함하고, 같은 Row_ID를 가진 레코드들 수는 p가 되며, 같은 Row_ID를 가진 레코드들의 PROB 컬럼의 합은 1.0이 되는 테이블 PT를 생성하는 구성요소이다.The protection table generating unit 720 reads the table T from the first storage unit, receives a predetermined privacy level value (p, p is a natural number of two or more), and receives (QI ₁ ,..., QI _m ,). Table QIT having a schema including Row_ID), and a record having a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT, and having the same Row_ID. The number of records becomes p, and the component that creates the table PT becomes the sum of PROB columns of records having the same Row_ID.

따라서, 보호 테이블 생성부(720)는 외부(사용자)로부터 프라이버시 레벨값(p)를 입력 받아서, 프라이버시 레벨값에 상응하는 노이즈를 추가하여(p-1개의 노이즈 레코드가 테이블 PT에 추가됨) 테이블 QIT와 테이블 PT를 생성하게 된다. 프라이버시 레벨값이 높을수록, 민감한 속성에 노이즈가 부가되어 개인정보의 보호도는 높아지지만 데이터의 유용성은 낮아지게 되며, 역으로 프라이버시 레벨값이 낮을수록, 민감한 속성에 부가되는 노이즈가 작아지므로 개인정보의 보호도는 낮아지지만 데이터의 유용성은 높아지게 된다.Accordingly, the protection table generator 720 receives the privacy level value p from the outside (user), adds noise corresponding to the privacy level value (p-1 noise records are added to the table PT), and the table QIT. Will create the table PT. The higher the privacy level value, the more noise is added to the sensitive property, which increases the protection of the personal information, but the lower the usefulness of the data. Conversely, the lower the privacy level value is, the less noise is added to the sensitive property. Is less protected, but the data is more useful.

마지막으로, 제 2 저장부(730)는 상기 보호 테이블 생성부(720)로부터 출력된 상기 테이블 QIT와 테이블 PT를 저장하는 구성요소로서, 상기 제 1 저장부와 마 찬가지로, 광기록매체, 자기기록매체, 반도체 메모리 등을 포괄적으로 포함하여 지칭하는 구성요소이다. 여기에서, 상기 제 1 저장부(710)와 상기 제 2 저장부(730)는 기능적으로 분리하여 설명되었을 뿐, 물리적으로는 동일한 저장장치를 지칭하는 것일 수 있다.Lastly, the second storage unit 730 is a component for storing the table QIT and the table PT output from the protection table generating unit 720. Like the first storage unit, the optical recording medium and the magnetic recording unit are stored. Comprehensive element, including a medium, a semiconductor memory, and the like. Here, the first storage unit 710 and the second storage unit 730 are only described functionally separated, and may refer to physically identical storage devices.

또한, 도 7을 다시 참조하면, 본 발명에 따른 개인정보 보호장치의 구성예(700)는 제 3 저장부(740), 변경 보호 테이블 생성부(750) 및 제 4 저장부(760)를 추가로 구비하여 구성될 수 있다.In addition, referring back to FIG. 7, the configuration 700 of the personal information protection device according to the present invention includes a third storage unit 740, a change protection table generation unit 750, and a fourth storage unit 760. It can be configured to include.

즉, 본 발명에 따른 개인정보 보호장치는 상술된 도 5에서 예시된 본 발명에 따라 데이터의 재배포시에 적용될 수 있는 개인정보 보호방법을 실행하기 위해서 추가적인 구성요소로서 제 3 저장부(740), 변경 보호 테이블 생성부(750) 및 제 4 저장부(760)를 포함할 수 있다.That is, the personal information protection device according to the present invention may further include a third storage unit 740 as an additional component in order to execute a personal information protection method applicable to redistribution of data according to the present invention illustrated in FIG. The change protection table generator 750 and the fourth storage unit 760 may be included.

제 3 저장부(740)는 상기 테이블 T에 레코드가 추가 및/또는 삭제된 테이블 T'가 저장된 구성요소이며, 마찬가지로, 광기록매체, 자기기록매체, 반도체 메모리 등을 포괄적으로 포함하여 지칭하는 구성요소이다.The third storage unit 740 is a component in which the table T 'in which records are added and / or deleted in the table T is stored. Similarly, the third storage unit 740 includes an optical recording medium, a magnetic recording medium, and a semiconductor memory. Element.

변경 보호 테이블 생성부(750)는 상기 제 1 저장부(710), 상기 제 2 저장부(730) 및 상기 제 3 저장부(740)로부터 상기 테이블 T, 상기 테이블 QIT, 상기 테이블 PT 및 상기 테이블 T'를 독출하고, 소정의 프라이버시 레벨값(p, p는 2이상의 자연수)를 입력 받아, T ∩ T'에 해당하는 레코드들(변경이 없는 레코드들)은 테이블 QIT와 테이블 PT로부터 각각 테이블 QIT'와 테이블 PT'에 그대로 복사하고, T'-T에 해당하는 레코드들(추가된 레코드들)에 대해서는, 대응되는 (QI₁, ..., QI_m, Row_ID)가 포함된 스키마(schema)를 가지는 레코드들을 테이블 QIT'에 생성하고, 상기 테이블 QIT'와 조인(join) 연산의 대상이 되는 (Row_ID, S, PROB)가 포함된 스키마를 가지며, 같은 Row_ID를 가진 레코드들의 PROB 컬럼의 합은 1.0이 되고 같은 Row_ID를 가진 p개의 레코드를 테이블 PT'에 생성하는 구성요소이다.The change protection table generation unit 750 may generate the table T, the table QIT, the table PT, and the table from the first storage unit 710, the second storage unit 730, and the third storage unit 740. T 'is read out, and a predetermined privacy level value (p, p is a natural number of 2 or more) is input, and records corresponding to T ∩ T' (records without change) are table QIT from table QIT and table PT, respectively. Copies directly to 'and table PT' and, for records corresponding to T'-T (added records), schema with corresponding (QI ₁ , ..., QI _m , Row_ID) Create records in the table QIT ', have a schema including (Row_ID, S, PROB) that is the target of a join operation with the table QIT', and the sum of the PROB columns of the records having the same Row_ID A component that creates p records in table PT 'with a value of 1.0 and having the same Row_ID .

변경 보호 테이블 생성부(750)는 제 1, 2, 3 저장부들(710, 730, 740) 및 후술될 제 4 저장부(760)에 억세스 가능한 구성요소로서, 앞서 언급된 보호 테이블 생성부(730)와 마찬가지로, 데이터베이스 관리 시스템 등에 포함되거나 부가된 소프트웨어적으로 구현된 구성요소일 수 있다.The change protection table generator 750 is a component accessible to the first, second, and third storage units 710, 730, and 740 and the fourth storage unit 760, which will be described later. The protection table generation unit 730 is described above. ), May be a software-implemented component included in or added to a database management system.

제 4 저장부(760)는 상기 변경 보호 테이블 생성부로부터 출력된 상기 테이블 QIT'와 테이블 PT'를 저장하는 구성요소로서, 마찬가지로, 광기록매체, 자기기록매체, 반도체 메모리 등을 포괄적으로 포함하여 지칭하는 구성요소이다.The fourth storage unit 760 is a component for storing the table QIT 'and the table PT' outputted from the change protection table generating unit. Similarly, the fourth storage unit 760 includes an optical recording medium, a magnetic recording medium, a semiconductor memory, and the like. Refers to the component.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although described above with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below I can understand that you can.

도 1a는 병원에서 발생된 환자의 진료 기록 데이터를 예시한 도표이며, 도 1b는 도 1a에서 예시된 마이크로 데이터에 2-anonymity 모델이 적용된 예를 예시한 도표이다. FIG. 1A is a diagram illustrating medical record data of a patient generated at a hospital, and FIG. 1B is a diagram illustrating an example in which a 2-anonymity model is applied to the micro data illustrated in FIG. 1A.

도 2a는 도 1a에 예시된 도표에서 추가된 레코드와 삭제된 레코드가 발생된 상황을 예시한 도표이며, 도 2b는 도 2a에서 예시된 마이크로 데이터의 재배포를 위하여 2-anonymity 모델이 적용된 예를 예시한 도표이다. FIG. 2A is a diagram illustrating a situation in which added and deleted records are generated in the diagram illustrated in FIG. 1A, and FIG. 2B illustrates an example in which a 2-anonymity model is applied for redistribution of micro data illustrated in FIG. 2A. It is a chart.

도 4a와 도 4b는 도 1a에 예시된 데이터에 본 발명에 따른 개인정보 보호방법이 적용된 경우에 생성된 테이블 QIT와 테이블 PT를 설명하기 위한 도표이다.4A and 4B are tables for explaining the table QIT and the table PT generated when the personal information protection method according to the present invention is applied to the data illustrated in FIG. 1A.

Claims

N records with sub-identifier attributes (QI ₁ , ..., QI _m ) and categorical sensitive attributes S belonging to domain {S ₁ , ..., S _l } {t ₁ , .. In the information protection method of a table T having., t _n },

Generating a table QIT having a schema including (QI ₁ ,..., QI _m , Row_ID) from the table T; And

Generating a table PT having a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT from the table T;

Records having the same Row_ID in the table PT are generated by a predetermined privacy level value (p, p is a natural number of 2 or more), and the sum of PROB columns of the records having the same Row_ID is 1.0. How we protect your privacy.

The method of claim 1,

PROB column of the records having the same Row_ID has an equal value (1 / p).

The method of claim 1,

In the case of generating QIT 'and PT' for the table T 'having records added and / or deleted in the table T,

Copying records corresponding to T ∩ T '(records without modification) from the table QIT and the table PT to the table QIT' and the table PT 'as they are; And

For records corresponding to T'-T (added records), create a corresponding record in table QIT 'with a schema containing (QI ₁ , ..., QI _m , Row_ID) Generating records in a table PT 'having a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT',

Records having a Row_ID generated in the table PT 'are generated by a predetermined privacy level value (p, p is a natural number of 2 or more), and the sum of PROB columns of the records having the same Row_ID is 1.0. How we protect your data.

N records {t1, ..., tn} with sub-identifier attributes (QI1, ..., QIm) and categorical sensitive attributes S belonging to domains {S1, ..., Sl} A first storage unit having a table T stored therein;

The table T is read from the first storage unit, a predetermined privacy level value (p, p is a natural number of two or more), and a schema including (QI ₁ , ..., QI _m , Row_ID) is included. and a record having a table QIT having a schema) and a schema including (Row_ID, S, PROB) that is a target of a join operation with the table QIT, and the number of records having the same Row_ID is p. A protection table generating unit generating a table PT of which the sum of PROB columns of records having the same Row_ID is 1.0; And

And a second storage unit for storing the table QIT and the table PT outputted from the protection table generating unit.

The method of claim 4, wherein

In the table PT generated by the protection table generating unit, PROB columns of records having the same Row_ID have an equal value.

The method of claim 4, wherein

A third storage unit storing a table T 'in which records are added and / or deleted in the table T;

The table T, the table QIT, the table PT, and the table T 'are read from the first storage unit, the second storage unit, and the third storage unit, and predetermined privacy level values p and p are two or more. Natural records), and records corresponding to T ∩ T '(records without change) are copied from table QIT and table PT to table QIT' and table PT 'as they are, and records corresponding to T'-T (Added records), create records in the table QIT 'with a schema containing the corresponding (QI ₁ , ..., QI _m , Row_ID), and join with the table QIT'. (join) It has a schema including (Row_ID, S, PROB) that is the target of the operation, and the sum of the PROB columns of records having the same Row_ID is 1.0 and p records having the same Row_ID are created in the table PT '. A change protection table generation unit; And

And a fourth storage unit for storing the table QIT 'and the table PT' outputted from the change protection table generating unit.