KR20210112469A

KR20210112469A - Method for Personal Information De-identification

Info

Publication number: KR20210112469A
Application number: KR1020200027515A
Authority: KR
Inventors: 김순석
Original assignee: 한라대학교산학협력단
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2021-09-15
Also published as: KR102318666B1

Abstract

According to an embodiment of the present invention, a personal information de-identification method comprises: a step of measuring a risk of a data situation; a step of calculating a total risk considering the data situation and determining a processing level; a step of considering the data situation according to the determined processing level to perform pseudonym processing or anonymous processing; a step of evaluating adequacy of a de-identification data set wherein the pseudonym processing or the anonymous processing has been performed; and a step of completing de-identification when evaluated as appropriate. The de-identification includes the pseudonym processing or the anonymous processing.

Description

{Method for Personal Information De-identification}

본 출원은 개인정보 비식별조치 방법에 관한 것이다.This application relates to a method of de-identification of personal information.

지난 2020년 2월 개정된 개인정보보호법에 따르면 개인정보란 살아 있는 개인에 관한 정보로서 다음 각 목의 어느 하나에 해당하는 정보를 말한다.According to the Personal Information Protection Act amended in February 2020, personal information refers to information about a living individual that falls under any of the following items:

가. 성명, 주민등록번호 및 영상 등을 통하여 개인을 알아볼 수 있는 정보go. Information that can identify an individual through name, resident registration number, and video

나. 해당 정보만으로는 특정 개인을 알아볼 수 없더라도 다른 정보와 쉽게 결합하여 알아볼 수 있는 정보. 이 경우 쉽게 결합할 수 있는지 여부는 다른 정보의 입수 가능성 등 개인을 알아보는 데 소요되는 시간, 비용, 기술 등을 합리적으로 고려하여야 한다.me. Information that can be easily combined with other information to identify an individual, even if that information alone cannot identify an individual. In this case, whether or not it can be easily combined shall reasonably consider the time, cost, technology, etc. required to identify an individual, such as the availability of other information.

다. 가목 또는 나목을 제1호의2에 따라 가명처리함으로써 원래의 상태로 복원하기 위한 추가 정보의 사용ㆍ결합 없이는 특정 개인을 알아볼 수 없는 정보all. Information that cannot identify a specific individual without the use or combination of additional information to restore the original state by pseudonymizing items (a) or (b) in accordance with subparagraph 1-2

우리나라는 2016년 6월 개인정보에 대한 비식별 조치를 위해 6개 정부부처가 합동으로 "개인정보 비식별 조치 가이드라인"을 발간한 바 있다. 또한 지난 2020년 2월 개인정보와 관련된 개념체계를 개인정보·가명정보·익명정보로 명확히 하고, 가명정보는 통계작성, 과학적 연구, 공익적 기록보존의 목적으로 처리할 수 있도록 하며, 서로 다른 개인정보 처리자가 보유하는 가명정보는 대통령령으로 정하는 보안시설을 갖춘 전문기관을 통해서만 결합할 수 있도록 하고, 전문기관의 승인을 거쳐 반출을 허용하는 개인정보보호법이 개정된 바 있다. 그러나 이러한 가이드라인의 발간과 법개정에도 불구하고 이를 토대로 일반 기업들이나 기관 특히, 중소기업이나 스타트업 기업들이 비식별 조치 수행과 그 적정성 판단에 대해 어떠한 시각과 방법으로 접근해야 하는 지를 자세하게 제시한 기술은 부재한 실정이다.In June 2016, six government ministries jointly published the "Guidelines for measures against personal information de-identification" in Korea. In addition, in February 2020, the conceptual system related to personal information was clarified as personal information, pseudonymous information, and anonymous information, and pseudonymous information can be processed for the purposes of statistical preparation, scientific research, and public record preservation. The Personal Information Protection Act has been amended, which allows for the combination of pseudonymous information held by information controllers only through specialized agencies equipped with security facilities prescribed by Presidential Decree, and permits export after approval of the specialized agencies. However, despite the publication of these guidelines and amendments to the law, the technology that detailed the perspective and method that general companies and institutions, especially small and medium-sized enterprises and start-ups, should approach to the implementation of de-identification measures and the judgment of their adequacy based on these guidelines is not is currently absent.

따라서, 당해 기술분야에서는 기업이나 기관 등 조직 내에서 개인정보 비식별 조치 수행 시 데이터만이 아닌 활용 목적이나 방법 등 주변의 상황과 환경 등을 고려한 위험도에 기반하여 안전하게 조치하고 조치된 결과가 적정한지를 평가하기 위한 방안이 요구되고 있다.Therefore, in the technical field, when performing personal information de-identification measures within an organization such as a company or institution, take safe measures based on the level of risk in consideration of the surrounding situation and environment, such as the purpose and method of use, not just data, and check whether the results of the measures are appropriate. A method for evaluation is required.

상기 과제를 해결하기 위해서, 본 발명의 일 실시예는 개인정보 비식별조치 방법을 제공한다.In order to solve the above problems, an embodiment of the present invention provides a method of de-identification of personal information.

상기 개인정보 비식별조치 방법은, 데이터 상황에 대한 위험도를 측정하는 단계; 상기 데이터 상황을 고려한 총 위험도를 산출하고 처리 수준을 결정하는 단계; 결정된 상기 처리 수준에 따라 상기 데이터 상황을 고려하여 가명 처리 또는 익명 처리를 수행하는 단계; 상기 가명 처리 또는 익명 처리가 수행된 비식별 데이터 세트의 적정성을 평가하는 단계; 및 적정으로 판정될 경우 비식별 조치를 완료하는 단계를 포함할 수 있으며, 상기 비식별조치는 가명 처리 또는 익명 처리를 포함한다.The personal information de-identification method includes: measuring the degree of risk to the data situation; calculating a total risk in consideration of the data situation and determining a processing level; performing pseudonymization or anonymization processing in consideration of the data situation according to the determined processing level; evaluating the adequacy of the de-identified data set on which the pseudonymization or anonymization has been performed; and completing the de-identification measures when it is determined to be appropriate, wherein the de-identification measures include pseudonymization or anonymization.

덧붙여 상기한 과제의 해결수단은, 본 발명의 특징을 모두 열거한 것이 아니다. 본 발명의 다양한 특징과 그에 따른 장점과 효과는 아래의 구체적인 실시형태를 참조하여 보다 상세하게 이해될 수 있을 것이다.Incidentally, the means for solving the above problems do not enumerate all the features of the present invention. Various features of the present invention and its advantages and effects may be understood in more detail with reference to the following specific embodiments.

본 발명의 일 실시예에 따르면, 기업이나 기관 등 조직 내에서 개인정보 비식별 조치 수행 시 안전하게 조치하고 조치된 결과가 적정한지를 평가하도록 할 수 있다.According to an embodiment of the present invention, when performing personal information de-identification measures within an organization such as a company or institution, it is possible to safely take measures and evaluate whether the measures taken are appropriate.

도 1은 본 발명의 일 실시예에 따른 데이터 상황에 대한 분류도이다.
도 2는 본 발명의 일 실시예에 따른 데이터 활용 방법의 분류도이다.
도 3은 본 발명의 일 실시예에 따른 데이터 활용 방법의 분류 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 개인정보 비식별 조치 방법의 흐름도이다.
도 5는 본 발명의 일 실시예에 따라 단순사용인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이다.
도 6은 본 발명의 일 실시예에 따라 내부결합인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이다.
도 7은 본 발명의 일 실시예에 따라 전문기관을 통한 외부결합인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이다.
도 8은 본 발명의 일 실시예에 따른 가명처리 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 9는 본 발명의 일 실시예에 따른 가명처리의 각 단계별 점수 분포와 빈도에 따른 최종 위험도 점수를 도시하는 도면이다.
도 10은 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우를 도시하는 도면이다.
도 11은 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우의 위험도 측정 흐름도이다.
도 12는 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우의 위험도 산출표를 도시하는 도면이다.
도 13은 본 발명의 일 실시예에 따른 가명처리 내부결합의 경우를 도시하는 도면이다.
도 14는 본 발명의 일 실시예에 따른 가명처리 내부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 15는 본 발명의 일 실시예에 따른 가명처리 외부결합의 경우를 도시하는 도면이다.
도 16은 본 발명의 일 실시예에 따른 가명처리 외부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 17은 본 발명의 일 실시예에 따른 익명처리 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 18은 본 발명의 일 실시예에 따른 익명처리의 각 단계별 점수 분포와 빈도에 따른 최종 위험도 점수를 도시하는 도면이다.
도 19는 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우를 도시하는 도면이다.
도 20은 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우의 위험도 측정 흐름도이다.
도 21은 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우의 위험도 산출표를 도시하는 도면이다.
도 22는 본 발명의 일 실시예에 따른 익명처리 내부결합의 경우를 도시하는 도면이다.
도 23은 본 발명의 일 실시예에 따른 익명처리 내부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 24는 본 발명의 일 실시예에 따른 익명처리 외부결합의 경우를 도시하는 도면이다.
도 25는 본 발명의 일 실시예에 따른 익명처리 외부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.
도 26은 본 발명의 일 실시예에 따른 적정성 평가 절차의 흐름도이다.1 is a classification diagram for data situations according to an embodiment of the present invention.
2 is a classification diagram of a data utilization method according to an embodiment of the present invention.
3 is a classification flowchart of a data utilization method according to an embodiment of the present invention.
4 is a flowchart of a method for de-identification of personal information according to an embodiment of the present invention.
5 is a diagram illustrating a flow in case of simple use and a degree of risk for each detailed situation according to an embodiment of the present invention.
6 is a diagram illustrating a flow and a degree of risk for each detailed situation in the case of an internal coupling according to an embodiment of the present invention.
7 is a diagram illustrating a flow in the case of external coupling through a specialized agency and a degree of risk for each detailed situation according to an embodiment of the present invention.
8 is a diagram illustrating a flow chart for measuring the risk of pseudonymization and a risk calculation table according to an embodiment of the present invention.
9 is a diagram illustrating a final risk score according to a score distribution and frequency of each stage of pseudonymization processing according to an embodiment of the present invention.
10 is a diagram illustrating a case of simple use of pseudonymization according to an embodiment of the present invention.
11 is a flowchart of risk measurement in the case of simple use of pseudonym processing according to an embodiment of the present invention.
12 is a diagram illustrating a risk calculation table in the case of simple use of pseudonym processing according to an embodiment of the present invention.
13 is a diagram illustrating a case of an internal coupling of pseudonym processing according to an embodiment of the present invention.
14 is a diagram illustrating a risk measurement flowchart and a risk calculation table in the case of internal combination of pseudonym processing according to an embodiment of the present invention.
15 is a diagram illustrating a case of alias processing external coupling according to an embodiment of the present invention.
16 is a diagram illustrating a risk measurement flow chart and a risk calculation table in the case of external combination of pseudonym processing according to an embodiment of the present invention.
17 is a diagram illustrating an anonymization risk measurement flowchart and risk calculation table according to an embodiment of the present invention.
18 is a diagram showing the final risk score according to the score distribution and frequency of each stage of anonymization according to an embodiment of the present invention.
19 is a diagram illustrating a case of simple use of anonymization according to an embodiment of the present invention.
20 is a flowchart of risk measurement in the case of simple use of anonymization according to an embodiment of the present invention.
21 is a diagram illustrating a risk level calculation table in the case of simple use of anonymization according to an embodiment of the present invention.
22 is a diagram illustrating a case of anonymization inner coupling according to an embodiment of the present invention.
23 is a diagram illustrating a risk measurement flow chart and a risk calculation table in the case of an anonymization internal combination according to an embodiment of the present invention.
24 is a diagram illustrating a case of anonymization outsourcing according to an embodiment of the present invention.
25 is a diagram illustrating a risk measurement flow chart and a risk calculation table in the case of anonymization outsourcing according to an embodiment of the present invention.
26 is a flowchart of an adequacy evaluation procedure according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, with reference to the accompanying drawings, a preferred embodiment will be described in detail so that those skilled in the art to which the present invention pertains can easily practice the present invention. However, in describing a preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 '연결'되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 '간접적으로 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 '포함'한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, throughout the specification, when a part is 'connected' with another part, it is not only 'directly connected' but also 'indirectly connected' with another element interposed therebetween. include In addition, 'including' a certain component means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1은 본 발명의 일 실시예에 따른 데이터 상황에 대한 분류도이다.1 is a classification diagram for data situations according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에서는 데이터 상황에 대한 구성을 총 12가지 관점에서 하기와 같이 정의할 수 있다.Referring to FIG. 1 , in an embodiment of the present invention, a configuration of a data situation may be defined from a total of 12 viewpoints as follows.

우선, 데이터 활용방법 측면에서 7가지 관점으로 분류할 수 있다. 구체적으로, 데이터 활용방법은 단순사용 또는 데이터 결합(내부결합 또는 외부결합)으로 분류되며, 각 경우에 따라 하기와 같이 분류될 수 있다. 본 명세서에서, '비식별 조치'란 용어는 가명 처리 또는 익명 처리를 포함한다.First, in terms of data utilization methods, it can be classified into seven perspectives. Specifically, the data utilization method is classified into simple use or data combination (internal combination or external combination), and may be classified as follows according to each case. As used herein, the term 'de-identifying action' includes pseudonymization or anonymization.

1.1 왜(Why)(목적 관점) 비식별 조치를 하고자 하는가? 즉, 그 활용목적이 무엇인가?1.1 Why (Purpose Perspective) Do you want to take de-identification measures? That is, what is the purpose of its use?

1.2 무엇을(What)(원천 관점) 데이터 결합 시 동일기관 내 데이터를 가지고 결합을 하려는가? 아니면 서로 다른 기관의 데이터를 가지고 결합을 하려는가?1.2 What (source point of view) When combining data, do you want to combine data within the same institution? Or do you want to combine data from different institutions?

1.3 어디서(Where)(장소 관점) 데이터를 활용하려는 장소가 내부, 외부 혹은 모두 사용 등 어디에서 이루어지는가?1.3 Where (Place Perspective) Where do you intend to use the data, internally, externally, or both?

1.4 어디서(Where)(처리 관점) 데이터에 대한 비식별 조치가 어디(기관 내부, 외부, 전문기관)에서 이루어지는가?1.4 Where (processing point of view) Where (internal, external, professional) actions on data de-identification take place?

1.5~1.7 어떻게(How) (1.5 데이터 공개관점) 데이터 접근/분석이 이루어지는가? (1.6 데이터 제공관점) 어느 기간 동안 데이터 제공이 이루어지는가? (1.7 데이터 활용관점) 활용목적관점에서 단일, 다용도 혹은 계약상 등 어떤 목적으로 데이터를 활용하려는가?1.5~1.7 How (1.5 Data Disclosure Perspective) How is data access/analysis done? (1.6 Perspective of data provision) During what period is data provided? (1.7 Data Utilization Perspective) From the point of view of the purpose of use, what purpose do you intend to use the data for, such as single, multi-purpose, or contractual?

다음으로, 데이터 이용 환경 측면에서 하기의 2가지 관점으로 분류할 수 있다. Next, it can be classified into the following two viewpoints in terms of the data use environment.

2.1 데이터 이용자 또는 제공받는 자의 재식별 시도 가능성2.1 Possibility of re-identification attempts by data users or recipients

2.2 재식별 되었을 때 정보주체에게 미치는 영향2.2 Effect on data subjects when re-identified

마지막으로, 데이터 자체에 대한 위험도 측면에서 하기의 3가지 관점으로 분류할 수 있다.Finally, in terms of the risk to the data itself, it can be classified into the following three perspectives.

3.1 데이터 구성3.1 Data organization

3.2 데이터 분포3.2 Data Distribution

3.3 데이터의 민감도 3.3 Data Sensitivity

상술한 바와 같은 12가지 관점에서 데이터 상황에 대해 분류할 수 있으며, 각각의 관점에 대해 보다 구체적으로 설명한다.Data situations can be classified from 12 viewpoints as described above, and each viewpoint will be described in more detail.

우선, 데이터 활용방법에 대해 각각의 관점에서 세부 구성을 살펴보면 하기와 같이 모두 19가지로 분류할 수 있으며, 도 2는 본 발명의 일 실시예에 따른 데이터 활용 방법의 분류도이다.first of all, Looking at the detailed configuration of the data utilization method from each point of view, it can be classified into 19 types as follows. FIG. 2 is a classification diagram of a data utilization method according to an embodiment of the present invention.

1.1 왜(Why?)(목적관점-데이터의 활용 목적에 따라)1.1 Why? (According to the purpose of use of data)

1.1.1 우리나라 개인정보보호법 제23조의 2(가명정보의 처리 등) 1항의 목적에 따른 통계작성, 과학적 연구, 공익적 기록 보존 등의 목적이나 신용정보의 보호 및 이용에 관한 법률 제32조(개인신용정보의 제공·활용에 대한 동의) 6항 9의2호 통계작성, 연구, 공익적 기록보존 등을 위하여 가명정보를 제공하는 경우(이 경우 통계작성에는 시장조사 등 상업적 목적의 통계작성을 포함하며, 연구에는 산업적 연구를 포함한다.)의 목적을 말함1.1.1 The purpose of statistical preparation, scientific research, public record preservation, etc. in accordance with the purpose of Article 23-2 (Processing of pseudonymous information, etc.) Paragraph 1 of the Korea Personal Information Protection Act, Article 32 ( Consent to the provision and use of personal credit information) Paragraph 6 9-2 In case of providing pseudonymous information for statistical preparation, research, public record preservation, etc. (In this case, statistical preparation for commercial purposes, such as market research, and the purpose of research includes industrial research.)

1.1.2 앞서 1.1.1에서 기술한 이외의 목적을 말함1.1.2 For purposes other than those described in 1.1.1 above

1.2 무엇을(What?)(결합원천관점)1.2 What? (Combined Sources Perspective)

1.2.1 결합하려는 대상 원천 데이터가 동일 기관인 경우에 해당함1.2.1 When the source data to be combined is the same institution

1.2.2 결합하려는 대상 원천 데이터가 서로 다른 기관인 경우에 해당함1.2.2 When the target source data to be combined is from different institutions

1.3 어디서(Where?)(처리 관점-데이터의 비식별 조치 관점에서) 1.3 Where? (in terms of processing - in terms of actions to de-identify data)

1.3.1 기관 내부에서의 데이터 비식별 조치를 말함1.3.1 Measures of data de-identification within an institution

1.3.2 기관 외부(예를 들어, 민간전문기관)에서의 데이터 비식별 조치를 말함1.3.2 Refers to data de-identification measures outside the institution (e.g., private specialized agencies)

1.3.3 전문기관에서의 데이터 비식별 조치를 말함1.3.3 Refers to data de-identification measures by specialized agencies

1.4 어디서(Where?)(장소관점)1.4 Where? (Place Perspective)

1.4.1 데이터를 활용하려는 장소가 기관 또는 기업 내부에서만 사용하는 경우에 해당함1.4.1 Applies to cases where the place where data is intended to be used is used only inside an institution or company

1.4.2 데이터를 활용하려는 장소가 기관 또는 기업의 외부(예를 들어, 민간전문기관)에서 사용하는 경우에 해당함1.4.2 When the place where data is to be used is used outside the institution or company (for example, a private specialized institution)

1.3.3 위 1.3.1과 1.3.2 모두의 경우에 해당함1.3.3 Applies to both 1.3.1 and 1.3.2 above

1.5 어떻게(How?)(공개관점-데이터 공개의 관점에서 데이터가 비식별조치된 이후의 데이터를 어떠한 방식으로 공개할 것인가에 따라)1.5 How? (Public point of view - depending on how the data will be disclosed after the data has been de-identified from the point of view of data disclosure)

1.5.1 기업 또는 기관 내부의 자체 분석실 내에서만 공개를 말함1.5.1 refers to disclosure only within its own analytical laboratory within a company or institution

1.5.2 일반 대중들에게 완전 공개를 말함1.5.2 Full disclosure to the general public

1.5.3 데이터 제공기관과 데이터 이용기관 간 상호 계약에 의한 공개를 말함1.5.3 Disclosure by mutual agreement between the data provider and the data user

1.5.4 외부 사용자를 위한 보안시설을 갖춘 샌드박스 내에서만 공개를 말함1.5.4 Disclosure refers to disclosure only within a sandbox with security facilities for external users.

1.6 어떻게(How?)(제공관점)1.6 How?

1.6.1 데이터 제공이 일회성인 경우에 해당함1.6.1 Applicable in case of one-time provision of data

1.6.2 데이터 제공이 주기적(매달, 분기별, 반기별 등)인 경우에 해당함1.6.2 When data provision is periodic (monthly, quarterly, semi-annually, etc.)

1.6.3 데이터 제공이 미리 정한 일정기간 동안인 경우에 해당함1.6.3 Applicable when data provision is for a predetermined period of time

1.7 어떻게(How?)(활용관점)1.7 How? (Usage Perspective)

1.7.1 데이터를 활용하려는 목적이 단일 목적인 경우에 해당함1.7.1 When the purpose of using the data is for a single purpose

1.7.2 데이터를 활용하려는 목적이 다용도인 경우에 해당함1.7.2 In cases where the purpose of using data is multi-purpose

상술한 데이터 활용방법을 1) 단순사용일 경우와 2) 데이터 결합(기관 또는 기업 내부에서의 내부결합과 외부에서의 외부결합)으로 분류하여 순서에 따른 흐름도를 나타내면 도 3에 도시된 바와 같다.When the above-described data utilization method is classified into 1) simple use and 2) data combination (internal combination inside an institution or company and external combination outside), a flowchart according to the sequence is shown in FIG. 3 .

다음으로, 데이터 이용 환경 및 데이터 자체는 각각 하기와 같이 분류할 수 있다.Next, the data use environment and the data itself can be classified as follows.

2.1 재식별 시도 가능성: 데이터 이용자 또는 제공받는 기관 또는 기업의 재식별 의도 및 능력과 개인정보보호수준을 말함2.1 Probability of re-identification attempt: refers to the intention and capability of re-identification of the data user or the receiving institution or company, and the level of personal information protection

2.1.1 데이터 이용자 또는 제공받는 기관 또는 기업의 재식별 의도, 재식별 능력, 그리고 외부정보와의 연계 가능성을 말함2.1.1 Refers to the re-identification intention, re-identification ability, and possibility of linkage with external information of the data user or the receiving institution or company

2.1.2 데이터 이용자 또는 제공받는 기관 또는 기업의 개인정보보호 능력을 말함2.1.2 Refers to the ability of data users or organizations or companies to protect personal information

2.2 재식별시 정보주체에게 미치는 영향: 데이터가 의도적 또는 비의도적으로 재식별 되었을 때 정보주체에게 미치는 영향을 말함2.2 Effect on data subject upon re-identification: This refers to the effect on data subject when data is re-identified intentionally or unintentionally.

2.2.1 위 2.2와 내용이 동일함2.2.1 Same as above 2.2

3.1 데이터 구성: 원본 데이터 자체의 구성도를 말함3.1 Data composition: Refers to the composition diagram of the original data itself

3.1.1 데이터 내에 고유 식별자 포함 여부3.1.1 Whether the data contains a unique identifier

3.1.2 데이터 총 컬럼수3.1.2 Total number of data columns

3.1.3 데이터 세트의 단일 또는 다중 통계적 특성3.1.3 Single or Multiple Statistical Characteristics of a Data Set

3.1.4 준식별자 총 컬럼의 수3.1.4 Total number of quasi-identifier columns

3.1.5 민감정보 총 컬럼의 수3.1.5 Total number of columns of sensitive information

3.1.6 원본의 모집단이 전 국민에 해당하는 지의 여부3.1.6 Whether the original population corresponds to all citizens

3.2 데이터 분포: 원본 데이터의 각 컬럼 내 속성 값들의 분포와 이상치(outlier) 포함 여부를 말함3.2 Data distribution: refers to the distribution of attribute values in each column of the original data and whether outliers are included

3.2.1 각 컬럼(속성) 내 속성 값들의 분포3.2.1 Distribution of attribute values in each column (attribute)

3.2.2 개인 식별이 가능한 이상치(outlier)들에 대한 포함 여부3.2.2 Whether or not to include personally identifiable outliers

3.3 데이터 민감도: 원본 데이터 자체의 민감성을 말함3.3 Data Sensitivity: Refers to the sensitivity of the original data itself.

3.3.1 원본 데이터의 단일, 다중, 연결(행위 또는 위치적) 시간적 특성3.3.1 Single, Multiple, Linked (Behavioral or Positional) Temporal Characteristics of the Source Data

3.3.2 우리나라 법에서 제한하는 데이터의 포함 여부3.3.2 Whether data restricted by Korean law is included

3.3.3 해당 산업 군에서 다루는 민감정보의 포함 여부3.3.3 Whether or not sensitive information handled by the relevant industry group is included

이하에서는, 상술한 바와 같이 정의된 데이터 상황을 고려하여 비식별 조치를 수행하는 방법에 대해 설명한다.Hereinafter, a method of performing the de-identification action in consideration of the data situation defined as described above will be described.

도 4는 본 발명의 일 실시예에 따른 개인정보 비식별조치 방법의 흐름도이다.4 is a flowchart of a personal information de-identification method according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 개인정보 비식별조치 방법은 크게 6 단계를 포함할 수 있다.Referring to FIG. 4 , the personal information de-identification method according to an embodiment of the present invention may largely include six steps.

우선, 제 1 단계인 데이터 상황에 대한 위험도 측정 단계에서는, 상술한 바와 같이 정의한 3가지의 데이터 환경, 즉 데이터 활용방법(A), 데이터 이용환경(B), 데이터 자체(C)의 데이터 상황에 대해 위험도를 측정할 수 있다.First, in the first stage, the risk measurement step for the data situation, the data situation of the three data environments defined as described above, namely, the data utilization method (A), the data use environment (B), and the data itself (C) risk can be measured.

이후, 제 2 단계인 데이터 상황을 고려한 총 위험도 산출 및 처리 수준 결정 단계에서는, 상술한 제 1 단계에서 측정한 3가지 데이터 상황을 고려한 총 위험도(D=A+B+C)를 산출하고 처리 수준을 결정할 수 있다.After that, in the second step, the calculation of the total risk and the determination of the processing level in consideration of the data situation, the It is possible to calculate the total risk (D=A+B+C) considering the three data situations and determine the treatment level.

이후, 제 3 단계인 비식별 처리 단계에서는, 상술한 제 2 단계에서 산출된 처리 수준에 따라 상술한 제 1 단계의 데이터 상황을 고려하여 이용 목적에 맞게 가명 처리 또는 익명 처리를 수행할 수 있다.Thereafter, in the third step, the de-identification processing step, according to the processing level calculated in the second step, pseudonymization processing or anonymization processing may be performed according to the purpose of use in consideration of the data situation of the first step described above.

이후, 제 4 단계인 적정성 평가 단계에서는, 상술한 제 3 단계에서 가명 처리 또는 익명 처리한 비식별 데이터 세트가 적정하게 처리되었는지를 평가할 수 있다. 이 경우, 적정으로 판정될 경우에는 제 5 단계로 진행하고, 부적정으로 판정될 경우에는 제 3 단계로 재진입하여 처리 수준이 적정 수준이 될 때까지 계속하여 반복 처리를 수행할 수 있다.Thereafter, in the fourth step, the adequacy evaluation step, it may be evaluated whether the de-identified data set that has been pseudonymized or anonymized in the third step described above has been properly processed. In this case, if it is determined to be appropriate, the process proceeds to the fifth step, and if it is determined to be unsuitable, the third step may be re-entered and repeated processing may be performed continuously until the processing level becomes an appropriate level.

이후, 제 5 단계인 처리 완료 단계에서는, 상술한 제 4 단계의 적정성 평가 결과가 적정인 경우에 처리를 완료하고 제 6 단계로 진행할 수 있다.Thereafter, in the processing completion step, which is the fifth step, when the adequacy evaluation result of the fourth step described above is appropriate, the processing may be completed and the sixth step may be performed.

이후, 제 6 단계인 과정 기록 단계에서는, 상술한 제 1 단계 내지 제 5 단계의 과정을 기록하고 보관할 수 있다.Thereafter, in the process recording step, which is the sixth step, the processes of the first to fifth steps described above may be recorded and stored.

도 4에 도시된 개인정보 비식별조치 방법은 프로세싱 장치에 의해 수행될 수 있다. 도 4의 각 단계에 대해서는 이하에서 보다 구체적으로 설명한다.The personal information de-identification method shown in FIG. 4 may be performed by the processing device. Each step of FIG. 4 will be described in more detail below.

제 1 단계 (데이터 상황에 대한 위험도 측정 단계)Step 1 (Risk Measurement Step for Data Situation)

본 발명의 일 실시예에 따르면, 데이터 상황에 대한 위험도 측정 방법 및 반영 비율은 하기와 같이 설정될 수 있다.According to an embodiment of the present invention, the risk measurement method and the reflection ratio for the data situation may be set as follows.

1) 데이터 활용방법에 대한 위험도 측정(40%)1) Measurement of risk of data utilization method (40%)

단순사용, 내부결합 및 외부결합 사용 등 각 경우에 따라 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very High(5점) 부여 후 점수를 합산하여 반영할 수 있다.After giving Very Low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very High (5 points) in each step according to each case such as simple use, internal coupling and external coupling use. It can be reflected by summing up the scores.

도 5는 본 발명의 일 실시예에 따라 단순사용인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이고, 도 6은 본 발명의 일 실시예에 따라 내부결합인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이며, 도 7은 본 발명의 일 실시예에 따라 전문기관을 통한 외부결합인 경우의 흐름 및 각 세부상황에 대한 위험도를 도시하는 도면이다.5 is a diagram showing a flow and a degree of risk for each detailed situation in the case of simple use according to an embodiment of the present invention, and FIG. 6 is a flow and each detail in the case of an internal coupling according to an embodiment of the present invention. It is a view showing the degree of risk for the situation, and FIG. 7 is a view showing the flow in the case of external coupling through a specialized agency according to an embodiment of the present invention and the degree of risk for each detailed situation.

또한, 하기의 표 1은 예를 들어 단순사용인 경우의 데이터 활용방법에 따른 가명 및 익명정보의 가능 유형 및 레벨을 나타낸 것이다.In addition, Table 1 below shows the possible types and levels of pseudonymous and anonymous information according to the data utilization method in the case of simple use, for example.

먼저, 가명처리에 따른 위험도 측정 방법에 대해 구체적으로 설명한다.First, the risk measurement method according to pseudonymization will be described in detail.

도 8은 본 발명의 일 실시예에 따른 가명처리 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.8 is a diagram illustrating a flow chart for measuring the risk of pseudonymization and a risk calculation table according to an embodiment of the present invention.

도 8에 도시된 바와 같이, 기관 또는 기업의 데이터 사용형태를 1. 사용목적부터 7. 완료까지 화살표로 이동하면서 거치는 경로의 화살표의 점수의 합이 활용방법에 대한 위험도 총점이 된다. 이러한 위험도 총점을 위험도 산출표를 이용하여 위험도 점수를 확인할 수 있다.As shown in FIG. 8 , the sum of the points of the arrows on the path taken while moving from 1. purpose of use to 7. completion of the data usage form of the institution or company becomes the total risk for the utilization method. The total risk score can be checked using the risk calculation table.

예를 들어, 자사에서 사용하며 주기적인 분석을 필요로 하고 단일 목적으로 사용하는 경우, For example, if your company uses it, requires periodic analysis, and is used for a single purpose;

보유기관 3점 + 주기적 4점 + 단일목적 사용 3점 = 10점3 points for holding institutions + 4 points for periodic use + 3 points for single-purpose use = 10 points

이고, 위험도 산출표에 따라 위험도 총점은 13점 이하에 해당하여 normal이고 위험도 점수는 24점으로 산정될 수 있다.According to the risk calculation table, the total risk score is 13 points or less, which is normal, and the risk score can be calculated as 24 points.

이와 같이, 각 단계별 위험도 점수와 합계를 기반으로 이를 빈도수에 따라 위험도를 normal(27.8%), high(45.6%), very high(26.7%)의 3가지 수준으로 나누어 점수를 부여하면 도 9에 도시된 바와 같다. 여기서, 위험도 점수는 수준에 따라 normal일 경우 24점, high일 경우 32점, very high일 경우 40점이 부여될 수 있다.In this way, based on the risk score and sum of each stage, the risk is divided into three levels of normal (27.8%), high (45.6%), and very high (26.7%) according to the frequency and given a score, as shown in FIG. same as it has been Here, according to the level, the risk score may be given 24 points for normal, 32 points for high, and 40 points for very high.

도 10은 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우를 도시하는 도면이고, 도 11은 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우의 위험도 측정 흐름도이며, 도 12는 본 발명의 일 실시예에 따른 가명처리 단순사용의 경우의 위험도 산출표를 도시하는 도면이다.10 is a diagram illustrating a case of simple use of pseudonym processing according to an embodiment of the present invention, FIG. 11 is a flowchart of risk measurement in case of simple use of pseudonym processing according to an embodiment of the present invention, and FIG. 12 is this It is a diagram showing a risk calculation table in the case of simple use of pseudonymization according to an embodiment of the present invention.

도 10 내지 도 12에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 10(22%, 4Case)점 이하일 경우 Normal로, 11점~13점(45%, 8Case)일 경우 High로, 14점 이상(33%, 6Case)일 경우 Very high로 판정할 수 있다.10 to 12, in each step Very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given. Normal, 11~13 points (45%, 8Case) can be judged high, and 14 points or more (33%, 6Case) can be judged as very high.

도 13은 본 발명의 일 실시예에 따른 가명처리 내부결합의 경우를 도시하는 도면이고, 도 14는 본 발명의 일 실시예에 따른 가명처리 내부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.13 is a diagram showing a case of internal combination of pseudonym processing according to an embodiment of the present invention, and FIG. 14 is a flowchart showing a risk measurement flow chart and a risk calculation table in the case of internal combination of pseudonym processing according to an embodiment of the present invention It is a drawing.

도 13 및 도 14에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 12(20.8%, 10Case)점 이하일 경우 Normal로, 13점~15점(43.8%, 23Case)일 경우 High로, 22점 이상(35.4%, 17Case)일 경우 Very high로 판정할 수 있다.13 and 14, very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given for each step. , It can be judged as Normal when the total score is 12 (20.8%, 10Case) or less, High when it is 13-15 (43.8%, 23Case), and Very High when it is 22 or more (35.4%, 17Case). have.

도 15는 본 발명의 일 실시예에 따른 가명처리 외부결합의 경우를 도시하는 도면이고, 도 16은 본 발명의 일 실시예에 따른 가명처리 외부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.15 is a diagram showing a case of alias processing external combination according to an embodiment of the present invention, and FIG. 16 is a risk measurement flow chart and risk calculation table in the case of alias processing external combination according to an embodiment of the present invention It is a drawing.

도 15 및 도 16에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 15(25%, 18Case)점 이하일 경우 Normal로, 16점~18점(41.7%, 30Case)일 경우 High로, 19점 이상(33.3%, 24Case)일 경우 Very high로 판정할 수 있다.15 and 16, very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given for each step. , It can be judged as Normal when the total score is 15 (25%, 18 Case) or less, High when it is 16 to 18 (41.7%, 30 Case), and Very High when it is 19 or more (33.3%, 24 Case). have.

다음으로, 익명처리에 따른 위험도 측정 방법에 대해 구체적으로 설명한다.Next, the risk measurement method according to anonymization will be described in detail.

도 17은 본 발명의 일 실시예에 따른 익명처리 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면으로, 도 8을 참조하여 상술한 가명처리 위험도 측정의 경우와 동일한 바 이에 대한 중복적인 설명은 생략한다. 다만, 모든 위험도를 산정하고 난 후 가명처리에 대해서는 3단계(normal, high, very high)로 구분한 데 반해, 익명처리에 대해서는 5단계(very low ~ very high)로 구분할 수 있다. 또한, 완전 공개의 경우 점수를 계산하지 않고 최고 수준인 very high로 공개할 수 있다.17 is a diagram illustrating an anonymization risk measurement flow chart and risk calculation table according to an embodiment of the present invention, which is the same as the case of the pseudonymization risk measurement described above with reference to FIG. 8, and a redundant description thereof will be omitted. However, after calculating all risks, pseudonymization is classified into three levels (normal, high, and very high), whereas anonymization can be classified into five levels (very low to very high). In addition, in the case of full disclosure, it can be disclosed at the highest level, very high, without calculating a score.

도 18은 본 발명의 일 실시예에 따른 익명처리의 각 단계별 점수 분포와 빈도에 따른 최종 위험도 점수를 도시하는 도면이다.18 is a diagram showing the final risk score according to the score distribution and frequency of each stage of anonymization according to an embodiment of the present invention.

도 18에 도시된 바와 같이, 각 단계별 위험도 점수와 합계를 기반으로 이를 빈도수에 따라 위험도를 very low(18.89%), low(18.9%), normal(23.3%), high(21.11%), very high(17.8%)의 5가지 수준으로 나누어 점수를 부여할 수 있다. 여기서, 위험도 점수는 수준에 따라 very low일 경우 8점, low일 경우 16점, normal일 경우 24점, high일 경우 32점, very high일 경우 40점이 부여될 수 있다.As shown in Fig. 18, based on the risk score and sum of each stage, the risk was set to very low (18.89%), low (18.9%), normal (23.3%), high (21.11%), and very high according to the frequency. (17.8%) can be divided into five levels to give a score. Here, depending on the level, the risk score can be given 8 points for very low, 16 points for low, 24 points for normal, 32 points for high, and 40 points for very high.

이처럼, 익명처리에서는 각 빈도에 따른 빈도 합을 5단계로 구분하여 위험도와 위험도 점수를 산출할 수 있다.In this way, in anonymization, the sum of the frequencies according to each frequency is divided into 5 steps to calculate the risk and the risk score.

도 19는 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우를 도시하는 도면이고, 도 20은 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우의 위험도 측정 흐름도이며, 도 21은 본 발명의 일 실시예에 따른 익명처리 단순사용의 경우의 위험도 산출표를 도시하는 도면이다.19 is a diagram showing a case of simple use of anonymization according to an embodiment of the present invention, FIG. 20 is a flowchart of risk measurement in case of simple use of anonymization according to an embodiment of the present invention, and FIG. It is a diagram showing a risk calculation table in the case of simple use of anonymization according to an embodiment of the present invention.

도 19 내지 도 21에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 9(11.1%, 2Case)점 이하일 경우 Very low로, 10점~11점(27.8%, 5Case)일 경우 low로, 12점~13점(27.8%, 5Case)일 경우 Normal로, 14점(22.2%, 4Case)일 경우 High로, 15점(11.1%, 2Case)일 경우 Very high로 판정할 수 있다.19 to 21, Very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given to each step. , When the total score is 9 (11.1%, 2Case) or less, it is very low, when it is 10 to 11 (27.8%, 5Case), it is low, and when it is 12 to 13 (27.8%, 5Case), it is Normal. 14 points (22.2%, 4 cases) can be judged high, and 15 points (11.1%, 2 cases) can be judged as very high.

도 22는 본 발명의 일 실시예에 따른 익명처리 내부결합의 경우를 도시하는 도면이고, 도 23은 본 발명의 일 실시예에 따른 익명처리 내부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.22 is a diagram illustrating a case of anonymization inner combination according to an embodiment of the present invention, and FIG. 23 is a risk measurement flowchart and risk calculation table in the case of anonymization inner combination according to an embodiment of the present invention. It is a drawing.

도 22 및 도 23에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 11(10.4%, 5Case)점 이하일 경우 Very low로, 12점~13점(20.8%, 10Case)일 경우 low로, 14점~15점(33.3%, 16Case)일 경우 Normal로, 16점~17점(22.9%, 11Case)일 경우 High로, 18점 이상(12.5%, 6Case)일 경우 Very high로 판정할 수 있다.22 and 23, very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given for each step. , When the total score is 11 (10.4%, 5Case) or less, it is very low, when it is 12 to 13 (20.8%, 10Case), it is low, when it is 14 to 15 (33.3%, 16Case), it is Normal. 16 to 17 points (22.9%, 11 cases) can be judged high, and 18 points or more (12.5%, 6 cases) can be judged as very high.

도 24는 본 발명의 일 실시예에 따른 익명처리 외부결합의 경우를 도시하는 도면이고, 도 25는 본 발명의 일 실시예에 따른 익명처리 외부결합의 경우의 위험도 측정 흐름도 및 위험도 산출표를 도시하는 도면이다.24 is a diagram illustrating a case of anonymization outsourcing according to an embodiment of the present invention, and FIG. 25 is a risk measurement flow chart and risk calculation table in the case of anonymization external binding according to an embodiment of the present invention. It is a drawing.

도 24 및 도 25에 도시된 바와 같이, 각 단계별로 Very low(1점), Low(2점), Normal(3점), High(4점), Very high(5점)를 부여할 수 있으며, 합산 점수가 14(15.3%, 11Case)점 이하일 경우 Very low로, 15점~16점(23.6%, 17Case)일 경우 low로, 17점~18점(27.8%, 20Case)일 경우 Normal로, 19점~20점(20.8%, 15Case)일 경우 High로, 21점 이상(12.5%, 9Case)일 경우 Very high로 판정할 수 있다.24 and 25, Very low (1 point), Low (2 points), Normal (3 points), High (4 points), and Very high (5 points) can be given at each stage. , Very low when the total score is 14 (15.3%, 11Case) or less, Low when 15 to 16 (23.6%, 17Case), Normal when 17 to 18 (27.8%, 20Case). If it is 19~20 points (20.8%, 15Case), it can be judged as High, and when it is 21 points or more (12.5%, 9Case), it can be judged as Very High.

2) 데이터 이용환경에 대한 위험도 측정(30%)2) Measurement of risk for data use environment (30%)

가명의 경우와 익명의 경우로 나뉘며 체크리스트를 이용하며, 5점 척도의 경우 1~5점을 부여하고 예/아니오의 경우 각각 5점과 1점을 부여한 후 점수를 합산하여 반영할 수 있다.It is divided into a pseudonymous case and an anonymous case, and a checklist is used. In the case of a 5-point scale, 1 to 5 points are given, and in the case of yes/no, 5 points and 1 point are given, respectively, and the scores can be summed and reflected.

구체적으로, 데이터 이용환경에 대한 세부 위험도는 하기의 표 2 및 표 3에서와 같이 가명의 경우와 익명의 경우로 나뉘어 측정될 수 있다.Specifically, the detailed risk level for the data use environment can be measured by dividing it into a pseudonymous case and an anonymous case as shown in Tables 2 and 3 below.

먼저, 재식별 시도 가능성은, 데이터 이용자 또는 제공받는 기관 또는 기업의 1) 재식별 의도 및 능력과 2) 개인정보보호수준에 대한 2가지의 측정을 의미한다. First, the possibility of re-identification attempts is, It refers to two measures of 1) re-identification intention and ability of data users or receiving organizations or companies, and 2) level of personal information protection.

여기서, 1)의 재식별 의도 및 능력은, 내부사용과 외부사용의 경우로 나뉠 수 있다. 또한, 이는 가명과 익명 각각의 경우로 다시 나누어 하기와 같은 세부지표가 정의될 수 있다.Here, the re-identification intention and ability of 1) can be divided into the case of internal use and external use. In addition, this is subdivided into pseudonymous and anonymous cases, respectively, and the following detailed indicators can be defined.

1-1) 가명처리 외부사용의 경우(15%, 7.5점 만점으로 환산)1-1) In case of external use of pseudonymization (15%, converted to a scale of 7.5 points)

하기의 표 4의 체크리스트를 이용하며, 5점 척도의 경우 1~5점을 부여하고 예/아니오의 경우 각각 5점과 1점을 부여한 후 점수를 합산하여 반영할 수 있다. 여기서, 파란색으로 표시된 2개의 지표는 외부사용의 경우에 한해 추가된 지표를 나타낸다.The checklist of Table 4 below is used, and in the case of a 5-point scale, 1 to 5 points are given, and in the case of yes/no, 5 points and 1 point are given, respectively, and the scores can be summed and reflected. Here, the two indicators indicated in blue indicate additional indicators only for external use.

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 21점(12.4%, 38,640case) 이하(very low), 22~25점(25.4%, 79,340case)(low), 26~28점(24.4%, 76,540case)(normal), 29~32점(25.4%, 79,340case)(high), 33점(12.4%, 38,640case) 이상(very high)로 산정될 수 있다(표 5의 위험도 산출표 참조).Depending on the sum of the scores for the evaluation results for these detailed indicators, 21 points (12.4%, 38,640case) or less (very low), 22~25 points (25.4%, 79,340case) (low), 26~28 points (24.4) %, 76,540case) (normal), 29~32 points (25.4%, 79,340case) (high), 33 points (12.4%, 38,640case) can be calculated as very high (refer to the risk calculation table in Table 5) ).

1-2) 익명처리 외부사용의 경우(15%, 15점 만점으로 환산)1-2) In case of external use of anonymization (15%, converted to a full scale of 15 points)

이 경우 세부지표 및 점수 산정 방식은 가명처리의 경우와 동일하며, 위험도 산출표는 표 6과 같다.In this case, detailed indicators and scoring methods are the same as in the case of pseudonymization, and the risk calculation table is shown in Table 6.

1-3) 가명처리 내부사용의 경우(15%, 7.5점 만점으로 환산)1-3) In case of internal use of pseudonymization (15%, converted to a scale of 7.5 points)

하기의 표 7의 체크리스트를 이용하며, 5점 척도의 경우 1~5점을 부여하고 예/아니오의 경우 각각 5점과 1점을 부여한 후 점수를 합산하여 반영할 수 있다. The checklist of Table 7 below is used, and in the case of a 5-point scale, 1 to 5 points are given, and in the case of yes/no, 5 points and 1 point are given, respectively, and the scores can be summed and reflected.

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 16점(13.4%, 4,201case) 이하(very low), 17~19점(22.4%, 6,986case)(low), 20~22점(28.4%, 8,876case)(normal), 23~25점(22.4%, 6,986case)(high), 26점(13.4%, 4,201case) 이상(very high)로 산정될 수 있다(표 8의 위험도 산출표 참조).According to the sum of the scores for the evaluation results for these detailed indicators, 16 points (13.4%, 4,201 case) or less (very low), 17-19 points (22.4%, 6,986 case) (low), 20-22 points (28.4) %, 8,876case) (normal), 23-25 points (22.4%, 6,986case) (high), 26 points (13.4%, 4,201case) can be calculated as very high (refer to the risk calculation table in Table 8) ).

1-4) 익명처리 내부사용의 경우(15%, 15점 만점으로 환산)1-4) In case of internal use of anonymization (15%, converted to a full scale of 15 points)

이 경우 세부지표 및 점수 산정 방식은 가명처리의 경우와 동일하며, 위험도 산출표는 표 9와 같다.In this case, the detailed index and scoring method are the same as in the case of pseudonymization, and the risk calculation table is shown in Table 9.

한편, 2) 개인정보보호수준에 대한 세부지표는 가명처리의 경우에만 표 10과 같이 정의될 수 있다(25%, 7.5점 만점으로 환산).On the other hand, 2) detailed indicators of the level of personal information protection can be defined as shown in Table 10 only in the case of pseudonymization (25%, converted to a scale of 7.5 points).

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 24점(13.6%) 이하(very low), 24~27점(20.2%)(low), 28~32점(32.4%)(normal), 33~36점(20.2%)(high), 37점(13.6%) 이상(very high)로 산정될 수 있다(표 11의 위험도 산출표 참조).Depending on the sum of the scores for the evaluation results for these detailed indicators, 24 points (13.6%) or less (very low), 24-27 points (20.2%) (low), 28-32 points (32.4%) (normal), 33~36 points (20.2%) (high), 37 points (13.6%) or higher (very high) can be calculated (refer to the risk calculation table in Table 11).

다음으로, 재식별시 정보주체에게 미치는 영향분석(50%, 총점 15점)은, 데이터가 의도적 또는 비의도적으로 재식별 되었을 때 정보주체에게 미치는 영향을 의미하는 것으로, 이의 세부지표는 가명과 익명 구분 없이 동일하게 표 12와 같이 정의될 수 있다(50%, 15점 만점으로 환산).Next, the analysis of the effect on the data subject upon re-identification (50%, total score of 15 points) means the effect on the data subject when the data is intentionally or unintentionally re-identified, and the detailed indicators of this are pseudonym and anonymity. It can be equally defined as in Table 12 without distinction (50%, converted to a full scale of 15 points).

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 8점(11.2%, 70case) 이하(very low), 9~10점(19.2%, 120case)(low), 11~13점(39.2%, 245case)(normal), 14~15점(19.2%, 120case)(high), 16점(11.2%, 70case) 이상(very high)로 산정될 수 있다(표 13의 위험도 산출표 참조).According to the sum of the scores for the evaluation results for these detailed indicators, 8 points (11.2%, 70case) or less (very low), 9-10 points (19.2%, 120case) (low), 11-13 points (39.2%, 245case) (normal), 14-15 points (19.2%, 120case) (high), 16 points (11.2%, 70case) or higher (very high) (refer to the risk calculation table in Table 13).

3) 데이터 자체에 대한 위험도 측정(30%)3) Measuring the risk of the data itself (30%)

가명의 경우와 익명의 경우 모두 동일하며 체크리스트를 이용하며, 5점 척도의 경우 1~5점을 부여하고 예/아니오의 경우 각각 5점과 1점을 부여한 후 점수를 합산하여 반영할 수 있다.In the case of pseudonymity and anonymous, it is the same, and a checklist is used. In the case of a 5-point scale, 1 to 5 points are given, and in the case of yes/no, 5 points and 1 point are given, respectively, and the scores can be summed and reflected. .

구체적으로, 데이터 자체에 대한 위험도 측정은 하기와 같은 3개의 영역으로 구분될 수 있다.Specifically, the risk measurement for the data itself can be divided into three areas as follows.

- 데이터 구성도(50% 반영, 15점 만점으로 환산)- Data composition diagram (reflected by 50%, converted to a full scale of 15 points)

- 데이터 분포도(20% 반영, 6점 만점으로 환산)- Data distribution chart (reflected by 20%, converted to a score of 6)

- 데이터 민감도(30% 반영, 9점 만점으로 환산)- Data sensitivity (reflected 30%, converted out of 9 points)

여기서, 데이터 구성도는 다시 하기와 같은 6개의 소영역으로 구분될 수 있다.Here, the data configuration diagram may be divided into six subregions as follows.

1) 데이터 내에 고유식별자 포함 여부1) Whether a unique identifier is included in the data

2) 데이터 총 컬럼수2) Total number of data columns

3) 데이터 세트의 단일 또는 다중 통계적 특성3) single or multiple statistical characteristics of the data set

4) 준식별자 총 컬럼의 수4) Total number of quasi-identifier columns

5) 민감정보 총 컬럼의 수5) Total number of columns of sensitive information

6) 원본의 모집단이 전 국민에 해당하는 지의 여부6) Whether the original population corresponds to all citizens

또한, 데이터 분포도는 다시 하기와 같은 2개의 소영역으로 구분될 수 있다.Also, the data distribution map may be divided into two subregions as follows.

1) 각 컬럼(속성) 내 속성 값들의 분포1) Distribution of attribute values in each column (attribute)

2) 개인 식별이 가능한 이상치(outlier)들에 대한 포함 여부2) Whether personally identifiable outliers are included

또한, 데이터 민감도는 다시 하기와 같은 3개의 소영역으로 구분될 수 있다.In addition, data sensitivity may be divided into three sub-regions as follows.

1) 원본 데이터의 단일, 다중, 연결(행위 또는 위치적) 시간적 특성1) Single, multiple, and concatenated (behavioral or positional) temporal characteristics of the original data

2) 우리나라 법에서 제한하는 데이터의 포함 여부2) Whether data restricted by Korean law is included

3) 해당 산업 군에서 다루는 민감정보의 포함 여부3) Whether or not sensitive information handled by the relevant industry group is included

데이터 구성도의 세부지표는 가명과 익명 구분 없이 동일하게 표 14와 같이 정의될 수 있다.The detailed indicators of the data composition diagram can be equally defined as in Table 14 without distinction between pseudonyms and anonymity.

이와 같은 세부지표의 첫 번째 항목(즉, 고유식별자 포함 여부)이 부적정일 경우, 데이터 구성도의 총점은 0점으로 환산되며 그렇지 않은 경우, 합계 13점(15.94%, 51case) 이하(very low), 14~16점(20.94%, 67case)(low), 17~19점(26.24%, 84case)(normal), 20~22점 (20.94%, 67case)(high), 23점(15.94%, 51case) 이상(very high)로 산정될 수 있다(표 15의 위험도 산출표 참조).If the first item of this detailed indicator (that is, whether it contains a unique identifier) is inappropriate, the total score of the data composition diagram is converted to 0; otherwise, the total score is 13 points (15.94%, 51 cases) or less (very low) , 14-16 points (20.94%, 67case) (low), 17-19 points (26.24%, 84case) (normal), 20-22 points (20.94%, 67case) (high), 23 points (15.94%, 51case) ) can be estimated as very high (refer to the risk calculation table in Table 15).

데이터 분포도의 세부지표는 가명과 익명 구분 없이 동일하게 표 16과 같이 정의될 수 있다.The detailed index of the data distribution chart can be equally defined as in Table 16, without distinction between pseudonyms and anonymity.

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 2점(25%, 1Case)(low), 6점(50%. 2case)(normal), 10점(25%, 1case)(high)로 산정될 수 있다(표 17의 위험도 산출표 참조).2 points (25%, 1 case) (low), 6 points (50%, 2 case) (normal), and 10 points (25%, 1 case) (high) according to the sum of the scores for the evaluation results for these detailed indicators can be estimated (see risk calculation table in Table 17).

데이터 민감도의 세부지표는 가명과 익명 구분 없이 동일하게 표 18과 같이 정의될 수 있다.The detailed index of data sensitivity can be defined as in Table 18 in the same way regardless of pseudonym and anonymity.

이와 같은 세부지표에 대한 평가 결과에 대한 점수 합계에 따라 5점(16.67%, 2case) 이하(very low), 7점(25%, 3case)(low), 9점(16.66%, 2case)(normal), 11점(25%, 3case)(high), 13점(16.67%, 2case) 이상(very high)로 산정될 수 있다(표 19의 위험도 산출표 참조).5 points (16.67%, 2case) or less (very low), 7 points (25%, 3case) (low), 9 points (16.66%, 2case) (normal) ), 11 points (25%, 3 cases) (high), and 13 points (16.67%, 2 cases) very high (refer to the risk calculation table in Table 19).

제 2 단계 (데이터 상황을 고려한 총 위험도 산출 및 처리 수준 결정 단계)Step 2 (Calculation of total risk taking into account the data situation and determination of treatment level)

상술한 제 1 단계에서 산출된 데이터 활용방법(A), 데이터 이용환경(B), 데이터 자체(C)의 3가지 위험도를 하기의 표 20의 반영비율에 따라 모두 합산하여 총 위험도(D)를 산출할 수 있다.The total risk (D) is obtained by adding up the three risk levels of the data utilization method (A), the data use environment (B), and the data itself (C) calculated in the first step described above according to the reflection ratio in Table 20 below. can be calculated.

상술한 측정 항목과 반영비율에 따라 합산 점수가 가명의 경우 3단계(Level 1(52점 미만, 29.2%), Level 2(52점 이상~70점 미만, 44.2%), Level 3(70점 이상, 26.6%))로, 익명의 경우 5단계(Level 1(42점 미만, 9.8%), Level 2(42점 이상 53점 미만, 21.8%), Level 3(53점 이상 68점 미만, 36.8%), Level 4(68점 이상 79점 미만, 21.8%), Level 5(79점 이상, 9.8%)로 최종 평가될 수 있다. 최종 평가 결과에 따른 처리 수준은 하기의 표 21과 같다.According to the above-mentioned measurement items and reflection ratio, if the total score is a pseudonym, 3 levels (Level 1 (less than 52 points, 29.2%), Level 2 (52 points or more - less than 70 points, 44.2%), Level 3 (70 points or more) , 26.6%)), in the case of anonymity, 5 levels (Level 1 (42 points or less, 9.8%), Level 2 (42 points or more and less than 53 points, 21.8%), Level 3 (53 points or more and less than 68 points, 36.8%) ), Level 4 (68 points or more, less than 79 points, 21.8%), and Level 5 (79 points or more, 9.8%) The treatment levels according to the final evaluation results are shown in Table 21 below.

제 4 단계 (적정성 평가 단계)Step 4 (Adequacy Assessment Step)

본 발명의 일 실시예에 따르면, 적정성 평가 단계는 신청기관에 의해 자체적으로 수행되거나, 또는 신청기관과 별도의 적정성 평가단에 의해 수행될 수 있다. According to an embodiment of the present invention, the adequacy evaluation step may be performed by the applicant organization itself or by an adequacy evaluation group separate from the applicant organization.

여기서, 평가 주체가 적정성 평가단일 경우, 평가시 이용된 K-익명성 모델(사용된 경우에 한함)에 대한 계량분석이 포함될 수 있다.Here, when the evaluation subject is the adequacy evaluation group, quantitative analysis of the K-anonymity model (limited to the case used) used in the evaluation may be included.

또한, 적정성 평가는 가명처리와 익명처리 기준을 분리하여 고려할 수 있다.In addition, the adequacy evaluation can be considered separately from pseudonymization and anonymization standards.

도 26은 본 발명의 일 실시예에 따른 적정성 평가 절차의 흐름도로서, 총 6 단계로 이루어질 수 있으며, 도 4에 도시된 개인정보 비식별조치 방법과 관련하여 구체적으로 상술한 방법을 고려하여 적정성 평가를 수행할 수 있다.26 is a flowchart of an adequacy evaluation procedure according to an embodiment of the present invention, which may be comprised of a total of six steps, and appropriateness evaluation in consideration of the method described above in detail in relation to the personal information de-identification method shown in FIG. can be performed.

우선, 가명처리의 경우에 대한 적정성 평가 절차에 대해 설명한다.First, the appropriateness evaluation procedure for the case of pseudonymization will be described.

[제 1 단계] 데이터 상황에 대한 위험도 측정 및 평가[Step 1] Risk measurement and evaluation of data situation

[제 2 단계] 데이터 상황을 고려한 총 위험도(D=A+B+C) 측정 및 처리수준 평가[Step 2] Measure the total risk (D=A+B+C) considering the data situation and evaluate the processing level

[제 3 단계] [제 2 단계]의 처리수준 평가 결과(위험도 총 3단계 수준)에 따른 가명처리 안전성 평가 기준 적용에 대한 만족 여부 평가로, 하기의 표 22와 같은 가명처리 안전성 평가 기준을 기초로 평가[Step 3] This is an evaluation of whether or not the application of the safety evaluation criteria for pseudonymization is satisfied according to the treatment level evaluation result (3 levels of risk in total) of [Step 2], based on the safety evaluation criteria for pseudonymization as shown in Table 22 below. rated as

[제 4 단계] [제 3 단계] 평가 결과 가부에 따라 처리 기준을 만족할 경우 적정, 그렇지 않을 경우 부적정으로 판정[Step 4] [Step 3] If the evaluation result satisfies the treatment criteria, it is judged as appropriate; otherwise, it is judged as unsuitable.

[제 5 단계] 평가 전 과정을 기록 보관[Step 5] Record the entire evaluation process

여기서, 가명처리는 모든 데이터 주체와 연결을 제거하고 데이터 대상 및 하나 이상의 가명과 관련된 특성의 특정 집합 사이의 연관성을 추가하는 비식별의 특정 유형을 말하는 것이다.Here, pseudonymization refers to a specific type of de-identification that removes associations with all data subjects and adds associations between data objects and a specific set of characteristics associated with one or more pseudonyms.

또한, 비가역적 가명처리는 가명에서 원본 식별자로 다시 추적, 계산될 수 없는 상황을 말하는 것으로 임시표(temporary table)가 프로세스 중에 사용될 수 있지만 프로세스가 완료되면 제거되는 것을 말하며, 가역적 가명처리는 가명에서 원본 식별자로 다시 추적, 계산될 수 있는 상황으로 인증된 곳에서 원래의 신원을 발견하는데 사용할 수 있는 비밀 조회표(secret lookup-table)일 수 있다.In addition, irreversible pseudonymization refers to a situation in which a pseudonym cannot be traced and calculated again from a pseudonym to an original identifier. A temporary table can be used during the process but is removed when the process is completed. It could be a secret lookup-table that can be used to discover the original identity in an authenticated context where it can be traced back and computed as an identifier.

또한, 식별 가능성은 가명처리된 데이터 세트로부터 정보 주체를 알아볼(식별할) 가능성을 의미하고, 복원 가능성은 가명처리 과정에서 생성된 추가적인 정보(암호키, 매핑테이블 등)가 없는 상황에서 원본 데이터 주체를 복원해 낼(가명처리 이전으로 되돌릴) 가능성을 의미한다.In addition, identifiability refers to the possibility of recognizing (identifying) the data subject from the pseudonymized data set, and reconstructability refers to the possibility of recognizing (identifying) the data subject from the pseudonymized data set. means the possibility of restoring (reverting to before pseudonymization).

또한, 처리 대상은 고유식별자, 준식별자 중 일부 및 이상치(Outlier)를 포함하며, 고유식별자는 데이터 세트에서 데이터 주체를 독립적으로 선정(singles out)해 내는 데이터 세트에서의 속성을 의미하고, 준식별자 중 일부는 쉽게 얻을 수 있는 준식별자나 혹은 공격자로부터 외부 데이터와의 결합 위험도가 높은 속성들을 의미하며, 이상치(Outlier)는 특이한 혹은 드문(rare) 자료를 의미한다.In addition, the processing target includes a unique identifier, some of the quasi-identifiers, and outliers, and the unique identifier means an attribute in the data set that independently selects the data subject from the data set (singles out), and the quasi-identifier Some of these refer to easily obtainable quasi-identifiers or attributes with a high risk of combining with external data from an attacker, and outliers refer to unusual or rare data.

다음으로, 익명처리의 경우에 대한 적정성 평가 절차에 대해 설명한다.Next, the adequacy evaluation procedure for the case of anonymization will be described.

[제 3 단계] [제 2 단계]의 처리수준 평가 결과(위험도 총 5단계 수준)에 따른 익명처리 안전성 평가 기준 적용에 대한 만족 여부 평가로, 국제표준인 ISO/IEC 20889에서 제시한 3가지, 즉, 특정가능성(Single out), 연결가능성, 추론가능성 기준에 따라 각 수준(Level)별로 하기의 표 23 내지 표 25의 평가 기준 및 방법을 적용하여 평가[Step 3] This is an evaluation of the satisfaction with the application of the safety evaluation criteria for anonymization according to the processing level evaluation result of [Step 2] (5 levels of risk in total). That is, evaluation by applying the evaluation criteria and methods of Tables 23 to 25 below for each level according to the criteria for specificity (Single out), linkability, and inference possibility

[제 4 단계] [제 3 단계] 평가 결과 가부에 따라 처리 기준을 만족할 경우 적정, 그렇지 않을 경우 부적정으로 판정 [Step 4] [Step 3] If the evaluation result satisfies the treatment criteria, it is judged as appropriate; otherwise, it is judged as unsuitable.

여기서, 특정가능성(Single)은 데이터 주체를 고유 식별하기 위해 데이터 세트의 특성 집합을 관찰하여 해당 데이터 주체에 속한 레코드를 격리(isolation)하는 행위를 의미한다(표 23 참조).Here, the specificity (Single) refers to the act of isolating records belonging to the data subject by observing the set of characteristics of the data set in order to uniquely identify the data subject (see Table 23).

또한, 연결 가능성은 동일한 데이터 주체 혹은 데이터 주체 그룹과 관련된 레코드를, 별도의 데이터 세트에 연결하는 행위를 의미한다(표 24 참조).Linkability also refers to the act of linking records related to the same data subject or group of data subjects to a separate data set (see Table 24).

또한, 추론 가능성은 무시할 수 없는 확률로 다른 속성 집합의 값에서 속성의 값을 추론하는 행위를 의미한다(표 25 참조).In addition, inferability refers to the act of inferring the value of an attribute from the value of another attribute set with a non-negligible probability (see Table 25).

분류classification 사용 형태mode of use 평가 기준 및 방법Evaluation Criteria and Methods 최종 위험도 측정 결과의 LevelLevel of final risk measurement result Level 1Level 1 내부분석실internal analysis room [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 샌드박스
(밀실)sandbox
(den) [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 데이터이용
합의서Data use
agreement [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is treated as unique, if the outlier is not processed, it is not appropriate. Otherwise, it is appropriate. Level 2Level 2 내부분석실internal analysis room [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 샌드박스
(밀실)sandbox
(den) [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 데이터이용
합의서Data use
agreement [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is treated as unique, if the outlier is not processed, it is not appropriate. Otherwise, it is appropriate. Level 3Level 3 내부분석실internal analysis room [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 [단계 3]으로 이동
[단계 3] 준식별자가 아닌 컬럼에 대한 원 데이터와의 완전 일치여부 확인, 완전 일치하는 데이터가 없으면 적정, 완전 일치하는 데이터가 있으면 [단계 4]로 이동
[단계 4] 완전 일치하는 데이터의 값의 분석 목적에 대한 필요성 확인, 필요성이 있으면 관리적 절차와 이용환경의 보호수준을 감안하여 적정 여부를 판단, 필요성이 없으면 부적정
※ 원본과의 대조 가능성을 제거하기 위한 관리적 절차와 이용환경의 보호수준을 고려[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is processed as unique, if the outlier is not processed, it is not appropriate. Otherwise, go to [Step 3].
[Step 3] Check whether the column that is not a quasi-identifier exactly matches the original data, if there is no exact match, it is appropriate
[Step 4] Check the necessity for the analysis purpose of the data value that matches perfectly, if necessary, judge whether it is appropriate in consideration of the management procedure and the level of protection of the use environment, if there is no need, it is inappropriate
※ Consideration of administrative procedures to eliminate the possibility of collation with the original and the level of protection of the use environment 샌드박스
(밀실)sandbox
(den) [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 데이터이용
합의서Data use
agreement [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is treated as unique, if the outlier is not processed, it is not appropriate. Otherwise, it is appropriate. 최종 위험도 측정
결과의 LevelFinal risk measurement
Result Level Level 4Level 4 내부분석실internal analysis room [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 [단계 3]으로 이동
[단계 3] 준식별자가 아닌 컬럼에 대한 원 데이터와의 완전 일치여부 확인, 완전 일치하는 데이터가 없으면 적정, 완전 일치하는 데이터가 있으면 [단계 4]로 이동
[단계 4] 완전 일치하는 데이터의 값의 분석 목적에 대한 필요성 확인, 필요성이 있으면 관리적 절차와 이용환경의 보호수준을 감안하여 적정 여부를 판단, 필요성이 없으면 부적정
※ 원본과의 대조 가능성을 제거하기 위한 관리적 절차와 이용환경의 보호수준을 고려[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is processed as unique, if the outlier is not processed, it is not appropriate. Otherwise, go to [Step 3].
[Step 3] Check whether the column that is not a quasi-identifier exactly matches the original data, if there is no exact match, it is appropriate
[Step 4] Check the necessity for the analysis purpose of the data value that matches perfectly, if necessary, judge whether it is appropriate in consideration of the management procedure and the level of protection of the use environment, if there is no need, it is inappropriate
※ Consideration of administrative procedures to eliminate the possibility of collation with the original and the level of protection of the use environment 샌드박스
(밀실)sandbox
(den) [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정[Step 1] Check the uniqueness of the quasi-identifiers, if unique, not appropriate, otherwise appropriate 데이터이용
합의서Data use
agreement [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 [단계 3]으로 이동
[단계 3] 준식별자가 아닌 컬럼에 대한 원 데이터와의 완전 일치여부 확인, 완전 일치하는 데이터가 없으면 적정, 완전 일치하는 데이터가 있으면 [단계 4]로 이동
[단계 4] 완전 일치하는 데이터 값의 분석 목적에 대한 필요성 확인, 필요성이 있으면 계약서의 내용 및 사용 환경에 대한 내용을 기초로 적정 여부를 결정[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is processed as unique, if the outlier is not processed, it is not appropriate. Otherwise, go to [Step 3].
[Step 3] Check whether the column that is not a quasi-identifier exactly matches the original data, if there is no exact match, it is appropriate
[Step 4] Confirm the need for the analysis purpose of the exact data value, and if necessary, determine whether it is appropriate based on the contents of the contract and the usage environment Level 5Level 5 데이터이용
합의서Data use
agreement [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 [단계 3]으로 이동
[단계 3] 준식별자가 아닌 컬럼에 대한 원 데이터와의 완전 일치여부 확인, 완전 일치하는 데이터가 없으면 적정, 완전 일치하는 데이터가 있으면 [단계 4]로 이동
[단계 4] 완전 일치하는 데이터 값의 분석 목적에 대한 필요성 확인, 필요성이 있으면 계약서의 내용 및 사용 환경에 대한 내용을 기초로 적정 여부를 결정[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is processed as unique, if the outlier is not processed, it is not appropriate. Otherwise, go to [Step 3].
[Step 3] Check whether the column that is not a quasi-identifier exactly matches the original data, if there is no exact match, it is appropriate
[Step 4] Confirm the need for the analysis purpose of the exact data value, and if necessary, determine whether it is appropriate based on the contents of the contract and the usage environment 완전공개full disclosure [단계 1] 준식별자들에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 [단계 2]로 이동
[단계 2] 이상치(Outlier)에 대한 유일성(Unique) 처리 여부 확인, 이상치(Outlier)에 대한 처리가 되어 있지 않으면 부적정 그렇지 않으면 [단계 3]으로 이동
[단계 3] 준식별자가 아닌 컬럼에 대한 원 데이터와의 완전 일치여부 확인, 완전 일치하는 데이터가 없으면 적정, 완전 일치하는 데이터가 있으면 [단계 4]로 이동
[단계 4] 완전 일치하는 데이터가 있으면 그 데이터에 대한 유일성(Unique) 확인, 유일성(Unique)이 있으면 부적정 그렇지 않으면 적정
※ 완전하게 일치하지 않게 하는 기법으로는 잡음추가, 범주화 등이 있으며 이 기법의 컬럼별 적용 여부를 확인[Step 1] Check the uniqueness of the quasi-identifiers, if it is unique, it is invalid. Otherwise, go to [Step 2]
[Step 2] Check whether the outlier is processed as unique, if the outlier is not processed, it is not appropriate. Otherwise, go to [Step 3].
[Step 3] Check whether the column that is not a quasi-identifier exactly matches the original data, if there is no exact match, it is appropriate
[Step 4] If there is an exact match, check the uniqueness of the data, if unique, it is negative, otherwise it is appropriate
※ Techniques that do not completely match include noise addition and categorization, and check whether this technique is applied to each column.

분류classification 사용 형태mode of use 평가 기준 및 방법Evaluation Criteria and Methods 최종 위험도 측정 결과의 LevelLevel of final risk measurement result Level 1Level 1 내부분석실internal analysis room [단계 1] 내부의 정보관리 방안 등을 기초로 원본과의 연결가능성의 위험에 대해 충분한 통제가 있으면 적정 그렇지 않으면 평가단의 판단에 따라 적정여부를 결정[Step 1] It is appropriate if there is sufficient control over the risk of linkability with the original based on the internal information management plan. 샌드박스(밀실)Sandbox (closed room) [단계 1] 샌드박스(밀실)에 분석 대상 이외에 다른 데이터가 있는지 확인
[단계 2] 다른 데이터가 없으면 연결 가능성은 없는 것으로 적정으로 판단, 있으면 [단계 3]으로 이동
[단계 3] K-익명성의 적용여부를 확인하고 K-익명성이 적용되어 있으면 적정으로 판단, K-익명성이 적용되어 있지 않으면 적정성 평가단의 판단에 따라 적정 여부를 결정
* K-익명성에 대한 보호 수준(대표 또는 평균 K값 등)은 적정성 평가단이 판단[Step 1] Check if there is any data other than the analysis target in the sandbox (closed room)
[Step 2] If there is no other data, it is judged appropriate that there is no connection possibility. If there is, go to [Step 3]
[Step 3] Check whether K-Anonymity is applied or not, and if K-Anonymity is applied, it is judged as appropriate.
* The level of protection against K-anonymity (representative or average K value, etc.) is determined by the adequacy evaluation team. 데이터이용
합의서Data use
agreement [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 계약서를 검토하여 계약서의 내용에 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 기초로 적정 여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] Review the contract and decide whether it is appropriate based on whether technical, administrative, and procedural measures to sufficiently reduce connection attacks are established in the contract Level 2Level 2 내부분석실internal analysis room [단계 1] 내부의 정보관리 방안 등을 검토하여 원본과의 연결가능성의 위험에 대해 충분한 통제가 있으면 적정 그렇지 않으면 평가단의 판단에 따라 적정 여부를 결정[Step 1] If there is sufficient control over the risk of linkability with the original by reviewing the internal information management plan, etc., it is appropriate; 샌드박스
(밀실)sandbox
(den) [단계 1] 샌드박스(밀실)에 분석 대상 이외에 다른 데이터가 있는지 확인
[단계 2] 다른 데이터가 없으면 연결 가능성은 없는 것으로 적정으로 판단, 있으면 [단계 3]으로 이동
[단계 3] K-익명성의 적용여부를 확인하고 K-익명성이 적용되어 있으면 적정으로 판단, K-익명성이 적용되어 있지 않으면 적정성 평가단의 판단에 따라 적정 여부를 결정
* K-익명성에 대한 보호 수준(대표 또는 평균 K값 등)은 적정성 평가단에 의해 판단[Step 1] Check if there is any data other than the analysis target in the sandbox (closed room)
[Step 2] If there is no other data, it is judged appropriate that there is no connection possibility. If there is, go to [Step 3]
[Step 3] Check whether K-Anonymity is applied or not, and if K-Anonymity is applied, it is judged as appropriate.
* The level of protection against K-anonymity (representative or average K value, etc.) is judged by the adequacy evaluation team. 데이터이용
합의서Data use
agreement [단계 1] 준식별자에 대한 K-익명성 적용 여부를 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 계약서를 검토하여 계약서의 내용에 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 기초로 적정 여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] Review the contract and decide whether it is appropriate based on whether technical, administrative, and procedural measures to sufficiently reduce connection attacks are established in the contract Level 3Level 3 내부분석실internal analysis room [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 내부의 정보관리 방안 등을 검토하여 원본과의 연결가능성의 위험 및 다른 데이터와의 연결가능성에 대해 충분한 통제가 있으면 적정 그렇지 않으면 평가단의 판단에 따라 적정여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] If there is sufficient control over the risk of linkability to the original source and linkability with other data by reviewing the internal information management plan, etc. 샌드박스
(밀실)sandbox
(den) [단계 1] 샌드박스(밀실)에 분석 대상 이외에 다른 데이터가 있는지 확인
[단계 2] 다른 데이터가 없으면 연결 가능성은 없는 것으로 적정으로 판단, 있으면 [단계 3]으로 이동
[단계 3] K-익명성의 적용여부를 확인하고 K-익명성이 적용되어 있으면 적정으로 판단, K-익명성이 적용되어 있지 않으면 부적정으로 판단
* K-익명성에 대한 보호 수준(대표 또는 평균 K값 등)은 적정성 평가단에 의해 판단[Step 1] Check if there is any data other than the analysis target in the sandbox (closed room)
[Step 2] If there is no other data, it is judged appropriate that there is no connection possibility. If there is, go to [Step 3]
[Step 3] Check whether K-anonymity is applied, and if K-anonymity is applied, it is judged as appropriate; if K-anonymity is not applied, it is judged as inappropriate
* The level of protection against K-anonymity (representative or average K value, etc.) is judged by the adequacy evaluation team. 데이터이용
합의서Data use
agreement [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 이전 특정 가능성에 대한 평가에서 완전공개 수준으로 데이터가 비식별 되어 있는 경우 적정, 그렇지 않으면 [단계 4]로 이동
[단계 4] 계약서를 검토하여 계약서의 내용에 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 기초로 적정 여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] Titration if the data was de-identified at the full disclosure level in the previous assessment of specific likelihood, otherwise go to [Step 4]
[Step 4] Review the contract and decide whether it is appropriate based on whether technical, administrative, and procedural measures to sufficiently reduce connection attacks are established in the contract details Level 4Level 4 내부분석실internal analysis room [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 내부의 정보관리 방안 등을 기초로 원본과의 연결가능성의 위험 및 다른 데이터와의 연결가능성에 대해 충분한 통제가 있으면 [단계 4]로 이동,　그렇지 않으면 부적정
[단계 4] 데이터가 주기적으로 제공되는지를 확인하여 주기적으로 제공되는 데이터의 특성에 따른 연결가능성(제공되는 데이터 사이의 연결가능성)을 충분히 배제할 수 있는 기법을 적용한 경우 적정, 그렇지 않으면 적정성 평가단의 판단에 따라 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정
※ 원본과의 대조 가능성을 제거하기 위한 관리적 절차와 이용환경의 보호수준을 고려[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] If there is sufficient control over the risk of linkability to the original source and linkability with other data based on the internal information management plan, move to [Step 4]; otherwise, it is inappropriate.
[Step 4] If a technique that can sufficiently exclude linkability (connectivity between provided data) according to the characteristics of periodically provided data is applied by checking whether data is provided periodically, it is appropriate, otherwise the It is decided whether it is appropriate by reviewing whether technical, administrative, and procedural measures have been established to sufficiently reduce connection attacks according to the judgment.
※ Consideration of administrative procedures to eliminate the possibility of collation with the original and the level of protection of the use environment 샌드박스(밀실)Sandbox (closed room) [단계 1] 샌드박스(밀실)에 분석 대상 이외에 다른 데이터가 있는지 확인
[단계 2] 다른 데이터가 없으면 연결 가능성은 없는 것으로 적정으로 판단, 있으면 [단계 3]으로 이동
[단계 3] K-익명성의 적용여부를 확인하고 K-익명성이 적용되어 있으면 적정으로 판단, K-익명성이 적용되어 있지 않으면 부적정으로 판단
* K-익명성에 대한 보호 수준(대표 또는 평균 K값 등)은 적정성 평가단에 의해 판단[Step 1] Check if there is any data other than the analysis target in the sandbox (closed room)
[Step 2] If there is no other data, it is judged appropriate that there is no connection possibility. If there is, go to [Step 3]
[Step 3] Check whether K-anonymity is applied, and if K-anonymity is applied, it is judged as appropriate; if K-anonymity is not applied, it is judged as inappropriate
* The level of protection against K-anonymity (representative or average K value, etc.) is judged by the adequacy evaluation team. 데이터이용
합의서Data use
agreement [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 데이터 민감도 평가에서 매우 높음 이상의 평가를 받은 경우 부적정, 그렇지 않으면 [단계 2]로 이동
[단계 2] QI에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 3] 검토, 적용되어 있지 않으면 부적정
[단계 3] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 4] 검토, 충분하지 않으면 부적정
[단계 4] 이전 특정 가능성에 대한 평가에서 완전공개 수준으로 데이터가 비식별 되어 있는 경우 적정, 그렇지 않으면 [단계 5] 검토
[단계 5] 데이터가 주기적으로 제공되는지를 확인하여 주기적으로 제공되는 데이터의 특성에 따른 연결가능성(제공되는 데이터 사이의 연결가능성)을 충분히 배제할 수 있는 기법을 적용한 경우 [단계 6] 검토 그렇지 않으면 부적정
[단계 6] 계약서를 기초로 계약서의 내용에 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] Check the sensitivity of the data you want to use, and if the data sensitivity evaluation receives an evaluation of very high or higher, it is inappropriate, otherwise go to [Step 2]
[Step 2] Review whether K-Anonymity is applied to QI and if it is applied, [Step 3] Review, if not, Inappropriate
[Step 3] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, [Step 4] Review, if not sufficient, it is inappropriate
[Step 4] Appropriate if the data was de-identified at the full disclosure level in the previous assessment of specific possibilities, otherwise [Step 5] Review
[Step 5] If a technique is applied that can sufficiently exclude linkability (connectivity between provided data) according to the characteristics of periodically provided data by checking whether data is provided periodically [Step 6] Review otherwise inappropriate
[Step 6] Based on the contract, it is decided whether it is appropriate by reviewing whether technical, administrative, and procedural measures that can sufficiently reduce connection attacks are established in the contract contents. Level 5Level 5 데이터이용
합의서Data use
agreement [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 상술한 ‘데이터 민감도’ 평가에서 높음~매우 높음 이상의 평가를 받은 경우 부적정, 그렇지 않으면 [단계 2]로 이동 검토
[단계 2] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 3]으로 이동, 적용되어 있지 않으면 부적정
[단계 3] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 4]로 이동, 충분하지 않으면 부적정
[단계 4] 이전 특정 가능성에 대한 평가에서 완전공개 수준으로 데이터가 비식별 되어 있는 경우 적정, 그렇지 않으면 [단계 5]로 이동
[단계 5] 데이터가 주기적으로 제공되는지를 확인하여 주기적으로 제공되는 데이터의 특성에 따른 연결가능성(제공되는 데이터 사이의 연결가능성)을 충분히 배제할 수 있는 기법을 적용한 경우 [단계 6]으로 이동, 그렇지 않으면 부적정
[단계 6] 계약서를 기초로 계약서의 내용에 연결공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] Check the sensitivity of the data to be utilized, and if the above-mentioned 'Data Sensitivity' evaluation received an evaluation of high to very high, it is inappropriate, otherwise move to [Step 2] Review
[Step 2] Review whether K-anonymity is applied to quasi-identifiers, and if it is applied, go to [Step 3]. If not, go to [Step 3].
[Step 3] For the K value of K-anonymity, if a sufficient level of K is applied according to the judgment of the adequacy evaluation team, go to [Step 4]. If not enough, it is inappropriate.
[Step 4] Titration if the data was de-identified at the full disclosure level in the previous assessment of specific likelihood, otherwise go to [Step 5]
[Step 5] If a technique is applied that can sufficiently exclude linkability (connectivity between provided data) according to the characteristics of periodically provided data by checking whether data is provided periodically, go to [Step 6]; otherwise inappropriate
[Step 6] Based on the contract, it is decided whether it is appropriate by reviewing whether technical, administrative, and procedural measures that can sufficiently reduce connection attacks are established in the contract contents. 완전공개full disclosure [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 상술한 ‘데이터 민감도’ 평가에서 보통(Normal) 이상의 평가를 받은 경우 부적정, 그렇지 않으면 [단계 2]로 이동
[단계 2] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 3]으로 이동, 적용되어 있지 않으면 부적정
[단계 3] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 적정, 충분하지 않으면 부적정
(예를 들어, 미국, 캐나다의 경우 대표 K값이 20이상인 경우 완전 공개에 사용함)[Step 1] Check the sensitivity of the data to be utilized and if the above evaluation of 'Data Sensitivity' is evaluated above Normal, it is inappropriate, otherwise go to [Step 2]
[Step 2] Review whether K-anonymity is applied to quasi-identifiers, and if it is applied, go to [Step 3]. If not, go to [Step 3].
[Step 3] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, it is appropriate, if it is not sufficient, it is inappropriate
(For example, in the United States and Canada, if the representative K value is 20 or more, it is used for full disclosure)

분류classification 사용 형태mode of use 평가 기준 및 방법Evaluation Criteria and Methods 최종 위험도 측정 결과의 LevelLevel of final risk measurement result Level 1Level 1 내부분석실internal analysis room [단계 1] 내부의 정보관리 방안 등을 기초로 추론공격으로 인한 식별에 대한 충분한 통제가 있으면 적정으로 판단하고, 그렇지 않으면 평가단의 판단에 따라 적정여부를 결정[Step 1] If there is sufficient control over identification due to inference attack based on the internal information management plan, it is judged as appropriate; otherwise, it is determined as appropriate according to the judgment of the evaluation team 샌드박스
(밀실)sandbox
(den) [단계 1] 추론 가능성이 없는 것으로 판정[Step 1] Determining that there is no possibility of inference 데이터이용
합의서Data use
agreement [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 계약서를 기초로 계약서의 내용에 추론공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] Based on the contract, it is decided whether it is appropriate by reviewing whether technical, administrative, and procedural measures that can sufficiently reduce inference attacks are established in the contract content. Level 2Level 2 내부분석실internal analysis room [단계 1] 내부의 정보관리 방안 등을 기초로 추론공격으로 인한 식별에 대한 충분한 통제가 있으면 적정으로 판단하고, 그렇지 않으면 평가단의 판단에 따라 적정 여부를 결정[Step 1] If there is sufficient control over identification due to inference attack based on the internal information management plan, it is judged as appropriate; otherwise, it is determined as appropriate according to the judgment of the evaluation team 샌드박스
(밀실)sandbox
(den) [단계 1] 추론 가능성이 없는 것으로 판정[Step 1] Determining that there is no possibility of inference 데이터이용
합의서Data use
agreement [단계 1] 준식별자에 대한 K-익명성 적용 여부 검토하여 적용되어 있으면 [단계 2]로 이동, 적용되어 있지 않으면 부적정
[단계 2] K-익명성의 K값에 대해 적정성 평가단의 판단에 따라 충분한 수준의 K값이 적용되어 있으면 [단계 3]으로 이동, 충분하지 않으면 부적정
[단계 3] 계약서를 기초로 계약서의 내용에 추론공격을 충분히 감소시킬 수 있는 기술적, 관리적, 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] Review whether K-anonymity is applied to quasi-identifiers and, if applicable, go to [Step 2]. If not, go to [Step 2].
[Step 2] For the K value of K-anonymity, if a sufficient level of K value is applied according to the judgment of the adequacy evaluation team, go to [Step 3], if not sufficient, it is inappropriate
[Step 3] Based on the contract, it is decided whether it is appropriate by reviewing whether technical, administrative, and procedural measures that can sufficiently reduce inference attacks are established in the contract content. Level 3Level 3 내부분석실internal analysis room [단계 1] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는 지를 기초로 모든 컬럼에 적용되어 있는 경우 적정으로 판단, 그렇지 않은 경우 [단계 2]로 이동
[단계 2] 추론공격을 방어할 수 있는 비식별 조치 기법이 적용된 컬럼의 비율을 검토하여 평가단의 판단에 따라 부적정이 아니라고 판단되면 [단계 3]으로 이동(추론 가능성을 컬럼별로 검토하여 가능성이 높은 컬럼으로부터 위험도가 매우 높음의 경우 3/5, 높음의 경우 1/2이상, 보통의 경우 최소 2/5이상, 낮음의 경우 최소 1/3이상, 매우 낮음의 경우 최소 1/5이상 적용 권고)
[단계 3] 내부의 정보관리 방안 등을 기초로 추론공격으로 인한 식별에 대한 충분한 통제가 있으면 적정으로 판단, 그렇지 않으면 평가단의 판단에 따라 적정여부를 결정[Step 1] It is judged appropriate if it is applied to all columns based on whether non-identification measures (refer to ISO/IEC 20889, such as noise addition) that can prevent inference attacks are sufficiently applied to all columns except for quasi-identifiers , otherwise go to [Step 2]
[Step 2] Review the ratio of columns to which non-identification measures that can defend against inference attacks are applied, and if it is judged that it is not inappropriate according to the judgment of the evaluation team, move to [Step 3] From the column, it is recommended to apply 3/5 for very high risk, 1/2 or more for high, at least 2/5 for normal, at least 1/3 or more for low, and at least 1/5 for very low)
[Step 3] If there is sufficient control over the identification due to inference attack based on the internal information management plan, it is judged as appropriate; otherwise, it is decided according to the judgment of the evaluation team. 샌드박스
(밀실)sandbox
(den) [단계 1] 데이터의 구성이 일반적인 상식으로 추론이 가능한 내용이 포함되어 있는지 확인하여 포함되어 있지 않으면 적정으로 판단하고, 그렇지 않고 포함되어 있는 경우 적정성 평가단의 판단에 따라 적정 여부를 결정[Step 1] Check whether the composition of the data contains content that can be inferred by common sense, and if it is not included, it is judged as appropriate. 데이터이용
합의서Data use
agreement [단계 1] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는 지를 기초로 모든 컬럼에 적용되어 있는 경우 적정으로 판단하고, 그렇지 않은 경우 [단계 2]로 이동
[단계 2] 추론공격을 방어할 수 있는 비식별 조치 기법이 적용된 컬럼의 비율을 검토하여 평가단의 판단에 따라 부적정이 아니라고 판단되면 [단계 3]으로 이동(추론 가능성을 컬럼별로 검토하여 가능성이 높은 컬럼으로부터 위험도가 매우 높음의 경우 3/4 이상, 높음의 경우 2/3이상, 보통의 경우 최소 1/2이상, 낮음의 경우 최소 1/3이상, 매우 낮음의 경우 최소 1/4이상 적용 권고)
[단계 3] 계약서를 기초로 계약서의 내용에 추론공격을 충분히 감소시킬 수 있는 기술적, 관리적 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] It is judged appropriate if it is applied to all columns based on whether non-identification measures (refer to ISO/IEC 20889, such as noise addition) that can prevent inference attacks are sufficiently applied to all columns except for quasi-identifiers and, if not, go to [Step 2]
[Step 2] Review the ratio of columns to which non-identification measures that can defend against inference attacks are applied, and if it is judged that it is not inappropriate according to the judgment of the evaluation team, move to [Step 3] From the column, it is recommended to apply at least 3/4 or more in the case of very high risk, 2/3 or more in the case of high, at least 1/2 or more in the normal case, at least 1/3 or more in the case of low risk, and at least 1/4 or more in the case of very low risk. )
[Step 3] Based on the contract, determine whether it is appropriate by reviewing whether technical and administrative procedural measures that can sufficiently reduce inference attacks are established in the contract's contents Level 4Level 4 내부분석실internal analysis room [단계 1] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는 지를 기초로 모든 컬럼에 적용되어 있는 경우 적정으로 판단하고, 그렇지 않은 경우 [단계 2]로 이동
[단계 2] 추론공격을 방어할 수 있는 비식별 조치 기법이 적용된 컬럼의 비율을 검토하여 평가단의 판단에 따라 부적정이 아니라고 판단되면 [단계 3]으로 이동(추론 가능성을 컬럼별로 검토하여 가능성이 높은 컬럼으로부터 위험도가 매우 높음의 경우 4/5, 높음의 경우 3/4이상, 보통의 경우 최소 2/3이상, 낮음의 경우 최소 1/2이상, 매우 낮음의 경우 최소 2/5이상 적용 권고)
[단계 3] 내부의 정보관리 방안 등을 기초로 추론공격으로 인한 식별에 대한 충분한 통제가 있으면 적정으로 판단하고, 그렇지 않으면 평가단의 판단에 따라 적정 여부를 결정[Step 1] It is judged appropriate if it is applied to all columns based on whether non-identification measures (refer to ISO/IEC 20889, such as noise addition) that can prevent inference attacks are sufficiently applied to all columns except for quasi-identifiers and, if not, go to [Step 2]
[Step 2] Review the ratio of columns to which non-identification measures that can defend against inference attacks are applied, and if it is judged that it is not inappropriate according to the judgment of the evaluation team, move to [Step 3] From the column, it is recommended to apply 4/5 for very high risk, 3/4 or more for high, at least 2/3 or more for normal, at least 1/2 or more for low, and at least 2/5 or more for very low)
[Step 3] If there is sufficient control over identification due to inference attack based on the internal information management plan, it is judged as appropriate; otherwise, it is determined as appropriate according to the judgment of the evaluation team 샌드박스
(밀실)sandbox
(den) [단계 1] 데이터의 구성이 일반적인 상식으로 추론이 가능한 내용이 포함되어 있는지 확인하여 포함되어 있지 않으면 적정, 그렇지 않고 포함되어 있는 경우 적정성 평가단의 판단에 따라 적정 여부를 결정[Step 1] Check whether the composition of the data contains content that can be inferred by common sense, and if it is not included, it is appropriate. 데이터이용
합의서Data use
agreement [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 상술한 데이터 민감도 평가에서 매우높음 이상의 평가를 받은 경우 부적정, 그렇지 않으면 단계 2 검토
[단계 2] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는 지를 기초로 모든 컬럼에 적용되어 있는 경우 적정, 그렇지 않은 경우 [단계 3]을 검토
[단계 3] 추론공격을 방어할 수 있는 비식별 조치 기법이 적용된 컬럼의 비율을 검토하여 평가단의 판단에 따라 부적정이 아니라고 판단되면 [단계 4]로 이동(추론 가능성을 컬럼별로 검토하여 가능성이 높은 컬럼으로부터 위험도가 높음의 경우 3/4이상, 보통의 경우 최소 2/3이상, 낮음의 경우 최소 1/2이상, 매우 낮음의 경우 최소 1/4이상 적용 권고)
[단계 4] 계약서를 기초로 계약서의 내용에 추론공격을 충분히 감소시킬 수 있는 기술적, 관리적 절차적 방안이 수립되어 있는지를 검토하여 적정여부를 결정[Step 1] Check the sensitivity of the data to be utilized, and if the above-mentioned data sensitivity evaluation received an evaluation of very high or higher, it is inappropriate, otherwise, review stage 2
[Step 2] Appropriate if applied to all columns based on whether the de-identification action technique (refer to ISO/IEC 20889, such as noise addition, etc.) that can prevent inference attacks is sufficiently applied to all columns except for quasi-identifiers. If not, review [Step 3]
[Step 3] Review the ratio of columns to which non-identification measures that can defend against inference attacks are applied, and if it is determined that it is not inappropriate according to the judgment of the evaluation team, move to [Step 4] It is recommended to apply at least 3/4 or more in the case of high risk from the column, at least 2/3 or more in the normal case, at least 1/2 or more in the case of low risk, and at least 1/4 or more in the case of very low risk)
[Step 4] Based on the contract, determine whether it is appropriate by reviewing whether technical and administrative procedural measures to sufficiently reduce inference attacks are established in the contract content Level 5Level 5 데이터이용
합의서Data use
agreement [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 상술한 ‘데이터 민감도’ 평가에서 높음~매우높음 이상의 평가를 받은 경우 부적정으로 판단하고, 그렇지 않으면 [단계 2]로 이동
[단계 2] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는지를 기초로 모든 컬럼에 적용되어 있는 경우 적정으로 판단하고, 그렇지 않은 경우 [단계 3]으로 이동
[단계 3] 추론공격을 방어할 수 있는 비식별 조치 기법이 적용된 컬럼의 비율을 검토하여 평가단의 판단에 따라 부적정이 아니라고 판단되면 [단계 4]로 이동(추론 가능성을 컬럼별로 검토하여 가능성이 높은 컬럼으로부터 보통의 경우 최소 3/4이상, 낮음의 경우 최소 2/3이상, 매우 낮음의 경우 최소 1/2이상 적용 권고)
[단계 4] 계약서를 기초로 계약서의 내용에 추론공격을 충분히 감소시킬 수 있는 기술적, 관리적 절차적 방안이 수립되어 있는지를 검토하여 적정 여부를 결정[Step 1] Check the sensitivity of the data you want to use and if it receives an evaluation of high to very high or higher in the above-mentioned 'data sensitivity' evaluation, it is judged as inappropriate, otherwise go to [Step 2]
[Step 2] It is judged appropriate if it is applied to all columns based on whether the de-identification action technique (refer to ISO/IEC 20889 such as noise addition) is sufficiently applied to all columns except for quasi-identifiers. and, if not, go to [Step 3]
[Step 3] Review the ratio of columns to which non-identification measures that can defend against inference attacks are applied, and if it is determined that it is not inappropriate according to the judgment of the evaluation team, move to [Step 4] From the column, it is recommended to apply at least 3/4 or more in the normal case, at least 2/3 or more in the case of low, and at least 1/2 or more in the case of very low)
[Step 4] Based on the contract, it is decided whether it is appropriate by reviewing whether technical and administrative procedural measures that can sufficiently reduce inference attacks are established in the contents of the contract. 완전공개full disclosure [단계 1] 활용하고자 하는 데이터의 민감도를 확인하여 상술한 ‘데이터 민감도’ 평가에서 보통(Normal) 이상의 평가를 받은 경우 부적정으로 판단하고, 그렇지 않으면 [단계 2]로 이동
[단계 2] 준식별자를 제외한 모든 컬럼에 추론 공격을 방어할 수 있는 비식별 조치 기법(노이즈 추가 등 ISO/IEC 20889 참조)이 충분히 적용되어 있는 지를 기초로 모든 컬럼에 적용되어 있는 경우 적정으로 판단하고, 그렇지 않은 경우 부적정[Step 1] Check the sensitivity of the data to be utilized and if the above evaluation of 'Data Sensitivity' has been evaluated above Normal, it is judged as inappropriate, otherwise go to [Step 2]
[Step 2] It is judged appropriate if it is applied to all columns based on whether non-identification measures (refer to ISO/IEC 20889 such as noise addition) that can prevent inference attacks are sufficiently applied to all columns except for quasi-identifiers and, otherwise, inappropriate

본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 본 발명에 따른 구성요소를 치환, 변형 및 변경할 수 있다는 것이 명백할 것이다.The present invention is not limited by the above embodiments and the accompanying drawings. For those of ordinary skill in the art to which the present invention pertains, it will be apparent that the components according to the present invention can be substituted, modified and changed without departing from the technical spirit of the present invention.

Claims

measuring the degree of risk for the data situation;
calculating a total risk in consideration of the data situation and determining a processing level;
performing pseudonymization or anonymization processing in consideration of the data situation according to the determined processing level;
evaluating the adequacy of the de-identified data set on which the pseudonymization or anonymization has been performed; and
Completing de-identification measures if determined to be appropriate;
The method of de-identification of personal information, characterized in that the de-identification measures include pseudonymization or anonymization.

The method of claim 1,
The data situation is a personal information de-identification method, characterized in that it includes a data utilization method, a data use environment, and the data itself.

3. The method of claim 2,
In the measuring the risk, the personal information de-identification measures method, characterized in that each of the risk for the data utilization method, the risk for the data use environment, and the risk for the data itself are measured according to predefined indicators.

4. The method of claim 3,
In the step of calculating the total risk and determining the processing level, the total risk is calculated by adding up the risk for the data utilization method, the risk to the data use environment, and the risk to the data itself according to a preset ratio, and the total risk is A method of de-identification of personal information, characterized in that it determines the level of processing based on it.

The method of claim 1,
If it is determined to be inappropriate, it re-enters the step of performing the pseudonymization or anonymization process, and repeats the processing until it is determined to be appropriate.