JP5452187B2

JP5452187B2 - Public information privacy protection device, public information privacy protection method and program

Info

Publication number: JP5452187B2
Application number: JP2009268863A
Authority: JP
Inventors: 晋作清本; 俊昭田中
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-11-26
Filing date: 2009-11-26
Publication date: 2014-03-26
Anticipated expiration: 2029-11-26
Also published as: JP2011113285A

Description

本発明は、医療情報などの公開情報に対するプライバシー保護装置、公開情報のプライバシー保護方法およびプログラムに関する。 The present invention relates to a privacy protection device for public information such as medical information, a privacy protection method for public information, and a program.

従来より、多くのデータに基づいて、統計処理を行って、例えば、特定の病気にかかりやすい年代、性別、地域、人種といった情報を広く公開して、その傾向分析を行い、その対策に用いる場合がある。 Conventionally, statistical processing is performed based on a lot of data, for example, information such as age, gender, region, and race that are likely to cause a specific disease is widely disclosed, its trend analysis is performed, and it is used for countermeasures There is a case.

ところが、データを公開する場合には、そのデータの所有者が特定されないように、プライバシーを慎重に保護する必要があるため、データの変形処理を行う必要がある。そのため、今までにも、プライバシーを保護するためのデータの変形処理に関する技術が多く開示されている（例えば、非特許文献１参照。）。 However, when publishing data, since it is necessary to carefully protect the privacy so that the owner of the data is not specified, it is necessary to perform data transformation processing. Therefore, many techniques related to data transformation processing for protecting privacy have been disclosed so far (see, for example, Non-Patent Document 1).

Ｂ．ＦｕｎｇａｎｄＫ．ＷａｎｇａｎｄＰ．Ｙｕ， “Ｔｏｐ−ｄｏｗｎｓｐｅｃｉａｌｉｚａｔｉｏｎｆｏｒｉｎｆｏｒｍａｔｉｏｎａｎｄｐｒｉｖａｃｙｐｒｅｓｅｒｖａｔｉｏｎ”ＰｒｏｃｏｆＩＣＤＥ２００５ｐｐ．２０５−２１６B. Fung and K.K. Wang and P.W. Yu, “Top-down specialization for information and privacy preservation” Proc of ICDE 2005 pp. 205-216

しかしながら、これまでの手法においては、最適k−匿名性を満たすためにすべてのデータを平等に扱っていたが、データ利用者が求める情報が欠落してしまうという問題点があった。 However, in the conventional methods, all data is treated equally to satisfy the optimal k-anonymity, but there is a problem that information required by the data user is lost.

そこで、本発明は、上述の課題に鑑みてなされたものであり、データの加工時に各データに対して優先順位を設定し、関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持する公開情報のプライバシー保護装置、公開情報のプライバシー保護方法およびプログラムを提供することを目的とする。 Therefore, the present invention has been made in view of the above-mentioned problems, and when data is processed, a priority order is set for each data, and the data user evaluates the deformed data using a function. It is an object of the present invention to provide a public information privacy protection device, a public information privacy protection method, and a program that hold as much information as possible.

発明者は、上記の課題を解決するために、以下の事項を提案している。なお、理解を容易にするために、本発明の実施形態に対応する符号を付して説明するが、これに限定されるものではない。 The inventor has proposed the following matters in order to solve the above problems. In addition, in order to make an understanding easy, although the code | symbol corresponding to embodiment of this invention is attached | subjected and demonstrated, it is not limited to this.

（１）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置であって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する設定手段（例えば、図１の設定部２に相当）と、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する算出手段（例えば、図１の算出部３に相当）と、該算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択する加工方法選択手段（例えば、図１の加工方法選択部４に相当）と、該選択された加工方法でデータの加工を行うデータ加工手段（例えば、図１の加工部５に相当）と、を備えたことを特徴とする公開情報のプライバシー保護装置を提案している。 (1) The present invention is a public information privacy protection device for processing data and protecting the privacy of the information to be disclosed. In consideration of the requirements of the user who uses the public information, Setting means for setting priority (weighting) for each attribute (for example, equivalent to the setting unit 2 in FIG. 1) and calculation for calculating evaluation points of each data based on the set priority (weighting) Means (for example, equivalent to the calculation unit 3 in FIG. 1) and processing method selection means (for example, the processing method selection unit in FIG. 1) for selecting a data processing method that minimizes the decrease in the calculated evaluation points. 4) and a data processing means (for example, corresponding to the processing unit 5 in FIG. 1) for processing data by the selected processing method. Proposal To have.

この発明によれば、設定手段は、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する。算出手段は、設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する。加工方法選択手段は、算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択する。データ加工手段は、選択された加工方法でデータの加工を行う。したがって、データの加工時に各データに対して優先順位を設定し、ボトムアップ処理を適用した関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止する。 According to this invention, the setting means sets the priority (weighting) for each attribute of the data in consideration of the requirements of the user who uses the public information. The calculation means calculates an evaluation point for each data based on the set priority (weighting). The processing method selection means selects a data processing method that minimizes the calculated decrease in evaluation points. The data processing means processes data using the selected processing method. Therefore, by setting the priority order for each data when processing the data, by evaluating the data deformed using a function to which a bottom-up process is applied, while maintaining as much information as required by the data user, Prevent data loss.

（２）本発明は、（１）の公開情報のプライバシー保護装置について、前記加工方法選択手段が、ボトムアップ処理を用いて、データの加工方法を選択することを特徴とする公開情報のプライバシー保護装置を提案している。 (2) In the privacy protection device for public information according to (1), the processing method selection means selects a data processing method using bottom-up processing, and privacy protection for public information is provided. A device is proposed.

この発明によれば、加工方法選択手段が、ボトムアップ処理を用いて、データの加工方法を選択する。つまり、ボトムアップ処理においては、各属性について同一データを集めてソート処理及びグループ化処理を行い、各属性の属性値の数を算出し、評価ポイントを算出する。そして、設定された優先順位情報（重み付け）とｋ−匿名性判定からに基づいて、加工処理を行う属性およびグループを選択し、加工処理による評価ポイントの減少分を算出して、選択したグループにおいて加工処理を行い、データセット全体の処理結果に基づいて、ｋ−匿名性の判定を行うため、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止する。 According to this invention, the processing method selection means selects a data processing method using bottom-up processing. That is, in the bottom-up process, the same data is collected for each attribute, the sort process and the grouping process are performed, the number of attribute values of each attribute is calculated, and the evaluation point is calculated. Then, based on the set priority information (weighting) and k-anonymity determination, an attribute and a group to be processed are selected, and a decrease in evaluation points due to the processing is calculated. Since the processing is performed and the determination of k-anonymity is performed based on the processing result of the entire data set, information required by the data user is retained as much as possible, and data loss is prevented.

（３）本発明は、（１）の公開情報のプライバシー保護装置について、前記データ加工手段は、前記設定手段において設定された優先順位（重み付け）が最も低いものからｋ−匿名性を満足するまで加工を行うことを特徴とする公開情報のプライバシー保護装置を提案している。 (3) In the privacy protection device for public information according to (1), the data processing means satisfies the k-anonymity from the lowest priority (weighting) set in the setting means. We have proposed a privacy protection device for public information characterized by processing.

この発明によれば、データ加工手段は、設定手段において設定された優先順位（重み付け）が最も低いものからｋ−匿名性を満足するまで加工を行う。これにより、直接的には、ユーザとの関連性の低い複数の情報を組み合わせることによりユーザを特定することも防止することにより、データ利用者が求める情報を可能な限り保持することができる。 According to this invention, the data processing means performs processing from the lowest priority (weighting) set in the setting means until k-anonymity is satisfied. Thus, by directly preventing a user from being identified by combining a plurality of pieces of information having low relevance with the user, information required by the data user can be retained as much as possible.

（４）本発明は、（１）の公開情報のプライバシー保護装置について、データの各属性を重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）、準識別子（Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ）、削除すべき情報に分類する分類手段（例えば、図１の分類部１に相当）を備えたことを特徴とする公開情報のプライバシー保護装置を提案している。 (4) According to the privacy protection device for public information of (1), the present invention classifies each attribute of data into important information (Sensitive Information), quasi-identifier (Quasi-Identifier), and information to be deleted (for example, , Which corresponds to the classification unit 1 in FIG. 1), has been proposed.

この発明によれば、分類手段は、データの各属性を重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）、準識別子（Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ）、削除すべき情報に分類する。したがって、ユーザを特定できるような情報を排除するとともに、情報の関連性を低くして、複数の情報を組み合わせることによりユーザを特定することも防止できる。 According to this invention, the classification means classifies each attribute of the data into important information (Sensitive Information), a quasi-identifier (Quasi-Identifier), and information to be deleted. Therefore, it is possible to eliminate information that can identify the user, and to reduce the relevance of the information and prevent the user from being identified by combining a plurality of pieces of information.

（５）本発明は、（４）の公開情報のプライバシー保護装置について、前記重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）は、加工の対象とはならず、前記削除すべき情報は、加工の際に、自動的に削除されることを特徴とする公開情報のプライバシー保護装置を提案している。 (5) In the privacy protection device for public information of (4), the present invention does not process the important information (Sensitive Information), and the information to be deleted is automatically processed at the time of processing. It proposes a privacy protection device for public information, which is characterized by being deleted.

この発明によれば、重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）は、加工の対象とはならず、前記削除すべき情報は、加工の際に、自動的に削除される。これにより、ユーザを直接的に特定できる情報を排除してプライバシーを保護するとともに、重要な情報を公開することができる。 According to the present invention, the important information (Sensitive Information) is not a target of processing, and the information to be deleted is automatically deleted at the time of processing. Thereby, it is possible to protect the privacy by excluding information that can directly identify the user, and to disclose important information.

（６）本発明は、（１）の公開情報のプライバシー保護装置について、前記算出手段は、各属性が持つ属性値の種類の数に前記設定手段において設定された優先順位（重み付け）を乗じて前記評価ポイントを算出することを特徴とする公開情報のプライバシー保護装置を提案している。 (6) In the privacy protection device for public information according to (1), the calculation unit multiplies the number of types of attribute values possessed by each attribute by the priority (weighting) set in the setting unit. An apparatus for protecting privacy of public information, characterized in that the evaluation points are calculated, is proposed.

この発明によれば、算出手段は、各属性が持つ属性値の種類の数に前記設定手段において設定された優先順位（重み付け）を乗じて前記評価ポイントを算出する。したがって、属性値の種類の数が多く、優先順位（重み付け）が高いものほど、元の情報が保持される確率が高くなる。 According to this invention, the calculation means calculates the evaluation point by multiplying the number of types of attribute values possessed by each attribute by the priority (weight) set in the setting means. Therefore, the higher the number of attribute value types and the higher the priority (weighting), the higher the probability that the original information is retained.

（７）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置であって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する設定手段と、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する算出手段と、該算出した評価ポイントの増加分が最大となるようなデータの加工方法を選択する加工方法選択手段と、該選択された加工方法でデータの加工を行うデータ加工手段と、を備えたことを特徴とする公開情報のプライバシー保護装置を提案している。 (7) The present invention is a public information privacy protection device for processing data and protecting the privacy of the information to be disclosed. In consideration of the requirements of the user who uses the public information, Setting means for setting priority (weighting) for each attribute; calculation means for calculating evaluation points for each data based on the set priority (weighting); and increments of the calculated evaluation points An apparatus for protecting privacy of public information, comprising: a processing method selecting unit that selects a data processing method that maximizes data; and a data processing unit that processes data using the selected processing method. is suggesting.

この発明によれば、設定手段は、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する。算出手段は、設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する。加工方法選択手段は、算出した評価ポイントの増加分が最大となるようなデータの加工方法を選択する。データ加工手段は、選択された加工方法でデータの加工を行う。したがって、データの加工時に各データに対して優先順位を設定し、トップダウン処理を適用した関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止する。 According to this invention, the setting means sets the priority (weighting) for each attribute of the data in consideration of the requirements of the user who uses the public information. The calculation means calculates an evaluation point for each data based on the set priority (weighting). The processing method selection means selects a data processing method that maximizes the calculated increase in evaluation points. The data processing means processes data using the selected processing method. Therefore, by setting the priority order for each data when processing the data and evaluating the deformed data using a function to which the top-down processing is applied, the information required by the data user is retained as much as possible, Prevent data loss.

（８）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置における公開情報のプライバシー保護方法であって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する第１のステップ（例えば、図５のステップＳ２０１に相当）と、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する第２のステップ（例えば、図５のステップＳ２０２に相当）と、該算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択する第３のステップ（例えば、図５のステップＳ２０３に相当）と、該選択された加工方法でデータの加工を行う第４のステップ（例えば、図５のステップＳ２０４に相当）と、を備えたことを特徴とする公開情報のプライバシー保護方法を提案している。 (8) The present invention provides a privacy protection method for public information in a public information privacy protection apparatus for processing data and protecting the privacy of the information to be disclosed, and is a requirement of a user who uses the public information In consideration of the above, a first step (for example, equivalent to step S201 in FIG. 5) for setting the priority (weighting) for each attribute of the data, and each priority based on the set priority (weighting) A second step (for example, corresponding to step S202 in FIG. 5) for calculating data evaluation points and a third step for selecting a data processing method that minimizes the decrease in the calculated evaluation points ( For example, this corresponds to step S203 in FIG. 5 and a fourth step (for example, step S204 in FIG. 5) for processing the data by the selected processing method. Proposes a privacy protection method for public information, characterized in that it comprises a considerable), the.

この発明によれば、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定し、設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する。そして、算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択し、選択された加工方法でデータの加工を行う。したがって、データの加工時に各データに対して優先順位を設定し、関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止する。 According to the present invention, the priority (weighting) is set for each attribute of the data in consideration of the requirements of the user who uses the public information, and each data is set based on the set priority (weighting). The evaluation point is calculated. Then, a data processing method is selected such that the calculated reduction in the evaluation points is minimized, and the data is processed by the selected processing method. Therefore, by setting priorities for each data when processing the data and evaluating the transformed data using the function, the information required by the data user is retained as much as possible and data loss is prevented. .

（９）本発明は、データの加工を行うための方法を選択するボトムアップ処理方法であって、各属性について同一データを集めてソート処理及びグループ化処理を行う第１のステップ（例えば、図４のステップＳ１０１に相当）と、各属性の属性値の数を算出する第２のステップ（例えば、図４のステップＳ１０２に相当）と、評価ポイントを算出する第３のステップ（例えば、図４のステップＳ１０３に相当）と、設定された優先順位情報（重み付け）とｋ−匿名性判定からに基づいて、加工処理を行う属性およびグループを選択し、加工処理による評価ポイントの減少分を算出する第４のステップ（例えば、図４のステップＳ１０４に相当）と、選択したグループにおいて加工処理を行い、データセット全体の処理結果に基づいて、ｋ−匿名性の判定を行う第５のステップ（例えば、図４のステップＳ１０５に相当）と、を備えたことを特徴とするボトムアップ処理方法を提案している。 (9) The present invention is a bottom-up processing method for selecting a method for processing data, and includes a first step (for example, a diagram) for collecting the same data for each attribute and performing a sorting process and a grouping process. 4 (corresponding to step S101 in FIG. 4), a second step (for example, corresponding to step S102 in FIG. 4) for calculating the number of attribute values of each attribute, and a third step (for example in FIG. 4) for calculating evaluation points. Step S103), the set priority information (weighting), and the k-anonymity determination are used to select the attribute and group to be processed, and the evaluation point reduction due to the processing is calculated. Based on the fourth step (e.g., corresponding to step S104 in FIG. 4) and the selected group, and k− Fifth step for judging name (e.g., corresponding to step S105 in FIG. 4) proposes a bottom-up processing method characterized by comprising a, a.

この発明によれば、各属性について同一データを集めてソート処理及びグループ化処理を行い、各属性の属性値の数を算出し、評価ポイントを算出する。そして、設定された優先順位情報（重み付け）とｋ−匿名性判定からに基づいて、加工処理を行う属性およびグループを選択し、加工処理による評価ポイントの減少分を算出して、選択したグループにおいて加工処理を行い、データセット全体の処理結果に基づいて、ｋ−匿名性の判定を行う。したがって、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止する。 According to the present invention, the same data is collected for each attribute, the sorting process and the grouping process are performed, the number of attribute values of each attribute is calculated, and the evaluation point is calculated. Then, based on the set priority information (weighting) and k-anonymity determination, an attribute and a group to be processed are selected, and a decrease in evaluation points due to the processing is calculated. Processing is performed, and k-anonymity is determined based on the processing result of the entire data set. Therefore, the information required by the data user is retained as much as possible, and data loss is prevented.

（１０）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置における公開情報のプライバシー保護方法であって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する第１のステップと、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する第２のステップと、該算出した評価ポイントの増加分が最大となるようなデータの加工方法を選択する第３のステップと、該選択された加工方法でデータの加工を行う第４のステップと、を備えたことを特徴とする公開情報のプライバシー保護方法を提案している。 (10) The present invention is a privacy protection method for public information in a public information privacy protection apparatus for processing data and protecting privacy for public information, and is a requirement for a user who uses public information In consideration of the above, the first step of setting the priority (weighting) for each attribute of the data, and the second step of calculating the evaluation point of each data based on the set priority (weighting) And a third step of selecting a data processing method that maximizes the calculated increase in the evaluation points, and a fourth step of processing data by the selected processing method. We have proposed a privacy protection method for public information.

（１１）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置における公開情報のプライバシー保護方法をコンピュータに実行させるためのプログラムであって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する第１のステップ（例えば、図５のステップＳ２０１に相当）と、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する第２のステップ（例えば、図５のステップＳ２０２に相当）と、該算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択する第３のステップ（例えば、図５のステップＳ２０３に相当）と、該選択された加工方法でデータの加工を行う第４のステップ（例えば、図５のステップＳ２０４に相当）と、をコンピュータに実行させるためのプログラムを提案している。 (11) The present invention is a program for causing a computer to execute a privacy protection method for public information in a public information privacy protection device for processing data and protecting privacy for public information. The first step (for example, corresponding to step S201 in FIG. 5) for setting the priority (weighting) for each attribute of the data in consideration of the requirements of the user using the data, and the set priority A second step (for example, corresponding to step S202 in FIG. 5) for calculating evaluation points for each data based on (weighting), and a data processing method that minimizes the decrease in the calculated evaluation points A third step (for example, corresponding to step S203 of FIG. 5) and a second step of processing data by the selected processing method Step (e.g., corresponding to step S204 of FIG. 5) proposes a program for executing a, to the computer.

（１２）本発明は、データの加工を行うための方法を選択するボトムアップ処理方法をコンピュータに実行させるためのプログラムであって、各属性について同一データを集めてソート処理及びグループ化処理を行う第１のステップ（例えば、図４のステップＳ１０１に相当）と、各属性の属性値の数を算出する第２のステップ（例えば、図４のステップＳ１０２に相当）と、評価ポイントを算出する第３のステップ（例えば、図４のステップＳ１０３に相当）と、設定された優先順位情報（重み付け）とｋ−匿名性判定からに基づいて、加工処理を行う属性およびグループを選択し、加工処理による評価ポイントの減少分を算出する第４のステップ（例えば、図４のステップＳ１０４に相当）と、選択したグループにおいて加工処理を行い、データセット全体の処理結果に基づいて、ｋ−匿名性の判定を行う第５のステップ（例えば、図４のステップＳ１０５に相当）と、をコンピュータに実行させるためのプログラムを提案している。 (12) The present invention is a program for causing a computer to execute a bottom-up processing method for selecting a method for processing data, and collects the same data for each attribute and performs sort processing and grouping processing. A first step (for example, corresponding to step S101 in FIG. 4), a second step for calculating the number of attribute values of each attribute (for example, corresponding to step S102 in FIG. 4), and a first step for calculating evaluation points Based on the step 3 (e.g., corresponding to step S103 in FIG. 4), the set priority information (weighting), and k-anonymity determination, the attribute and group to be processed are selected and processed. A fourth step (for example, equivalent to step S104 in FIG. 4) for calculating the reduction amount of the evaluation points, and processing is performed in the selected group Based on the data set overall processing result, k-fifth step of performing the anonymity of determination (e.g., corresponding to step S105 in FIG. 4) has proposed a program for executing a, to the computer.

（１３）本発明は、データを加工して、公開する情報に対するプライバシーを保護するための公開情報のプライバシー保護装置における公開情報のプライバシー保護方法をコンピュータに実行させるためのプログラムであって、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する第１のステップと、該設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する第２のステップと、該算出した評価ポイントの増加分が最大となるようなデータの加工方法を選択する第３のステップと、該選択された加工方法でデータの加工を行う第４のステップと、をコンピュータに実行させるためのプログラムを提案している。 (13) The present invention is a program for causing a computer to execute a privacy protection method for public information in a public information privacy protection device for processing data and protecting privacy for public information. The first step of setting the priority (weighting) for each attribute of the data in consideration of the requirements of the user using the data, and the evaluation of each data based on the set priority (weighting) A second step of calculating points, a third step of selecting a data processing method that maximizes the increase in the calculated evaluation points, and a first step of processing data by the selected processing method A program for causing a computer to execute the four steps is proposed.

本発明によれば、データの加工時に各データに対して優先順位を設定し、関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持することができるという効果がある。 According to the present invention, when data is processed, priority is set for each data, and the data requested by the data user can be held as much as possible by evaluating the transformed data using a function. effective.

本実施形態に係る公開情報のプライバシー保護装置の構成図である。It is a block diagram of the privacy protection apparatus of the public information which concerns on this embodiment. 本実施形態に係る加工処理前のデータを例示した図である。It is the figure which illustrated the data before the processing which concerns on this embodiment. 本実施形態に係るボトムアップ処理を示したイメージ図である。It is the image figure which showed the bottom-up process which concerns on this embodiment. 本実施形態に係る公開情報のプライバシー保護装置の処理フローである。It is a processing flow of the privacy protection apparatus of the public information which concerns on this embodiment. 本実施形態に係るボトムアップ処理の処理フローである。It is a processing flow of the bottom-up process which concerns on this embodiment.

以下、本発明の実施形態について、図面を用いて、詳細に説明する。
なお、本実施形態における構成要素は適宜、既存の構成要素等との置き換えが可能であり、また、他の既存の構成要素との組合せを含む様々なバリエーションが可能である。したがって、本実施形態の記載をもって、特許請求の範囲に記載された発明の内容を限定するものではない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Note that the constituent elements in the present embodiment can be appropriately replaced with existing constituent elements and the like, and various variations including combinations with other existing constituent elements are possible. Therefore, the description of the present embodiment does not limit the contents of the invention described in the claims.

＜公開情報のプライバシー保護装置の構成＞
図１を用いて、本実施形態に係る公開情報のプライバシー保護装置の構成について説明する。本実施形態に係る公開情報のプライバシー保護装置は、図１に示すように、分類部１と、設定部２と、算出部３と、加工方法選択部４と、加工部５とから構成されている。 <Configuration of privacy protection device for public information>
The configuration of the public information privacy protection apparatus according to the present embodiment will be described with reference to FIG. As shown in FIG. 1, the public information privacy protection apparatus according to this embodiment includes a classification unit 1, a setting unit 2, a calculation unit 3, a processing method selection unit 4, and a processing unit 5. Yes.

分類部１は、元データをそのデータの各属性に応じて、重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）、準識別子（Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ）、削除すべき情報に分類する。なお、実際には、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を用いて、コンピュータ上のグラフィックス表示をマウスなどでポインティングすることにより、利用者が分類を行う。なお、重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）に指定された属性の変更は行われない。また、削除すべき情報に指定された情報は加工処理の際に自動的に削除される。これにより、ユーザを直接的に特定できる情報を排除してプライバシーを保護するとともに、重要な情報を公開することができる。 The classification unit 1 classifies the original data into important information (Sensitive Information), a quasi-identifier (Quasi-Identifier), and information to be deleted according to each attribute of the data. In practice, the user performs classification by pointing a graphic display on a computer with a mouse or the like using a GUI (Graphical User Interface) or the like. Note that the attribute specified in the important information (Sensitive Information) is not changed. Information specified as information to be deleted is automatically deleted at the time of processing. Thereby, it is possible to protect the privacy by excluding information that can directly identify the user, and to disclose important information.

設定部２は、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する。具体的には、各属性の重み付けは、利用者の入力により行われる。重み付けは、属性の優先順位を表わし、利用者が最も重視する属性を最上位とする。また、加工処理においては、優先順位が最下位の属性から順に加工処理を行い、ｋ−匿名性を満たした段階で終了する。従って、最上位の属性ほど元の情報が保持される確率が高くなる。また、これにより、直接的には、ユーザとの関連性の低い複数の情報を組み合わせることによりユーザを特定することも防止することにより、データ利用者が求める情報を可能な限り保持することができる。利用者は、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を利用して各属性に対して優先順位を入力する。利用者は、各優先順位に対して、重み付けポイント（数値）を設定する。この値は、加工処理を行う属性を選択する際に使用する。 The setting unit 2 sets the priority (weighting) for each attribute of the data in consideration of the requirements of the user who uses the public information. Specifically, each attribute is weighted by a user input. The weighting represents the priority order of attributes, and the attribute most important to the user is the highest. Further, in the processing process, the processing process is performed in order from the attribute with the lowest priority, and the process ends when the k-anonymity is satisfied. Therefore, the higher the attribute, the higher the probability that the original information is retained. In addition, the information requested by the data user can be held as much as possible by directly preventing the user from being identified by combining a plurality of pieces of information that are less relevant to the user. . A user inputs a priority order for each attribute using a GUI (Graphical User Interface) or the like. The user sets weighting points (numerical values) for each priority. This value is used when selecting an attribute for processing.

算出部３は、設定部２において設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する。具体的には、評価ポイントは、以下の数式を用いて、算出される。
評価ポイント＝（属性値の数）＊（重み付けポイント）
ここで、（属性値の数）とは、その属性が持つ属性値の種類の数を表す。加工処理によって、この評価ポイントの減少が最小になる属性を加工処理を行う属性として選択する。 The calculation unit 3 calculates an evaluation point of each data based on the priority order (weighting) set in the setting unit 2. Specifically, the evaluation points are calculated using the following mathematical formula.
Evaluation point = (number of attribute values) * (weighting point)
Here, (number of attribute values) represents the number of types of attribute values possessed by the attribute. The attribute that minimizes the decrease in the evaluation points by the processing process is selected as the attribute to be processed.

加工方法選択部４は、算出部３が算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択する。なお、加工方法選択部４は、ボトムアップ処理を用いている。ボトムアップ処理の詳細については、後述する。ただし、本実施形態においては、ボトムアップ処理を用いた例について説明するが、これに限らず、先行技術文献に記載されているトップダウン処理を用いてもよい。加工部５は、加工方法選択部４において選択された加工方法でデータの加工を行う。 The processing method selection unit 4 selects a data processing method that minimizes the decrease in evaluation points calculated by the calculation unit 3. The processing method selection unit 4 uses bottom-up processing. Details of the bottom-up process will be described later. However, in the present embodiment, an example using bottom-up processing will be described. However, the present invention is not limited to this, and top-down processing described in prior art documents may be used. The processing unit 5 processes data by the processing method selected by the processing method selection unit 4.

＜加工処理前のデータ＞
図２を用いて、加工処理前のデータについて、説明する。
図２は、加工処理前のデータとして医療情報を例示したものであり、本例では、データの属性として、「名前」、「年齢」、「性別」、「出身地」、「人種」、「病名」等が例示されている。 <Data before processing>
The data before processing will be described with reference to FIG.
FIG. 2 illustrates medical information as data before processing, and in this example, the attributes of the data are “name”, “age”, “gender”, “hometown”, “race”, “Disease name” and the like are exemplified.

本例では、女性である２５歳のＡが東京出身の日本人であって、肥満症という病気を持っており、男性である３７歳のＢが北海道出身の日本人であって、糖尿病という病気を持っており、男性である５５歳のＣが沖縄出身の日本人であって、高血圧症という病気を持っていることが示されている。 In this example, a female 25-year-old A is a Japanese from Tokyo and has a disease called obesity, and a male 37-year-old B is a Japanese from Hokkaido and has a disease called diabetes. A 55-year-old male C is a Japanese from Okinawa and has been shown to have a disease of hypertension.

このうち、「名前」という属性は、個人を直接的に特定できるものであるため、「削除すべき情報」に分類される。また、「病名」という属性は、プライバシー情報であるため、「重要情報（ＳｅｎｓｉｔｉｖｅＩｎｆｏｒｍａｔｉｏｎ）」に分類される。さらに、「年齢」、「性別」、「出身地」、「人種」という属性は、直接的に個人を特定できる情報ではないため、「準識別子（Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ）」に分類され、利用者の利用目的に応じて、重み付けが行われる。 Among these, the attribute “name” is classified as “information to be deleted” because it can directly identify an individual. Further, since the attribute “disease name” is privacy information, it is classified into “important information (Sensitive Information)”. Furthermore, since the attributes of “age”, “gender”, “birthplace”, and “race” are not information that can directly identify an individual, they are classified as “quasi-identifiers”. Weighting is performed according to the purpose of use.

＜ボトムアップ処理＞
図３は、ボトムアップ処理を示したイメージ図である。この図に示すように、ボトムアップ処理は、始点から終点に至るボトムアップにより匿名データを生成する処理である。このボトムアップ処理について、図３および図４を用いて、詳細に説明する。 <Bottom-up processing>
FIG. 3 is an image diagram showing a bottom-up process. As shown in this figure, the bottom-up process is a process for generating anonymous data by bottom-up from the start point to the end point. This bottom-up process will be described in detail with reference to FIGS.

まず、図３に示すように、始点と終点および最適ｋ−匿名性を満たす領域を設定し、各属性について同一データを集めてソート処理及びグループ化処理を行う（ステップＳ１０１）。次に、各属性の属性値の数を算出して、これを図３のようにプロットする（ステップＳ１０２）。 First, as shown in FIG. 3, a region satisfying the start point, the end point, and the optimum k-anonymity is set, and the same data is collected for each attribute and the sort process and group process are performed (step S101). Next, the number of attribute values for each attribute is calculated and plotted as shown in FIG. 3 (step S102).

次に、評価ポイントを算出する（ステップＳ１０３）。そして、設定された優先順位情報（重み付け）とｋ−匿名性判定からに基づいて、加工処理を行う属性およびグループを選択し、始点から順にボトムアップし、直近の階層において複数に分岐する要素について加工処理による評価ポイントの減少分を算出する（ステップＳ１０４）。 Next, an evaluation point is calculated (step S103). Then, based on the set priority information (weighting) and k-anonymity determination, an attribute and a group to be processed are selected, bottom-up from the start point, and a plurality of elements branching to the nearest hierarchy A reduction in evaluation points due to the processing is calculated (step S104).

図３を用いて具体的に説明すると、始点に対して、「Ａ」および「Ｂ」の分岐があり、それぞれに対して、加工処理による評価ポイントの減少分を算出する。そして、加工処理による評価ポイントの減少分が最少となるものを選択する。図３の例では、これを「Ｂ」とする。「Ｂ」には、同様に、「Ｃ」、「Ｄ」、「Ｅ」の分岐があり、それぞれに対して、加工処理による評価ポイントの減少分を算出する。そして、加工処理による評価ポイントの減少分が最少となるものを選択する。図３の例では、これを「Ｄ」とする。「Ｄ」には、同様に、「Ｆ」、「Ｇ」、「Ｈ」の分岐があり、それぞれに対して、加工処理による評価ポイントの減少分を算出する。そして、加工処理による評価ポイントの減少分が最少となるものを選択する。図３の例では、これを「Ｇ」とする。「Ｇ」には、同様に、「Ｉ」、「Ｊ」、「Ｋ」の分岐があり、それぞれに対して、加工処理による評価ポイントの減少分を算出する。そして、加工処理による評価ポイントの減少分が最少となるものを選択する。図３の例では、これを「Ｊ」とする。このような処理を最適ｋ−匿名性に至るまで実行する。 More specifically, with reference to FIG. 3, there are branches “A” and “B” with respect to the start point, and the decrease of the evaluation point due to the processing is calculated for each. Then, the one that minimizes the decrease in evaluation points due to the processing is selected. In the example of FIG. 3, this is “B”. Similarly, “B” has branches of “C”, “D”, and “E”, and for each, a decrease in the evaluation point due to the processing is calculated. Then, the one that minimizes the decrease in evaluation points due to the processing is selected. In the example of FIG. 3, this is “D”. Similarly, “D” has branches of “F”, “G”, and “H”, and the decrease of the evaluation point due to the processing is calculated for each. Then, the one that minimizes the decrease in evaluation points due to the processing is selected. In the example of FIG. 3, this is “G”. Similarly, “G” has branches of “I”, “J”, and “K”, and the decrease of the evaluation point due to the processing is calculated for each. Then, the one that minimizes the decrease in evaluation points due to the processing is selected. In the example of FIG. 3, this is “J”. Such processing is executed until the optimum k-anonymity is reached.

つまり、上記のように、選択したグループにおいて加工処理を行い、データセット全体の処理結果に基づいて、ｋ−匿名性の判定を行う（ステップＳ１０５）。ｋ−匿名性の判定では、入力されたデータセットがｋ−匿名性を満たすかどうか判定し、満たす場合はデータを匿名データセット出力に受け渡す。満たさない場合、満たさなかったグループをフィードバック情報として、ステップＳ１０１に戻す。したがって、ボトムアップ処理においては、データ利用者が求める情報を可能な限り保持するとともに、データの欠落を防止することができる。 That is, as described above, processing is performed on the selected group, and k-anonymity is determined based on the processing result of the entire data set (step S105). In the determination of k-anonymity, it is determined whether the input data set satisfies k-anonymity, and if so, the data is passed to the anonymous data set output. If not satisfied, the group that has not been satisfied is returned to step S101 as feedback information. Therefore, in the bottom-up process, information required by the data user can be retained as much as possible, and data loss can be prevented.

なお、トップダウン処理は、上記において説明したボトムアップ処理とは、反対の処理であって、図３に示した場合では、終点から始点に至るボトムアップにより匿名データを生成する処理である。つまり、ボトムアップ処理では、各属性に属するデータが予めいくつかのデータのかたまりとして規定されているために、上記に示した評価ポイントは、減少し、その減少分が最小のものを選択するが、トップダウン処理では、各属性に属するデータをまずひとつのかたまりとして扱い、順次、これを細分化して処理を行うため、評価ポイントは増加し、その増加分が最大のものを選択する。このような差異は、あるものの、結果としては、トップダウン処理を用いた場合においても、上記のボトムアップ処理を用いた場合と同様の効果が得られる。 The top-down process is a process opposite to the bottom-up process described above. In the case shown in FIG. 3, the top-down process is a process of generating anonymous data by bottom-up from the end point to the start point. In other words, in the bottom-up process, since the data belonging to each attribute is defined in advance as a group of several data, the evaluation points shown above are reduced, and the one with the smallest reduction is selected. In the top-down processing, the data belonging to each attribute is first treated as one chunk, and this is sequentially subdivided and processed, so that the evaluation points increase and the one with the largest increase is selected. Although there is such a difference, as a result, even when the top-down process is used, the same effect as that obtained when the bottom-up process is used can be obtained.

＜公開情報のプライバシー保護装置の処理＞
図５を用いて、公開情報のプライバシー保護装置の処理について説明する。
まず、公開情報を利用する利用者の要求条件を考慮して、データの各属性ごとに優先順位（重み付け）を設定する（ステップＳ１０２）。設定された優先順位（重み付け）に基づいて、各データの評価ポイントを算出する（ステップＳ１０２）。 <Processing of privacy protection device for public information>
The processing of the public information privacy protection apparatus will be described with reference to FIG.
First, the priority (weighting) is set for each attribute of the data in consideration of the requirements of the user using the public information (step S102). Based on the set priority (weighting), the evaluation point of each data is calculated (step S102).

そして、算出した評価ポイントの減少分が最小となるようなデータの加工方法を選択し（ステップＳ１０３）、その選択された加工方法でデータの加工を行う（ステップＳ１０４）。 Then, a data processing method is selected such that the calculated decrease in the evaluation points is minimized (step S103), and the data is processed using the selected processing method (step S104).

したがって、本実施形態によれば、データの加工時に各データに対して優先順位を設定し、関数を用いて変形したデータを評価することで、データ利用者が求める情報を可能な限り保持することができる。 Therefore, according to the present embodiment, the priority required for each data is set at the time of data processing, and the information requested by the data user is retained as much as possible by evaluating the deformed data using a function. Can do.

なお、公開情報のプライバシー保護装置をコンピュータ読み取り可能な記録媒体に記録し、この記録媒体に記録されたプログラムを公開情報のプライバシー保護装置に読み込ませ、実行することによって本発明の認証システムを実現することができる。ここでいうコンピュータシステムとは、ＯＳや周辺装置等のハードウェアを含む。 The public information privacy protection device is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the public information privacy protection device and executed, thereby realizing the authentication system of the present invention. be able to. The computer system here includes an OS and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）システムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されても良い。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW (World Wide Web) system is used. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.

また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。更に、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組合せで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態につき、図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to the embodiments, and includes designs and the like that do not depart from the gist of the present invention.

１；分類部
２；設定部
３；算出部
４；加工方法選択部
５；加工部 DESCRIPTION OF SYMBOLS 1; Classification part 2; Setting part 3; Calculation part 4; Processing method selection part 5;

Claims

A public information privacy protection device for processing data to protect the privacy of information to be disclosed.
Setting means for setting priority (weighting) for each attribute of the data according to the user's input in consideration of the requirements of the user using the public information;
Calculation means for calculating an evaluation point of each data based on the set priority (weighting);
A processing method selection means for selecting a data processing method that minimizes a decrease in the calculated evaluation point;
Data processing means for processing data by the selected processing method;
An apparatus for protecting privacy of public information, comprising:

The public information privacy protection apparatus according to claim 1, wherein the processing method selection unit selects a data processing method using bottom-up processing.

2. The privacy protection of public information according to claim 1, wherein the data processing unit performs processing from the lowest priority (weighting) set in the setting unit until k-anonymity is satisfied. apparatus.

The public information privacy protection apparatus according to claim 1, further comprising a classifying unit that classifies each attribute of the data into important information (Sensitive Information), a quasi-identifier (Quasi-Identifier), and information to be deleted.

5. The privacy of public information according to claim 4, wherein the important information (Sensitive Information) is not subject to processing, and the information to be deleted is automatically deleted at the time of processing. Protective device.

The public information according to claim 1, wherein the calculation unit calculates the evaluation point by multiplying the number of types of attribute values possessed by each attribute by the priority (weighting) set in the setting unit. Privacy protection device.

A public information privacy protection device for processing data to protect the privacy of information to be disclosed.
Setting means for setting priority (weighting) for each attribute of the data according to the user's input in consideration of the requirements of the user using the public information;
Calculation means for calculating an evaluation point of each data based on the set priority (weighting);
A processing method selection means for selecting a data processing method that maximizes the increase in the calculated evaluation points;
Data processing means for processing data by the selected processing method;
An apparatus for protecting privacy of public information, comprising:

A method for protecting the privacy of public information in a public information privacy protection device for processing data and protecting the privacy of information to be disclosed.
A first step of setting a priority (weighting) for each attribute of the data according to the input of the user in consideration of the requirements of the user using the public information;
A second step of calculating an evaluation point of each data based on the set priority (weighting);
A third step of selecting a data processing method that minimizes the decrease in the calculated evaluation points;
A fourth step of processing the data with the selected processing method;
A method for protecting the privacy of public information, comprising:

A bottom-up processing method for selecting a method for processing data,
A first step of collecting the same data for each attribute and performing a sorting process and a grouping process;
A second step of calculating the number of attribute values for each attribute;
A third step of calculating evaluation points;
A fourth step of selecting an attribute and a group to be processed based on the set priority information (weighting) and k-anonymity determination, and calculating a reduction in evaluation points by the processing;
A fifth step of performing processing in the selected group and determining k-anonymity based on the processing result of the entire data set;
A bottom-up processing method characterized by comprising:

A method for protecting the privacy of public information in a public information privacy protection device for processing data and protecting the privacy of information to be disclosed.
A first step of setting a priority (weighting) for each attribute of the data according to the input of the user in consideration of the requirements of the user using the public information;
A second step of calculating an evaluation point of each data based on the set priority (weighting);
A third step of selecting a data processing method that maximizes the calculated increase in evaluation points;
A fourth step of processing the data with the selected processing method;
A method for protecting the privacy of public information, comprising:

A program for causing a computer to execute a privacy protection method for public information in a public information privacy protection device for processing data and protecting privacy for information to be disclosed,
A first step of setting a priority (weighting) for each attribute of the data according to the input of the user in consideration of the requirements of the user using the public information;
A second step of calculating an evaluation point of each data based on the set priority (weighting);
A third step of selecting a data processing method that minimizes the decrease in the calculated evaluation points;
A fourth step of processing the data with the selected processing method;
A program that causes a computer to execute.

A program for causing a computer to execute a bottom-up processing method for selecting a method for processing data,
A first step of collecting the same data for each attribute and performing a sorting process and a grouping process;
A second step of calculating the number of attribute values for each attribute;
A third step of calculating evaluation points;
A fourth step of selecting an attribute and a group to be processed based on the set priority information (weighting) and k-anonymity determination, and calculating a reduction in evaluation points by the processing;
A fifth step of performing processing in the selected group and determining k-anonymity based on the processing result of the entire data set;
A program that causes a computer to execute.

A program for causing a computer to execute a privacy protection method for public information in a public information privacy protection device for processing data and protecting privacy for information to be disclosed,
A first step of setting a priority (weighting) for each attribute of the data according to the input of the user in consideration of the requirements of the user using the public information;
A second step of calculating an evaluation point of each data based on the set priority (weighting);
A third step of selecting a data processing method that maximizes the calculated increase in evaluation points;
A fourth step of processing the data with the selected processing method;
A program that causes a computer to execute.