JP2014170369A

JP2014170369A - Information processor, information processing system, and information anonymization method

Info

Publication number: JP2014170369A
Application number: JP2013041743A
Authority: JP
Inventors: Takao Takenouchi; 隆夫竹之内
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-03-04
Filing date: 2013-03-04
Publication date: 2014-09-18

Abstract

PROBLEM TO BE SOLVED: To achieve anonymization of data of a providing source that reduces communication volume without requiring a reliable mediator.SOLUTION: An information processor 20 includes: a policy determination part 201 for calculating policies to be used to generalize data of a providing source device on the basis of data for policy calculation being data obtained by making data of the providing source device ambiguous to the extent that a policy can be calculated; and an anonymized data creation part 202 for creating anonymized data and providing a user device with the anonymized data on the basis of data generated on the basis of the policy by the providing source device.

Description

本発明は、情報処理に関し、特に、匿名化に関する。 The present invention relates to information processing, and more particularly to anonymization.

近年、多くの個人データが、電子データ化されている。 In recent years, much personal data has been converted to electronic data.

データの電子データ化に伴い、個人データの２次利用が、拡大している。 Secondary use of personal data is expanding with the conversion of data into electronic data.

しかし、個人データは、個人に関連した公開したくないデータ（センシティブデータ）を含むため、公開に際して、プライバシー保護が必要である。 However, since personal data includes data (sensitive data) that is not related to an individual and that is not desired to be disclosed, privacy protection is required for disclosure.

匿名化技術は、プライバシーを保護する技術の１つである。 Anonymization technology is one of technologies for protecting privacy.

本発明に関連する匿名化技術は、例えば、個人データから、個人を一意に識別する識別子を削除して、データを公開する。 The anonymization technology related to the present invention, for example, deletes an identifier that uniquely identifies an individual from personal data and publishes the data.

しかし、個人データは、他のデータとの組み合わせると、個人を特定できるデータを含む場合もある。 However, personal data may include data that can identify an individual when combined with other data.

「準識別子」とは、このように、他のデータと組み合わせると個人を特定できるデータである。 The “quasi-identifier” is data that can identify an individual when combined with other data.

そのため、本発明に関連する匿名化技術は、個人データを保護するための所定の方針を満たすように、準識別子を匿名化する。匿名化の方針は、幾つか提案されている。 Therefore, the anonymization technique related to the present invention anonymizes the quasi-identifier so as to satisfy a predetermined policy for protecting personal data. Several anonymization policies have been proposed.

例えば、「ｋ−匿名性」及び「ｌ−多様性」は、広く用いられている。 For example, “k-anonymity” and “l-diversity” are widely used.

「ｋ−匿名性」は、データの各グループにおいて、同じ準識別子又は準識別子の組を含むデータが、「ｋ」個以上含まれる匿名化を保証する方針である。 “K-anonymity” is a policy that guarantees anonymization in which “k” or more data including the same quasi-identifier or quasi-identifier pair is included in each group of data.

「ｌ−多様性」は、データの各グループにおいて、センシティブデータの種類が、「ｌ」個以上含まれる匿名化を保証する方針である。 The “l-diversity” is a policy that guarantees anonymization including “l” or more types of sensitive data in each group of data.

なお、「ｋ−匿名化」は、「ｋ−匿名性」を満足たすような匿名化である。また、「ｌ−多様化」は、「ｌ−匿名性」を満足するような匿名化である。 Note that “k-anonymization” is anonymization that satisfies “k-anonymity”. Further, “l-diversification” is anonymization that satisfies “l-anonymity”.

その他、例えば、「ｔ−近接性」、「ｍ−不変性」といった方針も、提案されている。 Other policies such as “t-proximity” and “m-invariance” have also been proposed.

「ｔ−近接性」は、グループ間のセンシティブ情報の分布のおける距離と、全属性の分布における距離とが、「ｔ」以下であることを保証する方針である。 The “t-proximity” is a policy that guarantees that the distance in the distribution of sensitive information between groups and the distance in the distribution of all attributes are “t” or less.

「ｍ−不変性」は、データの逐次開示において、準識別情報の組合せが同じレコードが「m」個以上あり、全てのレコードで違うセンシティブデータを持つことを保証する方針である。 “M-invariance” is a policy for guaranteeing that there are “m” or more records having the same combination of quasi-identification information in the sequential disclosure of data, and that all records have different sensitive data.

また、匿名化の手法は、多く提案されている（例えば、非特許文献１を参照）。 Many anonymization techniques have been proposed (see, for example, Non-Patent Document 1).

非特許文献１に記載の「Mondrian Multidimensional」は、準識別子を１つのグループにまとめた後、ｋ−匿名性を満足するように、そのグループを分割する手法である。 “Mondrian Multidimensional” described in Non-Patent Document 1 is a technique of grouping quasi-identifiers into one group and then dividing the group so as to satisfy k-anonymity.

さらに、データの提供者（提供元）は、１つに限らず、複数の場合がある。 Furthermore, the number of data providers (providers) is not limited to one, and there may be a plurality of cases.

データの匿名化の方針は、複数の提供元のデータを集約（結合）したデータ（結合データ）において満足する必要がある。従って、提供元が複数の場合、提供元のデータは、匿名化の前に、集約が必要である。しかし、各提供元にとってデータは、財産である。そのため、提供元は、他の提供元に、匿名化していないデータを提供したくない。 The data anonymization policy needs to be satisfied in data (combined data) obtained by aggregating (combining) data from a plurality of providers. Therefore, when there are a plurality of providers, the data of the providers needs to be aggregated before anonymization. However, for each provider, data is a property. Therefore, the provider does not want to provide data that has not been anonymized to other providers.

すべての提供元が信頼する仲介者が存在する場合、提供元は、その仲介者に、データの集約と匿名化を依頼すればよい。 When there is an intermediary trusted by all providers, the provider may request the intermediary to aggregate and anonymize data.

しかし、常に、すべての提供元が信頼できる仲介者が、存在するとは限らない。 However, there are not always intermediaries that all providers can trust.

そこで、仲介者を必要としない手法が、提案されている（例えば、非特許文献２を参照）。 Therefore, a method that does not require an intermediary has been proposed (see, for example, Non-Patent Document 2).

非特許文献２に記載の匿名化手法は、複数の「データの提供者」のデータを集約する際に、「Secure Multi-Party Computation（ＭＰＣ）」を用いる。そして、非特許文献２に記載の匿名化手法は、次のような手法でデータを匿名化する。 The anonymization method described in Non-Patent Document 2 uses “Secure Multi-Party Computation (MPC)” when aggregating data of a plurality of “data providers”. And the anonymization method of a nonpatent literature 2 anonymizes data with the following methods.

各提供者は、それぞれ保持するデータを抽象化してグループを作成する。そして、各提供者は、ＭＰＣを用いて、相互のデータを開示しないまま、データをグループに分割し、抽象化されたデータを詳細化する。提供者は、匿名性を満たす間、この処理を繰り返し、データを匿名化する。 Each provider creates a group by abstracting the data held therein. Each provider uses the MPC to divide the data into groups without disclosing each other's data and to refine the abstracted data. While satisfying anonymity, the provider repeats this process to anonymize the data.

つまり、各提供者は、ＭＰＣを用いることで、相互のデータを開示しないで、データを匿名化できる。そして、データの受け取る利用者は、匿名化後のデータを受け取る。 That is, each provider can anonymize data by using MPC without disclosing each other's data. And the user who receives data receives the data after anonymization.

K. LeFevre, D.J. DeWitt, R. Ramakrishnan, , "Mondrian Multidimensional K-Anonymity", Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference, 03-07 April 2006K. LeFevre, D.J. DeWitt, R. Ramakrishnan,, "Mondrian Multidimensional K-Anonymity", Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference, 03-07 April 2006 Pawel Jurczyk, Li Xiong, "Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers, "Proceedings of the 23rd Annual IFIP WG 11.3 Working Conference, Montreal, Canada, July 12-15, 2009 on Data and Applications Security XXIII, pp. 191-207Pawel Jurczyk, Li Xiong, "Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers," Proceedings of the 23rd Annual IFIP WG 11.3 Working Conference, Montreal, Canada, July 12-15, 2009 on Data and Applications Security XXIII, pp 191-207

非特許文献２に記載のＭＰＣは、データの提供元間の相互の通信と、通信したデータの提供元での処理が必要である。そのため、ＭＰＣの通信量は、データの提供元の数をＮとし、Ｏ記法を用いて記載すると、おおよそ次のようになる。なお、この通信量は、データの提供元が1つのデータを持っている場合の例である。 The MPC described in Non-Patent Document 2 requires mutual communication between data providers and processing at the communicated data providers. Therefore, the traffic volume of MPC is approximately as follows when the number of data providers is N and is described using the O notation. Note that this communication amount is an example when the data provider has one piece of data.

通信量：Ｏ（Ｎ・ｌｏｇＮ）
つまり、非特許文献２に記載の手法は、提供元の数とその対数との積に比例して通信量が増加する。 Traffic volume: O (N · logN)
That is, according to the method described in Non-Patent Document 2, the communication amount increases in proportion to the product of the number of providers and the logarithm thereof.

このように、非特許文献２に記載の技術は、データの提供元が多くなると、通信量が多くなるため、実用化が困難となる問題点があった。 As described above, the technique described in Non-Patent Document 2 has a problem that it becomes difficult to put it to practical use because the amount of communication increases as the number of data providers increases.

本発明の目的は、上記問題点を解決し、信頼できる仲介者が存在しない場合でも、装置通信量を低減した匿名化を実現する情報処理装置、情報処理システム、及び、情報匿名化方法を提供することにある。 An object of the present invention is to provide an information processing apparatus, an information processing system, and an information anonymization method that solves the above-described problems and realizes anonymization with reduced apparatus communication volume even when there is no reliable mediator There is to do.

本発明の情報処理装置は、方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出する方針決定手段と、前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する匿名化データ作成手段とを含む。 The information processing apparatus according to the present invention calculates a policy to be used for generalizing the data of the providing source device based on the data for policy calculation, which is data that obfuscates the data of the providing source device to the extent that the policy can be calculated. Determining means and anonymized data creating means for creating anonymized data based on data generalized by the provider device based on the policy.

本発明の情報処理システムは、方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置がデータの汎化に用いる方針を算出する方針算出手段と、前記提供元装置が前記方針を基に汎化したデータを基に匿名化データを作成する匿名化データ作成手段とを含む情報処理装置と、前記提供元装置のデータを保存するデータ保存手段と、前記保存するデータを基に前記方針算出用データを作成する方針算出用データ作成手段と、前記情報処理装置が算出した前記方針を保存する方針保存手段と、前記方針を基に前記保存するデータを汎化する汎化手段とを含む提供元装置とを含む。 The information processing system of the present invention calculates a policy used by the provider device for data generalization based on policy calculation data that is data that obfuscates the data of the provider device to the extent that the policy can be calculated. Information processing apparatus including calculation means, anonymized data creating means for creating anonymized data based on data generalized by the provider apparatus based on the policy, and data for storing data of the provider apparatus Saving means, policy calculation data creation means for creating the policy calculation data based on the data to be saved, policy saving means for saving the policy calculated by the information processing device, and the policy based on the policy A provider apparatus including generalization means for generalizing data to be stored.

本発明の情報匿名化方法は、方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出し、前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する。 The information anonymization method of the present invention calculates a policy used for generalizing the data of the providing source device based on the data for policy calculation, which is data that obfuscates the data of the providing source device to such an extent that the policy can be calculated. The anonymization data is created based on the data generalized by the provider device based on the policy.

本発明のプログラムは、方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出する処理と、前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する処理とをコンピュータに実行させる。 The program of the present invention calculates a policy used for generalization of the data of the providing source device based on the data for policy calculation that is data that obfuscates the data of the providing source device to the extent that the policy can be calculated; The provider apparatus causes the computer to execute processing for creating anonymized data based on data generalized based on the policy.

本発明によれば、提供元が信頼できる仲介者を必要としないで、通信量を削減した複数の提供元のデータを用いた匿名化を実現できる。 ADVANTAGE OF THE INVENTION According to this invention, the anonymization using the data of the several provider which reduced the communication volume is realizable, without requiring the broker whose provider is reliable.

図１は、本発明における第１の実施形態に係る情報処理装置を含む情報処理システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of an information processing system including an information processing apparatus according to the first embodiment of the present invention. 図２は、第1の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the first embodiment. 図３は、第１の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the first embodiment. 図４は、第１の実施形態に係る情報処理装置及び提供元装置の構成の一例を示すブロック図である。FIG. 4 is a block diagram illustrating an example of the configuration of the information processing apparatus and the provider apparatus according to the first embodiment. 図５は、第１の実施形態の説明に用いる保存する元のデータの一例を示す図である。FIG. 5 is a diagram illustrating an example of original data to be used for the description of the first embodiment. 図６は、第１の実施形態の説明に用いるノイズ付加データの一例を示す図である。FIG. 6 is a diagram illustrating an example of noise-added data used in the description of the first embodiment. 図７は、第１の実施形態に係る匿名化後のノイズ付加データの一例を示す図である。FIG. 7 is a diagram illustrating an example of noise-added data after anonymization according to the first embodiment. 図８は、第１の実施形態に係る汎化後のデータの一例を示すブロック図である。FIG. 8 is a block diagram illustrating an example of data after generalization according to the first embodiment. 図９は、第１の実施形態の変形例の構成の一例を示す図である。FIG. 9 is a diagram illustrating an example of a configuration of a modified example of the first embodiment. 図１０は、第２の実施形態に係る匿名化部の動作の説明に用いるノイズ付加データを示す図である。FIG. 10 is a diagram illustrating noise-added data used for explaining the operation of the anonymization unit according to the second embodiment. 図１１は、第２の実施形態に係る匿名化部の動作の説明に用いるノイズ付加データを示す図である。FIG. 11 is a diagram illustrating noise-added data used for explaining the operation of the anonymization unit according to the second embodiment. 図１２は、第２の実施形態に係る匿名化部の動作を説明するための汎化後のデータを示す図である。FIG. 12 is a diagram illustrating data after generalization for explaining the operation of the anonymization unit according to the second embodiment. 図１３は、第２の実施形態に係る匿名化部の動作を説明に用いるデータを示す図である。FIG. 13 is a diagram illustrating data used for explaining the operation of the anonymization unit according to the second embodiment. 図１４は、第２の実施形態に係る匿名化部を用いた場合の匿名化後のデータの一例を示す図である。FIG. 14 is a diagram illustrating an example of data after anonymization when the anonymization unit according to the second embodiment is used.

次に、本発明の実施形態について図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

なお、各図面は、本発明の実施形態を説明するものである。そのため、本発明は、各図面の記載に限られるわけではない。また、各図面の同様の構成には、同じ番号を付し、その繰り返しの説明は、省略する場合がある。 Each drawing explains an embodiment of the present invention. Therefore, the present invention is not limited to the description of each drawing. Moreover, the same number is attached | subjected to the same structure of each drawing, and the repeated description may be abbreviate | omitted.

なお、本発明に係る情報処理装置が匿名化する準識別子は、特に制限はない。また、発明に係る情報処理装置は、匿名化する準識別子の数に、特に制限はない。ただし、以下の説明では、準識別子の一例として、「年齢」を用いて説明する。 The quasi-identifier made anonymous by the information processing apparatus according to the present invention is not particularly limited. Further, the information processing apparatus according to the invention has no particular limitation on the number of quasi-identifiers to be anonymized. However, in the following description, “age” is used as an example of a quasi-identifier.

また、本発明に係る情報処理装置が取り扱うセンシティブ情報は、特に制限はない。ただし、以下の説明では、センシティブ情報の一例として、「疾病」を用いて説明する。 Further, the sensitive information handled by the information processing apparatus according to the present invention is not particularly limited. However, in the following description, “disease” is used as an example of sensitive information.

（第１の実施形態）
図１は、本発明における第１の実施形態に係る情報処理装置２０を含む情報処理システム１０の構成の一例を示すブロック図である。 (First embodiment)
FIG. 1 is a block diagram illustrating an example of a configuration of an information processing system 10 including an information processing apparatus 20 according to the first embodiment of the present invention.

情報処理システム１０は、情報処理装置２０と、提供元装置３０と、利用者装置４０と、ネットワーク５０とを含む。 The information processing system 10 includes an information processing device 20, a provider device 30, a user device 40, and a network 50.

第１の実施形態に係る情報処理装置２０は、提供元装置３０が提供するデータを匿名化し、利用者装置４０に送信する。また、情報処理装置２０は、提供元装置３０が提供するデータを集約してもよい。情報処理装置２０については、後ほど詳細に説明する。 The information processing device 20 according to the first embodiment anonymizes the data provided by the providing source device 30 and transmits the data to the user device 40. Further, the information processing apparatus 20 may aggregate data provided by the providing source apparatus 30. The information processing apparatus 20 will be described in detail later.

提供元装置３０は、汎化済みデータを情報処理装置２０に提供する装置である。情報処理装置２０は、汎化済みデータを用いて作成した匿名化済みデータを、利用者装置４０に提供する。 The providing source device 30 is a device that provides generalized data to the information processing device 20. The information processing apparatus 20 provides the anonymized data created using the generalized data to the user apparatus 40.

提供元装置３０は、データを汎化して提供する装置であれば、特に制限はない。そのため、提供元装置３０の詳細な説明を省略する。ただし、後ほどの情報処理装置２０の説明において、情報処理装置２０の動作に関連する提供元装置３０の構成及び動作については、詳細に説明する。 The provider device 30 is not particularly limited as long as it is a device that generalizes and provides data. Therefore, detailed description of the providing device 30 is omitted. However, in the description of the information processing apparatus 20 later, the configuration and operation of the provider apparatus 30 related to the operation of the information processing apparatus 20 will be described in detail.

なお、本実施形態の情報処理装置２０は、匿名化のための汎化データを提供する提供元装置３０の数に、特に制限はない。情報処理装置２０は、複数の提供元装置３０からの汎化データを匿名化しても良い。あるいは、情報処理装置２０は、１つの提供元装置３０の汎化データを匿名化しても良い。 In addition, the information processing apparatus 20 of this embodiment does not have a restriction | limiting in particular in the number of the provision source apparatuses 30 which provide the generalization data for anonymization. The information processing device 20 may anonymize generalized data from a plurality of providing source devices 30. Alternatively, the information processing device 20 may anonymize the generalized data of one provider device 30.

利用者装置４０は、情報処理装置２０から、匿名化されたデータを受け取る。利用者装置４０は、受け取ったデータを基に、例えば、所定のデータ解析を実施する装置である。利用者装置４０は、一般的なコンピュータや端末装置でも良く、特に限定はない。そのため、利用者装置４０の詳細な説明を省略する。 The user device 40 receives anonymized data from the information processing device 20. The user device 40 is a device that performs, for example, predetermined data analysis based on the received data. The user device 40 may be a general computer or a terminal device, and is not particularly limited. Therefore, detailed description of the user device 40 is omitted.

ネットワーク５０は、情報処理装置２０と各装置を接続する通信路又は通信網である。ネットワーク５０は、各装置が接続できれば、特に制限はない。そのため、ネットワーク５０の詳細な説明を省略する。 The network 50 is a communication path or communication network that connects the information processing apparatus 20 and each apparatus. The network 50 is not particularly limited as long as each device can be connected. Therefore, detailed description of the network 50 is omitted.

次に、情報処理装置２０について、図面を参照して、さらに説明する。 Next, the information processing apparatus 20 will be further described with reference to the drawings.

まず、情報処理装置２０に構成について説明する。 First, the configuration of the information processing apparatus 20 will be described.

図２は、第１の実施形態に係る情報処理装置２０の構成の一例を示すブロック図である。 FIG. 2 is a block diagram illustrating an example of the configuration of the information processing apparatus 20 according to the first embodiment.

情報処理装置２０は、提供元装置３０が提供するデータを汎化するための方針の決定と、提供元装置３０が提供する汎化データを基に利用者装置４０に提供する匿名化済み結合データを作成する。 The information processing device 20 determines the policy for generalizing the data provided by the providing source device 30, and the anonymized combined data provided to the user device 40 based on the generalized data provided by the providing source device 30. Create

そのため、情報処理装置２０は、方針決定部２０１と匿名化データ作成部２０２とを含む。 Therefore, the information processing device 20 includes a policy determination unit 201 and an anonymized data creation unit 202.

方針決定部２０１は、提供元装置３０が作成した方針算出用データを受信する。さらに、方針決定部２０１は、方針算出用データを基に、提供元装置３０が汎化で使用する方針を決定する。 The policy determining unit 201 receives the policy calculation data created by the providing source device 30. Further, the policy determining unit 201 determines a policy used by the providing source device 30 for generalization based on the policy calculation data.

ここで、「方針」とは、提供元装置３０が利用者装置４０に提供するために、元のデータを汎化するときに用いる汎化の方針である。情報処理装置２０は、利用者装置４０に提供するデータに用いる匿名化に基づいて、方針を算出する。例えば、方針は、提供元装置３０の汎化処理のための汎化幅、境界、又は、レベルである。 Here, the “policy” is a generalization policy used when the original data is generalized so that the providing source device 30 provides the user device 40. The information processing device 20 calculates a policy based on anonymization used for data provided to the user device 40. For example, the policy is a generalization width, a boundary, or a level for the generalization process of the providing apparatus 30.

また、「方針算出用データ」とは、情報処理装置２０が方針を算出するために、提供元装置３０が、情報処理装置２０に提供するデータである。 The “policy calculation data” is data provided by the providing source device 30 to the information processing device 20 in order for the information processing device 20 to calculate a policy.

匿名化データ作成部２０２は、提供元装置３０から受け取った汎化データを基に、利用者装置４０に提供する匿名化済み結合データを作成する。後ほど説明するが、匿名化データ作成部２０２は、必要に応じ、汎化データを集約し、サプレッションを実施する。 The anonymized data creation unit 202 creates anonymized combined data to be provided to the user device 40 based on the generalized data received from the providing source device 30. As will be described later, the anonymized data creation unit 202 aggregates generalized data and performs suppression as necessary.

図面を参照し、情報処理装置２０の各構成の詳細について更に説明する。 Details of each component of the information processing apparatus 20 will be further described with reference to the drawings.

図３は、第１の実施形態に係る情報処理装置２０の構成の一例を示すブロック図である。 FIG. 3 is a block diagram illustrating an example of the configuration of the information processing apparatus 20 according to the first embodiment.

図３に示すように、情報処理装置２０の方針決定部２０１は、方針算出用データ集約部２１０と、匿名化部２２０と、方針算出部２３０とを含む。また、匿名化データ作成部２０２は、汎化データ集約部２４０と、匿名性検査部２５０と、サプレッション部２６０とを含む。 As illustrated in FIG. 3, the policy determination unit 201 of the information processing apparatus 20 includes a policy calculation data aggregation unit 210, an anonymization unit 220, and a policy calculation unit 230. The anonymized data creation unit 202 includes a generalized data aggregation unit 240, an anonymity check unit 250, and a suppression unit 260.

方針算出用データ集約部２１０は、提供元装置３０から受信した方針算出用データを集約する。なお、ここでの集約とは、次に説明する匿名化部２２０が匿名化するためにデータをまとめることである。 The policy calculation data aggregation unit 210 collects the policy calculation data received from the providing apparatus 30. In addition, aggregation here is collecting data in order for the anonymization part 220 demonstrated below to anonymize.

また、方針算出用データ集約部２１０は、提供元装置３０から、連続的に方針算出用データを受信しても良い。あるいは、方針算出用データ集約部２１０は、提供元装置３０から、複数回に分けて方針算出用データを受信しても良い。 Further, the policy calculation data aggregation unit 210 may continuously receive the policy calculation data from the providing source device 30. Alternatively, the policy calculation data aggregation unit 210 may receive the policy calculation data from the providing source device 30 in a plurality of times.

なお、後ほど説明するように、情報処理装置２０は、提供元装置３０から受信する汎化済みデータの匿名性を検査し、必要な場合、データをサプレッションする。そのため、提供元装置３０は、データを、最終的に利用者装置４０に提供する状態まで汎化する必要はない。また、提供元装置３０は、できる限り保存する元のデータをそのままの状態で、情報処理装置２０に公開したくない。 As will be described later, the information processing apparatus 20 checks the anonymity of the generalized data received from the providing apparatus 30, and suppresses the data if necessary. Therefore, the providing source device 30 does not have to generalize the state in which the data is finally provided to the user device 40. Further, the providing source device 30 does not want to disclose to the information processing device 20 the original data to be stored as it is.

そこで、本実施形態の情報処理装置２０は、方針算出用データとして、提供元装置３０から、方針を算出できる程度に曖昧化したデータを基に方針を算出する。 Therefore, the information processing apparatus 20 according to the present embodiment calculates a policy from the providing apparatus 30 as data for calculating a policy based on data that is ambiguous enough to calculate the policy.

つまり、提供元装置３０は、元のデータではなく、曖昧化したデータ（方針算出用データ）を情報処理装置２０に送信する。そして、提供元装置３０は、その応答として、情報処理装置２０から汎化の方針を受信できる。 That is, the providing source device 30 transmits ambiguous data (policy calculation data) to the information processing device 20 instead of the original data. Then, the providing source device 30 can receive a generalization policy from the information processing device 20 as a response.

例えば、提供元装置３０は、情報処理装置２０に、利用者装置４０に提供するためのデータにノイズを入れた「ノイズ付加データ」を送信しても良い。あるいは、提供元装置３０は、利用者装置４０に提供するためのデータと同様の構成の準識別子を含むデータ、例えば、開示済みの過去のデータを送信しても良い。方針算出用データについては、後ほど、さらに説明する。 For example, the providing source device 30 may transmit “noise added data” in which noise is added to data to be provided to the user device 40 to the information processing device 20. Alternatively, the providing source device 30 may transmit data including a quasi-identifier having the same configuration as the data to be provided to the user device 40, for example, disclosed past data. The policy calculation data will be further described later.

方針算出用データ集約部２１０は、集約後の方針算出用データを、匿名化部２２０に送る。 The policy calculation data aggregation unit 210 sends the aggregated policy calculation data to the anonymization unit 220.

匿名化部２２０は、方針算出用データ集約部２１０が集約した方針算出用データを匿名化する。後ほど説明するように、方針算出部２３０は、匿名化後のデータを用いて、方針を算出する。つまり、匿名化部２２０は、方針を算出するために、方針算出用データを匿名化する。 The anonymization unit 220 anonymizes the policy calculation data aggregated by the policy calculation data aggregation unit 210. As will be described later, the policy calculation unit 230 calculates a policy using the anonymized data. That is, the anonymization unit 220 anonymizes the policy calculation data in order to calculate the policy.

そのため、匿名化部２２０は、情報処理装置２０が利用者装置４０に提供するデータに必要な匿名性と同程度の匿名性を満足するように、集約したデータを匿名化する。 Therefore, the anonymization unit 220 anonymizes the aggregated data so that the anonymity equivalent to the anonymity necessary for the data provided to the user device 40 by the information processing device 20 is satisfied.

なお、本実施形態の匿名化部２２０の匿名化は、特に制限はない。例えば、匿名化部２２０は、所定の準識別子に対してｋ−匿名化を実施しても良い。匿名化部２２０は、ｋ-匿名性に加え、ｌ−多様化やｔ−近接性を満たすように匿名化しても良い。 In addition, there is no restriction | limiting in particular in the anonymization of the anonymization part 220 of this embodiment. For example, the anonymization unit 220 may perform k-anonymization on a predetermined quasi-identifier. The anonymization unit 220 may anonymize so as to satisfy l-diversification and t-proximity in addition to k-anonymity.

匿名化部２２０は、匿名化後のデータを方針算出部２３０に送る。 The anonymization unit 220 sends the data after anonymization to the policy calculation unit 230.

方針算出部２３０は、匿名化部２２０が匿名化したデータを基に、提供元装置３０がデータを汎化するための方針を算出する。 The policy calculation unit 230 calculates a policy for the providing source device 30 to generalize the data based on the data anonymized by the anonymization unit 220.

そして、方針算出部２３０は、算出した方針を、提供元装置３０に送信する。 Then, the policy calculation unit 230 transmits the calculated policy to the providing source device 30.

提供元装置３０は、方針を基にデータを汎化し、汎化データを情報処理装置２０に送付する。 The provider apparatus 30 generalizes the data based on the policy and sends the generalized data to the information processing apparatus 20.

汎化データ集約部２４０は、提供元装置３０から受け取った汎化データを集約、つまり結合し、結合データを作成する。 The generalized data aggregating unit 240 aggregates, that is, combines the generalized data received from the providing apparatus 30, and creates combined data.

そして、汎化データ集約部２４０は、結合データを匿名性検査部２５０に送る。 Then, the generalized data aggregating unit 240 sends the combined data to the anonymity checking unit 250.

匿名性検査部２５０は、汎化データ集約部２４０から受け取った結合データが、所定の匿名性を満足するか否か検査する。 The anonymity checking unit 250 checks whether or not the combined data received from the generalized data aggregation unit 240 satisfies a predetermined anonymity.

ここで、情報処理装置２０が匿名性検査部２５０を含むのは、次に示す理由のためである。 Here, the reason why the information processing apparatus 20 includes the anonymity inspection unit 250 is as follows.

既に説明したとおり、情報処理装置２０が方針の算出に使用する方針算出用データと、提供元装置３０が汎化する元のデータとは、異なる。そのため、情報処理装置２０が算出した方針は、提供元装置３０におけるデータの汎化を方針として、最適でない可能性がある。つまり、情報処理装置２０が受け取った汎化後のデータは、所定の匿名性を満足しない可能性がある。そこで、情報処理装置２０は、匿名性検査部２５０を用いて、結合データの匿名性を検査する。 As already described, the policy calculation data used by the information processing apparatus 20 for calculating the policy is different from the original data that the provider apparatus 30 generalizes. Therefore, there is a possibility that the policy calculated by the information processing device 20 is not optimal with data generalization in the providing source device 30 as a policy. That is, the generalized data received by the information processing apparatus 20 may not satisfy the predetermined anonymity. Therefore, the information processing apparatus 20 uses the anonymity inspection unit 250 to inspect the anonymity of the combined data.

結合データが匿名性を満足しない場合、匿名性検査部２５０は、結合データをサプレッション部２６０に送り、結合データのサプレッションを依頼する。 If the combined data does not satisfy the anonymity, the anonymity checking unit 250 sends the combined data to the suppression unit 260 and requests suppression of the combined data.

サプレッション部２６０は、受け取った結合データの匿名性を確保するため、結合データを、サプレッションする。 The suppression unit 260 suppresses the combined data in order to ensure anonymity of the received combined data.

ここで、サプレッションとは、データの匿名性を確保するための処理である。サプレッションは、匿名性を確保する処理であれば、特に制限はない。サプレッションは、例えば、所定の匿名性を満たしていないデータを削除する、又は、所定の匿名性を満たしていないデータをさらに汎化するなどである。 Here, the suppression is a process for ensuring the anonymity of data. The suppression is not particularly limited as long as it is a process that ensures anonymity. The suppression is, for example, deleting data that does not satisfy the predetermined anonymity, or further generalizing data that does not satisfy the predetermined anonymity.

サプレッション後、サプレッション部２６０は、処理済み結合データを、匿名性検査部２５０に戻す。 After the suppression, the suppression unit 260 returns the processed combined data to the anonymity checking unit 250.

匿名性検査部２５０は、サプレッション部２６０から受け取った結合データの匿名性を検査し、匿名性を満足しない場合、サプレッション部２６０に結合データを送る。 The anonymity checking unit 250 checks the anonymity of the combined data received from the suppression unit 260 and sends the combined data to the suppression unit 260 when the anonymity is not satisfied.

匿名性検査部２５０は、結合データが所定の匿名性を確保するまで、サプレッション部２６０への結合データの送信を繰り返す。 The anonymity inspection unit 250 repeats transmission of the combined data to the suppression unit 260 until the combined data secures predetermined anonymity.

なお、サプレッション部２６０は、所定の匿名性を満足するまでデータを処理し、匿名性を満足後、匿名性検査部２５０に結合データを戻しても良い。 The suppression unit 260 may process the data until a predetermined anonymity is satisfied, and return the combined data to the anonymity inspection unit 250 after satisfying the anonymity.

結合データが匿名性を満足する場合、匿名性検査部２５０は、結合データを、匿名化後データして、利用者装置４０に送信する。 When combined data satisfies anonymity, the anonymity test | inspection part 250 makes the combined data anonymized data, and transmits to the user apparatus 40. FIG.

次に、具体的にデータを用いて、本実施形態の情報処理装置２０について、さらに説明する。 Next, the information processing apparatus 20 of the present embodiment will be further described using specific data.

なお、以下の説明において、提供元装置３０は、曖昧化としてノイズ付加を用いるとする。つまり、提供元装置３０は、方針算出用データの一例として、「ノイズ付加データ」を送信する。 In the following description, it is assumed that the providing apparatus 30 uses noise addition as an ambiguity. That is, the providing source device 30 transmits “noise addition data” as an example of the policy calculation data.

また、情報処理装置２０は、方針の一例として「汎化幅」を算出するとする。 The information processing apparatus 20 calculates “generalization width” as an example of the policy.

準識別子は、「年齢」とする。また、センシティブ情報は、「疾病」とする。 The quasi-identifier is “age”. Sensitive information is “disease”.

なお、「ノイズ付加データ」及び「汎化幅」は、後ほど、図面を参照して、詳細に説明する。 The “noise addition data” and the “generalization width” will be described in detail later with reference to the drawings.

図４は、本実施形態の情報処理装置２１と提供元装置３０との構成の一例を示すブロック図である。図４の提供元装置３０は、例示として、１台としている。ただし、これは、図面の明確にするためである。情報処理装置２１は、情報処理装置２０と同様に、複数の提供元装置３０と接続しても良い。 FIG. 4 is a block diagram illustrating an example of the configuration of the information processing apparatus 21 and the providing source apparatus 30 according to the present embodiment. The provider apparatus 30 of FIG. 4 is one as an example. However, this is for clarifying the drawing. Similarly to the information processing apparatus 20, the information processing apparatus 21 may be connected to a plurality of providing source apparatuses 30.

また、図４において、図３の同じ構成には同じ番号を付し、その詳細な説明を省略する。 Also, in FIG. 4, the same components as those in FIG.

情報処理装置２１は、方針決定部２０３と匿名化データ作成部２０２とを含む。 The information processing apparatus 21 includes a policy determination unit 203 and an anonymized data creation unit 202.

方針決定部２０３は、ノイズ付加データ集約部２１１と、匿名化部２２０と、汎化幅算出部２３１とを含む。 The policy determination unit 203 includes a noise addition data aggregation unit 211, an anonymization unit 220, and a generalization width calculation unit 231.

ノイズ付加データ集約部２１１は、図３の方針算出用データ集約部２１０に相当し、ノイズ付加データを集約する。 The noise addition data aggregating unit 211 corresponds to the policy calculation data aggregating unit 210 of FIG. 3 and aggregates noise addition data.

匿名化部２２０は、図３の匿名化部２２０と同様に、集約後のノイズ付加データを匿名化する。 The anonymization unit 220 anonymizes the noise-added data after aggregation, similarly to the anonymization unit 220 of FIG.

汎化幅算出部２３１は、図３の方針算出部２３０に相当し、方針として汎化幅を算出する。 The generalization width calculation unit 231 corresponds to the policy calculation unit 230 in FIG. 3 and calculates a generalization width as a policy.

匿名化データ作成部２０２は、図３と同様に、汎化データ集約部２４０と、匿名性検査部２５０と、サプレッション部２６０とを含む。各構成は、図３と同様のため、構成の説明は、省略する。 The anonymized data creation unit 202 includes a generalized data aggregation unit 240, an anonymity check unit 250, and a suppression unit 260, as in FIG. Each configuration is the same as that in FIG. 3, and the description of the configuration is omitted.

提供元装置３０は、データ保存部３１０と、ノイズ付加部３２０と、汎化幅保存部３３０と、汎化部３４０とを含む。 The providing apparatus 30 includes a data storage unit 310, a noise addition unit 320, a generalization width storage unit 330, and a generalization unit 340.

データ保存部３１０は、情報処理装置２１を介して、利用者装置４０に提供する元のデータを保存する。つまり、データ保存部３１０が保存する元のデータは、汎化後、情報処理装置２１で集約され、利用者装置４０に提供される。 The data storage unit 310 stores original data provided to the user device 40 via the information processing device 21. That is, the original data stored by the data storage unit 310 is aggregated by the information processing apparatus 21 after being generalized and provided to the user apparatus 40.

ノイズ付加部３２０は、方針算出用データとして、データ保存部３１０が保存する元のデータに所定のノイズを付加した「ノイズ付加データ」を、作成する。つまり、ノイズ付加部３２０は、「方針算出用データ作成部」とも言える。ノイズ付加部３２０は、作成したノイズ付加データを情報処理装置２１に送信する。 The noise addition unit 320 creates “noise addition data” in which predetermined noise is added to the original data stored by the data storage unit 310 as the policy calculation data. That is, the noise adding unit 320 can be said to be a “policy calculation data creation unit”. The noise addition unit 320 transmits the created noise addition data to the information processing apparatus 21.

汎化幅保存部３３０は、情報処理装置２１から受け取った方針である汎化幅を保存する。つまり、汎化幅保存部３３０は、「方針保存部」とも言える。汎化幅保存部３３０は、保存した方針を、汎化部３４０に送る。 The generalization width storage unit 330 stores the generalization width that is the policy received from the information processing apparatus 21. That is, the generalization width storage unit 330 can be said to be a “policy storage unit”. The generalization width storage unit 330 sends the stored policy to the generalization unit 340.

汎化部３４０は、汎化幅を用いて、データ保存部３１０が保存する元のデータを汎化する。そして、汎化部３４０は、汎化後のデータを、情報処理装置２１に送信する。 The generalization unit 340 generalizes the original data stored by the data storage unit 310 using the generalization width. Then, the generalization unit 340 transmits the data after generalization to the information processing apparatus 21.

次に、図４に示す情報処理装置２１と提供元装置３０とを合わせた動作を説明する。 Next, the combined operation of the information processing apparatus 21 and the providing source apparatus 30 illustrated in FIG. 4 will be described.

まず、ノイズ付加部３２０は、方針算出用データとして、データ保存部３１０が保存する元のデータに所定のノイズを付加した「ノイズ付加データ」を作成し、情報処理装置２１に送信する。 First, the noise addition unit 320 creates “noise addition data” in which predetermined noise is added to the original data stored by the data storage unit 310 as policy calculation data, and transmits the data to the information processing apparatus 21.

ここで、ノイズは、その平均が「０」となるように、所定の分布に従ってランダムに発生された値である。つまり、ノイズの付加は、データの数が多くなると、付加の影響が少なくなる（０に近づく）値の付加である。ノイズのデータの分布は、特に制限はない。例えば、ノイズの発生の分布は、０を中心とした所定の範囲おいて、発生確率が均等（ホワイトノイズ）でも良い。あるいは、ノイズの発生の分布は、「０」を中心とする正規分布でも良い。 Here, the noise is a value randomly generated according to a predetermined distribution so that the average is “0”. That is, the addition of noise is the addition of a value that reduces the influence of addition (approaches 0) as the number of data increases. The distribution of the noise data is not particularly limited. For example, the distribution of noise generation may be uniform (white noise) in a predetermined range centered at 0. Alternatively, the distribution of noise generation may be a normal distribution centered on “0”.

具体的な値を用いて説明する。 This will be described using specific values.

図５は、説明に用いるデータ保存部３１０が保存する元のデータの一例を示す図である。 FIG. 5 is a diagram illustrating an example of original data stored by the data storage unit 310 used for explanation.

つまり、データ保存部３１０は、図５に示す元のデータを保存する。 That is, the data storage unit 310 stores the original data shown in FIG.

図６は、図５に示すデータにノイズを付加した「ノイズ付加データ」の一例を示す図である。 FIG. 6 is a diagram illustrating an example of “noise-added data” obtained by adding noise to the data illustrated in FIG. 5.

ノイズ付加部３２０は、例えば、図５に示すデータの準識別子の年齢に、図６に示すノイズを付加し、識別子を削除し、ノイズ付加データを算出する（図６を参照）。このノイズ付加データは、方針算出用データに相当する。 For example, the noise adding unit 320 adds the noise shown in FIG. 6 to the age of the quasi-identifier of the data shown in FIG. 5, deletes the identifier, and calculates noise added data (see FIG. 6). This noise addition data corresponds to policy calculation data.

図４を用いた説明に戻る。 Returning to the description using FIG.

ノイズ付加部３２０は、ノイズ付加データを情報処理装置２１に送信する。 The noise addition unit 320 transmits noise addition data to the information processing device 21.

ノイズ付加データ集約部２１１は、ノイズ付加データを提供元装置３０から受け取り、ノイズ付加データを集約し、匿名化部２２０に送る。 The noise addition data aggregating unit 211 receives the noise addition data from the providing source device 30, aggregates the noise addition data, and sends the data to the anonymization unit 220.

匿名化部２２０は、集約後のノイズ付加データを、予め指定された匿名性を満たすように、匿名化する。 The anonymization unit 220 anonymizes the aggregated noise-added data so as to satisfy the anonymity specified in advance.

図７は、図６に示すノイズ付加データを匿名化した匿名化後データの一例を示す図である。 FIG. 7 is a diagram illustrating an example of anonymized data obtained by anonymizing the noise-added data illustrated in FIG.

例えば、匿名化部２２０は、図６に示すデータの年齢を、最も汎化した状態（図７の左の図）に汎化する。そして、匿名化部２２０は、データの年齢の中央値「２６」を境界として、年齢を２グループに分割し、図７の右に示すように、ノイズ付加データを匿名化する。なお、図７において、匿名化部２２０は、「２−匿名性」を満たすように、データを匿名化している。 For example, the anonymization unit 220 generalizes the age of the data shown in FIG. 6 to the most generalized state (the left diagram in FIG. 7). Then, the anonymization unit 220 divides the age into two groups with the median age “26” of the data as a boundary, and anonymizes the noise-added data as shown on the right in FIG. In FIG. 7, the anonymization unit 220 anonymizes data so as to satisfy “2-anonymity”.

図４の用いた説明に戻る。 Returning to the description of FIG.

匿名化部２２０は、匿名化後のデータを汎化幅算出部２３１に送る。 The anonymization unit 220 sends the data after anonymization to the generalization width calculation unit 231.

汎化幅算出部２３１は、匿名化後のデータを基に、「汎化幅」を算出する。 The generalization width calculation unit 231 calculates a “generalization width” based on the anonymized data.

ここで「汎化幅」とは、匿名化後のデータにおける、匿名化した値の範囲である。 Here, the “generalization width” is a range of anonymized values in the data after anonymization.

例えば、図７の右に示す匿名化後ノイズ付加データは、年齢が「２０−２５」と「２６−３０」との２つのグループに分かれている。この場合、年齢の「２０−２５」及び「２６−３０」が、汎化幅である。 For example, the anonymized post-anonymization data shown on the right in FIG. 7 is divided into two groups of “20-25” and “26-30”. In this case, “20-25” and “26-30” of ages are generalization widths.

汎化幅算出部２３１は、匿名化後のデータを基に、汎化幅（図７では、「年齢：２０−２５」と「年齢：２６−３０」）を算出する。 The generalization width calculation unit 231 calculates the generalization width (“age: 20-25” and “age: 26-30” in FIG. 7) based on the data after anonymization.

なお、汎化幅算出部２３１が算出する汎化幅は、ノイズ付加データを基にした汎化幅である。そのため、汎化幅算出部２３１が算出した汎化幅は、提供元装置３０が保持する元のデータを基にした汎化幅と異なる可能性がある。しかし、既に説明したとおり、提供元装置３０が加算するノイズは、平均が０となる値の付加である。そのため、ノイズ付加データを基にした汎化幅は、元のデータを基に算出した汎化幅から大きくずれる可能性（確率）が低い。つまり、汎化幅算出部２３１が算出する汎化幅は、提供元装置３０のデータに対して、ある程度の妥当性を備えた汎化幅である。 The generalization width calculated by the generalization width calculation unit 231 is a generalization width based on noise addition data. For this reason, the generalization width calculated by the generalization width calculation unit 231 may be different from the generalization width based on the original data held by the providing source device 30. However, as already described, the noise added by the provider device 30 is an addition of a value that averages zero. For this reason, the generalization width based on the noise-added data has a low possibility (probability) of deviating from the generalization width calculated based on the original data. That is, the generalization width calculated by the generalization width calculation unit 231 is a generalization width having a certain degree of validity with respect to the data of the providing apparatus 30.

汎化幅算出部２３１は、提供元装置３０に、汎化幅を送信する。 The generalization width calculation unit 231 transmits the generalization width to the providing apparatus 30.

なお、汎化幅算出部２３１は、汎化幅として、年齢の範囲とは異なる値を送信しても良い。例えば、汎化幅算出部２３１は、汎化の境界（例えば、図７に示すデータの「年齢の２６」）を送信しても良い。 Note that the generalization width calculation unit 231 may transmit a value different from the age range as the generalization width. For example, the generalization width calculation unit 231 may transmit a generalization boundary (for example, “age 26” in the data illustrated in FIG. 7).

汎化幅保存部３３０は、情報処理装置２０から受け取った汎化幅を保存する。そして、汎化幅保存部３３０は、汎化幅を汎化部３４０に送る。 The generalization width storage unit 330 stores the generalization width received from the information processing apparatus 20. Then, the generalization width storage unit 330 sends the generalization width to the generalization unit 340.

汎化部３４０は、情報処理装置２０から受け取った汎化幅を用いて、データ保存部３１０が保存する元のデータを汎化する。 The generalization unit 340 generalizes the original data stored by the data storage unit 310 using the generalization width received from the information processing apparatus 20.

図８は、提供元装置３０が汎化した、汎化後のデータの一例を示す図である。 FIG. 8 is a diagram illustrating an example of data after generalization that has been generalized by the providing source device 30.

例えば、汎化部３４０は、図５に示す元のデータを、受け取った汎化幅（「年齢：２０−２５」と「年齢：２６−３０」）を基に、図８に示すデータに汎化する。 For example, the generalization unit 340 converts the original data shown in FIG. 5 into the data shown in FIG. 8 based on the received generalization widths (“age: 20-25” and “age: 26-30”). Turn into.

図４の用いた説明に戻る。 Returning to the description of FIG.

汎化部３４０は、汎化したデータを、情報処理装置２１に送信する。 The generalization unit 340 transmits the generalized data to the information processing apparatus 21.

汎化データ集約部２４０と、匿名性検査部２５０と、サプレッション部２６０は、図３の用いた説明と同様に動作する。繰り返しとなるが、各部について説明すると、次のようになる。 The generalized data aggregating unit 240, the anonymity checking unit 250, and the suppression unit 260 operate in the same manner as described with reference to FIG. Again, each part will be described as follows.

汎化データ集約部２４０は、提供元装置３０から受け取った汎化後のデータを集約、つまり結合し、結合データを匿名性検査部２５０に送る。 The generalized data aggregating unit 240 aggregates, that is, combines, the data after generalization received from the providing apparatus 30, and sends the combined data to the anonymity checking unit 250.

匿名性検査部２５０は、汎化データ集約部２４０から受け取った結合データの匿名性を検査する。 The anonymity inspection unit 250 inspects the anonymity of the combined data received from the generalized data aggregation unit 240.

結合データが匿名性を満足しない場合、匿名性検査部２５０は、結合データをサプレッション部２６０に送り、データのサプレッションを依頼する。 If the combined data does not satisfy the anonymity, the anonymity checking unit 250 sends the combined data to the suppression unit 260 and requests data suppression.

サプレッション部２６０は、受け取った結合データをサプレッションし、匿名性検査部２５０に戻す。 The suppression unit 260 suppresses the received combined data and returns it to the anonymity checking unit 250.

匿名性検査部２５０は、結合データが匿名性を確保するまで、サプレッション部２６０への結合データの送信を繰り返す。 The anonymity inspection unit 250 repeats transmission of the combined data to the suppression unit 260 until the combined data ensures anonymity.

結合データが匿名性を満足する場合、匿名性検査部２５０は、結合データを、匿名化後データとして、利用者装置４０に送る。 When combined data satisfies anonymity, the anonymity test | inspection part 250 sends combined data to the user apparatus 40 as data after anonymization.

なお、既に説明したとおり、汎化幅算出部２３１が算出する汎化幅は、ある程度の妥当性を備えている。そのため、提供元装置３０が汎化したデータは、適切な汎化に近い汎化である。そのため、情報処理装置２１のサプレッション部２６０の処理は、大きな処理量とはならない。 As already described, the generalization width calculated by the generalization width calculation unit 231 has a certain degree of validity. Therefore, the data generalized by the providing source device 30 is generalization close to appropriate generalization. Therefore, the processing of the suppression unit 260 of the information processing apparatus 21 does not have a large processing amount.

ここで、本実施形態の情報処理装置２０及び情報処理装置２１（以下、まとめて情報処理装置２０と言う）の効果について説明する。 Here, the effects of the information processing apparatus 20 and the information processing apparatus 21 (hereinafter collectively referred to as the information processing apparatus 20) of the present embodiment will be described.

本実施形態の情報処理装置２０は、提供元装置３０が保存する元のデータを受信しなくても、提供元装置３０が保存する元のデータを匿名化して、利用者装置４０に提供する効果を実施できる。つまり、本実施形態の情報処理装置２０を用いて情報を匿名化する提供元装置３０は、情報処理装置２０を信頼しない場合でも、元のデータを汎化して集約し、匿名化できる効果を得ることができる。 The information processing apparatus 20 according to the present embodiment has the effect of anonymizing the original data stored by the providing source apparatus 30 and providing it to the user apparatus 40 without receiving the original data stored by the providing source apparatus 30. Can be implemented. That is, the provider apparatus 30 that anonymizes information using the information processing apparatus 20 of the present embodiment has an effect of generalizing and aggregating original data and anonymizing even when the information processing apparatus 20 is not trusted. be able to.

その理由は、次のとおりである。 The reason is as follows.

本実施形態の情報処理装置２０は、提供元装置３０から、元のデータを受信するのではなく、方針算出用データ（例えば、ノイズ付加データ）を受信し、汎化のための方針を算出し、提供元装置３０に送信する。そして、情報処理装置２０は、提供元装置３０が方針を基に汎化した汎化後のデータを受信する。 The information processing apparatus 20 according to the present embodiment does not receive the original data from the providing apparatus 30, but receives policy calculation data (for example, noise-added data) and calculates a generalization policy. To the provider device 30. Then, the information processing apparatus 20 receives the generalized data that the provider apparatus 30 generalizes based on the policy.

つまり、情報処理装置２０は、提供元装置３０が保存する元のデータを受信しなくても、提供元装置３０から汎化したデータを受信できるためである。 That is, the information processing apparatus 20 can receive generalized data from the providing source apparatus 30 without receiving the original data stored by the providing source apparatus 30.

さらに、本実施形態の情報処理装置２０は、情報処理システム１０の通信量を削減する効果を得ることができる。 Furthermore, the information processing apparatus 20 according to the present embodiment can obtain an effect of reducing the communication amount of the information processing system 10.

その理由は、次のとおりである。 The reason is as follows.

本実施形態の情報処理装置２０を用いてデータを匿名化する提供元装置３０は、他の提供元装置３０と通信する必要がない。 The provider apparatus 30 that anonymizes data using the information processing apparatus 20 of the present embodiment does not need to communicate with other provider apparatuses 30.

提供元装置３０の送信は、情報処理装置２０に対する、ノイズ付加データと汎化後のデータとの送信である。 The transmission of the providing source device 30 is transmission of noise addition data and generalized data to the information processing device 20.

情報処理装置２０の送信は、提供元装置３０への、方針の送信である。方針の送信は、データの送信に比べると、十分小さな通信量である。 Transmission of the information processing apparatus 20 is transmission of a policy to the providing source apparatus 30. Policy transmission is a sufficiently small amount of communication compared to data transmission.

つまり、情報処理装置２０を含む情報処理システム１０の通信量は、各提供元装置３０の２回のデータ（ノイズ付加データと匿名化後のデータ）を送信となる。このように、情報処理装置２０を含む情報処理システム１０の通信量は、各提供元装置３０のデータの２倍である。つまり、提供元装置３０の数をＮとする、通信量は、「Ｏ（Ｎ）」のレベルとなるためである。 That is, the amount of communication of the information processing system 10 including the information processing device 20 is transmitted twice (data with noise added and data after anonymization) of each provider device 30. Thus, the communication amount of the information processing system 10 including the information processing device 20 is twice the data of each providing source device 30. That is, it is because the communication amount is set to the level of “O (N)” where the number of providing source devices 30 is N.

また、本実施形態の情報処理装置２０は、提供元装置３０の処理量を削減する効果を得ることができる。 Further, the information processing apparatus 20 according to the present embodiment can obtain an effect of reducing the processing amount of the providing apparatus 30.

その理由は、次のとおりである。 The reason is as follows.

例えば、提供元装置３０が、本発明に関連する「Mondrian Multidimensional」と「ＭＰＣ」を用いる場合、計算量は、「Ｏ（Ｎ^２・ｌｏｇ^２Ｎ）」となる。 For example, when the provider apparatus 30 uses “Mondrian Multidimensional” and “MPC” related to the present invention, the calculation amount is “O (N ² · log ² N)”.

一方、本実施形態の情報処理装置２０を用いる提供元装置３０は、他の提供元装置３０のデータを処理する必要がない。 On the other hand, the provider apparatus 30 that uses the information processing apparatus 20 of the present embodiment does not need to process data of other provider apparatuses 30.

提供元装置３０の処理は、保存する元のデータにノイズを入れる処理、及び、保存する元のデータを汎化する処理となる。つまり、提供元装置３０の処理量は、「Ｏ（Ｎ）」のレベルとなる。このように、本実施形態の情報処理装置２０は、提供元装置３０の処理量を削減できる。 The processing of the providing source device 30 includes processing for adding noise to the original data to be stored and processing for generalizing the original data to be stored. That is, the processing amount of the providing source device 30 is at the level of “O (N)”. As described above, the information processing apparatus 20 according to the present embodiment can reduce the processing amount of the providing source apparatus 30.

＜変形例１＞
本実施形態の情報処理装置２０は、準識別子として、年齢のような数値に限る必要はない。例えば、情報処理装置２０は、準識別子の一部又は全てに、性別や病気名のような分類名（カテゴリー）を用いても良い。 <Modification 1>
The information processing apparatus 20 according to the present embodiment need not be limited to numerical values such as age as a quasi-identifier. For example, the information processing apparatus 20 may use a classification name (category) such as gender or disease name for some or all of the quasi-identifiers.

分類名を使用する場合、情報処理装置２０は、分類名を数値に置き換えて処理しても良い。分類名を数値に置き換えれば、情報処理装置２０は、第１の実施形態と同様の構成及び処理を用いて、処理を実現できる。 When using a classification name, the information processing apparatus 20 may process the classification name by replacing it with a numerical value. If the classification name is replaced with a numerical value, the information processing apparatus 20 can realize the processing using the same configuration and processing as in the first embodiment.

また、情報処理装置２０は、分類に木構造を適用し、一般的な木構造の処理を用いて、分類名の準識別子を処理しても良い。 In addition, the information processing apparatus 20 may apply a tree structure to the classification and process the quasi-identifier of the classification name using a general tree structure process.

本変形例の情報処理装置２０は、数値以外の準識別子を取り扱う効果を得ることができる。 The information processing apparatus 20 according to this modification can obtain an effect of handling quasi-identifiers other than numerical values.

その理由は、次のとおりである。 The reason is as follows.

本変形例の情報処理装置２０は、数値以外の順識別子を数値に変換する、又は、木構造を用いて、数値以外の準識別子を処理できるためである。 This is because the information processing apparatus 20 according to the present modification can convert a sequential identifier other than a numerical value into a numerical value, or can process a quasi-identifier other than a numerical value by using a tree structure.

＜変形例２＞
情報処理装置２０は、提供元装置３０に、方針算出用データの算出について、指示しても良い。 <Modification 2>
The information processing device 20 may instruct the providing source device 30 to calculate the policy calculation data.

例えば、提供元装置３０が、データにノイズを加える場合を用いて説明する。 For example, the case where the providing source device 30 adds noise to data will be described.

情報処理装置２０の匿名性検査部２５０の検査に結果において、集約後のデータの匿名性を満たすデータの比率が、所定の値より低い場合、情報処理装置２０は、ノイズ付加データを基にした方針の算出が、適切でないと判断する。 When the ratio of the data satisfying the anonymity of the aggregated data is lower than a predetermined value in the result of the inspection of the anonymity inspection unit 250 of the information processing device 20, the information processing device 20 is based on the noise addition data. Judge that the calculation of the policy is not appropriate.

そこで、情報処理装置２０は、提供元装置３０にノイズ幅の調整を依頼する。 Therefore, the information processing apparatus 20 requests the provider apparatus 30 to adjust the noise width.

そして、情報処理装置２０は、提供元装置３０からノイズ幅の調整後のノイズ付加データを受信し、新たな方針を算出する。そして、情報処理装置２０は、算出した方針を、提供元装置３０に送信する。 Then, the information processing device 20 receives the noise-added data after adjusting the noise width from the providing source device 30, and calculates a new policy. Then, the information processing apparatus 20 transmits the calculated policy to the providing source apparatus 30.

なお、情報処理装置２０は、新に算出した方針が、前回の方針と変化したか否かを判断しても良い。そして、変化しない場合、情報処理装置２０は、方針を送信せず、さらにノイズ幅の変更を依頼しても良い。 The information processing apparatus 20 may determine whether the newly calculated policy has changed from the previous policy. And when it does not change, the information processing apparatus 20 may request the change of the noise width without transmitting the policy.

提供元装置３０は、新たな方針を基にデータを汎化し、情報処理装置２０に送信する。 The provider apparatus 30 generalizes the data based on the new policy and transmits the data to the information processing apparatus 20.

情報処理装置２０は、修正した方針を基に汎化したデータを集約し、匿名性を検査する。 The information processing apparatus 20 collects generalized data based on the revised policy and checks anonymity.

情報処理装置２０は、所定の匿名性を満たすまで、この処理を繰り返しても良い。 The information processing apparatus 20 may repeat this process until predetermined anonymity is satisfied.

なお、情報処理装置２０が提供元装置３０に指示する構成は、特に制限はない。 Note that the configuration instructed by the information processing apparatus 20 to the providing source apparatus 30 is not particularly limited.

例えば、情報処理装置２０の匿名性検査部２５０が、提供元装置３０にノイズの修正を指示してもよい。 For example, the anonymity inspection unit 250 of the information processing apparatus 20 may instruct the provider apparatus 30 to correct noise.

あるいは、情報処理装置２０が、図示しないノイズ修正指示部を含み、匿名性検査部２５０が、ノイズ修正指示部に提供元装置３０への通知を依頼してもよい。 Alternatively, the information processing apparatus 20 may include a noise correction instruction unit (not shown), and the anonymity inspection unit 250 may request the noise correction instruction unit to notify the providing apparatus 30.

あるいは、匿名性検査部２５０が、提供元装置３０に汎化幅を送信する方針算出部２３０に指示し、方針算出部２３０が、提供元装置３０にノイズの変更を指示してもよい。 Alternatively, the anonymity checking unit 250 may instruct the policy calculating unit 230 that transmits the generalization width to the providing source device 30, and the policy calculating unit 230 may instruct the providing source device 30 to change the noise.

本変形例に係る情報処理装置２０は、より適切な匿名性を実現する効果を得ることできる。 The information processing apparatus 20 according to the present modification can obtain an effect of realizing more appropriate anonymity.

その理由は、次のとおりである。 The reason is as follows.

本変形例の情報処理装置２０は、集約した汎化データの匿名性の検査結果を基に、提供元装置３０から受け取る方針算出用データを修正できるためである。 This is because the information processing apparatus 20 of the present modification can modify the policy calculation data received from the providing apparatus 30 based on the anonymity test result of the generalized data that has been aggregated.

＜変形例３＞
情報処理装置２０は、提供元装置３０から、各提供元装置３０が必要とするデータの匿名性を受け取っても良い。 <Modification 3>
The information processing device 20 may receive the anonymity of data required by each provider device 30 from the provider device 30.

この場合、情報処理装置２０は、受け取ったデータの匿名性を満たすように、利用者装置４０に提供する結合データの匿名性を決定する。 In this case, the information processing device 20 determines the anonymity of the combined data provided to the user device 40 so as to satisfy the anonymity of the received data.

例えば、情報処理装置２０は、提供元装置３０から受け取った最も高い匿名性を満たすように、汎化の方針を決定し、データを匿名化しても良い。 For example, the information processing apparatus 20 may determine the generalization policy and anonymize the data so as to satisfy the highest anonymity received from the provider apparatus 30.

あるいは、情報処理装置２０は、提供元装置３０から受け取ったデータが含まれる各グループにおいて、受け取った匿名性を満足するように、各提供元装置３０の汎化の方針を決定し、データを匿名化しても良い。 Alternatively, the information processing apparatus 20 determines the generalization policy of each provider apparatus 30 so that the received anonymity is satisfied in each group including the data received from the provider apparatus 30, and the data is anonymous. May be used.

本変形例に係る情報処理装置２０は、提供元装置３０にとって、より適切な匿名性を実現する効果を得ることができる。 The information processing apparatus 20 according to the present modification can obtain an effect of realizing more appropriate anonymity for the provider apparatus 30.

その理由は、次のとおりである。 The reason is as follows.

本変形例の情報処理装置２０は、提供元装置３０が必要とするデータの匿名性を受け取り、その匿名性を満たすように、データを匿名化するためである。 This is because the information processing apparatus 20 of the present modification receives anonymity of data required by the provider apparatus 30 and anonymizes the data so as to satisfy the anonymity.

＜変形例４＞
情報処理装置２０の構成は、これまでの説明に限らない。 <Modification 4>
The configuration of the information processing apparatus 20 is not limited to the above description.

情報処理装置２０は、各構成を複数の構成に分けても良い。 The information processing apparatus 20 may divide each configuration into a plurality of configurations.

例えば、情報処理装置２０の方針算出部２３０は、方針を算出する構成と、方針を送信する構成とに分かれても良い。 For example, the policy calculation unit 230 of the information processing apparatus 20 may be divided into a configuration for calculating a policy and a configuration for transmitting a policy.

あるいは、情報処理装置２０は、１つの装置で構成される必要はない。例えば、情報処理装置２０は、ネットワーク５０を介して接続した方針決定部２０１を含む装置と、匿名化データ作成部２０２を含む装置とを用いて構成されても良い。 Or the information processing apparatus 20 does not need to be comprised with one apparatus. For example, the information processing device 20 may be configured using a device including the policy determination unit 201 connected via the network 50 and a device including the anonymized data creation unit 202.

また、情報処理装置２０は、複数の構成を１つの構成としても良い。 Further, the information processing apparatus 20 may have a plurality of configurations as one configuration.

例えば、情報処理装置２０は、ＣＰＵ（Central Processing Unit）と、ＲＯＭ（Read Only Memory）と、ＲＡＭ（Random Access Memory）と、入出力接続回路（ＩＯＣ：Input/Output Circuit）と、ネットワークインターフェース回路（ＮＩＣ：Network Interface Circuit）とを含むコンピュータ装置として実現しても良い。 For example, the information processing apparatus 20 includes a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), an input / output circuit (IOC), and a network interface circuit ( You may implement | achieve as a computer apparatus containing NIC: Network Interface Circuit).

図９は、本実施形態の情報処理装置２０の変形例である情報処理装置６０の構成の一例を示すブロック図である。 FIG. 9 is a block diagram illustrating an example of a configuration of an information processing device 60 that is a modification of the information processing device 20 of the present embodiment.

情報処理装置６０は、ＣＰＵ６１０と、ＲＯＭ６２０と、ＲＡＭ６３０と、内部記憶装置６４０と、ＩＯＣ６５０と、ＮＩＣ６８０とを含み、コンピュータを構成している。 The information processing apparatus 60 includes a CPU 610, a ROM 620, a RAM 630, an internal storage device 640, an IOC 650, and a NIC 680, and constitutes a computer.

ＣＰＵ６１０は、ＲＯＭ６２０からプログラムを読み込む。そして、ＣＰＵ６１０は、読み込んだプログラムに基づいて、ＲＡＭ６３０と、内部記憶装置６４０と、ＩＯＣ６５０と、ＮＩＣ６８０とを制御する。そして、ＣＰＵ６１０は、これらの構成を制御し、図２に示す、方針決定部２０１と匿名化データ作成部２０２としての各機能を実現する。ＣＰＵ６１０は、各機能を実現する際に、ＲＡＭ６３０をプログラムの一時記憶として使用しても良い。 The CPU 610 reads a program from the ROM 620. The CPU 610 controls the RAM 630, the internal storage device 640, the IOC 650, and the NIC 680 based on the read program. And CPU610 controls these structures and implement | achieves each function as the policy determination part 201 and the anonymization data preparation part 202 which are shown in FIG. The CPU 610 may use the RAM 630 as a temporary program storage when realizing each function.

また、ＣＰＵ６１０は、コンピュータで読み取り可能にプログラムを記憶した記憶媒体７００が含むプログラムを、図示しない記憶媒体読み取り装置を用いて読み込んでも良い。あるいは、ＣＰＵ６１０は、ＮＩＣ６８０を介して、図示しない外部の装置からプログラムを受け取っても良い。 In addition, the CPU 610 may read a program included in the storage medium 700 that stores the program so as to be readable by a computer using a storage medium reading device (not shown). Alternatively, the CPU 610 may receive a program from an external device (not shown) via the NIC 680.

ＲＯＭ６２０は、ＣＰＵ６１０が実行するプログラム及び固定的なデータを記憶する。ＲＯＭ６２０は、例えば、Ｐ−ＲＯＭ（Programable-ROM）やフラッシュＲＯＭである。 The ROM 620 stores programs executed by the CPU 610 and fixed data. The ROM 620 is, for example, a P-ROM (Programmable-ROM) or a flash ROM.

ＲＡＭ６３０は、ＣＰＵ６１０が実行するプログラムやデータを一時的に記憶する。ＲＡＭ６３０は、例えば、Ｄ−ＲＡＭ（Dynamic-RAM）である。 The RAM 630 temporarily stores programs executed by the CPU 610 and data. The RAM 630 is, for example, a D-RAM (Dynamic-RAM).

内部記憶装置６４０は、情報処理装置６０が長期的に保存するデータやプログラムを記憶する。また、内部記憶装置６４０は、ＣＰＵ６１０の一時記憶装置として動作しても良い。内部記憶装置６４０は、例えば、ハードディスク装置、光磁気ディスク装置、ＳＳＤ（Solid State Drive）又はディスクアレイ装置である。 The internal storage device 640 stores data and programs that the information processing device 60 saves over a long period of time. Further, the internal storage device 640 may operate as a temporary storage device for the CPU 610. The internal storage device 640 is, for example, a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), or a disk array device.

ＩＯＣ６５０は、ＣＰＵ６１０と、入力機器６６０及び表示機器６７０とのデータを仲介する。ＩＯＣ６５０は、例えば、ＩＯインターフェースカードである。 The IOC 650 mediates data between the CPU 610, the input device 660, and the display device 670. The IOC 650 is, for example, an IO interface card.

入力機器６６０は、情報処理装置６０の操作者からの入力指示を受け取る機器である。入力機器６６０は、例えば、キーボード、マウス又はタッチパネルである。 The input device 660 is a device that receives an input instruction from an operator of the information processing apparatus 60. The input device 660 is, for example, a keyboard, a mouse, or a touch panel.

表示機器６７０は、情報処理装置６０の操作者に情報を表示する機器である。表示機器６７０は、例えば、液晶ディスプレイである。 The display device 670 is a device that displays information to the operator of the information processing apparatus 60. The display device 670 is a liquid crystal display, for example.

ＮＩＣ６８０は、ネットワークを介した外部の装置とのデータのやり取りを中継する。ＮＩＣ６８０は、例えば、ＬＡＮカードである。 The NIC 680 relays data exchange with an external device via a network. The NIC 680 is, for example, a LAN card.

このように構成された情報処理装置６０は、情報処理装置２０と同様の効果を得ることができる。 The information processing apparatus 60 configured as described above can obtain the same effects as the information processing apparatus 20.

その理由は、次のとおりである。 The reason is as follows.

情報処理装置６０のＣＰＵ６１０は、プログラムに基づいて情報処理装置２０と同様の機能を実現できるためである。 This is because the CPU 610 of the information processing device 60 can realize the same function as the information processing device 20 based on the program.

なお、提供元装置３０も、情報処理装置２０の同様に、図９で示すコンピュータを用いて実現されても良い。 The provider device 30 may also be realized using the computer shown in FIG.

（第２の実施形態）
第１の実施形態の情報処理装置２０は、匿名性検査部２５０とサプレッション部２６０とを用いて、適切な匿名化を実現する。 (Second Embodiment)
The information processing apparatus 20 according to the first embodiment realizes appropriate anonymization using the anonymity inspection unit 250 and the suppression unit 260.

また、情報処理装置２０が匿名性を検査する汎化後の結合データは、提供元装置３０において、方針を基に匿名化されている。情報処理装置２０が算出する方針は、既に説明のとおり、汎化としてかなり妥当な値である。そのため、提供元装置３０は、ある程度適切にデータを汎化できる。従って、サプレッション部２６０の処理は、大きな処理量とはならない。 Further, the generalized combined data that the information processing device 20 checks for anonymity is anonymized based on the policy in the providing source device 30. The policy calculated by the information processing device 20 is a fairly reasonable value for generalization as already described. Therefore, the providing apparatus 30 can generalize the data appropriately to some extent. Therefore, the processing of the suppression unit 260 does not have a large processing amount.

しかし、サプレッション部２６０の処理は、できる限り少ない方が、望ましい。サプレッション部２６０の処理を削減するために、情報処理装置２０において、より適切な方針（例えば、汎化幅又は境界）の算出が、望ましい。 However, it is desirable that the processing of the suppression unit 260 be as small as possible. In order to reduce the processing of the suppression unit 260, the information processing apparatus 20 desirably calculates a more appropriate policy (for example, a generalization width or a boundary).

そこで、第２の実施形態の情報処理装置２０は、方針の算出を改善し、より適切な方針（例えば、汎化幅又は境界）を算出する。 Therefore, the information processing apparatus 20 according to the second embodiment improves the calculation of the policy and calculates a more appropriate policy (for example, a generalization width or a boundary).

第２の実施形態の情報処理装置２０の構成は、第１の実施形態と同様のため、構成の説明を省略する。また、第２の実施形態の情報処理装置２０の動作は、匿名化部２２０の動作を除き、第１に実施形態と同様である。そのため、第１の実施形態と同じ動作の説明は省略し、第２の実施形態に特有の動作について説明する。 Since the configuration of the information processing apparatus 20 of the second embodiment is the same as that of the first embodiment, description of the configuration is omitted. The operation of the information processing apparatus 20 of the second embodiment is the same as that of the first embodiment except for the operation of the anonymization unit 220. Therefore, the description of the same operation as that of the first embodiment is omitted, and the operation unique to the second embodiment will be described.

第１の実施形態の匿名化部２２０は、受け取った方針算出用データを基に、匿名化を進めた。 The anonymization unit 220 of the first embodiment proceeds with anonymization based on the received policy calculation data.

これに対し、第２の実施形態の匿名化部２２０は、受け取った方針算出用データを、方針算出用データの作成方法のデータへの影響を考慮して、匿名化する。つまり、第２の実施形態の匿名化部２２０は、提供元装置３０の方針算出用データの作成手法を参照し、匿名化を実施する。 In contrast, the anonymization unit 220 of the second embodiment anonymizes the received policy calculation data in consideration of the influence of the policy calculation data creation method on the data. That is, the anonymization part 220 of 2nd Embodiment refers to the production method of the data for policy calculation of the provider apparatus 30, and implements anonymization.

以下の匿名化部２２０の動作の説明は、図５に示すデータに、ノイズを入れる場合を用いて説明する。 The following description of the operation of the anonymization unit 220 will be made using a case where noise is added to the data shown in FIG.

第２の実施形態の匿名化部２２０の動作について、図１０−１３を参照して説明する。 Operation | movement of the anonymization part 220 of 2nd Embodiment is demonstrated with reference to FIGS. 10-13.

図１０は、第１の実施形態の説明に用いた図６に示すノイズ付加データを示す図である。 FIG. 10 is a diagram showing the noise addition data shown in FIG. 6 used for the description of the first embodiment.

図中の黒丸が、ノイズ付加データを示す。また、各黒丸の上下の範囲が、ノイズの範囲である。ここで、ノイズの最大は、「２」としている。つまり、各ノイズ付加データに対応する元のデータは、この範囲に入る。 Black circles in the figure indicate noise added data. The range above and below each black circle is the range of noise. Here, the maximum noise is “2”. That is, the original data corresponding to each noise addition data falls within this range.

図１０の右に示す表は、図５に示す「元のデータ」と、図１０に示す「ノイズ付加データ」と、ノイズ付加データを基に匿名化部２２０が匿名化した「匿名化データ」とを対応させた表である。 The table shown on the right side of FIG. 10 includes “original data” shown in FIG. 5, “noise addition data” shown in FIG. 10, and “anonymization data” anonymized by the anonymization unit 220 based on the noise addition data. Is a table in which

また、図１０に示す「境界」は、匿名化後データの境界である。 Further, the “boundary” shown in FIG. 10 is the boundary of the anonymized data.

図１１は、図５のデータに、図１０とは異なるノイズを加算したノイズ付加データを示す図である。 FIG. 11 is a diagram showing noise-added data obtained by adding noise different from that in FIG. 10 to the data in FIG.

第１の実施形態の匿名化部２２０は、受信したデータを基に、境界（例えば、ノイズ付加データの中央値）を算出した。 The anonymization part 220 of 1st Embodiment calculated the boundary (for example, the median value of noise addition data) based on the received data.

第１の実施形態と同様に動作すると、匿名化部２２０は、図１１に示すように、ノイズ付加データの中央値である「２２」を境界として用いて、データを匿名化する。その結果、情報処理装置２０は、汎化幅として、「年齢：２０−２１」と「年齢：２２−３０」を、提供元装置３０に送信する。図１１の右の表は、この汎化幅に対応するデータのである。 When operating in the same manner as in the first embodiment, the anonymization unit 220 anonymizes data using “22”, which is the median value of noise-added data, as a boundary, as shown in FIG. As a result, the information processing apparatus 20 transmits “age: 20-21” and “age: 22-30” to the provider apparatus 30 as the generalization width. The table on the right side of FIG. 11 shows data corresponding to this generalization width.

図１１に示す匿名化の場合、汎化幅は、「年齢：２０−２１」と「年齢：２２−３０」となる。 In the case of anonymization illustrated in FIG. 11, the generalization width is “age: 20-21” and “age: 22-30”.

図１２は、提供元装置３０が、図１１に示す匿名化を基に算出される汎化幅を基に汎化した場合のデータを示す図である。 FIG. 12 is a diagram illustrating data when the providing source device 30 generalizes based on the generalization width calculated based on the anonymization illustrated in FIG. 11.

図１２から明らかなように、この匿名化は、適切ではない。例えば、年齢２０−２１のグループは、データ数が「１」であり、「２−匿名性」を満足しない。そのため、この汎化データを受け取った情報処理装置２０は、サプレッション部２６０でのサプレッション処理が必要となる。 As is apparent from FIG. 12, this anonymization is not appropriate. For example, the group of ages 20-21 has a data count of “1” and does not satisfy “2-anonymity”. Therefore, the information processing apparatus 20 that has received this generalized data needs to perform a suppression process in the suppression unit 260.

そこで、本実施形態の匿名化部２２０は、各点における「分割の不均等性」及び「匿名性を満たさないリスク」を用いて境界を選択する。 Therefore, the anonymization unit 220 of the present embodiment selects a boundary by using “unevenness of division” and “risk that does not satisfy anonymity” at each point.

「分割の不均等性」とは、その点を境界とした場合における、分割後のデータの不均等さである。例えば、「中央値からの距離」は、「分割の不均等性」の一例である。中央値から遠い点での分割は、中央値に近い点での分割に比べ、分割後のグループに含まれるデータの数の差が大きくなる。 “Division non-uniformity” is non-uniformity of data after division when the point is a boundary. For example, “distance from median” is an example of “unevenness of division”. The division at a point far from the median value has a larger difference in the number of data included in the group after the division than the division at a point near the median value.

「匿名性を満たさないリスク」は、その点を境界とした場合における、方針算出用データの曖昧さに基づく分割の不適正さである。例えば、「ノイズ範囲を考慮した分割点に含まれるデータの数」は、「匿名性を満たさないリスク」の一例である。ノイズ範囲を考慮したデータが多く含まれる境界は、その境界で分割した場合の元のデータの分割後の分布の変動が大きいと想定できる。 “Risk that does not satisfy anonymity” is improper division based on the ambiguity of the policy calculation data when that point is the boundary. For example, “the number of data included in the division points in consideration of the noise range” is an example of “risk that does not satisfy anonymity”. It can be assumed that a boundary including a lot of data in consideration of the noise range has a large variation in distribution after the original data is divided when the boundary is divided.

そこで、本実施形態の匿名化部２２０は、例えば、次に示す「数式１」を用いて計算した値（スコア）が小さい点を境界（分割点）とする。 Therefore, for example, the anonymization unit 220 of the present embodiment uses a point having a small value (score) calculated using the following “Formula 1” as a boundary (division point).

（数１）
スコア＝「中央値からの距離」＋「ノイズ範囲を考慮したデータの数」 … （１）
図１３は、各データに「数式１」のスコアを記載したデータを示す図である。 (Equation 1)
Score = “Distance from median” + “Number of data considering noise range” (1)
FIG. 13 is a diagram illustrating data in which the score of “Formula 1” is described in each data.

年齢のデータの括弧の中が、「数式１」を用いたスコアである。右辺の第１項が、「中央値からの距離」である。第２項が、「ノイズ範囲を考慮したデータの数」である。 The score in the parentheses of the age data is the score using “Formula 1”. The first term on the right side is “distance from median”. The second term is “the number of data considering the noise range”.

例えば、年齢「２５」のスコアの「４」は、「中央値からの距離」の「３」と「ノイズ範囲を考慮したデータの数」の「１」との和である。 For example, the score “4” of the age “25” is the sum of “3” of “distance from the median” and “1” of “number of data considering noise range”.

なお、ノイズ付加データのデータ範囲でない年齢（例えば、図１１の２０未満、及び、２８を超える範囲）は、境界となる可能性がないため、考慮しなくても良い。 It should be noted that an age that is not in the data range of the noise-added data (for example, a range less than 20 and more than 28 in FIG. 11) does not need to be considered because there is no possibility of becoming a boundary.

図１１の各データで上記スコアを計算すると、年齢「２２」、「２３」、及び、「２４」のスコアが、最も小さい値（３）である。 When the above score is calculated for each data in FIG. 11, the scores of the ages “22”, “23”, and “24” are the smallest value (3).

同じスコアの準識別子（今の場合、年齢）が複数ある場合、本実施形態の匿名化部２２０は、いずれの準識別子を境界として採用してもよく、特に制限はない。 When there are a plurality of quasi-identifiers with the same score (in this case, age), the anonymization unit 220 of this embodiment may adopt any quasi-identifier as a boundary, and there is no particular limitation.

ただし、本実施形態の情報処理装置２０は、匿名化を満たさない可能性の低減を、目的の１つとしている。そこで、以下、本実施形態の匿名化部２２０は、最も「匿名性を満たさないリスク」である「ノイズ範囲を考慮したデータの数」が小さい準識別子を採用するとして説明する。 However, the information processing apparatus 20 according to the present embodiment aims to reduce the possibility that the anonymization is not satisfied. Therefore, hereinafter, the anonymization unit 220 of the present embodiment will be described assuming that a quasi-identifier having the smallest “number of data in consideration of the noise range”, which is the “risk of not satisfying anonymity”, is employed.

今の場合、年齢「２４」の「ノイズ範囲を考慮したデータの数」の値「１」が、最も小さい値である。そのため、匿名化部２２０は、年齢「２４」を境界として、ノイズ付加データを匿名化する（図１３に示す境界を参照）。 In this case, the value “1” of the “number of data considering the noise range” of the age “24” is the smallest value. Therefore, the anonymization unit 220 anonymizes the noise-added data with the age “24” as a boundary (see the boundary shown in FIG. 13).

なお、図１３に示すように、年齢「２４」は、ノイズ付加データのノイズを考慮した場合の、データの重なりの少ない、つまり、「匿名性を満たさないリスク」が低い年齢の中で、最も中央値に近い値である。 As shown in FIG. 13, the age “24” is the smallest among the ages with little data overlap, that is, the “risk of not satisfying anonymity” when the noise of the noise-added data is considered. The value is close to the median.

図１４は、年齢「２４」を境界とした場合の提供元装置３０の汎化後のデータを示す図である。 FIG. 14 is a diagram illustrating data after generalization of the provider apparatus 30 when the age “24” is a boundary.

図１４に示すデータは、図１２に示すデータと比べ、適切な匿名化が実施されている。例えば、図１４に示すデータは、「２−匿名性」を満足する。 The data shown in FIG. 14 is appropriately anonymized compared to the data shown in FIG. For example, the data shown in FIG. 14 satisfies “2-anonymity”.

なお、本実施形態の情報処理装置２０が使用するスコアは、「数式１」に限る必要はない。 Note that the score used by the information processing apparatus 20 of the present embodiment need not be limited to “Formula 1”.

例えば、情報処理装置２０は、「分割の不均等性」又は「匿名性を満たさないリスク」の影響を修正するための「重み」を用いても良い。 For example, the information processing apparatus 20 may use “weight” for correcting the influence of “unevenness of division” or “risk of not satisfying anonymity”.

そのため、情報処理装置２０は、「数式１」の代わりに、次に示す「数式２」を用いて、スコアを算出しても良い。 Therefore, the information processing apparatus 20 may calculate the score using the following “Equation 2” instead of “Equation 1”.

（数２）
スコア＝「中央値からの距離」＋「重み」×「ノイズ範囲を考慮したデータの数」…（２）
「数式２」の「重み」は、「ノイズ範囲を考慮したデータの数」がスコアに与える影響を調整するためのパラメータである。 (Equation 2)
Score = “distance from median” + “weight” × “number of data considering noise range” (2)
“Weight” in “Formula 2” is a parameter for adjusting the influence of “the number of data in consideration of the noise range” on the score.

「重み」に「１」より大きな値を設定した場合、スコアは、「ノイズ範囲を考慮したデータの数」の影響を大きく受ける。その結果、「ノイズ範囲を考慮したデータの数」の値が大きな境界は、選択されにくくなる。反対に、「中央値からの距離」の値が大きな境界は、相対的に、選択され易くなる。 When a value greater than “1” is set for “weight”, the score is greatly affected by “the number of data considering the noise range”. As a result, a boundary having a large value of “the number of data considering the noise range” becomes difficult to be selected. On the other hand, a boundary having a large “distance from the median” value is relatively easily selected.

一方、「重み」に「１」より小さな値を設定した場合、スコアは、「ノイズ範囲を考慮したデータの数」の影響を受けにくくなる。その結果、「ノイズ範囲を考慮したデータの数」の値が大きな境界は、選択され易くなる。反対に、「中央時からの距離」が大きな境界は、相対的に、選択されにくくなる。 On the other hand, when a value smaller than “1” is set for “weight”, the score is less susceptible to the “number of data considering the noise range”. As a result, a boundary having a large value of “the number of data considering the noise range” is easily selected. Conversely, a boundary having a large “distance from the central time” is relatively difficult to be selected.

なお、「重み」は、「中央値からの距離」に乗じても良い。その場合、「重み」と「境界としての選択され易さ」とは、上記の反対となる。 The “weight” may be multiplied by “distance from the median”. In this case, “weight” and “ease of being selected as a boundary” are opposite to the above.

さらに、情報処理装置２０は、「重み」を、準識別子の属性に応じて変更しても良い。例えば、事前に、分割に対する準識別子の属性の影響の良否が分かっている場合、情報処理装置２０は、準識別子の属性に異なる重みを設定しても良い。 Furthermore, the information processing apparatus 20 may change the “weight” according to the attribute of the quasi-identifier. For example, when the quality of the quasi-identifier attribute on the division is known in advance, the information processing apparatus 20 may set different weights for the quasi-identifier attribute.

例えば、準識別子の属性「年齢」が、属性「身長」と比べ、分割に対して良い影響を与えることが分かっている場合、情報処理装置２０は、属性「年齢」に大きなの「重み」（例えば「２」）を用い、属性「身長」に小さな「重み」（例えば「１」）を用いても良い。情報処理装置２０は、このような「重み」の設定を基に、更に、良い分割を実現できる。 For example, when it is known that the attribute “age” of the quasi-identifier has a better influence on the division than the attribute “height”, the information processing apparatus 20 has a large “weight” ( For example, “2”) may be used, and a small “weight” (for example, “1”) may be used for the attribute “height”. The information processing apparatus 20 can realize further good division based on such a “weight” setting.

このように、第２の実施形態に係る情報処理装置２０は、第１の実施形態の効果に加え、より適切な汎化幅を算出する効果を得ることができる。 Thus, the information processing apparatus 20 according to the second embodiment can obtain an effect of calculating a more appropriate generalization width in addition to the effect of the first embodiment.

その理由は、次のとおりである。 The reason is as follows.

第２の実施形態に係る匿名化部２２０は、「分割の不均等性」及び「匿名性を満たさないリスク」を用いて、提供元装置３０での汎化の方針を選択する。そのため、本実施形態の情報処理装置２０は、提供元装置３０において、より適切な汎化を指示できるためである。 The anonymization unit 220 according to the second embodiment selects a generalization policy in the providing source device 30 using “divided non-uniformity” and “risk that does not satisfy anonymity”. For this reason, the information processing apparatus 20 according to the present embodiment can instruct the more appropriate generalization in the providing source apparatus 30.

さらに、第２の実施形態に係る情報処理装置２０は、サプレッション部２６０の処理を低減する効果を得ることができる。 Furthermore, the information processing apparatus 20 according to the second embodiment can obtain an effect of reducing the processing of the suppression unit 260.

その理由は、次のとおりである。 The reason is as follows.

匿名化部２２０は、提供元装置３０での汎化を改善できる。そのため、サプレッション部２６０で必要となる処理の回数が少なくなるためである。 The anonymization unit 220 can improve generalization in the provider device 30. This is because the number of processes required by the suppression unit 260 is reduced.

（第３の実施形態）
提供元装置３０は、情報処理装置２０に、できる限りデータを提供したくない。 (Third embodiment)
The provider apparatus 30 does not want to provide data to the information processing apparatus 20 as much as possible.

また、情報処理装置２０は、提供元装置３０の一部のデータを基に、所定の適応度を持った方針を算出できる。 Further, the information processing apparatus 20 can calculate a policy having a predetermined fitness based on a part of data of the providing source apparatus 30.

そのため、本実施形態の情報処理装置２０は、方針算出用データとして、提供元装置３０から、利用者装置４０に提供するためのデータの一部又は属性の一部を受信する。 Therefore, the information processing apparatus 20 according to the present embodiment receives a part of data or a part of attributes to be provided to the user apparatus 40 from the providing source apparatus 30 as the policy calculation data.

なお、ここに記載の「一部」は、提供元装置３０が、保持するデータの一部を方針算出用データとして提供する場合と、一部の提供元装置３０が、方針算出用データを提供する場合とを含む。 Note that the “part” described here refers to the case where the providing device 30 provides a part of the held data as policy calculation data, and the case where some of the providing device 30 provides the policy calculation data. Including the case.

情報処理装置２０の構成は、第１及び第２の実施形態と同様で良い。そのため、本実施形態の情報処理装置２０の構成の説明を省略する。 The configuration of the information processing apparatus 20 may be the same as in the first and second embodiments. Therefore, the description of the configuration of the information processing apparatus 20 of the present embodiment is omitted.

また、情報処理装置２０の動作は、第１の実施形態及び第２の実施形態と同様でも良い。 Further, the operation of the information processing apparatus 20 may be the same as in the first embodiment and the second embodiment.

ただし、方針算出用データが、提供元装置３０が提供するデータの一部の場合、情報処理装置２０の匿名化部２２０は、利用者装置４０に提供するデータの匿名性とは異なる匿名性を満足するように、方針算出用データを匿名化しても良い。この場合でも、方針算出部２３０は、匿名化部２２０が匿名化したデータを基に方針を算出する。 However, when the policy calculation data is a part of the data provided by the providing source device 30, the anonymization unit 220 of the information processing device 20 has anonymity different from the anonymity of the data provided to the user device 40. The policy calculation data may be anonymized so as to satisfy. Even in this case, the policy calculation unit 230 calculates the policy based on the data anonymized by the anonymization unit 220.

以下、第１及び第２の実施形態と同様の動作の説明を省略し、本実施形態に特有の動作について説明する。 Hereinafter, description of operations similar to those in the first and second embodiments will be omitted, and operations unique to the present embodiment will be described.

データ量が多い結合データは、データ量が少ない結合データに比べ、匿名性を満たしやすい。例えば、１００００人の結合データは、３０００人の結合データに比べ、所定の「ｋ−匿名性」を容易に満たせる。 Combined data with a large amount of data is more likely to satisfy anonymity than combined data with a small amount of data. For example, the combined data of 10,000 people can easily satisfy the predetermined “k-anonymity” compared to the combined data of 3000 people.

そこで、本実施形態の匿名化部２２０は、データの量に比例してデータを匿名化する。 Therefore, the anonymization unit 220 of this embodiment anonymizes data in proportion to the amount of data.

具体的な数値を用いて説明する。 This will be described using specific numerical values.

例えば、情報処理装置２０は、匿名化済みデータとして、「１０−匿名性」を満足した１００００人のデータを利用者装置４０に提供するとする。そして、情報処理装置２０は、方針算出用データとして、提供元装置３０から３０％の確率、つまり、３０００人のデータを受信するとする。 For example, it is assumed that the information processing device 20 provides the user device 40 with data of 10,000 people who satisfy “10-anonymity” as anonymized data. Then, it is assumed that the information processing apparatus 20 receives a 30% probability, that is, data of 3000 people from the providing source apparatus 30 as the policy calculation data.

この場合、情報処理装置２０の匿名化部２２０は、匿名化として、匿名化済み結合データの匿名化より低い匿名化、具体的には、「１０―匿名化」の「１０」の値を３０％に削減した「３−匿名化」を実現しても良い。 In this case, the anonymization unit 220 of the information processing device 20 sets the value of “10” of “10-anonymization” to 30 as anonymization, which is lower than the anonymization of the anonymized combined data. You may implement | achieve "3-anonymization" reduced to%.

これは、既に説明したとおり、情報処理装置２０において、データ量が少ないほど、「ｋ−匿名性」を満たしにくいためである。 This is because, as already described, in the information processing apparatus 20, the smaller the amount of data, the more difficult it is to satisfy “k-anonymity”.

例えば、３０００人のデータを基に「１０−匿名性」を満足する方針は、１００００人のデータに対して、必要以上に高い匿名性を実現する方針となる可能性が高い。 For example, a policy that satisfies “10-anonymity” based on data of 3000 people is likely to be a policy that realizes anonymity higher than necessary for data of 10,000 people.

一方、３０００人のデータにおいて「３−匿名性」を満足する方針は、１００００人のデータにおいて「１０−匿名性」を満足する方針に相当する可能性が高いと想定できる。 On the other hand, it can be assumed that a policy satisfying “3-anonymity” in data of 3000 people is highly likely to correspond to a policy satisfying “10-anonymity” in data of 10,000 people.

このように、本実施形態の情報処理装置２０は、利用者装置４０に提供するデータの量に対する方針算出用データのデータ量を基に、匿名化部２２０の匿名化を選択する。方針算出部２３０は、匿名化部２２０が匿名化しデータを基に方針を算出する。その結果、情報処理装置２０は、方針算出用データのデータ量を基に、算出する方針を変更する。 As described above, the information processing apparatus 20 according to the present embodiment selects anonymization of the anonymization unit 220 based on the data amount of the policy calculation data with respect to the amount of data provided to the user device 40. The policy calculation unit 230 is anonymized by the anonymization unit 220 and calculates a policy based on the data. As a result, the information processing apparatus 20 changes the policy to be calculated based on the data amount of the policy calculation data.

なお、匿名化部２２０は、データの比率ではなく、他の指標を基に、匿名性を変更しても良い。 Note that the anonymization unit 220 may change the anonymity based on another index instead of the data ratio.

本実施形態の情報処理装置２０は、第１の及び第２の実施形態の効果に加え、通信量及び処理量を、低減する効果を得ることができる。 In addition to the effects of the first and second embodiments, the information processing apparatus 20 according to the present embodiment can obtain the effect of reducing the communication amount and the processing amount.

その理由は、次のとおりである。 The reason is as follows.

情報処理装置２０の方針算出用データ集約部２１０は、提供元装置３０の保存する元のデータの一部を受信する。そのため、情報処理装置２０が受信するデータの通信量は、低減する。 The policy calculation data aggregation unit 210 of the information processing device 20 receives a part of the original data stored in the providing source device 30. Therefore, the amount of data communication received by the information processing apparatus 20 is reduced.

また、提供元装置３０は、方針算出用データの作成処理が少なくなる。さらに、情報処理装置２０の匿名化部２２０の匿名化処理は、処理するデータ量が少なくなるためである。 Further, the provision source device 30 requires less policy calculation data creation processing. Furthermore, the anonymization process of the anonymization unit 220 of the information processing apparatus 20 is because the amount of data to be processed decreases.

＜変形例１＞
情報処理装置２０は、階層化された複数の装置で構成されても良い。 <Modification 1>
The information processing apparatus 20 may be configured by a plurality of hierarchized apparatuses.

情報処理装置２０が階層構造を構成する場合、階層構造の下位の情報処理装置２０は、必要に応じて、上位に情報処理装置２０に匿名化したデータを送信する。上位の情報処理装置２０は、受け取った匿名化後のデータを集約し、必要な匿名化を実施する。 When the information processing apparatus 20 configures a hierarchical structure, the information processing apparatus 20 that is lower in the hierarchical structure transmits anonymized data to the information processing apparatus 20 in the higher order as necessary. The host information processing apparatus 20 aggregates the received data after anonymization and performs necessary anonymization.

そして、本変形例に係る情報処理装置２０は、取扱いデータ量を基に、データが満足する匿名性を変更する。 And the information processing apparatus 20 which concerns on this modification changes the anonymity which data satisfies based on the amount of handling data.

Ｘ市は、Ａ町とＢ町とを含むとする。そして、情報処理装置２０は、Ｘ市と、Ａ町と、Ｂ町とに備えられるとする。そして、情報処理装置２０は、人口に比例した匿名化を実施するとする。例えば、情報処理装置２０は、「［人口／１０００］−匿名性」を実施するとする。そして、Ａ町の人口が４０００人、Ｂ町の人口が６０００人とする。つまり、Ｘ市の人口は、１００００人とする。 X city includes A town and B town. The information processing apparatus 20 is provided in X city, A town, and B town. The information processing apparatus 20 performs anonymization in proportion to the population. For example, the information processing apparatus 20 performs “[population / 1000] −anonymity”. The population of town A is 4000, and the population of town B is 6000. In other words, the population of City X is 10,000.

この場合、Ａ町の情報処理装置２０は、Ａ町のデータを「４（＝４０００／１０００）−匿名化」する。 In this case, the information processing apparatus 20 in A town “4 (= 4000/1000) —anonymize” the data in A town.

同様に、Ｂ町の情報処理装置２０は、Ｂ町のデータを「６（＝６０００／１０００）−匿名化」する。 Similarly, the information processing apparatus 20 in town B “6 (= 6000/1000) —anonymize” the data in town B.

さらに、Ｘ市の情報処理装置２０は、Ａ町の情報処理装置２０からＡ町の匿名化後のデータを受信し、Ｂ町の情報処理装置２０からＢ町の匿名化後のデータ受信し、データを集約し、「１０（＝１００００／１０００）−匿名化」する。 Furthermore, the information processing apparatus 20 in the X city receives the data after the anonymization of the A town from the information processing apparatus 20 in the A town, receives the data after the anonymization of the B town from the information processing apparatus 20 in the B town, Data is aggregated and “10 (= 10000/1000) −anonymized”.

本変形例の情報処理装置２０は、適切な処理量と匿名化を実現できる効果を得ることができる。 The information processing apparatus 20 according to the present modification can obtain an effect of realizing an appropriate amount of processing and anonymization.

その理由は、次のとおりである。 The reason is as follows.

本変形例の情報処理装置２０は、必要なデータの範囲を匿名化するためである。 This is because the information processing apparatus 20 of this modification anonymizes the range of necessary data.

また、本実施形態の上位の情報処理装置２０は、下位の情報処理装置２０から匿名化後のデータを受信し、匿名化するためである。 Moreover, the upper information processing apparatus 20 of this embodiment is for receiving the data after anonymization from the lower information processing apparatus 20, and making it anonymous.

以上、実施形態を参照して本願発明を説明したが、本願発明は、上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出する方針決定手段と、
前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する匿名化データ作成手段と
を含む情報処理装置。 (Appendix 1)
Policy determining means for calculating a policy to be used for generalization of the data of the providing source device, based on the data for policy calculation that is data that obfuscates the data of the providing source device to the extent that the policy can be calculated;
An anonymized data creating means for creating anonymized data based on data generalized based on the policy by the provider device.

（付記２）
前記方針決定手段が、
前記方針算出用データを集約する方針算出用データ集約手段と、
前記集約された方針算出用データを匿名化する匿名化手段と、
前記匿名化されたデータを基に前記方針を算出する方針算出手段と
を含む付記１に記載の情報処理装置。 (Appendix 2)
The policy determining means is
Policy calculation data aggregating means for aggregating the policy calculation data;
Anonymization means for anonymizing the aggregated policy calculation data;
The information processing apparatus according to appendix 1, further comprising: policy calculation means for calculating the policy based on the anonymized data.

（付記３）
前記方針算出用データ集約手段が、
前記方針算出用データとして、前記提供元装置のデータに所定のノイズを加算したデータを集約する
付記２に記載の情報処理装置。 (Appendix 3)
The policy calculation data aggregation means includes:
The information processing apparatus according to attachment 2, wherein data obtained by adding predetermined noise to the data of the providing apparatus is aggregated as the policy calculation data.

（付記４）
前記方針算出手段が、
前記方針として、前記提供元装置の汎化における汎化幅又は汎化の境界を算出する
付記２又は付記３に記載の情報処理装置。 (Appendix 4)
The policy calculation means is
The information processing apparatus according to claim 2 or 3, wherein, as the policy, a generalization width or a generalization boundary in the generalization of the providing apparatus is calculated.

（付記５）
前記匿名化手段が、
準識別子を匿名化する
付記２乃至付記４のいずれか１項に記載の情報処理装置。 (Appendix 5)
The anonymization means is
The information processing apparatus according to any one of supplementary notes 2 to 4, wherein the quasi-identifier is anonymized.

（付記６）
前記匿名化手段が、
匿名化における「分割の不均等性」及び「匿名性を満たさないリスク」を考慮して匿名化する
付記２乃至付記５のいずれか１項に記載の情報処理装置。 (Appendix 6)
The anonymization means is
The information processing apparatus according to any one of appendix 2 to appendix 5, wherein anonymization is performed in consideration of “unevenness of division” and “risk that does not satisfy anonymity” in anonymization.

（付記７）
前記方針算出用データが、前記提供元装置のデータの一部を曖昧化したデータであり、
前記匿名化手段が、前記提供元装置のデータに対する前記方針算出用データのデータ量を基に、前記方針算出用データを匿名化する
付記２乃至付記６のいずれか１項に記載の情報処理装置。 (Appendix 7)
The policy calculation data is data obtained by obscuring a part of the data of the provider device,
The information processing apparatus according to any one of appendix 2 to appendix 6, wherein the anonymization unit anonymizes the policy calculation data based on a data amount of the policy calculation data with respect to data of the providing source device. .

（付記８）
前記匿名化データ作成手段が、
前記提供元装置が前記方針を基に汎化したデータを集約する汎化データ集約手段と、
前記集約された汎化データの匿名性を検査する匿名性検査手段と、
前記検査の結果、前記集約された汎化データが匿名性を満たさない場合、前記集約された汎化データをサプレッションするサプレッション手段と
を含む付記１乃至付記７のいずれか1項に記載の情報処理装置。 (Appendix 8)
The anonymized data creating means is
Generalized data aggregating means for aggregating data generalized based on the policy by the provider device;
Anonymity checking means for checking anonymity of the aggregated generalized data;
The information processing according to any one of supplementary notes 1 to 7, further comprising: suppression means that suppresses the aggregated generalized data when the aggregated generalized data does not satisfy anonymity as a result of the inspection. apparatus.

（付記９）
前記匿名性検査手段が、
前記集約された汎化データの匿名性の検査を基に、前記提供元装置に方針算出用データの修正を指示する
付記８に記載の情報処理装置。 (Appendix 9)
The anonymity inspection means is
The information processing apparatus according to claim 8, wherein the information processing apparatus instructs the provider apparatus to modify policy calculation data based on anonymity inspection of the aggregated generalized data.

（付記１０）
前記サプレッション手段が、
準識別子をサプレッションする
付記８又は付記９に記載の情報処理装置。 (Appendix 10)
The suppression means is
The information processing apparatus according to appendix 8 or appendix 9, wherein the quasi-identifier is suppressed.

（付記１１）
前記準識別子が、
数値データ又は分類名データである
付記５又は付記１０に記載の情報処理装置。 (Appendix 11)
The quasi-identifier is
The information processing apparatus according to appendix 5 or appendix 10, which is numerical data or classification name data.

（付記１２）
方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置がデータの汎化に用いる方針を算出する方針算出手段と、
前記提供元装置が前記方針を基に汎化したデータを基に匿名化データを作成する匿名化データ作成手段と
を含む情報処理装置と、
前記提供元装置のデータを保存するデータ保存手段と、
前記保存するデータを基に前記方針算出用データを作成する方針算出用データ作成手段と、
前記情報処理装置が算出した前記方針を保存する方針保存手段と、
前記方針を基に前記保存するデータを汎化する汎化手段と
を含む提供元装置と
を含む情報処理システム。 (Appendix 12)
Policy calculation means for calculating a policy used by the provider device for data generalization based on policy calculation data that is data that obfuscates the data of the provider device to such an extent that a policy can be calculated;
An information processing apparatus including anonymized data creating means for creating anonymized data based on data generalized based on the policy by the provider device;
Data storage means for storing data of the provider device;
Policy calculation data creating means for creating the policy calculation data based on the stored data;
Policy storage means for storing the policy calculated by the information processing apparatus;
An information processing system including: a generalization unit that generalizes the data to be stored based on the policy.

（付記１３）
階層構造で接続された複数の前記情報処理装置を含む
付記８に記載の情報処理システム。 (Appendix 13)
The information processing system according to appendix 8, including a plurality of the information processing devices connected in a hierarchical structure.

（付記１４）
方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出し、
前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する
情報匿名化方法。 (Appendix 14)
Based on the data for policy calculation, which is data that obfuscates the data of the providing source device to the extent that the policy can be calculated, calculates the policy used for generalization of the data of the providing source device,
An information anonymization method for creating anonymized data based on data generalized by the provider device based on the policy.

（付記１５）
方針を算出できる程度に提供元装置のデータを曖昧化したデータである方針算出用データを基に、前記提供元装置のデータの汎化に用いる方針を算出する処理と、
前記提供元装置が前記方針を基に汎化したデータを基に、匿名化データを作成する処理と
をコンピュータに実行させるプログラム。 (Appendix 15)
Based on policy calculation data that is data that obfuscates the data of the providing source device to such an extent that the policy can be calculated, a process of calculating a policy used for generalization of the data of the providing source device;
A program that causes a computer to execute processing for creating anonymized data based on data generalized by the provider device based on the policy.

１０情報処理システム
２０情報処理装置
２１情報処理装置
３０提供元装置
４０利用者装置
５０ネットワーク
６０情報処理装置
２０１方針決定部
２０２匿名化データ作成部
２０３方針決定部
２１０方針算出用データ集約部
２１１ノイズ付加データ集約部
２２０匿名化部
２３０方針算出部
２３１汎化幅算出部
２４０汎化データ集約部
２５０匿名性検査部
２６０サプレッション部
３１０データ保存部
３２０ノイズ付加部
３３０汎化幅保存部
３４０汎化部
６１０ＣＰＵ
６２０ＲＯＭ
６３０ＲＡＭ
６４０内部記憶装置
６５０ＩＯＣ
６６０入力機器
６７０表示機器
６８０ＮＩＣ
７００記憶媒体 DESCRIPTION OF SYMBOLS 10 Information processing system 20 Information processing apparatus 21 Information processing apparatus 30 Provider apparatus 40 User apparatus 50 Network 60 Information processing apparatus 201 Policy determination part 202 Anonymized data creation part 203 Policy determination part 210 Policy calculation data aggregation part 211 Noise addition Data aggregation unit 220 Anonymization unit 230 Policy calculation unit 231 Generalization width calculation unit 240 Generalization data aggregation unit 250 Anonymity check unit 260 Suppression unit 310 Data storage unit 320 Noise addition unit 330 Generalization width storage unit 340 Generalization unit 610 CPU
620 ROM
630 RAM
640 Internal storage device 650 IOC
660 Input device 670 Display device 680 NIC
700 storage media

Claims

Policy determining means for calculating a policy to be used for generalization of the data of the providing source device, based on the data for policy calculation that is data that obfuscates the data of the providing source device to the extent that the policy can be calculated;
An anonymized data creating means for creating anonymized data based on data generalized based on the policy by the provider device.

The policy determining means is
Policy calculation data aggregating means for aggregating the policy calculation data;
Anonymization means for anonymizing the aggregated policy calculation data;
The information processing apparatus according to claim 1, further comprising: policy calculation means for calculating the policy based on the anonymized data.

The policy calculation data aggregation means includes:
The information processing apparatus according to claim 2, wherein the policy calculation data includes data obtained by adding predetermined noise to the data of the providing source apparatus.

The policy calculation means is
The information processing apparatus according to claim 2, wherein a generalization width or a generalization boundary in the generalization of the providing apparatus is calculated as the policy.

The anonymization means is
The information processing apparatus according to any one of claims 2 to 4, wherein the quasi-identifier is anonymized.

The anonymization means is
The information processing apparatus according to any one of claims 2 to 5, wherein anonymization is performed in consideration of "unevenness of division" and "risk that does not satisfy anonymity" in anonymization.

The policy calculation data is data obtained by obscuring a part of the data of the provider device,
The information according to any one of claims 2 to 6, wherein the anonymization means anonymizes the policy calculation data based on a data amount of the policy calculation data with respect to data of the providing source device. Processing equipment.

The anonymized data creating means is
Generalized data aggregating means for aggregating data generalized based on the policy by the provider device;
Anonymity checking means for checking anonymity of the aggregated generalized data;
The suppression method according to any one of claims 1 to 7, further comprising: suppression means that suppresses the aggregated generalized data when the aggregated generalized data does not satisfy anonymity as a result of the inspection. Information processing device.

The anonymity inspection means is
The information processing apparatus according to claim 8, wherein the provider apparatus is instructed to modify the policy calculation data based on anonymity inspection of the aggregated generalized data.

The suppression means is
The information processing apparatus according to claim 8 or 9, wherein the quasi-identifier is suppressed.

The quasi-identifier is
The information processing apparatus according to claim 5, wherein the information processing apparatus is numerical data or classification name data.

Policy calculation means for calculating a policy used by the provider device for data generalization based on policy calculation data that is data that obfuscates the data of the provider device to such an extent that a policy can be calculated;
An information processing apparatus including anonymized data creating means for creating anonymized data based on data generalized based on the policy by the provider device;
Data storage means for storing data of the provider device;
Policy calculation data creating means for creating the policy calculation data based on the stored data;
Policy storage means for storing the policy calculated by the information processing apparatus;
An information processing system including: a generalization unit that generalizes the data to be stored based on the policy.

The information processing system according to claim 8, comprising a plurality of the information processing devices connected in a hierarchical structure.

Based on the data for policy calculation, which is data that obfuscates the data of the providing source device to the extent that the policy can be calculated, calculates the policy used for generalization of the data of the providing source device,
An information anonymization method for creating anonymized data based on data generalized by the provider device based on the policy.

Based on policy calculation data that is data that obfuscates the data of the providing source device to such an extent that the policy can be calculated, a process of calculating a policy used for generalization of the data of the providing source device;
A program that causes a computer to execute processing for creating anonymized data based on data generalized by the provider device based on the policy.