JP6627328B2

JP6627328B2 - Anonymous processing device and anonymous processing method

Info

Publication number: JP6627328B2
Application number: JP2015164276A
Authority: JP
Inventors: 秀暢小栗
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-08-21
Filing date: 2015-08-21
Publication date: 2020-01-08
Anticipated expiration: 2035-08-21
Also published as: JP2017041212A

Description

本発明は、個人情報を匿名化又は多様化するための匿名処理技術に関する。 The present invention relates to an anonymous processing technique for anonymizing or diversifying personal information.

情報処理技術の発展に伴い、日常の多くの場面で情報が収集され、この収集された情報を用いた処理が行われている。例えば、消費者が店舗の会員となって商品を購入する場合、会員登録時に消費者の氏名、年齢、性別、住所、メールアドレス等を登録することが多い。そして、消費者が商品を購入すると、店舗側のシステムが、この消費者と購入した商品の情報を対応付けて記録する。このように購入した商品の情報を蓄積して分析すると、当該消費者の嗜好が推定でき、この消費者が好む新商品が発売された場合にダイレクトメールを発送するといったサービスを行うことができる。更に、多くの消費者の情報について分析することで、２０代女性の好む商品や関東エリアで好まれる商品といった情報を導くことができ、マーケティング等に利用される。 2. Description of the Related Art With the development of information processing technology, information is collected in many everyday situations, and processing is performed using the collected information. For example, when a consumer becomes a member of a store and purchases a product, the name, age, gender, address, mail address, and the like of the consumer are often registered at the time of member registration. Then, when the consumer purchases the product, the store-side system records the information of the consumer and the purchased product in association with each other. By accumulating and analyzing the information of the purchased product in this way, it is possible to estimate the taste of the consumer and to provide a service such as sending out a direct mail when a new product desired by the consumer is released. Further, by analyzing information of many consumers, information such as products preferred by women in their twenties and products favored in the Kanto area can be derived and used for marketing and the like.

また、これらの情報は、当該店舗だけでなく、商品を製造するメーカや、他の企業にとっても新商品の開発や安全性の向上などに用いることができ、価値を有することがある。 In addition, such information can be used not only by the store, but also by a maker that manufactures the product and other companies to develop new products and improve safety, and may have value.

しかし、店舗が有する消費者の個人情報を各消費者の許諾を得ずに、他者へ提供することはできないため、従来は、前記会員登録時の契約に基づいて各消費者が許諾した範囲内で主に当該店舗において利用し、第三者へ個人情報が漏洩しないように管理していた。但し、近年、前記個人情報から有用な情報を導き、個人が特定できない状態で他者へ提供できれば、産業の活性化につながる可能性があるため、上記消費者に関する情報を匿名化した上で他者へ提供することが検討されている。 However, since the personal information of the consumer possessed by the store cannot be provided to other persons without obtaining the permission of each consumer, conventionally, the range of the license granted by each consumer based on the contract at the time of the member registration has been conventionally used. It was used mainly at the store and managed to prevent personal information from leaking to third parties. However, in recent years, if useful information can be derived from the personal information and provided to others in a state in which the individual cannot be identified, there is a possibility of activating the industry. To be provided to the public.

この場合、単に各消費者の氏名や電話番号等の識別情報を削除するだけでなく、他の情報と組み合わせても元の消費者が識別できないように匿名化する必要がある。 In this case, it is necessary to not only delete the identification information such as the name and the telephone number of each consumer, but also to make the original consumer unidentifiable even when combined with other information.

例えば、年齢が記載されている会員リストに２５歳の人が一人だけであると、２５歳の知人がその会員であることを知った時点で、その人を特定できることになる。即ち、２５歳の会員という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、個人を特定できる可能性が高い。 For example, if there is only one 25-year-old person in the member list in which the age is described, the person can be identified when a 25-year-old acquaintance knows that he is a member. That is, if there is only one person having the attribute of a 25-year-old member, there is a high possibility that the individual can be specified by comparing it with other information.

そこで、会員リストの年齢の記載を１０歳区切りに抽象化し、２０代が３人のように同じ属性を持つ人が複数人となるようにすれば、３人のうちの誰であるかを特定できなくなる。このように同じ属性を持つ人がｋ人以上いる状態を、「ｋ−匿名性」を満たすと称し、そのようにデータを加工することを「k-匿名化」と称する。 Therefore, if the age description in the member list is abstracted into 10-year-old sections and if there are two or more people with the same attribute, such as three in their twenties, it is possible to specify which of the three people become unable. Such a state where there are k or more people having the same attribute is referred to as satisfying "k-anonymity", and processing data in such a manner is referred to as "k-anonymization".

また、匿名化の基準や手法としては、種々のものが提案されており、例えば、ｌ−多様性、Ｐｋ匿名化、t-closeness（非特許文献１参照）が知られている。 In addition, various standards and methods for anonymization have been proposed, for example, l-diversity, Pk anonymization, and t-closeness (see Non-Patent Document 1).

特開２０１２−１３３４５１号公報JP 2012-133451 A 特開２０１１−１０８１９５号公報JP 2011-108195 A 特開２０１１−１２８８６２号公報JP 2011-128882 A 特開２０１２−７８９３２号公報JP 2012-78932 A

中川裕志著、″プライバシ保護データマイニング″、[平成26年5月23日検索］、インターネット〈URL：http://www.r.dl.itc.u-tokyo.ac.jp/~nakagawa/labintro/2010PPDM-summary.pdf〉Author: Hiroshi Nakagawa, "Privacy Protection Data Mining", [Search on May 23, 2014], Internet <URL: http://www.r.dl.itc.u-tokyo.ac.jp/~nakagawa/labintro /2010PPDM-summary.pdf>

個人情報をｋ-匿名化することで、再識別化や悪用のリスクを低減させ、第三者への情
報提供等が可能になるが、匿名状態とする処理や検定の処理に多大な計算リソースが必要になるという問題があった。匿名状態の生成処理と検定処理は、対象データの属性値同士の組み合わせで行われるため、組み合わせる属性の数を増加させたり、属性値の種類（選択肢数）が多い属性を組み合わせたりして、組み合わせ数（後述の区分数）が増加すると、指数関数的に計算量が増加するため、詳細なデータを得ようとすると、膨大な計算リソースが必要になってしまう。 By making personal information k-anonymous, it is possible to reduce the risk of re-identification and abuse, and to provide information to third parties. However, a great deal of computational resources are required for anonymous processing and verification processing. There was a problem that needed. Since the anonymous state generation process and the test process are performed using a combination of attribute values of the target data, the number of attributes to be combined is increased, or attributes with many types of attribute values (the number of choices) are combined. When the number (the number of sections to be described later) increases, the amount of calculation increases exponentially. Therefore, to obtain detailed data, an enormous amount of calculation resources is required.

また、詳細なデータを得ようとして、多数の属性を組み合わせると、組み合わされた属性値の出現数が劇的に減少するため匿名化できず、匿名化できるように属性値の抽象度を高くすると、匿名化できたとしても抽象度が高く、結果的に情報量の少ないデータとなってしまう。 Also, when trying to obtain detailed data, if many attributes are combined, the number of occurrences of the combined attribute values will decrease dramatically and it will not be possible to anonymize. However, even if the data can be anonymized, the abstraction is high, resulting in data with a small amount of information.

このため、従来、詳細で且つ匿名可能なデータを少ない演算処理量で得るため、ＯＬＡ方式のようにボトムアップ又はトップダウンで匿名状態を確認する処理が提案されているが、トップから計算してボトム近くで最適解（ＧＯＤ：Globally Optimal Dataset）が見つかることや、ボトムから計算してトップ近くでＧＯＤが見つかることもあり、確実に計算量が抑えられるものではなかった。 For this reason, in the past, in order to obtain detailed and anonymous data with a small amount of calculation processing, processing of confirming an anonymous state from the bottom up or from the top down as in the OLA method has been proposed. Since the optimal solution (GOD: Globally Optimal Dataset) was found near the bottom, or the GOD was found near the top calculated from the bottom, the calculation amount could not be reliably reduced.

そこで本発明は、詳細で且つ匿名化可能なデータを少ない演算処理量で求める技術を提供する。 Therefore, the present invention provides a technique for obtaining detailed and anonymous data with a small amount of arithmetic processing.

本発明に係る匿名処理装置は、
個人情報を複数の属性の組み合わせで匿名化する場合に、前記属性が取り得る属性値の組み合わせ数を区分数とし、前記属性の組み合わせで前記個人情報を匿名化した場合に、前記属性値の組み合わせが出現する数のうち最小の数を最小出現数とし、前記区分数と前記最少出現数の対応関係を記憶する記憶部と、
処理対象の個人情報を匿名化した場合の最少出現数を目標値以上とする場合に、前記区分数と前記最少出現数の対応関係に基づき、前記最少出現数が前記目標値となる前記区分数を予測値とする予測部と、
処理対象の個人情報を複数の属性の組み合わせで匿名化する匿名化部と、
前記匿名化を行った場合の最少出現数が前記目標値以上か否かの検定を行う検定部と、を備え、
前記匿名化部が、複数の属性の組み合わせで匿名化を行う場合の区分数に基づいて、前記属性の組み合わせを順位付けし、前記予測値と対応する区分数となる順位の属性の組み合わせで匿名化を行い、最少出現数が前記目標値以上でないと検定された場合に、前記順位付けに従い、前記予測値と対応する区分数と近い順に前記匿名化と前記検定を行う。 The anonymous processing device according to the present invention,
When the personal information is anonymized by a combination of a plurality of attributes, the number of combinations of attribute values that the attribute can take is defined as the number of categories, and when the personal information is anonymized by the combination of the attributes, the combination of the attribute values A storage unit that stores the correspondence between the number of divisions and the minimum number of occurrences, with the minimum number being the minimum occurrence number among the numbers in which
When the minimum number of appearances when the personal information to be processed is anonymized is equal to or more than the target value, based on the correspondence between the number of divisions and the minimum number of occurrences, the number of divisions where the minimum number of occurrences becomes the target value A prediction unit having a prediction value of
An anonymization unit that anonymizes personal information to be processed with a combination of a plurality of attributes;
A testing unit for testing whether or not the minimum number of appearances in the case of performing the anonymization is equal to or more than the target value,
The anonymization unit ranks the combinations of the attributes based on the number of classes when performing anonymization with a combination of a plurality of attributes, and anonymizes the combinations of the attributes having the ranks corresponding to the predicted values and the number of classes. When it is determined that the minimum number of occurrences is not greater than or equal to the target value, the anonymization and the test are performed in the order of the number of segments corresponding to the predicted value according to the ranking.

前記匿名処理装置は、前記匿名化を行った際の最少出現数が前記目標値以上であった場合に、当該匿名化を行った際の区分数で前記予測値を更新し、更新した予測値を用いて前記匿名化を行ってもよい。 The anonymous processing device, when the minimum number of appearances when performing the anonymization is equal to or more than the target value, updates the predicted value with the number of segments when performing the anonymization, the updated predicted value May be used to perform the anonymization.

前記匿名処理装置は、前記匿名化を行った際の最少出現数が前記目標値以上でないと検定された場合に、前記順位付けに従い、次に匿名化を行う属性の組み合わせを決定して匿名性を満たす属性の組み合わせを探索する順序を選択可能に複数設定してもよい。 The anonymous processing device, when it is determined that the minimum number of occurrences when performing the anonymization is not more than the target value, according to the ranking, determine the combination of the attributes to be subsequently anonymized, May be set in a selectable plurality in the order of searching for a combination of attributes that satisfies.

本発明に係る匿名処理方法は、
個人情報を複数の属性の組み合わせで匿名化する場合に、前記属性が取り得る属性値の組み合わせ数を区分数とし、前記属性の組み合わせで前記個人情報を匿名化した場合に、前記属性値の組み合わせが出現する数のうち最小の数を最小出現数とし、前記区分数と前記最少出現数の対応関係を記憶する記憶部を参照するステップと、
処理対象の個人情報を匿名化した場合の最少出現数を目標値以上とする場合に、前記区分数と前記最少出現数の対応関係に基づき、前記最少出現数が前記目標値となる前記区分数を予測値とするステップと、
処理対象の個人情報を複数の属性の組み合わせで匿名化するステップと、
前記匿名化を行った場合の最少出現数が前記目標値以上か否かの検定を行うステップと、
をコンピュータが実行し、
前記匿名化するステップにて、複数の属性の組み合わせで匿名化を行う場合の区分数に基づいて、前記属性の組み合わせを順位付けし、前記予測値と対応する区分数となる順位の属性の組み合わせで匿名化を行い、
前記検定を行うステップにて、最少出現数が前記目標値以上でないと検定された場合に、前記順位付けに従い、前記予測値と対応する区分数と近い順に前記匿名化と前記検定を行う。 The anonymous processing method according to the present invention,
When the personal information is anonymized by a combination of a plurality of attributes, the number of combinations of attribute values that the attribute can take is set to the number of categories, and when the personal information is anonymized by the combination of the attributes, the combination of the attribute values Is the minimum number of occurrences among the number of occurrences, the step of referring to a storage unit that stores the correspondence between the number of divisions and the minimum number of occurrences,
When the minimum number of appearances when the personal information to be processed is anonymized is equal to or more than the target value, based on the correspondence between the number of divisions and the minimum number of occurrences, the number of divisions where the minimum number of occurrences becomes the target value Taking as a predicted value;
Anonymizing personal information to be processed with a combination of a plurality of attributes;
Performing a test as to whether the minimum number of appearances in the case of performing the anonymization is equal to or more than the target value,
Is executed by the computer,
In the anonymizing step, the combinations of the attributes are ranked based on the number of segments in the case of performing the anonymization with a combination of a plurality of attributes, and the combination of the attributes of the order that is the number of segments corresponding to the predicted value Anonymize with
In the step of performing the test, when it is determined that the minimum number of occurrences is not equal to or greater than the target value, the anonymization and the test are performed according to the ranking and in ascending order of the number of segments corresponding to the predicted value.

前記匿名処理方法は、前記匿名化を行った際の最少出現数が前記目標値以上であった場合に、当該匿名化を行った際の区分数で前記予測値を更新し、更新した予測値を用いて前記匿名化を行ってもよい。 The anonymous processing method, when the minimum number of appearances when the anonymization is performed is equal to or more than the target value, updates the predicted value with the number of divisions when the anonymization is performed, and updates the updated predicted value May be used to perform the anonymization.

前記匿名処理方法は、前記匿名化を行った際の最少出現数が前記目標値以上でないと検定された場合に、前記順位付けに従い、次に匿名化を行う属性の組み合わせを決定して匿名性を満たす属性の組み合わせを探索する順序を選択可能に複数設定したものであってもよい。 In the anonymity processing method, when it is verified that the minimum number of appearances when performing the anonymization is not more than the target value, a combination of attributes to be subsequently anonymized is determined according to the ranking and the anonymity is determined. May be set in a selectable plurality in the order of searching for a combination of attributes that satisfy the condition.

また、本発明は、上記方法をコンピュータに実行させるためのプログラムであっても良い。更に、前記化プログラムは、コンピュータが読み取り可能な記録媒体に記録されていても良い。 Further, the present invention may be a program for causing a computer to execute the above method. Further, the conversion program may be recorded on a computer-readable recording medium.

ここで、コンピュータが読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータから読み取ることができる記録媒体をいう。このような記録媒体の内コンピュータから取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、CD-ROM、CD-R/W、DVD、DAT、８mmテープ、メモリカード等がある。 Here, the computer-readable recording medium refers to a recording medium that stores information such as data and programs by electrical, magnetic, optical, mechanical, or chemical action and can be read from a computer. . Examples of such a recording medium that can be removed from a computer include a flexible disk, a magneto-optical disk, a CD-ROM, a CD-R / W, a DVD, a DAT, an 8 mm tape, a memory card, and the like.

また、コンピュータに固定された記録媒体としてハードディスクやＲＯＭ（リードオンリーメモリ）等がある。 Further, as a recording medium fixed to the computer, there is a hard disk, a ROM (Read Only Memory) or the like.

本発明は、詳細で且つ匿名化可能なデータを少ない演算処理量で求める技術を提供することができる。 The present invention can provide a technique for obtaining detailed and anonymous data with a small amount of calculation processing.

図１は、匿名化処理の説明図である。FIG. 1 is an explanatory diagram of the anonymization process. 図２は、多様化処理の説明図である。FIG. 2 is an explanatory diagram of the diversification process. 図３は、実施形態における匿名化システムの概略構成図である。FIG. 3 is a schematic configuration diagram of the anonymization system in the embodiment. 図４は、個人情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of personal information. 図５は、図４に示す属性Ｂ（投薬量）の値一般化階層（VGH：value generalization hierarchies)を示す図である。FIG. 5 is a diagram showing a value generalization hierarchy (VGH) of the attribute B (dosage amount) shown in FIG. 図６は、図４に示す属性Ｃ（完治までの期間）の値一般化階層を示す図である。FIG. 6 is a diagram showing a value generalization hierarchy of the attribute C (period until complete recovery) shown in FIG. 図７は、属性Ｂ及び属性Ｃの属性一般化階層（DGH：Domain Generalization Hierachies）を示す図である。FIG. 7 is a diagram showing an attribute generalization hierarchy (DGH: Domain Generalization Hierachies) of the attribute B and the attribute C. 図８は、地域（属性Ｄ）の値一般化階層と属性一般化階層を示す図である。FIG. 8 is a diagram showing a value generalized hierarchy and an attribute generalized hierarchy of a region (attribute D). 図９は、属性を組み合わせた場合の区分数を示す図である。FIG. 9 is a diagram illustrating the number of sections when attributes are combined. 図１０は、匿名処理装置及び近似関数算出装置のハードウェア構成を示す図である。FIG. 10 is a diagram illustrating a hardware configuration of the anonymous processing device and the approximate function calculation device. 図１１は、近似関数算出方法の説明図である。FIG. 11 is an explanatory diagram of an approximate function calculation method. 図１１は、近似関数取得方法の説明図である。FIG. 11 is an explanatory diagram of an approximate function obtaining method. 図１３は、予測ＧＯＤ値に基づく匿名化処理について説明する。FIG. 13 illustrates an anonymization process based on a predicted GOD value. 図１４は、予測ＧＯＤ値に基づく匿名化処理について説明する。FIG. 14 illustrates an anonymization process based on a predicted GOD value. 図１５は、値一般化階層の一例を示す図である。FIG. 15 is a diagram illustrating an example of the value generalization hierarchy. 図１６は、匿名化処理を連続して行う例を示す図である。FIG. 16 is a diagram illustrating an example in which anonymization processing is continuously performed. 図１７は、匿名化処理の変形例を示す図である。FIG. 17 is a diagram illustrating a modification of the anonymization process. 図１８は、近似関数の例を示す図である。FIG. 18 is a diagram illustrating an example of the approximation function.

以下、図面を参照して本発明を実施するための形態について説明する。以下の実施の形態の構成は例示であり、本発明は実施の形態の構成に限定されない。 Hereinafter, embodiments for implementing the present invention will be described with reference to the drawings. The configuration of the following embodiment is an exemplification, and the present invention is not limited to the configuration of the embodiment.

〈実施形態１〉
図１は匿名化処理の説明図、図２は多様化処理の説明図である。図１（Ａ）は、姓、年齢、性別の項目を含む会員情報から姓の項目を削除した例を示す。図１（Ａ）に示すように年齢が記載されている会員情報に１６歳の女性が一人だけであると、１６歳の女性が、この会員であることが分かった時点で、その人を特定できる。即ち、１６歳・女性という属性を持つ人が一人だけであると、他の情報と照らし合わせることで、個人を特定できる可能性がある。 <Embodiment 1>
FIG. 1 is an explanatory diagram of the anonymization process, and FIG. 2 is an explanatory diagram of the diversification process. FIG. 1A shows an example in which the last name item is deleted from the member information including the last name, age, and gender items. As shown in FIG. 1 (A), if there is only one 16-year-old woman in the member information in which the age is described, when the 16-year-old woman is found to be this member, that person is identified. it can. That is, if there is only one person having the attribute of 16-year-old woman, there is a possibility that the individual can be identified by comparing it with other information.

図１（Ｂ）では、会員リストの年齢の記載を抽象化し、０代（１０歳未満）、１０代、２０代のように年代別とした。しかし、この場合でも１０代女性は一人だけであり、図１（Ａ）と同様に個人が特定できてしまい匿名化としては不十分である。 In FIG. 1B, the description of the age in the member list is abstracted and classified by age, such as 0s (under 10 years old), 10s, and 20s. However, even in this case, there is only one female teenager, and the individual can be identified similarly to FIG. 1A, which is insufficient for anonymization.

そこで、図１（Ｃ）では、更に抽象化し、１０代以下（１９歳以下）と２０代のように年代の区切りを変更した。図１（Ｃ）の場合、１０代以下の女性が２人であり、［１０代以下］及び［女性］という属性が単一では無くなる。このため前述のように１６歳の女性が、この会員であることが分かったとしても、どちらが当該１６歳女性のデータであるかは特定できない。このように同じ属性を持つ人がｋ人以上いる状態を、「k-匿名性」を満たすと称し、そのようにデータを加工することを「k-匿名化」と称する。 Therefore, in FIG. 1 (C), the abstraction is further abstracted, and the division of the ages is changed as in teens and younger (19 years and younger) and in their 20s. In the case of FIG. 1C, there are two women under the age of ten, and the attributes [teens and under] and [female] are not single. For this reason, even if it turns out that a 16-year-old woman is this member as described above, it cannot be specified which is the data of the 16-year-old woman. Such a state where there are k or more people having the same attribute is referred to as “k-anonymity”, and processing data in such a manner is referred to as “k-anonymization”.

図２は、ユーザ毎の利用駅のデータを抽象化し、ユーザ毎の利用駅が属する区のデータとした例を示す。抽象化前のデータでは、駅が特定されているために、住居が新宿駅付近で勤務地が東京駅付近といったデータと照らし合わせることでユーザを特定できる可能性がある。このため利用駅を抽象化して、利用駅が属する区とすることで、新宿区内の駅と
千代田区内の駅を利用するユーザが複数となり、利用者が特定されなくなる。このように「新宿区内の駅と千代田区内の駅を利用する」のように属性値がｌ種類の可能性を持つよう抽象化することをｌ−多様化と称する。 FIG. 2 shows an example in which the data of the use station for each user is abstracted and the data of the ward to which the use station belongs for each user is used. Since the station is specified in the data before the abstraction, there is a possibility that the user can be specified by comparing the data with data such as a house near Shinjuku station and a work place near Tokyo station. For this reason, by abstracting the use station to the ward to which the use station belongs, the number of users who use the station in Shinjuku ward and the station in Chiyoda ward becomes multiple, and the user is not specified. Such an abstraction that the attribute value has one type of possibility, such as “using a station in Shinjuku ward and a station in Chiyoda ward”, is called l-diversification.

図３は、本実施形態における匿名化システム１０の概略構成図である。匿名化システム１０は、図１に示すように、匿名処理装置１と近似関数算出装置２を有している。 FIG. 3 is a schematic configuration diagram of the anonymization system 10 in the present embodiment. As shown in FIG. 1, the anonymization system 10 includes an anonymous processing device 1 and an approximate function calculation device 2.

匿名処理装置１は、データ受付部１１や、区分数取得部１２、係数取得部１３、匿名化部１５、検定部１６、データ出力部１８、予測部１９、匿名結果ＤＢ（データベース）３１、匿名情報縦列ＤＢ３２を備えている。 The anonymous processing device 1 includes a data receiving unit 11, a division number acquiring unit 12, a coefficient acquiring unit 13, an anonymizing unit 15, a test unit 16, a data output unit 18, a prediction unit 19, an anonymous result DB (database) 31, anonymous An information column DB 32 is provided.

データ受付部１１は、個人と対応付けられた複数の項目（属性）を含む対象データ（個人情報）や、匿名化の条件、匿名化に係る命令等を受け付ける。なお、個人情報や匿名化の条件等の受付は、インターネット等のネットワークを介して受信するものや、記憶媒体から読み出されるもの、キーボード等の入力手段から入力されるものであっても良い。図４は個人情報の一例を示す図である。図４に示す例では、各行（タプル）に各個人の性別、投薬量、完治までの期間等の属性を対応付けて記録している。 The data receiving unit 11 receives target data (personal information) including a plurality of items (attributes) associated with an individual, an anonymization condition, an anonymization command, and the like. The reception of personal information and anonymization conditions may be received via a network such as the Internet, read from a storage medium, or input from input means such as a keyboard. FIG. 4 is a diagram illustrating an example of personal information. In the example shown in FIG. 4, each row (tuple) is recorded in association with attributes of each individual, such as sex, dosage, and period until complete healing.

区分数取得部１２は、匿名化対象の個人情報を匿名化する際の区分数を取得する。区分数は、匿名情報に含まれる属性が取り得る属性値（語）の種類の数、換言すると匿名情報を構成する属性の値を同一の値毎に区分した場合の区分の数である。また、区分数取得部１２は、一般化（抽象化）した属性値の区分数を取得しても良い。図５は、図４に示す属性Ｂ（投薬量）の値一般化階層（VGH:value generalization hierarchies)を示す図であ
る。属性Ｂは、１ｍｌ〜２００ｍｌの範囲で投薬した量を示すデータであり、図５のＢ３のように１ｍｌ毎の値に区分した場合、取り得る属性値は、１ｍｌ，２ｍｌ，３ｍｌ，４ｍｌ・・・２００ｍｌのように２００種類となる。即ち、属性Ｂを１ｍｌ毎に区分した場合の区分数は、２００区分である。また、属性Ｂを一般化（抽象化）し、例えば、図５のＢ２のように１０ｍｌ毎の値に区分した場合、取り得る属性値は、１ｍｌ〜１０ｍｌ，１１ｍｌ〜２０ｍｌ，２１ｍｌ〜３０ｍｌ・・・１９１ｍｌ〜２００ｍｌのように２０種類となる。即ち、属性Ｂを１０ｍｌ毎に区分した場合の区分数は、２０区分である。また、属性Ｂを更に一般化し、例えばＢ１のように５０ｍｌ毎の値に区分した場合、取り得る属性値は、１ｍｌ〜５０ｍｌ，５１ｍｌ〜１００ｍｌ，１０１ｍｌ〜１５０ｍｌ，１５１ｍｌ〜２００ｍｌのように４種類となる。即ち、属性Ｂを５０ｍｌ毎に区分した場合の区分数は、４区分である。図５において、Ｂ３の１ｍｌ，２ｍｌ，３ｍｌ，４ｍｌ・・・１０ｍｌの属性値を一般化したものが、Ｂ２の１ｍｌ〜１０ｍｌである。また、Ｂ２の１ｍｌ〜１０ｍｌ，１１ｍｌ〜２０ｍｌ，２１ｍｌ〜３０ｍｌ，３１ｍｌ〜４０ｍｌ，４１ｍｌ〜５０ｍｌの属性値を一般化したものが、Ｂ３の１ｍｌ〜５０ｍｌである。即ち、Ｂ３の属性値を一般化したものがＢ２の属性値であり、Ｂ２の属性値を一般化したものがＢ１の属性値であり、階層状に一般化されている。例えばＢ３のように同じ基準（１ｍｌ毎）で区切られた一般化の程度が同じ属性値の集合を一つの階層とし、これを一般化したＢ２の基準（１０ｍｌ毎）で区切られた属性値の集合をＢ３の階層より一般化の程度の高いＢ２の階層としている。更に、Ｂ１の基準（５０ｍｌ毎）で区切られた属性値の集合をＢ２の階層より一般化の程度の高いＢ１の階層としている。このように属性の値一般化階層を設定することで、Ｂ３の階層の７ｍｌはＢ２の階層の１ｍｌ〜１０ｍｌ、Ｂ３の階層の６８ｍｌはＢ２の階層の６１ｍｌ〜７０ｍｌに一般化することができる。また、Ｂ２の階層の１ｍｌ〜１０ｍｌはＢ１の階層の１ｍｌ〜５０ｍｌ、Ｂ２の階層の６１ｍｌ〜７０ｍｌはＢ１の階層の５１ｍｌ〜１００ｍｌに一般化することができる。 The number-of-sections acquiring unit 12 acquires the number of sections when anonymizing personal information to be anonymized. The number of divisions is the number of types of attribute values (words) that can be taken by the attributes included in the anonymous information, in other words, the number of divisions when the values of the attributes constituting the anonymous information are divided into the same values. In addition, the number-of-sections acquiring unit 12 may acquire the number of sections of generalized (abstracted) attribute values. FIG. 5 is a diagram showing a value generalization hierarchy (VGH) of the attribute B (dosage amount) shown in FIG. Attribute B is data indicating the dose administered in the range of 1 ml to 200 ml. When the value is divided into values of 1 ml as shown in B3 in FIG. 5, the possible attribute values are 1 ml, 2 ml, 3 ml, 4 ml,.・ There are 200 types such as 200ml. That is, when the attribute B is divided for each 1 ml, the number of divisions is 200. Also, when attribute B is generalized (abstracted) and, for example, is divided into values of 10 ml as shown in B2 of FIG. 5, possible attribute values are 1 ml to 10 ml, 11 ml to 20 ml, 21 ml to 30 ml.・ There are 20 types such as 191 ml to 200 ml. That is, when the attribute B is divided for every 10 ml, the number of divisions is 20. Further, if attribute B is further generalized and divided into values for every 50 ml as in B1, for example, there are four types of possible attribute values such as 1ml to 50ml, 51ml to 100ml, 101ml to 150ml, and 151ml to 200ml. Become. That is, when the attribute B is divided every 50 ml, the number of divisions is four. In FIG. 5, a generalized attribute value of 1 ml, 2 ml, 3 ml, 4 ml,..., 10 ml of B3 is 1 ml to 10 ml of B2. The generalized attribute values of 1 ml to 10 ml, 11 ml to 20 ml, 21 ml to 30 ml, 31 ml to 40 ml, and 41 ml to 50 ml of B2 are 1 ml to 50 ml of B3. That is, a generalized attribute value of B3 is an attribute value of B2, and a generalized attribute value of B2 is an attribute value of B1, which is generalized hierarchically. For example, a set of attribute values separated by the same reference (every 1 ml) such as B3 and having the same degree of generalization is defined as one layer, and the attribute values separated by the generalized B2 reference (every 10 ml) are defined as one layer. The set is a layer of B2, which has a higher degree of generalization than the layer of B3. Further, a set of attribute values delimited by the standard of B1 (every 50 ml) is defined as a layer B1 having a higher degree of generalization than the layer B2. By setting the attribute value generalization hierarchy in this way, 7 ml of the B3 hierarchy can be generalized to 1 ml to 10 ml of the B2 hierarchy, and 68 ml of the B3 hierarchy can be generalized to 61 ml to 70 ml of the B2 hierarchy. Also, 1 ml to 10 ml of the layer B2 can be generalized to 1 ml to 50 ml of the layer B1, and 61 ml to 70 ml of the layer B2 can be generalized to 51 ml to 100 ml of the layer B1.

図６は、図４に示す属性Ｃ（完治までの期間）の値一般化階層を示す図である。属性Ｃは、１週〜１２週の範囲で完治までの期間を示すデータであり、図６のＣ２のように１週
毎の値に区分した場合、取り得る属性値は、１週，２週，３週，４週・・・１２週のように１２種類、即ち、１２区分となる。また、属性Ｃを一般化し、例えば、図６のＣ１のように月毎の値に区分した場合、取り得る属性値は、１月未満，１月〜２月未満，２月〜３月未満のように３種類、即ち、３区分となる。図６において、Ｃ２の１週，２週，３週，４週の属性値を一般化したものが、Ｃ１の１月未満、Ｃ２の９週，１０週，１１週，１２週の属性値を一般化したものが、Ｃ１の２月〜３月未満である。即ち、Ｃ２の属性値を一般化したものがＣ１の属性値であり、階層状に一般化されている。例えばＣ２のように同じ基準（週毎）で区切られた一般化の程度が同じ属性値の集合を一つの階層とし、これを一般化したＣ１の基準（月毎）で区切られた属性値の集合をＣ２の階層より一般化の程度の高いＣ１の階層としている。このように属性の値一般化階層を設定することで、Ｃ２の階層の３週はＣ１の階層の１月未満、Ｃ２の階層の１０週はＣ１の階層の２月〜３月未満に一般化することができる。 FIG. 6 is a diagram showing a value generalization hierarchy of the attribute C (period until complete recovery) shown in FIG. Attribute C is data indicating the period until complete recovery in the range of 1 week to 12 weeks. If the value is divided into weekly values as shown in C2 of FIG. 6, the possible attribute values are 1 week and 2 weeks. , 3 weeks, 4 weeks... 12 weeks, that is, 12 categories. Also, when the attribute C is generalized and divided into monthly values, for example, as indicated by C1 in FIG. 6, possible attribute values are less than January, less than January to February, and less than February to March. Thus, there are three types, that is, three sections. In FIG. 6, the generalized attribute values of C2 for one week, two weeks, three weeks, and four weeks are obtained by changing the attribute values of C1 for less than one month and C2 for nine weeks, ten weeks, eleven weeks, and twelve weeks. The generalization is less than February to March of C1. That is, the generalization of the attribute value of C2 is the attribute value of C1, and is generalized in a hierarchical manner. For example, a set of attribute values separated by the same criterion (every week) having the same degree of generalization, such as C2, is defined as one layer, and this is a generalized C1 criterion (monthly). The set is a layer of C1 having a higher degree of generalization than the layer of C2. By setting the attribute value generalization hierarchy in this way, three weeks of the hierarchy of C2 are generalized to less than January of the hierarchy of C1, and ten weeks of the hierarchy of C2 are generalized to less than February to March of the hierarchy of C1. can do.

図７は、属性Ｂ及び属性Ｃの属性一般化階層（DGH：Domain Generalization Hierachies）を示す図である。図７に示すように属性Ｂの階層構造は、上記Ｂ３の階層→Ｂ２の階
層→Ｂ１の階層→ルートとなっている。Ｂ３の階層は、｛１ｍｌ，２ｍｌ，３ｍｌ，４ｍｌ・・・２００ｍｌ｝の属性値を含み、Ｂ２の階層は、｛１ｍｌ〜１０ｍｌ，１１ｍｌ〜２０ｍｌ，２１ｍｌ〜３０ｍｌ・・・１９１ｍｌ〜２００ｍｌ｝の属性値を含み、Ｂ１の階層は、｛１ｍｌ〜５０ｍｌ，５１ｍｌ〜１００ｍｌ，１０１ｍｌ〜１５０ｍｌ，１５１ｍｌ〜２００ｍｌ｝の属性値を含む。また、属性Ｃの階層構造は、上記Ｃ２の階層→Ｃ１の階層→ルートとなっている。Ｃ２の階層は、｛１週，２週，３週，４週・・・１２週｝の属性値を含み、Ｃ１の階層は、｛１月未満，１月〜２月未満，２月〜３月未満｝の属性値を含む。このように、属性一般化階層として、各属性の階層構造及び各属性に含まれる属性値を設定している。 FIG. 7 is a diagram illustrating an attribute generalization hierarchy (DGH: Domain Generalization Hierachies) of the attribute B and the attribute C. As shown in FIG. 7, the hierarchical structure of the attribute B is as follows: the hierarchy of the above B3 → the hierarchy of the B2 → the hierarchy of the B1 → the root. The layer of B3 includes attribute values of {1 ml, 2 ml, 3 ml, 4 ml... 200 ml}, and the layer of B2 has attributes of {1 ml to 10 ml, 11 ml to 20 ml, 21 ml to 30 ml. Values, and the layer of B1 includes attribute values of {1 ml to 50 ml, 51 ml to 100 ml, 101 ml to 150 ml, 151 ml to 200 ml}. Further, the hierarchical structure of the attribute C is such that the above-mentioned hierarchy of C2 → the hierarchy of C1 → the root. The layer of C2 includes attribute values of {1 week, 2 weeks, 3 weeks, 4 weeks ... 12 weeks}, and the layer of C1 is {less than January, less than January to February, and 2 to 3 months. Includes attribute values for less than a month. As described above, as the attribute generalized hierarchy, the hierarchical structure of each attribute and the attribute value included in each attribute are set.

なお、属性は、数値に限定されるものではない。図８は、地域（属性Ｄ）の値一般化階層と属性一般化階層を示す図である。図８において、Ｄ３の階層は４７都道府県、Ｄ２の階層は１４地域区分、Ｄ１の階層は８地域区分を示している。 The attribute is not limited to a numerical value. FIG. 8 is a diagram showing a value generalized hierarchy and an attribute generalized hierarchy of a region (attribute D). In FIG. 8, the hierarchy of D3 indicates 47 prefectures, the hierarchy of D2 indicates 14 regions, and the hierarchy of D1 indicates 8 regions.

本実施形態では、最も一般化の程度が高い階層（最も抽象的な階層）を第一階層とし、第一階層の属性は、属性の記号に階層を示す数字１を付してＢ１，Ｃ１，Ｄ１のように示し、第二階層の属性は、Ｂ２，Ｃ２，Ｄ２、、第三階層の属性は、Ｂ３，Ｃ３，Ｄ３のように示す。 In the present embodiment, the hierarchy with the highest degree of generalization (the most abstract hierarchy) is defined as the first hierarchy, and the attributes of the first hierarchy are represented by B1, C1, The attribute of the second layer is indicated as B2, C2, D2, and the attribute of the third layer is indicated as B3, C3, D3.

各属性の値一般化階層及び属性一般化階層は、例えば、予め定めて記憶装置に記憶させておく。なお、属性が数値の場合には、匿名処理装置１が、入力された対象のデータや属性の範囲を所定の値で区分して値一般化階層及び属性一般化階層を作成しても良い。 The value generalized hierarchy and the attribute generalized hierarchy of each attribute are, for example, predetermined and stored in a storage device. When the attribute is a numerical value, the anonymous processing device 1 may create a value generalized hierarchy and an attribute generalized hierarchy by dividing the input target data and the range of the attribute by predetermined values.

また、属性は、一般化できるものに限定されず、一般化せずに用いる属性を有しても良い。例えば、属性Ａ（性別）は、男，女の二区分としている。なお、階層状に一般化していない属性は、最も一般化の程度が高い階層（第一階層）として扱う。 Further, the attributes are not limited to those that can be generalized, and may have attributes used without generalization. For example, the attribute A (sex) is classified into two categories, male and female. It should be noted that an attribute that is not generalized in a hierarchical manner is handled as a hierarchy having the highest generalization (first hierarchy).

図９は、これらの属性を組み合わせた場合の区分数を示す図である。図９に示すように、属性を組み合わせた場合の区分数は、各属性の区分数を乗じた値となり、属性Ａと属性Ｂ１を組み合わせた場合の区分数は８区分、属性Ａと属性Ｂ２を組み合わせた場合の区分数は４０区分、属性Ａと属性Ｃ１を組み合わせた場合の区分数は６区分、属性Ａと属性Ｃ２を組み合わせた場合の区分数は２４区分である。 FIG. 9 is a diagram showing the number of sections when these attributes are combined. As shown in FIG. 9, the number of divisions when the attributes are combined is a value obtained by multiplying the number of divisions of each attribute. The number of divisions when the attribute A and the attribute B1 are combined is eight divisions, and the attribute A and the attribute B2 are The number of sections when combined is 40, the number of sections when attribute A and attribute C1 are combined is 6, and the number of sections when attribute A and attribute C2 are combined is 24.

同様に、属性Ｂ１と属性Ｃ１であれば８区分、属性Ｂ１と属性Ｃ２であれば２４区分、属性Ｂ２と属性Ｃ１であれば６０区分、属性Ｂ２と属性Ｃ２であれば２４０区分である。 Similarly, the attribute B1 and the attribute C1 have eight sections, the attribute B1 and the attribute C2 have 24 sections, the attribute B2 and the attribute C1 have 60 sections, and the attribute B2 and the attribute C2 have 240 sections.

更に、属性Ａと属性Ｂ１と属性Ｃ１であれば２４区分、属性Ａと属性Ｂ１と属性Ｃ２であれば９６区分、属性Ａと属性Ｂ２と属性Ｃ１であれば１２０区分、属性Ａと属性Ｂ２と属性Ｃ２であれば４８０区分である。 Further, if the attribute A, the attribute B1, and the attribute C1, 24 sections, if the attribute A, the attribute B1, and the attribute C2, 96 sections, if the attribute A, the attribute B2, and the attribute C1, 120 sections, the attribute A, the attribute B2, and the like. If it is attribute C2, it is 480 division.

係数取得部１３は、近似関数算出装置２によって算出された近似関数を取得する。近似関数は、例えば、対象データを匿名化する際、区分数を増加させた場合の最少出現数の減少数又は前記減少数の全体数に対する割合である。 The coefficient acquisition unit 13 acquires the approximate function calculated by the approximate function calculation device 2. The approximation function is, for example, the number of decreases in the minimum number of appearances or the ratio of the number of decreases to the total number when the number of segments is increased when anonymizing the target data.

匿名化部１５は、対象データを匿名化或いは多様化する際に、対象データ中の属性の値（属性値）であるワード（語）を前記値一般化階層に基づいて一般化したワードに替えることで匿名化を行い、対象データを匿名候補データとする。本実施形態においてワード（語）は、単語や句など、一まとまりの言葉であり、年齢や投薬量、完治までの期間、位置情報、電話番号等の数値、メールアドレスやＩＰアドレス等の識別情報、言葉と同様の意味を持つ記号等を含んでも良い。 When anonymizing or diversifying the target data, the anonymizing unit 15 replaces a word (word) that is an attribute value (attribute value) in the target data with a word generalized based on the value generalization hierarchy. In this way, anonymization is performed, and the target data is set as anonymous candidate data. In the present embodiment, a word (word) is a group of words such as a word or a phrase, and includes age, dosage, period until complete recovery, location information, numerical values such as telephone numbers, and identification information such as a mail address and an IP address. And symbols having the same meaning as words.

検定部１６は、匿名候補データの一個人と対応する属性値の組み合わせが、当該匿名候補データ中で基準数以上存在すること、少なくとも単一でないことを条件として検定する。換言すると、匿名候補データにおいて、一タプルの属性値の組み合わせのうち、一致する組み合わせの数（出現数）であって、最も少ないもの（最少出現数）が基準値以上であることを条件として検定する。即ち、ｋ値（最少出現数）が基準値以上であること、またはｌ値が基準値以上であることを条件とした場合、検定部１６は、匿名候補データがｋ−匿名性を満たしているかや、ｌ−多様性を満たしているかを検定する。検定部１６は、匿名候補データにおいて、一タプルの属性値の組み合わせが例えば（男，３５ｍｌ，２月）であった場合、この属性値の組み合わせ（男，３５ｍｌ，２月）と一致するものの数（出現数）をカウントし、当該匿名候補データの全ての組み合わせの中で、最も少ない出現数（最少出現数）が基準値以上であることを条件として検定する。検定部１６は、この検定の結果、匿名性を満たした匿名候補データを匿名情報として匿名結果ＤＢ３１に記憶させる。 The test unit 16 performs a test on the condition that a combination of attribute values corresponding to one individual of the anonymous candidate data exists in the anonymous candidate data in a number equal to or more than the reference number and is not at least one. In other words, in the anonymous candidate data, a test is performed under the condition that the number of matching combinations (the number of appearances) and the smallest one (the minimum number of occurrences) are equal to or more than the reference value among the combinations of attribute values of one tuple. I do. That is, when the k value (minimum number of occurrences) is equal to or greater than the reference value or the l value is equal to or greater than the reference value, the test unit 16 determines whether the anonymous candidate data satisfies k-anonymity. And whether or not 1-diversity is satisfied. When the combination of attribute values of one tuple is, for example, (male, 35 ml, February) in the anonymous candidate data, the test unit 16 determines the number of matches with this attribute value combination (male, 35 ml, February). (The number of appearances) is counted, and the test is performed on the condition that the smallest number of appearances (minimum number of appearances) is equal to or more than the reference value among all the combinations of the anonymous candidate data. The test unit 16 stores the anonymous candidate data satisfying the anonymity as anonymous information in the anonymous result DB 31 as a result of the test.

データ出力部１８は、匿名結果ＤＢ３１から匿名化情報を読み出して出力する。ここで、匿名化情報の出力とは、例えば、表示装置による表示出力や、プリンタによる印刷出力、他のコンピュータへの送信、記憶媒体への書き込み等である。 The data output unit 18 reads and outputs anonymized information from the anonymous result DB 31. Here, the output of the anonymized information is, for example, display output by a display device, print output by a printer, transmission to another computer, writing to a storage medium, and the like.

予測部１９は、処理対象の個人情報を匿名化した場合の最少出現数を目標値以上とする場合に、前記区分数と前記最少出現数の対応関係に基づき、前記最少出現数が前記目標値となる前記区分数を予測値とする。 When the minimum number of occurrences when the personal information to be processed is anonymized is equal to or more than the target value, the prediction unit 19 sets the minimum number of occurrences to the target value based on the correspondence between the number of categories and the minimum number of occurrences. The above-mentioned number of sections is defined as a predicted value.

また、近似関数算出装置２は、匿名情報取得部２１や、出現数取得部２２、係数算出部２３、頻出パターンＤＢ３３、近似関数ＤＢ３４を備えている。 The approximate function calculation device 2 includes an anonymous information acquisition unit 21, an appearance number acquisition unit 22, a coefficient calculation unit 23, a frequent pattern DB 33, and an approximate function DB 34.

匿名情報取得部２１は、個人情報を匿名化した匿名結果ＤＢ３１から匿名情報を取得する。また、出現数取得部２２は、匿名情報を構成する属性を語（属性値）毎に区分して区分数を求め、各区分における語の最少出現数を求める。 The anonymous information acquisition unit 21 acquires anonymous information from the anonymous result DB 31 obtained by anonymizing personal information. Further, the appearance number acquisition unit 22 obtains the number of divisions by dividing the attributes constituting the anonymous information for each word (attribute value), and obtains the minimum number of occurrences of the word in each division.

係数算出部と、区分数の異なる複数の前記区分数及び前記最少出現数の組み合わせに基づいて、前記区分数を増加させた場合の最少出現数の減少数又は前記減少数の全体数に対する割合を近似関数として求め、近似関数ＤＢ３４に記憶する。 A coefficient calculation unit, based on a combination of a plurality of the different numbers of sections and the minimum number of sections, based on the combination of the minimum number of occurrences, the number of the minimum occurrences when the number of the sections is increased, or the ratio of the reduced number to the total number of the reduced number. It is obtained as an approximate function and stored in the approximate function DB 34.

図１０は匿名処理装置１及び近似関数算出装置２のハードウェア構成を示す図である。
匿名処理装置１及び近似関数算出装置２は、ＣＰＵ１０１、メモリ１０２、通信制御部１０３、記憶装置１０４、入出力インタフェース１０５を有する所謂コンピュータである。 FIG. 10 is a diagram illustrating a hardware configuration of the anonymous processing device 1 and the approximate function calculation device 2.
The anonymous processing device 1 and the approximate function calculation device 2 are so-called computers having a CPU 101, a memory 102, a communication control unit 103, a storage device 104, and an input / output interface 105.

ＣＰＵ１０１は、メモリ１０２に実行可能に展開されたプログラムを実行する。これにより、匿名処理装置１のＣＰＵ１０１は、前述のデータ受付部１１や、区分数取得部１２、係数取得部１３、匿名化部１５、検定部１６、データ出力部１８、予測部１９の機能を提供する。また、近似関数算出装置２のＣＰＵ１０１は、前述の匿名情報取得部２１や、出現数取得部２２、係数算出部２３の機能を提供する。 The CPU 101 executes a program expanded in an executable manner in the memory 102. Accordingly, the CPU 101 of the anonymous processing device 1 performs the functions of the data receiving unit 11, the number-of-sections obtaining unit 12, the coefficient obtaining unit 13, the anonymizing unit 15, the test unit 16, the data output unit 18, and the prediction unit 19 described above. provide. The CPU 101 of the approximation function calculation device 2 provides the functions of the anonymous information acquisition unit 21, the appearance number acquisition unit 22, and the coefficient calculation unit 23 described above.

メモリ１０２は、主記憶装置ということもできる。メモリ１０２は、例えば、ＣＰＵ１０１が実行するプログラムや、通信制御部１０３を介して受信したデータ、記憶装置１０４から読み出したデータ、その他のデータ等を記憶する。 The memory 102 can also be called a main storage device. The memory 102 stores, for example, programs executed by the CPU 101, data received via the communication control unit 103, data read from the storage device 104, and other data.

通信制御部１０３は、ネットワークを介して他の装置と接続し、当該装置との通信を制御する。入出力インタフェース１０５は、表示装置やプリンタ等の出力手段や、キーボードやポインティングデバイス等の入力手段、ドライブ装置等の入出力手段が適宜接続される。ドライブ装置は、着脱可能な記憶媒体の読み書き装置であり、例えば、フラッシュメモリカードの入出力装置、ＵＳＢメモリを接続するＵＳＢのアダプタ等である。また、着脱可能な記憶媒体は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）等のディスク媒体であってもよい。ドライブ装置は、着脱可能な記憶媒体からプログラムを読み出し、記憶装置１０４に格納する。 The communication control unit 103 connects to another device via a network and controls communication with the device. The input / output interface 105 is appropriately connected to output means such as a display device and a printer, input means such as a keyboard and a pointing device, and input / output means such as a drive device. The drive device is a removable storage medium read / write device, such as a flash memory card input / output device, a USB adapter for connecting a USB memory, or the like. Further, the removable storage medium may be a disk medium such as a CD (Compact Disc) and a DVD (Digital Versatile Disk). The drive device reads the program from the removable storage medium and stores the program in the storage device 104.

記憶装置１０４は、外部記憶装置ということもできる。記憶装置１０４としては、ＳＳＤ（Solid State Drive）やＨＤＤ等であってもよい。記憶装置１０４は、ドライブ装置
との間で、データを授受する。例えば、記憶装置１０４は、ドライブ装置からインストールされる情報処理プログラム等を記憶する。また、記憶装置１０４は、プログラムを読み出し、メモリ１０２に引き渡す。本実施形態では、匿名処理装置１の記憶装置１０４が前述の匿名結果ＤＢ３１を格納している。また、近似関数算出装置２の記憶装置１０４が、頻出パターンＤＢ３３、近似関数ＤＢ３４を格納している。 The storage device 104 can also be referred to as an external storage device. The storage device 104 may be an SSD (Solid State Drive), HDD, or the like. The storage device 104 exchanges data with the drive device. For example, the storage device 104 stores an information processing program and the like installed from the drive device. Further, the storage device 104 reads the program and delivers the program to the memory 102. In the present embodiment, the storage device 104 of the anonymous processing device 1 stores the anonymous result DB 31 described above. Further, the storage device 104 of the approximate function calculation device 2 stores a frequent pattern DB 33 and an approximate function DB 34.

次に本実施形態における匿名化システム１０の近似関数算出装置２がプログラムに従って実行する近似関数算出方法について説明する。図１１は、近似関数算出方法の説明図である。近似関数算出装置２は、先ず他のコンピュータ或いは記憶装置から最も抽象的な値一般化階層と処理回数ｎを取得する（ステップＳ１０）。なお、処理回数ｎは、サンプルの取得数、即ちサンプルを取得する処理の繰り返し数を示す所定値である。 Next, an approximate function calculation method executed by the approximate function calculation device 2 of the anonymization system 10 according to the present embodiment according to a program will be described. FIG. 11 is an explanatory diagram of an approximate function calculation method. The approximate function calculation device 2 first obtains the most abstract value generalized hierarchy and the number of processings n from another computer or storage device (step S10). Note that the processing count n is a predetermined value indicating the number of samples obtained, that is, the number of repetitions of the processing for obtaining the samples.

次に匿名化処理ＤＢから本実施形態の近似関数算出装置２は、対象データを取得し(ス
テップＳ２０)、当該対象データの区分数をカウントして（ステップＳ３０)、匿名化処理ＤＢに記憶させる(ステップＳ４０)。そして、近似関数算出装置２は、処理回数ｎが１に達したか否かを判定し(ステップＳ５０)、処理回数ｎが１に達していなければ(ステップ
Ｓ５０，Ｎｏ)、処理回数ｎをデクリメントして(ステップＳ６０)、ステップＳ１０に戻
り、ステップＳ１０〜Ｓ５０の処理を繰り返す。このステップＳ１０〜Ｓ５０の処理を所定回数う繰り返し、ステップＳ５０で処理回数ｎが１と判定された場合(ステップＳ５０
、Ｙｅｓ)、処理回数ｎを初期値に戻し(ステップＳ７０)、ステップＳ９０に移行する。
なお、ステップＳ９０〜Ｓ１５０の処理は、値一般化階層の出現数の調査を行うものである。 Next, the approximate function calculation device 2 of the present embodiment acquires the target data from the anonymization process DB (Step S20), counts the number of sections of the target data (Step S30), and stores the number in the anonymization process DB. (Step S40). Then, the approximation function calculating device 2 determines whether or not the number of processes n has reached 1 (step S50). If the number of processes n has not reached 1 (step S50, No), the number of processes n is decremented. Then (step S60), the process returns to step S10, and the processes of steps S10 to S50 are repeated. The processes of steps S10 to S50 are repeated a predetermined number of times, and when the number of processes n is determined to be 1 in step S50 (step S50)
, Yes), the number of processes n is returned to the initial value (step S70), and the routine goes to step S90.
Note that the processing of steps S90 to S150 is to investigate the number of appearances of the value generalized hierarchy.

近似関数算出装置２は、まだ出現数を求めていない値一般化階層のうち、最も抽象的な値一般化階層を取得し (ステップＳ９０)、個人情報（対象情報）を取得する(ステップＳ１００)。個人情報の各属性値をステップＳ９０で取得した値一般化階層に合わせて一般
化し、各属性値の組み合わせについて出現数をカウントし(ステップＳ１１０)、このうち最小の出現数（ｋ値）を求めて(ステップＳ１２０)、匿名化処置ＤＢに格納する(ステッ
プＳ１３０)。そして、近似関数算出装置２は、処理回数ｎが１に達したか否かを判定し(ステップＳ１４０)、処理回数ｎが１に達していなければ(ステップＳ１４０，Ｎｏ)、処理回数ｎをデクリメントして(ステップＳ１５０)、ステップＳ９０に戻り、ステップＳ９０〜Ｓ１５０の処理を繰り返す。このステップＳ９０〜Ｓ１５０の処理を所定回数繰り返し、ステップＳ１４０で処理回数ｎが１と判定された場合(ステップＳ１４０、Ｙｅｓ)、図１１の処理を終了する。 The approximation function calculation device 2 acquires the most abstract value generalized hierarchy from among the value generalized hierarchies for which the number of appearances has not yet been obtained (step S90), and acquires personal information (target information) (step S100). . Each attribute value of personal information is generalized according to the value generalization hierarchy acquired in step S90, the number of appearances is counted for each combination of attribute values (step S110), and the minimum number of appearances (k value) is obtained. (Step S120), and stores it in the anonymization treatment DB (Step S130). Then, the approximation function calculating device 2 determines whether or not the number of processes n has reached 1 (step S140). If the number of processes n has not reached 1 (step S140, No), the number of processes n is decremented. Then (step S150), the process returns to step S90, and the processes of steps S90 to S150 are repeated. The processes in steps S90 to S150 are repeated a predetermined number of times. If it is determined in step S140 that the number of processes n is 1 (step S140, Yes), the process in FIG. 11 ends.

次に、近似関数算出装置２は、図１２に示すように、近似関数を取得する。近似関数算出装置２は、値一般化階層の区分数とｋ値のリストを取得し(ステップＳ２１０)、ｋ値を配列に格納して平均値を求め(ステップＳ２２０)、式１によりβを求め(ステップＳ２３
０)、式２によりαを求める(ステップＳ２４０)。

但し、ｙ：最少出現数（ｋ値）、ｘ：区分数

そして、式１と式２から、近似関数として式３（累乗近似式）を求め、匿名化処理ＤＢに格納する(ステップＳ２５０)。
αｘβ ・・・式３
図１８は、横軸に区分数、縦軸に最少出現数をとり、近似関数の例を示したグラフである。 Next, the approximate function calculation device 2 acquires an approximate function as shown in FIG. The approximate function calculation device 2 obtains a list of the number of partitions of the value generalized hierarchy and a list of k values (step S210), stores the k values in an array to obtain an average value (step S220), and obtains β by Expression 1. (Step S23
0), and α is obtained by Expression 2 (step S240).

Where y: minimum number of occurrences (k value), x: number of divisions

Then, Expression 3 (power approximate expression) is obtained as an approximate function from

Expressions

1 and 2, and stored in the anonymization processing DB (Step S250).
αxβ ・・・ Equation 3
FIG. 18 is a graph showing an example of the approximation function with the horizontal axis representing the number of divisions and the vertical axis representing the minimum number of appearances.

また、近似関数算出装置２は、ターゲットとするｋ値、即ち、匿名化に必要な最少出現数を取得し(ステップＳ２６０)、式３を式４のように変形してターゲットのｋ値を満たすｘ値（区分数）を求め(ステップＳ２７０)、求めたｘ値を予測ＧＯＤ値として匿名化処理ＤＢに記憶する(ステップＳ２８０)。

次に、図１３、図１４を用いて、予測ＧＯＤ値に基づく匿名化処理について説明する。匿名処理装置１は、図１３の処理を開始すると、先ず、処理対象とする個人データを個人データ蓄積ＤＢから取得する。また、匿名処理装置１は、予測ＧＯＤの値と値一般化階層（書き換えパターン）とを匿名化処理ＤＢから取得し（ステップＳ３４０）、取得した値一般化階層に含まれる属性の組み合わせのうち、予測ＧＯＤの値以下で最も大きい区分数を持つ組み合わせを選択する(ステップＳ３５０)。 The approximation function calculating device 2 obtains the target k value, that is, the minimum number of occurrences required for anonymization (step S260), and satisfies the k value of the target by transforming Expression 3 into Expression 4. An x value (the number of sections) is obtained (step S270), and the obtained x value is stored in the anonymization processing DB as a predicted GOD value (step S280).

Next, the anonymization process based on the predicted GOD value will be described with reference to FIGS. When the processing of FIG. 13 is started, the anonymous processing device 1 first acquires personal data to be processed from the personal data storage DB. In addition, the anonymous processing device 1 acquires the value of the predicted GOD and the value generalization hierarchy (rewrite pattern) from the anonymization processing DB (step S340), and among the combinations of the attributes included in the acquired value generalization hierarchy, A combination having the largest number of sections that is equal to or smaller than the value of the predicted GOD is selected (step S350).

図１５は、値一般化階層の一例を示す図である。図１５では、属性Ａ，Ｂ，Ｃについて、各階層の属性を組み合わせた場合の区分数を昇順に並べて示している。なお、図１５では、説明の便宜上、区分数を昇順に並べて示したが、必ずしも物理的に順番に並べる必要はなく、区分数に応じて属性の組み合わせを選択できれば良い。 FIG. 15 is a diagram illustrating an example of the value generalization hierarchy. FIG. 15 shows, for attributes A, B, and C, the number of sections when attributes of each layer are combined, arranged in ascending order. In FIG. 15, for convenience of explanation, the number of sections is shown in ascending order. However, it is not always necessary to physically arrange them in order, but it is sufficient that a combination of attributes can be selected according to the number of sections.

ここで匿名処理装置１は、ステップＳ３５０で選択した属性の組み合わせが複数か否か、即ち予測ＧＯＤの値以下で最も大きい区分数を持つ組み合わせが複数存在するか否かを判定し(ステップＳ３６０)、複数存在する場合には(ステップＳ３６０，Ｙｅｓ)、優先度が設定されているか否かを判定する（ステップＳ３７０）。匿名処理装置１は、予め優先度が設定されている場合(ステップＳ３７０，Ｙｅｓ)、この優先度を記憶装置から読み出し、優先度の高い属性の組み合わせを選択する(ステップＳ３８０)。優先度は、例えば属性毎に設定しておき、各組み合わせで夫々属性の優先度を合計し、合計した優先度の高い組み合わせを選択する、或は最も優先度の高い属性を含む組み合わせを選択する。 Here, the anonymous processing device 1 determines whether or not there are a plurality of combinations of the attributes selected in step S350, that is, whether or not there are a plurality of combinations having the largest number of divisions equal to or less than the value of the predicted GOD (step S360). If there is a plurality (Step S360, Yes), it is determined whether or not a priority is set (Step S370). When the priorities are set in advance (step S370, Yes), the anonymous processing device 1 reads out the priorities from the storage device and selects a combination of attributes with high priorities (step S380). The priorities are set, for example, for each attribute, and the priorities of the attributes are totaled for each combination, and a combination having a higher total priority is selected, or a combination including an attribute having the highest priority is selected. .

なお、優先度は、任意に設定した数値であっても良いし、インターネット等のネットワークの検索エンジンにおいて、各属性又は属性値を検索した際のヒット数や、各属性値の検索数、各属性値をＳＥＭの広告キーワードとした場合の価格を取得して優先度として用いても良い。例えば、投薬量や治療期間等のように属性を示す語について検索エンジンから、ヒット数や、検索回数、ＳＥＭ価格を取得し、ヒット数や検索回数が高いもの、ＳＥＭ価格の高いものを優先するよう優先順位を決定して属性の組み合わせを選択する。 Note that the priority may be an arbitrarily set numerical value, or the number of hits when each attribute or attribute value is searched by a search engine of a network such as the Internet, the number of searches for each attribute value, and the number of attributes. The price when the value is used as the SEM advertisement keyword may be acquired and used as the priority. For example, the number of hits, the number of searches, and the SEM price are obtained from the search engine for words indicating attributes such as the dosage and the treatment period, and those with a high hit count, the number of searches, and those with a high SEM price are prioritized. And the combination of attributes is selected.

一方、このような優先度が設定されていない場合には(ステップＳ３７０，Ｎｏ)、各属性の内容に応じて優先順位を決定して属性の組み合わせを選択する(ステップＳ３９０)。例えば、組み合わせた属性の数が多いものを優先する。即ち、区分数が同じであれば属性を二つ組み合わせたものより、属性を三つ組み合わせたものを優先する。また、属性値の種類が多いものや、属性値の偏りが少ないものを優先しても良い。 On the other hand, when such a priority is not set (No at Step S370), a priority is determined according to the content of each attribute, and a combination of attributes is selected (Step S390). For example, priority is given to an attribute having a large number of combined attributes. That is, if the number of sections is the same, a combination of three attributes is given priority over a combination of two attributes. In addition, priority may be given to those having many types of attribute values and those having a small bias of attribute values.

なお、ステップＳ３８０，Ｓ３９０で属性の組み合わせの優先度を決める処理は、ステップＳ３５０で選択した組み合わせが複数の場合だけでなく、値一般化階層に基づいて作成した属性の組み合わせの全てについて、同一の区分数が複数存在する場合には、ステップＳ３８０，Ｓ３９０で優先度を決めてもよい。即ち、属性の組み合わせについて、区分数に従って昇順又は降順にソートして順位を付ける場合に、区分数が同一の組み合わせについては、ステップＳ３８０，Ｓ３９０で優先度に従って順位を決定しても良い。 The process of determining the priority of the combination of attributes in steps S380 and S390 is not limited to the case where there are a plurality of combinations selected in step S350, and is the same for all the combinations of attributes created based on the value generalization hierarchy. If there are a plurality of sections, the priority may be determined in steps S380 and S390. That is, in the case where the combinations of attributes are sorted and ranked in ascending or descending order according to the number of divisions, the ranks may be determined according to the priorities in steps S380 and S390 for the combinations having the same number of divisions.

このステップＳ３８０，Ｓ３９０で属性の組み合わせを選択した場合、或はステップＳ３５０で選択した属性の組み合わせが複数存在しないと判定した場合(ステップＳ３６０
，Ｎｏ)、選択した属性の組み合わせで個人情報Ａｎを匿名化し、最少出現数（ｋ値）を
求める(ステップＳ４００)。 When a combination of attributes is selected in steps S380 and S390, or when it is determined that there is no plurality of combinations of attributes selected in step S350 (step S360
, No), the personal information An is anonymized by the combination of the selected attributes, and the minimum number of appearances (k value) is obtained (step S400).

そして、匿名処理装置１は、図１４のステップＳ４１０において、予測ＧＯＤに基づいて選択した属性の組み合わせで匿名化した場合の最少出現数がターゲット（目標値）以上か否かを判定する(ステップＳ４１０)、即ちｋ値をターゲットとして設定した場合には、ｋ匿名性を満たしているか否かを判定する。 Then, in step S410 of FIG. 14, the anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized by the combination of the attributes selected based on the predicted GOD is equal to or larger than the target (target value) (step S410). ), That is, when the k value is set as the target, it is determined whether or not k anonymity is satisfied.

匿名処理装置１は、この最少出現数がターゲット以上であれば(ステップＳ４１０、Ｙ
ｅｓ)、ステップＳ４９０へ移行し、ターゲット以上でなければ(ステップＳ４１０、Ｎｏ)、予測ＧＯＤに基づいて選択した属性の組み合わせ、即ちステップＳ３８０，Ｓ３９０
又はステップＳ３５０で選択した属性の組み合わせの順位を変数Ｊに設定し、変数ｍを初期値（ｍ＝１）に設定し(ステップＳ４２０)、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ４３０)。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも一つ順位が大きい属性の組み合わせ、即ち順位が一つ分詳細な属性の組み合わせで匿名化する。 If the minimum number of appearances is equal to or larger than the target (step S410, Y
es), the process proceeds to step S490, and if it is not equal to or more than the target (step S410, No), a combination of attributes selected based on the predicted GOD, that is, steps S380 and S390
Alternatively, the order of the combination of the attributes selected in step S350 is set in the variable J, the variable m is set to the initial value (m = 1) (step S420), and the personal information An is anonymized by the combination of the attributes in the order J + m. (Step S430). Therefore, anonymization is performed using a combination of attributes that is one rank higher than the combination of attributes selected based on the predicted GOD, that is, a combination of attributes that are detailed by one rank.

また、匿名処理装置１は、ステップＳ４３０で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ４４０)、この最少出現数がターゲット以上でなければ(ステップＳ４４０，Ｎｏ)、次に順位がＪ−ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ４５０)。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも一つ順位が小さい属性の組み合わせ、即ち順位が一つ分抽象的な属性の組み合わせで匿名化する。 Further, the anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized in step S430 is equal to or greater than the target (step S440), and if the minimum number of occurrences is not equal to or greater than the target (step S440, No), Next, the personal information An is anonymized using a combination of attributes having the rank Jm (step S450). Therefore, anonymization is performed using a combination of attributes whose rank is one smaller than the combination of attributes selected based on the predicted GOD, that is, a combination of attributes whose rank is one abstract.

匿名処理装置１は、ステップＳ４５０で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ４６０)、この最少出現数がターゲット以上でなければ(ステ
ップＳ４６０，Ｎｏ)、変数ｍをインクリメントして(ステップＳ４７０)、ステップ４３
０に戻り、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する。このため、ステップＳ４３０の処理が２回目であれば、予測ＧＯＤに基づいて選択した属性の組み合わせより二つ順位が大きい属性の組み合わせで匿名化し、更に、この最少出現数がターゲット以上でなければ(ステップＳ４４０，Ｎｏ) 、予測ＧＯＤに基づいて選択した属性の組
み合わせより二つ順位が小さい属性の組み合わせで匿名化する(ステップＳ４５０)。そして、ステップＳ４３０の処理が３回目であれば、予測ＧＯＤに基づいて選択した属性の組み合わせより三つ順位が大きい属性の組み合わせで匿名化し、更に、この最少出現数がターゲット以上でなければ(ステップＳ４４０，Ｎｏ) 、予測ＧＯＤに基づいて選択した属
性の組み合わせより三つ順位が小さい属性の組み合わせで匿名化する(ステップＳ４５０)。 The anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized in step S450 is equal to or greater than the target (step S460). If the minimum number of occurrences is not equal to or greater than the target (step S460, No), the variable m Is incremented (step S470), and step 43 is executed.
Returning to 0, the personal information An is anonymized by the combination of the attributes having the rank of J + m. For this reason, if the process in step S430 is the second time, anonymization is performed using a combination of attributes having two orders of magnitude higher than the combination of attributes selected based on the predicted GOD, and if the minimum number of occurrences is not equal to or greater than the target ( (Step S440, No), anonymization is performed using a combination of attributes having two smaller ranks than the combination of attributes selected based on the predicted GOD (Step S450). If the process in step S430 is the third time, anonymization is performed using a combination of attributes having three orders of magnitude higher than the combination of attributes selected based on the predicted GOD, and if the minimum number of occurrences is not equal to or greater than the target (step S430). (S440, No), anonymization is performed using a combination of attributes having three orders smaller than the combination of attributes selected based on the predicted GOD (step S450).

このように匿名性を満たさない場合には、予測ＧＯＤに基づいて選択した属性の組み合わせよりも詳細な組み合わせと、抽象的な組み合わせで交互に匿名化してＧＯＤを探索する。 When the anonymity is not satisfied as described above, the GOD is searched by alternately anonymizing a combination that is more detailed than a combination of attributes selected based on the predicted GOD and an abstract combination.

そして、匿名処理装置１は、ステップＳ４３０で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ４４０，Ｙｅｓ)、或はステップＳ４５０で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ４６０，Ｙｅ
ｓ)、この匿名情報の区分数で、予測ＧＯＤ値を更新し(ステップＳ４８０)、匿名化処理
の結果を出力する(ステップＳ４９０)。本例では、予測ＧＯＤ値、匿名化情報、及び当該匿名化情報の最少出現数（ｋ値）を匿名化処理ＤＢに記憶させる。 Then, the anonymous processing device 1 determines that the minimum number of appearances of the anonymized information anonymized in step S430 is equal to or larger than the target (step S440, Yes), or the minimum occurrence number of the anonymous information anonymized in step S450 is the target. If it is determined to be above (step S460, Ye
s) The predicted GOD value is updated with the number of sections of the anonymous information (step S480), and the result of the anonymization process is output (step S490). In this example, the predicted GOD value, the anonymized information, and the minimum number of appearances (k value) of the anonymized information are stored in the anonymization processing DB.

このように図１３，図１４の匿名化処理によれば、予測ＧＯＤ値からＧＯＤ値の探索を開始するので、匿名化の処置負荷を確実に低減できる。 As described above, according to the anonymization processing of FIGS. 13 and 14, since the search for the GOD value is started from the predicted GOD value, the processing load of the anonymization can be reliably reduced.

なお、類似した属性情報を有する個人情報を匿名化する場合、予測ＧＯＤ値も類似する傾向にあるため、大量の個人情報を連続して匿名化処理する場合、図１３，図１４のように予測ＧＯＤ値を求めた後は、この予測ＧＯＤ値を用いて匿名化処理を行っても良い。図１６は、この連続処理の例を示す図である。図１６では、ｎ件の個人情報を１〜ｎまで順に処理する例を示し、処理対象の個人情報をＡｎと示している。 When anonymizing personal information having similar attribute information, the predicted GOD value also tends to be similar. Therefore, when a large amount of personal information is continuously anonymized, the prediction is performed as shown in FIGS. After obtaining the GOD value, anonymization processing may be performed using the predicted GOD value. FIG. 16 is a diagram illustrating an example of this continuous processing. FIG. 16 shows an example in which n pieces of personal information are sequentially processed from 1 to n, and the personal information to be processed is indicated as An.

匿名処理装置１は、図１６の処理を開始すると、先ず処理数ｎを初期値に設定し(ステ
ップＳ５１０)、処理対象とする個人データＡｎを個人データ蓄積ＤＢから取得する(ステップＳ５２０)。なお、処理数ｎの初期値は、通常１と設定して、ステップＳ５２０で個
人データＡ１を取得するが、図１３，図１４の処理に続けて図１６の処理を行う場合、図１３，図１４で個人データＡ１を処理するので、ステップＳ５１０で設定する初期値を２とし、ステップＳ５２０でＡ２以降の個人データを取得する。ここで、匿名処理装置１は、個人データＡｎが取得できたか否か、即ち未処理の個人データＡｎが存在するか否かを判定し(ステップＳ５３０)、未処理の個人データＡｎが存在した場合(ステップＳ５３０
，Ｙｅｓ)、予測ＧＯＤの値と値一般化階層（書き換えパターン）とを匿名化処理ＤＢか
ら取得する（ステップＳ５４０）。そして、匿名処理装置１は、取得した値一般化階層に含まれる属性の組み合わせのうち、予測ＧＯＤの値以下で最も大きい区分数を持つ組み合わせを選択し、この属性の組み合わせで個人情報Ａｎを匿名化し、最少出現数（ｋ値）を求める(ステップＳ５５０)。 When the processing of FIG. 16 is started, the anonymous processing device 1 first sets the number of processes n to an initial value (step S510), and acquires personal data An to be processed from the personal data storage DB (step S520). The initial value of the number of processes n is usually set to 1 and the personal data A1 is acquired in step S520. However, when the process of FIG. 16 is performed following the processes of FIGS. Since the personal data A1 is processed in step 14, the initial value set in step S510 is set to 2, and in step S520 personal data after A2 is acquired. Here, the anonymous processing device 1 determines whether the personal data An has been acquired, that is, whether there is any unprocessed personal data An (step S530). (Step S530
, Yes), the value of the predicted GOD and the value generalized hierarchy (rewrite pattern) are acquired from the anonymization process DB (step S540). Then, the anonymous processing device 1 selects a combination having the largest number of divisions that is equal to or less than the value of the predicted GOD from among the combinations of attributes included in the acquired value generalized hierarchy, and anonymously transfers the personal information An by this attribute combination. Then, the minimum number of appearances (k value) is obtained (step S550).

そして、匿名処理装置１は、ステップＳ５５０で求めた最少出現数がターゲット（目標値）以上か否かを判定する(ステップＳ５６０)、即ちｋ値をターゲットとして設定した場合には、ｋ匿名性を満たしているか否かを判定する。 Then, the anonymous processing device 1 determines whether or not the minimum number of occurrences determined in step S550 is equal to or more than the target (target value) (step S560). It is determined whether or not the condition is satisfied.

匿名処理装置１は、この最少出現数がターゲット以上であれば(ステップＳ５６０、Ｙ
ｅｓ)、匿名性を満たしているので、匿名化処理の結果を匿名化処理ＤＢに記録し(ステップＳ６６０)、処理数ｎをインクリメントして(ステップＳ６７０)、ステップＳ５２０に
戻り、次の処理へ移行する。 If the minimum number of appearances is equal to or greater than the target (step S560, Y
es) Since the anonymity is satisfied, the result of the anonymization process is recorded in the anonymization process DB (step S660), the number of processes n is incremented (step S670), the process returns to step S520, and proceeds to the next process. Transition.

一方、ステップＳ５５０で求めた最少出現数がターゲット以上でなければ(ステップＳ
５６０、Ｎｏ)、予測ＧＯＤに基づいて選択した属性の組み合わせ、即ちステップＳ５５
０で選択した属性の組み合わせの順位を変数Ｊに設定し、変数ｍを初期値（ｍ＝１）に設定し(ステップＳ５９０)、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ６００)。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも一つ順位が大きい属性の組み合わせ、即ち順位が一つ分詳細な属性の組み合わせで匿名化する。 On the other hand, if the minimum number of occurrences determined in step S550 is not equal to or greater than the target (step S550).
560, No), a combination of attributes selected based on the predicted GOD, ie, step S55
The rank of the combination of the attributes selected at 0 is set to the variable J, the variable m is set to the initial value (m = 1) (step S590), and the personal information An is anonymized by the combination of the attributes of the rank J + m (step S590). Step S600). Therefore, anonymization is performed using a combination of attributes that is one rank higher than the combination of attributes selected based on the predicted GOD, that is, a combination of attributes that are detailed by one rank.

また、匿名処理装置１は、ステップＳ６００で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ６１０)、この最少出現数がターゲット以上でなければ(ステップＳ６１０，Ｎｏ)、次に順位がＪ−ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ６２０)。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも一つ順位が小さい属性の組み合わせ、即ち順位が一つ分抽象的な属性の組み合わせで匿名化する。 Further, the anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized in step S600 is equal to or more than the target (step S610). Next, the personal information An is anonymized using a combination of attributes having the rank Jm (step S620). Therefore, anonymization is performed using a combination of attributes whose rank is one smaller than the combination of attributes selected based on the predicted GOD, that is, a combination of attributes whose rank is one abstract.

匿名処理装置１は、ステップＳ６２０で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ６３０)、この最少出現数がターゲット以上でなければ(ステ
ップＳ６３０，Ｎｏ)、変数ｍをインクリメントして(ステップＳ６４０)、ステップ６０
０に戻り、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する。このため、ステップＳ６００の処理が２回目であれば、予測ＧＯＤに基づいて選択した属性の組み合わせより二つ順位が大きい属性の組み合わせで匿名化し、更に、この最少出現数がターゲット以上でなければ(ステップＳ６１０，Ｎｏ) 、予測ＧＯＤに基づいて選択した属性の組
み合わせより二つ順位が小さい属性の組み合わせで匿名化する(ステップＳ６２０)。 The anonymous processing device 1 determines whether the minimum number of appearances when anonymized in step S620 is equal to or larger than the target (step S630). If the minimum number of occurrences is not equal to or larger than the target (step S630, No), the variable m Is incremented (step S640), and step 60 is performed.
Returning to 0, the personal information An is anonymized by the combination of the attributes having the rank of J + m. For this reason, if the process in step S600 is the second time, anonymization is performed using a combination of attributes that are two ranks higher than the combination of attributes selected based on the predicted GOD, and if the minimum number of occurrences is not equal to or greater than the target ( (Step S610, No), anonymization is performed using a combination of attributes having two smaller ranks than the combination of attributes selected based on the predicted GOD (Step S620).

このように匿名性を満たさない場合には、予測ＧＯＤに基づいて選択した属性の組み合わせよりも詳細な組み合わせと、抽象的な組み合わせで交互に匿名化してＧＯＤを探索す
る。 When the anonymity is not satisfied as described above, the GOD is searched by alternately anonymizing a combination that is more detailed than a combination of attributes selected based on the predicted GOD and an abstract combination.

そして、匿名処理装置１は、ステップＳ６００で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ６１０，Ｙｅｓ)、或はステップＳ６２０で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ６３０，Ｙｅ
ｓ)、この匿名情報の区分数で、予測ＧＯＤ値を更新し(ステップＳ６５０)、匿名化処理
の結果を出力する(ステップＳ６６０)。本例では、予測ＧＯＤ値、匿名化情報、及び当該匿名化情報の最少出現数（ｋ値）を匿名化処理ＤＢに記憶させる。 Then, the anonymous processing device 1 determines that the minimum number of appearances of the anonymized information anonymized in step S600 is equal to or larger than the target (step S610, Yes), or the minimum occurrence number of anonymous information anonymized in step S620 is the target. If it is determined that this is the case (step S630, Ye
s) The predicted GOD value is updated with the number of sections of the anonymous information (step S650), and the result of the anonymization process is output (step S660). In this example, the predicted GOD value, the anonymized information, and the minimum number of appearances (k value) of the anonymized information are stored in the anonymization processing DB.

当該匿名情報に対するステップＳ６６０が完了した場合、処理数ｎをインクリメントして(ステップＳ６７０)、ステップＳ５２０へ戻り、次の個人データＡｎの処理に移行する。そして、ステップＳ５２０で、次の個人情報Ａｎが取得できなくなった場合、即ち未処理の個人情報が存在しなくなった場合(ステップＳ５３０，Ｎｏ)、図１６の処理を終了する。図１６の匿名化処理によれば、匿名化が成功したＧＯＤ値を次の匿名化処理の予測ＧＯＤ値として用いて連続処理することで、更に効率よく大量の個人情報を処理することができる。 When step S660 for the anonymous information is completed, the number of processes n is incremented (step S670), the process returns to step S520, and shifts to the process of the next personal data An. Then, in step S520, if the next personal information An cannot be obtained, that is, if there is no unprocessed personal information (No in step S530), the processing in FIG. 16 ends. According to the anonymization process of FIG. 16, a large amount of personal information can be processed more efficiently by performing continuous processing using the GOD value for which anonymization has succeeded as the predicted GOD value of the next anonymization process.

以上のように、本実施形態によれば、近似関数に基づいて予測ＧＯＤ値を求め、この予測ＧＯＤ値からＧＯＤ値の探索を開始するので、ＧＯＤ値を求める処理が確実に低減できる。 As described above, according to the present embodiment, the predicted GOD value is obtained based on the approximation function, and the search for the GOD value is started from the predicted GOD value, so that the process of obtaining the GOD value can be reliably reduced.

また、予測ＧＯＤ値を用いて類似する個人情報を連続して処理することで、大量の匿名情報を効率良く作成することができる。 Further, by processing similar personal information continuously using the predicted GOD value, a large amount of anonymous information can be efficiently created.

<変形例>
図１７は、匿名化処理の変形例を示す図である。図１４，図１６の処理では、ＧＯＤの探索を区分数の順で交互に行ったが、他の順序で探索を行っても良い。例えば、図１４におけるＧＯＤの探索の処理（ステップＳ４２０〜Ｓ４９０）や図１６におけるＧＯＤの探索の処理（ステップＳ５９０〜Ｓ６６０）を図１７の処理に変えて行っても良い。 <Modified example>
FIG. 17 is a diagram illustrating a modification of the anonymization process. In the processes of FIGS. 14 and 16, the GOD search is performed alternately in the order of the number of sections, but the search may be performed in another order. For example, the GOD search process (steps S420 to S490) in FIG. 14 and the GOD search process (steps S590 to S660) in FIG. 16 may be changed to the process in FIG.

この場合、匿名処理装置１は、図１４のステップＳ４１０で最少出現数がターゲット以上でないと判定した場合(ステップＳ４１０、Ｎｏ)、又は図１６のステップＳ５６０で最少出現数がターゲット以上でないと判定した場合に(ステップＳ５６０、Ｎｏ)、探索パターンの指示を取得する(ステップＳ６０２)。ここで、探索パターンとは、予測ＧＯＤに基づく属性の組み合わせで匿名化した場合の最少出現数がターゲット以上でなかった場合に、次に探索する属性の組み合わせの順序を示すものである。本例では、前述の図１４、図１６と同じく、交互に探索するものをパターン１、区分数ｍが大きくなる順番に探索するものをパターン２、区分数ｍが小さくなる順番に探索するものをパターン３としている。この探索パターンは、オペレータの入力を受ける構成や予め設定された値を読み出す構成であっても良い。 In this case, the anonymous processing device 1 determines that the minimum number of occurrences is not equal to or greater than the target in step S410 of FIG. 14 (No in step S410), or determines that the minimum number of occurrences is not equal to or greater than the target in step S560 of FIG. In this case (step S560, No), an instruction for a search pattern is obtained (step S602). Here, the search pattern indicates the order of the next combination of attributes to be searched when the minimum number of appearances when the combination of attributes based on the predicted GOD is anonymized is not equal to or greater than the target. In this example, as in FIGS. 14 and 16 described above, pattern 1 is searched alternately, pattern 2 is searched in the order of increasing the number of sections m, and pattern 2 is searched in the order of decreasing the number m of sections. Pattern 3 is used. The search pattern may be configured to receive an input from an operator or to read a preset value.

次に匿名処理装置１は、予測ＧＯＤに基づいて選択した属性の組み合わせ、即ち図１３のステップＳ３５０又は図１６のステップＳ５５０で選択した属性の組み合わせの順位を変数Ｊに設定し、変数ｒ及び変数ｍを初期値（例えばｒ＝１，ｍ＝１）に設定し(ステッ
プＳ６０４)、探索パターンが１か否かを判定する(ステップＳ６０６)。 Next, the anonymous processing device 1 sets the combination of the attributes selected based on the predicted GOD, that is, the rank of the combination of the attributes selected in step S350 in FIG. 13 or step S550 in FIG. m is set to an initial value (for example, r = 1, m = 1) (step S604), and it is determined whether or not the search pattern is 1 (step S606).

探索バターンが１であれば(ステップＳ６０６，Ｙｅｓ)、匿名処理装置１は、ステップＳ６０８へ移行し、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも順位がｍだけ大きい属性の組み合わせ、即ちｍが１であれば、順位が１つ分詳細な属性の組み合わせで匿名化する。 If the search pattern is 1 (step S606, Yes), the anonymous processing device 1 proceeds to step S608, and anonymizes the personal information An with the combination of the attributes having the order of J + m. Therefore, if the combination of attributes is higher by m than the combination of attributes selected based on the predicted GOD, that is, if m is 1, anonymization is performed using a combination of attributes that are detailed by one rank.

次に、匿名処理装置１は、ステップＳ６０８で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ６１０)、この最少出現数がターゲット以上でなければ(ステップＳ６１０，Ｎｏ)、次に順位がＪ−ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ６２０)。従って、予測ＧＯＤに基づいて選択した属性の組み合わせよりも一つ順位がｍだけ小さい属性の組み合わせ、即ちｍが１であれば、順位が一つ分抽象的な属性の組み合わせで匿名化する。 Next, the anonymous processing device 1 determines whether the minimum number of appearances when anonymized in step S608 is equal to or greater than the target (step S610), and if the minimum number of occurrences is not equal to or greater than the target (step S610, No). Then, the personal information An is anonymized using a combination of attributes having the rank of Jm (step S620). Therefore, if the combination of attributes is smaller by m than the combination of attributes selected based on the predicted GOD, that is, if m is 1, anonymization is performed using a combination of attributes whose rank is one more abstract.

匿名処理装置１は、ステップＳ６２０で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ６３０)、この最少出現数がターゲット以上でなければ(ステ
ップＳ６３０，Ｎｏ)、変数ｍをインクリメントして(ステップＳ６４０)、ステップ６０
６に戻り、ステップＳ６０８で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ６１０，Ｙｅｓ)、或はステップＳ６２０で匿名化した匿名情報の最少出現数がターゲット以上と判定した場合(ステップＳ６３０，Ｙｅｓ)、匿名化処理の結果を出力する(ステップＳ６６０)。本例では、予測ＧＯＤ値、匿名化情報、及び当該匿名化情報の最少出現数（ｋ値）を匿名化処理ＤＢに記憶させる。 The anonymous processing device 1 determines whether the minimum number of appearances when anonymized in step S620 is equal to or larger than the target (step S630). If the minimum number of occurrences is not equal to or larger than the target (step S630, No), the variable m Is incremented (step S640), and step 60 is performed.
6, when it is determined in step S608 that the minimum number of appearances of the anonymized anonymous information is equal to or larger than the target (step S610, Yes), or in step S620, it is determined that the minimum number of appearances of the anonymized anonymous information is equal to or larger than the target. In this case (step S630, Yes), the result of the anonymization process is output (step S660). In this example, the predicted GOD value, the anonymized information, and the minimum number of appearances (k value) of the anonymized information are stored in the anonymization processing DB.

また、ステップＳ６０６で、探索パターンが１でないと判定した場合(ステップＳ６０
６，Ｎｏ)、匿名処理装置１は、探索パターンが２か否かを判定し(ステップＳ７１０)、
探索バターンが２であれば(ステップＳ７１０，Ｙｅｓ)、ステップＳ７１５へ移行し、順位がＪ＋ｍの属性の組み合わせで個人情報Ａｎを匿名化する(ステップＳ７１５)。なお、パターン２で探索を行う場合、ステップＳ７１５で匿名化を行う際のＪの値は、パターン１と異ならせても良く、例えば所定数Ｑを減じて、（Ｊ−Ｑ）＋ｍのように、区分数の少ない属性の組み合わせから探索を開始するようにオフセットしても良い。 If it is determined in step S606 that the search pattern is not 1 (step S60
6, No), the anonymous processing device 1 determines whether or not the search pattern is 2 (step S710),
If the search pattern is 2 (step S710, Yes), the process proceeds to step S715, and the personal information An is anonymized using a combination of the attributes having the rank of J + m (step S715). When the search is performed using the pattern 2, the value of J at the time of performing the anonymization in step S715 may be different from that of the pattern 1, for example, by subtracting a predetermined number Q to obtain (J−Q) + m. Alternatively, the offset may be set such that the search is started from a combination of attributes having a small number of sections.

次に、匿名処理装置１は、匿名処理装置１は、ステップＳ７１５で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ７２０)、この最少出現数がターゲット以上でなければ(ステップＳ７２０，Ｎｏ)、変数ｒが所定数Ｒに達したか否かを判定し(ステップＳ７５０)、変数ｒが所定数Ｒに達していなければ (ステップＳ７５０，Ｎｏ)、変数ｍをインクリメントして(ステップＳ６４０)、ステップ６０６に戻る。 Next, the anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized in step S715 is equal to or greater than the target (step S720). (Step S720, No), it is determined whether or not the variable r has reached a predetermined number R (Step S750). If the variable r has not reached the predetermined number R (Step S750, No), the variable m is incremented. Then (step S640), the process returns to step 606.

一方、ステップＳ７２０において、匿名処理装置１は、ステップＳ７１５で匿名化した場合の最少出現数がターゲット以上と判定した場合(ステップＳ７２０，Ｙｅｓ)、匿名化処理の結果を出力し(ステップＳ７３０)、本例では、予測ＧＯＤ値、匿名化情報、及び当該匿名化情報の最少出現数（ｋ値）を匿名化処理ＤＢに記憶させ、変数（匿名化の成功回数）ｒをインクリメントして(ステップＳ７４０)、ステップＳ７５０へ移行する。即ち、成功回数ｒが所定値Ｒに達するまで、変数ｍをインクリメントし、区分数を増やした属性の組み合わせでパターン２の探索を行う(ステップＳ７１０〜Ｓ７５０)。 On the other hand, in step S720, when the anonymous processing device 1 determines that the minimum number of appearances when the anonymization is performed in step S715 is equal to or larger than the target (step S720, Yes), the anonymization processing result is output (step S730). In this example, the predicted GOD value, the anonymized information, and the minimum number of appearances (k value) of the anonymized information are stored in the anonymization processing DB, and the variable (the number of successful anonymization) r is incremented (step S740). ), And proceed to step S750. That is, until the number of successes r reaches the predetermined value R, the variable m is incremented, and a search for the pattern 2 is performed using the combination of attributes with the increased number of sections (steps S710 to S750).

また、ステップＳ７１０で、探索パターンが２でないと判定した場合(ステップＳ７１
０，Ｎｏ)、匿名処理装置１は、ステップＳ７５５へ移行して、順位がＪ−ｍの属性の組
み合わせで個人情報Ａｎを匿名化する。なお、パターン３で探索を行う場合、ステップＳ７５５で匿名化を行う際のＪの値は、パターン１と異ならせても良く、例えば所定数Ｐを加えて、（Ｊ＋Ｐ）＋ｍのように、区分数の多い属性の組み合わせから探索を開始するようにオフセットしても良い。 If it is determined in step S710 that the search pattern is not 2 (step S71)
(0, No), the anonymous processing device 1 proceeds to step S755, and anonymizes the personal information An with a combination of the attributes having the rank of Jm. When the search is performed using the pattern 3, the value of J at the time of performing the anonymization in step S755 may be different from that of the pattern 1. For example, a predetermined number P may be added and the classification may be performed as (J + P) + m. The offset may be set so that the search is started from a combination of a large number of attributes.

次に、匿名処理装置１は、ステップＳ７５５で匿名化した場合の最少出現数がターゲット以上か否かを判定し(ステップＳ７６０)、この最少出現数がターゲット以上でなければ(ステップＳ７６０，Ｎｏ)、変数ｒが所定数Ｒに達したか否かを判定し(ステップＳ７５
０)、変数ｒが所定数Ｒに達していなければ (ステップＳ７５０，Ｎｏ)、変数ｍをインク
リメントして(ステップＳ６４０)、ステップ６０６に戻る。 Next, the anonymous processing device 1 determines whether or not the minimum number of appearances when anonymized in step S755 is equal to or greater than the target (step S760), and if the minimum number of occurrences is not equal to or greater than the target (step S760, No). It is determined whether or not the variable r has reached a predetermined number R (step S75).
0), if the variable r has not reached the predetermined number R (No at Step S750), the variable m is incremented (Step S640), and the process returns to Step 606.

一方、ステップＳ７６０において、匿名処理装置１は、ステップＳ７５５で匿名化した場合の最少出現数がターゲット以上と判定した場合(ステップＳ７６０，Ｙｅｓ)、匿名化処理の結果を出力し(ステップＳ７７０)、本例では、予測ＧＯＤ値、匿名化情報、及び当該匿名化情報の最少出現数（ｋ値）を匿名化処理ＤＢに記憶させ、変数（匿名化の成功回数）ｒをインクリメントして(ステップＳ７８０)、ステップＳ７５０へ移行する。即ち、成功回数ｒが所定値Ｒに達するまで、変数ｍをインクリメントし、区分数を減じた属性の組み合わせでパターン３の探索を行う(ステップＳ７５５〜Ｓ７８０)。 On the other hand, in step S760, when the anonymous processing device 1 determines that the minimum number of appearances when the anonymization is performed in step S755 is equal to or more than the target (step S760, Yes), the anonymization processing result is output (step S770). In this example, the predicted GOD value, the anonymized information, and the minimum number of appearances (k value) of the anonymized information are stored in the anonymization processing DB, and the variable (the number of successful anonymization) r is incremented (step S780). ), And proceed to step S750. That is, until the number of successes r reaches the predetermined value R, the variable m is incremented, and a search for the pattern 3 is performed using a combination of attributes in which the number of divisions is reduced (steps S755 to S780).

そして、ステップＳ６６０が完了した場合、又はステップＳ７５０で成功回数ｒが所定数に達した場合(ステップＳ７５０，Ｙｅｓ)、匿名処理装置１は、図１４の探索処理であれば終了し、図１６の探索処理であればステップＳ６７０へ移行する。なお、ステップＳ７５０では、成功回数ｒの判定だけでなく、成功回数ｒが１以上の場合、失敗回数が所定値に達し場合にもパターン２，３の探索が終了したものとして判定し(ステップＳ７５０
，Ｙｅｓ)、図１４の探索処理であれば終了し、図１６の探索処理であればステップＳ６
７０へ移行しても良い。 Then, when step S660 is completed, or when the number of successes r reaches the predetermined number in step S750 (step S750, Yes), the anonymous processing device 1 ends if it is the search process of FIG. If it is a search process, the process moves to step S670. In step S750, in addition to the determination of the number of successes r, if the number of successes r is 1 or more, it is determined that the search for patterns 2 and 3 has been completed even if the number of failures reaches a predetermined value (step S750).
, Yes), if it is the search process of FIG. 14, the process is terminated, and if it is the search process of FIG. 16, step S6
70 may be performed.

このように、本変形例によれば、所望の探索パターンでＧＯＤの探索を行うことができる。このため、匿名化の目的等に応じて適切な匿名化処理を行うことが可能になる。 As described above, according to the present modification, GOD search can be performed with a desired search pattern. Therefore, it is possible to perform an appropriate anonymization process according to the purpose of the anonymization.

〈その他〉
本発明は、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 <Others>
The present invention is not limited to the illustrated examples described above, and it goes without saying that various changes can be made without departing from the spirit of the present invention.

１匿名処理装置
２近似関数算出装置
１０匿名化システム
１１データ受付部
１２区分数取得部
１３係数取得部
１５匿名化部
１６検定部
１８データ出力部
１９予測部
２１匿名情報取得部
２２出現数取得部
２３係数算出部 DESCRIPTION OF SYMBOLS 1 Anonymous processing device 2 Approximate function calculation device 10 Anonymization system 11 Data reception unit 12 Division number acquisition unit 13 Coefficient acquisition unit 15 Anonymization unit 16 Test unit 18 Data output unit 19 Prediction unit 21 Anonymous information acquisition unit 22 Number of appearance acquisition unit 23 Coefficient calculation unit

Claims

When the personal information is anonymized by a combination of a plurality of attributes, the number of combinations of attribute values that the attribute can take is set to the number of categories, and when the personal information is anonymized by the combination of the attributes, the combination of the attribute values A storage unit that stores the correspondence between the number of divisions and the minimum number of occurrences, with the minimum number being the minimum occurrence number among the numbers in which
When the minimum number of appearances when the personal information to be processed is anonymized is equal to or more than the target value, based on the correspondence between the number of divisions and the minimum number of occurrences, the number of divisions where the minimum number of occurrences is the target value A prediction unit having a prediction value of
An anonymization unit that anonymizes personal information to be processed with a combination of a plurality of attributes;
A testing unit for testing whether the minimum number of occurrences when the anonymization is performed is equal to or greater than the target value,
The anonymization unit ranks the combinations of the attributes based on the number of classes when performing anonymization with a combination of a plurality of attributes, and anonymizes the combinations of the attributes having the ranks corresponding to the predicted values and the number of classes. ,
If the anonymous minimal number of occurrences when performing was assayed with the not more than the target value,
The assay portion may, in accordance with the ranking, have row the anonymous and the test on the number of divisions and close order corresponding to the predicted value,
When it is verified that the minimum number of appearances when performing the anonymization is equal to or more than the target value,
The test unit updates the predicted value with the number of segments when the anonymization is performed,
The anonymization unit performs the anonymization using the updated predicted value,
Anonymous processing device.

When it is verified that the minimum number of occurrences when performing the anonymization is not more than the target value, according to the ranking, a combination of attributes to be subsequently anonymized is determined, and a combination of attributes satisfying the anonymity is determined. anonymity processing apparatus according to claim 1 in which a plurality set the order of searching selectable.

When the personal information is anonymized by a combination of a plurality of attributes, the number of combinations of attribute values that the attribute can take is set to the number of categories, and when the personal information is anonymized by the combination of the attributes, the combination of the attribute values Is the minimum number of occurrences among the number of occurrences, the step of referring to a storage unit that stores the correspondence between the number of divisions and the minimum number of occurrences,
When the minimum number of appearances when the personal information to be processed is anonymized is equal to or more than the target value, based on the correspondence between the number of divisions and the minimum number of occurrences, the number of divisions where the minimum number of occurrences becomes the target value Taking as a predicted value;
Anonymizing personal information to be processed with a combination of a plurality of attributes;
Performing a test as to whether the minimum number of appearances in the case of performing the anonymization is equal to or more than the target value,
Is executed by the computer,
In the anonymizing step, the combinations of the attributes are ranked based on the number of segments in the case of performing the anonymization with a combination of a plurality of attributes, and the combination of the attributes of the order that is the number of segments corresponding to the predicted value Anonymize with
In the step of performing the test, when it is determined that the minimum number of occurrences when the anonymization is performed is not more than the target value, the anonymity is determined according to the ranking and in the order of the number of classifications corresponding to the predicted value. line doctor the reduction and the test
When it is verified that the minimum number of appearances when performing the anonymization is equal to or more than the target value,
In the step of performing the test, the predicted value is updated with the number of segments when the anonymization is performed,
In the anonymizing step, performing the anonymization using the updated predicted value,
Anonymous processing method.