JP2015090617A

JP2015090617A - Anonymized data generation method, device and program

Info

Publication number: JP2015090617A
Application number: JP2013230613A
Authority: JP
Inventors: 裕司山岡; Yuji Yamaoka
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-11-06
Filing date: 2013-11-06
Publication date: 2015-05-11
Anticipated expiration: 2033-11-06
Also published as: JP6156071B2

Abstract

PROBLEM TO BE SOLVED: To anonymize data including a numerical attribute value so as to provide appropriate analysis precision.SOLUTION: An anonymized data generation method includes: (A) extracting a group of data blocks included in one mesh element of a plurality of kinds of meshes, including a non-grouped first data block and having frequency distribution of confidential attribute values meeting a predetermined condition from among a plurality of data blocks regarding each of the plurality of kinds of meshes in a space spread with numerical attributes in the plurality of data blocks stored in a data storage unit and each including the confidential attribute value and the numerical attribute value; and (B) substituting the numerical attribute value of the data block belonging to the group with a numerical attribute value regarding the group.

Description

本技術は、情報の匿名化技術に関する。 The present technology relates to information anonymization technology.

複数の情報提供者から収集した、数値属性値を含むレコード群を、各レコードの情報提供者識別子（以下、単にＩＤと略す）を秘密にして、他者に開示又は提供したい場合がある。このとき、ＩＤを削除して開示又は提供しても、特徴ある数値属性値を有するレコードについては他者が情報提供者を推定できてしまう場合がある。 There is a case where it is desired to disclose or provide a group of records including numerical attribute values collected from a plurality of information providers, with the information provider identifier (hereinafter simply referred to as ID) of each record kept secret. At this time, even if the ID is deleted and disclosed or provided, there is a case where another person can estimate the information provider for a record having a characteristic numerical attribute value.

例えば、個人の位置データの収集者が、情報提供者が分からない形で、分析者に位置データを提供することを考える。ここで、収集者としては位置データについてのサービス提供者、分析者としてはクラウドサービス提供者又はデータ二次利用者（例えば人口密度調査会社など）などが考えられる。 For example, suppose that a collector of personal location data provides location data to an analyst without the information provider knowing. Here, a service provider for location data may be used as a collector, and a cloud service provider or a secondary data user (for example, a population density survey company) may be used as an analyst.

ここで、収集者が収集した位置データが図１に示すものであるとする。図１の例では、各レコードには、行番号と、ＩＤと、Ｘ（緯度）と、Ｙ（経度）とが含まれる。ここでは、各レコードは、Ａ、Ｂ及びＣの３人のいずれかの位置データを表しており、全部で７レコードある。すなわち、同じＩＤのレコードが複数回出現する場合がある。なお、ＩＤは、個人のユーザＩＤである場合もあれば、測定機器のＩＤである場合もある。また、所属する組織のＩＤである場合もある。 Here, it is assumed that the position data collected by the collector is as shown in FIG. In the example of FIG. 1, each record includes a row number, ID, X (latitude), and Y (longitude). Here, each record represents position data of any one of A, B, and C, and there are 7 records in total. That is, a record with the same ID may appear multiple times. The ID may be a personal user ID or a measurement device ID. It may also be the ID of the organization to which it belongs.

図１に示されるデータを地図上にプロットすると、例えば図２に示すようになる。分析者は、図１及び図２のようなデータが得られれば、分析に役立てることができる。例えば、Ａ宅及びＢ宅付近に人が集まっていることが分かる。 When the data shown in FIG. 1 is plotted on a map, it is as shown in FIG. 2, for example. An analyst can use the data as shown in FIGS. 1 and 2 for analysis. For example, it can be seen that people are gathering around the A and B homes.

しかしながら、例えば、収集者が情報提供者との間で、匿名化しない限り他者にデータを提供しないという契約を結んでいる状況が考えられる。情報提供者は、特定の時期にどこにいたかを収集者以外に知られたくないなどの理由で、匿名化を希望する場合がある。 However, for example, there may be a situation in which the collector has a contract with the information provider not to provide data to others unless anonymizing. An information provider may desire anonymization for reasons such as not wanting to know other than the collector where he was at a specific time.

一方、分析者はＩＤ等の情報提供者の情報を利用しない場合もある。位置データの提供者が誰かを知らなくても人口密度調査のような分析はできるためである。 On the other hand, the analyst may not use the information provider information such as ID. This is because an analysis like a population density survey can be performed without knowing who the location data provider is.

このような場合、収集者は図１のデータに対して匿名化を行って、情報提供者の推定を困難にすれば良い。 In such a case, the collector may anonymize the data in FIG. 1 to make it difficult to estimate the information provider.

収集者による単純な匿名化方式として、ＩＤを削除する方式がある。図１からＩＤを削除したデータを分析者が見ても、どのレコードが誰のデータなのかそのままでは分からない。しかし、位置データから情報提供者を推定可能なレコードがあるという問題がある。 As a simple anonymization method by a collector, there is a method of deleting an ID. Even if the analyst looks at the data from which the ID has been deleted from FIG. 1, it is not possible to know which record is who's data. However, there is a problem that there is a record that can estimate the information provider from the position data.

図１からＩＤを削除したデータを図２のように地図上にプロットすると、例えば第１レコードの位置データ（Ｘ，Ｙ）＝（６，２）はＡ宅内であることが分かってしまう。すなわち、ＩＤが削除されたデータしか見ることのできない分析者であっても、第１レコードの情報提供者がＡであることが推定できてしまい、十分に匿名化されているとは言い難い。同様に、第７レコード以外は匿名化が不十分である。 If the data from which the ID is deleted from FIG. 1 is plotted on the map as shown in FIG. 2, for example, it will be understood that the position data (X, Y) = (6, 2) of the first record is in A's home. That is, even an analyst who can only see data from which the ID has been deleted can estimate that the information provider of the first record is A, and is not sufficiently anonymized. Similarly, anonymization is insufficient except for the seventh record.

従来技術として、事前に定められた、重なりのない複数の数値範囲をグループとして把握し、各グループ内のレコード群をそれらの統計値に変換する方法がある。 As a conventional technique, there is a method of grasping a plurality of predetermined numerical ranges without overlapping as a group and converting a group of records into each statistical value.

この従来技術では、緯度及び経度に基づいて地域をメッシングし、各メッシュ要素内のレコード群についての統計値を計算し、それを開示又は提供する。 In this prior art, a region is meshed based on latitude and longitude, and a statistical value for a group of records in each mesh element is calculated and disclosed or provided.

統計値としては、例えば「メッシュ要素Ｍ１には３レコード」というように、メッシュ要素毎のレコード数が用いられる。もしくは、各レコードのＩＤを削除し、位置をメッシュ要素の中心点に変換しても良い。 As the statistical value, the number of records for each mesh element is used, for example, “3 records for mesh element M1”. Alternatively, the ID of each record may be deleted and the position may be converted to the center point of the mesh element.

例えば、図１の各レコードを一辺「５」のメッシュ要素によりグループ分けし、変換する場合を考える。その場合、例えば（Ｘ，Ｙ）＝（[5, 10), [0, 5)）などが１つのメッシュ要素、すなわちグループとなる。このメッシュ要素を仮にＭ１０と名付けると、図１では第１レコードのみがＭ１０に分類される。よって、「メッシュＭ１０には１レコードあった」ことが開示される、あるいは第１レコードが（Ｘ，Ｙ）＝（7.5, 2.5）（Ｍ１０の中心点）に変換され開示されることになる。 For example, consider a case where each record in FIG. 1 is grouped and converted by a mesh element having a side of “5”. In this case, for example, (X, Y) = ([5, 10), [0, 5)) is one mesh element, that is, a group. If this mesh element is named M10, only the first record is classified as M10 in FIG. Accordingly, it is disclosed that “there is one record in the mesh M10”, or the first record is converted into (X, Y) = (7.5, 2.5) (center point of M10) and disclosed.

この従来技術では、メッシュサイズが十分大きければ匿名性に問題は生じないが、メッシュサイズを小さくすると匿名性が脅かされるという問題がある。例えば、メッシュ要素Ｍ１０が、もしＡ宅の敷地内に包含された場合（例えば、Ａ宅の敷地が（Ｘ，Ｙ）＝（[2, 10], [0, 6]）の場合など）、メッシュ要素Ｍ１０に分類されたレコードの情報提供者はＡだと推定できてしまう。メッシュサイズを小さくするほど、特定のＩＤしか存在し得ないような地域にメッシュ要素が包含される可能性が高くなる。 In this prior art, there is no problem in anonymity if the mesh size is sufficiently large, but there is a problem that anonymity is threatened if the mesh size is reduced. For example, if the mesh element M10 is included in the site of the A house (for example, the site of the A house is (X, Y) = ([2, 10], [0, 6]), etc.) The information provider of the record classified into the mesh element M10 can be estimated as A. The smaller the mesh size, the higher the possibility that the mesh element is included in an area where only a specific ID can exist.

一方、メッシュサイズを大きくするほど、位置の一般化度合いが大きくなり、分析者による分析の精度に大きな悪影響を与えるという問題がある。例えば、統計調査では一辺約１ｋｍのメッシュ要素が使われることがあるが、その結果だけを使う限り１ｋｍ単位より詳細な地域に関する分析結果を出すことは一般的にはできない。 On the other hand, as the mesh size is increased, the degree of generalization of the position is increased, and there is a problem that the analysis accuracy by the analyst is greatly adversely affected. For example, in a statistical survey, a mesh element having a side of about 1 km may be used. However, as long as only the result is used, it is generally not possible to produce an analysis result regarding a region more detailed than 1 km unit.

このように、この従来技術は、匿名性を担保するためにメッシュサイズを十分に大きくしなければならず、分析の精度に大きな悪影響を与えるという問題がある。 As described above, this conventional technique has a problem that the mesh size must be sufficiently increased in order to ensure anonymity, and the analysis accuracy is greatly adversely affected.

また、グループを生成する別の従来技術として、事前に決めた値ｄ及びｋに対し、大きさｄ未満の範囲内に、ｋ個以上のレコードが含まれるように、且つ別の範囲と重ならないように範囲の位置を調整し、その範囲に基づきグループ化する技術がある。 As another conventional technique for generating a group, k or more records are included in a range less than the size d with respect to predetermined values d and k, and do not overlap another range. There is a technique for adjusting the position of the range and grouping based on the range.

この従来技術は対象データとして互いに異なるＩＤを有するレコード群を前提にしており、その場合は適切な匿名性が担保されるが、図１のように同じＩＤのレコードが複数存在し得るデータに対しては十分な匿名性を担保できないという問題がある。 This prior art is based on record groups having different IDs as target data, and in that case, proper anonymity is ensured, but for data in which multiple records with the same ID can exist as shown in FIG. However, there is a problem that sufficient anonymity cannot be secured.

例えば、この従来技術の一部を適用し、図１の各レコードを一辺「５」未満の矩形（ｄ＝(5, 5)）で、３個以上（ｋ＝３）のレコードが含まれるようグループ分けする場合を考える。この場合、例えばレコード｛１，２，３｝を含む矩形Ｒ４３:（Ｘ，Ｙ）＝（[2, 6], [2, 4]）と、レコード｛４，５，６｝を含む矩形Ｒ４９:（Ｘ，Ｙ）＝（[2, 6], [8, 10]）の２つのグループができる。しかし、上でも述べた例と同じように、特定のＩＤしか存在し得ないような地域に矩形が包含される可能性がある。例えば矩形Ｒ４３がＡ宅の敷地内に包含された場合には、矩形Ｒ４３に分類されたレコード｛１，２，３｝の情報提供者がＡだと推定できてしまう。 For example, by applying a part of this prior art, each record of FIG. 1 is a rectangle (d = (5, 5)) with a side less than “5” and includes three or more (k = 3) records. Consider the case of grouping. In this case, for example, a rectangle R43 including records {1, 2, 3}: (X, Y) = ([2, 6], [2, 4]) and a rectangle R49 including records {4, 5, 6}. : (X, Y) = ([2, 6], [8, 10]) Two groups are created. However, as in the example described above, there is a possibility that a rectangle is included in an area where only a specific ID can exist. For example, when the rectangle R43 is included in the premises of the house A, it can be estimated that the information provider of the record {1, 2, 3} classified into the rectangle R43 is A.

一般的に、図１のように同じＩＤのレコードが複数存在するレコード群をも取り扱うことができる手法の方が適用範囲が広くて良い。例えば、組織が情報提供者の場合は特に、複数の測定機器のデータに同じＩＤ（すなわち組織ＩＤ）が記録される場合もある。また、ＩＤが同じ複数のレコードの存在を許すことで多くのレコードを一度に分析できるようになり、分析精度の向上が望める。しかしながら、この従来技術は、特殊な対象データでしか匿名性を担保できず、適用できる場面が少ないという問題がある。 In general, a technique that can handle a group of records having a plurality of records with the same ID as shown in FIG. For example, particularly when the organization is an information provider, the same ID (ie, organization ID) may be recorded in the data of a plurality of measuring devices. In addition, by allowing the existence of a plurality of records having the same ID, it becomes possible to analyze many records at once, and improvement in analysis accuracy can be expected. However, this conventional technique has a problem that anonymity can be ensured only with special target data, and there are few scenes that can be applied.

さらに、グループ化する別の従来技術として、ＩＤのような機密属性値の種類が各グループ内でｌ種類以上ある（すなわちｌ−多様性を満たす）ようにする技術がある。この従来技術はグループを事前に決めた大きさ未満の範囲内にすることが難しいという問題がある。事前に決めた大きさ未満の範囲内にできないと、分析の精度に大きな悪影響を与えるという問題がある。 Further, as another conventional technique for grouping, there is a technique for making there are 1 or more types of confidential attribute values such as IDs (that is, satisfying l-diversity) in each group. This prior art has a problem that it is difficult to make a group within a range smaller than a predetermined size. If the size cannot be within a predetermined range, there is a problem that the accuracy of analysis is greatly adversely affected.

O. Abul, F. Bonchi, and M. Nanni. Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases. In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp.376-385 (2008).O. Abul, F. Bonchi, and M. Nanni.Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases.In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp.376-385 (2008). A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, Vol. 1, Issue 1, Article No. 3, 2007.A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. L-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, Vol. 1, Issue 1, Article No. 3, 2007.

従って、本技術の目的は、一側面によれば、適切な分析精度を出すことができるように数値属性値を含むデータの匿名化を可能にするための技術を提供することである。 Therefore, the objective of this technique is to provide the technique for enabling the anonymization of the data containing a numerical attribute value so that appropriate analysis precision can be taken out according to one side surface.

本発明に係る匿名化データ生成方法は、（Ａ）データ格納部に格納されており且つ各々機密属性値と数値属性値とを含む複数のデータブロックにおける各数値属性で張られる空間における複数種類のメッシュの各々について、複数のデータブロックのうち、当該メッシュの１つのメッシュ要素に含まれ且つグループ化未了の第１のデータブロックを含み且つ機密属性値の度数分布が所定の条件を満すデータブロックのグループを抽出し、（Ｂ）上記グループに属するデータブロックの数値属性値を、上記グループについての数値属性値で置換する処理を含む。 The anonymized data generation method according to the present invention includes (A) a plurality of types of spaces in a space spanned by each numerical attribute in a plurality of data blocks that are stored in a data storage unit and each include a confidential attribute value and a numerical attribute value. For each mesh, data that includes a first data block that is included in one mesh element of the mesh and that has not been grouped among a plurality of data blocks, and whose frequency distribution of confidential attribute values satisfies a predetermined condition A process of extracting a group of blocks and (B) replacing a numerical attribute value of a data block belonging to the group with a numerical attribute value for the group.

一側面によれば、適切な分析精度を出すことができるように数値属性値を含むデータを匿名化できるようになる。 According to one aspect, data including numerical attribute values can be anonymized so that appropriate analysis accuracy can be obtained.

図１は、データの一例を示す図である。FIG. 1 is a diagram illustrating an example of data. 図２は、データと他のデータとの重ね合わせの一例を示す図である。FIG. 2 is a diagram illustrating an example of superposition of data and other data. 図３は、本実施の形態に係る情報処理装置の構成例を示す図である。FIG. 3 is a diagram illustrating a configuration example of the information processing apparatus according to the present embodiment. 図４は、第１データ格納部に格納されるデータの一例を示す図である。FIG. 4 is a diagram illustrating an example of data stored in the first data storage unit. 図５は、本実施の形態における処理フローを示す図である。FIG. 5 is a diagram showing a processing flow in the present embodiment. 図６は、メッシュ生成処理の処理フローを示す図である。FIG. 6 is a diagram illustrating a processing flow of mesh generation processing. 図７は、２種類のメッシュが生成された状態を示す図である。FIG. 7 is a diagram illustrating a state where two types of meshes are generated. 図８は、メッシュの基準点のばらつきを表す図である。FIG. 8 is a diagram illustrating the variation of the reference points of the mesh. 図９は、複数種類のメッシュのメッシュ要素の重なりを示す図である。FIG. 9 is a diagram illustrating overlapping of mesh elements of a plurality of types of meshes. 図１０は、グループ生成処理の処理フローを示す図である。FIG. 10 is a diagram illustrating a process flow of the group generation process. 図１１は、１番目のメッシュについての分類結果の例を示す図である。FIG. 11 is a diagram illustrating an example of the classification result for the first mesh. 図１２は、レコード群Ｒｓ抽出処理の処理フローを示す図である。FIG. 12 is a diagram illustrating a process flow of the record group Rs extraction process. 図１３は、除外処理の処理フローを示す図である。FIG. 13 is a diagram illustrating a processing flow of the exclusion process. 図１４は、除外処理において生成される度数分布表の一例を示す図である。FIG. 14 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図１５は、除外処理において生成される度数分布表の一例を示す図である。FIG. 15 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図１６は、除外処理において生成される度数分布表の一例を示す図である。FIG. 16 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図１７は、除外処理において生成される度数分布表の一例を示す図である。FIG. 17 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図１８は、除外処理において生成される度数分布表の一例を示す図である。FIG. 18 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図１９は、除外処理において生成される度数分布表の一例を示す図である。FIG. 19 is a diagram illustrating an example of a frequency distribution table generated in the exclusion process. 図２０は、レコード群Ｒｓ抽出処理の処理フローを示す図である。FIG. 20 is a diagram illustrating a process flow of the record group Rs extraction process. 図２１は、譲渡可能レコードの抽出処理の処理フローを示す図である。FIG. 21 is a diagram illustrating a processing flow of transferable record extraction processing. 図２２は、抽出処理において生成される度数分布表の一例を示す図である。FIG. 22 is a diagram illustrating an example of a frequency distribution table generated in the extraction process. 図２３は、抽出処理において生成される度数分布表の一例を示す図である。FIG. 23 is a diagram illustrating an example of a frequency distribution table generated in the extraction process. 図２４は、抽出処理において生成される度数分布表の一例を示す図である。FIG. 24 is a diagram illustrating an example of a frequency distribution table generated in the extraction process. 図２５は、抽出処理において生成される度数分布表の一例を示す図である。FIG. 25 is a diagram illustrating an example of a frequency distribution table generated in the extraction process. 図２６は、抽出処理において生成される度数分布表の一例を示す図である。FIG. 26 is a diagram illustrating an example of a frequency distribution table generated in the extraction process. 図２７は、メッシュとグループの対応付けテーブルの一例を示す図である。FIG. 27 is a diagram illustrating an example of a mesh-group association table. 図２８は、譲れるレコード及び譲れないレコードの区別を行うためのグループデータテーブルの一例を示す図である。FIG. 28 is a diagram illustrating an example of a group data table for distinguishing between records that cannot be transferred and records that cannot be transferred. 図２９は、行番号とグループＩＤの対応付けテーブルの一例を示す図である。FIG. 29 is a diagram illustrating an example of a correspondence table between row numbers and group IDs. 図３０は、メッシュとグループの対応付けテーブルの一例を示す図である。FIG. 30 is a diagram illustrating an example of a mesh / group association table. 図３１は、譲れるレコード及び譲れないレコードの区別を行うためのグループデータテーブルの一例を示す図である。FIG. 31 is a diagram illustrating an example of a group data table for distinguishing between records that cannot be transferred and records that cannot be transferred. 図３２は、行番号とグループＩＤの対応付けテーブルの一例を示す図である。FIG. 32 is a diagram illustrating an example of a correspondence table between row numbers and group IDs. 図３３は、２番目のメッシュについての分類結果の例を示す図である。FIG. 33 is a diagram illustrating an example of the classification result for the second mesh. 図３４は、メッシュとグループの対応付けテーブルの一例を示す図である。FIG. 34 is a diagram illustrating an example of a mesh / group association table. 図３５は、譲れるレコード及び譲れないレコードの区別を行うためのグループデータテーブルの一例を示す図である。FIG. 35 is a diagram illustrating an example of a group data table for distinguishing between a record that can be transferred and a record that cannot be transferred. 図３６は、行番号とグループＩＤの対応付けテーブルの一例を示す図である。FIG. 36 is a diagram illustrating an example of a correspondence table between row numbers and group IDs. 図３７は、３番目のメッシュについての分類結果の例を示す図である。FIG. 37 is a diagram illustrating an example of the classification result for the third mesh. 図３８は、グループ化処理の結果の一例を示す図である。FIG. 38 is a diagram illustrating an example of a result of the grouping process. 図３９は、処理結果の一例を示す図である。FIG. 39 is a diagram illustrating an example of the processing result. 図４０は、コンピュータの機能ブロック図である。FIG. 40 is a functional block diagram of a computer.

図３に、本発明の実施の形態に係る情報処理装置１００の機能ブロック図を示す。情報処理装置１００は、第１データ格納部１１０と、設定データ格納部１２０と、グループ化処理部１３０と、第２データ格納部１４０と、匿名化処理部１５０と、第３データ格納部１６０と、出力部１７０と、メッシュ生成部１８０と、第４データ格納部１９０とを有する。 FIG. 3 shows a functional block diagram of the information processing apparatus 100 according to the embodiment of the present invention. The information processing apparatus 100 includes a first data storage unit 110, a setting data storage unit 120, a grouping processing unit 130, a second data storage unit 140, an anonymization processing unit 150, and a third data storage unit 160. , An output unit 170, a mesh generation unit 180, and a fourth data storage unit 190.

第１データ格納部１１０は、例えば図４に示すような匿名化前のデータを格納している。図４の例では、各レコード（データブロックとも呼ぶ）は、ＩＤと、Ｘ（緯度）と、Ｙ（経度）と、速さと含む。行番号は、以下の説明のために付されている。 The first data storage unit 110 stores data before anonymization as shown in FIG. 4, for example. In the example of FIG. 4, each record (also referred to as a data block) includes an ID, X (latitude), Y (longitude), and speed. Line numbers are given for the following explanation.

また、設定データ格納部１２０は、生成すべきメッシュの種類数ｘと、メッシュ要素のサイズｄと、度数分布についての条件（度数分布パターンとも呼ぶ）と、第１データ格納部１１０に格納されているデータのうちの機密属性（例えばＩＤ属性。機微属性とも呼ぶ。）及び数値属性（例えば緯度Ｘ及び経度Ｙを含む位置データ）の指定とが格納される。度数分布パターンは、最小種類数ｌと、減衰率ａとを含む。最小種類数ｌは２以上の整数であり、減衰率ａは１以下の正の実数である。例えば、ｌ種類のＩＤについて度数の多い順にｎ番目の度数がｎ−１番目の度数のａ倍以上であるという度数分布パターンが条件として設定される。 The setting data storage unit 120 is stored in the first data storage unit 110 and the number x of mesh types to be generated, the size d of the mesh elements, the condition for the frequency distribution (also referred to as a frequency distribution pattern), and the like. In the stored data, a confidential attribute (for example, an ID attribute; also called a sensitive attribute) and a numerical attribute (for example, position data including latitude X and longitude Y) are stored. The frequency distribution pattern includes a minimum kind number l and an attenuation rate a. The minimum kind number l is an integer of 2 or more, and the attenuation rate a is a positive real number of 1 or less. For example, a frequency distribution pattern in which the nth frequency is a times greater than the (n−1) th frequency in the descending order of frequencies for one type of ID is set as a condition.

グループ化処理部１３０は、設定データ格納部１２０及び第４データ格納部１９０に格納されているデータを用いて、第１データ格納部１１０に格納されているレコード群をグループ化する処理を行い、処理結果を第２データ格納部１４０に格納する。匿名化処理部１５０は、グループ化の結果に基づき、各グループに属するレコードの数値属性値を変換する処理を行い、処理結果を第３データ格納部１６０に格納する。出力部１７０は、第３データ格納部１６０に格納されているデータを、他のコンピュータ、表示装置や印刷装置などに出力する。 The grouping processing unit 130 performs processing for grouping the record groups stored in the first data storage unit 110 using the data stored in the setting data storage unit 120 and the fourth data storage unit 190, The processing result is stored in the second data storage unit 140. The anonymization processing unit 150 performs processing for converting the numerical attribute values of the records belonging to each group based on the grouping result, and stores the processing result in the third data storage unit 160. The output unit 170 outputs the data stored in the third data storage unit 160 to another computer, a display device, a printing device, or the like.

メッシュ生成部１８０は、設定データ格納部１２０に格納されているメッシュの種類数ｘ及びメッシュ要素のサイズｄに従って、メッシュのデータを生成し、第４データ格納部１９０に格納する。 The mesh generation unit 180 generates mesh data according to the number of types of meshes x stored in the setting data storage unit 120 and the size d of the mesh elements, and stores the generated mesh data in the fourth data storage unit 190.

メッシュ要素のサイズｄは、各数値属性で張られる空間においてメッシュ要素の辺の長さを表す正の数値列である。たとえば、数値属性が（Ｘ，Ｙ）の２次元なら、ｄ＝（６，６）などと設定される。 The size d of the mesh element is a positive numeric string representing the length of the side of the mesh element in the space spanned by each numeric attribute. For example, if the numerical attribute is two-dimensional (X, Y), d = (6, 6) is set.

メッシュの種類数ｘは、自然数である。例えば、異なるメッシュを６種類使うなら、ｘ＝６となる。本実施の形態では、メッシュはｎ次元矩形メッシュ（ｎは数値属性の数）とする。なお、メッシュは複数のメッシュ要素を含み、メッシュ要素間の境界（壁とも呼ぶ）は、２次元であれば辺であり、３次元であれば平面であり、４次元以上であれば超平面となる。また、メッシュは一般的には、三角形メッシュや六角形メッシュなど、数値属性の値がどのメッシュ要素に該当するかすぐに計算できるような規則に従っていれば、矩形に限らない。 The number of mesh types x is a natural number. For example, if six different meshes are used, x = 6. In this embodiment, the mesh is an n-dimensional rectangular mesh (n is the number of numerical attributes). The mesh includes a plurality of mesh elements, and the boundary (also referred to as a wall) between the mesh elements is a side if 2D, a plane if 3D, and a hyperplane if 4D or more. Become. In general, the mesh is not limited to a rectangle, such as a triangular mesh or a hexagonal mesh, as long as it conforms to a rule that can immediately calculate which mesh element corresponds to the value of a numerical attribute.

次に、図５乃至図３９を用いて、情報処理装置１００の処理内容について説明する。 Next, processing contents of the information processing apparatus 100 will be described with reference to FIGS.

メッシュ生成部１８０は、メッシュ生成処理を実行し、生成されたメッシュのデータを、第４データ格納部１９０に格納する（図５：ステップＳ１）。メッシュ生成処理については、図６乃至図９を用いて説明する。 The mesh generation unit 180 executes mesh generation processing, and stores the generated mesh data in the fourth data storage unit 190 (FIG. 5: step S1). The mesh generation process will be described with reference to FIGS.

まず、メッシュ生成部１８０は、メッシュの空のリストＭを生成する（図６：ステップＳ１１）。また、メッシュ生成部１８０は、原点（０，０）をリストＭに追加する（ステップＳ１３）。まず、メッシュ要素の１辺の長さが「１」であるメッシュを生成するものとする。従って、以降の処理では、基準の点についての各次元の値域が［０，１）に入るメッシュが、リストＭに追加される。 First, the mesh generation unit 180 generates an empty list M of meshes (FIG. 6: Step S11). Further, the mesh generation unit 180 adds the origin (0, 0) to the list M (step S13). First, it is assumed that a mesh in which the length of one side of the mesh element is “1” is generated. Therefore, in the subsequent processing, a mesh in which the range of each dimension for the reference point falls within [0, 1) is added to the list M.

そして、メッシュ生成部１８０は、リストＭに含まれるメッシュの種類数｜Ｍ｜は、指定された種類数ｘに達したか否かを判断する（ステップＳ１５）。ステップＳ１３の直後であれば、｜Ｍ｜＝１であり、例えばｘ＝６であるとすると、この条件を満たしていない。 Then, the mesh generation unit 180 determines whether or not the number of types of meshes | M | included in the list M has reached the designated number of types x (step S15). If it is immediately after step S13, | M | = 1, for example, if x = 6, this condition is not satisfied.

ステップＳ１５の条件を満たしていない場合には、メッシュ生成部１８０は、リストＭに属する各メッシュの境界のうち最も近い境界との距離が最大となる点Ｐ’を選択する（ステップＳ１９）。 If the condition of step S15 is not satisfied, the mesh generation unit 180 selects a point P ′ that maximizes the distance from the nearest boundary among the boundaries of each mesh belonging to the list M (step S19).

本実施の形態では、リストＭに含まれるメッシュの種類数｜Ｍ｜に応じた点（基準点とも呼ぶ）を発生させる。すなわち、ｃ＝ceil（log₂（｜Ｍ｜＋１））（ceilは天井関数）と定義して、点（２^-c，２^-c）を基点として、［０，１）内を幅２^-c+1で変動させた点を生成する。最初は、｜Ｍ｜＝１なので、ｃ＝１となる。従って、（１／２，１／２）のみが生成される。 In the present embodiment, points (also referred to as reference points) corresponding to the number of mesh types | M | included in the list M are generated. That is, c = ceil (log ₂ (| M | +1)) (ceil is a ceiling function), and a point (2 ^−c , 2 ^−c ) is used as a base point, and [0, 1) has a width 2 ⁻ Generate points fluctuated by ^{c + 1} . At first, since | M | = 1, c = 1. Therefore, only (1/2, 1/2) is generated.

さらに、メッシュ生成部１８０は、選択された点Ｐ’のうち、リストＭに属する各メッシュの格子点のうち近い格子点との距離が最大となる点のメッシュをリストＭに追加する（ステップＳ２１）。最初は、基準点が（１／２，１／２）となるメッシュがリストＭに追加される。そして処理はステップＳ１５に戻る。 Further, the mesh generation unit 180 adds, to the list M, a mesh at a point where the distance from the closest lattice point among the lattice points of each mesh belonging to the list M is the largest among the selected points P ′ (step S21). ). Initially, a mesh whose reference point is (1/2, 1/2) is added to the list M. Then, the process returns to step S15.

この状態では、図７に示すようなメッシュが生成されたことになる。なお、まだメッシュ要素の辺の長さは「１」のままである。 In this state, a mesh as shown in FIG. 7 is generated. The side length of the mesh element is still “1”.

ステップＳ１５において、リストＭに含まれるメッシュの種類数｜Ｍ｜が、指定された種類数ｘに達した場合には、メッシュ生成部１８０は、メッシュ要素のサイズｄから、各メッシュを設定し、第４データ格納部１９０に格納する（ステップＳ２３）。そして処理は呼び出し元の処理に戻る。 In step S15, when the number of mesh types | M | included in the list M reaches the specified number of types x, the mesh generation unit 180 sets each mesh from the size d of the mesh elements, The data is stored in the fourth data storage unit 190 (step S23). The process then returns to the caller process.

上で述べた例では、｜Ｍ｜＝２で次にステップＳ１９に移行すると、ｃ＝２となるので、Ｐ＝｛（１／４，１／４），（１／４，３／４），（３／４，１／４），（３／４，３／４）｝が生成される。この段階では、どの点も、リストＭに含まれる各メッシュの格子点からの距離は同じなので、ステップＳ２１では、いずれかの点、例えばｐ＝（１／４，１／４）を選び、リストＭに追加する。すなわち、Ｍ＝｛（０，０），（１／２，１／２），（１／４，１／４）｝となる。 In the example described above, when | M | = 2 and the process proceeds to step S19, c = 2 is obtained, so that P = {(1/4/4), (1/4, 3/4). , (3/4, 1/4), (3/4, 3/4)} are generated. At this stage, since each point has the same distance from the lattice point of each mesh included in the list M, in step S21, one of the points, for example, p = (1/4, 1/4) is selected and the list is selected. Add to M. That is, M = {(0, 0), (1/2, 1/2), (1/4, 1/4)}.

さらに、｜Ｍ｜＝３で次にステップＳ１９に移行すると、ｃ＝２のままとなる。従って、前回と同じＰが生成されるが、（３／４，１／４）及び（１／４，３／４）は、（１／４，１／４）のメッシュの辺上に載るので、それより距離が遠い（３／４，３／４）が採用される。ステップＳ２１でも、ｐ＝（３／４，３／４）が選択されて、リストＭに追加される。すなわち、Ｍ＝｛（０，０），（１／２，１／２），（１／４，１／４），（３／４，３／４）｝となる。 Further, when | M | = 3 and then the process proceeds to step S19, c = 2 remains. Therefore, the same P as before is generated, but (3/4, 1/4) and (1/4, 3/4) are placed on the side of the mesh of (1/4, 1/4). , (3/4, 3/4) which is farther than that is adopted. Also in step S21, p = (3/4, 3/4) is selected and added to the list M. That is, M = {(0, 0), (1/2, 1/2), (1/4, 1/4), (3/4, 3/4)}.

さらに、｜Ｍ｜＝４で次にステップＳ１９に移行すると、ｃ＝３となるので、Ｐ＝｛（１／８，１／８）、（１／８，３／８），．．．，（３／８，１／８），（３／８，３／８），．．．，（７／８，７／８）｝が生成される。Ｐに含まれる点のうち、リストＭの各メッシュの格子点のうち、近い点と距離が最大となる点の１つであるｐ（１／８，５／８）をリストＭに追加する。なお、近い格子点との距離が最大とは、たとえば、最も近い格子点とのユークリッド距離が最大で、それが同じ場合次に近い格子点とのユークリッド距離が最大で、といったように以降同様に各格子点との距離を比較して決める。例を挙げると、点（１／８，５／８）の最も近い点の１つである（１／４，１／４）との距離は１０^1/2／８だが、（１／８，３／８）の最も近い点（１／４，１／４）との距離は２^1/2／８であり、距離が短い後者の点は採用されない。そうすると、Ｍ＝｛（０，０），（１／２，１／２），（１／４，１／４），（３／４，３／４），（１／８，５／８）｝となる。 Further, when | M | = 4 and the process proceeds to step S19, c = 3, so that P = {(1/8, 1/8), (1/8, 3/8),. . . , (3/8, 1/8), (3/8, 3/8),. . . , (7/8, 7/8)} is generated. Among the points included in P, p (1/8, 5/8) which is one of the points having the maximum distance from the closest point among the mesh points of each mesh in the list M is added to the list M. Note that the distance to the nearest grid point is the maximum, for example, the Euclidean distance to the nearest grid point is the maximum, and if it is the same, the Euclidean distance to the next nearest grid point is the maximum, and so on. Determine by comparing the distance to each grid point. For example, the distance from (1/4, 1/4), which is one of the closest points of the point (1/8, 5/8), is 10 ^1/2 / 8, but (1/8, The distance from the nearest point (1/4, 1/4) of 3/8) is 2 ^1/2 / 8, and the latter point having a short distance is not adopted. Then, M = {(0, 0), (1/2, 1/2), (1/4, 1/4), (3/4, 3/4), (1/8, 5/8) }.

さらに、｜Ｍ｜＝５で次にステップＳ１９に移行すると、ｃ＝３のままとなる。そして同様に処理すると、（５／８，１／８）が選択されて、Ｍ＝｛（０，０），（１／２，１／２），（１／４，１／４），（３／４，３／４），（１／８，５／８），（５／８，１／８）｝となる。ここで、｜Ｍ｜＝ｘ＝６となり、ｄ＝（６，６）から、以下のようなメッシュの基準点が生成される。Ｍ＝｛（０，０），（３，３），（３／２，３／２），（９／２，９／２），（３／４，１５／４），（１５／４，３／４）｝。そして、処理は呼び出し元の処理に戻る。 Further, when | M | = 5 and the process proceeds to step S19, c = 3 remains. If the same processing is performed, (5/8, 1/8) is selected, and M = {(0, 0), (1/2, 1/2), (1/4, 1/4), ( 3/4, 3/4), (1/8, 5/8), (5/8, 1/8)}. Here, | M | = x = 6, and the following mesh reference point is generated from d = (6, 6). M = {(0,0), (3,3), (3/2, 3/2), (9/2, 9/2), (3/4, 15/4), (15/4, 3/4)}. Then, the process returns to the caller process.

図６の処理フローでは、メッシュの境界を共有しないように、ずらしてメッシュを生成するようにしている。このようにずらしてメッシュを生成することで、以下で述べるグループ化処理で、固定のメッシュ要素単位でグループ化を行うが、多様なグループ化が可能になる。 In the processing flow of FIG. 6, the meshes are generated by shifting so as not to share the boundary of the meshes. By generating meshes by shifting in this way, grouping is performed in units of fixed mesh elements in the grouping process described below, but various groupings are possible.

なお、上で述べた基準点は、図８に示すようにばらつくようになる。２次元の矩形格子を採用しているので、メッシュがそれぞれ少しずつずらして生成されていることがこの図からも分かる。なお、図中の数字は生成順番を示している。また、図９に、実際のメッシュの重なり状態を示している。ここでは６番目に生成されるメッシュのメッシュ要素までを１つずつ示しているが、メッシュ要素の境界は共有されていない。 Note that the reference points described above vary as shown in FIG. Since a two-dimensional rectangular grid is used, it can be seen from this figure that the meshes are generated with a slight shift. The numbers in the figure indicate the generation order. FIG. 9 shows an actual mesh overlapping state. Here, the mesh elements of the mesh generated sixth are shown one by one, but the boundaries of the mesh elements are not shared.

なお、図６の処理フローは、リストＭに含まれるメッシュから最も遠い位置のメッシュをリストＭに追加している処理とも言える。最も遠い位置の定義としては、以下のようなものとなる。 Note that the processing flow of FIG. 6 can also be said to be processing in which a mesh farthest from the meshes included in the list M is added to the list M. The definition of the farthest position is as follows.

すなわち、リストＭと、２つのメッシュｍ１及びｍ２を入力として、Ｍからｍ２の方がｍ１よりも遠い場合には−１を、ｍ１の方がｍ２より遠い場合には１を、ｍ１とｍ２が同じ遠さの場合には０を返す関数ｃｍｐ（Ｍ，ｍ１，ｍ２）：｛−１，０，１｝を用意して、最も遠いメッシュの集合Ｆは、Ｆ＝｛ｍ｜∀ｍ’，ｃｍｐ（Ｍ，ｍ，ｍ’）≧０｝で定義される。 That is, when the list M and the two meshes m1 and m2 are input, -1 is obtained when M to m2 is farther than m1, 1 is given when m1 is farther than m2, and m1 and m2 are given. For the same distance, a function cmp (M, m1, m2): {-1, 0, 1} that returns 0 is prepared, and the set F of the farthest meshes is F = {m | ∀m ′, It is defined by cmp (M, m, m ′) ≧ 0}.

例えば、図６の処理フローにおけるステップＳ１９及びＳ２１の処理をｃｍｐ（Ｍ，ｍ１，ｍ２）に当てはめると、以下のようになる。
１．ｍ１及びｍ２それぞれの任意の境界（辺）と、リストＭの各メッシュの境界の中で最も近い境界との距離をｄ１及びｄ２としたとき、ｄ１＜ｄ２なら−１を返し、ｄ１＞ｄ２なら１を返す。
２．ｍ１及びｍ２それぞれの基準点と、リストＭに含まれる各メッシュの格子点の中で１番目に近い頂点との距離をｄ１及びｄ２としたとき、ｄ１＜ｄ２なら−１を返し、ｄ１＞ｄ２なら１を返す。
３．そうでない場合、２と同様に、２番目に近い格子点との距離の比較を行って条件に合うか判断し、...、｜Ｍ｜番目に近い格子点との距離の比較を行って条件に合うか判断する、ということを繰り返し、条件にあった時点で−１又は１を返す。
４．そうでなければ０を返す。 For example, when the processes of steps S19 and S21 in the process flow of FIG. 6 are applied to cmp (M, m1, m2), the following is obtained.
1. When the distance between an arbitrary boundary (side) of each of m1 and m2 and the nearest boundary among the boundaries of each mesh of list M is d1 and d2, -1 is returned if d1 <d2, and if d1> d2, Returns 1.
2. When d1 and d2 are distances between the reference points of m1 and m2 and the first closest vertex among the mesh points of each mesh included in the list M, -1 is returned if d1 <d2, and d1> d2 Returns 1 if.
3. Otherwise, as in 2, compare the distance with the second closest grid point to determine whether the condition is met, and then compare the distance with the ... M | It is repeatedly determined that the condition is met, and -1 or 1 is returned when the condition is met.
4). Otherwise 0 is returned.

このような関数ｃｍｐによってＦを求め、その中から基準点が最も原点に近いメッシュを選択すればよい。 What is necessary is just to obtain | require F by such a function cmp, and to select the mesh from which the reference point is the closest to the origin.

図５の処理フローの説明に戻って、グループ化処理部１３０は、第１データ格納部１１０に格納されているレコードに対して、第４データ格納部１９０及び設定データ格納部１２０に格納されているデータに従って、グループ生成処理を実行する（ステップＳ３）。グループ生成処理については、図１０乃至図３８を用いて説明する。 Returning to the description of the processing flow of FIG. 5, the grouping processing unit 130 stores the records stored in the first data storage unit 110 in the fourth data storage unit 190 and the setting data storage unit 120. The group generation process is executed in accordance with the stored data (step S3). The group generation process will be described with reference to FIGS.

まず、グループ化処理部１３０は、第４データ格納部１９０におけるリストＭから、未処理のメッシュｍを１つ特定する（図１０：ステップＳ３１）。そして、グループ化処理部１３０は、第１データ格納部１１０に格納されているレコード群を、特定されたメッシュｍのメッシュ要素で分類する（ステップＳ３３）。 First, the grouping processing unit 130 identifies one unprocessed mesh m from the list M in the fourth data storage unit 190 (FIG. 10: step S31). Then, the grouping processing unit 130 classifies the record group stored in the first data storage unit 110 by the mesh element of the identified mesh m (step S33).

具体的には、各レコードに対してメッシュ要素識別子（ＩＤ）を算出し、ＩＤと行番号との対応関係表を生成する。例えば、メッシュ要素ＩＤは、ｆｌｏｏｒ（（数値属性−ｍ）／ｄ）として算出する。なお、ｆｌｏｏｒは床関数である。例えば、図４における行番号「８」のレコードは（Ｘ，Ｙ）＝（６，５）であり、ｍ＝（０，０），ｄ＝（６，６）であるから、（ｆｌｏｏｒ（（６−０）／６），ｆｌｏｏｒ（（５−０）／６））＝（１，０）がメッシュ要素ＩＤとなる。 Specifically, a mesh element identifier (ID) is calculated for each record, and a correspondence table between IDs and row numbers is generated. For example, the mesh element ID is calculated as floor ((numerical attribute-m) / d). The floor is a floor function. For example, the record of the line number “8” in FIG. 4 is (X, Y) = (6, 5), and m = (0, 0), d = (6, 6), so (floor (( 6-0) / 6), floor ((5-0) / 6)) = (1, 0) is the mesh element ID.

以上のような処理を行うと、図１１に示すような分類がなされる。図１１の例では、３つのメッシュ要素に、各レコードが分類されている。 When the above processing is performed, classification as shown in FIG. 11 is performed. In the example of FIG. 11, each record is classified into three mesh elements.

次に、グループ化処理部１３０は、レコードを含むメッシュ要素のうち、未処理のメッシュ要素ｓを１つ特定する（ステップＳ３５）。例えば、メッシュ要素ＩＤ（０，０）のメッシュ要素を特定する。 Next, the grouping processing unit 130 identifies one unprocessed mesh element s from among the mesh elements including the record (step S35). For example, the mesh element with the mesh element ID (0, 0) is specified.

そして、グループ化処理部１３０は、特定されたメッシュ要素ｓ内において譲れないレコードとして第２データ格納部１４０に登録されているレコード以外のレコードＲａを抽出する（ステップＳ３７）。最初の場合には、譲れないレコードとして登録されているレコードは存在しない。但し、以下で述べる処理では譲れないレコードとして登録されているレコードが存在する場合には、当該レコードを除外してレコードＲａを特定する。最初の例では、Ｒａ＝｛１，２，３，４，５，６，７｝となる。 Then, the grouping processing unit 130 extracts a record Ra other than the record registered in the second data storage unit 140 as a record that cannot be transferred in the identified mesh element s (step S37). In the first case, there is no record registered as a record that cannot be transferred. However, if there is a record registered as a record that cannot be transferred in the process described below, the record Ra is specified by excluding the record. In the first example, Ra = {1, 2, 3, 4, 5, 6, 7}.

その後、グループ化処理部１３０は、レコード群Ｒｓ抽出処理を実行する（ステップＳ３９）。このレコード群Ｒｓ抽出処理については、図１２乃至図１９を用いて説明する。なお、この処理が終了すると、処理は端子Ａを介して図２０の処理に移行する。 Thereafter, the grouping processing unit 130 executes a record group Rs extraction process (step S39). The record group Rs extraction process will be described with reference to FIGS. When this process ends, the process proceeds to the process of FIG.

まず、グループ化処理部１３０は、レコードＲａについての度数分布を生成し、レコードＲａにはｌ種類以上のＩＤ属性値が含まれるか判断する（ステップＳ１３３）。レコードＲａにｌ種類以上のＩＤ属性値が含まれない場合には、グループ化処理部１３０は、レコード群Ｒｓを空に設定する（ステップＳ１３８）。そして、処理は呼び出し元に戻る。ｌ種類以上のＩＤ属性値が含まれていない場合には、グループ化を行うことができないので、呼び出し元の処理に戻る。 First, the grouping processing unit 130 generates a frequency distribution for the record Ra, and determines whether the record Ra includes one or more types of ID attribute values (step S133). When the record Ra does not include one or more types of ID attribute values, the grouping processing unit 130 sets the record group Rs to be empty (step S138). Then, the process returns to the caller. If the ID attribute value of one or more types is not included, grouping cannot be performed, and the process returns to the caller process.

一方、レコードＲａにｌ種類以上のＩＤ属性値が含まれる場合には、グループ化処理部１３０は、レコードＲａについての度数分布が、度数分布パターンにおける条件ａ（＝減衰率）を満たしているか判断する（ステップＳ１３４）。ｌ＝３及びａ＝０．５が設定されているものとする。度数が大きい順に並べて、ｎ番目の度数が、ｎ−１番目の度数のａ＝０．５以上であることが条件となっている。例えば｛Ｂ：２，Ａ：１，Ｃ：１｝という度数分布が得られた場合には、この条件を満たしている。 On the other hand, when the record Ra includes one or more ID attribute values, the grouping processing unit 130 determines whether the frequency distribution for the record Ra satisfies the condition a (= attenuation rate) in the frequency distribution pattern. (Step S134). It is assumed that l = 3 and a = 0.5 are set. Arranged in descending order of frequency, the n-th frequency is a condition that the a-1 of the (n-1) -th frequency is 0.5 or more. For example, when a frequency distribution of {B: 2, A: 1, C: 1} is obtained, this condition is satisfied.

減衰率ａを用いるのは、度数分布の偏りが大きすぎるグループを作らない、すなわち安全性を高めるためである。減衰率ａは、ｌ種類以上あっても偏りが大きすぎる度数分布となるグループを許さないようにするための条件である。例えば、度数分布が｛Ａ：１００，Ｃ：１｝（Ａが１００個、Ｃが１個の意味）となるレコードを含む範囲を考える。この範囲は２種類のＩＤを含むので、この範囲が特定の１つのＩＤしか存在し得ないような地域に包含されることはない。しかし、度数の９９％以上がＡであり、この範囲のほとんどがＡしか存在し得ないような地域である可能性がある。そうであった場合、この範囲を開示すると、この範囲のほとんどのレコードがＡから提供されたことを推定しやすく問題となるからである。 The reason why the attenuation factor a is used is not to form a group in which the bias of the frequency distribution is too large, that is, to improve safety. The attenuation rate a is a condition for preventing groups having a frequency distribution that is too biased even if there are 1 or more types. For example, consider a range including records in which the frequency distribution is {A: 100, C: 1} (A means 100 and C means 1). Since this range includes two types of IDs, this range is not included in an area where only one specific ID can exist. However, 99% or more of the frequencies are A, and there is a possibility that most of this range is an area where only A can exist. If this is the case, disclosing this range is problematic because it is easy to estimate that most records in this range have been provided by A.

レコードＲａについての度数分布が、度数分布パターンにおける条件を満たしている場合には、グループ化処理部１３０は、レコード群Ｒｓに、レコードＲａを全て設定する（ステップＳ１３６）。そして、処理は呼出元の処理に戻る。 When the frequency distribution for the record Ra satisfies the condition in the frequency distribution pattern, the grouping processing unit 130 sets all the records Ra in the record group Rs (step S136). Then, the process returns to the caller process.

一方、レコードＲａについての度数分布が、度数分布パターンにおける条件を満たしていない場合には、グループ化処理部１３０は、除外処理を実行する（ステップＳ１３５）。条件ｌを満たしていれば、条件ａを満たすようにレコードを除外することが可能である。従って、除外処理を実行する。除外処理については、図１３乃至図１９を用いて説明する。 On the other hand, when the frequency distribution for the record Ra does not satisfy the condition in the frequency distribution pattern, the grouping processing unit 130 executes an exclusion process (step S135). If the condition l is satisfied, it is possible to exclude records so as to satisfy the condition a. Therefore, the exclusion process is executed. The exclusion process will be described with reference to FIGS.

そして、グループ化処理部１３０は、予め定められているルールに従って、除外処理で決定された数の除外すべきレコードを特定して、レコードＲａから除外して、残余をレコード群Ｒｓに設定する（ステップＳ１３７）。所定のルールは、例えばランダムといったような単純な方法でよい。 Then, according to a predetermined rule, the grouping processing unit 130 specifies the number of records to be excluded determined by the exclusion process, excludes the records from the record Ra, and sets the remainder in the record group Rs ( Step S137). The predetermined rule may be a simple method such as random.

ここで、除外処理について説明する。まず、グループ化処理部１３０は、レコードＲａについて度数分布表Ｆを生成し、度数の昇順に整列させる（図１３：ステップＳ１４１）。上で述べた例では、除外処理は行われないので、ここでは図１４に示すような度数分布表Ｆが生成されたものとする。また、ｌ＝４且つａ＝０．５であるものとする。 Here, the exclusion process will be described. First, the grouping processing unit 130 generates a frequency distribution table F for the records Ra and arranges them in ascending order of the frequencies (FIG. 13: step S141). In the example described above, since the exclusion process is not performed, it is assumed here that a frequency distribution table F as shown in FIG. 14 is generated. Also assume that l = 4 and a = 0.5.

そして、グループ化処理部１３０は、変数ｐを初期化し（ステップＳ１４３）、変数ｉを０に初期化する（ステップＳ１４５）。その後、グループ化処理部１３０は、ｉが度数分布表Ｆの行数｜Ｆ｜より小さいか判断する（ステップＳ１４７）。ｉが度数分布表Ｆの行数｜Ｆ｜より小さい場合には、グループ化処理部１３０は、ｉ＋ｌ−１が｜Ｆ｜より小さいか判断する（ステップＳ１４９）。ｉ＋ｌ−１が｜Ｆ｜より小さい場合には、グループ化処理部１３０は、変数ｐに対してＦ［ｉ］を代入する（ステップＳ１５１）。Ｆ［ｉ］は、Ｆのｉ＋１行目の度数である。ｉ＝０であれば、変数ｐには、Ｆの１行目の度数「１」が代入される。 Then, the grouping processing unit 130 initializes the variable p (step S143) and initializes the variable i to 0 (step S145). Thereafter, the grouping processing unit 130 determines whether i is smaller than the number of rows | F | of the frequency distribution table F (step S147). If i is smaller than the number of rows | F | in the frequency distribution table F, the grouping processing unit 130 determines whether i + l−1 is smaller than | F | (step S149). When i + l−1 is smaller than | F |, the grouping processing unit 130 substitutes F [i] for the variable p (step S151). F [i] is the frequency of the (i + 1) th row of F. If i = 0, the frequency “1” in the first row of F is assigned to the variable p.

一方、ｉ＋ｌ−１が｜Ｆ｜以上であれば、グループ化処理部１３０は、変数ｐに、ｍｉｎ（Ｆ［ｉ］，ｆｌｏｏｒ（ｐ／ａ））を代入する（ステップＳ１５３）。ｍｉｎ（Ａ，Ｂ）は、ＡとＢのうち小さい方を出力する関数である。 On the other hand, if i + 1−1 is equal to or greater than | F |, the grouping processing unit 130 substitutes min (F [i], floor (p / a)) for the variable p (step S153). min (A, B) is a function that outputs the smaller one of A and B.

ステップＳ１５１又はＳ１５３の後に、グループ化処理部１３０は、Ｆ［ｉ］に、Ｆ［ｉ］−ｐを代入する（ステップＳ１５５）。ｉ＝０の時に、Ｓ１４９を実行すると、度数分布表Ｆは、図１５のようになる。 After step S151 or S153, the grouping processing unit 130 substitutes F [i] -p for F [i] (step S155). When S149 is executed when i = 0, the frequency distribution table F becomes as shown in FIG.

その後、グループ化処理部１３０は、変数ｉを１インクリメントし（ステップＳ１５７）、処理はステップＳ１４７に戻る。 Thereafter, the grouping processing unit 130 increments the variable i by 1 (step S157), and the process returns to step S147.

２回目のステップＳ１４７では、｜Ｆ｜＝５，ｉ＝１であるから、ｉ＜｜Ｆ｜となる。また、ｌ＝４であるので、ｉ＋ｌ−１＜｜Ｆ｜となる。従って、ステップＳ１５１でｐ＝３であり、Ｆ［１］＝３−３＝０となる。そうすると、度数分布表Ｆは、図１６のようになる。その後ｉ＝２となる。 In the second step S147, since | F | = 5 and i = 1, i <| F |. Since l = 4, i + l−1 <| F |. Therefore, in step S151, p = 3 and F [1] = 3-3 = 0. Then, the frequency distribution table F becomes as shown in FIG. After that, i = 2.

３回目のステップＳ１４７では、｜Ｆ｜＝５，ｉ＝２であるから、ｉ＜｜Ｆ｜となる。また、ｌ＝４であるので、ｉ＋ｌ−１＜｜Ｆ｜とはならず、ステップＳ１５３に移行して、ａ＝０．５及びｐ＝３であるから、ｍｉｎ（Ｆ［ｉ］＝４，ｆｌｏｏｒ（ｐ／ａ）＝６）＝４となる。従って、Ｆ［２］＝４−４＝０となる。そうすると、度数分布表Ｆは、図１７のようになる。その後ｉ＝３となる。 In the third step S147, since | F | = 5 and i = 2, i <| F |. Since l = 4, i + l−1 <| F | is not satisfied, and the process proceeds to step S153, where a = 0.5 and p = 3, and thus min (F [i] = 4, floor (p / a) = 6) = 4. Therefore, F [2] = 4-4 = 0. Then, the frequency distribution table F becomes as shown in FIG. Then i = 3.

４回目のステップＳ１４７では、｜Ｆ｜＝５、ｉ＝３であるから、ｉ＜｜Ｆ｜となる。また、ｌ＝４であるので、ｉ＋ｌ−１＜｜Ｆ｜とはならず、ステップＳ１５３に移行して、ａ＝０．５及びｐ＝４であるから、ｍｉｎ（Ｆ［ｉ］＝９，ｆｌｏｏｒ（ｐ／ａ）＝８）＝８となる。従って、Ｆ［３］＝９−８＝１となる。そうすると、度数分布表Ｆは、図１８のようになる。その後ｉ＝４となる。 In the fourth step S147, since | F | = 5 and i = 3, i <| F |. Since l = 4, i + l−1 <| F | is not satisfied, and the process proceeds to step S153, where a = 0.5 and p = 4, and thus min (F [i] = 9, floor (p / a) = 8) = 8. Therefore, F [3] = 9-8 = 1. Then, the frequency distribution table F becomes as shown in FIG. Then i = 4.

５回目のステップＳ１４７では、｜Ｆ｜＝５、ｉ＝４であるから、ｉ＜｜Ｆ｜となる。また、ｌ＝４であるので、ｉ＋ｌ−１＜｜Ｆ｜とはならず、ステップＳ１５３に移行して、ａ＝０．５及びｐ＝８であるから、ｍｉｎ（Ｆ［ｉ］＝１０，ｆｌｏｏｒ（ｐ／ａ）＝１６）＝１０となる。従って、Ｆ［４］＝１０−１０＝０となる。そうすると、度数分布表Ｆは、図１９のようになる。その後ｉ＝５となる。 In the fifth step S147, since | F | = 5 and i = 4, i <| F |. Since l = 4, i + l−1 <| F | is not satisfied, and the process proceeds to step S153, where a = 0.5 and p = 8, so min (F [i] = 10, floor (p / a) = 16) = 10. Therefore, F [4] = 10−10 = 0. Then, the frequency distribution table F becomes as shown in FIG. Then i = 5.

６回目のステップＳ１４７では、｜Ｆ｜＝５、ｉ＝５であるから、ｉ＜｜Ｆ｜が成り立たなくなる。そうすると、処理は呼出元の処理に戻る。すなわち、この時点における度数分布表Ｆ（図１９）が、除外すべきレコードを示している。ここでは、ＩＤが「Ｅ」のレコードを１つ除外することになる。除外するレコードについては、上で述べたように、ランダムに選択すればよい。 In the sixth step S147, since | F | = 5 and i = 5, i <| F | does not hold. Then, the process returns to the caller process. That is, the frequency distribution table F (FIG. 19) at this time indicates records to be excluded. Here, one record with ID “E” is excluded. The records to be excluded may be selected at random as described above.

図１０の端子Ａを介して図２０の処理の説明に移行する。なお、上で述べていた例のように、Ｒａ＝｛１，２，３，４，５，６，７｝の場合、ｌ及びａの条件を満たすので、Ｒａ＝Ｒｓとなる。 The processing shifts to the description of the processing in FIG. 20 via the terminal A in FIG. As in the example described above, when Ra = {1, 2, 3, 4, 5, 6, 7}, since the conditions of l and a are satisfied, Ra = Rs.

グループ化処理部１３０は、レコード群Ｒｓが空であるか判断する（ステップＳ４１）。レコード群Ｒｓが空であると、グループ化は行われないので、処理はステップＳ５５に移行する。 The grouping processing unit 130 determines whether the record group Rs is empty (step S41). If the record group Rs is empty, no grouping is performed, and the process proceeds to step S55.

一方、レコード群Ｒｓが空でない場合には、グループ化処理部１３０は、レコード群Ｒｓに未グルーピングのレコードが存在するか否かを判断する（ステップＳ４３）。未グルーピングのレコードが存在しない場合には、新たなグループの生成をわざわざ行うことはないので、処理はステップＳ５５に移行する。 On the other hand, when the record group Rs is not empty, the grouping processing unit 130 determines whether or not an ungrouped record exists in the record group Rs (step S43). If there is no ungrouped record, no new group is generated, and the process proceeds to step S55.

一方、レコード群Ｒｓに未グルーピングのレコードが存在する場合には、グループ化処理部１３０は、レコード群Ｒｓでグループを生成する（ステップＳ４５）。Ｒｓ＝｛１，２，３，４，５，６，７｝が１つのグループとなる。 On the other hand, when an ungrouped record exists in the record group Rs, the grouping processing unit 130 generates a group with the record group Rs (step S45). Rs = {1, 2, 3, 4, 5, 6, 7} forms one group.

そして、グループ化処理部１３０は、生成されたグループに対して譲渡可能レコードの抽出処理を実行する（ステップＳ４７）。すなわち、生成されたグループのうち、度数分布パターンを満たす上で必須のレコード以外のレコードを抽出する処理を実行する。より具体的には、図２１乃至図２６を用いて説明する。 Then, the grouping processing unit 130 performs transferable record extraction processing on the generated group (step S47). That is, a process of extracting records other than those essential to satisfy the frequency distribution pattern from the generated group is executed. More specific description will be given with reference to FIGS.

まず、グループ化処理部１３０は、レコードＲｓについて度数分布表Ｆを生成し、度数の昇順に整列させる（図２１：ステップＳ１７１）。処理を分かり易くするために、図２２に示すような度数分布表Ｆが生成されたものとする。 First, the grouping processing unit 130 generates a frequency distribution table F for the records Rs and arranges them in ascending order of the frequencies (FIG. 21: step S171). In order to make the process easy to understand, it is assumed that a frequency distribution table F as shown in FIG. 22 is generated.

そして、グループ化処理部１３０は、変数ｃｉに｜Ｆ｜−ｌを設定し、変数ｍｉｎに、ｃｅｉｌ（Ｆ［ｃｉ］＊ａ）を設定する（ステップＳ１７３）。ｃｅｉｌ（ｘ）は天井関数であり、実数ｘに対してｘ以上の最小の整数を出力する関数である。Ｆ［ｉ］は度数分布表Ｆのｉ＋１行目の度数を表し、｜Ｆ｜は度数分布表Ｆの行数を表す。ｃｉ＝５−４＝１であり、ｍｉｎ＝ｃｅｉｌ（２＊０．５）＝１となる。 Then, the grouping processing unit 130 sets | F | −1 for the variable ci and sets ceil (F [ci] * a) for the variable min (step S173). ceil (x) is a ceiling function, and is a function that outputs the smallest integer equal to or greater than x with respect to the real number x. F [i] represents the frequency in the (i + 1) th row of the frequency distribution table F, and | F | represents the number of rows in the frequency distribution table F. ci = 5-4 = 1 and min = ceil (2 * 0.5) = 1.

また、グループ化処理部１３０は、変数ｉを０に初期化し、変数ｍａｘを０に初期化する（ステップＳ１７５）。 In addition, the grouping processing unit 130 initializes the variable i to 0 and initializes the variable max to 0 (step S175).

その後、グループ化処理部１３０は、ｉ＜｜Ｆ｜であるか判断する（ステップＳ１７７）。ｉ＜｜Ｆ｜であれば、グループ化処理部１３０は、変数ｃを初期化する（ステップＳ１７９）。その後、グループ化処理部１３０は、ｉ＜ｃｉであるか判断する（ステップＳ１８１）。ｉ＝０であれば、ｃｉ＝１であるからこの条件は満たされている。 Thereafter, the grouping processing unit 130 determines whether i <| F | is satisfied (step S177). If i <| F |, the grouping processing unit 130 initializes the variable c (step S179). Thereafter, the grouping processing unit 130 determines whether i <ci (step S181). If i = 0, this condition is satisfied because ci = 1.

ｉ＜ｃｉであれば、グループ化処理部１３０は、ｃに０を設定する（ステップＳ１８３）。そうすると、グループ化処理部１３０は、Ｆ［ｉ］に、Ｆ［ｉ］−ｃを設定する（ステップＳ１８５）。Ｆ［ｉ］＝１であり、ｃ＝０であるから、Ｆ［ｉ］＝１となる。その後、グループ化処理部１３０は、ｉを１インクリメントし（ステップＳ１８７）、処理はステップＳ１７７に戻る。 If i <ci, the grouping processing unit 130 sets c to 0 (step S183). Then, the grouping processing unit 130 sets F [i] -c for F [i] (step S185). Since F [i] = 1 and c = 0, F [i] = 1. Thereafter, the grouping processing unit 130 increments i by 1 (step S187), and the process returns to step S177.

ｉ＝１になると、ステップＳ１８１では、ｉ＜ｃｉは成り立たなくなるので、グループ化処理部１３０は、ｉ＋１＝｜Ｆ｜であるか判断する（ステップＳ１８９）。ｉ＝１であれば、ｉ＋１＝２であるから、この条件を満たさない。ステップＳ１８９の条件を満たさない場合には、グループ化処理部１３０は、ｃに、ｃｅｉｌ（Ｆ［ｉ＋１］＊ａ）を代入する（ステップＳ１９１）。ｃ＝ｃｅｉｌ（Ｆ［２］＊０．５）＝２となる。そして、グループ化処理部１３０は、ｍａｘ＜ｃであるか判断する（ステップＳ１９３）。ｍａｘ＝０であるからこの条件を満たす。そうすると、グループ化処理部１３０は、ｍａｘにｃを代入する（ステップＳ１９７）。すなわち、ｍａｘ＝ｃ＝２となる。その後、処理はステップＳ１８５に移行する。従って、２回目のステップＳ１８５では、Ｆ［１］＝２−２＝０となる。従って、図２３に示すような度数分布表Ｆになる。 When i = 1, i <ci does not hold in step S181, so the grouping processing unit 130 determines whether i + 1 = | F | (step S189). If i = 1, since i + 1 = 2, this condition is not satisfied. If the condition of step S189 is not satisfied, the grouping processing unit 130 substitutes ceil (F [i + 1] * a) for c (step S191). c = ceil (F [2] * 0.5) = 2. Then, the grouping processing unit 130 determines whether max <c is satisfied (step S193). This condition is satisfied because max = 0. Then, the grouping processing unit 130 substitutes c for max (step S197). That is, max = c = 2. Thereafter, the process proceeds to step S185. Accordingly, in the second step S185, F [1] = 2-2 = 0. Therefore, a frequency distribution table F as shown in FIG.

ｉ＝２になると、ステップＳ１８１では、ｉ＜ｃｉは成り立たなくなるので、ステップＳ１８９に移行する。但し、ｉ＋１＜｜Ｆ｜であるから、ステップＳ１９１に処理は移行し、ｃ＝ｃｅｉｌ（Ｆ［３］＊０．５）＝２となる。ｍａｘ＝２であるから、ｍａｘ＜ｃの条件を満たしていない。そうすると、グループ化処理部１３０は、ｃにｍｉｎを代入する（ステップＳ１９５）。ｍｉｎ＝１であるから、ｃ＝１となる。そして処理はステップＳ１８５に移行して、３回目のステップＳ１８５では、Ｆ［２］＝３−１＝２となる。従って、図２４に示すような度数分布表Ｆになる。 When i = 2, i <ci is not satisfied in step S181, and the process proceeds to step S189. However, since i + 1 <| F |, the process proceeds to step S191, and c = ceil (F [3] * 0.5) = 2. Since max = 2, the condition of max <c is not satisfied. Then, the grouping processing unit 130 substitutes min for c (step S195). Since min = 1, c = 1. Then, the process proceeds to step S185, and in the third step S185, F [2] = 3-1 = 2. Therefore, a frequency distribution table F as shown in FIG.

ｉ＝３になると、ステップＳ１８１では、ｉ＜ｃｉは成り立たないので、ステップＳ１８９に移行する。但し、ｉ＋１＜｜Ｆ｜であるから、ステップＳ１９１に処理は移行し、ｃ＝ｃｅｉｌ（Ｆ［４］＊０．５）＝３となる。ｍａｘ＝２でｃ＝３であるから、ｍａｘ＜ｃの条件を満たしている。従って、ｍａｘ＝ｃ＝３となる。そして、４回目のステップＳ１８５では、Ｆ［３］＝４−３＝１となる。従って、図２５に示すような度数分布表Ｆとなる。 When i = 3, since i <ci does not hold in step S181, the process proceeds to step S189. However, since i + 1 <| F |, the process proceeds to step S191, and c = ceil (F [4] * 0.5) = 3. Since max = 2 and c = 3, the condition of max <c is satisfied. Therefore, max = c = 3. In the fourth step S185, F [3] = 4-3 = 1. Therefore, a frequency distribution table F as shown in FIG. 25 is obtained.

ｉ＝４になると、ステップＳ１８１では、ｉ＜ｃｉは成り立たないので、ステップＳ１８９に移行する。ｉ＋１＝｜Ｆ｜を満たすので、ステップＳ１９５に処理は移行し、ｃ＝ｍｉｎ＝１となる。そして処理はステップＳ１８５に移行して、５回目のステップＳ１８５では、Ｆ［４］＝Ｆ［４］−ｃ＝５−１＝４となる。従って、図２６に示すような度数分布表Ｆが得られる。 When i = 4, since i <ci does not hold in step S181, the process proceeds to step S189. Since i + 1 = | F | is satisfied, the process proceeds to step S195, where c = min = 1. Then, the process proceeds to step S185, and in the fifth step S185, F [4] = F [4] −c = 5-1 = 4. Therefore, a frequency distribution table F as shown in FIG. 26 is obtained.

その後ｉ＝５になると、ステップＳ１７７ではｉ＜｜Ｆ｜の条件を満たさなくなるので、処理は呼出元の処理に戻る。従って図２６に示すように、ＩＤがＡの１レコード、ＩＤがＣの２レコード、ＩＤがＤの１レコード、及びＩＤがＥの４レコードが、譲渡可能なレコードとして特定されたことになる。 Thereafter, when i = 5, the condition of i <| F | is not satisfied in step S177, and the process returns to the caller process. Therefore, as shown in FIG. 26, one record whose ID is A, two records whose ID is C, one record whose ID is D, and four records whose ID is E are specified as transferable records.

図２０の処理の説明に戻って、グループ化処理部１３０は、ステップＳ４７の処理結果に基づき、予め定められているルールに従って、譲渡可能なレコードを具体的に特定する（ステップＳ４９）。所定のルールについては、例えばランダムであっても良いし、行番号が大きい順番であっても良い。 Returning to the description of the processing in FIG. 20, the grouping processing unit 130 specifically identifies a transferable record according to a predetermined rule based on the processing result in step S47 (step S49). The predetermined rule may be random, for example, or may be in order of increasing row numbers.

上で述べた例では、Ｒｓ＝｛１，２，３，４，５，６，７｝のうち、譲れるレコードは｛２，４，６，７｝であり、譲れないレコードは｛１，３，５｝となる。 In the example described above, among Rs = {1, 2, 3, 4, 5, 6, 7}, the record that can be transferred is {2, 4, 6, 7}, and the record that cannot be transferred is {1, 3, 5}.

ここまでの処理によって、第２データ格納部１４０には、図２７乃至図２９のデータが格納される。図２７は、グループ化が行われたメッシュ及びメッシュ要素ＩＤとグループＩＤ（Ｇ＿ＩＤ）との対応付けを行うテーブルを表している。グループＩＤについてはユニークになるようにシリアルに番号付けする例を示している。また、図２８では、グループＩＤに対応付けて、当該グループに属するレコードのうち譲れないレコードの行番号集合と、譲れるレコードの行番号集合とが登録されるようになっている。さらに、図２９は、行番号とグループＩＤとの対応付けを示している。 Through the processing so far, the data of FIGS. 27 to 29 is stored in the second data storage unit 140. FIG. 27 shows a table that associates grouped meshes and mesh element IDs with group IDs (G_IDs). In the example, the group ID is serially numbered so as to be unique. Also, in FIG. 28, in association with a group ID, a row number set of records that cannot be transferred among records belonging to the group and a row number set of records that can be transferred are registered. Furthermore, FIG. 29 shows the correspondence between row numbers and group IDs.

そして、グループ化処理部１３０は、他のグループから譲ってもらったレコードが存在しているか否かを判断する（ステップＳ５１）。他のグループにグループ化されていたが、譲れるレコードに分類されていて今回生成されたグループに属するレコードが存在する場合には、所属グループを変更することになる。従って、他のグループから譲ってもらったレコードが存在している場合には、グループ化処理部１３０は、第２データ格納部１４０において、譲渡元グループから、譲ってもらったレコードの行番号を削除する（ステップＳ５３）。一方、他のグループから譲ってもらったレコードが存在していない場合には、処理はステップＳ５５に移行する。 Then, the grouping processing unit 130 determines whether there is a record given from another group (step S51). If there is a record that has been grouped into another group but is classified as a record to be transferred and belongs to the group generated this time, the group to which the group belongs is changed. Therefore, when there is a record transferred from another group, the grouping processing unit 130 deletes the row number of the record transferred from the transfer source group in the second data storage unit 140. (Step S53). On the other hand, if there is no record given from another group, the process proceeds to step S55.

そして、グループ化処理部１３０は、未処理のメッシュ要素が存在するか判断する（ステップＳ５５）。未処理のメッシュ要素が存在する場合には、処理は端子Ｂを介して図１０のステップＳ３５に戻る。一方、未処理のメッシュ要素が存在しない場合には、グループ化処理部１３０は、未処理のメッシュが存在するか否かを判断する（ステップＳ５７）。未処理のメッシュが存在する場合には、処理は端子Ｃを介して図１０のステップＳ３１に戻る。一方、未処理のメッシュが存在しない場合には、処理は呼び出し元の処理に戻る。 Then, the grouping processing unit 130 determines whether there is an unprocessed mesh element (step S55). If there is an unprocessed mesh element, the process returns to step S35 in FIG. On the other hand, if there is no unprocessed mesh element, the grouping processing unit 130 determines whether there is an unprocessed mesh (step S57). If there is an unprocessed mesh, the process returns to step S31 in FIG. On the other hand, if there is no unprocessed mesh, the process returns to the caller process.

なお、上で述べた例では、メッシュ（０，０）についてメッシュ要素ＩＤ（１，０）に対する処理に移行する。但し、Ｒａ＝｛８｝となるので、ｌ及びａの条件を満たさないので、Ｒｓは空に設定されて、グループ化は行われない。 In the example described above, the processing shifts to mesh element ID (1, 0) for mesh (0, 0). However, since Ra = {8}, the conditions of l and a are not satisfied, so Rs is set to be empty and no grouping is performed.

また、メッシュ（０，０）についてメッシュ要素ＩＤ（１，１）に対する処理に移行する。この場合、Ｒａ＝｛９，１０，１１，１２｝となって、ｌ及びａの条件を満たすので、Ｒａ＝Ｒｓ＝｛９，１０，１１，１２｝が設定され、グループもそのまま生成される。なお、譲渡可能レコードの抽出処理（図２１）を実行すると、｛１２｝が譲れるレコードに特定されるので、譲れないレコードは｛９，１０，１１｝となる。 Further, the process proceeds to the process for the mesh element ID (1, 1) for the mesh (0, 0). In this case, Ra = {9, 10, 11, 12} and the conditions of l and a are satisfied. Therefore, Ra = Rs = {9, 10, 11, 12} is set, and the group is also generated as it is. . When the transferable record extraction process (FIG. 21) is executed, {12} is specified as the record to be transferred, so the record that cannot be transferred is {9, 10, 11}.

ここまで処理すると、第２データ格納部１４０には、図３０乃至図３２に示すようなデータが格納される。図３０は、図２７の後の状態を表しており、グループＩＤ「２」が、メッシュ（０，０）及びメッシュ要素ＩＤ（１，１）に付与されている。また、図３１は、図２８の後の状態を表しており、グループＩＤ「２」について、上で述べたように、譲れないレコードの行番号集合｛９，１０，１１｝、及び譲れるレコードの行番号集合｛１２｝が示されている。図３２は、図２９の後の状態を表しており、グループＩＤ「２」についての行のデータが追加されている。 When the processing so far is performed, the second data storage unit 140 stores data as illustrated in FIGS. 30 to 32. FIG. 30 shows a state after FIG. 27, and the group ID “2” is assigned to the mesh (0, 0) and the mesh element ID (1, 1). FIG. 31 shows the state after FIG. 28, and for group ID “2”, as described above, the row number set {9, 10, 11} of records that cannot be transferred, and the records that can be transferred A row number set {12} is shown. FIG. 32 shows a state after FIG. 29, and row data for the group ID “2” is added.

次に、メッシュ（３，３）の処理に移行して、レコードの分類を行うと、図３３のような分類結果が得られる。２つのメッシュ要素に分類されたことが分かる。 Next, when the processing is shifted to the mesh (3, 3) process and the records are classified, a classification result as shown in FIG. 33 is obtained. It can be seen that it is classified into two mesh elements.

まず、メッシュ要素ＩＤ（−１，０）のメッシュ要素に分類されたレコードを処理する。レコード｛１，２｝は、ｌ及びａの条件を満たさないので、メッシュ要素ＩＤ（０，０）の処理に移行する。 First, a record classified as a mesh element having a mesh element ID (-1, 0) is processed. Since the record {1, 2} does not satisfy the conditions of l and a, the process proceeds to processing of the mesh element ID (0, 0).

そうすると、｛３，４，５，６，７，８，９，１０，１１，１２｝のうち、これまでに生成されたグループにおいて譲れないレコードは｛１，３，５，９，１０，１１｝であるから、Ｒａ＝｛４，６，７，８，１２｝となる。このＲａはｌ及びａの条件を満たすので、Ｒａ＝Ｒｓとなる。 Then, among {3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, records that cannot be transferred in the groups generated so far are {1, 3, 5, 9, 10, 11 }, Ra = {4, 6, 7, 8, 12}. Since Ra satisfies the conditions of l and a, Ra = Rs.

Ｒｓの中にはレコード｛８｝が含まれているので、Ｒｓでグループを生成することになる。さらに、譲渡可能レコードの抽出処理（図２１）を実行すると、譲れるレコードは｛７，８｝と特定され、譲れないレコードは｛４，６，１２｝となる。なお、これらは図３１から分かるように既にグループ化されているので、各グループから譲ってもらうことになる。 Since Rs includes a record {8}, a group is generated with Rs. Further, when the transferable record extraction process (FIG. 21) is executed, the record that can be transferred is identified as {7, 8}, and the record that cannot be transferred becomes {4, 6, 12}. Since these are already grouped as can be seen from FIG. 31, they are handed over from each group.

そうすると、第２データ格納部１４０に格納されるデータは、図３４乃至図３６に示すようなデータに更新される。図３４は、図３０の後の状態を表しており、グループＩＤに対応付けてメッシュ（３，３）のメッシュ要素ＩＤ（０，０）が追加されている。さらに、図３５は、図３１の後の状態を表しており、グループＩＤ「３」に対応付けて、譲れないレコード｛４，６，１２｝、及び譲れるレコード｛７，８｝が登録されている。レコード｛８｝以外は、他のグループから譲ってもらったので、グループ「１」及び「２」から、それらのグループの行番号が削除されている。図３６は、図３２の後の状態を表しており、レコード｛８｝が新たに追加され、譲ってもらったレコードについてのグループＩＤが変更になっている。 Then, the data stored in the second data storage unit 140 is updated to data as shown in FIGS. FIG. 34 shows a state after FIG. 30, and a mesh element ID (0, 0) of the mesh (3, 3) is added in association with the group ID. Furthermore, FIG. 35 shows the state after FIG. 31, and records {4, 6, 12} that cannot be transferred and records {7, 8} that can be transferred are registered in association with the group ID “3”. ing. Since records other than the record {8} were handed over from other groups, the row numbers of those groups are deleted from the groups “1” and “2”. FIG. 36 shows the state after FIG. 32, in which a record {8} is newly added, and the group ID for the record that has been transferred is changed.

さらに、メッシュ（３／２，３／２）についての処理に移行する。そうすると、図３７に示すような分類結果が得られる。図３７からも分かるように、メッシュ要素ＩＤ（−１，０）（０，１）（１，１）はｌ及びａの条件を満たさない。メッシュ要素ＩＤ（０，０）の場合には、これまでに生成されたグループにおいて譲れないレコードは｛１，３，４，５，６，９，１０，１１，１２｝であるから、Ｒａ＝｛２，７，８｝となる。これは、ｌ及びａの条件を満たすので、Ｒａ＝Ｒｓとなる。但し、｛２，７，８｝は皆グルーピング済みであるので、新たなグループが生成されないことになる。 Further, the processing shifts to mesh (3/2, 3/2). Then, a classification result as shown in FIG. 37 is obtained. As can be seen from FIG. 37, the mesh element ID (-1, 0) (0, 1) (1, 1) does not satisfy the conditions of l and a. In the case of the mesh element ID (0, 0), since the records that cannot be transferred in the groups generated so far are {1, 3, 4, 5, 6, 9, 10, 11, 12}, Ra = {2, 7, 8}. Since this satisfies the conditions of l and a, Ra = Rs. However, since {2, 7, 8} are all grouped, no new group is generated.

以下、他のメッシュについて処理を行っても、新たにグループが生成されることはない。 Hereinafter, even if processing is performed on other meshes, no new group is generated.

以上の処理結果を、平面上に表したのが図３８である。図３８において、三角はＩＤ「Ａ」のレコードを表し、×はＩＤ「Ｂ」のレコードを表し、丸はＩＤ「Ｃ」のレコードを表し、このような記号の脇の番号は行番号を表す。 FIG. 38 shows the above processing results on a plane. In FIG. 38, a triangle represents a record with ID “A”, a cross represents a record with ID “B”, a circle represents a record with ID “C”, and a number beside such a symbol represents a row number. .

グループ「１」は矩形１００１に対応し、レコード｛１，２，３，５｝が所属する。また、グループ「２」は矩形１００２に対応し、レコード｛９，１０，１１｝が所属する。グループ「３」は矩形１００３に対応し、レコード｛４，６，７，８，１２｝が所属する。このように、重なっている領域部分に含まれるレコードを分け合っており、各矩形において特定のＩＤのみが所属することはない。 The group “1” corresponds to the rectangle 1001 and the record {1, 2, 3, 5} belongs to it. The group “2” corresponds to the rectangle 1002 and the record {9, 10, 11} belongs to it. The group “3” corresponds to the rectangle 1003 and the record {4, 6, 7, 8, 12} belongs to it. Thus, the records included in the overlapping area portion are shared, and only a specific ID does not belong to each rectangle.

図５の処理の説明に戻って、匿名化処理部１５０は、第２データ格納部１４０に格納されているデータから、グループ毎に匿名化を実行し、処理結果を第３データ格納部１６０に格納する（ステップＳ５）。例えば、グループに対応するメッシュ要素の中心点に、そのグループに所属するレコードが存在するようにデータを変換すると共に、機密属性であるＩＤを削除する。例えば、図４に示したレコード群の場合には、図３９に示すようなデータが得られる。図４の（Ｘ，Ｙ）が、メッシュ要素の中心座標で置換されていることが分かる。このように匿名化されて、個々のレコードが特定されることはない。 Returning to the description of the processing in FIG. 5, the anonymization processing unit 150 executes anonymization for each group from the data stored in the second data storage unit 140, and sends the processing result to the third data storage unit 160. Store (step S5). For example, the data is converted so that the record belonging to the group exists at the center point of the mesh element corresponding to the group, and the ID which is the confidential attribute is deleted. For example, in the case of the record group shown in FIG. 4, data as shown in FIG. 39 is obtained. It can be seen that (X, Y) in FIG. 4 is replaced with the center coordinates of the mesh element. In this way, anonymization is not performed and individual records are not specified.

そして、出力部１７０は、第３データ格納部１６０に格納されているデータを、出力装置（例えば表示装置、印刷装置やネットワークで接続された他のコンピュータ）に出力する（ステップＳ７）。 Then, the output unit 170 outputs the data stored in the third data storage unit 160 to an output device (for example, a display device, a printing device, or another computer connected via a network) (step S7).

以上のように処理を行えば、匿名性と分析精度を両立したデータを開示することができるようになる。また、メッシュ生成は高速に行われるので、全体の匿名化処理をも高速に行える。 If processing is performed as described above, it is possible to disclose data having both anonymity and analysis accuracy. Moreover, since mesh generation is performed at high speed, the entire anonymization process can be performed at high speed.

多様性の条件ｌを満たすようにグルーピングされたレコードだけが開示されるが、各グループには機密属性値についてｌ種類以上あるので、各グループのレコードが特定のＩＤしか存在し得ないようなメッシュ要素に包含されることはなく、匿名性が担保される。 Only records grouped so as to satisfy the condition of diversity l are disclosed, but since there are more than one type of confidential attribute value in each group, a mesh in which each group record can only have a specific ID It is not included in the element and anonymity is ensured.

また、開示される各メッシュ要素は大きさｄ未満となるため、小さいｄを指定することで高精度な分析を期待される。但し、ｄが小さいほどいずれのメッシュ要素にも分類されないレコードが増え、そのようなレコードは開示されないため、ｄを小さくし過ぎても良くない。 In addition, since each disclosed mesh element has a size less than d, high-precision analysis is expected by specifying a small d. However, as d is smaller, records that are not classified into any mesh element increase, and such records are not disclosed. Therefore, it is not necessary to make d too small.

さらに、複数種類のメッシュで多様性の条件ｌを満たすか否かを判定するため、多様性ｌの条件を満たさないレコード数を減らすことができる。これは、開示されるレコード数を増やす、つまり分析に使えるデータ量が増えることになるので、高精度な分析が期待される。 Furthermore, since it is determined whether or not the diversity condition l is satisfied by a plurality of types of meshes, the number of records that do not satisfy the diversity condition l can be reduced. This increases the number of records to be disclosed, that is, increases the amount of data that can be used for analysis, so high-precision analysis is expected.

以上本発明の実施の形態を説明したが、本発明はこの実施の形態に限定されるものではない。例えば、図３に示した機能ブロック図は一例であって、プログラムモジュール構成やファイル構成とは一致しない場合もある。処理フローについても、処理結果が変わらない限り、ステップの順番を入れ替えたり、複数ステップを並列に実行するようにしても良い。 Although the embodiment of the present invention has been described above, the present invention is not limited to this embodiment. For example, the functional block diagram shown in FIG. 3 is an example, and may not match the program module configuration or the file configuration. As for the processing flow, as long as the processing result does not change, the order of the steps may be changed or a plurality of steps may be executed in parallel.

なお、図６のメッシュ生成処理は一例であって、他の手順で同様のメッシュを生成しても良いし、さらに異なる態様のメッシュを生成するようにしても良い。予め単位サイズのメッシュを所定種類だけ生成しておき、サイズｄに応じてサイズ変換を行うようにしても良い。 Note that the mesh generation process of FIG. 6 is an example, and a similar mesh may be generated by another procedure, or a mesh having a different aspect may be generated. A predetermined number of meshes of unit size may be generated in advance, and size conversion may be performed according to the size d.

なお、上で述べた情報処理装置１００は、例えばコンピュータ装置であって、図４０に示すように、メモリ２５０１とＣＰＵ２５０３とハードディスク・ドライブ（ＨＤＤ）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatus 100 described above is, for example, a computer apparatus, and as shown in FIG. 40, a display control unit 2507 connected to a memory 2501, a CPU 2503, a hard disk drive (HDD) 2505, and a display apparatus 2509. A drive device 2513 for the removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本実施の形態をまとめると、以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態に係る匿名化データ生成方法は、（Ａ）データ格納部に格納されており且つ各々機密属性値と数値属性値とを含む複数のデータブロックにおける各数値属性で張られる空間における複数種類のメッシュの各々について、複数のデータブロックのうち、当該メッシュの１つのメッシュ要素に含まれ且つグループ化未了の第１のデータブロックを含み且つ機密属性値の度数分布が所定の条件を満すデータブロックのグループを抽出し、（Ｂ）上記グループに属するデータブロックの数値属性値を、上記グループについての数値属性値で置換する処理を含む。 The anonymized data generation method according to the present embodiment includes (A) a plurality of spaces in a space spanned by each numerical attribute in a plurality of data blocks that are stored in the data storage unit and each include a confidential attribute value and a numerical attribute value. For each type of mesh, the frequency distribution of confidential attribute values satisfies a predetermined condition, including a first data block that is included in one mesh element of the mesh and that has not been grouped, among a plurality of data blocks. (B) processing for replacing the numerical attribute values of the data blocks belonging to the group with the numerical attribute values for the group.

機密属性値の度数分布が所定の条件を満たすデータブロックのグループが生成されるので、確実な匿名化がなされる。また、複数種類のメッシュを用いることで、いずれかのメッシュのメッシュ要素でグループ化が行われる可能性があるため、開示されないデータブロックの数を減らすことができる。すなわち、分析に用いることができるデータ量が増加し、分析精度が向上する。また、メッシュ要素のサイズについても調整可能であり、この点についても分析精度向上の要因となる。メッシュ要素のサイズは、メッシュの各種類について同じにする場合もある。さらに、任意の位置にグループ化のための範囲を規定することによりグループ化を行う場合に比して処理が高速化される。 Since a group of data blocks whose frequency distribution of confidential attribute values satisfies a predetermined condition is generated, reliable anonymization is performed. In addition, by using a plurality of types of meshes, there is a possibility that grouping may be performed with mesh elements of any mesh, so the number of data blocks that are not disclosed can be reduced. That is, the amount of data that can be used for analysis increases and the analysis accuracy improves. In addition, the size of the mesh element can be adjusted, and this is also a factor in improving the analysis accuracy. The size of the mesh element may be the same for each type of mesh. Further, by defining a grouping range at an arbitrary position, the processing speed is increased as compared with the case where grouping is performed.

さらに、本匿名化データ生成方法は、上で述べたグループに属するデータブロックの機密属性値を削除する処理をさらに含むようにしても良い。これによって安全なデータを開示できるようになる。 Further, the anonymized data generation method may further include a process of deleting the confidential attribute value of the data block belonging to the group described above. This makes it possible to disclose safe data.

さらに、上で述べた所定の条件が、機密属性値の種類数の下限値を含む場合もある。この場合、上で述べた抽出する処理は、（ａ１）グループ化未了の第１のデータブロックを含み且つ上記１つのメッシュ要素に含まれるデータブロックの集合における機密属性値の種類が、機密属性値の種類の下限値以上となっているか判断する処理を含むようにしても良い。このようにすれば、秘匿化の安全性が高まる。 Furthermore, the predetermined condition described above may include a lower limit value for the number of types of confidential attribute values. In this case, the extraction process described above includes (a1) the type of confidential attribute value in the set of data blocks including the first data block not yet grouped and included in the one mesh element is classified attribute You may make it include the process which judges whether it is more than the lower limit of the kind of value. If it does in this way, the security of concealment will increase.

さらに、上で述べた抽出する処理は、（ａ２）グループ化未了の第１のデータブロックを含み且つ上記１つのメッシュ要素に含まれるデータブロックの第２の集合についての機密属性値の度数分布が、所定の条件を満たすか判断し、（ａ３）データブロックの第２の集合についての機密属性値の度数分布が所定の条件を満たさない場合には、所定の条件を満たすようにデータブロックの第２の集合から第２のデータブロックを除外することでデータブロックのグループを生成する処理を含むようにしても良い。例えば、データブロックの第２の集合についての機密属性値の種類数が、その下限値以上となっていれば、適切なデータブロックを除外することで度数分布の条件を満たすようになる場合もある。 Further, the extracting process described above includes: (a2) a frequency distribution of confidential attribute values for a second set of data blocks including a first data block that has not been grouped and included in the one mesh element; (A3) If the frequency distribution of the confidential attribute value for the second set of data blocks does not satisfy the predetermined condition, the data block is configured to satisfy the predetermined condition. A process of generating a group of data blocks by excluding the second data block from the second set may be included. For example, if the number of types of confidential attribute values for the second set of data blocks is equal to or greater than the lower limit value, the frequency distribution condition may be satisfied by excluding appropriate data blocks. .

また、上で述べた抽出する処理は、（ａ４）データブロックのグループから、機密属性値の度数分布が所定の条件を満たす上で必須となるデータブロック以外の第３のデータブロックを抽出する処理を含むようにしても良い。この場合、データブロックの他のグループを抽出する処理において、第３のデータブロックが他の種類のメッシュにおけるメッシュ要素において抽出される場合もある。このようにすれば、第３のデータブロックが他のグループ生成に用いられるので、削除されてしまうデータブロックを削減することができるようになる。 The extraction process described above includes (a4) a process of extracting a third data block other than the data block that is essential when the frequency distribution of the confidential attribute value satisfies a predetermined condition from the group of data blocks. May be included. In this case, in the process of extracting another group of data blocks, the third data block may be extracted from mesh elements in other types of meshes. In this way, since the third data block is used for generating other groups, data blocks that are deleted can be reduced.

なお、本実施の形態に係る匿名化データ生成方法は、（Ｃ）メッシュ要素の境界を共有しないように、複数種類のメッシュを生成する処理をさらに含む場合もある。このようにすれば、匿名化されないデータブロックをより効率的に削減できるようになる。 Note that the anonymized data generation method according to the present embodiment may further include (C) a process of generating a plurality of types of meshes so as not to share the boundaries of the mesh elements. In this way, data blocks that are not anonymized can be more efficiently reduced.

また、本実施の形態に係る匿名化データ生成方法は、（Ｄ）これまでに生成されたメッシュの境界のうち最も近い境界との距離が最大となり、且つこれまでに生成されたメッシュの格子点との距離が最大となる基準点を選択し、（Ｅ）選択された基準点及びメッシュ要素のサイズに基づきメッシュを生成する処理をさらに含むようにしても良い。このようにすれば、匿名化されないデータブロックをより効率的に削減できるようになる。 In addition, the anonymized data generation method according to the present embodiment is (D) the maximum distance from the nearest boundary among mesh boundaries generated so far, and the mesh grid points generated so far And (E) a process of generating a mesh based on the selected reference point and the size of the mesh element may be further included. In this way, data blocks that are not anonymized can be more efficiently reduced.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭなどの光ディスク、光磁気ディスク、半導体メモリ（例えばＲＯＭ）、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。なお、処理途中のデータについては、ＲＡＭ等の記憶装置に一時保管される。 It is possible to create a program for causing a computer to carry out the processing described above, such as a flexible disk, an optical disk such as a CD-ROM, a magneto-optical disk, and a semiconductor memory (for example, ROM). Or a computer-readable storage medium such as a hard disk or a storage device. Note that data being processed is temporarily stored in a storage device such as a RAM.

以下、本実施の形態に係る付記を記す。 Hereinafter, additional notes according to the present embodiment will be described.

（付記１）
データ格納部に格納されており且つ各々機密属性値と数値属性値とを含む複数のデータブロックにおける各数値属性で張られる空間における複数種類のメッシュの各々について、前記複数のデータブロックのうち、当該メッシュの１つのメッシュ要素に含まれ且つグループ化未了の第１のデータブロックを含み且つ機密属性値の度数分布が所定の条件を満すデータブロックのグループを抽出し、
前記グループに属するデータブロックの数値属性値を、前記グループについての数値属性値で置換する
処理を含み、コンピュータにより実行される匿名化データ生成方法。 (Appendix 1)
For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, Extracting a group of data blocks including a first data block which is included in one mesh element of the mesh and has not yet been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
An anonymized data generation method executed by a computer, including a process of replacing a numerical attribute value of a data block belonging to the group with a numerical attribute value for the group.

（付記２）
前記グループに属するデータブロックの機密属性値を削除する
処理をさらに含む付記１記載の匿名化データ生成方法。 (Appendix 2)
The anonymized data generation method according to appendix 1, further including a process of deleting a confidential attribute value of a data block belonging to the group.

（付記３）
前記所定の条件が、機密属性値の種類数の下限値を含み、
前記抽出する処理は、
前記グループ化未了の第１のデータブロックを含み且つ前記１つのメッシュ要素に含まれるデータブロックの集合における機密属性値の種類が、前記機密属性値の種類の下限値以上となっているか判断する
処理を含む付記１又は２記載の匿名化データ生成方法。 (Appendix 3)
The predetermined condition includes a lower limit value of the number of types of confidential attribute values,
The extraction process is:
It is determined whether the type of the confidential attribute value in the set of data blocks including the first data block that has not been grouped and included in the one mesh element is equal to or greater than the lower limit value of the type of the confidential attribute value. The anonymization data generation method of Additional remark 1 or 2 including a process.

（付記４）
前記抽出する処理は、
前記グループ化未了の第１のデータブロックを含み且つ前記１つのメッシュ要素に含まれるデータブロックの第２の集合についての機密属性値の度数分布が、前記所定の条件を満たすか判断し、
前記データブロックの第２の集合についての機密属性値の度数分布が前記所定の条件を満たさない場合には、前記所定の条件を満たすように前記データブロックの第２の集合から第２のデータブロックを除外することでデータブロックのグループを生成する
処理を含む付記１乃至３のいずれか１つ記載の匿名化データ生成方法。 (Appendix 4)
The extraction process is:
Determining whether a frequency distribution of confidential attribute values for a second set of data blocks including the first data block that has not been grouped and included in the one mesh element satisfies the predetermined condition;
If the frequency distribution of confidential attribute values for the second set of data blocks does not satisfy the predetermined condition, the second data block from the second set of data blocks to satisfy the predetermined condition The anonymized data generation method according to any one of appendices 1 to 3, including a process of generating a group of data blocks by excluding.

（付記５）
前記抽出する処理は、
前記データブロックのグループから、前記機密属性値の度数分布が前記所定の条件を満たす上で必須となるデータブロック以外の第３のデータブロックを抽出する処理
を含み、
データブロックの他のグループを抽出する処理において、前記第３のデータブロックが他の種類のメッシュにおけるメッシュ要素において抽出される
付記１乃至４のいずれか１つ記載の匿名化データ生成方法。 (Appendix 5)
The extraction process is:
A process of extracting a third data block other than the data block that is essential when the frequency distribution of the confidential attribute value satisfies the predetermined condition from the group of data blocks,
The anonymized data generation method according to any one of appendices 1 to 4, wherein in the process of extracting another group of data blocks, the third data block is extracted in a mesh element in another type of mesh.

（付記６）
メッシュ要素の境界を共有しないように、前記複数種類のメッシュを生成する
処理をさらに含む付記１乃至５のいずれか１つ記載の匿名化データ生成方法。 (Appendix 6)
The anonymized data generation method according to any one of appendices 1 to 5, further including a process of generating the plurality of types of meshes so as not to share a boundary between mesh elements.

（付記７）
これまでに生成されたメッシュの境界のうち最も近い境界との距離が最大となり、且つ前記これまでに生成されたメッシュの格子点との距離が最大となる基準点を選択し、
選択された前記基準点及び前記メッシュ要素のサイズに基づきメッシュを生成する
処理をさらに含む付記１乃至５のいずれか１つ記載の匿名化データ生成方法。 (Appendix 7)
Select a reference point that maximizes the distance to the nearest boundary of the mesh boundaries generated so far, and maximizes the distance to the grid points of the mesh generated so far,
The anonymized data generation method according to any one of appendices 1 to 5, further including a process of generating a mesh based on the selected reference point and the size of the mesh element.

（付記８）
データ格納部に格納されており且つ各々機密属性値と数値属性値とを含む複数のデータブロックにおける各数値属性で張られる空間における複数種類のメッシュの各々について、前記複数のデータブロックのうち、当該メッシュの１つのメッシュ要素に含まれ且つグループ化未了の第１のデータブロックを含み且つ機密属性値の度数分布が所定の条件を満すデータブロックのグループを抽出し、
前記グループに属するデータブロックの数値属性値を、前記グループについての数値属性値で置換する
処理を、コンピュータに実行させるための匿名化データ生成プログラム。 (Appendix 8)
For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, Extracting a group of data blocks including a first data block which is included in one mesh element of the mesh and has not yet been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
An anonymized data generation program for causing a computer to execute a process of replacing a numerical attribute value of a data block belonging to the group with a numerical attribute value for the group.

（付記９）
データ格納部に格納されており且つ各々機密属性値と数値属性値とを含む複数のデータブロックにおける各数値属性で張られる空間における複数種類のメッシュの各々について、前記複数のデータブロックのうち、当該メッシュの１つのメッシュ要素に含まれ且つグループ化未了の第１のデータブロックを含み且つ機密属性値の度数分布が所定の条件を満すデータブロックのグループを抽出するグループ化処理部と、
前記グループに属するデータブロックの数値属性値を、前記グループについての数値属性値で置換する匿名化処理部と、
を有する情報処理装置。 (Appendix 9)
For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, A grouping processing unit that extracts a group of data blocks that include a first data block that is included in one mesh element of the mesh and that has not been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
Anonymization processing unit for replacing the numerical attribute value of the data block belonging to the group with the numerical attribute value for the group;
An information processing apparatus.

１１０第１データ格納部
１２０設定データ格納部
１３０グループ化処理部
１４０第２データ格納部
１５０匿名化処理部
１６０第３データ格納部
１７０出力部
１８０メッシュ生成部
１９０第４データ格納部 110 First data storage unit 120 Setting data storage unit 130 Grouping processing unit 140 Second data storage unit 150 Anonymization processing unit 160 Third data storage unit 170 Output unit 180 Mesh generation unit 190 Fourth data storage unit

Claims

For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, Extracting a group of data blocks including a first data block which is included in one mesh element of the mesh and has not yet been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
An anonymized data generation method executed by a computer, including a process of replacing a numerical attribute value of a data block belonging to the group with a numerical attribute value for the group.

The anonymized data generation method according to claim 1, further comprising a process of deleting a confidential attribute value of a data block belonging to the group.

The predetermined condition includes a lower limit value of the number of types of confidential attribute values,
The extraction process is:
It is determined whether the type of the confidential attribute value in the set of data blocks including the first data block that has not been grouped and included in the one mesh element is equal to or greater than the lower limit value of the type of the confidential attribute value. The anonymized data generation method of Claim 1 or 2 including a process.

The extraction process is:
Determining whether a frequency distribution of confidential attribute values for a second set of data blocks including the first data block that has not been grouped and included in the one mesh element satisfies the predetermined condition;
If the frequency distribution of confidential attribute values for the second set of data blocks does not satisfy the predetermined condition, the second data block from the second set of data blocks to satisfy the predetermined condition The anonymized data generation method according to any one of claims 1 to 3, including a process of generating a group of data blocks by excluding.

The extraction process is:
A process of extracting a third data block other than the data block that is essential when the frequency distribution of the confidential attribute value satisfies the predetermined condition from the group of data blocks,
5. The anonymized data generation method according to claim 1, wherein in the process of extracting another group of data blocks, the third data block is extracted in a mesh element in another type of mesh.

The anonymized data generation method according to any one of claims 1 to 5, further including a process of generating the plurality of types of meshes so as not to share a boundary between mesh elements.

Select a reference point that maximizes the distance to the nearest boundary of the mesh boundaries generated so far, and maximizes the distance to the grid points of the mesh generated so far,
The anonymized data generation method according to claim 1, further comprising: generating a mesh based on the selected reference point and the size of the mesh element.

For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, Extracting a group of data blocks including a first data block which is included in one mesh element of the mesh and has not yet been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
An anonymized data generation program for causing a computer to execute a process of replacing a numerical attribute value of a data block belonging to the group with a numerical attribute value for the group.

For each of a plurality of types of meshes in the space that is stored in the data storage unit and that is spanned by each numerical attribute in a plurality of data blocks each including a confidential attribute value and a numerical attribute value, among the plurality of data blocks, A grouping processing unit that extracts a group of data blocks that include a first data block that is included in one mesh element of the mesh and that has not been grouped and whose frequency distribution of confidential attribute values satisfies a predetermined condition;
Anonymization processing unit for replacing the numerical attribute value of the data block belonging to the group with the numerical attribute value for the group;
An information processing apparatus.