JP6256035B2

JP6256035B2 - Data editing program, data editing method, and data editing apparatus

Info

Publication number: JP6256035B2
Application number: JP2014008149A
Authority: JP
Inventors: 裕司山岡
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-01-20
Filing date: 2014-01-20
Publication date: 2018-01-10
Anticipated expiration: 2034-01-20
Also published as: JP2015138302A

Description

本発明は、データ編集プログラム、データ編集方法およびデータ編集装置に関するものである。 The present invention relates to a data editing program, a data editing method, and a data editing apparatus.

個票データは２次元の表で表される関係モデルであって、各行にサンプルの情報が格納されているものである。サンプルは人であっても、他の動物であっても、装置であっても良いが、以下では、サンプルは人であるとする。情報としては、たとえば、個票データの各行に、「識別子」（サンプルが人の場合は「氏名」等であり得る）、「生年月日」、「郵便番号」、「趣味」などが含まれる。このような個票データから、各サンプルのプライバシーに配慮しつつ多くの情報が残るよう、データを一般化（変換）したい場合がある。 Individual vote data is a relational model represented by a two-dimensional table, in which sample information is stored in each row. The sample may be a person, another animal, or a device, but in the following, it is assumed that the sample is a person. The information includes, for example, “identifier” (may be “name” if the sample is a person), “date of birth”, “postal code”, “hobby”, etc. . In some cases, it is desired to generalize (convert) the data so that a large amount of information remains in consideration of the privacy of each sample.

個票データを一般化したい場合として、データの２次活用が考えられる。たとえば、Ａ社が顧客から図１のような個票データを集め、そのデータをＢ社に販売し、Ｂ社が個票データを分析した知識を市場分析等に役立てる、といった状況である。このとき、Ａ社は顧客に配慮し、顧客のプライバシーが侵害されうるデータはＢ社等の他社に提供したくないとする。一方、Ｂ社はＡ社の一人一人の情報には興味がないが、全体的な情報はできるだけ正確に得たいとする。 As a case where the individual vote data is to be generalized, secondary utilization of the data can be considered. For example, company A collects individual vote data as shown in FIG. 1 from a customer, sells the data to company B, and company B uses the knowledge of analyzing the individual vote data for market analysis and the like. At this time, Company A considers the customer and does not want to provide other companies such as Company B with data that may infringe on the customer's privacy. On the other hand, Company B is not interested in the information of each individual of Company A, but wants to obtain overall information as accurately as possible.

一般化の方法の一つとして、属性毎の統計化が知られている。たとえば、個票データに、氏名、生年月日、郵便番号、趣味が記載されている場合、「趣味」について度数分布の情報に変換する方法である。たとえば、ゴルフを趣味とする人が２人、水泳を趣味とする人が２人、読書を趣味とする人が２人、料理を趣味とする人が２人であるとすると、趣味=｛ゴルフ：２、水泳：２、読書：２、料理：２｝のように度数分布の情報に変換する。この情報を見てもどれが誰の趣味かわからないが、全体的な趣味の傾向を知ることができる。 As one of the generalization methods, statisticalization for each attribute is known. For example, when the name, date of birth, postal code, and hobby are described in the individual vote data, the “hobby” is converted into frequency distribution information. For example, if there are two people who have a hobby of golf, two people who have a hobby of swimming, two people who have a hobby of reading, and two people who have a hobby of cooking, hobby = {golf : 2, swimming: 2, reading: 2, cooking: 2}. Looking at this information does not tell who is a hobby, but it is possible to know the trend of the overall hobby.

このように属性毎の統計化は要求に応える方法の一つといえる。しかし、この方法は属性間の関係がわからないという欠点がある。たとえば、上の例の場合、Ｂ社は「生年」と「趣味」との相関を分析したいと考えているとき、「生年」や「趣味」の統計データを別々に入手しても、それを実現できない。 Thus, it can be said that the statistics for each attribute is one of the methods to meet the demand. However, this method has a drawback that the relationship between attributes is not known. For example, in the case of the above example, if company B wants to analyze the correlation between "birth year" and "hobby", even if it obtains statistical data for "birth year" and "hobby" separately, Cannot be realized.

属性間の関係を分析できる一般化の方法の一つとして、無名化が知られている。つまり、個票データの各行の「識別子」かそれに近い属性（サンプルが人の場合は「氏名」であり得る）を削除した後の関係モデルである。これを見たＢ社は（「識別子」以外の）各属性の関係を分析することができる。 Anonymization is known as one of the generalization methods that can analyze the relationship between attributes. That is, the relation model after deleting the “identifier” of each line of the individual vote data or an attribute close to it (if the sample is a person, it may be “name”). Company B who saw this can analyze the relationship between the attributes (other than “identifier”).

属性間の関係を分析できる一般化の方法の一つとして、ｌ−多様化が知られている（たとえば、非特許文献１）。ｌ−多様化は、Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ（ＱＩ）とＳｅｎｓｉｔｉｖｅ−Ａｔｔｒｉｂｕｔｅ（ＳＡ）などを入力として受け、ＱＩの値が近い行同士をグループ化し、各グループのＳＡの値が多様性を持つようにする。ＳＡは各サンプル、すなわち個票データの各行のデータ提供者がみだりに知られたくない情報の列であり、ＱＩは他人でも容易に知ることができる情報の集合である。ＱＩは個票データでは、１つ以上の列に記載されているデータに対応し得る。ここで、多様性とは、ＳＡ値の度数分布について判定できる性質で、普通は度数分布の偏りが小さいことを判定する性質である。 As one of the generalization methods that can analyze the relationship between attributes, l-diversification is known (for example, Non-Patent Document 1). l-Diversification receives Quasi-Identifier (QI) and Sensitive-Attribute (SA) as inputs and groups rows with close QI values so that the SA values of each group have diversity. . SA is a column of information that the data provider of each sample, that is, each row of the individual vote data, does not want to be known to the public, and QI is a set of information that can be easily known by others. The QI can correspond to data described in one or more columns in the individual vote data. Here, diversity is a property that can be determined for the frequency distribution of SA values, and is usually a property that determines that the bias of the frequency distribution is small.

また、コンピュータを介して商品及びサービスを売買することに関連して、その商品及びサービスに関する知識を表現する方法として、少なくとも１つのエンティティ及び各エンティティと関連した少なくとも１つの特性を含む機能的分類を行う方法が知られている（たとえば、特許文献１）。たとえば、制約満足化技法と共にオブジェクト指向原理を組み込んだオブジェクト中心の制約方法に依存する方法が知られている。オブジェクト指向原理は、識別、分類、多相性、および継承を含み、制約満足化方法は、属性の間のインタ・コンセプト間或いはイントラ・コンセプトの関係を表す方法である。このような方法を採用することによって、複数の売り手が、種々の環境でカスタマイズ、統合、相互交換、及び再使用することができる画一的な方法でそれぞれの製品を説明することができる。 In addition, in relation to buying and selling goods and services via a computer, a functional classification including at least one entity and at least one characteristic associated with each entity as a method of expressing knowledge about the goods and services. The method of performing is known (for example, patent document 1). For example, methods that rely on object-centric constraint methods that incorporate object-oriented principles with constraint satisfaction techniques are known. Object-oriented principles include identification, classification, polymorphism, and inheritance, and constraint satisfaction methods are methods that represent inter-concept or intra-concept relationships between attributes. By adopting such a method, multiple sellers can describe each product in a uniform way that can be customized, integrated, interchanged and reused in various environments.

特開平１０−１４９２８８号公報JP-A-10-149288

ＡｓｈｗｉｎＭａｃｈａｎａｖａｊｊｈａｌａ、ＤａｎｉｅｌＫｉｆｅｒ、ＪｏｈａｎｎｅｓＧｅｈｒｋｅ、ａｎｄＭｕｔｈｕｒａｍａｋｒｉｓｈｎａｎＶｅｎｋｉｔａｓｕｂｒａｍａｎｉａｍ、Ｌ−ｄｉｖｅｒｓｉｔｙ：Ｐｒｉｖａｃｙｂｅｙｏｎｄｋ−ａｎｏｎｙｍｉｔｙ、ＡＣＭＴｒａｎｓ．Ｋｎｏｗｌ．Ｄｉｓｃｏｖ．Ｄａｔａ、Ｖｏｌ．１、Ｍａｒｃｈ２００７Ashwin Macavanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venquitabramaniam, L-diversity: Privacy beyondmitymany, Knowl. Discov. Data, Vol. 1. March 2007

しかしながら、実際には、人によってどの属性を知られたくないと思うかが違う場合があり、各サンプル、すなわち個票データの各行のデータ提供者がみだりに知られたくない情報（たとえばｌ−多様化による一般化を用いる場合では、ＳＡ）と、他人でも容易に知ることができる情報（たとえばｌ−多様化による一般化を用いる場合では、ＱＩ）を適切に設定することが難しく、プライバシーに配慮した一般化ができないという問題がある。 However, in practice, it may be different depending on the attribute that a person does not want to be known, and information (for example, l-diversification) that the data provider of each sample, that is, each row of individual data, is not surely known. SA) and information that can be easily known by others (for example, when using generalization by l-diversification, QI) is difficult to set appropriately, and privacy is considered. There is a problem that it cannot be generalized.

よって、一つの側面として、本発明は、サンプルのプライバシーに配慮して属性間の関係を分析することができるデータ編集プログラム、データ編集方法およびデータ編集装置を提供することを目的とする。 Therefore, as one aspect, an object of the present invention is to provide a data editing program, a data editing method, and a data editing apparatus that can analyze a relationship between attributes in consideration of sample privacy.

複数のサンプルのそれぞれに対する複数の属性の値を含む関係モデルと、前記属性ごとに階層と前記階層のそれぞれに対応する一般化値とが定義された一般化情報とを用いて、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の全ての属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出する処理をコンピュータに実行させることを特徴とするデータ編集プログラムが提供される。
A relation model including a plurality of attribute values for each of a plurality of samples, and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute, A combination of the generalized values of the plurality of samples corresponding to one, and corresponding to all other attributes of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes There is provided a data editing program that causes a computer to execute a process of extracting a combination of the generalized values in which the generalized values have diversity.

サンプルのプライバシーに配慮して属性間の関係を分析することができる。 The relationship between attributes can be analyzed in consideration of sample privacy.

個票データの例を示す図である。It is a figure which shows the example of individual vote data. 無名化後の関係モデルの例を示す図である。It is a figure which shows the example of the relationship model after anonymization. ｌ−多様化後の関係モデルの例を示す図である。It is a figure which shows the example of the relationship model after l-diversification. データ編集装置の機能ブロックの例を示す図である。It is a figure which shows the example of the functional block of a data editing apparatus. 「生年」の一般化木の例を示す図である。It is a figure which shows the example of the generalized tree of "the year of birth." 「郵便番号」の一般化木の例を示す図である。It is a figure which shows the example of the generalization tree of "zip code". 「趣味」の一般化木の例を示す図である。It is a figure which shows the example of the generalization tree of "hobby". 各一般化木の各階層の多様性定義の例を示す図である。It is a figure which shows the example of the diversity definition of each hierarchy of each generalization tree. 図２に一般化情報を適用した例を示す図である。It is a figure which shows the example which applied generalization information to FIG. 一般化情報の集合の例を示す図である。It is a figure which shows the example of the set of generalization information. 一般化した結果の例を示す図である。It is a figure which shows the example of the generalized result. 一般化した結果の例を示す図である。It is a figure which shows the example of the generalized result. 一般化した結果の例を示す図である。It is a figure which shows the example of the generalized result. データ編集装置の構成の例を示す図である。It is a figure which shows the example of a structure of a data editing apparatus. 処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process. Ｐに対する集合Ｏを作る処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process which makes the set O with respect to P. FIG. Ｐに対する集合Ｉを作る処理の流れの例を示す図である。It is a figure which shows the example of the flow of a process which makes the set I with respect to P. FIG.

以下、図面を参照して、実施形態のデータ編集プログラム、データ編集方法およびデータ編集装置について説明する。実施形態のデータ編集プログラム、データ編集方法およびデータ編集装置は、個票データ、すなわち各行に各人の情報を格納した関係モデル（２次元表）を、各人のプライバシーに配慮しつつ多くの情報が残るよう、データを一般化（変換）する。 Hereinafter, a data editing program, a data editing method, and a data editing apparatus according to embodiments will be described with reference to the drawings. The data editing program, the data editing method, and the data editing apparatus according to the embodiment include individual information, that is, a relational model (two-dimensional table) in which each person's information is stored in each row. Generalize (convert) the data so that

図１は、個票データの例を示す図である。
個票データを一般化したい場合として、データの２次活用が考えられる。たとえば、Ａ社が顧客から図１のような個票データを集め、そのデータをＢ社に販売し、Ｂ社が個票データを分析した知識を市場分析等に役立てる、といった状況である。このとき、Ａ社は顧客に配慮し、顧客のプライバシーが侵害されうるデータはＢ社等の他社に提供したくないとする。一方、Ｂ社はＡ社の一人一人の情報には興味がないが、全体的な情報はできるだけ正確に得たいとする。 FIG. 1 is a diagram illustrating an example of individual vote data.
As a case where the individual vote data is to be generalized, secondary utilization of the data can be considered. For example, company A collects individual vote data as shown in FIG. 1 from a customer, sells the data to company B, and company B uses the knowledge of analyzing the individual vote data for market analysis and the like. At this time, Company A considers the customer, and does not want to provide other companies such as Company B with data that may infringe on the privacy of the customer. On the other hand, Company B is not interested in the information of each individual of Company A, but wants to obtain overall information as accurately as possible.

一般化の方法として、属性毎の統計化がある。たとえば、図１の場合、「趣味」について、趣味＝｛ゴルフ：２、水泳：２、読書：２、料理：２｝などと度数分布の情報に変換する方法である。Ｂ社はこの情報を見てもどれが誰の趣味かわからないが、全体的な趣味の傾向はわかる。 As a generalization method, there is statistics for each attribute. For example, in the case of FIG. 1, “hobby” is a method of converting into frequency distribution information such as hobby = {golf: 2, swimming: 2, reading: 2, cooking: 2}. Company B does not know who has a hobby even after looking at this information, but can understand the overall trend of hobbies.

このように属性毎の統計化は要求に応える方法の一つといえる。しかし、この方法は属性間の関係がわからないという欠点がある。たとえば、図１の場合、Ｂ社は「生年」と「趣味」との相関を分析したいと考えているとき、「生年」や「趣味」の統計データを別々に入手しても、それを実現できない。以下では、そのように、属性間の関係を分析できるような一般化を対象とする。 Thus, it can be said that the statistics for each attribute is one of the methods to meet the demand. However, this method has a drawback that the relationship between attributes is not known. For example, in the case of Fig. 1, when company B wants to analyze the correlation between "birth year" and "hobby", even if it obtains statistical data for "birth year" and "hobby" separately, it achieves that. Can not. In the following, we will focus on generalization that can analyze the relationship between attributes.

属性間の関係を分析できる一般化の方法として、無名化がある。
図２は、無名化後の関係モデルの例を示す図である。図２には、図１を無名化、つまり行の識別子かそれに近い属性、例えば図１の「氏名」を削除した後の関係モデルである。 Anonymization is a generalization method that can analyze the relationship between attributes.
FIG. 2 is a diagram illustrating an example of a relation model after anonymization. FIG. 2 shows a relation model after anonymizing FIG. 1, that is, deleting a row identifier or an attribute close thereto, for example, “name” of FIG.

これを見たＢ社は（「氏名」以外の）各属性の関係を分析することができる。しかし、無名化だけではプライバシー保護が不十分な場合がある。たとえば、次の状況を考える。
（１）「八条八郎」は自分の趣味がダンスであることをＢ社に知られたくないと思っている。
（２）Ｂ社は「八条八郎」の「生年」が１９８９年で「郵便番号」が１２２と知っている。
（３）Ｂ社は図２に示されている個票データに「八条八郎」の行があることを知っている。 Company B who saw this can analyze the relationship between the attributes (other than “name”). However, privacy protection may not be sufficient just by anonymization. For example, consider the following situation.
(1) “Hachijo Hachiro” doesn't want B to know that his hobby is dancing.
(2) Company B knows that “the year of birth” of “Hachijo Hachiro” is 1989 and “zip code” is 122.
(3) Company B knows that there is a line “Hachijo Hachiro” in the individual data shown in FIG.

これらの状況が全て成立した場合、図２はプライバシー保護が十分とはいえない。なぜなら、Ｂ社は生年と郵便番号から、「八条八郎」の行が図２の８行目、すなわち、（生年, 郵便番号）＝（１９８９、１２２）であることを知ることができる。よって、「八条八郎」の「趣味」が「ダンス」だとわかる。しかし、これは「八条八郎」が望まないことだからである。 When all of these situations are established, privacy protection is not sufficient in FIG. This is because Company B can know from the year of birth and the postal code that the line “Hachijo Hachiro” is the eighth line in FIG. 2, ie, (Birth year, postal code) = (1989, 122). Therefore, it can be understood that the “hobby” of “Hachijo Hachiro” is “dance”. However, this is because "Hachijo Hachiro" does not want.

実施形態のデータ編集装置、データ編集方法およびデータ編集プログラムでは、一般化後の関係モデルを見る者は、そこに含まれるある一人について詳細な情報を持っている場合を対象とする。その場合、上記例のように、無名化だけではプライバシー保護が不十分である。 In the data editing apparatus, the data editing method, and the data editing program of the embodiment, a person who sees a relation model after generalization is targeted for a case where detailed information about a certain person included therein is included. In that case, as in the above example, privacy protection is insufficient only by anonymization.

図３は、ｌ−多様化後の関係モデルの例を示す図である。ｌ−多様化は、属性間の関係を分析できる一般化の方法の一つである。 FIG. 3 is a diagram illustrating an example of a relationship model after l-diversification. l-Diversification is one of the generalization methods that can analyze the relationship between attributes.

ｌ−多様化は、ｌ−多様化は、Ｑｕａｓｉ−Ｉｄｅｎｔｉｆｉｅｒ（ＱＩ）とＳｅｎｓｉｔｉｖｅ−Ａｔｔｒｉｂｕｔｅ（ＳＡ）などを入力として受け、ＱＩ値（ＱＩの値）が近い行同士をグループ化し（たとえば同じ値に一般化し）、各グループのＳＡ値（ＳＡの値）が多様性を持つようにする。ＳＡは各人（行データ提供者）がみだりに知られたくない情報の列であり、ＱＩは他人でも容易に知ることができる情報の列集合（１つ以上の列）である。 l-Diversification is an example of l-diversification, which accepts Quasi-Identifier (QI) and Sensitive-Attribute (SA) as inputs, and groups rows with similar QI values (QI values) (for example, to the same value). In general, the SA values (SA values) of each group have diversity. SA is a column of information that each person (row data provider) does not want to be made aware of, and QI is a column set (one or more columns) of information that other people can easily know.

ｌ−多様化における多様性とは、ＳＡの値の度数分布について判定できる性質で、普通は度数分布の偏りが小さいことを判定する性質である。 Diversity in l-diversification is a property that can be determined for the frequency distribution of SA values, and is usually a property that determines that the bias of the frequency distribution is small.

ｌ−多様化の例として、たとえば、ＱＩ＝｛生年、郵便番号｝、ＳＡ＝趣味とし、多様性を「趣味の値が２種類以上あること」としてｌ−多様化することが考えられる。たとえば、１９７０年生まれ、郵便番号が１２１の場所に住み、水泳を趣味とするｋさんと、１９７２年生まれ、郵便番号が１２２の場所に住み、読書を趣味とするｍさんがいるとする。ここで、「１９７？年」生まれ、郵便番号が「１２？」をＱＩとすると、ＳＡ＝｛水泳、読書｝と２種類のＳＡ値が存在するので、「趣味の値が２種類以上あること」という多様性の条件を満たす。このように、ｌ−多様化は、属性間の関係を分析できる上、プライバシーにも配慮した一般化方法として知られている。 As an example of l-diversification, for example, QI = {year of birth, postal code}, SA = hobby, and diversity can be considered as “having two or more hobby values”. For example, suppose that there is k who was born in 1970 and lives in a place where the postal code is 121 and has a hobby of swimming, and m who was born in 1972 and lives in a place where the postal code is 122 and has a hobby of reading. Here, suppose that born in "197?" And the postal code is "12?" As QI, SA = {swimming, reading} and there are two types of SA values, so there are two or more hobby values. The condition of diversity is satisfied. Thus, l-diversification is known as a generalization method that can analyze the relationship between attributes and also considers privacy.

図３に示されている例は、図２に示されている例に対し、ＱＩ＝｛生年、郵便番号｝、ＳＡ＝｛趣味｝とし、多様性を「趣味の値が２種類以上あること」としてｌ−多様化した例である。図３中で、「？」は任意の数字を表し、グループのＱＩ値を同じ値に一般化するのに使用されている。ＱＩ値が互いに近い、｛１、３｝行目、｛２、４｝行目、｛５、６｝行目、｛７、８｝行目がグループ化されている。また、ＱＩ値が近い行がない９行目は墨塗りされている。各グループのＳＡ値は多様性を満たしている。すなわち、どのグループのＳＡ値の度数分布も、２種類以上のＳＡ値がある。 In the example shown in FIG. 3, QI = {birth year, postal code}, SA = {hobby}, and the diversity is “two or more hobby values”, compared to the example shown in FIG. 2. Is an example of diversification. In FIG. 3, “?” Represents an arbitrary number and is used to generalize the QI values of groups to the same value. The {1, 3} line, {2, 4} line, {5, 6} line, {7, 8} line, whose QI values are close to each other, are grouped. In addition, the ninth line where no line has a close QI value is painted in black. The SA value of each group satisfies diversity. That is, the frequency distribution of SA values of any group has two or more types of SA values.

図３に示されている例では、属性間の関係を分析できる上、プライバシーにも配慮されている。たとえば、前述の例で考えると、Ｂ社は「八条八郎」の行を｛７、８｝行目のいずれかに絞り込めず、従ってその「趣味」も｛読書, ダンス｝のいずれかに絞り込めない。なお、「八条八郎」が図３の９行目でないことは、ｌ−多様化のアルゴリズムを知っていればわかる。９行目はＱＩ値が近い行がなかったため墨塗りされたのであり、７行目や８行目のＱＩ値と近ければ墨塗りされなかったはずなので、｛７、８｝行目のＱＩ値に相当する「八条八郎」は９行目でないことがわかる。 In the example shown in FIG. 3, the relationship between attributes can be analyzed and privacy is also taken into consideration. For example, considering the above example, Company B cannot narrow the line of “Hachijo Hachiro” to any of {7, 8} lines, and therefore its “hobby” is also narrowed to one of {reading, dancing}. I can't put it. It should be noted that “Hachijo Hachiro” is not the ninth line in FIG. 3 if one knows the l-diversification algorithm. The 9th line was painted because there was no line with a close QI value, and it should not have been painted if it was close to the 7th or 8th line QI value, so the QI value of the {7, 8} line It can be seen that “Hachijo Hachiro” corresponding to is not on the 9th line.

このように、ｌ−多様化は、属性間の関係を分析できる上、プライバシーにも配慮することができる。 As described above, l-diversification can analyze the relationship between attributes and can also consider privacy.

しかし、ｌ−多様化はＱＩとＳＡを利用者（たとえば、Ａ社等）が適切に決めなければならないという問題がある。 However, l-diversification has a problem that the user (for example, Company A) must appropriately determine QI and SA.

たとえば、先と違い、「八条八郎」は自分の生年や郵便番号をＢ社に知られたくないと思っていて、Ｂ社は「八条八郎」の「趣味」がダンスだと知っている状況を考える。ただし、Ａ社はそのことを知らず、先と同じくＱＩとＳＡを設定したとする。この状況でＢ社が図３の関係モデルを見ると、「八条八郎」の行は８行目で、その「生年」と「郵便番号」はそれぞれ“１９８９”と“１２？”である可能性が高いことがわかってしまう。つまり、「八条八郎」にとってプライバシー保護が不十分となる。なお、「八条八郎」の行が９行目の可能性もあるが、たとえば「趣味」がダンスの人が稀なことや、各人の趣味の度数分布がわかっている場合などには、「八条八郎」は８行目である可能性が高まることがある。 For example, unlike the previous case, “Hachijo Hachiro” does not want his company's year of birth and postal code to be known to Company B, and Company B knows that the “hobby” of “Hachijo Hachiro” is dance. Think. However, it is assumed that Company A does not know that and has set QI and SA as before. In this situation, when company B looks at the relationship model in Figure 3, the line “Hachijo Hachiro” is line 8, and the “year of birth” and “zip code” may be “1989” and “12?”, Respectively. It turns out that is high. In other words, privacy protection is insufficient for “Hachijo Hachiro”. There is a possibility that the line “Hachijo Hachiro” is the 9th line. For example, when “Hobby” is rare for dancers or when the frequency distribution of each person ’s hobby is known, The possibility of “Hachiro Hachiro” in the eighth line may increase.

また、実際には、人によってどの属性を知られたくないと思うかが違う場合があり、その場合にはＱＩとＳＡを適切に設定しようがないことがあり得る。 Also, in actuality, there are cases where different attributes do not want to be known depending on the person, and in that case, it may not be possible to set the QI and SA appropriately.

以下で説明する実施形態のデータ編集装置、データ編集方法およびデータ編集プログラムでは、一般化対象の関係モデルがｎ列ある場合、任意のｎ−１列について、それらの値が一緒の行について残りの列の値に多様性を持たせるように一般化する。各列についての多様性定義を入力として受け、それらで各列の多様性を判定しながら一般化することで、どの列も多様性を達成している関係モデルに変換する。すなわち、以下で説明する実施形態のデータ編集装置、データ編集方法およびデータ編集プログラムでは、ＱＩとＳＡの設定を必要とせず、プライバシーに配慮した、属性間の関係を分析できる一般化を行う。一般化後の関係モデルは、サンプルのプライバシーに配慮して属性間の関係を分析することができる。たとえば、個票データに含まれている誰か一人について、ｎ列のうちｎ−１列までの情報を知っている者が見ても、残りの１列の情報が多様化されているため、高いプライバシーの保護を実現することができる。 In the data editing apparatus, the data editing method, and the data editing program of the embodiment described below, when there are n columns of relational models to be generalized, for any n−1 columns, the remaining values for the rows where the values are the same. Generalize to give diversity to column values. By receiving the definition of diversity for each column as input and generalizing it while judging the diversity of each column, each column is converted into a relational model that achieves diversity. In other words, the data editing apparatus, data editing method, and data editing program of the embodiment described below do not require setting of QI and SA, and generalize the relationship between attributes in consideration of privacy. The relation model after generalization can analyze the relation between attributes considering the privacy of the sample. For example, about one person included in the individual vote data, even if a person who knows the information up to n-1 column out of n columns, the remaining one column information is diversified, so it is high Privacy protection can be realized.

＜データ編集装置＞
図４は、データ編集装置１００の機能ブロックの例を示す図である。 <Data editing device>
FIG. 4 is a diagram illustrating an example of functional blocks of the data editing apparatus 100.

データ編集装置１００は、入力部１０２、一般化部１０４、出力部１０６を含む。一般化部１０４は、匿名化部１０４とも呼ばれることがある。 The data editing apparatus 100 includes an input unit 102, a generalization unit 104, and an output unit 106. The generalization unit 104 may also be referred to as an anonymization unit 104.

データ編集装置１００の入力部１０２は、以下を入力として受ける。
（Ｉ１）一般化対象の関係モデルＲ
（Ｉ２）関係モデルＲの各列の一般化木Ｔ
（Ｉ３）各一般化木の各階層の多様性定義Ｄ
（Ｉ４）最大墨塗り行数ｓ
関係モデルＲは、各列にサンプルに対する情報が格納されている２次元の表で、たとえば、図２に示されているものである。もちろん、関係モデルＲは、図２のように、サンプルは人である必要はなく、他の動物であっても、装置であってもよく、一般には任意である。 The input unit 102 of the data editing apparatus 100 receives the following as input.
(I1) Relational model R to be generalized
(I2) Generalized tree T for each column of relational model R
(I3) Diversity definition D of each hierarchy of each generalized tree
(I4) Maximum number of black lines s
The relation model R is a two-dimensional table in which information about samples is stored in each column, and is shown in FIG. 2, for example. Of course, in the relational model R, as shown in FIG. 2, the sample does not need to be a person, and may be another animal or a device, and is generally arbitrary.

関係モデルＲの各列の一般化木Ｔは、各値の分類関係を示した木構造データである。一般化木Ｔには、階層と、各階層における値（一般化値とも呼ばれる）が定義され、階層０を任意値"*"とし、階層数が大きくなるにつれ、一般化の程度が低くなる。つまり階層数が大きくなるにつれ、抽象から具象に移行する。一般化木は、一般化情報と呼ばれることもある。 The generalized tree T in each column of the relationship model R is tree structure data indicating the classification relationship of each value. In the generalized tree T, hierarchies and values in each hierarchy (also referred to as generalized values) are defined. Hierarchy 0 is an arbitrary value “*”, and the degree of generalization decreases as the number of hierarchies increases. In other words, as the number of hierarchies increases, the transition from abstract to concrete. The generalized tree is sometimes called generalized information.

図５は「生年」の一般化木の例である。図５では、図２に出現している値に関する部分だけを表現しているが、それ以外の値に関する部分があっても良い。図５は深さ３までの３階層がある。木の根の方向にエッジを辿るにつれ、数字が一般化されている。根まで一般化することは墨塗りに相当するが、便宜上これを階層０と呼ぶ。たとえば、値１９８９は、階層３では１９８９であり、階層２に一般化すると“１９８？”であり、階層１に一般化すると“１９？？”である。さらに、階層０に一般化すると“＊”である。ここで“？”は任意の１桁の数字であり、“＊”は任意の桁数の任意の数である。 FIG. 5 is an example of a generalized tree of “birth year”. In FIG. 5, only the portion related to the values appearing in FIG. 2 is expressed, but there may be portions related to other values. FIG. 5 has three levels up to a depth of three. As you follow the edge in the direction of the root of the tree, the numbers are generalized. Generalizing to the root corresponds to sanitization, but for convenience, this is called layer 0. For example, the value 1989 is 1989 in layer 3, is “198?” When generalized to layer 2, and is “19?” When generalized to layer 1. Furthermore, it is “*” when generalized to the hierarchy 0. Here, “?” Is an arbitrary single-digit number, and “*” is an arbitrary number of arbitrary digits.

図６は「郵便番号」の一般化木の例を示す図であり、図７は「趣味」の一般化木の例を示す図である。 FIG. 6 is a diagram illustrating an example of a generalized tree of “zip code”, and FIG. 7 is a diagram illustrating an example of a generalized tree of “hobby”.

関係モデルＲは図２に示されているものとする。このとき、一般化木Ｔとして、図２の「生年」から図５に示されているものへ、図２の「郵便番号」から図６に示されているものへ、図２の「趣味」から図７へ、の関係を有する写像を以下の説明で用いる。 Assume that the relationship model R is shown in FIG. At this time, as the generalized tree T, from “birth year” in FIG. 2 to what is shown in FIG. 5, from “zip code” in FIG. 2 to what is shown in FIG. 6, “hobby” in FIG. A map having the relationship from to FIG. 7 will be used in the following description.

各一般化木の各階層の多様性定義Ｄは、列名ａと階層ｌの組から多様性定義への写像Ｄ：（ａ、ｌ）→ｄとする。多様性定義ｄは、度数分布から性質を満たすか否かの論理値への写像ｄ：Ｆ→１／０とする。さらに、写像ｄは、次の性質を持つ写像とする。
（ｄ１）単調性：（ｄ（Ｆ１）＝１）∧（ｄ（Ｆ２）＝１）⇒ｄ（Ｆ１＋Ｆ２）＝１。ただし、Ｆ１＋Ｆ２は度数分布同士の和である。
（ｄ２）階層単調性：ｄ（Ｆ）＝１⇒ｄ’（Ｆ’）＝１。ただし、ｄ’はｄより１つだけ小さい階層での多様性定義で、Ｆ’はｄ’の階層での度数分布、すなわちＦをｄ’の階層に一般化したものである。 The diversity definition D of each hierarchy of each generalized tree is assumed to be a mapping D: (a, l) → d from the combination of the column name a and the hierarchy 1 to the diversity definition. The diversity definition d is a mapping d: F → 1/0 from the frequency distribution to a logical value indicating whether or not the property is satisfied. Further, the map d is a map having the following properties.
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1. However, F1 + F2 is the sum of frequency distributions.
(D2) Hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1. However, d ′ is a diversity definition in a hierarchy that is one smaller than d, and F ′ is a frequency distribution in the d ′ hierarchy, that is, F is generalized to the d ′ hierarchy.

図８は、各一般化木の各階層の多様性定義の例を示す図である。図８は上記の一般化木Ｔに対応する写像Ｄの例である。 FIG. 8 is a diagram illustrating an example of diversity definition of each hierarchy of each generalized tree. FIG. 8 is an example of the mapping D corresponding to the generalized tree T described above.

たとえば、ｄを生年：３の多様性定義とする。このときｄはＦに１０の位が異なる値があるときにのみ真（“１”）となるので、たとえば、Ｆ＝｛“１９７０”：１、“１９８９”：２｝⇒ ｄ（Ｆ）＝１、Ｆ＝｛“１９７０”：２、“１９７８”：２｝⇒ｄ（Ｆ）＝０などが成り立つ。また、多様性定義ｄをはじめ、図８の各多様性は単調性および階層単調性を満たす。たとえば、Ｆ＝｛“１９７０”：１、“１９８９”：２｝⇒ｄ（Ｆ１）＝１、Ｆ２＝｛“１９７０”：２、“１９８９”：１｝⇒ｄ（Ｆ２）＝１であり、ｄ（Ｆ１＋Ｆ２）＝ｄ（｛“１９７０”：３、“１９８９”：３｝）＝１であるため、確かに単調性に矛盾しない。また、Ｆ１＝｛“１９７０”：１、 “１９８９”：２｝⇒ｄ（Ｆ１）＝１であり、ｄ’(Ｆ１’)＝ｄ’(“１９７？”：１、“１９８？”：２）＝１であるため、確かに階層単調性に矛盾しない。なお、各列の階層０の多様性定義は常に真（“１”）を返す写像とする。このような写像Ｄを以下の説明で用いる。 For example, let d be the diversity definition of birth year: 3. At this time, d becomes true (“1”) only when F has different values at the 10's place. For example, F = {“1970”: 1, “1989”: 2} → d (F) = 1, F = {“1970”: 2, “1978”: 2} => d (F) = 0. Moreover, each diversity of FIG. 8 including the diversity definition d satisfies monotonicity and hierarchical monotonicity. For example, F = {“1970”: 1, “1989”: 2} → d (F1) = 1, F2 = {“1970”: 2, “1989”: 1} → d (F2) = 1, Since d (F1 + F2) = d ({“1970”: 3, “1989”: 3}) = 1, it is certainly consistent with monotonicity. Further, F1 = {“1970”: 1, “1989”: 2} → d (F1) = 1, and d ′ (F1 ′) = d ′ (“197?”: 1, “198?”: 2 ) = 1, so it is certainly consistent with hierarchical monotonicity. Note that the diversity definition of layer 0 of each column is a mapping that always returns true (“1”). Such a map D is used in the following description.

最大墨塗り行数ｓは、関係モデルＲの行数未満の非負整数とする。以下の説明ではｓ＝２とする。 The maximum sanitized line number s is a non-negative integer less than the number of lines in the relational model R. In the following description, s = 2.

データ編集装置１００の出力部１０６は、一般化部１０４の結果として、一般化情報の集合、およびその要素の１つである一般化情報Ｇにより一般化した後の関係モデルを出力する。 As a result of the generalization unit 104, the output unit 106 of the data editing apparatus 100 outputs a generalized information set and a relation model after generalization with the generalized information G that is one of its elements.

一般化情報Ｇは、｛列名：階層｝の集合と、墨塗りすべき行番号集合の組とする。｛列名：階層｝は各列のものが高々１つずつあるものとする。階層＝０の列名のデータはなくても良い。以下では、行番号は１行目から順番に０、１、・・・によってインデックスされるものとする。一般化情報Ｇは、たとえばＧ＝（｛生年：２、郵便番号：１、趣味：１｝、｛８｝）などとなり得る。これは「生年」を階層２に一般化、「郵便番号」を階層１に一般化、「趣味」を階層１に一般化、９行目を墨塗り、という意味であるとする。 The generalized information G is a set of a set of {column name: hierarchy} and a set of row numbers to be sanitized. It is assumed that {column name: hierarchy} has at most one in each column. There is no need for column name data of hierarchy = 0. In the following, it is assumed that line numbers are indexed by 0, 1,... In order from the first line. The generalized information G can be, for example, G = ({birth year: 2, postal code: 1, hobby: 1}, {8}). This means that “birth year” is generalized to level 2, “zip code” is generalized to level 1, “hobby” is generalized to level 1, and the ninth line is sanitized.

図９は図２に一般化情報を適用した例を示す図である。より詳細には、図９は、図２に示されている関係モデルを一般化情報Ｇ＝（｛生年：２、郵便番号：１、趣味：１｝、｛８｝）に従い一般化した後の関係モデルである。「八条八郎」についての一部の情報を知っているＢ社が図９を見た場合、「生年」と「郵便番号」を知っていても「趣味」はわからず、「郵便番号」と「趣味」を知っていても「生年」は詳しくはわからず、「生年」と「趣味」を知っていても「郵便番号」は詳しくはわからない。なお、墨塗りの代わりに、行を削除しても良い。 FIG. 9 is a diagram showing an example in which the generalized information is applied to FIG. More specifically, FIG. 9 shows the relation model shown in FIG. 2 after generalization according to the generalized information G = ({birth year: 2, postal code: 1, hobby: 1}, {8}). It is a relationship model. If Company B, who knows some information about “Hachijo Hachiro”, sees FIG. 9, even if he / she knows “birth year” and “postal code”, he / she does not know “hobby”, but “postal code” and “ Even if you know "hobby", you will not know "birth year" in detail, and even if you know "birth year" and "hobby", you will not know "zip code" in detail. Note that a line may be deleted instead of sanitizing.

データ編集装置１００の出力部１０６は、一般化情報の集合を図１０のように表形式で出力しても良い。図１０の１行目が一般化情報Ｇである。データ編集装置１００の出力部１０６から出力される集合は、一般化された関係モデルと呼ばれることもある。 The output unit 106 of the data editing apparatus 100 may output a set of generalized information in a table format as shown in FIG. The first line in FIG. The set output from the output unit 106 of the data editing apparatus 100 may be referred to as a generalized relational model.

多様性定義Ｄの単調性および階層単調性により、ある一般化情報Ｇが多様性定義Ｄを達成するとき、ある一般化情報Ｇ以下の｛列名：階層｝集合を持つ一般化情報Ｇは全て多様性定義Ｄを達成する。それにより、一般化情報Ｇを求めやすくなる利点がある。なお、列名：階層集合の大小は、各列の階層の大小で判定する。たとえば、Ｐ１＝｛生年：２、郵便番号：１｝、Ｐ２＝｛生年：１、郵便番号：１｝、Ｐ３＝｛生年：１、郵便番号：２｝とすると、Ｐ１>Ｐ２、Ｐ３>Ｐ２であり、Ｐ１とＰ３の大小は決められない。 Due to the monotonicity and hierarchical monotonicity of the diversity definition D, when a generalized information G achieves the diversity definition D, all generalized information G having a {column name: hierarchy} set below the generalized information G is all Achieve diversity definition D. Thereby, there is an advantage that the generalized information G can be easily obtained. Note that the size of the column name: hierarchy set is determined by the size of the hierarchy of each column. For example, if P1 = {birth year: 2, zip code: 1}, P2 = {birth year: 1, zip code: 1}, P3 = {birth year: 1, zip code: 2}, P1> P2, P3> P2 Therefore, the size of P1 and P3 cannot be determined.

データ編集装置１００の一般化部１０４では、入力部１０２で受けた関係モデルＲがｎ列ある場合、任意のｎ−１列について、それらの値が一緒の行について残りの列の値に多様性があるように一般化する。一般化部１０は、列の冪集合による束構造について、小さい集合から順に、多様性を達成するよう一般化を試みてもよい。この一般化は、ｌ−多様化と類似しているが、各列で多様性判定が異なるという点で、ｌ−多様化とは異なっている。 In the generalization unit 104 of the data editing apparatus 100, when the relation model R received by the input unit 102 has n columns, for any n−1 columns, the values are diversified into the values of the remaining columns for the same row. Generalize as there is. The generalization unit 10 may attempt to generalize so as to achieve diversity in order from a small set to a bundle structure based on a set of rows of columns. This generalization is similar to l-diversification, but differs from l-diversification in that the diversity determination is different for each column.

またデータ編集装置１００の一般化部１０４は、各変換後の関係モデル（２次元の表）で、最大墨塗り行数ｓ以下を削除すれば達成しないか繰り返し検証する機能を有する。出力部１０６は、最大墨塗り行数ｓ以上の数の行を削除した場合は、結果を出力しなくても良い。 Further, the generalization unit 104 of the data editing apparatus 100 has a function of repeatedly verifying whether or not the relationship model (two-dimensional table) after conversion is achieved if the maximum number of sanitized lines s or less is deleted. The output unit 106 may not output the result when the number of lines greater than or equal to the maximum sanitized line number s is deleted.

データ編集装置１００の入力部１０２で、関係モデルＲとして図２に示されているもの、一般化木Ｔとして図５〜図７に示されているもの、多様性定義Ｄとして図８に示されているもの、最大墨塗り行数ｓ＝２、を入力として受け入れたときの、データ編集装置１００の一般化部１０４の機能について説明する。 In the input unit 102 of the data editing apparatus 100, the relation model R shown in FIG. 2, the generalized tree T shown in FIGS. 5 to 7, and the diversity definition D shown in FIG. The function of the generalization unit 104 of the data editing apparatus 100 when the maximum sanitized line number s = 2 is accepted as an input will be described.

データ編集装置１００の入力部１０２は、関係モデルＲの各列名の｛列名：最大階層｝を要素とする集合Ｃを用意する機能を有する。関係モデルＲの列名は｛生年、郵便番号、趣味｝であり、それぞれの最大階層は一般化木Ｔより｛生年：３、郵便番号：２、趣味：２｝なので、Ｃ＝｛｛生年：３｝、｛郵便番号：２｝、｛趣味：２｝｝である。 The input unit 102 of the data editing apparatus 100 has a function of preparing a set C having {column name: maximum hierarchy} of each column name of the relational model R as an element. The column name of the relation model R is {birth year, zip code, hobby}, and the maximum hierarchy of each is {general year: 3, zip code: 2, hobby: 2} from generalized tree T, so C = {{birth year: 3}, {zip code: 2}, {hobby: 2}}.

データ編集装置１００の一般化部１０４は、集合Ｃの各要素Ｐに対し、多様性定義Ｄを満たす最大の階層を要素とする集合Ｏを用意する機能を有する。たとえば、要素Ｐ＝｛生年：３｝の場合、多様性を満たす最大の階層は｛生年：３｝で墨塗りは不要である。また、要素Ｐ＝｛郵便番号：２｝の場合、多様性を満たす最大の階層は｛郵便番号：２｝であり、墨塗り不要である。さらに、要素Ｐ＝｛趣味：２}の場合、多様性を満たす最大の階層は｛趣味：２｝で墨塗りは不要である。よって、集合Ｏ＝｛（｛生年：３｝、｛｝）、（｛郵便番号：２｝、｛｝）、（｛趣味：２｝、｛｝）｝となる。 The generalization unit 104 of the data editing apparatus 100 has a function of preparing a set O having the maximum hierarchy satisfying the diversity definition D as an element for each element P of the set C. For example, when the element P = {birth year: 3}, the maximum hierarchy that satisfies the diversity is {birth year: 3} and no sanitization is required. In addition, when the element P = {zip code: 2}, the maximum hierarchy that satisfies the diversity is {zip code: 2}, and no sanitization is required. Further, when the element P = {hobby: 2}, the maximum hierarchy that satisfies the diversity is {hobby: 2}, and no sanitization is required. Therefore, the set O = {({birth year: 3}, {}), ({zip code: 2}, {}), ({hobby: 2}, {})}.

ここで、「多様性を満たす」、「多様性を達成する」という言葉は、「多様性定義Ｄを満足する」と同義で用いられ得る。 Here, the terms “satisfy diversity” and “achieve diversity” may be used synonymously with “satisfies diversity definition D”.

データ編集装置１００の一般化部１０４は、列名集合の異なる全ての２つの一般化情報Ｇ∈Ｏの｛列名：階層｝の組み合わせについて、重複列名について階層を２つの最小値として融合した｛列名：階層｝をそれぞれ求め、それらを集合Ｃとし、集合Ｃから階層が極大でない要素を削除する機能を有する。 The generalization unit 104 of the data editing apparatus 100 merges hierarchies of duplicate column names as two minimum values for combinations of {column name: hierarchy} of all two pieces of generalized information GεO having different column name sets. {Column name: Hierarchy} is obtained, set as set C, and elements having a non-maximal hierarchy are deleted from set C.

また、データ編集装置１００の一般化部１０４は、集合Ｏを空にし、集合Ｃの各要素Ｐにつき、多様性定義Ｄを満たす全ての極大の階層を集合Ｏに追加する機能を有する。 The generalization unit 104 of the data editing apparatus 100 has a function of emptying the set O and adding all maximum hierarchies satisfying the diversity definition D to the set O for each element P of the set C.

また、データ編集装置１００の一般化部１０４は、集合Ｏの要素のうち、階層が正値である列名が１つ以下のものを削除する機能を有する。階層が正値である列名が１つ以下の一般化情報が算出できなくなるが、全体の処理量を減らせるという効果がある。この機能を除いても、一般化情報は算出できるが、そのときには、階層が正値である列名が１つ以下の一般化情報による一般化は統計化で代用することができる。 Further, the generalization unit 104 of the data editing apparatus 100 has a function of deleting one or less column names having a positive hierarchy from among the elements of the set O. Although generalized information with one or less column names having a positive hierarchy cannot be calculated, the overall processing amount can be reduced. Even if this function is excluded, generalized information can be calculated, but at that time, generalization by generalized information with one or less column names having a positive hierarchy can be substituted by statisticalization.

さらに、データ編集装置１００の一般化部１０４は、（関係モデルＲの列数−１）回だけ、上記の処理を繰り返す機能を有している。 Further, the generalization unit 104 of the data editing apparatus 100 has a function of repeating the above process only (number of columns of the relational model R−1) times.

図２の例では、列名集合の異なる２つのＧ∈Ｏの｛列名：階層｝の組み合わせは、次の３通りとなる。
（１）｛生年：３｝、｛郵便番号：２｝
（２）｛生年：３｝、｛趣味：２｝
（３）｛郵便番号：２｝、｛趣味：２｝
いずれも重複列名がないためそのまま融合し、集合Ｃ＝｛｛生年：３、郵便番号：２｝、｛生年：３、趣味：２｝、｛郵便番号：２、趣味：２｝｝となる。集合Ｃの要素は全て極大である。 In the example of FIG. 2, there are the following three combinations of {column name: hierarchy} of two G∈Os having different column name sets.
(1) {Birth year: 3}, {Postal code: 2}
(2) {Birth year: 3}, {Hobby: 2}
(3) {Zip code: 2}, {Hobby: 2}
Since there is no duplicate column name, they are merged as they are, and the set C = {{birth year: 3, zip code: 2}, {birth year: 3, hobby: 2}, {zip code: 2, hobby: 2}}. . All elements of set C are maximal.

たとえば、集合Ｃの各要素Ｐ＝｛生年：３、郵便番号：２｝の場合、多様性を満たす極大の階層は、｛生年：３、郵便番号：０｝で墨塗り不要と、｛生年：２、郵便番号：２｝で墨塗り｛８｝である。同様に、Ｐ＝｛生年：３、趣味：２｝の場合は（｛生年：２、趣味：１｝、｛８｝）と（｛生年：１、趣味：２｝、｛８｝）であり、Ｐ＝｛郵便番号：２、趣味：２｝の場合は（｛郵便番号：２、趣味：１｝、｛８｝）と（｛郵便番号：０、趣味：２｝、｛｝）である。よって、Ｏ＝｛（｛生年：３、郵便番号：０｝、｛｝）、（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、趣味：１｝、｛８｝）、（｛生年：１、趣味：２｝、｛８｝）、（｛郵便番号：２、趣味：１｝、｛８｝）、（｛郵便番号：０、趣味：２｝、｛｝）｝である。 For example, if each element P of the set C = {birth year: 3, zip code: 2}, the maximum hierarchy satisfying the diversity is {birth year: 3, zip code: 0} and no sanitization is required, {birth year: 2, with a postal code: 2} and inked {8}. Similarly, in the case of P = {birth year: 3, hobby: 2}, ({birth year: 2, hobby: 1}, {8}) and ({birth year: 1, hobby: 2}, {8}) , P = {zip code: 2, hobby: 2} are ({zip code: 2, hobby: 1}, {8}) and ({zip code: 0, hobby: 2}, {}) . Therefore, O = {({birth year: 3, zip code: 0}, {}), ({birth year: 2, zip code: 2}, {8}), ({birth year: 2, hobby: 1}, { 8}), ({birth year: 1, hobby: 2}, {8}), ({zip code: 2, hobby: 1}, {8}), ({zip code: 0, hobby: 2}, { })}.

図２の例では、集合Ｏの要素のうち、（｛生年：３、郵便番号：０｝、｛｝）と（｛郵便番号：０、趣味：２｝、｛｝）｝はいずれも階層が正値である列名が１つしかないため、これらを削除し、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、趣味：１｝、｛８｝）、（｛生年：１、趣味：２｝、｛８｝）、（｛郵便番号：２、趣味：１｝、｛８｝）｝となる。 In the example of FIG. 2, among the elements of the set O, ({birth year: 3, postal code: 0}, {}) and ({postal code: 0, hobby: 2}, {})} are both hierarchical. Since there is only one column name that is a positive value, these are deleted, and O = {({birth year: 2, postal code: 2}, {8}), ({birth year: 2, hobby: 1}, { 8}), ({birth year: 1, hobby: 2}, {8}), ({zip code: 2, hobby: 1}, {8})}.

２回目の繰り返しの処理で、列名集合の異なる２つのＧ∈Ｏの｛列名：階層｝の組み合わせは、次の５通りとなる。
（１’）｛生年：２、郵便番号：２｝、｛生年：２、趣味：１｝
（２’）｛生年：２、郵便番号：２｝、｛生年：１、趣味：２｝
（３’）｛生年：２、郵便番号：２｝、｛生年：２、趣味：１｝
（４’）｛生年：２、趣味：１｝、｛郵便番号：２、趣味：１｝
（５’）｛生年：１、趣味：２｝、｛郵便番号：２、趣味：１｝
ここで、データ編集装置１００の一般化部１０４は、重複列名の階層は互いの最小値として融合する機能を有する。この機能を用いると、上記（１’）〜（５’）は、それぞれは次のようになる。
（１）｛生年：２、郵便番号：２、趣味：１｝
（２）｛生年：１、郵便番号：２、趣味：２｝
（３）｛生年：２、郵便番号：２、趣味：１｝
（４）｛生年：２、郵便番号：２、趣味：１｝
（５）｛生年：１、郵便番号：２、趣味：１｝
これらの項目のうち、項目（３）＜項目（１）、項目（４）＜項目（１）、項目５＜項目（２）なので、これら極大でない項目を除いて項目（１）、（２）を残し、集合Ｃ＝｛｛生年：２、郵便番号：２、趣味：１｝、｛生年：１、郵便番号：２、趣味:２｝｝となる。 In the second iteration, there are the following five combinations of {column name: hierarchy} of two GεOs with different column name sets.
(1 ') {Birth year: 2, Zip code: 2}, {Birth year: 2, Hobby: 1}
(2 ') {Birth year: 2, Zip code: 2}, {Birth year: 1, Hobby: 2}
(3 ') {Birth year: 2, Zip code: 2}, {Birth year: 2, Hobby: 1}
(4 ') {Birth year: 2, Hobby: 1}, {Zip code: 2, Hobby: 1}
(5 ') {Birth year: 1, Hobby: 2}, {Zip code: 2, Hobby: 1}
Here, the generalization unit 104 of the data editing apparatus 100 has a function of merging the hierarchy of duplicate column names as the minimum value of each other. When this function is used, the above (1 ′) to (5 ′) are as follows.
(1) {Birth year: 2, Zip code: 2, Hobby: 1}
(2) {Birth year: 1, Zip code: 2, Hobby: 2}
(3) {Birth year: 2, Zip code: 2, Hobby: 1}
(4) {Birth year: 2, Zip code: 2, Hobby: 1}
(5) {Birth year: 1, Zip code: 2, Hobby: 1}
Among these items, item (3) <item (1), item (4) <item (1), item 5 <item (2), so items (1) and (2) are excluded except for these non-maximum items. And set C = {{year of birth: 2, postal code: 2, hobby: 1}, {birth year: 1, postal code: 2, hobby: 2}}.

次に、データ編集装置１００の一般化部１０４は、再び、集合Ｏを空にし、集合Ｃの各要素Ｐにつき、多様性定義Ｄを満たす全ての極大の階層を集合Ｏに追加する。集合Ｃの各要素Ｐ＝｛生年：２、郵便番号：２、趣味：１｝の場合多様性を満たす極大の階層は（｛生年：２、郵便番号：１、趣味：１｝、｛８｝）と（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）であり、Ｐ＝｛生年：１、郵便番号：２、趣味：２｝の場合多様性を満たす極大の階層は（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）と（｛生年：１、郵便番号：０、趣味：２｝、｛８｝）である。よって、Ｏ＝｛（｛生年：２、郵便番号：１、趣味：１｝、｛８｝、（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）、（｛生年：１、郵便番号：０、趣味：２｝、｛８｝）｝である。この場合、データ編集装置１００の一般化部１０４は、集合Ｏに削除対象となる要素はなく、何もしない。 Next, the generalization unit 104 of the data editing apparatus 100 again empties the set O and adds all maximum hierarchies satisfying the diversity definition D to the set O for each element P of the set C. When each element P of the set C = {birth year: 2, zip code: 2, hobby: 1}, the maximum hierarchy satisfying the diversity is ({birth year: 2, zip code: 1, hobby: 1}, {8} ) And ({birth year: 1, zip code: 2, hobby: 1}, {8}), and P = {birth year: 1, zip code: 2, hobby: 2} Are ({birth year: 1, zip code: 2, hobby: 1}, {8}) and ({birth year: 1, zip code: 0, hobby: 2}, {8}). Therefore, O = {({Birth year: 2, Zip code: 1, Hobby: 1}, {8}, ({Birth year: 1, Zip code: 2, Hobby: 1}, {8}), ({Birth year: 1, postal code: 0, hobby: 2}, {8})} In this case, the generalization unit 104 of the data editing apparatus 100 has no element to be deleted in the set O, and does nothing.

図１０は一般化情報の集合の例を示す図である。ただし、図１０では階層の値が０の列に関する情報は省略している。 FIG. 10 is a diagram illustrating an example of a set of generalized information. However, in FIG. 10, information regarding the column whose hierarchy value is 0 is omitted.

図１０が得られたら、そこから一般化情報を一つ適当に選び、関係モデルＲを一般化する。選び方は、たとえば、階層の和が最大のもの、それが複数ある場合階層の値が正値である列数がより多いもの、それが複数ある場合は文字列表現してソートしたときの最初のもの、などとする。一般的には、一般化情報を入力とし評価値を出力とする評価関数を予め用意しておき、その値が最小あるいは最大のものを選ぶ。 When FIG. 10 is obtained, one generalized information is appropriately selected therefrom, and the relational model R is generalized. The selection method is, for example, the one with the highest sum of hierarchies, if there are multiple hierarchies with a higher number of columns with a positive value, and if there are multiple hierarchies, the first one when sorting with a string representation , Etc. In general, an evaluation function having generalized information as an input and an evaluation value as an output is prepared in advance, and a function having a minimum or maximum value is selected.

一般化は、関係モデルの各列ａにつき、一般化木ｔをＴから取得し、一般化情報から階層を取得し、各行の値をｔのその階層の値に置換し、さらに一般化情報の墨塗りすべき行番号集合にあたる行を全て墨塗りすることでおこなう。図９は図１０の最初の要素で一般化した結果である。 For generalization, for each column a of the relational model, the generalized tree t is acquired from T, the hierarchy is acquired from the generalized information, the value of each row is replaced with the value of that hierarchy of t, and the generalized information This is done by sanitizing all the lines corresponding to the set of line numbers to be sanitized. FIG. 9 is a result generalized with the first element of FIG.

データ編集装置１００の出力部１０６は、一般化情報か、変更後の関係モデルの、どちらかだけを出力しても良い。 The output unit 106 of the data editing apparatus 100 may output only the generalized information or the changed relation model.

データ編集装置１００の一般化部１０４の集合Ｃの各要素Ｐに対し多様性定義Ｄを満たす最大の階層を要素とする集合Ｏを用意する機能について、より詳細に説明する。 The function of preparing the set O having the maximum hierarchy satisfying the diversity definition D as an element for each element P of the set C of the generalization unit 104 of the data editing apparatus 100 will be described in more detail.

データ編集装置１００の一般化部１０４は、集合Ｃの各要素Ｐの列集合は｛生年、郵便番号｝なので、関係モデルＲの複写をこの列範囲でＰによって一般化する。 The generalization unit 104 of the data editing apparatus 100 generalizes the copy of the relationship model R by P in this column range because the column set of each element P of the set C is {birth year, postal code}.

図１１は、Ｐ’＝｛生年：３、郵便番号：２｝で一般化した結果を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛０、１、・・・、８｝となる。つまり、全行を墨塗りする必要がある。Ｉの要素数｜Ｉ｜＝９、最大墨塗り行数ｓ＝２なので条件は成立しない。ここで、集合Ｃ＝｛Ｐ｝＝｛｛生年：３、郵便番号：２｝｝である。 FIG. 11 is a diagram illustrating a result generalized by P ′ = {year of birth: 3, postal code: 2}. In order to achieve the diversity definition D, the row set to be sanitized is I = {0, 1,..., 8}. In other words, all lines need to be painted. The condition does not hold because the number of I elements | I | = 9 and the maximum number of sanitized lines s = 2. Here, the set C = {P} = {{year of birth: 3, postal code: 2}}.

データ編集装置１００の一般化部１０４は、集合Ｃから要素を１つ削除し、Ｐとする。たとえば、Ｐ←｛生年：３、郵便番号：２｝、Ｃ←｛｝である。 The generalization unit 104 of the data editing apparatus 100 deletes one element from the set C and sets it as P. For example, P ← {year of birth: 3, postal code: 2}, C ← {}.

そして、データ編集装置１００の一般化部１０４は、Ｐの各列名ａで且つａの階層が０より大きい列名ａに対し、まず、集合Ｃの要素Ｐを要素Ｐ’に複写し、要素Ｐ’のａの階層を１だけ減じる。たとえば、Ｐ’←｛生年：２、郵便番号：２｝となる。 The generalization unit 104 of the data editing apparatus 100 first copies the element P of the set C to the element P ′ for each column name a of P and the column name a in which the hierarchy of a is greater than 0. Decrease P's a hierarchy by 1. For example, P ′ ← {year of birth: 2, postal code: 2}.

データ編集装置１００の一般化部１０４は、まず、列名ａ←生年とし、要素Ｐ’の列範囲で、Ｐ’による一般化で関係モデルＲが多様性を達成するために必要な黒塗り行集合Ｉを算出する。たとえば、Ｐの生年の階層は３であるため、合Ｃの要素Ｐを要素Ｐ’に複写しに複写し、要素Ｐ’のａの階層を１だけ減じる。すると、Ｐ’←｛生年：２、郵便番号：２｝となる。要素Ｐ’の列集合は｛生年、郵便番号｝なので、関係モデルＲの複写をこの列範囲でＰ’によって一般化する。 First, the generalization unit 104 of the data editing apparatus 100 sets the column name a ← the year of birth, and the black line necessary for the relation model R to achieve diversity by the generalization by P ′ in the column range of the element P ′. Set I is calculated. For example, since the hierarchy of P's birth year is 3, the element P of the total C is copied to the element P 'and copied, and the hierarchy of a of the element P' is reduced by 1. Then, P ′ ← {year of birth: 2, postal code: 2}. Since the column set of element P 'is {birth year, zip code}, the copy of relational model R is generalized by P' in this column range.

図１２は、Ｐ’＝｛生年：２、郵便番号：２｝で一般化した結果の例を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛８｝となる。図１２に示されている例では、墨塗りすべき行集合Ｉの要素数｜Ｉ｜＝１、最大墨塗り行数ｓ＝２なので条件が成立する。 FIG. 12 is a diagram illustrating an example of a result generalized by P ′ = {year of birth: 2, postal code: 2}. In order to achieve diversity definition D, the set of rows to be sanitized is I = {8}. In the example shown in FIG. 12, the condition is satisfied because the number of elements | I | = 1 of the line set I to be sanitized and the maximum number of sanitized lines s = 2.

するとデータ編集装置１００の一般化部１０４は、集合Ｏに（Ｐ’、Ｉ）を追加する。結果、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）｝となる。 Then, the generalization unit 104 of the data editing apparatus 100 adds (P ′, I) to the set O. As a result, O = {({year of birth: 2, postal code: 2}, {8})}.

次にデータ編集装置１００の一般化部１０４は、ａ←郵便番号とする。Ｐの郵便番号の階層は２であるため、ＰをＰ’に複写し、Ｐ’のａの階層を１だけ減じる。すると、Ｐ’←｛生年：３、郵便番号：１｝となる。 Next, the generalization unit 104 of the data editing apparatus 100 sets a ← zip code. Since the postal code hierarchy of P is 2, P is copied to P ′, and the hierarchy of a of P ′ is reduced by 1. Then, P '← {year of birth: 3, postal code: 1}.

次にデータ編集装置１００の一般化部１０４は、Ｐ’の列集合は｛生年、郵便番号｝なので、Ｒの複写をこの列範囲でＰ’によって一般化する。 Next, the generalization unit 104 of the data editing apparatus 100 generalizes the copy of R by P ′ within this column range since the column set of P ′ is {birth year, postal code}.

図１３は、Ｐ’＝｛生年：３、郵便番号：１｝で一般化した結果の例を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛０、１、・・・、８｝となる。つまり、全行を墨塗りする必要がある。墨塗りすべき行集合Ｉの要素数｜Ｉ｜＝９、最大墨塗り行数ｓ＝２なので条件は成立しない。そこで、ＣにＰ’を追加し、集合Ｃ＝｛Ｐ｝＝｛｛生年：３、郵便番号：１｝｝である。Ｃは空でないので、データ編集装置１００の一般化部１０４は、集合Ｃから要素を１つ削除し、Ｐとする。データ編集装置１００の一般化部１０４は、この集合Ｃが空になるまで、上でａを生年とした場合について説明したことを繰り返す機能を有する。結果、Ｃが空になった時点で、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、郵便番号：１}、｛８｝）、（｛生年：３、郵便番号：０｝、｛｝）｝を得る。 FIG. 13 is a diagram illustrating an example of a result generalized by P ′ = {year of birth: 3, zip code: 1}. In order to achieve the diversity definition D, the row set to be sanitized is I = {0, 1,..., 8}. In other words, all lines need to be painted. Since the number of elements of the line set I to be sanitized | I | = 9 and the maximum number of sanitized lines s = 2, the condition is not satisfied. Therefore, P ′ is added to C, and the set C = {P} = {{birth year: 3, postal code: 1}}. Since C is not empty, the generalization unit 104 of the data editing apparatus 100 deletes one element from the set C and sets it to P. The generalizing unit 104 of the data editing apparatus 100 has a function of repeating the description of the case where “a” is the birth year above until the set C becomes empty. As a result, when C becomes empty, O = {({birth year: 2, zip code: 2}, {8}), ({birth year: 2, zip code: 1}, {8}), ({ Year of birth: 3, postal code: 0}, {})}.

データ編集装置１００の一般化部１０４は、集合Ｏの３要素のうち（｛生年：２、郵便番号:１｝、｛８｝）は極大でない（他の要素（｛生年：２、郵便番号：２｝、｛８｝）よりＰ’の部分が小さい）ので、これを削除する。結果、集合Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：３、郵便番号：０｝、｛｝）｝となる。 The generalization unit 104 of the data editing apparatus 100 is not maximal (the other elements ({birth year: 2, zip code :)) among the three elements of the set O ({birth year: 2, zip code: 1}, {8}). 2}, {8}) is smaller than P ′), and is deleted. As a result, the set O = {({birth year: 2, zip code: 2}, {8}), ({birth year: 3, zip code: 0}, {})}.

上記の例でデータ編集装置１００の一般化部１０４は、極大にならない結果に対しても計算をおこなっているが、できるだけ極大にならない結果に対して計算をおこなわないようにしても良い。そのためには、たとえば計算不要な｛列名：階層｝を記憶しても良い。たとえばデータ編集装置１００の一般化部１０４は、集合Ｏに要素を追加するとき、そのＰ’以下の｛列名：階層｝は全て計算不要集合に追加し、Ｐ’が計算不要集合に含まれているか検査する機能を有してもよい。 In the above example, the generalization unit 104 of the data editing apparatus 100 performs the calculation even for the result that does not become the maximum, but may not perform the calculation for the result that does not become the maximum as much as possible. For this purpose, for example, {column name: hierarchy} that does not require calculation may be stored. For example, when the generalization unit 104 of the data editing apparatus 100 adds an element to the set O, all {column names: hierarchies} below P ′ are added to the calculation unnecessary set, and P ′ is included in the calculation unnecessary set. It may have a function to check whether or not

データ編集装置１００の一般化部１０４の、Ｐに対する集合Ｉを作る機能について、より詳細に説明する。 The function of creating the set I for P of the generalization unit 104 of the data editing apparatus 100 will be described in more detail.

データ編集装置１００の一般化部１０４は、Ｐの要素ｐについて、以下のような処理を行う機能を有する。 The generalization unit 104 of the data editing apparatus 100 has a function of performing the following processing on the element p of P.

データ編集装置１００の一般化部１０４は、Ｐの要素ｐに対する多様性定義ｄを多様性定義Ｄから抽出する。たとえば、多様性定義ｄは、｛生年：３｝の多様性定義であり、図８に示されているように、１０の位の値が２種類以上のとき１、さもなくば０を返す写像であっても良い。 The generalization unit 104 of the data editing apparatus 100 extracts the diversity definition d for the element p of P from the diversity definition D. For example, the diversity definition d is a diversity definition of {birth year: 3}, and as shown in FIG. 8, a map that returns 1 if there are two or more values of the tenth place, and returns 0 otherwise. It may be.

データ編集装置１００の一般化部１０４は、関係モデルＲ’のＩ以外の行から、ｐの列名以外の値が等しい行番号の集合をグループＪとし、Ｊの集合Ｕを算出する。図１１では、Ｒ’のＩ以外の行は、図１１の全ての行であり、ｐの列名以外の列は｛郵便番号｝なので、図１１の「郵便番号」の値が等しいグループを作り、Ｕ＝｛｛０、２、６、７｝、｛１、３、４、５｝、｛８｝｝となる。なお、たとえばＪ＝｛８｝は（郵便番号）＝（１３？）のグループである。また、ｐの列名以外の列が｛｝であった場合は、Ｒ’のＩ以外の全行を１つのグループをＪとし、Ｕ＝｛Ｊ｝とする。 The generalization unit 104 of the data editing apparatus 100 calculates a set U of J by setting a set of row numbers having the same value other than the column name of p as a group J from rows other than I of the relation model R ′. In FIG. 11, the rows other than I in R ′ are all the rows in FIG. 11, and the columns other than the column name of p are {zip code}, so a group having the same value of “zip code” in FIG. , U = {{0, 2, 6, 7}, {1, 3, 4, 5}, {8}}. For example, J = {8} is a group of (zip code) = (13?). If the column other than the column name of p is {}, all the rows other than I in R ′ are set as one group, and U = {J}.

データ編集装置１００の一般化部１０４は、Ｒ’から、ｐの列且つＪの値の度数分布を算出する。ｐの列は生年、Ｊ＝｛０、２、６、７｝なので、図１１よりＦ＝｛“１９７０”：１、“１９７２”：１、“１９８９”：２｝となる。 The generalization unit 104 of the data editing apparatus 100 calculates a frequency distribution of p columns and J values from R ′. Since the row of p is the year of birth and J = {0, 2, 6, 7}, F = {“1970”: 1, “1972”: 1, “1989”: 2} from FIG.

データ編集装置１００の一般化部１０４は、ｄ（Ｆ）＝１であるかを判定する機能を有する。上の例では、Ｆには１０の位の値が７と８の２種類あるため、ｄ（Ｆ）＝１となる。 The generalization unit 104 of the data editing apparatus 100 has a function of determining whether d (F) = 1. In the above example, since F has two kinds of values of the 10th place, 7 and 8, d (F) = 1.

データ編集装置１００の一般化部１０４は、全てのＪの要素について、上記の機能を繰り返し適用する機能を有する。 The generalization unit 104 of the data editing apparatus 100 has a function of repeatedly applying the above function to all J elements.

最後の繰り返しＪ←｛８｝になったとする。ｐの列は「生年」、Ｊ＝｛８｝なので、図１１よりＦ＝｛“２０００”：１｝となる。Ｆには１０の位の値が０の１種類しかないため、ｄ（Ｆ）＝０となり、ＩにＪの全要素を追加する。上の例では、Ｉ＝｛８｝、ｃ＝１となる。 Assume that the last repetition J ← {8}. Since the column of p is “the year of birth” and J = {8}, F = {“2000”: 1} from FIG. Since there is only one type of F in which the value of the 10's place is 0, d (F) = 0, and all elements of J are added to I. In the above example, I = {8} and c = 1.

データ編集装置１００の一般化部１０４は、次に、ｐ←郵便番号：２とし、ｐに対する多様性定義ｄをＤから抽出する機能を有する。Ｒ’のＩ以外の行は図１１の行｛０、１、・・・、７｝であり、ｐの列名以外の列は｛生年｝なので、図１１の行｛０、１、・・・、７｝の「生年」の値が等しいグループを作り、Ｕ＝｛｛０、１｝、｛２｝、｛３｝、｛４｝、｛５｝、｛６、７｝｝となる。Ｊ←｛０、１｝となり、Ｆ＝｛“１２？”：１、 “１４？”：１｝となる。Ｆには１０の位の値が２と４の２種類あるため、ｄ（Ｆ）＝１である。 Next, the generalization unit 104 of the data editing apparatus 100 has a function of setting p ← zip code: 2 and extracting the diversity definition d for p from D. The rows other than I in R ′ are the rows {0, 1,..., 7} of FIG. 11 and the columns other than the column name of p are {the year of birth}, so the rows {0, 1,. .., 7} “Birth Year” values are made into a group, and U = {{0, 1}, {2}, {3}, {4}, {5}, {6, 7}}. J ← {0, 1} and F = {“12?”: 1, “14?”: 1}. Since there are two types of F, 2 and 4, in F, d (F) = 1.

全てのＵの要素について繰り返し終了時には、Ｉ＝｛２、３、４、５、６、７、８｝となる。 At the end of repetition for all U elements, I = {2, 3, 4, 5, 6, 7, 8}.

データ編集装置１００の一般化部１０４は、全てのｐの要素について、上記の機能を繰り返し適用する機能を有する。 The generalization unit 104 of the data editing apparatus 100 has a function of repeatedly applying the above function to all p elements.

データ編集装置１００の一般化部１０４は、ｃ＝０であるかどうかを判定し、ｃ＝０である場合には、Ｉを出力部１０６に渡す機能を有する。 The generalization unit 104 of the data editing apparatus 100 has a function of determining whether c = 0, and passing c to the output unit 106 when c = 0.

データ編集装置１００の一般化部１０４は、上記のｃを用いることで、Ｉに変更があったら多様性定義Ｄが達成できているか再度調べる、といった機能を有している。 The generalization unit 104 of the data editing apparatus 100 has a function of checking again whether the diversity definition D has been achieved when I is changed by using the above c.

以上の機能によりデータ編集装置１００は、関係モデルＲを、ｓ行以下で墨塗りし、多様性定義を満たすよう一般化木Ｔで一般化するための一般化情報の集合を得ることができる。 With the above function, the data editing apparatus 100 can obtain a set of generalized information for generalizing the relational model R with the generalized tree T so as to satisfy the diversity definition by sanitizing the relationship model R in the s rows or less.

したがって、データ編集装置１００は、各サンプル、すなわち個票データの各行のデータ提供者がみだりに知られたくない情報（たとえばｌ−多様化による一般化を用いる場合では、ＳＡ）と、他人でも容易に知ることができる情報（たとえばｌ−多様化による一般化を用いる場合では、ＱＩ）の設定を必要とせず、プライバシーに配慮した、属性間の関係を分析できる一般化を行うことができる。 Therefore, the data editing apparatus 100 can be easily used by information that the data provider of each sample, that is, each line of the individual vote data does not want to be known (for example, SA in the case of using generalization by l-diversification) and others. It is not necessary to set information that can be known (for example, QI in the case of using generalization by l-diversification), and generalization that can analyze the relationship between attributes in consideration of privacy can be performed.

図１４は実施形態のデータ編集装置１００の構成の例を示す図である。
このコンピュータ２００は、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ）２０２、ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）２０４、及びＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）２０６を備えている。コンピュータ５００は、さらに、ハードディスク装置２０８、入力装置２１０、表示装置２１２、インターフェース装置２１４、及び記録媒体駆動装置２１６を備えている。なお、これらの構成要素はバスライン２２０を介して接続されており、ＣＰＵ２０２の管理の下で各種のデータを相互に授受することができる。 FIG. 14 is a diagram illustrating an example of the configuration of the data editing apparatus 100 according to the embodiment.
The computer 200 includes a central processing unit (CPU) 202, a read only memory (ROM) 204, and a random access memory (RAM) 206. The computer 500 further includes a hard disk device 208, an input device 210, a display device 212, an interface device 214, and a recording medium driving device 216. These components are connected via a bus line 220, and various data can be exchanged under the control of the CPU 202.

ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ）２０２は、このコンピュータ２００全体の動作を制御する演算処理装置であり、コンピュータ２００の制御処理部として機能する。 A central processing unit (CPU) 202 is an arithmetic processing unit that controls the operation of the entire computer 200, and functions as a control processing unit of the computer 200.

ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）２０４は、所定の基本制御プログラムが予め記録されている読み出し専用半導体メモリである。ＣＰＵ２０２は、この基本制御プログラムをコンピュータ１００の起動時に読み出して実行することにより、このコンピュータ２００の各構成要素の動作制御が可能になる。 A Read Only Memory (ROM) 204 is a read-only semiconductor memory in which a predetermined basic control program is recorded in advance. The CPU 202 can control the operation of each component of the computer 200 by reading and executing the basic control program when the computer 100 is started.

ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）２０６は、ＣＰＵ２０２が各種の制御プログラムを実行する際に、必要に応じて作業用記憶領域として使用する、随時書き込み読み出し可能な半導体メモリである。 A random access memory (RAM) 206 is a semiconductor memory that can be written and read at any time and used as a working storage area as necessary when the CPU 202 executes various control programs.

ハードディスク装置２０８は、ＣＰＵ２０２によって実行される各種の制御プログラムや各種のデータを記憶しておく記憶装置である。ＣＰＵ２０２は、ハードディスク装置２０８に記憶されている所定の制御プログラムを読み出して実行することにより、後述する各種の制御処理を行えるようになる。 The hard disk device 208 is a storage device that stores various control programs executed by the CPU 202 and various data. The CPU 202 can perform various control processes described later by reading and executing a predetermined control program stored in the hard disk device 208.

入力装置２１０は、例えばマウス装置やキーボード装置であり、情報処理装置のユーザにより操作されると、その操作内容に対応付けられている各種情報の入力を取得し、取得した入力情報をＣＰＵ２０２に送付する。 The input device 210 is, for example, a mouse device or a keyboard device. When operated by a user of the information processing device, the input device 210 acquires input of various information associated with the operation content and sends the acquired input information to the CPU 202. To do.

表示装置２１２は例えば液晶ディスプレイであり、ＣＰＵ２０２から送付される表示データに応じて各種のテキストや画像を表示する。 The display device 212 is a liquid crystal display, for example, and displays various texts and images according to display data sent from the CPU 202.

インターフェース装置２１４は、このコンピュータ２００に接続される各種機器との間での各種情報の授受の管理を行う。 The interface device 214 manages the exchange of various information with various devices connected to the computer 200.

記録媒体駆動装置２１６は、可搬型記録媒体２１８に記録されている各種の制御プログラムやデータの読み出しを行う装置である。ＣＰＵ２０２は、可搬型記録媒体２１８に記録されている所定の制御プログラムを、記録媒体駆動装置２１６を介して読み出して実行することによって、後述する各種の制御処理を行うようにすることもできる。なお、可搬型記録媒体２１８としては、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）規格のコネクタが備えられているフラッシュメモリ、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）などがある。 The recording medium driving device 216 is a device that reads various control programs and data recorded on the portable recording medium 218. The CPU 202 can read out and execute a predetermined control program recorded on the portable recording medium 218 via the recording medium driving device 216 to perform various control processes described later. As the portable recording medium 218, for example, a flash memory equipped with a USB (Universal Serial Bus) standard connector, a CD-ROM (Compact Disc Read Only Memory), a DVD-ROM (Digital Versatile Disc Only Only). and so on.

このようなコンピュータ２００を用いてデータ編集装置１００を構成するには、例えば、上述の各処理部における処理をＣＰＵ２０２に行わせるための制御プログラムを作成する。作成された制御プログラムはハードディスク装置２０８若しくは可搬型記録媒体２１８に予め格納しておく。そして、ＣＰＵ２０２に所定の指示を与えてこの制御プログラムを読み出させて実行させる。こうすることで、情報処理装置が備えている機能がＣＰＵ２０２により提供される。 In order to configure the data editing apparatus 100 using such a computer 200, for example, a control program for causing the CPU 202 to perform the processing in each processing unit described above is created. The created control program is stored in advance in the hard disk device 208 or the portable recording medium 218. Then, a predetermined instruction is given to the CPU 202 to read and execute the control program. By doing so, the CPU 202 provides the functions of the information processing apparatus.

＜データ編集処理＞
図１５〜１７を参照して、データ編集処理について説明する。 <Data editing process>
The data editing process will be described with reference to FIGS.

また、データ編集装置１００が図１４に示されているような汎用コンピュータ２００である場合には、下記の説明は、そのような処理を行う制御プログラムを定義する。すなわち、以下では、下記に説明する処理を汎用コンピュータに行わせる制御プログラムの説明でもある。 When the data editing apparatus 100 is a general-purpose computer 200 as shown in FIG. 14, the following description defines a control program for performing such processing. That is, hereinafter, it is also a description of a control program that causes a general-purpose computer to perform the processing described below.

すなわち、以下では、複数のサンプルのそれぞれに対する複数の属性の値を含む関係モデルＲと、属性ごとに階層と階層のそれぞれに対応する一般化値とが定義された一般化情報Ｔとを用いて、階層の一つに対応する複数のサンプルの一般化値の組み合わせであって、複数の属性のうち一つの属性に対応する一般化値に対して複数の属性の他の属性に対応する一般化値が多様性を有する一般化値の組み合わせを抽出する処理をコンピュータ２００に実行させることを特徴とするデータ編集プログラムが提供される。 That is, in the following, using a relational model R including a plurality of attribute values for each of a plurality of samples, and generalized information T in which generalized values corresponding to the hierarchies and the respective hierarchies are defined for each attribute. A combination of generalized values of a plurality of samples corresponding to one of the hierarchies, and a generalization corresponding to other attributes of the plurality of attributes with respect to a generalized value corresponding to one attribute of the plurality of attributes There is provided a data editing program characterized by causing the computer 200 to execute a process of extracting a combination of generalized values having various values.

さらに、階層の一つに対応する複数のサンプルの一般化値の組み合わせにおいて、複数の属性の一つの値に対して、複数の属性の他の全てに対する前記一般化値が、複数の属性の他の属性および前記他の属性に対応する階層によって定義される多様性定義Ｄを満足するために、関係モデルに含まれる複数の属性の値のうち、所定の数のサンプルに関する複数の属性の値を削除する処理をコンピュータ２００に実行させても良い。 Further, in the combination of generalized values of a plurality of samples corresponding to one of the hierarchies, the generalized value for all of the plurality of attributes is different from that of the plurality of attributes for one value of the plurality of attributes. In order to satisfy the diversity definition D defined by the hierarchy corresponding to the attribute and the other attribute, among the values of the attributes included in the relationship model, the values of the attributes related to a predetermined number of samples You may make the computer 200 perform the process to delete.

複数の属性の一つａと階層をｌの組から多様性定義への写像Ｄ：（ａ、ｌ）→ｄによって得られる写像ｄは、複数の属性の一つａに対する度数分布をＦとして論理値への写像ｄ：Ｆ→１／０であり、Ｆ１＋Ｆ２は前記度数分布Ｆ１及びＦ２の和、∧を論理積演算として、
（ｄ１）単調性：（ｄ（Ｆ１）＝１）∧（ｄ（Ｆ２）＝１）⇒ｄ（Ｆ１＋Ｆ２）＝１、
（ｄ２）階層単調性：ｄ（Ｆ）＝１⇒ｄ’（Ｆ’）＝１、
を満たす。 A mapping d obtained by mapping D: (a, l) → d from a set of a and a hierarchy of one of a plurality of attributes to a diversity definition is logical with F as the frequency distribution for one of the plurality of attributes. Mapping to values d: F → 1/0, F1 + F2 is the sum of the frequency distributions F1 and F2, and ∧ as a logical product operation,
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1
(D2) hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1,
Meet.

階層の一つに対応する複数のサンプルの一般化値の組み合わせであって、複数の属性のうち一つの属性に対応する一般化値に対して複数の属性の他の全ての属性に対応する一般化値が多様性を有する一般化値の組み合わせを抽出しても良い。 A combination of generalized values of multiple samples corresponding to one of the hierarchies, and corresponding to all other attributes of the multiple attributes with respect to the generalized value corresponding to one of the multiple attributes You may extract the combination of the generalized value in which the normalized value has diversity.

さらに、一般化情報Ｔを一般化木で表したとき、一般化木の根に相当する階層を含む属性を最大１つだけ含むように一般化を行っても良い。 Further, when the generalized information T is represented by a generalized tree, the generalization may be performed so as to include at most one attribute including a hierarchy corresponding to the root of the generalized tree.

図１５は、処理の流れの例を示す図である。
処理が開始されると、Ｓ１００で、データ編集装置１００の入力部１０２は、関係モデルＲ、関係モデルの各列の一般化木Ｔ、各一般化木の各階層の多様性定義Ｄ、最大墨塗り行数ｓを入力として受入れる。 FIG. 15 is a diagram illustrating an example of the flow of processing.
When the processing is started, in S100, the input unit 102 of the data editing apparatus 100 causes the relation model R, the generalized tree T of each column of the relational model, the diversity definition D of each hierarchy of each generalized tree, and the maximum ink. The number of fill lines s is accepted as input.

次のＳ１０２でデータ編集装置１００の一般化部１０４は、関係モデルＲの各列名の列名；最大階層を要素とする集合Ｃを用意する。関係モデルＲの列名は｛生年、郵便番号、趣味｝であり、それぞれの最大階層は一般化木Ｔより｛生年：３、郵便番号：２、趣味：２｝なので、Ｃ＝｛｛生年：３｝、｛郵便番号：２｝、｛趣味：２｝｝である。本ステップの処理が終わると、処理はＳ１０４に進む。 In the next S102, the generalization unit 104 of the data editing apparatus 100 prepares a column name of each column name of the relation model R; a set C having the maximum hierarchy as an element. The column name of the relation model R is {birth year, zip code, hobby}, and the maximum hierarchy of each is {general year: 3, zip code: 2, hobby: 2} from generalized tree T, so C = {{birth year: 3}, {zip code: 2}, {hobby: 2}}. When the process in this step ends, the process proceeds to S104.

Ｓ１０４でデータ編集装置１００の一般化部１０４は、集合Ｃの各要素Ｐにつき、多様性を達成する（多様性定義Ｄを満たす）最大の階層を要素とする集合Ｏを用意する。たとえば、要素Ｐ＝｛生年：３｝の場合、多様性を満たす最大の階層は｛生年：３｝で墨塗りは不要である。また、要素Ｐ＝｛郵便番号：２｝の場合、多様性を満たす最大の階層は｛郵便番号：２｝であり、墨塗り不要である。さらに、要素Ｐ＝｛趣味：２}の場合、多様性を満たす最大の階層は｛趣味：２｝で墨塗りは不要である。よって、集合Ｏ＝｛（｛生年：３｝、｛｝）、（｛郵便番号：２｝、｛｝）、（｛趣味：２｝、｛｝）｝となる。 In S104, the generalization unit 104 of the data editing apparatus 100 prepares a set O having elements of the maximum hierarchy that achieves diversity (satisfies diversity definition D) for each element P of the set C. For example, when the element P = {birth year: 3}, the maximum hierarchy that satisfies the diversity is {birth year: 3} and no sanitization is required. In addition, when the element P = {zip code: 2}, the maximum hierarchy that satisfies the diversity is {zip code: 2}, and no sanitization is required. Further, when the element P = {hobby: 2}, the maximum hierarchy that satisfies the diversity is {hobby: 2}, and no sanitization is required. Therefore, the set O = {({birth year: 3}, {}), ({zip code: 2}, {}), ({hobby: 2}, {})}.

Ｓ１０４の処理について、図１６を参照しながら説明する。
処理が開始されると、Ｓ２００でデータ編集装置１００の一般化部１０４は、関係モデルＲ、関係モデルの各列の一般化木Ｔ、各一般化木の各階層の多様性定義Ｄ、最大墨塗り行数ｓ、｛列名：階層｝の組Ｐ、を入力として受入れる。 The process of S104 will be described with reference to FIG.
When the processing is started, the generalization unit 104 of the data editing apparatus 100 in S200, the relation model R, the generalization tree T of each column of the relational model, the diversity definition D of each hierarchy of each generalization tree, the maximum ink The number of fill lines s and the set P of {column name: hierarchy} are accepted as input.

次のＳ２０２でデータ編集装置１００の一般化部１０４は、空集合Ｏを用意する。本ステップの処理が終わると、処理はＳ２０４に進む。 In step S202, the generalization unit 104 of the data editing apparatus 100 prepares an empty set O. When the process of this step ends, the process proceeds to S204.

Ｓ２０４でデータ編集装置１００の一般化部１０４は、Ｐの列範囲で、Ｐによる一般化で関係モデルＲが多様性を達成する（多様性定義Ｄを満たす）ために必要な墨塗り行集合Ｉを算出する。たとえば、集合Ｃの各要素Ｐの列集合は｛生年、郵便番号｝なので、関係モデルＲの複写をこの列範囲でＰによって一般化する。図１１は、Ｐ’＝｛生年：３、郵便番号：２｝で一般化した結果を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛０、１、・・・、８｝となる。つまり、全行を墨塗りする必要がある。 In S204, the generalization unit 104 of the data editing apparatus 100 performs the sanitization line set I necessary for the relational model R to achieve diversity (satisfaction with the diversity definition D) by P generalization in the column range of P. Is calculated. For example, since the column set of each element P of the set C is {birth year, zip code}, the copy of the relational model R is generalized by P in this column range. FIG. 11 is a diagram illustrating a result generalized by P ′ = {year of birth: 3, postal code: 2}. In order to achieve the diversity definition D, the row set to be sanitized is I = {0, 1,..., 8}. In other words, all lines need to be painted.

Ｓ２０４の処理について、図１７を参照しながら説明する。
処理が開始されると、Ｓ３００でデータ編集装置１００の一般化部１０４は、関係モデルＲ、関係モデルの各列の一般化木Ｔ、各一般化木の各階層の多様性定義Ｄ、最大墨塗り行数ｓ、｛列名：階層｝の組Ｐ、を入力として受入れる。 The process of S204 will be described with reference to FIG.
When the process is started, the generalization unit 104 of the data editing apparatus 100 in S300, the relation model R, the generalization tree T of each column of the relation model, the diversity definition D of each hierarchy of each generalization tree, the maximum ink The number of fill lines s and the set P of {column name: hierarchy} are accepted as input.

次のＳ３０２でデータ編集装置１００の一般化部１０４は、関係モデルＲを関係モデルＲ’に複写し、関係モデルＲ’を集合Ｃの要素Ｐによって一般化木Ｔに従って一般化する。本ステップの処理が終わると、処理はＳ３０４に進む。 In the next step S302, the generalization unit 104 of the data editing apparatus 100 copies the relation model R to the relation model R ′, and generalizes the relation model R ′ according to the generalized tree T by the element P of the set C. When the process of this step ends, the process proceeds to S304.

Ｓ３０４でデータ編集装置１００の一般化部１０４は、空集合Ｉを用意する。本ステップの処理が終わると、処理はＳ３０６に進む。 In S304, the generalization unit 104 of the data editing apparatus 100 prepares an empty set I. When the process in this step ends, the process proceeds to S306.

Ｓ３０６でデータ編集装置１００の一般化部１０４は、論理値ｃを用意し、論理値ｃに初期値０を代入する。すなわち、ｃ＝０となる。本ステップの処理が終わると、処理はＳ３０８に進む。 In S306, the generalization unit 104 of the data editing apparatus 100 prepares a logical value c, and substitutes an initial value 0 for the logical value c. That is, c = 0. When the process of this step ends, the process proceeds to S308.

Ｓ３０８でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐの要素ｐを特定する変数ｌをリセットする。たとえば、ｌ＝０としても良い。本ステップの処理が終わると、処理はＳ３１０に進む。 In S308, the generalization unit 104 of the data editing apparatus 100 resets the variable l that specifies the element p of the element P of the set C. For example, l = 0 may be set. When the process in this step ends, the process proceeds to S310.

Ｓ３１０でデータ編集装置１００の一般化部１０４は、ｌを更新し、ｌを使って集合Ｃの要素Ｐから要素ｐを得る。たとえば、ｌの値を１だけ増やし、それに対応する要素ｐを得ても良い。まず、ｐ←｛生年：３｝とする。本ステップの処理が終わると、処理はＳ３１２に進む。 In S310, the generalization unit 104 of the data editing apparatus 100 updates l and obtains the element p from the element P of the set C using l. For example, the value of l may be increased by 1, and the corresponding element p may be obtained. First, p ← {year of birth: 3}. When the process of this step ends, the process proceeds to S312.

Ｓ３１２でデータ編集装置１００の一般化部１０４は、要素ｐ∈Ｐに対応する多様性定義ｄを多様性定義Ｄから抽出する。たとえば、｛生年：３｝の多様性定義は、「１０の位の値が２種類以上のとき１、さもなくば０を返す」写像となる。本ステップの処理が終わると、処理はＳ３１４に進む。 In S312, the generalization unit 104 of the data editing apparatus 100 extracts the diversity definition d corresponding to the element pεP from the diversity definition D. For example, the diversity definition of {birth year: 3} is a map that “returns 1 when there are two or more 10-digit values, otherwise returns 0”. When the process of this step ends, the process proceeds to S314.

Ｓ３１４でデータ編集装置１００の一般化部１０４は、関係モデルＲ’の墨塗り行集合Ｉ以外の行から、要素ｐの列名以外の値が等しい行番号の集合をグループＪとし、Ｊの集合Ｕを算出する。図１１では、Ｒ’のＩ以外の行は、図１１の全ての行であり、ｐの列名以外の列は｛郵便番号｝なので、図１１の「郵便番号」の値が等しいグループを作り、Ｕ＝｛｛０、２、６、７｝、｛１、３、４、５｝、｛８｝｝となる。なお、たとえばＪ＝｛８｝は（郵便番号）＝（１３？）のグループである。また、ｐの列名以外の列が｛｝であった場合は、Ｒ’のＩ以外の全行を１つのグループをＪとし、Ｕ＝｛Ｊ｝とする。本ステップの処理が終わると、処理はＳ３１６に進む。 In S314, the generalization unit 104 of the data editing apparatus 100 sets a group of row numbers having the same value other than the column name of the element p from a row other than the sanitized row set I of the relation model R ′ as a group J, and sets J U is calculated. In FIG. 11, the rows other than I in R ′ are all the rows in FIG. 11, and the columns other than the column name of p are {zip code}, so a group having the same value of “zip code” in FIG. , U = {{0, 2, 6, 7}, {1, 3, 4, 5}, {8}}. For example, J = {8} is a group of (zip code) = (13?). If the column other than the column name of p is {}, all the rows other than I in R ′ are set as one group, and U = {J}. When the process of this step ends, the process proceeds to S316.

Ｓ３１６でデータ編集装置１００の一般化部１０４は、集合Ｕの要素を特定する変数ｍをリセットする。たとえば、ｍ＝０としても良い。本ステップの処理が終わると、処理はＳ３１８に進む。以下では、Ｕの各要素について、Ｓ３２２〜Ｓ３２８を繰り返す。 In S316, the generalization unit 104 of the data editing apparatus 100 resets the variable m that identifies the elements of the set U. For example, m = 0 may be set. When the process of this step ends, the process proceeds to S318. Hereinafter, S322 to S328 are repeated for each element of U.

Ｓ３１８でデータ編集装置１００の一般化部１０４は、ｍの更新を行う。たとえば、ｍの値を１だけ増やしても良い。本ステップの処理が終わると、処理はＳ３２０に進む。 In step S318, the generalization unit 104 of the data editing apparatus 100 updates m. For example, the value of m may be increased by 1. When the process of this step ends, the process proceeds to S320.

Ｓ３２０でデータ編集装置１００の一般化部１０４は、現在のｍに対応する集合Ｕの要素Ｊを取得する。たとえば、Ｊ←｛０、２、６、７｝とする。本ステップの処理が終わると、処理はＳ３２２に進む。 In S320, the generalization unit 104 of the data editing apparatus 100 acquires the element J of the set U corresponding to the current m. For example, J ← {0, 2, 6, 7}. When the process of this step ends, the process proceeds to S322.

Ｓ３２２でデータ編集装置１００の一般化部１０４は、関係モデルＲ’から、要素ｐの列且つＪの値の度数分布Ｆを算出する。たとえば、ｐの列は生年、Ｊ＝｛０、２、６、７｝なので、図１１よりＦ＝｛“１９７０”：１、“１９７２”：１、“１９８９”：２｝となる。本ステップの処理が終わると、処理はＳ３２４に進む。 In S322, the generalization unit 104 of the data editing apparatus 100 calculates the frequency distribution F of the column of elements p and the value of J from the relationship model R ′. For example, since the row of p is the year of birth and J = {0, 2, 6, 7}, F = {“1970”: 1, “1972”: 1, “1989”: 2} from FIG. When the process of this step ends, the process proceeds to S324.

Ｓ３２４でデータ編集装置１００の一般化部１０４は、ｄ（Ｆ）＝０かどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわちｄ（Ｆ）＝０である場合、処理はＳ３２６に進む。もし、この判定の結果が“ＮＯ”、すなわちｄ（Ｆ）≠０である場合、処理はＳ３２８に進む。たとえばＦ＝｛“１９７０”：１、“１９７２”：１、“１９８９”：２｝の場合、Ｆには１０の位の値が７と８の２種類あるため、ｄ（Ｆ）＝１となる。この場合、処理はＳ３２６に進む。 In S324, the generalization unit 104 of the data editing apparatus 100 determines whether d (F) = 0. If the result of this determination is “YES”, that is, d (F) = 0, the process proceeds to S326. If the result of this determination is “NO”, that is, d (F) ≠ 0, the process proceeds to S328. For example, in the case of F = {“1970”: 1, “1972”: 1, “1989”: 2}, there are two kinds of values of the tenth place of 7 and 8 in F, so d (F) = 1. Become. In this case, the process proceeds to S326.

Ｓ３２６でデータ編集装置１００の一般化部１０４は、墨塗り行集合Ｉにｍに対応する集合Ｕの要素Ｊの全要素を追加し、論理値ｃに１を代入（ｃ←１）する。本ステップの処理が終わると、処理はＳ３２８に進む。 In S326, the generalization unit 104 of the data editing apparatus 100 adds all elements of the element J of the set U corresponding to m to the sanitized line set I, and substitutes 1 for the logical value c (c ← 1). When the process of this step ends, the process proceeds to S328.

Ｓ３２８でデータ編集装置１００の一般化部１０４は、集合Ｕの全ての要素について処理したかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、集合Ｕの全ての要素について処理した場合、処理はＳ３３０に進む。もし、この判定の結果が“ＮＯ”、すなわち集合Ｕの全ての要素について処理していない場合、処理はＳ３１８に戻る。 In S328, the generalization unit 104 of the data editing apparatus 100 determines whether all elements of the set U have been processed. If the result of this determination is “YES”, that is, if all elements of the set U have been processed, the process proceeds to S330. If the result of this determination is “NO”, that is, if all elements of the set U have not been processed, the process returns to S318.

最後の繰り返しでＪ←｛８｝になったとする。Ｓ３２２で、ｐの列は「生年」、Ｊ＝｛８｝なので、図１１よりＦ＝｛“２０００”：１｝となる。すると、Ｓ３２４でＦには１０の位の値が０の１種類しかないため、ｄ（Ｆ）＝０となり、Ｓ３２６でＩにＪの全要素を追加し、たとえば、Ｉ＝｛８｝、ｃ＝１となる。 Assume that J ← {8} in the last iteration. In S322, since the column of p is “birth year” and J = {8}, F = {“2000”: 1} from FIG. Then, since there is only one type of F in S324 having a value of 10's place, d (F) = 0, and in S326, all elements of J are added to I. For example, I = {8}, c = 1.

Ｓ３３０でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐの全ての要素について処理したかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、集合Ｕの全ての要素について処理した場合、処理はＳ３３２に進む。もし、この判定の結果が“ＮＯ”、すなわち、集合Ｕの全ての要素について処理していない場合、処理はＳ３１０に戻る。 In S330, the generalization unit 104 of the data editing apparatus 100 determines whether all the elements of the elements P in the set C have been processed. If the result of this determination is “YES”, that is, if all elements of the set U have been processed, the process proceeds to S332. If the result of this determination is “NO”, that is, if all elements of the set U have not been processed, the process returns to S310.

２回目のＳ３１０で、ｐ←郵便番号：２としても良い。
２回目のＳ３１２では、｛郵便番号：２｝の多様性定義は、図８より「１０の位の値が２種類以上のとき１、さもなくば０を返す」写像となる。 In the second S310, p ← zip code: 2 may be set.
In the second S312, the diversity definition of {zip code: 2} is a map "returns 1 when there are two or more values of the number of 10 and 0 otherwise" from FIG.

２回目のＳ３１４で、Ｒ’のＩ以外の行は図１１の行｛０、１、・・・、７｝であり、ｐの列名以外の列は｛生年｝なので、図１１の行｛０、１、・・・、７｝の「生年」の値が等しいグループを作り、Ｕ＝｛｛０、１｝、｛２｝、｛３｝、｛４｝、｛５｝、｛６、７｝｝となる。 In S314 of the second time, the rows other than I of R ′ are the rows {0, 1,..., 7} of FIG. 11 and the columns other than the column name of p are {the year of birth}. 0, 1,..., 7} having the same “birth year” value, and U = {{0, 1}, {2}, {3}, {4}, {5}, {6, 7}}.

２回目のＳ３２０では、Ｊ←｛０、１｝となり、次のＳ３２２でＦ＝｛“１２？”：１、 “１４？”：１｝となる。 In the second S320, J ← {0, 1}, and in the next S322, F = {“12?”: 1, “14?”: 1}.

２回目のＳ３２４では、Ｆには１０の位の値が２と４の２種類あるため、ｄ（Ｆ）＝１である。 In S324 of the 2nd time, since F has two kinds of values of the tenth place, 2 and 4, d (F) = 1.

Ｓ３１８〜Ｓ３２８の処理を繰り返し、全てのＵの要素について繰り返し終了時には、Ｉ＝｛２、３、４、５、６、７、８｝となる。 The processes of S318 to S328 are repeated, and when all U elements are repeated, I = {2, 3, 4, 5, 6, 7, 8}.

Ｓ３３２でデータ編集装置１００の一般化部１０４は、論理値ｃの値が０かどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、ｃ＝０の場合、処理はＳ３３４に進む。もし、この判定の結果が“ＮＯ”、すなわち、ｃ≠０の場合、処理はＳ３０６に戻る。 In S332, the generalization unit 104 of the data editing apparatus 100 determines whether the logical value c is 0. If the result of this determination is “YES”, ie, c = 0, the process proceeds to S334. If the result of this determination is “NO”, ie, c ≠ 0, the process returns to S306.

同様に処理をすると、次にＳ３２２に処理が移った時にはＩ＝｛０、１、２、３、４、５、６、７、８｝、ｃ＝１になっているため、再度Ｓ３０６に戻る。その次にＳ３２２に処理が移った時にはＩ＝｛０、１、２、３、４、５、６、７、８｝、ｃ＝０になっている。 If the same processing is performed, I = {0, 1, 2, 3, 4, 5, 6, 7, 8} and c = 1 when the processing moves to S322 next time, so that the processing returns to S306 again. . Next, when the process moves to S322, I = {0, 1, 2, 3, 4, 5, 6, 7, 8} and c = 0.

Ｓ３３４でデータ編集装置１００の出力部１０６は、墨塗り行集合Ｉを出力する。本ステップの処理が終わると、処理は図１６のＳ２０６に進む。 In step S334, the output unit 106 of the data editing apparatus 100 outputs the sanitized line set I. When the process in this step is completed, the process proceeds to S206 in FIG.

図１７のＳ３３０により、墨塗り行集合Ｉに変更があった場合、多様性が達成できている（多様性定義Ｄが満たされている）か再度調べる、といった処理を実現している。上記例のように、最初にＳ３３０に制御が移ったときは行｛０、１｝は多様性が達成されていると扱われているが、墨塗り行集合Ｉが増えたことで結局行｛０、１｝も多様性が達成されていないという結果が算出されている。これは、複数列で多様性定義を同時に達成するように要求するために必要な特徴的な処理である。 By S330 of FIG. 17, when the sanitization line set I is changed, a process of checking again whether diversity has been achieved (diversity definition D is satisfied) is realized. As in the above example, when control is first transferred to S330, the row {0, 1} is treated as having achieved diversity, but as the sanitized row set I increases, the row { 0, 1} has also been calculated that diversity has not been achieved. This is a characteristic process required to request that a diversity definition be simultaneously achieved in multiple columns.

Ｓ２０６でデータ編集装置１００の一般化部１０４は、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓ以下であるかどうか、すなわち｜Ｉ｜≦ｓであるかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓ以下（｜Ｉ｜≦ｓ）の場合、処理はＳ２０８に進む。もし、この判定の結果が“ＮＯ”、すなわち、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓより大きい（｜Ｉ｜＞ｓ）場合、処理はＳ２１０に進む。たとえば、Ｉの要素数｜Ｉ｜＝９、最大墨塗り行数ｓ＝２の場合は、条件は成立しない。よって、処理はＳ２１０に進む。もし、条件が成立していた場合は、Ｓ２０８に進む。 In S206, the generalization unit 104 of the data editing apparatus 100 determines whether the number of elements | I | of the sanitized line set I is equal to or less than the maximum number of sanitized lines s, that is, whether | I | ≦ s. . If the result of this determination is “YES”, that is, if the number of elements | I | of the sanitized line set I is equal to or less than the maximum sanitized line number s (| I | ≦ s), the process proceeds to S208. If the result of this determination is “NO”, that is, if the number of elements | I | in the sanitized line set I is greater than the maximum sanitized line number s (| I |> s), the process proceeds to S210. For example, the condition is not satisfied when the number of elements I | I | = 9 and the maximum number of sanitized lines s = 2. Therefore, the process proceeds to S210. If the condition is satisfied, the process proceeds to S208.

Ｓ２０８でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐと墨塗り行集合Ｉの組を要素として集合Ｏに追加する。たとえば、Ｏ＝｛（Ｐ、Ｉ）｝となる。本ステップの処理が終わると、処理はＳ２４２に進む。 In S208, the generalization unit 104 of the data editing apparatus 100 adds the set of the element P of the set C and the sanitized line set I as an element to the set O. For example, O = {(P, I)}. When the process of this step ends, the process proceeds to S242.

Ｓ２１０でデータ編集装置１００の一般化部１０４は、空集合Ｃを用意し、集合Ｃの要素Ｐを集合Ｃの要素に追加する。たとえば、集合Ｃ＝｛Ｐ｝＝｛｛生年：３、郵便番号：２｝｝である。本ステップの処理が終わると、処理はＳ２１２に進む。 In S210, the generalization unit 104 of the data editing apparatus 100 prepares an empty set C and adds an element P of the set C to an element of the set C. For example, the set C = {P} = {{year of birth: 3, zip code: 2}}. When the process of this step ends, the process proceeds to S212.

Ｓ２１２でデータ編集装置１００の一般化部１０４は、Ｃの要素が空になるまで、Ｓ２１２からＳ２３２までの繰り返しを開始する。本ステップの処理が終わると、処理はＳ２１４に進む。 In S212, the generalization unit 104 of the data editing apparatus 100 starts to repeat S212 to S232 until the element C becomes empty. When the process of this step ends, the process proceeds to S214.

Ｓ２１４でデータ編集装置１００の一般化部１０４は、集合Ｃから要素を１つ削除し、Ｐとする。たとえば、Ｐ←｛生年：３、郵便番号：２｝、Ｃ←｛｝とする。本ステップの処理が終わると、処理はＳ２１６に進む。 In S214, the generalization unit 104 of the data editing apparatus 100 deletes one element from the set C and sets it to P. For example, P ← {year of birth: 3, postal code: 2}, C ← {}. When the process of this step ends, the process proceeds to S216.

Ｓ２１６でデータ編集装置１００の一般化部１０４は、Ｐの列名ａのそれぞれについて、Ｓ２１６からＳ２３０までの繰り返しを開始する。本ステップの処理が終わると、処理はＳ２１８に進む。 In S216, the generalization unit 104 of the data editing apparatus 100 starts repeating S216 to S230 for each of the P column names a. When the process of this step ends, the process proceeds to S218.

Ｓ２１８でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐの一つの列名ａを取得する。たとえば、ａ←生年とする。本ステップの処理が終わると、処理はＳ２２４に進む。 In S218, the generalization unit 104 of the data editing apparatus 100 acquires one column name a of the element P of the set C. For example, a ← birth year. When the process of this step ends, the process proceeds to S224.

Ｓ２１８でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐの列名ａの階層が０より大きいかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、集合Ｃの要素Ｐの列名ａの階層が０より大きい場合、処理はＳ２２０に進む。もし、この判定の結果が“ＮＯ”、すなわち、集合Ｃの要素Ｐの列名ａの階層が０より大きくない場合、処理はＳ２３０に進む。たとえば、Ｐの生年の階層は３であるので、処理はＳ２２０に進む。 In S218, the generalization unit 104 of the data editing apparatus 100 determines whether the hierarchy of the column name a of the element P of the set C is greater than zero. If the result of this determination is “YES”, that is, if the hierarchy of the column name a of the element P of the set C is greater than 0, the process proceeds to S220. If the result of this determination is “NO”, that is, if the hierarchy of the column name a of the element P of the set C is not greater than 0, the process proceeds to S230. For example, since the hierarchy of P's birth year is 3, the process proceeds to S220.

Ｓ２２０でデータ編集装置１００の一般化部１０４は、要素Ｐを要素Ｐ’に複写し、要素Ｐ’のａの階層を１だけ減じる。たとえば、Ｐ’←｛生年：２、郵便番号：２｝とする。本ステップの処理が終わると、処理はＳ２２２に進む。 In S220, the generalizing unit 104 of the data editing apparatus 100 copies the element P to the element P ', and reduces the hierarchy of a of the element P' by 1. For example, P ′ ← {year of birth: 2, postal code: 2}. When the process of this step ends, the process proceeds to S222.

Ｓ２２２でデータ編集装置１００の一般化部１０４は、Ｐ’の列範囲で、Ｐ’による一般化で関係モデルＲが多様性を達成する（多様性定義Ｄを満たす）ために必要な墨塗り行集合Ｉを算出する。たとえば、要素Ｐ’の列集合は｛生年、郵便番号｝なので、関係モデルＲの複写をこの列範囲でＰ’によって一般化する。図１２は、Ｐ’＝｛生年：２、郵便番号：２｝で一般化した結果の例を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛８｝となる。図１２に示されている例では、墨塗りすべき行集合Ｉの要素数｜Ｉ｜＝１、最大墨塗り行数ｓ＝２なので条件が成立する。 In S222, the generalization unit 104 of the data editing apparatus 100 performs the sanitization necessary for the relation model R to achieve diversity (satisfaction with the diversity definition D) in the column range of P ′ by the generalization by P ′. Set I is calculated. For example, since the column set of element P ′ is {birth year, postal code}, the copy of relational model R is generalized by P ′ in this column range. FIG. 12 is a diagram illustrating an example of a result generalized by P ′ = {year of birth: 2, postal code: 2}. In order to achieve diversity definition D, the set of rows to be sanitized is I = {8}. In the example shown in FIG. 12, the condition is satisfied because the number of elements | I | = 1 of the line set I to be sanitized and the maximum number of sanitized lines s = 2.

Ｓ２２２の処理は、Ｓ２０４の処理と同様であり、図１７に示されているので、繰り返しの説明は省略する。本ステップの処理が終わると、処理はＳ２２４に進む。 The process of S222 is the same as the process of S204, and is shown in FIG. When the process of this step ends, the process proceeds to S224.

Ｓ２２４でデータ編集装置１００の一般化部１０４は、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓ以下であるかどうか、すなわち｜Ｉ｜≦ｓであるかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓ以下（｜Ｉ｜≦ｓ）の場合、処理はＳ２２６に進む。もし、この判定の結果が“ＮＯ”、すなわち、墨塗り行集合Ｉの要素数｜Ｉ｜が最大墨塗り行数ｓより大きい（｜Ｉ｜＞ｓ）場合、処理はＳ２３４に進む。｜Ｉ｜＝１、ｓ＝２の場合は条件が成立し、Ｓ２２８に進む。 In S224, the generalization unit 104 of the data editing apparatus 100 determines whether the number of elements | I | of the sanitized line set I is equal to or less than the maximum number of sanitized lines s, that is, whether | I | ≦ s. . If the result of this determination is “YES”, that is, if the number of elements | I | of the sanitized line set I is equal to or less than the maximum number of sanitized lines s (| I | ≦ s), the process proceeds to S226. If the result of this determination is “NO”, that is, if the number of elements | I | in the sanitized line set I is greater than the maximum sanitized line number s (| I |> s), the process proceeds to S234. When | I | = 1 and s = 2, the condition is satisfied, and the process proceeds to S228.

Ｓ２２６でデータ編集装置１００の一般化部１０４は、集合Ｃの要素Ｐ’と墨塗り行集合Ｉの組を要素として集合Ｏに追加する。たとえば集合Ｏに（Ｐ’、Ｉ）を追加する。結果、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）｝となる。本ステップの処理が終わると、処理はＳ２３０に進む。 In S226, the generalizing unit 104 of the data editing apparatus 100 adds the set of the element P ′ of the set C and the sanitizing line set I as an element to the set O. For example, (P ′, I) is added to the set O. As a result, O = {({year of birth: 2, postal code: 2}, {8})}. When the process of this step ends, the process proceeds to S230.

Ｓ２２８でデータ編集装置１００の一般化部１０４は、集合Ｃに要素Ｐ’を追加する。本ステップの処理が終わると、処理はＳ２３０に進む。 In S228, the generalization unit 104 of the data editing apparatus 100 adds the element P ′ to the set C. When the process of this step ends, the process proceeds to S230.

Ｓ２３０でデータ編集装置１００の一般化部１０４は、要素Ｐの全ての列について、Ｓ２１６〜Ｓ２２８の処理を行ったかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、要素Ｐの全ての列について、Ｓ２１６〜Ｓ２２８の処理を行った場合、処理はＳ２３２に進む。もし、この判定の結果が“ＮＯ”、すなわち、要素Ｐの全ての列について、Ｓ２１６〜Ｓ２２８の処理を行っていない場合、処理はＳ２１６に戻る。たとえば、ａ←郵便番号とし、Ｓ２２４に戻る。 In S230, the generalization unit 104 of the data editing apparatus 100 determines whether or not the processing of S216 to S228 has been performed for all the columns of the element P. If the result of this determination is “YES”, that is, if the processes of S216 to S228 have been performed for all the columns of the element P, the process proceeds to S232. If the result of this determination is “NO”, that is, if the processing of S216 to S228 has not been performed for all the columns of the element P, the processing returns to S216. For example, a ← zip code is set, and the process returns to S224.

２回目のＳ２１８では、Ｐの郵便番号の階層は２であるため、処理はＳ２２０に進む。
２回目のＳ２２０でＰをＰ’に複写し、Ｐ’のａの階層を１だけ減じる。すると、Ｐ’←｛生年：３、郵便番号：１｝となる。 In the second S218, since the postal code hierarchy of P is 2, the process proceeds to S220.
In the second S220, P is copied to P ′, and the level of P ′ a is reduced by 1. Then, P ′ ← {year of birth: 3, postal code: 1}.

２回目のＳ２２２では、Ｐ’の列集合は｛生年、郵便番号｝なので、Ｒの複写をこの列範囲でＰ’によって一般化する。図１３は、Ｐ’＝｛生年：３、郵便番号：１｝で一般化した結果の例を示す図である。多様性定義Ｄを達成するために、墨塗りすべき行集合Ｉ＝｛０、１、・・・、８｝となる。つまり、全行を墨塗りする必要がある。 In the second S222, since the column set of P 'is {birth year, postal code}, the copy of R is generalized by P' in this column range. FIG. 13 is a diagram illustrating an example of a result generalized by P ′ = {year of birth: 3, zip code: 1}. In order to achieve the diversity definition D, the row set to be sanitized is I = {0, 1,..., 8}. In other words, all lines need to be painted.

２回目のＳ２２４では、墨塗りすべき行集合Ｉの要素数｜Ｉ｜＝９、最大墨塗り行数ｓ＝２なので条件は成立しない。よって、処理はＳ２２８に進む。 In the second S224, the condition is not satisfied because the number of elements | I | = 9 of the line set I to be sanitized and the maximum number of sanitized lines s = 2. Therefore, the process proceeds to S228.

Ｓ２２８でデータ編集装置１００の一般化部１０４は、ＣにＰ’を追加し、集合Ｃ＝｛Ｐ｝＝｛｛生年：３、郵便番号：１｝｝である。 In S228, the generalization unit 104 of the data editing apparatus 100 adds P ′ to C, and the set C = {P} = {{year of birth: 3, zip code: 1}}.

Ｓ２３２でデータ編集装置１００の一般化部１０４は、集合Ｃの全ての要素Ｐについて、Ｓ２１２〜Ｓ２３０の処理を行ったかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、集合Ｃの全ての要素Ｐについて、Ｓ２１２〜Ｓ２３０の処理を行った場合、処理はＳ２３４に進む。もし、この判定の結果が“ＮＯ”、すなわち、集合Ｃの全ての要素Ｐについて、Ｓ２１２〜Ｓ２３０の処理を行っていない場合、処理はＳ２１２に戻る。 In S232, the generalization unit 104 of the data editing apparatus 100 determines whether or not the processing of S212 to S230 has been performed for all elements P of the set C. If the result of this determination is “YES”, that is, if the processes of S212 to S230 have been performed for all elements P of the set C, the process proceeds to S234. If the result of this determination is “NO”, that is, if the processes of S212 to S230 have not been performed for all the elements P of the set C, the process returns to S212.

たとえば、Ｃは空でない場合、データ編集装置１００の一般化部１０４は、集合Ｃから要素を１つ削除し、Ｐとする。つまり、データ編集装置１００の一般化部１０４は、この集合Ｃが空になるまで、Ｓ２１４〜Ｓ２３２の処理を繰り返す。結果、Ｃが空になった時点で、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、郵便番号：１｝、｛８｝）、（｛生年：３、郵便番号：０｝、｛｝）｝を得る。 For example, when C is not empty, the generalization unit 104 of the data editing apparatus 100 deletes one element from the set C and sets it as P. That is, the generalization unit 104 of the data editing apparatus 100 repeats the processes of S214 to S232 until the set C becomes empty. As a result, when C becomes empty, O = {({birth year: 2, zip code: 2}, {8}), ({birth year: 2, zip code: 1}, {8}), ({ Year of birth: 3, postal code: 0}, {})}.

Ｓ２３４でデータ編集装置１００の一般化部１０４は、集合Ｏから、極大でない要素Ｐ’を持つ要素を削除する。たておば、集合Ｏの３要素のうち（｛生年：２、郵便番号:１｝、｛８｝）は極大でない（他の要素（｛生年：２、郵便番号：２｝、｛８｝）よりＰ’の部分が小さい）ので、これを削除する。結果、集合Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：３、郵便番号：０｝、｛｝）｝となる。本ステップの処理が終わると、処理はＳ２３６に進む。 In S234, the generalization unit 104 of the data editing apparatus 100 deletes the element having the non-maximum element P ′ from the set O. Therefore, among the three elements of set O, ({birth year: 2, zip code: 1}, {8}) is not maximal (other elements ({birth year: 2, zip code: 2}, {8}) This is deleted because the part P ′ is smaller). As a result, the set O = {({birth year: 2, zip code: 2}, {8}), ({birth year: 3, zip code: 0}, {})}. When the process of this step ends, the process proceeds to S236.

Ｓ２３６でデータ編集装置１００の出力部１０６は、集合Ｏを出力する。本ステップの処理が終わると、処理は図１５のＳ１０６に進む。 In S236, the output unit 106 of the data editing apparatus 100 outputs the set O. When the process in this step is completed, the process proceeds to S106 in FIG.

図１６では、｜Ｉ｜＞ｓの場合はその墨塗り行集合Ｉは使用しないため、図１７のＳ３２６の後などで｜Ｉ｜＞ｓとなった場合に、データ編集装置１００の出力部１０６は、すぐに墨塗り行集合Ｉを出力して図１７に示されている処理を終了しても良い。そうすることで、処理量を減らす効果がある。 In FIG. 16, when | I |> s, the sanitized line set I is not used. Therefore, when | I |> s after S326 in FIG. 17 or the like, the output unit 106 of the data editing apparatus 100 May immediately output the sanitized line set I and end the processing shown in FIG. By doing so, there is an effect of reducing the processing amount.

Ｓ１０６でデータ編集装置１００の一般化部１０４は、関係モデルの列を指定する変数ｉを０に設定する。 In S106, the generalization unit 104 of the data editing apparatus 100 sets a variable i for designating a relation model column to 0.

次のＳ１０８でデータ編集装置１００の一般化部１０４は、ｉの値を１だけ増やす。本ステップの処理が終わると、処理はＳ１１０に進む。 In the next S108, the generalization unit 104 of the data editing apparatus 100 increases the value of i by 1. When the process in this step is finished, the process proceeds to S110.

Ｓ１１０でデータ編集装置１００の一般化部１０４は、列名集合の異なる全ての２つのＧ∈Ｏの｛列名：階層｝の組み合わせについて、重複列名について階層を２つの最小値として融合した｛列名：階層｝をそれぞれ求め、それらを集合Ｃとし、集合Ｃから階層が極大でない要素を削除する。図２の例では、列名集合の異なる２つのＧ∈Ｏの｛列名：階層｝の組み合わせは、（１）｛生年：３｝、｛郵便番号：２｝、（２）｛生年：３｝、｛趣味：２｝、（３）｛郵便番号：２｝、｛趣味：２｝の３通りとなる。いずれも重複列名がないためそのまま融合し、集合Ｃ＝｛｛生年：３、郵便番号：２｝、｛生年：３、趣味：２｝、｛郵便番号：２、趣味：２｝｝となる。集合Ｃの要素は全て極大である。重複列名があった場合や、極大でない要素があった場合の例は、後の2回目の繰り返しの時に処理をする。本ステップの処理が終わると、処理はＳ１１２に進む。 In S110, the generalization unit 104 of the data editing apparatus 100 merges the hierarchies of duplicate column names as two minimum values for all the combinations of {column names: hierarchies} of two GεOs having different column name sets { Column name: Hierarchy} is obtained, and these are set as set C, and elements whose hierarchies are not maximum are deleted from set C. In the example of FIG. 2, the combinations of {column name: hierarchy} of two GεOs having different column name sets are (1) {birth year: 3}, {zip code: 2}, (2) {birth year: 3 }, {Hobby: 2}, (3) {zip code: 2}, {hobby: 2}. Since there is no duplicate column name, they are merged as they are, and the set C = {{birth year: 3, zip code: 2}, {birth year: 3, hobby: 2}, {zip code: 2, hobby: 2}}. . All elements of set C are maximal. If there is a duplicate column name or there is a non-maximum element, it will be processed at the second iteration. When the process of this step ends, the process proceeds to S112.

Ｓ１１２でデータ編集装置１００の一般化部１０４は、集合Ｏを空にし、各要素Ｐ∈Ｃにつき、多様性を満たす（多様性定義Ｄを満たす）全ての極大の階層を集合Ｏに追加する。本ステップの処理は、図１６に示されている処理と同様である。よって、繰り返しの説明は省略する。たとえば、集合Ｃの各要素Ｐ＝｛生年：３、郵便番号：２｝の場合、多様性を満たす極大の階層は、｛生年：３、郵便番号：０｝で墨塗り不要と、｛生年：２、郵便番号：２｝で墨塗り｛８｝である。同様に、Ｐ＝｛生年：３、趣味：２｝の場合は（｛生年：２、趣味：１｝、｛８｝）と（｛生年：１、趣味：２｝、｛８｝）であり、Ｐ＝｛郵便番号：２、趣味：２｝の場合は（｛郵便番号：２、趣味：１｝、｛８｝）と（｛郵便番号：０、趣味：２｝、｛｝）である。よって、Ｏ＝｛（｛生年：３、郵便番号：０｝、｛｝）、（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、趣味：１｝、｛８｝）、（｛生年：１、趣味：２｝、｛８｝）、（｛郵便番号：２、趣味：１｝、｛８｝）、（｛郵便番号：０、趣味：２｝、｛｝）｝である。図２の例では、集合Ｏの要素のうち、（｛生年：３、郵便番号：０｝、｛｝）と（｛郵便番号：０、趣味：２｝、｛｝）｝はいずれも階層が正値である列名が１つしかないため、これらを削除し、Ｏ＝｛（｛生年：２、郵便番号：２｝、｛８｝）、（｛生年：２、趣味：１｝、｛８｝）、（｛生年：１、趣味：２｝、｛８｝）、（｛郵便番号：２、趣味：１｝、｛８｝）｝となる。本ステップの処理が終わると、処理はＳ１１４に進む。 In S112, the generalization unit 104 of the data editing apparatus 100 empties the set O, and adds, for each element PεC, all maximal hierarchies that satisfy diversity (satisfy diversity definition D) to the set O. The processing in this step is the same as the processing shown in FIG. Therefore, repeated description is omitted. For example, if each element P of the set C = {birth year: 3, zip code: 2}, the maximum hierarchy satisfying the diversity is {birth year: 3, zip code: 0} and no sanitization is required, {birth year: 2, with a postal code: 2} and inked {8}. Similarly, in the case of P = {birth year: 3, hobby: 2}, ({birth year: 2, hobby: 1}, {8}) and ({birth year: 1, hobby: 2}, {8}) , P = {zip code: 2, hobby: 2} are ({zip code: 2, hobby: 1}, {8}) and ({zip code: 0, hobby: 2}, {}) . Therefore, O = {({birth year: 3, zip code: 0}, {}), ({birth year: 2, zip code: 2}, {8}), ({birth year: 2, hobby: 1}, { 8}), ({birth year: 1, hobby: 2}, {8}), ({zip code: 2, hobby: 1}, {8}), ({zip code: 0, hobby: 2}, { })}. In the example of FIG. 2, among the elements of the set O, ({birth year: 3, postal code: 0}, {}) and ({postal code: 0, hobby: 2}, {})} are both hierarchical. Since there is only one column name that is a positive value, these are deleted, and O = {({birth year: 2, postal code: 2}, {8}), ({birth year: 2, hobby: 1}, { 8}), ({birth year: 1, hobby: 2}, {8}), ({zip code: 2, hobby: 1}, {8})}. When the process of this step ends, the process proceeds to S114.

Ｓ１１４でデータ編集装置１００の一般化部１０４は、集合Ｏの要素のうち、階層が正値である列名が１つ以下のものを削除する。本ステップの処理が終わると、処理はＳ１１６に進む。 In S114, the generalization unit 104 of the data editing apparatus 100 deletes the elements of the set O that have one or less column names having a positive hierarchy. When the process of this step ends, the process proceeds to S116.

Ｓ１１６でデータ編集装置１００の一般化部１０４は、ｉの値が（関係モデルＲの列数−１）以上であるかどうかを判定する。もし、この判定の結果が“ＹＥＳ”、すなわち、ｉの値が（関係モデルＲの列数−１）以上である場合、処理はＳ１１８に進む。もし、この判定の結果が“ＮＯ”、すなわち、ｉの値が（関係モデルＲの列数−１）以上でない場合、処理はＳ１０８に戻る。 In S116, the generalization unit 104 of the data editing apparatus 100 determines whether the value of i is equal to or greater than (the number of columns of the relational model R−1). If the result of this determination is “YES”, that is, if the value of i is equal to or greater than (number of columns of relational model R−1), the process proceeds to S118. If the result of this determination is “NO”, that is, if the value of i is not equal to or greater than (number of columns of relational model R−1), the process returns to S108.

２回目のＳ１１０の処理では、列名集合の異なる２つのＧ∈Ｏの｛列名：階層｝の組み合わせは、（１’）｛生年：２、郵便番号：２｝、｛生年：２、趣味：１｝、（２’）｛生年：２、郵便番号：２｝、｛生年：１、趣味：２｝、（３’）｛生年：２、郵便番号：２｝、｛生年：２、趣味：１｝、（４’）｛生年：２、趣味：１｝、｛郵便番号：２、趣味：１｝、（５’）｛生年：１、趣味：２｝、｛郵便番号：２、趣味：１｝の５通りとなる。 In the second processing of S110, combinations of {column name: hierarchy} of two GεOs with different column name sets are (1 ′) {birth year: 2, postal code: 2}, {birth year: 2, hobby. : 1}, (2 ') {Birth year: 2, Zip code: 2}, {Birth year: 1, Hobby: 2}, (3') {Birth year: 2, Zip code: 2}, {Birth year: 2, Hobby] : 1}, (4 ') {Birth year: 2, Hobby: 1}, {Zip code: 2, Hobby: 1}, (5') {Birth year: 1, Hobby: 2}, {Zip code: 2, Hobby : 1}.

ここで、データ編集装置１００の一般化部１０４は、重複列名の階層は互いの最小値として融合する。すると、上記（１’）〜（５’）は、それぞれ（１）｛生年：２、郵便番号：２、趣味：１｝、（２）｛生年：１、郵便番号：２、趣味：２｝、（３）｛生年：２、郵便番号：２、趣味：１｝、（４）｛生年：２、郵便番号：２、趣味：１｝、（５）｛生年：１、郵便番号：２、趣味：１｝となる。これらの項目のうち、項目（３）＜項目（１）、項目（４）＜項目（１）、項目５＜項目（２）なので、これら極大でない項目を除いて項目（１）、（２）を残し、集合Ｃ＝｛｛生年：２、郵便番号：２、趣味：１｝、｛生年：１、郵便番号：２、趣味:２｝｝となる。 Here, the generalization unit 104 of the data editing apparatus 100 merges the hierarchy of duplicate column names as the minimum value of each other. Then, the above (1 ′) to (5 ′) are respectively (1) {birth year: 2, postal code: 2, hobby: 1}, (2) {birth year: 1, postal code: 2, hobby: 2} , (3) {birth year: 2, postal code: 2, hobby: 1}, (4) {birth year: 2, postal code: 2, hobby: 1}, (5) {birth year: 1, postal code: 2, Hobby: 1}. Among these items, item (3) <item (1), item (4) <item (1), item 5 <item (2), so items (1) and (2) are excluded except for these non-maximum items. And set C = {{year of birth: 2, postal code: 2, hobby: 1}, {birth year: 1, postal code: 2, hobby: 2}}.

２回目のＳ１１２の処理では、集合Ｃの各要素Ｐ＝｛生年：２、郵便番号：２、趣味：１｝の場合多様性を満たす極大の階層は（｛生年：２、郵便番号：１、趣味：１｝、｛８｝）と（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）であり、Ｐ＝｛生年：１、郵便番号：２、趣味：２｝の場合多様性を満たす極大の階層は（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）と（｛生年：１、郵便番号：０、趣味：２｝、｛８｝）である。よって、Ｏ＝｛（｛生年：２、郵便番号：１、趣味：１｝、｛８｝、（｛生年：１、郵便番号：２、趣味：１｝、｛８｝）、（｛生年：１、郵便番号：０、趣味：２｝、｛８｝）｝である。 In the second processing of S112, if each element P of the set C = {birth year: 2, zip code: 2, hobby: 1}, the maximum hierarchy that satisfies the diversity is ({birth year: 2, zip code: 1, Hobbies: 1}, {8}) and ({birth year: 1, zip code: 2, hobbies: 1}, {8}), P = {birth year: 1, zip code: 2, hobbies: 2} The maximum hierarchies that satisfy diversity are ({birth year: 1, zip code: 2, hobby: 1}, {8}) and ({birth year: 1, zip code: 0, hobby: 2}, {8}) It is. Therefore, O = {({Birth year: 2, Zip code: 1, Hobby: 1}, {8}, ({Birth year: 1, Zip code: 2, Hobby: 1}, {8}), ({Birth year: 1, postal code: 0, hobby: 2}, {8})}.

２回目のＳ１１４でデータ編集装置１００の一般化部１０４は、集合Ｏに削除対象となる要素はなく、何もしない。 In the second S114, the generalization unit 104 of the data editing apparatus 100 has no elements to be deleted in the set O and does nothing.

Ｓ１１８でデータ編集装置１００の出力部１０６は、集合Ｏを出力する。Ｓ１１８でデータ編集装置１００の出力部１０６から出力される集合Ｏは、一般化された関係モデルと呼ばれることもある。出力の例は図９に示されている。 In S118, the output unit 106 of the data editing apparatus 100 outputs the set O. The set O output from the output unit 106 of the data editing apparatus 100 in S118 may be referred to as a generalized relational model. An example output is shown in FIG.

図１５のＳ１１０の処理は、これまでのＣの要素の列数より１列多い列集合について、Ｓ１１２で集合Ｏに含まれる可能性のある極大の｛列名：階層｝をＣとして算出することが目的である。Ｓ１１０では簡単に算出しているが、Ｓ１１２で集合Ｏに含まれる可能性のない｛列名：階層｝をより詳細に算出しても良い。 The process of S110 of FIG. 15 calculates the maximum {column name: hierarchy} that may be included in the set O in S112 as C for a column set that is one column larger than the number of columns of the C elements so far. Is the purpose. Although it is simply calculated in S110, {column name: hierarchy} that may not be included in the set O may be calculated in more detail in S112.

図１５のＳ１１４での処理は、階層が正値である列名が１つ以下の一般化情報が算出できなくなるが、全体の処理量を減らせるという効果がある。この処理をスキップしても一般化情報を算出できるが、階層が正値である列名が１つ以下の一般化情報による一般化は統計化で代用できる。 The process in S114 of FIG. 15 has the effect of reducing the overall processing amount, although it becomes impossible to calculate generalized information with one or less column names having a positive hierarchy. Even if this process is skipped, generalized information can be calculated, but generalization by generalized information with one or less column names having a positive hierarchy can be substituted by statistics.

以上の処理例により、関係モデルＲを、最大墨塗り行数ｓ以下で墨塗りし、多様性定義Ｄを満たすよう一般化木Ｔで一般化するための一般化情報の集合が得られる。また、いずれかの一般化情報で関係モデルＲを一般化した結果を得られる。これにより、個票データを、２次活用のため、プライバシー保護度合いの高い関係モデルに変換することができる。 With the above processing example, the relation model R is sanitized with the maximum number of sanitized lines s or less, and a set of generalized information for generalizing with the generalized tree T so as to satisfy the diversity definition D is obtained. Moreover, the result of generalizing the relation model R with any generalized information can be obtained. Thus, the individual vote data can be converted into a relational model with a high degree of privacy protection for secondary use.

以上の実施形態に関し、さらに以下の付記を開示する。
（付記１）
複数のサンプルのそれぞれに対する複数の属性の値を含む関係モデルと、前記属性ごとに階層と前記階層のそれぞれに対応する一般化値とが定義された一般化情報を用いて、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出する
処理をコンピュータに実行させることを特徴とするデータ編集プログラム。
（付記２）
さらに、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせにおいて、前記複数の属性の一つの値に対して、前記複数の属性の他の全てに対する前記一般化値が、前記複数の属性の他の属性および前記他の属性に対応する階層によって定義される多様性定義Ｄを満足するために、関係モデルに含まれる複数の属性の値のうち、所定の数のサンプルに関する複数の属性の値を削除するする処理をコンピュータに実行させることを特徴とする、付記１に記載のデータ編集プログラム。
（付記３）
前記複数の属性の一つａと前記階層をｌの組から前記多様性定義への写像Ｄ：（ａ、ｌ）→ｄによって得られる写像ｄは、前記前記複数の属性の一つａに対する度数分布をＦとして論理値への写像ｄ：Ｆ→１／０であり、Ｆ１＋Ｆ２は前記度数分布Ｆ１及びＦ２の和、∧を論理積演算として、
（ｄ１）単調性：（ｄ（Ｆ１）＝１）∧（ｄ（Ｆ２）＝１）⇒ｄ（Ｆ１＋Ｆ２）＝１、
（ｄ２）階層単調性：ｄ（Ｆ）＝１⇒ｄ’（Ｆ’）＝１、
を満たす、付記２に記載のデータ編集プログラム。
（付記４）
前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の全ての属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出する処理をコンピュータに実行させることを特徴とする、付記１乃至３のいずれか一項に記載の編集プログラム。
（付記５）
さらに、前記一般化情報一般化木で表したとき、前記一般化木の根に相当する前記階層を含む前記属性を最大１つだけ含むように前記一般化を行う処理をコンピュータに実行させることを特徴とする、付記１乃至４のいずれか一項に記載のデータ編集プログラム。
（付記６）
コンピュータによって実行されるデータ編集方法であって、複数のサンプルのそれぞれに対する複数の属性の値を含む関係モデルと、前記属性ごとに階層と前記階層のそれぞれに対応する一般化値とが定義された一般化情報を用いて、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出すること、
を含むデータ編集方法。
（付記７）
さらに、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせにおいて、前記複数の属性の一つの値に対して、前記複数の属性の他の全てに対する前記一般化値が、前記複数の属性の他の属性および前記他の属性に対応する階層によって定義される多様性定義Ｄを満足するために、関係モデルに含まれる複数の属性の値のうち、所定の数のサンプルに関する複数の属性の値を削除することを含む、付記６に記載のデータ編集方法。
（付記８）
前記多様性定義は、前記複数の属性の一つａと前記階層をｌの組から前記多様性定義への写像Ｄ：（ａ、ｌ）→ｄによって得られる写像ｄは、前記前記複数の属性の一つａに対する度数分布をＦとして論理値への写像ｄ：Ｆ→１／０であり、Ｆ１＋Ｆ２は前記度数分布Ｆ１及びＦ２の和、∧を論理積演算として、
（ｄ１）単調性：（ｄ（Ｆ１）＝１）∧（ｄ（Ｆ２）＝１）⇒ｄ（Ｆ１＋Ｆ２）＝１、
（ｄ２）階層単調性：ｄ（Ｆ）＝１⇒ｄ’（Ｆ’）＝１、
を満たす、付記７に記載のデータ編集方法。
（付記９）
前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の全ての属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出すること、を含む付記６乃至８のいずれか一項に記載のデータ編集方法。
（付記１０）
さらに、前記一般化情報一般化木で表したとき、前記一般化木の根に相当する前記階層を含む前記属性を最大１つだけ含むように前記一般化を行うことを含む、付記６乃至９のいずれか一項に記載のデータ編集方法。
（付記１１）
複数のサンプルのそれぞれに対する複数の属性の値を含む関係モデルと、前記属性ごとに階層と前記階層のそれぞれに対応する一般化値とが定義された一般化情報を用いて、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出する一般化部、
を含むことを特徴とするデータ編集装置。
（付記１２）
前記一般化部は、さらに、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせにおいて、前記複数の属性の一つの値に対して、前記複数の属性の他の全てに対する前記一般化値が、前記複数の属性の他の属性および前記他の属性に対応する階層によって定義される多様性定義Ｄを満足するために、関係モデルに含まれる複数の属性の値のうち、所定の数のサンプルに関する複数の属性の値を削除する、付記１１に記載のデータ編集装置。
（付記１３）
前記多様性定義は、前記複数の属性の一つａと前記階層をｌの組から前記多様性定義への写像Ｄ：（ａ、ｌ）→ｄによって得られる写像ｄは、前記前記複数の属性の一つａに対する度数分布をＦとして論理値への写像ｄ：Ｆ→１／０であり、Ｆ１＋Ｆ２は前記度数分布Ｆ１及びＦ２の和、∧を論理積演算として、
（ｄ１）単調性：（ｄ（Ｆ１）＝１）∧（ｄ（Ｆ２）＝１）⇒ｄ（Ｆ１＋Ｆ２）＝１、
（ｄ２）階層単調性：ｄ（Ｆ）＝１⇒ｄ’（Ｆ’）＝１、
を満たす、付記１１に記載のデータ編集装置。
（付記１４）
前記一般化部は、前記階層の一つに対応する前記複数のサンプルの前記一般化値の組み合わせであって、前記複数の属性のうち一つの属性に対応する前記一般化値に対して前記複数の属性の他の全ての属性に対応する前記一般化値が多様性を有する前記一般化値の組み合わせを抽出する、付記１１乃至１３のいずれか一項に記載のデータ編集装置。
（付記１５）
前記一般化部は、前記一般化情報一般化木で表したとき、前記一般化木の根に相当する前記階層を含む前記属性を最大１つだけ含むように前記一般化を行う、付記１１乃至１４のいずれか一項に記載のデータ編集装置。 Regarding the above embodiment, the following additional notes are disclosed.
(Appendix 1)
One of the hierarchies using a relational model including values of a plurality of attributes for each of a plurality of samples and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute. A combination of the generalized values of the plurality of samples corresponding to the generalized value corresponding to another attribute of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes A data editing program for causing a computer to execute a process of extracting a combination of generalized values having a variety of normalized values.
(Appendix 2)
Furthermore, in the combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, the generalized value for all of the plurality of attributes is set to one value of the plurality of attributes, In order to satisfy the diversity definition D defined by the other attributes of the plurality of attributes and the hierarchy corresponding to the other attributes, a predetermined number of samples among the values of the plurality of attributes included in the relation model The data editing program according to appendix 1, wherein the computer executes a process of deleting a plurality of attribute values.
(Appendix 3)
The mapping d obtained by mapping D: (a, l) → d from the set of one of the plurality of attributes a and the hierarchy to the diversity definition is a frequency for one of the plurality of attributes a Mapping of distribution to F as logical value d: F → 1/0, F1 + F2 is the sum of frequency distributions F1 and F2, and ∧ is a logical product operation.
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1
(D2) hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1,
The data editing program according to Appendix 2, satisfying
(Appendix 4)
A combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, wherein all other of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes The editing program according to any one of appendices 1 to 3, wherein the computer executes a process of extracting a combination of the generalized values in which the generalized values corresponding to the attributes of the variable have diversity.
(Appendix 5)
Further, when represented by the generalized information generalized tree, the computer causes the computer to execute a process of performing the generalization so as to include at most one of the attributes including the hierarchy corresponding to the root of the generalized tree. The data editing program according to any one of appendices 1 to 4.
(Appendix 6)
A data editing method executed by a computer, wherein a relational model including values of a plurality of attributes for each of a plurality of samples, and a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each of the attributes The generalized information is a combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, and the generalized value corresponding to one attribute of the plurality of attributes Extracting a combination of the generalized values in which the generalized values corresponding to other attributes of a plurality of attributes have diversity;
Data editing method including.
(Appendix 7)
Furthermore, in the combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, the generalized value for all of the plurality of attributes is set to one value of the plurality of attributes, In order to satisfy the diversity definition D defined by the other attributes of the plurality of attributes and the hierarchy corresponding to the other attributes, a predetermined number of samples among the values of the plurality of attributes included in the relation model The data editing method according to appendix 6, including deleting a plurality of attribute values.
(Appendix 8)
In the diversity definition, a mapping d obtained by mapping D: (a, l) → d from a set of one of the plurality of attributes a and the hierarchy l to the diversity definition is the plurality of attributes. F is a frequency distribution for one of a and F is a mapping to a logical value d: F → 1/0, and F1 + F2 is the sum of the frequency distributions F1 and F2, and ∧ is a logical product operation.
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1
(D2) hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1,
The data editing method according to appendix 7, wherein:
(Appendix 9)
A combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, wherein all other of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes The data editing method according to any one of appendices 6 to 8, further comprising: extracting a combination of the generalized values in which the generalized values corresponding to the attributes of the variable have diversity.
(Appendix 10)
Further, when represented by the generalized information generalized tree, the generalization is performed so as to include at most one attribute including the hierarchy corresponding to a root of the generalized tree. The data editing method according to one item.
(Appendix 11)
One of the hierarchies using a relational model including values of a plurality of attributes for each of a plurality of samples and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute. A combination of the generalized values of the plurality of samples corresponding to the generalized value corresponding to another attribute of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes A generalization unit for extracting a combination of the generalized values having diversified values.
A data editing apparatus comprising:
(Appendix 12)
The generalization unit further includes, for one value of the plurality of attributes, for all other values of the plurality of attributes in the combination of the generalized values of the plurality of samples corresponding to one of the hierarchies. In order for the generalized value to satisfy the diversity definition D defined by the other attribute of the plurality of attributes and the hierarchy corresponding to the other attribute, among the values of the plurality of attributes included in the relationship model, The data editing apparatus according to attachment 11, wherein values of a plurality of attributes relating to a predetermined number of samples are deleted.
(Appendix 13)
In the diversity definition, a mapping d obtained by mapping D: (a, l) → d from a set of one of the plurality of attributes a and the hierarchy l to the diversity definition is the plurality of attributes. F is a frequency distribution for one of a and F is a mapping to a logical value d: F → 1/0, and F1 + F2 is the sum of the frequency distributions F1 and F2, and ∧ is a logical product operation.
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1
(D2) hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1,
The data editing apparatus according to Supplementary Note 11, wherein
(Appendix 14)
The generalization unit is a combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, and the plurality of the generalized values corresponding to one attribute among the plurality of attributes. 14. The data editing apparatus according to any one of appendices 11 to 13, wherein the generalized values corresponding to all the other attributes are extracted from a combination of the generalized values.
(Appendix 15)
The generalization unit, when represented by the generalized information generalized tree, performs the generalization so as to include at most one attribute including the hierarchy corresponding to the root of the generalized tree. The data editing device according to any one of the above.

１００データ編集装置
１０２入力部
１０４一般化部
１０６出力部 DESCRIPTION OF SYMBOLS 100 Data editing apparatus 102 Input part 104 Generalization part 106 Output part

Claims

One of the hierarchies using a relational model including values of a plurality of attributes for each of a plurality of samples, and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute. A combination of the generalized values of the plurality of samples corresponding to, corresponding to all other attributes of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes A data editing program that causes a computer to execute a process of extracting a combination of the generalized values in which the generalized values have diversity.

One of the hierarchies using a relational model including values of a plurality of attributes for each of a plurality of samples and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute. A combination of the generalized values of the plurality of samples corresponding to the generalized value corresponding to another attribute of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes A combination of the generalized values having a variety of normalized values,
In the combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, the generalized value for all of the other attributes of the plurality of attributes is the plurality of the plurality of attributes. In order to satisfy the diversity definition defined by the other attribute of the attribute and the hierarchy corresponding to the other attribute, a plurality of attributes related to a predetermined number of samples among the values of the plurality of attributes included in the relation model A data editing program for causing a computer to execute a process of deleting the value of.

In the diversity definition, a mapping d obtained by mapping D: (a, l) → d from a set of one of the plurality of attributes a and the hierarchy l to the diversity definition is the plurality of attributes. F is a frequency distribution for one of a and F is a mapping to a logical value d: F → 1/0, and F1 + F2 is the sum of the frequency distributions F1 and F2, and ∧ is a logical product operation.
(D1) Monotonicity: (d (F1) = 1) ∧ (d (F2) = 1) => d (F1 + F2) = 1
(D2) hierarchical monotonicity: d (F) = 1⇒d ′ (F ′) = 1,
The data editing program according to claim 2, wherein:

Further, when the generalized information is represented by a generalized tree, the computer is caused to execute a process of performing generalization so as to include at most one of the attributes including the hierarchy corresponding to a root of the generalized tree. The data editing program according to any one of claims 1 to 3 .

A data editing method executed by a computer, wherein a relational model including values of a plurality of attributes for each of a plurality of samples, and a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each of the attributes The generalized information is a combination of the generalized values of the plurality of samples corresponding to one of the hierarchies, and the generalized value corresponding to one attribute of the plurality of attributes Extracting a combination of the generalized values in which the generalized values corresponding to all other attributes of a plurality of attributes have diversity;
Data editing method including.

One of the hierarchies using a relational model including values of a plurality of attributes for each of a plurality of samples and generalized information in which a hierarchy and a generalized value corresponding to each of the hierarchies are defined for each attribute. A combination of the generalized values of the plurality of samples corresponding to, corresponding to all other attributes of the plurality of attributes with respect to the generalized value corresponding to one attribute of the plurality of attributes A generalization unit for extracting a combination of the generalized values in which the generalized values have diversity;
A data editing apparatus comprising: