JP2013125374A

JP2013125374A - Information processing method, device, and program

Info

Publication number: JP2013125374A
Application number: JP2011273037A
Authority: JP
Inventors: Yuji Yamaoka; 裕司山岡
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-12-14
Filing date: 2011-12-14
Publication date: 2013-06-24
Anticipated expiration: 2031-12-14
Also published as: JP5772563B2

Abstract

PROBLEM TO BE SOLVED: To protect privacy while leaving an inclination of appearance distribution of an original value.SOLUTION: The information processing method includes steps of: determining, for every type of attribute values of a first attribute that is included in a plurality of records and designated as an object of ambiguity, whether or not a distribution of the number of records satisfies a condition representing to have a large deviation, from data stored in a data storage part for storing the number of records in which the attribute values of the first attribute appear, in the plurality of records; and when the distribution of the number of records satisfies the condition representing to have the large deviation, replacing the attribute value of the first attribute of at least one record of the plurality of records, by ambiguity data, and storing them in the data storage part.

Description

本技術は、データを匿名化する技術に関する。 The present technology relates to a technology for anonymizing data.

プライバシーに関わるデータの統計値を開示するとき、データの分布によってはプライバシーが侵害される場合がある。たとえば、４人の従業員に対しアンケート調査をおこない、その統計結果を従業員に開示することを考える。 When disclosing statistical values of data related to privacy, privacy may be infringed depending on the distribution of data. For example, consider conducting a questionnaire survey of four employees and disclosing the statistical results to the employees.

図１に、アンケート結果例を示す。３つの質問に対し、それぞれ４人が回答し、その統計結果が示されている。ここで、それぞれの回答はプライバシー情報であるものとする。すなわち、それぞれの質問につき、各従業員は自分が何と回答したか他の従業員に知られたくないと考えているものとする。また、各従業員は誰がこのアンケートに回答したか知っているものとする。 FIG. 1 shows an example of a questionnaire result. Four people answered each of the three questions, and the statistical results are shown. Here, it is assumed that each answer is privacy information. That is, for each question, each employee thinks that he / she does not want other employees to know what he / she answered. Each employee also knows who answered this questionnaire.

このとき、質問１の回答は、全員が「不満」であるため、各従業員が何と回答したか他の従業員に知られてしまうという問題がある。また、質問２の回答は、１人以外の全員が「不満」であるため、どの従業員が「不満」と回答したか、その１人に知られてしまうという問題がある。なお、質問３の回答は偏りが小さく、結託をしない限り、どの従業員にも他の従業員の回答を一意に知ることができない。 At this time, since all of the answers to question 1 are “dissatisfied”, there is a problem that other employees know what each employee answered. Further, since the answer to question 2 is that all but one person is “dissatisfied”, there is a problem that which one of the employees answered “dissatisfied” is known. Note that the answers to question 3 are small in bias, and no employee can uniquely know the answers of other employees unless collusion is made.

ところで、ｋ−匿名化技術という匿名化技術が知られている。ｋ−匿名化技術とは、例えば表データについて、プライバシー上あまり問題とならない属性の値が全て同じレコードがｋ個以上となるようにデータを変更する技術である。 By the way, the anonymization technique called k-anonymization technique is known. The k-anonymization technique is a technique for changing data so that, for example, table data has k or more records having the same attribute values that do not cause much privacy.

たとえば、部署や年齢とともに、従業員８人に対し図１のような質問についてアンケート調査を行うことを考える。 For example, consider a questionnaire survey on questions such as those shown in FIG.

図２に、このアンケート回答例を示す。各レコード（すなわち行）が各従業員の回答内容である。各属性（すなわち列）は調査項目で、「部署」及び「年齢」はプライバシー上あまり問題とならない属性とし、「回答」内容はプライバシー情報であるものとする。また、前と同様に、各従業員は誰がこのアンケートに回答したかを知っているものとする。 FIG. 2 shows an example of this questionnaire response. Each record (that is, a row) is the response content of each employee. Each attribute (that is, column) is a survey item, “Department” and “Age” are attributes that do not cause much privacy, and “Reply” content is privacy information. Also, as before, each employee knows who answered this questionnaire.

さらに、今度は、アンケート結果を開示する際に、全体の統計値だけでなく、できるだけ詳細な統計値も開示したいとする。それにより、例えば「開発部」は不満率が高いとか、「若い者」は不満率が高いといった、多くの情報を提供できる可能性がある。 Furthermore, this time, when disclosing the questionnaire results, it is assumed that not only the overall statistical values but also the statistical values as detailed as possible are disclosed. As a result, for example, the “development department” may be able to provide a lot of information, such as a high dissatisfaction rate, or a “young person” having a high dissatisfaction rate.

但し、図２のデータをそのまま開示することはプライバシー上問題がある。そのまま開示すると、例えば企画部２６歳の太郎さんが「回答」したことを知っている者には、最初のレコードが太郎さんだということが分かるので、太郎さんが不満を抱いていることが分かってしまう。 However, disclosing the data of FIG. 2 as it is has a privacy problem. If it is disclosed as it is, for example, those who know that Taro, 26-year-old Planning Department answered, will know that the first record is Taro, so I know that Taro is dissatisfied End up.

そこで、ｋ-匿名化技術により変更したデータを開示することが考えられる。ｋ-匿名化技術を使うと、プライバシー上あまり問題とならない属性である、「部署」及び「年齢」の値が変更対象となる。 Therefore, it is conceivable to disclose data changed by the k-anonymization technique. When k-anonymization technology is used, the values of “department” and “age”, which are attributes that do not cause much privacy problems, are changed.

図３に、ｋ−匿名化技術（ｋ＝４）を適用し、データを変更した例を示す。この表を見ても、企画部２６歳の太郎さんのレコードは最初の１乃至４番目のどれかということまでしかわからない。一方、企画部／開発部は不満率が高い傾向があるという情報が得られる。 FIG. 3 shows an example in which data is changed by applying the k-anonymization technique (k = 4). Even if you look at this chart, you can only tell if the record of Taro, 26 years old, is one of the first to fourth. On the other hand, information that the planning department / development department tends to have a high dissatisfaction rate is obtained.

このように、ある程度情報を残しつつ、任意の個人（一般的には人に限らない）のレコードがどれか少なくともｋレコードまでしか絞れないようにするのが、ｋ−匿名化技術の効果である。 As described above, it is an effect of the k-anonymization technology that allows any individual (generally not limited to) records to be limited to at least k records while leaving some information. .

しかし、ｋ−匿名化技術を適用したからといって、開示してもプライバシー問題がない表に変更されるとは限らない。 However, just because the k-anonymization technology is applied does not necessarily change to a table that does not have a privacy problem even if disclosed.

上で述べた例では、例えば企画部４２歳で「普通」と回答した次郎さんが図３のデータを見た場合、自分のレコードは２番目であるから企画部２６歳の太郎さんのレコードは１番目か３番目か４番目であることが分かる。そうすると、太郎さんが不満と回答していることが分かってしまう。これは、(部署, 年齢)＝（企画部／開発部，２５／２６／２８／４２）であるグループにおける回答の統計値｛不満：３，普通：１｝の偏りが大きいことによる。 In the example described above, for example, when Jiro who answered “normal” at the planning department 42 years old looks at the data in FIG. 3, his record is second, so the record of Mr. Taro 26 years old at the planning department is It turns out that it is 1st, 3rd, or 4th. Then, it turns out that Taro responded that he was dissatisfied. This is because there is a large bias in the statistical values {dissatisfaction: 3, normal: 1} of responses in the group where (department, age) = (planning department / development department, 25/26/28/42).

このように、ｋ−匿名化技術を適用した表を開示することは、少ないレコード数での統計値を複数開示することに相当し、偏りの大きい統計値が生じやすい。 Thus, disclosing a table to which the k-anonymization technique is applied corresponds to disclosing a plurality of statistical values with a small number of records, and a statistical value with a large bias tends to occur.

また、プライバシー情報となる属性の値の偏りを小さくするｋ−匿名化技術として、ｌ−多様性を満たすｋ−匿名化技術がある。 Moreover, there is a k-anonymization technique that satisfies l-diversity as a k-anonymization technique that reduces the bias of attribute values that serve as privacy information.

ｌ−多様性とは、ｋ−匿名化により作られる各グループ（プライバシー上あまり問題とならない属性の値が全て同じレコード群）のプライバシー情報となる属性の統計に、ｌ種類以上の属性値が含まれる性質である。例えば、図３のデータは２−多様性を満たす。なぜなら、（部署，年齢）＝（企画部／開発部，２５／２６／２８／４２）であるグループの回答は不満と普通の２種類があり、（部署，年齢）＝（管理部／営業部，２４／３５／３６／４４）であるグループの回答は不満と普通と満足の３種類があり、他にグループはないからである。 l-diversity is an attribute statistic that is privacy information for each group created by k-anonymization (records with the same attribute values that do not cause much privacy). It is a property. For example, the data in FIG. 3 satisfies 2-diversity. Because there are two types of responses, dissatisfied and normal, (Department, Age) = (Management Department / Sales Department), (Department, Age) = (Planning Department / Development Department, 25/26/28/42) , 24/35/36/44), there are three types of responses, dissatisfied, normal and satisfied, and there are no other groups.

図３のデータは２−多様性を満たすが、上で述べたようにプライバシー保護が不十分である。よって、一般的に２−多様性を満たすだけではプライバシー保護が不十分であるといえる。 The data in FIG. 3 satisfies 2-diversity, but as mentioned above, privacy protection is insufficient. Therefore, it can be said that privacy protection is generally insufficient simply by satisfying 2-diversity.

そこでｌ≧３とすることが考えられるが、ｌを大きくすると開示できる情報が少なくなるという問題がある。 Therefore, it is conceivable that l ≧ 3. However, there is a problem that if l is increased, less information can be disclosed.

図４は図２に示すデータに対してｌ−多様性（ｌ＝３）を満たすｋ−匿名化技術（ｋ＝４）を適用し、データを変更した例である。この表を開示しても、自身の回答以外は、誰がどんな回答をしたのか誰にも一意に決められない。 FIG. 4 shows an example in which the data shown in FIG. 2 is changed by applying k-anonymization technology (k = 4) that satisfies l-diversity (l = 3). Even if this table is disclosed, no one can uniquely determine who made what other than his own.

しかし、図４の表からは、もはや全体の統計値以上に意味がありそうな情報を得るのが難しい。たとえば、開発部は（平均より）不満率が高いとか、若い者は不満率が高いといった情報は得られない。 However, from the table in FIG. 4, it is difficult to obtain information that seems to be more meaningful than the overall statistics. For example, the development department cannot get information that the dissatisfaction rate is higher (than the average) or that young people have a high dissatisfaction rate.

このように、ｌ−多様性を満たすｋ−匿名化技術は、得られる情報を多くするにはｌを小さくしたいが、ｌ≦２にするとプライバシー保護が不十分の場合があるという問題がある。 As described above, the k-anonymization technique satisfying l-diversity has a problem that privacy protection may be insufficient when l ≦ 2 although it is desired to reduce l in order to increase the obtained information.

また、別の技術として、従来から、プライバシー上あまり問題とならない属性の値を確率的に変更することで、開示してもプライバシー上あまり問題とならない表にする、撹乱技術がある。しかしながら、このような技術を用いると、誰がどんな回答をしたのか推定するのは難しいが、もはや全体の統計値以上に意味がありそうな情報を得るのが難しい。すなわち、開発部は不満率が高いとか、若い者は不満率が高いといった情報は得られない。確率的な情報を得ることができるはずだが、その確率を計算するのは難しく、計算できたとしても多くの情報は望めないと考えられる。 As another technique, conventionally, there is a disturbance technique in which the value of an attribute that does not cause much privacy is probabilistically changed to a table that does not cause much privacy even if disclosed. However, using such a technique, it is difficult to estimate who answered what, but it is difficult to obtain information that is more meaningful than the overall statistics. In other words, there is no information that the development department has a high dissatisfaction rate or that young people have a high dissatisfaction rate. Although probabilistic information should be obtained, it is difficult to calculate the probability, and even if it can be calculated, it is thought that a lot of information cannot be expected.

特開２０１１−１２８８６２号公報JP 2011-128862 A 特開２０１１−１００１１６号公報JP 2011-100116 A

L. Sweeney. Achieving k-Anonymity Privacy Protection using Generalization and Suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 10, No. 5, pp. 571-588, 2002.L. Sweeney. Achieving k-Anonymity Privacy Protection using Generalization and Suppression.International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 10, No. 5, pp. 571-588, 2002. K. LeFevre, D. J. DeWitt, R. Ramakrishnan. Incognito: Efficient Full-Domain k-Anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49-60, 2005.K. LeFevre, D. J. DeWitt, R. Ramakrishnan. Incognito: Efficient Full-Domain k-Anonymity. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 49-60, 2005. K. LeFevre, D. J. DeWitt, R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In Proceedings of the 22nd International Conference on Data Engineering, pp. 25-, 2006.K. LeFevre, D. J. DeWitt, R. Ramakrishnan. Mondrian Multidimensional k-Anonymity. In Proceedings of the 22nd International Conference on Data Engineering, pp. 25-, 2006. A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, Vol. 1, Issue 1, Article No. 3, 2007.A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. L-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, Vol. 1, Issue 1, Article No. 3, 2007.

従って、本技術の目的は、一側面によれば、元の値の出現分布の傾向を残しつつプライバシー保護を行う技術を提供することである。 Accordingly, an object of the present technology is, according to one aspect, to provide a technology that protects privacy while leaving a tendency of an appearance distribution of original values.

本技術に係る情報処理方法は、（Ａ）複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、レコードの数の分布が、偏りが大きいことを表す条件を満たしているか判断するステップと、（Ｂ）レコードの数の分布が、偏りが大きいことを表す条件を満たしている場合には、複数のレコードのうち少なくとも１のレコードにおける第１の属性の属性値を、曖昧化データに置換し、データ格納部に格納するステップとを含む。 The information processing method according to the present technology includes (A) the attribute of the first attribute among the plurality of records for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. A step of determining whether or not the distribution of the number of records satisfies a condition indicating that the deviation is large from data stored in the data storage unit in which the number of records in which the value appears is stored; If the distribution of the number satisfies the condition indicating that the bias is large, the attribute value of the first attribute in at least one of the plurality of records is replaced with the ambiguous data, and the data storage unit And storing in the.

元の値の出現分布の傾向を残しつつプライバシー保護を行うことができる。 Privacy protection can be performed while leaving the trend of the appearance distribution of the original values.

図１は、背景技術を説明するための図である。FIG. 1 is a diagram for explaining the background art. 図２は、背景技術を説明するための図である。FIG. 2 is a diagram for explaining the background art. 図３は、背景技術を説明するための図である。FIG. 3 is a diagram for explaining the background art. 図４は、背景技術を説明するための図である。FIG. 4 is a diagram for explaining the background art. 図５は、第１の実施の形態における情報処理装置の機能ブロック図である。FIG. 5 is a functional block diagram of the information processing apparatus according to the first embodiment. 図６は、頻度表の一例を示す図である。FIG. 6 is a diagram illustrating an example of a frequency table. 図７は、第１の実施の形態に係る処理の処理フローを示す図である。FIG. 7 is a diagram illustrating a processing flow of processing according to the first embodiment. 図８は、変更後の頻度表の一例を示す図である。FIG. 8 is a diagram illustrating an example of the frequency table after the change. 図９は、変更後のレコード群の一例を示す図である。FIG. 9 is a diagram illustrating an example of the record group after the change. 図１０は、第２の実施の形態における情報処理装置の機能ブロック図である。FIG. 10 is a functional block diagram of the information processing apparatus according to the second embodiment. 図１１は、第２の実施の形態におけるメインの処理フローを示す図である。FIG. 11 is a diagram illustrating a main processing flow in the second embodiment. 図１２は、レコード群の一例を示す図である。FIG. 12 is a diagram illustrating an example of a record group. 図１３は、第２の実施の形態における頻度表の一例を示す図である。FIG. 13 is a diagram illustrating an example of a frequency table according to the second embodiment. 図１４は、曖昧化処理の処理フローを示す図である。FIG. 14 is a diagram illustrating a processing flow of the obfuscation processing. 図１５は、確率算出処理の処理フローを示す図である。FIG. 15 is a diagram illustrating a processing flow of the probability calculation processing. 図１６は、曖昧化処理の処理フローを示す図である。FIG. 16 is a diagram illustrating a processing flow of the obscuring processing. 図１７は、変更後の頻度表の一例を示す図である。FIG. 17 is a diagram illustrating an example of the changed frequency table. 図１８は、変更後の頻度表に従って変更されたレコード群の一例を示す図である。FIG. 18 is a diagram illustrating an example of a record group changed according to the changed frequency table. 図１９は、変更後の頻度表の他の例を示す図である。FIG. 19 is a diagram illustrating another example of the changed frequency table. 図２０は、変更後の頻度表に従って変更されたレコード群の一例を示す図である。FIG. 20 is a diagram illustrating an example of a record group changed according to the changed frequency table. 図２１は、変更後の元データの一例を示す図である。FIG. 21 is a diagram illustrating an example of the original data after the change. 図２２は、第３の実施の形態に係る曖昧化処理の処理フローを示す図である。FIG. 22 is a diagram illustrating a processing flow of the obfuscation processing according to the third embodiment. 図２３は、第３の実施の形態に係る曖昧化処理で処理されたデータの一例を示す図である。FIG. 23 is a diagram illustrating an example of data processed in the obfuscation process according to the third embodiment. 図２４は、コンピュータの機能ブロック図である。FIG. 24 is a functional block diagram of a computer.

［実施の形態１］
本実施の形態に係る情報処理装置の構成例を図５に示す。図５に示すように、情報処理装置１００は、第１データ格納部１１０と、判断部１２０と、曖昧化処理部１３０と、第２データ格納部１４０とを有する。 [Embodiment 1]
FIG. 5 shows a configuration example of the information processing apparatus according to this embodiment. As illustrated in FIG. 5, the information processing apparatus 100 includes a first data storage unit 110, a determination unit 120, an ambiguity processing unit 130, and a second data storage unit 140.

第１データ格納部１１０は、例えば図６に示すように、あるレコード群についてプライバシー情報となるため曖昧化対象の属性について、当該レコード群において出現する属性値毎にその出現頻度が格納されている。図６の例では、値「不満」の出現頻度が「３」であり、値「普通」の出現頻度が「１」である。このようなデータを、頻度表と呼ぶことにする。また、第１データ格納部１１０は、属性値として取り得る値の種類ｍの値についても格納する。 For example, as shown in FIG. 6, the first data storage unit 110 becomes privacy information for a certain record group, and therefore, for the attribute to be obfuscated, the appearance frequency is stored for each attribute value that appears in the record group. . In the example of FIG. 6, the appearance frequency of the value “unsatisfied” is “3”, and the appearance frequency of the value “normal” is “1”. Such data is called a frequency table. The first data storage unit 110 also stores values of the value type m that can be taken as attribute values.

判断部１２０は、第１データ格納部１１０に格納されている頻度表を基に、出現頻度の分布に偏りが発生しているか判断する。判断部１２０が偏り発生を検出すると、曖昧化処理部１３０に処理を指示する。曖昧化処理部１３０は、判断部１２０からの指示に応じて、第２データ格納部１４０に格納されているレコード群（第１データ格納部１１０に格納されているデータの元データ）の少なくとも１つのレコードにおける曖昧化対象の属性の属性値を、曖昧化データで置換し、置換結果を第２データ格納部１４０に格納する。 Based on the frequency table stored in the first data storage unit 110, the determination unit 120 determines whether there is a bias in the appearance frequency distribution. When the determination unit 120 detects the occurrence of bias, the determination unit 120 instructs the obscuration processing unit 130 to perform processing. In accordance with an instruction from the determination unit 120, the ambiguity processing unit 130 is at least one of a record group (original data of data stored in the first data storage unit 110) stored in the second data storage unit 140. The attribute value of the attribute to be obfuscated in one record is replaced with the obfuscation data, and the replacement result is stored in the second data storage unit 140.

次に、本実施の形態に係る情報処理装置１００の処理内容について、図７を用いて説明する。まず、判断部１２０は、第１データ格納部１１０から処理対象のレコードについての頻度表を読み出す（ステップＳ１）。そして、判断部１２０は、頻度表から、出現頻度の分布に偏りが大きいか判断する（ステップＳ３）。 Next, processing contents of the information processing apparatus 100 according to the present embodiment will be described with reference to FIG. First, the determination unit 120 reads a frequency table for a record to be processed from the first data storage unit 110 (step S1). Then, the determination unit 120 determines whether the appearance frequency distribution is largely biased from the frequency table (step S3).

例えば、頻度表において２種類しか属性値が出現しておらず且つ出現頻度が少ない方の属性値の出現頻度が１又は２となっている場合、又は１種類しか属性値が出現していない場合には、出現頻度の分布に偏りが大きいと判断する。回答者自身が結果を見る場合には、他の回答者の回答を一意に識別できないようにするためには、このような条件を採用する。また、この場合には、ステップＳ５では、多くとも２つのレコードの属性値を曖昧化する。 For example, when only two types of attribute values appear in the frequency table and the appearance frequency of the attribute value with the lower appearance frequency is 1 or 2, or when only one type of attribute value appears Therefore, it is determined that the distribution of the appearance frequency is largely biased. When the respondent himself / herself sees the result, such a condition is adopted in order to make it impossible to uniquely identify the answers of other respondents. In this case, in step S5, attribute values of at most two records are obscured.

また、回答した人が結果を見ないという前提がある場合には、例えば、頻度表において２種類しか属性値が出現しておらず且つ出現頻度が少ない方の属性値の出現頻度が１となっている場合、又は１種類しか属性値が出現していない場合に、出現頻度の分布に偏りが大きいと判断する。このような場合には、ステップＳ５では、多くとも１つのレコードの属性値を曖昧化する。 If there is a premise that the responding person does not see the result, for example, only two types of attribute values appear in the frequency table, and the appearance frequency of the attribute value with the lower appearance frequency is 1. If only one type of attribute value appears, it is determined that the distribution of the appearance frequency is largely biased. In such a case, in step S5, the attribute value of at most one record is obscured.

このほかにも予め基準を設定しておき、出現頻度の偏りを表す指標値が、予め設定されている基準以上であれば、出現頻度の偏りが大きいと判断する。 In addition to this, a reference is set in advance, and if the index value representing the appearance frequency deviation is equal to or greater than the preset reference, it is determined that the appearance frequency deviation is large.

出現頻度の分布に偏りが大きいと判断されなかった場合には処理を終了する。一方、出現頻度の分布に偏りが大きいと判断された場合には、判断部１２０は、曖昧化処理部１３０に処理を指示する。曖昧化処理部１３０は、判断部１２０からの指示に応じて、出現頻度の偏りに応じて頻度表における１以上の属性値を確率的に曖昧化する（ステップＳ５）。曖昧化処理部１３０は、変更後の頻度表を、第１データ格納部１１０に格納する。 If it is not determined that the appearance frequency distribution is largely biased, the process ends. On the other hand, when it is determined that the distribution of appearance frequencies is largely biased, the determination unit 120 instructs the obscuration processing unit 130 to perform processing. In accordance with the instruction from the determination unit 120, the ambiguity processing unit 130 probabilistically obfuscates one or more attribute values in the frequency table in accordance with the appearance frequency bias (step S5). The ambiguity processing unit 130 stores the changed frequency table in the first data storage unit 110.

例えば、本実施の形態では、例えば出現頻度の偏り（例えば２種類しか属性値が出現しておらず且つ出現頻度が少ない方の属性値の出現頻度が１である場合、２である場合）及びｎに応じて複数の曖昧化態様が用意されている。そして、各曖昧化態様においては、例えばｍと出現頻度の総和ｎの値とに応じて決定される確率に応じて、１又は２の属性値を曖昧化データに置換する複数のパターンのいずれかを選択する。なお、１種類しか属性値が出現しない場合には出現頻度の分布に偏りが大きいと言えるが、曖昧化できるのはその１種類の属性値しかないので、そのままその属性値を曖昧化することになる。 For example, in the present embodiment, for example, the appearance frequency is biased (for example, when only two types of attribute values appear and the appearance frequency of the attribute value with the lower appearance frequency is 1, the case is 2) and A plurality of obscuring modes are prepared according to n. And in each obfuscation mode, for example, one of a plurality of patterns that replace the attribute value of 1 or 2 with the obfuscation data according to the probability determined according to the value of m and the sum n of the appearance frequencies. Select. If only one type of attribute value appears, it can be said that the distribution of the appearance frequency is largely biased. However, since only one type of attribute value can be obscured, the attribute value is obscured as it is. Become.

例えば、図６に示すように出現している属性値が不満と普通の２種類で、出現頻度が少ない方である普通の出現頻度が１である場合、ｍ＝３でｎ＝４であれば、１つの曖昧化態様として、以下のような曖昧化パターンを規定しておく。
Ａ：確率１１％で、不満１レコードと普通１レコードを曖昧化
Ｂ：それ以外（８９％）で、不満１レコードを曖昧化 For example, as shown in FIG. 6, when there are two types of attribute values that appear, dissatisfied and normal, and the normal appearance frequency that is the lower appearance frequency is 1, if m = 3 and n = 4 The following obscuration pattern is defined as one obscuration mode.
A: Ambiguity of 1 record of dissatisfaction and 1 record of ordinary with 11% probability B: Disambiguation of 1 record of dissatisfaction with other (89%)

従って、曖昧化処理部１３０は、乱数を発生させて、パターンＡ又はＢを選択して、頻度表を変更する。例えば、パターンＢが選択された場合には、頻度表は図８に示すように変更される。図８の例では、曖昧化データとして「？」が用いられている。このように、どのような属性値かを特定できないようなデータに置換する。但し、「？」だけではなく、元の値の確率分布情報を含むようにしても良い。具体的には、この例では（不満＝５７％，普通＝４３％）というデータをも含むようにしても良い。 Therefore, the ambiguity processing unit 130 generates a random number, selects the pattern A or B, and changes the frequency table. For example, when the pattern B is selected, the frequency table is changed as shown in FIG. In the example of FIG. 8, “?” Is used as the ambiguous data. In this way, the attribute value is replaced with data that cannot be specified. However, not only “?” But also the probability distribution information of the original value may be included. Specifically, in this example, the data (dissatisfaction = 57%, normal = 43%) may be included.

その後、曖昧化処理部１３０は、第２データ格納部１４０に格納されているレコード群の一部のレコードを、変更後の頻度表に従って変更し、変更後のレコードを第２データ格納部１４０に格納する（ステップＳ７）。図８に示すような頻度表に変更する場合には、属性値「不満」のレコードを１つ選択して曖昧化データに置換する。上で述べた例では、図３の１行目から４行目についての頻度表を処理していたので、図３の１行目から４行目が変更対象であり、その中で曖昧化対象の属性「回答」の属性値「不満」の３レコードのいずれかをランダムに選択して、曖昧化データに置換する。例えば、図９に示すようなデータに変更される。図９の例では、上で述べたように、図３の１行目から４行目のうち３行目のレコードの属性「回答」の属性値「不満」が「？（不満＝５７％，普通＝４３％）」に置換されている。 Thereafter, the ambiguity processing unit 130 changes some records of the record group stored in the second data storage unit 140 according to the changed frequency table, and the changed records are stored in the second data storage unit 140. Store (step S7). When changing to the frequency table as shown in FIG. 8, one record with the attribute value “dissatisfied” is selected and replaced with the ambiguous data. In the example described above, since the frequency table for the first to fourth lines in FIG. 3 is processed, the first to fourth lines in FIG. One of the three records having the attribute value “dissatisfied” of the attribute “answer” is randomly selected and replaced with the ambiguous data. For example, the data is changed as shown in FIG. In the example of FIG. 9, as described above, the attribute value “dissatisfied” of the attribute “answer” of the third line of the first to fourth lines of FIG. 3 is “? (Dissatisfaction = 57%, Normal = 43%) ”.

このようにすれば、回答者が図９の結果を見たとしても、他の回答者がどのように回答したかを一意に特定できない。一方、「企画部／開発部」は不満の数又は割合が高い傾向があることも把握できる。すなわち、出現頻度に偏りが大きい場合であっても、元の値の出現頻度の傾向を保持しつつプライバシー保護が図られるようになる。 In this way, even if the respondent sees the result of FIG. 9, it is not possible to uniquely identify how other respondents answered. On the other hand, the “planning department / development department” can also grasp that the number or ratio of dissatisfaction tends to be high. That is, even when the appearance frequency is largely biased, privacy protection can be achieved while maintaining the tendency of the appearance frequency of the original value.

なお、頻度表を変更することなく、ステップＳ７でいずれの属性値を曖昧化データに置換するかを決定できれば、直接レコードにおける曖昧化対象の属性の属性値を曖昧化データに置換しても良い。 It should be noted that the attribute value of the attribute to be obfuscated in the direct record may be replaced with the ambiguous data as long as it can be determined in step S7 which attribute value is to be replaced with the ambiguous data without changing the frequency table. .

［実施の形態２］
本実施の形態に係る情報処理装置の構成例を、図１０に示す。図１０に示すように、情報処理装置２００は、第１データ格納部２１０と、ｋ−匿名化処理部２２０と、グループ化処理部２３０と、出力部２４０と、入力部２５０と、曖昧化処理部２６０と、第２データ格納部２７０とを有する。 [Embodiment 2]
FIG. 10 shows a configuration example of the information processing apparatus according to this embodiment. As illustrated in FIG. 10, the information processing device 200 includes a first data storage unit 210, a k-anonymization processing unit 220, a grouping processing unit 230, an output unit 240, an input unit 250, and an ambiguity process. Unit 260 and a second data storage unit 270.

第１データ格納部２１０には、処理の対象となるレコード群が格納されている。ｋ−匿名化処理部２２０は、第１データ格納部２１０に格納されているレコード群について、よく知られたｋ−匿名化処理を実施する。 The first data storage unit 210 stores a group of records to be processed. The k-anonymization processing unit 220 performs a well-known k-anonymization process on the record group stored in the first data storage unit 210.

入力部２５０は、プライバシー保護上あまり問題とならない属性と、ユーザから曖昧化対象の属性及びその属性の取り得る属性値の種類数ｍの組み合わせの入力を受け付け、第１データ格納部２１０に格納する。 The input unit 250 accepts an input of a combination of an attribute that does not matter much in terms of privacy protection and an obfuscation target attribute and the number m of attribute values that the attribute can take, and stores the input in the first data storage unit 210. .

グループ化処理部２３０は、ｋ−匿名化処理後のレコード群について、プライバシー保護の上であまり問題のない属性の属性値が同じレコードをグループ化する。グループ化処理部２３０は、グループ化についてのデータを第１データ格納部２１０に格納する。 The grouping processing unit 230 groups records having the same attribute value with no problem in terms of privacy protection for the record group after the k-anonymization processing. The grouping processing unit 230 stores data regarding grouping in the first data storage unit 210.

曖昧化処理部２６０は、各グループについて、曖昧化処理を実施し、曖昧化の処理結果を第１データ格納部２１０に格納する。なお、曖昧化処理部２６０は、頻度表などの処理途中のデータについては第２データ格納部２７０に格納する。出力部２４０は、第１データ格納部２１０に格納されているデータを、出力装置（表示装置や印刷装置など）に出力する。 The obscuring processing unit 260 performs the obscuring processing for each group, and stores the obscuring processing result in the first data storage unit 210. Note that the obscuration processing unit 260 stores data in the middle of processing such as a frequency table in the second data storage unit 270. The output unit 240 outputs the data stored in the first data storage unit 210 to an output device (such as a display device or a printing device).

次に、図１１乃至図２１を用いて、情報処理装置２００の処理内容を説明する。既に、入力部２５０は、ユーザから、第１データ格納部２１０に格納されているレコード群における曖昧化対象の属性及びその属性の取り得る属性値の種類数ｍの組み合わせの入力を受け付け、第１データ格納部２１０に格納しているものとする。例えば、レコード群において部署、年齢、アンケートの回答１及び回答２が属性となっており、部署及び年齢が、プライバシー保護をあまり問題としない属性として指定されているものとする。そして、プライバシー情報である曖昧化対象属性は回答１及び回答２とする。そして、回答１については、属性値「不満」、「普通」及び「満足」という３種類の取り得る属性値が存在しており、｛回答１：３｝というような入力がなされる。すなわち、回答１についてはｍ＝３である。また、回答２についても、属性値「不満」、「普通」及び「満足」という３種類の取り得る属性値が存在しており、｛回答２：３｝というような入力がなされる。すなわち、回答２についても、ｍ＝３である。 Next, processing contents of the information processing apparatus 200 will be described with reference to FIGS. 11 to 21. The input unit 250 has already received an input from the user of a combination of the attribute to be obfuscated in the record group stored in the first data storage unit 210 and the number m of types of attribute values that the attribute can take. Assume that the data is stored in the data storage unit 210. For example, it is assumed that the department, age, and questionnaire responses 1 and 2 are attributes in the record group, and the department and age are designated as attributes that do not matter much about privacy protection. The obfuscation target attribute which is privacy information is assumed to be answer 1 and answer 2. For the answer 1, there are three possible attribute values of attribute values “dissatisfied”, “normal”, and “satisfied”, and an input such as {answer 1: 3} is made. That is, for answer 1, m = 3. Also, there are three possible attribute values for the answer 2, which are attribute values “dissatisfied”, “normal”, and “satisfied”, and an input such as {answer 2: 3} is made. That is, for answer 2, m = 3.

そして、ｋ−匿名化処理部２２０は、第１データ格納部２１０に格納されているレコード群に対して、よく知られたｋ−匿名化処理を実施することで、プライバシー保護上あまり問題とならない属性の属性値がｋレコード以上同じになるようにその属性値を変更し、変更後のデータを第１データ格納部２１０に格納する（図１１：ステップＳ１１）。本実施の形態では、この段階で図１２に示すようなデータが第１データ格納部２１０に格納されているものとする。図１２の例では、ｋ＝４であり、部署「製造部門」及び年齢「２５−４２」というレコードが４レコードになるように、そして部署「営業部門」及び年齢「２４−４４」というレコードが４レコードになるように、年齢属性の属性値が変更されている。 Then, the k-anonymization processing unit 220 performs a well-known k-anonymization process on the record group stored in the first data storage unit 210, so that there is not much problem in terms of privacy protection. The attribute value is changed so that the attribute value of the attribute is the same for k records or more, and the changed data is stored in the first data storage unit 210 (FIG. 11: step S11). In this embodiment, it is assumed that data shown in FIG. 12 is stored in the first data storage unit 210 at this stage. In the example of FIG. 12, k = 4, so that the record of the department “manufacturing department” and the age “25-42” becomes four records, and the record of the department “sales department” and the age “24-44” The attribute value of the age attribute is changed so as to be 4 records.

その後、グループ化処理部２３０は、第１データ格納部２１０に格納されているｋ−匿名化処理後のレコード群について、曖昧化対象属性以外の属性の属性値に基づき、レコードをグループ化する（ステップＳ１３）。図１２の例では、部署及び年齢の属性値が同じレコード同士を同じグループに分類する。上でも述べたように、部署「製造部門」及び年齢「２５−４２」という４レコードと、部署「営業部門」及び年齢「２４−４４」という４レコードとがグループとして特定され、グループ分けのデータが第１データ格納部２１０に格納される。例えば、レコード１乃至４が第１のグループであり、レコード５乃至８が第２のグループであることを表すデータが格納される。 Thereafter, the grouping processing unit 230 groups the records based on the attribute values of attributes other than the obfuscation target attribute for the record group after the k-anonymization processing stored in the first data storage unit 210 ( Step S13). In the example of FIG. 12, records having the same department and age attribute values are classified into the same group. As described above, the four records of the department “manufacturing department” and age “25-42” and the four records of the department “sales department” and age “24-44” are specified as a group, and data for grouping Is stored in the first data storage unit 210. For example, data indicating that records 1 to 4 are a first group and records 5 to 8 are a second group is stored.

その後、曖昧化処理部２６０は、第１データ格納部２１０に格納されているグループ分けのデータに基づき、未処理のグループを１つ特定する（ステップＳ１５）。さらに、曖昧化処理部２６０は、未処理の曖昧化対象属性を１つ特定する（ステップＳ１６）。そして、曖昧化処理部２６０は、特定されたグループ及び曖昧化対象属性について、頻度表を生成し、第２データ格納部２７０に格納する（ステップＳ１７）。例えば、レコード１乃至４のグループの属性「回答１」について頻度表を生成すると、図１３に示すような頻度表が得られる。 Thereafter, the ambiguity processing unit 260 identifies one unprocessed group based on the grouping data stored in the first data storage unit 210 (step S15). Furthermore, the obscuration processing unit 260 identifies one unprocessed obscuration target attribute (step S16). Then, the obfuscation processing unit 260 generates a frequency table for the specified group and the obfuscation target attribute, and stores it in the second data storage unit 270 (step S17). For example, when a frequency table is generated for the attribute “answer 1” of the group of records 1 to 4, a frequency table as shown in FIG. 13 is obtained.

そして、曖昧化処理部２６０は、曖昧化処理を実施する（ステップＳ１９）。曖昧化処理については、図１４乃至図２１を用いて説明する。 Then, the obscuring processing unit 260 performs the obscuring processing (step S19). The obfuscation process will be described with reference to FIGS.

曖昧化処理部２６０は、第２データ格納部２７０から頻度表を読み出し、第１データ格納部２１０から、特定されたグループ及び曖昧化対象属性についての取り得る属性値の種類数ｍを読み出す（図１４：ステップＳ３１）。また、曖昧化処理部２６０は、変数ｎに頻度表における総頻度（出現頻度の総和）を設定する（ステップＳ３３）。そして、曖昧化処理部２６０は、ｎが３以上であるか判断する（ステップＳ３５）。ｎが３未満、すなわち１又は２の場合には、曖昧化処理部２６０は、頻度表において全ての要素を曖昧化する（ステップＳ３７）。そのまま公開するには総頻度が低すぎてプライバシー保護が図れないため、いずれの属性値についても曖昧化データに置換する。例えば「？」に置換する。なお、確率分布情報を付加するようにしても良い。この場合には、各属性値について確率１／ｍずつというようなデータを付加する。処理は端子Ｂを介して図１６のステップＳ６５に移行する。 The obfuscation processing unit 260 reads the frequency table from the second data storage unit 270, and reads from the first data storage unit 210 the number m of attribute value types that can be taken for the specified group and the attribute to be obfuscated (see FIG. 14: Step S31). Further, the obscuration processing unit 260 sets the total frequency (total appearance frequency) in the frequency table for the variable n (step S33). Then, the ambiguity processing unit 260 determines whether n is 3 or more (step S35). When n is less than 3, that is, 1 or 2, the obscuring processing unit 260 obscures all elements in the frequency table (step S37). Since the total frequency is too low to publish as it is and privacy protection cannot be achieved, any attribute value is replaced with ambiguous data. For example, replace with “?”. Note that probability distribution information may be added. In this case, data with a probability of 1 / m is added for each attribute value. The process proceeds to step S65 in FIG.

一方、ｎが３以上であれば、曖昧化処理部２６０は、頻度表のレコード数が２であるか判断する（ステップＳ３９）。頻度表のレコード数が２以外であれば、曖昧化処理部２６０は、頻度表のレコード数が１であるか判断する（ステップＳ４１）。頻度表のレコード数が１以外、すなわち３以上である場合には、曖昧化データに置換しなくてもプライバシー保護上あまり問題とならないので、何もせず端子Ｃを介して、呼び出し元の処理に戻る。 On the other hand, if n is 3 or more, the ambiguity processing unit 260 determines whether the number of records in the frequency table is 2 (step S39). If the number of records in the frequency table is other than 2, the ambiguity processing unit 260 determines whether the number of records in the frequency table is 1 (step S41). When the number of records in the frequency table is other than 1, that is, 3 or more, there is no problem in terms of privacy protection without replacing with obfuscated data. Return.

一方、頻度表のレコード数が１であれば、曖昧化処理部２６０は、頻度表における唯一のレコードの出現頻度ｎのうち２つを曖昧化する（ステップＳ４３）。このように取り得る属性値の種類がｍであるにも拘わらず、１種類しか出現していない場合には、出現頻度に偏りが大きいと判断して、出現頻度ｎのうち２つを曖昧化データで置換する。そうすると、元々ある属性値の出現頻度がｎ−２となり、曖昧化データの出現頻度が２となる。曖昧化データは、例えば「？（ａ＝確率Ｐ，ａ以外＝（１−Ｐ）／（ｍ−１）ずつ）」とする。ａは、曖昧化データで置換した属性値であり、Ｐはｎ＝４の場合には４／７、それ以外の場合には２／（ｎ＋１）である。これについては詳しくは以下で述べる。その後処理は端子Ｂを介して図１６のステップＳ６５に移行する。 On the other hand, if the number of records in the frequency table is 1, the obfuscation processing unit 260 obfuscates two of the appearance frequencies n of the only records in the frequency table (step S43). When only one type appears even though the type of attribute value that can be taken is m in this way, it is determined that there is a large bias in the appearance frequency, and two of the appearance frequencies n are obscured. Replace with data. Then, the appearance frequency of the original attribute value is n-2, and the appearance frequency of the ambiguous data is 2. The ambiguous data is, for example, “? (A = probability P, other than a = (1-P) / (m−1) each)”. a is an attribute value replaced with obfuscated data, and P is 4/7 when n = 4, and 2 / (n + 1) otherwise. This will be described in detail below. Thereafter, the processing shifts to step S65 in FIG.

また、頻度表のレコード数が２であれば、曖昧化処理部２６０は、頻度表においてレコードを出現頻度順に並べ替え、多い方をａ、少ない方をｂと設定する（ステップＳ４５）。そして、曖昧化処理部２６０は、ｂの出現頻度は２以下であるか判断する（ステップＳ４７）。ｂの出現頻度は３以上であれば、曖昧化データに置換しなくてもプライバシ保護上あまり問題とならないので、何もせず端子Ｃを介して、呼び出し元の処理に戻る。 If the number of records in the frequency table is 2, the obfuscation processing unit 260 rearranges the records in the frequency table in the order of appearance frequency, and sets a greater number as a and a smaller number as b (step S45). Then, the ambiguity processing unit 260 determines whether the appearance frequency of b is 2 or less (step S47). If the appearance frequency of b is 3 or more, there is no problem in privacy protection even if it is not replaced with obfuscated data, so nothing is done and the process returns to the caller process via terminal C.

一方、ｂの出現頻度が２であれば、曖昧化処理部２６０は、確率算出処理を実施する（ステップＳ４９）。確率算出処理では、ｍ及びｎから、確率ｘ及びｐが算出される。処理は端子Ａを介して図１６のステップＳ５１に移行する。 On the other hand, if the appearance frequency of b is 2, the ambiguity processing unit 260 performs a probability calculation process (step S49). In the probability calculation process, probabilities x and p are calculated from m and n. The process proceeds to step S51 in FIG.

確率算出処理について、図１５を用いて説明する。但し、具体的な処理を説明する前に、ｎ≧３の場合の考え方について説明する。 The probability calculation process will be described with reference to FIG. However, the concept in the case of n ≧ 3 will be described before describing specific processing.

まず、頻度表が｛ａ：ｎ｝（属性値ａの出現頻度がｎ）のとき、特定の人がａであることを他者に一意に決められないようにするには、少なくとも２個を曖昧化すると、プライバシー保護上問題が無くなる。一方、多くの情報を提示するため曖昧化は最小限にしたいので、｛ａ：ｎ−２，？：２｝と曖昧化すべきである。「？」は曖昧化データである。 First, when the frequency table is {a: n} (the appearance frequency of the attribute value a is n), in order to prevent a specific person from uniquely determining that a specific person is a, at least two are used. If it becomes ambiguous, there will be no problem in privacy protection. On the other hand, since a lot of information is presented, we want to minimize ambiguity, so {a: n-2,? : 2} should be obscured. “?” Is obfuscated data.

｛ａ：ｎ−２，？：２｝の元の頻度表が｛ａ：ｎ｝であると決められないようにするために、確率ｘ，ｙ，ｐ，ｑを次のように定める。
頻度表が｛ａ：ｎ−１，ｂ：１｝のとき、｛ａ：ｎ−２，？：２｝とする確率をｘとする。
頻度表が｛ａ：ｎ−１，ｂ：１｝のとき、｛ａ：ｎ−２，ｂ：１，？：１｝とする確率をｙとする。
頻度表が｛ａ：ｎ−２，ｂ：２｝のとき、｛ａ：ｎ−２，？：２｝とする確率をｐとする。
頻度表が｛ａ：ｎ−２，ｂ：２｝のとき、｛ａ：ｎ−２，ｂ：１，？：１｝とする確率をｑとする。
ここで、０≦ｘ，０≦ｙ，０≦ｐ，０≦ｑ，ｘ＋ｙ≦１，ｐ＋ｑ≦１である。 {A: n-2,? : 2} In order to prevent the original frequency table of {2: n} from being determined to be {a: n}, the probabilities x, y, p, and q are determined as follows.
When the frequency table is {a: n-1, b: 1}, {a: n-2,? : 2} is assumed to be x.
When the frequency table is {a: n-1, b: 1}, {a: n-2, b: 1,? : 1} is y.
When the frequency table is {a: n-2, b: 2}, {a: n-2 ,? : 2} is assumed to be p.
When the frequency table is {a: n-2, b: 2}, {a: n-2, b: 1,? : 1} is assumed to be q.
Here, 0 ≦ x, 0 ≦ y, 0 ≦ p, 0 ≦ q, x + y ≦ 1, and p + q ≦ 1.

このとき、取り得る属性値（ａ及びｂなど）の出現確率を全て等しいとすると、｛ａ：ｎ−２，？：２｝の元の頻度表が｛ａ：ｎ｝である確率をＡ、ｂの人から見た時｛ａ：ｎ−２，？：２｝の元の頻度表が｛ａ：ｎ−１，ｂ：１｝である確率をＢ、ｂの人から見た時｛ａ：ｎ−２，ｂ：１，？：１｝の頻度表が｛ａ：ｎ−１，ｂ：１｝である確率をＣとすると、Ａ，Ｂ及びＣは、以下の式で表される。なお、ｖ＝ｍ−１である。 At this time, if the appearance probabilities of possible attribute values (such as a and b) are all equal, {a: n−2,? : 2} When the probability that the original frequency table of {a: n} is {a: n-2 ,? : 2} when the probability that the original frequency table of {a: n-1, b: 1} is viewed from the persons B and b {a: n-2, b: 1 ,? : 1}, where C is the probability that the frequency table is {a: n−1, b: 1}, A, B and C are expressed by the following equations. Note that v = m−1.

このとき、｛ａ：ｎ−２，？：２｝の元の頻度表で２つの？に対応するａの個数の期待値をＥ、｛ａ：ｎ−２，ｂ：１，？：１｝の元の頻度表で？がａである確率をＰとすると、それらは次の式となる。 At this time, {a: n-2,? : 2} in the original frequency table? The expected value of the number of a corresponding to E, {a: n-2, b: 1,? : 1} in the original frequency table? If the probability that A is a is P, they are as follows.

プライバシー保護を最大限にするため、Ａ＝Ｂ＝Ｃ且つその値が最小となるｘ，ｙ，ｐ，ｑを求めることを考える。なお、Ａ，Ｂ，Ｃのいずれかが１になると、特定の人がａであることが他者に一意に決められてしまう。 In order to maximize privacy protection, consider obtaining x, y, p, and q where A = B = C and the value is minimized. When any one of A, B, and C is 1, it is uniquely determined by the other person that the specific person is a.

Ａ＝Ｂ＝Ｃより、ｐは次の式によりｘで表される。 From A = B = C, p is represented by x by the following equation.

ここで、ｎ≧５の場合、ｘ＋ｙ＝ｐ＋ｑ＝１のときＡ，Ｂ，Ｃは最小になり、ｘ，Ａ，Ｐ，Ｅは次の式になる。 Here, when n ≧ 5, when x + y = p + q = 1, A, B, and C are minimum, and x, A, P, and E are as follows.

ｎ＝４の場合、｛ａ：２，ｂ：２｝を曖昧化する際、｛ａ：２，？：２｝と｛ｂ：２，？：２｝、｛ａ：２，ｂ：１，？：１｝と｛ａ：１，ｂ：２，？：１｝の確率をそれぞれ等しくしたいため、ｘ＋ｙ＝１，ｐ＋ｑ＝１／２の時、Ａ，Ｂ，Ｃは最小になり、ｘ，
Ａ，Ｐ，Ｅは次の式になる。 When n = 4, when obscuring {a: 2, b: 2}, {a: 2,? : 2} and {b: 2 ,? : 2}, {a: 2, b: 1,? : 1} and {a: 1, b: 2,? : 1} to equalize, respectively, when x + y = 1 and p + q = 1/2, A, B, and C are minimized, and x,
A, P, and E are as follows.

ｎ＝３の場合、ｎ＝４の場合と同じように対称性を考え、ｘ＋ｙ＋ｐ＝１，ｑ＝ｙとし、ｘ，Ａ，Ｐ，Ｅは次の式になる。 In the case of n = 3, symmetry is considered in the same manner as in the case of n = 4, x + y + p = 1, q = y, and x, A, P, E are as follows.

以上のような考え方からすると、図１５に示すような処理を実施する。すなわち、曖昧化処理部２６０は、ｖ＝ｍ−１と設定する（ステップＳ６１）。そして、曖昧化処理部２６０は、ｎ＝４であるか判断する（ステップＳ６３）。ｎ＝４であれば、上で述べたように、曖昧化処理部２６０は、ｘ＝３／１４ｖ及びＰ＝４／７を算出する（ステップＳ６７）。処理はステップＳ６９に移行する。一方、ｎ＝４以外であれば、上で述べたように、曖昧化処理部２６０は、ｘ＝２（ｎ−１）／（ｖｎ（ｎ＋１））及びＰ＝２／（ｎ＋１）を算出する（ステップＳ６５）。そして処理はステップＳ６９に移行する。 From the above view, the processing as shown in FIG. 15 is performed. That is, the ambiguity processing unit 260 sets v = m−1 (step S61). Then, the ambiguity processing unit 260 determines whether n = 4 (step S63). If n = 4, as described above, the ambiguity processing unit 260 calculates x = 3 / 14v and P = 4/7 (step S67). The process proceeds to step S69. On the other hand, if other than n = 4, as described above, the ambiguity processing unit 260 calculates x = 2 (n−1) / (vn (n + 1)) and P = 2 / (n + 1). (Step S65). Then, the process proceeds to step S69.

そして、曖昧化処理部２６０は、ｐ＝２ｖｎｘ²／（（ｎ−１）（２−ｖｎｘ））を算出する（ステップＳ６９）。そして呼び出し元の処理に戻る。 Then, the ambiguity processing unit 260 calculates p = 2vnx ² / ((n−1) (2-vnx)) (step S69). Then, the process returns to the calling process.

なお、図１４の処理では、総頻度ｎが２以下というように少ない場合、ｎは３以上でも出現する属性値の数が１である場合に、プライバシー保護を図るために画一的に曖昧化する曖昧化態様を示している。 In the process of FIG. 14, when the total frequency n is as small as 2 or less, even when n is 3 or more and the number of appearing attribute values is 1, it is uniformly obscured to protect privacy. The obscuration mode is shown.

次に、図１６を用いて端子Ａ以降の処理を説明する。図１６では、最も少ない出現頻度の属性値の出現頻度が１の場合と２の場合といった出現頻度の偏りとｎに従って、複数の曖昧化態様のいずれかを選択するようになっており、各曖昧化態様では、図１５で算出した確率ｘ及びｐに従って確率的に複数の曖昧化パターンのうちいずれかを選択するようになっている。 Next, the processing after the terminal A will be described with reference to FIG. In FIG. 16, one of a plurality of ambiguities is selected according to the appearance frequency bias and n when the appearance frequency of the attribute value having the lowest appearance frequency is 1 and 2. In the conversion mode, one of a plurality of obscuring patterns is selected stochastically according to the probabilities x and p calculated in FIG.

すなわち、曖昧化処理部２６０は、ｂの出現頻度が２であるか判断する（ステップＳ５１）。ｂの出現頻度が２ではない、すなわち１であれば、曖昧化処理部２６０は、ｎ＝３であるか判断する（ステップＳ５３）。ｎ＝３であれば、曖昧化処理部２６０は、乱数により、以下の曖昧化パターンのいずれかを選択して、実行する（ステップＳ５７）。すなわち、（１）確率ｘで、ａ１個と、ｂ１個を曖昧化する。この場合、ｂの出現頻度は１であるが、このｂも曖昧化されて｛ａ：１，？：２｝となるので、確率分布情報については、「ａ＝確率１／２（＝２／（３＋１））、ａ以外＝確率（１−１／２）／ｖずつ」という情報となる。（２）確率ｐで、ａ２個を曖昧化する。この場合、｛ｂ：１，？：２｝となるので、確率分布情報については、「ｂ＝確率１／２、ｂ以外＝確率（１−１／２）／ｖずつ」という情報となる。（３）それ以外で、ａ１個を曖昧化する。この場合、｛ａ：１，ｂ：１，？：１｝となるので、「ａ＝確率１／２、ｂ＝確率１／２（＝１−１／２）」という情報となる。そして処理はステップＳ６５に移行する。 That is, the ambiguity processing unit 260 determines whether the appearance frequency of b is 2 (step S51). If the appearance frequency of b is not 2, that is, 1, the obfuscation processing unit 260 determines whether n = 3 (step S53). If n = 3, the obscuring processing unit 260 selects and executes one of the following obscuring patterns using a random number (step S57). That is, (1) obscure a1 and b1 with probability x. In this case, the appearance frequency of b is 1, but this b is also obscured and {a: 1,? : 2}, the probability distribution information is information “a = probability 1/2 (= 2 / (3 + 1)), other than a = probability (1-1 / 2) / v each”. (2) Obscure a2 with probability p. In this case, {b: 1,? : 2}, the probability distribution information is information “b = probability 1/2, other than b = probability (1-1 / 2) / v each”. (3) Otherwise, obscure a1. In this case, {a: 1, b: 1,? 1}, the information is “a = probability ½, b = probability ½ (= 1−1 / 2)”. Then, the process proceeds to step S65.

一方、ｎ＝３以外の場合（ｎ＝４以上）には、曖昧化処理部２６０は、乱数により、以下の曖昧化パターンのいずれかを選択し、実行する（ステップＳ５５）。すなわち、（１）確率ｘで、ａ１個とｂ１個を曖昧化する。この場合、ｂの出現頻度は１であるが、このｂも曖昧化されて｛ａ：ｎ−２，？：２｝となるので、確率分布情報については、ｍ＝５以上であれば「ａ＝確率２／（ｎ＋２）、ａ以外＝確率（１−２／（ｎ＋２））／ｖずつ」という情報となる。ｍ＝４であれば「ａ＝確率４／７、ａ以外＝確率（１−４／７）／ｖずつ」という情報になる。（２）それ以外で、ａ１個を曖昧化する。この場合、｛ａ：ｎ−２，ｂ：１，？：１｝となるので、確率分布情報については、「ａ＝確率１／２、ｂ＝確率１／２（＝１−１／２）」という情報となる。そして処理はステップＳ６５に移行する。 On the other hand, when n is not 3 (n = 4 or more), the obscuration processing unit 260 selects and executes one of the following obscuration patterns using a random number (step S55). That is, (1) obscure a1 and b1 with probability x. In this case, the appearance frequency of b is 1, but this b is also obscured and {a: n-2,? Therefore, the probability distribution information is “a = probability 2 / (n + 2), other than a = probability (1-2 / (n + 2)) / v each” if m = 5 or more. Become. If m = 4, the information is “a = probability 4/7, other than a = probability (1-4 / 7) / v”. (2) Otherwise, obscure a1. In this case, {a: n-2, b: 1,? 1}, the probability distribution information is “a = probability 1/2, b = probability 1/2 (= 1−1 / 2)”. Then, the process proceeds to step S65.

また、ｂの出現頻度が２であれば、曖昧化処理部２６０は、ｎ＝４であるか判断する（ステップＳ５９）。ｎ＝４であれば、曖昧化処理部２６０は、乱数により、以下の曖昧化パターンのいずれかを選択し、実行する（ステップＳ６３）。すなわち、（１）確率ｐで、ａ２個を曖昧化する。この場合、｛ｂ：２，？：２｝となるので、確率分布情報については、「ｂ＝確率４／７、ｂ以外＝確率（１−４／７）／ｖずつ」という情報となる。（２）確率ｐで、ｂ２個を曖昧化する。この場合、｛ａ：ｎ−２，？：２｝となるので、確率分布情報については、「ａ＝確率４／７、ａ以外＝確率（１−４／７）／ｖずつ」という情報となる。（３）確率０．５−ｐで、ａ１個を曖昧化する。この場合、｛ａ：１，ｂ：２，？：１｝となるので、確率分布情報については、「ｂ＝確率４／７、ａ＝確率３／７（＝１−４／７）」という情報となる。（４）それ以外で、ｂ１個を曖昧化する。この場合、｛ａ：２，ｂ：１，？：１｝となるので、確率分布情報については、「ａ＝確率４／７、ｂ＝確率３／７（＝１−４／７）」という情報となる。そして処理はステップＳ６５に移行する。 If the appearance frequency of b is 2, the ambiguity processing unit 260 determines whether n = 4 (step S59). If n = 4, the obscuring processing unit 260 selects and executes one of the following obscuring patterns using a random number (step S63). That is, (1) The a2 items are obscured with probability p. In this case, {b: 2 ,? : 2}, the probability distribution information is “b = probability 4/7, other than b = probability (1−4 / 7) / v”. (2) Obscure b2 with probability p. In this case, {a: n-2,? : 2}, the probability distribution information is information “a = probability 4/7, other than a = probability (1-4 / 7) / v”. (3) Obscure a1 with probability 0.5-p. In this case, {a: 1, b: 2,? 1}, the probability distribution information is “b = probability 4/7, a = probability 3/7 (= 1−4 / 7)”. (4) Otherwise, obscure b1. In this case, {a: 2, b: 1,? 1}, the probability distribution information is information “a = probability 4/7, b = probability 3/7 (= 1−4 / 7)”. Then, the process proceeds to step S65.

一方、ｎ＝４以外の場合（ｎ＝５以上）、曖昧化処理部２６０は、乱数により、以下の曖昧化パターンのいずれかを選択し、実行する（ステップＳ６１）。すなわち、（１）確率ｐで、ｂ２個を曖昧化する。この場合、｛ａ：ｎ−２，？：２｝となるので、確率分布情報については、「ａ＝確率２／（ｎ＋１）、ａ以外＝確率（１−２／（ｎ＋１））／ｖずつ」という情報となる。（２）それ以外で、ｂ１個を曖昧化する。この場合、｛ａ：ｎ−２，ｂ：１，？：１｝となるので、確率分布情報については、「ａ＝確率２／（ｎ＋１）、ｂ＝確率１−２（ｎ＋１）」という情報になる。そして処理はステップＳ６５に移行する。 On the other hand, when other than n = 4 (n = 5 or more), the obscuration processing unit 260 selects and executes one of the following obscuration patterns using a random number (step S61). That is, (1) b2 items are obscured with probability p. In this case, {a: n-2,? : 2}, the probability distribution information is information that “a = probability 2 / (n + 1), other than a = probability (1-2 / (n + 1)) / v each”. (2) Otherwise, obscure b1. In this case, {a: n-2, b: 1,? 1}, the probability distribution information is “a = probability 2 / (n + 1), b = probability 1-2 (n + 1)”. Then, the process proceeds to step S65.

その後、曖昧化処理部２６０は、変更後の頻度表を、第２データ格納部２７０に格納する（ステップＳ６５）。そして、呼び出し元の処理に戻る。 Thereafter, the ambiguity processing unit 260 stores the changed frequency table in the second data storage unit 270 (step S65). Then, the process returns to the caller process.

図１１の処理の説明に戻って、曖昧化処理部２６０は、第２データ格納部２７０に変更後の頻度表が格納されている場合には、変更前の頻度表との差を、第１データ格納部２１０に格納されており且つステップＳ１５で特定されたグループに属するレコード群の属性値を変更する（ステップＳ２１）。そして、曖昧化処理部２６０は、未処理の曖昧化対象属性が存在するか判断する（ステップＳ２２）。未処理の曖昧化対象属性が存在する場合には、ステップＳ１６に戻る。一方、未処理の曖昧化対象属性が存在しない場合には、曖昧化処理部２６０は、未処理のグループが存在するか判断する（ステップＳ２３）。未処理のグループが存在する場合には、ステップＳ１５に戻る。一方、未処理のグループが存在しない場合には、出力部２４０は、第１データ格納部２１０に格納されている修正後の元データを、出力装置（例えば表示装置、印刷装置、又はネットワークで接続されている他のコンピュータなど）に出力する（ステップＳ２５）。 Returning to the description of the processing in FIG. 11, when the frequency table after the change is stored in the second data storage unit 270, the ambiguity processing unit 260 calculates the difference from the frequency table before the change as the first frequency table. The attribute value of the record group that is stored in the data storage unit 210 and belongs to the group specified in step S15 is changed (step S21). Then, the obscuration processing unit 260 determines whether there is an unprocessed obscuration target attribute (step S22). If there is an unprocessed obfuscation target attribute, the process returns to step S16. On the other hand, when there is no unprocessed obfuscation target attribute, the obfuscation processing unit 260 determines whether there is an unprocessed group (step S23). If there is an unprocessed group, the process returns to step S15. On the other hand, when there is no unprocessed group, the output unit 240 connects the corrected original data stored in the first data storage unit 210 with an output device (for example, a display device, a printing device, or a network). To another computer that is being used (step S25).

例えば、図１２の例でグループが部署「製造部門」且つ年齢「２５−４２」で、曖昧化対象属性「回答１」を処理対象とする場合、図１３に示すような頻度表が得られる。このような場合、ｂの出力頻度は「１」で、ｎ＝４であるからステップＳ５５の曖昧化態様が選択される。また、図１５の処理フローからすると、ｘ＝３／２８、ｐ＝３／５６となる。そうすると、確率ｘで、ａ１個とｂ１個を曖昧化するか、それ以外でａ１個を曖昧化する。後者が選択されると、図１７に示すように頻度表が変更される。ａは「不満」であるから、「不満」の出現頻度が１減少し、その分「？（不満＝５７％、普通＝４３％）」が追加される。このような変更後の頻度表によれば、図１２の元データの関連部分は、図１８に示すようになる。図１８の例では、ａである「不満」が回答１となっているレコードをランダムに１つ選択して、「？（不満＝５７％、普通＝４３％）」に変更する。 For example, in the example of FIG. 12, when the group is the department “manufacturing department” and the age is “25-42” and the ambiguity target attribute “answer 1” is the processing target, the frequency table as shown in FIG. 13 is obtained. In such a case, since the output frequency of b is “1” and n = 4, the obscuration mode in step S55 is selected. Further, according to the processing flow of FIG. 15, x = 3/28 and p = 3/56. Then, a1 and b1 are obfuscated with probability x, or a1 is obfuscated otherwise. When the latter is selected, the frequency table is changed as shown in FIG. Since a is “dissatisfied”, the appearance frequency of “dissatisfaction” is decreased by 1, and “? (dissatisfaction = 57%, normal = 43%)” is added accordingly. According to the frequency table after such change, the relevant portion of the original data in FIG. 12 is as shown in FIG. In the example of FIG. 18, one record in which “dissatisfied” is a response 1 is selected at random, and is changed to “? (Dissatisfied = 57%, normal = 43%)”.

一方、前者が選択されると、図１９に示すように頻度表が変更される。この場合、「不満」の出現頻度が１減少し、「普通」の出現頻度は１だったのでレコード自体が消去される。このような変更後の頻度表によれば、図１２の元データの関連部分は、図２０に示すようになる。回答１が「普通」のレコードは１つだけなのでそのレコードの属性値を「？（不満＝５７％、不満以外＝２１％ずつ）」に変更する。さらに、回答１が「不満」のレコードは３つあるのでランダムに１つ選択して「？（不満＝５７％、不満以外＝２１％ずつ）」に変更する。 On the other hand, when the former is selected, the frequency table is changed as shown in FIG. In this case, the appearance frequency of “unsatisfied” is decreased by 1, and the appearance frequency of “normal” is 1, so the record itself is deleted. According to the frequency table after such change, the relevant portion of the original data in FIG. 12 is as shown in FIG. Since there is only one record whose answer 1 is “normal”, the attribute value of the record is changed to “? (Dissatisfaction = 57%, non-satisfaction = 21% each)”. Further, since there are three records whose answer 1 is “dissatisfied”, one is selected at random and is changed to “? (Dissatisfaction = 57%, non-satisfaction = 21% each)”.

なお、図１２の元データ全体については、例えば図２１に示すようなデータに変換され、ステップＳ２５で出力される。部署「営業部門」且つ年齢「２４−４４」というグループの回答１という曖昧化対象属性については、３種類の属性値が出現するので曖昧化することなく、そのまま出力することになる。また、部署「営業部門」且つ年齢「２４−４４」というグループの回答２という曖昧化対象属性については、属性値「普通」が１種類しか出現していないので、ランダムに２つのレコードを選択して曖昧化データで置換している。その他のグループ及び曖昧化対象属性については、確率的にいずれかの曖昧化パターンが選択される。 Note that the entire original data in FIG. 12 is converted into data as shown in FIG. 21, for example, and output in step S25. As for the obfuscation target attribute of the answer “1” of the group “sales department” and age “24-44”, since three types of attribute values appear, they are output without being obfuscated. Also, for the obfuscation target attribute “answer 2” of the group “sales department” and age “24-44”, only one type of attribute value “normal” appears, so two records are selected at random. Are replaced with obfuscated data. For other groups and obscuring target attributes, any obscuring pattern is selected probabilistically.

［実施の形態３］
第２の実施の形態では、回答者が変更後の元データを見たとしても、他の回答者の回答を一意に特定できないようにしていたが、回答者が曖昧化後のデータを見ることがないということであれば、図１４及び図１６の曖昧化処理については、図２２に示すような曖昧化処理を行えばよい。 [Embodiment 3]
In the second embodiment, even if the respondent sees the original data after the change, the answer of other respondents cannot be specified uniquely, but the respondent sees the data after ambiguity. If there is no ambiguity, the obscuring process of FIGS. 14 and 16 may be performed as shown in FIG.

但し、ｎが３以上の場合には、以下のように考える。すなわち、頻度表が｛ａ：ｎ｝（頻度が多い方の属性値ａの出現頻度が総頻度と同じｎである）である場合、特定の人がａであることを頻度表と無関係な他者に一意に決められないようにするには、少なくとも１個を曖昧化することになる。一方、曖昧化は最小限にしたいため、｛ａ：ｎ−１，？：１｝と曖昧化すべきである。 However, when n is 3 or more, it is considered as follows. In other words, if the frequency table is {a: n} (the appearance frequency of the attribute value a having the higher frequency is n, which is the same as the total frequency), the fact that the specific person is a is not related to the frequency table. In order not to be uniquely determined by a person, at least one is obfuscated. On the other hand, because we want to minimize ambiguity, {a: n-1 ,? : 1} should be obscured.

｛ａ：ｎ−１，？：１｝の元の頻度表が｛ａ：ｎ｝であると決められないようにするために、確率ｘを次のように定める。 {A: n-1,? In order to prevent the original frequency table of 1: 1 from being determined to be {a: n}, the probability x is determined as follows.

すなわち、頻度表が｛ａ：ｎ−１，ｂ：１｝のとき、｛ａ：ｎ−１，？：１｝とする確率をｘとする。ここで０≦ｘ≦１である。 That is, when the frequency table is {a: n-1, b: 1}, {a: n-1,? : 1} is assumed to be x. Here, 0 ≦ x ≦ 1.

このとき、取り得る属性値（ａ及びｂなど）の出現確率を全て等しいとすると、｛ａ：ｎ−１，？：１｝の元の頻度表が｛ａ：ｎ｝である確率をＰとすると、これは次の式となる。 At this time, if the appearance probabilities of possible attribute values (such as a and b) are all equal, {a: n−1,? : 1}, where P is the probability that the original frequency table is {a: n}, this is

プライバシー保護を最大限にするため、Ｐが最小となるｘを求めることを考える。なお、Ｐが１になると、特定の人がａであることが頻度表と無関係な他者にも一意に決められてしまう。 In order to maximize privacy protection, consider finding x that minimizes P. In addition, when P becomes 1, it is uniquely determined by other persons unrelated to the frequency table that the specific person is a.

従って、ｎ≧３の場合、ｘ＝１のときＰは最小になり、Ｐは次の式になる。 Therefore, when n ≧ 3, when x = 1, P is minimum, and P is given by

また、ｎ＝２の場合、｛ａ：１，ｂ：１｝を曖昧化する際、｛ａ：１，？：１｝と｛ｂ：１，？：１｝の確率を等しくしたいため、ｘ＝１／２のときＰは最小になり、Ｐは次の式になる。 In addition, when n = 2, when {a: 1, b: 1} is obscured, {a: 1,? : 1} and {b: 1,? : 1} to equalize the probability, P is minimum when x = 1/2, and P is given by the following equation.

以上のような考え方に基づき、以下のような処理を実施する。 Based on the above concept, the following processing is implemented.

まず、曖昧化処理部２６０は、第２データ格納部２７０から頻度表を読み出す（図２２：ステップＳ１０１）。また、曖昧化処理部２６０は、ｎに総頻度（出現頻度の総和）を設定する（ステップＳ１０３）。 First, the ambiguity processing unit 260 reads the frequency table from the second data storage unit 270 (FIG. 22: step S101). Further, the obscuring processing unit 260 sets the total frequency (total appearance frequency) to n (step S103).

そして、曖昧化処理部２６０は、頻度表のレコード数が２であるか判断する（ステップＳ１０５）。頻度表のレコード数が２以外である場合（１又は３以上の場合）、曖昧化処理部２６０は、頻度表のレコード数が１であるか判断する（ステップＳ１１９）。頻度表のレコード数が１ではない、即ち３以上である場合には曖昧化することなく呼び出し元の処理に戻る。 Then, the ambiguity processing unit 260 determines whether the number of records in the frequency table is 2 (step S105). When the number of records in the frequency table is other than 2 (1 or 3 or more), the ambiguity processing unit 260 determines whether the number of records in the frequency table is 1 (step S119). If the number of records in the frequency table is not 1, that is, 3 or more, the process returns to the caller process without ambiguity.

一方、頻度表のレコード数が１である場合には、曖昧化処理部２６０は、唯一のレコードの出現頻度のうち１つを曖昧化する（ステップＳ１２１）。ステップＳ４３と同趣旨である。上で述べたように、ｎが３以上であれば、確率分布情報については、「？（ａ＝確率Ｐ、ａ以外＝確率（１−Ｐ）／ｖ）ずつ」という情報になる。ｍ＝３及びｎ＝４であれば、Ｐ＝１／（ｖｎ＋１）＝１／（（３−１）＊４＋１）＝１１％となる。そして処理はステップＳ１１７に移行する。 On the other hand, when the number of records in the frequency table is 1, the ambiguity processing unit 260 obscures one of the appearance frequencies of only one record (step S121). This is the same as step S43. As described above, if n is 3 or more, the probability distribution information is “? (A = probability P, other than a = probability (1-P) / v)”. If m = 3 and n = 4, then P = 1 / (vn + 1) = 1 / ((3-1) * 4 + 1) = 11%. Then, the process proceeds to step S117.

また、頻度表のレコード数が２である場合には、曖昧化処理部２６０は、頻度表においてレコードを頻度順に並べ替え、多い方をａ、少ない方をｂに設定する（ステップＳ１０７）。そして、曖昧化処理部２６０は、ｂの頻度は１であるか判断する（ステップＳ１０９）。ｂの頻度が２以上であれば、上記の前提の下では問題がないので、曖昧化することなく呼び出し元の処理に戻る。 If the number of records in the frequency table is 2, the ambiguity processing unit 260 rearranges the records in the frequency table in the order of frequency, and sets a greater number to a and a smaller number to b (step S107). Then, the ambiguity processing unit 260 determines whether the frequency of b is 1 (step S109). If the frequency of b is 2 or more, there is no problem under the above assumption, and the process returns to the caller process without ambiguity.

一方、ｂの頻度が１である場合には、曖昧化処理部２６０は、ｎ＝２であるか判断する（ステップＳ１１１）。ｎが２以外の場合には、曖昧化処理部２６０は、ｂ１個を曖昧化する（ステップＳ１１５）。この場合、｛ａ：ｎ−１，？：１｝であるから、確率分布情報については、ｎが３以上であれば「ａ＝確率Ｐ（＝１／（ｖｎ＋１））、ａ以外の属性値＝確率（（１−Ｐ）／ｖ）ずつ」となる。そして処理はステップＳ１１７に移行する。 On the other hand, when the frequency of b is 1, the ambiguity processing unit 260 determines whether n = 2 (step S111). When n is other than 2, the ambiguity processing unit 260 obscures b1 (step S115). In this case, {a: n-1,? 1}, for probability distribution information, if n is 3 or more, “a = probability P (= 1 / (vn + 1)), attribute value other than a = probability ((1−P) / v) ” Then, the process proceeds to step S117.

また、ｎ＝２である場合には、曖昧化処理部２６０は、乱数により、次のいずれかの曖昧化パターンのうちいずれかを選択して、実行する（ステップＳ１１３）。すなわち、（１）確率１／２で、ａ１個を曖昧化する。この場合、確率分布情報については、｛ｂ：１，？：１｝であるから、「？（ｂ＝確率１／（ｖ＋１），ｂ以外＝確率（１−１／（ｖ＋１））／ｖずつ）」という情報である。（２）それ以外で、ｂ１個を曖昧化する。この場合、確率分布情報については、｛ａ：１，？：１｝であるから、「？（ａ＝確率１／（ｖ＋１），ａ以外＝確率（１−１／（ｖ＋１））／ｖずつ）」という情報である。そして、ステップＳ１１７に移行する。 If n = 2, the obscuring processing unit 260 selects and executes one of the following obscuring patterns using a random number (step S113). That is, (1) a1 is made ambiguous with probability 1/2. In this case, for probability distribution information, {b: 1,? 1}, the information is “? (B = probability 1 / (v + 1), other than b = probability (1-1 / (v + 1)) / v)”. (2) Otherwise, obscure b1. In this case, for probability distribution information, {a: 1,? 1}, the information is “? (A = probability 1 / (v + 1), other than a = probability (1-1 / (v + 1)) / v each)”. Then, the process proceeds to step S117.

その後、曖昧化処理部２６０は、変更後の頻度表を第２データ格納部２７０に格納する（ステップＳ１１７）。 Thereafter, the ambiguity processing unit 260 stores the changed frequency table in the second data storage unit 270 (step S117).

例えば図１２に示した例を図２２の処理フローで処理すると、図２３のようなデータが得られる。すなわち、部署「製造部門」且つ年齢「２５−４２」というグループの曖昧化対象属性「回答２」については、第２の実施の形態では曖昧化されていたが、本実施の形態では曖昧化されない。また、部署「製造部門」且つ年齢「２５−４２」というグループの曖昧化対象属性「回答１」については、ステップＳ１１５において、回答１が「普通」であるレコードの属性値が「？（不満＝１１％、不満以外＝４４％ずつ）」という曖昧化データに置換される。また、部署「営業部門」且つ年齢「２４−４４」というグループの曖昧化対象属性「回答１」を処理する場合には、曖昧化されない。部署「営業部門」且つ年齢「２４−４４」というグループの曖昧化対象属性「回答２」を処理する場合に、ステップＳ１２１で曖昧化データに置換される。すなわち、「？（不満＝１１％、不満以外＝４４％ずつ」）となる。 For example, when the example shown in FIG. 12 is processed by the processing flow of FIG. 22, data as shown in FIG. 23 is obtained. In other words, the obfuscation target attribute “answer 2” of the group “manufacturing department” and age “25-42” has been obscured in the second embodiment, but is not obscured in the present embodiment. . In addition, regarding the obfuscation target attribute “answer 1” of the group “manufacturing department” and age “25-42”, the attribute value of the record whose answer 1 is “normal” is “? 11%, other than dissatisfaction = 44% each) ”. Further, in the case where the ambiguity target attribute “answer 1” of the group “sales department” and age “24-44” is processed, it is not obscured. When the obfuscation target attribute “answer 2” of the department “sales department” and age “24-44” is processed, it is replaced with obfuscation data in step S121. That is, “? (Dissatisfaction = 11%, non-satisfaction = 44% each”).

以上本技術の実施の形態を説明したが、本技術はこれに限定されるものではない。機能ブロック構成については、一例であって、必ずしも実際のプログラムモジュール構成と一致しない場合もある。また、処理フローについても、処理結果が変わらない限り、処理順番を入れ替えたり、複数のステップを並列実行しても良い。 Although the embodiment of the present technology has been described above, the present technology is not limited to this. The functional block configuration is an example, and may not necessarily match the actual program module configuration. As for the processing flow, as long as the processing result does not change, the processing order may be changed or a plurality of steps may be executed in parallel.

また、上で述べた実施の形態では、全ての属性値を平等に扱ったが、属性値によってはプライバシー上あまり問題にならないような値を特別扱いするなどしても良い。たとえば、各人について「回答」が普通であることは開示して問題ない場合は、[普通, 普通, 普通, 普通] や [普通, 普通, 満足, 普通] などは、偏りが大きいと判定される場合においても曖昧化しない、といったアルゴリズムにしても良い。 In the embodiment described above, all attribute values are treated equally. However, depending on the attribute value, a value that does not cause much privacy may be treated specially. For example, if there is no problem in disclosing that “answer” is normal for each person, [normal, normal, normal, normal] and [normal, normal, satisfied, normal] etc. are judged to have a large bias. In such a case, an algorithm that does not obfuscate the case may be used.

なお、上で述べた情報処理装置１００及び２００は、コンピュータ装置であって、図２４に示すように、メモリ２５０１とＣＰＵ（Central Processing Unit）２５０３とハードディスク・ドライブ（ＨＤＤ：Hard Disk Drive）２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本技術の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatuses 100 and 200 described above are computer apparatuses, and as shown in FIG. 24, a memory 2501, a CPU (Central Processing Unit) 2503, a hard disk drive (HDD: Hard Disk Drive) 2505, A display control unit 2507 connected to the display device 2509, a drive device 2513 for a removable disk 2511, an input device 2515, and a communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS: Operating System) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In an embodiment of the present technology, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed from the drive device 2513 to the HDD 2505. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本実施の形態をまとめると、以下のようになる。 The above-described embodiment can be summarized as follows.

本実施の形態に係る情報処理方法は、（Ａ）複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、レコードの数の分布が、偏りが大きいことを表す条件を満たしているか判断するステップと、（Ｂ）レコードの数の分布が偏りが大きいことを表す条件を満たしている場合には、複数のレコードのうち少なくとも１のレコードにおける第１の属性の属性値を、曖昧化データに置換し、データ格納部に格納するステップとを含む。 In the information processing method according to the present embodiment, (A) the first attribute of the plurality of records is included for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. Determining whether the distribution of the number of records satisfies a condition indicating that the deviation is large, from data stored in a data storage unit in which the number of records in which the attribute value appears is stored (B ) When the condition indicating that the distribution of the number of records has a large deviation is satisfied, the attribute value of the first attribute in at least one of the plurality of records is replaced with the ambiguous data, and the data is stored. Storing in the section.

このように曖昧化対象の第１の属性の属性値を曖昧化データと置換することで、元の値の出現分布の傾向を残しつつプライバシー保護を図ることができるようになる。 Thus, by replacing the attribute value of the first attribute to be obfuscated with the ambiguous data, privacy protection can be achieved while leaving the tendency of the appearance distribution of the original value.

また、本実施の形態に係る情報処理方法が、（Ｃ）データ格納部に格納されているレコードを、第１の属性とは異なる第２の属性（又は第２の属性群）の属性値が同じレコードにグループ化することで、複数のレコードを抽出する抽出ステップと、（Ｄ）複数のレコードにおける第１の属性の属性値毎に、当該第１の属性の属性値を含むレコードの数を計数し、データ格納部に格納するステップとをさらに含むようにしても良い。このように、曖昧化対象属性の属性値毎に、レコード数を算出するようにしても良い。 Further, in the information processing method according to the present embodiment, (C) a record stored in the data storage unit has an attribute value of a second attribute (or second attribute group) different from the first attribute. The extraction step of extracting a plurality of records by grouping into the same record, and (D) the number of records including the attribute value of the first attribute for each attribute value of the first attribute in the plurality of records And counting and storing in the data storage unit. In this way, the number of records may be calculated for each attribute value of the obfuscation target attribute.

さらに、上で述べた曖昧化データが、少なくとも第１の属性の属性値のうち頻度が最も多い属性値である確率のデータを含むようにしても良い。このような確率のデータが提示されれば、より元の値の傾向を把握することが容易になる。 Further, the obscuration data described above may include data having a probability that the attribute value has the highest frequency among the attribute values of the first attribute. If such probability data is presented, it becomes easier to grasp the tendency of the original value.

また、上で述べた抽出ステップが、第２の属性（又は第２の属性群）の属性値が、ｋ個以上同じ値となるように匿名化する処理を行った後に実施される場合もある。すなわちｋ−匿名化処理を実施すれば、基礎的なプライバシー保護を実現できるようになる。 In addition, the extraction step described above may be performed after performing anonymization processing so that the attribute value of the second attribute (or the second attribute group) is equal to k or more. . That is, if the k-anonymization process is performed, basic privacy protection can be realized.

さらに、上で述べた偏りが大きいことを表す条件が、第１の属性の属性値が２種類しか出現しておらず頻度が少ない方の属性値の頻度が１又は２であるという条件と、第１の属性の属性値が１種類しか存在しないという条件とのいずれかを満たすという判断条件である場合もある。回答者自身が処理結果を見ても他の回答者の回答内容を一意に識別できないようにしつつ、曖昧化するデータを最小限に抑えるには、このような条件を採用する。なお、この場合、曖昧化データと置換するレコードの数は多くとも２となる。 Further, the above-described condition indicating that the bias is large is that the attribute value of the first attribute appears only in two types and the frequency of the attribute value with the lower frequency is 1 or 2, In some cases, the determination condition satisfies either one of the conditions that the attribute value of the first attribute has only one type. Such a condition is adopted in order to minimize the data to be obscured while making it impossible for the respondent himself / herself to uniquely identify the reply contents of other respondents even when viewing the processing result. In this case, the number of records to be replaced with the ambiguous data is at most two.

また、複数のレコードのレコード数が２以下である場合に、曖昧化ステップを実施するようにしても良い。このように、元々の回答者の数が少ない場合には曖昧化を行ってプライバシー保護を図る。 Further, when the number of records of the plurality of records is 2 or less, the ambiguity step may be performed. Thus, when the number of original respondents is small, obfuscation is performed to protect privacy.

また、上で述べた曖昧化ステップが、複数のレコードのレコード数と第１の属性の属性値の取り得る種類の数とに応じて算出される確率に従い、複数のレコードのうち１又は２のレコードにおける第１の属性の属性値を曖昧化する複数の曖昧化パターンのうちいずれかを特定するステップと、特定された曖昧化パターンに従って、第１の属性の属性値を曖昧化データで置換するステップとを含むようにしても良い。このような処理を行うと効果的にプライバシー保護が図られる。 In addition, the ambiguity step described above may include one or two of the plurality of records according to the probability calculated according to the number of records of the plurality of records and the number of types that the attribute value of the first attribute can take. The step of identifying any one of a plurality of obfuscation patterns for obfuscating the attribute value of the first attribute in the record, and replacing the attribute value of the first attribute with the obfuscation data according to the identified obfuscation pattern Steps may be included. When such processing is performed, privacy protection is effectively achieved.

さらに、上で述べた偏りが大きいことを表す条件が、第１の属性の属性値が２種類しか出現しておらず且つ頻度が少ない方の属性の頻度が１であるという条件と、第１の属性の属性値が１種類しか存在しないという条件とのいずれかを満たすという判断条件である場合もある。例えば、回答者自身が処理結果を見ることがない場合には、このような条件でもプライバシー保護が図られる。この場合、曖昧化データと置換されるレコードの数は１となる。 Further, the above-described condition indicating that the bias is large includes the condition that only two types of attribute values of the first attribute appear and the frequency of the attribute with the lower frequency is 1, In some cases, the determination condition satisfies one of the conditions that only one type of attribute value exists. For example, when the respondent himself does not see the processing result, privacy protection can be achieved even under such conditions. In this case, the number of records replaced with the ambiguous data is 1.

さらに、複数のレコードにおける第１の属性の属性値が、第１の属性値と当該第１の属性値より頻度が低い第２の属性値とを含み、複数のレコードのレコード数をｎとする場合、例えば以下のようにして上で述べた確率を算出するようにしても良い。すなわち、第１の属性値が（ｎ−２）個出現し、２レコード分前記曖昧化データに置換されたことを表す情報が生成されている場合において実際には第１の属性値がｎ個出現していた場合の確率Ａと、第２の属性値に該当する人から見て第１の属性値が（ｎ−２）個出現し、２レコード分曖昧化データに置換されたことを表す情報が生成されている場合において実際には第１の属性値が（ｎ−１）個出現しており且つ第２の属性が１つ出現していた場合の確率Ｂと、第２の属性値に該当する人から見て第１の属性値が（ｎ−２）個出現し、第２の属性値が１個出現し、１レコード分曖昧化データに置換されたことを表す情報が生成されている場合において実際には第１の属性値が（ｎ−１）個出現しており且つ第２の属性が１つ出現していた場合の確率Ｃと、が等しく且つ最小になるという条件を満たすように、上で述べた確率が算出されるようにしても良い。適切な確率を算出できる。 Furthermore, the attribute value of the first attribute in the plurality of records includes the first attribute value and the second attribute value having a lower frequency than the first attribute value, and the number of records of the plurality of records is n. In this case, for example, the probability described above may be calculated as follows. That is, when (n−2) first attribute values appear and information indicating that two records have been replaced with the obfuscation data is generated, the first attribute value is actually n first. The probability A when it appears, and the fact that (n-2) first attribute values appear from the viewpoint of the person corresponding to the second attribute value, and that two records have been replaced with obfuscation data When information is generated, the probability B in the case where (n−1) first attribute values actually appear and one second attribute appears, and the second attribute value Information indicating that (n-2) first attribute values appear, one second attribute value appears, and one record has been replaced with obfuscation data as viewed from the person corresponding to In practice, (n-1) first attribute values appear and one second attribute appears. Probability C of cases, to satisfy the condition that is equal and minimum, may be the probability mentioned above is calculated. An appropriate probability can be calculated.

なお、上で述べたような処理をコンピュータに実施させるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブル・ディスク、ＣＤ−ＲＯＭなどの光ディスク、光磁気ディスク、半導体メモリ（例えばＲＯＭ）、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。 It is possible to create a program for causing a computer to carry out the processing described above, such as a flexible disk, an optical disk such as a CD-ROM, a magneto-optical disk, and a semiconductor memory (for example, ROM). Or a computer-readable storage medium such as a hard disk or a storage device.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に前記複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、前記レコードの数の分布に、偏りが大きいことを表す条件を満たしているか判断するステップと、
前記レコードの数の分布に、前記偏りが大きいことを表す条件を満たしている場合には、前記複数のレコードのうち少なくとも１のレコードにおける前記第１の属性の属性値を、曖昧化データに置換し、前記データ格納部に格納するステップと、
を含み、コンピュータにより実行される情報処理方法。 (Appendix 1)
The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. Determining whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from the data stored in the data storage unit,
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And storing in the data storage unit;
An information processing method executed by a computer.

（付記２）
前記データ格納部に格納されているレコードを、前記第１の属性とは異なる第２の属性の属性値が同じレコードにグループ化することで、前記複数のレコードを抽出する抽出ステップと、
前記複数のレコードにおける前記第１の属性の属性値毎に、当該第１の属性の属性値を含むレコードの数を計数し、前記データ格納部に格納するステップと、
をさらに含む付記１記載の情報処理方法。 (Appendix 2)
An extraction step of extracting the plurality of records by grouping records stored in the data storage unit into records having the same attribute value of a second attribute different from the first attribute;
For each attribute value of the first attribute in the plurality of records, counting the number of records including the attribute value of the first attribute, and storing in the data storage unit;
The information processing method according to appendix 1, further comprising:

（付記３）
前記曖昧化データが、少なくとも前記第１の属性の属性値のうち頻度が最も多い属性値である確率のデータを含む
付記１又は２記載の情報処理方法。 (Appendix 3)
The information processing method according to claim 1 or 2, wherein the obfuscation data includes data having a probability that the attribute value has the highest frequency among the attribute values of the first attribute.

（付記４）
前記抽出ステップが、前記第２の属性の属性値が、ｋ個以上同じ値となるように匿名化する処理を行った後に実施される
付記１又は２記載の情報処理方法。 (Appendix 4)
The information processing method according to claim 1 or 2, wherein the extraction step is performed after performing anonymization processing such that k or more attribute values of the second attribute have the same value.

（付記５）
前記偏りが大きいことを表す条件が、前記第１の属性の属性値が２種類しか出現しておらず且つ頻度が少ない方の属性値の頻度が１又は２であるという条件と、前記第１の属性の属性値が１種類しか存在しないという条件とのいずれかを満たすという判断条件である
付記１乃至４のいずれか１つ記載の情報処理方法。 (Appendix 5)
The condition indicating that the bias is large includes a condition that only two types of attribute values of the first attribute appear and the frequency of the attribute value having a lower frequency is 1 or 2, and the first The information processing method according to any one of appendices 1 to 4, wherein the determination condition satisfies any one of a condition that only one type of attribute value of the attribute exists.

（付記６）
前記複数のレコードのレコード数が２以下である場合に、前記曖昧化ステップを実施する
付記５記載の情報処理方法。 (Appendix 6)
The information processing method according to claim 5, wherein the obfuscation step is performed when the number of records of the plurality of records is 2 or less.

（付記７）
前記曖昧化ステップが、
前記複数のレコードのレコード数と前記第１の属性の属性値の取り得る種類の数とに応じて算出される確率に従い、前記複数のレコードのうち１又は２のレコードにおける第１の属性の属性値を曖昧化する複数の曖昧化パターンのうちいずれかを特定するステップと、
特定された前記曖昧化パターンに従って、前記第１の属性の属性値を前記曖昧化データで置換するステップと、
を含む付記１乃至６のいずれか記載の情報処理方法。 (Appendix 7)
The obscuring step comprises:
The attribute of the first attribute in one or two of the plurality of records according to the probability calculated according to the number of records of the plurality of records and the number of types of attribute values of the first attribute Identifying any one of a plurality of obfuscation patterns that obfuscate values;
Replacing the attribute value of the first attribute with the obfuscation data according to the identified obfuscation pattern;
The information processing method according to any one of appendices 1 to 6, including:

（付記８）
前記偏りが大きいことを表す条件が、前記第１の属性の属性値が２種類しか出現しておらず且つ頻度が少ない方の属性の頻度が１であるという条件と、前記第１の属性の属性値が１種類しか存在しないという条件とのいずれかを満たすという判断条件である
付記１乃至４記載のいずれか１つ記載の情報処理方法。 (Appendix 8)
The condition indicating that the bias is large includes a condition that only two types of attribute values of the first attribute appear and the frequency of the attribute with the lower frequency is 1, and the first attribute The information processing method according to any one of supplementary notes 1 to 4, which is a determination condition that satisfies any one of a condition that only one type of attribute value exists.

（付記９）
前記複数のレコードにおける前記第１の属性の属性値が、第１の属性値と当該第１の属性値より頻度が低い前記第２の属性値とを含み、
前記複数のレコードのレコード数をｎとし、
前記第１の属性値が（ｎ−２）個出現し、２レコード分前記曖昧化データに置換されたことを表す情報が生成されている場合において実際には前記第１の属性値がｎ個出現していた場合の確率Ａと、前記第２の属性値に該当する人から見て前記第１の属性値が（ｎ−２）個出現し、２レコード分前記曖昧化データに置換されたことを表す情報が生成されている場合において実際には前記第１の属性値が（ｎ−１）個出現しており且つ前記第２の属性が１つ出現していた場合の確率Ｂと、前記第２の属性値に該当する人から見て前記第１の属性値が（ｎ−２）個出現し、前記第２の属性値が１個出現し、１レコード分前記曖昧化データに置換されたことを表す情報が生成されている場合において実際には前記第１の属性値が（ｎ−１）個出現しており且つ前記第２の属性が１つ出現していた場合の確率Ｃと、が等しく且つ最小になるという条件を満たすように、前記確率が算出される
付記７記載の情報処理方法。 (Appendix 9)
The attribute value of the first attribute in the plurality of records includes a first attribute value and the second attribute value having a frequency lower than that of the first attribute value,
The number of records of the plurality of records is n,
In the case where (n-2) first attribute values appear and information indicating that two records have been replaced with the obfuscation data is generated, the first attribute value is actually n. (N−2) first attribute values appearing from the viewpoint of the person corresponding to the second attribute value and the probability A in the case of appearing, two records were replaced with the obfuscation data. In the case where information indicating that is generated, the probability B in the case where (n−1) first attribute values actually appear and one second attribute appears, From the viewpoint of the person corresponding to the second attribute value, (n−2) first attribute values appear, one second attribute value appears, and one record is replaced with the ambiguous data. In the case where the information indicating that it has been generated is actually generated, (n-1) first attribute values appear. Cage and the probability C of when the second attribute that has emerged one so as to satisfy the condition that is equal and minimum, the information processing method according to Note 7, wherein the probability is calculated.

（付記１０）
複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に前記複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、前記レコードの数の分布に、偏りが大きいことを表す条件を満たしているか判断するステップと、
前記レコードの数の分布に、前記偏りが大きいことを表す条件を満たしている場合には、前記複数のレコードのうち少なくとも１のレコードにおける前記第１の属性の属性値を、曖昧化データに置換し、前記データ格納部に格納するステップと、
を、コンピュータに実行させるプログラム。 (Appendix 10)
The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. Determining whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from the data stored in the data storage unit,
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And storing in the data storage unit;
A program that causes a computer to execute.

（付記１１）
複数のレコードに含まれ且つ曖昧化対象と指定されている第１の属性の属性値の種類毎に前記複数のレコードのうち当該第１の属性の属性値が出現するレコードの数が格納されているデータ格納部に格納されているデータから、前記レコードの数の分布に、偏りが大きいことを表す条件を満たしているか判断する判断部と、
前記レコードの数の分布に、前記偏りが大きいことを表す条件を満たしている場合には、前記複数のレコードのうち少なくとも１のレコードにおける前記第１の属性の属性値を、曖昧化データに置換し、前記データ格納部に格納する曖昧化処理部と、
を有する情報処理装置。 (Appendix 11)
The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. A determination unit that determines whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from data stored in a data storage unit;
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And an ambiguity processing unit stored in the data storage unit,
An information processing apparatus.

１００，２００情報処理装置
１１０第１データ格納部
１２０判断部
１３０曖昧化処理部
１４０第２データ格納部
２１０第１データ格納部
２２０ｋ−匿名化処理部
２３０グループ化処理部
２４０出力部
２５０入力部
２６０曖昧化処理部
２７０第２データ格納部 100, 200 Information processing device 110 First data storage unit 120 Determination unit 130 Ambiguization processing unit 140 Second data storage unit 210 First data storage unit 220 k-anonymization processing unit 230 Grouping processing unit 240 Output unit 250 Input unit 260 Ambiguity processing unit 270 Second data storage unit

Claims

The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. Determining whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from the data stored in the data storage unit,
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And storing in the data storage unit;
An information processing method executed on a computer.

An extraction step of extracting the plurality of records by grouping records stored in the data storage unit into records having the same attribute value of a second attribute different from the first attribute;
For each attribute value of the first attribute in the plurality of records, counting the number of records including the attribute value of the first attribute, and storing in the data storage unit;
The information processing method according to claim 1, further comprising:

The information processing method according to claim 1, wherein the obfuscation data includes data having a probability that the attribute value has the highest frequency among the attribute values of the first attribute.

The information processing method according to claim 1 or 2, wherein the extraction step is performed after performing anonymization processing so that attribute values of the second attributes are equal to k or more.

The condition indicating that the bias is large includes a condition that only two types of attribute values of the first attribute appear and the frequency of the attribute value having a lower frequency is 1 or 2, and the first The information processing method according to any one of claims 1 to 4, wherein the determination condition satisfies any one of a condition that only one type of attribute value of the attribute exists.

The obscuring step comprises:
The attribute of the first attribute in one or two of the plurality of records according to the probability calculated according to the number of records of the plurality of records and the number of types of attribute values of the first attribute Identifying any one of a plurality of obfuscation patterns that obfuscate values;
Replacing the attribute value of the first attribute with the obfuscation data according to the identified obfuscation pattern;
The information processing method according to claim 1, comprising:

The condition indicating that the bias is large includes a condition that only two types of attribute values of the first attribute appear and the frequency of the attribute with the lower frequency is 1, and the first attribute The information processing method according to any one of claims 1 to 4, wherein the determination condition satisfies any one of a condition that only one kind of attribute value exists.

The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. Determining whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from the data stored in the data storage unit,
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And storing in the data storage unit;
A program that causes a computer to execute.

The number of records in which the attribute value of the first attribute appears among the plurality of records is stored for each type of attribute value of the first attribute that is included in the plurality of records and designated as the object to be obfuscated. A determination unit that determines whether or not a condition indicating a large deviation is satisfied in the distribution of the number of records from data stored in a data storage unit;
When the condition indicating that the deviation is large is satisfied in the distribution of the number of records, the attribute value of the first attribute in at least one of the plurality of records is replaced with ambiguous data And an ambiguity processing unit stored in the data storage unit,
An information processing apparatus.