JP5683425B2

JP5683425B2 - Data disturbance / reconstruction system, data reconstruction device, data reconstruction method, data reconstruction program

Info

Publication number: JP5683425B2
Application number: JP2011219796A
Authority: JP
Inventors: 亮菊池; 大五十嵐; 千田　浩司; 浩司千田; 浩気濱田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-10-04
Filing date: 2011-10-04
Publication date: 2015-03-11
Anticipated expiration: 2031-10-04
Also published as: JP2013080094A

Description

本発明はデータベースにおける個別データを確率的手法により秘匿しつつ統計値のみを得るデータ撹乱・再構築システム、データ再構築装置、データ再構築方法、データ再構築プログラムに関する。 The present invention relates to a data disturbance / reconstruction system, a data reconstruction device, a data reconstruction method, and a data reconstruction program that obtain only statistical values while concealing individual data in a database by a probabilistic method.

従来、データベースにおける個別データを確率的手法により秘匿しつつクロス集計結果のみを再構築して得る技術として、非特許文献１の方法がある。非特許文献１に示されているような撹乱・再構築法は、データに対し撹乱と呼ばれる確率的な処理を行うことでプライバシーを保護し、統計値を得る際には再構築と呼ばれるステップを経ることで統計値を復元する手法である。撹乱処理はデータを提供する提供者が自身で行うことが可能であるため、管理者に対してもプライバシーを保護できる。この撹乱・再構築法の中でも、維持置換撹乱と呼ばれる撹乱処理と、クロス集計と呼ばれる統計量を得るための反復ベイズ法という再構築処理を組み合わせた手法は、セキュア関数計算に比べ計算効率が良い。また、ｋ−匿名法と比較した際も同等のプライバシー保護度合いを得つつより精度の良い統計値を得られることが実験で示されている（非特許文献２）。 Conventionally, there is a method of Non-Patent Document 1 as a technique for reconstructing only the cross tabulation result while concealing individual data in a database by a probabilistic method. The disturbance / reconstruction method as shown in Non-Patent Document 1 protects privacy by performing probabilistic processing called disturbance on data, and a step called reconstruction is performed when obtaining statistical values. It is a technique to restore the statistical value by passing. The disturbing process can be performed by the provider who provides the data, so that privacy can be protected even for the administrator. Among these disturbance / reconstruction methods, the combination of disturbance processing called maintenance replacement disturbance and iterative Bayesian method for obtaining statistics called cross tabulation is more computationally efficient than secure function calculation. . In addition, experiments have shown that more accurate statistical values can be obtained while obtaining the same degree of privacy protection when compared with the k-anonymous method (Non-Patent Document 2).

五十嵐大, 千田浩司, 高橋克巳, “多値属性に適用可能な効率的プライバシー保護クロス集計”, CSS2008, 2008.University of Igarashi, Koji Senda, Katsumi Takahashi, “Efficient Privacy Protection Cross Tabulation Applicable to Multi-valued Attributes”, CSS2008, 2008. 永井彰, 五十嵐大, 濱田浩気, 松林達史, “クロネッカー積を含む行列積演算の最適化による効率的なプライバシ保護データ公開技術”, SCIS, 2010.Nagai Akira, Igarashi Univ., Hiroki Hirota, Tatsufumi Matsubayashi, “Efficient Privacy-Protected Data Disclosure Technology by Optimization of Matrix Product Operations Including Kronecker Product”, SCIS, 2010.

しかしながら、従来技術には、秘匿データの取り得る値の範囲が大きくなると再構築の処理コストが急激に増大するという課題がある。維持置換撹乱と反復ベイズ法の組み合わせでは反復ベイズ法における計算量が支配的であり、その計算量はデータの取りうる値の広さ、すなわち全属性値の組み合わせ数に強く依存している。そのため例えば１歳刻みの年齢等データの値域が広い場合、計算量が膨大となるため、非特許文献１などに提案されている幾つかの高速化手法を用いても、現実的な時間で実行できない場合が存在する。 However, the conventional technique has a problem that the processing cost of reconstruction increases rapidly when the range of values that the confidential data can take increases. In the combination of the maintenance replacement disturbance and the iterative Bayes method, the amount of calculation in the iterative Bayes method is dominant, and the amount of calculation strongly depends on the range of values that the data can take, that is, the number of combinations of all attribute values. For this reason, for example, when the range of data such as age in increments of 1 year is large, the amount of calculation becomes enormous, so even if some speedup methods proposed in Non-Patent Document 1 are used, it can be executed in a realistic time. There are cases where it is not possible.

本発明は、秘匿されたデータを再構築するための処理コストを低減することを目的とする。 An object of the present invention is to reduce processing costs for reconstructing secret data.

本発明のデータ撹乱・再構築システムは、データ撹乱装置とデータ再構築装置とを備える。まず、Ｋは属性の数を示す２以上の整数、Ｎはレコードの数を示す２以上の整数、Ｍはすべての属性値の組み合わせの数とする。データ撹乱装置は、Ｋ個の属性を有するＮ個のレコードからなる初期テーブルを、一部の属性は属性値を撹乱させて他の属性は属性値を保持する遷移確率行列Ａを用いて撹乱させた撹乱テーブルに対するクロス集計Ｙ＝（ｙ_０，ｙ_１，…，ｙ_Ｍ−１）を生成する。データ再構築装置は、値域計算部、行列生成部、ベクトル生成部、反復ベイズ部を備える。値域計算部は、撹乱された属性の属性値の組み合わせの数を準識別子全体の値域Ｑとする。行列生成部は、保持された属性の属性値の組み合わせｐごとに、遷移確率行列Ａの成分から、撹乱された属性の遷移確率を用いてＱ×Ｑの部分遷移確率行列Ａ_ｐを生成する。ベクトル生成部は、保持された属性の属性値の組み合わせｐごとに、クロス集計Ｙの成分から、保持された属性の属性値の組み合わせがｐであるクロス集計の値を用いてＱ次のベクトルＹ_ｐを生成する。反復ベイズ部は、保持された属性の属性値の組み合わせｐごとに、反復ベイズ法を用いて、撹乱させた属性に対する再構築されたクロス集計を示すＱ次のベクトルＸ_ｐを部分遷移確率行列Ａ_ｐとベクトルＹ_ｐから求め、すべてのベクトルＸ_ｐを用いてクロス集計Ｘを再構築する。 The data disturbance / reconstruction system of the present invention includes a data disturbance device and a data reconstruction device. First, K is an integer of 2 or more indicating the number of attributes, N is an integer of 2 or more indicating the number of records, and M is the number of combinations of all attribute values. The data disturbance device disturbs an initial table composed of N records having K attributes by using a transition probability matrix A in which some attributes disturb attribute values and other attributes retain attribute values. Cross tabulation Y = (y ₀ , y ₁ ,..., Y _M−1 ) for the disturbance table is generated. The data reconstruction apparatus includes a range calculation unit, a matrix generation unit, a vector generation unit, and an iterative Bayes unit. The range calculation unit sets the number of combinations of attribute values of the disturbed attribute as the range Q of the entire quasi-identifier. Matrix generating unit, for each combination p of attribute values of the retained attribute, from the components of the transition probability matrix A, to generate a partial transition probability matrix A _p of Q × Q by using the transition probabilities of the disturbed attributes. The vector generation unit uses a cross tabulation value in which the combination of attribute values of the held attributes is p from the components of the cross tabulation Y for each combination p of attribute values of the held attributes. _p is generated. Iterative Bayesian unit, for each combination p of attribute values of the retained attributes, using an iterative Bayesian method, shows a cross-tabulation reconstructed for attributes perturb Q following vectors X _p of the partial transition probability matrix A obtained from _p and vector Y _p, to reconstruct the cross-tabulation X with all the vectors X _p.

なお、例えば、撹乱させる属性の番号がすべての保持する属性の番号よりも小さくなるように属性の番号を付与したとすれば、値域計算部、行列生成部、ベクトル生成部、反復ベイズ部は、以下のように構成すればよい。値域計算部は、撹乱された属性の属性値の組み合わせの数を準識別子全体の値域Ｑとし、保持された属性の属性値の組み合わせの数を保持属性の値域Ｓとする。行列生成部は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、ｉ行ｊ列目の成分が遷移確率行列Ａの（ｐ＋ｉ）行（ｐ＋ｊ）列目の成分であるＱ×Ｑの部分遷移確率行列Ａ_ｐを生成する。ベクトル生成部は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、ベクトルＹ_ｐ＝（ｙ_ｐ，ｙ_ｐ＋１，…，ｙ_{ｐ＋Ｑ−１}）を生成する。反復ベイズ部は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、反復ベイズ法を用いて、ベクトルＸ_ｐ＝（ｘ_ｐ，ｘ_ｐ＋１，…，ｘ_{ｐ＋Ｑ−１}）を部分遷移確率行列Ａ_ｐとベクトルＹ_ｐから求め、再構築されたクロス集計ＸをＸ_０‖Ｘ_１‖…‖Ｘ_{（Ｓ−１）×Ｑ}とする。 For example, if the attribute number is assigned so that the number of the attribute to be disturbed is smaller than all the attribute numbers held, the range calculation unit, the matrix generation unit, the vector generation unit, the iteration Bayes unit, What is necessary is just to comprise as follows. The range calculation unit sets the number of combinations of attribute values of disturbed attributes as the range Q of the entire quasi-identifier, and sets the number of combinations of attribute values of the held attributes as the range S of the held attributes. The matrix generation unit determines that the component in the i-th row and j-th column is the (p + i) -th row (p + j) -th column of the transition probability matrix A for each of p = 0, Q, 2 × Q,. A component Q × Q partial transition probability matrix _Ap is generated. The vector generation unit generates a vector Y _p = (y _p , y _{p + 1} ,..., Y _{p + Q−1} ) for each of p = 0, Q, 2 × Q,... (S−1) × Q. The iterative Bayes unit uses the iterative Bayes method for every p = 0, Q, 2 × Q,..., (S−1) × Q, and uses the vector X _p = (x _p , x _{p + 1} ,..., X _{p + Q− 1)} is obtained from the partial transition probability matrix _{a p} and the vector _{Y p,} the cross-tabulation X reconstructed _{X 0} ‖X ₁ || ... ‖X _(S-1) and _{× Q.}

本発明のデータ撹乱・再構築システムによれば、従来の方法との等価性が証明できた方法を用いて、データ再構築装置が、属性値を保持する属性の属性値の組み合わせごとに、それぞれ別々に再構築を行うことができる。したがって、従来と同様の結果が得られ、かつ、再構築の処理コストを低減できる。 According to the data disturbance / reconstruction system of the present invention, the data reconstruction device uses a method that can prove equivalence with the conventional method, for each combination of attribute values of attributes that hold attribute values. Reconstruction can be done separately. Therefore, the same result as before can be obtained, and the processing cost of reconstruction can be reduced.

本発明の処理の対象となるテーブルの例を示す図。The figure which shows the example of the table used as the object of the process of this invention. 本発明のデータ撹乱・再構築システムの構成例を示す図。The figure which shows the structural example of the data disturbance and reconstruction system of this invention. 本発明のデータ撹乱・再構築システムの処理フロー例を示す図。The figure which shows the example of a processing flow of the data disturbance and reconstruction system of this invention. 反復ベイズ部１４０の機能構成例を示す図。The figure which shows the function structural example of the repetition Bayes part. 反復ベイズ部１４０の処理フロー例を示す図。The figure which shows the processing flow example of the repetition Bayes part.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

図１は、本発明の処理の対象となるテーブルの例を示す図である。図１（Ａ）は撹乱処理を行う前の初期のテーブルの例であり、図１（Ｂ）はクロス集計をした状態のテーブルの例である。初期のテーブルには、国籍、趣味、年齢、性別などの属性があり、各行のレコードには、各属性に対する属性値が記載されている。クロス集計とは、あるテーブルに対し同じ属性の組み合わせが幾つあるのかを数え上げたものである。例えば図１（Ａ）のテーブルに対するクロス集計は図１（Ｂ）のようになる。 FIG. 1 is a diagram showing an example of a table to be processed in the present invention. FIG. 1A is an example of an initial table before performing the disturbance process, and FIG. 1B is an example of a table in a cross tabulated state. The initial table has attributes such as nationality, hobby, age, and sex, and each row record describes attribute values for each attribute. Cross tabulation is the number of combinations of the same attribute for a table. For example, the cross tabulation for the table of FIG. 1A is as shown in FIG.

本明細書では、Ｋは属性の数を示す２以上の整数、Ｎはレコードの数を示す２以上の整数、Ｍはすべての属性値の組み合わせの数、ｋは属性の番号を示す０以上Ｋ−１以下の整数、Ｍ_ｋはｋ番目の属性の取り得る属性値の数を示す１以上の整数、‖はベクトルを連結する記号、Ｔは行列またはベクトルの転置を示す記号とする。なお、ＭとＭ_ｋには、Ｍ＝Ｍ_０・Ｍ_１・…・Ｍ_Ｋ−１の関係がある。 In this specification, K is an integer of 2 or more indicating the number of attributes, N is an integer of 2 or more indicating the number of records, M is the number of combinations of all attribute values, and k is 0 or more indicating the attribute number. An integer of −1 or less, M _k is an integer of 1 or more indicating the number of attribute values that the k-th attribute can take, ‖ is a symbol that connects vectors, and T is a symbol that indicates a matrix or transpose of a vector. M and M _k have a relationship of M = M ₀ · M ₁ ···· M _K-1 .

図２に本発明のデータ撹乱・再構築システムの構成例を示す。また、図３に本発明のデータ撹乱・再構築システムの処理フロー例を示す。図２のデータ撹乱・再構築システムは、ネットワーク９００で接続されたデータ撹乱装置２００とデータ再構築装置１００とを備える。なお、データ撹乱装置２００とデータ再構築装置１００とのデータの授受を何らかの記録媒体で行う場合には、ネットワーク９００を介して接続しておく必要はない。また、データ撹乱装置２００とデータ再構築装置１００とを同一の装置で構成してもよい。この場合も、ネットワークは必要ない。 FIG. 2 shows a configuration example of the data disturbance / reconstruction system of the present invention. FIG. 3 shows a processing flow example of the data disturbance / reconstruction system of the present invention. The data disturbance / reconstruction system of FIG. 2 includes a data disturbance device 200 and a data reconstruction device 100 connected via a network 900. Note that when data is exchanged between the data disturbance device 200 and the data reconstruction device 100 using any recording medium, it is not necessary to connect them via the network 900. Moreover, you may comprise the data disturbance apparatus 200 and the data reconstruction apparatus 100 with the same apparatus. Again, no network is required.

撹乱・再構築の対象となる初期テーブルは、Ｋ個の属性を有するＮ個のレコードからなるとする。データ撹乱装置２００は、初期テーブルを、遷移確率行列Ａを用いて撹乱させた撹乱テーブルに対するクロス集計Ｙ＝（ｙ_０，ｙ_１，…，ｙ_Ｍ−１）を生成する（Ｓ２００）。クロス集計Ｙは、ネットワーク９００を介してデータ再構築装置１００に送信される。遷移確率行列Ａは、初期テーブルの一部の属性は属性値を撹乱させ、他の属性は属性値を保持する。遷移確率行列Ａには、すべての属性のすべての属性値に対して、撹乱によって遷移する属性値の確率が定義されている。ｋ番目の属性において、撹乱前の属性値ａが撹乱後に属性値ｂとなる確率をＡ^（ｋ） _ａｂと表現すると、 It is assumed that the initial table to be disturbed / reconstructed is composed of N records having K attributes. The data disturbance device 200 generates a cross tabulation Y = (y ₀ , y ₁ ,..., Y _M−1 ) for the disturbance table in which the initial table is disturbed using the transition probability matrix A (S200). The cross tabulation Y is transmitted to the data reconstruction device 100 via the network 900. In the transition probability matrix A, some attributes of the initial table disturb attribute values, and other attributes hold attribute values. In the transition probability matrix A, the probability of the attribute value that changes due to the disturbance is defined for all the attribute values of all the attributes. In the k-th attribute, when the probability that the attribute value a before the disturbance becomes the attribute value b after the disturbance is expressed as A ^(k) _ab ,

である。なお、属性値を保持する属性の場合は、ａ＝ｂの場合にＡ^（ｋ） _ａｂ＝１、ａ≠ｂの場合にＡ^（ｋ） _ａｂ＝０と設定すればよい。 It is. In the case of an attribute that holds an attribute value, A ^(k) _ab = 1 is set when a = b, and A ^(k) _ab = 0 is set when a ≠ b.

データ再構築装置１００は、値域計算部１１０、行列生成部１２０、ベクトル生成部１３０、反復ベイズ部１４０、記録部１９０を備える。記録部１９０は、クロス集計Ｙと遷移確率行列Ａとを記録する。値域計算部１１０は、撹乱された属性の属性値の組み合わせの数を準識別子全体の値域Ｑとし、保持された属性の属性値の組み合わせの数を保持属性の値域Ｓとする（Ｓ１１０）。例えば、撹乱させる属性の数をｔ（ｔは１以上Ｋ−１以下の整数）とし、撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１の整数となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１の整数となるように属性の番号を付与したとする。この場合には、Ｑ＝Ｍ_０・Ｍ_１・…・Ｍ_ｔ−１、Ｓ＝Ｍ_ｔ・Ｍ_ｔ＋１・…・Ｍ_Ｋ−１となる。なお、Ｍ＝Ｑ・Ｓなので、値域Ｓの計算は省略してもよい。 The data reconstruction apparatus 100 includes a range calculation unit 110, a matrix generation unit 120, a vector generation unit 130, an iterative Bayes unit 140, and a recording unit 190. The recording unit 190 records the cross tabulation Y and the transition probability matrix A. The range calculator 110 sets the number of combinations of attribute values of the disturbed attributes as the range Q of the entire quasi-identifier, and sets the number of combinations of attribute values of the held attributes as the range S of the retained attributes (S110). For example, the number of attributes to be disturbed is t (t is an integer between 1 and K−1), the number k of the attributes to be disturbed is an integer 0 ≦ k ≦ t−1, and the number k of the attribute to be retained is t ≦ Assume that attribute numbers are assigned so as to be integers of k ≦ K−1. In this case, Q = M ₀ · M ₁ ···· M _t-1 , and S = M _t · M _{t + 1} ····· M _K-1 . Since M = Q · S, the calculation of the range S may be omitted.

行列生成部１２０は、保持された属性の属性値の組み合わせｐごとに、遷移確率行列Ａの成分から、撹乱された属性の遷移確率を用いてＱ×Ｑの部分遷移確率行列Ａ_ｐを生成する（Ｓ１２０）。なお、保持された属性の属性値の組み合わせｐは、Ｓ通り存在するので、Ｑ×Ｑの部分遷移確率行列Ａ_ｐはＳ個生成される。ここで、撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１となるように属性の番号を付与した場合の具体例を示す。この場合は、保持された属性の属性値の組み合わせを示す値ｐは、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×ＱのようにＳ個のＱの倍数とする。そして、行列生成部１２０は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、ｉ行ｊ列目の成分が遷移確率行列Ａの（ｐ＋ｉ）行（ｐ＋ｊ）列目の成分であるＱ×Ｑの部分遷移確率行列Ａ_ｐを生成ればよい。 Matrix generating unit 120, for each combination p of attribute values of the retained attribute, from the components of the transition probability matrix A, to generate a partial transition probability matrix A _p of Q × Q by using the transition probabilities of the disturbed attributes (S120). Combinations p attribute values of the retained attribute, since there as S, partial transition probability matrix A _p of Q × Q is the S generated. Here, a specific example is shown in which attribute numbers are assigned so that the number k of the attribute to be disturbed is 0 ≦ k ≦ t−1 and the number k of the attribute to be held is t ≦ k ≦ K−1. In this case, the value p indicating the combination of the attribute values of the held attributes is a multiple of S Q such that p = 0, Q, 2 × Q,..., (S−1) × Q. Then, the matrix generation unit 120 determines that the component in the i-th row and j-th column is (p + i) rows (p + j) of the transition probability matrix A for every p = 0, Q, 2 × Q,. the partial transition probability matrix a _p of Q × Q is a component of th column may be Re generation.

ここで、保持された属性の属性値の組み合わせを示す値ｐについて、補足説明をする。まず、ｑを属性値の組み合わせを示す番号（撹乱させる属性も保持する属性も含めた属性値全体の組み合わせを示す番号）とし、組み合わせｑのときのｋ番目の属性の属性値をｑ^（ｋ）とする。なお、属性値の組み合わせはＭ通り存在するので、ｑはＭ種類の番号であり、クロス値と呼ぶ。クロス値ｑは、 Here, a supplementary explanation will be given for the value p indicating the combination of the attribute values of the held attributes. First, q is a number indicating a combination of attribute values (a number indicating a combination of all attribute values including the attribute that holds the disturbing attribute), and the attribute value of the k-th attribute at the time of combination q is q ^(k) And Since there are M combinations of attribute values, q is an M number and is called a cross value. The cross value q is

のように定義すればよく、この場合クロス値ｑは０以上Ｍ−１以下の整数である。そして、保持された属性の属性値の組み合わせを示す値ｐを、ｑをＱで除した商にＱを乗算した値と定義すれば、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑとなる。そして、ｐが同じ値となるクロス値ｑは、保持された属性の属性値の組み合わせが同じである。 In this case, the cross value q is an integer of 0 or more and M−1 or less. Then, if a value p indicating a combination of attribute values of held attributes is defined as a value obtained by multiplying q by Q and multiplying Q by Q, p = 0, Q, 2 × Q,... (S− 1) xQ And the cross value q in which p becomes the same value has the same combination of attribute values of the held attributes.

ベクトル生成部１３０は、保持された属性の属性値の組み合わせｐごとに、クロス集計Ｙの成分から、保持された属性の属性値の組み合わせがｐであるクロス集計の値を用いてＱ次のベクトルＹ_ｐを生成する（Ｓ１３０）。撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１となるように属性の番号を付与した場合であれば、ベクトル生成部１３０は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、ベクトルＹ_ｐ＝（ｙ_ｐ，ｙ_ｐ＋１，…，ｙ_{ｐ＋Ｑ−１}）を生成すればよい。 The vector generation unit 130 uses the cross tabulation value in which the combination of the attribute values of the held attributes is p from the components of the cross tabulation Y for each of the attribute value combinations p of the held attributes. generating a Y _p (S130). If the attribute number k is assigned so that the number k of the attribute to be disturbed is 0 ≦ k ≦ t−1 and the number k of the attribute to be held is t ≦ k ≦ K−1, the vector generation unit 130 , P = 0, Q, 2 × Q,..., (S−1) × Q, a vector Y _p = (y _p , y _{p + 1} ,..., Y _{p + Q−1} ) may be generated.

反復ベイズ部１４０は、保持された属性の属性値の組み合わせｐごとに、反復ベイズ法を用いて、撹乱させた属性に対する再構築されたクロス集計を示すＱ次のベクトルＸ_ｐを部分遷移確率行列Ａ_ｐとベクトルＹ_ｐから求め、すべてのベクトルＸ_ｐを用いてクロス集計Ｘを再構築する（Ｓ１４０）。撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１となるように属性の番号を付与した場合であれば、反復ベイズ部１４０は、ｐ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑごとに、反復ベイズ法を用いて、ベクトルＸ_ｐ＝（ｘ_ｐ，ｘ_ｐ＋１，…，ｘ_{ｐ＋Ｑ−１}）を部分遷移確率行列Ａ_ｐとベクトルＹ_ｐから求め、再構築されたクロス集計ＸをＸ_０‖Ｘ_１‖…‖Ｘ_{（Ｓ−１）×Ｑ}とすればよい。 The iterative Bayes unit 140 generates a Q-order vector X _p indicating the cross-tabulation reconstructed for the disturbed attribute using the iterative Bayes method for each attribute value combination p of the retained attributes. determined from a _p and the vector _{Y p,} to reconstruct the cross-tabulation X with all vectors _{X p} (S140). If the attribute number k is assigned so that the number k of the attribute to be disturbed is 0 ≦ k ≦ t−1 and the number k of the attribute to be held is t ≦ k ≦ K−1, the repetitive Bayes unit 140 , P = 0, Q, 2 × Q,..., (S−1) × Q, by using the iterative Bayesian method, partial vector X _p = (x _p , x _{p + 1} ,..., X _{p + Q−1} ) The cross tabulation X obtained from the transition probability matrix A _p and the vector Y _p and reconstructed may be X ₀ ‖X ₁ ‖... ‖X _{(S−1) × Q.}

次に、反復ベイズ部１４０についてさらに詳細に説明する。図４には反復ベイズ部１４０の機能構成例を、図５には反復ベイズ部１４０の処理フロー例を示す。反復ベイズ部１４０は、初期設定手段１４１、繰返し手段１４２、Ｙ_ｐ確認手段１４３、成分初期化手段１４５、成分計算手段１４６、成分繰返し制御手段１４７、出力手段１４９を備える。この説明でも、撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１となるように属性の番号を付与したとする。 Next, the repetitive Bayes unit 140 will be described in more detail. FIG. 4 shows a functional configuration example of the iterative Bayes unit 140, and FIG. Iterative Bayesian unit 140 includes an initial setting unit 141, the repeating unit 142, _{Y p} confirmation unit 143, the component initializes means 145, component computing means 146, component repetitive control unit 147, an output unit 149. Also in this description, it is assumed that the attribute number k is assigned so that the number k of the attribute to be disturbed is 0 ≦ k ≦ t−1 and the number k of the attribute to be held is t ≦ k ≦ K−1.

初期設定手段１４１は、ｐに０を代入する（Ｓ１４１）。繰返し手段１４２は、ｐ＝Ｍかを確認し、Ｙｅｓの場合は後述のステップＳ１４９に進み、Ｎｏの場合は後述のステップＳ１４３に進む（Ｓ１４２）。ステップＳ１４２がＮｏの場合、Ｙ_ｐ確認手段１４３は、Ｙ_ｐのすべての要素が０かを確認し、Ｙｅｓの場合は後述のステップＳ１４４に進み、Ｎｏの場合は後述のステップＳ１４５に進む（Ｓ１４３）。なお、図５の“０^Ｑ”は、Ｑ個の要素すべてが０であることを意味している。 The initial setting means 141 substitutes 0 for p (S141). The repeater 142 checks whether p = M. If Yes, the process proceeds to step S149 described later, and if No, the process proceeds to step S143 described later (S142). If step S142 of No, _{Y p} confirmation unit 143, and all of the elements of _{Y p} is check 0, If Yes proceeds to step S144 described later, the case of No, proceeds to step S145 described later (S143 ). Note that “0 ^Q ” in FIG. 5 means that all Q elements are zero.

ステップＳ１４３がＮｏの場合、成分初期化手段１４５は、ベクトルＹ_ｐをベクトルＸ_ｐ ^（０）とし、ｉ＝０とする（Ｓ１４５）。成分計算手段１４６は、 When step S143 is No, the component initializing means 145 sets the vector Y _p to the vector X _p ⁽⁰⁾ and i = 0 (S145). The component calculation means 146

のようにＸ_ｐ ^{（ｉ＋１）}を求める（Ｓ１４６）。成分繰返し制御手段１４７は、ベクトルＸ_ｐ ^{（ｉ＋１）}とベクトルＸ_ｐ ^（ｉ）とのすべての要素の差の絶対値の合計があらかじめ定めた範囲かを確認する（Ｓ１４７）。あらかじめ定めた範囲とは、例えば、あらかじめ定めた許容誤差未満であることなどである。範囲外の場合（ステップＳ１４７がＮｏの場合）には、成分繰返し制御手段１４７は、ｉにｉ＋１を代入してステップＳ１４６に戻る（Ｓ１４８）。範囲内の場合（ステップＳ１４７がＹｅｓの場合）には後述のステップＳ１４４に進む。 X _p ^{(i + 1)} is obtained as follows (S146). The component repetition control means 147 confirms whether the sum of the absolute values of the differences of all the elements between the vector X _p ^{(i + 1)} and the vector X _p ⁽ⁱ⁾ is within a predetermined range (S147). The predetermined range is, for example, being less than a predetermined allowable error. When it is out of range (when Step S147 is No), the component repetition control means 147 substitutes i + 1 for i, and returns to Step S146 (S148). If it is within the range (when step S147 is Yes), the process proceeds to step S144 described later.

繰返し手段１４２は、ステップＳ１４３がＹｅｓだった場合にはベクトルＹ_ｐをベクトルＸ_ｐとし、ステップＳ１４７がＹｅｓだった場合にはベクトルＸ_ｐ ^{（ｉ＋１）}をベクトルＸ_ｐとする。そして、繰返し手段１４２は、ｐにｐ＋Ｑを代入し、ステップＳ１４２に戻る（Ｓ１４４）。ステップＳ１４２がＹｅｓの場合、出力手段１４９は、クロス集計ＸをＸ_０‖Ｘ_１‖…‖Ｘ_{（Ｓ−１）×Ｑ}として出力する（Ｓ１４９）。 The repeater 142 sets the vector Y _p as the vector X _p when Step S143 is Yes, and sets the vector X _p ^{(i + 1)} as the vector X _p when Step S147 is Yes. Then, the repeating unit 142 substitutes p + Q for p, and returns to step S142 (S144). If step S142 is Yes, the output unit 149 outputs the cross tabulation X as X ₀ ‖X ₁ ‖... ‖X _{(S−1) × Q} (S149).

従来の反復ベイズ法の計算量が The computational complexity of the conventional iterative Bayes method is

であるのに対し、本発明の反復ベイズ部の計算量のオーダーは、 On the other hand, the order of the computational amount of the iterative Bayesian part of the present invention is

となる。また、本発明の反復ベイズ部の場合、乗算回数は、 It becomes. In the case of the iterative Bayesian section of the present invention, the number of multiplications is

である。例えば、保持属性の値域Ｓが１千万通り、準識別子全体の値域Ｑが２千通り、レコード数Ｎが２０万件、最も値域の広い属性の地域Ｍ_ｋが２千通りだとすると、従来に比べ、計算量が約１５０分の１になる。 It is. For example, assuming that the range S of the retained attribute is 10 million, the range Q of the entire quasi-identifier is 2000, the number of records N is 200,000, and the region _{Mk of the} attribute with the widest range is 2000, compared to the conventional case The calculation amount is about 1/150.

＜証明＞
以下に、本発明のデータ再構築装置であれば従来の反復ベイズ法と同様の結果が得られることを示す。属性の番号の付与方法、クロス値の定義、保持属性だけでの組み合わせを示す値の決め方自体は証明には影響しないので、説明を簡単にするため次のよう定める。属性の番号の付与方法は、撹乱させる属性の番号ｋが０≦ｋ≦ｔ−１となり、保持する属性の番号ｋがｔ≦ｋ≦Ｋ−１となるように属性の番号を付与したとする。また、ｃとｄとｑをクロス値（撹乱させる属性も保持する属性も含めた属性値全体の組み合わせを示す番号）とし、クロス値ｃのときのｋ番目の属性の属性値をｃ^（ｋ）、クロス値ｄのときのｋ番目の属性の属性値をｄ^（ｋ）、クロス値ｑのときのｋ番目の属性の属性値をｑ^（ｋ）とする。そして、クロス値ｑは、 <Proof>
Hereinafter, it will be shown that the data reconstruction apparatus of the present invention can obtain the same results as those of the conventional iterative Bayes method. The method of assigning attribute numbers, the definition of the cross value, and the method of determining the value indicating the combination of only the retained attributes do not affect the proof. Therefore, in order to simplify the explanation, the following is determined. The attribute number assignment method assumes that the attribute number k is assigned so that the attribute number k to be disturbed is 0 ≦ k ≦ t−1 and the attribute number k to be held is t ≦ k ≦ K−1. . Also, c, d, and q are cross values (numbers indicating the combination of all attribute values including the attribute that holds the disturbing attribute), and the attribute value of the ^kth attribute at the cross value c is c ^(k) The attribute value of the kth attribute at the cross value d is d ^(k) , and the attribute value of the kth attribute at the cross value q is q ^(k) . And the cross value q is

のように定義する。また、保持属性だけでの組み合わせを示す値ｐ_ｑは、クロス値ｑをＱで除した商にＱを乗算した値とする。この場合、ｐ_ｑ＝０，Ｑ，２×Ｑ，…，（Ｓ−１）×Ｑとなる。 Define as follows. Further, the value p _q indicating the combination of only the retention attributes is a value obtained by multiplying the quotient obtained by dividing the cross value q by Q by Q. In this case, p _q = 0, Q, 2 × Q,..., (S−1) × Q.

保持属性の属性値は変化しないことから、ｔ≦ｋ≦Ｋ−１であるｋ番目の属性では、属性値ａと属性値ｂとが異なるようなすべてのａとｂの組に対して、遷移確率Ａ^（ｋ） _ａｂは０である。 Since the attribute value of the holding attribute does not change, in the k-th attribute where t ≦ k ≦ K−1, transition is performed for all a and b pairs in which the attribute value a and the attribute value b are different. The probability A ^(k) _ab is zero.

撹乱によってクロス値ｃからクロス値ｄに遷移する確率Ａ_ｃｄは、 The probability A _cd of transition from the cross value c to the cross value d due to disturbance is

であるので、ｔ≦ｋ≦Ｋ−１のすべてのｋについてｃ^（ｋ）＝ｄ^（ｋ）ならば、Ａ_ｃｄ≠０が成り立つ。また、ｔ≦ｋ≦Ｋ−１のすべてのｋについてｃ^（ｋ）＝ｄ^（ｋ）ならば、保持された属性の属性値の組み合わせを示す値ｐは同じである。したがって、遷移確率行列Ａは、ｐ_ｃとｐ_ｄが異なるようなすべてのｃとｄの組に対して、Ａ_ｃｄ＝０であるという拘束条件を満たすことが分かる。なお、上述の保持属性だけでの組み合わせを示す値の決め方から、ｄがｐ_ｃ以上（ｐ_ｃ＋Ｑ）未満の場合に、ｐ_ｄがｐ_ｃと同一となることに注意されたい。 Therefore, if c ^(k) = d ^(k) for all k where t ≦ k ≦ K−1, then A _cd ≠ 0 holds. If c ^(k) = d ^(k) for all k of t ≦ k ≦ K−1, the value p indicating the combination of the attribute values of the retained attributes is the same. Therefore, it can be seen that the transition probability matrix A satisfies the constraint condition that A _cd = 0 for all pairs of c and d in which p _c and p _d are different. Incidentally, the method of determining the value indicating the combination of just holding attributes described above, like d is in the case of less than _{_{p c (p c + Q)}} , p d is noted that the same as the p _c.

また、従来の反復ベイズ法で計算する組み合わせｃのときのＭ次のベクトルＸ_ｃ ^{（ｉ＋１）}は、 In addition, the M-order vector X _c ^{(i + 1)} for the combination c calculated by the conventional iterative Bayes method is

のように計算される。この式に上述の拘束条件を用いると、 It is calculated as follows. Using the above constraint conditions in this equation,

が成り立つ。また、同様に、 Holds. Similarly,

が成り立ち、ｐ_ｃ≦ｄ＜（ｐ_ｃ＋Ｑ）ならばｐ_ｄ＝ｐ_ｃなので、 So it _{_{holds, p c ≦ d <(p}} c + Q) if _{_p} d = _p _c,

が成り立つ。この式の右辺の分母の部分をベクトルと行列で表すと、 Holds. When the denominator part on the right side of this equation is expressed as a vector and a matrix,

となる。ただし、［Ｘ］_ｄは、ベクトルＸのｄ番目の要素を表す。さらに同様に以下が成立する。 It becomes. However, [X] _d represents the d-th element of the vector X. Similarly, the following holds.

左辺をベクトル表記で書けば、 If you write the left side in vector notation,

である。したがって、上述のステップＳ１４５〜Ｓ１４８（図５参照）での結果は、従来と同様の結果となることが分かる。 It is. Therefore, it turns out that the result in the above-mentioned steps S145-S148 (refer to Drawing 5) turns into the result similar to the former.

したがって、上述のステップＳ１４３（図５参照）のようにＹ_ｐのすべての要素が０の場合に、ステップＳ１４５〜Ｓ１４８の処理を省略できることも分かる。よって、本発明の方式の結果は、従来と同様の結果となっている。 Therefore, if all elements are 0 in _{Y p} as in the above step S143 (see FIG. 5), also it is seen to be able to omit the process of step S145～S148. Therefore, the result of the method of the present invention is the same as the conventional result.

［プログラム、記録媒体］
上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 [Program, recording medium]
The various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

本発明は、プライバシー保護などのデータの秘密管理に利用することができる。 The present invention can be used for secret management of data such as privacy protection.

１００データ再構築装置１１０値域計算部
１２０行列生成部１３０ベクトル生成部
１４０反復ベイズ部１４１初期設定手段
１４２繰返し手段１４３Ｙ_ｐ確認手段
１４５成分初期化手段１４６成分計算手段
１４７成分繰返し制御手段１４９出力手段
１９０記録部２００データ撹乱装置
９００ネットワーク 100 data restructuring device 110 value range calculation unit 120 matrix generating unit 130 vector generation unit 140 repeats Bayesian unit 141 initial setting means 142 repeating means 143 Y _p confirmation unit 145 component initialization unit 146 component computing means 147 component repetitive control unit 149 output unit 190 Recording unit 200 Data disturbance device 900 Network

Claims

A data disturbance / reconstruction system comprising a data disturbance device and a data reconstruction device,
K is an integer of 2 or more indicating the number of attributes, N is an integer of 2 or more indicating the number of records, M is the number of combinations of all attribute values, and the number of attributes to be disturbed is greater than the number of all retained attributes Given an attribute number to make it smaller,
The data disturbing device is:
Crossing an initial table consisting of N records having K attributes with a disturbance table disturbed using a transition probability matrix A in which some attributes disturb attribute values and other attributes hold attribute values Generate a summary Y = (y ₀ , y ₁ ,..., Y _M−1 ),
The data reconstruction device
A range calculator that sets the number of combinations of attribute values of the disturbed attributes as the range Q of the entire quasi-identifier,
For each combination p of retained attribute value of the attribute, the components of the transition probability matrix A, the matrix generation unit for generating a partial transition probability matrix A _p of Q × Q by using the transition probabilities of the disturbed attributes,
For each attribute value combination p of held attributes, a Q-th order vector Y _p is generated from the cross tabulation Y components using a cross tabulation value in which the combination of attribute values of the held attributes is p. A vector generator;
For each attribute value combination p of the retained attributes, a Q-order vector X _p indicating the reconstructed cross tabulation for the disturbed attribute using the iterative Bayes method is used as the partial transition probability matrix A _p and the vector. calculated from Y _p, the data disturbance and reconstruction system comprising a repeat Bayesian unit to reconstruct the cross-tabulation X with all the vectors X _p.

K is an integer of 2 or more indicating the number of attributes, N is an integer of 2 or more indicating the number of records, M is the number of combinations of all attribute values,
Y = (y ₀ , y ₁ ,..., Y _M−1 ) is an initial table composed of N records having K attributes, some attributes disturb attribute values, and other attributes are attributes Cross tabulation for the disturbance table disturbed using the transition probability matrix A holding values,
A range calculator that sets the number of combinations of attribute values of the disturbed attributes as the range Q of the entire quasi-identifier,
For each combination p of retained attribute value of the attribute, the components of the transition probability matrix A, the matrix generation unit for generating a partial transition probability matrix A _p of Q × Q by using the transition probabilities of the disturbed attributes,
For each attribute value combination p of held attributes, a Q-th order vector Y _p is generated from the cross tabulation Y components using a cross tabulation value in which the combination of attribute values of the held attributes is p. A vector generator;
For each attribute value combination p of the retained attributes, a Q-order vector X _p indicating the reconstructed cross tabulation for the disturbed attribute using the iterative Bayes method is used as the partial transition probability matrix A _p and the vector. A data reconstruction apparatus comprising: an iterative Bayesian unit that calculates from Y _p and reconstructs the cross tabulation X using all vectors X _p .

K is an integer greater than or equal to 2 indicating the number of attributes, N is an integer greater than or equal to 2 indicating the number of records, M is the number of combinations of all attribute values, ‖ is a symbol that connects vectors,
Y = (y ₀ , y ₁ ,..., Y _M−1 ) is an initial table composed of N records having K attributes, some attributes disturb attribute values, and other attributes are attributes Cross tabulation for the disturbance table disturbed using the transition probability matrix A holding values,
Suppose the attribute number is assigned so that the number of the attribute to be disturbed is smaller than the number of all the retained attributes.
A range calculation unit in which the number of combinations of attribute values of the disturbed attribute is a range Q of the entire quasi-identifier, and the number of combinations of attribute values of the held attribute is a range S of the retained attribute;
For each of p = 0, Q, 2 × Q,..., (S−1) × Q, the component in the i-th row and j-th column is the component in the (p + i) -th row (p + j) -th column of the transition probability matrix A. × a matrix generator for generating a partial transition probability matrix a _p of Q,
a vector generation unit for generating a vector Y _p = (y _p , y _{p + 1} ,..., y _{p + Q−1} ) for each of p = 0, Q, 2 × Q,..., (S−1) × Q;
For each of p = 0, Q, 2 × Q,..., (S−1) × Q, the vector X _p = (x _p , x _{p + 1} ,..., x _{p + Q−1} ) is expressed by the iterative Bayesian method. wherein a transition probability matrix a _p calculated from the vector Y _p, the cross-tabulation X reconstructed X ₀ ‖X ₁ || ... ‖X _(S-1) data restructuring device and a repetition Bayesian portion to _{× Q.}

A data reconstruction device according to claim 2 or 3,
T is a symbol indicating the transpose of a matrix or vector,
The repetitive Bayes portion is
Component initialization means for setting vector Y _p to vector X _p ⁽⁰⁾ , i = 0;

Component calculation means for obtaining X _p ^{(i + 1)} as follows:
If the sum of the absolute values of the differences of all elements between the vector X _p ^{(i + 1)} and the vector X _p ⁽ⁱ⁾ is outside the predetermined range, i + 1 is substituted into i and the processing of the component calculation means is repeated. Component repetition control means for setting the vector X _p ^{(i + 1)} to the vector X _p within the range;
A data restructuring device comprising:

A data reconstruction method using a data reconstruction device including a range calculator, a matrix generator, a vector generator, and an iterative Bayes unit,
K is an integer of 2 or more indicating the number of attributes, N is an integer of 2 or more indicating the number of records, M is the number of combinations of all attribute values,
Y = (y ₀ , y ₁ ,..., Y _M−1 ) is an initial table composed of N records having K attributes, some attributes disturb attribute values, and other attributes are attributes Cross tabulation for the disturbance table disturbed using the transition probability matrix A holding values,
A range calculation step in which the range calculation unit sets the number of combinations of attribute values of the disturbed attribute as the range Q of the entire quasi-identifier;
Matrix generating unit, for each combination p of attribute values of the retained attribute, from the components of the transition probability matrix A, to generate a partial transition probability matrix A _p of Q × Q by using the transition probabilities of the disturbed attributes A matrix generation step;
The vector generation unit uses a cross tabulation value in which the combination of attribute values of held attributes is p from the cross tabulation Y components for each of the attribute value combinations p of held attributes. A vector generation step of generating Y _p ;
For each combination p of attribute values of the retained attributes, the iterative Bayesian unit uses the iterative Bayesian method to generate a Q-order vector X _p indicating the reconstructed cross tabulation for the disturbed attribute, using the partial transition probability matrix wherein the a _p calculated from the vector Y _p, the data reconstruction method and a repetitive Bayesian step of reconstructing a cross-tabulation X with all the vectors X _p.

A data reconstruction method using a data reconstruction device including a range calculator, a matrix generator, a vector generator, and an iterative Bayes unit,
K is an integer greater than or equal to 2 indicating the number of attributes, N is an integer greater than or equal to 2 indicating the number of records, M is the number of combinations of all attribute values, ‖ is a symbol that connects vectors,
Y = (y ₀ , y ₁ ,..., Y _M−1 ) is an initial table composed of N records having K attributes, some attributes disturb attribute values, and other attributes are attributes Cross tabulation for the disturbance table disturbed using the transition probability matrix A holding values,
Suppose the attribute number is assigned so that the number of the attribute to be disturbed is smaller than the number of all the retained attributes.
A range calculation step in which the range calculation unit sets the number of combinations of attribute values of disturbed attributes as the range Q of the entire quasi-identifier, and sets the number of combinations of attribute values of held attributes as the range S of the retained attributes;
The matrix generation unit determines that the component of the i-th row and the j-th column is the (p + i) -th row (p + j) -th column of the transition probability matrix A for every p = 0, Q, 2 × Q,. matrix raw Narusu step of generating a partial transition probability matrix a _p which is the component Q × Q,
A vector generation unit generates a vector Y _p = (y _p , y _{p + 1} ,..., Y _{p + Q−1} ) for each of p = 0, Q, 2 × Q,..., (S−1) × Q. Steps,
The iterative Bayesian unit uses the iterative Bayes method for every p = 0, Q, 2 × Q,..., (S−1) × Q, and uses the vector X _p = (x _p , x _{p + 1} ,..., X _{p + Q−} seek ₁₎ from the vector _{Y p} and the partial transition probability matrix _{a p,} the cross-tabulation X reconstructed and a repeating Bayesian steps that _{X 0} ‖X ₁ || _{... ‖X (S-1) ×} Q Data reconstruction method.

The data reconstruction method according to claim 5 or 6,
T is a symbol indicating the transpose of a matrix or vector,
The iterative Bayes step includes
A component initialization sub-step in which vector Y _p is vector X _p ⁽⁰⁾ , i = 0;

A component calculation substep for obtaining X _p ^{(i + 1)} as follows:
If the sum of absolute values of differences between all elements of the vector X _p ^{(i + 1)} and the vector X _p ⁽ⁱ⁾ is outside a predetermined range, i + 1 is substituted into i and the component calculation substep is repeated. A component repetition control substep in which the vector X _p ^{(i + 1) is set} to the vector X _p when it is within the range;
A data reconstruction method characterized by comprising:

A data reconstruction program for causing a computer to operate as the data reconstruction apparatus according to claim 2.