JP6978385B2

JP6978385B2 - Anonymization device, anonymization method and anonymization program

Info

Publication number: JP6978385B2
Application number: JP2018140085A
Authority: JP
Inventors: 知明三本; 清良披田野; 晋作清本
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2021-12-08
Anticipated expiration: 2038-07-26
Also published as: JP2020017101A

Description

本発明は、データセットを匿名化する匿名化装置、匿名化方法及び匿名化プログラムに関する。 The present invention relates to an anonymization device, anonymization method and anonymization program for anonymizing a data set.

従来、ユーザ情報を含むデータセットを活用する際に、個人のプライバシを保護するために、データの匿名化が行われている。
匿名化手法として、例えば、非特許文献１〜３ではｋ−匿名化手法が、非特許文献４〜６ではデータにノイズを付加する手法が提案されている。 Conventionally, when utilizing a data set containing user information, data anonymization is performed in order to protect personal privacy.
As anonymization methods, for example, the k-anonymization method is proposed in Non-Patent Documents 1 to 3, and the method of adding noise to data is proposed in Non-Patent Documents 4 to 6.

Ｐ．ＳａｍａｒａｔｉａｎｄＬ．Ｓｗｅｅｎｅｙ， “Ｇｅｎｅｒａｌｉｚｉｎｇｄａｔａｔｏｐｒｏｖｉｄｅａｎｏｎｙｍｉｔｙｗｈｅｎｄｉｓｃｌｏｓｉｎｇｉｎｆｏｒｍａｔｉｏｎ，” ｉｎＰｒｏｃ．ｏｆＰＯＤＳ１９９８，１９９８，ｐ．１８８．P. Samari and L. Sweeney, "Generalization data to provid anonymity when disclosing information," in Proc. of PODS 1998, 1998, p. 188. Ｐ．Ｓａｍａｒａｔｉ， “Ｐｒｏｔｅｃｔｉｎｇｒｅｓｐｏｎｄｅｎｔｓ’ ｉｄｅｎｔｉｔｉｅｓｉｎｍｉｃｒｏｄａｔａｒｅｌｅａｓｅ，” ＩＥＥＥＴｒａｎｓ．ｏｎＫｎｏｗｌｅｄｇｅａｎｄＤａｔａＥｎｇｉｎｅｅｒｉｎｇ，ｖｏｌ．１３，ｎｏ．６，ｐｐ．１０１０−１０２７，２００１．P. Samariti, “Protecting repondents' identities in microdata releases,” IEEE Trans. on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001. Ｌ．Ｓｗｅｅｎｅｙ， “Ａｃｈｉｅｖｉｎｇｋ−ａｎｏｎｙｍｉｔｙｐｒｉｖａｃｙｐｒｏｔｅｃｔｉｏｎｕｓｉｎｇｇｅｎｅｒａｌｉｚａｔｉｏｎａｎｄｓｕｐｐｒｅｓｓｉｏｎ，” ｉｎＪ．Ｕｎｃｅｒｔａｉｎｔｙ，Ｆｕｚｚｉｎｅｓｓ，ａｎｄＫｎｏｗｌｅｄｇｅ−ＢａｓｅＳｙｓｔｅｍｓ，ｖｏｌ．１０（５），２００２，ｐｐ．５７１−５８８．L. Sweeney, "Achieving k-anonymity privacy promotion using generalization and support," in J. et al. Uncertainty, Fuzzines, and Knowledge-Base Systems, vol. 10 (5), 2002, pp. 571-588. Ｋ．Ｍｉｖｕｌｅ， “Ｕｔｉｌｉｚｉｎｇｎｏｉｓｅａｄｄｉｔｉｏｎｆｏｒｄａｔａｐｒｉｖａｃｙ，ａｎｏｖｅｒｖｉｅｗ，” ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１３０９．３９５８，２０１３．K. Mivule, "Utilizing noise addition for data privacy, an overflow," arXiv preprint arXiv: 1309.3958, 2013. Ｊ．Ｊ．Ｋｉｍ， “Ａｍｅｔｈｏｄｆｏｒｌｉｍｉｔｉｎｇｄｉｓｃｌｏｓｕｒｅｉｎｍｉｃｒｏｄａｔａｂａｓｅｄｏｎｒａｎｄｏｍｎｏｉｓｅａｎｄｔｒａｎｓｆｏｒｍａｔｉｏｎ，” ｉｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅｓｅｃｔｉｏｎｏｎｓｕｒｖｅｙｒｅｓｅａｒｃｈｍｅｔｈｏｄｓ．ＡｍｅｒｉｃａｎＳｔａｔｉｓｔｉｃａｌＡｓｓｏｃｉａｔｉｏｎ，１９８６，ｐｐ．３０３−３０８．J. J. Kim, "A method for limiting discrosure in microdata based on random noise and transformation," in Proceedings of the second method. American Statistical Association, 1986, pp. 303-308. Ｔ．ＹｕａｎｄＳ．Ｊａｊｏｄｉａ， “Ｓｅｃｕｒｅｄａｔａｍａｎａｇｅｍｅｎｔｉｎｄｅｃｅｎｔｒａｌｉｚｅｄｓｙｓｔｅｍｓ，” ＳｐｒｉｎｇｅｒＳｃｉｅｎｃｅ＆ＢｕｓｉｎｅｓｓＭｅｄｉａ，２００７，ｖｏｌ．３３．T. Yu and S. Jajodia, "Secure data management in decentralized systems," Springer Science & Business Media, 2007, vol. 33.

データの匿名化の目的は、個人のプライバシを守ることと、データの有用性を維持することとの相反する二つの両立にある。しかしながら、従来の匿名化手法では、データの匿名性を向上させると、有用性が大きく低下してしまう場合があった。 The purpose of data anonymization is to protect the privacy of individuals and to maintain the usefulness of the data. However, in the conventional anonymization method, if the anonymity of the data is improved, the usefulness may be greatly reduced.

本発明は、データセットの有用性を維持して匿名化できる匿名化装置、匿名化方法及び匿名化プログラムを提供することを目的とする。 An object of the present invention is to provide an anonymization device, anonymization method, and anonymization program capable of maintaining the usefulness of a data set and anonymizing it.

本発明に係る匿名化装置は、ユーザ情報のデータセットが記述されたテンソルの入力を受け付ける入力部と、前記テンソルを、所定のテンソル分解のアルゴリズムにより、複数の因子の積に分解するテンソル分解部と、前記複数の因子のうち、ユーザ毎の特徴を示す特定の因子に対して、匿名化の演算を行う匿名化演算部と、匿名化された因子で前記特定の因子を置き換え、前記テンソルに近似した匿名化テンソルを算出して出力する出力部と、を備える。 The anonymization device according to the present invention has an input unit that accepts an input of a tensor in which a data set of user information is described, and a tensor decomposition unit that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm. And, among the plurality of factors, the anonymization calculation unit that performs an anonymization calculation for a specific factor showing the characteristics of each user, and the anonymized factor replace the specific factor with the tensor. It includes an output unit that calculates and outputs an approximate anonymized tensor.

前記匿名化演算部は、前記特定の因子に対して、ｋ−匿名化の演算を行ってもよい。 The anonymization calculation unit may perform a k-anonymization calculation for the specific factor.

前記匿名化演算部は、前記特定の因子に対して、ノイズを付加する演算を行ってもよい。 The anonymization calculation unit may perform a calculation for adding noise to the specific factor.

前記匿名化装置は、前記テンソル分解におけるランク、及び前記匿名化の強度をパラメータとして受け付けて設定する設定部を備えてもよい。 The anonymization device may include a setting unit that accepts and sets the rank in the tensor decomposition and the strength of the anonymization as parameters.

本発明に係る匿名化方法は、ユーザ情報のデータセットが記述されたテンソルの入力を受け付ける入力ステップと、前記テンソルを、所定のテンソル分解のアルゴリズムにより、複数の因子の積に分解するテンソル分解ステップと、前記複数の因子のうち、ユーザ毎の特徴を示す特定の因子に対して、匿名化の演算を行う匿名化演算ステップと、匿名化された因子で前記特定の因子を置き換え、前記テンソルに近似した匿名化テンソルを算出して出力する出力ステップと、をコンピュータが実行する。 The anonymization method according to the present invention includes an input step that accepts an input of a tensor in which a data set of user information is described, and a tensor decomposition step that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm. And, among the plurality of factors, the anonymization calculation step for performing an anonymization operation for a specific factor showing the characteristics of each user, and the anonymized factor replaces the specific factor with the tensor. The computer executes an output step that calculates and outputs an approximate anonymized tensor.

本発明に係る匿名化プログラムは、ユーザ情報のデータセットが記述されたテンソルの入力を受け付ける入力ステップと、前記テンソルを、所定のテンソル分解のアルゴリズムにより、複数の因子の積に分解するテンソル分解ステップと、前記複数の因子のうち、ユーザ毎の特徴を示す特定の因子に対して、匿名化の演算を行う匿名化演算ステップと、匿名化された因子で前記特定の因子を置き換え、前記テンソルに近似した匿名化テンソルを算出して出力する出力ステップと、をコンピュータに実行させるためのものである。 The anonymization program according to the present invention has an input step that accepts an input of a tensor in which a data set of user information is described, and a tensor decomposition step that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm. And, among the plurality of factors, the anonymization calculation step for performing an anonymization operation for a specific factor showing the characteristics of each user, and the anonymized factor replaces the specific factor with the tensor. The purpose is to have a computer execute an output step that calculates and outputs an approximate anonymized tensor.

本発明によれば、データセットの有用性を維持して匿名化できる。 According to the present invention, the usefulness of the data set can be maintained and anonymized.

実施形態に係る匿名化装置の機能構成を示すブロック図である。It is a block diagram which shows the functional structure of the anonymization apparatus which concerns on embodiment. 実施形態に係る匿名化方法のアルゴリズムを例示する図である。It is a figure which illustrates the algorithm of the anonymization method which concerns on embodiment. 実施形態に係る匿名化方法の従来との比較実験結果を示す図である。It is a figure which shows the comparative experiment result with the prior art of the anonymization method which concerns on embodiment.

以下、本発明の実施形態の一例について説明する。
図１は、本実施形態に係る匿名化装置１の機能構成を示すブロック図である。
匿名化装置１は、サーバ装置又はパーソナルコンピュータ等の情報処理装置（コンピュータ）であり、制御部１０、記憶部２０、及び各種の入出力デバイスを備える。 Hereinafter, an example of the embodiment of the present invention will be described.
FIG. 1 is a block diagram showing a functional configuration of the anonymization device 1 according to the present embodiment.
The anonymization device 1 is an information processing device (computer) such as a server device or a personal computer, and includes a control unit 10, a storage unit 20, and various input / output devices.

制御部１０は、匿名化装置１の全体を制御する部分であり、記憶部２０に記憶された各種プログラムを適宜読み出して実行することにより、本実施形態における機能を実現している。制御部１０は、ＣＰＵであってよい。 The control unit 10 is a part that controls the entire anonymization device 1, and realizes the function in the present embodiment by appropriately reading and executing various programs stored in the storage unit 20. The control unit 10 may be a CPU.

記憶部２０は、ハードウェア群を匿名化装置１として機能させるための各種プログラム、及び各種データ等の記憶領域であり、ＲＯＭ、ＲＡＭ、フラッシュメモリ又はハードディスク（ＨＤＤ）等であってよい。具体的には、記憶部２０は、本実施形態の機能を制御部１０に実行させるための匿名化プログラムの他、処理対象のデータセット及び各種パラメータ等を記憶する。 The storage unit 20 is a storage area for various programs and various data for making the hardware group function as the anonymization device 1, and may be a ROM, RAM, flash memory, hard disk (HDD), or the like. Specifically, the storage unit 20 stores the data set to be processed, various parameters, and the like, in addition to the anonymization program for causing the control unit 10 to execute the function of the present embodiment.

また、制御部１０は、入力部１１と、テンソル分解部１２と、匿名化演算部１３と、出力部１４と、設定部１５とを備える。これらの機能部は、記憶部２０に記憶された匿名化プログラムを制御部１０が実行することにより実現される。 Further, the control unit 10 includes an input unit 11, a tensor decomposition unit 12, an anonymization calculation unit 13, an output unit 14, and a setting unit 15. These functional units are realized by the control unit 10 executing the anonymization program stored in the storage unit 20.

入力部１１は、ユーザの個人情報を含んだデータセットが記述されたテンソルの入力を受け付ける。
なお、本実施形態では、入力されるテンソルは、一例として２階テンソルである行列として説明するが、これには限られない。 The input unit 11 accepts the input of the tensor in which the data set including the personal information of the user is described.
In the present embodiment, the input tensor is described as a matrix which is a second-order tensor as an example, but the present invention is not limited to this.

匿名化の対象とするデータセットは、例えば、ある期間におけるユーザの移動履歴、購買履歴、アクセス履歴等である。このようなデータセットは、ｎ人のユーザに対してｍ個の位置、商品、サイト等のうち実績のあり／なしがｎ行ｍ列の行列において各要素の１／０として記述される。 The data set to be anonymized is, for example, a user's movement history, purchase history, access history, etc. in a certain period. Such a data set is described as 1/0 of each element in a matrix of n rows and m columns with or without actual results among m positions, products, sites, etc. for n users.

テンソル分解部１２は、入力されたテンソルを、所定のテンソル分解のアルゴリズムにより、複数の因子の積に分解する。
テンソル分解には、既知のアルゴリズムが利用可能である。例えば、２階テンソルの場合には、ＳＶＤ（ＳｉｎｇｕｌａｒＶａｌｕｅＤｅｃｏｍｐｏｓｉｔｉｏｎ）又はＮＭＦ（Ｎｏｎ−ｎｅｇａｔｉｖｅＭａｔｒｉｘＦａｃｔｏｒｉｚａｔｉｏｎ）等の行列分解アルゴリズムが利用可能である。 The tensor decomposition unit 12 decomposes the input tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm.
Known algorithms are available for tensor decomposition. For example, in the case of a second-order tensor, a matrix factorization algorithm such as SVD (Singular Value Decomposition) or NMF (Non-negative Matrix Factorization) can be used.

これにより、行列Ｍ∈Ｒ^ｎ×ｍは、Ｕ∈Ｒ^ｎ×ｒ及びＶ∈Ｒ^ｒ×ｍという２つの行列に分割される。Ｕは、行の特徴、すなわちユーザ毎の特徴を示す行列である。一方、Ｖは、列の特徴、すなわち履歴データ等の特徴を示しており、ユーザ固有の情報を含まない。
ここで、行列分解のパラメータであるランクｒが小さくなるほど、特徴量の数が削減され、分解された行列Ｕ及びＶの積ＵＶの元の行列Ｍとの近似精度が低くなる。 As a result, the matrix M ∈ R ^{n × m} is divided into two matrices, ^{U ∈ R n × r} and V ∈ ^{R r × m.} U is a matrix showing the characteristics of the row, that is, the characteristics of each user. On the other hand, V indicates the characteristics of the column, that is, the characteristics such as historical data, and does not include user-specific information.
Here, as the rank r, which is a parameter of matrix factorization, becomes smaller, the number of feature quantities is reduced, and the approximation accuracy of the product UV of the decomposed matrices U and V with the original matrix M becomes lower.

匿名化演算部１３は、分解された複数の因子のうち、ユーザ毎の特徴を示す特定の因子に対して、匿名化の演算を行う。
例えば、匿名化の対象とする行列ＭがＵとＶとに分解された場合、匿名化演算部１３は、ユーザ毎の特徴を示す行列Ｕのみに匿名化の演算を行う。
匿名化の演算には、例えば、ｋ−匿名化、又はノイズを付加する演算が採用されてよい。 The anonymization calculation unit 13 performs anonymization calculation for a specific factor showing the characteristics of each user among the plurality of decomposed factors.
For example, when the matrix M to be anonymized is decomposed into U and V, the anonymization calculation unit 13 performs the anonymization operation only on the matrix U showing the characteristics of each user.
For the anonymization operation, for example, k-anonymization or an operation for adding noise may be adopted.

出力部１４は、匿名化された因子で特定の因子を置き換え、元のテンソルに近似した匿名化テンソルを算出して出力する。
例えば、行列に対する匿名化関数をＡ（）とすると、出力部１４は、元の行列Ｍの近似行列Ａ（Ｕ）Ｖを算出することで、ユーザの特徴のみを匿名化した行列を出力する。 The output unit 14 replaces a specific factor with an anonymized factor, calculates and outputs an anonymized tensor that is close to the original tensor.
For example, assuming that the anonymization function for the matrix is A (), the output unit 14 outputs a matrix in which only the user's characteristics are anonymized by calculating the approximate matrix A (U) V of the original matrix M.

設定部１５は、テンソル分解におけるランク、及び匿名化の強度をパラメータとして受け付けて設定する。
匿名化の強度とは、ｋ−匿名化におけるｋの値、又はノイズの大きさ、例えばノイズの一例であるラプラス分布における分散２φ^２の値等である。 The setting unit 15 accepts and sets the rank in the tensor decomposition and the strength of anonymization as parameters.
The intensity of anonymization is the value of k in k-anonymization, or the magnitude of noise, for example, the value ^{of variance 2φ 2 in the Laplace distribution, which is an example of noise.}

図２は、本実施形態に係る匿名化方法のアルゴリズムを例示する図である。
この例は、ＮＭＦのアルゴリズムに対して、匿名化の演算を組み入れたものである。
まず、匿名化の対象となる元の行列Ｍに対して、ランクｒと、交互最適化の繰り返し回数Ｉとが与えられる。 FIG. 2 is a diagram illustrating an algorithm of the anonymization method according to the present embodiment.
This example incorporates an anonymization operation into the NMF algorithm.
First, the rank r and the number of iterations I of the alternate optimization are given to the original matrix M to be anonymized.

ステップ１〜２において、テンソル分解部１２は、行列Ｍの分解後の行列Ｕ及びＶの初期値として、ｔ＝０，Ｕ_ｔ∈Ｒ^ｎ×ｒ及びＶ_ｔ∈Ｒ^ｒ×ｍをランダムに生成する。 In step 1-2, the tensor decomposition unit 12 as the initial value of the matrix U and V after the decomposition of the matrix M, randomly generates a _{^{t = 0, U t ∈R n}} × r and _{V t} ∈R ^{r × m} do.

ステップ３〜７において、テンソル分解部１２は、交互最適化により、Ｕ_ｔ＋１＝Ｕ_ｔ・（ＭＶ_ｔ ^Ｔ）／（Ｕ_ｔＶ_ｔＶ_ｔ ^Ｔ）及びＶ_ｔ＋１＝Ｖ_ｔ・（Ｕ_ｔ＋１ ^ＴＭ）／（Ｕ_ｔ＋１ ^ＴＵ_ｔ＋１Ｖ_ｔ）を計算し、ｔをカウントアップする処理を繰り返し、Ｕ_Ｉ及びＶ_Ｉを算出する。 In step 3-7, the tensor decomposition section 12, by alternating _{_{_{^{optimization, U t + 1 = U t}}}} · (MV t T) / (U t V t V t T) and _{_{_{V t + 1 = V t ·}}} (U t + 1 T M ) _/ a ^{_{(U t + 1 T U t}} + 1 V t) is calculated and repeats the process of counting up the t, and calculates the _{U I} and _{V I.}

ステップ８〜９において、匿名化演算部１３は、Ｕ_Ｉに対して匿名化の演算を行い、ｔ＝Ｉ，Ｕ’_ｔ＋１＝Ａ_{（ａｎｏ）}（Ｕ_ｔ）及びＶ’_ｔ＋１＝Ｖ_ｔ・（Ｕ’_ｔ＋１ ^ＴＭ）／（Ｕ’_ｔ＋１ ^ＴＵ’_ｔ＋１Ｖ_ｔ）を算出する。 In step 8-9, the anonymizing calculation unit 13 performs calculation of the anonymous with respect to _{U I, t = I, U} 't + 1 = A (ano) (U t) and _{_{V' t + 1 = V t}} · ( U calculates the _{^{'t + 1 T M) /}} (U' t + 1 T U 't + 1 V t).

このアルゴリズムにより、匿名化された行列Ｕ’_ｔ＋１と、元の行列Ｍを近似するためのＶ’_ｔ＋１とが得られるので、出力部１４は、Ｕ’_ｔ＋１Ｖ’_ｔ＋１を、行列Ｍを匿名化した結果として出力する。 This algorithm 'and _{t + 1,} V for approximating the original matrix M' matrix U which is anonymous since the _{t + 1} is obtained, the output unit 14, U a _{'t + 1} V' _{t + 1,} anonymizing matrix M Is output as a result of.

図３は、本実施形態に係る匿名化方法の従来との比較実験結果を示す図である。
ここでは、匿名化の演算としてノイズ付加を採用した場合に、元の行列Ｍ全体を匿名化する従来の手法と、ノイズの大きさとランクとを調整してユーザ行列Ｕのみを匿名化する本実施形態の手法とを示している。 FIG. 3 is a diagram showing the results of comparative experiments of the anonymization method according to the present embodiment with the conventional ones.
Here, when noise addition is adopted as the operation of anonymization, the conventional method of anonymizing the entire original matrix M and the present implementation of adjusting the magnitude and rank of noise to anonymize only the user matrix U. It shows the method of morphology.

この実験に用いた匿名化の対象は、ｎ×ｍ＝２００×１０００の行列Ｍである。行列Ｍにφ＝１．５のノイズを付加した場合、ランクｒ＝４０の行列Ｕにφ＝１．５のノイズを付加した場合、ランクｒ＝８０の行列Ｕにφ＝２．５のノイズを付加した場合、ランクｒ＝１２０の行列Ｕにφ＝３．５のノイズを付加した場合の４パターンの有用性Ｕｔｉｌｉｔｙ（Ｄ）を算出した。
なお、これらの４パターンのデータセットは、匿名化の前後で同一ユーザをマッチング、すなわち再識別（ｒｅ−ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）できる確率が同等（約０．６３）のものである。 The object of anonymization used in this experiment is a matrix M of n × m = 200 × 1000. When the noise of φ = 1.5 is added to the matrix M, when the noise of φ = 1.5 is added to the matrix U of rank r = 40, the noise of φ = 2.5 is added to the matrix U of rank r = 80. When the noise of φ = 3.5 was added to the matrix U of rank r = 120, the usefulness (D) of the four patterns was calculated.
It should be noted that these four patterns of data sets have the same probability (about 0.63) that the same user can be matched, that is, re-identified before and after anonymization.

有用性の評価には、データセットを教師データとした機械学習による予測精度であるＦ値を用い、匿名化前の元の行列ＭのＦ値に対する比を評価値Ｕｔｉｌｉｔｙ（Ｄ）とした。
従来の手法に比べて、本実施形態の手法では、特に、ランクｒ＝４０の行列Ｕにφ＝１．５のノイズを付加した場合には、高い有用性が認められる。 For the evaluation of usefulness, the F value, which is the prediction accuracy by machine learning using the data set as the teacher data, was used, and the ratio of the original matrix M before anonymization to the F value was defined as the evaluation value Utility (D).
Compared with the conventional method, the method of the present embodiment is particularly useful when noise of φ = 1.5 is added to the matrix U of rank r = 40.

ランクを小さくすることで、元の行列からの近似精度が低くなることから、テンソル分解自体にも匿名化の効果がある。ノイズ付加等の匿名化の演算と組み合わせることにより、これらのパラメータを適切に選択することで、同程度の安全性を持つ有用性の高い匿名化データセットが得られた。 By reducing the rank, the approximation accuracy from the original matrix becomes low, so the tensor decomposition itself has the effect of anonymization. By properly selecting these parameters in combination with anonymization operations such as noise addition, a highly useful anonymization dataset with the same degree of security was obtained.

本実施形態によれば、匿名化装置１は、行列等のテンソルで記述されたデータセットを、テンソル分解するによって、ユーザ毎の特徴を示す特定の因子（行列等のテンソル）を抽出する。匿名化装置１は、この特定の因子のみに対して匿名化の演算を行うことにより、ユーザ情報である履歴データ等、他の情報（特徴量）間の相関を維持しつつ、ユーザの特徴のみを匿名化できる。
したがって、匿名化装置１は、データセットの有用性を維持して匿名化できる。 According to the present embodiment, the anonymization device 1 extracts a specific factor (tensor such as a matrix) showing characteristics for each user by decomposing a data set described by a tensor such as a matrix into a tensor. The anonymization device 1 performs anonymization calculation only for this specific factor, so that only the user's characteristics are maintained while maintaining the correlation between other information (features) such as historical data which is user information. Can be anonymized.
Therefore, the anonymization device 1 can maintain the usefulness of the data set and anonymize it.

匿名化装置１は、匿名化の演算として、既知の手法を適用でき、例えば、ｋ−匿名化又はノイズ付加といった手法を容易に組み合わせて用いることができる。
また、匿名化装置１は、テンソル分解におけるランク、及び匿名化の強度をパラメータとして受け付けることにより、匿名化の対象とするデータセットに応じて、適切な設定により演算可能であり、データの匿名性と有用性とを高度に両立できる。 The anonymization device 1 can apply a known method as an anonymization operation, and for example, a method such as k-anonymization or noise addition can be easily combined and used.
Further, the anonymization device 1 accepts the rank in the tensor decomposition and the strength of anonymization as parameters, so that it can be calculated with appropriate settings according to the data set to be anonymized, and the anonymity of the data. And usefulness are highly compatible.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、前述した実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Moreover, the effects described in the above-described embodiments are merely a list of the most suitable effects resulting from the present invention, and the effects according to the present invention are not limited to those described in the embodiments.

匿名化装置１による匿名化方法は、ソフトウェアにより実現される。ソフトウェアによって実現される場合には、このソフトウェアを構成するプログラムが、情報処理装置（コンピュータ）にインストールされる。また、これらのプログラムは、ＣＤ−ＲＯＭのようなリムーバブルメディアに記録されてユーザに配布されてもよいし、ネットワークを介してユーザのコンピュータにダウンロードされることにより配布されてもよい。さらに、これらのプログラムは、ダウンロードされることなくネットワークを介したＷｅｂサービスとしてユーザのコンピュータに提供されてもよい。 The anonymization method by the anonymization device 1 is realized by software. When realized by software, the programs that make up this software are installed in the information processing device (computer). Further, these programs may be recorded on a removable medium such as a CD-ROM and distributed to the user, or may be distributed by being downloaded to the user's computer via a network. Further, these programs may be provided to the user's computer as a Web service via a network without being downloaded.

１匿名化装置
１０制御部
１１入力部
１２テンソル分解部
１３匿名化演算部
１４出力部
１５設定部
２０記憶部 1 Anonymization device 10 Control unit 11 Input unit 12 Tensor decomposition unit 13 Anonymization calculation unit 14 Output unit 15 Setting unit 20 Storage unit

Claims

An input unit that accepts input from a tensor that describes a data set of user information,
A tensor decomposition unit that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm, and a tensor decomposition unit.
Among the plurality of factors, the specific factor indicating the characteristics of each user is anonymized, and the specific factor is used to approximate the tensor by the product of the anonymized factor. Anonymization calculation unit that converts other factors except
An anonymization device including an output unit that calculates and outputs an anonymized tensor similar to the tensor by the product of the anonymized factor and the converted other factors.

The anonymization device according to claim 1, wherein the anonymization calculation unit performs a k-anonymization calculation for the specific factor.

The anonymization device according to claim 1, wherein the anonymization calculation unit performs an operation of adding noise to the specific factor.

The anonymization device according to any one of claims 1 to 3, further comprising a setting unit that accepts and sets the rank in the tensor decomposition and the strength of the anonymization as parameters.

An input step that accepts input from a tensor that describes a dataset of user information,
A tensor decomposition step that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm, and
Among the plurality of factors, the specific factor indicating the characteristics of each user is anonymized, and the specific factor is used to approximate the tensor by the product of the anonymized factor. Anonymization operation steps that transform other factors except
An anonymization method in which a computer executes an output step of calculating and outputting an anonymized tensor similar to the tensor by the product of the anonymized factor and the converted other factors.

An input step that accepts input from a tensor that describes a dataset of user information,
A tensor decomposition step that decomposes the tensor into a product of a plurality of factors by a predetermined tensor decomposition algorithm, and
Among the plurality of factors, the specific factor indicating the characteristics of each user is anonymized, and the specific factor is used to approximate the tensor by the product of the anonymized factor. Anonymization operation steps that transform other factors except
The anonymized factor, and the product of the transformed the other factors, anonymizing program to execute an output step, to a computer which calculates and outputs the anonymous tensor approximate to the tensor.