JP2015176496A

JP2015176496A - Risk analysis apparatus, method and program for anonymized data

Info

Publication number: JP2015176496A
Application number: JP2014054142A
Authority: JP
Inventors: オニバン・バス; Basu Anirban; 清本　晋作; Shinsaku Kiyomoto; 晋作清本; 正柳原; Tadashi Yanagihara; 敏朗疋田; Toshiro Hikita; 雄介田中; Yusuke Tanaka
Original assignee: KDDI Corp; Toyota InfoTechnology Center Co Ltd
Current assignee: KDDI Corp; Toyota InfoTechnology Center Co Ltd
Priority date: 2014-03-17
Filing date: 2014-03-17
Publication date: 2015-10-05
Anticipated expiration: 2034-03-17
Also published as: JP6300588B2

Abstract

PROBLEM TO BE SOLVED: To provide a risk analysis device for quantitatively analyzing such a risk level that an individual contained in anonymous data is uniquely specified, and optimizing parameters for use in anonymity by using the quantified risk level as a measuring rod, and to provide its method and program.SOLUTION: A risk analysis device 10 quantitatively analyzes a risk every record constituting anonymous data, quantifies and calculates a risk level of the anonymous data according to a specific scale (e.g., a prior knowledge of an attacker, and a content rate (reproduction rate and F-scale of a conformable rate)) based on the risk analyzed for every record, and outputs the calculated risk level (corresponding to the parameters for use in anonymity).

Description

本発明は、匿名化データにおけるリスク分析装置、方法及びプログラムに関する。 The present invention relates to a risk analysis apparatus, method, and program for anonymized data.

従来より、データを統計処理することによって、データの有効活用が図られている。例えば、特定の病気にかかりやすい年代、性別、地域、人種といった情報を含む大量のデータが広く公開され、統計処理されて、その傾向分析や予防対策に用いられている。 Conventionally, effective use of data has been attempted by statistically processing the data. For example, a large amount of data including information such as age, sex, region, and race that are likely to cause a specific disease is widely released, statistically processed, and used for trend analysis and preventive measures.

このようなデータを公開する場合には、プライバシーを慎重に保護する必要があるため、そのデータの所有者が特定されないように、データの変形処理を行う必要がある。そのため、今までにも、プライバシーを保護するためのデータの変形処理に関する技術が多く開示されている。例えば、データの一部を一般化やあいまい化することにより、データを組み合わせても個人が特定されないようにする技術（例えば、ｋ‐匿名化処理等）が開示されている（例えば、非特許文献１参照）。 When such data is disclosed, it is necessary to carefully protect the privacy, and therefore it is necessary to perform data transformation processing so that the owner of the data is not specified. Therefore, many techniques related to data transformation processing for protecting privacy have been disclosed so far. For example, a technique (for example, k-anonymization processing) is disclosed in which a part of data is generalized or ambiguous so that an individual is not specified even if data is combined (for example, non-patent literature) 1).

また、データセット全体における個人特定の確率の低下を可能とする技術が、非特許文献２に開示されている。非特許文献２では、履歴データの一種である移動データに対し、履歴に含まれる地点に対し、事例数の逆数を個人特定の確率とし、値が１／ｋに近い履歴データの事例を削減するリスク分析手法を開示している。 Further, Non-Patent Document 2 discloses a technique that can reduce the probability of individual identification in the entire data set. In Non-Patent Document 2, with respect to movement data that is a type of history data, the reciprocal of the number of cases is set as an individual specific probability for points included in the history, and the number of cases of history data whose value is close to 1 / k is reduced. Disclosure of risk analysis methods.

ＬａｔａｎｙａＳｗｅｅｎｅｙ，ｋ−ａｎｏｎｙｍｉｔｙ：ａｍｏｄｅｌｆｏｒｐｒｏｔｅｃｔｉｎｇｐｒｉｖａｃｙ，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＵｎｃｅｒｔａｉｎｔｙ，ＦｕｚｚｉｎｅｓｓａｎｄＫｎｏｗｌｅｄｇｅ−ＢａｓｅｄＳｙｓｔｅｍｓ，Ｖｏｌｕｍｅ１０Ｉｓｓｕｅ５，Ｏｃｔｏｂｅｒ２００２，Ｐａｇｅｓ５５７−５７０．Latina Sweney, k-anonymity: a model for protecting privacy, International Journal of Uncertainty, Fuzines and Knows 70, Vs. ＡｎｎａＭｏｎｒｅａｌｅ，ＧｅｎｎａｄｙＬ．Ａｎｄｒｉｅｎｋｏ，ＮａｔａｌｉａＶ．Ａｎｄｒｉｅｎｋｏ，ＦｏｓｃａＧｉａｎｎｏｔｔｉ，ＤｉｎｏＰｅｄｒｅｓｃｈｉ，ＳａｌｖａｔｏｒｅＲｉｎｚｉｖｉｌｌｏ，ＳｔｅｆａｎＷｒｏｂｅｌ：“ＭｏｖｅｍｅｎｔＤａｔａＡｎｏｎｙｍｉｔｙｔｈｒｏｕｇｈＧｅｎｅｒａｌｉｚａｔｉｏｎ．”ＴｒａｎｓａｃｔｉｏｎｓｏｎＤａｔａＰｒｉｖａｃｙ３（２）：９１−１２１（２０１０）Anna Monreale, Gennady L. Andrienko, Natalia V. Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, Stefan Wrobel: “Movement Data Anonymity through Throughgeneralization.

従来のリスク分析の事例としては、例えばｋ−匿名化を挙げることができる。これは、ワーストケースを考慮したリスク分析であり、最適化という視点に立てば必ずしも適した手法ではなかった。
従来手法では個人特定を試みる攻撃者の予備知識をモデルに加味していないため、個人特定の確率が実利用時の値より不当に高くなる可能性がある。
また、従来手法が用いる評価尺度では元データに実在しないデータ（ノイズ）を加える手法を正しく評価できない。具体的には、ノイズの量が反映されないため、同一の手法でもノイズの量が多い場合と少ない場合では評価手法の結果では区別がつかない（特定リスクが同一の場合はノイズがより少ない方がデータセットが良いと言える）。 Examples of conventional risk analysis include, for example, k-anonymization. This is a risk analysis considering the worst case, and is not always a suitable method from the viewpoint of optimization.
In the conventional method, since the prior knowledge of the attacker who tries to identify the individual is not taken into consideration in the model, the probability of identifying the individual may be unreasonably higher than the value at the time of actual use.
In addition, the evaluation scale used by the conventional method cannot correctly evaluate the method of adding data (noise) that does not exist in the original data. Specifically, since the amount of noise is not reflected, the result of the evaluation method cannot distinguish between cases where the amount of noise is large and small even if the same method is used (if the specific risk is the same, the noise is less. The dataset is good).

本発明における課題は、匿名化されたデータが開示された場合において、当該データに含まれる個人が一意に特定されるリスクをより現実的なモデルにおいて定量的に分析する手法並びに定量化されたリスクを尺度として匿名化に使用するパラメータを最適化する手法を考案することにある。 The problem in the present invention is that when anonymized data is disclosed, a method for quantitatively analyzing the risk of uniquely identifying an individual included in the data in a more realistic model and the quantified risk The idea is to devise a method for optimizing the parameters used for anonymization on the basis of.

本発明は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析し、定量化されたリスクレベルを尺度として匿名化に使用するパラメータを最適化するリスク分析装置、方法及びプログラムを提供することを目的とする。 The present invention quantitatively analyzes a risk level in which an individual included in the data is uniquely identified in anonymized data, and optimizes parameters used for anonymization using the quantified risk level as a measure An object of the present invention is to provide a risk analysis apparatus, method and program.

匿名化されたデータは、そのデータを構成する個人がそれぞれ１つのレコードに割り当てられているものとする。本手法は、匿名化されたデータから個人が一意に特定されるリスクレベルを定量的に評価する手法であって、まず、それぞれのレコードの個人特定リスクを定量的に評価し、定量化されたリスクに応じて、リスクの低いものからレコードを並べていき、ある尺度に従ってリスクレベルを定量化する手法である。 In the anonymized data, it is assumed that individuals constituting the data are each assigned to one record. This method quantitatively evaluates the risk level at which individuals are uniquely identified from anonymized data. First, the individual specific risk of each record was quantitatively evaluated and quantified. This is a technique that arranges records from the lowest risk according to the risk and quantifies the risk level according to a certain scale.

また、上記のようなリスクレベル定量化の手法であって、それぞれのレコードのリスクを評価する場合において、個人を特定しようとする攻撃者の予備知識の量に応じてリスクを算定し、ワーストケースをリスクの上限とする手法である。 In addition, the risk level quantification method as described above, and when assessing the risk of each record, the risk is calculated according to the amount of prior knowledge of the attacker who tries to identify the individual, and the worst case This is a method with an upper limit of risk.

また、上記のようなリスクレベル定量化手法であって、移動軌跡のデータに対して、その一部分、あるいは全部を知っていることを攻撃者の予備知識とし、既知である軌跡のノード数を予備知識の量として定量化する手法である。 In addition, the risk level quantification method as described above is based on the fact that the attacker knows part or all of the movement trajectory data, and reserves the number of nodes in the known trajectory. It is a technique for quantifying the amount of knowledge.

又は、上記のようなリスクレベル定量化の手法であって、地点の匿名化処理に対し、元データに含まれる地点数の含有率を算出すると共に、匿名化処理において元データに実在しないデータの含有率を表わす数値を算出するリスクレベル定量化手法である。 Or, it is a risk level quantification method as described above, and for the point anonymization process, the content rate of the number of points included in the original data is calculated, and the data that does not exist in the original data in the anonymization process This is a risk level quantification method for calculating a numerical value representing the content rate.

また、上記のようなリスクレベル定量化手法であって、匿名化処理で用いたパラメータに基づいて算出された出力結果の件数に従い、前記の再現性（リコール）や含有率と比較し、パラメータを自動決定する匿名化パラメータ決定手法である。 Further, in the risk level quantification method as described above, according to the number of output results calculated based on the parameters used in the anonymization process, the parameters are compared with the reproducibility (recall) and content rate. This is an anonymization parameter determination method for automatic determination.

さらに、以上のリスクレベル定量化手法を用いてリスクモデルを構築し、そのモデルにおいて利得が最大となる匿名化のためのパラメータを選択することによって、最適なパラメータを得る手法である。 Furthermore, a risk model is constructed using the above risk level quantification technique, and an optimal parameter is obtained by selecting a parameter for anonymization that maximizes the gain in the model.

具体的には、以下のような解決手段を提供する。
（１）個人に関する情報を含む匿名化データから前記個人が一意に特定されるリスクレベルを分析するリスク分析装置であって、前記匿名化データを構成するレコードごとに、リスクを定量的に分析するリスク分析手段と、前記リスク分析手段によって分析された前記レコードごとの前記リスクに基づいて、特定の尺度に従って前記匿名化データのリスクレベルを定量化して算出するリスクレベル算出手段と、前記リスクレベル算出手段によって算出された前記リスクレベルを出力するリスクレベル出力手段と、を備えるリスク分析装置。 Specifically, the following solutions are provided.
(1) A risk analysis device that analyzes a risk level in which the individual is uniquely identified from anonymized data including information related to the individual, and quantitatively analyzes the risk for each record constituting the anonymized data Risk analysis means, risk level calculation means for quantifying and calculating the risk level of the anonymized data according to a specific scale based on the risk for each record analyzed by the risk analysis means, and the risk level calculation And a risk level output means for outputting the risk level calculated by the means.

（１）の構成によれば、リスク分析装置は、匿名化データを構成するレコードごとに、リスクを定量的に分析し、分析したレコードごとのリスクに基づいて、特定の尺度（例えば、攻撃者の予備知識、含有率（再現率及び適合率のＦ−尺度））に従って匿名化データのリスクレベルを定量化して算出し、算出したリスクレベル（匿名化に使用するパラメータに対応する）を出力する。 According to the configuration of (1), the risk analysis apparatus quantitatively analyzes the risk for each record constituting the anonymized data, and based on the risk for each analyzed record (for example, an attacker) Quantified and calculated risk level of anonymized data according to prior knowledge and content rate (F-scale of recall and relevance rate)), and outputs the calculated risk level (corresponding to parameters used for anonymization) .

したがって、（１）に係るリスク分析装置は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析し、定量化されたリスクレベルを尺度として匿名化に使用するパラメータを最適化することができる。 Therefore, the risk analysis apparatus according to (1) quantitatively analyzes the anonymized data for the risk level at which an individual included in the data is uniquely identified, and anonymized using the quantified risk level as a scale. The parameters used for the optimization can be optimized.

（２）前記リスク分析手段は、それぞれのレコードの前記リスクを評価する場合において、個人を特定しようとする攻撃者の予備知識の量に応じて前記リスクを算定する、（１）に記載のリスクレベル分析装置。 (2) The risk according to (1), wherein the risk analysis means calculates the risk according to an amount of prior knowledge of an attacker who attempts to identify an individual when evaluating the risk of each record. Level analyzer.

したがって、（２）に係るリスク分析装置は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを攻撃者の予備知識を含めて定量的に分析することができる。 Therefore, the risk analysis apparatus according to (2) can quantitatively analyze the anonymized data, including the attacker's prior knowledge, the risk level that uniquely identifies the individual included in the data. .

（３）前記リスク分析手段は、移動軌跡のデータに対して、その一部分、あるいは全部を知っていることを前記攻撃者の予備知識とし、既知である軌跡のノード数を予備知識の量として定量化する、（２）に記載のリスクレベル分析装置。 (3) The risk analysis means quantifies the number of nodes of the known trajectory as the amount of prior knowledge, with knowledge of a part or all of the movement trajectory data as the prior knowledge of the attacker. The risk level analysis device according to (2).

したがって、（３）に係るリスク分析装置は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを、攻撃者の予備知識の量に応じて定量的に分析することができる。 Therefore, the risk analysis apparatus according to (3) quantitatively analyzes the risk level in which the individual included in the data is uniquely specified in the anonymized data according to the amount of the attacker's prior knowledge. be able to.

（４）前記リスクレベル算出手段は、前記リスク分析手段によって定量化された前記リスクと閾値とを比較し、前記リスクが前記閾値以下である前記レコードの数が、前記匿名化データの前記レコードの総数に占める割合をリスクレベルとして算出する、（１）に記載のリスク分析装置。 (4) The risk level calculation means compares the risk quantified by the risk analysis means with a threshold value, and the number of the records whose risk is equal to or less than the threshold value is the number of the records of the anonymized data. The risk analyzer according to (1), wherein a ratio of the total number is calculated as a risk level.

したがって、（４）に係るリスク分析装置は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析することができる。 Therefore, the risk analysis apparatus according to (4) can quantitatively analyze the risk level in which the individual included in the data is uniquely specified in the anonymized data.

（５）前記リスクレベル算出手段は、匿名化される前の元データに含まれる属性であって前記匿名化データにも含まれる前記属性の個数が、前記元データに含まれる前記属性の総数に対する比率である再現率を算出する再現率算出手段と、前記元データに含まれる前記属性であって前記匿名化データにも含まれる前記属性の個数が、前記匿名化データに含まれる前記属性の総数に対する比率である適合率を算出する適合率算出手段と、前記再現率算出手段によって算出された前記再現率と、前記適合率算出手段によって算出された前記適合率とに基づいて特定の尺度を算出する尺度算出手段と、をさらに備え、前記リスクレベル算出手段は、前記尺度算出手段によって算出された前記特定の尺度に基づいて、匿名化のためのパラメータを算出する（４）に記載のリスク分析装置。 (5) The risk level calculation means is an attribute included in the original data before being anonymized, and the number of attributes included in the anonymized data corresponds to a total number of the attributes included in the original data. Recall rate calculating means for calculating a recall rate that is a ratio, and the number of the attributes that are included in the original data and also included in the anonymized data are the total number of the attributes included in the anonymized data A specific scale is calculated based on the relevance ratio calculating means for calculating the relevance ratio, which is a ratio to the ratio, the recall ratio calculated by the recall ratio calculation means, and the relevance ratio calculated by the precision ratio calculation means And a risk level calculating means for calculating a parameter for anonymization based on the specific scale calculated by the scale calculating means. Risk analysis apparatus according to (4).

したがって、（５）に係るリスク分析装置は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを、ノイズが加えられた匿名化データについて定量的に分析することができる。 Therefore, the risk analysis apparatus according to (5) quantitatively analyzes the anonymized data in which noise is added to the risk level in which the individual included in the data is uniquely specified in the anonymized data. Can do.

（６）（１）に記載のリスク分析装置が実行する方法であって、前記リスク分析手段が、前記匿名化データを構成するレコードごとに、リスクを定量的に分析するリスク分析ステップと、前記リスクレベル算出手段が、前記リスク分析ステップによって分析された前記レコードごとの前記リスクに基づいて、特定の尺度に従って前記匿名化データのリスクレベルを定量化して算出するリスクレベル算出ステップと、前記リスクレベル出力手段が、前記リスクレベル算出ステップによって算出された前記リスクレベルを出力するリスクレベル出力ステップと、を備える方法。 (6) A method executed by the risk analysis apparatus according to (1), in which the risk analysis unit quantitatively analyzes a risk for each record constituting the anonymized data; A risk level calculating step for quantifying and calculating a risk level of the anonymized data according to a specific scale based on the risk for each of the records analyzed by the risk analysis step; and the risk level A method comprising: a risk level output step in which the output means outputs the risk level calculated by the risk level calculation step.

したがって、（６）に係る方法は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析し、定量化されたリスクレベルを尺度として匿名化に使用するパラメータを最適化することができる。 Therefore, in the method according to (6), in the anonymized data, the risk level in which the individual included in the data is uniquely identified is quantitatively analyzed, and the anonymization is performed using the quantified risk level as a scale. The parameters used can be optimized.

（７）コンピュータに、（６）に記載の方法の各ステップを実行させるためのプログラム。 (7) A program for causing a computer to execute each step of the method according to (6).

したがって、（７）に係るプログラムは、コンピュータに、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析させ、定量化されたリスクレベルを尺度として匿名化に使用するパラメータを最適化させることができる。 Therefore, the program according to (7) causes the computer to quantitatively analyze the risk level in which the individual included in the data is uniquely identified in the anonymized data, and the quantified risk level as a scale. Parameters used for anonymization can be optimized.

本発明によれば、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析することができる。
さらに、本発明によれば、ノイズを加える匿名化手法を評価することが可能になり、滞在を表す地点の情報から個人が特定されるリスクを定量化でき、地点の匿名化パラメータを自動決定できる。 According to the present invention, in the anonymized data, it is possible to quantitatively analyze a risk level at which an individual included in the data is uniquely specified.
Furthermore, according to the present invention, it becomes possible to evaluate an anonymization method for adding noise, and it is possible to quantify the risk of identifying an individual from information on a point representing stay, and to automatically determine an anonymization parameter of the point. .

本発明の実施形態１に係るリスク分析装置の構成を示す図である。It is a figure which shows the structure of the risk analyzer which concerns on Embodiment 1 of this invention. 本発明の実施形態１に係るリスク分析装置において、リスクが同一の値を持つ地点ごとにまとめた場合の例を示す図である。In the risk analysis apparatus which concerns on Embodiment 1 of this invention, it is a figure which shows the example at the time of putting together for every point where a risk has the same value. 本発明の実施形態１に係るリスク分析装置において、閾値以下のリスクの値を有するレコードの総数が、全体のレコードの総数に占める割合の例を示す図である。In the risk analysis apparatus according to Embodiment 1 of the present invention, it is a diagram illustrating an example of a ratio of the total number of records having a risk value equal to or less than a threshold to the total number of records. 本発明の実施形態１に係るリスク分析装置の処理を示すフローチャートである。It is a flowchart which shows the process of the risk analyzer which concerns on Embodiment 1 of this invention. 本発明の実施形態２に係るリスク分析装置の構成を示す図である。It is a figure which shows the structure of the risk analyzer which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るリスク分析装置において、パラメータｋにおける再現率と個人が特定されるリスクとの関係の例を示す図である。In the risk analysis apparatus which concerns on Embodiment 2 of this invention, it is a figure which shows the example of the relationship between the reproduction rate in the parameter k, and the risk that an individual is specified. 本発明の実施形態２に係るリスク分析装置において、パラメータｋにおける適合率と個人が特定されるリスクとの関係の例を示す図である。In the risk analysis apparatus which concerns on Embodiment 2 of this invention, it is a figure which shows the example of the relationship between the relevance rate in the parameter k, and the risk that an individual is specified. 本発明の実施形態２に係るリスク分析装置において、リスクの閾値とＦ−尺度との関係の例を示す図である。In the risk analysis apparatus which concerns on Embodiment 2 of this invention, it is a figure which shows the example of the relationship between the threshold value of a risk, and F-scale. 本発明の実施形態２に係るリスク分析装置の処理を示すフローチャートである。It is a flowchart which shows the process of the risk analyzer which concerns on Embodiment 2 of this invention.

本実施形態において、個人に関する情報を含むデータ（元データと言う。）は、個人が一意に特定されないように、匿名化データ作成装置（図示せず）によって匿名化される。
リスク分析装置１０は、匿名化されたデータ（匿名化データと言う。）について、個人が一意に特定されるリスクレベルを分析する。
リスク分析装置１０によって分析されたリスクレベルに基づいて、個人が特定される可能性の小さい匿名化データが匿名化データ作成装置によって作成されることが可能となる。
以下、本発明の実施形態について、図を参照しながら説明する。 In the present embodiment, data including information about an individual (referred to as original data) is anonymized by an anonymized data creation device (not shown) so that the individual is not uniquely identified.
The risk analysis apparatus 10 analyzes a risk level at which an individual is uniquely specified for anonymized data (referred to as anonymized data).
Based on the risk level analyzed by the risk analysis device 10, anonymized data that is less likely to identify an individual can be created by the anonymized data creation device.
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［実施形態１］
実施形態１のリスク分析装置１０は、個人を特定しようとする攻撃者の予備知識の量に応じてリスクを算定し、ワーストケースをリスクの上限とする。 [Embodiment 1]
The risk analysis apparatus 10 according to the first embodiment calculates a risk according to the amount of prior knowledge of an attacker who attempts to identify an individual, and sets the worst case as the upper limit of the risk.

具体的には、移動軌跡のデータに対して、その一部分、あるいは全部を知っていることを攻撃者の予備知識とし、既知である軌跡のノード数（例えば、軌跡を示す地点の数の割合）を予備知識の量として定量化する。 Specifically, it is assumed that the attacker knows part or all of the movement trajectory data, and the number of nodes of the known trajectory (for example, the ratio of the number of points indicating the trajectory). Is quantified as the amount of prior knowledge.

リスク分析装置１０は、匿名化データについて、（１）から（３）の手順によってリスク分析する。
（１）まず、匿名化データを構成するそれぞれのレコードについて、個人が特定されるリスクを定量的に評価する。
（２）レコードごとの定量化されたリスクに基づいて、リスクの低いものからレコードを並べる。
（３）攻撃者の予備知識を定めた場合において、匿名化データについてのリスクレベルを定量的に算出する。 The risk analysis apparatus 10 performs risk analysis on the anonymized data according to the procedures (1) to (3).
(1) First, the risk that an individual is specified is quantitatively evaluated for each record constituting anonymized data.
(2) Based on the quantified risk for each record, the records are arranged from the lowest risk.
(3) When the attacker's prior knowledge is determined, the risk level for the anonymized data is quantitatively calculated.

具体的には、リスク分析装置１０は、リスク分析手段１１と、リスクレベル算出手段１２と、リスクレベル出力手段１３と、を備える。 Specifically, the risk analysis apparatus 10 includes a risk analysis unit 11, a risk level calculation unit 12, and a risk level output unit 13.

リスク分析手段１１は、匿名化データを構成するレコードごとに、リスクを定量的に分析する。リスク分析手段１１は、それぞれのレコードのリスクを評価する場合において、個人を特定しようとする攻撃者の予備知識の量に応じてリスクを算定する。リスク分析手段１１は、移動軌跡のデータに対して、その一部分、あるいは全部を知っていることを攻撃者の予備知識とし、既知である軌跡のノード数（例えば、移動した軌跡を示す位置情報の数）を予備知識の量として定量化する。 The risk analysis means 11 quantitatively analyzes the risk for each record constituting the anonymized data. When the risk analysis means 11 evaluates the risk of each record, the risk analysis means 11 calculates the risk in accordance with the amount of prior knowledge of the attacker who intends to identify an individual. The risk analysis means 11 uses the knowledge of a part or all of the movement trajectory data as prior knowledge of the attacker, and uses the number of nodes of the known trajectory (for example, position information indicating the moved trajectory). Number) as the amount of prior knowledge.

リスクレベル算出手段１２は、リスク分析手段１１によって分析されたレコードごとのリスクに基づいて、特定の尺度（例えば、攻撃者の予備知識）に従って匿名化データのリスクレベルを定量化して算出する。具体的には、リスクレベル算出手段１２は、リスク分析手段１１によって定量化されたリスクと閾値とを比較し、リスクが閾値以下であるレコードの数が、匿名化データのレコードの総数に占める割合をリスクレベルとして算出する。 The risk level calculation means 12 quantifies and calculates the risk level of the anonymized data according to a specific scale (for example, preliminary knowledge of the attacker) based on the risk for each record analyzed by the risk analysis means 11. Specifically, the risk level calculation means 12 compares the risk quantified by the risk analysis means 11 with a threshold value, and the ratio of the number of records whose risk is equal to or less than the threshold value to the total number of anonymized data records Is calculated as a risk level.

リスクレベル出力手段１３は、リスクレベル算出手段１２によって算出されたリスクレベルを出力する。具体的には、リスクレベル出力手段１３は、攻撃者の予備知識の量であってユーザによって指定された予備知識の量を受け付け、受け付けた予備知識の量に対応するパラメータｋの値を算出し、算出したパラメータｋの値をリスクレベルとして出力する。 The risk level output unit 13 outputs the risk level calculated by the risk level calculation unit 12. Specifically, the risk level output means 13 accepts the amount of preliminary knowledge of the attacker and the amount of preliminary knowledge designated by the user, and calculates the value of the parameter k corresponding to the amount of preliminary knowledge received. The value of the calculated parameter k is output as the risk level.

上述の内容を、各個人の軌跡データを例として、以下に示す。例えば、各個人の軌跡が、数１のように表わされる場合について説明する。
ここで、Ａ〜Ｓは位置を示し、Ａ→Ｂにより個人が移動した軌跡を示している。 The above-described contents are shown below by taking the trajectory data of each individual as an example. For example, a case where the trajectory of each individual is expressed as in Equation 1 will be described.
Here, A to S indicate positions, and indicate a trajectory in which an individual has moved by A → B.

数１で表わされる元データについて、匿名化データは、数２で表わされるようなデータである。

For the original data represented by Equation 1, the anonymized data is data represented by Equation 2.

数２によって表わされるように、匿名化データは、個人が特定されないように、ｔ１〜ｔ６やｔ８、ｔ９のように属性（例えば、軌跡を示す地点）が加工されている。また、ｔ７は直前のデータｔ６により、ｔ１０は直前のデータｔ９により、それぞれ匿名化のパラメータｋの条件を満たすように加工されている。 As represented by Equation 2, the anonymized data is processed with attributes (for example, points indicating a trajectory) such as t1 to t6, t8, and t9 so that an individual is not specified. Further, t7 is processed by the immediately preceding data t6 and t10 is processed by the immediately preceding data t9 so as to satisfy the condition of the anonymization parameter k.

このとき、リスクはそれぞれ数３のように評価される。 At this time, each risk is evaluated as shown in Equation 3.

すなわち、軌跡を表わす位置の頻度の逆数を定量的な評価とする。例えば、位置「Ａ」は７つのレコードに出現するので、Ａを含むレコードのリスクは１／７になる。
次に、リスクが同一の値を持つ地点ごとにまとめ、匿名化データに対するまとめた地点の割合を求める。 That is, the reciprocal of the frequency of the position representing the trajectory is used as a quantitative evaluation. For example, since the position “A” appears in seven records, the risk of the record including A is 1/7.
Next, the risk is summarized for each point having the same value, and the ratio of the collected points to the anonymized data is obtained.

図２は、リスクが同一の値を持つ地点ごとにまとめた場合の例を示す図である。図２の例は、リスク分析装置１０が分析した結果、匿名化されたデータのうち、リスクが０であるレコードが５０％、リスクが１／７であるレコードが２０％、リスクが１／６であるレコードが１０％、リスクが１／３であるレコードが１５％であることを示す例である。 FIG. 2 is a diagram illustrating an example when the risks are grouped for each point having the same value. In the example of FIG. 2, as a result of analysis by the risk analysis device 10, among the anonymized data, 50% of the records have a risk of 0, 20% have a risk of 1/7, and 1/6 have the risk This is an example showing that 10% records and 15% risk records are 15%.

このようなリスクの値を積算すると、例えば、リスクの値が閾値以下である地点の割合が求められ、図３の様なグラフとなる。
図３は、閾値以下のリスクの値を有するレコードの総数が、全体のレコードの総数に占める割合の例を示す図である。図３において、横軸はリスクの閾値、縦軸は攻撃者の予備知識レベルを示している。
図２のようなリスク分析結果から、リスクの閾値が０．０１以下が５０％、０．１５以下が７０％（５０％＋２０％）、０．１７以下が８０％（５０％＋２０％＋１０％）、０．３５以下が９５％（５０％＋２０％＋１０％＋１５％）というような値になり、レコードの割合を攻撃者の予備知識に対応させて、図３のようなグラフに表現することができる。
なお、図３は、属性（例えば、地点数）が３つの場合（ｈ＝３）のグラフと、属性（例えば、地点数）が５つの場合（ｈ＝５）のグラフとの例を示している。 When such risk values are integrated, for example, the ratio of points where the risk value is equal to or less than the threshold value is obtained, and a graph as shown in FIG. 3 is obtained.
FIG. 3 is a diagram illustrating an example of a ratio of the total number of records having a risk value equal to or less than the threshold to the total number of records. In FIG. 3, the horizontal axis indicates the risk threshold, and the vertical axis indicates the attacker's prior knowledge level.
From the risk analysis results shown in FIG. 2, the risk threshold is 0.01% or less at 50%, 0.15 or less at 70% (50% + 20%), and 0.17 or less at 80% (50% + 20% + 10%). ), 0.35 or less becomes 95% (50% + 20% + 10% + 15%), and the ratio of records corresponds to the attacker's prior knowledge and is expressed in the graph as shown in FIG. Can do.
FIG. 3 shows an example of a graph with three attributes (for example, the number of points) (h = 3) and a graph with five attributes (for example, the number of points) (h = 5). Yes.

したがって、攻撃者の知識レベルを固定すれば、選択すべきパラメータｋの値を求めることができる。
例えば、攻撃者の知識レベルを９５％とし、属性が３つの場合（ｈ＝３）とすると、選択すべきパラメータｋの値は、例えば、１００００となる。 Therefore, if the attacker's knowledge level is fixed, the value of the parameter k to be selected can be obtained.
For example, assuming that the attacker's knowledge level is 95% and there are three attributes (h = 3), the value of the parameter k to be selected is 10,000, for example.

図４は、本発明の実施形態１に係るリスク分析装置１０の処理を示すフローチャートである。リスク分析装置１０は、コンピュータ及びその周辺装置が備えるハードウェア並びに該ハードウェアを制御するソフトウェアによって構成され、以下の処理は、制御部（例えば、リスク分析装置１０のＣＰＵ）が所定のソフトウェアに従い実行する処理である。 FIG. 4 is a flowchart showing processing of the risk analysis apparatus 10 according to the first embodiment of the present invention. The risk analysis apparatus 10 includes hardware included in a computer and its peripheral devices, and software that controls the hardware. The following processing is executed by a control unit (for example, the CPU of the risk analysis apparatus 10) according to predetermined software. It is processing to do.

ステップＳ１０１において、ＣＰＵ（リスク分析手段１１）は、匿名化データを取得する。より具体的には、ＣＰＵは、匿名化パラメータｋの匿名化データを取得する。 In step S101, the CPU (risk analysis means 11) acquires anonymized data. More specifically, the CPU acquires anonymization data of the anonymization parameter k.

ステップＳ１０２において、ＣＰＵ（リスク分析手段１１）は、レコードごとのリスクを算出する。より具体的には、ＣＰＵは、匿名化データを構成するレコードにおいて個人を特定する属性が出現する頻度を算出し、算出した頻度の逆数を求める。 In step S102, the CPU (risk analysis means 11) calculates the risk for each record. More specifically, the CPU calculates the frequency at which an attribute that specifies an individual appears in a record constituting anonymized data, and obtains the reciprocal of the calculated frequency.

ステップＳ１０３において、ＣＰＵ（リスクレベル算出手段１２）は、算出したリスクに基づいて匿名化データのリスクレベルを攻撃者の予備知識に応じて算出する。より具体的には、ＣＰＵは、ステップＳ１０２において算出されたリスクと閾値とを比較し、リスクが閾値以下であるレコードの数が、匿名化データのレコードの総数に占める割合をリスクレベルとして算出する。 In step S103, the CPU (risk level calculation means 12) calculates the risk level of the anonymized data according to the prior knowledge of the attacker based on the calculated risk. More specifically, the CPU compares the risk calculated in step S102 with a threshold value, and calculates, as a risk level, the ratio of the number of records whose risk is equal to or less than the threshold value to the total number of anonymized data records. .

ステップＳ１０４において、ＣＰＵ（リスクレベル出力手段１３）は、算出したリスクレベルを出力する。 In step S104, the CPU (risk level output means 13) outputs the calculated risk level.

［実施形態２］
実施形態２では、それぞれのレコードのリスクを評価する場合において、元データに含まれる属性（例えば、滞在した地点を表わすデータ）が匿名化データに含まれる率である含有率を算出し、算出した含有率に応じてリスクを算定する。
前提として、匿名化パラメータｋによって匿名化データを作成する装置があり、地点を含む移動データを入力することで、匿名化された地点の集合（例えば、匿名化された移動データ）が得られるものとする。 [Embodiment 2]
In the second embodiment, when evaluating the risk of each record, the content rate, which is the rate at which the attribute included in the original data (for example, data representing the place where the user stayed) is included in the anonymized data, is calculated and calculated. The risk is calculated according to the content rate.
As a premise, there is a device that creates anonymized data with an anonymization parameter k, and a set of anonymized points (for example, anonymized moving data) can be obtained by inputting movement data including the points And

リスク分析装置１０は、リスク分析手段１１と、リスクレベル算出手段１２と、リスクレベル出力手段１３と、を備え、リスクレベル算出手段１２は、再現率算出手段１２１と、適合率算出手段１２２と、尺度算出手段１２３と、をさらに備える。リスク分析手段１１及びリスクレベル出力手段１３は、実施形態１と同様である。 The risk analysis apparatus 10 includes a risk analysis unit 11, a risk level calculation unit 12, and a risk level output unit 13. The risk level calculation unit 12 includes a recall rate calculation unit 121, a conformance rate calculation unit 122, Scale calculating means 123. The risk analysis means 11 and the risk level output means 13 are the same as in the first embodiment.

再現率算出手段１２１は、匿名化される前の元データに含まれる属性であって匿名化データにも含まれる属性の個数が、元データに含まれる属性の総数に対する比率である再現率を算出する。
適合率算出手段１２２は、元データに含まれる属性であって匿名化データにも含まれる属性の個数が、匿名化データに含まれる属性の総数に対する比率である適合率を算出する。
尺度算出手段１２３は、再現率算出手段１２１によって算出された再現率と、適合率算出手段１２２によって算出された適合率とに基づいて特定の尺度（例えば、Ｆ−尺度）を算出する。 The recall ratio calculating unit 121 calculates a recall ratio in which the number of attributes included in the original data before being anonymized and included in the anonymized data is a ratio to the total number of attributes included in the original data. To do.
The relevance ratio calculating unit 122 calculates a relevance ratio in which the number of attributes included in the original data and included in the anonymized data is a ratio to the total number of attributes included in the anonymized data.
The scale calculation unit 123 calculates a specific scale (for example, F-scale) based on the reproduction rate calculated by the reproduction rate calculation unit 121 and the matching rate calculated by the matching rate calculation unit 122.

個人に関する情報として個人が滞在した地点の情報を含む移動データを例として、上述の内容を説明する。 The above-described content will be described by taking movement data including information on a place where an individual stays as information related to an individual as an example.

リスクレベル算出手段１２は、以下の３つの評価値（再現率、適合率、Ｆ−尺度）を算出する。３つの評価値は、次のように定義される。
再現率（Ｒｅｃａｌｌ）：（ＮｏＰ（Ｂ）∧ＮｏＰ（Ａ））／ＮｏＰ（Ｂ）
適合率（Ｐｒｅｃｉｓｉｏｎ）：（ＮｏＰ（Ｂ）∧ＮｏＰ（Ａ））／ＮｏＰ（Ａ）
Ｆ−尺度（Ｆ−Ｍｅａｓｕｒｅ）：適合率と再現率の調和平均 The risk level calculation means 12 calculates the following three evaluation values (recall rate, relevance rate, F-scale). The three evaluation values are defined as follows.
Recall rate (Recall): (NoP (B) ∧NoP (A)) / NoP (B)
Precision (Precision): (NoP (B) ∧NoP (A)) / NoP (A)
F-Measure: Harmonic average of precision and recall

ここで、ＮｏＰ（Ａ）、ＮｏＰ（Ｂ）、ＮｏＰ（Ｂ）∧ＮｏＰ（Ａ）は、それぞれ次のように定義される。
ＮｏＰ（Ａ）：匿名化処理をした後の匿名化データに含まれる属性（例えば、地点）の集合
ＮｏＰ（Ｂ）：匿名化処理をする前の元データに含まれる属性（例えば、地点）の集合
ＮｏＰ（Ｂ）∧ＮｏＰ（Ａ）：元データにも、匿名化データにも含まれる属性（例えば、地点）の集合 Here, NoP (A), NoP (B), NoP (B) ∧NoP (A) are defined as follows.
NoP (A): Set of attributes (eg, points) included in anonymized data after anonymization processing NoP (B): Attributes (eg, points) included in original data before anonymization processing Set NoP (B) ∧ NoP (A): Set of attributes (for example, points) included in both original data and anonymized data

リスク分析装置１０は、以下の（１）〜（４）の処理を匿名化パラメータｋを２，３，・・・と増やしながら実施する。
（１）匿名化データを構成するレコードに含まれる地点を、同じ地点ごとに数えて地点の頻度を算出し、算出した頻度（ｎ）の逆数を求め、その地点のリスク（１／ｎ）とする。
（２）同一のリスクを持つ地点ごとにまとめる。
例えば、リスクが０．３３である地点同士、リスクが０．０２である地点同士、リスクが０．０１である地点同士をまとめる。
（３）リスクの値が小さい地点の集合から順に、それらの集合に含まれる地点に基づいて、それらの地点の集合における再現率を求める。
（４）このとき、ｎ番目に小さいエリアの集合から再現率を求める際に、１，２，・・・ｎ番目までの小さいエリアの集合を累積した結果から再現率を求める。
例えば、リスクが０．０１である地点同士の集合において、再現率を算出する。次に、リスクが０．０２である地点同士の集合において再現率を算出する際に、リスクが０．０１である地点同士の集合を累積した結果から再現率を算出する。同様に、リスクが０．３３である地点同士の集合において再現率を算出する際に、リスクが０．０１から０．３３までの地点同士の集合を累積した結果から再現率を算出する。
このように算出して結果をグラフに表わすと、図６のようなグラフに表現される。
図６は、パラメータｋにおける再現率と個人が特定されるリスクとの関係の例を示す図である。図６において、縦軸は再現率、横軸は個人が特定されるリスク（確率）である。 The risk analysis apparatus 10 performs the following processes (1) to (4) while increasing the anonymization parameter k to 2, 3,.
(1) Count the points included in the records that make up the anonymized data for each same point, calculate the frequency of the points, find the reciprocal of the calculated frequency (n), and calculate the risk (1 / n) at that point To do.
(2) Summarize each spot with the same risk.
For example, points where the risk is 0.33, points where the risk is 0.02, and points where the risk is 0.01 are collected.
(3) In order from the set of points with the smallest risk value, the recall rate in the set of points is obtained based on the points included in the set.
(4) At this time, when the recall is obtained from the nth smallest area set, the recall is obtained from the result of accumulating the 1,2,... Nth smaller area sets.
For example, the recall is calculated for a set of points with a risk of 0.01. Next, when calculating the recall for a set of points with a risk of 0.02, the recall is calculated from the result of accumulating the set of points with a risk of 0.01. Similarly, when calculating the recall rate for a set of points with a risk of 0.33, the recall rate is calculated from the result of accumulating a set of points with a risk of 0.01 to 0.33.
When the calculation is made in this way and the result is represented in a graph, it is represented in a graph as shown in FIG.
FIG. 6 is a diagram illustrating an example of the relationship between the recall rate in parameter k and the risk of identifying an individual. In FIG. 6, the vertical axis represents the recall rate, and the horizontal axis represents the risk (probability) of identifying an individual.

再現率と同様に、適合率でも同様に計算する。
すなわち、リスクの値が小さい地点の集合から順に、それらの集合に含まれる地点に基づいて、それらの地点の集合における適合率を求める。
このとき、ｎ番目に小さいエリアの集合から適合率を求める際に、１，２，・・・ｎ番目までの小さいエリアの集合を累積した結果から適合率を求める。
このように算出して結果をグラフに表わすと、図７のようなグラフに表現される。
図７は、パラメータｋにおける適合率と個人が特定されるリスクとの関係の例を示す図である。図７において、縦軸は再現率、横軸は個人が特定されるリスク（確率）である。 Similar to the recall, the precision is calculated in the same way.
That is, in accordance with the points included in the set in order from the set of points with the smallest risk value, the relevance ratio in the set of those points is obtained.
At this time, when obtaining the relevance ratio from the nth smallest area set, the relevance ratio is obtained from the result of accumulating the 1,2,... Nth small area sets.
When the calculation is performed as described above and the result is represented in a graph, the graph is represented as shown in FIG.
FIG. 7 is a diagram illustrating an example of the relationship between the relevance ratio in the parameter k and the risk of identifying an individual. In FIG. 7, the vertical axis represents the recall rate, and the horizontal axis represents the risk (probability) for identifying an individual.

最後に横軸の各項目に対し、適合率と再現率との調和平均であるＦ−尺度を求め、最大となる点を選択する。
すなわち、リスク値についての閾値ｔを設定する（０＜ｔ＜１／ｋ）。
横軸の各項目に対し、適合率と再現率との調和平均であるＦ−尺度を求める。Ｆ−尺度（Ｆ−Ｍｅａｓｕｒｅ）は一般に以下の公式で求められる：
Ｆ−Ｍｅａｓｕｒｅ＝（２＊Ｒｅｃａｌｌ＊Ｐｒｅｃｉｓｉｏｎ）／（Ｒｅｃａｌｌ＋Ｐｒｅｃｉｓｉｏｎ） Finally, for each item on the horizontal axis, the F-scale that is the harmonic average of the precision and recall is obtained, and the point that maximizes is selected.
That is, the threshold value t for the risk value is set (0 <t <1 / k).
For each item on the horizontal axis, an F-scale that is a harmonic average of the precision and recall is obtained. The F-Measure is generally determined by the following formula:
F-Measure = (2 * Recall * Precision) / (Recall + Precision)

図８は、リスクの閾値とＦ−尺度との関係の例を示す図である。図８（１）の例は、匿名化パラメータｋが３の場合の例であり、図８（２）の例は、匿名化パラメータｋが４の場合の例であり、図８（３）の例は、匿名化パラメータｋが５の場合の例である。
この例ではｋ＝３，ｔ＝０．１５のときのＦ−尺度が最大となったため、これらのパラメータを確定させ、ｋ＝３のときにリスクが０．１５以下に該当する地点のみを含むデータを出力結果とする。 FIG. 8 is a diagram illustrating an example of the relationship between the risk threshold and the F-scale. The example of FIG. 8 (1) is an example when the anonymization parameter k is 3, and the example of FIG. 8 (2) is an example when the anonymization parameter k is 4, as shown in FIG. 8 (3). An example is an example when the anonymization parameter k is 5.
In this example, since the F-scale when k = 3 and t = 0.15 is maximized, these parameters are determined, and only points where the risk falls below 0.15 are included when k = 3. Use the data as the output result.

なお、匿名化データの用途によって、適合率や再現率のどちらかをより重視する可能性があるため、そのときは代わりにＥ−尺度（Ｅ−ｍｅａｓｕｒｅ）を用いても良い。
Ｅ−Ｍｅａｓｕｒｅ＝（１−（１＋ｂ^２））／（（ｂ^２／Ｒｅｃａｌｌ）＋（１／Ｐｒｅｃｉｓｉｏｎ））
ここで、ｂは重み係数であり、０から１の間の値を取る。出力結果に応じてｂの値を調整しても良い。 Note that, depending on the use of the anonymized data, there is a possibility that either the precision or the recall is more important, and in that case, an E-measure may be used instead.
E-Measure = (1- (1 + b ² )) / ((b ² / Recall) + (1 / Precision))
Here, b is a weighting coefficient and takes a value between 0 and 1. The value of b may be adjusted according to the output result.

図９は、本発明の実施形態２に係るリスク分析装置１０の処理を示すフローチャートである。 FIG. 9 is a flowchart showing processing of the risk analysis apparatus 10 according to the second embodiment of the present invention.

ステップＳ２０１において、ＣＰＵ（リスク分析手段１１）は、匿名化データを取得する。より具体的には、ＣＰＵは、匿名化パラメータｋ（初期値を２とする）の匿名化データを取得する。 In step S201, the CPU (risk analysis means 11) acquires anonymized data. More specifically, the CPU acquires anonymization data of anonymization parameter k (initial value is 2).

ステップＳ２０２において、ＣＰＵ（リスク分析手段１１）は、レコードごとのリスクを算出する。より具体的には、ＣＰＵは、匿名化データを構成するレコードにおいて個人を特定する属性が出現する頻度を算出し、算出した頻度の逆数を求める。 In step S202, the CPU (risk analysis means 11) calculates the risk for each record. More specifically, the CPU calculates the frequency at which an attribute that specifies an individual appears in a record constituting anonymized data, and obtains the reciprocal of the calculated frequency.

ステップＳ２０３において、ＣＰＵ（リスクレベル算出手段１２）は、算出したリスク値ごとにまとめる。より具体的には、ＣＰＵは、同一のリスク値である属性（例えば、地点）ごとにまとめる。 In step S203, the CPU (risk level calculation means 12) collects the calculated risk values. More specifically, the CPU collects each attribute (for example, point) that is the same risk value.

ステップＳ２０４において、ＣＰＵ（リスクレベル算出手段１２、再現率算出手段１２１、適合率算出手段１２２）は、まとめたリスク値ごとの再現率及び適合率を、リスク値ごとを累積しながら求める。より具体的には、ＣＰＵは、リスクの値が小さい属性（例えば、地点）の集合から順に、それらの集合に含まれる属性（例えば、地点）に基づいて、それらの属性（例えば、地点）の集合における再現率及び適合率を求める。このとき、ＣＰＵは、ｎ番目に小さいエリアの集合から再現率及び適合率を求める際に、１，２，・・・ｎ番目までの小さいエリアの集合を累積した結果から再現率及び適合率を求める。 In step S <b> 204, the CPU (risk level calculation means 12, reproduction rate calculation means 121, relevance rate calculation means 122) obtains the reproducibility and relevance rate for each collected risk value while accumulating each risk value. More specifically, the CPU sequentially sets the attributes (for example, points) of the attributes (for example, points) based on the attributes (for example, points) included in the set in order from the set of attributes (for example, points) having a small risk value. Find the recall and precision of the set. At this time, when the CPU obtains the recall rate and the matching rate from the nth smallest area set, the CPU calculates the recall rate and the matching rate from the result of accumulating the first, 2,. Ask.

ステップＳ２０５において、ＣＰＵ（尺度算出手段１２３）は、再現率と適合率とのＦ−尺度を算出する。 In step S205, the CPU (scale calculation means 123) calculates an F-scale of the recall rate and the matching rate.

ステップＳ２０６において、ＣＰＵ（リスクレベル算出手段１２）は、終了か否かを判断する。具体的には、ＣＰＵは、匿名化パラメータｋが所定の値（例えば、ユーザによって指定された値）以上か否かを判断する。この判断がＹＥＳの場合、ＣＰＵは、処理をステップＳ２０７に移し、この判断がＮＯの場合、ＣＰＵは、処理をステップＳ２０８に移す。 In step S206, the CPU (risk level calculation means 12) determines whether or not the process is finished. Specifically, the CPU determines whether or not the anonymization parameter k is equal to or greater than a predetermined value (for example, a value specified by the user). If this determination is YES, the CPU moves the process to step S207, and if this determination is NO, the CPU moves the process to step S208.

ステップＳ２０７において、ＣＰＵ（リスクレベル算出手段１２、リスクレベル出力手段）は、最大のＦ−尺度に対するリスク値の閾値を求め、出力する。その後、ＣＰＵは、処理を終了する。 In step S207, the CPU (risk level calculation means 12, risk level output means) obtains and outputs a risk value threshold for the maximum F-scale. Thereafter, the CPU ends the process.

ステップＳ２０８において、ＣＰＵ（リスクレベル算出手段１２）は、匿名化パラメータｋを増加（＋１）させる。その後、ＣＰＵは、処理をステップＳ２０１に移す。 In step S208, the CPU (risk level calculation means 12) increases (+1) the anonymization parameter k. Thereafter, the CPU moves the process to step S201.

本実施形態１によれば、リスク分析装置１０は、匿名化データを構成するレコードごとに、リスクを定量的に分析し、分析したレコードごとのリスクに基づいて、攻撃者の予備知識に従って匿名化データのリスクレベルを定量化して算出し、算出したリスクレベル（匿名化に使用するパラメータに対応する）を出力する。
本実施形態２によれば、リスク分析装置１０は、匿名化データを構成するレコードごとに、リスクを定量的に分析し、分析したレコードごとのリスクに基づいて、特定の尺度（例えば、含有率（再現率及び適合率のＦ−尺度））に従って匿名化データのリスクレベルを定量化して算出し、算出したリスクレベル（匿名化に使用するパラメータに対応する）を出力する。
したがって、リスク分析装置１０は、匿名化されたデータにおいて、当該データに含まれる個人が一意に特定されるリスクレベルを定量的に分析することができる。さらに、本発明によれば、ノイズを加える匿名化手法を評価することが可能になり、滞在を表す地点の情報から個人が特定されるリスクを定量化でき、地点の匿名化パラメータを自動決定できる。 According to the first embodiment, the risk analysis apparatus 10 quantitatively analyzes the risk for each record constituting the anonymized data, and anonymizes according to the attacker's prior knowledge based on the risk for each analyzed record. The risk level of the data is quantified and calculated, and the calculated risk level (corresponding to the parameter used for anonymization) is output.
According to the second embodiment, the risk analysis apparatus 10 quantitatively analyzes the risk for each record constituting the anonymized data, and based on the risk for each analyzed record, for example, the content rate (for example, content rate) The risk level of the anonymized data is quantified and calculated according to (Reproducibility and F-scale of precision), and the calculated risk level (corresponding to the parameter used for anonymization) is output.
Therefore, the risk analysis device 10 can quantitatively analyze the risk level in which the individual included in the data is uniquely specified in the anonymized data. Furthermore, according to the present invention, it becomes possible to evaluate an anonymization method for adding noise, and it is possible to quantify the risk of identifying an individual from information on a point representing stay, and to automatically determine an anonymization parameter of the point. .

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

１０リスク分析装置
１１リスク分析手段
１２リスクレベル算出手段
１２１再現率算出手段
１２２適合率算出手段
１２３尺度算出手段
１３リスクレベル出力手段 DESCRIPTION OF SYMBOLS 10 Risk analyzer 11 Risk analysis means 12 Risk level calculation means 121 Recall rate calculation means 122 Relevance rate calculation means 123 Scale calculation means 13 Risk level output means

Claims

A risk analysis device that analyzes a risk level for uniquely identifying the individual from anonymized data including information about the individual,
Risk analysis means for quantitatively analyzing risk for each record constituting the anonymized data;
Risk level calculation means for quantifying and calculating the risk level of the anonymized data according to a specific scale based on the risk for each record analyzed by the risk analysis means;
A risk level output means for outputting the risk level calculated by the risk level calculation means;
A risk analysis apparatus comprising:

The risk analysis apparatus according to claim 1, wherein the risk analysis unit calculates the risk according to an amount of prior knowledge of an attacker who attempts to identify an individual when evaluating the risk of each record.

The risk analysis means quantifies the number of nodes of the known trajectory as the amount of prior knowledge, with knowledge of part or all of the movement trajectory data as the prior knowledge of the attacker. The risk analysis apparatus according to claim 2.

The risk level calculation means compares the risk quantified by the risk analysis means with a threshold, and the number of the records whose risk is equal to or less than the threshold occupies the total number of the records of the anonymized data The risk analysis device according to claim 1, wherein the ratio is calculated as a risk level.

The risk level calculation means includes
A recall that calculates the recall that is the ratio of the number of attributes included in the original data before being anonymized and included in the anonymized data to the total number of the attributes included in the original data A calculation means;
Relevance ratio calculating means for calculating a relevance ratio, wherein the number of the attributes included in the original data and included in the anonymized data is a ratio to the total number of the attributes included in the anonymized data; ,
Scale calculating means for calculating a specific scale based on the recall calculated by the recall calculating means and the precision calculated by the precision calculating means;
Further comprising
The risk analysis apparatus according to claim 4, wherein the risk level calculation unit calculates a parameter for anonymization based on the specific scale calculated by the scale calculation unit.

A method executed by the risk analysis apparatus according to claim 1,
A risk analysis step in which the risk analysis means quantitatively analyzes the risk for each record constituting the anonymized data;
A risk level calculating step in which the risk level calculating means quantifies and calculates the risk level of the anonymized data according to a specific measure based on the risk for each of the records analyzed by the risk analyzing step;
A risk level output step in which the risk level output means outputs the risk level calculated by the risk level calculation step;
A method comprising:

The program for making a computer perform each step of the method of Claim 6.