JP2016151908A

JP2016151908A - Personal information anonymization support device

Info

Publication number: JP2016151908A
Application number: JP2015029240A
Authority: JP
Inventors: 和明井堀; Kazuaki Ihori; 岡田　健一; Kenichi Okada; 健一岡田; 謙英田辺; Kenei Tanabe
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2015-02-18
Filing date: 2015-02-18
Publication date: 2016-08-22

Abstract

PROBLEM TO BE SOLVED: To allow a flexible procedure to obtain anonymization data required by users.SOLUTION: In a personal information anonymization support device, provided are: (1) an anonymization object data take-in unit that takes in anonymization object data; (2) a generalization hierarchy generation unit that generates generalization hierarchy data on the basis of the anonymization object data; (3) a first k-anonymization execution unit that k-anonymizes the anonymization object data using the generalization hierarchy data, and outputs anonymization data as a result of the k-anonymization and incidental information on the k-anonymization; (4) an anonymization map editing unit that displays an anonymization map underlining a value of a sub-identifier used in generation of the k-anonymization data on a screen on the generalization hierarchy data having a tree structure displayed, and receives an editing input with respect to the anonymization map; (5) a second k-anonymization execution unit that k-anonymizes the anonymization object data using the edited anonymization map; (6) a storage device that stores the anonymization object data, the generalization hierarchy data, the k-anonymization data, and the anonymization map.SELECTED DRAWING: Figure 4

Description

本発明は、パーソナル情報の匿名化作業を支援する技術に関する。 The present invention relates to a technique for supporting anonymization work of personal information.

現在、個人に関する情報を分析して新たなサービスを創出し、これらの情報の提供主である情報主体を含む社会の発展につなげる動きが医療・通信などの分野で活発化している。その一方、これらの情報の取り扱いを誤ると、情報主体のプライバシーが侵害される。このようなリスクを低減する技術として、ｋ匿名化がある。ｋ匿名化とは、個人に関する情報のどのレコードも、自分と同じ準識別子の値の組み合わせを持つレコードが自分以外に少なくとも（ｋ−１）件以上現れるように、当該準識別子の値を曖昧化する匿名化技術である。準識別子とは、それ単独では個人を特定できないが、組み合わせることで個人を特定するために使われる属性であり、例えば年齢、性別、居住都道府県などがある。ｋ匿名化では、準識別子の値をより一般的な値に置き換えるための階層構造である一般化階層を用いる。一般化階層の例を図１に示す。 At present, movements that analyze information related to individuals to create new services and lead to the development of society, including the information entity that provides such information, are becoming increasingly active in the fields of medicine and communication. On the other hand, if such information is handled incorrectly, the privacy of the information subject is infringed. As a technique for reducing such a risk, there is k anonymization. k anonymization blurs the value of the quasi-identifier so that every record of information about an individual appears with at least (k-1) or more records having the same quasi-identifier value combination Anonymization technology. The quasi-identifier is an attribute that cannot be used to specify an individual by itself, but is used to specify an individual by combining them, and includes, for example, age, gender, and residence prefecture. In k-anonymization, a generalized hierarchy that is a hierarchical structure for replacing the value of the quasi-identifier with a more general value is used. An example of a generalized hierarchy is shown in FIG.

主なｋ匿名化の実施方式には、非特許文献１に示すように、階層探索型手法とクラスタリング型手法の２つがある。階層探索型手法とは、ｋ匿名性を満たすための一般化階層の階層レベルの組み合わせを探索する手法であり、Ｉｎｃｏｇｎｉｔｏ（非特許文献２）やＦｌａｓｈ（非特許文献３）などのアルゴリズムが知られている。中でもＦｌａｓｈは、探索性能の高速性を追究したアルゴリズムであり、ｋ匿名化のオープンソースソフトウェアのＡＲＸ（非特許文献４）に実装されている。クラスタリング型手法とは、準識別子の値の間の距離に基づき、距離の近い値同士をグループ化することでｋ匿名性を満たすようにする手法であり、非特許文献５のような技術がある。非特許文献６によると、ｋ匿名化の最適解を求める問題はＮＰ困難であるので、これらのアルゴリズムは近似解を与えるものである。 As shown in Non-Patent Document 1, there are two main implementation methods of k-anonymization: a hierarchical search method and a clustering method. The hierarchical search type method is a method for searching for a combination of hierarchical levels of a generalized hierarchy to satisfy k anonymity, and algorithms such as Incognito (Non-patent Document 2) and Flash (Non-patent Document 3) are known. ing. Among them, Flash is an algorithm that pursues high-speed search performance and is implemented in ARX (Non-patent Document 4), an open source software of k-anonymization. The clustering method is a method for satisfying k-anonymity by grouping values having close distances based on the distance between the values of the quasi-identifiers. . According to Non-Patent Document 6, since the problem of obtaining an optimal solution for k anonymization is NP-hard, these algorithms give approximate solutions.

平成２１年度経済産業省情報大航海プロジェクト（基盤共通技術の開発・改良と検証）パーソナル情報保護・解析基盤の開発・改良と検証個人情報匿名化基盤外部仕様書（平成２２年３月、日立コンサルティング）2009 Ministry of Economy, Trade and Industry Information Grand Voyage Project (Development, improvement and verification of common infrastructure technology) Development, improvement and verification of personal information protection / analysis infrastructure Personal information anonymization platform External specifications (March 2010, Hitachi Consulting) ) Kristen LeFevre, David J. DeWitt, Raghu Ramakrishnan, Incognito: efficient full-domain K-anonymity, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, MarylandKristen LeFevre, David J. DeWitt, Raghu Ramakrishnan, Incognito: efficient full-domain K-anonymity, Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June 14-16, 2005, Baltimore, Maryland Florian Kohlmayer, Fabian Prasser, Claudia Eckert, Alfons Kemper, Klaus. A. Kuhn, Flash: Efficient, Stable and Optimal K-Anonymity, Proceedings of the 4th IEEE International Conference on Information Privacy, Security, Risk and Trust (PASSAT), September 3 - 5, 2012, Amsterdam, Netherlands.Florian Kohlmayer, Fabian Prasser, Claudia Eckert, Alfons Kemper, Klaus.A. Kuhn, Flash: Efficient, Stable and Optimal K-Anonymity, Proceedings of the 4th IEEE International Conference on Information Privacy, Security, Risk and Trust (PASSAT), September 3-5, 2012, Amsterdam, Netherlands. ARX - Powerful Data Anonymization | k-Anonymity, l-Diversity, t-Closeness, δ-Presence Implementation in Java, http://arx.deidentifier.org/ (２０１４年１１月２０日現在)ARX-Powerful Data Anonymization | k-Anonymity, l-Diversity, t-Closeness, δ-Presence Implementation in Java, http://arx.deidentifier.org/ (as of November 20, 2014) Jiuyong Li, Raymond Chi-Wing Wong, AdaWai-Chee Fu, and Jian Pei, Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures, DaWaK 2006, LNCS 4081, pp.405-416, 2006.Jiuyong Li, Raymond Chi-Wing Wong, AdaWai-Chee Fu, and Jian Pei, Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures, DaWaK 2006, LNCS 4081, pp.405-416, 2006. Adam Meyerson , Ryan Williams, On the complexity of optimal K-anonymity, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, FranceAdam Meyerson, Ryan Williams, On the complexity of optimal K-anonymity, Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 14-16, 2004, Paris, France

上記のｋ匿名化アルゴリズムによって最適解の近似解が得られるが、得られた解が分析目的に対して有用でない場合がある。非特許文献４に示す技術では、このような場合に備え、一般化階層の階層レベルの集合の直積集合であるラティスから別の解を選択させることで、分析に適した解を見つけられる機能を提供している。この機能では、ある準識別子の値を一般化する階層レベルを上げる代わりに、他の準識別子の値を一般化する階層レベルを下げることなどによって別の解を選択することができる。 Although an approximate solution of the optimal solution is obtained by the k anonymization algorithm, the obtained solution may not be useful for analysis purposes. The technology shown in Non-Patent Document 4 provides a function for finding a solution suitable for analysis by selecting another solution from a lattice which is a Cartesian product set of a set of hierarchical levels of a generalized hierarchy in preparation for such a case. providing. In this function, instead of increasing the hierarchical level that generalizes the value of a certain semi-identifier, another solution can be selected, for example, by lowering the hierarchical level that generalizes the value of another semi-identifier.

一方で、一つの準識別子の値を一般化する階層レベルを一律に変更することなく調整することができれば、ユーザにとってより使い勝手が良いｋ匿名化データを得ることができると考える。例えば図２の場合、住所・年齢とも階層レベルを１ずつ上げる一般化を行うことで２匿名性を満たせるが、図３に示すような、住所が「川崎市」または「横浜市」のものについては住所の値を変更せず、他は図２と同じ一般化を行うようにしても２匿名性を満たせる。このとき、後者の方が住所の情報がより詳細に残るため、より分析に使いやすい。 On the other hand, if the hierarchy level that generalizes the value of one quasi-identifier can be adjusted without changing it uniformly, it is considered that k anonymized data that is more convenient for the user can be obtained. For example, in the case of Fig. 2, it is possible to satisfy 2 anonymity by generalizing the hierarchy level by 1 for both the address and age, but the address is "Kawasaki City" or "Yokohama City" as shown in Fig. 3 Does not change the value of the address, and others can satisfy the 2 anonymity even if the same generalization as FIG. 2 is performed. At this time, the latter is easier to use for analysis because the address information remains in more detail.

しかし、前述した非特許文献４に示す技術では、準識別子のある値の階層レベルを変更する場合は、当該準識別子のすべての値の階層レベルを同じものに変更しなくてはならず、図３に示すように特定の準識別子についてのみ解の調整を行うことができない。このように、非特許文献４に示す技術には、よりよい匿名化データを得るための試行錯誤を柔軟に行うことができないという課題がある。 However, in the technique shown in Non-Patent Document 4 described above, when changing the hierarchical level of a value having a quasi-identifier, the hierarchical level of all values of the quasi-identifier must be changed to the same one. As shown in FIG. 3, the solution cannot be adjusted only for a specific quasi-identifier. Thus, the technique shown in Non-Patent Document 4 has a problem that trial and error for obtaining better anonymized data cannot be performed flexibly.

加えて、非特許文献５などのクラスタリング型手法では、同一の準識別子の中でも、出現頻度の高い値ほど階層レベルを上げずに済む。このため、クラスタリング型手法は、階層探索型手法に比べ、より分析に適したデータを得やすいという特長がある。ところが、非特許文献４に示す技術は、前述したように階層探索型手法に特化されたものであるため、このクラスタリング型手法を非特許文献４の技術に適用することはできない。すなわち、非特許文献４に示す技術によっては、よりよい匿名化データを得る方式が現れたとしても、柔軟に対応できないという課題がある。 In addition, in the clustering type method such as Non-Patent Document 5, it is not necessary to increase the hierarchical level of the same quasi-identifier as the frequency of appearance is high. Therefore, the clustering method has a feature that it is easier to obtain data suitable for analysis than the hierarchical search method. However, since the technique shown in Non-Patent Document 4 is specialized in the hierarchical search method as described above, this clustering method cannot be applied to the technique of Non-Patent Document 4. That is, depending on the technique shown in Non-Patent Document 4, there is a problem that even if a method for obtaining better anonymized data appears, it cannot be flexibly handled.

上記課題を解決するため、１つの発明においては、(1) 匿名化対象データを取り込む匿名化対象データ取込部と、(2) 前記匿名化対象データに基づいて一般化階層データを作成する一般化階層作成部と、(3) 前記匿名化対象データを、前記一般化階層データを用いてｋ匿名化し、ｋ匿名化の結果である匿名化データとその付帯情報を出力するｋ匿名化実行部と、(4) 前記ｋ匿名化データの作成に用いられた準識別子の値を強調表示した匿名化マップを画面上に表示する匿名化マップ表示部と、(5) 前記匿名化対象データ、前記一般化階層データ、前記ｋ匿名化データ、前記匿名化マップを記憶する記憶装置と、を有するパーソナル情報匿名化支援装置を提供する。 In order to solve the above problems, in one invention, (1) anonymization target data capturing unit that captures anonymization target data, and (2) generalized hierarchical data that is created based on the anonymization target data (3) The anonymization execution unit that anonymizes the anonymization target data using the generalized hierarchy data, and outputs anonymization data that is the result of k anonymization and its accompanying information And (4) an anonymization map display unit that displays on the screen an anonymization map highlighting the value of the quasi-identifier used to create the k anonymization data, and (5) the anonymization target data, There is provided a personal information anonymization support device having generalized hierarchical data, the k anonymization data, and a storage device for storing the anonymization map.

また、１つの発明においては、(1) 匿名化対象データを取り込む匿名化対象データ取込部と、(2) 前記匿名化対象データに基づいて一般化階層データを作成する一般化階層作成部と、(3) 前記匿名化対象データを、前記一般化階層データを用いてｋ匿名化し、ｋ匿名化の結果である匿名化データとその付帯情報を出力する第１のｋ匿名化実行部と、(4) 木構造表示された前記一般化階層データ上で、前記ｋ匿名化データの作成に用いられた準識別子の値を強調表示した匿名化マップを画面上に表示すると共に、当該匿名化マップに対する編集入力を受け付ける匿名化マップ編集部と、(5) 編集後の前記匿名化マップを用いて、前記匿名化対象データをｋ匿名化する第２のｋ匿名化実行部と、(6) 前記匿名化対象データ、前記一般化階層データ、前記ｋ匿名化データ、前記匿名化マップを記憶する記憶装置と、を有するパーソナル情報匿名化支援装置を提供する。 Further, in one invention, (1) an anonymization target data capturing unit that captures anonymization target data, and (2) a generalized hierarchy creating unit that creates generalized hierarchical data based on the anonymization target data; (3) a first k anonymization executing unit that anonymizes the anonymization target data using the generalized hierarchical data, and outputs anonymized data that is a result of k anonymization and its accompanying information; (4) On the screen, the anonymization map highlighting the value of the quasi-identifier used to create the k anonymization data on the generalized hierarchical data displayed in a tree structure is displayed on the screen. An anonymization map editing unit that accepts an edit input, and (5) a second k anonymization execution unit that anonymizes the anonymization target data using the anonymized map after editing, and (6) Anonymization target data, the generalized hierarchical data, the k Naka data, provides personal information anonymization support device having a storage device for storing said pseudonym map.

１つの発明によれば、ｋ匿名化データの作成に用いられた準識別子の値と階層レベルを、木構造表示された一般化階層データ上において、ユーザが容易に確認することができる。また、１つの発明によれば、分析目的に適したｋ匿名化データを得るための調整をユーザがより柔軟に実施できるようになる。前述した以外の課題、構成及び効果は、以下の実施の形態の説明により明らかにされる。 According to one aspect of the invention, the user can easily confirm the value of the quasi-identifier and the hierarchy level used for creating the k-anonymized data on the generalized hierarchy data displayed in a tree structure. Moreover, according to one invention, a user can implement the adjustment for obtaining k anonymization data suitable for an analysis purpose more flexibly. Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

一般化階層の例を表す図。The figure showing the example of a generalization hierarchy. ２匿名化の実行例を表す図。The figure showing the example of execution of 2 anonymization. ２匿名化を保ちつつ、図２よりも詳細に準識別子の値を残す例を表す図。2 is a diagram illustrating an example in which the value of the quasi-identifier is left in more detail than FIG. 2 while keeping anonymization. パーソナル情報匿名化支援装置の全体構成例を表すブロック図。The block diagram showing the example of whole structure of a personal information anonymization assistance apparatus. パーソナル情報匿名化支援装置によるｋ匿名化データの作成手順を示すフローチャート。The flowchart which shows the preparation procedure of k anonymization data by a personal information anonymization assistance apparatus. 匿名化対象データの取込画面例を示す図。The figure which shows the example of a capture screen of the anonymization object data. ｋ匿名化実行部の入力画面例を示す図。The figure which shows the example of an input screen of k anonymization execution part. 準識別子の匿名化マップファイルの例を表す図。The figure showing the example of the anonymization map file of a semi-identifier. 「ｋ匿名化データが実現するｋ匿名化閾値」を説明するための図。The figure for demonstrating "k anonymization threshold value which k anonymization data implement | achieves." ｋ匿名化結果の確認画面例を示す図。The figure which shows the example of a confirmation screen of k anonymization result. 匿名化後の値に対する匿名化前の値の可視化イメージを示す図。The figure which shows the visualization image of the value before anonymization with respect to the value after anonymization. 匿名化前の値に対する匿名化後の値の可視化イメージを示す図。The figure which shows the visualization image of the value after anonymization with respect to the value before anonymization. 準識別子の匿名化マップを編集し、特殊化する場合のイメージ図。The image figure in the case of editing and specializing the quasi-identifier anonymization map. 準識別子の匿名化マップを編集し、一般化する場合のイメージ図。The image figure in the case of editing and generalizing the anonymization map of a semi-identifier. 準識別子の匿名化マップを編集するための参考情報を提示する画面のイメージ図。The image figure of the screen which presents the reference information for editing the anonymization map of a semi-identifier.

以下、添付図面を参照して発明の実施形態について説明する。ただし、以下で説明する実施形態は、発明を実現するための一例に過ぎず、発明の技術的範囲を限定するものではないことに注意すべきである。以下の実施形態では、匿名化マップをGUI（Graphical User Interface）によって表す例について説明するが、CUI（Character User Interface）によって表しても良い。 Embodiments of the invention will be described below with reference to the accompanying drawings. However, it should be noted that the embodiments described below are merely examples for realizing the invention and do not limit the technical scope of the invention. In the following embodiment, an example in which an anonymization map is represented by a GUI (Graphical User Interface) will be described, but it may be represented by a CUI (Character User Interface).

（１）基本的な実施形態
（１−１）全体構成
図４に、実施形態として提示するパーソナル情報匿名化支援装置の全体構成を示す。この試行錯誤支援装置は、入出力装置４００と、情報処理装置４１０と、記憶装置４２０から構成される。 (1) Basic Embodiment (1-1) Overall Configuration FIG. 4 shows the overall configuration of a personal information anonymization support device presented as an embodiment. This trial and error support device includes an input / output device 400, an information processing device 410, and a storage device 420.

入出力装置４００は、k匿名化に必要な情報の入出力に用いられる。k匿名化に必要な情報には、匿名化対象データの他、ユーザインタフェースを通じて入力される操作入力、ユーザインタフェースを通じて出力される各種の画面(例えば匿名化マップを含む確認画面)が含まれる。 The input / output device 400 is used to input / output information necessary for anonymization. The information necessary for anonymization includes anonymization target data, operation input input through the user interface, and various screens output through the user interface (for example, confirmation screens including an anonymization map).

情報処理装置４１０は、各種の情報を処理する計算機であり、プログラムを実行するコンピュータ（CPU、RAM、ROM等で構成される）として実現される。図４では、プログラムの実行を通じて実現される各種の機能のうち、ユーザによるｋ匿名化データの試行錯誤的な調整作業を支援する機能についてのみ表している。匿名化対象データ取込部４１１は、匿名化対象データ４２１を入出力装置４００から取り込む処理を実行する。一般化階層作成部４１２は、匿名化対象データ４２１について一般化階層を作成し、一般化階層データ４２２として記憶装置４２０に格納する。ｋ匿名化実行部４１３は、一般化階層データ４２２を用いて匿名化対象データ４２１をｋ匿名化し、作成された匿名化データ４２３を記憶装置４２０に格納する。ｋ匿名化結果出力部４１４は、作成された匿名化データ４２３、その作成に用いられた準識別子の値の対応関係を示す匿名化マップデータ４２４、匿名化サマリデータ４２５等を画面上に表示する。本明細書では、ｋ匿名化結果出力部４１４の表示機能のうち匿名化マップの表示機能を「匿名化マップ表示部」という。匿名化マップ編集部４１５は、匿名化マップに対するユーザの編集入力を受け付け、匿名化マップの編集処理を実行する。調整後匿名化実行部４１６は、編集された匿名化マップを用いて匿名化対象データ４２１をｋ匿名化する。 The information processing apparatus 410 is a computer that processes various types of information, and is realized as a computer (configured with a CPU, RAM, ROM, etc.) that executes a program. FIG. 4 shows only the function that supports the trial-and-error adjustment work of the k-anonymized data by the user among various functions realized through the execution of the program. The anonymization target data capturing unit 411 executes processing for capturing the anonymization target data 421 from the input / output device 400. The generalized hierarchy creating unit 412 creates a generalized hierarchy for the anonymization target data 421 and stores it as generalized hierarchy data 422 in the storage device 420. The k anonymization execution unit 413 anonymizes the anonymization target data 421 using the generalized hierarchy data 422 and stores the created anonymization data 423 in the storage device 420. The k anonymization result output unit 414 displays the created anonymized data 423, anonymized map data 424 indicating the correspondence between the values of the quasi-identifiers used for the creation, anonymized summary data 425, and the like on the screen. . In this specification, the display function of the anonymization map among the display functions of the k anonymization result output unit 414 is referred to as “anonymization map display unit”. The anonymized map editing unit 415 receives a user's editing input for the anonymized map and executes an anonymized map editing process. The post-adjustment anonymization execution unit 416 anonymizes the anonymization target data 421 using the edited anonymization map.

記憶装置４２０には、匿名化対象データ４２１、一般化階層データ４２２、匿名化データ４２３、匿名化マップに対応する匿名化マップデータ４２４、匿名化データ４２３のデータ品質を表現する匿名化サマリデータ４２５が格納される。記憶装置４２０は、例えばハードディスクドライブ、ＳＳＤで構成される。ハードウェア上で稼動するファイルシステムには、Ｌｉｎｕｘのｅｘｔ３などのローカルファイルシステム、分散処理基盤ＨａｄｏｏｐのＨＤＦＳのような分散ファイルシステムなどが考えられる。 The storage device 420 includes anonymization target data 421, generalized hierarchical data 422, anonymization data 423, anonymization map data 424 corresponding to the anonymization map, and anonymization summary data 425 expressing the data quality of the anonymization data 423. Is stored. The storage device 420 is configured by, for example, a hard disk drive or an SSD. As a file system operating on hardware, a local file system such as Linux ext3 or a distributed file system such as HDFS of distributed processing infrastructure Hadoop can be considered.

本明細書において、「匿名化マップ」とは、匿名化前の準識別子の値と、匿名化後の準識別子の値との対応関係の表現をいう。例えば図８は、都道府県という準識別子に対するCUI形式（CSVファイル形式）の匿名化マップデータの例である。図８の場合、匿名化前の準識別子の値が第１列に、匿名化後の準識別子の値が第２列に配置されている。もっとも、匿名化マップの表現形式はCUI形式に限らない。例えば図１０の匿名化マップ欄１００９に示すようなGUI形式でも良い。後述するように、一般化階層データに対応する木構造によって匿名化マップを表現する場合、匿名化後の準識別子の値のみを強調表示することで匿名化前後の対応関係を表現する。 In this specification, the “anonymization map” refers to an expression of a correspondence relationship between the value of the quasi-identifier before anonymization and the value of the quasi-identifier after anonymization. For example, FIG. 8 shows an example of anonymized map data in the CUI format (CSV file format) for the quasi-identifier of prefectures. In the case of FIG. 8, the value of the quasi-identifier before anonymization is arranged in the first column, and the value of the quasi-identifier after anonymization is arranged in the second column. However, the expression format of the anonymization map is not limited to the CUI format. For example, a GUI format as shown in the anonymization map column 1009 in FIG. 10 may be used. As will be described later, when the anonymization map is expressed by a tree structure corresponding to generalized hierarchical data, the correspondence relationship before and after anonymization is expressed by highlighting only the value of the quasi-identifier after anonymization.

（１−２）ｋ匿名化データの作成手順
図５に、パーソナル情報匿名化支援装置によるｋ匿名化データの作成手順を示す。以下の処理は、情報処理装置４１０によるプログラムの実行を通じて実現される。 (1-2) Procedure for creating k anonymized data FIG. 5 shows a procedure for creating k anonymized data by the personal information anonymization support device. The following processing is realized through execution of a program by the information processing apparatus 410.

［ステップＳ５０１］
匿名化対象データ取込部４１１は、匿名化対象データファイルを入出力装置４００から取り込む。匿名化対象データ取込部４１１は、取り込んだ匿名化対象データファイルを、当該ファイルの列の定義情報と共に記憶装置４２０の匿名化対象データ４２１に格納する。列の定義情報とは、列の名前やデータ型などの列に関するメタデータを指す。 [Step S501]
The anonymization target data capturing unit 411 captures the anonymization target data file from the input / output device 400. The anonymization target data capturing unit 411 stores the captured anonymization target data file in the anonymization target data 421 of the storage device 420 together with the definition information of the column of the file. Column definition information refers to metadata about a column such as the column name and data type.

［ステップＳ５０２］
一般化階層作成部４１２は、匿名化対象データ４２１に対応する一般化階層を作成する。一般化階層は、記憶装置４２０に一般化階層データ４２２として格納される。 [Step S502]
The generalized hierarchy creating unit 412 creates a generalized hierarchy corresponding to the anonymization target data 421. The generalized hierarchy is stored as generalized hierarchy data 422 in the storage device 420.

［ステップＳ５０３］
ｋ匿名化実行部４１３は、一般化階層データ４２２を用いて匿名化対象データ４２１をｋ匿名化し、ｋ匿名化後のデータとその付帯情報を出力する。ｋ匿名化後のデータは、匿名化データ４２３として記憶装置４２０に格納される。ｋ匿名化後のデータの付帯情報とは、ｋ匿名化の実施に当たって準識別子の値を置き換えるルールを定義した匿名化マップの情報や、ｋ匿名化の結果実現されたｋの値などの匿名化サマリ情報のことであり、それぞれ匿名化マップデータ４２４及び匿名化サマリデータ４２５として記憶装置４２０に格納される。匿名化マップデータ４２４及び匿名化サマリデータ４２５についての詳細は後述する。 [Step S503]
The k anonymization execution unit 413 anonymizes the anonymization target data 421 using the generalized hierarchy data 422, and outputs the data after the anonymization and the accompanying information. The data after k anonymization is stored in the storage device 420 as anonymized data 423. Ancillary information of data after k anonymization means anonymization information such as anonymization map information defining rules for replacing quasi-identifier values in the implementation of k anonymization, and k value realized as a result of k anonymization It is summary information and is stored in the storage device 420 as anonymized map data 424 and anonymized summary data 425, respectively. Details of the anonymization map data 424 and the anonymization summary data 425 will be described later.

［ステップＳ５０４］
ｋ匿名化結果出力部４１４は、作成された匿名化データ４２３、匿名化マップデータ４２４、匿名化サマリデータ４２５等に対応する情報を確認画面（例えば図１０）上に提示する。パーソナル情報匿名化支援装置のユーザは、前記確認画面上で、これらの情報を確認する。 [Step S504]
The k anonymization result output unit 414 presents information corresponding to the created anonymization data 423, anonymization map data 424, anonymization summary data 425, and the like on a confirmation screen (for example, FIG. 10). The user of the personal information anonymization support device confirms the information on the confirmation screen.

［ステップＳ５０５］
匿名化マップ編集部４１５は、操作画面上に表示された匿名化マップに対するユーザの編集操作の有無を判定する。ここで、匿名化マップに対する編集指示が無い場合（例えば“OK”ボタンが操作された場合）、匿名化マップ編集部４１５は、本処理を終了する。一方、匿名化マップに対する編集指示が有った場合（例えば指示入力があった場合）、匿名化マップ編集部４１５は、ステップＳ５０６に進む。 [Step S505]
The anonymized map editing unit 415 determines whether or not the user has performed an editing operation on the anonymized map displayed on the operation screen. Here, when there is no editing instruction for the anonymization map (for example, when the “OK” button is operated), the anonymization map editing unit 415 ends this processing. On the other hand, when there is an editing instruction for the anonymization map (for example, when there is an instruction input), the anonymization map editing unit 415 proceeds to step S506.

［ステップＳ５０６］
匿名化マップ編集部４１５は、操作画面を通じて受け付けた匿名化マップに対する指示入力に基づいて匿名化マップを変更すると共に、変更後の匿名化マップが満たすｋ匿名化閾値や情報損失などの情報を計算する。当該処理の実行により、記憶装置４２０に格納されている匿名化マップデータ４２４及び匿名化サマリデータ４２５が更新される。 [Step S506]
The anonymization map editing unit 415 changes the anonymization map based on an instruction input for the anonymization map received through the operation screen, and calculates information such as the k anonymization threshold and information loss that the anonymization map after the change satisfies. To do. By executing this process, the anonymized map data 424 and the anonymized summary data 425 stored in the storage device 420 are updated.

［ステップＳ５０７］
匿名化マップ編集部４１５は、ユーザが望む匿名化マップが作成されたか否かを判定する。作成された匿名化マップを受け入れる指示入力があった場合、匿名化マップ編集部４１５はステップＳ５０８に進み、前記指示入力が無かった場合、匿名化マップ編集部４１５はステップＳ５０９に進む。 [Step S507]
The anonymization map editing unit 415 determines whether the anonymization map desired by the user has been created. If there is an instruction input for accepting the created anonymization map, the anonymization map editing unit 415 proceeds to step S508, and if there is no instruction input, the anonymization map editing unit 415 proceeds to step S509.

［ステップＳ５０８］
調整後匿名化実行部４１６は、ステップＳ５０６で作成された匿名化マップを用いて匿名化対象データ４２１をｋ匿名化し、匿名化データ４２３を作成する。作成された匿名化データ４２３により、記憶装置４２０の匿名化データ４２３が更新される。 [Step S508]
The post-adjustment anonymization execution unit 416 anonymizes the anonymization target data 421 using the anonymization map created in step S506, and creates anonymization data 423. The anonymized data 423 in the storage device 420 is updated with the created anonymized data 423.

［ステップＳ５０９］
一般化階層作成部４１２は、一般化階層の編集処理を実行する。一般化階層の編集内容は、例えば操作画面を通じてユーザより与えられる。編集が完了すると、記憶装置４２０の一般化階層データ４２２が更新される。 [Step S509]
The generalized hierarchy creating unit 412 executes a generalized hierarchy editing process. The editing contents of the generalized hierarchy are given by the user through the operation screen, for example. When the editing is completed, the generalized hierarchical data 422 of the storage device 420 is updated.

（１−３）詳細構成
続いて、パーソナル情報匿名化支援装置を構成する各部又は機能部の詳細を説明する。 (1-3) Detailed Configuration Next, details of each unit or function unit configuring the personal information anonymization support device will be described.

（１−３−１）入出力装置
入出力装置４００は、情報処理装置４１０に対するデータの入出力を行う。入出力装置４００は、例えばマウスやキーボード、ディスプレイなどによって実現する。 (1-3-1) Input / Output Device The input / output device 400 inputs / outputs data to / from the information processing device 410. The input / output device 400 is realized by a mouse, a keyboard, a display, or the like, for example.

（１−３−２）匿名化対象データ取込部
匿名化対象データ取込部４１１は、前記匿名化対象データとその列情報を匿名化対象データ４２１として記憶装置４２０に取り込む。例えば図６に示す匿名化対象データ取込画面６００において、ユーザが取込ボタン６０３を押すと、匿名化対象データ欄６０１で指定されたファイルの１行目に書かれているカンマ区切り文字列を列名とみなし、２行目以降を匿名化対象データの本体として、匿名化対象データ４２１に書き込む。匿名化対象データ４２１は、匿名化対象データ名称欄６０２で指定した名称で識別されるようになる。列情報の指定方法はこれに限るものではない。例えば、１行目からデータが始まるファイルを指定し、列情報は別の入力欄から指定するものであってもよい。また、匿名化対象データファイルを読み取る際の文字コードが指定できてもよい。取り込む匿名化対象データファイルの種類としては、ユーザが予め入手してあるCSVファイル形式のパーソナルデータなどが考えられる。 (1-3-2) Anonymization target data capturing unit The anonymization target data capturing unit 411 captures the anonymization target data and its column information in the storage device 420 as the anonymization target data 421. For example, in the anonymization target data capture screen 600 shown in FIG. 6, when the user presses the capture button 603, a comma-delimited character string written in the first line of the file specified in the anonymization target data column 601 is displayed. It is regarded as a column name, and the second and subsequent lines are written in the anonymization target data 421 as the main body of the anonymization target data. The anonymization target data 421 is identified by the name specified in the anonymization target data name column 602. The method for specifying column information is not limited to this. For example, a file in which data starts from the first line may be specified, and column information may be specified from another input field. Moreover, the character code at the time of reading the anonymization object data file may be designated. As a type of the anonymization target data file to be captured, personal data in a CSV file format obtained in advance by the user can be considered.

（１−３−３）一般化階層作成部
一般化階層作成部４１２は、匿名化対象データ取込部４１１で取り込んだ匿名化対象データ４２１に基づいて一般化階層を作成し、作成されたデータに名前を付けて一般化階層データ４２２に出力する。一般化階層を作成するときは、匿名化対象データ４２１の列情報に含まれる列の一部または全部を準識別子として指定し、当該準識別子に対して行う。一般化階層は、例えば非特許文献１に記載の方法によって自動生成しても良いし、非特許文献４に記載の方法のようにＧＵＩ上でインタラクティブに作成しても良い。 (1-3-3) Generalized Hierarchy Creation Unit The generalized hierarchy creation unit 412 creates a generalized hierarchy based on the anonymization target data 421 captured by the anonymization target data capture unit 411, and the created data To the generalized hierarchical data 422. When creating the generalized hierarchy, a part or all of the columns included in the column information of the anonymization target data 421 are designated as quasi-identifiers, and the quasi-identifier is performed. The generalized hierarchy may be automatically generated by the method described in Non-Patent Document 1, for example, or may be interactively created on the GUI as in the method described in Non-Patent Document 4.

（１−３−４）ｋ匿名化実行部
ｋ匿名化実行部４１３は、匿名化対象データ４２１を、一般化階層データ４２２によってｋ匿名化し、匿名化データ４２３として出力する。また、ｋ匿名化実行部４１３は、ｋ匿名化の実行結果に関する付帯情報である匿名化マップデータ４２４と匿名化サマリデータ４２５を同時に出力する。ｋ匿名化実行部４１３は、例えば図７に示すようなｋ匿名化実行画面７００を通じて入力された情報に基づいてｋ匿名化を実現する。 (1-3-4) k anonymization execution unit The k anonymization execution unit 413 anonymizes the anonymization target data 421 by the generalized hierarchy data 422 and outputs the data as anonymization data 423. Further, the k anonymization execution unit 413 simultaneously outputs anonymization map data 424 and anonymization summary data 425 that are incidental information related to the execution result of k anonymization. The k anonymization execution unit 413 realizes k anonymization based on information input through the k anonymization execution screen 700 as illustrated in FIG. 7, for example.

図７に示すｋ匿名化実行画面７００の場合、匿名化対象データ名称欄７０１を通じ、匿名化対象データ４２１に格納した匿名化対象データの名称が指定される。図７は、プルダウンボタンの操作によって事前に登録された名称や階層等が一覧的に表示され、表示された項目の中から特定の名称等を選択できる場合について表している。もっとも、名称等を直接入力することも可能である。ｋ匿名化閾値欄７０２を通じ、ｋ匿名化データが満たすべきｋの値を入力できる。図７は、値を直接入力する例を表しているが、プルダウンボタンの操作によって表示される項目の中から選択入力できるようにしても良い。準識別子欄７０３を通じ、一般化階層作成部４１２で指定した準識別子の一部または全部を指定する。図７は、値を直接入力する例を表しているが、リストの中から選択入力できるようにしても良い。 In the case of the k anonymization execution screen 700 shown in FIG. 7, the name of the anonymization target data stored in the anonymization target data 421 is designated through the anonymization target data name column 701. FIG. 7 shows a case where names, hierarchies, and the like registered in advance by a pull-down button operation are displayed as a list, and a specific name or the like can be selected from the displayed items. However, it is also possible to directly enter a name or the like. Through the k anonymization threshold field 702, the value of k that the k anonymization data should satisfy can be input. Although FIG. 7 shows an example in which a value is directly input, the value may be selected and input from items displayed by operating a pull-down button. Through the quasi-identifier field 703, a part or all of the quasi-identifier designated by the generalized hierarchy creating unit 412 is designated. Although FIG. 7 shows an example in which a value is directly input, it may be selected and input from a list.

一般化階層欄７０４には、一般化階層作成部４１２で作成した一般化階層の名前が入力される。図７は、プルダウンボタンの操作によって事前に登録された名前の中から特定の名前を選択できるようになっているが、名前を直接入力しても良い。ｋ匿名化手法欄７０５には、ｋ匿名化に使用する方式が入力される。図７では、階層探索型手法が選択されているが、もちろん、この手法に限らない。例えばクラスタリング型手法でも良い。また、図７では、プルダウンボタンの操作によって事前に登録された名前の中から特定の手法を選択できるようになっているが、手法を直接入力しても良い。出力データ名称欄７０６には、出力する匿名化データ４２３を識別するための名称が入力される。図７は、値を直接入力する例を表しているが、プルダウンボタンの操作によって表示される項目の中から選択入力できるようにしても良い。 In the generalized hierarchy column 704, the name of the generalized hierarchy created by the generalized hierarchy creating unit 412 is input. In FIG. 7, a specific name can be selected from names registered in advance by operating a pull-down button. However, the name may be directly input. In the k anonymization technique column 705, a method used for k anonymization is input. In FIG. 7, the hierarchical search type method is selected, but of course, it is not limited to this method. For example, a clustering method may be used. In FIG. 7, a specific method can be selected from names registered in advance by operating a pull-down button. However, the method may be directly input. In the output data name column 706, a name for identifying the anonymized data 423 to be output is input. Although FIG. 7 shows an example in which a value is directly input, the value may be selected and input from items displayed by operating a pull-down button.

これらの情報を入力した後、画面右下の実行ボタン７０７を押すと、ｋ匿名化手法欄７０５で指定された方式によってｋ匿名化が実行される。ｋ匿名化によって得られたデータは、匿名化データ４２３として、出力データ名称欄７０６で指定した名前を付して出力される。さらに、ｋ匿名化の実行結果に関する付帯情報も出力される。 When the execution button 707 at the lower right of the screen is pressed after inputting such information, k anonymization is executed by the method specified in the k anonymization method column 705. The data obtained by k anonymization is output as anonymized data 423 with the name specified in the output data name column 706. Further, incidental information regarding the execution result of k anonymization is also output.

なお、図７に表示した項目は一例であり、他の項目についても指定入力できるようにしても良い。例えばプラン数、削除レコードの許容数、一般化階層の生成に用いる手法等も指定入力できるようにしても良い。 The items displayed in FIG. 7 are merely examples, and other items may be designated and input. For example, the number of plans, the allowable number of deleted records, a method used for generating a generalized hierarchy, and the like may be designated and input.

続いて、出力の付帯情報である匿名化マップデータ４２４と匿名化サマリデータ４２５について説明する。匿名化マップデータ４２４とは、ｋ匿名化を実施するときに用いる準識別子の値の変換前の値と変換後の値との対応表である。匿名化マップデータ４２４は、例えば図８に示すように、変換前の値を第１列に、変換後の値を第２列に記述したＣＳＶファイルとして表現する。本実施形態では、図８のような形式のＣＳＶファイルを準識別子ごとに別々に用意しているが、すべての準識別子に関する変換情報を１つのファイルに持ってもよい。例えば１列目に準識別子名を、２列目に変換前の当該準識別子の値を、３列目に変換後の当該準識別子の値を持つようなＣＳＶファイルとすることもできる。 Next, anonymized map data 424 and anonymized summary data 425, which are output supplementary information, will be described. The anonymization map data 424 is a correspondence table between pre-conversion values and post-conversion values of quasi-identifier values used when k anonymization is performed. For example, as shown in FIG. 8, the anonymization map data 424 is expressed as a CSV file in which values before conversion are described in the first column and values after conversion are described in the second column. In this embodiment, CSV files of the format shown in FIG. 8 are prepared separately for each quasi-identifier, but conversion information regarding all quasi-identifiers may be included in one file. For example, the CSV file may have a semi-identifier name in the first column, a value of the semi-identifier before conversion in the second column, and a value of the semi-identifier after conversion in the third column.

匿名化サマリデータ４２５とは、ｋ匿名化データに関する統計情報であり、ｋ匿名化を実行したときに指定したｋ匿名化閾値、ｋ匿名化データが実現するｋ匿名化閾値、情報損失率、匿名化後の準識別子の値のリストおよびその要素数などがある。 Anonymization summary data 425 is statistical information about k anonymization data, k anonymization threshold specified when k anonymization is executed, k anonymization threshold realized by k anonymization data, information loss rate, anonymity There is a list of quasi-identifier values after conversion into the number of elements.

ｋ匿名化データが実現するｋ匿名化閾値とは、ｋ匿名化されたデータを準識別子の値の組み合わせによって分類したときの分類のレコード数の最小値である。これはｋ匿名化を実行するときに指定するｋ匿名化閾値よりも大きい値になることがある。例えば図９に示す匿名化対象データを同図に示す一般化階層によって２匿名化を行う場合、匿名化データは図９のように、レコード数が３である性別＝男、年齢＝３０代の分類と、レコード数が４である性別＝女、年齢＝２０代の分類とに分かれるので、ｋ匿名化データが実現するｋ匿名化閾値は３になる。 The k anonymization threshold realized by k anonymized data is the minimum value of the number of records of classification when k anonymized data is classified by a combination of quasi-identifier values. This may be larger than the k anonymization threshold specified when executing k anonymization. For example, when the anonymization target data shown in FIG. 9 is anonymized by the generalization hierarchy shown in FIG. 9, the anonymization data is as shown in FIG. Since it is divided into a classification and a classification of gender = female and age = 20 in which the number of records is 4, the k anonymization threshold realized by the k anonymization data is 3.

情報損失率とは、匿名化対象データに対する匿名化データの情報の欠落の度合いを表す指標であり、例えば以下に示す式１及び式２によって定義される。後述する式１及び式２において、Ｄは匿名化対象データまたは匿名化データ、ｑは準識別子、ｉは匿名化対象データまたは匿名化データのレコードの添え字、Ｑは準識別子の数、Ｒ（Ｄ）はデータＤの件数、ｆ（Ｄ，ｉ，ｑ）はデータＤにおける準識別子ｑのレコードｉでの値ｒ（Ｄ，ｉ，ｑ）の、データＤの準識別子ｑでの出現頻度を表す。また、ｌｏｇは底が２の対数関数を表す。ただし、対数関数の底は、定義式全体で同じものを使用している限り、自然対数の底ｅ＝２．７１８２８…、常用対数の底１０など、１を越える任意の実数でもよい。 The information loss rate is an index representing the degree of information loss of anonymized data with respect to anonymization target data, and is defined by, for example, Expression 1 and Expression 2 shown below. In Equation 1 and Equation 2 described later, D is anonymization target data or anonymization data, q is a quasi-identifier, i is a subscript of the anonymization target data or anonymization data record, Q is the number of quasi-identifiers, R ( D) is the number of cases of data D, f (D, i, q) is the frequency of occurrence of the value r (D, i, q) in the record i of the semi-identifier q in the data D at the semi-identifier q of the data D. Represent. In addition, log represents a logarithmic function with a base of 2. However, the base of the logarithmic function may be any real number exceeding one, such as the base of the natural logarithm e = 2.71828..., The base 10 of the common logarithm, as long as the same definition expression is used.

本実施形態では、一般化階層の作成とｋ匿名化の実行を別々の画面として記述しているが、一般化階層を生成する場合は、一般化階層作成部とｋ匿名化実行部の処理を連続した一体のものとして実施してもよい。 In this embodiment, the creation of the generalized hierarchy and the execution of k anonymization are described as separate screens. However, when generating the generalized hierarchy, the processes of the generalized hierarchy creating part and the k anonymization executing part are performed. It may be implemented as a continuous and unitary object.

ｋ匿名化実行部４１３の入力画面には図７に示したものが考えられるが、これら以外にも、ｋ匿名性を満たすために削除できる例外的なレコードの件数の最大値である削除レコード数が指定できてもよい。また、ｋ匿名化手法欄７０５で階層探索型手法を選択したときは、一般に解は複数あるので、探索する解の数の上限が指定できてもよい。 The input screen of the k-anonymization execution unit 413 may be the one shown in FIG. 7, but in addition to these, the number of deleted records that is the maximum number of exceptional records that can be deleted to satisfy k-anonymity May be specified. In addition, when the hierarchical search type method is selected in the k anonymization method column 705, there are generally a plurality of solutions, and therefore the upper limit of the number of solutions to be searched may be specified.

匿名化サマリデータ４２５については、先に挙げたものの他、削除レコード数を指定する場合は、ｋ匿名化を実行するときに実際に削除したレコード数を含んでもよい。 As for the anonymization summary data 425, in addition to the above-mentioned ones, when specifying the number of deleted records, the number of records actually deleted when k anonymization is executed may be included.

（１−３−５）ｋ匿名化結果出力部
ｋ匿名化結果出力部４１４では、ｋ匿名化の実行結果である匿名化データ４２３、その付帯情報である匿名化マップデータ４２４、匿名化サマリデータ４２５を出力する。本実施形態では、図１０に示す確認画面に従ってこれらの情報を出力する。 (1-3-5) k anonymization result output unit In the k anonymization result output unit 414, anonymization data 423 that is an execution result of k anonymization, anonymization map data 424 that is incidental information, anonymization summary data 425 is output. In the present embodiment, these pieces of information are output according to the confirmation screen shown in FIG.

・匿名化データ４２３の出力
図１０に示す確認画面内の匿名化後データ欄１００１で匿名化データの名称を選択してデータ出力ボタン１００２を押すことで、該当する一般化階層データ４２２を、その列情報に記載されている列順にレコードを１行ずつ出力したＣＳＶファイルとして出力する。出力先のファイルパスは、ファイルダイアログなどで指定する。 -Output of anonymized data 423 By selecting the name of the anonymized data in the post-anonymized data column 1001 in the confirmation screen shown in FIG. 10 and pressing the data output button 1002, the corresponding generalized hierarchical data 422 is The records are output as a CSV file in which the records are output line by line in the order of the columns described in the column information. The output destination file path is specified in the file dialog.

・匿名化マップデータ４２４の出力
図１０に示す確認画面内の準識別子欄１００６で選択した準識別子と一般化階層データ４２２に格納されている当該準識別子の一般化階層とに基づいて、匿名化マップ欄１００９に、一般化階層データ４２２のツリー（木）構造を出力する。さらに、準識別子の匿名化マップデータ４２４に基づいて、変換後の準識別子の値に該当するノードを強調表示する。このように一般化階層データ４２２のツリー（木）構造のうち匿名化後の準識別子の値を強調表示したものが匿名化マップである。葉ノードであって、その祖先ノードに強調表示されているノードが存在しない葉ノードについては、変換後の準識別子の値と同じく、匿名化後のデータに出力されている属性値であるので、変換後の準識別子の値と同様に強調表示する。 Output of anonymization map data 424 Anonymization based on the quasi-identifier selected in the quasi-identifier field 1006 in the confirmation screen shown in FIG. 10 and the generalized hierarchy of the quasi-identifier stored in the generalized hierarchy data 422 In the map field 1009, the tree structure of the generalized hierarchical data 422 is output. Further, based on the quasi-identifier anonymization map data 424, a node corresponding to the converted quasi-identifier value is highlighted. In this way, the anonymization map is the one in which the value of the quasi-identifier after anonymization is highlighted in the tree structure of the generalized hierarchical data 422. For leaf nodes that are leaf nodes that do not have a highlighted node in their ancestor nodes, they are attribute values that are output in the anonymized data, just like the quasi-identifier values after conversion, It is highlighted like the quasi-identifier value after conversion.

・匿名化サマリデータ４２５の出力
匿名化サマリデータ４２５に格納されている、ｋ匿名化を実行したときに指定したｋ匿名化閾値を「ｋ匿名化閾値欄１００３」に、匿名化サマリデータ４２５に格納されているｋ匿名化データが実現するｋ匿名化閾値を「実現したｋ匿名化閾値欄１００４」に、匿名化サマリデータ４２５に格納されている情報損失率を「情報損失欄１００５」に出力する。さらに、準識別子欄１００６で選択した準識別子に基づいて、匿名化サマリデータ４２５に格納されている当該準識別子の匿名化後のすべての値を匿名化後の値の一覧表示欄１００７に、匿名化サマリデータ４２５に格納されている当該準識別子の匿名化後の値の個数を値の個数欄１００８に出力する。ユーザはこれらの情報を見ることで匿名化データの質を判断し、匿名化マップの編集が必要かどうかを判断する。 -Output of the anonymization summary data 425 The k anonymization threshold value specified when executing the anonymization stored in the anonymization summary data 425 is set to “k anonymization threshold field 1003”, and the anonymization summary data 425 is set. The k anonymization threshold realized by the stored k anonymization data is output to the “realized k anonymization threshold field 1004” and the information loss rate stored in the anonymization summary data 425 is output to the “information loss field 1005”. To do. Furthermore, based on the quasi-identifier selected in the quasi-identifier field 1006, all the values after the anonymization of the quasi-identifier stored in the anonymization summary data 425 are anonymized in the list display field 1007 of the values after anonymization. The number of anonymized values of the quasi-identifier stored in the conversion summary data 425 is output to the value number column 1008. The user determines the quality of the anonymized data by looking at these pieces of information, and determines whether the anonymized map needs to be edited.

（１−３−６）匿名化マップ編集部
匿名化マップ編集部４１５は、匿名化マップ欄１００９の表示に基づいて、匿名化前の値と匿名化後の値との対応関係をユーザによる確認が可能な態様で提示する。対応関係を確認する方法には、匿名化後の値から匿名化前の値へのナビゲーション及び匿名化前の値から匿名化後の値へのナビゲーションがある。ユーザは、これらのナビゲーション画面によって対応関係を確認した後、匿名化後の値を特殊化または一般化して元の匿名化マップを編集する。ユーザは、匿名化マップ編集部４１５を通じて提供される「実現したｋ匿名化閾値欄１００４」や「情報損失欄１００５」の数値がどのように変わるかを確認でき、分析要件に適した匿名化マップを得ることができる。 (1-3-6) Anonymization map editing unit The anonymization map editing unit 415 confirms the correspondence between the value before anonymization and the value after anonymization based on the display of the anonymization map column 1009 by the user. Is presented in a possible manner. Methods for confirming the correspondence include navigation from a value after anonymization to a value before anonymization and navigation from a value before anonymization to a value after anonymization. After confirming the correspondence relationship with these navigation screens, the user specializes or generalizes the value after anonymization and edits the original anonymization map. The user can confirm how the numerical values of the “realized anonymization threshold field 1004” and “information loss field 1005” provided through the anonymization map editing unit 415 change, and the anonymization map suitable for the analysis requirements Can be obtained.

・匿名化後の値から匿名化前の値へのナビゲーション
匿名化後の値から匿名化前の値へのナビゲーションでは、匿名化マップ編集部４１５は、匿名化後のある値を起点に、自身に匿名化される前の値（つまり、匿名化対象データの準識別子の値）がどれかを指し示す画面を表示する。匿名化マップ欄１００９の画面変化を、図１１を用いて説明する。図１１では、左図が匿名化後の準識別子の値を表しており、右図が匿名化前の準識別子の値を表している。まず、左図において、匿名化後の値として強調表示されているノード（薄い網掛けで示す）のうち太線で囲んで示すノードを選択した状態で、マウスボタンを右クリックしてコンテキストメニューを表示し、そこから「匿名化前の値を表示」という項目を選択した場合、右図が表示される。この操作により、選択した匿名化後の値に匿名化される匿名化前の値が、匿名化後の値とは異なる方法で強調表示され、該当する匿名化前の値に表示が移る。このとき、選択した匿名化後の値に匿名化される匿名化前の値は一般に複数あるので、最初に表示が移る先の匿名化前の値は任意であるが、本実施形態ではかかる匿名化前の値のうち、匿名化マップ欄１００９内で最も左に表示されているものとする。・ Navigation from the value after anonymization to the value before anonymization In the navigation from the value after anonymization to the value before anonymization, the anonymization map editor 415 starts with a certain value after anonymization. A screen indicating which value before anonymization (that is, the value of the quasi-identifier of the anonymization target data) is displayed. The screen change of the anonymized map column 1009 will be described with reference to FIG. In FIG. 11, the left figure represents the value of the quasi-identifier after anonymization, and the right figure represents the value of the quasi-identifier before anonymization. First, in the figure on the left, with the highlighted node (indicated by light shading) selected as the value after anonymization, the context menu is displayed by right-clicking the mouse button. Then, when the item “Display value before anonymization” is selected, the right figure is displayed. By this operation, the value before anonymization that is anonymized to the selected value after anonymization is highlighted in a different manner from the value after anonymization, and the display shifts to the corresponding value before anonymization. At this time, since there are generally a plurality of pre-anonymization values that are anonymized to the selected post-anonymization values, the value before the anonymization to which the display is first transferred is arbitrary, but in this embodiment such anonymity Of the values before conversion, the leftmost display in the anonymization map column 1009 is assumed.

なお、ユーザにより指定された（左図において太線で囲んで示す）ノードに匿名化される匿名化前の値の確認は以下のように行う。まず、ユーザは、あるノードを指定した状態でマウスボタンを右クリックしてコンテキストメニューを表示し、そこから「一つ前の値」又は「一つ後の値」という項目を選択する。これにより、先に選択した匿名化後の値に匿名化される匿名化前の値の間で強調表示が移動する。これにより、先に選択した匿名化後の値に匿名化される匿名化前の値を確認することができる。 The value before anonymization that is anonymized by a node designated by the user (indicated by a bold line in the left figure) is confirmed as follows. First, a user displays a context menu by right-clicking a mouse button while designating a certain node, and selects an item “previous value” or “next value” therefrom. Thereby, an emphasis display moves between the values before anonymization made anonymous by the value after anonymization selected previously. Thereby, the value before anonymization made anonymous by the value after anonymization selected previously can be confirmed.

・匿名化前の値から匿名化後の値へのナビゲーション
匿名化前の値から匿名化後の値へのナビゲーションでは、匿名化マップ編集部４１５は、匿名化前のある値が匿名化後のどの値に匿名化されるかを指し示す画面を表示する。匿名化マップ欄１００９の画面変化を、図１２を用いて説明する。図１２では、左図が匿名化前の準識別子の値を表しており、右図が匿名化後の準識別子の値を表している。まず、左図において、ユーザは、匿名化前の値として強調表示されているノード（濃い網掛けで示す）のうち太線で示すノードを選択した状態で、マウスボタンを右クリックしてコンテキストメニューを表示し、そこから「匿名化後の値を表示」という項目を選択する。これにより、選択した匿名化前の値の匿名化先である匿名化後に表示が移る。すなわち、右図に移動する。このとき、匿名化前の値に対する強調表示は解除される。・ Navigation from the value before anonymization to the value after anonymization In the navigation from the value before anonymization to the value after anonymization, the anonymization map editor 415 determines that a certain value before anonymization is after anonymization. Display a screen that indicates which value will be anonymized. The screen change of the anonymization map column 1009 will be described with reference to FIG. In FIG. 12, the left figure represents the quasi-identifier value before anonymization, and the right figure represents the quasi-identifier value after anonymization. First, in the figure on the left, the user selects the node indicated by the bold line among the highlighted nodes (shown by dark shading) as the value before anonymization, and right-clicks the mouse button to display the context menu. Display and select the item “Display value after anonymization”. Thereby, a display shifts after the anonymization which is the anonymization destination of the value before the selected anonymization. That is, it moves to the right figure. At this time, highlighting of the value before anonymization is canceled.

・特殊化
匿名化に用いる準識別子の値を特殊化するナビゲーションでは、匿名化マップ編集部４１５は、匿名化の際に使用する準識別子のうちユーザによって指定された準識別子の値を、一般化レベルのより具体的な値（下位階層の値）に変更する画面を表示する。匿名化マップ欄１００９の画面変化を、図１３を用いて説明する。図１３では、左図が特殊化前の準識別子の値を表しており、右図が特殊化後の準識別子の値を表している。まず、左図において、ユーザは、強調表示されているノードを右クリックしてコンテキストメニューを表示し、そこから「特殊化」という項目を選択する。これにより、匿名化後の準識別子の値が、左図で選択された一値から、当該値よりも１階層下にある値すべての値に変更される。さらに、当該変更後の値がすべて強調表示され、元々選択した値の強調表示は解除される。さらに、編集後の結果に基づいて、実現したｋ匿名化閾値欄１００４、情報損失欄１００５、匿名化後の値の一覧表示欄１００７及び値の個数欄１００８を更新する。なお、編集した結果がｋ匿名性を満たさないなどの理由により、編集を取り消したい場合のために、アンドゥ機能を提供してもよい。 Specialization In the navigation that specializes the value of the quasi-identifier used for anonymization, the anonymization map editing unit 415 generalizes the value of the quasi-identifier specified by the user among the quasi-identifiers used for anonymization. Display a screen to change to a more specific value of the level (value of the lower hierarchy). The screen change of the anonymization map column 1009 will be described with reference to FIG. In FIG. 13, the left figure represents the value of the quasi-identifier before specialization, and the right figure represents the value of the quasi-identifier after specialization. First, in the left diagram, the user right-clicks a highlighted node to display a context menu, and selects an item “specialization” therefrom. Thereby, the value of the quasi-identifier after anonymization is changed from the one value selected in the left figure to all values that are one level lower than the value. Further, all the changed values are highlighted, and the highlighting of the originally selected value is canceled. Furthermore, based on the result after editing, the realized k anonymization threshold value column 1004, the information loss column 1005, the list display column 1007 of values after anonymization, and the number of values column 1008 are updated. Note that an undo function may be provided for a case where editing is to be canceled because the edited result does not satisfy k anonymity.

・一般化
匿名化に用いる準識別子の値を一般化するナビゲーションでは、匿名化マップ編集部４１５は、匿名化の際に使用する準識別子のうちユーザによって指定された準識別子の値を、一般化レベルのより抽象的な値（上位階層の値）に変更する画面を表示する。匿名化マップ欄１００９の画面変化を、図１４を用いて説明する。図１４では、左図が一般化前の準識別子の値を表しており、右図が一般化後の準識別子の値を表している。まず、左図において、ユーザは、強調表示されているノードを右クリックしてコンテキストメニューを表示し、そこから「一般化」という項目を選択する。これにより、匿名化後の準識別子の値が、左図で選択された値から、当該値よりも１階層上にある値に変更となる。このとき、当該一般化前の準識別子の値であって、一般化後に新しく匿名化後の値になったものの子孫の関係（下位階層として包含される関係）にある準識別子の値についても、当該一般化後に新しく匿名化後の準識別子の値になったものに変更になる。さらに、当該一般化後の値が強調表示され、一般化前の値であって、当該一般化後の値の子孫の関係にある準識別子の値の強調表示は解除される。さらに、編集後の結果に基づいて、実現したｋ匿名化閾値欄１００４、情報損失欄１００５、匿名化後の値の一覧表示欄１００７及び値の個数欄１００８を更新する。 Generalization In the navigation that generalizes the value of the quasi-identifier used for anonymization, the anonymization map editing unit 415 generalizes the value of the quasi-identifier specified by the user among the quasi-identifiers used for anonymization. Display a screen for changing to a more abstract value (higher level value) of the level. The screen change of the anonymization map column 1009 will be described with reference to FIG. In FIG. 14, the left figure represents the value of the quasi-identifier before generalization, and the right figure represents the value of the quasi-identifier after generalization. First, in the left diagram, the user right-clicks a highlighted node to display a context menu, and selects an item “generalization” from the context menu. Thereby, the value of the quasi-identifier after anonymization is changed from the value selected in the left figure to a value that is one level higher than the value. At this time, the value of the quasi-identifier before the generalization, and the value of the quasi-identifier in the descendant relationship (the relationship included as a lower hierarchy) of the quasi-identifier value after the generalization and the new anonymization value After the generalization, it will be changed to a new quasi-identifier value after anonymization. Furthermore, the value after the generalization is highlighted, and the highlighting of the value of the quasi-identifier that is a value before generalization and is a descendant of the value after the generalization is canceled. Furthermore, based on the result after editing, the realized k anonymization threshold value column 1004, the information loss column 1005, the list display column 1007 of values after anonymization, and the number of values column 1008 are updated.

なお、編集した結果が一般化レベルを上げすぎてしまい、分析要件を満たさないなどの理由により、編集を取り消したい場合のために、アンドゥ機能を提供してもよい。 Note that an undo function may be provided in the case where it is desired to cancel the editing because the edited result has raised the generalization level too much and does not satisfy the analysis requirements.

（１−３−７）推奨編集内容提示部
推奨編集内容提示部４１７は、前述した特殊化または一般化の後、次に実施すべき編集内容を一定の基準に基づいてユーザに提示する。この機能は、例えば図１０に示す確認画面において、推奨編集内容提示ボタン１０１２が押されることで有効になる。図１５に、推奨編集手段提示画面１５００の例を示す。推奨編集手段提示画面１５００は、匿名化マップに対して実施することができる編集内容を、編集後に想定される情報損失の昇順に、推奨編集手段１５０１に表示するものである。推奨編集手段１５０１に示す各行は、編集対象の準識別子と、その匿名化後値を示すノードと、当該ノードに対して実施する編集内容と、前段までの情報によって示される編集によって実現されるｋ匿名化閾値、情報損失を表している。ｋ匿名化実行部４１３で削除レコード数を指定する場合は、さらに実現される削除レコード数を表示してもよい。推奨編集手段１５０１の行を１つ選択して適用ボタン１５０２を押すことにより、該当する行に示される編集が実施される。さらに、この編集が完了したあとに、編集後の匿名化マップに基づいて、更に実施可能な編集内容を表示してもよい。 (1-3-7) Recommended Editing Content Presentation Unit The recommended editing content presentation unit 417 presents the editing content to be executed next to the user based on a certain standard after the above-described specialization or generalization. This function becomes effective when, for example, the recommended editing content presentation button 1012 is pressed on the confirmation screen shown in FIG. FIG. 15 shows an example of a recommended editing means presentation screen 1500. The recommended editing means presentation screen 1500 displays editing contents that can be performed on the anonymization map on the recommended editing means 1501 in ascending order of information loss assumed after editing. Each row shown in the recommended editing means 1501 is realized by editing indicated by the quasi-identifier to be edited, the node indicating the anonymized value, the editing content to be performed on the node, and the information up to the previous stage. It represents anonymization threshold and information loss. When the number of deleted records is designated by the k anonymization execution unit 413, the number of deleted records to be realized may be displayed. By selecting one row of the recommended editing means 1501 and pressing the apply button 1502, the editing shown in the corresponding row is performed. Furthermore, after this editing is completed, editable contents that can be further implemented may be displayed based on the anonymized map after editing.

（１−３−８）調整後匿名化実行部
調整後匿名化実行部４１６では、編集した匿名化マップによって匿名化対象データ４２１の準識別子の値を置き換えたものを、匿名化データ４２３として出力する。例えば、図１０における再匿名化実行ボタン１０１１から実行することができる。 (1-3-8) Adjusted Anonymization Execution Unit The adjusted anonymization execution unit 416 outputs, as anonymized data 423, the value of the quasi-identifier of the anonymization target data 421 replaced by the edited anonymization map. To do. For example, it can be executed from the re-anonymization execution button 1011 in FIG.

（１−４）まとめ
前述のパーソナル情報匿名化支援装置を用いれば、ｋ匿名化データの作成に用いられた準識別子の値と階層レベルとが、木構造表示された一般化階層データ上において強調表示されるため、ユーザは、ｋ匿名化データの作成に用いられた準識別子の値と階層レベルを容易に確認することができる。また、前述のパーソナル情報匿名化支援装置を用いれば、GUI上の操作を通じ、ｋ匿名化データを得るための準識別子の値を調整（例えば「特殊化」、「一般化」）できるため、より柔軟でより簡便な調整作業が実現される。 (1-4) Summary If the personal information anonymization support apparatus described above is used, the value of the quasi-identifier and the hierarchy level used to create k anonymization data are emphasized on the generalized hierarchy data displayed in a tree structure. Since it is displayed, the user can easily confirm the value of the quasi-identifier and the hierarchical level used to create the k-anonymized data. Further, if the personal information anonymization support device described above is used, the value of the quasi-identifier for obtaining k anonymization data can be adjusted (for example, “specialization”, “generalization”) through operation on the GUI. Flexible and simpler adjustment work is realized.

（２）他の実施の形態
本発明は、上述した実施の形態に限定されるものでなく、様々な変形例を含んでいる。例えば、上述した実施の形態は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備える必要はない。また、前述した実施の形態に既知の構成を追加し、前述した実施の形態の一部を削除し、又は前述した実施の形態の一部を既知の構成で置換することもできる。 (2) Other Embodiments The present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and it is not always necessary to include all the configurations described. It is also possible to add a known configuration to the above-described embodiment, delete a part of the above-described embodiment, or replace a part of the above-described embodiment with a known configuration.

また、上述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現しても良い。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、SSD（Solid State Drive）等の記憶装置、又は、ICカード、SDカード、DVD等の記憶媒体に格納することができる。また、制御線や情報線は、説明上必要と考えられるものを示すものであり、製品上必要な全ての制御線や情報線を表すものでない。実際にはほとんど全ての構成が相互に接続されていると考えて良い。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them, for example, with an integrated circuit. Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, and an SSD (Solid State Drive), or a storage medium such as an IC card, an SD card, and a DVD. Control lines and information lines indicate what is considered necessary for the description, and do not represent all control lines and information lines necessary for the product. In practice, it can be considered that almost all components are connected to each other.

４００…入出力装置
４１０…情報処理装置
４１１…匿名化対象データ取込部
４１２…一般化階層作成部
４１３…k匿名化実行部
４１４…k匿名化結果出力部
４１５…匿名化マップ編集部
４１６…調整後匿名化実行部
４１７…推奨編集内容提示部
４２０…記憶装置
４２１…匿名化対象データ
４２２…一般化階層データ
４２３…匿名化データ
４２４…匿名化マップデータ
４２５…匿名化サマリデータ
６００…匿名化対象データ取込画面
６０１…匿名化対象データ欄
６０２…匿名化対象データ名称欄
６０３…取込ボタン
７００…k匿名化実行画面
７０１…匿名化対象データ名称欄
７０２…k匿名化閾値欄
７０３…準識別子欄
７０４…一般化階層欄
７０５…k匿名化手法欄
７０６…出力データ名称欄
７０７…実行ボタン
１０００…ｋ匿名化結果ビューア画面
１００１…匿名化後データ欄
１００２…データ出力ボタン
１００３…ｋ匿名化閾値欄
１００４…実現したｋ匿名化閾値欄
１００５…情報損失欄
１００６…準識別子欄
１００７…匿名化後の値の一覧表示欄
１００８…値の個数欄
１００９…匿名化マップ欄
１０１０…匿名化マップ出力ボタン
１０１１…再匿名化実行ボタン
１０１２…推奨編集内容提示ボタン
１５００…推奨編集手段提示画面
１５０１…推奨編集手段
１５０２…適用ボタン 400 ... I / O device 410 ... Information processing device 411 ... Anonymization target data capturing unit 412 ... Generalized hierarchy creation unit 413 ... k Anonymization execution unit 414 ... k Anonymization result output unit 415 ... Anonymization map editing unit 416 ... Post-adjustment anonymization execution unit 417 ... recommended editing content presentation unit
420 ... Storage device 421 ... Anonymization target data 422 ... Generalized hierarchical data 423 ... Anonymization data 424 ... Anonymization map data 425 ... Anonymization summary data 600 ... Anonymization target data capture screen 601 ... Anonymization target data column 602 ... anonymization target data name column 603 ... import button 700 ... k anonymization execution screen 701 ... anonymization target data name column 702 ... k anonymization threshold value column 703 ... semi-identifier column 704 ... generalized hierarchy column 705 ... k anonymization Method column 706 ... Output data name column 707 ... Execution button 1000 ... k anonymization result viewer screen 1001 ... Data field after anonymization 1002 ... Data output button 1003 ... k anonymization threshold column 1004 ... realized k anonymization threshold column 1005 ... Information loss column 1006 ... Semi-identifier column 1007 ... List of values after anonymization display column 1008 ... Number of values column 1009 ... Anonymous Map field 1010 ... anonymous map output button 1011 ... re-anonymous execution button 1012 ... Recommended edits presentation button 1500 ... Recommended editing means presentation screen 1501 ... Recommended editing means 1502 ... Apply button

Claims

An anonymization target data capturing unit for capturing anonymization target data;
A generalized hierarchy creating unit for creating generalized hierarchy data based on the anonymization target data;
A first k anonymization execution unit that anonymizes the anonymization target data using the generalized hierarchical data, outputs k anonymization data that is a result of k anonymization, and accompanying information;
On the generalized hierarchical data displayed in a tree structure, an anonymization map highlighting the value of the quasi-identifier used to create the k anonymization data is displayed on the screen, and editing input to the anonymization map An anonymized map editor that accepts
A second k anonymization execution unit that anonymizes the anonymization target data using the anonymized map after editing;
A storage device for storing the anonymization target data, the generalized hierarchy data, the k anonymization data, and the anonymization map;
A trial and error support device for anonymizing personal information.

In the trial and error support device for personal information anonymization according to claim 1,
The anonymization map editing unit displays the amount of information lost at the time of creating the k anonymization data as an information loss rate together with the anonymization map.

In the trial and error support device for personal information anonymization according to claim 1,
The anonymization map editing unit, as the editing input, specifies a quasi-identifier to be edited, generalization from a specified quasi-identifier to a higher hierarchy, or specialization to a lower hierarchy than a specified quasi-identifier A trial and error support device for anonymizing personal information characterized by accepting designation.

In the trial and error support device for personal information anonymization according to claim 1,
The anonymization map editing unit accepts a quasi-identifier to be edited through designation of a mouse pointer, generalizes from a quasi-identifier received as an object to be edited to a higher hierarchy, or specializes to a lower hierarchy than a designated quasi-identifier The personal information anonymization trial-and-error support device is characterized in that the designation of a personal information is received from a selection item candidate displayed by right-clicking the mouse.

In the trial and error support device for personal information anonymization according to claim 1,
The anonymization map editing unit, when the display of the value after anonymization is selected for the quasi-identifier specified on the anonymization map displayed on the screen, the anonymity corresponding to the value of the specified quasi-identifier The quasi-identifier value before conversion is highlighted on the anonymization map. A personal information anonymization trial-and-error support device.

In the trial and error support device for personal information anonymization according to claim 1,
The anonymization map editing unit, when the display of the value before anonymization is selected for the quasi-identifier designated on the anonymization map displayed on the screen, the anonymity corresponding to the value of the designated quasi-identifier A trial-and-error support device for personal information anonymization, wherein the value of the quasi-identifier after conversion is highlighted on the anonymization map.

In the trial and error support device for personal information anonymization according to claim 1,
For each quasi-identifier used to create the anonymization data, the k-anonymization threshold assumed when the quasi-identifier used for anonymization is generalized to a higher hierarchy than the current quasi-identifier and the quasi-identifier after generalization Information loss rate indicating the amount of information lost when k anonymized data is created on the screen, and / or the quasi-identifier used for anonymization is special in the lower hierarchy than the current quasi-identifier Recommended edit content presentation that displays on the screen the information loss rate indicating the amount of information lost when k anonymization data is created using the k anonymization threshold and the quasi-identifier after specialization A trial and error support device for anonymizing personal information, further comprising:

An anonymization target data capturing unit for capturing anonymization target data;
A generalized hierarchy creating unit for creating generalized hierarchy data based on the anonymization target data;
The anonymization target data is k anonymized using the generalized hierarchical data, and the k anonymization execution unit that outputs k anonymization data and its incidental information as a result of k anonymization,
An anonymization map display unit that displays on the screen an anonymization map highlighting the value of the quasi-identifier used to create the k anonymization data;
A storage device for storing the anonymization target data, the generalized hierarchy data, the k anonymization data, and the anonymization map;
A trial and error support device for anonymizing personal information.