JP2015170169A

JP2015170169A - Personal information anonymization program and personal information anonymization device

Info

Publication number: JP2015170169A
Application number: JP2014045022A
Authority: JP
Inventors: 和明井堀; Kazuaki Ihori; 中村　雄一; Yuichi Nakamura; 雄一中村
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2014-03-07
Filing date: 2014-03-07
Publication date: 2015-09-28

Abstract

PROBLEM TO BE SOLVED: To provide a personal information anonymization technic allowing a user to correctly determining the usefulness of an anonymized data by providing the information loss ratio significant for the user.SOLUTION: The personal information anonymization program sets a boundary between a part of the personal information on which a simple anonymization is made and a part on which the simple anonymization is not made, and calculates information loss ratio by using the former as a reference.

Description

本発明は、パーソナル情報を匿名化する技術に関する。 The present invention relates to a technique for anonymizing personal information.

クラウドコンピューティングや分散処理技術の発展により、これまで企業や団体で蓄積されてきた大量のデータの分析や利活用が現実的な時間内で可能となってきている。一方で、これらのデータは多くの場合、氏名や住所のような個人を特定できる情報や、年齢・職業・学歴のように必ずしも個人を特定できないが個人に関する情報を含んでいる。例えば、医療分野における患者への投薬データや、流通分野における個人の購買履歴などがこれにあたる。このような情報はパーソナル情報と呼ばれる。パーソナル情報を利活用するに際してはプライバシー侵害への懸念があるため、プライバシー保護に配慮する必要がある。 With the development of cloud computing and distributed processing technology, it has become possible to analyze and utilize a large amount of data accumulated by companies and organizations so far in a realistic time. On the other hand, in many cases, these data include information that can identify an individual such as a name and address, and information about an individual that cannot be identified, such as age, occupation, and educational background. For example, this includes medication data for patients in the medical field and personal purchase history in the distribution field. Such information is called personal information. When using personal information, there is a concern about privacy infringement, so it is necessary to consider privacy protection.

匿名化技術は、パーソナル情報の一部または全部の情報に曖昧化などの加工を施すことによってプライバシー流出を防止する技術である。匿名化技術は大きく単純匿名化と統合匿名化の２つに分けられる。単純匿名化は、健康保険の保険者番号や氏名といった個人情報の除去や置き換え、年齢や日付情報の丸めなど、レコードが持つ個人に関する情報を曖昧化するのに用いる。しかしながら、パーソナル情報を単純匿名化することにより得られるデータから真に個人の識別性が失われているかどうかは、一般的には定量的に保証されていない。統合匿名化は、個人の識別性が失われているかどうかを表す指標が所定の条件を満たすように個人に関する情報を曖昧化する。 The anonymization technique is a technique for preventing leakage of privacy by performing processing such as obscuration on part or all of personal information. Anonymization technology can be broadly divided into two types: simple anonymization and integrated anonymization. Simple anonymization is used to obscure information about individuals in records, such as the removal and replacement of personal information such as health insurance insurer numbers and names, and rounding of age and date information. However, it is generally not quantitatively guaranteed whether or not personal identification is truly lost from data obtained by simply anonymizing personal information. Integrated anonymization blurs information about an individual so that an index indicating whether or not the individual's identity is lost satisfies a predetermined condition.

プライバシーの保護は、個人情報保護法で定める個人情報を削除するだけでは不十分であるといわれている。例えば医療分野においては、患者の年齢・居住地域・職業・性別などの、それ単独では個人を特定できない属性を組み合わせることにより個人を特定できることがある。このような、組み合わせることで個人特定が可能になる属性は、準識別子と呼ばれる。ｋ匿名化は統合匿名化の代表的な技術であり、パーソナル情報内に同じ準識別子の組み合わせがｋ件（匿名化閾値と呼ぶ）以上現れるように属性値を曖昧化する。例えば、「３１歳・神奈川県・システムエンジニア」という属性値の組み合わせを「３０代・神奈川県・ＩＴ」というように曖昧化する。ｋ匿名化は、パーソナル情報の準識別子の属性値を曖昧化する、一般化階層というデータ構造を使用する。 It is said that it is not sufficient to protect privacy by simply deleting personal information defined by the Personal Information Protection Law. For example, in the medical field, an individual may be identified by combining attributes such as the age, residence area, occupation, and sex of a patient that cannot be identified by themselves. Such attributes that can be identified by combining them are called quasi-identifiers. k anonymization is a typical technique of integrated anonymization, and attribute values are made ambiguous so that k or more combinations of the same quasi-identifier appear in personal information (referred to as anonymization threshold). For example, the attribute value combination of “31 years old, Kanagawa Prefecture, system engineer” is obscured as “30s, Kanagawa Prefecture, IT”. k anonymization uses a data structure called a generalized hierarchy that obfuscates attribute values of quasi-identifiers of personal information.

下記特許文献１は、パーソナル情報の準識別子の属性値の出現頻度を利用して一般化階層を生成することより、理論的に情報損失が最小になる一般化階層を得る技術を開示している。 Patent Document 1 below discloses a technique for obtaining a generalized hierarchy that theoretically minimizes information loss by generating a generalized hierarchy using the frequency of appearance of attribute values of quasi-identifiers of personal information. .

ＷＯ２０１１１４５４０１号公報WO2011145401 gazette

ｋ匿名化を実施することによる情報損失がユーザにとって許容できるか否かは、例えばｋ匿名化による情報損失率をユーザに対して提示してユーザ自身に判断させることができる。このときユーザに対して提示する値によっては、情報損失率が適切であるか否かをユーザが正しく判断することができない可能性がある。この理由について以下に説明する。 Whether or not the information loss due to the k anonymization is acceptable for the user can be determined by presenting the information loss rate due to the k anonymization to the user, for example. At this time, depending on the value presented to the user, the user may not be able to correctly determine whether or not the information loss rate is appropriate. The reason for this will be described below.

パーソナル情報を匿名化するときは、まず単純匿名化を実施した後にｋ匿名化を実施する。つまり、パーソナル情報の匿名化においては通常、個人情報の削除や切り落とし、日付や数値項目の丸めなどを実施してからｋ匿名化を実施する。したがって、情報損失率を計算する基準点によっては、ユーザから見ると情報が過剰に損なわれたように見えたり、あるいは匿名化が不十分であるように見えたりする可能性がある。 When anonymizing personal information, first anonymization is performed after simple anonymization. That is, in the anonymization of personal information, the anonymization is usually performed after the personal information is deleted or cut off, the date or numerical items are rounded. Therefore, depending on the reference point for calculating the information loss rate, the information may appear to be excessively damaged from the viewpoint of the user, or the anonymization may appear to be insufficient.

本発明は、上記のような状況に鑑みてなされたものであり、ユーザにとって意味のある情報損失率を提供することにより、匿名化されたデータの有用性をユーザが正しく判断することができるパーソナル情報匿名化技術を提供することを目的とする。 The present invention has been made in view of the above situation, and by providing an information loss rate that is meaningful to the user, the user can correctly determine the usefulness of the anonymized data. The purpose is to provide information anonymization technology.

本発明のパーソナル情報匿名化プログラムは、パーソナル情報に対して単純匿名化を実施する部分と実施しない部分との間の境界を設定し、前者を基準として情報損失率を算出する。 The personal information anonymization program of this invention sets the boundary between the part which implements simple anonymization with respect to personal information, and the part which does not implement, and calculates an information loss rate on the basis of the former.

本発明に係るパーソナル情報匿名化プログラムによれば、あらかじめ単純匿名化を実施することによって不要な情報をフィルタリングした結果を基準として情報損失率を算出するので、ユーザにとって意味のある情報損失率を提示することができる。これにより、ユーザは匿名化データの有用性をより正しく判断することができる。 According to the personal information anonymization program according to the present invention, since the information loss rate is calculated based on the result of filtering unnecessary information by executing simple anonymization in advance, an information loss rate meaningful to the user is presented. can do. Thereby, the user can judge the usefulness of anonymization data more correctly.

パーソナル情報匿名化装置の構成を示す図である。It is a figure which shows the structure of a personal information anonymization apparatus. 受付部２００が表示する匿名化メイン画面３００の画面イメージである。It is a screen image of the anonymization main screen 300 which the reception part 200 displays. 匿名化対象データ取込画面４００の画面イメージである。It is a screen image of the anonymization object data acquisition screen 400. FIG. 匿名化対象データ取込画面４００を介して匿名化対象データ２４１を取り込む手順を説明するフローチャートである。It is a flowchart explaining the procedure which takes in the anonymization object data 241 via the anonymization object data acquisition screen. 単純匿名化実行画面５００の画面イメージである。It is a screen image of the simple anonymization execution screen 500. 設定ガジェット５６１の例を示す図である。5 is a diagram illustrating an example of a setting gadget 561. FIG. 設定ガジェット５６１の例を示す図である。5 is a diagram illustrating an example of a setting gadget 561. FIG. 設定ガジェット５６１の例を示す図である。5 is a diagram illustrating an example of a setting gadget 561. FIG. 設定ガジェット５６１の例を示す図である。5 is a diagram illustrating an example of a setting gadget 561. FIG. 単純匿名化実行画面５００を介して単純匿名化を実施する手順を説明するフローチャートである。It is a flowchart explaining the procedure of implementing simple anonymization via the simple anonymization execution screen 500. ｋ匿名化設定画面６００の画面イメージである。It is a screen image of k anonymization setting screen 600. FIG. ｋ匿名化設定画面６００を介してｋ匿名化に関するパラメータを設定する手順を説明するフローチャートである。12 is a flowchart for explaining a procedure for setting parameters relating to k anonymization via a k anonymization setting screen 600. 一般化階層編集画面７００の画面イメージである。7 is a screen image of a generalized hierarchy editing screen 700. ｋ匿名化実行画面８００の画面イメージである。It is a screen image of k anonymization execution screen 800. ｋ匿名化実行画面８００を介してｋ匿名化を実行する手順を説明するフローチャートである。10 is a flowchart illustrating a procedure for executing k anonymization via a k anonymization execution screen 800. 匿名化データ出力画面９００の画面イメージである。It is a screen image of the anonymization data output screen 900. FIG. 匿名化データ出力画面９００を介して匿名化データ２４６を出力する手順を説明するフローチャートである。10 is a flowchart illustrating a procedure for outputting anonymized data 246 via an anonymized data output screen 900. 計算設定画面１０００の画面イメージである。It is a screen image of the calculation setting screen 1000. 計算定義ファイル１１００の記載例である。This is a description example of the calculation definition file 1100. 情報量を計算する手順を実装する計算クラスの実装規約を説明するクラス図である。It is a class diagram explaining the implementation rule of the calculation class which implements the procedure which calculates the amount of information. 計算設定画面１０００を用いてカスタム計算クラスを登録する手順を説明するフローチャートである。10 is a flowchart illustrating a procedure for registering a custom calculation class using a calculation setting screen 1000. 情報損失率算出部２３９が各データの情報量を計算する手順を説明するフローチャートである。It is a flowchart explaining the procedure in which the information loss rate calculation part 239 calculates the information amount of each data. 匿名化管理データ２４３の構成例を示す図である。It is a figure which shows the structural example of the anonymization management data 243. FIG. 匿名化属性管理データ２４４の構成例を示す図である。It is a figure which shows the structural example of the anonymization attribute management data 244. FIG. 匿名化対象データ２４１の構成例を示す図である。It is a figure which shows the structural example of the anonymization object data 241. FIG. 一般化階層データ２４７の構成例を示す図である。6 is a diagram illustrating a configuration example of generalized hierarchical data 247. FIG.

以下、添付図面を参照して本発明の実施形態について説明する。ただし、本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。本実施形態では、匿名化対象データとしてはアンケートデータを想定するが、これに限るものではない。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, it should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In the present embodiment, questionnaire data is assumed as the anonymization target data, but is not limited thereto.

図１は、本発明に係るパーソナル情報匿名化装置の構成を示す図である。パーソナル情報匿名化装置は、受付部２００、計算定義設定部２１０、匿名化装置２３０、情報管理装置２４０を備える。受付部２００は、ユーザが匿名化の設定や実行などを指示するユーザインタフェースである。計算定義設定部２１０は、匿名化に伴って発生する情報損失を計算する方法をユーザが定義するユーザインタフェースである。これら装置および機能部はネットワーク２２０によって接続されている。 FIG. 1 is a diagram showing a configuration of a personal information anonymization apparatus according to the present invention. The personal information anonymization device includes a reception unit 200, a calculation definition setting unit 210, an anonymization device 230, and an information management device 240. The accepting unit 200 is a user interface that instructs the user to set or execute anonymization. The calculation definition setting unit 210 is a user interface that allows a user to define a method for calculating information loss that occurs with anonymization. These devices and functional units are connected by a network 220.

匿名化装置２３０は、制御部２３１、データ取込部２３２、単純匿名化部２３３、ｋ匿名化部２３４、ｋ匿名化設定部２３５、一般化階層編集部２３６、情報量計算設定部２３７、匿名化データ出力部２３８、情報損失率算出部２３９を備える。 The anonymization device 230 includes a control unit 231, a data capturing unit 232, a simple anonymization unit 233, a k anonymization unit 234, a k anonymization setting unit 235, a generalized hierarchy editing unit 236, an information amount calculation setting unit 237, and anonymity. A data output unit 238 and an information loss rate calculation unit 239.

制御部２３１は、ユーザの操作を受け付けて各処理を実施する演算機能部である。データ取込部２３２は、匿名化対象データを取り込んで名前や属性情報などを設定する。単純匿名化部２３３は、単純匿名化を実施する。ｋ匿名化部２３４は、匿名化対象データ２４１を匿名化して匿名化データ２４６として出力する。ｋ匿名化設定部２３５は、単純匿名化後のデータに対して実施するｋ匿名化のパラメータを設定する。一般化階層編集部２３６は、ユーザが指定した準識別子の一般化階層を編集し保存する。情報量計算設定部２３７は、計算定義設定部２１０が受け付けた設定に基づいて情報量の計算方法を設定する。匿名化データ出力部２３８は、ｋ匿名化されたデータをファイルに出力する。情報損失率算出部２３９は、情報量計算設定部２３７による設定に基づいて情報損失率を計算する。 The control unit 231 is an arithmetic function unit that receives a user operation and performs each process. The data capturing unit 232 captures anonymization target data and sets name, attribute information, and the like. The simple anonymization unit 233 performs simple anonymization. The k anonymization unit 234 anonymizes the anonymization target data 241 and outputs it as anonymization data 246. The k anonymization setting unit 235 sets k anonymization parameters to be performed on the data after simple anonymization. The generalized hierarchy editing unit 236 edits and stores the generalized hierarchy of the quasi-identifier specified by the user. The information amount calculation setting unit 237 sets an information amount calculation method based on the setting received by the calculation definition setting unit 210. The anonymized data output unit 238 outputs k anonymized data to a file. The information loss rate calculation unit 239 calculates the information loss rate based on the setting by the information amount calculation setting unit 237.

情報管理装置２４０は、ハードディスク装置などの記憶装置を用いて構成され、匿名化対象データ２４１、計算起点データ２４２、匿名化管理データ２４３、匿名化属性管理データ２４４、単純匿名化データ２４５、匿名化データ２４６、一般化階層データ２４７を格納する。 The information management device 240 is configured using a storage device such as a hard disk device, and includes anonymization target data 241, calculation origin data 242, anonymization management data 243, anonymization attribute management data 244, simple anonymization data 245, and anonymization. Data 246 and generalized hierarchical data 247 are stored.

匿名化対象データ２４１は、匿名化処理前のオリジナルのパーソナル情報である。計算起点データ２４２は、情報損失率を計算する基準点となるデータである。匿名化管理データ２４３は、匿名化結果を格納するテーブル名などのメタデータやｋ匿名化の設定パラメータを保持する。匿名化属性管理データ２４４は、匿名化対象データ２４１の属性情報、当該属性に対する準識別子指定、情報量の計算方法などについての設定データである。単純匿名化データ２４５は、匿名化対象データ２４１に対して単純匿名化を実施して得られるデータである。匿名化データ２４６は、単純匿名化データ２４５に対してｋ匿名化を実施して得られるデータである。一般化階層データ２４７は、ｋ匿名化において用いる一般化階層を定義したデータである。 The anonymization target data 241 is original personal information before anonymization processing. The calculation starting point data 242 is data serving as a reference point for calculating the information loss rate. The anonymization management data 243 holds metadata such as a table name for storing an anonymization result and k anonymization setting parameters. The anonymization attribute management data 244 is setting data regarding the attribute information of the anonymization target data 241, the quasi-identifier designation for the attribute, the information amount calculation method, and the like. The simple anonymization data 245 is data obtained by performing simple anonymization on the anonymization target data 241. The anonymization data 246 is data obtained by performing k anonymization on the simple anonymization data 245. The generalized hierarchy data 247 is data defining a generalized hierarchy used in k anonymization.

受付部２００は、例えばＷｅｂページとして実装してもよいし、Ｊａｖａのグラフィカルユーザインタフェース部品であるＳＷＴ（ＳＷｉｎｇＴｏｏｌｋｉｔ）を用いたクライアントサーバアプリケーションとして実装してもよい。匿名化装置２３０は、例えば制御部２３１以外の各機能部をソフトウェアモジュールとして実装し、制御部２３１がこれを実行することによって実現することもできるし、同様の機能を回路デバイスなどのハードウェアによって実装することにより実現することもできる。情報管理装置２４０は、例えば関係データベース、キーバリュー型分散データベースなどの形態で実装することができる。 The accepting unit 200 may be implemented as, for example, a Web page, or may be implemented as a client server application using SWT (Swing Talkit), which is a Java graphical user interface component. The anonymization device 230 can be implemented by, for example, mounting each functional unit other than the control unit 231 as a software module, and the control unit 231 executing the same, or a similar function can be realized by hardware such as a circuit device. It can also be realized by mounting. The information management device 240 can be implemented, for example, in the form of a relational database, a key-value type distributed database, or the like.

＜受付部２００＞
図２は、受付部２００が表示する匿名化メイン画面３００の画面イメージである。匿名化メイン画面３００は、データメニュー３１０、匿名化メニュー３２０、画面表示エリア３３０を有する。データメニュー３１０は、データの入出力を取り扱う操作メニューである。匿名化メニュー３２０は、匿名化の設定や実行を指示する操作メニューである。画面表示エリア３３０は、各メニューの選択内容に応じて子画面を表示する。 <Reception unit 200>
FIG. 2 is a screen image of the anonymization main screen 300 displayed by the reception unit 200. The anonymization main screen 300 includes a data menu 310, an anonymization menu 320, and a screen display area 330. The data menu 310 is an operation menu that handles data input / output. The anonymization menu 320 is an operation menu that instructs setting and execution of anonymization. The screen display area 330 displays a sub-screen according to the selection contents of each menu.

データメニュー３１０は、匿名化対象データ２４１を取り込むメニュー３１１と、匿名化データ２４６をファイル出力するメニュー３１２とを有する。匿名化メニュー３２０は、単純匿名化を実行するよう指示するメニュー３２１、ｋ匿名化のパラメータを設定するよう指示するメニュー３２２、一般化階層の編集を開始するメニュー３２３、ｋ匿名化を実行するよう指示するメニュー３２４を有する。 The data menu 310 includes a menu 311 for importing the anonymization target data 241 and a menu 312 for outputting the anonymization data 246 as a file. The anonymization menu 320 includes a menu 321 for instructing to execute simple anonymization, a menu 322 for instructing to set a parameter for k anonymization, a menu 323 for starting editing of the generalized hierarchy, and an execution of k anonymization. It has a menu 324 for indicating.

＜匿名化対象データ取込画面４００＞
図３は、匿名化対象データ取込画面４００の画面イメージである。ユーザが図２におけるメニュー３１１を選択すると、受付部２００は画面表示エリア３３０内に匿名化対象データ取込画面４００を表示する。ユーザは匿名化対象データ取込画面４００を用いて、匿名化対象データ２４１の元データを取り込むとともに、データの名称、データに含まれる列情報、分析対象とする列、分析対象の各列に対する情報量の計算方法などを設定する。画面上の各項目については図４と併せて説明する。 <Anonymization target data capture screen 400>
FIG. 3 is a screen image of the anonymization target data capture screen 400. When the user selects the menu 311 in FIG. 2, the reception unit 200 displays the anonymization target data capture screen 400 in the screen display area 330. The user captures the original data of the anonymization target data 241 using the anonymization target data capture screen 400, and the data name, column information included in the data, columns to be analyzed, information on each column to be analyzed Set the amount calculation method. Each item on the screen will be described in conjunction with FIG.

図４は、匿名化対象データ取込画面４００を介して匿名化対象データ２４１を取り込む手順を説明するフローチャートである。以下図４の各ステップについて説明する。 FIG. 4 is a flowchart for explaining a procedure for fetching the anonymization target data 241 via the anonymization target data fetch screen 400. Hereinafter, each step of FIG. 4 will be described.

（図４：ステップＳ４０１〜Ｓ４０２）
ユーザは、匿名化メイン画面３００のデータメニュー３１０からメニュー３１１（取込）を選択する（Ｓ４０１）。受付部２００は、画面表示エリア３３０内に匿名化対象データ取込画面４００を表示する（Ｓ４０２）。 (FIG. 4: Steps S401 to S402)
The user selects the menu 311 (capture) from the data menu 310 of the anonymization main screen 300 (S401). The accepting unit 200 displays the anonymization target data capture screen 400 in the screen display area 330 (S402).

（図４：ステップＳ４０３）
ユーザは、匿名化対象データ取込画面４００上で、データ名４１０、データファイル４２０、列情報４３０を入力する。データ名４１０は、取り込むデータを区別するため任意に付与する名称である。データファイル４２０は、取り込むパーソナル情報を記述したデータファイルのパス名である。列情報４３０は、データァイル内に含まれるカラムを表示および編集する欄である。データファイル４２０は、例えばファイルダイアログを開いて取り込むデータファイルを選択することにより指定してもよい。 (FIG. 4: Step S403)
The user inputs a data name 410, a data file 420, and column information 430 on the anonymization target data capture screen 400. The data name 410 is a name arbitrarily given to distinguish the data to be captured. The data file 420 is a path name of a data file describing personal information to be imported. The column information 430 is a column for displaying and editing columns included in the data file. The data file 420 may be specified by, for example, opening a file dialog and selecting a data file to be imported.

（図４：ステップＳ４０３：列情報４３０の詳細）
列情報４３０は、列名４３１、型４３２、分析対象指定４３３を有する。取り込んだデータファイルが例えばＣＳＶ（ＣｏｍｍａＳｅｐａｒａｔｅｄＶａｌｕｅ）ファイルである場合、データファイル自身は生のデータのみを記述し、カラムの属性については記述していない。そこでユーザは、列情報４３０上においてこれら属性情報を指定し、匿名化装置２３０へ通知する。列名４３１は各カラムの名称である。型４３２は各カラムのデータ型である。分析対象指定４３３は各カラムを分析対象とするか否かを指定する。分析対象指定４３３をオンにした列についてはさらに、情報量計算方法４３４と係数４３５を指定する。情報量計算方法４３４は、各カラムが有する情報量を計算する方法を指定する欄であり、既定の計算方法のほか、計算定義設定部２１０を介して設定されたカスタム計算方法を選択することもできる。カスタム計算方法を指定する手順については後述する。係数４３５は匿名化処理において用いる係数であり、正の実数を指定する。 (FIG. 4: Step S403: Details of the column information 430)
The column information 430 includes a column name 431, a type 432, and an analysis target designation 433. When the captured data file is, for example, a CSV (Comma Separated Value) file, the data file itself describes only raw data and does not describe column attributes. Therefore, the user designates the attribute information on the column information 430 and notifies the anonymization device 230. The column name 431 is the name of each column. A type 432 is a data type of each column. The analysis target designation 433 designates whether or not each column is to be analyzed. An information amount calculation method 434 and a coefficient 435 are further specified for the column for which the analysis target specification 433 is turned on. The information amount calculation method 434 is a field for designating a method for calculating the amount of information included in each column. In addition to the default calculation method, a custom calculation method set via the calculation definition setting unit 210 can be selected. it can. The procedure for specifying the custom calculation method will be described later. The coefficient 435 is a coefficient used in the anonymization process, and specifies a positive real number.

（図４：ステップＳ４０４〜Ｓ４０５）
ユーザが各情報を入力した後に取込ボタン４４０を押すと、受付部２００は、データ名４１０、データファイル４２０の本体、列情報４３０を匿名化装置２３０へ送信し、制御部２３１はこれらを受信する（Ｓ４０４）。データ取込部２３２は、データファイル４２０の本体を匿名化対象データ２４１として格納する（Ｓ４０５）。 (FIG. 4: Steps S404 to S405)
When the user presses the capture button 440 after inputting each information, the reception unit 200 transmits the data name 410, the main body of the data file 420, and the column information 430 to the anonymization device 230, and the control unit 231 receives them. (S404). The data capturing unit 232 stores the main body of the data file 420 as the anonymization target data 241 (S405).

（図４：ステップＳ４０６）
データ取込部２３２は、匿名化対象データ取込画面４００上に入力された値を、匿名化管理データ２４３と匿名化属性管理データ２４４に書き込む。これらデータの構成については後述する。本ステップにおいては、匿名化管理データ２４３にはテーブル名とデータ名だけが書き込まれ、匿名化属性管理データ２４４には匿名化管理データ２４３と同じテーブル名を用いて列情報が書き込まれる。テーブル名は、パーソナル情報匿名化装置全体で一意になる名称を自動的に設定する。その他のテーブル名についても同様である。 (FIG. 4: Step S406)
The data capture unit 232 writes the values input on the anonymization target data capture screen 400 in the anonymization management data 243 and the anonymization attribute management data 244. The configuration of these data will be described later. In this step, only the table name and data name are written in the anonymization management data 243, and column information is written in the anonymization attribute management data 244 using the same table name as the anonymization management data 243. As the table name, a name that is unique for the entire personal information anonymization apparatus is automatically set. The same applies to other table names.

（図４：ステップＳ４０７〜Ｓ４０８）
データ取込部２３２は、制御部２３１に対して処理完了を通知する（Ｓ４０７）。制御部２３１は、ネットワーク２２０を経由して受付部２００に対して処理完了を通知する（Ｓ４０８）。 (FIG. 4: Steps S407 to S408)
The data fetch unit 232 notifies the control unit 231 of the completion of processing (S407). The control unit 231 notifies the reception unit 200 of the completion of processing via the network 220 (S408).

＜単純匿名化実行画面５００＞
図５Ａは、単純匿名化実行画面５００の画面イメージである。ユーザが図２におけるメニュー３２１を選択すると、受付部２００は画面表示エリア３３０内に単純匿名化実行画面５００を表示する。ユーザは単純匿名化実行画面５００を用いて、単純匿名化に関するパラメータ等を設定し、その設定に基づいて単純匿名化を実行する。画面上の各項目については図６と併せて説明する。 <Simple anonymization execution screen 500>
FIG. 5A is a screen image of the simple anonymization execution screen 500. When the user selects the menu 321 in FIG. 2, the receiving unit 200 displays the simple anonymization execution screen 500 in the screen display area 330. Using the simple anonymization execution screen 500, the user sets parameters related to simple anonymization and executes simple anonymization based on the setting. Each item on the screen will be described in conjunction with FIG.

図５Ｂ〜図５Ｅは、後述する設定ガジェット５６１の例を示す図である。設定ガジェット５６１において設定することができる項目は、単純匿名化手法毎に異なる。ただしいずれの手法においても、その手法を適用するカラムを指定する点は共通である。 5B to 5E are diagrams illustrating an example of a setting gadget 561 described later. Items that can be set in the setting gadget 561 differ for each simple anonymization method. However, in any method, the point to specify the column to which the method is applied is common.

図６は、単純匿名化実行画面５００を介して単純匿名化を実施する手順を説明するフローチャートである。以下図６の各ステップについて説明する。 FIG. 6 is a flowchart illustrating a procedure for performing simple anonymization via the simple anonymization execution screen 500. Hereinafter, each step of FIG. 6 will be described.

（図６：ステップＳ６０１〜Ｓ６０２）
ユーザは、匿名化メイン画面３００の匿名化メニュー３２０からメニュー３２１を選択する（Ｓ６０１）。受付部２００は、画面表示エリア３３０内に単純匿名化実行画面５００を表示する（Ｓ６０２）。 (FIG. 6: Steps S601 to S602)
The user selects the menu 321 from the anonymization menu 320 of the anonymization main screen 300 (S601). The accepting unit 200 displays the simple anonymization execution screen 500 in the screen display area 330 (S602).

（図６：ステップＳ６０３）
ユーザは、入力データ名５１０を入力する。入力データ名５１０は、匿名化管理データ２４３が格納している「データ名」の中から選択するようにすることもできる。ユーザは、単純匿名化の結果として出力するデータの名称を、出力データ名５２０に入力する。ユーザは、単純匿名化において適用する匿名化手法を匿名化手法５４０から選択して追加ボタン５５０を押す。選択された単純匿名化手法の設定ガジェット５６１が匿名化設定５６０に追加される。 (FIG. 6: Step S603)
The user inputs the input data name 510. The input data name 510 may be selected from “data names” stored in the anonymization management data 243. The user inputs the name of data to be output as a result of simple anonymization in the output data name 520. The user selects an anonymization method to be applied in simple anonymization from the anonymization method 540 and presses the add button 550. The setting gadget 561 of the selected simple anonymization method is added to the anonymization setting 560.

（図６：ステップＳ６０３：設定ガジェット５６１の詳細）
設定ガジェット５６１は、個々の単純匿名化手法に関するパラメータを設定するＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）である。設定ガジェット５６１のタイトル部分をクリックすることにより、設定対象を切り替えることができる。匿名化手法５４０の中には各匿名化手法に加えて境界５６４が含まれており、匿名化設定５６０内に１個（または複数）追加することができる。計算起点データ２４２は、匿名化対象データ２４１の最初のカラムから境界５６４によって区切られる前のカラムまでに対して単純匿名化を実施した結果得られるデータである。ユーザは上ボタン５６２と下ボタン５６３を用いて、各匿名化設定や境界５６４の順番を入れ替えることができる。各匿名化設定を削除するときは、削除する設定ガジェット５６１を選択して削除ボタン５５１を押す。ユーザは単純匿名化に関する設定を完了すると、実行ボタン５７０を押す。 (FIG. 6: Step S603: Details of Setting Gadget 561)
The setting gadget 561 is a GUI (Graphical User Interface) for setting parameters regarding each simple anonymization method. By clicking the title part of the setting gadget 561, the setting target can be switched. The anonymization method 540 includes a boundary 564 in addition to each anonymization method, and one (or a plurality) can be added to the anonymization setting 560. The calculation starting point data 242 is data obtained as a result of performing simple anonymization from the first column of the anonymization target data 241 to the column before being delimited by the boundary 564. The user can change the order of each anonymization setting and the boundary 564 using the upper button 562 and the lower button 563. When deleting each anonymization setting, the setting gadget 561 to be deleted is selected and the delete button 551 is pressed. When the user completes the setting related to simple anonymization, the user presses the execution button 570.

（図６：ステップＳ６０４〜Ｓ６０５）
受付部２００は、ユーザが単純匿名化実行画面５００上で入力した設定パラメータと併せて、単純匿名化を実行するよう指示する命令を、匿名化装置２３０に対してネットワーク２２０を経由して送信する（Ｓ６０４）。制御部２３１は、その命令と設定パラメータを受信し、単純匿名化部２３３に引き渡す（Ｓ６０５）。 (FIG. 6: Steps S604 to S605)
The receiving unit 200 transmits a command for instructing to execute simple anonymization to the anonymization device 230 via the network 220 together with the setting parameters input by the user on the simple anonymization execution screen 500. (S604). The control unit 231 receives the command and the setting parameter, and delivers them to the simple anonymization unit 233 (S605).

（図６：ステップＳ６０６）
単純匿名化部２３３は、設定ガジェット５６１が表示されている順にしたがって、境界５６４以前に設定されている匿名化対象データ２４１内のカラムに対して単純匿名化を実施し、その結果を計算起点データ２４２として格納する。計算起点データ２４２はデータベースのテーブル形式で格納される。単純匿名化部２３３は、テーブル名を匿名化管理データ２４３の起点テーブル名として書き込む。 (FIG. 6: Step S606)
In accordance with the order in which the setting gadget 561 is displayed, the simple anonymization unit 233 performs simple anonymization on the column in the anonymization target data 241 set before the boundary 564, and calculates the result as the calculation origin data. Store as 242. The calculation starting point data 242 is stored in a database table format. The simple anonymization unit 233 writes the table name as the starting table name of the anonymization management data 243.

（図６：ステップＳ６０７）
単純匿名化部２３３は、設定ガジェット５６１が表示されている順にしたがって、境界以後に設定されている計算起点データ２４２内のカラムに対して単純匿名化処理を実施し、その結果を単純匿名化データ２４５として格納する。単純匿名化データ２４５はデータベースのテーブル形式で格納される。単純匿名化部２３３は、テーブル名と単純匿名化実行画面５００上で入力されたデータ名を、それぞれ匿名化管理データ２４３の単純匿名化テーブル名と単純匿名化データ名として書き込む。 (FIG. 6: Step S607)
In accordance with the order in which the setting gadget 561 is displayed, the simple anonymization unit 233 performs simple anonymization processing on the columns in the calculation origin data 242 set after the boundary, and uses the result as simple anonymization data. Stored as H.245. The simple anonymization data 245 is stored in a database table format. The simple anonymization unit 233 writes the table name and the data name input on the simple anonymization execution screen 500 as the simple anonymization table name and the simple anonymization data name of the anonymization management data 243, respectively.

（図６：ステップＳ６０８）
情報損失率算出部２３９は、計算起点データ２４２と単純匿名化データ２４５それぞれの情報量を求め、情報損失率を計算する。情報量の計算手順については後述する。情報損失率は、下記式によって求める。 (FIG. 6: Step S608)
The information loss rate calculation unit 239 calculates the information loss rate by obtaining the information amounts of the calculation origin data 242 and the simple anonymization data 245, respectively. The information amount calculation procedure will be described later. The information loss rate is obtained by the following formula.

（図６：ステップＳ６０９〜Ｓ６１１）
単純匿名化部２３３は、制御部２３１に対して情報損失率を通知する（Ｓ６０９）。制御部２３１は、ネットワーク２２０を経由して、受付部２００に対して情報損失率を通知する（Ｓ６１０）。受付部２００は、受信した情報損失率を単純匿名化実行画面５００の処理結果：情報損失率５３０に表示する（Ｓ６１１）。 (FIG. 6: Steps S609 to S611)
The simple anonymization unit 233 notifies the control unit 231 of the information loss rate (S609). The control unit 231 notifies the information loss rate to the reception unit 200 via the network 220 (S610). The receiving unit 200 displays the received information loss rate in the processing result of the simple anonymization execution screen 500: the information loss rate 530 (S611).

計算起点データ２４２の意義について説明する。例えば、匿名化対象データ２４１が個人の職業に関する情報を保持しており、その職業が業種によって細かく分類されているとする。このとき、分析者は匿名化対象データ２４１内に含まれる細かい業種の分類を必要とせず、より粗い分類レベルで足りる場合がある。このとき匿名化対象データ２４１を基準として情報量を計算すると、分析者の想定以上に情報量が大きく損なわれたかのように見える場合がある。分析者の立場からすれば必要なのは粗い分類レベルの業種に関する情報なので、そのレベルまで業種を一般化したとしても、情報量損失はないと考えるのが適切である。計算起点データ２４２を基準として情報損失率を算出することにより、分析者の立場で必要かつ十分なレベルの情報が確保されている状態を起点として、分析者にとって適切な情報損失率を計算することができる。 The significance of the calculation starting point data 242 will be described. For example, it is assumed that the anonymization target data 241 holds information related to an individual occupation, and the occupation is finely classified according to the type of business. At this time, the analyst does not need to classify the detailed business types included in the anonymization target data 241 and may need a coarser classification level. At this time, if the amount of information is calculated based on the anonymization target data 241, it may appear as if the amount of information is greatly impaired more than the analyst expects. From the analyst's point of view, what is needed is information about the industry at a rough classification level, so even if the industry is generalized to that level, it is appropriate to think that there is no information loss. By calculating the information loss rate on the basis of the calculation starting point data 242, the information loss rate appropriate for the analyst is calculated starting from a state in which the necessary and sufficient level of information is secured from the viewpoint of the analyst. Can do.

＜ｋ匿名化設定画面６００＞
図７は、ｋ匿名化設定画面６００の画面イメージである。ユーザが図２におけるメニュー３２２を選択すると、受付部２００は画面表示エリア３３０内にｋ匿名化設定画面６００を表示する。ユーザはｋ匿名化設定画面６００を用いて、ｋ匿名化に関するパラメータ等を設定する。画面上の各項目については図８と併せて説明する。 <K anonymization setting screen 600>
FIG. 7 is a screen image of the k anonymization setting screen 600. When the user selects the menu 322 in FIG. 2, the reception unit 200 displays the k anonymization setting screen 600 in the screen display area 330. The user uses the k anonymization setting screen 600 to set parameters related to k anonymization. Each item on the screen will be described in conjunction with FIG.

図８は、ｋ匿名化設定画面６００を介してｋ匿名化に関するパラメータを設定する手順を説明するフローチャートである。以下図８の各ステップについて説明する。 FIG. 8 is a flowchart for explaining a procedure for setting parameters relating to k anonymization via the k anonymization setting screen 600. Hereinafter, each step of FIG. 8 will be described.

（図８：ステップＳ８０１〜Ｓ８０２）
ユーザは、匿名化メイン画面３００の匿名化メニュー３２０からメニュー３２２を選択する（Ｓ８０１）。受付部２００は、画面表示エリア３３０内にｋ匿名化設定画面６００を表示する（Ｓ８０２）。 (FIG. 8: Steps S801 to S802)
The user selects the menu 322 from the anonymization menu 320 of the anonymization main screen 300 (S801). The accepting unit 200 displays the k anonymization setting screen 600 in the screen display area 330 (S802).

（図８：ステップＳ８０３）
ユーザは、処理対象データ名６１０を入力する。処理対象データ名６１０は、匿名化管理データ２４３が格納している「単純匿名化データ名」の中から選択するようにすることもできる。ユーザは、ｋ匿名化閾値６２０（２以上の整数）を入力する。ユーザは準識別子として用いるカラムを準識別子６３４のチェックボックスによって選択し、選択したカラムについてはさらに優先順位６３５を入力する。連番６３１、名前６３２、型６３３は読取専用である。優先順位６３５は、ｋ匿名化を実行するときに匿名化処理前の値を保存する優先順位である。言い換えると、一般化階層によって値を置き換えるときに、遷移する階層数をできるだけ小さくしたい準識別子を優先順に指定するものである。準識別子の値は１以上の整数で、値が小さいほど高優先順位となる。ユーザは全ての値を入力すると設定ボタン６４０を押す。 (FIG. 8: Step S803)
The user inputs the processing target data name 610. The processing target data name 610 may be selected from “simple anonymized data names” stored in the anonymization management data 243. The user inputs k anonymization threshold 620 (an integer of 2 or more). The user selects a column to be used as a quasi-identifier by a check box of the quasi-identifier 634, and inputs a priority 635 for the selected column. The serial number 631, the name 632, and the mold 633 are read-only. The priority order 635 is a priority order for storing values before anonymization processing when k anonymization is executed. In other words, when replacing values by the generalized hierarchy, quasi-identifiers that want to make the number of hierarchy levels to be transitioned as small as possible are specified in order of priority. The value of the quasi-identifier is an integer greater than or equal to 1, and the lower the value, the higher the priority. When the user inputs all values, the user presses the setting button 640.

（図８：ステップＳ８０４）
受付部２００は、ユーザがｋ匿名化設定画面６００上で入力した各設定パラメータと併せて、これら設定パラメータを記録するよう指示する命令を、匿名化装置２３０に対してネットワーク２２０を経由して送信する。制御部２３１は、その命令と設定パラメータを受信し、ｋ匿名化設定部２３５に引き渡す。 (FIG. 8: Step S804)
The receiving unit 200 transmits an instruction to record the setting parameters to the anonymization device 230 via the network 220 together with the setting parameters input by the user on the k anonymization setting screen 600. To do. The control unit 231 receives the command and the setting parameter, and passes them to the k anonymization setting unit 235.

（図８：ステップＳ８０５）
ｋ匿名化設定部２３５は、ユーザがｋ匿名化設定画面６００上で入力したｋ匿名化設定パラメータを各データに書き込む。具体的には以下の通りである。ｋ匿名化閾値６２０は匿名化管理データ２４３のｋ値に書き込まれる。準識別子６３４によって選択されたカラムについては優先順位６３５を匿名化属性管理データ２４４の優先順位に書き込む。準識別子６３４によって選択されなかったカラムについては０（ゼロ）を匿名化属性管理データ２４４の優先順位に書き込む。 (FIG. 8: Step S805)
The k anonymization setting unit 235 writes the k anonymization setting parameters input by the user on the k anonymization setting screen 600 in each data. Specifically, it is as follows. The k anonymization threshold 620 is written in the k value of the anonymization management data 243. For the column selected by the quasi-identifier 634, the priority 635 is written in the priority of the anonymized attribute management data 244. For columns not selected by the quasi-identifier 634, 0 (zero) is written in the priority order of the anonymized attribute management data 244.

（図８：ステップＳ８０６）
一般化階層編集部２３６は、準識別子６３４によって選択されたカラムについて一般化階層を自動生成し、一般化階層データ２４７として書き込む。具体的な手順としては、例えば特許文献１に示す技術を用いることができる。 (FIG. 8: Step S806)
The generalized hierarchy editing unit 236 automatically generates a generalized hierarchy for the column selected by the semi-identifier 634 and writes it as generalized hierarchy data 247. As a specific procedure, for example, the technique disclosed in Patent Document 1 can be used.

（図８：ステップＳ８０７〜Ｓ８０８）
ｋ匿名化設定部２３５は制御部２３１に対してｋ匿名化設定完了を通知する（Ｓ８０７）。制御部２３１は受付部２００に対してネットワーク２２０経由でｋ匿名化設定完了を通知する（Ｓ８０８）。 (FIG. 8: Steps S807 to S808)
The k anonymization setting unit 235 notifies the control unit 231 of the completion of k anonymization setting (S807). The control unit 231 notifies the accepting unit 200 of the completion of the k anonymization setting via the network 220 (S808).

＜一般化階層編集画面７００＞
図９は、一般化階層編集画面７００の画面イメージである。ユーザが図２におけるメニュー３２３を選択すると、受付部２００は画面表示エリア３３０内に一般化階層編集画面７００を表示する。ユーザは一般化階層編集画面７００を用いて、分析に適した匿名化データ２４６が得られるように一般化階層２４７を編集し、一般化階層データ２４７に改めて格納する。 <Generalized hierarchy editing screen 700>
FIG. 9 is a screen image of the generalized hierarchy editing screen 700. When the user selects the menu 323 in FIG. 2, the receiving unit 200 displays the generalized hierarchy editing screen 700 in the screen display area 330. Using the generalized hierarchy editing screen 700, the user edits the generalized hierarchy 247 so as to obtain anonymized data 246 suitable for analysis, and stores it again in the generalized hierarchy data 247.

＜ｋ匿名化実行画面８００＞
図１０は、ｋ匿名化実行画面８００の画面イメージである。ユーザが図２におけるメニュー３２４を選択すると、受付部２００は画面表示エリア３３０内にｋ匿名化実行画面８００を表示する。ユーザはｋ匿名化実行画面８００を用いてｋ匿名化を実行し、その結果を匿名化データ２４６として出力する。画面上の各項目については図１１と併せて説明する。 <K anonymization execution screen 800>
FIG. 10 is a screen image of the k anonymization execution screen 800. When the user selects menu 324 in FIG. 2, reception unit 200 displays k anonymization execution screen 800 in screen display area 330. The user executes k anonymization using the k anonymization execution screen 800 and outputs the result as anonymized data 246. Each item on the screen will be described in conjunction with FIG.

図１１は、ｋ匿名化実行画面８００を介してｋ匿名化を実行する手順を説明するフローチャートである。以下図１１の各ステップについて説明する。 FIG. 11 is a flowchart illustrating a procedure for executing k anonymization via the k anonymization execution screen 800. Hereinafter, each step of FIG. 11 will be described.

（図１１：ステップＳ１１０１〜Ｓ１１０２）
ユーザは、匿名化メイン画面３００の匿名化メニュー３２０からメニュー３２４を選択する（Ｓ１１０１）。受付部２００は、画面表示エリア３３０にｋ匿名化実行画面８００を表示する（Ｓ１１０２）。 (FIG. 11: Steps S1101 to S1102)
The user selects the menu 324 from the anonymization menu 320 of the anonymization main screen 300 (S1101). The accepting unit 200 displays the k anonymization execution screen 800 in the screen display area 330 (S1102).

（図１１：ステップＳ１１０３）
ユーザは、処理対象データ名８１０を入力する。処理対象データ名８１０は、匿名化管理データ２４３が保持している「単純匿名化データ名」の中から選択するようにすることもできる。ユーザは出力される匿名化データ２４６のデータ名８２０を入力し、実行ボタン８４０を押す。 (FIG. 11: Step S1103)
The user inputs a processing target data name 810. The processing target data name 810 can be selected from “simple anonymized data names” held in the anonymization management data 243. The user inputs the data name 820 of the anonymized data 246 to be output and presses the execution button 840.

（図１１：ステップＳ１１０４）
受付部２００は、ユーザがｋ匿名化実行画面８００上で入力した処理対象データ名８１０と出力データ名８２０と併せて、ｋ匿名化を実行するよう指示する命令を、匿名化装置２３０に対してネットワーク２２０を経由して送信する。制御部２３１は、その命令と各データ名を受信し、ｋ匿名化部２３４に引き渡す。 (FIG. 11: Step S1104)
The receiving unit 200 instructs the anonymization device 230 to instruct to perform k anonymization together with the processing target data name 810 and the output data name 820 input on the k anonymization execution screen 800 by the user. Transmit via the network 220. The control unit 231 receives the command and each data name, and delivers it to the k anonymization unit 234.

（図１１：ステップＳ１１０５）
ｋ匿名化部２３４は、匿名化管理データ２４３のレコードの中から、単純匿名化データ名が処理対象データ名８１０と合致するレコードを特定する。ｋ匿名化部２３４は、そのレコードのｋ値（ｋ匿名化閾値）を満たすような一般化階層のレイヤの組み合わせを算出する。 (FIG. 11: Step S1105)
The k anonymization unit 234 specifies a record whose simple anonymized data name matches the processing target data name 810 from among the records of the anonymization management data 243. The k anonymization unit 234 calculates a combination of layers in the generalized hierarchy that satisfies the k value (k anonymization threshold) of the record.

（図１１：ステップＳ１１０６）
ｋ匿名化部２３４は、算出した組み合わせに従って単純匿名化データ２４５の準識別子の属性値を置き換えたレコードを作成し、匿名化データ２４６に書き出す。ｋ匿名化閾値を満たすような一般化階層のレイヤの組み合わせについては、公知の技術を用いて求めることができる。匿名化データ２４６はデータベースのテーブル形式で格納される。ｋ匿名化部２３４は、テーブル名とｋ匿名化実行画面８００上で入力された出力データ名８２０を、それぞれ匿名化管理データ２４３の匿名化テーブル名と匿名化データ名として書き込む。 (FIG. 11: Step S1106)
The k anonymization unit 234 creates a record in which the attribute value of the quasi-identifier of the simple anonymization data 245 is replaced according to the calculated combination, and writes it in the anonymization data 246. A combination of layers in the generalized hierarchy that satisfies the k anonymization threshold can be obtained using a known technique. Anonymized data 246 is stored in a database table format. The k anonymization unit 234 writes the table name and the output data name 820 input on the k anonymization execution screen 800 as the anonymization table name and the anonymization data name of the anonymization management data 243, respectively.

（図１１：ステップＳ１１０７）
情報損失率算出部２３９は、計算起点データ２４２と匿名化データ２４６それぞれの情報量を求め、情報損失率を計算する。情報量の計算手順については後述する。情報損失率は、下記式によって求める。 (FIG. 11: Step S1107)
The information loss rate calculation unit 239 calculates the information loss rate by obtaining the information amounts of the calculation starting point data 242 and the anonymized data 246, respectively. The information amount calculation procedure will be described later. The information loss rate is obtained by the following formula.

（図１１：ステップＳ１１０８〜Ｓ１１１０）
ｋ匿名化部２３４は、制御部２３１に対して情報損失率を通知する（Ｓ１１０８）。制御部２３１は、ネットワーク２２０を経由して、受付部２００に対して情報損失率を通知する（Ｓ１１０９）。受付部２００は、受信した情報損失率をｋ匿名化実行画面８００の処理結果：情報損失率８３０に表示する（Ｓ１１１０）。 (FIG. 11: Steps S1108 to S1110)
The k anonymization unit 234 notifies the control unit 231 of the information loss rate (S1108). The control unit 231 notifies the information loss rate to the reception unit 200 via the network 220 (S1109). The accepting unit 200 displays the received information loss rate in the processing result of the k anonymization execution screen 800: information loss rate 830 (S1110).

＜匿名化データ出力画面９００＞
図１２は、匿名化データ出力画面９００の画面イメージである。ユーザが図２におけるメニュー３１２を選択すると、受付部２００は画面表示エリア３３０内に匿名化データ出力画面９００を表示する。ユーザは匿名化データ出力画面９００を用いてｋ匿名化が完了した匿名化データ２４６（例えばＣＳＶ形式）を出力する。画面上の各項目については図１３と併せて説明する。 <Anonymized data output screen 900>
FIG. 12 is a screen image of the anonymized data output screen 900. When the user selects the menu 312 in FIG. 2, the receiving unit 200 displays the anonymized data output screen 900 in the screen display area 330. Using the anonymized data output screen 900, the user outputs anonymized data 246 (for example, CSV format) for which k anonymization has been completed. Each item on the screen will be described in conjunction with FIG.

図１３は、匿名化データ出力画面９００を介して匿名化データ２４６を出力する手順を説明するフローチャートである。以下図１３の各ステップについて説明する。 FIG. 13 is a flowchart for explaining the procedure for outputting the anonymized data 246 via the anonymized data output screen 900. Hereinafter, each step of FIG. 13 will be described.

（図１３：ステップＳ１３０１〜Ｓ１３０３）
ユーザは、匿名化メイン画面３００のデータメニュー３１０からメニュー３１２を選択する（Ｓ１３０１）。受付部２００は、画面表示エリア３３０に匿名化データ出力画面９００を表示する（Ｓ１３０２）。ユーザは、出力対象データ名９１０を入力して出力ボタン９２０を押す（Ｓ１３０３）。出力対象データ名９１０は、ｋ匿名化が完了した匿名化データ２４６の名称から選択できるようにしてもよい。 (FIG. 13: Steps S1301 to S1303)
The user selects the menu 312 from the data menu 310 of the anonymization main screen 300 (S1301). The accepting unit 200 displays the anonymized data output screen 900 in the screen display area 330 (S1302). The user inputs the output target data name 910 and presses the output button 920 (S1303). The output target data name 910 may be selected from the name of the anonymized data 246 for which k anonymization has been completed.

（図１３：ステップＳ１３０４）
受付部２００は、ユーザが匿名化データ出力画面９００上で入力した出力対象データ名９１０と併せて、ネットワーク２２０を経由してデータ出力命令を匿名化装置２３０に対して送信する。制御部２３１は、その命令と出力対象データ名９１０を受信し、匿名化データ出力部２３８に引き渡す。 (FIG. 13: Step S1304)
The receiving unit 200 transmits a data output command to the anonymization device 230 via the network 220 together with the output target data name 910 input on the anonymized data output screen 900 by the user. The control unit 231 receives the command and the output target data name 910 and passes them to the anonymized data output unit 238.

（図１３：ステップＳ１３０５〜Ｓ１３０７）
匿名化データ出力部２３８は、出力対象データ名９１０によって指定される匿名化データ２４６のレコードを読み出してファイル出力する（Ｓ１３０５）。匿名化データ出力部２３８は、出力したファイルを制御部２３１に返す（Ｓ１３０６）。制御部２３１は、匿名化データ出力部２３８から受信したファイルを、ネットワーク２２０を経由して受付部２００へ転送する（Ｓ１３０７）。 (FIG. 13: Steps S1305 to S1307)
The anonymized data output unit 238 reads the record of the anonymized data 246 designated by the output target data name 910 and outputs the file (S1305). The anonymized data output unit 238 returns the output file to the control unit 231 (S1306). The control unit 231 transfers the file received from the anonymized data output unit 238 to the reception unit 200 via the network 220 (S1307).

（図１３：ステップＳ１３０８）
受付部２００は、転送されたファイルを受信し、匿名化データ出力画面９００上でファイル保存ダイアログを開く。ユーザはファイル保存ダイアログ上でファイルの保存先を指定する。受付部２００は、指定されたファイル名でファイルを保存する。 (FIG. 13: Step S1308)
The accepting unit 200 receives the transferred file and opens a file save dialog on the anonymized data output screen 900. The user designates a file save destination on the file save dialog. The accepting unit 200 saves the file with the specified file name.

＜計算設定画面１０００＞
図１４は、計算設定画面１０００の画面イメージである。計算定義設定部２１０は、受付部２００と同様に例えばＷｅｂアプリケーションとして実装することができる。計算定義設定部２１０は、計算設定画面１０００を提供する。計算設定画面１０００は、情報量を計算する手順を実装したカスタム計算モジュールを登録するために用いる画面である。画面上の各項目については図１７と併せて説明する。 <Calculation setting screen 1000>
FIG. 14 is a screen image of the calculation setting screen 1000. The calculation definition setting unit 210 can be implemented as a Web application, for example, as with the receiving unit 200. The calculation definition setting unit 210 provides a calculation setting screen 1000. The calculation setting screen 1000 is a screen used for registering a custom calculation module that implements a procedure for calculating the amount of information. Each item on the screen will be described in conjunction with FIG.

＜計算定義ファイル１１００＞
図１５は、計算定義ファイル１１００の記載例である。計算定義ファイル１１００は、情報量の計算方法の名称とその計算方法を実装したＪａｖａクラス名を記述するテキストファイルである。計算方法の名称は、”ａｎｏｎｙｍｉｚｅ．ｃａｌｃｕｌａｔｉｏｎ．ｃｌａｓｓ．”に、計算設定画面１０００上で入力された計算方法の名称１０１０を連結したものである。計算方法を実装したＪａｖａクラスは、後述するＩＣａｌｃｕｌａｔｏｒインタフェースを実装しており、完全修飾名で記述される。 <Calculation definition file 1100>
FIG. 15 is a description example of the calculation definition file 1100. The calculation definition file 1100 is a text file that describes the name of the information amount calculation method and the Java class name that implements the calculation method. The name of the calculation method is obtained by concatenating the name of the calculation method 1010 input on the calculation setting screen 1000 to “anonymize.calculation.class.”. The Java class that implements the calculation method implements an ICalculator interface, which will be described later, and is described with a fully qualified name.

＜計算クラスの実装規約＞
図１６は、情報量を計算する手順を実装する計算クラスの実装規約を説明するクラス図である。計算クラスは、情報量を計算するメソッドを定義するＩＣａｌｃｕｌａｔｏｒインタフェースを実装する必要がある。このインタフェースはメソッドｃａｌｃｕｌａｔｅを持つ。このメソッドは、テーブル名と列名を引数として受け取り、引数によって指定されたテーブル：列の情報量を計算して倍精度浮動小数点数として返す。ＤｅｆａｕｌｔＣａｌｃｕｌａｔｏｒクラスは、ＩＣａｌｃｕｌａｔｏｒインタフェースを実装した上で既定の計算方法を実装したクラスである。カスタム計算方法を用いることを指定しなかったカラムについては、情報量計算設定部２３７はＤｅｆａｕｌｔＣａｌｃｕｌａｔｏｒクラスを用いて情報量を計算する。 <Calculation class implementation rules>
FIG. 16 is a class diagram for explaining an implementation rule of a calculation class that implements a procedure for calculating an information amount. The calculation class needs to implement an ICalculator interface that defines a method for calculating the amount of information. This interface has a method calculate. This method receives the table name and column name as arguments, calculates the information amount of the table: column specified by the argument, and returns it as a double-precision floating point number. The DefaultCalculator class is a class that implements a predetermined calculation method after implementing the ICalculator interface. For columns that are not designated to use the custom calculation method, the information amount calculation setting unit 237 calculates the information amount by using the DefaultCalculator class.

＜計算設定画面１０００の動作＞
図１７は、計算設定画面１０００を用いてカスタム計算クラスを登録する手順を説明するフローチャートである。以下図１７の各ステップについて説明する。 <Operation of calculation setting screen 1000>
FIG. 17 is a flowchart illustrating a procedure for registering a custom calculation class using the calculation setting screen 1000. Hereinafter, each step of FIG. 17 will be described.

（図１７：ステップＳ１７０１〜Ｓ１７０２）
ユーザは、計算方法名１０１０、計算モジュール１０２０、計算クラス名１０３０を入力する（Ｓ１７０１）。計算方法名１０１０は、カスタム計算方法の識別名であり、情報量計算方法４３４に対応する。計算モジュール１０２０は、計算方法を実装したＪａｖａクラスを含むｊａｒファイルである。計算クラス名１０３０は、そのｊａｒファイル内に含まれる、計算方法を実装したＪａｖａクラスの完全修飾名である。ユーザは各項目を入力し、登録ボタン１０４０を押す（Ｓ１７０２）。 (FIG. 17: Steps S1701 to S1702)
The user inputs a calculation method name 1010, a calculation module 1020, and a calculation class name 1030 (S1701). The calculation method name 1010 is an identification name of the custom calculation method, and corresponds to the information amount calculation method 434. The calculation module 1020 is a jar file including a Java class that implements a calculation method. The calculation class name 1030 is a fully qualified name of the Java class that includes the calculation method and is included in the jar file. The user inputs each item and presses the registration button 1040 (S1702).

（図１７：ステップＳ１７０３）
計算定義設定部２１０は、ネットワーク２２０を経由して、各項目を匿名化装置２３０へ転送する。情報量計算設定部２３７は、計算モジュール１０２０で指定したファイルを匿名化装置２３０のＪａｖａクラスライブラリに追加する。 (FIG. 17: Step S1703)
The calculation definition setting unit 210 transfers each item to the anonymization device 230 via the network 220. The information amount calculation setting unit 237 adds the file specified by the calculation module 1020 to the Java class library of the anonymization device 230.

（図１７：ステップＳ１７０４）
情報量計算設定部２３７は、計算定義ファイル１１００にエントリを追加する。エントリのキーは、ａｎｏｎｙｍｉｚｅ．ｃａｌｃｕｌａｔｉｏｎ．ｃｌａｓｓ．＜計算方法名１０１０＞である。エントリの値は、計算クラス名１０３０である。 (FIG. 17: Step S1704)
The information amount calculation setting unit 237 adds an entry to the calculation definition file 1100. The key of the entry is anonymous. calculation. class. <Calculation method name 1010>. The value of the entry is a calculation class name 1030.

（図１７：ステップＳ１７０５〜Ｓ１７０６）
情報量計算設定部２３７は、制御部２３１に処理完了を通知する（Ｓ１７０５）。制御部２３１は、計算定義設定部２１０に対してネットワーク２２０を経由して処理完了を通知する（Ｓ１７０６）。 (FIG. 17: Steps S1705 to S1706)
The information amount calculation setting unit 237 notifies the control unit 231 of the completion of processing (S1705). The control unit 231 notifies the calculation definition setting unit 210 of the completion of processing via the network 220 (S1706).

＜カスタム計算クラスの実装例＞
以下ではＣｕｓｔｏｍＣａｌｃｕｌａｔｏｒクラスのメソッドｃａｌｃｕｌａｔｅの実装例を説明する。例えば医療分野において、年齢は分析に意味のある項目であるので情報量の計算対象として考えられる。一方で、年齢の数値を四捨五入するような単純匿名化を施すことを考えたとき、単純匿名化後に年齢の数値が取る値の個数は、匿名化前に比べて１０分の１程度になる。この条件下で情報量を計算した場合、情報損失が大きくなりすぎる場合がある。そこで、匿名化前後の年齢誤差を情報量に盛り込んだ情報量を考える。肉体の成長という観点からは、同じ１歳差でも高齢ほど誤差が小さいと考えられるので、年齢の逆数を利用し、以下のような情報量を考えることができる。この式のｌｏｇは２を底とする対数関数であるが、パーソナル情報匿名化装置全体で統一されていれば１を超える任意の実数を底として用いてもよい。 <Custom calculation class implementation example>
Hereinafter, an implementation example of the method “calculate” of the CustomCalculator class will be described. For example, in the medical field, age is a meaningful item for analysis, and thus can be considered as an information amount calculation target. On the other hand, when considering the simple anonymization that rounds off the numerical value of the age, the number of values that the numerical value of the age takes after the simple anonymization is about 1/10 of that before the anonymization. If the amount of information is calculated under these conditions, information loss may become too large. Therefore, consider an information amount that incorporates the age error before and after anonymization into the information amount. From the viewpoint of the growth of the body, the error is considered to be smaller as the age increases even with the same age difference, so the following information amount can be considered using the reciprocal of age. The log in this equation is a logarithmic function with 2 as the base, but any real number exceeding 1 may be used as the base as long as it is unified throughout the personal information anonymization apparatus.

＜情報量を計算する手順＞
図１８は、情報損失率算出部２３９が各データの情報量を計算する手順を説明するフローチャートである。以下図１８の各ステップについて説明する。 <Procedure for calculating the amount of information>
FIG. 18 is a flowchart for explaining a procedure by which the information loss rate calculation unit 239 calculates the information amount of each data. Hereinafter, each step of FIG. 18 will be described.

（図１８：ステップＳ１８０１）
情報損失率算出部２３９は、各カラムの情報量の計算方法を匿名化属性管理データ２４４から取得する。例えば、匿名化データ２４６に対する情報量を計算したいときは、該当する匿名化データ２４６のテーブル名をキーにして、匿名化管理データ２４３から計算対象データを格納したテーブル名を取得し、そのテーブル名をキーにして匿名化属性管理データ２４４からレコードを取得する。 (FIG. 18: Step S1801)
The information loss rate calculation unit 239 obtains the information amount calculation method for each column from the anonymized attribute management data 244. For example, when calculating the amount of information for the anonymized data 246, the table name storing the calculation target data is obtained from the anonymized management data 243 using the table name of the corresponding anonymized data 246 as a key, and the table name As a key, a record is acquired from the anonymized attribute management data 244.

（図１８：ステップＳ１８０２〜Ｓ１８０３）
情報損失率算出部２３９は、情報量の値を０で初期化する（Ｓ１８０２）。情報損失率算出部２３９は、匿名化属性管理データ２４４から取得した各カラムについて、ステップＳ１８０４〜Ｓ１８０７を実施する（Ｓ１８０３）。 (FIG. 18: Steps S1802 to S1803)
The information loss rate calculation unit 239 initializes the information amount value to 0 (S1802). The information loss rate calculation unit 239 performs Steps S1804 to S1807 for each column acquired from the anonymized attribute management data 244 (S1803).

（図１８：ステップＳ１８０４〜Ｓ１８０５）
情報損失率算出部２３９は、匿名化属性管理データ２４４のレコードからテーブル名、属性名、計算方法、係数を読み出す（Ｓ１８０４）。情報損失率算出部２３９は、ステップＳ１８０４で読み出した計算方法をキーにして、計算定義ファイル１１００から計算方法を実装するクラスの完全修飾名を取得する（Ｓ１８０５）。 (FIG. 18: Steps S1804 to S1805)
The information loss rate calculation unit 239 reads the table name, attribute name, calculation method, and coefficient from the record of the anonymized attribute management data 244 (S1804). The information loss rate calculation unit 239 acquires the fully qualified name of the class implementing the calculation method from the calculation definition file 1100 using the calculation method read in step S1804 as a key (S1805).

（図１８：ステップＳ１８０６）
情報損失率算出部２３９は、ステップＳ１８０５で取得したクラスの完全修飾名を用いて、ＩＣａｌｃｕｌａｔｏｒインタフェースを実装するカスタム計算クラスのインスタンスを取得する。 (FIG. 18: Step S1806)
The information loss rate calculation unit 239 acquires an instance of a custom calculation class that implements the ICalculator interface using the fully qualified name of the class acquired in step S1805.

（図１８：ステップＳ１８０７）
情報損失率算出部２３９は、そのインスタンスに対して、ステップＳ１８０４で取得したテーブル名と属性名を渡し、計算結果を得る。情報損失率算出部２３９は、その結果に対してステップＳ１８０４で取得した係数を乗じて得られる値を、現在の情報量値に加算する。 (FIG. 18: Step S1807)
The information loss rate calculation unit 239 passes the table name and attribute name acquired in step S1804 to the instance, and obtains a calculation result. The information loss rate calculation unit 239 adds a value obtained by multiplying the result by the coefficient acquired in step S1804 to the current information amount value.

（図１８：ステップＳ１８０８）
情報損失率算出部２３９は、ステップＳ１８０４からステップＳ１８０７までのループの処理が完了したあと、最終的な情報量の値を返す。 (FIG. 18: Step S1808)
The information loss rate calculation unit 239 returns the final information amount value after the loop processing from step S1804 to step S1807 is completed.

＜テーブル構成＞
図１９は、匿名化管理データ２４３の構成例を示す図である。匿名化対象データ２４１などの各データは、処理の便宜上、関係データベースのテーブル形式で格納することが望ましい。匿名化装置２３０の各機能部は、これらデータを格納する一意のテーブル名を自動的に生成し、そのテーブルに各データを格納する。匿名化管理データ２４３は、どのテーブルにどのデータが格納されているかを管理するために用いられる。 <Table configuration>
FIG. 19 is a diagram illustrating a configuration example of the anonymization management data 243. Each data such as the anonymization target data 241 is preferably stored in a table format of a relational database for convenience of processing. Each functional unit of the anonymization device 230 automatically generates a unique table name for storing these data, and stores each data in the table. The anonymization management data 243 is used for managing which data is stored in which table.

図２０は、匿名化属性管理データ２４４の構成例を示す図である。匿名化属性管理データ２４４は、匿名化対象データ取込画面４００とｋ匿名化設定画面６００で設定したパラメータを管理するテーブルである。 FIG. 20 is a diagram illustrating a configuration example of the anonymized attribute management data 244. The anonymization attribute management data 244 is a table for managing parameters set on the anonymization target data capture screen 400 and the k anonymization setting screen 600.

図２１は、匿名化対象データ２４１の構成例を示す図である。ここではテーブル形式で格納した後のデータ形式を例示した。計算起点データ２４２、単純匿名化データ２４５、匿名化データ２４６も同様の構成を有する。これらテーブルの構成は、取り込むデータのカラム数などによって異なる。 FIG. 21 is a diagram illustrating a configuration example of the anonymization target data 241. Here, the data format after being stored in the table format is illustrated. The calculation starting point data 242, the simple anonymization data 245, and the anonymization data 246 have the same configuration. The configuration of these tables differs depending on the number of columns of data to be captured.

図２２は、一般化階層データ２４７の構成例を示す図である。一般化階層データ２４７は、図９で例示した一般化階層の構成を記録するデータである。 FIG. 22 is a diagram illustrating a configuration example of the generalized hierarchy data 247. As illustrated in FIG. The generalized hierarchy data 247 is data for recording the configuration of the generalized hierarchy exemplified in FIG.

本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。上記実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。 The present invention is not limited to the embodiments described above, and includes various modifications. The above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described.

上記各構成、機能、処理部、処理手段等は、それらの一部や全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized in hardware by designing a part or all of them, for example, with an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

本実施形態においては、境界５６４を１つのみ追加する例を説明したが、複数の境界５６４を設定できるようにしてもよい。この場合は、例えば単純匿名化実行画面５００上において計算起点データ２４２の対象とするカラムとそうでないカラムを選択できるようにしてもよい。 In the present embodiment, an example in which only one boundary 564 is added has been described, but a plurality of boundaries 564 may be set. In this case, for example, on the simple anonymization execution screen 500, a column that is the target of the calculation origin data 242 and a column that is not so may be selected.

２００：受付部、２１０：計算定義設定部、２３０：匿名化装置、２３１：制御部、２３２：データ取込部、２３３：単純匿名化部、２３４：ｋ匿名化部、２３５：ｋ匿名化設定部、２３６：一般化階層編集部、２３７：情報量計算設定部、２３８：匿名化データ出力部、２３９：情報損失率算出部、２４０：情報管理装置、２４１：匿名化対象データ、２４２：計算起点データ、２４３：匿名化管理データ、２４４：匿名化属性管理データ、２４５：単純匿名化データ、２４６：匿名化データ、２４７：一般化階層データ。 200: reception unit, 210: calculation definition setting unit, 230: anonymization device, 231: control unit, 232: data capture unit, 233: simple anonymization unit, 234: k anonymization unit, 235: k anonymization setting 236: Generalized hierarchy editing unit 237: Information amount calculation setting unit 238: Anonymized data output unit 239: Information loss rate calculation unit 240: Information management device 241: Anonymization target data 242: Calculation Origin data, 243: Anonymized management data, 244: Anonymized attribute management data, 245: Simple anonymized data, 246: Anonymized data, 247: Generalized hierarchical data.

Claims

A program for causing a computer to execute processing for anonymizing personal information, wherein the computer
A simple anonymization step for outputting simple anonymization data obtained by performing simple anonymization on the personal information;
K anonymization step for outputting k anonymization data obtained by performing k anonymization on the simple anonymization data;
Information amount calculation setting step for setting a reference point for calculating a loss rate of information lost by carrying out anonymization,
An information loss rate calculating step for calculating the loss rate;
And execute
In the information amount calculation setting step, in the computer,
In the personal information, in the simple anonymization step, the boundary between the portion that performs simple anonymization and the portion that does not perform is set as the reference point,
In the information loss rate calculating step, the computer
When calculating the loss rate in the k anonymization step, between the information amount of the calculation origin data obtained by carrying out simple anonymization according to the reference point and the information amount of the k anonymization data The personal information anonymization program characterized by calculating said loss rate based on a ratio.

In the information loss rate calculating step, the computer
When calculating the loss rate in the simple anonymization step, the amount of information of the calculation origin data and the simple anonymization obtained by performing simple anonymization on the entire personal information regardless of the boundary The personal information anonymization program according to claim 1, wherein the loss rate is calculated based on a ratio between the amount of data information.

The simple anonymization data or the k anonymization data has one or more columns,
The personal information anonymization program is stored in the computer.
The personal information anonymization program according to claim 1, wherein a calculation procedure defining step for defining a procedure for calculating an information amount of the simple anonymization data or an information amount of the k anonymization data is executed for each column. .

In the calculation procedure definition step, the computer
4. The personal information according to claim 3, wherein an instruction to specify a custom calculation program module that implements the procedure for each column is received and a step of reflecting the custom calculation program module in the procedure is executed according to the instruction. Anonymization program.

The personal information anonymization program includes a calculation program module that implements a process for calculating the information amount of the simple anonymization data or the information amount of the k anonymization data,
The calculation program module implements a program interface for calling a process for calculating the information amount of the simple anonymized data or the information amount of the k anonymized data, and a process for executing a prescribed calculation procedure via the program interface. A default calculation program module, and
In the calculation procedure definition step, the computer
The custom calculation program in which the processing for calculating the information amount of the simple anonymization data or the information amount of the k anonymization data is implemented via the program interface is used instead of the default calculation program module. The personal information anonymization program according to claim 4.

A device for anonymizing personal information,
A simple anonymization unit that outputs simple anonymization data obtained by performing simple anonymization on the personal information;
K anonymization unit that outputs k anonymization data obtained by performing k anonymization on the simple anonymization data;
An information amount calculation setting unit that sets a reference point for calculating a loss rate of information that is impaired by anonymization,
An information loss rate calculation unit for calculating the loss rate,
With
The information amount calculation setting unit sets, as the reference point, a boundary between a portion of the personal information where the simple anonymization unit performs simple anonymization and a portion where it is not performed,
The information loss rate calculation unit
When calculating the loss rate by the k anonymization unit, between the information amount of the calculation origin data obtained by carrying out simple anonymization according to the reference point, and the information amount of the k anonymization data The personal information anonymization device, wherein the loss rate is calculated based on a ratio.