JP2012079205A

JP2012079205A - Personal information anonymizing device and method

Info

Publication number: JP2012079205A
Application number: JP2010225582A
Authority: JP
Inventors: Kunihiko Harada; 邦彦原田; Yoshinori Sato; 嘉則佐藤; Masakazu Ito; 雅一伊藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-10-05
Filing date: 2010-10-05
Publication date: 2012-04-19

Abstract

PROBLEM TO BE SOLVED: To appropriately anonymize personal information of a person who uses a graph structure environment.SOLUTION: A personal information anonymizing device comprises storage means and anonymizing means. The storage means stores adjacent relation information including an adjacent relation information tuple for each location which is an adjacent relation showing which location is adjacent to which location among a finite number of multiple locations, and personal management information including a personal information tuple for each person which is personal information including identification information of a person and location information showing a location used by the person among the multiple locations. The anonymizing means anonymizes each personal information tuple by aggregating only the locations with an adjacent relation which the adjacent relation information shows so that a number of different persons corresponding to multiple anonymized personal information tuples including the same location aggregation becomes equal to or more than a prescribed threshold.

Description

本発明は、個人情報を匿名化するコンピュータ技術に関する。 The present invention relates to computer technology for anonymizing personal information.

個人にまつわる膨大なデータの集積化が進む昨今、個人情報を扱う企業にとってはプライバシへの配慮が必要不可欠なものになっている。個人情報取り扱い事業者にとって、少なくとも個人情報の保護に関する法律（以下、保護法）や関係法令の遵守が必須となっている。保護法は、個人情報の収集や利用等に対して、管理対応を義務付けており、さらにその具体的な措置を各省庁のガイドラインが規定している。 Nowadays, with the accumulation of huge amounts of data related to individuals, privacy considerations are indispensable for companies that handle personal information. For business operators handling personal information, at least compliance with laws on the protection of personal information (hereinafter referred to as protection laws) and related laws and regulations are indispensable. The protection law requires management measures for the collection and use of personal information, and the specific measures are stipulated by the ministries' guidelines.

これらのガイドラインが規定する管理措置の１つに個人情報の匿名化がある。例えば、厚生労働省は医療に関する個人情報の第三者提供、学会発表、医療事故報告等において、特段の必要がない限りはこれを匿名化することを求めている。また、経済産業省でも個人情報の匿名化を第三者提供時の望ましい措置として挙げている。 One of the management measures defined by these guidelines is anonymization of personal information. For example, the Ministry of Health, Labor and Welfare requests that anonymization of personal information related to medical care be made anonymous unless there is a special need in providing information to a third party, presenting an academic conference, or reporting a medical accident. The Ministry of Economy, Trade and Industry also cited anonymization of personal information as a desirable measure when providing it to a third party.

個人情報の匿名化を適用する対象として、氏名、住所、年齢といった個人の属性が挙げられるとともに、現在流行となりつつあるＬＢＳ（Location Based Service）の前提となるＧＰＳ（Global Positioning System）情報や、電子交通乗車券によりログとして残される履歴（鉄道の場合は改札入退場履歴）といった位置情報の匿名化が求められている。この種の匿名化技術は、例えば非特許文献１に記載されている。 Personal information anonymization includes personal attributes such as name, address, and age, as well as GPS (Global Positioning System) information, which is the premise of LBS (Location Based Service), which is becoming popular, and electronic There is a need for anonymizing location information such as history (log entry / exit history in the case of railways) left as a log by a traffic ticket. This type of anonymization technique is described in Non-Patent Document 1, for example.

非特許文献１には、個人のタプル（（時刻、位置）で構成されたタプル）が多数あった場合に、時刻及び位置を曖昧にすることで、少なくとも所定件数以上の個人が同じルートを辿るデータに変換することを保証する方法が記載されている。つまり、非特許文献１の技術は、ＧＰＳデータのように任意の位置を取り得るケースについての位置情報匿名化技術である。 In Non-Patent Document 1, when there are a large number of individual tuples (tuples composed of (time, position)), at least a predetermined number of individuals follow the same route by making the time and position ambiguous. A method for guaranteeing conversion to data is described. That is, the technique of Non-Patent Document 1 is a position information anonymization technique for a case where an arbitrary position can be taken like GPS data.

M.E.Nergiz, M.Atzori, and Y.Saygin, “Towards Trajectory Anonymization: AGeneralization-Based Approach,” Proceedings of the SIGSPATIAL ACM GIS 2008 International Workshop on Security and Privacy in GIS and LBS, California, pp.52−61, 2008.MENergiz, M. Atzori, and Y. Saygin, “Towards Trajectory Anonymization: AGeneralization-Based Approach,” Proceedings of the SIGSPATIAL ACM GIS 2008 International Workshop on Security and Privacy in GIS and LBS, California, pp.52-61, 2008 .

どの位置とどの位置が隣接するかが定義されたグラフ構造を持つ移動環境（以下、グラフ構造環境）がある。そのような移動環境として、例えば、鉄道（例えば、位置＝駅）、路線バス（例えば、位置＝停留所）、飛行機（位置＝空港）、あるいは、道路網（例えば、位置＝交差点）がある。 There is a moving environment having a graph structure in which which position is adjacent to which position (hereinafter referred to as a graph structure environment). Examples of such a mobile environment include a railway (eg, position = station), a route bus (eg, position = stop), an airplane (position = airport), or a road network (eg, position = intersection).

グラフ構造環境では、必ずしも、隣接する位置が距離的に最も近いとは限らない。 In a graph structure environment, adjacent positions are not necessarily closest in distance.

こうした理由から、任意の位置を取り得るケースについての位置情報匿名化技術（非特許文献１の匿名化技術）を、グラフ構造環境を利用した個人の個人情報の匿名化に適用すると、隣接していない位置同士が同じグループに曖昧化されてしまうことがある。故に、グラフ構造環境を利用した個人の個人情報を匿名化することに非特許文献１の技術を適用することは、実用上の観点からは好ましくない。 For these reasons, when the position information anonymization technique (anonymization technique of Non-Patent Document 1) for a case that can take an arbitrary position is applied to anonymization of individual personal information using a graph structure environment, it is adjacent. Missing positions can be obscured by the same group. Therefore, it is not preferable from a practical viewpoint to apply the technique of Non-Patent Document 1 to anonymize personal information of individuals using a graph structure environment.

従って、本発明の目的は、グラフ構造環境を利用した個人の個人情報を適切に匿名化することにある。 Accordingly, an object of the present invention is to appropriately anonymize personal personal information using a graph structure environment.

個人情報匿名化装置が、記憶手段と、匿名化手段を有する。記憶手段が、有限個の複数の位置のどの位置にどの位置が隣接するかの隣接関係を表した隣接関係タプルを各位置について含んだ隣接関係情報と、個人の識別情報と前記複数の位置においてその個人が利用する位置を表す位置情報とを含んだ個人情報である個人情報タプルを各個人について含んだ個人管理情報とを記憶する。匿名化手段が、隣接関係情報が示す隣接関係にある位置同士のみを集合化することにより、同一の位置集合を含んだ複数の匿名化された個人情報タプルに対応する異なる個人の数が所定の閾値以上になるように、各個人情報タプルを匿名化する。 The personal information anonymization device includes a storage unit and an anonymization unit. The storage means includes adjacency information including an adjacency tuple representing an adjacency relation indicating which position is adjacent to which position of the finite number of positions, individual identification information, and the plurality of positions. Personal management information including a personal information tuple, which is personal information including positional information indicating the position used by the individual, is stored for each individual. The anonymization means aggregates only the positions in the adjacent relationship indicated by the adjacent relationship information so that the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set is predetermined. Each personal information tuple is anonymized so as to be equal to or greater than the threshold.

グラフ構造環境の利用者の個人情報を適切に匿名化することができる。 It is possible to appropriately anonymize the personal information of the user of the graph structure environment.

本発明の実施例１に係る個人情報匿名化装置が適用された計算機の構成例を示す。The structural example of the computer to which the personal information anonymization apparatus which concerns on Example 1 of this invention was applied is shown. 図２Ａは、駅隣接構造テーブル１３１の一例を示す。図２Ｂは、図２Ａが示すテーブル１３１が表す隣接関係の模式図である。FIG. 2A shows an example of the station adjacent structure table 131. FIG. 2B is a schematic diagram of the adjacency relationship represented by the table 131 shown in FIG. 2A. 実施例１に係る改札入退場テーブル１３２の一例を示す。An example of the ticket gate entrance / exit table 132 which concerns on Example 1 is shown. 実施例１に係る最小同値件数情報１３４の一例を示す。An example of the minimum equivalence number information 134 which concerns on Example 1 is shown. 実施例１に係る匿名改札入退場テーブル１３３の一例を示す。An example of the anonymous ticket gate entrance / exit table 133 which concerns on Example 1 is shown. 実施例１に係る計算機１００が実行する処理の全体の流れの一例を示す。An example of the flow of the whole process which the computer 100 which concerns on Example 1 performs is shown. 図６のＳ６０１の詳細な流れの一例を示す。An example of a detailed flow of S601 in FIG. 6 is shown. 図６のＳ６０２の詳細な流れの一例を示す。An example of the detailed flow of S602 of FIG. 6 is shown. 図８のＳ８０３の詳細な流れの一例を示す。An example of the detailed flow of S803 of FIG. 8 is shown. 図１０Ａ及び図１０Ｂは、実施例１に係る予測開示リスクの一例の説明図である。10A and 10B are explanatory diagrams of an example of the prediction disclosure risk according to the first embodiment. 本発明の実施例２に係る個人情報匿名化装置が適用された計算機の構成例を示す。The structural example of the computer with which the personal information anonymization apparatus which concerns on Example 2 of this invention was applied is shown. 図１２Ａは、路線駅隣接構造テーブル１１３１の構成例を示す。図１２Ｂは、図１２Ａが示すテーブル１１３１が表す路線構造の模式図である。FIG. 12A shows a configuration example of the route station adjacent structure table 1131. FIG. 12B is a schematic diagram of the route structure represented by the table 1131 shown in FIG. 12A. 実施例２に係る路線探索部１１２２の出力例を示す。The example of an output of the route search part 1122 which concerns on Example 2 is shown. 実施例２に係る匿名改札入退場テーブルの一例を示す。An example of the anonymous ticket gate entrance / exit table which concerns on Example 2 is shown. 実施例２に係る計算機１００が実行する処理の全体の流れの一例を示す。An example of the flow of the whole process which the computer 100 which concerns on Example 2 performs is shown. 図１５のＳ１５０３の詳細な流れの一例を示す。An example of the detailed flow of S1503 of FIG. 15 is shown. 図１５のＳ１５０３の詳細な流れの別の一例を示す。Another example of the detailed flow of S1503 of FIG. 15 is shown. 図１８Ａは、直線的な路線構造の一例を示す。図１８Ｂは、環状的な路線構造の一例を示す。FIG. 18A shows an example of a straight line structure. FIG. 18B shows an example of an annular route structure. 図１９Ａは、実施例２に係るＨｕ−Ｔｕｃｋｅｒ木の一例を示す。図１９Ｂは、実施例２に係る利用頻度の一例を示す。FIG. 19A illustrates an example of a Hu-Tucker tree according to the second embodiment. FIG. 19B illustrates an example of usage frequency according to the second embodiment. 図１７のＳ１７０３の詳細の流れの一例を示す。An example of the detailed flow of S1703 in FIG. 17 is shown. 本発明の実施例３に係る個人情報匿名化装置が適用された計算機の構成例を示す。The structural example of the computer with which the personal information anonymization apparatus which concerns on Example 3 of this invention was applied is shown. 実施例３に係る定期マスタテーブル２１３１の一例を示す。An example of the regular master table 2131 according to the third embodiment is shown. 実施例３に係る匿名定期マスタテーブル２１３２の一例を示す。An example of the anonymous regular master table 2132 which concerns on Example 3 is shown. 実施例３に係る計算機１００が実行する処理の全体の流れの一例を示す。14 shows an example of the overall flow of processing executed by the computer 100 according to the third embodiment. 本発明の実施例４に係る個人情報匿名化装置が適用された計算機の構成例を示す。The structural example of the computer with which the personal information anonymization apparatus which concerns on Example 4 of this invention was applied is shown. 実施例４に係る路線探索部２５２２の出力の一例を示す。An example of the output of the route search part 2522 which concerns on Example 4 is shown. 実施例４に係る計算機１００が実行する処理の全体の流れの一例を示す。An example of the flow of the whole process which the computer 100 which concerns on Example 4 performs is shown. 図２７のＳ２７０２の詳細な流れの一例を示す。An example of the detailed flow of S2702 of FIG. 27 is shown. 図２７のＳ２７０２の詳細な流れの別の一例を示す。Another example of the detailed flow of S2702 of FIG. 27 is shown.

以下、本発明の幾つかの実施例を、図面を参照して詳細に説明する。 Hereinafter, some embodiments of the present invention will be described in detail with reference to the drawings.

なお、以下の説明では「ｘｘｘテーブル」の表現にて各種情報を説明することがあるが、各種情報は、テーブル以外のデータ構造で表現されていても良い。データ構造に依存しないことを示すために「ｘｘｘテーブル」を「ｘｘｘ情報」と呼ぶことができる。 In the following description, various types of information may be described using the expression “xxx table”, but the various types of information may be expressed using a data structure other than a table. In order to show that it does not depend on the data structure, the “xxx table” can be called “xxx information”.

また、以下の説明では、要素を特定するためにＩＤ（識別子）が使用されるが、ＩＤに代えて、識別情報として、名前又は番号などが使用されて良い。 In the following description, an ID (identifier) is used to specify an element, but a name or a number may be used as identification information instead of the ID.

また、以下の説明では、「プログラム」がプロセッサ（例えばＣＰＵ（Central Processing Unit））によって実行されることで発揮される機能を主語にして処理を説明することがあるが、定められた処理を、適宜に記憶資源（例えばメモリ）及び／又は通信インタフェース装置（例えば通信ポート）を用いながら行うため、処理の主語がプロセッサとされても良い。プロセッサは、プロセッサが行う処理の一部又は全部を行うハードウェア回路を含んでも良い。コンピュータプログラムは、プログラムソースから各計算機にインストールされても良い。プログラムソースは、例えば、プログラム配布サーバ又は記憶メディアであっても良い。 Further, in the following description, there is a case where a process is described by using a function that is exhibited when a “program” is executed by a processor (for example, a CPU (Central Processing Unit)) as a subject. Since the processing is performed using a storage resource (for example, a memory) and / or a communication interface device (for example, a communication port) as appropriate, the subject of processing may be a processor. The processor may include a hardware circuit that performs part or all of the processing performed by the processor. The computer program may be installed on each computer from a program source. The program source may be, for example, a program distribution server or a storage medium.

また、以下の全ての実施例は、主に電子的な形態の個人情報を保護する技術に関する。実施例１及び２では、匿名化される個人情報は、グラフ構造環境における位置（隣接関係を定義可能な位置）を表す位置情報のログデータ（移動ログデータ）である。位置は、例えば、鉄道のケースでは駅、バスのケースでは停留所、メッシュ化した地図情報のケースではメッシュである。 Also, all the following embodiments relate to a technique for protecting personal information mainly in electronic form. In the first and second embodiments, the personal information to be anonymized is position information log data (movement log data) representing a position (a position where an adjacency relationship can be defined) in the graph structure environment. The position is, for example, a station in the case of a railway, a stop in the case of a bus, or a mesh in the case of meshed map information.

以下の全ての実施例では、グラフ構造環境として鉄道を例に取る。以下の実施例１及び２において、個人情報の匿名化とは、情報主体（利用者）を一意に識別できないように移動ログデータを変換する処理をいう。また、再符号化とは、移動ログデータに含まれている時刻情報や位置情報を、より曖昧な概念へと置き換えることをいう。また、以下の説明では、「駅」という用語は、通常通りにある１つの駅単体に対して用いる。また、「駅Ａまたは駅Ｂまたは駅Ｃ」を、「駅｛Ａ，Ｂ，Ｃ｝」のように駅集合で表現することがある。 In all the following examples, railway is taken as an example of the graph structure environment. In Examples 1 and 2 below, anonymization of personal information refers to a process of converting movement log data so that an information subject (user) cannot be uniquely identified. Re-encoding means replacing time information and position information included in the movement log data with a more ambiguous concept. In the following description, the term “station” is used for a single station as usual. Further, “station A or station B or station C” may be expressed as a set of stations such as “station {A, B, C}”.

以下の説明では、「入退場非区別方式」と「入退場区別方式」を説明する。それぞれの方式で、別個の意味を持つ場合には明確にその旨を記す。特に方式を区別しない場合には、両方式共通であることを意味する。 In the following description, “entrance / exit distinction method” and “entrance / exit distinction method” will be described. If each method has a different meaning, clearly indicate that fact. If the methods are not particularly distinguished, it means that both methods are common.

図１は、本発明の実施例１に係る個人情報匿名化装置が適用された計算機の構成例を示す。 FIG. 1 shows a configuration example of a computer to which a personal information anonymization apparatus according to Embodiment 1 of the present invention is applied.

計算機１００は、情報処理装置であり、例えば、ＰＣ（Personal Computer）、サーバ或いはワークステーションである。計算機１００は、ＣＰＵ（Central Processing Unit）１０１、メモリ１０２、ストレージ１０３、入力装置１０４、出力装置１０５及び通信装置１０６を有する。これらは全て、バスなどの内部通信線１０７により互いに接続されている。 The computer 100 is an information processing apparatus, such as a PC (Personal Computer), a server, or a workstation. The computer 100 includes a CPU (Central Processing Unit) 101, a memory 102, a storage 103, an input device 104, an output device 105, and a communication device 106. All of these are connected to each other by an internal communication line 107 such as a bus.

ストレージ１０３は、例えばＣＤ−Ｒ（ＣｏｍｐａｃｔＤｉｓｃＲｅｃｏｒｄａｂｌｅ）やＤＶＤ−ＲＡＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、シリコンディスク等の記憶メディア及び当該記憶メディアの駆動装置、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等である。ストレージ１０３は、駅隣接構造テーブル１３１、改札入退場テーブル１３２、匿名改札入退場テーブル１３３、最小同値件数情報１３４及びプログラム１５１を記憶する。 The storage 103 is, for example, a CD-R (Compact Disc Recordable), a DVD-RAM (Digital Versatile Disk Random Access Memory), a storage medium such as a silicon disk, a drive device for the storage medium, an HDD (Hard Disk Drive), or the like. The storage 103 stores a station adjacent structure table 131, a ticket gate entrance / exit table 132, an anonymous ticket gate entrance / exit table 133, minimum equivalence number information 134, and a program 151.

駅隣接構造テーブル１３１は、どの駅がどの駅の隣駅であるかの情報を格納する。 The station adjacent structure table 131 stores information on which station is the next station to which station.

改札入退場テーブル１３２は、複数の個人が駅に存在する改札を利用した際に残される個人の位置情報の履歴（移動ログデータ）を格納する。 The ticket gate entry / exit table 132 stores a history of personal location information (movement log data) that is left when a plurality of individuals use a ticket gate existing at a station.

匿名改札入退場テーブル１３３は、改札入退場テーブル１３２が有する移動ログデータを匿名化した結果を格納する。 The anonymized ticket gate entrance / exit table 133 stores a result of anonymizing the movement log data included in the ticket gate entrance / exit table 132.

最小同値件数情報１３４は、しきい値を格納する。 The minimum equivalence number information 134 stores a threshold value.

プログラム１５１は、後述する機能を実現するためのものである。 The program 151 is for realizing functions to be described later.

入力装置１０４は、例えばキーボード、マウス、スキャナ、マイク等である。出力装置１０５は、ディスプレイ装置、プリンタ、スピーカ等である。入力装置１０４及び出力装置が一体になっていても良い（例えば、タッチパネル型のディスプレイ装置）。 The input device 104 is, for example, a keyboard, a mouse, a scanner, a microphone, or the like. The output device 105 is a display device, a printer, a speaker, or the like. The input device 104 and the output device may be integrated (for example, a touch panel display device).

通信装置１０６は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ボード等であり、通信ネットワーク（図示略）と接続することができる。 The communication device 106 is, for example, a LAN (Local Area Network) board or the like, and can be connected to a communication network (not shown).

ＣＰＵ１０１は、メモリ１０２上にプログラム１５１をロードし、実行することにより、時区間化部１２１及び位置匿名化部１２２を実現する。 The CPU 101 loads the program 151 on the memory 102 and executes it, thereby realizing the time segmentation unit 121 and the position anonymization unit 122.

時区間化部１２１は、改札入退場テーブル１３２を入力とし、テーブルに含まれる時刻情報を、該時刻情報を含む２つの時刻の間の時区間へと変換したテーブルを出力とし、位置匿名化部１２２の入力に渡す。 The time interval unit 121 receives the ticket gate entrance / exit table 132 as an input, outputs a table obtained by converting time information included in the table into a time interval between two times including the time information, and outputs a position anonymization unit Pass to 122 input.

位置匿名化部１２２は、時区間化部１２１から渡されたテーブル、及び駅隣接構造テーブル１３１、並びに最小同値件数情報１３４を入力とする。入退場非区別方式の場合には、位置匿名化部１２２は、任意の時区間に任意の駅に存在した総人数が０（又は、最小同値件数情報１３４に格納された値以上）になるよう、駅を再符号化する。入退場区別方式の場合には、位置匿名化部１２２は、任意の時区間に任意の駅に入場した総人数、及び、任意の時区間に任意の駅から退場した総人数が、それぞれ０（又は、最小同値件数情報１３４に格納された値以上）になるよう、駅を再符号化する。位置匿名化部１２２は、再符号化により得られた結果を匿名改札入退場テーブル１３３に格納する。なお、位置匿名化部１２２は、その再符号化により得られた結果を、出力装置１０５を介して出力してもよい。 The position anonymization unit 122 receives as input the table passed from the time segmentation unit 121, the station adjacent structure table 131, and the minimum equivalence number information 134. In the case of the entrance / exit non-distinguishing method, the position anonymization unit 122 causes the total number of people present at any station in any time interval to be 0 (or more than the value stored in the minimum equivalence number information 134). Re-encode the station. In the case of the entrance / exit distinction method, the position anonymization unit 122 determines that the total number of people who entered any station in any time section and the total number of people who left the station in any time section are 0 ( Alternatively, the station is re-encoded so that the value is equal to or greater than the value stored in the minimum equivalence number information 134. The position anonymization unit 122 stores the result obtained by re-encoding in the anonymous ticket gate entrance / exit table 133. Note that the position anonymization unit 122 may output the result obtained by the re-encoding via the output device 105.

次に、上述した各テーブルの詳細を説明する。 Next, details of each table described above will be described.

図２Ａは、駅隣接構造テーブル１３１の一例を示す。 FIG. 2A shows an example of the station adjacent structure table 131.

駅隣接構造テーブル１３１は、複数のレコードを有する。各レコードは、どの駅とどの駅が隣り合っているかを示す。図２Ａのテーブル１３１は、図２Ｂに示す線路構造２００で表されるグラフ構造を表している。即ち、線路構造２００は、駅を表す位置（ノード）２０１と駅と駅との隣接関係（線路）を表す辺（リンク）２０２で構成され、各辺は、駅隣接構造テーブル１３１の各レコードに対応する（つまり、レコードと辺が１対１で対応する）。 The station adjacent structure table 131 has a plurality of records. Each record indicates which station is adjacent to which station. The table 131 in FIG. 2A represents a graph structure represented by the line structure 200 shown in FIG. 2B. In other words, the track structure 200 includes a position (node) 201 representing a station and an edge (link) 202 representing an adjacent relationship (track) between the station and the station, and each edge is stored in each record of the station adjacent structure table 131. Corresponding (that is, the record and the edge correspond one-to-one).

なお、駅隣接構造テーブル１３１の構成は、図２Ａに示す例に限られない。グラフ構造には様々な表現形式が存在し、実際の駅の隣接関係を定義できるものであれば、その何れでも良い。 In addition, the structure of the station adjacent structure table 131 is not restricted to the example shown to FIG. 2A. There are various expression formats in the graph structure, and any of them can be used as long as it can define the adjacent relationship between actual stations.

図３は、改札入退場テーブル１３２の一例を示す。 FIG. 3 shows an example of the ticket gate entry / exit table 132.

改札入退場テーブル１３２は、複数のレコードを有する。１つのレコードは、ある個人がある駅にある時刻に入場または退場したことを意味する。即ち、改札入退場テーブル１３２は、移動ログ毎に、下記の情報を有するレコード、
（１）移動ログに関わる個人の識別子であるユーザＩＤ３０１、
（２）個人が存在した駅の識別子である駅３０２、
（３）個人が駅に入場又は駅から退場した時刻を表す時刻３０３、
（４）個人が駅に入場したのか退場したのかを示す入退場３０４、
を有する。図３の例によれば、ユーザＩＤ「２」の個人が２０１０年５月１２日７時２２分１９秒に駅「Ｂ」に入場し、駅「Ｈ」まで移動し、駅「Ｈ」から同日７時４２分１１秒に退場し、再度、同日７時４８分５２秒に駅「Ｈ」に入場したことが分かる。 The ticket gate entry / exit table 132 has a plurality of records. One record means that an individual has entered or left a certain station at a certain time. That is, the ticket gate entry / exit table 132 includes a record having the following information for each movement log:
(1) a user ID 301 which is an identifier of an individual related to the movement log,
(2) Station 302, which is an identifier of the station where the individual exists,
(3) a time 303 representing the time when an individual enters or leaves the station;
(4) Entrance / exit 304 indicating whether an individual has entered or left the station,
Have According to the example of FIG. 3, the individual with the user ID “2” enters the station “B” at 7:22:19 on May 12, 2010, moves to the station “H”, and from the station “H”. It turns out that he left the station at 7:42:11 and entered the station “H” again at 7:48:52 on the same day.

改札入退場テーブル１３２は、随時に又は定期的に更新される。 The ticket gate entry / exit table 132 is updated at any time or periodically.

なお、改札入退場テーブル１３２の構成は、図３に示した例に限られない。例えば、ユーザの入場と退場をセットにしたテーブルが採用されても良い。しかし、図３のようなテーブル表現形式に変換可能なテーブルが望ましい。また、図３の例によれば、鉄道の改札入退場履歴が移動ログとして採用されるが、前述のようにバスの停留所やグラフ構造を持つ地理情報など、グラフ構造における任意の位置について移動ログが取得されて良い。 The structure of the ticket gate entrance / exit table 132 is not limited to the example shown in FIG. For example, a table in which the user's entry and exit are set may be employed. However, a table that can be converted into a table representation format as shown in FIG. 3 is desirable. In addition, according to the example of FIG. 3, the ticket gate entry / exit history of the railway is adopted as the movement log. However, as described above, the movement log for an arbitrary position in the graph structure such as a bus stop or geographical information having a graph structure. Good to be acquired.

図４は、最小同値件数情報１３４の一例を示す。 FIG. 4 shows an example of the minimum equivalence number information 134.

図４の例によれば、最小同値件数４０１が５件である。最小同値件数４０１は、入退場非区別方式では、任意の時区間に任意の駅に存在する個人の数が０又は最小同値件数４０１以上となる情報（匿名化された情報）を公開しても個人を一意に特定することが困難であると見なされる値である。ここで、「存在する」とは、本実施例では、入場または退場を行ったことを意味する。なお、任意の時区間及び任意の駅に存在するレコードの数ではなく、個人の数であることが重要である。また、入退場区別方式の場合には、最小同値件数４０１は、任意の時区間に任意の駅に改札を通って入場した個人の数と改札を通って退場した個人の数がそれぞれ０に一致または最小同値件数４０１以上となる情報（匿名化された情報）を公開しても個人を一意に特定することが困難であると見なされる値である。 According to the example of FIG. 4, the minimum equivalence number 401 is five. The minimum equivalence number 401 is an entry / exit non-distinguishing method even if information (anonymized information) in which the number of individuals existing at any station in any time interval is 0 or the minimum equivalence number 401 or more is disclosed. A value that is considered difficult to uniquely identify an individual. Here, “exists” means that in this embodiment, entry or exit has been performed. In addition, it is important that it is the number of individuals, not the number of records existing in an arbitrary time section and an arbitrary station. In the case of the entrance / exit distinction method, the minimum equivalence number 401 is equal to 0 for the number of individuals who entered through a ticket gate at any station and the number of individuals who exited through a ticket gate at any time interval. Or even if information (anonymized information) having a minimum equivalence number of 401 or more is disclosed, it is a value considered that it is difficult to uniquely identify an individual.

なお、最小同値件数４０１の値は５件に限定されず、任意の値で良い。本明細書では、最小同値件数４０１の値を「ｋ」とすることがある。本実施例は、ｋ−匿名化に関するからである。 Note that the value of the minimum equivalence number 401 is not limited to five, and may be an arbitrary value. In this specification, the value of the minimum equivalence number 401 may be “k”. This is because the present embodiment relates to k-anonymization.

図５は、匿名改札入退場テーブル１３３の一例を示す。 FIG. 5 shows an example of the anonymous ticket gate entrance / exit table 133.

匿名改札入退場テーブル１３３は、複数のレコードを有する。匿名改札入退場テーブル１３３のレコードが有する情報は、改札入退場テーブル１３２のレコードが有する情報が再符号化（曖昧化）された情報である。匿名改札入退場テーブル１３３のレコード（以下、匿名レコード）は、改札入退場テーブル１３２のレコード（以下、通常レコード）と１対１で対応している。つまり、ｎ番目の通常レコードはｎ番目の匿名レコードに対応する（ｎは自然数）。匿名改札入退場テーブル１３３の各レコードは、下記の情報、
（１）個人の識別子を表すユーザＩＤ５０１、
（２）個人が存在した駅の識別子を表す駅５０２、
（３）個人が駅に存在した時区間を表す時区間５０３、
（４）個人が駅に入場したのか退場したのかを示す入退場３０４、
を有する。 The anonymous ticket gate entrance / exit table 133 has a plurality of records. The information included in the record of the anonymous ticket gate entry / exit table 133 is information obtained by re-encoding (obscuring) the information included in the record of the ticket gate entry / exit table 132. A record in the anonymous ticket gate entry / exit table 133 (hereinafter, anonymous record) has a one-to-one correspondence with a record in the ticket gate entry / exit table 132 (hereinafter, normal record). That is, the nth normal record corresponds to the nth anonymous record (n is a natural number). Each record of the anonymous ticket gate entry / exit table 133 includes the following information:
(1) a user ID 501 representing an individual identifier;
(2) a station 502 representing the identifier of the station where the individual existed;
(3) a time interval 503 representing the time interval in which the individual was present at the station;
(4) Entrance / exit 304 indicating whether an individual has entered or left the station,
Have

ｎ番目の匿名レコード内のユーザＩＤ５０１は、ｎ番目の通常レコード内のユーザＩＤ３０１に一致する。なお、本実施例では、同一の個人であれば、ユーザＩＤ５０１とユーザＩＤ３０１は同一であるが、匿名性を高めるために、ユーザＩＤ５０１として、ユーザＩＤ３０１と１対１に対応する別のユーザＩＤが採用されてもよい。 The user ID 501 in the nth anonymous record matches the user ID 301 in the nth normal record. In this embodiment, the user ID 501 and the user ID 301 are the same for the same individual. However, in order to improve anonymity, another user ID corresponding to the user ID 301 and the user ID 501 is one-to-one. It may be adopted.

２番目の匿名レコードの駅５０２によれば、個人「２」は、駅「Ｂ」又は駅「Ｃ」に存在したことがわかる。２番目の通常レコード（図３参照）によれば、個人「２」は駅「Ｂ」に存在したことがわかるが、２番目の通常レコードが有する情報の匿名化により、駅３０２が表す情報「Ｂ」が、駅５０２が表す情報「Ｂ、Ｃ」に再符号化されている。 According to the station 502 of the second anonymous record, it can be seen that the individual “2” exists at the station “B” or the station “C”. According to the second normal record (see FIG. 3), it can be seen that the individual “2” exists at the station “B”, but the information “ B ”is re-encoded into information“ B, C ”represented by the station 502.

ｎ番目の匿名レコード内の時区間５０３が表す時区間は、ｎ番目の通常レコード内の時刻３０３が表す時刻を含んだ区間である。 The time section represented by the time section 503 in the nth anonymous record is a section including the time represented by the time 303 in the nth normal record.

ｎ番目の匿名レコードが有する入退場５０４は、ｎ番目の通常レコード内の入退場３０４と一致する。 The entrance / exit 504 included in the nth anonymous record matches the entrance / exit 304 in the nth normal record.

匿名改札入退場テーブル１３３は、入退場非区別方式の場合には、以下の制約（Ｘ）を満たさねばならない。
（Ｘ）同一の駅５０２及び時区間５０３の組合せに対応した異なる個人（ユーザＩＤ５０１）の数が、ｋ以上（最小同値件数４０１以上）である。 The anonymous ticket gate entrance / exit table 133 must satisfy the following restriction (X) in the case of the entrance / exit non-distinguishing method.
(X) The number of different individuals (user IDs 501) corresponding to the combination of the same station 502 and time interval 503 is k or more (minimum equivalent number 401 or more).

一方、匿名改札入退場テーブル１３３は、入退場区別方式の場合には、以下の制約（Ｙ）を満たさなくてはならない。
（Ｙ）同一の駅５０２、時区間５０３及び入退場５０４の組合せに対応した異なる個人（ユーザＩＤ５０１）の数がｋ以上（最小同値件数４０１以上）である。 On the other hand, the anonymous ticket gate entrance / exit table 133 must satisfy the following restriction (Y) in the case of the entrance / exit distinction method.
(Y) The number of different individuals (user IDs 501) corresponding to the combination of the same station 502, time interval 503, and entrance / exit 504 is k or more (minimum equivalent number 401 or more).

このテーブル１３３は、図６に示す一連の処理が行われることにより生成されたテーブルである。 The table 133 is a table generated by performing a series of processes shown in FIG.

図６は、実施例１に係る計算機１００が実行する処理の全体の流れの一例を示す。 FIG. 6 shows an example of the overall flow of processing executed by the computer 100 according to the first embodiment.

まず、Ｓ６０１で、時区間化部１２１は、改札入退場テーブル１３２を参照し、時刻３０３を時区間５０３へと再符号化し、その時区間５０３を位置匿名化部１２２に入力として渡す。 First, in step S <b> 601, the time interval unit 121 refers to the ticket gate entrance / exit table 132, re-encodes the time 303 into the time interval 503, and passes the time interval 503 to the position anonymization unit 122 as input.

次に、Ｓ６０２で、位置匿名化部１２２が、下記の処理、
（ａ）Ｓ６０１における時区間化部１２１の出力と、駅隣接構造テーブル１３１と、改札入退場テーブル１３２と、最小同値件数情報１３４とを参照する、
（ｂ）入退場非区別方式の場合には、匿名化改札入場テーブル１３３に格納されることになる同一の駅５０２及び時区間５０３の組合せを含んだタプル（レコード）に対応する異なる個人の数がｋ以上（最小同値件数４０１以上）になるように駅３０２を再符号化する、
（ｃ）入退場区別方式の場合には、匿名化改札入場テーブル１３３に格納されることになる同一の駅５０２、時区間５０３及び入退場５０４の組合せを含んだタプル（レコード）に対応する異なる個人の数がｋ以上（最小同件数４０１以上）になるように駅３０２を再符号化する、
（ｄ）上記（ｂ）又は（ｃ）の結果に従う駅５０２（駅３０２が再符号化されることにより得られた情報）と、それに対応するユーザＩＤ５０１、時区間５０３及び入退場５０４を、匿名改札入退場テーブル１３３に格納する、
を行う。 Next, in S602, the position anonymization unit 122 performs the following processing:
(A) Refer to the output of the time segmenting unit 121 in S601, the station adjacent structure table 131, the ticket gate entrance / exit table 132, and the minimum equivalence number information 134.
(B) In the case of the entrance / exit non-distinguishing method, the number of different individuals corresponding to the tuple (record) including the combination of the same station 502 and time interval 503 to be stored in the anonymized ticket gate entrance table 133 Re-encode the station 302 so that is equal to or greater than k (minimum equivalent number 401).
(C) In the case of the entrance / exit distinction method, the tuple (record) corresponding to the combination of the same station 502, time interval 503 and entrance / exit 504 to be stored in the anonymized ticket gate entrance table 133 is different. Re-encode station 302 so that the number of individuals is k or more (minimum number of cases 401 or more).
(D) The station 502 (information obtained by re-encoding the station 302) according to the result of (b) or (c) above, and the corresponding user ID 501, time interval 503, and entrance / exit 504 are anonymous. Store in the ticket gate entrance / exit table 133,
I do.

図７は、図６のＳ６０１の詳細な流れの一例を示す。従って、図７に示す全てのステップの処理の主体は、時区間化部１２１である。 FIG. 7 shows an example of the detailed flow of S601 in FIG. Therefore, the subject of all the steps shown in FIG.

まず、時区間化部１２１は、最終的に出力するテーブルのレコードを格納するＢを空集合（レコードなし）に初期化する（Ｓ７０１）。 First, the time segmentation unit 121 initializes B, which stores a record of a table to be finally output, to an empty set (no record) (S701).

次に、時区間化部１２１は、改札入退場テーブル１３２を参照し、これを各日ごとに分割し、それぞれの日について以下のループを実行する。なお、実行対象とする日のレコード全体をＤに格納する（Ｓ７０２）。 Next, the time segmentation unit 121 refers to the ticket gate entry / exit table 132, divides it into each day, and executes the following loop for each day. The entire record of the day to be executed is stored in D (S702).

時区間化部１２１は、Ｄ内のレコードを時刻順にソートする。時区間化部１２１は、ｉ番目のレコードとｊ（ｊ＞ｉ）番目のレコードに対して、ｊの時刻がｉの時刻よりも後になるようにする（Ｓ７０３）。また、時区間化部１２１は、レコードを格納及び編集するためのテーブルＡを初期化する（Ｓ７０４）。 The time segmentation unit 121 sorts the records in D in order of time. The time segmentation unit 121 causes the time of j to be later than the time of i for the i-th record and the j (j> i) -th record (S703). Further, the time segmentation unit 121 initializes a table A for storing and editing records (S704).

次に、時区間化部１２１は、Ｄ内のレコード数が、システムによって定められたＮよりも小さいかどうかで条件分岐を行う（Ｓ７０５）。ただし、Ｎはｋ以上の整数値とする。Ｎは、例えば１０００００などの値を持ち、最終的に各時区間の有するレコード数の概数にあたる。 Next, the time segmentation unit 121 performs conditional branching depending on whether the number of records in D is smaller than N determined by the system (S705). Here, N is an integer value greater than or equal to k. N has a value such as 100,000, for example, and finally corresponds to the approximate number of records in each time interval.

Ｓ７０５で、時区間化部１２１は、Ｄ内のレコード数がＮよりも小さくないと判定された場合には（Ｓ７０５：Ｎｏ）、Ａ内のレコード全てを、Ａ内の任意のレコードが持つ時刻を含む時区間に置き換える（Ｓ７０６）。例えば、Ａ内のレコードの中で最も早い時刻が10/05/12 07:22:23で、最も遅い時刻が10/05/12 07:30:36の場合には、時区間化部１２１は、10/05/12 07:22:23〜10/05/12 07:30:36という時区間を算出する。さらに、時区間化部１２１は、ＢにＡ内のレコードを全て追加し、Ａを初期化する（Ｓ７０７）。なお、Ｓ７０６及びＳ７０７は、初回に実行される時にはどの変数も変化しない。 In S705, when the time segmenting unit 121 determines that the number of records in D is not smaller than N (S705: No), the time that any record in A has all the records in A (S706). For example, if the earliest time among the records in A is 10/05/12 07:22:23 and the latest time is 10/05/12 07:30:36, the time segmenting unit 121 , 10/05/12 07:22:23 to 10/05/12 07:30:36 time interval is calculated. Further, the time segmentation unit 121 adds all the records in A to B and initializes A (S707). In S706 and S707, none of the variables changes when executed for the first time.

次に、時区間化部１２１は、ＡにＤの中の最初のＮ件のレコードを格納する。図７でＤ（Ｎ）は、Ｄの最初のＮ件を意味する。時区間化部１２１は、ＡにＮ件のレコードを格納した後、Ｄ（Ｎ）をＤから削除する（Ｓ７０８）。 Next, the time segmentation unit 121 stores the first N records in D in A. In FIG. 7, D (N) means the first N cases of D. After storing N records in A, the time segmenting unit 121 deletes D (N) from D (S708).

さらに、時区間化部１２１は、Ａのレコードの時刻の中で最も遅い時刻と同じ時刻を持つレコードをＤの中から全てＡに移す。時区間化部１２１は、移したレコードをＤから削除する（Ｓ７０９）。このプロセスにより、時区間化部１２１の処理結果のデータの持つ各時区間が、重ならないようにすることができる。Ｓ７０９の処理を終えた後は、Ｓ７０５の条件分岐に再度戻る。 Further, the time segmenting unit 121 moves all records having the same time as the latest time among the records of A from A to A. The time segmentation unit 121 deletes the transferred record from D (S709). By this process, it is possible to prevent the time intervals of the processing result data of the time interval unit 121 from overlapping each other. After finishing the process of S709, it returns to the conditional branch of S705 again.

Ｓ７０５で、時区間化部１２１は、Ｄ内のレコード数がＮよりも小さいと判定された場合には（Ｓ７０５：Ｙｅｓ）、まず、Ｄ内の全てのレコードをＡに追加する（Ｓ７１０）。このプロセスにより、時区間化部１２１の処理結果のデータは、どの時区間をとっても少なくともＮ件以上あるようにすることができる。 If it is determined in S705 that the number of records in D is smaller than N (S705: Yes), first, all the records in D are added to A (S710). With this process, the processing result data of the time interval unit 121 can be at least N in any time interval.

続いて行うＳ７１１及びＳ７１２は、それぞれＳ７０６、Ｓ７０７と全く同一である。 Subsequent S711 and S712 are exactly the same as S706 and S707, respectively.

時区間化部１２１は、以上のループ処理を、全ての日について行い、最後に、テーブルＢを出力して、Ｓ６０２の入力とし（Ｓ７１３）、処理を終える。 The time segmentation unit 121 performs the above loop processing for all the days, and finally outputs the table B to be input in S602 (S713) and ends the processing.

図７の例では、各時区間の持つレコードの概数をＮ件とすることにより、各時区間の持つレコード数の平準化を図っている。これは、通勤時間帯や帰宅時間帯は鉄道の利用者が大幅に増えることや、日中の鉄道利用が前記時間帯に比べて少ないことに対応するためのものである。これを行うことにより、利用頻度の高い通勤時間帯などは、時刻が他時間帯と比べてそれほど曖昧化されずに、匿名改札入退場データを利用する立場からは有用性が高いデータを生成できる。 In the example of FIG. 7, the number of records of each time interval is equalized by setting the approximate number of records of each time interval to N. This is to cope with a significant increase in the number of railway users in the commuting time zone and the return time zone, and the fact that the daytime railway usage is less than that in the time zone. By doing this, it is possible to generate highly useful data from the standpoint of using anonymous ticket gate entry / exit data, such as commuting hours, which are frequently used, without the time being so obscured compared to other times .

また、時区間の有するレコード数を平準化する方法は、図７に示した手法に限られない。例えば、１日ｔ件のレコードが存在した場合に、時区間の分割数Ｌを予め指定しておくことで、各々の時区間をおよそｔ／Ｌのレコード数を持つよう分割していく方法なども用いることができる。 Also, the method of leveling the number of records in the time interval is not limited to the method shown in FIG. For example, when there are t records per day, a method of dividing each time interval so as to have a record number of about t / L by specifying the division number L of the time interval in advance. Can also be used.

なお、図７は、各時区間の持つレコード数を平準化する１つの方法を例示したが、例えば、１５分毎や１時間毎といった一定の時間で区切って時区間を構成しても良い。例えば、１５分毎の場合には、07:23を07:15〜07:30に再符号化する、等である。 FIG. 7 illustrates one method for leveling the number of records in each time interval. However, for example, the time interval may be divided by a fixed time such as every 15 minutes or every hour. For example, in the case of every 15 minutes, 07:23 is re-encoded from 07:15 to 07:30, and so on.

図８は、図６のＳ６０２の詳細な流れの一例を示す。従って、図８に示す全てのステップの処理の主体は、位置匿名化部１２２である。 FIG. 8 shows an example of a detailed flow of S602 in FIG. Therefore, the subject of all the steps shown in FIG.

まず、位置匿名化部１２２は、最終的に出力するテーブルのレコードを格納するＢを空集合（レコードなし）に初期化する（Ｓ８０１）。 First, the position anonymization unit 122 initializes B, which stores the record of the table to be finally output, to an empty set (no record) (S801).

次に、位置匿名化部１２２は、入退場非区別方式では、Ｓ６０１から入力として渡されたレコード集合を、各レコードの持つ時区間のデータを用いて、同じ時区間を持つレコード集合に分割し、分割した各レコード集合Ｄに対して以下の処理を行う。入退場区別方式では、位置匿名化部１２２は、同じ時区間と入退場のタプルを持つレコード集合に分割し、分割した各レコード集合Ｄに対して以下の処理を行う（Ｓ８０２）。 Next, in the entrance / exit non-distinguishing method, the position anonymization unit 122 divides the record set passed as input from S601 into record sets having the same time interval using the data of the time interval possessed by each record. The following processing is performed on each divided record set D. In the entry / exit distinction method, the position anonymization unit 122 divides the record set into the record sets having the same time interval and the entry / exit tuple, and performs the following processing on each divided record set D (S802).

まず、位置匿名化部１２２は、駅隣接構造テーブル１３１をメモリ１０２にロードする（Ｓ８０３）。 First, the location anonymization unit 122 loads the station adjacent structure table 131 into the memory 102 (S803).

次に、位置匿名化部１２２は、Ｄから各駅の頻度を取得する（Ｓ８０４）。ここでの頻度とは、同一駅と関連付けられる相異なるユーザＩＤの総数を意味する。 Next, the position anonymization part 122 acquires the frequency of each station from D (S804). The frequency here means the total number of different user IDs associated with the same station.

次に、位置匿名化部１２２は、頻度が正で、かつｋ件未満の駅があるかどうか条件判定を行う（Ｓ８０５）。ここで、正とは０を含まない。 Next, the position anonymization unit 122 determines whether or not there is a station having a positive frequency and less than k (S805). Here, positive does not include 0.

Ｓ８０５で頻度が正で、ｋ件未満の駅が存在すると判定された場合には（Ｓ８０５：Ｙｅｓ）、位置匿名化部１２２は、頻度が正で、ｋ件未満の駅Ｘをランダムに１つ選択する（Ｓ８０６）。 When it is determined in S805 that the frequency is positive and there are less than k stations (S805: Yes), the location anonymization unit 122 randomly selects one station X having a positive frequency and less than k stations. Select (S806).

次に、位置匿名化部１２２は、Ｓ８０３でメモリ１０２にロードした（あるいは、Ｓ８１０でそれを書き換えた）駅隣接関係を示すテーブルから、駅Ｘの隣駅にあたるものを全て列挙し、この集合をＣＡＮＤとおく（Ｓ８０７）。ＣＡＮＤは、図２に示すテーブル構造から簡単に取得できる。 Next, the location anonymization unit 122 lists all of the stations adjacent to the station X from the table indicating the station adjacency relationship loaded into the memory 102 in S803 (or rewritten it in S810), and sets this set. It is set as CAND (S807). The CAND can be easily obtained from the table structure shown in FIG.

次に、位置匿名化部１２２は、ＣＡＮＤの中からシステムで定まっている指標を用いて、１つの駅Ｙを選択する（Ｓ８０８）。この処理の一例については、図９を用いて後に詳しく説明する。 Next, the location anonymization unit 122 selects one station Y from the CAND using an index determined by the system (S808). An example of this processing will be described in detail later with reference to FIG.

次に、位置匿名化部１２２は、Ｄ内に出現する全てのＸ、またはＹのセルを｛Ｘ、Ｙ｝に再符号化する（Ｓ８０９）。｛Ｘ、Ｙ｝は駅Ｘまたは駅Ｙに存在したことを示す。便宜上、以降駅｛Ｘ、Ｙ｝という呼び方をする。 Next, the position anonymization unit 122 re-encodes all X or Y cells appearing in D to {X, Y} (S809). {X, Y} indicates that station X or station Y was present. For convenience, the station {X, Y} will be referred to hereinafter.

次に、位置匿名化部１２２は、Ｓ８０３でメモリにロードした駅隣接構造テーブル内のＸ、Ｙも全て｛Ｘ、Ｙ｝に置き換え、｛Ｘ、Ｙ｝の頻度を駅Ｘの頻度と駅Ｙの頻度の和とする（Ｓ８１０）。以上の処理を終えたら、Ｓ８０５に戻る。 Next, the location anonymization unit 122 replaces all X and Y in the station adjacent structure table loaded into the memory in S803 with {X, Y}, and changes the frequency of {X, Y} to the frequency of the station X and the station Y. (S810). When the above processing is completed, the process returns to S805.

Ｓ８０５で、正の頻度でｋ件未満の駅がないと判定された場合には（Ｓ８０５：Ｎｏ）、位置匿名化部１２２は、ＢにＤ内のレコードを全て追加し（Ｓ８１１）、次のレコード集合に対するループ処理を行う。 If it is determined in S805 that there are no less than k stations with a positive frequency (S805: No), the location anonymization unit 122 adds all the records in D to B (S811), and Perform loop processing on record sets.

以上のループ処理を全ての対象に対して終えると、位置匿名化部１２２は、Ｂを匿名改札入退場テーブルに出力し、処理を終える（Ｓ８１２）。 When the above loop processing is finished for all the objects, the position anonymization unit 122 outputs B to the anonymous ticket gate entrance / exit table and finishes the processing (S812).

図８の例では、Ｓ８０６で、頻度がｋ件未満の駅がランダムに選択されるが、何らかの指標を用いて選択が行われても良い。例えば、最も頻度の小さい駅を選択するという方法が適用されても良い。 In the example of FIG. 8, in S806, stations with a frequency of less than k are randomly selected, but selection may be performed using some index. For example, a method of selecting a station with the lowest frequency may be applied.

図９は、図８のＳ８０３の詳細な流れの一例を示す。図９を参照して、位置匿名化部１２２がＳ８０８でＣＡＮＤの中から１つの駅Ｙを選択する動作の一例を説明する。なお、図９の例では、ＣＡＮＤ、ＣＡＮＤに含まれる各駅の頻度、駅Ｘの頻度を入力として、駅Ｙが返される。 FIG. 9 shows an example of a detailed flow of S803 in FIG. With reference to FIG. 9, an example of the operation in which the position anonymization unit 122 selects one station Y from the CAND in S808 will be described. In the example of FIG. 9, the station Y is returned with the frequency of each station included in CAND and CAND and the frequency of the station X as inputs.

まず、位置匿名化部１２２は、指標を格納する変数aを無限大に初期化し、最終的に返す駅を格納する変数Ｙをヌル値に初期化する（Ｓ９０１）。 First, the position anonymization unit 122 initializes a variable a for storing an index to infinity, and initializes a variable Y for storing a station to be finally returned to a null value (S901).

次に、位置匿名化部１２２は、駅Ｘと再符号化した時に指標が最小となるＣＡＮＤ内の駅を探索するために、ＣＡＮＤ内の各駅Ｓに対して以下の手続きを実行する（Ｓ９０２）。 Next, the position anonymization unit 122 executes the following procedure for each station S in the CAND in order to search for a station in the CAND that has the smallest index when re-encoding with the station X (S902). .

位置匿名化部１２２は、ｂを以下の式で計算した指標とする（Ｓ９０３）。

The position anonymization unit 122 sets b as an index calculated by the following formula (S903).

ただし、ｃｎｔ（ｈ）は駅を引数とし、その駅の頻度を表すものとする。ｌｏｇの底はシステムで統一をしていれば任意でよい。ｌｏｇの底として、一般的には、２が用いられる。また、０ｌｏｇ０は、０とする。数１は、駅Ｘと駅Ｓを再符号化した時に失う情報エントロピーを表す量である。 However, cnt (h) takes a station as an argument and represents the frequency of the station. The bottom of the log may be arbitrary as long as it is unified in the system. Generally, 2 is used as the bottom of the log. Also, 0log0 is 0. Equation 1 is an amount representing information entropy lost when the station X and the station S are re-encoded.

次に、位置匿名化部１２２は、ａがｂより大きいかどうか条件判定を行う（Ｓ９０４）。ａがｂより大きいと判定された場合には、再符号化時に、より指標を小さくする駅候補が見つかったことを意味する。 Next, the position anonymization unit 122 determines whether or not a is larger than b (S904). If it is determined that a is larger than b, it means that a station candidate having a smaller index is found during re-encoding.

Ｓ９０４で、ａがｂより大きいと判定された場合には（Ｓ９０４：Ｙｅｓ）、位置匿名化部１２２は、ａにｂを代入し、ＹにＳを代入する（Ｓ９０５）。 If it is determined in S904 that a is greater than b (S904: Yes), the position anonymization unit 122 substitutes b for a and S for Y (S905).

以上のループ処理を終えた後、位置匿名化部１２２は、Ｙの値を返して処理を終了する。 After completing the above loop processing, the position anonymization unit 122 returns the value of Y and ends the processing.

図９の例によれば、駅Ｘに隣接する複数の駅のうち、上記ｂの値が最も小さくなる駅が、駅Ｙとして選択される。上記ｂの値が最も小さいということは、再符号化による情報損失が最も小さいということである。これは、利用者数が最も少ない駅を利用した個人の個人情報が、利用者数が多い駅を利用した個人の個人情報よりも価値が高いという考えに基づいている。すなわち、図９の例によれば、ｋ−匿名化を実現しつつ、情報の損失が少ない匿名改札入退場テーブル１３３を作成することができる。 According to the example of FIG. 9, the station having the smallest value of b among the plurality of stations adjacent to the station X is selected as the station Y. That the value of b is the smallest means that the information loss due to re-encoding is the smallest. This is based on the idea that the personal information of individuals using the station with the smallest number of users is more valuable than the personal information of individuals using the stations with the largest number of users. That is, according to the example of FIG. 9, it is possible to create the anonymous ticket gate entrance / exit table 133 with little information loss while realizing k-anonymization.

図９の例では、前述したように、損失する情報エントロピーが、再符号化対象を選択するための指標として採用されているが、特にこれに限る必要はない。例えば、地理上の距離の近さが指標とされてもよいし、利用頻度（利用者数）の和がｋにどれだけ近いかが指標とされてもよい。 In the example of FIG. 9, as described above, the lost information entropy is adopted as an index for selecting a re-encoding target, but it is not necessary to be limited to this. For example, the proximity of geographical distances may be used as an index, and how close the sum of usage frequencies (number of users) is to k may be used as an index.

以上のように、本実施例に係る計算機１００の１つの特徴は、時刻を時区間化する方法と隣駅だけ再符号化を行う位置匿名化方法を有することにある。複数ある隣駅から再符号化をする駅を選択する際には、何らかの指標が用いられる。図９を参照して説明したように、損失する情報エントロピーが指標として用いられれば、利用頻度の小さい駅同士が再符号化される可能性が高くなるため、過度の再符号化を避けた有用性の高い匿名改札入退場テーブル１３３を生成することができる。 As described above, one feature of the computer 100 according to the present embodiment is that it has a method for making time a time interval and a position anonymization method for re-encoding only the adjacent station. When selecting a station to be re-encoded from a plurality of neighboring stations, some index is used. As described with reference to FIG. 9, if information entropy to be lost is used as an index, stations with low usage frequency are likely to be re-encoded, so that it is useful to avoid excessive re-encoding. An anonymous ticket gate entrance / exit table 133 having high characteristics can be generated.

また、本実施例によれば、乗車・降車を考慮した個人情報匿名化を行うことができる。 In addition, according to the present embodiment, personal information anonymization in consideration of getting on and off can be performed.

なお、計算機１００の実現する図６〜図９の動作の例では、以下のような方法で匿名性を破られてしまう可能性もある。図１０Ａのような駅隣接関係があり、駅Ｂの近くに住んでいるような場合には、ある１日の改札入退場履歴を参照した場合、最初に改札を入場した駅と最後に改札を退場した駅は同一にＢである場合が多い。図１０Ｂは、当該個人のある１日の改札入退場履歴を図６〜図９の動作の例で匿名化を行った例であるが、攻撃者は前述の前提知識により、図１０Ｂのデータだけから、｛Ａ，Ｂ｝と｛Ｂ，Ｃ｝の積集合を取って、当該個人の利用駅は駅Ｂに違いないと予測することができる。このリスクは、予測開示リスクと呼ばれ、図６〜図９の動作では回避したいリスクの対象とはなっていないものである。そこで、この予測開示リスクを回避したい場合に、どのように動作を変更すれば良いかの一例を示す。 In addition, in the example of operation | movement of FIGS. 6-9 which the computer 100 implement | achieves, anonymity may be broken by the following methods. If there is a station adjacency relationship as shown in FIG. 10A and you live near station B, refer to the ticket entry / exit history for a certain day, and the station that entered the ticket gate first and the ticket gate at the end In many cases, the stations that have exited are identically B. FIG. 10B is an example of anonymizing the ticket entry / exit history of the individual for one day with the example of the operation of FIGS. 6 to 9, but the attacker uses only the data of FIG. Therefore, by taking the product set of {A, B} and {B, C}, it can be predicted that the station used by the individual must be station B. This risk is referred to as a predicted disclosure risk, and is not a risk target to be avoided in the operations of FIGS. Therefore, an example of how to change the operation when it is desired to avoid this prediction disclosure risk is shown.

まず、図８の動作例において、位置匿名化部１２２は、Ｓ８０２のループを行わず、全ての時区間のレコードをＤとする。Ｓ８０４では、位置匿名化部１２２は、全時区間での駅頻度と、（時区間、駅）のタプルでの駅頻度を両方取得する。また、Ｓ８０５の条件判定は、「頻度がｋ件未満の（時区間、駅）のタプルがある」である。Ｓ８０６では、「頻度がｋ件未満の（時区間、駅）のタプルがある駅Ｘをランダムに１つ取得する」という処理が行われる。また、図９の動作例では、入力のＣＡＮＤ内各駅の頻度と駅Ｘの頻度が、Ｓ８０４で取得した全時区間での駅頻度とされる。 First, in the operation example of FIG. 8, the position anonymization unit 122 does not perform the loop of S802 and sets D for all time interval records. In S804, the position anonymization unit 122 acquires both the station frequency in the all time section and the station frequency in the tuple (time section, station). Further, the condition determination in S805 is “There is a tuple with a frequency of less than k (time interval, station)”. In S806, a process of “obtaining one station X with a tuple with a frequency less than k (time interval, station) at random” is performed. Further, in the operation example of FIG. 9, the frequency of each station in the input CAND and the frequency of the station X are the station frequencies in all the time intervals acquired in S804.

以上の変更により、全ての時区間で、匿名化前の駅が同じであれば、それを全く同一の駅に再符号化することができる。例えば、図１０Ｂの例では、何れも駅｛Ａ，Ｂ，Ｃ｝に再符号化される。従って、前述の予測開示リスクをも回避する匿名化が可能となる。 With the above change, if the station before anonymization is the same in all time sections, it can be re-encoded into the same station. For example, in the example of FIG. 10B, all are re-encoded to the stations {A, B, C}. Therefore, anonymization that avoids the above-described risk of predictive disclosure is possible.

以下、本発明の実施例２を説明する。その際、実施例１との相違点を主に説明し、実施例１との共通点については説明を省略或いは簡略する。具体的には、例えば、実施例２を説明する場合、上述の実施例１と重複する構成に対しては同じ符号を付与して説明を省略する。また、実施例１と同じ動作に対しては、同じ符号を付与して説明を省略する。 Embodiment 2 of the present invention will be described below. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified. Specifically, for example, when describing the second embodiment, the same reference numerals are given to the same components as those in the first embodiment, and the description thereof is omitted. Further, the same operations as those in the first embodiment are denoted by the same reference numerals and the description thereof is omitted.

実施例２は、異なる路線の駅とは再符号化を行わないことにより、実際に改札入退場を行った路線を保存することを１つの特徴とする。 The second embodiment is characterized in that a route that has actually entered and exited a ticket gate is stored by not performing re-encoding with a station on a different route.

図１１は、本発明の実施例２に係る個人情報匿名化装置が適用された計算機の構成例を示す。以下の説明でも実施例１と同様に、「入退場非区別方式」と「入退場区別方式」を説明する。それぞれの方式で別個の処理を持つ場合には明確にその旨を記す。特に方式を断らない場合には、両方式で共通であることを意味する。 FIG. 11 shows a configuration example of a computer to which the personal information anonymization apparatus according to the second embodiment of the present invention is applied. In the following description, the “entrance / exit distinction method” and the “entry / exit distinction method” will be described as in the first embodiment. If there is a separate process for each method, this is clearly stated. Unless otherwise specified, it means that both methods are common.

ＣＰＵ１０１は、メモリ１０２上にプログラム１１５１をロードし、実行することにより、時区間化部１２１及び位置匿名化部１１２１並びに路線探索部１１２２を実現する。 The CPU 101 loads the program 1151 on the memory 102 and executes it, thereby realizing the time segmentation unit 121, the position anonymization unit 1121, and the route search unit 1122.

路線探索部１１２２は、改札入退場テーブル１３２を入力とし、各個人の改札の入場時刻、入場駅のタプル、および該個人の直後の改札退場時刻、退場駅のタプルをセットにして、入場駅で乗車した路線および退場駅で降車した路線を算出する。路線探索部１１２２は、改札入退場テーブル１３２の各レコードに、この路線を付与して出力し、時区間化部１２１の入力に渡す。なお、路線探索部１１２２の実現には、既存の経路探索技術（例えば、Ｗｅｂ上で提供されている経路探索サービスに従う技術）をそのまま利用することができる。 The route search unit 1122 takes the ticket gate entrance / exit table 132 as an input, sets the entrance time of each individual ticket gate, the tuple of the entrance station, the ticket exit time immediately after the individual, and the tuple of the exit station as a set. Calculate the route you got on and the route you got off at the exit station. The route search unit 1122 assigns and outputs this route to each record in the ticket gate entrance / exit table 132 and passes it to the input of the time segmentation unit 121. In order to realize the route search unit 1122, an existing route search technology (for example, a technology according to a route search service provided on the Web) can be used as it is.

図１３は、路線探索部１１２２の出力例を示す。 FIG. 13 shows an output example of the route search unit 1122.

図３に示したような入力の改札入退場テーブル１３２から、路線探索部１１２２は、例えば、ユーザＩＤ「１」の個人が、2010/05/12 07:20:13に駅Ａに入場後、2010/05/12
07:43:10に駅Ｇを退場したという情報を取得する。そして、路線探索部１１２２は、これを既存の経路探索技術を用いて、ａｌｐｈａ路線で乗車、ａｌｐｈａ路線で降車という情報を推定する。さらに、路線探索部１１２２は、この情報を入場、退場それぞれのレコードに対し付与して出力する。この出力された情報の一例が、図１３に示す情報である。なお、乗車路線と降車路線が異なる場合もある。 From the ticket gate entrance / exit table 132 as shown in FIG. 3, the route search unit 1122 indicates that, for example, after the individual with the user ID “1” enters the station A at 2010/05/12 07:20:13, 2010/05/12
The information that the station G was left at 07:43:10 is acquired. And the route search part 1122 estimates the information of boarding on an alpha route and getting off on an alpha route using the existing route search technology. Further, the route search unit 1122 assigns and outputs this information to each record of entry and exit. An example of the output information is the information shown in FIG. The boarding route and the getting-off route may be different.

時区間化部１２１は、路線探索部１１２２の出力を入力として受け取り、実施例１と同様に、各レコードの時刻のカラムを時区間へと変換したテーブルを出力とし、位置匿名化部１１２１の入力へと渡す。 The time segmentation unit 121 receives the output of the route search unit 1122 as an input, and outputs a table obtained by converting the time column of each record into a time interval as in the first embodiment, and the input of the position anonymization unit 1121 Pass to.

位置匿名化部１１２１は、時区間化部１２１から渡されたテーブル、及び路線駅隣接構造テーブル１１３１、及び最小同値件数情報１３４を入力とする。入退場非区別方式の場合には、位置匿名化部１１２１は、任意の時区間、任意の路線上の任意の駅に存在した総人数が０（又はｋ以上）になるように駅を再符号化する。入退場区別方式の場合には、位置匿名化部１１２１は、任意の時区間に任意の路線上の任意の駅に入場した総人数、及び、任意の時区間に任意の路線上の任意の駅から退場した総人数が、それぞれ０（又はｋ以上）になるように駅を再符号化する。このとき、再符号化を行う駅同士は、何れも同一路線上の駅に限定される。位置匿名化部１２２は、以上の結果を匿名改札入退場テーブル１３３に出力する。図１４は、実施例２での匿名化により生成された匿名改札入退場テーブル１３３の一例を示しているが、匿名化後の駅が示す本来の駅がいずれも同じ路線であることがわかる。なお、このテーブル１３３が表す情報が、出力装置１０５を介して出力されてもよい。 The position anonymization unit 1121 receives as input the table passed from the time segmentation unit 121, the route station adjacent structure table 1131, and the minimum equivalence number information 134. In the case of the entrance / exit non-distinguishing method, the position anonymization unit 1121 re-codes the station so that the total number of people existing at any station on any time section and any route becomes 0 (or k or more). Turn into. In the case of the entrance / exit distinction method, the position anonymization unit 1121 includes the total number of people who entered any station on any route in any time section, and any station on any route in any time section. The station is re-encoded so that the total number of people leaving the station becomes 0 (or more than k). At this time, stations that perform re-encoding are limited to stations on the same route. The position anonymization unit 122 outputs the above result to the anonymous ticket gate entrance / exit table 133. FIG. 14 shows an example of the anonymized ticket gate entrance / exit table 133 generated by anonymization in the second embodiment, and it can be seen that all the original stations indicated by the anonymized station are the same route. Note that the information represented by the table 133 may be output via the output device 105.

図１２Ａは、路線駅隣接構造テーブル１１３１の構成例を示す。 FIG. 12A shows a configuration example of the route station adjacent structure table 1131.

図２に示した駅隣接構造テーブル１３１との違いは、駅と駅を結ぶ線路をどの路線が走るかを表す路線のカラムが増えている点である。図１２Ａでは、路線駅隣接構造テーブル１１３１は、図１２Ｂに示す路線構造１２００に対応する。具体的には、図１２Ａに示す路線駅隣接構造テーブル１１３１によれば、路線ａｌｐｈａが、Ａ−Ｂ−Ｃ−Ｊ−Ｋ−Ｇ−Ｌ−Ｍと構成されていることがわかり、同様に路線ｂｅｔａは、Ｃ−Ｄ−Ｅ−Ｆ−Ｇ−Ｈ−Ｉ−Ｃという環状線になっていることがわかる。なお、同一駅間を異なる路線が走る場合には、例えば(Ｎ,Ｏ,ｇａｍｍａ)、（Ｎ,Ｏ,ｄｅｌｔａ)という２つのレコードで、駅Ｎと駅Ｏの間に２つの路線ｇａｍｍａ、ｄｅｌｔａが走っていることを表すことができる。 The difference from the station adjacent structure table 131 shown in FIG. 2 is that the number of line columns indicating which line runs on the line connecting the stations is increased. In FIG. 12A, the route station adjacent structure table 1131 corresponds to the route structure 1200 shown in FIG. 12B. Specifically, according to the route station adjacent structure table 1131 shown in FIG. 12A, it can be seen that the route alpha is configured as A-B-C-J-K-G-L-M. It can be seen that beta is a circular line C-D-E-F-G-H-I-C. When different routes run between the same stations, for example, two records (N, O, gamma) and (N, O, delta), and two routes gamma and delta between station N and station O are used. Can show that is running.

なお、路線駅隣接構造テーブル１１３１の構造は、必ずしも図１２Ａに示す構造に限られない。グラフ構造には様々な表現形式が存在し、実際の駅の隣接関係とその間を走る路線を定義できるものであれば、その何れでも良い。 The structure of the route station adjacent structure table 1131 is not necessarily limited to the structure shown in FIG. 12A. There are various representation formats in the graph structure, and any of them can be used as long as it can define the adjacent relationship between the actual stations and the route that runs between them.

次に、図１５は、実施例２に係る計算機１００が実行する処理の全体の流れの一例を示す。 Next, FIG. 15 illustrates an example of the overall flow of processing executed by the computer 100 according to the second embodiment.

まず、Ｓ１５０１で、路線探索部１１２２が、改札入退場テーブル１３２の各レコードの乗車又は降車の路線を推定する。なお、この推定は、前述したように、既存技術で良いので、詳細な説明を省略する。路線探索部１１２２は、改札入退場テーブル１３２の各レコードに推定路線情報（各レコードに登録されている駅の推定された路線を表す情報）が付加されたテーブルを出力する。 First, in S1501, the route search unit 1122 estimates the route of getting on or getting off of each record in the ticket gate entrance / exit table 132. Note that, as described above, this estimation may be performed using existing technology, and thus detailed description thereof is omitted. The route search unit 1122 outputs a table in which estimated route information (information indicating an estimated route of a station registered in each record) is added to each record of the ticket gate entrance / exit table 132.

次に、Ｓ１５０２で、時区間化部１２１が、路線探索部１１２２が出力したテーブルを入力とし、入力のテーブルの時刻のカラムを時区間へと変換する。Ｓ１５０２の詳細については、Ｓ７０２で改札入退場テーブル１３２を取得するのではなく、路線探索部１１２２の出力を取得する部分が異なる以外は、実施例１と同様なので、説明を省略する。 In step S1502, the time segmentation unit 121 receives the table output from the route search unit 1122 and converts the time column in the input table into a time interval. The details of S1502 are the same as those in the first embodiment except that the ticket gate entrance / exit table 132 is not acquired in S702 but the output of the route search unit 1122 is different.

次に、Ｓ１５０３で、位置匿名化部１１２１が、下記の処理、
（ａ）時区間化部１２１の出力と、路線駅隣接構造テーブル１１３１と、最小同値件数情報１３４とを参照する、
（ｂ）入退場非区別方式の場合には、同一路線の同一の駅５０２及び時区間５０３の組合せを含んだタプル（レコード）に対応する異なる個人の数がｋ以上（最小同値件数４０１以上）になるように駅３０２を再符号化する、
（ｃ）入退場区別方式の場合には、同一路線の同一の駅５０２、時区間５０３及び入退場５０４の組合せを含んだタプル（レコード）に対応する異なる個人の数がｋ以上（最小同件数４０１以上）になるように駅３０２を再符号化する、
（ｄ）上記（ｂ）又は（ｃ）の結果に従う駅５０２（駅３０２が再符号化されることにより得られた情報）と、それに対応するユーザＩＤ５０１、時区間５０３及び入退場５０４を、匿名改札入退場テーブル１３３に格納する、
を行う。 Next, in S1503, the position anonymization unit 1121 performs the following processing:
(A) Refer to the output of the time segmentation unit 121, the route station adjacent structure table 1131, and the minimum equivalence number information 134.
(B) In the case of the entrance / exit non-distinguishing method, the number of different individuals corresponding to the tuple (record) including the combination of the same station 502 and time section 503 on the same route is more than k (minimum equivalent number 401 or more). Re-encode station 302 to
(C) In the case of the entrance / exit distinction method, the number of different individuals corresponding to tuples (records) including combinations of the same station 502, time interval 503 and entrance / exit 504 on the same route is equal to or greater than k (minimum number of cases) 401 or more), and re-encoding the station 302 so that
(D) The station 502 (information obtained by re-encoding the station 302) according to the result of (b) or (c) above, and the corresponding user ID 501, time interval 503, and entrance / exit 504 are anonymous. Store in the ticket gate entrance / exit table 133,
I do.

図１６は、図１５のＳ１５０３の詳細な流れの一例を示す。図１６の各ステップを位置匿名化部１１２１が行う。図１６に示した例は、Ｓ６０２の詳細とほとんど同じであるから、動作を異とするステップのみ、異なるステップ番号を付与し、その説明を行う。 FIG. 16 shows an example of a detailed flow of S1503 in FIG. The position anonymization part 1121 performs each step of FIG. Since the example shown in FIG. 16 is almost the same as the details of S602, only steps having different operations are given different step numbers and described.

Ｓ１６０１では、位置匿名化部１１２１は、入退場非区別方式では、Ｓ１５０２から入力として渡されたレコード集合を、同じ時区間を持つレコード集合に分割し、分割した各レコード集合Ｅに対して以下のループ処理を行う。入退場区別方式では、位置匿名化部１１２１は、同じ時区間と入退場のタプルを持つレコード集合に分割し、分割した各レコード集合Ｅに対して以下のループ処理を行う。 In S1601, in the entrance / exit non-distinguishing method, the position anonymization unit 1121 divides the record set passed as input from S1502 into record sets having the same time interval, and for each divided record set E, the following Perform loop processing. In the entry / exit distinction method, the position anonymization unit 1121 divides the record set into the same time interval and the entry / exit tuple, and performs the following loop processing on each divided record set E.

Ｓ１６０２では、位置匿名化部１１２１は、Ｅに含まれるレコードを路線ごとに分割し、その各々のレコード集合Ｄに対して、以降のループ処理を行う。 In S1602, the position anonymization unit 1121 divides the record included in E for each route, and performs the subsequent loop processing on each record set D.

Ｓ１６０３では、位置匿名化部１１２１は、Ｓ１６０２で選択した路線に関する駅隣接構造を、路線駅隣接構造テーブル１１３１からメモリ１０２にロードする。 In S1603, the position anonymization unit 1121 loads the station adjacent structure related to the route selected in S1602 from the route station adjacent structure table 1131 to the memory 102.

以上のように、図８の処理例を変更すれば、Ｓ１５０３の１つの処理例を実現することができる。 As described above, if the processing example of FIG. 8 is changed, one processing example of S1503 can be realized.

図１７は、図１５のＳ１５０３の詳細な流れの別の一例を示す。図１７の各ステップを位置匿名化部１１２１が行う。図１７の例では、図８及び図１６と同じステップには、同一のステップ番号が付与されている。図８及び図１６と異なるステップを主に説明する。 FIG. 17 shows another example of the detailed flow of S1503 of FIG. The position anonymization unit 1121 performs each step of FIG. In the example of FIG. 17, the same step number is assigned to the same step as in FIGS. 8 and 16. Steps different from those in FIGS. 8 and 16 will be mainly described.

図１２Ａに示したテーブルから、路線の構造として、図１８Ａに示す直線（Ａ−Ｂ−Ｃ−Ｄ−Ｅと直線的に移動する通常の路線）と、図１８Ｂに示す環状線（Ａ−Ｂ−Ｃ−Ｄ−Ｅ−Ｆ−Ａと周回的に移動する環状線）とのうちのいずれかが特定される。 From the table shown in FIG. 12A, as the structure of the route, a straight line shown in FIG. 18A (a normal route that moves linearly with A-B-C-D-E) and a circular line shown in FIG. 18B (A-B -C-D-E-F-A and a circular line moving in a circular manner) are specified.

位置匿名化部１１２１は、Ｓ１６０３で処理中の路線の駅隣接関係と、Ｓ８０４で各駅の利用頻度とを取得した後、この駅隣接関係及び利用頻度を利用して、Ｈｕ−Ｔｕｃｋｅｒ木を生成する（Ｓ１７０１）。この符号木の構成方法として、例えば、文献「D.E. Knuth, “The Art of Computer Programming: Volume 3 Sorting and Searching,” Addison-Wesley, pp.439〜444, 1973」に記載の方法を採用することができる。Ｈｕ−Ｔｕｃｋｅｒ木は、全順序関係を保存した２分木を生成するアルゴリズムである。図１８Ａに示した直線の場合には、位置匿名化部１１２１は、その直線の一端の駅から順に他端の駅までを全順序として捉える。どちらの端を先頭にするかについては、アルゴリズムは依存しない。例えば、図１８Ａの例では、Ａ→Ｂ→Ｃ→Ｄ→Ｅという順序とする。図１８Ｂに示した環状線場合には、全順序ではなく、巡回的な順序となっているが、Ｈｕ−Ｔｕｃｋｅｒ木の構成アルゴリズムでは、隣接関係の判定のみしか行われないため、問題なく実行可能である。 The location anonymization unit 1121 generates a Hu-Tucker tree using the station adjacency relationship and the use frequency after acquiring the station adjacency relationship of the route being processed in S1603 and the use frequency of each station in S804. (S1701). As a method for constructing this code tree, for example, the method described in the document “DE Knuth,“ The Art of Computer Programming: Volume 3 Sorting and Searching, ”Addison-Wesley, pp. 439 to 444, 1973” can be adopted. it can. The Hu-Tucker tree is an algorithm for generating a binary tree that preserves the total order relation. In the case of the straight line shown in FIG. 18A, the position anonymization unit 1121 captures the entire order from the station at one end of the straight line to the station at the other end. The algorithm does not depend on which end is the first. For example, in the example of FIG. 18A, the order is A → B → C → D → E. In the case of the circular line shown in FIG. 18B, the order is not the entire order but the cyclic order. However, the Hu-Tucker tree construction algorithm only performs the adjacency determination, and can be executed without any problem. It is.

次に、位置匿名化部１１２１は、構成したＨｕ−Ｔｕｃｋｅｒ木の各内部節点に、ラベルと頻度とを付与する（Ｓ１７０２）。具体的には、例えば、ラベルは、孫にあたる葉（実際の駅）すべてを列挙したものとする。頻度は、孫にあたる葉すべての頻度の和とする。 Next, the position anonymization unit 1121 assigns a label and a frequency to each internal node of the configured Hu-Tucker tree (S1702). Specifically, for example, it is assumed that the label lists all the leaves (actual stations) as grandchildren. The frequency is the sum of the frequencies of all leaves that are grandchildren.

図１９Ａは、Ｓ１７０１及びＳ１７０２で構成したＨｕ−Ｔｕｃｋｅｒ木の例を示す。 FIG. 19A shows an example of the Hu-Tucker tree configured in S1701 and S1702.

図１９Ａにおいて、丸で記された節点１９０１が駅に相当し、図の破線は、Ａ−Ｂ−Ｃ−Ｄ−Ｅという路線構造を示す。例えば、図１９Ｂのような頻度情報がＳ８０４で取得された場合、Ｈｕ−Ｔｕｃｋｅｒ木の構造は、図１９Ａに記したような木となる。なお、Ａ，Ｂ，Ｃ，Ｄ，Ｅ間を直接結ぶ破線は木の枝ではない。四角で表された節点１９０２が、内部節点である。Ｓ１７０２でラベルと頻度を内部節点に付与する際には、例えば、節点１９０２のように、孫にあたる葉を並べた情報｛Ａ，Ｂ，Ｃ｝をラベルとし、葉Ａ，Ｂ，Ｃの頻度の和１０を頻度とする。また、｛Ａ，Ｂ，Ｃ，Ｄ，Ｅ｝のように、再符号化された場合に最も曖昧になり、頻度が最大となる内部節点を根と呼ぶ。 In FIG. 19A, a node 1901 indicated by a circle corresponds to a station, and a broken line in the drawing indicates a route structure of A-B-C-D-E. For example, when the frequency information as illustrated in FIG. 19B is acquired in S804, the structure of the Hu-Tucker tree is a tree as illustrated in FIG. 19A. A broken line directly connecting A, B, C, D, and E is not a tree branch. A node 1902 represented by a square is an internal node. When a label and a frequency are assigned to the internal node in S1702, for example, information {A, B, C} in which leaves corresponding to grandchildren are arranged like a node 1902 is used as a label, and the frequency of the leaves A, B, C is changed. The sum 10 is the frequency. In addition, as in {A, B, C, D, E}, an internal node that is most ambiguous and has the highest frequency when re-encoded is called a root.

次に、位置匿名化部１１２１は、作成したＨｕ−Ｔｕｃｋｅｒ木上で、どの駅をどの内部節点に再符号化するかを決定し（Ｓ１７０３）、Ｄに含まれる各レコードの駅を決定された対応にしたがって再符号化する（Ｓ１７０４）。具体的には、例えば、位置匿名化部１１２１は、Ｓ１７０３で再符号化対象の節点をマークし、Ｓ１７０４では、葉から親をたどることで初めて再符号化対象とマークされた節点へと再符号化する。なお、葉が再符号化対象である場合には、再符号化は行われない。 Next, the location anonymization unit 1121 determines which station is to be re-encoded to which internal node on the created Hu-Tucker tree (S1703), and the station of each record included in D is determined. Re-encoding is performed according to the correspondence (S1704). Specifically, for example, the position anonymization unit 1121 marks a node to be re-encoded in S1703, and re-encodes to a node marked to be re-encoded for the first time in S1704 by following the parent from the leaf. Turn into. If the leaf is a target for re-encoding, re-encoding is not performed.

図２０は、図１７のＳ１７０３（作成したＨｕ−Ｔｕｃｋｅｒ木上での再符号化対応を決定する処理）の詳細の流れの一例を示す。図２０の各ステップを位置匿名化部１１２１が行う。 FIG. 20 shows an example of the detailed flow of S1703 in FIG. 17 (processing for determining re-encoding correspondence on the created Hu-Tucker tree). The position anonymization unit 1121 performs each step of FIG.

まず、位置匿名化部１１２１は、スタックＳを初期化し、入力のＨｕ−Ｔｕｃｋｅｒ木の根にあたる節点をＳに入れる（Ｓ２００１）。スタックは、プッシュとポップと呼ばれる操作が可能な配列に類したデータ構造で、プッシュとは配列の末尾にデータを挿入する操作であり、ポップとは配列の末尾のデータを取り出し、配列から削除する操作である。 First, the position anonymization unit 1121 initializes the stack S, and puts a node corresponding to the root of the input Hu-Tucker tree into S (S2001). A stack is a data structure similar to an array that allows operations called push and pop. Push is an operation that inserts data at the end of an array, and pop takes out the data at the end of the array and deletes it from the array. It is an operation.

次に、位置匿名化部１１２１は、Ｓが空であるかどうかの条件判定を行う（Ｓ２００２）。Ｓが空である場合には（Ｓ２００２：Ｙｅｓ）、処理が終了する。 Next, the position anonymization unit 1121 determines whether or not S is empty (S2002). If S is empty (S2002: Yes), the process ends.

Ｓ２００２で、Ｓが空でないと判定されれば（Ｓ２００２：Ｎｏ）、位置匿名化部１１２１は、Ｓから節点をポップし、これをｎとおく（Ｓ２００３）。 If it is determined in S2002 that S is not empty (S2002: No), the position anonymization unit 1121 pops a node from S and sets this as n (S2003).

次に、位置匿名化部１１２１は、ｎが葉であるかどうかの条件判定を行う（Ｓ２００４）。 Next, the position anonymization unit 1121 determines whether or not n is a leaf (S2004).

Ｓ２００４でｎが葉ではないと判定されれば（Ｓ２００４：Ｎｏ）、位置匿名化部１１２１は、ｎの持つ２つの子がいずれも、０と同値か、最小同値件数４０１以上、即ちｋ以上の頻度を持つかどうかを判断する（Ｓ２００５）。 If it is determined in S2004 that n is not a leaf (S2004: No), the position anonymization unit 1121 determines that both of the two children of n are the same as 0 or the minimum number of equivalences 401 or more, that is, k or more. It is determined whether or not it has a frequency (S2005).

Ｓ２００４でｎが葉であると判定された場合（Ｓ２００４：Ｙｅｓ）、または、Ｓ２００５でｎの少なくとも１つの子が正でｋ未満の頻度を有する場合には（Ｓ２００５：Ｎｏ）、位置匿名化部１１２１は、節点ｎを再符号化対象としてフラグを付け（Ｓ２００７）、Ｓ２００２の条件判定に戻る。 If it is determined in S2004 that n is a leaf (S2004: Yes), or if at least one child of n has a positive frequency less than k in S2005 (S2005: No), the position anonymization unit 1121 adds a flag to node n as a re-encoding target (S2007), and returns to the condition determination of S2002.

Ｓ２００５で、ｎの２つの子がいずれも０と同値か、ｋ以上の頻度を持つ場合には（Ｓ２００５：Ｙｅｓ）、位置匿名化部１１２１は、正の頻度を持つｎの子を全てＳにプッシュし（Ｓ２００６）、Ｓ２００２の条件判定に戻る。 In S2005, when the two children of n are both equal to 0 or have a frequency of k or more (S2005: Yes), the position anonymization unit 1121 sets all children of n having a positive frequency to S. Push (S2006) and return to the condition determination of S2002.

以上のように、実施例２の１つの特徴は、乗車した路線と降車した路線を保存することである。これによって、匿名改札入退場データ１３３を用いて分析や経路推定を行った場合にも、正確に乗車した路線を推定することが可能である。 As described above, one feature of the second embodiment is that the route on which the user gets on and the route on which the vehicle gets off are stored. Thus, even when analysis or route estimation is performed using the anonymous ticket gate entrance / exit data 133, it is possible to accurately estimate the route on which the vehicle has been boarded.

また、図１７で示したＳ１５０３の実現方法は、Ｈｕ−Ｔｕｃｋｅｒ木を用いており、Ｈｕ−Ｔｕｃｋｅｒ木は頻度の小さいものを木の深い階層に、頻度の大きい物を木の浅い階層に配置する特徴を持つため、過度の再符号化を避けた有用性の高いデータを生成できることが期待できる。 The implementation method of S1503 shown in FIG. 17 uses a Hu-Tucker tree, and the Hu-Tucker tree arranges a low-frequency thing in a deep hierarchy of a tree and a high-frequency thing in a shallow hierarchy of a tree. Since it has characteristics, it can be expected that highly useful data that avoids excessive re-encoding can be generated.

なお、実施例１の最後に説明したような予測開示リスクも、実施例２においても同様の処理の変更で回避することが可能である。 Note that the prediction disclosure risk described at the end of the first embodiment can also be avoided in the second embodiment by a similar process change.

以下、本発明の実施例３を説明する。その際、実施例１との相違点を主に説明し、実施例１との共通点については説明を省略或いは簡略する。具体的には、例えば、実施例３を説明する場合、上述の実施例１と重複する構成に対しては同じ符号を付与して説明を省略する。また、実施例１と同じ動作に対しては、同じ符号を付与して説明を省略する。 Embodiment 3 of the present invention will be described below. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified. Specifically, for example, when describing the third embodiment, the same reference numerals are given to the same components as those in the first embodiment, and the description thereof is omitted. Further, the same operations as those in the first embodiment are denoted by the same reference numerals and the description thereof is omitted.

実施例３（及び後述の実施例４）では、匿名化される個人情報は、定期券のマスタテーブルに記録されている個人情報（例えば、利用者が利用する区間の一端のノードと他端のノードとを表す情報を含んだ情報）である。 In Example 3 (and Example 4 described later), the personal information to be anonymized is the personal information recorded in the master table of the commuter pass (for example, the node at one end and the other end of the section used by the user). Information including information representing a node).

実施例３では、定期券のマスタテーブルの匿名化が行われる。実施例３では、実施例２で行ったような路線情報の保存を行われず、実施例１と同様の手法が行われる。 In Example 3, the anonymization of the commuter pass master table is performed. In the third embodiment, route information is not stored as in the second embodiment, and the same method as in the first embodiment is performed.

図２１は、本発明の実施例３に係る個人情報匿名化装置が適用された計算機の構成例を示す。 FIG. 21 shows a configuration example of a computer to which the personal information anonymization apparatus according to the third embodiment of the present invention is applied.

ＣＰＵ１０１は、メモリ１０２上にプログラム２１５１をロードし、実行することにより、位置匿名化部２１２１を実現する。 The CPU 101 implements the position anonymization unit 2121 by loading and executing the program 2151 on the memory 102.

位置匿名化部２１２１は、駅隣接構造テーブル１３１と定期マスタテーブル２１３１、及び最小同値件数情報１３４を入力とする。さらに、位置匿名化部２１２１は、後に説明する定期マスタテーブル２１３１を、テーブル中に出現する何れの駅も、少なくとも最小同値件数情報１３４に格納された値ｋ以上の人数が利用していることになるように駅を再符号化し、その結果を匿名定期マスタテーブル２１３２に出力する。なお、位置匿名化部２１２１は、出力装置１０５を介して処理結果を出力してもよい。 The position anonymization unit 2121 receives the station adjacent structure table 131, the regular master table 2131, and the minimum equivalence number information 134 as inputs. Further, the location anonymization unit 2121 uses a periodic master table 2131, which will be described later, at least as many people as the number k stored in the minimum equivalence number information 134 is used in any station that appears in the table. Then, the station is re-encoded and the result is output to the anonymous periodic master table 2132. Note that the position anonymization unit 2121 may output the processing result via the output device 105.

図２２は、定期マスタテーブル２１３１の一例を示す。 FIG. 22 shows an example of the regular master table 2131.

定期マスタテーブル２１３１は複数のレコードを有する。各レコードは、ユーザＩＤ３０１と、駅１（２２０１）と、駅２（２２０２）という情報を含む。１つのレコードは、そのレコード内のユーザＩＤ３０１から識別される個人が、駅１（２２０１）が表す駅から駅２（２２０２）が表す駅までの定期券を所持していることを意味する。図２２の例では、１人の個人が１つの定期券しか所持していないような例となっているが、必ずしもその限りでない。例えば、定期マスタテーブル２１３１の各レコードは、通過駅を表す情報として、駅ｍ（ｍは３以上の整数）という情報を含んでも良い。 The regular master table 2131 has a plurality of records. Each record includes information of user ID 301, station 1 (2201), and station 2 (2202). One record means that the individual identified from the user ID 301 in the record has a commuter pass from the station represented by the station 1 (2201) to the station represented by the station 2 (2202). In the example of FIG. 22, an individual has only one commuter pass, but this is not necessarily the case. For example, each record of the regular master table 2131 may include information on a station m (m is an integer of 3 or more) as information indicating a passing station.

また、上述の定期マスタテーブル２１３１内の情報は予め格納されている。 Information in the above-mentioned regular master table 2131 is stored in advance.

図２３は、匿名定期マスタテーブル２１３２の一例を示す。 FIG. 23 shows an example of the anonymous periodic master table 2132.

匿名定期マスタテーブル２１３２は、定期マスタテーブル２１３１が曖昧化されたテーブルである。匿名定期マスタテーブル２１３２のレコード（以下、匿名レコード）は、定期マスタテーブル２１３１のレコード（以下、通常レコード）に１対１で対応している。 The anonymous regular master table 2132 is a table in which the regular master table 2131 is obscured. Records in the anonymous regular master table 2132 (hereinafter referred to as anonymous records) correspond one-to-one to records in the regular master table 2131 (hereinafter referred to as normal records).

ｎ番目の匿名レコードの駅１（２３０１）は、ｎ番目の通常レコードの駅１（２２０１）が再符号化された情報であり、ｎ番目の匿名レコードの駅２（２３０２）は、ｎ番目の通常レコードの駅２（２２０２）が再符号化された情報である。必ずしも、駅１（２３０１）が駅１（２２０１）を再符号化したもので、駅２（２３０２）が駅２（２２０２）を再符号化したものである必要はなく、駅１（２３０１）が駅２（２２０２）を、駅２（２３０２）が駅１（２２０１）をそれぞれ再符号化したものであってもよい。しかし、説明の簡単のために、本実施例では駅１（２３０１）が駅１（２２０１）を、駅２（２３０２）が駅２（２２０２）をそれぞれ再符号化したものであるとする。 The station 1 (2301) of the nth anonymous record is information obtained by re-encoding the station 1 (2201) of the nth normal record, and the station 2 (2302) of the nth anonymous record is the nth record. The normal record station 2 (2202) is re-encoded information. Station 1 (2301) does not necessarily re-encode station 1 (2201), station 2 (2302) need not re-encode station 2 (2202), and station 1 (2301) The station 2 (2202) and the station 2 (2302) may re-encode the station 1 (2201). However, for the sake of simplicity of explanation, in this embodiment, it is assumed that the station 1 (2301) re-encodes the station 1 (2201), and the station 2 (2302) re-encodes the station 2 (2202).

図２４は、実施例３に係る計算機１００が実行する処理の全体の流れの一例を示す。本実施例では、位置匿名化部２１２１が匿名化処理を行うのみであるから、図２４は、位置匿名化部２１２１の処理の流れの一例を示す。即ち、図２４に示す各ステップを位置匿名化部２１２１が行う。また、図２４では図８と共通するステップが多いため、それらのステップには同一の番号が付与され、ここでは説明を省略する。 FIG. 24 illustrates an example of the overall flow of processing executed by the computer 100 according to the third embodiment. In the present embodiment, since the position anonymization unit 2121 only performs the anonymization process, FIG. 24 shows an example of the process flow of the position anonymization unit 2121. That is, the position anonymization unit 2121 performs each step shown in FIG. In FIG. 24, since there are many steps in common with FIG. 8, the same numbers are assigned to these steps, and description thereof is omitted here.

Ｓ２４０１では、位置匿名化部２１２１は、定期マスタテーブル２１３１を取得し、これをＤとする。 In S2401, the location anonymization unit 2121 acquires the regular master table 2131 and sets it as D.

Ｓ２４０２では、位置匿名化部２１２１は、Ｄから各駅の出現頻度を取得する。ただしここで各駅の出現頻度とは、駅１（２２０１）または駅２（２２０２）のいずれかが各駅と一致するユーザの総数を指す。 In S2402, the position anonymization unit 2121 acquires the appearance frequency of each station from D. Here, the appearance frequency of each station refers to the total number of users whose station 1 (2201) or station 2 (2202) matches each station.

Ｓ２４０３では、位置匿名化部２１２１は、Ｂを匿名定期マスタテーブル２１３２に出力し、処理を終了する。 In S2403, the position anonymization unit 2121 outputs B to the anonymous periodic master table 2132 and ends the process.

以上のように、実施例３の１つの特徴は、実施例１及び２とは異なり、定期券のマスタテーブルを対象として匿名化を行う点である。さらに、必ず隣接する駅とのみ再符号化を行うという特徴もある。その際に利用する指標は、実施例１と同様のものを用いることができ、情報エントロピーを指標として用いる場合に、は実施例１に記したような性質を持つ結果を得ることができる。 As described above, one feature of the third embodiment is that, unlike the first and second embodiments, anonymization is performed on the commuter pass master table. Furthermore, there is a feature that re-encoding is always performed only with adjacent stations. The index used at that time can be the same as that of the first embodiment, and when information entropy is used as the index, a result having the properties described in the first embodiment can be obtained.

以下、本発明の実施例４を説明する。実施例４もまた、定期券のマスタテーブルの匿名化を行うものである。実施例４では、実施例２で行ったように路線情報の保存を行う手法について説明する。以下、実施例４を説明する場合、上述の実施例１、２、３と重複する構成、動作に対しては同じ符号を付与して説明を省略する。 Embodiment 4 of the present invention will be described below. Example 4 also anonymizes the commuter pass master table. In the fourth embodiment, a method of storing route information as performed in the second embodiment will be described. Hereinafter, when describing the fourth embodiment, the same reference numerals are given to the same configurations and operations as those of the first, second, and third embodiments, and the description thereof is omitted.

図２５は、本発明の実施例４に係る個人情報匿名化装置が適用された計算機の構成例を示す。 FIG. 25 shows a configuration example of a computer to which the personal information anonymization apparatus according to the fourth embodiment of the present invention is applied.

ＣＰＵ１０１はメモリ１０２上にプログラム２５５１をロードし、実行することにより、位置匿名化部２５５１及び路線探索部２５２２を実現する。 The CPU 101 loads a program 2551 on the memory 102 and executes it, thereby realizing a position anonymization unit 2551 and a route search unit 2522.

路線探索部２５２２は、定期マスタテーブル２１３１を入力とし、各定期券保持者の定期区間を示す２つの駅の情報から、両駅を乗車または降車する路線を算出し、定期マスタテーブル２１３１の各レコードにこの路線を付与して出力し、位置匿名化部２５２１の入力に渡す。この場合、路線探索部２５２２の実現には、既存の経路探索技術をそのまま利用することができる。あるいは、定期券があらかじめ路線を指定している場合には、別途その情報がストレージ１０３に記録されているか、外部の計算機（図示略）に保存されていることが考えられる。このような場合には、その情報から２つの駅がそれぞれ属する２つの路線が推定されても良い。 The route search unit 2522 receives the regular master table 2131 as an input, calculates routes for getting on and off both stations from information on two stations indicating the regular section of each commuter pass holder, and records each regular master table 2131. The route is assigned to the output, and is passed to the input of the position anonymization unit 2521. In this case, the existing route search technology can be used as it is for realizing the route search unit 2522. Alternatively, if the commuter pass designates a route in advance, the information may be separately recorded in the storage 103 or stored in an external computer (not shown). In such a case, two routes to which two stations respectively belong may be estimated from the information.

図２６は、路線探索部２５２２の出力の一例を示す。 FIG. 26 shows an example of the output of the route search unit 2522.

路線探索部２５２２は、図２２に示した定期マスタテーブル２１３１から、例えばユーザＩＤが１の駅Ａと駅Ｇ間の定期を所持している情報を参照して、駅Ａで乗車または降車時にはａｌｐｈａ路線を使用、駅Ｇで乗車または降車時にはａｌｐｈａ路線を使用という情報を推定する。さらに、路線探索部２５２２は、この情報を駅１、駅２それぞれの駅に関連付けて付与し、出力する。すなわち、路線探索部２５２２から出力される情報は、定期マスタテーブル２１３１に路線情報が付加されたテーブル（駅１及び駅２のそれぞれについて推定された路線を表す情報が付加されたテーブル）である。 The route search unit 2522 refers to, for example, information possessing a period between the station A and the station G having the user ID 1 from the periodical master table 2131 shown in FIG. Information that the route is used and that the alpha route is used when getting on or off at the station G is estimated. Further, the route search unit 2522 gives this information in association with the stations of the station 1 and the station 2 and outputs them. In other words, the information output from the route search unit 2522 is a table in which route information is added to the regular master table 2131 (a table to which information representing routes estimated for each of the station 1 and the station 2 is added).

位置匿名化部２５２１は、路線探索部２５２２から渡されたテーブルと、路線駅隣接構造テーブル１１３１と、最小同値件数情報１３４とを入力とする。さらに、位置匿名化部２５２１は、定期マスタテーブル２１３１を、テーブル中に出現する何れの駅も、少なくとも最小同値件数情報１３４に格納された値ｋ以上の人数が利用していることになるように、かつ再符号化後の駅に集合値として含まれるいずれの駅も本来の駅と同一の路線上の駅であるように駅を再符号化し、その結果を匿名定期マスタテーブル２１３２に出力する。なお、位置匿名化部２５２１は、出力装置１０５を介して処理結果を出力してもよい。 The position anonymization unit 2521 receives the table passed from the route search unit 2522, the route station adjacent structure table 1131, and the minimum equivalence number information 134. Further, the location anonymization unit 2521 uses the regular master table 2131 so that at least any number of stations appearing in the table is used by at least the number k stored in the minimum equivalence number information 134. In addition, the station is re-encoded so that any station included as a set value in the re-encoded station is a station on the same route as the original station, and the result is output to the anonymous periodic master table 2132. Note that the position anonymization unit 2521 may output the processing result via the output device 105.

図２７は、実施例４に係る計算機１００が実行する処理の全体の流れの一例を示す。 FIG. 27 illustrates an example of the overall flow of processing executed by the computer 100 according to the fourth embodiment.

まず、路線探索部２５２２が、定期マスタテーブル２１３１を入力とし、定期区間を示す両端の駅の利用路線を推定し、図２６に示したようなテーブルを作り、位置匿名化部２５２１に渡す（Ｓ２７０１）。次に、位置匿名化部２５２１が、駅を再符号化し、匿名化を達成する（Ｓ２７０２）。 First, the route search unit 2522 receives the regular master table 2131 as an input, estimates the use routes of the stations at both ends indicating the regular section, creates a table as shown in FIG. 26, and passes it to the location anonymization unit 2521 (S2701). ). Next, the position anonymization unit 2521 re-encodes the station to achieve anonymization (S2702).

図２８は、図２７のＳ２７０２の詳細な流れの一例を示す。図２８の各ステップを位置匿名化部２５２１が行う。なお、図８、図１６と同一のステップについては同一の番号が付与され、説明を省略する。図２８の処理例は、実施例２で図１６に示したＳ１５０３の処理例を定期マスタ用に修正した例である。 FIG. 28 shows an example of the detailed flow of S2702 of FIG. The position anonymization unit 2521 performs each step of FIG. Note that the same steps as those in FIG. 8 and FIG. The processing example of FIG. 28 is an example in which the processing example of S1503 shown in FIG.

Ｓ２８０１では、位置匿名化部２５２１は、Ｓ２７０１の出力として路線探索部２５２２から渡された入力テーブルをＥとする。 In S2801, the position anonymization unit 2521 sets E as the input table passed from the route search unit 2522 as the output of S2701.

Ｓ２８０２では、位置匿名化部２５２１は、Ｅの含む各路線を列挙し、各路線に対して別個にループ処理を行う。ループ処理を行う際には、処理を行う路線の駅全体の集合をＤとする。 In S2802, the position anonymization unit 2521 enumerates each route included in E, and performs a loop process separately on each route. When the loop process is performed, a set of all stations on the route to be processed is D.

Ｓ２８０３では、位置匿名化部２５２１は、Ｄに含まれる各駅の頻度をＥから取得する。この際、各駅の頻度として計上するのは、駅１または駅２のいずれかが各駅と一致するユーザの総数である。ただし、一致する駅であっても、利用路線が異なるもの（Ｅから取得できる路線情報とループで処理中の路線が異なるもの）は計上しない。 In S2803, the position anonymization unit 2521 acquires the frequency of each station included in D from E. At this time, what is counted as the frequency of each station is the total number of users whose station 1 or station 2 matches each station. However, even if the stations are the same, those that use different routes (route information that can be acquired from E and routes that are different in the loop) are not counted.

Ｓ２８０４では、位置匿名化部２５２１は、ループで処理中の路線でＥに含まれる駅Ｘまたは駅Ｙのものを、駅｛Ｘ，Ｙ｝に変換する。なお、駅Ｘも駅Ｙも処理中の路線上の駅であり、同じであるはずだから、路線に関しては変換を行う必要がない。 In S2804, the position anonymization unit 2521 converts the station X or station Y included in E on the route being processed in the loop into the station {X, Y}. Note that the station X and the station Y are stations on the route being processed and should be the same, so there is no need to convert the route.

Ｓ２８０５では、位置匿名化部２５２１は、Ｄから駅Ｘ、及び駅Ｙを消去し、代わりに駅｛Ｘ，Ｙ｝を追加する。 In S2805, the position anonymization unit 2521 deletes the station X and the station Y from D, and adds a station {X, Y} instead.

Ｓ２８０６では、位置匿名化部２５２１は、Ｅを匿名定期マスタテーブル２１３２に格納する。なお、本例では、格納時に列「路線１」と列「路線２」は取り除く。 In S2806, the position anonymization unit 2521 stores E in the anonymous periodic master table 2132. In this example, the column “Route 1” and the column “Route 2” are removed during storage.

図２９は、図２７のＳ２７０２の詳細な流れの別の一例を示す。なお、図２９の全てのステップは、図８、図１６、図１７、図２８と同一であり、これらのステップについては同一の番号を付与し、説明を省略する。図２９の処理例は、実施例２で図１７に示したＳ１５０３の処理例を定期マスタ用に修正した例である。 FIG. 29 shows another example of the detailed flow of S2702 of FIG. Note that all steps in FIG. 29 are the same as those in FIGS. 8, 16, 17, and 28, and the same numbers are assigned to these steps, and descriptions thereof are omitted. The processing example of FIG. 29 is an example in which the processing example of S1503 shown in FIG.

以上のように、実施例４の１つの特徴は、実施例３と同様に、定期券のマスタテーブルを対象として匿名化を行う点である。実施例３との相違点は、再符号化後の匿名定期マスタテーブルに含まれる駅タプルから推定される利用路線が、再符号化前のデータから推定される利用路線と一致する点である。 As described above, one feature of the fourth embodiment is that, as in the third embodiment, the anonymization is performed on the commuter pass master table. The difference from the third embodiment is that the use route estimated from the station tuple included in the anonymous periodic master table after re-encoding matches the use route estimated from the data before re-encoding.

以上、本発明の幾つかの実施例を説明したが、これらは、本発明の説明のための例示であって、本発明の範囲をこれらの実施例にのみ限定する趣旨ではない。すなわち、本発明は、他の種々の形態でも実施する事が可能である。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 As mentioned above, although several Example of this invention was described, these are illustrations for description of this invention, Comprising: It is not the meaning which limits the scope of the present invention only to these Examples. That is, the present invention can be implemented in various other forms. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. In addition, a part of the configuration of a certain embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of a certain embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

１００：計算機、１０１：ＣＰＵ、１０２：メモリ、１２１：時区間化部、１２２：位置匿名化部、１０３：ストレージ、１３１：駅隣接構造テーブル、１３２：改札入退場テーブル、１３３：匿名改札入退場テーブル、１３４：最小同値件数情報、１５１：プログラム、１０４：入力装置、１０５：出力装置、１０６：通信装置、１０７：内部通信線、１１２１：位置匿名化部、１１２２：路線探索部、１１３１：路線駅隣接構造テーブル、１１５１：プログラム、２１２１：位置匿名化部、２１３１：定期マスタテーブル、２１３２：匿名定期マスタテーブル、２１５１：プログラム、２５２１：位置匿名化部、２５２２：路線探索部、２５５１：プログラム 100: Computer, 101: CPU, 102: Memory, 121: Time segmentation unit, 122: Location anonymization unit, 103: Storage, 131: Station adjacent structure table, 132: Ticket gate entry / exit table, 133: Anonymous ticket gate entry / exit Table: 134: minimum equivalence number information, 151: program, 104: input device, 105: output device, 106: communication device, 107: internal communication line, 1121: position anonymization unit, 1122: route search unit, 1131: route Station adjacent structure table, 1151: Program, 2121: Location anonymization unit, 2131: Regular master table, 2132: Anonymous regular master table, 2151: Program, 2521: Location anonymization unit, 2522: Route search unit, 2551: Program

Claims

Adjacent relation information including adjacency tuples representing adjacency relations indicating which positions are adjacent to which of a finite number of plural positions, personal identification information, and the individual's use at the plural positions Storage means for storing personal management information including a personal information tuple for each individual, which is personal information including positional information indicating a position to be performed;
By aggregating only the positions in the adjacent relationship indicated by the adjacent relationship information, the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same set of positions becomes a predetermined threshold or more. Thus, the personal information anonymization apparatus which has the anonymization means which anonymizes each personal information tuple.

The personal information anonymization device according to claim 1,
Route estimation means for estimating a route to which the position represented by the position information in the personal information tuple belongs, for each personal information tuple;
Further comprising
The adjacency information includes information indicating a route including a link connecting a position and a position adjacent to the position for each adjacency,
The anonymization means anonymizes each personal information tuple so that the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set on the same route is equal to or greater than a predetermined threshold. To
Personal information anonymization device.

The personal information anonymization device according to claim 2,
The structure of each path represented by the adjacency information is either a linear structure or a ring structure,
The anonymization means anonymizes each personal information tuple using a binary tree generated by constructing a Hu-Tucker tree;
Personal information anonymization device.

The personal information anonymization device according to claim 1 or 2,
The anonymization means sets the position where the number of individuals used is the smallest among two or more positions adjacent to the first position among the plurality of positions as the second position, and In anonymization, the first position and the second position are aggregated;
Personal information anonymization device.

The personal information anonymization device according to claim 4,
The second position is (-X frequency) × log (X frequency / (X frequency + S frequency) among the aggregation candidates S that are two or more positions adjacent to the first position (X). Frequency))-S frequency × log (frequency of S / (frequency of X + frequency of S)) is the smallest position.
Personal information anonymization device.

The personal information anonymization device according to any one of claims 1 to 5,
Each of the personal information tuples includes time information indicating the time of entering or leaving the used position.
A time sectioning means for obscuring the time represented by the time information of each personal information tuple into a time section;
Further comprising
Each anonymized personal information tuple contains information representing time intervals,
The anonymization means anonymizes each personal information tuple so that the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set and time interval is equal to or greater than a predetermined threshold. To
Personal information anonymization device.

The personal information anonymization device according to any one of claims 1 to 5,
Each of the personal information tuples includes entry / exit information that indicates whether the entry is to or from the location used.
The anonymization means anonymizes each personal information tuple so that the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same location set and entrance / exit information is equal to or greater than a predetermined threshold. To
Personal information anonymization device.

The personal information anonymization device according to claim 6,
Each of the personal information tuples includes entry / exit information that indicates whether the entry is to or from the location used.
The anonymization means is configured so that the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set, time interval, and entrance / exit information is equal to or greater than a predetermined threshold. Anonymize tuples,
Personal information anonymization device.

The personal information anonymization device according to claim 6 or 8,
The time segmenting means converts the time in each personal information tuple to a time interval so that the number of personal information tuples including information representing the time belonging to the time interval is substantially uniform in a plurality of time intervals. Vague,
Personal information anonymization device.

The personal information anonymization device according to any one of claims 1 to 9,
The anonymization means anonymizes the individual information tuples so as to be the same position set for the same position included in the personal management information storage means.
Personal information anonymization device.

The personal information anonymization device according to any one of claims 1 to 5,
Each personal information tuple includes, as position information, information representing two or more different positions,
For each of the two or more different positions, the anonymization means collects only the positions in the adjacent relationship indicated by the adjacent relationship information, thereby obtaining the same position set for each of the two or more different positions. Anonymizing each personal information tuple so that the number of different individuals corresponding to the plurality of anonymized personal information tuples included is greater than or equal to a predetermined threshold;
Personal information anonymization device.

Refer to adjacency information including adjacency tuples representing adjacency relations indicating which positions are adjacent to which position of a finite number of positions, for each position,
Each of the personal information tuples in the personal management information including personal information tuples, which are personal information including personal identification information and position information indicating positions used by the individual at the plurality of positions, for each individual By aggregating only the positions in the adjacent relationship indicated by the information, the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set is equal to or greater than a predetermined threshold. Anonymize,
Personal information anonymization method.

With reference to adjacency information including adjacency tuples representing adjacency relations indicating which positions are adjacent to which positions of a finite number of positions, for each position,
Each of the personal information tuples in the personal management information including personal information tuples, which are personal information including personal identification information and position information indicating positions used by the individual at the plurality of positions, for each individual By aggregating only the positions in the adjacent relationship indicated by the information, the number of different individuals corresponding to a plurality of anonymized personal information tuples including the same position set is equal to or greater than a predetermined threshold. Anonymize,
A computer program that causes a computer to execute.