JP2017182508A

JP2017182508A - Anonymizing device, anonymizing method and computer program

Info

Publication number: JP2017182508A
Application number: JP2016069612A
Authority: JP
Inventors: 優一真田; Yuichi Sanada; 悠佑榎本; Yusuke Enomoto; 柳本　清; Kiyoshi Yanagimoto; 清柳本; 浩鞍留; Hiroshi Kuratome; 寛寺門; Hiroshi Terakado
Original assignee: Nippon Telegraph and Telephone West Corp
Current assignee: Nippon Telegraph and Telephone West Corp
Priority date: 2016-03-30
Filing date: 2016-03-30
Publication date: 2017-10-05

Abstract

PROBLEM TO BE SOLVED: To provide an anonymizing technique capable of reducing the time required for anonymizing processing.SOLUTION: An anonymizing device performs a series of processing including the steps of: acquiring plural records which are non-anonymized information having plural non-anonymized attribute values; acquiring a predetermined priority sequence for each of the plural attributes; k-anonymizing the plural records for each attribute unit; and anonymizing the attribute unit anonymizing records across the plural attributes by increasing the number of records with different attribute value as a new attribute value, the lower the priority of acquired sequence from the anonymized attribute unit anonymizing records based on attribute unit.SELECTED DRAWING: Figure 1

Description

本発明は、情報の匿名化技術に関する。 The present invention relates to information anonymization technology.

従来、多くの情報をビッグデータとして収集し、それらを解析することによって新たな情報を取得することが行われている。ビッグデータには、個人の情報などそのままでは解析にかけることができない情報も含まれている。そのため、収集された情報を二次利用することが可能となるように、収集された情報に対して匿名化処理が行われている。 Conventionally, a lot of information is collected as big data, and new information is acquired by analyzing them. Big data also includes information that cannot be directly analyzed, such as personal information. Therefore, anonymization processing is performed on the collected information so that the collected information can be secondarily used.

特開２０１５−０４６０３０号公報JP, 2015-046030, A

しかしながら、従来の匿名化処理では、大容量データで複数の属性を跨った匿名化処理を行う場合、処理時間は情報数、属性数、及び属性値数に応じて匿名化処理に要する時間が大幅に増加するという問題点があった。
上記事情に鑑み、本発明は、匿名化処理に要する時間を抑制可能な匿名化技術を提供することを目的としている。 However, in the conventional anonymization process, when anonymization process is performed across a plurality of attributes with a large amount of data, the processing time greatly depends on the number of information, the number of attributes, and the number of attribute values. There was a problem of increasing.
In view of the above circumstances, an object of the present invention is to provide an anonymization technique capable of suppressing the time required for anonymization processing.

本発明の一態様は、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得部と、前記複数の属性に対して予め定められた優先順位を取得する順位取得部と、前記取得部により取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化部と、個別属性匿名化部により属性単位で匿名化された属性単位匿名化レコードから、前記順位取得部により取得された前記優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化部と、を有する匿名化装置である。 One aspect of the present invention provides a record acquisition unit that acquires a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized, and a priority order that is predetermined for the plurality of attributes. The order acquisition unit to be acquired, the individual attribute anonymization unit that anonymizes the plurality of records acquired by the acquisition unit, and the attribute unit anonymization anonymized by the attribute unit by the individual attribute anonymization unit The attribute unit anonymization record spans a plurality of attributes by increasing the number of records having different attribute values as one new attribute value as the attribute with the lower priority acquired by the rank acquisition unit from the record. An anonymization device having a multi-attribute anonymization unit for anonymization.

本発明の一態様は、上記匿名化装置であって、前記複数属性匿名化部は、異なる属性値を新たな１つの属性値とすることで匿名性を判定し、匿名性が担保されていない場合には、異なる属性値を新たな１つの属性値とした属性より優先順位の高い属性の異なる属性値を新たな１つの属性値として匿名性を判定することを繰り返すことで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する。 One aspect of the present invention is the anonymization device, wherein the multi-attribute anonymization unit determines anonymity by setting different attribute values as one new attribute value, and anonymity is not ensured. In this case, the attribute unit anonymization is performed by repeatedly determining anonymity using a different attribute value having a higher priority than an attribute having a different attribute value as a new attribute value as a new attribute value. Anonymize records across multiple attributes.

本発明の一態様は、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得ステップと、前記複数の属性に対して予め定められた優先順位を取得する順位取得ステップと、前記取得ステップにより取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化ステップと、個別属性匿名化ステップにより属性単位で匿名化された属性単位匿名化レコードから、前記順位取得ステップにより取得された前記優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化ステップと、を有する匿名化方法である。 One aspect of the present invention is a record acquisition step of acquiring a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized, and a priority order that is predetermined for the plurality of attributes. The order acquisition step for acquiring, the individual attribute anonymization step for anonymizing the plurality of records acquired by the acquisition step, and the attribute unit anonymization anonymized by the attribute unit by the individual attribute anonymization step The attribute unit anonymization record spans a plurality of attributes by increasing the number of records having different attribute values as one new attribute value for the attributes with lower priority obtained in the order obtaining step from the record. A multi-attribute anonymization step for anonymization.

本発明の一態様は、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得ステップと、前記複数の属性に対して予め定められた優先順位を取得する順位取得ステップと、前記取得ステップにより取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化ステップと、個別属性匿名化ステップにより属性単位で匿名化された属性単位匿名化レコードから、前記順位取得ステップにより取得された前記優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化ステップと、をコンピュータに実行させるためのコンピュータプログラムである。 One aspect of the present invention is a record acquisition step of acquiring a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized, and a priority order that is predetermined for the plurality of attributes. The order acquisition step for acquiring, the individual attribute anonymization step for anonymizing the plurality of records acquired by the acquisition step, and the attribute unit anonymization anonymized by the attribute unit by the individual attribute anonymization step The attribute unit anonymization record spans a plurality of attributes by increasing the number of records having different attribute values as one new attribute value for the attributes with lower priority obtained in the order obtaining step from the record. A computer program for causing a computer to execute a multi-attribute anonymization step for anonymization.

本発明により、匿名化処理に要する時間を抑制可能となる。 According to the present invention, the time required for the anonymization process can be suppressed.

匿名化システム１のシステム構成を表すシステム構成図である。1 is a system configuration diagram illustrating a system configuration of an anonymization system 1. FIG. 匿名化装置２０の処理の流れの具体例を示すフローチャートである。4 is a flowchart illustrating a specific example of a process flow of the anonymization device 20. 匿名化装置２０の処理の流れの具体例を示すフローチャートである。4 is a flowchart illustrating a specific example of a process flow of the anonymization device 20. 同値レコードグループ化処理の流れの具体例を示すフローチャートである。It is a flowchart which shows the specific example of the flow of an equivalence record grouping process. 図３のフローチャートに示される処理の処理内容を示す図である。It is a figure which shows the processing content of the process shown by the flowchart of FIG. 条件情報が階層内属性値間結合定義の場合における属性単位でのｋ匿名化の一例を示す図である。It is a figure which shows an example of k anonymization in an attribute unit in the case where condition information is the definition definition between attribute values in a hierarchy. 匿名化装置によるｋ匿名化の処理結果を示す図である。It is a figure which shows the process result of k anonymization by the anonymization apparatus. 条件情報が閾値定義の場合の結合処理例を示す図である。It is a figure which shows the example of a joint process in case condition information is a threshold value definition. 条件情報が階層内属性値間結合定義の場合の結合処理例を示す図である。It is a figure which shows the example of a joint process in case the condition information is the definition definition in the attribute value in hierarchy. 条件情報が時系列データにおける重心の場合の結合処理例を示す図である。It is a figure which shows the example of a joint process in case condition information is the gravity center in time series data.

図１は、匿名化システム１のシステム構成を表すシステム構成図である。匿名化システム１は、非匿名化情報記憶部１０、匿名化装置２０、及び匿名化情報記憶部３０を備える。 FIG. 1 is a system configuration diagram illustrating a system configuration of the anonymization system 1. The anonymization system 1 includes a non-anonymization information storage unit 10, an anonymization device 20, and an anonymization information storage unit 30.

非匿名化情報記憶部１０は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。非匿名化情報記憶部１０は、匿名化されていない情報（以下「非匿名化情報」という。）を記憶する。非匿名化情報は、少なくとも１以上の属性を含む。１又は複数の属性の組み合わせによって表される情報のかたまりをレコードと呼ぶ。例えば、血圧値のレコードは、測定日時と、測定対象の人物を示す識別情報（例えば氏名、診察券番号など）と、血圧値と、の各属性の値を有する。また、一つの属性が時系列のデータとして表された情報のかたまりもレコードと呼ぶ。例えば、ｎ個の時系列データをｎ次元における点と見做すと、ｎ個の時系列データは複数の点で表される。さらに、１つの属性が階層化されて表された情報のかたまりをレコードと呼ぶ。非匿名化情報は、このようなレコードとして表されてもよい。 The non-anonymized information storage unit 10 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The non-anonymized information storage unit 10 stores information that is not anonymized (hereinafter referred to as “non-anonymized information”). The non-anonymized information includes at least one attribute. A group of information represented by a combination of one or more attributes is called a record. For example, the blood pressure value record includes the measurement date and time, identification information indicating the person to be measured (for example, name, examination ticket number, and the like), and blood pressure value. A group of information in which one attribute is represented as time-series data is also called a record. For example, when n pieces of time series data are regarded as points in the n dimension, the n pieces of time series data are represented by a plurality of points. Furthermore, a group of information expressed by hierarchizing one attribute is called a record. Non-anonymized information may be represented as such a record.

非匿名化情報記憶部１０によって記憶される非匿名化情報において、匿名化の対象となっている属性（以下「匿名化対象属性」という。）は、例えば数値によって表される。例えば、匿名化対象属性は、価格、血圧、気温、消費電力量、給与、動作の回数（例：購入回数、閲覧回数）、年齢である。なお、非匿名化情報記憶部１０が記憶する情報の一部には、既に匿名化された情報が含まれていてもよい。 In the non-anonymization information stored by the non-anonymization information storage unit 10, an attribute that is an anonymization target (hereinafter referred to as “anonymization target attribute”) is represented by a numerical value, for example. For example, the anonymization target attributes are price, blood pressure, temperature, power consumption, salary, number of operations (eg, number of purchases, number of browsing), and age. Note that information already anonymized may be included in part of the information stored in the non-anonymized information storage unit 10.

非匿名化情報記憶部１０は、さらに条件情報を記憶する。条件情報は、非匿名化情報に関する各種条件を示す情報である。条件情報の具体例として、閾値定義、情報連結定義、階層内属性値間結合定義、時系列データにおける重心、優先順位がある。閾値定義は、属性の値に関して予め与えられた閾値を示す情報である。閾値によって複数のクラスが定義される。例えば、属性が血圧である数値に対して予め与えられる閾値定義として、血圧の値が９０以下の低血圧というクラス、血圧の値が９０〜１４０の正常血圧というクラス、血圧の値が１４０以上の高血圧というクラス、という３つのクラスの定義がある。情報連結定義は、一つの属性の値に関して与えられる複数のサブ属性を示す情報である。例えば、属性が年月日である数値に対して予め与えられる情報連結定義として、最初の４桁の数値が年を表し、次の２桁の数値が月を表し、最後の２桁の数値が日を表す、という定義がある。この場合、年、月、日という３つのサブ属性が定義されている。 The non-anonymized information storage unit 10 further stores condition information. The condition information is information indicating various conditions regarding the non-anonymized information. Specific examples of condition information include threshold definition, information connection definition, intra-hierarchy attribute value connection definition, centroid in time series data, and priority. The threshold definition is information indicating a threshold given in advance with respect to the attribute value. Multiple classes are defined by the threshold. For example, as a threshold definition given in advance for a numerical value whose attribute is blood pressure, a class of low blood pressure whose blood pressure value is 90 or less, a class of normal blood pressure whose blood pressure value is 90 to 140, and a blood pressure value of 140 or more There are three classes of definition: high blood pressure. The information connection definition is information indicating a plurality of sub-attributes given for one attribute value. For example, as an information connection definition given in advance for a numerical value whose attribute is year / month / day, the first four digits represent the year, the next two digits represent the month, and the last two digits There is a definition that represents the day. In this case, three sub-attributes of year, month, and day are defined.

階層内属性値間結合定義は、一つの属性に関する各階層の各値に与えられる同一階層内で結合可能な他の値を示す情報である。ここで、結合可能とは、ある階層のある値が属するグループに、同一階層内の他の値を含めることができることを表す。時系列データにおける重心とは、複数の時系列データをｎ次元上の点で表した場合に、それらの点で構成される図形の重心である。優先順位は、複数の属性に対して予め定められており、有用性を担保するためのものである。例えば、重要な属性の属性値の匿名性を高めると有用性が損なわれる。そこで、重要な属性に高い優先順位を設定しておくことで、有用性が担保される。 The intra-hierarchical attribute value connection definition is information indicating other values that can be combined in the same hierarchy given to each value of each hierarchy related to one attribute. Here, “combinable” means that another value in the same hierarchy can be included in a group to which a certain value in a certain hierarchy belongs. The centroid in the time series data is the centroid of a figure constituted by these points when a plurality of time series data is represented by points on the n-dimension. The priorities are predetermined for a plurality of attributes, and are intended to ensure usability. For example, if the anonymity of attribute values of important attributes is increased, the usefulness is impaired. Therefore, usefulness is ensured by setting high priority to important attributes.

匿名化装置２０は、非匿名化情報記憶部１０に記憶される非匿名化情報の複数のレコードを、ｋ匿名化する。匿名化装置２０は、メインフレームやワークステーションやパーソナルコンピュータなどの情報処理装置を用いて構成される。匿名化装置２０は、バスで接続されたＣＰＵ（Central Processing Unit）やメモリや補助記憶装置などを備える。匿名化装置２０は、匿名化プログラムを実行することによって、条件情報設定部２０１、個別属性匿名化部２０２、及び複数属性匿名化部２０３を備える装置として機能する。なお、匿名化装置２０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。 The anonymization device 20 anonymizes a plurality of records of non-anonymization information stored in the non-anonymization information storage unit 10. The anonymization device 20 is configured using an information processing device such as a mainframe, a workstation, or a personal computer. The anonymization device 20 includes a CPU (Central Processing Unit), a memory, an auxiliary storage device, and the like connected by a bus. The anonymization device 20 functions as a device including the condition information setting unit 201, the individual attribute anonymization unit 202, and the multi-attribute anonymization unit 203 by executing the anonymization program. Note that all or part of each function of the anonymization device 20 may be realized by using hardware such as an application specific integrated circuit (ASIC), a programmable logic device (PLD), or a field programmable gate array (FPGA). .

条件情報設定部２０１は、非匿名化情報記憶部１０から、匿名化対象属性に関する条件情報を取得する。条件情報設定部２０１は、取得した条件情報を、個別属性匿名化部２０２、及び複数属性匿名化部２０３が参照可能な情報としてメモリに設定する。個別属性匿名化部２０２は、非匿名化情報記憶部１０から、複数のレコードを取得し、条件情報を参照して、レコードを属性ごとに属性単位でｋ匿名化し、属性単位でｋ匿名化されたレコードを複数属性匿名化部２０３に出力する。 The condition information setting unit 201 acquires condition information related to the anonymization target attribute from the non-anonymization information storage unit 10. The condition information setting unit 201 sets the acquired condition information in the memory as information that can be referred to by the individual attribute anonymization unit 202 and the multiple attribute anonymization unit 203. The individual attribute anonymization unit 202 acquires a plurality of records from the non-anonymization information storage unit 10, refers to the condition information, anonymizes the records for each attribute by attribute unit, and is anonymized by attribute unit. The record is output to the multi-attribute anonymization unit 203.

属性ごとに属性単位でｋ匿名化されていても、レコードは複数の属性を備えるため、複数のレコード全体がｋ匿名化されているとは限らない。そこで、複数属性匿名化部２０３は、属性単位でｋ匿名化されたレコードを複数の属性にまたがってｋ匿名化し、複数のレコード全体をｋ匿名化する。そして、複数属性匿名化部２０３は、ｋ匿名化された複数のレコードを匿名化情報記憶部３０に出力する。 Even if each attribute is anonymized on an attribute basis, since the record has a plurality of attributes, the entire plurality of records are not necessarily anonymized. Therefore, the multi-attribute anonymization unit 203 anonymizes a record that has been anonymized in attribute units over a plurality of attributes, and anonymizes the entire plurality of records. Then, the multi-attribute anonymization unit 203 outputs a plurality of k-anonymized records to the anonymization information storage unit 30.

匿名化情報記憶部３０は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。匿名化情報記憶部３０は、匿名化装置２０によって出力された、ｋ匿名化されたレコードを記憶する。 The anonymization information storage unit 30 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The anonymization information storage unit 30 stores the anonymized record output by the anonymization device 20.

図２は、匿名化装置２０の処理の流れの具体例を示すフローチャートである。このフローチャートは、レコード数がＮＲのレコード全体をｋ匿名化する場合の処理を示している。まず、条件情報設定部２０１は、複数のレコードに関する条件情報を非匿名化情報記憶部１０から取得し、取得した条件情報を、個別属性匿名化部２０２、及び複数属性匿名化部２０３が参照可能な情報としてメモリに設定する（ステップＳ１０１）。 FIG. 2 is a flowchart showing a specific example of the processing flow of the anonymization device 20. This flowchart shows a process in the case of anonymizing an entire record having the number of records NR. First, the condition information setting unit 201 acquires condition information regarding a plurality of records from the non-anonymized information storage unit 10, and the acquired attribute information can be referred to by the individual attribute anonymization unit 202 and the multiple attribute anonymization unit 203. Information is set in the memory as new information (step S101).

個別属性匿名化部２０２は、複合対象となる属性を含むレコードを非匿名化情報記憶部１０から取得する（ステップＳ１０２）。複合対象となる属性は、ｋ匿名化対象の属性である。例えば、レコードが３種類の属性を持つ場合に、そのうちの２種類の属性のみをｋ匿名化するときには、２種類の属性の属性値が複合対象となる非匿名化情報となる。さらに、非匿名化情報の更新が発生した場合、個別属性匿名化部２０２が取得するレコードは、属性値の変更があった属性に限る。このように属性値の変更があった属性のみを匿名化するため、より高速化を図ることができる。 The individual attribute anonymization unit 202 acquires a record including the attribute to be combined from the non-anonymization information storage unit 10 (step S102). The attribute to be combined is the attribute to be anonymized. For example, when a record has three types of attributes, when only two types of attributes are anonymized, the attribute values of the two types of attributes become non-anonymized information to be combined. Furthermore, when the non-anonymization information is updated, the record acquired by the individual attribute anonymization unit 202 is limited to the attribute whose attribute value has been changed. Since only the attribute whose attribute value has been changed is anonymized in this way, the speed can be further increased.

個別属性匿名化部２０２は、取得した非匿名化情報のうちのレコードを属性単位でｋ匿名化する（ステップＳ１０３）。属性単位でｋ匿名化されたレコードは、複数属性匿名化部２０３に出力される。 The individual attribute anonymization unit 202 anonymizes the record of the acquired non-anonymization information in attribute units (step S103). The record anonymized in attribute units is output to the multi-attribute anonymization unit 203.

複数属性匿名化部２０３は、複数のレコード全体をｋ匿名化するために、まずループカウンタｉを１で初期化する（ステップＳ１０４）。複数属性匿名化部２０３は、優先順位が最も低い属性の属性値を昇順でソートする（ステップＳ１０５）。複数属性匿名化部２０３は、カウンタｉを一時記憶するための変数ｔに代入し、同値のレコード数を示す変数ｊに１を代入する（ステップＳ１０６）。 The multi-attribute anonymization unit 203 first initializes the loop counter i with 1 in order to anonymize the entire plurality of records (step S104). The multi-attribute anonymization unit 203 sorts attribute values of attributes having the lowest priority in ascending order (step S105). The multi-attribute anonymization unit 203 substitutes the counter i into a variable t for temporarily storing, and substitutes 1 into a variable j indicating the number of records with the same value (step S106).

複数属性匿名化部２０３は、同値レコードグループ化処理を行う（ステップＳ１０７）。この同値レコードグループ化処理では、全属性の属性値が全て同じレコードを１つのグループとする。同値レコードグループ化処理では、属性値が異なるレコードが出現した場合に処理を終了する。 The multi-attribute anonymization unit 203 performs an equivalence record grouping process (step S107). In this equivalence record grouping process, records having the same attribute values for all attributes are grouped into one group. In the equivalence record grouping process, the process ends when a record having a different attribute value appears.

複数属性匿名化部２０３は、変数ｊがｋ以上か否かを判定する（ステップＳ１０８）。このステップＳ１０８における「ｋ」はｋ匿名化のｋである。従って、ステップＳ１０８で肯定判定された場合には、同値のレコードがｋ以上存在するので、ｊ個のレコードはｋ匿名化されていることを示している。 The multi-attribute anonymization unit 203 determines whether or not the variable j is greater than or equal to k (step S108). “K” in this step S108 is k for k anonymization. Therefore, if an affirmative determination is made in step S108, there are k or more equivalent records, indicating that j records are anonymized.

変数ｊがｋ以上の場合には（ステップＳ１０８−ＹＥＳ）、複数属性匿名化部２０３は、ｒ（ｔ）のレコードとその個数（ｊ個）を匿名化情報記憶部３０に出力することで登録する（ステップＳ１０９）。複数属性匿名化部２０３は、カウンタｉがレコード数ＮＲと等しいか否かを判定する（ステップＳ１１０）。カウンタｉがレコード数ＮＲと等しい場合には（ステップＳ１１０−ＹＥＳ）、複数属性匿名化部２０３は、全レコードに対する処理が終了したので、本処理を終了する。 When the variable j is greater than or equal to k (step S108—YES), the multi-attribute anonymization unit 203 registers by outputting r (t) records and their number (j) to the anonymization information storage unit 30. (Step S109). The multi-attribute anonymization unit 203 determines whether or not the counter i is equal to the record number NR (step S110). When the counter i is equal to the record number NR (step S110—YES), the multi-attribute anonymization unit 203 ends the process because the process for all the records is completed.

カウンタｉがレコード数ＮＲと異なる場合には（ステップＳ１１０−ＮＯ）、複数属性匿名化部２０３は、カウンタｉを１増分する（ステップＳ１１１）。複数属性匿名化部２０３は、ｒ（ｉ）が処理済みか否かを判定する（ステップＳ１１２）。ｒ（ｉ）が処理済みとは、ステップＳ１０９で示したように、匿名化情報記憶部３０に出力され、登録されたレコードであることを示す。ｒ（ｉ）が処理済みの場合には（ステップＳ１１１−ＹＥＳ）、複数属性匿名化部２０３は、再びステップＳ１１１でカウンタｉを１増分する。ｒ（ｉ）が処理済みではない場合には（ステップＳ１１１−ＮＯ）、複数属性匿名化部２０３は、ステップＳ１０５に戻る。 When the counter i is different from the record number NR (step S110—NO), the multi-attribute anonymization unit 203 increments the counter i by 1 (step S111). The multi-attribute anonymization unit 203 determines whether r (i) has been processed (step S112). The fact that r (i) has been processed indicates that the record is output and registered in the anonymized information storage unit 30 as shown in step S109. When r (i) has been processed (step S111—YES), the multi-attribute anonymization unit 203 increments the counter i by 1 again in step S111. If r (i) has not been processed (step S111—NO), the multi-attribute anonymization unit 203 returns to step S105.

上記ステップＳ１０８において、変数ｊがｋ未満の場合には（ステップＳ１０８−ＮＯ）、ｋ匿名性が担保されていないため、複数属性匿名化部２０３は、次の図３に示されるように、優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させる。上記「異なる属性値を新たな１つの属性値とする」ことを、以下の説明では「結合」と表現することがある。結合を具体的に説明すると、ステップＳ１０８でＮＯとなった時点で、ｒ（ｔ）からｒ（ｉ―１）まで、同値のレコードが存在する。これらのレコードと、ｒ（ｉ）以降のレコード（例えばｒ（ｓ）、ｒ（ｕ）…）とを、ある属性の新たな１つの属性値とすることが結合である。なお、結合するレコードは、１つの場合もあれば、複数の場合もある。こうして結合されたレコード全体でｋ匿名化が担保されいているか否かが判定される。また、あるレコードが結合可能なレコードか否かについては、条件情報により判定される。結合処理の具体例については後述する。 In the above step S108, when the variable j is less than k (step S108-NO), k anonymity is not ensured, so the multi-attribute anonymization unit 203 has priority as shown in FIG. As the attribute has a lower rank, the number of records having different attribute values as one new attribute value is increased. In the following description, “making a different attribute value a new attribute value” may be expressed as “combined”. More specifically, the combination has records of the same value from r (t) to r (i-1) when NO is determined in step S108. The combination of these records and records after r (i) (for example, r (s), r (u)...) As one new attribute value of a certain attribute is a combination. There may be one record or a plurality of records to be combined. It is determined whether or not k anonymization is secured in the entire record thus combined. Whether or not a certain record can be combined is determined based on the condition information. A specific example of the combining process will be described later.

図３に移り、複数属性匿名化部２０３は、ループカウンタｎを１で初期化する（ステップＳ１２０）。カウンタｎは、各属性における結合処理回数を示す。複数属性匿名化部２０３は、ループカウンタｍを１で初期化する（ステップＳ１２１）。カウンタｍは、結合処理を行う属性を示す。カウンタｍが大きいほど、優先順位が高い属性であることを示している。 Moving to FIG. 3, the multi-attribute anonymization unit 203 initializes the loop counter n with 1 (step S120). The counter n indicates the number of times of combination processing for each attribute. The multi-attribute anonymization unit 203 initializes the loop counter m with 1 (step S121). The counter m indicates an attribute for performing the combining process. The larger the counter m, the higher the priority.

複数属性匿名化部２０３は、優先順位ｍの属性の属性値を昇順でソートする（ステップＳ１２２）。複数属性匿名化部２０３は、当該属性にて結合可能な未処理の属性値が存在するか否かを判定する（ステップＳ１２３）。ここで、結合可能な未処理の属性値が存在しない場合の例は以下の通りである。
・属性が数値情報の場合では当該属性の最大の属性値が処理済である。
・属性が閾値定義によりグループ化された数値情報の場合では、当該グループの最大の属性値が処理済である。
・属性が階層化された情報の場合では当該属性値に結合可能な属性値がすべて処理済である。これら３つの例に限らず、結合可能な未処理の属性値が存在しない場合には（ステップＳ１２３−ＮＯ）、複数属性匿名化部２０３は、ステップＳ１２８に進む。 The multi-attribute anonymization unit 203 sorts attribute values of attributes having the priority order m in ascending order (step S122). The multi-attribute anonymization unit 203 determines whether there is an unprocessed attribute value that can be combined with the attribute (step S123). Here, an example when there is no unprocessed attribute value that can be combined is as follows.
-When the attribute is numerical information, the maximum attribute value of the attribute has been processed.
In the case where the attribute is numerical information grouped by the threshold definition, the maximum attribute value of the group has been processed.
In the case of hierarchical information, all attribute values that can be combined with the attribute value have been processed. In addition to these three examples, when there is no unprocessed attribute value that can be combined (step S123-NO), the multi-attribute anonymization unit 203 proceeds to step S128.

結合可能な未処理の属性値が存在する場合には（ステップＳ１２３−ＹＥＳ）、複数属性匿名化部２０３は、優先順位ｍの属性について結合処理を行う（ステップＳ１２４）。複数属性匿名化部２０３は、ステップＳ１２４において新たに結合したレコード数を、変数ｊに加算する（ステップＳ１２５）。 When there are unprocessed attribute values that can be combined (step S123-YES), the multi-attribute anonymization unit 203 performs a combining process on the attribute with the priority m (step S124). The multi-attribute anonymization unit 203 adds the number of records newly combined in step S124 to the variable j (step S125).

複数属性匿名化部２０３は、変数ｊがｋ以上か否かを判定する（ステップＳ１２６）。このステップＳ１２５における「ｋ」もｋ匿名化のｋである。従って、ステップＳ１２６で肯定判定された場合には、結合処理によって、ｊ個のレコードはｋ匿名化されていることを示している。変数ｊがｋ以上の場合には（ステップＳ１２６−ＹＥＳ）、複数属性匿名化部２０３は、ｒ（ｔ）のうち、結合した属性部分を変更したレコードと、その個数（ｊ個）を匿名化情報記憶部３０に出力することで登録し（ステップＳ１２７）、図２のステップＳ１１０に進む。 The multi-attribute anonymization unit 203 determines whether or not the variable j is greater than or equal to k (step S126). “K” in step S125 is also k anonymized k. Therefore, when an affirmative determination is made in step S126, it is indicated that j records have been anonymized by the joining process. When the variable j is greater than or equal to k (step S126—YES), the multi-attribute anonymization unit 203 anonymizes the records in which the combined attribute parts are changed and the number (j) of r (t). Registration is performed by outputting to the information storage unit 30 (step S127), and the process proceeds to step S110 of FIG.

上記ステップＳ１２６において、変数ｊがｋ未満の場合には（ステップＳ１２６−ＮＯ）、ｋ匿名性が担保されていないため、複数属性匿名化部２０３は、ｎがＮ以下か否かを判定する（ステップＳ１２８）。上記Ｎは、属性数である。ｎがＮより大きい場合には（ステップＳ１２８−ＮＯ）、複数属性匿名化部２０３は、ｍとＮとが等しいか否かを判定する（ステップＳ１３１）。ｍとＮとが等しい場合には（ステップＳ１３１−ＹＥＳ）、複数属性匿名化部２０３は、ｎを増分して（ステップＳ１３２）、ステップＳ１２１に進む。ｍとＮとが異なる場合には（ステップＳ１３１−ＮＯ）、複数属性匿名化部２０３は、ｍを増分して（ステップＳ１３０）、ステップＳ１２２に進む。すなわち、優先順位を１つ上げた属性において結合処理が行われる。 In step S126, when the variable j is less than k (step S126—NO), k anonymity is not ensured, so the multi-attribute anonymization unit 203 determines whether n is N or less ( Step S128). N is the number of attributes. When n is larger than N (step S128—NO), the multi-attribute anonymization unit 203 determines whether m and N are equal (step S131). When m is equal to N (step S131—YES), the multi-attribute anonymization unit 203 increments n (step S132) and proceeds to step S121. If m and N are different (step S131—NO), the multi-attribute anonymization unit 203 increments m (step S130) and proceeds to step S122. In other words, the combining process is performed on the attribute whose priority is increased by one.

ステップＳ１２８において、ｎがＮ以下の場合には（ステップＳ１２８−ＹＥＳ）、複数属性匿名化部２０３は、ｍとｎとが等しいか否かを判定する（ステップＳ１２９）。ｍとｎとが異なる場合には（ステップＳ１２９−ＮＯ）、複数属性匿名化部２０３は、上記ステップＳ１３０に進む。ｍとｎとが等しい場合には（ステップＳ１２９−ＹＥＳ）、複数属性匿名化部２０３は、上記ステップＳ１３２に進む。 In step S128, when n is N or less (step S128-YES), the multi-attribute anonymization unit 203 determines whether m and n are equal (step S129). When m and n are different (step S129—NO), the multi-attribute anonymization unit 203 proceeds to step S130. When m is equal to n (step S129—YES), the multi-attribute anonymization unit 203 proceeds to step S132.

上記ステップＳ１２３に示されるように、結合可能なものがない場合には、複数属性匿名化部２０３は、上記ステップＳ１２８に進む。このように、当該属性にて結合可能なものがない場合には、別の属性での結合を実施することとなる。 As shown in step S123, when there is nothing that can be combined, the multi-attribute anonymization unit 203 proceeds to step S128. Thus, when there is nothing that can be combined with the attribute, the connection with another attribute is performed.

図４は、図２のステップＳ１０７の同値レコードグループ化処理の流れの具体例を示すフローチャートである。まず、複数属性匿名化部２０３は、カウンタｉを１増分する（ステップＳ３０１）。複数属性匿名化部２０３は、ｒ（ｉ）が処理済みか否かを判定する（ステップＳ３０２）。ｒ（ｉ）が処理済みの場合には（ステップＳ３０２−ＹＥＳ）、複数属性匿名化部２０３は、再びステップＳ３０１でカウンタｉを１増分する。ｒ（ｉ）が処理済みではない場合には（ステップＳ３０２−ＮＯ）、複数属性匿名化部２０３は、ｒ（ｔ）とｒ（ｉ）とが等しいか否かを判定する（ステップＳ３０３）。ここで、ｒ（＊）は、＊番目のレコード全体を示す。また、ｒ（ｔ）とｒ（ｉ）とが等しいとは、ｒ（ｔ）の属性値に対応するｒ（ｉ）の属性値が全て等しいことを意味する。 FIG. 4 is a flowchart showing a specific example of the flow of the equivalence record grouping process in step S107 of FIG. First, the multi-attribute anonymization unit 203 increments the counter i by 1 (step S301). The multi-attribute anonymization unit 203 determines whether r (i) has been processed (step S302). When r (i) has been processed (step S302—YES), the multi-attribute anonymization unit 203 increments the counter i by 1 again in step S301. When r (i) has not been processed (step S302—NO), the multi-attribute anonymization unit 203 determines whether r (t) and r (i) are equal (step S303). Here, r (*) indicates the entire * th record. Also, r (t) and r (i) being equal means that the attribute values of r (i) corresponding to the attribute value of r (t) are all equal.

ｒ（ｔ）とｒ（ｉ）とが等しい場合には（ステップＳ３０３−ＹＥＳ）、レコード数が増加するため、複数属性匿名化部２０３は、変数ｊを１増分し（ステップＳ３０４）、ステップＳ３０１に戻る。ｒ（ｔ）とｒ（ｉ）とが異なる場合には（ステップＳ３０３−ＮＯ）、複数属性匿名化部２０３は、処理を終了する。 If r (t) and r (i) are equal (step S303—YES), the number of records increases, so the multi-attribute anonymization unit 203 increments the variable j by 1 (step S304), and step S301. Return to. When r (t) and r (i) are different (step S303—NO), the multi-attribute anonymization unit 203 ends the process.

図５は、図３のフローチャートに示される処理の処理内容を示す図である。図３においては、一例として属性数が５の場合を処理内容を示している。また、図３において、横軸は優先度１の属性の結合処理回数を示し、縦軸は各属性の優先度を示す。また、横軸は図３のフローチャートにおける「ｎ」を示し、縦軸は図３のフローチャートにおける「ｍ」を示している。 FIG. 5 is a diagram showing the processing contents of the processing shown in the flowchart of FIG. FIG. 3 shows the processing content when the number of attributes is 5, as an example. In FIG. 3, the horizontal axis indicates the number of times of attribute combination processing with priority 1, and the vertical axis indicates the priority of each attribute. Further, the horizontal axis indicates “n” in the flowchart of FIG. 3, and the vertical axis indicates “m” in the flowchart of FIG.

図５（ａ）は、図３のフローチャートに示される処理の順番を示す図である。まず、ｎ＝１、ｍ＝１で、優先度が１の属性において結合処理が行われる。次いで、ｎ＝２、ｍ＝１〜２で、優先度が１、２の属性において結合処理が行われる。次いで、ｎ＝３、ｍ＝１〜３で、優先度が１〜３の属性において結合処理が行われる。次いで、ｎ＝４、ｍ＝１〜４で、優先度が１〜４の属性において結合処理が行われる。次いで、ｎ＝５、ｍ＝１〜５で、優先度が１〜５の属性において結合処理が行われる。以下、ｋ匿名化されるまで、結合処理回数を増加していくこととなる。 FIG. 5A is a diagram showing the order of processing shown in the flowchart of FIG. First, a combination process is performed for an attribute with n = 1 and m = 1 and a priority of 1. Next, a combination process is performed for attributes of n = 2, m = 1-2, and priorities 1 and 2. Next, a combination process is performed for attributes with n = 3, m = 1-3, and priorities 1-3. Next, a combination process is performed on attributes having n = 4 and m = 1 to 4 and priorities of 1 to 4. Next, a combination process is performed for attributes with n = 5, m = 1-5, and priorities 1-5. In the following, the number of combined processes will be increased until k anonymized.

図５（ｂ）は、図３のフローチャートに示される処理の「ｎ」と「ｍ」との関係を示す図である。各矢印５０〜５４は、それぞれｎ＝１〜５に対応している。また、各々の矢印５０〜５４において、始点から終点へ向かうことで優先度が増加する様子は、ｍが増加することを示している。 FIG. 5B is a diagram showing the relationship between “n” and “m” in the process shown in the flowchart of FIG. The arrows 50 to 54 correspond to n = 1 to 5, respectively. Moreover, in each of the arrows 50 to 54, the manner in which the priority increases by moving from the start point to the end point indicates that m increases.

従って、矢印５０は、ｎ＝１、ｍ＝１の処理（優先度が１の属性における結合処理）が行われることを示している。矢印５１は、ｎ＝２、ｍ＝１〜２の処理（優先度が１〜２における属性の結合処理）が行われることを示している。矢印５２は、ｎ＝３、ｍ＝１〜３の処理（優先度が１〜３の属性における結合処理）が行われることを示している。矢印５３は、ｎ＝４、ｍ＝１〜４の処理（優先度が１〜４の属性における結合処理）が行われることを示している。矢印５４は、ｎ＝５、ｍ＝１〜５の処理（優先度が１〜５の属性における結合処理）が行われることを示している。以下、ｋ匿名化されるまで、矢印に示される順番で結合処理が行われ、矢印の終点まで結合処理が行われると、必要に応じて右の矢印に移行し、再び結合処理が行われることとなる。 Therefore, the arrow 50 indicates that processing of n = 1 and m = 1 (combination processing with an attribute having a priority of 1) is performed. An arrow 51 indicates that the process of n = 2 and m = 1 to 2 (the attribute combining process with the priority of 1 to 2) is performed. An arrow 52 indicates that processing with n = 3 and m = 1 to 3 (combination processing with attributes having priorities of 1 to 3) is performed. An arrow 53 indicates that processing with n = 4 and m = 1 to 4 (combination processing with attributes having priorities of 1 to 4) is performed. An arrow 54 indicates that processing of n = 5 and m = 1 to 5 (combination processing in an attribute having a priority of 1 to 5) is performed. Thereafter, until k anonymization is performed, the joining process is performed in the order indicated by the arrow, and when the joining process is performed up to the end point of the arrow, the process moves to the right arrow as necessary, and the joining process is performed again. It becomes.

さらに、非匿名化情報の更新が発生した場合、複数属性匿名化部２０３が処理対象とする属性は変更が発生した属性に限定し、変更が発生していない属性については前回匿名化処理結果を転用する。このように属性値の結合処理による匿名化を属性値の変更があった属性に限定することで、より高速化を図ることができる。 Furthermore, when the update of non-anonymization information occurs, the attributes that the multi-attribute anonymization unit 203 processes are limited to the attributes that have changed, and the attributes that have not changed are the results of the previous anonymization process. Divert. By limiting the anonymization by the attribute value combination processing to the attribute whose attribute value has been changed in this way, the speed can be further increased.

図６は、個別属性匿名化部２０２によるｋ匿名化（ｋ＝３）の一例を示す図である。なお、図６は、レコードが３個の属性を備え、属性１は階層化された情報、属性２及び３が数値情報であるレコードの例を示している。また、図６に例示する属性単位でのｋ匿名化は、一般階層木を用いてｋ匿名化を行うものであり、このｋ匿名化は既存技術によるものである。つまり、複数属性匿名化部２０３の前処理を実行する個別属性匿名化部２０２においては、既存技術を使用した匿名化処理でも良い。 FIG. 6 is a diagram illustrating an example of k anonymization (k = 3) by the individual attribute anonymization unit 202. FIG. 6 shows an example of a record in which a record has three attributes, attribute 1 is hierarchical information, and attributes 2 and 3 are numerical information. Moreover, k anonymization in the attribute unit illustrated in FIG. 6 performs k anonymization using a general hierarchical tree, and this k anonymization is based on the existing technology. That is, the individual attribute anonymization unit 202 that executes the preprocessing of the multi-attribute anonymization unit 203 may be an anonymization process using an existing technique.

図６において、図６（ａ）は、非匿名化情報を示している。図６（ｂ）は、属性１における一般化階層を示している。図６（ｃ）は、属性１において、レベル１によるｋ匿名化を示している。図６（ｄ）は、属性２における一般化階層を示している。図６（ｅ）は、属性２において、レベル１によるｋ匿名化を示している。図６（ｆ）は、属性３における一般化階層を示している。図６（ｇ）は、属性３において、レベル２によるｋ匿名化を示している。図６（ｃ）（ｅ）（ｇ）に示されるように、いずれの場合もｋ匿名化されている。 In FIG. 6, FIG. 6 (a) shows non-anonymized information. FIG. 6B shows a generalized hierarchy in attribute 1. FIG. 6 (c) shows k anonymization by level 1 in attribute 1. FIG. 6D shows a generalized hierarchy in attribute 2. FIG. 6 (e) shows k anonymization by level 1 in attribute 2. FIG. 6 (f) shows a generalized hierarchy in attribute 3. FIG. 6G shows k anonymization by level 2 in attribute 3. As shown in FIGS. 6C, 6 </ b> E, and 6 </ b> G, in each case, k anonymization is performed.

図７は、複数属性匿名化部２０３によるｋ匿名化の処理結果を示す図である。図７（ａ）は、図６で示した属性単位でのｋ匿名化結果を示している。図７（ｂ）は、複数属性匿名化部２０３によるｋ匿名化の処理結果を示す図である。 FIG. 7 is a diagram illustrating a processing result of k anonymization performed by the multi-attribute anonymization unit 203. FIG. 7A shows the k anonymization result in the attribute unit shown in FIG. FIG. 7B is a diagram illustrating a result of k anonymization performed by the multi-attribute anonymization unit 203.

図２に示した処理に基づき説明すると、まずレコード１からレコード４までは、同値レコードであり（ステップＳ１０７）、かつ４個存在するため（ステップＳ１０８で肯定判定）、ｋ匿名化されている。次いで、レコード５からレコード７までは、同値レコードであり（ステップＳ１０７）、かつ３個存在するため（ステップＳ１０８で肯定判定）、ｋ匿名化されている。 Describing based on the processing shown in FIG. 2, first, record 1 to record 4 are equivalent records (step S107), and there are four (affirmative determination in step S108), so they are anonymized. Next, since records 5 to 7 are equivalent records (step S107) and there are three (positive determination in step S108), they are anonymized.

次いで、レコード８からレコード１１までは、同値レコードであり（ステップＳ１０７）、かつ４個存在するため（ステップＳ１０８で肯定判定）、ｋ匿名化されている。そして、レコード１２からレコード１４までは、同値レコードであり（ステップＳ１０７）、かつ３個存在するため（ステップＳ１０８で肯定判定）、ｋ匿名化されている。 Next, since records 8 to 11 are equivalent records (step S107) and there are four records (affirmative determination in step S108), they are anonymized. Since records 12 to 14 are equivalent records (step S107) and there are three records (positive determination in step S108), they are anonymized.

以上の処理により、レコード数が１４のレコード全体がｋ匿名化される。そして、図７（ｂ）に示されるように、ｋ匿名化されたレコードが、匿名化情報記憶部３０に登録される。 Through the above processing, the entire record with 14 records is anonymized. 7B, the anonymized record is registered in the anonymized information storage unit 30.

上述したように、図６における属性単位でのｋ匿名化は、一般階層木を用いたｋ匿名化を行うものであり、このｋ匿名化は既存技術によるものである。一方、図７に示した複数属性匿名化部２０３によるｋ匿名化は、従来技術のように一般階層木を用いてｋ匿名化するものではない。一般階層木を用いたｋ匿名化では、ｋ匿名化されるまで、次々と親の階層でｋ匿名化されているか否かを判定するものであるが、複数属性匿名化部２０３によるｋ匿名化は、優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで属性単位匿名化レコードを複数の属性にまたがってｋ匿名化するものであるため、一般階層木を用いた従来技術のｋ匿名化とは全く異なるものである。 As described above, k anonymization in attribute units in FIG. 6 performs k anonymization using a general hierarchical tree, and this k anonymization is based on the existing technology. On the other hand, the k-anonymization by the multi-attribute anonymization unit 203 shown in FIG. 7 is not anonymization using a general hierarchical tree as in the prior art. In k anonymization using a general hierarchy tree, it is determined whether or not k anonymization is successively performed in a parent hierarchy until k anonymization is performed. Is an anonymization of attribute unit anonymization records across a plurality of attributes by increasing the number of records having different attribute values as one new attribute value for lower priority attributes. This is completely different from the conventional k anonymization using a tree.

次に、複数属性匿名化部２０３の結合処理例について説明する。図８は、数値情報の場合の結合処理例を示す図である。図８（ａ）に示されるように、属性値「３１−４０」を持つレコードが２個のため、ｋ匿名性（ｋ＝３）を担保できない。この場合には、図８（ｂ）に示されるように、隣接する属性値「４１−５０」を持つレコードと、属性値「３１−４０」を持つレコードとを新たな１つの属性値「３１−５０」を持つレコードとする。これにより、属性値「３１−５０」を持つレコードが５つとなるため、結合された５つのレコードはｋ匿名化されたレコードとなる。なお、図８での結合処理は属性ｍ以外の属性値がすべて一致しているレコードに限ることを前提条件とする。 Next, an example of the combination process of the multi-attribute anonymization unit 203 will be described. FIG. 8 is a diagram illustrating an example of a combination process in the case of numerical information. As shown in FIG. 8A, since there are two records having the attribute value “31-40”, k anonymity (k = 3) cannot be secured. In this case, as shown in FIG. 8B, a record having an adjacent attribute value “41-50” and a record having the attribute value “31-40” are combined into a new attribute value “31”. Record with “−50”. As a result, since there are five records having the attribute value “31-50”, the combined five records are k-anonymized records. Note that it is assumed that the joining process in FIG. 8 is limited to records in which all attribute values other than the attribute m match.

図９は、属性が階層化された情報で条件情報として階層内属性値間結合定義を持つ場合の結合処理例を示す図である。図９（ａ）に示されるように、属性値「京都府」を持つレコードが２個のため、ｋ匿名性（ｋ＝３）を担保できない。そこで複数属性匿名化部２０３は、結合可能属性値を参照する。結合可能属性値とは、自然な結合が可能な属性値を示すものである。図９の例では、地理的に隣接している京都府と大阪府は結合可能であるが、隣接していない京都府と兵庫県は結合できないことを示している。これにより有用性を担保することが可能となる。なお、図９での結合処理は属性ｍ以外の属性値がすべて一致しているレコードに限ることを前提条件とする。 FIG. 9 is a diagram illustrating an example of a combination process in the case where the attribute is hierarchized and has a combination definition between attribute values in the hierarchy as condition information. As shown in FIG. 9A, since there are two records having the attribute value “Kyoto Prefecture”, k anonymity (k = 3) cannot be secured. Therefore, the multi-attribute anonymization unit 203 refers to the joinable attribute value. The connectable attribute value indicates an attribute value that can be naturally combined. The example of FIG. 9 shows that Kyoto Prefecture and Osaka Prefecture that are geographically adjacent can be combined, but Kyoto Prefecture and Hyogo Prefecture that are not adjacent to each other can not be combined. This makes it possible to ensure usefulness. It is assumed that the joining process in FIG. 9 is limited to records in which all attribute values other than the attribute m match.

この場合には、図９（ｂ）に示されるように、属性値「京都」を持つレコードと、結合可能な属性値「大阪府」を持つレコードとを新たな１つの属性値「京都府、大阪府」を持つレコードとする。これにより、属性値「京都府、大阪府」を持つレコードが５つとなるため、結合された５つのレコードはｋ匿名化されたレコードとなる。 In this case, as shown in FIG. 9B, a record having the attribute value “Kyoto” and a record having a connectable attribute value “Osaka Prefecture” are combined with a new attribute value “Kyoto Prefecture, Record with “Osaka Prefecture”. As a result, since there are five records having the attribute values “Kyoto Prefecture, Osaka Prefecture”, the combined five records become k-anonymized records.

図１０は、条件情報が時系列データにおける重心の場合の結合処理例を示す図である。なお、図１０に示される時系列データでは、データが数値であることから、図１０は、数値情報の結合処理例の一つとして、時系列データにおける重心の場合の結合処理例を示している。図１０（ａ）に示されるように、第１時系列のレコードが２個のため、ｋ匿名性（ｋ＝３）を担保できない。そこで複数属性匿名化部２０３は、ともに空欄でないカラムの個数が最大となるレコードを結合対象属性値候補として選出する。次に時系列データにおける重心を参照し、前記結合対象属性値候補のうち当該レコードとの重心距離が所定の値以下のレコードのみに前記結合対象属性値候補を絞り込む。絞り込まれた前記結合対象属性値候補のうち、重心距離が最小のレコードを結合対象とする。ただし、前記結合対象属性値候補の絞り込み処理において該当するレコードが存在しない場合は、ともに空欄ではないカラムの個数が次に多いレコードを結合対象属性値候補として選出し、以降同一の処理を繰り返す。条件情報が時系列データにおける重心の場合、各々の時系列データの重心が条件情報として含まれている。なお、図１０の各レコードの空欄は、匿名化を行ったことを示している。 FIG. 10 is a diagram illustrating an example of the combination process when the condition information is the center of gravity in the time series data. In the time series data shown in FIG. 10, since the data is a numerical value, FIG. 10 shows an example of the combination process in the case of the centroid in the time series data as one example of the combination process of the numerical information. . As shown in FIG. 10A, since there are two records in the first time series, k anonymity (k = 3) cannot be secured. Therefore, the multi-attribute anonymization unit 203 selects a record having the maximum number of columns that are not blank as a combination target attribute value candidate. Next, the centroid in the time series data is referred to, and the merging target attribute value candidates are narrowed down to only the records whose centroid distance from the record is a predetermined value or less among the merging target attribute value candidates. Of the narrowed combination target attribute value candidates, a record having the shortest center-of-gravity distance is set as a combination target. However, if there is no corresponding record in the process of narrowing down the attribute value candidate to be combined, a record having the next largest number of non-blank columns is selected as a candidate attribute value to be combined, and the same process is repeated thereafter. When the condition information is the centroid in the time series data, the centroid of each time series data is included as the condition information. Note that the blank in each record in FIG. 10 indicates that anonymization has been performed.

図１０の場合には、例えば、結合対象属性値候補絞込みのための条件がマンハッタン距離≦２であるとした場合、第１時系列とともに空欄ではないカラムの個数が最大である第５時系列を結合対象属性値候補として選出する。第１時系列と第５時系列間のマンハッタン距離は１となるため第５時系列は所定の値以内となる唯一のレコードとなり結合対象に選出される。この場合、第１時系列と第５時系列とを新たな１つの属性値「左端の属性値「月ＡＭ」が１で、その１つ右の属性値「月ＡＭ」が０」でそれ以外は空欄のレコードとする。これにより、「左端の属性値「月ＡＭ」が１で、その１つ右の属性値「月ＡＭ」が０」でそれ以外は空欄のレコードが６つとなるため、結合された６つのレコードはｋ匿名化されたレコードとなる。なお、重心間の距離を求めるための距離関数は、距離の公理（非負性、非退化性、対称性、三角不等式）を満たすものであれば、どのような関数であってもよい。なお、数値に加え、数値以外の属性値を持つ他の属性を持つレコードの結合では、数値以外の他の属性の属性値が全て一致してることがレコードを結合する前提である。 In the case of FIG. 10, for example, when the condition for narrowing the candidate attribute value to be combined is Manhattan distance ≦ 2, the fifth time series in which the number of non-blank columns is the maximum is combined with the first time series. Selected as a candidate attribute value to be combined. Since the Manhattan distance between the first time series and the fifth time series is 1, the fifth time series is the only record that is within a predetermined value, and is selected for combination. In this case, the first time series and the fifth time series have a new attribute value “the leftmost attribute value“ month AM ”is 1 and the right attribute value“ month AM ”is 0”, and the others. Is a blank record. As a result, the leftmost attribute value “Month AM” is 1, the right attribute value “Month AM” is 0, and there are six blank records in all other cases. k Anonymized record. The distance function for obtaining the distance between the centroids may be any function as long as it satisfies the distance axioms (non-negative, non-degenerate, symmetrical, triangular inequality). In addition, in the combination of records having other attributes having attribute values other than numerical values in addition to numerical values, it is assumed that all attribute values of other attributes other than numerical values are matched.

このように構成された匿名化装置２０では、匿名性及び有用性の双方を保つことが可能になる。具体的には以下のとおりである。 In the anonymization device 20 configured in this way, both anonymity and usefulness can be maintained. Specifically, it is as follows.

まず、優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで、全てのレコードの組み合わせを総当たりして匿名化する場合と比較して、匿名化処理に要する時間を抑制可能となる。また、匿名化装置２０は、優先順位が低い属性ほど、異なる属性値を新たな１つの属性値とするレコードを増加させることで属性単位匿名化レコードを複数の属性にまたがってｋ匿名化する。従って、例えばあまり匿名化させたくない属性ほど高い優先順位に設定することで、低い優先順位が設定された属性よりは匿名性が低くなるため、有用性を担保することが可能となる。 First, an anonymization process is performed in comparison with the case where all the combinations of records are anonymized by increasing the number of records with different attribute values as one new attribute value for lower priority attributes. It is possible to suppress the time required for. Moreover, the anonymization apparatus 20 anonymizes an attribute unit anonymization record over several attributes by increasing the record which uses a different attribute value as one new attribute value, so that an attribute with a low priority order. Therefore, for example, by setting an attribute that is not desired to be anonymized to a higher priority, anonymity is lower than an attribute for which a lower priority is set, so that usefulness can be ensured.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１…匿名化システム，１０…非匿名化情報記憶部，２０…匿名化装置，３０…匿名化情報記憶部，２０１…条件情報設定部，２０２…個別属性匿名化部，２０３…複数属性匿名化部 DESCRIPTION OF SYMBOLS 1 ... Anonymization system, 10 ... Non-anonymization information storage part, 20 ... Anonymization apparatus, 30 ... Anonymization information storage part, 201 ... Condition information setting part, 202 ... Individual attribute anonymization part, 203 ... Multiple attribute anonymization Part

本発明の一態様は、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得部と、前記複数の属性に対して予め定められた優先順位を取得する順位取得部と、前記取得部により取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化部と、個別属性匿名化部により属性単位で匿名化された属性単位匿名化レコードから、前記順位取得部により取得された前記優先順位が低い属性ほど、互いに異なる複数の属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化部と、を有する匿名化装置である。 One aspect of the present invention provides a record acquisition unit that acquires a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized, and a priority order that is predetermined for the plurality of attributes. The order acquisition unit to be acquired, the individual attribute anonymization unit that anonymizes the plurality of records acquired by the acquisition unit, and the attribute unit anonymization anonymized by the attribute unit by the individual attribute anonymization unit from the record, the higher the priority is low attribute acquired by the order acquisition section, a plurality of attributes the attribute unit anonymous record by increasing the records to different attribute values to each other as a new one attribute value A multi-attribute anonymization unit that anonymizes across the anonymization device.

本発明の一態様は、上記匿名化装置であって、前記複数属性匿名化部は、互いに異なる複数の属性値を新たな１つの属性値とすることで匿名性を判定し、匿名性が担保されていない場合には、互いに異なる複数の属性値を新たな１つの属性値とした属性より優先順位の高い属性の互いに異なる複数の属性値を新たな１つの属性値として匿名性を判定することを繰り返すことで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する。 One aspect of the present invention is the above-described anonymous device, the plurality attribute anonymizing section determines anonymity by different attribute values to each other and one new attribute value, anonymity collateral is If not is is, determining the anonymity different different attribute values of a plurality of high attribute values a new one attribute value and the more priority attribute attribute together as one new attribute value By repeating the above, the attribute unit anonymization record is anonymized across a plurality of attributes.

本発明の一態様は、匿名化装置における匿名化方法であって、匿名化装置が、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得ステップと、匿名化装置が、前記複数の属性に対して予め定められた優先順位を取得する順位取得ステップと、匿名化装置が、前記取得ステップにより取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化ステップと、匿名化装置が、個別属性匿名化ステップにより属性単位で匿名化された属性単位匿名化レコードから、前記順位取得ステップにより取得された前記優先順位が低い属性ほど、互いに異なる複数の属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化ステップと、を有する匿名化方法である。 One aspect of the present invention is an anonymization method in an anonymization device, wherein the anonymization device acquires a plurality of records that are non-anonymization information having a plurality of attribute values that are not anonymized. If, anonymizing apparatus, and charts acquiring the predetermined priority for the plurality of attributes, anonymous device, the plurality of records obtained by the obtaining step, anonymous attribute units From the attribute unit anonymization record anonymized by the individual attribute anonymization step , the individual attribute anonymization step to be converted into an attribute unit anonymization record, the attribute having a lower priority obtained by the rank acquisition step, anonymization across multiple attributes the attribute unit anonymous record by increasing the records to different attribute values to each other as a new one attribute value A plurality attributes anonymizing step that is anonymous method with.

本発明の一態様は、匿名化装置が、匿名化されていない複数の属性の値を有する非匿名化情報である複数のレコードを取得するレコード取得ステップと、匿名化装置が、前記複数の属性に対して予め定められた優先順位を取得する順位取得ステップと、匿名化装置が、前記取得ステップにより取得された前記複数のレコードを、属性単位で匿名化する個別属性匿名化ステップと、匿名化装置が、個別属性匿名化ステップにより属性単位で匿名化された属性単位匿名化レコードから、前記順位取得ステップにより取得された前記優先順位が低い属性ほど、互いに異なる複数の属性値を新たな１つの属性値とするレコードを増加させることで前記属性単位匿名化レコードを複数の属性にまたがって匿名化する複数属性匿名化ステップと、を匿名化装置のコンピュータに実行させるためのコンピュータプログラムである。
According to one aspect of the present invention, the anonymization device acquires a plurality of records that are non-anonymization information having a plurality of attribute values that are not anonymized, and the anonymization device includes the plurality of attributes. An order anonymization step for obtaining a predetermined priority order, an anonymization device anonymizing the plurality of records obtained by the obtaining step in an attribute unit, and anonymization From the attribute unit anonymization record that has been anonymized in attribute units by the individual attribute anonymization step , the device has a plurality of attribute values that are different from each other as the attribute with the lower priority acquired by the rank acquisition step. anonymizing, a plurality attributes anonymizing step of anonymizing across the attribute unit anonymous record a plurality of attributes by increasing the record to attribute values A computer program to be executed by a location of the computer.

Claims

A record acquisition unit that acquires a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized;
A rank obtaining unit for obtaining a predetermined priority order for the plurality of attributes;
An individual attribute anonymization unit that anonymizes the plurality of records acquired by the acquisition unit in attribute units;
From the attribute unit anonymization record anonymized in units of attributes by the individual attribute anonymization unit, a record having a different attribute value as a new attribute value for the attribute with the lower priority acquired by the rank acquisition unit A multi-attribute anonymization unit that anonymizes the attribute unit anonymization record across a plurality of attributes by increasing;
Anonymization device having

The multi-attribute anonymization unit determines anonymity by setting different attribute values as one new attribute value, and when anonymity is not ensured, the different attribute value is set as a new one attribute value. Characterized by anonymizing the attribute unit anonymization record across a plurality of attributes by repeatedly determining anonymity as a new attribute value with a different attribute value having a higher priority than the attribute. The anonymization device according to claim 1.

A record acquisition step for acquiring a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized;
A rank obtaining step for obtaining a predetermined priority order for the plurality of attributes;
An individual attribute anonymization step of anonymizing the plurality of records acquired by the acquisition step in attribute units;
From the attribute unit anonymization record anonymized in attribute units by the individual attribute anonymization step, the attribute having a lower attribute priority acquired by the rank acquisition step is a record having a different attribute value as a new attribute value. A multi-attribute anonymization step of anonymizing the attribute unit anonymization record across a plurality of attributes by increasing;
Anonymization method having

A record acquisition step for acquiring a plurality of records that are non-anonymized information having a plurality of attribute values that are not anonymized;
A rank obtaining step for obtaining a predetermined priority order for the plurality of attributes;
An individual attribute anonymization step of anonymizing the plurality of records acquired by the acquisition step in attribute units;
From the attribute unit anonymization record anonymized in attribute units by the individual attribute anonymization step, the attribute having a lower attribute priority acquired by the rank acquisition step is a record having a different attribute value as a new attribute value. A multi-attribute anonymization step of anonymizing the attribute unit anonymization record across a plurality of attributes by increasing;
A computer program for causing a computer to execute.