JP7179795B2

JP7179795B2 - Anonymization device, anonymization method and anonymization program

Info

Publication number: JP7179795B2
Application number: JP2020047509A
Authority: JP
Inventors: 知明三本; 晋作清本
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2022-11-29
Anticipated expiration: 2040-03-18
Also published as: JP2021149398A

Description

本発明は、データセットを匿名化するための装置、方法及びプログラムに関する。 The present invention relates to a device, method and program for anonymizing datasets.

従来、データセットのレコードから個人を特定されないために、例えば非特許文献１～５のように、プライバシ保護の観点からデータを匿名化するための様々な技術が提案されている。 Conventionally, various techniques for anonymizing data from the viewpoint of privacy protection have been proposed, such as Non-Patent Documents 1 to 5, so that individuals cannot be identified from data set records.

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, "Mondrian multidimensional k-anonymity," in Proc. of the 22nd International Conference on Data Engineering (ICDE ’06), pp. 25-35, IEEE, 2006.K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, "Mondrian multidimensional k-anonymity," in Proc. of the 22nd International Conference on Data Engineering (ICDE '06), pp. 25-35, IEEE, 2006. P. Samarati and L. Sweeney, "Generalizing data to provide anonymity when disclosing information," in Proc. of PODS 1998, 1998, p. 188.P. Samarati and L. Sweeney, "Generalizing data to provide anonymity when disclosing information," in Proc. of PODS 1998, 1998, p. 188. P. Samarati, "Protecting respondents’ identities in microdata release," IEEE Trans. on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001.P. Samarati, "Protecting respondents' identities in microdata release," IEEE Trans. on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001. L. Sweeney, "Achieving k-anonymity privacy protection using generalization and suppression," in J. Uncertainty, Fuzziness, and Knowledge-Base Systems, vol. 10(5), 2002, pp. 571-588.L. Sweeney, "Achieving k-anonymity privacy protection using generalization and suppression," in J. Uncertainty, Fuzziness, and Knowledge-Base Systems, vol. 10(5), 2002, pp. 571-588. Byun, Ji-Won and Kamra, Ashish and Bertino, Elisa and Li, Ninghui, "Efficient k-anonymization using clustering techniques," International Conference on Database Systems for Advanced Applications, 188-200, 2007, Springer.Byun, Ji-Won and Kamra, Ashish and Bertino, Elisa and Li, Ninghui, "Efficient k-anonymization using clustering techniques," International Conference on Database Systems for Advanced Applications, 188-200, 2007, Springer.

ところで、従来の匿名化手法において、匿名化対象のデータは、全ての属性が一致している必要があった。しかし、実際のデータには、例えばレセプトデータのように、ペイロードが異なるものが存在している。この場合、従来の匿名化手法では対応できなかった。 By the way, in the conventional anonymization method, all the attributes of the data to be anonymized had to match. However, some actual data, such as receipt data, have different payloads. In this case, the conventional anonymization method could not cope.

本発明は、ペイロードが異なるデータを匿名化できる匿名化装置、匿名化方法及び匿名化プログラムを提供することを目的とする。 An object of the present invention is to provide an anonymization device, an anonymization method, and an anonymization program capable of anonymizing data with different payloads.

本発明に係る匿名化装置は、ペイロードの属性が当該ペイロードに含まれるコードにより特定可能なレコードからなるデータセットの入力を受け付ける入力部と、前記レコードの順序を記憶する記憶部と、前記コード、又は前記コードの組み合わせ毎に前記データセットの一部のレコードを抽出し、共通の属性に対して匿名化処理を行う匿名化処理部と、匿名化された前記一部のレコードを、前記順序に従って統合し、匿名化されたデータセットを再構成する統合部と、を備える。 An anonymization device according to the present invention includes an input unit that receives an input of a data set consisting of records whose payload attributes can be specified by a code included in the payload, a storage unit that stores the order of the records, the code, Alternatively, an anonymization processing unit that extracts a partial record of the data set for each combination of the codes and performs anonymization processing on a common attribute, and the anonymized partial record according to the order an amalgamator for amalgamating and reconstructing the anonymized dataset.

前記匿名化処理部は、共通の属性を持つ複数のコードを、同一の汎化されたコードに加工し、当該汎化されたコード毎に前記一部のレコードを抽出してもよい。 The anonymization processing unit may process a plurality of codes having a common attribute into the same generalized code, and extract the partial records for each generalized code.

前記匿名化処理部は、前記汎化されたコード毎に前記匿名化処理を行った後、詳細化したコード毎に前記匿名化処理を行ってもよい。 The anonymization processing unit may perform the anonymization processing for each detailed code after performing the anonymization processing for each generalized code.

本発明に係る匿名化方法は、ペイロードの属性が当該ペイロードに含まれるコードにより特定可能なレコードからなるデータセットの入力を受け付ける入力ステップと、前記レコードの順序を記憶する記憶ステップと、前記コード、又は前記コードの組み合わせ毎に前記データセットの一部のレコードを抽出し、共通の属性に対して匿名化処理を行う匿名化処理ステップと、匿名化された前記一部のレコードを、前記順序に従って統合し、匿名化されたデータセットを再構成する統合ステップと、をコンピュータが実行する。 An anonymization method according to the present invention includes an input step of accepting an input of a data set consisting of records whose payload attributes can be specified by a code included in the payload, a storage step of storing the order of the records, the code, Alternatively, an anonymization processing step of extracting partial records of the data set for each combination of the codes and performing anonymization processing on common attributes, and anonymizing the partial records in accordance with the order and an aggregating step of aggregating and reconstructing the anonymized data set.

本発明に係る匿名化プログラムは、前記匿名化装置としてコンピュータを機能させるためのものである。 An anonymization program according to the present invention is for causing a computer to function as the anonymization device.

本発明によれば、ペイロードが異なるデータを匿名化できる。 According to the present invention, data with different payloads can be anonymized.

実施形態における匿名化装置の機能構成を示す図である。It is a figure which shows the functional structure of the anonymization apparatus in embodiment. 実施形態におけるコードとペイロード属性との関係を例示する図である。It is a figure which illustrates the relationship between a code|cord|chord and a payload attribute in embodiment. 実施形態における匿名化方法の概要を示す図である。It is a figure which shows the outline|summary of the anonymization method in embodiment. 実施形態における匿名化方法の流れを例示するフローチャートである。4 is a flow chart illustrating the flow of an anonymization method in the embodiment;

以下、本発明の実施形態の一例について説明する。
図１は、本実施形態における匿名化装置１の機能構成を示す図である。
匿名化装置１は、サーバ又はパーソナルコンピュータ等の情報処理装置（コンピュータ）であり、制御部１０及び記憶部２０の他、各種データの入出力デバイス及び通信デバイス等を備える。 An example of an embodiment of the present invention will be described below.
FIG. 1 is a diagram showing the functional configuration of an anonymization device 1 according to this embodiment.
The anonymization device 1 is an information processing device (computer) such as a server or a personal computer, and includes a control unit 10 and a storage unit 20, input/output devices for various data, communication devices, and the like.

制御部１０は、匿名化装置１の全体を制御する部分であり、記憶部２０に記憶された各種プログラムを適宜読み出して実行することにより、本実施形態における各機能を実現する。制御部１０は、ＣＰＵであってよい。 The control unit 10 is a part that controls the entire anonymization device 1, and implements each function in the present embodiment by appropriately reading and executing various programs stored in the storage unit 20. FIG. The control unit 10 may be a CPU.

記憶部２０は、ハードウェア群を匿名化装置１として機能させるための各種プログラム、及び各種データ等の記憶領域であり、ＲＯＭ、ＲＡＭ、フラッシュメモリ又はハードディスク（ＨＤＤ）等であってよい。具体的には、記憶部２０は、本実施形態の各機能を制御部１０に実行させるためのプログラム（匿名化プログラム）の他、匿名化対象のデータセット、及びデータセットのペイロードに格納されるデータの属性を定義したコード等を記憶する。 The storage unit 20 is a storage area for various programs and various data for causing the hardware group to function as the anonymization device 1, and may be ROM, RAM, flash memory, hard disk (HDD), or the like. Specifically, the storage unit 20 stores a program (anonymization program) for causing the control unit 10 to execute each function of the present embodiment, an anonymization target dataset, and a payload of the dataset. It stores codes, etc. that define data attributes.

制御部１０は、入力部１１と、匿名化処理部１２と、統合部１３とを備える。制御部１０は、これらの機能部により、ペイロードの属性が異なるデータセットを匿名化して出力する。 The control unit 10 includes an input unit 11 , an anonymization processing unit 12 and an integration unit 13 . The control unit 10 uses these functional units to anonymize and output data sets with different payload attributes.

ここで、データセットの各レコードは、共通部と、ペイロード（コード、ペイロード属性１、２、…）とに分けられる。共通部は、全てのレコードに共通の属性（例えば、年齢、住所等）からなり、ペイロードは、データセット内で共通ではなく、コードによって各レコードに含まれるペイロード属性が特定される。
コードとペイロード属性との対応関係は、記憶部２０の所定のデータベースに格納され、適宜参照される。 Here, each record of the data set is divided into a common part and a payload (code, payload attributes 1, 2, . . . ). The common part consists of attributes common to all records (eg, age, address, etc.), the payload is not common within the dataset, and the code identifies the payload attributes contained in each record.
Correspondences between codes and payload attributes are stored in a predetermined database in the storage unit 20 and referenced as appropriate.

図２は、本実施形態におけるコードとペイロード属性との関係を例示する図である。
例えば、コード００が付与されたペイロードには、「入院日」、「退院日」、「病名」が格納される。同様に、コード０１、１０、１１に対して、ペイロード属性が定義される。 FIG. 2 is a diagram illustrating the relationship between codes and payload attributes in this embodiment.
For example, the payload with code 00 stores "hospitalization date", "discharge date", and "disease name". Similarly, payload attributes are defined for codes 01, 10, and 11.

入力部１１は、ペイロード属性がペイロードに含まれるコードにより特定可能なレコードからなるデータセットの入力を受け付ける。
また、入力部１１は、受け付けたデータセットの各レコードの順序を、記憶部２０に記憶しておく。 The input unit 11 receives an input of a data set consisting of records whose payload attributes can be specified by a code included in the payload.
Also, the input unit 11 stores the order of each record of the received data set in the storage unit 20 .

匿名化処理部１２は、コード、又はコードの組み合わせ毎に、データセットの一部のレコードを抽出し、抽出したレコードに共通の属性に対して匿名化処理を行う。なお、匿名化の手法は限定されず、各種の既存の手法が適用可能である。 The anonymization processing unit 12 extracts some records of the data set for each code or combination of codes, and performs anonymization processing on attributes common to the extracted records. Note that the anonymization method is not limited, and various existing methods can be applied.

このとき、匿名化処理部１２は、共通の属性を持つ複数のコードを、同一の汎化されたコードに加工し、この汎化されたコード毎に一部のレコードを抽出してもよい。
例えば、コードを構成する文字列とペイロード属性の共通性とに関連がある場合、この関連性に基づいてコードが加工される。図２の例では、コード００及び０１は、属性ＰＬ１及びＰＬ２が共通しているので、共に「０＊」と汎化することで該当のレコードが同時に抽出される。同様に、コード１０及び１１は、属性ＰＬ１が共通しているので、共に「１＊」と汎化される。 At this time, the anonymization processing unit 12 may process a plurality of codes having common attributes into the same generalized code, and extract some records for each generalized code.
For example, if there is a relationship between the character strings that make up the code and the commonality of payload attributes, the code is processed based on this relationship. In the example of FIG. 2, codes 00 and 01 have attributes PL1 and PL2 in common, so by generalizing both with "0*", corresponding records are extracted at the same time. Similarly, codes 10 and 11 are both generalized to "1*" because they have a common attribute PL1.

また、匿名化処理部１２は、汎化されたコード（例えば、「０＊」、「１＊」）毎に匿名化処理を行った後、詳細化した元のコード（例えば、００、０１、１０、１１）毎に匿名化処理を行ってもよい。 Further, the anonymization processing unit 12 performs anonymization processing for each generalized code (eg, “0*”, “1*”), and then the detailed original code (eg, 00, 01, 10 and 11), anonymization processing may be performed.

統合部１３は、匿名化された一部のレコードを、記憶しておいた順序に従って統合し、匿名化されたデータセットを再構成して出力する。 The integration unit 13 integrates the partial anonymized records according to the stored order, reconstructs and outputs an anonymized data set.

図３は、本実施形態における匿名化方法の概要を示す図である。
匿名化装置１は、まず、整形用の順序データが付与されたデータセット（Ａ）から、属性の共通部とコードとを抽出し、匿名化処理を行う。このとき、コード（００、０１、１０、１１）は、「０＊」又は「１＊」に汎化される。 FIG. 3 is a diagram showing an outline of an anonymization method according to this embodiment.
The anonymization device 1 first extracts common portions of attributes and codes from the data set (A) to which order data for shaping is added, and performs anonymization processing. At this time, the code (00, 01, 10, 11) is generalized to '0*' or '1*'.

匿名化装置１は、汎化されたコードそれぞれをキーにデータセットの一部を抽出すると、各グループ（Ｂ、Ｃ）内で共通の属性（太枠）に対して匿名化処理を行う。
そして、匿名化装置１は、それぞれ匿名化された複数のグループを順序データに基づいて統合して出力する（Ｄ）。 When the anonymization device 1 extracts a part of the data set using each generalized code as a key, the anonymization process is performed on common attributes (bold frames) within each group (B, C).
Then, the anonymization device 1 integrates and outputs the plurality of anonymized groups based on the order data (D).

図４は、本実施形態における匿名化方法の流れを例示するフローチャートである。
この例では、ペイロードのコードに汎化のレベルが複数存在し、各レベルにおいて匿名化処理が実施される。
例えば、コード０１２が「０１＊」、「０＊＊」のように階層的に汎化される。 FIG. 4 is a flowchart illustrating the flow of the anonymization method in this embodiment.
In this example, the payload code has multiple levels of generalization, and anonymization processing is performed at each level.
For example, code 012 is hierarchically generalized as "01*" and "0**".

ステップＳ１において、入力部１１は、ペイロード部が異なるデータセットの入力を受け付ける。このとき、データセット内の各属性は、ペイロードのコードも含めて既知とする。 In step S1, the input unit 11 receives input of data sets having different payload portions. At this time, each attribute in the data set is known, including the code of the payload.

ステップＳ２において、入力部１１は、匿名化の結果出力時のために、各レコードの順序データを記憶部２０に記憶する。 In step S2, the input unit 11 stores the order data of each record in the storage unit 20 in order to output the anonymization result.

ステップＳ３において、匿名化処理部１２は、データセット内の共通部と、ペイロードのコードとを切り出し、匿名化を実施する。
このとき、匿名化処理部１２は、コードをその定義（例えば、一般化階層木）に応じて、最も汎化されたコードに加工する。 In step S3, the anonymization processing unit 12 cuts out the common part in the data set and the code of the payload, and performs anonymization.
At this time, the anonymization processing unit 12 processes the code into the most generalized code according to its definition (for example, generalized hierarchical tree).

ステップＳ４において、匿名化処理部１２は、コードが一致するレコードを抽出し、抽出したグループ内で共通の属性に対して匿名化を実施する。 In step S4, the anonymization processing unit 12 extracts records with matching codes, and anonymizes common attributes within the extracted group.

ステップＳ５において、匿名化処理部１２は、全ての属性について匿名化を実施したか否かを判定する。この判定がＹＥＳの場合、処理はステップＳ８に移り、判定がＮＯの場合、処理はステップＳ６に移る。 In step S5, the anonymization processing unit 12 determines whether or not all attributes have been anonymized. If the determination is YES, the process moves to step S8, and if the determination is NO, the process moves to step S6.

ステップＳ６において、匿名化処理部１２は、ステップＳ４で用いたコードが汎化前の元のコードであるか否かを判定する。この判定がＹＥＳの場合、処理はステップＳ８に移り、判定がＮＯの場合、処理はステップＳ７に移る。 In step S6, the anonymization processing unit 12 determines whether the code used in step S4 is the original code before generalization. If the determination is YES, the process moves to step S8, and if the determination is NO, the process moves to step S7.

ステップＳ７において、匿名化処理部１２は、コードを１レベル詳細化する。その後、処理はステップＳ４に戻る。 In step S7, the anonymization processing unit 12 details the code by one level. After that, the process returns to step S4.

ステップＳ８において、統合部１３は、ステップＳ４において匿名化されたグループを統合し、予め記憶された順序データに基づいて、加工後のレコードを入力されたデータセットと同じ順序に並び替えて出力する。 In step S8, the integration unit 13 integrates the groups anonymized in step S4, rearranges the processed records in the same order as the input data set based on pre-stored order data, and outputs the processed records. .

以上のように、本実施形態によれば、匿名化装置１は、ペイロードの属性がコードにより特定可能なレコードからなるデータセットの入力を受け付け、レコードの順序を記憶すると、コード、又はコードの組み合わせ毎にデータセットの一部のレコードを抽出し、共通の属性に対して匿名化処理を行う。これにより、匿名化装置１は、匿名化された一部のレコードを、記憶した順序に従って統合し、匿名化されたデータセットを再構成する。
したがって、匿名化装置１は、従来、全てのレコードが同一の属性を持つ必要があったのに対して、ペイロードの属性が異なるデータセットを匿名化できる。 As described above, according to the present embodiment, the anonymization device 1 accepts input of a data set consisting of records whose payload attributes can be specified by code, stores the order of the records, and stores the code or a combination of codes. Extract some records from the data set for each, and perform anonymization processing on common attributes. As a result, the anonymization device 1 integrates some of the anonymized records according to the order in which they were stored, and reconstructs an anonymized data set.
Therefore, the anonymization device 1 can anonymize data sets with different payload attributes, whereas conventionally all records were required to have the same attribute.

このとき、匿名化装置１は、共通の属性を持つ複数のコードを、同一の汎化されたコードに加工し、この汎化されたコード毎に一部のレコードを抽出する。
したがって、匿名化装置１は、コードと属性の共通性とが関連する場合に、汎化されたコードによってグループ化するので、属性が一致している部分を効率的に抽出して匿名化処理を繰り返すことにより、データセット全体を適切に匿名化できる。 At this time, the anonymization device 1 processes a plurality of codes having common attributes into the same generalized code, and extracts some records for each generalized code.
Therefore, the anonymization device 1 groups by the generalized code when the commonality of the code and the attribute is related, so that the anonymization process can be performed by efficiently extracting the part with the matching attribute. By repeating, the entire dataset can be adequately anonymized.

また、匿名化装置１は、汎化されたコード毎に匿名化処理を行った後、詳細化したコード毎に匿名化処理を行う。
したがって、匿名化装置１は、コードの汎化のレベル毎に共通する属性を効率的に抽出して、データセット全体を適切に匿名化できる。 Further, the anonymization device 1 performs anonymization processing for each generalized code, and then performs anonymization processing for each detailed code.
Therefore, the anonymization device 1 can efficiently extract common attributes for each level of code generalization and appropriately anonymize the entire data set.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、前述した実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Moreover, the effects described in the above-described embodiments are merely enumerations of the most suitable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments.

前述の実施形態では、コードの文字列の並びにより属性の共通性が判断できる場合を示したが、これには限られない。
例えば、匿名化装置１は、コードとペイロード属性との対応関係を示すデータベースから、ペイロード属性の少なくとも一部が共通するコードの組み合わせを抽出し、この組み合わせで一部のレコードを抽出してもよい。 In the above-described embodiment, the case where the commonality of attributes can be determined from the arrangement of the character strings of the code was shown, but the present invention is not limited to this.
For example, the anonymization device 1 may extract a combination of codes having at least part of the payload attributes in common from a database showing the correspondence between codes and payload attributes, and extract some records from this combination. .

匿名化装置１による匿名化方法は、ソフトウェアにより実現される。ソフトウェアによって実現される場合には、このソフトウェアを構成するプログラムが、情報処理装置（コンピュータ）にインストールされる。また、これらのプログラムは、ＣＤ－ＲＯＭのようなリムーバブルメディアに記録されてユーザに配布されてもよいし、ネットワークを介してユーザのコンピュータにダウンロードされることにより配布されてもよい。さらに、これらのプログラムは、ダウンロードされることなくネットワークを介したＷｅｂサービスとしてユーザのコンピュータに提供されてもよい。 The anonymization method by the anonymization device 1 is implemented by software. When it is implemented by software, a program that constitutes this software is installed in an information processing device (computer). These programs may be recorded on removable media such as CD-ROMs and distributed to users, or may be distributed by being downloaded to users' computers via a network. Furthermore, these programs may be provided to the user's computer as a web service through the network without being downloaded.

１匿名化装置
１０制御部
１１入力部
１２匿名化処理部
１３統合部
２０記憶部 1 anonymization device 10 control unit 11 input unit 12 anonymization processing unit 13 integration unit 20 storage unit

Claims

an input unit that receives an input of a data set consisting of records whose payload attributes can be specified by a code included in the payload;
a storage unit that stores the order of the records;
an anonymization processing unit that extracts a partial record of the data set for each of the codes or combinations of the codes and performs anonymization processing on common attributes;
An anonymization device comprising an integration unit that integrates the partial anonymized records according to the order and reconstructs an anonymized data set.

2. The anonymization processing unit according to claim 1, wherein a plurality of codes having common attributes are processed into the same generalized code, and the partial records are extracted for each generalized code. Anonymization device.

3. The anonymization device according to claim 2, wherein the anonymization processing unit performs the anonymization processing for each detailed code after performing the anonymization processing for each generalized code.

an input step of accepting input of a data set consisting of records whose payload attributes are identifiable by a code contained in the payload;
a storage step of storing the order of said records;
an anonymization processing step of extracting partial records of the data set for each of the codes or combinations of the codes and performing anonymization processing on common attributes;
An anonymization method executed by a computer, and an integration step of integrating the partial anonymized records according to the order to reconstruct an anonymized data set.

An anonymization program for causing a computer to function as the anonymization device according to any one of claims 1 to 3.