JP7151759B2

JP7151759B2 - Information processing device, information processing method, and program

Info

Publication number: JP7151759B2
Application number: JP2020503639A
Authority: JP
Inventors: 敦典坂井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-03-02
Filing date: 2019-03-01
Publication date: 2022-10-12
Anticipated expiration: 2039-03-01
Also published as: WO2019168144A1; JPWO2019168144A1

Description

本発明は、情報の処理に関し、特に、情報の匿名化に関する。 The present invention relates to information processing, and in particular to anonymization of information.

機械学習又はディープラーニングなどの技術を用いてビックデータなどの大量の個人情報の解析（ビックデータ解析）を基に、多数の人々の行動及び嗜好などを精緻に分析することができる。このような分析を基に、大量の個人情報を用いた新たな製品又はサービスの開発が、求められている。このような、所定のコンテンツを処理する技術が、提案されている（例えば、特許文献１を参照）。 Based on the analysis of a large amount of personal information such as big data (big data analysis) using techniques such as machine learning or deep learning, it is possible to precisely analyze the behavior and preferences of many people. Based on such analysis, development of new products or services using a large amount of personal information is desired. Techniques for processing such predetermined content have been proposed (see Patent Document 1, for example).

ただし、個人情報の扱いにおいては、機微情報（センシティブ情報）の保護を考慮する必要がある。例えば、分析などのために個人情報を第三者に提供する場合、その個人（本人）からの同意が必要である。大量の個人情報を分析するためには、多数の個人から同意を得ることが必要である。その工数は膨大であり、費用は非常に高くなる。あるいは、本人が承諾しない、又は、本人を所在地が不明であるなど、本人の同意を得ることができない場合もある。そのため、個人情報の第三者での利用は、容易ではなかった。 However, when handling personal information, it is necessary to consider the protection of sensitive information. For example, when personal information is provided to a third party for analysis, etc., consent from the individual (the principal) is required. In order to analyze a large amount of personal information, it is necessary to obtain consent from many individuals. The man-hours are enormous and the cost is very high. Alternatively, there may be cases where it is not possible to obtain the consent of the person, such as the person not consenting or the location of the person being unknown. Therefore, it was not easy for a third party to use personal information.

そこで、個人情報の利用を促進するような法令が制定されてきている。例えば、個人を識別することができないように加工された個人情報は、本人の同意なしに第三者が利用できるようになってきている。そこで、個人が識別することができないように個人情報を加工する技術が、提案されている。 Therefore, laws and regulations have been enacted to promote the use of personal information. For example, personal information that has been processed so that individuals cannot be identified can be used by third parties without the consent of the individual. Therefore, techniques for processing personal information so that individuals cannot be identified have been proposed.

個人を識別することができないように個人情報を加工する技術の一つして、匿名化がある（例えば、特許文献２ないし４を参照）。匿名化の処理は、まず、単独で個人を特定できる属性（「識別子」と呼ばれる）を削除する。さらに、匿名化の処理は、組み合わせると個人を特定できる属性（「準識別子」と呼ばれている）を所定の匿名化の強度を満足するように匿名化する。 Anonymization is one of techniques for processing personal information so that individuals cannot be identified (for example, see Patent Documents 2 to 4). The anonymization process first removes attributes (called “identifiers”) that can uniquely identify an individual. In addition, the anonymization process anonymizes attributes (called "quasi-identifiers") that, when combined, can identify an individual so as to satisfy a predetermined anonymization strength.

特開２０１７－００４２６０号公報JP 2017-004260 A 特開２０１７－０４９６９３号公報JP 2017-049693 A 特開２０１６－１３９２６１号公報JP 2016-139261 A 特開２０１５－２３２８６３号公報JP 2015-232863 A

匿名化のやり方（匿名化手法）としては、様々なやり方（手法）が存在している。ここで、匿名化のやり方とは、匿名化の種類に限らず、データをどの程度まで匿名化するか（匿名化強度）の違いを含む。 There are various methods (methods) for anonymization methods (anonymization methods). Here, the method of anonymization is not limited to the type of anonymization, and includes differences in the extent to which data is anonymized (anonymization strength).

最も多く用いられている匿名化手法として、「ｋ－匿名化」がある。ｋ－匿名化とは、全てのレコードをデータセットに分割し、各データセットの中に同じ準識別子（又は準識別子の組）を有するレコードが、少なくともｋ個以上存在するように、匿名化することである。ｋ－匿名化における「ｋ」が、ｋ－匿名化における指標である。 One of the most commonly used anonymization techniques is "k-anonymization." K-anonymization divides all records into datasets and anonymizes them so that there are at least k records with the same quasi-identifier (or a set of quasi-identifiers) in each dataset. That is. “k” in k-anonymization is an index in k-anonymization.

「ｋ」の値を大きくすると、匿名化後のデータにおいて個人を特定するのが難しくなる。つまり、個人のプライバシイの保護の観点からは、匿名化の指標を大きくすることが望ましい。 If the value of "k" is increased, it becomes difficult to identify individuals in the anonymized data. In other words, from the viewpoint of protecting individual privacy, it is desirable to increase the index of anonymization.

一方、匿名化の指標を大きくすると、匿名化後のデータにおける情報量の損失が大きくなる。 On the other hand, if the anonymization index is increased, the amount of information lost in the anonymized data increases.

そこで、個人情報を匿名化する場合、匿名化の指標を適切に決めることが必要である。 Therefore, when anonymizing personal information, it is necessary to appropriately determine an index for anonymization.

一般的な匿名化では、匿名化を実施する担当者（実施者）が、匿名化の対象となる個人情報と、データを利用する第三者などとを基に、準識別子における匿名化の指標を、匿名化が要求される場面ごとに、決めている。より詳細には、匿名化の実施者が、複数の匿名化を用いてデータを匿名化し、匿名化後のデータを確認して、匿名化を選択している。そのため、匿名化を決めるための作業に多くの工数が、必要となっている。さらに、必ずしも複数の実施者の判断が同じとなるとは限られない。そのため、実施者に依存した匿名化が実施されている。 In general anonymization, the person in charge of anonymization (implementer) determines the anonymization index for the quasi-identifier based on the personal information to be anonymized and the third party who uses the data. is determined for each scene where anonymization is required. More specifically, the anonymization implementer anonymizes data using multiple anonymization methods, checks the data after anonymization, and selects anonymization. Therefore, many man-hours are required for the work for deciding on anonymization. Furthermore, the judgments of multiple practitioners are not necessarily the same. Therefore, anonymization is implemented depending on the implementer.

さらに、個人情報に複数の準識別子が含まれる場合、各準識別子に対して匿名化の指標を決める必要がある。この場合、実施者は、複数の準識別子それぞれに対する指標（指標の組合せ）を決める必要がある。 Furthermore, when personal information includes multiple quasi-identifiers, it is necessary to determine an anonymization index for each quasi-identifier. In this case, the implementer needs to determine an index (combination of indices) for each of the multiple quasi-identifiers.

この場合、多くの選択肢があるため、多くの匿名化後のデータを確認して匿名化を決める必要があった。その結果、匿名化後のデータの確認のための、多くの作業が必要とされている。 In this case, since there are many options, it was necessary to check a lot of data after anonymization and decide on anonymization. As a result, much work is needed to verify data after anonymization.

匿名化手法に対して他の匿名化手法と比較するための指標又は評価値を算出できると、実施者は、その値を比較することで、匿名化手法を選択できる。そこで、匿名化を比較するための値の算出が望まれている。 If an index or an evaluation value for comparing an anonymization method with other anonymization methods can be calculated, the implementer can select an anonymization method by comparing the values. Therefore, calculation of a value for comparing anonymization is desired.

特許文献１に記載の発明は、コンテンツにおける同義語を処理するものであり、匿名化に関するものではない。 The invention described in Patent Literature 1 is for processing synonyms in content, not for anonymization.

特許文献２ないし４に記載の発明は、匿名化に関するものであるが、複数の匿名化を比較するものではない。 The inventions described in Patent Documents 2 to 4 relate to anonymization, but do not compare multiple anonymizations.

特許文献１ないし４に記載の発明は、匿名化を比較できないという問題点があった。 The inventions described in Patent Documents 1 to 4 have a problem that anonymization cannot be compared.

本発明の目的は、上記問題点を解決し、匿名化の比較に用いるための値を算出する情報処理装置などを提供することにある。 An object of the present invention is to solve the above problems and to provide an information processing apparatus or the like that calculates a value for use in anonymization comparison.

本発明の一形態における情報処理装置は、確信度を算出するモデルを用いて匿名化前データにおける確信度である匿名化前確信度を算出する匿名化前確信度算出手段と、匿名化前データに匿名化手法を適用して匿名化後データを作成する匿名化手段と、匿名化手法の匿名化強度を算出する匿名化強度算出手段と、モデルを用いて、匿名化後データにおける確信度である匿名化後確信度を算出する匿名化後確信度算出手段と、匿名化前確信度と匿名化後確信度との差と、匿名化強度とを基に匿名化手法の評価値を算出する評価値算出手段とを含む。 An information processing apparatus according to one embodiment of the present invention includes pre-anonymization certainty calculation means for calculating pre-anonymization certainty, which is the certainty in pre-anonymized data using a model for calculating certainty, and pre-anonymization data An anonymization method that creates anonymized data by applying an anonymization method to the anonymization method, and an anonymization strength calculation method that calculates the anonymization strength of the anonymization method An anonymization method evaluation value is calculated based on a post-anonymization confidence calculation means for calculating a certain post-anonymization confidence, the difference between the pre-anonymization confidence and the post-anonymization confidence, and the anonymization strength. and evaluation value calculation means.

本発明の一形態における情報処理方法は、確信度を算出するモデルを用いて匿名化前データにおける確信度である匿名化前確信度を算出し、匿名化前データに匿名化手法を適用して匿名化後データを作成し、匿名化手法の匿名化強度を算出し、モデルを用いて、匿名化後データにおける確信度である匿名化後確信度を算出し、匿名化前確信度と匿名化後確信度との差と、匿名化強度とを基に匿名化手法の評価値を算出する。 An information processing method according to one embodiment of the present invention calculates a pre-anonymization confidence that is a confidence in pre-anonymization data using a model for calculating confidence, and applies an anonymization method to the pre-anonymization data. Create post-anonymization data, calculate the anonymization strength of the anonymization method, use the model to calculate the post-anonymization confidence that is the confidence in the anonymization data, and compare the confidence before anonymization and anonymization An evaluation value of the anonymization method is calculated based on the difference from the posterior confidence and the anonymization strength.

本発明の一形態におけるプログラムは、確信度を算出するモデルを用いて匿名化前データにおける確信度である匿名化前確信度を算出する処理と、匿名化前データに匿名化手法を適用して匿名化後データを作成する処理と、匿名化手法の匿名化強度を算出する処理と、モデルを用いて、匿名化後データにおける確信度である匿名化後確信度を算出する処理と、匿名化前確信度と匿名化後確信度との差と、匿名化強度とを基に匿名化手法の評価値を算出する処理とをコンピュータに実行させる。

A program in one embodiment of the present invention includes a process of calculating pre-anonymization confidence, which is confidence in pre-anonymized data, using a confidence calculation model, and applying an anonymization method to pre-anonymization data. Processing to create data after anonymization, processing to calculate the anonymization strength of the anonymization method, processing to calculate the degree of confidence after anonymization, which is the degree of confidence in the data after anonymization, using a model, and anonymization A computer is caused to execute a process of calculating an evaluation value of an anonymization method based on the difference between the pre-confidence and the post-anonymization confidence and the anonymization strength.

本発明に基づけば、匿名化の比較に用いるための値を算出するとの効果を奏することができる。 According to the present invention, it is possible to obtain an effect of calculating a value to be used for anonymization comparison.

図１は、第１の実施形態に係る情報処理装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of an information processing apparatus according to the first embodiment. 図２は、匿名化の対象データの一例を示す図である。FIG. 2 is a diagram showing an example of anonymization target data. 図３は、匿名化手法の一例の匿名化後のデータを示す図である。FIG. 3 is a diagram showing data after anonymization as an example of an anonymization method. 図４は、匿名化手法の別の例の匿名化後のデータを示す図である。FIG. 4 is a diagram showing data after anonymization by another example of anonymization method. 図５は、第１の実施形態に係る学習用特徴量算出部が算出する特徴量の一例を示す図である。FIG. 5 is a diagram illustrating an example of a feature amount calculated by a learning feature amount calculation unit according to the first embodiment; 図６は、データの匿名化を説明するための図である。FIG. 6 is a diagram for explaining anonymization of data. 図７は、第１の実施形態に係る情報処理装置におけるモデルを作成する動作の一例を示すフロー図である。FIG. 7 is a flowchart showing an example of the operation of creating a model in the information processing apparatus according to the first embodiment; 図８は、第１の実施形態に係る情報処理装置における匿名化手法の評価値を算出する動作の一例を示すフロー図である。FIG. 8 is a flow chart showing an example of the operation of calculating the evaluation value of the anonymization method in the information processing apparatus according to the first embodiment. 図９は、第１の実施形態に係る匿名化手法選択部の動作を含む情報処理装置の動作の一例を示すフロー図である。FIG. 9 is a flow diagram showing an example of the operation of the information processing device including the operation of the anonymization method selection unit according to the first embodiment. 図１０は、第１の実施形態に係る情報処理装置の概要の構成の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of a schematic configuration of the information processing apparatus according to the first embodiment. 図１１は、第１の実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。11 is a block diagram illustrating an example of a hardware configuration of an information processing apparatus according to the first embodiment; FIG. 図１２は、第１の実施形態に係る情報処理装置を含む情報処理システムの構成の一例を示すブロック図である。FIG. 12 is a block diagram showing an example of the configuration of an information processing system including the information processing device according to the first embodiment.

次に、本発明の実施形態について図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

なお、各図面は、本発明の実施形態を説明するためのものである。ただし、本発明は、各図面の記載に限られるわけではない。また、各図面の同様の構成には、同じ番号を付し、その繰り返しの説明を、省略する場合がある。また、以下の説明に用いる図面において、本発明の説明に関係しない部分の構成については、記載を省略し、図示しない場合もある。 In addition, each drawing is for demonstrating embodiment of this invention. However, the present invention is not limited to the description of each drawing. In addition, the same numbers are assigned to the same configurations in each drawing, and repeated descriptions thereof may be omitted. In addition, in the drawings used for the following description, the description of the configuration of the portion that is not related to the description of the present invention may be omitted and may not be illustrated.

＜第１の実施形態＞
以下、図面を参照して、第１の実施形態について説明する。<First embodiment>
A first embodiment will be described below with reference to the drawings.

本発明における第１の実施形態に係る情報処理装置１０は、データの匿名化に関連する。 The information processing apparatus 10 according to the first embodiment of the present invention relates to anonymization of data.

そこで、まず、データの匿名化について説明する。 Therefore, first, the anonymization of data will be explained.

図６は、データの匿名化を説明するための図である。 FIG. 6 is a diagram for explaining anonymization of data.

図６の上部が、匿名化前のデータである。図６の下部が、匿名化後のデータである。図６において、例えば、属性「年齢」は、２０歳の幅のデータに変換されている。そのため、匿名化後データの利用者が匿名化後データを用いて個人の年齢を特定しようとしても、利用者は、年齢幅２０歳の範囲までしか、個人を限定できない。あるいは、属性「市区町村」は、削除されている。そのため、利用者が、個人の住所を特定しようとしても、都道府県までしか特定できない。 The upper part of FIG. 6 is the data before anonymization. The lower part of FIG. 6 is the data after anonymization. In FIG. 6, for example, the attribute "age" is converted into data with a width of 20 years. Therefore, even if the user of the anonymized data tries to specify the age of the individual using the anonymized data, the user can only limit the individual to an age range of 20 years old. Alternatively, the attribute "city" is deleted. Therefore, even if a user tries to identify an individual's address, it can only identify up to the prefecture.

ただし、匿名化手法は、一つではない。図面を参照して、匿名化手法の例を説明する。 However, there is more than one anonymization method. An example of an anonymization method will be described with reference to the drawings.

図２は、匿名化の対象データの一例を示す図である。図２の上部は、顧客の個人情報の一例である。図２の下部は、購入履歴に関する情報の一例である。 FIG. 2 is a diagram showing an example of anonymization target data. The upper part of FIG. 2 is an example of a customer's personal information. The lower part of FIG. 2 is an example of information about purchase history.

例えば、図２のデータが利用者に提供された場合、データの利用者は、図２の情報から、「千葉県市原市に住む１５歳の男子学生である山田太郎が、２０１７年９月１９日１２時３０分に、千葉店でＡパンを購入したこと」が分かる。 For example, when the data in FIG. 2 is provided to the user, the user of the data can read from the information in FIG. At 12:30 on the day, you can see that A bread was purchased at the Chiba store.

図３は、匿名化手法の一例の匿名化後のデータを示す図である。図３に示されている匿名化後データは、会員ＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）及び名前など、個人を特定できる属性（識別子）を削除し、残り属性を匿名化している。 FIG. 3 is a diagram showing data after anonymization as an example of an anonymization method. In the anonymized data shown in FIG. 3, attributes (identifiers) that can identify an individual, such as a member ID (Identifier) and name, are deleted, and the remaining attributes are anonymized.

図３のデータが利用者に提供された場合、データ利用者は、図３の情報から、「千葉県市原市に住む１０代の男子学生が、１２時台に千葉店でＡパンを購入したこと」が分かる。 When the data in Fig. 3 is provided to the user, the data user can read from the information in Fig. 3, "A male student in his teens who lives in Ichihara City, Chiba Prefecture purchased bread A at the Chiba store around 12:00. I understand that.

図４は、匿名化手法の別の例の匿名化後のデータを示す図である。図４に示されている匿名化手法は、図３に示されている匿名化手法と比べ、データをより汎化している。 FIG. 4 is a diagram showing data after anonymization by another example of anonymization method. The anonymization technique shown in FIG. 4 generalizes the data more than the anonymization technique shown in FIG.

図４のデータが利用者に提供された場合、データの利用者は、図４の情報から、「千葉県に住む１９歳以下の男子学生が、１２時から１４時の間に千葉でパンを購入したこと」が分かる。 When the data in Fig. 4 is provided to the user, the user of the data can read from the information in Fig. 4, "A male student aged 19 or younger living in Chiba purchased bread in Chiba between 12:00 and 14:00. I understand that.

図４に示されている匿名化手法の方が、より情報が汎化されている。しかし、図４に示されている匿名化手法は、利用可能な情報量が少ない。 The anonymization technique shown in FIG. 4 generalizes the information more. However, the anonymization approach shown in FIG. 4 has a low amount of information available.

実際の匿名化においては、複数の匿名化の中から適切な匿名化を選択する必要がある。一般的な匿名化において、匿名化の実施者が、複数の匿名化の結果を基に、恣意的に匿名化を選択していた。しかし、この場合、匿名化は、実施者ごとに、異なる可能性があった。 In actual anonymization, it is necessary to select appropriate anonymization from multiple anonymization methods. In general anonymization, the anonymizer arbitrarily selected anonymization based on multiple anonymization results. However, in this case, anonymization could be different for each practitioner.

第１の実施形態に係る情報処理装置１０は、匿名化手法を評価するための指標（評価値）を算出する。 The information processing apparatus 10 according to the first embodiment calculates an index (evaluation value) for evaluating the anonymization method.

［構成の説明］
まず、図面を参照して、本発明における第１の実施形態に係る情報処理装置１０の構成について説明する。[Description of configuration]
First, the configuration of an information processing apparatus 10 according to the first embodiment of the present invention will be described with reference to the drawings.

図１は、第１の実施形態に係る情報処理装置１０の構成の一例を示すブロック図である。 FIG. 1 is a block diagram showing an example configuration of an information processing apparatus 10 according to the first embodiment.

情報処理装置１０は、匿名化部２０１と、匿名化前確信度算出部２０２と、匿名化後確信度算出部２０３と、評価値算出部２０４と、匿名化強度算出部２１１とを含む。情報処理装置１０は、さらに、学習用特徴量算出部１０３と、学習部１０４とを含む。情報処理装置１０は、さらに、匿名化手法選択部２０５を含む。 Information processing apparatus 10 includes an anonymization unit 201 , a pre-anonymization certainty calculation unit 202 , a post-anonymization certainty calculation unit 203 , an evaluation value calculation unit 204 , and an anonymization strength calculation unit 211 . The information processing apparatus 10 further includes a learning feature amount calculation unit 103 and a learning unit 104 . Information processing apparatus 10 further includes an anonymization method selection unit 205 .

情報処理装置１０は、匿名化手法の評価として、確信度を用いる。情報処理装置１０は、確信度の算出に所定のモデルを用いる。 The information processing device 10 uses the degree of certainty as an evaluation of the anonymization method. The information processing apparatus 10 uses a predetermined model to calculate the certainty.

「確信度」とは、データを用いた判定結果に対して、どのくらいの確信が持てるかの度合い（確からしさ）を表す値である。確信度は、限定されない。確信度は、利用が想定される場面（想定ユースケース）と、確信度の算出に用いられるデータとを基に決定される。例えば、利用者の情報から飲料の購入を予測する場合、モデルは、入力された利用者の情報に対して、その利用者が飲料を購買する確率（判定に対する正解の程度（正解率））を出力する。この確率が、確信度の一例である。なお、確信度は、確率のような０から１までの範囲の数値に限定されない。確信度は、他の範囲の数値でもよい。 The “certainty degree” is a value that indicates the degree of certainty (likelihood) with respect to the determination result using data. Confidence is not limited. The degree of certainty is determined based on the scene in which use is assumed (assumed use case) and the data used to calculate the degree of certainty. For example, when predicting the purchase of a drink from user information, the model calculates the probability that the user will purchase a drink (the degree of correctness for the judgment (correct answer rate)) for the input user information. Output. This probability is an example of confidence. Note that the degree of certainty is not limited to a numerical value ranging from 0 to 1 like the probability. Confidence may have other ranges of values.

「モデル」は、想定ユースケースに沿って選定される。例えば、モデルは、過去において想定ユースケースに近いケースに用いられたモデルの中から選定される。後ほど説明するように、情報処理装置１０は、モデルを学習する。そのため、モデルは、想定ユースケースとは異なるケースに適用されたモデルでもよい。ただし、想定ユースケースに近いケースに用いられたモデルを用いると、学習における収束が早くなる。 The “model” is selected according to the assumed use case. For example, the model is selected from models that have been used in the past for cases close to the assumed use case. As will be described later, the information processing device 10 learns models. As such, the model may be a model applied to a different case than the intended use case. However, if a model used for a case close to the assumed use case is used, convergence in learning will be faster.

学習用特徴量算出部１０３は、学習部１０４がモデルの学習に用いる教師データを、取得する。そして、学習用特徴量算出部１０３は、教師データを基に、学習部１０４がモデルの学習に用いる特徴量（学習用特徴量）を算出する。 The learning feature value calculation unit 103 acquires teacher data that the learning unit 104 uses for model learning. Then, the learning feature amount calculation unit 103 calculates a feature amount (learning feature amount) that the learning unit 104 uses for model learning based on the teacher data.

情報処理装置１０が取得する教師データは、限定されない。例えば、教師データは、情報処理装置１０の利用者が、匿名化の対象となるデータ（対象データ）の一部に対して、想定される利用ケースにおける正誤の値を設定したデータでもよい。 The teacher data acquired by the information processing apparatus 10 is not limited. For example, the training data may be data in which the user of the information processing apparatus 10 sets correct/wrong values in assumed use cases for part of the data to be anonymized (target data).

利用者における対象データの作成の一例を説明する。まず、利用者は、匿名化後のデータの想定ユースケースを基に、対象データの中から、正誤を判定できる一部のデータを抽出する。そして、利用者は、抽出した対象データにおいて、識別子など単独で個人を特定できる属性を削除する。さらに、利用者は、想定利用ユースケースを基に、不要な属性を削除する。そして、利用者は、想定ユースケースを基に、抽出した対象データに正誤の判定結果を設定する。この結果が、教師データとなる。 An example of creating target data by a user will be described. First, the user extracts a portion of the target data that can be judged correct or incorrect based on the assumed use case of the anonymized data. Then, the user deletes an attribute such as an identifier that can identify an individual alone from the extracted target data. Furthermore, the user deletes unnecessary attributes based on the assumed usage use case. Based on the assumed use case, the user sets the correctness/incorrectness judgment result for the extracted target data. This result becomes teacher data.

なお、情報処理装置１０は、ディープラーニングなどを用いて所定のデータの集合（例えば、対象データ）から教師データを作成又は抽出する構成を含み、その構成を用いて教師データを作成又は抽出してもよい。 The information processing apparatus 10 includes a configuration for creating or extracting teacher data from a set of predetermined data (for example, target data) using deep learning or the like, and uses the configuration to create or extract teacher data. good too.

なお、教師データは、対象データの一部に限定されない。教師データは、所定の知見を基に作成されたデータでもよい。教師データは、対象データの一部と、他のデータとを含んでいてもよい。 Note that the teacher data is not limited to part of the target data. The teacher data may be data created based on predetermined knowledge. The teacher data may include part of the target data and other data.

学習用特徴量算出部１０３が算出する特徴量は、限定されない。特徴量は、学習部１０４におけるモデルの学習に合わせて決定される。 The feature amount calculated by the learning feature amount calculation unit 103 is not limited. The feature amount is determined according to model learning in the learning unit 104 .

特徴量の一例は、スカラ値である。例えば、対象データが属性として「性別」を含む場合、学習用特徴量算出部１０３は、性別を二値（例えば、男性を「０」、女性を「１」）に変換してもよい。特徴量は、二値に限られず、二を超える複数の値を取ってもよい。特徴量は、整数に限られず、小数又は分数など有理数でもよい。 An example of a feature quantity is a scalar value. For example, when the target data includes "gender" as an attribute, the learning feature quantity calculation unit 103 may convert the gender into a binary value (for example, "0" for male and "1" for female). The feature amount is not limited to two values, and may take more than two values. The feature amount is not limited to an integer, and may be a rational number such as a decimal number or a fraction.

あるいは、特徴量は、ベクトルでもよい。例えば、対象データが属性として複数の職業を含む場合、学習用特徴量算出部１０３は、職業の特徴量として、ベクトルを算出してもよい。例えば、属性「職業」が４種類（学生、会社員、主婦、及び、公務員）の属性値を含むとする。この場合、例えば、学習用特徴量算出部１０３は、属性「職業」の属性値が「会社員」の対象データの特徴量として、ベクトル（０，１，０，０）を算出する。 Alternatively, the feature quantity may be a vector. For example, when the target data includes a plurality of occupations as attributes, the learning feature quantity calculation unit 103 may calculate a vector as the occupation feature quantity. For example, assume that the attribute "occupation" includes attribute values of four types (student, office worker, housewife, and civil servant). In this case, for example, the learning feature quantity calculation unit 103 calculates a vector (0, 1, 0, 0) as the feature quantity of the target data whose attribute value of the attribute “occupation” is “office worker”.

図５は、第１の実施形態に係る学習用特徴量算出部１０３が算出する特徴量の一例を示す図である。学習用特徴量算出部１０３は、図５の上部に示されている教師データを、下部に示されている特徴量のデータに変換する。図５に示されている特徴量を説明する。 FIG. 5 is a diagram showing an example of feature amounts calculated by the learning feature amount calculation unit 103 according to the first embodiment. The learning feature amount calculation unit 103 converts the teacher data shown in the upper part of FIG. 5 into feature amount data shown in the lower part. The feature quantities shown in FIG. 5 will be described.

属性「性別」は、二値（男が「０」、女が「１」）に変換されている。属性「年齢」は、文字データから整数に変換されている。属性「職業」は、「ベクトル（学生、会社員、主婦、公務員）」における対応する要素を「１」、その他の要素を「０」としたベクトルに変換されている。属性「都道府県」は、都道府県に割り振った数値（整数）に変換されている。属性「購入時間」は、小数２位までの数値（有理数）に変換されている。属性「店舗エリア」は、店舗エリアを示す数値（整数）に変換されている。属性「数（購入数）」は、元のデータが数値（整数）のため、そのまま整数の値となっている。属性「正誤」は、二値（正が「１」、誤が「０」）に変換されている。 The attribute "gender" is converted to binary values ("0" for male and "1" for female). The attribute "age" is converted from character data to an integer. The attribute "occupation" is converted into a vector in which the corresponding element in the "vector (student, office worker, housewife, civil servant)" is "1" and the other elements are "0". The attribute "prefecture" is converted into a numerical value (integer) assigned to each prefecture. The attribute "purchase time" is converted into a numerical value (rational number) with two decimal places. The attribute "store area" is converted into a numerical value (integer) indicating the store area. The attribute "number (number of purchases)" is an integer value as it is because the original data is a numerical value (integer). The attribute "correct/incorrect" is converted to a binary value (correct is "1" and incorrect is "0").

ただし、情報処理装置１０が用いる特徴量は、図５に限定されない。情報処理装置１０は、図５と異なる特徴量を用いてもよい。 However, the feature amount used by the information processing apparatus 10 is not limited to that shown in FIG. The information processing apparatus 10 may use feature amounts different from those in FIG.

図１を参照した説明に戻る。 Returning to the description with reference to FIG.

学習部１０４は、学習用特徴量算出部１０３が算出した特徴量を用いて、データにおける「確信度」を算出するモデルを作成及び学習する。学習部１０４における機械学習は、限定されない。学習部１０４は、例えば、使用する機械学習として、サポートベクタマシン、ニューラルネット、又は、ベイズ分類器を用いればよい。なお、教師データを用いているため、学習部１０４における学習は、教師あり学習である。 The learning unit 104 uses the feature amount calculated by the learning feature amount calculation unit 103 to create and learn a model for calculating the "certainty factor" in the data. Machine learning in the learning unit 104 is not limited. The learning unit 104 may use, for example, a support vector machine, a neural network, or a Bayesian classifier as machine learning. Since teacher data is used, the learning in the learning unit 104 is supervised learning.

また、学習部１０４は、学習対象のモデルとして、予め保持するモデルを用いてもよい。あるいは、学習部１０４は、学習の際に、図示しない外部の装置からモデルを取得してもよい。 In addition, the learning unit 104 may use a model held in advance as a model to be learned. Alternatively, the learning unit 104 may acquire a model from an external device (not shown) during learning.

なお、情報処理装置１０は、教師データとして特徴量を取得してもよい。この場合、学習部１０４が教師データを受信して学習を実行すればよい。そのため、この場合、情報処理装置１０は、学習用特徴量算出部１０３を含まなくてもよい。 Note that the information processing apparatus 10 may acquire feature amounts as training data. In this case, the learning unit 104 may receive teacher data and perform learning. Therefore, in this case, the information processing apparatus 10 does not need to include the learning feature amount calculation unit 103 .

モデルの学習が終了すると、情報処理装置１０は、対象データの匿名化を開始する。なお、教師データが対象データの一部を用いている場合、情報処理装置１０は、以下の説明において、教師データに用いたデータを除いた対象データを用いる。 When model learning ends, the information processing apparatus 10 starts anonymizing the target data. Note that when the teacher data uses part of the target data, the information processing apparatus 10 uses the target data excluding the data used as the teacher data in the following description.

なお、対象データは、一般的な匿名化における前処理（例えば、識別子と不要な属性の削除）が終了したデータである。 Note that the target data is data that has undergone general anonymization preprocessing (for example, deletion of identifiers and unnecessary attributes).

匿名化前確信度算出部２０２は、匿名化の対象である対象データ（匿名化前データ）をモデルに適用して確信度（匿名化前確信度）を算出する。匿名化前確信度算出部２０２は、必要に応じて、学習用特徴量算出部１０３と同様のやり方を用いて、対象データの特徴量を算出する。 The pre-anonymization certainty calculation unit 202 applies target data (pre-anonymization data) to be anonymized to a model to calculate a certainty (pre-anonymization certainty). The pre-anonymization certainty calculation unit 202 calculates the feature amount of the target data using the same method as the learning feature amount calculation unit 103 as necessary.

匿名化部２０１は、匿名化手法を取得する。そして、匿名化部２０１は、取得した匿名化手法を用いて、対象データを匿名化する。匿名化部２０１が用いる匿名化手法は、限定されない。例えば、匿名化部２０１は、「ｋ－匿名化」、「ｌ－多様化」及び／又は「ｔ－近接化」などを用いてもよい。「ｌ－多様化」は、「ｌ」種類以上の属性を含むように、データを匿名化することである。「ｔ－近接化」は、ｋ-匿名化前後におけるデータの分布を近づけるように、データを匿名化することである。 The anonymization unit 201 acquires an anonymization method. Then, the anonymization unit 201 anonymizes the target data using the acquired anonymization method. The anonymization method used by the anonymization unit 201 is not limited. For example, the anonymization unit 201 may use “k-anonymization”, “l-diversification” and/or “t-proximity”. “l-diversification” is to anonymize the data so that it contains more than “l” types of attributes. “t-proximization” is to anonymize data so that the distribution of data before and after k-anonymization is closer.

匿名化部２０１が取得する匿名化手法の取得元は、限定されない。例えば、匿名化部２０１は、利用者の端末装置から評価対象となる匿名化手法を受信してもよい。あるいは、匿名化部２０１は、図示しない記憶装置に予め保存されている複数の匿名化から所定の選択規則に沿って、匿名化手法を選択してもよい。 The acquisition source of the anonymization method acquired by the anonymization unit 201 is not limited. For example, the anonymization unit 201 may receive an anonymization method to be evaluated from the terminal device of the user. Alternatively, the anonymization unit 201 may select an anonymization method from a plurality of anonymization methods pre-stored in a storage device (not shown) according to a predetermined selection rule.

匿名化強度算出部２１１は、匿名化部２０１が用いた匿名化手法に対する匿名化強度を算出する。 The anonymization strength calculation unit 211 calculates an anonymization strength for the anonymization method used by the anonymization unit 201 .

第１の実施形態に係る匿名化強度は、限定されない。匿名化強度は、匿名化手法及び想定ユースケースを基に決定されればよい。次に示す「数１」は、ｋ－匿名化を用いた場合の匿名化強度Ｚの一例である。 The anonymization strength according to the first embodiment is not limited. The anonymization strength should be determined based on an anonymization method and an assumed use case. The following “numerical number 1” is an example of the anonymization strength Z when k-anonymization is used.

［数１］
Ｚ＝（ａ・Ｋ）＋ｂ・（１／Ｌ）＋（ｃ・Ｔ）
数１において、「ａ」、「ｂ」、及び「ｃ」は、それぞれ、所定の重みである。「Ｋ」は、ｋ－匿名化における「ｋ（指標値）」である。「Ｋ」の値が多いほど（つまり、同じ値の属性の数が多いほど）、匿名化強度Ｚは、高くなる。「Ｌ」は、属性の数である。「Ｌ」の値の少ないほど（つまり、個人を特定するために組み合わせることができる属性が少ないほど）、匿名化強度Ｚは、高くなる。Ｔは、匿名化における汎化の程度を示す値である。例えば、Ｔは、時間を匿名化する場合における時間幅に対応した値である。具体的に例示すると、時間幅が１時間の場合にＴを「１」とし、時間幅が２時間の場合にＴを「２」とする。「Ｔ」の値が大きいほど（つまり、より範囲が広く匿名化されるほど）、匿名化強度Ｚは、高くなる。なお、匿名化強度Ｚは、正の値である。[Number 1]
Z=(a*K)+b*(1/L)+(c*T)
In Equation 1, "a", "b", and "c" are predetermined weights. “K” is “k (index value)” in k-anonymization. The greater the value of “K” (that is, the greater the number of attributes with the same value), the higher the anonymization strength Z. "L" is the number of attributes. The smaller the value of “L” (that is, the fewer attributes that can be combined to identify an individual), the higher the anonymization strength Z. T is a value indicating the degree of generalization in anonymization. For example, T is a value corresponding to the duration when anonymizing time. As a specific example, T is set to "1" when the time width is one hour, and T is set to "2" when the time width is two hours. The greater the value of “T” (that is, the wider the range of anonymization), the higher the anonymization strength Z. Note that the anonymization strength Z is a positive value.

ここまでの説明では、ｋ－匿名化を用いて説明しているが、本実施形態は、ｋ-匿名化に限定されない。例えば、情報処理装置１０は、ｋ－匿名化に加え、ｌ－多様化及び／又はｔ－近接化を用いてもよい。この場合、情報処理装置１０は、匿名化強度にこれらの匿名化の指標を追加してもよい。 Although k-anonymization has been used in the description so far, this embodiment is not limited to k-anonymization. For example, the information processing device 10 may use l-diversification and/or t-proximization in addition to k-anonymization. In this case, the information processing apparatus 10 may add these anonymization indicators to the anonymization strength.

匿名化後確信度算出部２０３は、匿名化部２０１において匿名化された匿名化後データをモデルに適用して確信度（匿名化後確信度）を算出する。匿名化後確信度算出部２０３は、匿名化前確信度算出部２０２と同様に、必要に応じて、学習用特徴量算出部１０３と同様のやり方を用いて、匿名化後のデータの特徴量を算出する。 The post-anonymization confidence calculation unit 203 applies the anonymized data anonymized by the anonymization unit 201 to the model to calculate the confidence (post-anonymization confidence). As with the pre-anonymization confidence calculation unit 202, the post-anonymization confidence calculation unit 203 uses the same method as the learning feature calculation unit 103 as necessary to calculate the feature quantity of the data after anonymization. Calculate

評価値算出部２０４は、匿名化前確信度と、匿名化後確信度と、匿名化強度とを用いて、匿名化手法に対する評価値を算出する。 The evaluation value calculation unit 204 calculates an evaluation value for the anonymization method using the pre-anonymization certainty factor, the post-anonymization certainty factor, and the anonymization strength.

望ましい匿名化手法は、匿名化前確信度と匿名化後確信度のとの差が小さく（つまり、匿名化の前後で確信度の変化が少ない）、かつ、匿名化強度が大きい（つまり、個人が特定されにくい）匿名化手法である。そこで、評価値算出部２０４は、匿名化手法の評価値として、匿名化前確信度と匿名化後確信度との差が小さいと値が大きくなり、かつ、匿名化強度が高いと値が大きくなる評価値を算出する。 A desirable anonymization method has a small difference between pre-anonymization confidence and post-anonymization confidence (that is, little change in confidence before and after anonymization) and a high anonymization strength (that is, individual is difficult to identify) is an anonymization method. Therefore, as the evaluation value of the anonymization method, the evaluation value calculation unit 204 increases the value when the difference between the confidence before anonymization and the confidence after anonymization is small, and increases when the anonymization strength is high. Calculate the evaluation value.

評価値Ａの計算式の一例を、「数２」に示す。 An example of the formula for calculating the evaluation value A is shown in "Equation 2".

［数２］

[Number 2]

数２において、右辺の第１項が、確信度の差の項である。第２項が、匿名化強度の項である。「ｄ」及び「ｅ」は、確信度の差と匿名化強度との比率を決める所定の重みである。「ｎ」は、対象データの数である。「ｘ_ｉ」は、匿名化前確信度（つまり、対象データの確信度）である。「ｙ_ｉ」は、匿名化後確信度（つまり、匿名化後データの確信度）である。右辺の第１項は、匿名化の前後における確信度の変化の絶対値の平均値である。第２項のＺは、上記で説明した匿名化強度である。In Equation 2, the first term on the right side is the term for the difference in confidence. The second term is the anonymization strength term. "d" and "e" are predetermined weights that determine the ratio between the confidence level difference and the anonymization strength. "n" is the number of target data. “x _i ” is the pre-anonymization certainty (that is, the certainty of the target data). “y _i ” is the post-anonymization confidence (that is, the confidence of the anonymized data). The first term on the right-hand side is the average absolute value of changes in confidence before and after anonymization. Z in the second term is the anonymization strength described above.

第１項は、匿名化前後の確信度の差が小さいほど０に近い負の値となり、確信度の差が大きいほど絶対値が大きな負の値となる。第２項は、匿名化強度が大きいほど大きな正の値となり、匿名化強度が小さいほど小さな正の値となる。 The first term has a negative value close to 0 when the difference in confidence before and after anonymization is small, and a negative value with a large absolute value when the difference in confidence is large. The second term takes a larger positive value as the anonymization strength increases, and takes a smaller positive value as the anonymization strength decreases.

評価値算出部２０４は、算出した評価値を所定の装置（例えば、匿名化手法の評価を依頼した装置）に出力する。評価値算出部２０４は、図示しない表示装置に評価値を表示してもよい。 The evaluation value calculation unit 204 outputs the calculated evaluation value to a predetermined device (for example, the device that requested the evaluation of the anonymization method). The evaluation value calculation unit 204 may display the evaluation value on a display device (not shown).

情報処理装置１０は、一つに限られず、複数の匿名化手法の評価値を算出してもよい。 The information processing apparatus 10 may calculate evaluation values for a plurality of anonymization methods, not limited to one.

匿名化手法選択部２０５は、複数の匿名化手法の中から、評価値が最大となる匿名化手法、評価値が所定の閾値より大きい匿名化手法、又は、評価値が大きい方から所定の数の匿名化手法を選択する。そして、匿名化手法選択部２０５は、選択した匿名化手法を所定の装置に出力する。 The anonymization method selection unit 205 selects, from among a plurality of anonymization methods, an anonymization method with the largest evaluation value, an anonymization method with an evaluation value greater than a predetermined threshold value, or a predetermined number of choose an anonymization method for Then, the anonymization method selection unit 205 outputs the selected anonymization method to a predetermined device.

あるいは、匿名化手法選択部２０５は、次のように動作してもよい。 Alternatively, the anonymization method selection unit 205 may operate as follows.

匿名化手法選択部２０５は、評価値が所定の条件を満たすか否かを判定する。例えば、匿名化手法選択部２０５は、評価値が所定の閾値を越えているか否かを判定する。 The anonymization method selection unit 205 determines whether the evaluation value satisfies a predetermined condition. For example, the anonymization method selection unit 205 determines whether the evaluation value exceeds a predetermined threshold.

評価値が所定の条件を満たさない場合、匿名化手法選択部２０５は、複数の匿名化手法の中から選択していない別の匿名化手法を選択する。そして、匿名化手法選択部２０５は、選択した匿名化手法を用いて、匿名化部２０１、匿名化後確信度算出部２０３、匿名化強度算出部２１１、及び評価値算出部２０４に同様動作を実行させ、選択した匿名化手法の評価値を取得する。匿名化手法選択部２０５は、評価値が所定の条件を満足するまで、匿名化手法を選択して各構成に同様の動作を繰り返させる。 If the evaluation value does not satisfy the predetermined condition, the anonymization method selection unit 205 selects another anonymization method that has not been selected from among the plurality of anonymization methods. Then, the anonymization method selection unit 205 causes the anonymization unit 201, the post-anonymization certainty calculation unit 203, the anonymization strength calculation unit 211, and the evaluation value calculation unit 204 to perform the same operation using the selected anonymization method. Run it and get the evaluation value of the selected anonymization method. The anonymization method selection unit 205 selects an anonymization method and causes each component to repeat the same operation until the evaluation value satisfies a predetermined condition.

評価値が所定の閾値を越えた場合、匿名化手法選択部２０５は、その匿名化手法を出力する。 When the evaluation value exceeds a predetermined threshold, the anonymization method selection unit 205 outputs the anonymization method.

このような動作を基に、情報処理装置１０は、評価値が所定の条件を満足する匿名化手法を選択できる。 Based on such operations, the information processing apparatus 10 can select an anonymization method whose evaluation value satisfies a predetermined condition.

なお、情報処理装置１０は、他の装置で学習したモデルを取得してもよい。この場合、情報処理装置１０は、学習用特徴量算出部１０３と学習部１０４とを含まなくてもよい。 Note that the information processing device 10 may acquire a model learned by another device. In this case, the information processing apparatus 10 does not have to include the learning feature value calculation unit 103 and the learning unit 104 .

［動作の説明］
次に、図面を参照して、第１の実施形態に係る情報処理装置１０の動作について説明する。[Explanation of operation]
Next, operation of the information processing apparatus 10 according to the first embodiment will be described with reference to the drawings.

まず、図面を参照して、教師データを用いてモデルを学習するまでの動作を説明する。 First, referring to the drawings, the operation up to learning a model using teacher data will be described.

図７は、第１の実施形態に係る情報処理装置１０におけるモデルを作成する動作の一例を示すフロー図である。 FIG. 7 is a flowchart showing an example of the operation of creating a model in the information processing apparatus 10 according to the first embodiment.

学習用特徴量算出部１０３は、所定の記憶装置又は処理装置から教師データを取得する（ステップＡ３０１）。 The learning feature value calculation unit 103 acquires teacher data from a predetermined storage device or processing device (step A301).

学習用特徴量算出部１０３は、教師データから、確信度を算出するためのモデルの学習に用いる特徴量（学習用特徴量）を算出する（ステップＡ３０２）。 The learning feature quantity calculation unit 103 calculates a feature quantity (learning feature quantity) used for learning a model for calculating a certainty from the teacher data (step A302).

学習部１０４は、学習用特徴量を用いて、モデルを学習する（ステップＡ３０３）。 The learning unit 104 learns the model using the learning feature amount (step A303).

想定ユースケースが複数ある場合、学習用特徴量算出部１０３及び学習部１０４は、それぞれの想定ユースケースに対する教師データとモデルとを用いて、上記の動作を実行する。 When there are a plurality of assumed use cases, the learning feature quantity calculation unit 103 and the learning unit 104 execute the above operation using teacher data and models for each assumed use case.

次に、図面を参照して、匿名化手法の評価値を算出する動作を説明する。 Next, the operation of calculating the evaluation value of the anonymization method will be described with reference to the drawings.

図８は、第１の実施形態に係る情報処理装置１０における匿名化手法の評価値を算出する動作の一例を示すフロー図である。 FIG. 8 is a flowchart showing an example of the operation of calculating the evaluation value of the anonymization method in the information processing apparatus 10 according to the first embodiment.

匿名化前確信度算出部２０２は、対象データ（匿名化前データ）をモデルに適用して匿名化前確信度を算出する（ステップＢ３０１）。 The pre-anonymization certainty calculation unit 202 applies the target data (pre-anonymization data) to the model to calculate the pre-anonymization certainty (step B301).

匿名化部２０１は、匿名化手法を用いて、対象データを匿名化する（ステップＢ３０４）。 The anonymization unit 201 anonymizes the target data using an anonymization method (step B304).

匿名化強度算出部２１１は、匿名化手法の匿名化強度を算出する(ステップＢ３０５)。 The anonymization strength calculator 211 calculates the anonymization strength of the anonymization method (step B305).

匿名化後確信度算出部２０３は、匿名化された対象データ（匿名化後データ）をモデルに適用して、匿名化後データに対する確信度（匿名化後確信度）を算出する（ステップＢ３０６）。 The post-anonymization confidence calculation unit 203 applies the anonymized target data (post-anonymization data) to the model to calculate the confidence (post-anonymization confidence) of the anonymized data (step B306). .

評価値算出部２０４は、匿名化強度と、匿名化前確信度と、匿名化後確信度とを用いて、匿名化手法の評価値を算出する（ステップＢ３０７）。 The evaluation value calculation unit 204 calculates the evaluation value of the anonymization method using the anonymization strength, the pre-anonymization certainty factor, and the post-anonymization certainty factor (step B307).

評価値算出部２０４は、算出した評価値を出力する（ステップＢ３０９）。 The evaluation value calculation unit 204 outputs the calculated evaluation value (step B309).

このような動作を基に、情報処理装置１０は、匿名化手法に対する評価値を出力する。 Based on such operations, the information processing apparatus 10 outputs an evaluation value for the anonymization method.

なお、情報処理装置１０は、匿名化手法選択部２０５を用いて、複数の匿名化手法から、評価値が特定の条件（例えば、評価値が閾値より大きい）を満足する匿名化手法を選択してもよい。 Note that the information processing apparatus 10 uses the anonymization method selection unit 205 to select an anonymization method whose evaluation value satisfies a specific condition (for example, the evaluation value is greater than a threshold value) from a plurality of anonymization methods. may

図９は、第１の実施形態に係る匿名化手法選択部２０５の動作を含む情報処理装置１０の動作の一例を示すフロー図である。図９において、図８と同様の動作の説明については、適宜省略する。 FIG. 9 is a flowchart showing an example of operations of the information processing apparatus 10 including operations of the anonymization method selection unit 205 according to the first embodiment. In FIG. 9, description of the same operation as in FIG. 8 will be omitted as appropriate.

匿名化前確信度を算出後、匿名化手法選択部２０５は、匿名化手法を選択し、匿名化部２０１に渡す（ステップＢ３０３）。 After calculating the pre-anonymization certainty factor, the anonymization method selection unit 205 selects an anonymization method and passes it to the anonymization unit 201 (step B303).

匿名化手法選択部２０５のおける上記の動作の一例を説明する。匿名化前確信度算出部２０２は、匿名化前確信度を算出すると、匿名化手法選択部２０５に算出の終了を通知する。通知を受けると、匿名化手法選択部２０５は、複数の匿名化手法の中から、所定に規則に沿って最初に使用する匿名化手法を選択する。なお、２回目以降においては、匿名化手法選択部２０５は、取得した評価値を用いて、匿名化手法を選択する。そして、匿名化手法選択部２０５は、選択した匿名化手法を匿名化部２０１に渡す。そして、匿名化手法選択部２０５は、匿名化部２０１、匿名化後確信度算出部２０３、匿名化強度算出部２１１、及び評価値算出部２０４に既に説明した動作と同様動作を実行させ、選択した匿名化手法の評価値を取得する。ただし、上記の説明は、匿名化手法選択部２０５の動作の一例である。匿名化手法選択部２０５は、上記と異なるように動作してもよい。 An example of the above operation in the anonymization method selection unit 205 will be described. After calculating the pre-anonymization certainty factor calculation unit 202, the anonymization method selection unit 205 is notified of the end of the calculation. Upon receiving the notification, the anonymization method selection unit 205 selects an anonymization method to be used first from among a plurality of anonymization methods according to a predetermined rule. From the second time onward, the anonymization method selection unit 205 selects an anonymization method using the acquired evaluation value. Then, the anonymization method selection unit 205 passes the selected anonymization method to the anonymization unit 201 . Then, the anonymization method selection unit 205 causes the anonymization unit 201, the post-anonymization certainty calculation unit 203, the anonymization strength calculation unit 211, and the evaluation value calculation unit 204 to perform operations similar to the operations already described, and select Get the evaluation value of the anonymization method used. However, the above description is an example of the operation of the anonymization method selection unit 205 . The anonymization method selection unit 205 may operate differently from the above.

情報処理装置１０は、ステップＢ３０７までは、図８の同様に動作する。 The information processing apparatus 10 operates in the same manner as in FIG. 8 up to step B307.

匿名化手法選択部２０５は、評価値が所定の条件を満足するか否かを判定する(ステップＢ３０８)。 The anonymization method selection unit 205 determines whether the evaluation value satisfies a predetermined condition (step B308).

条件を満たさない場合（ステップＢ３０８でＮｏ）、情報処理装置１０は、ステップＢ３０３に戻る。そして、匿名化手法選択部２０５は、次の匿名化手法を選択する。以降、情報処理装置１０は、条件を満足するまで動作を繰り返す。 If the condition is not satisfied (No in step B308), the information processing apparatus 10 returns to step B303. Then, the anonymization method selection unit 205 selects the next anonymization method. Thereafter, the information processing apparatus 10 repeats operations until the conditions are satisfied.

条件を満たす場合（ステップＢ３０８でＹｅｓ）、情報処理装置１０は、評価手法を出力する（ステップＢ３１０）。なお、この場合、匿名化手法選択部２０５は、匿名化手法に合わせて評価値を出力してもよい。あるいは、匿名化手法選択部２０５は、評価値算出部２０４に評価値の出力を依頼してもよい。 If the condition is satisfied (Yes in step B308), the information processing apparatus 10 outputs the evaluation method (step B310). In this case, the anonymization method selection unit 205 may output an evaluation value according to the anonymization method. Alternatively, the anonymization method selection unit 205 may request the evaluation value calculation unit 204 to output the evaluation value.

［効果の説明］
次に、第１の実施形態に係る情報処理装置１０の効果を説明する。[Explanation of effect]
Next, effects of the information processing apparatus 10 according to the first embodiment will be described.

このように、第１の実施形態に係る情報処理装置１０は、匿名化の比較に用いるための値を算出するとの効果を得ることができる。 In this way, the information processing apparatus 10 according to the first embodiment can obtain the effect of calculating a value to be used for anonymization comparison.

その理由は、次のとおりである。 The reason is as follows.

情報処理装置１０は、匿名化前確信度算出部２０２と、匿名化部２０１と、匿名化後確信度算出部２０３と、匿名化強度算出部２１１と、評価値算出部２０４とを含む。匿名化前確信度算出部２０２は、確信度を算出するモデルを用いて匿名化前データにおける確信度である匿名化前確信度を算出する。匿名化部２０１は、匿名化前データに匿名化手法を適用して匿名化後データを作成する。匿名化強度算出部２１１は、匿名化手法の匿名化強度を算出する。匿名化後確信度算出部２０３は、モデルを用いて、匿名化後データにおける確信度である匿名化後確信度を算出する。評価値算出部２０４は、匿名化前確信度と匿名化後確信度との差と、匿名化強度とを基に匿名化手法の評価値を算出する。 Information processing apparatus 10 includes pre-anonymization certainty calculation unit 202 , anonymization unit 201 , post-anonymization certainty calculation unit 203 , anonymization strength calculation unit 211 , and evaluation value calculation unit 204 . The pre-anonymization certainty calculation unit 202 calculates the pre-anonymization certainty, which is the certainty in the pre-anonymization data, using a model for calculating the certainty. The anonymization unit 201 applies an anonymization technique to the pre-anonymization data to create post-anonymization data. The anonymization strength calculation unit 211 calculates the anonymization strength of the anonymization method. The post-anonymization confidence calculation unit 203 uses a model to calculate a post-anonymization confidence that is the confidence in the anonymized data. The evaluation value calculation unit 204 calculates the evaluation value of the anonymization method based on the difference between the pre-anonymization certainty factor and the post-anonymization certainty factor and the anonymization strength.

上記構成は、モデルと、匿名化手法と、対象データとが決まると、自動的に評価値を算出する。そのため、情報処理装置１０の利用者は、情報処理装置１０が算出した評価値を用いて、匿名化手法を判定することができる。例えば、利用者が複数の匿名化手法の中から適切な匿名化手法を選択する場合、利用者は、情報処理装置１０を用いて全ての匿名化手法の評価値を算出し、最も評価値が高い匿名化手法を選択すればよい。 The above configuration automatically calculates the evaluation value when the model, the anonymization method, and the target data are determined. Therefore, the user of the information processing device 10 can use the evaluation value calculated by the information processing device 10 to determine the anonymization method. For example, when the user selects an appropriate anonymization method from among a plurality of anonymization methods, the user uses the information processing device 10 to calculate the evaluation values of all the anonymization methods, A high anonymization method should be selected.

さらに、情報処理装置１０は、評価値を所定の条件を満たす匿名化手法を提供するとの効果を奏することができる。 Furthermore, the information processing apparatus 10 can provide an effect of providing an anonymization method that satisfies a predetermined condition for the evaluation value.

その理由は、情報処理装置１０が、匿名化手法選択部２０５の動作を基に、評価値が所定の条件を満足するまで、匿名化手法の選択と、選択した匿名化手法の評価値の算出を繰り返し、所定の条件を満足する匿名化手法を出力するためである。 The reason is that the information processing apparatus 10 selects an anonymization method and calculates an evaluation value of the selected anonymization method based on the operation of the anonymization method selection unit 205 until the evaluation value satisfies a predetermined condition. is repeated to output an anonymization method that satisfies a predetermined condition.

さらに、情報処理装置１０は、確信度の精度を向上するとの効果を奏することができる。 Furthermore, the information processing apparatus 10 can produce an effect of improving the accuracy of the confidence factor.

その理由は、情報処理装置１０が、教師データを基に確信度を算出するモデルを学習する学習部１０４を含むためである。 The reason for this is that the information processing apparatus 10 includes the learning unit 104 that learns a model for calculating certainty based on teacher data.

［実施形態の概要］
次に、図面を参照して、第１の実施形態に係る情報処理装置１０の概要を説明する。[Overview of embodiment]
Next, an overview of the information processing apparatus 10 according to the first embodiment will be described with reference to the drawings.

図１０は、第１の実施形態に係る情報処理装置１０の概要である情報処理装置１５の構成の一例を示すブロック図である。 FIG. 10 is a block diagram showing an example of a configuration of an information processing device 15, which is an overview of the information processing device 10 according to the first embodiment.

情報処理装置１５は、匿名化前確信度算出部２０２と、匿名化部２０１と、匿名化後確信度算出部２０３と、匿名化強度算出部２１１と、評価値算出部２０４とを含む。匿名化前確信度算出部２０２は、確信度を算出するモデルを用いて匿名化前データにおける確信度である匿名化前確信度を算出する。匿名化部２０１は、匿名化前データに匿名化手法を適用して匿名化後データを作成する。匿名化強度算出部２１１は、匿名化手法の匿名化強度を算出する。匿名化後確信度算出部２０３は、モデルを用いて、匿名化後データにおける確信度である匿名化後確信度を算出する。評価値算出部２０４は、匿名化前確信度と匿名化後確信度との差と、匿名化強度とを基に匿名化手法の評価値を算出する。 Information processing apparatus 15 includes pre-anonymization certainty calculation unit 202 , anonymization unit 201 , post-anonymization certainty calculation unit 203 , anonymization strength calculation unit 211 , and evaluation value calculation unit 204 . The pre-anonymization certainty calculation unit 202 calculates the pre-anonymization certainty, which is the certainty in the pre-anonymization data, using a model for calculating the certainty. The anonymization unit 201 applies an anonymization technique to the pre-anonymization data to create post-anonymization data. The anonymization strength calculation unit 211 calculates the anonymization strength of the anonymization method. The post-anonymization confidence calculation unit 203 uses a model to calculate a post-anonymization confidence that is the confidence in the anonymized data. The evaluation value calculation unit 204 calculates the evaluation value of the anonymization method based on the difference between the pre-anonymization certainty factor and the post-anonymization certainty factor and the anonymization strength.

上記構成は、モデルと、匿名化手法と、対象データとが決まると、自動的に評価値を算出する。そのため、情報処理装置１５の利用者は、評価値を用いて、匿名化手法を比較することができる。 The above configuration automatically calculates the evaluation value when the model, the anonymization method, and the target data are determined. Therefore, the user of the information processing device 15 can compare the anonymization methods using the evaluation values.

情報処理装置１５は、情報処理装置１０と同様の効果を奏することができる。 The information processing device 15 can achieve the same effects as the information processing device 10 .

その理由は、上記の情報処理装置１５の構成が、対応する情報処理装置１０の構成と同様に動作するためである。 The reason is that the configuration of the information processing device 15 described above operates in the same manner as the configuration of the corresponding information processing device 10 .

なお、情報処理装置１５は、第１の実施形態の最小構成である。 The information processing device 15 is the minimum configuration of the first embodiment.

［ハードウェア構成］
図面を参照して、第１の実施形態に係る情報処理装置１０のハードウェアを説明する。[Hardware configuration]
Hardware of the information processing apparatus 10 according to the first embodiment will be described with reference to the drawings.

例えば、情報処理装置１０の各構成部は、ハードウェア回路で構成されてもよい。 For example, each component of the information processing device 10 may be configured by a hardware circuit.

あるいは、情報処理装置１０において、各構成部は、ネットワークを介して接続した複数の装置を用いて、構成されてもよい。 Alternatively, in the information processing apparatus 10, each component may be configured using a plurality of devices connected via a network.

あるいは、情報処理装置１０において、複数の構成部は、１つのハードウェアで構成されてもよい。 Alternatively, in the information processing device 10, the plurality of components may be configured by one piece of hardware.

あるいは、情報処理装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）とを含むコンピュータ装置として実現されてもよい。情報処理装置１００は、上記構成に加え、さらに、入出力接続回路（ＩＯＣ：ＩｎｐｕｔａｎｄＯｕｔｐｕｔＣｉｒｃｕｉｔ）を含むコンピュータ装置として実現されてもよい。情報処理装置１００は、上記構成に加え、さらに、ネットワークインターフェース回路（ＮＩＣ：ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣｉｒｃｕｉｔ）を含むコンピュータ装置として実現されてもよい。 Alternatively, the information processing apparatus 100 may be realized as a computer device including a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The information processing device 100 may be implemented as a computer device that further includes an input/output connection circuit (IOC: Input and Output Circuit) in addition to the above configuration. The information processing apparatus 100 may be realized as a computer apparatus that further includes a network interface circuit (NIC: Network Interface Circuit) in addition to the above configuration.

図１１は、第１の実施形態に係る情報処理装置１００のハードウェア構成の一例である情報処理装置６００の構成を示すブロック図である。 FIG. 11 is a block diagram showing the configuration of an information processing device 600, which is an example of the hardware configuration of the information processing device 100 according to the first embodiment.

情報処理装置６００は、ＣＰＵ６１０と、ＲＯＭ６２０と、ＲＡＭ６３０と、内部記憶装置６４０と、ＩＯＣ６５０と、ＮＩＣ６８０とを含み、コンピュータ装置を構成している。 The information processing device 600 includes a CPU 610, a ROM 620, a RAM 630, an internal storage device 640, an IOC 650, and a NIC 680, and constitutes a computer device.

ＣＰＵ６１０は、ＲＯＭ６２０からプログラムを読み込む。そして、ＣＰＵ６１０は、読み込んだプログラムに基づいて、ＲＡＭ６３０と、内部記憶装置６４０と、ＩＯＣ６５０と、ＮＩＣ６８０とを制御する。そして、ＣＰＵ６１０を含むコンピュータは、これらの構成を制御し、図１に示されている各構成の機能を実現する。各構成とは、学習用特徴量算出部１０３、学習部１０４、匿名化前確信度算出部２０２と、匿名化部２０１と、匿名化後確信度算出部２０３と、評価値算出部２０４と、匿名化手法選択部２０５とである。 The CPU 610 reads programs from the ROM 620 . Then, the CPU 610 controls the RAM 630, the internal storage device 640, the IOC 650, and the NIC 680 based on the read program. A computer including the CPU 610 controls these configurations and implements the functions of each configuration shown in FIG. Each configuration includes a learning feature calculation unit 103, a learning unit 104, a pre-anonymization certainty calculation unit 202, an anonymization unit 201, a post-anonymization certainty calculation unit 203, an evaluation value calculation unit 204, and an anonymization method selection unit 205 .

ＣＰＵ６１０は、各機能を実現する際に、ＲＡＭ６３０又は内部記憶装置６４０を、プログラムの一時記憶媒体として使用してもよい。 When implementing each function, the CPU 610 may use the RAM 630 or the internal storage device 640 as a temporary storage medium for the program.

また、ＣＰＵ６１０は、コンピュータで読み取り可能にプログラムを記憶した記憶媒体７００が含むプログラムを、図示しない記憶媒体読み取り装置を用いて読み込んでもよい。あるいは、ＣＰＵ６１０は、ＮＩＣ６８０を介して、図示しない外部の装置からプログラムを受け取り、ＲＡＭ６３０又は内部記憶装置６４０に保存して、保存したプログラムを基に動作してもよい。 Further, the CPU 610 may read a program included in the storage medium 700 storing the computer-readable program using a storage medium reading device (not shown). Alternatively, CPU 610 may receive a program from an external device (not shown) via NIC 680, store the program in RAM 630 or internal storage device 640, and operate based on the stored program.

ＲＯＭ６２０は、ＣＰＵ６１０が実行するプログラム及び固定的なデータを記憶する。ＲＯＭ６２０は、例えば、Ｐ－ＲＯＭ（Ｐｒｏｇｒａｍｍａｂｌｅ－ＲＯＭ）又はフラッシュＲＯＭである。 The ROM 620 stores programs executed by the CPU 610 and fixed data. The ROM 620 is, for example, a P-ROM (Programmable-ROM) or a flash ROM.

ＲＡＭ６３０は、ＣＰＵ６１０が実行するプログラム及びデータを一時的に記憶する。ＲＡＭ６３０は、例えば、Ｄ－ＲＡＭ（Ｄｙｎａｍｉｃ－ＲＡＭ）である。 RAM 630 temporarily stores programs and data executed by CPU 610 . The RAM 630 is, for example, a D-RAM (Dynamic-RAM).

内部記憶装置６４０は、情報処理装置６００が長期的に保存するデータ及びプログラムを記憶する。また、内部記憶装置６４０は、ＣＰＵ６１０の一時記憶装置として動作してもよい。内部記憶装置６４０は、例えば、ハードディスク装置、光磁気ディスク装置、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）又はディスクアレイ装置である。 The internal storage device 640 stores data and programs that the information processing device 600 saves for a long time. Moreover, the internal storage device 640 may operate as a temporary storage device for the CPU 610 . The internal storage device 640 is, for example, a hard disk device, a magneto-optical disk device, an SSD (Solid State Drive), or a disk array device.

ここで、ＲＯＭ６２０と内部記憶装置６４０は、不揮発性（ｎｏｎ－ｔｒａｎｓｉｔｏｒｙ）の記憶媒体である。一方、ＲＡＭ６３０は、揮発性（ｔｒａｎｓｉｔｏｒｙ）の記憶媒体である。そして、ＣＰＵ６１０は、ＲＯＭ６２０、内部記憶装置６４０、又は、ＲＡＭ６３０に記憶されているプログラムを基に動作可能である。つまり、ＣＰＵ６１０は、不揮発性記憶媒体又は揮発性記憶媒体を用いて動作可能である。 Here, the ROM 620 and the internal storage device 640 are non-transitory storage media. On the other hand, RAM 630 is a volatile (transitory) storage medium. The CPU 610 can operate based on programs stored in the ROM 620 , the internal storage device 640 , or the RAM 630 . That is, CPU 610 can operate using a non-volatile storage medium or a volatile storage medium.

ＩＯＣ６５０は、ＣＰＵ６１０と、入力機器６６０及び表示機器６７０とのデータを仲介する。ＩＯＣ６５０は、例えば、ＩＯインターフェースカード又はＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）カードである。さらに、ＩＯＣ６５０は、ＵＳＢのような有線に限らず、無線を用いてもよい。 IOC 650 mediates data between CPU 610 and input device 660 and display device 670 . The IOC 650 is, for example, an IO interface card or a USB (Universal Serial Bus) card. Furthermore, the IOC 650 is not limited to a wired connection such as USB, and may use a wireless connection.

入力機器６６０は、情報処理装置６００の操作者からの入力指示を受け取る機器である。入力機器６６０は、例えば、キーボード、マウス又はタッチパネルである。 The input device 660 is a device that receives input instructions from the operator of the information processing device 600 . The input device 660 is, for example, a keyboard, mouse or touch panel.

表示機器６７０は、情報処理装置６００の操作者に情報を表示する機器である。表示機器６７０は、例えば、液晶ディスプレイである。 The display device 670 is a device that displays information to the operator of the information processing device 600 . The display device 670 is, for example, a liquid crystal display.

ＮＩＣ６８０は、図示しない外部の装置とのネットワークを介したデータのやり取りを中継する。ＮＩＣ６８０は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）カードである。さらに、ＮＩＣ６８０は、有線に限らず、無線を用いてもよい。ＮＩＣ６８０は、モデル、対象データ、及び匿名化手法を受信する。さらに、ＮＩＣ６８０は、教師データを受信する。 The NIC 680 relays data exchange with an external device (not shown) via a network. The NIC 680 is, for example, a LAN (Local Area Network) card. Furthermore, the NIC 680 is not limited to wired, and may be wireless. NIC 680 receives models, target data, and anonymization techniques. In addition, NIC 680 receives teacher data.

このように構成された情報処理装置６００は、情報処理装置１０と同様の効果を得ることができる。 The information processing device 600 configured in this way can obtain the same effects as the information processing device 10 .

その理由は、情報処理装置６００のＣＰＵ６１０が、プログラムに基づいて情報処理装置１０と同様の機能を実現できるためである。 The reason is that the CPU 610 of the information processing device 600 can implement the same functions as the information processing device 10 based on the program.

［情報処理システム］
図面を参照して、情報処理装置１０を含む情報処理システム５０の一例を説明する。[Information processing system]
An example of an information processing system 50 including the information processing device 10 will be described with reference to the drawings.

図１２は、第１の実施形態に係る情報処理装置１０を含む情報処理システム５０の構成の一例を示すブロック図である。情報処理システム５０は、情報処理装置１０と、データ格納装置３０と、表示装置４０とを含む。 FIG. 12 is a block diagram showing an example configuration of an information processing system 50 including the information processing apparatus 10 according to the first embodiment. Information processing system 50 includes information processing device 10 , data storage device 30 , and display device 40 .

各装置は、所定の通信網を用いて接続されている。 Each device is connected using a predetermined communication network.

データ格納装置３０は、匿名化の対象となるデータ（対象データ）を格納する。さらに、データ格納装置３０は、教師データを格納する。なお、情報処理装置１０が、教師データとして、匿名化の対象となるデータの一部を用いる場合、データ格納装置３０は、匿名化の対象となるデータとは別に教師データを保持せず、教師データとして用いるデータの範囲を保持してもよい。さらに、データ格納装置３０は、匿名化手法を保持してもよい。 The data storage device 30 stores data to be anonymized (target data). Further, the data storage device 30 stores teacher data. Note that when the information processing device 10 uses a part of the data to be anonymized as the teacher data, the data storage device 30 does not hold the teacher data separately from the data to be anonymized. A range of data used as data may be held. Furthermore, the data storage device 30 may hold an anonymization method.

データ格納装置３０は、さらにモデル（学習前のモデル）を保持してもよい。この場合、情報処理装置１０は、データ格納装置３０からモデルを取得する。 The data storage device 30 may also hold a model (pre-learning model). In this case, the information processing device 10 acquires the model from the data storage device 30 .

情報処理装置１０は、データ格納装置３０から、教師データと、匿名化の対象となるデータと、匿名化手法とを取得する。 The information processing device 10 acquires teacher data, data to be anonymized, and an anonymization method from the data storage device 30 .

そして、情報処理装置１０は、上記の動作を基に匿名化手法の評価値を算出し、算出した評価値を表示装置４０に送信する。情報処理装置１０は、評価値に合わせて、匿名化手法及び／又は匿名化後データを送信してもよい。 Then, the information processing device 10 calculates the evaluation value of the anonymization method based on the above operation, and transmits the calculated evaluation value to the display device 40 . The information processing device 10 may transmit the anonymization method and/or the post-anonymization data in accordance with the evaluation value.

表示装置４０は、受信した評価値を表示する。表示装置４０は、他の情報（匿名化手法など）を表示してもよい。 The display device 40 displays the received evaluation value. The display device 40 may display other information (anonymization method, etc.).

情報処理システム５０の利用者は、表示された評価値などを基に、匿名化手法を判定すればよい。 The user of the information processing system 50 can determine the anonymization method based on the displayed evaluation value.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成及び詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

この出願は、２０１８年３月２日に出願された日本出願特願２０１８－０３７１１７を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2018-037117 filed on March 2, 2018, and the entire disclosure thereof is incorporated herein.

本発明は、所定の解析等に利用するために、データ内に含まれる個人情報を匿名化する際に利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used when anonymizing personal information contained in data for use in predetermined analysis or the like.

１０情報処理装置
１５情報処理装置
３０データ格納装置
４０表示装置
５０情報処理システム
１０３学習用特徴量算出部
１０４学習部
２０１匿名化部
２０２匿名化前確信度算出部
２０３匿名化後確信度算出部
２０４評価値算出部
２０５匿名化手法選択部
２１１匿名化強度算出部
６００情報処理装置
６１０ＣＰＵ
６２０ＲＯＭ
６３０ＲＡＭ
６４０内部記憶装置
６５０ＩＯＣ
６６０入力機器
６７０表示機器
６８０ＮＩＣ
７００記憶媒体10 information processing device 15 information processing device 30 data storage device 40 display device 50 information processing system 103 learning feature quantity calculation unit 104 learning unit 201 anonymization unit 202 pre-anonymization certainty calculation unit 203 post-anonymization certainty calculation unit 204 Evaluation value calculation unit 205 Anonymization method selection unit 211 Anonymization strength calculation unit 600 Information processing device 610 CPU
620 ROMs
630 RAM
640 internal storage device 650 IOC
660 Input device 670 Display device 680 NIC
700 storage medium

Claims

Confidence before anonymization that calculates the pre-anonymization confidence that is the confidence in pre-anonymization data using a model that calculates the confidence that is the degree of correctness for the judgment result using data degree calculating means;
Anonymization means for applying an anonymization method to the data before anonymization to create data after anonymization;
Anonymization strength calculation means for calculating the anonymization strength of the anonymization method;
Post-anonymization confidence calculation means for calculating a post-anonymization confidence, which is the confidence in the anonymized data, using the model;
Evaluation value calculation means for calculating an evaluation value of the anonymization method based on the difference between the pre-anonymization confidence factor and the post-anonymization confidence factor and the anonymization strength,
The anonymization strength calculation means calculates the anonymization strength using the index value of the anonymization method, the number of attributes, and the time span ,
determining whether the evaluation value satisfies a predetermined condition;
If the evaluation value does not satisfy the condition, another anonymization method is selected until the evaluation value satisfies the condition, and the anonymization means, the anonymization strength calculation means, and the post-anonymization confidence are selected. causing the degree calculation means and the evaluation value calculation means to repeat the same operation
further comprising an anonymization option;
The anonymization selection means selects a predetermined number of the anonymization methods in descending order of the evaluation values, and outputs the evaluation values according to the anonymization methods.
Information processing equipment.

2. The information processing apparatus according to claim 1 , further comprising learning means for learning said model using teacher data.

learning feature quantity calculation means for acquiring the training data and calculating the feature quantity used by the model based on the training data;
3. The information processing apparatus according to claim 2, further comprising:

the computer
Calculate the pre-anonymization confidence that is the confidence in the pre-anonymized data using a model that calculates the confidence that is the degree of correctness for the judgment result using the data,
creating post-anonymization data by applying an anonymization method to the pre-anonymization data;
Calculate the anonymization strength of the anonymization method,
Using the model, calculate the post-anonymization confidence that is the confidence in the anonymized data,
calculating an evaluation value of the anonymization method based on the difference between the pre-anonymization confidence factor and the post-anonymization confidence factor and the anonymization strength;
The calculation of the anonymization strength is calculation of the anonymization strength using the index value of the anonymization method, the number of attributes, and the time span ,
determining whether the evaluation value satisfies a predetermined condition;
If the evaluation value does not satisfy the condition, another anonymization method is selected until the evaluation value satisfies the condition, creating the anonymized data, calculating the anonymization strength, and anonymizing the data. Repeating the operation of calculating the confidence factor after anonymization and calculating the evaluation value of the anonymization method,
A predetermined number of the anonymization methods are selected in descending order of the evaluation values, and the evaluation values are output according to the anonymization methods.
Information processing methods.

A process of calculating the pre-anonymization confidence that is the confidence in pre-anonymization data using a model that calculates the confidence that is the degree of correctness for the judgment result using data;
A process of applying an anonymization method to the pre-anonymized data to create anonymized data;
A process of calculating the anonymization strength of the anonymization method;
A process of calculating the post-anonymization confidence level, which is the confidence level in the post-anonymization data, using the model;
causing a computer to perform a process of calculating an evaluation value of the anonymization method based on the difference between the pre-anonymization confidence factor and the post-anonymization confidence factor and the anonymization strength,
The process of calculating the anonymization strength is a process of calculating the anonymization strength using the index value of the anonymization method, the number of attributes, and the time span ,
determining whether the evaluation value satisfies a predetermined condition;
If the evaluation value does not satisfy the condition, another anonymization method is selected until the evaluation value satisfies the condition, processing to create the anonymized data, and calculation of the anonymization strength. processing, processing for calculating the degree of confidence after anonymization, and processing for repeating the processing for calculating the evaluation value of the anonymization method;
causing a computer to execute a process of selecting a predetermined number of the anonymization methods in descending order of the evaluation values and outputting the evaluation values in accordance with the anonymization methods;
program.