JP2022147342A

JP2022147342A - Secret information management program, secret information management method, data registration device and secret information management system

Info

Publication number: JP2022147342A
Application number: JP2021048542A
Authority: JP
Inventors: 利昭舟久保; Toshiaki Funakubo; 一穂前田; Kazuho Maeda
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2022-10-06

Abstract

To improve safety for frequency analysis attack of encrypted data.SOLUTION: A data registration device 1 associates each of a plurality of first item values that can be set to one item within a secret record in a secret record group 4 which includes one or more secret record of a secret target, with n (n is a natural number) second item values. Next, the data registration device 1 classifies each of the second item values into a plurality of item value sets 5a-5c such that each of a plurality of item value sets 5a-5c includes any number of one through n-1 of two or more second item values associated with the same first item value. Further, the data registration device 1 stochastically converts the first item value which is set to the secret record to any of the associated second item values, and generates a scrambled record group 6 in which a hit number of searching for the second item value belonging to the same item value set is equalized. Then, the data registration device 1 encrypts the scrambled record group 6.SELECTED DRAWING: Figure 1

Description

本発明は、秘密情報管理プログラム、秘密情報管理方法、データ登録装置、および秘密情報管理システムに関する。 The present invention relates to a confidential information management program, a confidential information management method, a data registration device, and a confidential information management system.

コンピュータシステムでは、ビッグデータと呼ばれる大量のデータを扱うことができる。例えばコンピュータでビッグデータを解析すれば、様々な知見を得ることが可能である。解析に用いるビッグデータの量が多いほど、そのビッグデータから多様な知識を得ることができ、得られた知識の信頼性も向上する。そこで企業などの組織ごとにビッグデータ用の独自のデータベース（ＤＢ）を構築するのではなく、複数の組織のデータを統合したＤＢを、複数の組織で利用することが考えられる。このような統合したＤＢサービスは、例えばクラウドコンピューティングシステム（以下、「クラウド」と呼ぶ）を用いて実現できる。 Computer systems can handle large amounts of data called big data. For example, if a computer analyzes big data, it is possible to obtain various findings. The larger the amount of big data used for analysis, the more diverse knowledge can be obtained from the big data, and the more reliable the obtained knowledge is. Therefore, instead of constructing a unique database (DB) for big data for each organization such as a company, it is conceivable that a plurality of organizations use a DB that integrates data of a plurality of organizations. Such an integrated DB service can be implemented using, for example, a cloud computing system (hereinafter referred to as "cloud").

複数の組織のデータを統合したＤＢをクラウドで管理する場合において、データ提供元の組織は、提供したデータの利用を、データ提供元の組織が許可した他の組織のみに制限したい場合がある。またデータ提供元の組織およびデータを利用する組織は、提供されるデータの内容やＤＢ内のデータへの検索内容を、クラウドの管理者にも知られたくない場合もある。これらの場合、ビッグデータを管理するクラウドは、例えば各組織から暗号化されたデータの提供を受け、暗号データをＤＢに格納する。そしてクラウドは、暗号データのままデータの同一性を判定できる照合技術を用いて、データ提供元の組織から渡された鍵で暗号化された検索要求に応じて、ＤＢ内のデータ検索を行う。これにより、データ提供元の組織は、鍵を渡した組織に対してのみ、提供したデータの利用を許可することができる。またクラウドでは、提供されたデータと検索要求とが暗号化されたままであるため、クラウドの管理者にデータの内容を知られることも抑止できる。 When a database that integrates data from multiple organizations is managed in the cloud, the data provider organization may want to limit the use of the provided data to only other organizations that the data provider organization permits. In addition, the organization that provides the data and the organization that uses the data sometimes do not want the cloud administrator to know the contents of the provided data and the contents of searches for the data in the DB. In these cases, a cloud that manages big data receives encrypted data from each organization, for example, and stores the encrypted data in a DB. Then, the cloud searches the data in the DB in response to the search request encrypted with the key passed from the organization that provided the data, using a collation technique that can determine the identity of the encrypted data. As a result, the data provider organization can permit use of the provided data only to the organization that has passed the key. Moreover, in the cloud, since the provided data and the search request remain encrypted, it is possible to prevent the administrator of the cloud from knowing the contents of the data.

ＤＢ内のデータの秘匿化に関連する技術としては、例えば頻度分析への耐性を持つ秘匿検索を実現する検索システムが提案されている。また、暗号化により情報提供者のプライバシーを保護しつつ、分析の処理効率を向上できるデータ集計分析システムも提案されている。 As a technique related to concealment of data in a DB, for example, a search system that realizes concealed search that is resistant to frequency analysis has been proposed. A data aggregation analysis system has also been proposed that can improve the processing efficiency of analysis while protecting the privacy of the information provider through encryption.

国際公開第２０１２／１１５０３１号公報International Publication No. 2012/115031 国際公開第２０１６／１２０９７５号公報International Publication No. 2016/120975

暗号データのまま検索を行うことができる秘匿検索技術を用いても、暗号化されたデータの安全性は十分とはいえない。すなわち、暗号データを格納したＤＢに対して頻度分析攻撃が可能であり、検索クエリの内容または検索結果が類推されてしまう場合がある。 Encrypted data cannot be said to be sufficiently secure even if a confidential search technique is used, which allows searching encrypted data as it is. In other words, a frequency analysis attack is possible against a DB storing encrypted data, and the content of a search query or search results may be inferred.

１つの側面では、本件は、暗号化されたデータへの頻度分析攻撃に対する安全性を向上させることを目的とする。 In one aspect, this application aims to improve security against frequency analysis attacks on encrypted data.

１つの案では、コンピュータに以下の処理を実行させる秘密情報管理プログラムが提供される。
コンピュータは、秘匿対象の秘匿レコードを１つ以上含む秘匿レコード群における秘匿レコード内の一の項目に設定可能な複数の第１項目値それぞれに対して、ｎ個（ｎは自然数）の第２項目値を対応付ける。次にコンピュータは、複数の項目値集合それぞれが、同一の第１項目値に対応付けられた２個以上の第２項目値のうちの１個からｎ－１個のいずれかの個数の第２項目値を含むように、第２項目値それぞれを複数の項目値集合のいずれかに分類する。次にコンピュータは、秘匿レコード内の一の項目に設定されている第１項目値を、対応付けられた第２項目値のいずれかに確率的に変換する。次にコンピュータは、秘匿レコード群に基づいて、ダミーの第２項目値の追加により、同一の項目値集合に属する第２項目値に対する検索のヒット件数の均等化を図った攪乱レコード群を生成する。次にコンピュータは、攪乱レコード群内のレコードに、レコードに含まれる第２項目値の真偽を示すフラグを付与する。そしてコンピュータは、攪乱レコード群を暗号化する。 In one proposal, a confidential information management program is provided that causes a computer to perform the following processes.
The computer provides n (n is a natural number) second items for each of a plurality of first item values that can be set for one item in a confidential record in a confidential record group including one or more confidential records to be confidential. Map values. Next, the computer determines that each of the plurality of item value sets is any of 1 to n-1 of the two or more second item values associated with the same first item value. Each second item value is classified into one of a plurality of item value sets so as to contain the item value. Next, the computer probabilistically converts the first item value set in one item in the confidential record to one of the associated second item values. Next, the computer generates a disturbance record group in which the number of search hits for the second item value belonging to the same item value set is equalized by adding a dummy second item value based on the confidential record group. . Next, the computer attaches a flag indicating whether the second item value included in the record is true or false to the record in the disturbance record group. The computer then encrypts the disturbed records.

１態様によれば、暗号化されたデータへの頻度分析攻撃に対する安全性を向上させることができる。 According to one aspect, security against frequency analysis attacks on encrypted data can be improved.

第１の実施の形態に係る秘密情報管理システムの一例を示す図である。1 is a diagram showing an example of a confidential information management system according to a first embodiment; FIG. 秘密情報管理システムの一例を示す図である。It is a figure which shows an example of a confidential information management system. データ管理サーバのハードウェアの一構成例を示す図である。It is a figure which shows one structural example of the hardware of a data management server. 頻度分析攻撃の一例を示す図である。FIG. 11 illustrates an example of a frequency analysis attack; ダミー要素を用いた変換集合の生成例を示す図である。FIG. 10 is a diagram illustrating an example of generating a transformation set using dummy elements; ダミー要素を用いた場合の項目値の変換例を示す図である。FIG. 10 is a diagram showing an example of conversion of item values when dummy elements are used; ダミー要素を用いた場合のダミー値の生成例を示す図である。FIG. 10 is a diagram showing an example of generating dummy values when dummy elements are used; ダミー要素を用いた場合の登録データの一例を示す図である。FIG. 10 is a diagram showing an example of registration data when dummy elements are used; 暗号文の均一化を説明する図である。It is a figure explaining equalization of a ciphertext. 適切な群数Ｇについて説明する図である。FIG. 4 is a diagram for explaining an appropriate number of groups G; 頻度攪乱後の頻度分布の一例を示す図である。It is a figure which shows an example of the frequency distribution after frequency perturbation. ダミー要素を用いた場合の組み合わせ頻度の一例を示す図である。FIG. 10 is a diagram showing an example of combination frequencies when dummy elements are used; 構造的ゼロを利用した検索対象の絞り込みの一例を示す図である。FIG. 10 is a diagram showing an example of narrowing down search targets using structural zeros; 分割キーワードへの確率的変換の一例を示す図である。It is a figure which shows an example of the probabilistic conversion to a division keyword. 変換集合による分割キーワードからダミー値への変換の一例を示す図である。FIG. 10 is a diagram showing an example of conversion from a split keyword to a dummy value by a conversion set; 変換集合の他の生成例を示す第１の図である。FIG. 11 is a first diagram showing another example of generating a transformation set; 変換集合の他の生成例を示す第２の図である。FIG. 11 is a second diagram showing another example of generating a transformation set; 秘密情報管理システムの機能を示すブロック図である。It is a block diagram which shows the function of a confidential information management system. ダミーデータを用いた出現頻度の攪乱処理の一例を示す図である。FIG. 10 is a diagram showing an example of disturbance processing of the appearance frequency using dummy data; 平文の患者データのＤＢの一例を示す図である。It is a figure which shows an example of DB of patient data of a plaintext. データ登録サーバ内の変換情報記憶部に格納される情報の一例を示す図である。It is a figure which shows an example of the information stored in the conversion information storage part in a data registration server. キーワード一覧の一例を示す図である。It is a figure which shows an example of a keyword list. 変換集合の第１の生成例を示す図である。FIG. 10 is a diagram illustrating a first generation example of a transformation set; 変換集合の第２の生成例を示す図である。FIG. 11 is a diagram illustrating a second generation example of a transformation set; 分割キーワードを用いて生成した登録データの一例を示す図である。It is a figure which shows an example of the registration data produced|generated using the division|segmentation keyword. 秘匿化ＤＢの一例を示す図である。It is a figure which shows an example of an anonymization DB. データ登録処理の手順の一例を示すフローチャートである。6 is a flow chart showing an example of a procedure of data registration processing; 変換集合生成処理の手順の一例を示すフローチャートである。9 is a flowchart illustrating an example of the procedure of transformation set generation processing; ダミーデータ生成処理の手順の一例を示すフローチャートである。7 is a flowchart illustrating an example of a procedure of dummy data generation processing; 検索条件入力画面の一例を示す図である。It is a figure which shows an example of a search condition input screen. 端末装置内の変換情報記憶部に格納される情報の一例を示す図である。It is a figure which shows an example of the information stored in the conversion information storage part in a terminal device. 分割キーワードへの変換の一例を示す図である。It is a figure which shows an example of conversion into a division keyword. 攪乱クエリの生成の一例を示す図である。It is a figure which shows an example of generation of a disturbance query. 検索処理の一例を示す図である。It is a figure which shows an example of search processing. 検索処理の手順の一例を示すフローチャートである。6 is a flowchart illustrating an example of a search processing procedure; ダミー値の逆変換の一例を示す図である。FIG. 10 is a diagram showing an example of inverse transformation of dummy values; 検索結果表示画面の一例を示す図である。It is a figure which shows an example of a search result display screen. 分割キーワードによる頻度攪乱の一例を示す図である。It is a figure which shows an example of frequency disturbance by a division|segmentation keyword. 頻度分析攻撃の一例を示す図である。FIG. 11 illustrates an example of a frequency analysis attack; 共通キーワードへの変換の一例を示す図である。It is a figure which shows an example of conversion to a common keyword. 共有集合の他の生成例を示す第１の図である。FIG. 11 is a first diagram showing another example of generating shared sets; 第３の実施の形態におけるデータ登録サーバの変換情報記憶部に格納される情報の一例を示す図である。FIG. 12 is a diagram showing an example of information stored in a conversion information storage unit of a data registration server according to the third embodiment; FIG. 共有集合の生成例を示す図である。FIG. 10 is a diagram illustrating an example of generating a shared set; 共有キーワードが格納された秘匿化ＤＢの一例を示す図である。It is a figure which shows an example of the anonymization DB in which the shared keyword was stored. データ登録処理の手順の一例を示すフローチャートである。6 is a flow chart showing an example of a procedure of data registration processing; 共有キーワードへ変換する処理手順の詳細を示すフローチャートである。10 is a flow chart showing details of a processing procedure for conversion into a shared keyword. 分割キーワードから共有キーワードへの変換の一例を示す図である。It is a figure which shows an example of conversion from a division keyword to a shared keyword. 登録データのレコードの一例を示す図である。It is a figure which shows an example of the record of registration data. 第３の実施の形態における端末装置の変換情報記憶部に格納される情報の一例を示す図である。FIG. 12 is a diagram showing an example of information stored in a conversion information storage unit of a terminal device according to the third embodiment; FIG. 秘匿化検索クエリの一例を示す図である。FIG. 10 is a diagram showing an example of anonymized search queries; FIG. 共有キーワードを用いた秘匿化検索の一例を示す図である。FIG. 11 is a diagram showing an example of anonymized search using shared keywords; 検索処理の手順の一例を示すフローチャートである。6 is a flowchart illustrating an example of a search processing procedure; 共有キーワードを用いた場合の頻度分析攻撃の困難性を示す図である。FIG. 10 is a diagram showing the difficulty of a frequency analysis attack when using shared keywords; 数値範囲検索の一例を示す図である。FIG. 10 is a diagram showing an example of a numerical range search; 年齢の共有集合の生成例（比較例）を示す図である。FIG. 11 is a diagram showing an example (comparative example) of generation of a shared set of ages; 年齢の共有集合の生成例を示す図である。FIG. 10 is a diagram showing an example of generating a shared set of ages; 群数を３とした場合の共有集合の生成例を示す図である。FIG. 10 is a diagram showing an example of generating a shared set when the number of groups is 3;

以下、本実施の形態について図面を参照して説明する。なお各実施の形態は、矛盾のない範囲で複数の実施の形態を組み合わせて実施することができる。
〔第１の実施の形態〕
まず、第１の実施の形態について説明する。第１の実施の形態は、キーワードを複数の分割キーワードに確率的に変換し、分割キーワードを暗号化してデータベース（ＤＢ）に登録することで、秘密情報として管理するキーワードの頻度攪乱を実現するものである。このようにして頻度攪乱をしたデータ登録方法およびそのデータの検索方法を含む秘密情報管理方法は、ＳＣＰＤＫ（Share Ciphertext with Probabilistically Divided Keywords）と呼ぶことができる。 Hereinafter, this embodiment will be described with reference to the drawings. It should be noted that each embodiment can be implemented by combining a plurality of embodiments within a consistent range.
[First embodiment]
First, a first embodiment will be described. In the first embodiment, a keyword is stochastically converted into a plurality of divided keywords, and the divided keywords are encrypted and registered in a database (DB), thereby realizing frequency disturbance of keywords managed as secret information. is. A confidential information management method including a method of registering data whose frequency is disturbed in this way and a method of retrieving that data can be called SCPDK (Share Ciphertext with Probabilistically Divided Keywords).

図１は、第１の実施の形態に係る秘密情報管理システムの一例を示す図である。図１には、秘密情報管理システムを用いた秘密情報管理方法の実現例を示している。秘密情報管理システムは、データ登録装置１、サーバ２、およびデータ利用装置３を有している。データ登録装置１、サーバ２、およびデータ利用装置３それぞれは、例えば秘密情報管理方法を実現するための各装置における処理手順が記述されたプログラムを実行することにより、秘密情報管理方法における各装置の処理を実施することができる。 FIG. 1 is a diagram showing an example of a confidential information management system according to the first embodiment. FIG. 1 shows an implementation example of a confidential information management method using a confidential information management system. The confidential information management system has a data registration device 1 , a server 2 and a data utilization device 3 . Each of the data registration device 1, the server 2, and the data utilization device 3 executes a program describing a processing procedure in each device for realizing the confidential information management method, for example. Processing can be performed.

データ登録装置１は、秘密情報管理方法を実現するために、記憶部１ａと処理部１ｂとを有する。記憶部１ａは、例えばデータ登録装置１が有するメモリ、またはストレージ装置である。処理部１ｂは、例えばデータ登録装置１が有するプロセッサ、または演算回路である。図示していないが、サーバ２とデータ利用装置３も、記憶部と処理部とを有する。例えばサーバ２の記憶部は、秘匿化データベース（ＤＢ）２ａを記憶する。 The data registration device 1 has a storage section 1a and a processing section 1b in order to implement a secret information management method. The storage unit 1a is, for example, a memory included in the data registration device 1 or a storage device. The processing unit 1b is, for example, a processor or an arithmetic circuit that the data registration device 1 has. Although not shown, the server 2 and the data utilization device 3 also have a storage section and a processing section. For example, the storage unit of the server 2 stores an anonymization database (DB) 2a.

データ登録装置１の記憶部１ａは、秘匿対象の秘匿レコードを１つ以上含む秘匿レコード群４を記憶する。
データ登録装置１の処理部１ｂは、第２項目値の生成と分類とを行う（ステップＳ１）。具体的には処理部１ｂは、まず秘匿対象の秘匿レコードを１つ以上含む秘匿レコード群４における秘匿レコード内の一の項目に設定可能な複数の第１項目値それぞれに対して、ｎ個（ｎは自然数）の第２項目値を対応付ける。図１の例では、第１項目値「小児科」は第２項目値「小児科０」、「小児科１」に対応付けられ、第１項目値「婦人科」は第２項目値「婦人科０」、「婦人科１」に対応付けられ、第１項目値「内科」は第２項目値「内科０」、「内科１」に対応付けられている。 The storage unit 1a of the data registration device 1 stores a confidential record group 4 including one or more confidential records to be confidential.
The processing unit 1b of the data registration device 1 generates and classifies second item values (step S1). Specifically, the processing unit 1b first selects n ( n is a natural number) is associated with the second item value. In the example of FIG. 1, the first item value "Pediatrics" is associated with the second item values "Pediatrics 0" and "Pediatrics 1", and the first item value "Gynecology" is associated with the second item value "Gynecology 0". , “gynecology 1”, and the first item value “internal medicine” is associated with the second item values “internal medicine 0” and “internal medicine 1”.

なお一部の第１項目値については、対応する第２項目値が１つであることも許容される。対応する第２項目値が１つの場合、第１項目値と第２項目値とは同じ値であってもよい。 It is also permissible for some first item values to have one corresponding second item value. When there is one corresponding second item value, the first item value and the second item value may be the same value.

処理部１ｂは、複数の項目値集合５ａ～５ｃそれぞれが、同一の第１項目値に対応付けられた２個以上の第２項目値のうちの１個からｎ－１個のいずれかの個数の第２項目値を含むように、第２項目値それぞれを複数の項目値集合５ａ～５ｃのいずれかに分類する。例えば処理部１ｂは、項目値集合５ａに「小児科０」と「内科０」を分類し、項目値集合５ｂに「小児科１」と「婦人科０」を分類し、項目値集合５ｃに「婦人科１」と「内科１」を分類する。このとき処理部１ｂは、複数の項目値集合において、属する第２項目値に対応する第１項目値の組み合わせが同じとなることを抑止する。 The processing unit 1b selects any number from 1 to n-1 of two or more second item values associated with the same first item value for each of the plurality of item value sets 5a to 5c Each second item value is classified into one of a plurality of item value sets 5a to 5c so as to include the second item values of . For example, the processing unit 1b classifies "pediatrics 0" and "internal medicine 0" into the item value set 5a, classifies "pediatrics 1" and "gynecology 0" into the item value set 5b, and classifies "gynecology 0" into the item value set 5c. Department 1” and “Internal Medicine 1”. At this time, the processing unit 1b prevents the combination of the first item values corresponding to the belonging second item values from becoming the same in a plurality of item value sets.

次に処理部１ｂは、秘匿レコード内の一の項目に設定されている第１項目値を、対応付けられた第２項目値のいずれかに確率的に変換する（ステップＳ２）。例えば各第２項目値には、ランダムに決定された選択確率が設定されている。処理部１ｂは、第１項目値に対して、対応付けられた第２項目値のうちの１つを、それぞれの選択確率に従った確率で選択し、その第１項目値を選択した第２項目値に変換する。 Next, the processing unit 1b probabilistically converts the first item value set in one item in the confidential record to one of the associated second item values (step S2). For example, each second item value is set with a randomly determined selection probability. The processing unit 1b selects one of the second item values associated with the first item value with a probability according to the respective selection probabilities, and selects the second item value that selects the first item value. Convert to item value.

処理部１ｂは、秘匿レコード群４に基づいて、ダミーの第２項目値の追加により、同一の項目値集合に属する第２項目値に対する検索のヒット件数の均等化を図った攪乱レコード群６を生成する（ステップＳ３）。例えば処理部１ｂは、項目値集合に属する第２項目値それぞれを同一の項目値集合に属する異なる第２項目値に全単射する全単射関係に従って、秘匿レコードそれぞれの一の項目に設定された第２項目値それぞれに対応するダミー値を生成する。次に処理部１ｂは、ダミー値が一の項目に設定された、秘匿レコードと同数のダミーレコードを有するダミーレコード群を生成する。そして処理部１ｂは、秘匿レコード群とダミーレコード群とを含む攪乱レコード群６を生成する。 Based on the secret record group 4, the processing unit 1b generates a disturbance record group 6 in which the number of search hits for the second item values belonging to the same item value set is equalized by adding dummy second item values. Generate (step S3). For example, the processing unit 1b is set to one item of each confidential record according to a bijective relationship in which each second item value belonging to the item value set is bijected to a different second item value belonging to the same item value set. A dummy value corresponding to each second item value is generated. Next, the processing unit 1b generates a dummy record group having the same number of dummy records as the confidential records, in which one item is set with a dummy value. Then, the processing unit 1b generates a disturbance record group 6 including the secret record group and the dummy record group.

さらに処理部１ｂは、攪乱レコード群６内のレコードに、そのレコードに含まれる第２項目値の真偽を示すフラグを付与する（ステップＳ４）。例えば処理部１ｂは、秘匿レコードに真を示す第１フラグを付与し、ダミーレコードに偽を示す第２フラグを付与する。 Further, the processing unit 1b attaches a flag indicating whether the second item value included in the record is true or false to the record in the disturbance record group 6 (step S4). For example, the processing unit 1b assigns a first flag indicating true to the confidential record, and assigns a second flag indicating false to the dummy record.

処理部１ｂは、攪乱レコード群を暗号化する（ステップＳ５）。例えば処理部１ｂは、秘匿レコードとダミーレコードそれぞれの一の項目に設定された第２項目値、秘匿レコードに付与された第１フラグ、並びにダミーレコードに付与された第２フラグを暗号化する。 The processing unit 1b encrypts the disturbance record group (step S5). For example, the processing unit 1b encrypts the second item value set to one item of each of the secret record and the dummy record, the first flag given to the secret record, and the second flag given to the dummy record.

そして処理部１ｂは、暗号化された攪乱レコード群６を、例えばサーバ２の秘匿化ＤＢ２ａに格納する。
このようにして、ダミーレコードの追加によって頻度分析攻撃に対する安全性を向上させた攪乱レコード群６が、サーバ２の秘匿化ＤＢ２ａに格納される。サーバ２は、秘匿化ＤＢ２ａに格納されたデータに対する検索サービスをデータ利用装置３に提供する。 Then, the processing unit 1b stores the encrypted disturbance record group 6 in the anonymization DB 2a of the server 2, for example.
In this way, the disturbance record group 6 with improved security against frequency analysis attacks by adding dummy records is stored in the anonymization DB 2 a of the server 2 . The server 2 provides the data utilization device 3 with a search service for the data stored in the anonymization DB 2a.

データ利用装置３は、データ登録装置１が生成した項目値集合５ａ～５ｃの情報、および第１項目値と第２項目値との対応関係を示す情報を有している。例えばデータ利用装置３は、これらの情報をデータ登録装置１から取得する。またデータ利用装置３はデータ登録装置１の処理部１ｂと同じアルゴリズムで第１項目値に基づく第２項目値の生成および第２項目値の項目値群への分類を行ってもよい。 The data utilization device 3 has information on the item value sets 5a to 5c generated by the data registration device 1 and information indicating the correspondence relationship between the first item values and the second item values. For example, the data utilization device 3 acquires these pieces of information from the data registration device 1 . The data utilization device 3 may also generate second item values based on the first item values and classify the second item values into item value groups using the same algorithm as the processing unit 1b of the data registration device 1 .

データ利用装置３は、検索者から検索条件の入力を受け付ける。検索条件には、例えば秘匿化ＤＢ２ａの一の項目の検索項目値が示される。データ利用装置３は、検索条件に応じた検索クエリをサーバ２に送信する。例えばデータ利用装置３は、検索条件に示される一の項目の検索項目値を、検索項目値と同じ値の第１項目値に対応する第２項目値それぞれに変換する。データ利用装置３は、変換により得られた第２項目値を暗号文のまま検索する検索クエリをサーバ２に送信する。例えばデータ利用装置３は、第２項目値の暗号文の論理和を含む検索クエリをサーバ２に送信する。 The data utilization device 3 receives input of search conditions from the searcher. The search condition indicates, for example, a search item value of one item in the anonymization DB 2a. The data utilization device 3 transmits a search query according to the search conditions to the server 2 . For example, the data utilization device 3 converts the search item value of one item indicated in the search condition into second item values corresponding to the first item value having the same value as the search item value. The data utilization device 3 transmits to the server 2 a search query for searching the second item value obtained by the conversion as it is in encrypted text. For example, the data utilization device 3 transmits to the server 2 a search query including the logical sum of the ciphertexts of the second item values.

サーバ２では、受信した検索クエリに応じて暗号文のまま検索を行う。例えばサーバ２は、検索クエリ内の暗号文のままの第２項目値と、秘匿化ＤＢ２ａ内の各レコードの暗号文の項目値とを照合する。サーバ２は、検索項目値に一致する項目値が設定されたレコードを含む検索結果をデータ利用装置３に送信する。 The server 2 searches the received search query with the ciphertext as it is. For example, the server 2 compares the ciphertext second item value in the search query with the ciphertext item value of each record in the anonymization DB 2a. The server 2 transmits to the data utilization device 3 search results including records in which item values matching the search item values are set.

データ利用装置３は、サーバ２から、秘匿化ＤＢ２ａ内の検索クエリにヒットした検出レコードを取得する。そしてデータ利用装置３は、検出レコードに付与されたフラグに基づいて、検出レコード内の第２項目値にヒットした値の真偽を判断し、真の検索結果を表示する。 The data utilization device 3 acquires from the server 2 the detection record hit by the search query in the anonymization DB 2a. Based on the flag attached to the detected record, the data utilization device 3 judges whether the value that hits the second item value in the detected record is true or false, and displays the true search result.

例えばデータ利用装置３は、検索結果に含まれる秘匿レコードとダミーレコードそれぞれの一の項目に設定された項目値、秘匿レコードに付与された第１フラグ、およびダミーレコードに付与された第２フラグを復号する。次にデータ利用装置３は、第１フラグおよび第２フラグに基づいて、検索結果からダミーレコードを除去し、検索条件を満たす秘匿レコードを取得する。そしてデータ利用装置３は、真の検索結果を例えば表示画面に表示する。 For example, the data utilization device 3 stores the item value set in one item of each of the confidential records and the dummy records included in the search result, the first flag assigned to the confidential records, and the second flag assigned to the dummy records. Decrypt. Next, the data utilization device 3 removes dummy records from the search results based on the first flag and the second flag, and obtains confidential records that satisfy the search conditions. Then, the data utilization device 3 displays the true search result on the display screen, for example.

このようにして秘匿レコード群４内での第１項目値それぞれの出現頻度を攪乱し、秘匿化ＤＢ２ａに対する頻度分析攻撃への安全性を向上させることができる。
なお処理部１ｂは、ダミーレコード群が複数ある場合、秘匿レコードの一の項目に設定された第２項目値が、ダミーレコード群ごとに異なる第２項目値に変換されるように、ダミーレコード群ごとに異なる全単射関係を利用することができる。これにより、頻度分析攻撃に対する安全性をさらに向上させることができる。 In this way, the frequency of appearance of each first item value in the confidential record group 4 can be disturbed, and the security against frequency analysis attacks on the confidential DB 2a can be improved.
Note that, when there are a plurality of dummy record groups, the processing unit 1b sets the dummy record groups so that the second item value set in one item of the confidential record is converted into a different second item value for each dummy record group. Different bijective relations can be used for each This can further improve security against frequency analysis attacks.

処理部１ｂは、１または複数生成されたダミーレコード群ごとに異なり、第１フラグとも異なる値の第２フラグを、ダミーレコード群それぞれに含まれるダミーレコードに設定してもよい。例えば処理部１ｂは、秘匿レコードの識別子と秘匿レコード群の群番号を示す値とを含む第１フラグを秘匿レコードに付与し、ダミーレコードの識別子と、属するダミーレコード群の群番号とを含む第２フラグをダミーレコードに設定する。これにより、第２フラグの値に基づいてダミーレコードが属するダミーレコード群を識別することができる。その結果、ダミーレコード内の第２項目値を第１項目値に逆変換することで、ダミーレコードに基づいて元のレコードを再現することが可能となる。 The processing unit 1b may set a second flag having a value that is different for each dummy record group that is generated one or more and that is different from the value of the first flag, to each dummy record included in each dummy record group. For example, the processing unit 1b attaches a first flag including a value indicating the identifier of the confidential record and the group number of the confidential record group to the confidential record, and a first flag including the identifier of the dummy record and the group number of the dummy record group to which it belongs. 2 flags are set to dummy records. Thereby, the dummy record group to which the dummy record belongs can be identified based on the value of the second flag. As a result, by inversely converting the second item value in the dummy record to the first item value, it is possible to reproduce the original record based on the dummy record.

データ登録装置１は、ダミーレコードを用いずに頻度攪乱を実現することもできる。例えば処理部１ｂは、秘匿レコード内の一の項目に設定されている第２項目値を、第２項目値が属する項目値集合に対応付けられ、同じ項目値集合に属するいずれの第２項目値に対する検索にもヒットする第３項目値に変換することで攪乱レコードを生成する。そして処理部１ｂは、生成した攪乱レコードを含む攪乱レコード群を生成する。これにより、ダミーレコードを用いずに済み、秘匿化ＤＢ２ａに登録するデータ量を削減することができる。 The data registration device 1 can also implement frequency disturbance without using dummy records. For example, the processing unit 1b associates the second item value set in one item in the confidential record with the item value set to which the second item value belongs, and determines which second item value belongs to the same item value set. A disturbance record is generated by converting to a third item value that also hits a search for . Then, the processing unit 1b generates a disturbance record group including the generated disturbance records. As a result, the amount of data to be registered in the anonymization DB 2a can be reduced without using dummy records.

第２項目値を第３項目値に変換する場合、処理部１ｂは、例えば攪乱レコードに、変換前の第２項目値を示す情報を含む第３フラグを付与する。処理部１ｂは、例えば秘匿レコードの識別子と、変換前の第２項目値に対応する要素の項目値集合内での要素番号とを含む第３フラグを、秘匿レコードに付与する。さらに処理部１ｂは、攪乱レコードの一の項目に設定された第３項目値、および付与された第３フラグを暗号化する。 When converting the second item value into the third item value, the processing unit 1b gives the disturbance record, for example, a third flag including information indicating the second item value before conversion. The processing unit 1b gives the confidential record a third flag including, for example, the identifier of the confidential record and the element number in the item value set of the element corresponding to the second item value before conversion. Further, the processing unit 1b encrypts the third item value set in one item of the disturbance record and the attached third flag.

データ利用装置３は、第３フラグを用いて第３項目値を変換元の第２項目値に戻し、その第２項目値に対応する第１項目値を取得できる。例えばデータ利用装置３は、検索条件に示される一の項目の検索項目値を、検索項目値と同じ値の第１項目値に対応する１または複数の第２項目値それぞれに変換後、第２項目値を、その第２項目値に対応する第３項目値に変換する。そしてデータ利用装置３は、第３項目値の暗号文を含む検索クエリをサーバ２に送信する。 The data utilization device 3 can use the third flag to restore the third item value to the original second item value, and acquire the first item value corresponding to the second item value. For example, the data utilization device 3 converts the search item value of one item indicated in the search condition into one or a plurality of second item values corresponding to the first item value having the same value as the search item value, and then converts the second item value to the second item value. Convert the item value to a third item value corresponding to the second item value. The data utilization device 3 then transmits to the server 2 a search query containing the encrypted text of the third item value.

その後、データ利用装置３は、サーバ２から、秘匿化ＤＢ２ａ内の検索クエリによる検索結果を取得する。次にデータ利用装置３は、検索結果に含まれる攪乱レコードそれぞれに設定された第３項目値と第３フラグとを復号する。さらにデータ利用装置３は、第３フラグに基づいて第３項目値の変換元の第２項目値を判断する。そしてデータ利用装置３は、検索条件を満たす第１項目値に対応する第２項目値の暗号文が格納された攪乱レコードを、検索結果から抽出する。 After that, the data utilization device 3 acquires from the server 2 the search result of the search query in the anonymization DB 2a. Next, the data utilization device 3 decodes the third item value and the third flag set in each disturbance record included in the search result. Furthermore, the data utilization device 3 determines the second item value from which the third item value is converted based on the third flag. Then, the data utilization device 3 extracts, from the search result, a disturbance record in which the ciphertext of the second item value corresponding to the first item value that satisfies the search condition is stored.

第１項目値が数値の場合、数値範囲検索が行われることにより、暗号文の元の平文の数値の並びが解析され、検索対象の絞り込みに利用される可能性がある。処理部１ｂは、複数の第１項目値が数値の場合、第２項目値を複数の項目値集合５ａ～５ｃのいずれかに分類する際に、数値の連続性を考慮した分類を行う。例えば処理部１ｂは、複数の第１項目値が数値の場合、第１数値範囲内の連続する複数の数値それぞれに対応する第２項目値を含む項目値集合に、第１数値範囲と重複しない第２数値範囲内の連続する数値それぞれに対応する第２項目値が含まれるようにする。これにより、数値の検索が行われた場合に、攻撃者によって、その数値の候補が、狭い数値範囲内の値に絞り込まれてしまうことを抑止できる。 When the first item value is a numerical value, a sequence of numerical values in the original plaintext of the ciphertext may be analyzed by performing a numerical range search and used to narrow down the search target. When the plurality of first item values are numerical values, the processing unit 1b classifies the second item values into one of the plurality of item value sets 5a to 5c by considering the continuity of the numerical values. For example, when a plurality of first item values are numerical values, the processing unit 1b adds a set of item values that do not overlap with the first numerical range to an item value set that includes second item values corresponding to each of a plurality of continuous numerical values within the first numerical range. A second item value corresponding to each consecutive numeric value within the second numeric range is included. This prevents an attacker from narrowing down candidates for the numerical value to values within a narrow numerical range when a numerical value is searched.

〔第２の実施の形態〕
次に第２の実施の形態について説明する。第２の実施の形態は、多数の医療機関が有する患者データを、患者データ収集活用基盤を用いて有効活用するものである。例えば患者データ収集活用基盤により、複数の病院のデータを統合してビッグデータ化し、ビッグデータを複数の製薬企業で活用できるようにする。これにより、製薬企業や病院は、新薬開発のための調査（対象疾患の患者数や所在地域など）を容易に把握できるようになる。 [Second embodiment]
Next, a second embodiment will be described. The second embodiment effectively utilizes patient data possessed by many medical institutions using a patient data collection and utilization platform. For example, the patient data collection and utilization platform integrates data from multiple hospitals into big data so that it can be used by multiple pharmaceutical companies. As a result, pharmaceutical companies and hospitals will be able to easily grasp surveys for new drug development (number of patients with target diseases, location, etc.).

患者データ収集活用基盤は、ＩＣＴ（Information and Communications Technology）企業が管理するクラウドを用いて実現するのが効率的である。クラウドを用いることで、病院や製薬企業からのビッグデータへのアクセスが容易となる。しかしながら、患者データは要配慮個人情報であり、法的に参照が許される手続きを経たとしても、漏洩や目的外利用のリスクを考慮し、クラウドの管理者に対しても秘匿しておくのが適切である。また製薬企業による検索の内容は製薬企業の戦略に関する重要な企業秘密に結びつくため、検索内容についても秘匿しておくことが望ましい。そこで、患者データ収集活用基盤を実現するクラウドは、例えば暗号化したまま検索可能な暗号化方式を用いて、暗号化された患者データをＤＢで管理すると共に、暗号化された検索キーワードを用いて、暗号文のままでデータ検索を行う。これにより、クラウドの管理者に対しても、患者データと検索クエリの内容を秘匿しておくことができる。 It is efficient to implement the patient data collection and utilization platform using a cloud managed by an ICT (Information and Communications Technology) company. Using the cloud will facilitate access to big data from hospitals and pharmaceutical companies. However, patient data is personal information that requires special care, and even if legally allowed to refer to it, it should be kept confidential even from the cloud administrator in consideration of the risk of leaks and unintended use. Appropriate. In addition, since the content of searches by pharmaceutical companies is linked to important trade secrets related to strategies of pharmaceutical companies, it is desirable to keep the content of searches confidential as well. Therefore, the cloud that realizes the patient data collection and utilization platform manages encrypted patient data in a DB, for example, using an encryption method that allows searching while encrypted, and uses encrypted search keywords. , data retrieval is performed with the ciphertext as it is. As a result, patient data and the content of search queries can be kept secret even from the cloud administrator.

複数の組織（例えば病院）のデータを同じ仕組みで使用する場合、ＤＢのフォーマットや格納する属性名と値の仕様は、共通の仕様として公開される。しかもシステム開発も担うクラウドの管理者は、秘匿化のアルゴリズムを熟知している。すると、クラウドの管理者の中に悪意を有する者が存在した場合、患者データを暗号文のまま管理するだけでは不十分な場合があり得る。 When data of multiple organizations (for example, hospitals) are used with the same mechanism, specifications of the DB format and attribute names and values to be stored are published as common specifications. Moreover, cloud administrators, who are also responsible for system development, are familiar with encryption algorithms. Then, if there is a person with malicious intent among the cloud administrators, it may not be enough to manage patient data as it is in ciphertext.

ここで、絞り込み検索などを用いて効率よく検索を行うために、データは行列形式で格納することを想定する。この場合、例えば項目「性別」のラベルが振られた列では平文候補が「男」または「女」の２種類しかなく、秘匿化ＤＢ内には同じ平文に基づく暗号文が多数存在することとなる。そして、攻撃者となり得るクラウドの管理者はこれらの暗号文を比較参照できる。 Here, it is assumed that data is stored in a matrix format in order to perform efficient retrieval using narrowed retrieval or the like. In this case, for example, in the column labeled with the item “gender”, there are only two types of plaintext candidates, “male” and “female”, and there are many ciphertexts based on the same plaintext in the anonymization DB. Become. A cloud administrator, who could be an attacker, can then compare and refer to these ciphertexts.

さらに、大きな病院では広報の一環として疾患別患者数などの情報を公開している。同様にあらゆる情報について、このような頻度情報が公開される可能性がある。そのため、すべてのデータの頻度分布は公知となる場合がある。例えば複数の項目の値の組み合わせ頻度（肺がんの男性の人数など）についても公知となり得る。また、医療情報は日々新たな情報が追加され、利活用者は最新の情報を求める。よって、秘匿化ＤＢは逐次最新の平文ＤＢとの差分が反映できることが重要である。 In addition, large hospitals disclose information such as the number of patients by disease as part of public relations. Similarly, for any information, such frequency information may be published. Therefore, the frequency distribution of all data may be publicly known. For example, the combination frequency of the values of multiple items (the number of men with lung cancer, etc.) can also be publicly known. In addition, new information is added to medical information on a daily basis, and users demand the latest information. Therefore, it is important that the anonymization DB can reflect the difference from the latest plaintext DB sequentially.

以上により、第２の実施の形態では、以下の条件（ｉ）～（ｖ）下でも秘匿化ＤＢの内容や検索内容が、秘匿化ＤＢを管理するクラウド管理者を含む攻撃者に対し秘匿できることを、セキュリティ要件とする。
（ｉ）平文の種類や値は公知であり、極めて種類が少ない場合もあり得る。
（ｉｉ）攻撃者は、秘匿化ＤＢ内に存在するすべての暗号文と暗号化された検索クエリおよびこれに合致した秘匿化ＤＢ内の暗号文をすべて参照可能である。
（ｉｉｉ）意図的に秘密情報として管理する情報（秘密鍵）以外の、暗号化や照合のアルゴリズムは公知である。
（ｉｖ）すべてのデータの頻度分布は組み合わせも含めて公知である。
（ｖ）秘匿化ＤＢは逐次更新され、攻撃者は差分情報を参照可能である。 As described above, in the second embodiment, even under the following conditions (i) to (v), the content of the anonymization DB and the search content can be concealed from attackers including the cloud administrator who manages the anonymization DB. is a security requirement.
(i) Plaintext types and values are publicly known, and there may be cases where there are very few types.
(ii) An attacker can refer to all ciphertexts and encrypted search queries existing in the anonymization DB and all matching ciphertexts in the anonymization DB.
(iii) Algorithms for encryption and verification other than information (secret key) intentionally managed as secret information are publicly known.
(iv) The frequency distribution of all data, including combinations, is publicly known.
(v) The anonymization DB is updated sequentially, and the attacker can refer to the difference information.

条件（ｉ）～（ｖ）を満たす秘匿化ＤＢがあるとき、容易に想定される攻撃手法としては総当たり攻撃が考えられる。平文の種類が少なく公知なため、暗号化鍵が公知である場合、攻撃者は全種類の平文を暗号化して平文と暗号文の辞書を作成することで、秘匿化ＤＢ内のデータや検索クエリを容易に解読できてしまう。よって、セキュリティ要件を満たすには、暗号化鍵は秘密鍵とすることとなる。 When there is an anonymized DB that satisfies the conditions (i) to (v), a brute force attack can be considered as an attack method that can be easily assumed. Since the types of plaintext are few and publicly known, if the encryption key is known, an attacker can encrypt all types of plaintext and create a dictionary of plaintext and ciphertext. can be easily deciphered. Therefore, in order to satisfy the security requirements, the encryption key should be a private key.

また、秘匿化ＤＢの管理者でもある攻撃者は照合判定の結果を参照できる。そのため、ある検索クエリに対し一致判定された暗号文はすべて同じ平文に対応することが分かってしまう。よって、同じ平文でも暗号化するたびに異なる暗号文となる確率的暗号を用いても、攻撃者は同じ平文が同じ暗号文となる確定的暗号のような暗号文に変換できてしまう。そして、攻撃者はデータの頻度分布を知っているため、暗号文の頻度と比較することで容易に秘匿データの内容を推定できてしまう。頻度分布を正確に知らない場合でも、例えば婦人科の性別データを参照すると、多い方の暗号文の平文は「女」であると容易に特定できる。よって、暗号化だけでなく頻度攪乱などの対策を採ることが重要となる。 In addition, the attacker who is also the administrator of the anonymization DB can refer to the result of the collation determination. Therefore, it turns out that all the ciphertexts determined to match a certain search query correspond to the same plaintext. Therefore, even if a probabilistic cipher is used in which even the same plaintext has different ciphertexts each time it is encrypted, an attacker can convert the same plaintext into ciphertexts such as deterministic ciphers in which the same ciphertexts are the same. Since the attacker knows the frequency distribution of the data, he/she can easily estimate the content of the secret data by comparing it with the frequency of the ciphertext. Even if the frequency distribution is not accurately known, for example, by referring to gynecological sex data, it is easy to identify that the plaintext of the more frequent ciphertexts is "female." Therefore, it is important to take countermeasures such as frequency disturbance in addition to encryption.

そこで第２の実施の形態では、条件（ｉ）～（ｖ）下でも秘匿化ＤＢの内容や検索内容を、攻撃者に対し秘匿できる秘密情報管理システムを提供する。第２の実施の形態に係る秘密情報管理システムでは、ダミーデータを追加することで頻度攪乱を実現する。この際、秘密情報管理システムは、データの増加率は一定に保ち、不要なストレージや検索処理の増加を防止する。そして秘密情報管理システムは、ダミーレコードを追加することにより、ある１種類の項目値に対し、出現頻度が同程度となる他の種類の項目値の絞り込みを抑止する。例えば秘密情報管理システムは、攻撃者が検索クエリにヒットしたレコード数からその検索クエリで指定された項目値の候補を絞り込もうとしても、容易には絞り込めないようにする。 Therefore, the second embodiment provides a confidential information management system that can conceal the contents of the anonymized DB and the search contents from an attacker even under the conditions (i) to (v). In the confidential information management system according to the second embodiment, frequency disturbance is realized by adding dummy data. At this time, the secret information management system keeps the data growth rate constant to prevent unnecessary increases in storage and search processing. By adding a dummy record, the secret information management system suppresses narrowing down of other types of item values that have similar appearance frequencies to one type of item value. For example, the secret information management system prevents an attacker from easily narrowing down candidates for item values specified by a search query based on the number of records hit by the search query.

図２は、秘密情報管理システムの一例を示す図である。第２の実施の形態では、患者データ収集活用基盤１２がクラウドによって構築されている。患者データ収集活用基盤１２はデータ管理サーバ１００を有している。データ管理サーバ１００は、患者データを暗号文のままで管理するコンピュータである。データ管理サーバ１００は、ネットワーク２０を介して、病院１３，１４のデータ登録サーバ２００，３００と製薬企業１５，１６の端末装置４００，５００に接続されている。 FIG. 2 is a diagram showing an example of a confidential information management system. In the second embodiment, the patient data collection and utilization platform 12 is constructed by cloud. The patient data collection and utilization platform 12 has a data management server 100 . The data management server 100 is a computer that manages patient data in encrypted form. The data management server 100 is connected to data registration servers 200 and 300 of hospitals 13 and 14 and terminal devices 400 and 500 of pharmaceutical companies 15 and 16 via a network 20 .

病院１３のデータ登録サーバ２００は、病院１３で受診した患者の電子カルテなどの患者データを蓄積し、その患者データを暗号化してデータ管理サーバ１００に提供するコンピュータである。同様に、病院１４のデータ登録サーバ３００は、病院１４で受診した患者の電子カルテなどの患者データを蓄積し、その患者データを暗号化してデータ管理サーバ１００に提供する。 The data registration server 200 of the hospital 13 is a computer that accumulates patient data such as electronic medical charts of patients who have been examined at the hospital 13 , encrypts the patient data, and provides the data management server 100 with the encrypted patient data. Similarly, the data registration server 300 of the hospital 14 accumulates patient data such as electronic medical charts of patients examined at the hospital 14 , encrypts the patient data, and provides the data management server 100 with the encrypted patient data.

製薬企業１５の端末装置４００は、データ管理サーバ１００で管理されている患者データを検索するために、製薬企業１５の社員が使用するコンピュータである。製薬企業１６の端末装置５００は、データ管理サーバ１００で管理されている患者データを検索するために、製薬企業１６の社員が使用するコンピュータである。 The terminal device 400 of the pharmaceutical company 15 is a computer used by employees of the pharmaceutical company 15 to retrieve patient data managed by the data management server 100 . A terminal device 500 of the pharmaceutical company 16 is a computer used by an employee of the pharmaceutical company 16 to retrieve patient data managed by the data management server 100 .

このような秘密情報管理システムは、例えば医療情報を活用した新薬開発の効率化に有用である。例えば、製薬企業１５，１６が、治験を行う場合、対象疾患の患者がどの程度存在するか等を考慮して計画を立案することで、治験の成功率を向上させることができる。そこで、患者データ収集活用基盤１２で多数の病院１３，１４に分散する患者の電子カルテから抽出した患者データを集中管理することで、目的の疾患を有する患者の情報を容易に得ることが可能となる。 Such a secret information management system is useful, for example, in improving the efficiency of new drug development utilizing medical information. For example, when the pharmaceutical companies 15 and 16 conduct a clinical trial, they can improve the success rate of the clinical trial by considering how many patients with the target disease are present in the plan. Therefore, by centrally managing the patient data extracted from the patient's electronic medical records distributed in many hospitals 13 and 14 on the patient data collection and utilization platform 12, it is possible to easily obtain the information of the patient with the target disease. Become.

なおデータ登録サーバ２００，３００は、第１の実施の形態におけるデータ登録装置１の一例である。データ管理サーバ１００は、第１の実施の形態におけるサーバ２の一例である。端末装置４００，５００は、第１の実施の形態におけるデータ利用装置３の一例である。 The data registration servers 200 and 300 are examples of the data registration device 1 in the first embodiment. The data management server 100 is an example of the server 2 in the first embodiment. Terminal devices 400 and 500 are examples of data utilization device 3 in the first embodiment.

図３は、データ管理サーバのハードウェアの一構成例を示す図である。データ管理サーバ１００は、プロセッサ１０１によって装置全体が制御されている。プロセッサ１０１には、バス１０９を介してメモリ１０２と複数の周辺機器が接続されている。プロセッサ１０１は、マルチプロセッサであってもよい。プロセッサ１０１は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、またはＤＳＰ（Digital Signal Processor）である。プロセッサ１０１がプログラムを実行することで実現する機能の少なくとも一部を、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）などの電子回路で実現してもよい。 FIG. 3 is a diagram showing a configuration example of hardware of the data management server. The data management server 100 is entirely controlled by a processor 101 . A memory 102 and a plurality of peripheral devices are connected to the processor 101 via a bus 109 . Processor 101 may be a multiprocessor. The processor 101 is, for example, a CPU (Central Processing Unit), MPU (Micro Processing Unit), or DSP (Digital Signal Processor). At least part of the functions realized by the processor 101 executing the program may be realized by an electronic circuit such as an ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device).

メモリ１０２は、データ管理サーバ１００の主記憶装置として使用される。メモリ１０２には、プロセッサ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、メモリ１０２には、プロセッサ１０１による処理に利用する各種データが格納される。メモリ１０２としては、例えばＲＡＭ（Random Access Memory）などの揮発性の半導体記憶装置が使用される。 The memory 102 is used as the main storage device of the data management server 100 . The memory 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the processor 101 . In addition, the memory 102 stores various data used for processing by the processor 101 . As the memory 102, for example, a volatile semiconductor memory device such as a RAM (Random Access Memory) is used.

バス１０９に接続されている周辺機器としては、ストレージ装置１０３、ＧＰＵ（Graphics Processing Unit）１０４、入力インタフェース１０５、光学ドライブ装置１０６、機器接続インタフェース１０７およびネットワークインタフェース１０８がある。 Peripheral devices connected to the bus 109 include a storage device 103 , a GPU (Graphics Processing Unit) 104 , an input interface 105 , an optical drive device 106 , a device connection interface 107 and a network interface 108 .

ストレージ装置１０３は、内蔵した記録媒体に対して、電気的または磁気的にデータの書き込みおよび読み出しを行う。ストレージ装置１０３は、コンピュータの補助記憶装置として使用される。ストレージ装置１０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。なお、ストレージ装置１０３としては、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を使用することができる。 The storage device 103 electrically or magnetically writes data to and reads data from a built-in recording medium. The storage device 103 is used as an auxiliary storage device for the computer. The storage device 103 stores an OS program, application programs, and various data. As the storage device 103, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive) can be used.

ＧＰＵ１０４には、モニタ２１が接続されている。ＧＰＵ１０４は、プロセッサ１０１からの命令に従って、画像をモニタ２１の画面に表示させる。ＧＰＵ１０４は、グラフィックコントローラと呼ばれることもある。モニタ２１としては、有機ＥＬ（Electro Luminescence）を用いた表示装置や液晶表示装置などがある。 A monitor 21 is connected to the GPU 104 . The GPU 104 displays an image on the screen of the monitor 21 according to instructions from the processor 101 . GPU 104 is sometimes called a graphics controller. Examples of the monitor 21 include a display device using an organic EL (Electro Luminescence), a liquid crystal display device, and the like.

入力インタフェース１０５には、キーボード２２とマウス２３とが接続されている。入力インタフェース１０５は、キーボード２２やマウス２３から送られてくる信号をプロセッサ１０１に送信する。なお、マウス２３は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 22 and a mouse 23 are connected to the input interface 105 . The input interface 105 transmits signals sent from the keyboard 22 and mouse 23 to the processor 101 . Note that the mouse 23 is an example of a pointing device, and other pointing devices can also be used. Other pointing devices include touch panels, tablets, touchpads, trackballs, and the like.

光学ドライブ装置１０６は、レーザ光などを利用して、光ディスク２４に記録されたデータの読み取りを行う。光ディスク２４は、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク２４には、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ－ＲＡＭ、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ－Ｒ（Recordable）／ＲＷ（ReWritable）などがある。 The optical drive device 106 reads data recorded on the optical disc 24 using laser light or the like. The optical disc 24 is a portable recording medium on which data is recorded so as to be readable by light reflection. The optical disc 24 includes DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable)/RW (ReWritable), and the like.

機器接続インタフェース１０７は、データ管理サーバ１００に周辺機器を接続するための通信インタフェースである。例えば機器接続インタフェース１０７には、メモリ装置２５やメモリリーダライタ２６を接続することができる。メモリ装置２５は、機器接続インタフェース１０７との通信機能を搭載した記録媒体である。メモリリーダライタ２６は、メモリカード２７へのデータの書き込み、またはメモリカード２７からのデータの読み出しを行う装置である。メモリカード２７は、カード型の記録媒体である。 The device connection interface 107 is a communication interface for connecting peripheral devices to the data management server 100 . For example, the device connection interface 107 can be connected to the memory device 25 and the memory reader/writer 26 . The memory device 25 is a recording medium equipped with a communication function with the device connection interface 107 . The memory reader/writer 26 is a device that writes data to the memory card 27 or reads data from the memory card 27 . The memory card 27 is a card-type recording medium.

ネットワークインタフェース１０８は、ネットワーク２０に接続されている。ネットワークインタフェース１０８は、ネットワーク２０を介して、他のコンピュータまたは通信機器との間でデータの送受信を行う。 Network interface 108 is connected to network 20 . Network interface 108 transmits and receives data to and from other computers or communication devices via network 20 .

データ管理サーバ１００は、以上のようなハードウェア構成によって、第２の実施の形態の処理機能を実現することができる。なおデータ登録サーバ２００，３００および端末装置４００，５００も、データ管理サーバ１００と同様のハードウェアにより実現することができる。さらに図１に示したデータ登録装置１、サーバ２、およびデータ利用装置３も、データ管理サーバ１００と同様のハードウェアにより実現することができる。 The data management server 100 can implement the processing functions of the second embodiment with the hardware configuration described above. The data registration servers 200 and 300 and the terminal devices 400 and 500 can also be realized by hardware similar to the data management server 100. FIG. Further, the data registration device 1, the server 2, and the data utilization device 3 shown in FIG. 1 can also be realized by hardware similar to the data management server 100.

データ管理サーバ１００は、例えばコンピュータ読み取り可能な記録媒体に記録されたプログラムを実行することにより、第２の実施の形態の処理機能を実現する。データ管理サーバ１００に実行させる処理内容を記述したプログラムは、様々な記録媒体に記録しておくことができる。例えば、データ管理サーバ１００に実行させるプログラムをストレージ装置１０３に格納しておくことができる。プロセッサ１０１は、ストレージ装置１０３内のプログラムの少なくとも一部をメモリ１０２にロードし、プログラムを実行する。またデータ管理サーバ１００に実行させるプログラムを、光ディスク２４、メモリ装置２５、メモリカード２７などの可搬型記録媒体に記録しておくこともできる。可搬型記録媒体に格納されたプログラムは、例えばプロセッサ１０１からの制御により、ストレージ装置１０３にインストールされた後、実行可能となる。またプロセッサ１０１が、可搬型記録媒体から直接プログラムを読み出して実行することもできる。 The data management server 100 implements the processing functions of the second embodiment by executing a program recorded in a computer-readable recording medium, for example. A program describing the contents of processing to be executed by the data management server 100 can be recorded in various recording media. For example, a program to be executed by the data management server 100 can be stored in the storage device 103 . The processor 101 loads at least part of the program in the storage device 103 into the memory 102 and executes the program. The program to be executed by the data management server 100 can also be recorded in a portable recording medium such as the optical disk 24, memory device 25, memory card 27, or the like. A program stored in a portable recording medium can be executed after being installed in the storage device 103 under the control of the processor 101, for example. Alternatively, the processor 101 can read and execute the program directly from the portable recording medium.

次に、暗号データを格納するＤＢ（秘匿化ＤＢ）に対する頻度分析攻撃について説明する。
図４は、頻度分析攻撃の一例を示す図である。平文のデータを記憶するＤＢ３１に基づいて、秘匿化ＤＢ３２が生成されているものとする。例えばＤＢ３１の各レコードに含まれる項目ごとの項目値が個別に暗号化され、秘匿化ＤＢ３２に格納されている。なお第２の実施の形態では、各項目に設定される項目値をキーワードと呼ぶこともある。 Next, a frequency analysis attack on a DB (anonymized DB) that stores encrypted data will be described.
FIG. 4 is a diagram showing an example of a frequency analysis attack. It is assumed that an anonymization DB 32 is generated based on the DB 31 that stores plaintext data. For example, an item value for each item included in each record of the DB 31 is individually encrypted and stored in the anonymization DB 32 . Note that in the second embodiment, the item value set for each item may be called a keyword.

図４の例では、同一の平文からは同一の暗号データが生成される確定的暗号化技術によって暗号化が行われているものとする。また攻撃者３３は、元のＤＢ３１の項目に用いられている項目名（「診療科」、「性別」、「年齢層」）および各項目に格納され得るキーワードの候補（キーワード一覧）のすべてを知っているものとする。 In the example of FIG. 4, it is assumed that encryption is performed by a deterministic encryption technique that generates the same encrypted data from the same plaintext. In addition, the attacker 33 obtains all the item names (“medical department”, “sex”, “age group”) used in the items of the original DB 31 and keyword candidates (keyword list) that can be stored in each item. assume you know.

攻撃者３３は、暗号文の出現頻度と公知の頻度情報を比較して、暗号文に対応する平文を類推することができる。例えば秘匿化ＤＢ３２の２列目には、値が２種類しか登録されていない。そのため攻撃者３３は、秘匿化ＤＢ３２の２列目の項目「性別」に該当すると類推することができる。 The attacker 33 can infer the plaintext corresponding to the ciphertext by comparing the appearance frequency of the ciphertext and known frequency information. For example, only two types of values are registered in the second column of the anonymization DB 32 . Therefore, it can be inferred that the attacker 33 corresponds to the item “sex” in the second column of the anonymization DB 32 .

さらに攻撃者３３は、秘匿化ＤＢ３２の２列目の値「ｔｍｕｅｓｆ」と「ｕｙｌｇｍ」との出現頻度を比較し、「ｕｙｌｇｍ」の方が多いことを確認できる。すると攻撃者３３は、人口分布を考えると出現頻度が多い方の「ｕｙｌｇｍ」が、「女」に対応する暗号データであると類推できる。 Furthermore, the attacker 33 can compare the frequency of appearance of the values "tmuesf" and "uylgm" in the second column of the anonymization DB 32 and confirm that "uylgm" is more common. Then, the attacker 33 can infer that "uylgm", which appears more frequently, is the encrypted data corresponding to "woman", considering the population distribution.

攻撃者３３は、複数の項目値の組み合わせによっても内容を類推できる。例えば攻撃者３３は、秘匿化ＤＢ３２の１列目の値「ｓＥｆｇｓｒ」に対応する２列目の値は「ｕｙｌｇｍ」のみであることを確認できる。すると攻撃者３３は、片方の性別との組み合わせしかない値は、診療科における「婦人科」であると類推することができ、「ｕｙｌｇｍ」の平文が「女」であることを強く確信できる。 The attacker 33 can also infer the contents by combining multiple item values. For example, the attacker 33 can confirm that the second column value corresponding to the first column value "sEfgsr" in the anonymization DB 32 is only "uylgm". Then, the attacker 33 can infer that the value with only one gender combination is "Gynecology" in the clinical department, and can be strongly convinced that the plaintext of "uylgm" is "female".

また攻撃者３３は、秘匿化ＤＢ３２の１列目の値「：ｏｐｆｙｙ」に対応する３列目の値は「ｊｒ８ｏｌｔ」のみであることを確認できる。すると攻撃者３３は、残りの３列目は年齢層であり、特定の年齢層との組み合わせしかない診療科は「小児科」であり、対応する年齢層は「児童」であると類推できる。 Also, the attacker 33 can confirm that the third column value corresponding to the first column value ":opfyy" in the anonymization DB 32 is only "jr8olt". Then, the attacker 33 can infer that the remaining third column is the age group, the clinical department that is only combined with a specific age group is "Pediatrics", and the corresponding age group is "Children".

なお、図４の例では、確定的暗号化によって暗号化された場合を想定しているが、確率的暗号化で暗号化された秘匿化ＤＢに対しても頻度分析攻撃は可能である。確率的暗号化とは、同じ平文でも暗号化するたびに異なる暗号文に暗号化する暗号化技術である。確率的暗号化を行うと、同じ平文から生成された複数の暗号文のそれぞれの値が異なるため、暗号化ＤＢ内を参照しただけでは、元の平文の出現頻度を数えられない。そのため、確率的暗号化で暗号することで、確定的暗号化よりも頻度分析攻撃が困難となる。 In the example of FIG. 4, it is assumed that the DB is encrypted by deterministic encryption, but a frequency analysis attack is also possible for an anonymized DB encrypted by probabilistic encryption. Probabilistic encryption is an encryption technology that encrypts the same plaintext into different ciphertext each time it is encrypted. When probabilistic encryption is performed, multiple ciphertexts generated from the same plaintext have different values, so the appearance frequency of the original plaintext cannot be counted simply by referring to the encryption DB. Therefore, encryption using probabilistic encryption makes frequency analysis attacks more difficult than deterministic encryption.

ただし、確率的暗号化を用いても、秘匿化ＤＢを検索対象とする秘匿化検索を許容すると、攻撃者３３は、検索クエリに一致した暗号文はすべて同じ平文の暗号文であることが分かり、それらの暗号文は確定的暗号に変換できてしまう。そのため確率的暗号化を用いても、攻撃者３３は、データが活用されるに従って秘匿化ＤＢ内の多数の暗号文を確定的暗号に変換でき、確定的暗号化で暗号化した場合と同様に頻度分析攻撃が可能となる。 However, even if probabilistic encryption is used, if anonymized searches for anonymized DBs are allowed, the attacker 33 will be able to find that all the ciphertexts that match the search query are the same plaintext ciphertexts. , their ciphertexts can be transformed into deterministic ciphers. Therefore, even if the probabilistic encryption is used, the attacker 33 can convert a large number of ciphertexts in the anonymization DB into deterministic encryption as the data is used, and the same as the encryption with the deterministic encryption A frequency analysis attack becomes possible.

このように頻度分析攻撃は、平文と暗号文に使用される文字や文字列の出現頻度を手掛りとして平文を類推し、盗み見る攻撃手法である。そして、元の平文の項目値に偏りがある場合に、その平文を暗号化した暗号文は、頻度分析攻撃に対して脆弱となる。 In this way, the frequency analysis attack is an attack technique in which the appearance frequency of characters and character strings used in plaintext and ciphertext is used as a clue to infer the plaintext and to steal a glance. If the item values of the original plaintext are biased, the ciphertext obtained by encrypting the plaintext becomes vulnerable to frequency analysis attacks.

そこで頻度分析攻撃に対する安全性を高めるために、ダミーデータを用いて、元の平文の出現頻度を攪乱させることが考えられる。このとき、元の平文での項目値の種類や、各項目値の出現頻度を考慮せずにダミーデータ内の各項目の項目値（ダミー値）を追加しても、出現頻度を適切に攪乱させることはできない。 Therefore, in order to increase security against frequency analysis attacks, it is conceivable to use dummy data to disturb the appearance frequency of the original plaintext. At this time, even if the item value (dummy value) of each item in the dummy data is added without considering the type of item value in the original plaintext and the appearance frequency of each item value, the appearance frequency is appropriately disturbed. I can't let you.

例えば元の平文のデータにおける項目値の出現頻度を考慮して、適切な内容のダミー値を有するダミーデータを登録することで、確実な頻度攪乱を実現することができる。この場合、データ登録サーバ２００，３００は、真のデータの頻度分布と同じ分布であるが、それぞれの値が別の値に変換されたダミーレコード群を生成する。このダミーレコード群の各ダミーレコードは真のデータのレコードに紐付き、１対１で対応する。そしてデータ登録サーバ２００，３００は、真のデータのレコードの項目値を別の項目値に変換して対応するダミーレコードのダミー値として設定する。この変換規則を定めたものを変換集合と呼ぶ。 For example, by registering dummy data having dummy values with appropriate contents in consideration of the appearance frequency of item values in the original plaintext data, reliable frequency disturbance can be realized. In this case, the data registration servers 200 and 300 generate a group of dummy records having the same distribution as the true data frequency distribution, but in which each value is converted to a different value. Each dummy record in this dummy record group is linked to a record of true data and corresponds one-to-one. Then, the data registration servers 200 and 300 convert the item value of the true data record into another item value and set it as the dummy value of the corresponding dummy record. A set of transformation rules is called a transformation set.

以下に、第２の実施の形態に使用する文字または用語の意味について説明する。
「ｋ」は、予め設定されたセキュリティパラメータであり、２以上の整数が設定される。システムの管理者は、１つの項目の複数のキーワードのうち、頻度分析攻撃を受けたときに攻撃者３３が絞り込むことを許容可能な最小のキーワード数をｋに設定する。 The meanings of characters and terms used in the second embodiment will be explained below.
“k” is a preset security parameter and is set to an integer of 2 or more. The system administrator sets k to be the minimum number of keywords that the attacker 33 can narrow down when subjected to a frequency analysis attack, among a plurality of keywords for one item.

「真のデータ」は、平文のＤＢに格納されているデータである。「真のデータのレコード」は、真のデータ内のキーワードを含むレコードである。なお真のデータのレコードは、第１の実施の形態の秘匿レコードの一例である。「ダミー値」は、真のデータの頻度分布を攪乱するために追加した値である。「ダミーレコード」は、ダミー値を含むレコードである。「真の値」は、平文のＤＢに設定可能なキーワードである。真の値は、第１の実施の形態における第１項目値の一例である。 "True data" is data stored in the plaintext DB. A "true data record" is a record that contains keywords in the true data. Note that the true data record is an example of the confidential record of the first embodiment. A "dummy value" is a value added to disturb the frequency distribution of true data. A "dummy record" is a record that contains dummy values. "True value" is a keyword that can be set in the plaintext DB. A true value is an example of the first item value in the first embodiment.

「Ｇ」は群数であり、ダミーデータの追加によりデータ量が平文データの何倍になるかを示す値でもある。「群」は、複数のレコードの集合である。真のデータのレコードの集合が群「０」である。真のデータのレコード数と同数のダミーレコードを含むＧ－１個の群「１，２，・・・，Ｇ－１」が生成される。群「０」に含まれる１つのレコードに対し、Ｇ－１個のダミーレコードが生成され、Ｇ－１個のダミーレコードは、それぞれ群「１，２，・・・，Ｇ－１」に１つずつ追加される。「フラグ」は、端末装置４００，５００がダミーレコードを見分けられるように、ダミーレコードに付与される値である。 "G" is the number of groups, and is also a value indicating how many times the amount of data becomes the plaintext data due to the addition of dummy data. A "group" is a set of multiple records. The set of true data records is group "0". G−1 groups “1, 2, . G−1 dummy records are generated for one record included in group “0”, and G−1 dummy records are assigned to group “1, 2, . . . , G−1” respectively. added one by one. A "flag" is a value given to a dummy record so that the terminal devices 400 and 500 can distinguish the dummy record.

「変換集合」は、各列（項目）について、真の値、分割キーワード（真の値から１つまたは複数生成されたキーワード）、またはダミー要素を含む集合である。変換集合には、１つの真の値に対してどのような値のダミーレコードを追加するかを決定するためのルールが表されている。変換集合の要素数は群数Ｇと等しい。変換集合は、分割クエリの攪乱や、秘匿化検索結果の復元にも用いられる。 A “transformation set” is a set including true values, split keywords (one or more keywords generated from true values), or dummy elements for each column (item). The conversion set expresses rules for determining what values of dummy records to add for one true value. The number of elements in the transformation set is equal to the number G of groups. Transformation sets are also used to perturb split queries and restore anonymized search results.

「ｊ」は列番号である。ｊは、レコードに登録された値の項目を特定するインデックスとして用いられる。「Ｘ_j」は、レコード内のｊ番目の列のフィールドに設定可能な項目値（キーワード）の種類数である。 "j" is the column number. j is used as an index specifying the item of the value registered in the record. “X _j ” is the number of types of item values (keywords) that can be set in the field of the j-th column in the record.

「ダミー要素」は、Ｘ_jがＧより小さいかまたはＧの倍数でない場合、変換集合に含める要素の不足を埋めるためのダミーの要素である。ダミー要素は、キーワード一覧には含まれない値である。 A "dummy element" is a dummy element to fill in the missing elements for inclusion in the transform set when X _j is less than G or not a multiple of G. A dummy element is a value that is not included in the keyword list.

次に、元の平文のデータにおける項目値の出現頻度を考慮したダミーデータの登録手法について説明する。
ダミーデータの登録手法の一例として、ダミー要素を用いる方法が考えられる。ダミー要素を用いる場合、データ登録サーバ２００，３００は、各項目の取り得るキーワードまたはダミー値から任意にＧ個選んだものを要素として変換集合を生成する。そしてデータ登録サーバ２００，３００は、各項目の取り得るキーワードがいずれか１つの変換集合の要素として含まれるまで変換集合を生成する。 Next, a method of registering dummy data in consideration of the appearance frequency of item values in the original plaintext data will be described.
As an example of a method of registering dummy data, a method using dummy elements is conceivable. When dummy elements are used, the data registration servers 200 and 300 generate a conversion set with G arbitrarily selected keywords or dummy values of each item as elements. Then, the data registration servers 200 and 300 generate conversion sets until one of the possible keywords of each item is included as an element of the conversion set.

図５は、ダミー要素を用いた変換集合の生成例を示す図である。図５の例では、真のデータのレコードは、「血圧」と「血液型」の項目が含まれているものとする。キーワードリスト３４には、項目ごとに、その項目に設定可能なキーワードが登録されている。図５の例では、「血圧」に設定可能な項目値が３種類（正常、低血圧、高血圧）であり、「血液型」に設定可能な項目値が８種類（Ａ＋、Ｂ＋、Ｏ＋、ＡＢ＋、Ａ－、Ｂ－、Ｏ－、ＡＢ－）である。 FIG. 5 is a diagram illustrating an example of generating a transformation set using dummy elements. In the example of FIG. 5, it is assumed that the true data record includes the items "blood pressure" and "blood type." Keywords that can be set for each item are registered in the keyword list 34 . In the example of FIG. 5, there are three types of item values (normal, hypotensive, hypertensive) that can be set for "blood pressure", and eight types (A+, B+, O+, AB+) that can be set for "blood type". , A-, B-, O-, AB-).

図５の変換集合一覧３５には、ｋ＝３，Ｇ＝４の場合に、３種類のキーワードをもつ項目「血圧」と８種類のキーワードをもつ項目「血液型」の２項目それぞれに対応して生成された変換集合３５ａ，３５ｂ，３５ｃが示されている。項目「血圧」に対しては、｛正常、低血圧、高血圧、Ｄ１｝を要素として含む変換集合３５ａが１つだけ生成されている。「Ｄ１」は、ダミー要素である。項目「血液型」に対しては、｛Ａ＋，Ｂ＋，Ｏ＋，ＡＢ＋｝を要素として含む変換集合３５ｂと、｛Ａ－，Ｂ－，Ｏ－，ＡＢ－｝を要素として含む変換集合３５ｃとの２つが生成されている。 In the conversion set list 35 of FIG. 5, in the case of k=3 and G=4, there are two items corresponding to the item "blood pressure" having three types of keywords and the item "blood type" having eight types of keywords. Transformation sets 35a, 35b, 35c generated by For the item "blood pressure", only one conversion set 35a including {normal, low blood pressure, high blood pressure, D1} as elements is generated. "D1" is a dummy element. For the item "blood type", a transformation set 35b containing {A+, B+, O+, AB+} as elements and a transformation set 35c containing {A-, B-, O-, AB-} as elements. Two are generated.

変換集合３５ａ，３５ｂ，３５ｃは、循環リスト構造となっており、要素が順番に配置されている。変換集合３５ａ，３５ｂ，３５ｃ内の各要素には、先頭から順に、０から昇順の要素番号が付与される。例えば変換集合３５ａでは、図中の左端を先頭とすると、「正常」が先頭の要素（要素番号「０」）であり、「低血圧」が次の要素（要素番号「１」）である。循環リスト構造となっているため、例えば最後の要素「Ｄ１」の次の要素は「正常」となる。 The conversion sets 35a, 35b, and 35c have a circular list structure, and the elements are arranged in order. Each element in the transformation sets 35a, 35b, and 35c is assigned an element number in ascending order from 0, starting from the top. For example, in the conversion set 35a, "normal" is the first element (element number "0") and "hypotension" is the next element (element number "1"), starting from the left end in the figure. Because of the circular list structure, for example, the element following the last element "D1" is "normal".

データ登録サーバ２００，３００は、真のデータのレコードのキーワードを、変換集合３５ａ，３５ｂ，３５ｃに基づいて変換することで、ダミーレコードに設定するダミー値を生成する。例えばデータ登録サーバ２００，３００は、真のデータのレコードの１つのキーワードに基づいて、ダミーレコード群ごとに、そのダミーレコード群内のダミーレコードに設定するダミー値を１つ生成する。 The data registration servers 200 and 300 generate dummy values to be set in dummy records by converting the keywords of the true data records based on the conversion sets 35a, 35b, and 35c. For example, the data registration servers 200 and 300 generate one dummy value to be set in the dummy record in each dummy record group based on one keyword of the true data record.

図６は、ダミー要素を用いた場合の項目値の変換例を示す図である。例えばデータ登録サーバ２００，３００は、真のデータのレコードについて、項目ごとに登録されているキーワードを参照し、そのキーワードが含まれる変換集合と、そのキーワードに該当する要素ｂ_aを特定する。ａは０以上の整数の要素番号であり、要素ｂ_aは要素番号ａの要素である。そしてデータ登録サーバ２００，３００は、真のデータに含まれるキーワードの要素ｂ_aに基づいて、以下の式（１）により、群番号ｇのダミーレコード群のダミー値ｃ_a,gを、変換集合を参照して決定する。
ｃ_a,g＝ｂ_mod(a+g,G) （１）
図６の例では、真のデータのレコードに出現する項目値は「低血圧」であるものとする。ダミーレコード群が３つである（Ｇ＝４）ため、データ登録サーバ２００，３００は、キーワード「低血圧」を含む変換集合３５ａに基づいて、３つのダミーレコード群（ｇ＝１，２，３）それぞれのダミーレコードに設定するダミー値を生成する。まずデータ登録サーバ２００，３００は、キーワード「低血圧」に対応する要素の変換集合３５ａ内での要素番号「１」（ａ＝１）を取得する。 FIG. 6 is a diagram showing an example of conversion of item values when dummy elements are used. For example, the data registration servers 200 and 300 refer to the keyword registered for each item of the true data record, and identify the conversion set containing the keyword and the element b _a corresponding to the keyword. a is an integer element number equal to or greater than 0, and the element b _a is the element with the element number a. Then, the data registration servers 200 and 300 convert the dummy values c _a _,g of the dummy record group with the group number g to to determine.
c _a,g =b _mod(a+g,G) (1)
In the example of FIG. 6, it is assumed that the item value appearing in the true data record is "hypotension". Since there are three dummy record groups (G=4), the data registration servers 200 and 300 create three dummy record groups (g=1, 2, 3 ) generate a dummy value to set in each dummy record. First, the data registration servers 200 and 300 acquire the element number "1" (a=1) in the conversion set 35a of the element corresponding to the keyword "hypotension".

データ登録サーバ２００，３００は、群番号「１」（ｇ＝１）のダミーレコード群のダミー値ｃ_1,1を生成する場合、まず、ｍｏｄ（１＋１，４）＝２を計算する。そしてデータ登録サーバ２００，３００は、計算して得られた値「２」を要素番号とする変換集合３５ａ内の要素ｂ₂に対応するキーワード「高血圧」を、群番号「１」のダミーレコード群に設定するダミー値に決定する。 When the data registration servers 200 and 300 generate the dummy value c _1,1 of the dummy record group with the group number “1” (g=1), first, mod (1+1,4)=2 is calculated. Then, the data registration servers 200 and 300 register the keyword "hypertension" corresponding to the element b2 in the conversion set 35a with the calculated value " ₂ " as the element number into the dummy record group with the group number "1". Decide on a dummy value to be set to

データ登録サーバ２００，３００は、群番号「２」（ｇ＝２）のダミーレコード群のダミー値ｃ_1,2を生成する場合、まず、ｍｏｄ（１＋２，４）＝３を計算する。そしてデータ登録サーバ２００，３００は、計算して得られた値「３」を要素番号とする変換集合３５ａ内の要素ｂ₃に対応するダミー要素「Ｄ１」を、群番号「２」のダミーレコード群に設定するダミー値に決定する。 The data registration servers 200 and 300 first calculate mod (1+2,4)=3 when generating the dummy value _c1,2 of the dummy record group with the group number "2" (g=2). Then, the data registration servers 200 and 300 register the dummy element "D1" corresponding to the element b3 in the conversion set 35a whose element number is the calculated value " ₃ " as the dummy record of the group number "2". Determine the dummy value to be set for the group.

データ登録サーバ２００，３００は、群番号「３」（ｇ＝３）のダミーレコード群のダミー値ｃ_1,3を生成する場合、まず、ｍｏｄ（１＋３，４）＝０を計算する。そしてデータ登録サーバ２００，３００は、計算して得られた値「０」を要素番号とする変換集合３５ａ内の要素ｂ₀に対応するキーワード「正常」を、群番号「３」のダミーレコード群に設定するダミー値に決定する。 The data registration servers 200 and 300 first calculate mod (1+3,4)=0 when generating the dummy value _c1,3 of the dummy record group with the group number "3" (g=3). Then, the data registration servers 200 and 300 register the keyword "normal" corresponding to the element b ₀ in the conversion set 35a with the calculated value "0" as the element number into the dummy record group with the group number "3". Decide on a dummy value to be set to

なお、図６に示したようなキーワードの変換元と変換先との関係は、全単射の関係の一例である。全単射関係を満たしていれば、図６に示す例とは別の規則で変換元のキーワードと変換先のキーワードまたはダミー要素との関係を定義してもよい。 Note that the relationship between the conversion source and the conversion destination of a keyword as shown in FIG. 6 is an example of a bijective relationship. As long as the bijective relationship is satisfied, the relationship between the conversion-source keyword and the conversion-destination keyword or dummy element may be defined by rules other than the example shown in FIG.

データ登録サーバ２００，３００は、図６に示したようなダミー値の生成を、真のデータに出現する項目値それぞれについて行う。なお、以下の説明では、ある平文に対応する暗号文をＨ（平文）と表すこととする。 The data registration servers 200 and 300 generate dummy values as shown in FIG. 6 for each item value appearing in the true data. In the following description, a ciphertext corresponding to a certain plaintext is represented as H (plaintext).

図７は、ダミー要素を用いた場合のダミー値の生成例を示す図である。図７の例では、真のデータ３６には、項目「血圧」のキーワードとして、「正常」が２回出現し、「低血圧」と「高血圧」とがそれぞれ１回ずつ出現している。また項目「血液型」の項目値として、「Ａ＋」が２回出現し、「Ｏ－」と「ＡＢ－」とがそれぞれ１回ずつ出現している。 FIG. 7 is a diagram illustrating an example of generating dummy values when dummy elements are used. In the example of FIG. 7, in the true data 36, "normal" appears twice as a keyword for the item "blood pressure", and "hypotension" and "hypertension" appear once each. As the item value of the item "blood type", "A+" appears twice, and "O-" and "AB-" appear once each.

真のデータ３６に出現するキーワードそれぞれに基づいて、ダミーレコード群それぞれに設定するダミー値が生成されている。例えば血圧「正常」に基づいて、群番号１（ｇ＝１）のダミーレコード群用のダミー値「低血圧」、群番号２（ｇ＝２）のダミーレコード群用のダミー値「高血圧」、群番号３（ｇ＝３）のダミーレコード群用のダミー値「Ｄ１」が生成されている。また例えば血液型「Ａ＋」に基づいて、群番号１（ｇ＝１）のダミーレコード群用のダミー値「Ｂ＋」、群番号２（ｇ＝２）のダミーレコード群用のダミー値「Ｏ＋」、群番号３（ｇ＝３）のダミーレコード群用のダミー値「ＡＢ＋」が生成されている。 A dummy value to be set in each dummy record group is generated based on each keyword appearing in the true data 36 . For example, based on the blood pressure “normal”, the dummy value “hypotension” for the dummy record group of group number 1 (g=1), the dummy value “hypertension” for the dummy record group of group number 2 (g=2), A dummy value "D1" for a dummy record group with group number 3 (g=3) is generated. Also, for example, based on the blood type "A+", the dummy value "B+" for the dummy record group of group number 1 (g=1) and the dummy value "O+" for the dummy record group of group number 2 (g=2) , a dummy value "AB+" for a dummy record group with group number 3 (g=3) is generated.

データ登録サーバ２００，３００は、生成したダミー値をダミーレコードに割り当て、項目値を暗号化することで登録データを生成する。
図８は、ダミー要素を用いた場合の登録データの一例を示す図である。登録データ３７には、真のデータ３６のレコード群３７ａとダミーレコード群３７ｂ，３７ｃ，３７ｄが含まれている。登録データ３７内の各レコードにはフラグが付与されている。そして登録データ３７に含まれるキーワード（フラグ値も含む）が暗号化されている。 The data registration servers 200 and 300 assign the generated dummy values to the dummy records and encrypt the item values to generate registration data.
FIG. 8 is a diagram showing an example of registration data when dummy elements are used. The registered data 37 includes a record group 37a of true data 36 and dummy record groups 37b, 37c, and 37d. Each record in the registration data 37 is given a flag. Keywords (including flag values) included in the registration data 37 are encrypted.

フラグ値は、真のデータのレコードとダミーレコードとを識別するために用いられる。例えば真のデータのレコードにはフラグ値「０」を設定し、ダミーレコードにはフラグ値「１」を設定することができる。この場合、Ｇ≧３のときにはフラグ値「０」の暗号文（Ｈ（０））よりもフラグ値「１」の暗号文（Ｈ（１））の方が多くなる。すると、攻撃者３３は、フラグ値の暗号文の出現頻度に基づいて、そのレコードが真のデータのレコードなのかを判断できてしまう。そこでデータ登録サーバ２００，３００は、フラグ値の暗号文についても頻度攪乱を行う。 The flag value is used to distinguish between true data records and dummy records. For example, a true data record can be set with a flag value of "0", and a dummy record can be set with a flag value of "1". In this case, when G≧3, there are more ciphertexts (H(1)) with flag value “1” than ciphertexts (H(0)) with flag value “0”. Then, the attacker 33 can determine whether the record is a true data record based on the appearance frequency of the ciphertext of the flag value. Therefore, the data registration servers 200 and 300 perform frequency perturbation on the ciphertext of the flag value as well.

例えばデータ登録サーバ２００，３００は、図８に示すように各レコードが属する群の群番号を、そのレコードのフラグとして使用することができる。各群に属するレコード数は等しいため、これによりフラグ値の出現頻度がすべて等しくなる。 For example, the data registration servers 200 and 300 can use the group number of the group to which each record belongs as the flag of that record, as shown in FIG. Since the number of records belonging to each group is equal, this ensures that all flag values appear with equal frequency.

またデータ登録サーバ２００，３００は、フラグ値を決定するために、真のデータのレコード群およびダミーレコード群それぞれについて、同一の変数の値に対して異なる値を出力する異なる関数を用意してもよい。この場合、データ登録サーバ２００，３００は、用意した関数にＩＤ列の値を変数ｘとして入力した値をフラグ値とする。 In order to determine the flag value, the data registration servers 200 and 300 may prepare different functions for outputting different values for the same variable value for each of the true data record group and the dummy record group. good. In this case, the data registration servers 200 and 300 use the value obtained by inputting the value of the ID column as the variable x into the prepared function as the flag value.

群番号ｇの群（真のデータのレコード群またはダミーレコード群）のフラグ値生成関数をｆ_g（ｘ）としたとき、例えば以下のようなフラグ値生成関数を用いることができる。
ｆ_g（ｘ）＝（Ｇ＋１）ｘ＋ｇ（２）
図８の例ではＧ＝４である。従って、真のデータのレコード群３７ａ（群番号「０」）のフラグ値生成関数は「ｆ₀（ｘ）＝４ｘ＋０」となる。ダミーレコード群３７ｂ（群番号「１」）のフラグ値生成関数は「ｆ₁（ｘ）＝４ｘ＋１」となる。ダミーレコード群３７ｃ（群番号「２」）のフラグ値生成関数は「ｆ₂（ｘ）＝４ｘ＋２」となる。ダミーレコード群３７ｄ（群番号「３」）のフラグ値生成関数は「ｆ₃（ｘ）＝４ｘ＋３」となる。 Assuming that the flag value generation function of the group of group number g (true data record group or dummy record group) is f _g (x), for example, the following flag value generation function can be used.
f _g (x)=(G+1)x+g (2)
In the example of FIG. 8, G=4. Therefore, the flag value generation function of the true data record group 37a (group number "0") is "f ₀ (x)=4x+0". The flag value generation function of the dummy record group 37b (group number "1") is "f ₁ (x)=4x+1". The flag value generation function of the dummy record group 37c (group number "2") is "f ₂ (x)=4x+2". The flag value generation function of the dummy record group 37d (group number "3") is "f ₃ (x)=4x+3".

端末装置４００，５００は、検索の際には、真のデータのレコード群３７ａまたはいずれか１つのダミーレコード群３７ｂ，３７ｃ，３７ｄを検索対象として指定した検索クエリを発行する。データ管理サーバ１００は、検索クエリに対する応答として、検索条件に合致したレコードを検索結果として応答する。そのときデータ管理サーバ１００は、検索結果にＩＤ列とフラグ列のフラグ値とを含める。端末装置４００，５００は、検索結果に示されるレコードのＩＤを、群ごとのフラグ値生成関数それぞれに入力し、得られた関数値を暗号化する。そして端末装置４００，５００は、フラグ値生成関数の関数値の暗号文とレコードのフラグ値（暗号文）と比較する。端末装置４００，５００は、比較の結果が一致したフラグ値生成関数に対応する群が、そのレコードが属する群であると判断する。 When searching, the terminal devices 400 and 500 issue a search query specifying the true data record group 37a or any one of the dummy record groups 37b, 37c, and 37d as a search target. As a response to the search query, the data management server 100 responds with records matching the search conditions as search results. At that time, the data management server 100 includes the ID column and the flag value of the flag column in the search result. The terminal devices 400 and 500 input the ID of the record shown in the search result to each flag value generation function for each group, and encrypt the obtained function value. Terminal devices 400 and 500 then compare the ciphertext of the function value of the flag value generating function with the flag value (ciphertext) of the record. The terminal devices 400 and 500 determine that the group corresponding to the flag value generation function with which the comparison results match is the group to which the record belongs.

このようにして生成された登録データ３７では、同一の変換集合に属する要素に対応する暗号文の出現頻度の均一化が図られている。
図９は、暗号文の均一化を説明する図である。例えば血圧の項目に設定可能なキーワードは、すべて１つの変換集合３５ａ（図５参照）に属している。真のデータ３６におけるキーワード「正常」の出現頻度は「２」、キーワード「低血圧」の出現頻度は「１」、キーワード「高血圧」の出現頻度は「１」である。 In the registration data 37 generated in this manner, the frequency of appearance of ciphertexts corresponding to elements belonging to the same transformation set is made uniform.
FIG. 9 is a diagram for explaining uniformization of ciphertexts. For example, all the keywords that can be set for the blood pressure item belong to one conversion set 35a (see FIG. 5). The appearance frequency of the keyword "normal" in the true data 36 is "2", the appearance frequency of the keyword "hypotension" is "1", and the appearance frequency of the keyword "hypertension" is "1".

登録データ３７では、各キーワードおよびダミー要素の値それぞれの暗号文の出現頻度が均一化されている。例えばキーワード「正常」の暗号文「Ｈ（正常）」、キーワード「低血圧」の暗号文「Ｈ（低血圧）」、キーワード「高血圧」の暗号文「Ｈ（高血圧）」、およびダミー要素の値「Ｄ１」の暗号文「Ｈ（Ｄ１）」それぞれの出現頻度は、いずれも「４」である。 In the registration data 37, the frequency of appearance of each ciphertext for each keyword and dummy element value is uniformed. For example, the ciphertext "H (normal)" for the keyword "normal", the ciphertext "H (low blood pressure)" for the keyword "hypotension", the ciphertext "H (hypertension)" for the keyword "hypertension", and the value of the dummy element The frequency of occurrence of each of the ciphertexts "H(D1)" of "D1" is "4".

図５に示した変換集合の生成例からも分かるように、各変換集合にはｋ個（図５の例ではｋ＝３）以上のキーワードが含まれる。従って攻撃者３３は、特定の暗号文の出現頻度が分かっても、その暗号文が、同一の変換集合に属するキーワードのうちのどのキーワードの暗号文なのかを特定することはできない。すなわち攻撃者３３は、暗号文に対応するキーワードの候補をｋ個までしか絞り込めない。その結果、頻度分析攻撃に対する安全性が向上している。 As can be seen from the conversion set generation example shown in FIG. 5, each conversion set includes k (k=3 in the example of FIG. 5) or more keywords. Therefore, even if the attacker 33 knows the appearance frequency of a specific ciphertext, it is impossible to specify which keyword of the keywords belonging to the same transformation set the ciphertext belongs to. That is, the attacker 33 can only narrow down the number of keyword candidates corresponding to the ciphertext to k. As a result, security against frequency analysis attacks is improved.

なお、暗号文に対応するキーワードの候補をｋ個までしか絞り込めないようにするためには、群数Ｇの値を適切に決定することが重要となる。
図１０は、適切な群数Ｇについて説明する図である。群数Ｇは、ｋに応じて決定される。前述のようにｋは、暗号文に対応するキーワードの候補をｋ個までしか絞れないようにするために予め設定するセキュリティパラメータである。従ってｋは、システムに求められる頻度分析攻撃に対する安全性の度合いに応じて決められる。 It should be noted that it is important to appropriately determine the value of the number of groups G so that the number of keyword candidates corresponding to the ciphertext can be narrowed down to k.
FIG. 10 is a diagram illustrating an appropriate number G of groups. The number of groups G is determined according to k. As described above, k is a security parameter set in advance so that the number of keyword candidates corresponding to the ciphertext can be narrowed down to k. Therefore, k is determined according to the degree of security required for the system against frequency analysis attacks.

このとき設定可能なキーワードの種類数Ｘ_j＜ｋとなる項目も存在する。このような項目は、変換集合内にｋ個以上のキーワードを含めることができない。そのため攻撃者３３は、該当項目の暗号文に対応するキーワードが、Ｘ_j個（ｋ＜）のキーワードのどれかであることが分かる。しかしこのような項目の暗号文に対応するキーワードが、Ｘ_j個のどれかであることは、公知の情報であり、攻撃者３３が知っていても問題ない。例えば図１０の例では、項目「性別」について設定可能なキーワードは「男」か「女」の２つしかない。しかし性別が「男」と「女」のいずれかであることは、攻撃者３３が知っていても問題ない情報である。 At this time, there are also items where the number of types of settable keywords X _j <k. Such an item cannot contain more than k keywords in the transform set. Therefore, the attacker 33 knows that the keyword corresponding to the ciphertext of the corresponding item is one of the X _j (k<) keywords. However, it is publicly known information that the keyword corresponding to the ciphertext of such an item is one of X _j pieces, and there is no problem even if the attacker 33 knows it. For example, in the example of FIG. 10, there are only two keywords, "male" and "female", that can be set for the item "sex". However, there is no problem even if the attacker 33 knows that the gender is either "male" or "female".

他方、設定可能なキーワードの種類数Ｘ_j≧ｋとなる項目であれば、変換集合内にｋ個以上のキーワードが含まれるようにするのが適切である。そこで、Ｘ_jがｋの倍数でない項目がある場合、データ登録サーバ２００，３００は、Ｇを調整することで、ｋ個以上のキーワードが各変換集合に含まれるようにする。 On the other hand, if the number of types of settable keywords X _j ≥k, it is appropriate to include k or more keywords in the conversion set. Therefore, if there are items where X _j is not a multiple of k, the data registration servers 200 and 300 adjust G so that k or more keywords are included in each conversion set.

例えば図１０の「血液型」の例では、ｋ＝３に対し、Ｘ_j＝４である。このとき変換集合一覧３８のようにＧ＝３にすると、検索者が「Ｏ」または「ＡＢ」を検索した際、その検索結果を見た攻撃者３３は、検索対象が「Ｏ」と「ＡＢ」の２つのどちらかだと分かってしまう。そこで、Ｘ_j≧ｋの項目について該当項目に設定可能なキーワードが各変換集合にｋ個以上含まれるようにするためには、Ｇ≧ｋのうちＧ＝３以外となるようにＧが決定される。例えば変換集合一覧３９のようにＧ＝４にすれば、いずれの変換集合にも、対応する項目に設定可能なキーワードが３つ以上含まれる。 For example, in the "blood type" example of FIG. 10, X _j =4 for k=3. At this time, if G = 3 as in the conversion set list 38, when the searcher searches for "O" or "AB", the attacker 33 who sees the search results will see that the search targets are "O" and "AB ”, it will be understood that it is one of the two. Therefore, in order for each conversion set to include k or more keywords that can be set for the item of X _j ≧k, G is determined such that G≧k other than G=3. be. For example, if G=4 as in the conversion set list 39, any conversion set includes three or more keywords that can be set in the corresponding item.

図１１は、頻度攪乱後の頻度分布の一例を示す図である。図１１に示す頻度分布表４０は、５種類のキーワード「Ａ」、「Ｂ」、「Ｃ」、「Ｄ」、「Ｅ」を設定可能な項目を有するＤＢにおける頻度攪乱後の各キーワードおよびダミー要素の値「Ｆ」の出現頻度を表している。図１１の例ではＧ＝３である。変換集合は、「Ａ，Ｂ，Ｃ」と「Ｄ，Ｅ，Ｆ」の２つである。 FIG. 11 is a diagram showing an example of frequency distribution after frequency perturbation. The frequency distribution table 40 shown in FIG. 11 has five types of keywords "A", "B", "C", "D", and "E" that can be set for each keyword and dummy keywords after frequency disturbance in the DB. It represents the appearance frequency of the element value “F”. In the example of FIG. 11, G=3. There are two transformation sets, 'A, B, C' and 'D, E, F'.

群番号「１」のダミーレコード群のダミー値は、真のデータの項目値を変換集合に基づいて「Ａ→Ｂ」、「Ｂ→Ｃ」、「Ｃ→Ａ」、「Ｄ→Ｅ」、「Ｅ→Ｆ」、「Ｆ→Ｄ」と変換することで生成される。群番号「２」のダミーレコード群のダミー値は、真のデータの項目値を変換集合に基づいて「Ａ→Ｃ」、「Ｂ→Ａ」、「Ｃ→Ｂ」、「Ｄ→Ｆ」、「Ｅ→Ｄ」、「Ｆ→Ｅ」と変換することで生成される。 The dummy values of the dummy record group with the group number "1" are converted from the item values of the true data to "A→B", "B→C", "C→A", "D→E", It is generated by converting “E→F” and “F→D”. The dummy values of the dummy record group with the group number "2" are converted from the item values of the true data to "A→C", "B→A", "C→B", "D→F", It is generated by converting "E→D" and "F→E".

真のデータにおける「Ａ」の出現頻度は「１０」、「Ｂ」の出現頻度は「７」、「Ｃ」の出現頻度は「２」、「Ｄ」の出現頻度は「１」、「Ｅ」の出現頻度は「５」、「Ｆ」の出現頻度は「０」である。すると、頻度攪乱後の変換集合「Ａ，Ｂ，Ｃ」に含まれる各キーワードの出現頻度は、すべて「１９」となる。また頻度攪乱後の変換集合「Ｄ，Ｅ，Ｆ」に含まれる各キーワードまたはダミー要素の値の出現頻度は、すべて「６」となる。すなわち項目値の３つずつの集合ごとに、集合内での項目値の出現頻度の均一化が図られている。 In the true data, the appearance frequency of "A" is "10", the appearance frequency of "B" is "7", the appearance frequency of "C" is "2", the appearance frequency of "D" is "1", and "E is "5", and the frequency of appearance of "F" is "0". Then, the appearance frequency of each keyword included in the conversion set "A, B, C" after the frequency disturbance is all "19". Also, the frequency of appearance of each keyword or dummy element value included in the conversion set "D, E, F" after the frequency disturbance is "6". That is, for each set of three item values, the frequency of appearance of the item values within the set is made uniform.

また各ダミーレコード群のダミー値は、真のデータのキーワードを１対１で変換することで生成されている。そのためキーワードの出現頻度のばらつき度合いは変わらない。例えば出現頻度が多い順にその出現頻度の値を並べた場合、真のデータのレコード群および２つのダミーレコード群のいずれにおいても「１０，７，５，２，１，０」である。このように、ダミーレコード群ごとの項目値の出現頻度のばらつき度合いは、すべてのダミーレコード群について真のデータと等しくなる。その結果、どのレコード群が真のデータのレコード群なのかを、キーワードの出現頻度のばらつき度合いから判断することが困難となる。また、すべてのダミーレコード群に真のデータの各レコードに１対１で対応するダミーレコードが存在する。そのため真のデータのあるレコードを削除する際には対応するダミーレコードを合わせて削除することで、データの秘匿性を維持したまま真のデータの任意のレコードを削除することができる。 The dummy value of each dummy record group is generated by converting the keywords of the true data on a one-to-one basis. Therefore, the degree of variation in the frequency of appearance of the keywords does not change. For example, when the values of the frequency of appearance are arranged in descending order of appearance frequency, they are "10, 7, 5, 2, 1, 0" for both the true data record group and the two dummy record groups. In this way, the degree of variation in the frequency of occurrence of item values for each dummy record group is equal to the true data for all dummy record groups. As a result, it becomes difficult to determine which record group is the true data record group from the degree of variation in keyword appearance frequency. In addition, all dummy record groups have dummy records corresponding to each record of true data on a one-to-one basis. Therefore, by deleting a corresponding dummy record when deleting a record containing true data, any record containing true data can be deleted while maintaining the confidentiality of the data.

このようにダミー要素を用いることで、頻度分析攻撃に対する安全性を向上させることが可能である。ただし、ダミー要素を用いたとしても、複数の項目のキーワード間の組合せ頻度を知っている攻撃者３３に対しては、検索条件に含まれるキーワードの候補がｋ個未満に絞られてしまう可能性がある。 By using dummy elements in this way, it is possible to improve security against frequency analysis attacks. However, even if dummy elements are used, there is a possibility that the number of keyword candidates included in the search condition will be narrowed down to less than k for the attacker 33 who knows the combination frequency between keywords of multiple items. There is

例えば図１１の「変換集合２」が項目「性別」についての変換集合であるものとする。性別のキーワードが「男」と「女」しかなければ、ｋ＝３の場合であっても図１１に示すような変換集合を生成することが許容される。このとき複数の項目のキーワード間の組合せ頻度を用いた頻度分析攻撃が行われると、「変換集合１」に含まれるキーワードについても、ｋ未満に絞り込まれる可能性がある。 For example, assume that "conversion set 2" in FIG. 11 is a conversion set for the item "gender". If the only gender keywords are "male" and "female," it is permissible to generate a transformation set as shown in FIG. 11 even when k=3. At this time, if a frequency analysis attack using the combination frequency between keywords of a plurality of items is performed, the keywords included in "transformation set 1" may also be narrowed down to less than k.

図１２は、ダミー要素を用いた場合の組み合わせ頻度の一例を示す図である。図１２にはｋ＝Ｇ＝３のときの変換集合一覧４１とそのときの組み合わせ頻度表４２とが示されている。 FIG. 12 is a diagram showing an example of combination frequencies when dummy elements are used. FIG. 12 shows a transformation set list 41 when k=G=3 and a combination frequency table 42 at that time.

変換集合一覧４１には血圧と性別との変換集合が示されている。血圧の変換集合には「正常」、「低血圧」、「高血圧」が含まれる。性別の変換集合には「男」、「女」、およびダミー要素の値「Ｄ１」が含まれる。 A conversion set list 41 shows conversion sets for blood pressure and sex. The transform set for blood pressure includes "normal," "hypotensive," and "hypertensive." The gender transform set includes "male", "female", and a dummy element value of "D1".

組み合わせ頻度表４２には、血圧の値と性別の値との組み合わせ（論理積）に対応するレコードの出現頻度が示されている。真の頻度は、真のデータのレコード群内の該当するレコードの数である。攪乱後頻度は、真のデータのレコード群とダミーレコード群とにおける該当するレコードの数である。 The combination frequency table 42 shows the frequency of occurrence of records corresponding to combinations (logical products) of blood pressure values and gender values. The true frequency is the number of relevant records in the true data record set. The post-perturbation frequency is the number of corresponding records in the true data record group and the dummy record group.

例えば血圧が「正常」で性別が「男」のレコードは、真のデータのレコード群に「ｎ１」件含まれる。血圧が「低血圧」で性別が「女」のレコードは、真のデータのレコード群に「ｎ２」件含まれる。血圧が「高血圧」で性別が「Ｄ１」のレコードは、真のデータのレコード群に「ｎ３」（ｎ３＝０）件含まれる。血圧が「正常」で性別が「女」のレコードは、真のデータのレコード群に「ｎ４」件含まれる。血圧が「低血圧」で性別が「Ｄ１」のレコードは、真のデータのレコード群に「ｎ５」（ｎ５＝０）件含まれる。血圧が「高血圧」で性別が「男」のレコードは、真のデータのレコード群に「ｎ６」件含まれる。血圧が「正常」で性別が「Ｄ１」のレコードは、真のデータのレコード群に「ｎ７」（ｎ７＝０）件含まれる。血圧が「低血圧」で性別が「男」のレコードは、真のデータのレコード群に「ｎ８」件含まれる。血圧が「高血圧」で性別が「女」のレコードは、真のデータのレコード群に「ｎ９」件含まれる。 For example, “n1” records of “normal” blood pressure and “male” gender are included in the true data record group. "n2" records of "low blood pressure" for blood pressure and "female" for sex are included in the true data record group. Records with the blood pressure of "hypertension" and the sex of "D1" are included in the true data record group of "n3" (n3=0) records. "n4" records of "normal" blood pressure and "female" gender are included in the true data record group. "n5" (n5=0) records with "hypotension" as the blood pressure and "D1" as the sex are included in the true data record group. "n6" records of "hypertension" for blood pressure and "male" for sex are included in the true data record group. Records with "normal" blood pressure and "D1" sex are included in the record group of true data with "n7" (n7=0) records. "n8" records of "hypotension" for blood pressure and "male" for sex are included in the true data record group. "n9" records of "hypertension" for blood pressure and "female" for sex are included in the true data record group.

ここで群番号「１」（ｇ＝１）のダミーレコード群において、血圧のダミー値が「正常」で性別のダミー値が「男」となるレコードは、真のデータのレコード群内の血圧が「低血圧」で性別が「女」のレコードに基づいて生成される。そのため血圧が「正常」で性別が「男」のレコードは、群番号「１」（ｇ＝１）のダミーレコード群に「ｎ２」件含まれる。群番号「２」（ｇ＝２）のダミーレコード群において、血圧のダミー値が「正常」で性別のダミー値が「男」となるレコードは、真のデータのレコード群内の血圧が「高血圧」で性別が「Ｄ１」のレコードに基づいて生成される。そのため血圧が「正常」で性別が「男」のレコードは、群番号「２」（ｇ＝２）のダミーレコード群に「ｎ３」（ｎ３＝０）件含まれる。 Here, in the dummy record group with the group number “1” (g=1), the record in which the blood pressure dummy value is “normal” and the gender dummy value is “male” is the blood pressure in the true data record group. It is generated based on records with "hypotension" and gender of "female". Therefore, “n2” records of “normal” blood pressure and “male” gender are included in the dummy record group of group number “1” (g=1). In the dummy record group with the group number “2” (g=2), the record in which the blood pressure dummy value is “normal” and the gender dummy value is “male” is the true data record group in which the blood pressure is “high blood pressure.” ” is generated based on the record whose gender is “D1”. Therefore, “n3” (n3=0) records of “normal” blood pressure and “male” sex are included in the dummy record group of group number “2” (g=2).

すると、秘匿化ＤＢにおける血圧が「正常」で性別が「男」のレコードの出現頻度（攪乱後頻度）は「ｎ１＋ｎ２」となる。同様に、秘匿化ＤＢにおける血圧が「低血圧」で性別が「女」のレコードの出現頻度（攪乱後頻度）、秘匿化ＤＢにおける血圧が「高血圧」で性別が「Ｄ１」のレコードの出現頻度（攪乱後頻度）も「ｎ１＋ｎ２」となる。 Then, the appearance frequency (post-disturbance frequency) of the record with "normal" blood pressure and "male" sex in the anonymized DB is "n1+n2". Similarly, the appearance frequency (post-disturbance frequency) of records with “low blood pressure” and “female” as the blood pressure in the anonymized DB, and the appearance frequency of records with “high blood pressure” as the blood pressure and “D1” as the gender in the anonymized DB. (Frequency after disturbance) is also "n1+n2".

図１２に示すような攪乱後頻度を有する秘匿化ＤＢに対して検索者が検索を行ったとき、複数の項目値の組合せ頻度を知っている攻撃者３３は、検索条件に含まれるキーワードの候補をｋ個（図１２の例では３個）未満に絞ることが可能である。例えば検索者が「正常∧男」（∧は論理積）の検索または「低血圧∧女」の検索を行ったものとする。これらのいずれの検索結果も錯乱後頻度は「ｎ１＋ｎ２」である。すなわち検索条件にヒットしたレコード数は「ｎ１＋ｎ２」である。なお検索者がダミー要素を検索することはない。 When a searcher searches an anonymized DB having post-disturbance frequencies as shown in FIG. can be narrowed down to less than k (three in the example of FIG. 12). For example, it is assumed that a searcher performs a search for "normal ∧ male" (∧ is a logical product) or a search for "low blood pressure ∧ female". The post-confusion frequency of any of these search results is "n1+n2". That is, the number of records that hit the search condition is "n1+n2". Note that the searcher never searches for dummy elements.

ここで、血圧のキーワードと性別のキーワードとの間の組み合わせ頻度（真の頻度）を知っている攻撃者３３は、攪乱後頻度が「ｎ１＋ｎ２」となるのが「正常∧男」または「低血圧∧女」のみであることも知っている。すると攻撃者３３は、少なくとも「高血圧」が検索対象外であることを認識できる。これは、検索者の関心の対象が血圧については「正常」または「低血圧」であり、「高血圧」は関心の対象外であると、攻撃者３３に知られてしまうことを意味する。すなわち、検索の対象がｋ（＝３）未満に絞られてしまう。 Here, the attacker 33, who knows the combination frequency (true frequency) between the keyword of blood pressure and the keyword of gender, assumes that the post-disturbance frequency is "n1+n2" for "normal male" or "hypotensive male". ∧ I also know that it is only a woman. Then, the attacker 33 can at least recognize that "hypertension" is not a search target. This means that the attacker 33 will know that the target of interest of the searcher is "normal" or "low blood pressure" for blood pressure, and that "high blood pressure" is not the target of interest. That is, the search target is narrowed down to less than k (=3).

このようにキーワードの種類数Ｘ_j（「男」、「女」の２種類）が少なく、Ｘ_j＜ｋとなる項目値と他の項目値とが組み合わさると、他の項目値も性別列のＸ_j個の値に絞られてしまう。 In this way, when the number of types of keywords X _j (two types of “male” and “female”) is small and an item value satisfying X _j <k is combined with other item values, the other item values also is narrowed down to X _j values of .

図１２の例では、ダミー要素に対する検索が行われることがないという事実が、検索対象の絞り込みに利用されている。そして要素の組み合わせを考慮にいれた場合、ダミー要素を含む検索以外にも検索が行われることがない要素の組み合わせが存在する。例えば患者ごとのレコードが登録されたＤＢの項目に「診療科」と「年齢層」とが含まれている場合があり得る。ここで、小児科を成人や老人が受診することはない。そのため診療科の値が「小児科」のレコードにおいて、年齢層の値が「成人」、「老人」などであることはない。 In the example of FIG. 12, the fact that no dummy element is searched is used to narrow down the search target. When the combinations of elements are taken into consideration, there are combinations of elements that are not searched for other than searches that include dummy elements. For example, there may be a case where the items of the DB in which records for each patient are registered include "medical department" and "age group". Adults and the elderly do not visit pediatrics here. Therefore, in a record with the department value of "Pediatrics", the age group value is not "adult" or "elderly".

このように１つの項目の値がある項目値であるとき他の項目に設定されることがあり得ない値を、構造的ゼロと呼ぶことができる。それに対して、１つの項目の値がある項目値であるとき他の項目に設定されることがあり得る唯一の値（例えば診療科「小児科」に対する年齢層「児童」）を、非構造的ゼロと呼ぶことができる。構造的ゼロも検索対象の絞り込みに利用可能である。 Such a value that cannot be set to another item when the value of one item is a certain item value can be called structural zero. In contrast, the only value that can be set in another item when the value of one item is an item value (for example, the age group "child" for the clinical department "pediatrics") is defined as the unstructured zero can be called Structural zeros can also be used to narrow down your search.

図１３は、構造的ゼロを利用した検索対象の絞り込みの一例を示す図である。図１３の例では、ＤＢ４３には「診療科」、「性別」、および「年齢層」の項目が設けられている。このＤＢ４３の各項目の値が暗号化された秘匿化ＤＢ４４が生成されている。ここで診療科については、「内科」、「婦人科」、「小児科」を含む１つの変換集合が生成されているものとする。また年齢層については、「老人」、「成人」、「Ｄ２」（ダミー要素）を含む変換集合と、「青年」、「児童」、「Ｄ３」（ダミー要素）を含む変換集合とが生成されているものとする。 FIG. 13 is a diagram illustrating an example of narrowing down search targets using structural zeros. In the example of FIG. 13, the DB 43 has items of "medical department", "gender", and "age group". An anonymization DB 44 is generated in which the values of each item of this DB 43 are encrypted. Here, it is assumed that one conversion set including "internal medicine", "gynecology", and "pediatrics" is generated for clinical departments. For the age group, a transformation set including "elderly", "adult" and "D2" (dummy elements) and a transformation set including "youth", "child" and "D3" (dummy elements) are generated. shall be

この場合、「内科∧成人」、「婦人科∧Ｄ２」、「小児科∧老人」それぞれの組み合わせの頻度攪乱後の出現頻度が同じとなる。図１３の例では、これらの組み合わせの出現頻度は「１３３」である。 In this case, the appearance frequencies after the frequency perturbation are the same for the combinations of "internal medicine/adult", "gynecology/D2", and "pediatrics/elderly". In the example of FIG. 13, the appearance frequency of these combinations is "133".

ここで検索者４５が「内科∧成人」の検索を、秘匿化ＤＢ４４に対して行った場合を想定する。この検索に対しては、１３３件のレコードがヒットする。攻撃者３３が検索者４５による検索を監視していた場合、攻撃者３３は、各項目に設定可能な値の出現頻度を組み合わせて、出現頻度が「１３３」となる値の組み合わせを調べる。これにより攻撃者３３は、検索対象の候補を「内科∧成人」、「婦人科∧Ｄ２」、「小児科∧老人」の３つまで絞り込むことができる。 Here, it is assumed that the searcher 45 searches the anonymized DB 44 for “internal medicine/adult”. This search hits 133 records. When the attacker 33 monitors the search by the searcher 45, the attacker 33 combines the appearance frequencies of the values that can be set for each item, and checks the combination of values with the appearance frequency of "133". As a result, the attacker 33 can narrow down the search target candidates to three: "internal medicine/adults", "gynecology/D2", and "pediatrics/elderly".

次に攻撃者３３は、ダミー要素である「Ｄ２」が検索対象となることはないため、「婦人科∧Ｄ２」を検索対象の候補から除外する。さらに攻撃者３３は、小児科に老人という検索が行われることはない（構造的ゼロの検索をする理由がない）ため、「小児科∧老人」も検索対象の候補から除外する。その結果、検索者４５が送信した検索クエリは「内科∧成人」であることを攻撃者３３が認識できる。 Next, the attacker 33 excludes "Gynecology ∧D2" from the search target candidates because the dummy element "D2" will never be a search target. In addition, the attacker 33 excludes "Pediatrics ∧ Geriatrics" from the search target candidates because there is no search for "Geriatrics" in Pediatrics (there is no reason to search for structural zero). As a result, the attacker 33 can recognize that the search query sent by the searcher 45 is "internal medicine/adult".

このようにダミー要素や構造的ゼロが存在することによって検索クエリに含まれるキーワードが絞り込まれてしまう。このことを前提とすると、各項目に設定可能なキーワードの組み合わせの出現頻度に基づいて、ｋ個程度にまで初期段階で絞り込まれていることに問題がある。そこで、データ登録サーバ２００，３００は、データを登録する際に、各項目に設定可能なキーワードの出現頻度が分かっていても、その情報からは検索クエリに含まれるキーワードの絞り込みが十分にできないようにする。具体的には以下の通りである。 The presence of dummy elements and structural zeros narrows down the keywords included in the search query. Assuming this, there is a problem in that the number of keyword combinations that can be set for each item is narrowed down to about k in the initial stage based on the appearance frequency of the combination of keywords. Therefore, when registering data, the data registration servers 200 and 300 are designed so that even if the frequency of appearance of keywords that can be set for each item is known, the keywords included in the search query cannot be sufficiently narrowed down from that information. to Specifically, it is as follows.

データ登録サーバ２００，３００は、ＤＢ４３に登録可能なキーワードを、複数のキーワードのいずれかに確率的に変換する。以下、変換先となるキーワードを分割キーワードと呼ぶこととする。 The data registration servers 200 and 300 stochastically convert a keyword that can be registered in the DB 43 into one of a plurality of keywords. A keyword to be converted is hereinafter referred to as a split keyword.

図１４は、分割キーワードへの確率的変換の一例を示す図である。例えばデータ登録サーバ２００は、診療科、性別、年齢層の項目を含むＤＢ２１０を有する。データ登録サーバ２００は、ＤＢ２１０の各項目に登録可能なキーワードに基づいて、分割キーワード選択テーブル２３３を生成する。 FIG. 14 is a diagram showing an example of probabilistic conversion into split keywords. For example, the data registration server 200 has a DB 210 containing items such as clinical department, gender, and age group. The data registration server 200 generates a split keyword selection table 233 based on keywords that can be registered in each item of the DB 210 .

分割キーワード選択テーブル２３３には、ＤＢ２１０内の項目ごとに、該当項目に登録可能なキーワードに対応する分割キーワードと、分割キーワードの選択確率とが設定されている。例えば項目「診療科」に登録可能なキーワード「小児科」に対応する分割キーワードとして「小児科０」と「小児科１」とがある。 In the split keyword selection table 233, for each item in the DB 210, split keywords corresponding to keywords that can be registered in the corresponding item and selection probabilities of the split keywords are set. For example, there are "pediatrics 0" and "pediatrics 1" as divided keywords corresponding to the keyword "pediatrics" that can be registered in the item "medical department".

分割キーワードの選択確率は、乱数を生成することによって決定され、データ登録サーバ２００外には秘密となるように管理される。従って、ＤＢ２１０に登録可能なキーワードそれぞれに対して２つの分割キーワードが存在する場合でも、分割キーワードごとに選択確率は異なる。例えば分割キーワード「小児科０」の選択確率は「０．７２」であり、「小児科１」の選択確率は「０．２８」である。また分割キーワード「婦人科０」の選択確率は「０．６０」であり、「婦人科１」の選択確率は「０．４０」である。なお、ＤＢ２１０に登録可能な１つのキーワードに対応する複数の分割キーワードの選択確率の合計は「１」となる。 The selection probabilities of the divided keywords are determined by generating random numbers and are managed confidentially outside the data registration server 200 . Therefore, even if there are two split keywords for each keyword that can be registered in the DB 210, the selection probabilities differ for each split keyword. For example, the selection probability of the divided keyword “pediatrics 0” is “0.72”, and the selection probability of “pediatrics 1” is “0.28”. The selection probability of the segmented keyword "gynecology 0" is "0.60", and the selection probability of "gynecology 1" is "0.40". Note that the sum of the selection probabilities of a plurality of divided keywords corresponding to one keyword that can be registered in the DB 210 is "1".

データ登録サーバ２００は、データ管理サーバ１００にＤＢ２１０内のデータを登録する際に、ＤＢ２１０内の各レコードの項目値を、分割キーワード選択テーブル２３３を参照して、確率的に対応する分割キーワードのいずれかに変換する。例えば「小児科」を変換する場合、７２％の確率で「小児科０」に変換され、残りの２８％の確率で「小児科１」に変換される。ＤＢ２１０内のキーワードを変換することで、分割済データ２１１が生成される。 When registering the data in the DB 210 in the data management server 100, the data registration server 200 refers to the split keyword selection table 233 to set the item value of each record in the DB 210 to one of the corresponding split keywords stochastically. Convert to For example, when "Pediatrics" is converted, it is converted into "Pediatrics 0" with a probability of 72%, and is converted into "Pediatrics 1" with a remaining probability of 28%. The divided data 211 is generated by converting the keywords in the DB 210 .

なお図１４の例では、すべてのキーワードを分割キーワードに変換しているが、一部のキーワードについては変換せずに、ＤＢ２１０に登録可能なキーワードのまま秘匿化を行うことも可能である。 In the example of FIG. 14, all keywords are converted into split keywords, but it is also possible to anonymize some of the keywords as they are, without conversion, as they can be registered in the DB 210 .

データ登録サーバ２００は、分割キーワード選択テーブル２３３に示される分割キーワードを要素として含む変換集合を生成し、その変換集合を用いてダミーデータを生成する。その際、データ登録サーバ２００は、各変換集合について、以下の生成条件を満たすようにする。 The data registration server 200 generates a conversion set containing the split keywords shown in the split keyword selection table 233 as elements, and uses the conversion set to create dummy data. At that time, the data registration server 200 satisfies the following generation conditions for each transformation set.

変換集合の生成条件は、その変換集合の要素のうちの少なくとも１つが分割キーワードであり、かつその分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の変換集合に含まれることである。換言すると変換集合は、１つのキーワードに対応するｎ個の分割キーワードのうちのｎ－１個以下の分割キーワードを含む。 The conditions for generating a transformation set are that at least one of the elements of the transformation set is a split keyword, and that the other transformation set contains another split keyword that shares the split keyword with the conversion source keyword. . In other words, the conversion set includes n-1 or less split keywords out of n split keywords corresponding to one keyword.

図１５は、変換集合による分割キーワードからダミー値への変換の一例を示す図である。変換集合一覧２３２には、診療科についての３つの変換集合２３２ａ～２３２ｃ、性別についての３つの変換集合２３２ｄ～２３２ｆ、および年齢層についての４つの変換集合２３２ｇ～２３２ｊが含まれている。各変換集合２３２ａ～２３２ｊには、２つずつの要素が含まれている。変換集合２３２ａ～２３２ｊに含まれる要素は、例えば分割キーワード、またはダミー要素である。また分割しないキーワードがある場合、そのキーワードも変換集合２３２ａ～２３２ｊの要素となる。 FIG. 15 is a diagram showing an example of conversion from split keywords to dummy values by a conversion set. The conversion set list 232 includes three conversion sets 232a-232c for departments, three conversion sets 232d-232f for gender, and four conversion sets 232g-232j for age groups. Each transform set 232a-232j contains two elements. Elements included in conversion sets 232a-232j are, for example, split keywords or dummy elements. Also, if there is a keyword that is not divided, that keyword is also an element of the conversion sets 232a-232j.

なお図１５における変換集合２３２ａ～２３２ｊ内の「：」の左の数字は要素番号である。「：」の右の文字列が要素として設定された分割キーワードである。
図１５の例では、ＤＢ２１０に登録可能なキーワードはすべて分割されるため、変換集合２３２ａ～２３２ｊに要素として含まれるのは分割キーワードとダミー要素である。例えば診療科の変換集合２３２ａには、分割キーワード「小児科０」、「内科０」が含まれる。また性別の変換集合２３２ｄには、分割キーワード「男０」とダミー要素「Ｄ１」が含まれる。 Note that the numbers to the left of ":" in the conversion sets 232a to 232j in FIG. 15 are element numbers. The character string to the right of ":" is a split keyword set as an element.
In the example of FIG. 15, all the keywords that can be registered in the DB 210 are split, so split keywords and dummy elements are included as elements in the conversion sets 232a to 232j. For example, the conversion set 232a of clinical departments includes the segmented keywords "pediatrics 0" and "internal medicine 0". The gender conversion set 232d also includes the segmented keyword "male 0" and the dummy element "D1".

変換集合２３２ａ～２３２ｊのいずれも、少なくとも１つの分割キーワードを要素として含んでいる。そしてその分割キーワードを含む変換集合とは別の変換集合に、その分割キーワードと変換元のキーワードが共通の他の分割キーワードが含まれている。例えば変換集合２３２ａには、分割キーワード「小児科０」が含まれている。「小児科０」の変換元のキーワードは「小児科」である。「小児科」の分割キーワードとしては「小児科０」以外に「小児科１」が存在する。「小児科１」は、変換集合２３２ａとは別の変換集合２３２ｂに含まれている。従って変換集合２３２ａは、変換集合の生成条件を満たしている。また変換集合２３２ｄは、分割キーワードを「男０」しか含んでおらず、もう１つの要素はダミー要素「Ｄ１」である。ただし「男０」と変換元のキーワードが共通の他の分割キーワード「男１」が他の変換集合２３２ｅに含まれている。従って変換集合２３２ｄについても変換集合の生成条件が満たされている。 Each of conversion sets 232a-232j includes at least one split keyword as an element. A conversion set different from the conversion set containing the divided keyword contains another divided keyword that shares the same conversion source keyword as the divided keyword. For example, the conversion set 232a includes the split keyword "pediatrics 0". The keyword from which "pediatrics 0" is converted is "pediatrics". "Pediatrics 1" exists in addition to "Pediatrics 0" as a division keyword of "Pediatrics". "Pediatrics 1" is included in a transform set 232b different from the transform set 232a. Therefore, the transformation set 232a satisfies the generation condition of the transformation set. Also, the conversion set 232d includes only the split keyword "male 0", and another element is the dummy element "D1". However, another divided keyword "man 1" having a common conversion source keyword with "man 0" is included in another conversion set 232e. Therefore, the transformation set generation condition is also satisfied for the transformation set 232d.

データ登録サーバ２００は、変換集合２３２ａ～２３２ｊに従いダミーデータを生成する。ダミーデータは、Ｇ－１個のダミーレコード群である。ダミーレコードに設定するダミー値の生成方法は、図６を参照して説明した通りである。 Data registration server 200 generates dummy data according to transformation sets 232a-232j. Dummy data is a group of G-1 dummy records. The method of generating dummy values to be set in dummy records is as described with reference to FIG.

分割済データ２１１内の真のデータのレコードとダミーレコードとを混在させることで、秘匿化ＤＢへの登録用の登録データ４６が生成される。登録データ４６は、第１の実施の形態における攪乱レコード群６の一例である。 By mixing true data records and dummy records in the divided data 211, registration data 46 for registration in the anonymization DB is generated. Registration data 46 is an example of disturbance record group 6 in the first embodiment.

登録データ４６においては、同じ変換集合に含まれるＧ個の要素の出現頻度が等しくなる。しかもそれら要素の出現頻度とＤＢ２１０データの頻度とを比較しても、平文を特定できないようになる。例えば変換集合２３２ｇに含まれる「老人０」と「児童０」の頻度とそれぞれの元データの頻度との相関は秘密の選択確率によって隠されている。そのため各変換集合の要素の頻度を合計しても、真のデータの２以上のキーワードの頻度の合計と一致しない。 In the registration data 46, the appearance frequencies of G elements included in the same transformation set are equal. Moreover, plaintext cannot be identified even by comparing the frequency of occurrence of these elements with the frequency of data in the DB 210 . For example, the correlation between the frequencies of "old man 0" and "child 0" included in the transform set 232g and the frequencies of the respective original data is hidden by secret selection probabilities. Therefore, the sum of the frequencies of the elements of each transformation set does not match the sum of the frequencies of two or more keywords in the true data.

またデータ登録サーバ２００は、検索時に暗号文がどの群に属しているかを見分けることができるようにするために、登録データ４６内の各レコードに群番号を含むフラグ値を追加する。例えばデータ登録サーバ２００は、群番号の分類攻撃を防ぐため、フラグ値に生成の元のレコードのＩＤを加え、すべてのフラグ値をユニークな値にする。 The data registration server 200 also adds a flag value including the group number to each record in the registration data 46 so that the group to which the ciphertext belongs can be identified at the time of retrieval. For example, the data registration server 200 adds the ID of the original record to the flag values to make all the flag values unique in order to prevent group number classification attacks.

登録データ４６の先頭のレコードは分割済データ２１１の先頭（ＩＤ「０」）のレコードであり、真のデータのレコード群（群番号「０」）に属するため、フラグ値は「０，０」である。登録データ４６の２つ目のレコードは分割済データ２１１の先頭（ＩＤ「０」）のレコードに基づくダミーレコードであり、ダミーレコード群（群番号「１」）に属するため、フラグ値は「０，１」である。 The leading record of the registered data 46 is the leading (ID "0") record of the divided data 211 and belongs to the true data record group (group number "0"), so the flag value is "0, 0". is. The second record of the registered data 46 is a dummy record based on the first record (ID "0") of the divided data 211 and belongs to the dummy record group (group number "1"), so the flag value is "0". , 1”.

図１４、図１５には、ＤＢ２１０に登録可能なすべてのキーワードについて分割キーワードに変換する例を示しているが、すべての変換集合が生成条件を満たせるのであれば、一部のキーワードについては分割しなくてもよい。 14 and 15 show an example of converting all keywords that can be registered in the DB 210 into split keywords. It doesn't have to be.

図１６は、変換集合の他の生成例を示す第１の図である。図１６に示す変換集合一覧４７には、診療科についての３つの変換集合４７ａ～４７ｃ、性別についての２つの変換集合４７ｄ～４７ｅ、および年齢層についての３つの変換集合４７ｆ～４７ｈが含まれている。図１６の例では、診療科の「内科」、性別のキーワード「女」、年齢層のキーワード「青年」および「児童」については分割キーワードへの変換が行われない。そのため、これらのキーワードはそのまま変換集合の要素となっている。 FIG. 16 is a first diagram showing another example of generating a transformation set. The conversion set list 47 shown in FIG. 16 includes three conversion sets 47a to 47c for clinical departments, two conversion sets 47d to 47e for gender, and three conversion sets 47f to 47h for age groups. there is In the example of FIG. 16, conversion into segmented keywords is not performed for the clinical department "internal medicine", the gender keyword "female", and the age group keywords "youth" and "children". Therefore, these keywords are the elements of the transformation set as they are.

このような変換集合一覧４７に含まれる各変換集合４７ａ～４７ｈは、変換集合の生成条件を満たしている。例えば変換集合４７ｄには、分割キーワード「男０」と分割されていないキーワード「女」が含まれている。このうち「男０」の変換元のキーワードは「男」である。「男」の分割キーワードとしては「男０」以外に「男１」が存在する。「男１」は、変換集合４７ｄとは別の変換集合４７ｅに含まれている。従って変換集合４７ｄは、変換集合の生成条件を満たしている。 Each of the transformation sets 47a to 47h included in the transformation set list 47 satisfies the transformation set generation conditions. For example, the conversion set 47d includes the divided keyword "man 0" and the undivided keyword "female". Among these, the keyword of the conversion source of "male 0" is "male". In addition to "male 0", "male 1" exists as a segmented keyword for "male". "Man 1" is included in a conversion set 47e different from the conversion set 47d. Therefore, the transformation set 47d satisfies the generation condition of the transformation set.

ただし変換集合一覧４７を用いてダミー値への変換を行うと、図１５に示した変換集合一覧２３２を用いた場合に比べて、平文推定の難易度が下がる。例えば変換集合一覧４７を用いてダミー値への変換を行った秘匿化ＤＢに対して、攻撃者３３は以下のような攻撃が可能である。 However, if conversion to dummy values is performed using the conversion set list 47, the degree of difficulty of plaintext estimation is lowered compared to the case where the conversion set list 232 shown in FIG. 15 is used. For example, the attacker 33 can attack the anonymized DB that has been converted into dummy values using the conversion set list 47 as follows.

攻撃者３３が性別に対する検索の検索対象の特定を試みる場合を想定する。攻撃者３３は、性別の変換集合４７ｄ，４７ｅごとの攪乱後頻度を得る。例えば攻撃者３３は、検索者が性別「女」のレコードを検索した場合にヒットしたレコード数を取得することで、変換集合４７ｄの攪乱後頻度（「男０」の頻度と「女」の頻度との合計）を得ることができる。攻撃者３３は、変換集合４７ｄの攪乱後頻度から平文の「女」の頻度を減算する。すると「男０」の頻度となる。さらに攻撃者３３は、減算結果にもう１つの変換集合４７ｅの攪乱後頻度を加算する。加算後の頻度は、「男０」の頻度と「男１」の頻度との合計である。 Assume that an attacker 33 attempts to identify a search target for a search for gender. The attacker 33 obtains post-perturbation frequencies for each gender conversion set 47d, 47e. For example, the attacker 33 acquires the number of hit records when the searcher searches for records with the gender “female”, thereby obtaining the post-disturbance frequencies (the frequency of “male 0” and the frequency of “female”) of the transformation set 47d. ) can be obtained. The attacker 33 subtracts the frequency of the plaintext "woman" from the perturbed frequency of the transformation set 47d. Then, the frequency of "male 0" is obtained. Furthermore, the attacker 33 adds the post-disturbance frequency of another transformation set 47e to the subtraction result. The frequency after addition is the sum of the frequency of "Male 0" and the frequency of "Male 1".

攻撃者３３は、加算結果が「男」の頻度と等しくなることが確認できれば、変換集合４７ｄの攪乱後頻度となる検索が行われた場合、対応する変換集合４７ｄには「女」が含まれると判断できる。以後、攻撃者３３は、検索者により変換集合４７ｄの攪乱後頻度となる検索のみが行われた場合、「女」を検索したと特定することができる。 If the attacker 33 can confirm that the addition result is equal to the frequency of "male", then if a search is performed with the post-disturbance frequency of the transformation set 47d, the corresponding transformation set 47d will include "female". can be judged. After that, the attacker 33 can identify that the searcher has searched for "woman" when only the search with the post-disturbance frequency of the conversion set 47d is performed by the searcher.

攻撃者３３がこのような試行を繰り返すことで検索対象の絞り込みが可能となるが、同じ項目に登録可能なキーワード数が多くなるほど、このような攻撃の試行回数が増加し、検索対象が特定される危険性は低下する。 By repeating such attempts, the attacker 33 can narrow down the search target. reduced risk of

このように、変換集合一覧４７に示すような変換集合４７ａ～４７ｈでは、性別のように登録可能なキーワード数が少ない項目がある場合に、頻度分析攻撃に対する安全性が低下してしまう。ただし、各項目の登録可能なキーワード数が多ければ、攻撃者３３が行う攻撃で用いるキーワード（攻撃において該当キーワードの頻度の加減算を行う）の組み合わせ数が膨大となる。その結果、変換集合一覧４７に示すように一部の要素が、ＤＢ２１０に登録されるキーワードのまま（分割キーワードに変換されていない）であっても十分に安全となる。 As described above, in the conversion sets 47a to 47h shown in the conversion set list 47, security against frequency analysis attacks decreases when there is an item such as gender that has a small number of keywords that can be registered. However, if the number of keywords that can be registered for each item is large, the number of combinations of keywords used in attacks by the attacker 33 (addition and subtraction of the frequency of the relevant keywords in the attacks) becomes enormous. As a result, as shown in the conversion set list 47, some elements are sufficiently safe even if the keywords registered in the DB 210 remain as they are (not converted into split keywords).

なお、すべてのキーワードを分割キーワードに変換したとしても、変換集合を適切に生成しないと、頻度分析攻撃に対する安全性が不十分となる場合がある。
図１７は、変換集合の他の生成例を示す第２の図である。図１７に示す変換集合一覧４８には、診療科についての３つの変換集合４８ａ～４８ｃ、性別についての３つの変換集合４８ｄ～４８ｆ、および年齢層についての４つの変換集合４８ｇ～４８ｊが含まれている。図１７の例では、年齢層の変換集合４８ｇ～４８ｊの要素が、図１５に示した変換集合２３２ｇ～２３２ｊと異なっている。 Note that even if all keywords are converted into split keywords, security against frequency analysis attacks may be insufficient unless a conversion set is generated appropriately.
FIG. 17 is a second diagram showing another example of generating a transformation set. The conversion set list 48 shown in FIG. 17 includes three conversion sets 48a-48c for clinical departments, three conversion sets 48d-48f for gender, and four conversion sets 48g-48j for age groups. there is In the example of FIG. 17, the elements of the age group transformation sets 48g-48j are different from the transformation sets 232g-232j shown in FIG.

変換集合４８ｇ～４８ｊでは、「老人」の分割キーワード「老人０」、「老人１」が、それぞれ変換集合４８ｇと変換集合４８ｈとに格納されている。また「青年」の分割キーワード「青年０」、「青年１」が、それぞれ変換集合４８ｇと変換集合４８ｈとに格納されている。すなわち変換集合４８ｇと変換集合４８ｈとは、属する分割キーワードの変換元のキーワードの組み合わせ（「老人」、「青年」）が同じである。同様に、変換集合４８ｉと変換集合４８ｊとも、属する分割キーワードの変換元のキーワードの組み合わせ（「成人」、「児童」）が同じである。 In the conversion sets 48g to 48j, the divided keywords "old man 0" and "old man 1" of "old man" are stored in the conversion set 48g and the conversion set 48h, respectively. Separated keywords "seinen 0" and "seinen 1" of "youth" are stored in the conversion set 48g and the conversion set 48h, respectively. That is, the conversion set 48g and the conversion set 48h have the same combination of keywords ("old man", "young man") as the conversion sources of the divided keywords to which they belong. Similarly, both the conversion set 48i and the conversion set 48j have the same combination of keywords (“adult” and “children”) as the conversion sources of the divided keywords to which they belong.

このような変換集合４８ｇ～４８ｊが生成された場合、変換集合４８ｇに属する分割キーワードの攪乱後頻度と変換集合４８ｈに属する分割キーワードの攪乱後頻度との合計が、ＤＢ２１０における「老人」と「青年」との出現頻度の合計に等しくなる。そうすると「老人」または「青年」の検索が行われたとき、ヒットしたレコードの件数から、検索対象のキーワードが「老人」または「青年」であると攻撃者３３が推定できてしまう。すなわち検索対象のキーワードの候補が２つに絞り込まれてしまう。 When such conversion sets 48g to 48j are generated, the sum of the post-disturbance frequencies of the split keywords belonging to the conversion set 48g and the post-disturbance frequencies of the split keywords belonging to the conversion set 48h is the same as the "elderly" and "youth" in the DB 210. is equal to the sum of the occurrence frequencies of . Then, when a search for "old man" or "young man" is performed, the attacker 33 can estimate that the keyword to be searched is "old man" or "young man" from the number of hit records. That is, the search target keyword candidates are narrowed down to two.

このような頻度分析攻撃を困難にするために、データ登録要求サーバ２００，３００は、図１５に示すように、変換集合内の分割キーワードの変換元のキーワードの組み合わせが同じとなる複数の変換集合が生じないようにする。 In order to make such a frequency analysis attack difficult, the data registration request servers 200 and 300, as shown in FIG. prevent it from occurring.

次に、確率的に分割キーワードへの変換を行うことで頻度分析攻撃に対する安全性を向上させた秘密情報管理システムの各装置の機能について説明する。
図１８は、秘密情報管理システムの機能を示すブロック図である。データ管理サーバ１００は、秘匿化ＤＢ１１０、データ登録部１２０、鍵提供依頼部１３０、および検索部１４０を有する。秘匿化ＤＢ１１０は、データ管理サーバ１００が有するメモリ１０２またはストレージ装置１０３によって実現される。データ登録部１２０、鍵提供依頼部１３０、および検索部１４０は、データ管理サーバ１００が有するプロセッサ１０１によって実現される。 Next, the function of each device of the secret information management system that improves security against frequency analysis attacks by probabilistic conversion into divided keywords will be described.
FIG. 18 is a block diagram showing functions of the confidential information management system. The data management server 100 has an anonymization DB 110 , a data registration section 120 , a key provision request section 130 and a search section 140 . The anonymization DB 110 is implemented by the memory 102 or storage device 103 of the data management server 100 . Data registration unit 120 , key provision request unit 130 , and search unit 140 are implemented by processor 101 of data management server 100 .

秘匿化ＤＢ１１０は、データ登録サーバ２００，３００から収集した暗号文の患者データを、暗号文のまま管理するＤＢである。
データ登録部１２０は、データ登録サーバ２００，３００からのデータ登録要求に応じて、暗号文の患者データを秘匿化ＤＢ１１０に登録する。 The anonymization DB 110 is a DB that manages ciphertext patient data collected from the data registration servers 200 and 300 in ciphertext form.
The data registration unit 120 registers ciphertext patient data in the anonymization DB 110 in response to data registration requests from the data registration servers 200 and 300 .

鍵提供依頼部１３０は、端末装置４００，５００からの鍵取得要求を受信すると、端末装置４００，５００への鍵提供依頼を、データ登録サーバ２００，３００に送信する。
検索部１４０は、端末装置４００，５００からの暗号化された検索キーワードを含むデータ検索クエリに応じて、秘匿化ＤＢ１１０に登録された患者データを検索する。この際、検索部１４０は、患者データと検索キーワードとを暗号文のまま照合し、検索キーワードに合致するレコードを、秘匿化ＤＢ１１０から抽出する。そして検索部１４０は、抽出したレコードを、検索クエリの送信元の端末装置４００，５００に送信する。 Upon receiving a key acquisition request from terminal devices 400 and 500 , key provision requesting section 130 transmits a key provision request for terminal devices 400 and 500 to data registration servers 200 and 300 .
The search unit 140 searches for patient data registered in the anonymization DB 110 in response to data search queries including encrypted search keywords from the terminal devices 400 and 500 . At this time, the search unit 140 compares the patient data and the search keyword as they are in ciphertext, and extracts records matching the search keyword from the anonymization DB 110 . Then, the search unit 140 transmits the extracted records to the terminal devices 400 and 500 that are the source of the search query.

データ登録サーバ２００は、ＤＢ２１０、鍵記憶部２２０、変換情報記憶部２３０、鍵生成部２４０、データ登録要求部２５０、および鍵提供部２６０を有する。ＤＢ２１０、鍵記憶部２２０、および変換情報記憶部２３０は、データ登録サーバ２００が有するメモリまたはストレージ装置によって実現される。また鍵生成部２４０、データ登録要求部２５０、および鍵提供部２６０は、データ登録サーバ２００が有するプロセッサによって実現される。 The data registration server 200 has a DB 210 , a key storage section 220 , a conversion information storage section 230 , a key generation section 240 , a data registration request section 250 and a key provision section 260 . The DB 210, the key storage unit 220, and the conversion information storage unit 230 are implemented by a memory or storage device that the data registration server 200 has. Key generation unit 240 , data registration request unit 250 , and key provision unit 260 are implemented by a processor included in data registration server 200 .

ＤＢ２１０は、患者データを平文で格納するＤＢである。
鍵記憶部２２０は、データ管理サーバ１００に登録する患者データの暗号化に使用する暗号鍵を記憶する。暗号鍵は、データ管理サーバ１００からアクセスできないように管理される。 The DB 210 is a DB that stores patient data in plain text.
The key storage unit 220 stores encryption keys used for encrypting patient data registered in the data management server 100 . The encryption key is managed so that it cannot be accessed from the data management server 100 .

変換情報記憶部２３０は、真のデータのレコード群に示されるキーワードを、ダミーレコードに登録するダミー値に変換するために使用する情報を記憶する。例えば変換集合一覧などの情報が変換情報記憶部２３０に格納される。 The conversion information storage unit 230 stores information used to convert the keywords indicated in the true data record group into dummy values to be registered in the dummy records. For example, information such as a conversion set list is stored in the conversion information storage unit 230 .

鍵生成部２４０は、暗号鍵を生成する。鍵生成部２４０は、生成した暗号鍵を鍵記憶部２２０に格納する。
データ登録要求部２５０は、データ管理サーバ１００への登録対象の患者データの暗号文を含むデータ登録要求を、データ管理サーバ１００に送信する。例えばデータ登録要求部２５０は、まず登録対象の患者データをＤＢ２１０から取得し、秘匿化ＤＢ１１０のフォーマットに合わせて、患者データを加工する。この際、データ登録要求部２５０は、送信するデータ登録要求にダミーデータを含める。ダミーデータは複数のダミーレコードを含む。ダミーレコード内にはダミー値が登録されている。データ登録要求部２５０は、変換情報記憶部２３０に格納されている情報を用いて、真のデータ内の値をダミー値に変換し、ダミーレコードに登録する。 The key generator 240 generates an encryption key. The key generation unit 240 stores the generated encryption key in the key storage unit 220 .
The data registration request unit 250 transmits to the data management server 100 a data registration request including encrypted text of patient data to be registered in the data management server 100 . For example, the data registration request unit 250 first acquires patient data to be registered from the DB 210 and processes the patient data according to the format of the anonymization DB 110 . At this time, the data registration request unit 250 includes dummy data in the data registration request to be transmitted. The dummy data contains multiple dummy records. A dummy value is registered in the dummy record. The data registration request unit 250 uses the information stored in the conversion information storage unit 230 to convert the value in the true data into a dummy value and registers it in the dummy record.

さらにデータ登録要求部２５０は、暗号鍵を用いて、秘匿化ＤＢ１１０に登録する項目値ごとに、患者データに含まれる値を暗号化する。そしてデータ登録要求部２５０は、項目値ごとに暗号化された、暗号文の患者データを含むデータ登録要求を、データ管理サーバ１００に送信する。 Furthermore, the data registration requesting unit 250 uses the encryption key to encrypt the values included in the patient data for each item value to be registered in the anonymization DB 110 . Then, the data registration request unit 250 transmits to the data management server 100 a data registration request including patient data in ciphertext encrypted for each item value.

鍵提供部２６０は、データ管理サーバ１００からの鍵提供依頼に応じて、登録した患者データの利用を許可する製薬企業の端末装置４００，５００へ、暗号鍵を送信する。なお鍵提供部２６０は、暗号鍵を、データ管理サーバ１００を経由せずに端末装置４００，５００に送信する。データ管理サーバ１００を経由せずに暗号鍵を送信することで、暗号鍵がデータ管理サーバ１００から隔離される。その結果、データ管理サーバ１００の管理者による、秘匿化ＤＢ１１０内のデータの復号が抑止される。 In response to a key provision request from the data management server 100, the key provision unit 260 transmits the encryption key to the terminal devices 400 and 500 of the pharmaceutical company that permit use of the registered patient data. Note that the key providing unit 260 transmits the encryption key to the terminal devices 400 and 500 without going through the data management server 100 . By transmitting the encryption key without going through the data management server 100 , the encryption key is isolated from the data management server 100 . As a result, the administrator of the data management server 100 is prevented from decrypting the data in the anonymization DB 110 .

以上、データ登録サーバ２００が有する機能を説明したが、データ登録サーバ３００もデータ登録サーバ２００と同様の機能を有する。
端末装置４００は、鍵記憶部４１０、変換情報記憶部４２０、鍵取得部４３０、および検索要求部４４０を有する。鍵記憶部４１０と変換情報記憶部４２０は、端末装置４００が有するメモリまたはストレージ装置によって実現される。また鍵取得部４３０、および検索要求部４４０は、端末装置４００が有するプロセッサによって実現される。 Although the functions of the data registration server 200 have been described above, the data registration server 300 also has functions similar to those of the data registration server 200 .
Terminal device 400 has key storage unit 410 , conversion information storage unit 420 , key acquisition unit 430 , and search request unit 440 . The key storage unit 410 and the conversion information storage unit 420 are implemented by a memory or storage device that the terminal device 400 has. Also, the key acquisition unit 430 and the search request unit 440 are implemented by a processor included in the terminal device 400 .

鍵記憶部４１０は、検索クエリに含める検索キーワードの暗号化に使用する暗号鍵を記憶する。暗号鍵は、データ管理サーバ１００からアクセスできないように管理される。
変換情報記憶部４２０は、検索者が指定した検索条件に示されるキーワードをダミー値に変換するために使用する情報を記憶する。変換情報記憶部４２０に格納されている情報は、データ登録サーバ２００の変換情報記憶部２３０に格納されている情報と同じである。 The key storage unit 410 stores encryption keys used for encrypting search keywords to be included in search queries. The encryption key is managed so that it cannot be accessed from the data management server 100 .
The conversion information storage unit 420 stores information used to convert a keyword indicated by a search condition specified by a searcher into a dummy value. Information stored in the conversion information storage unit 420 is the same as information stored in the conversion information storage unit 230 of the data registration server 200 .

鍵取得部４３０は、データ登録サーバ２００，３００で提供される暗号鍵を取得する。例えば鍵取得部４３０は、データ管理サーバ１００に、鍵取得要求を送信する。するとデータ管理サーバ１００の鍵提供依頼部１３０により、データ登録サーバ２００，３００に鍵提供依頼が送信される。鍵提供依頼に応じて、例えばデータ登録サーバ２００の鍵提供部２６０が、暗号鍵を端末装置４００に送信する。そして鍵取得部４３０は、端末装置４００から送信された暗号鍵を取得する。鍵取得部４３０は、取得した暗号鍵を、鍵記憶部４１０に格納する。 Key acquisition unit 430 acquires the encryption key provided by data registration servers 200 and 300 . For example, the key acquisition unit 430 transmits a key acquisition request to the data management server 100 . Then, the key provision request unit 130 of the data management server 100 transmits a key provision request to the data registration servers 200 and 300 . In response to the key provision request, for example, the key provision unit 260 of the data registration server 200 transmits the encryption key to the terminal device 400 . The key acquisition unit 430 then acquires the encryption key transmitted from the terminal device 400 . Key acquisition unit 430 stores the acquired encryption key in key storage unit 410 .

検索要求部４４０は、患者データの利用者（検索者）が入力した検索キーワードを取得する。次に検索要求部４４０は、取得した検索キーワードを、暗号鍵を用いて暗号化し、暗号文の検索キーワードを含む検索クエリをデータ管理サーバ１００に送信する。検索要求部４４０は、データ管理サーバ１００から検索結果を受信すると、検索結果の内容（例えば検索キーワードに合致した真のデータのレコード数）を表示する。 The search request unit 440 acquires a search keyword input by a patient data user (searcher). Next, the search request unit 440 encrypts the acquired search keyword using an encryption key, and transmits a search query including the encrypted search keyword to the data management server 100 . Upon receiving search results from the data management server 100, the search request unit 440 displays the contents of the search results (for example, the number of true data records that match the search keyword).

なお検索要求部４４０は、ダミーデータを検索対象とする検索クエリを送信することもできる。その場合、検索要求部４４０は、変換情報記憶部４２０に格納されている情報を用いて、検索キーワードをダミー値に変換し、暗号化したダミー値を含む検索クエリを送信する。この場合、検索要求部４４０は、検索結果に示されるレコードから所定のダミーレコードを抽出し、そのダミーレコード内のダミー値を、変換情報記憶部４２０に格納されている情報を用いて、真のデータに設定されていた値に変換する。そして検索要求部４４０は、変換後の値を有するレコードの内容を検索結果として表示する。 Note that the search request unit 440 can also transmit a search query that searches dummy data. In that case, the search request unit 440 converts the search keyword into a dummy value using information stored in the conversion information storage unit 420, and transmits a search query including the encrypted dummy value. In this case, the search request unit 440 extracts a predetermined dummy record from the records shown in the search result, and uses the information stored in the conversion information storage unit 420 to convert the dummy value in the dummy record into a true value. Convert to the value set in the data. Then, the search request unit 440 displays the content of the record having the converted value as the search result.

以上、端末装置４００が有する機能を説明したが、端末装置５００も端末装置４００と同様の機能を有する。
図１８に示した機能により、データ管理サーバ１００の管理者に対しても患者データと検索クエリの内容を秘匿したまま、データ管理サーバ１００において患者データを管理すると共に、製薬企業１５，１６による患者データの利用を可能とすることができる。なお、図１８に示した各要素の機能は、例えば、その要素に対応するプログラムモジュールをコンピュータに実行させることで実現することができる。 Although the functions of the terminal device 400 have been described above, the terminal device 500 also has the same functions as the terminal device 400 .
With the function shown in FIG. 18, patient data is managed in the data management server 100 while patient data and the content of search queries are kept secret even from the administrator of the data management server 100, and patient data is managed by the pharmaceutical companies 15 and 16. Data can be made available. It should be noted that the function of each element shown in FIG. 18 can be realized, for example, by causing a computer to execute a program module corresponding to the element.

次に、図１８に示したシステムによる出現頻度攪乱処理の概要について説明する。
図１９は、ダミーデータを用いた出現頻度の攪乱処理の一例を示す図である。データ登録サーバ２００は、データ５４の使用を許可する製薬企業（例えば製薬企業１５）の端末装置４００へ、鍵生成部２４０が生成した暗号鍵５１を送信する（ステップＳ１１）。例えば端末装置４００の鍵取得部４３０がデータ管理サーバ１００に鍵取得要求を送信する。データ管理サーバ１００では、鍵提供依頼部１３０が、データ登録サーバ２００に暗号鍵５１の提供を依頼する。データ登録サーバ２００の鍵提供部２６０は、暗号鍵５１の提供依頼を受信すると、管理者による暗号鍵５１の提供の許可を示す入力を受け付ける。鍵提供部２６０は、暗号鍵５１の提供を許可する旨の入力が行われると、暗号鍵５１を鍵記憶部２２０から取得し、取得した暗号鍵５１と同じ暗号鍵５２を、データ管理サーバ１００を経由させずに端末装置４００に送信する。端末装置４００では、鍵取得部４３０が受信した暗号鍵５２を鍵記憶部４１０に格納する。これにより、データ登録サーバ２００と端末装置４００とで、暗号鍵の共有化が図られる。 Next, an overview of appearance frequency disturbance processing by the system shown in FIG. 18 will be described.
FIG. 19 is a diagram showing an example of the appearance frequency disturbance process using dummy data. The data registration server 200 transmits the encryption key 51 generated by the key generation unit 240 to the terminal device 400 of the pharmaceutical company (for example, the pharmaceutical company 15) that permits the use of the data 54 (step S11). For example, the key acquisition unit 430 of the terminal device 400 transmits a key acquisition request to the data management server 100 . In the data management server 100 , the key provision requesting unit 130 requests the data registration server 200 to provide the encryption key 51 . Upon receiving the request for provision of the encryption key 51 , the key provision unit 260 of the data registration server 200 accepts an input indicating permission to provide the encryption key 51 from the administrator. When the key providing unit 260 receives an input indicating that the provision of the encryption key 51 is permitted, the key providing unit 260 acquires the encryption key 51 from the key storage unit 220, and supplies the same encryption key 52 as the acquired encryption key 51 to the data management server 100. is transmitted to the terminal device 400 without going through. In the terminal device 400 , the encryption key 52 received by the key obtaining section 430 is stored in the key storage section 410 . As a result, the encryption key is shared between the data registration server 200 and the terminal device 400 .

その後、データ登録サーバ２００は、ＤＢ２１０内のデータ２１０ａに対して、ダミーデータ５５を追加する（ステップＳ１２）。例えばデータ登録要求部２５０は、ＤＢ２１０から取得した真のデータ５４に含まれるレコードの数のＧ－１倍のダミーレコードを、ダミーデータ５５として追加する。この際、データ登録要求部２５０は、追加したダミーレコードの項目値（ダミー値）として、真のデータ５４に設定されている項目値を用い、各項目値の出現頻度の偏りを減少させる。さらにデータ登録要求部２５０は、各レコードに、真のデータ５４のレコードなのかダミーレコードなのか、ダミーレコードであればどのダミーレコード群に属するのかを識別するためのフラグ５６を付与する。 After that, the data registration server 200 adds dummy data 55 to the data 210a in the DB 210 (step S12). For example, the data registration requesting unit 250 adds as dummy data 55 G-1 times as many dummy records as the number of records included in the true data 54 acquired from the DB 210 . At this time, the data registration requesting unit 250 uses the item values set in the true data 54 as the item values (dummy values) of the added dummy record to reduce the bias in appearance frequency of each item value. Furthermore, the data registration requesting unit 250 gives each record a flag 56 for identifying whether it is a record of the true data 54 or a dummy record, and if it is a dummy record, to which dummy record group it belongs.

データ登録要求部２５０は、真のデータ５４とダミーデータ５５との各レコード内の項目値（フラグを含む）を暗号鍵５１で暗号化して、登録データ５３を生成する（ステップＳ１３）。そしてデータ登録要求部２５０は、登録データ５３を含むデータ登録要求を、データ管理サーバ１００に送信する（ステップＳ１４）。データ管理サーバ１００では、データ登録部１２０が、登録データ５３を受信し、受信した登録データ５３を秘匿化ＤＢ１１０に格納する。 The data registration request unit 250 encrypts the item values (including flags) in each record of the true data 54 and the dummy data 55 with the encryption key 51 to generate the registration data 53 (step S13). The data registration request unit 250 then transmits a data registration request including the registration data 53 to the data management server 100 (step S14). In the data management server 100 , the data registration unit 120 receives the registration data 53 and stores the received registration data 53 in the anonymization DB 110 .

製薬企業１５の担当者がデータ５４を利用する場合、担当者は、端末装置４００に検索キーワードを入力する。すると検索要求部４４０は、入力された検索キーワードを暗号鍵５２で暗号化して、暗号文の検索キーワードを含む検索クエリ５７を生成する（ステップＳ１５）。なお検索要求部４４０は、いずれかのダミーレコード群を検索対象とする場合、入力された検索キーワードを、ダミーレコード群におけるその検索キーワードに対応するダミー値に変換する。そして検索要求部４４０は、変化で得られたダミー値を暗号化した値を含む検索クエリ５７を生成する。そして検索要求部４４０は、検索クエリ５７をデータ管理サーバ１００に送信する（ステップＳ１６）。 When the person in charge of the pharmaceutical company 15 uses the data 54 , the person in charge enters a search keyword into the terminal device 400 . Then, the search request unit 440 encrypts the input search keyword with the encryption key 52 and generates a search query 57 including the encrypted search keyword (step S15). If any dummy record group is to be searched, the search request unit 440 converts the input search keyword into a dummy value corresponding to the search keyword in the dummy record group. The search requesting unit 440 then generates a search query 57 containing encrypted values of the dummy values obtained by the change. The search request unit 440 then transmits the search query 57 to the data management server 100 (step S16).

データ管理サーバ１００では、検索部１４０が、データを秘匿化したままで、登録データ５３と検索クエリ５７とを照合する（ステップＳ１７）。そして検索部１４０は、検索クエリ５７による検索にヒットしたレコードを、検索結果５８として端末装置４００に送信する（ステップＳ１８）。検索結果５８には、真のデータ５４のレコードとダミーデータ５５のダミーレコードとが含まれる。 In the data management server 100, the search unit 140 collates the registered data 53 and the search query 57 while keeping the data confidential (step S17). Then, the search unit 140 transmits the records hit by the search by the search query 57 to the terminal device 400 as the search results 58 (step S18). The search results 58 include records of true data 54 and dummy records of dummy data 55 .

端末装置４００では検索要求部４４０が検索結果５８を受信する。検索要求部４４０は、真のデータを検索対象とした場合には、例えばフラグに基づいて、検索結果５８からダミーレコードを破棄する（ステップＳ１９）。そして検索要求部４４０は、検索結果５８内の真のデータ５４のレコードのみを含む真の結果５９を、モニタなどに表示する。 In terminal device 400 , search request unit 440 receives search result 58 . The search request unit 440 discards the dummy record from the search result 58 based on the flag, for example, when the true data is to be searched (step S19). Then, the search request unit 440 displays on a monitor or the like a true result 59 containing only records of the true data 54 in the search result 58 .

なお検索要求部４４０は、ダミーレコード群を検索対象とした場合には、例えばフラグに基づいて、検索結果５８から検索対象のダミーレコード群に属するダミーレコードを抽出し、その他のレコードを破棄する。検索要求部４４０は、ダミーレコード内のダミー値を、そのダミー値の変換元であったキーワードに変換する。そして検索要求部４４０は、元のキーワードに変換された値を含むレコードを真の結果５９として表示する。 When the dummy record group is set as a search target, the search request unit 440 extracts dummy records belonging to the dummy record group to be searched from the search result 58 based on flags, for example, and discards other records. The search request unit 440 converts the dummy value in the dummy record into the keyword from which the dummy value was converted. Search requester 440 then displays as true results 59 records containing values converted to the original keywords.

このようにダミーデータ５５を追加することで、各項目値の頻度攪乱が可能となる。端末装置４００は、フラグを用いてダミーデータ５５と真のデータ５４とを識別して、真の結果５９を得ることができる。 By adding the dummy data 55 in this way, it becomes possible to disturb the frequency of each item value. The terminal device 400 can use the flag to distinguish between the dummy data 55 and the true data 54 to obtain the true result 59 .

次に、データ登録サーバ２００，３００が有する平文の患者データのＤＢ２１０について説明する。
図２０は、平文の患者データのＤＢの一例を示す図である。ＤＢ２１０には、真のデータ２１０ａが平文のまま格納されている。真のデータ２１０ａには、例えば患者ごとのレコードが、レコードの識別子（ＩＤ）に対応付けて登録されている。各レコードには、項目ごとの列に、その項目に対応するキーワードが設定されている。図２０の例では、項目として「診療科」、「性別」、「年齢層」がある。ＤＢ２１０に登録されている各レコード内の値は、例えば平文の文字コードである。 Next, the plaintext patient data DB 210 of the data registration servers 200 and 300 will be described.
FIG. 20 is a diagram showing an example of a plaintext patient data DB. The DB 210 stores true data 210a in plain text. In the real data 210a, for example, a record for each patient is registered in association with an identifier (ID) of the record. In each record, a keyword corresponding to the item is set in the column for each item. In the example of FIG. 20, items include "medical department", "gender", and "age group". A value in each record registered in the DB 210 is, for example, a plaintext character code.

データ登録要求部２５０は、患者のレコードをデータ管理サーバ１００に登録する場合、そのレコードに設定された値（平文）を、確定的暗号化技術により暗号化する。そして暗号化されたレコードが、データ管理サーバ１００の秘匿化ＤＢ１１０に登録される。その際、データ登録要求部２５０は、分割キーワード選択テーブル２３３（図１４参照）に基づいてキーワードを確率的に分割キーワードに変換する。さらにデータ登録要求部２５０は、変換情報記憶部２３０を参照して、頻度分析攻撃に対する攪乱のためにダミーデータを生成する。 When registering a patient's record in the data management server 100, the data registration requesting unit 250 encrypts the value (plaintext) set in the record using deterministic encryption technology. The encrypted record is registered in the anonymization DB 110 of the data management server 100 . At this time, the data registration request unit 250 probabilistically converts the keywords into divided keywords based on the divided keyword selection table 233 (see FIG. 14). Furthermore, the data registration requesting unit 250 refers to the conversion information storage unit 230 and generates dummy data for disturbing the frequency analysis attack.

図２１は、データ登録サーバ内の変換情報記憶部に格納される情報の一例を示す図である。変換情報記憶部２３０には、キーワード一覧２３１、変換集合一覧２３２、および分割キーワード選択テーブル２３３が記憶されている。キーワード一覧２３１は、項目ごとに、その項目に設定可能なキーワードのリストが示されたデータである。変換集合一覧２３２は、変換集合の内容を示すデータである。分割キーワード選択テーブル２３３は、キーワードを分割して秘匿化ＤＢ１１０に登録する場合における分割後のキーワード（分割キーワード）のリストを示すデータである。なお変換集合一覧２３２と分割キーワード選択テーブル２３３とは、キーワード一覧２３１に基づいて、データ登録要求部２５０によって生成されるデータである。変換集合一覧２３２と分割キーワード選択テーブル２３３とは、データ登録サーバ２００の外部からアクセスできないように秘密に管理される。 FIG. 21 is a diagram showing an example of information stored in a conversion information storage unit within the data registration server. A keyword list 231 , a conversion set list 232 , and a split keyword selection table 233 are stored in the conversion information storage unit 230 . The keyword list 231 is data showing a list of keywords that can be set for each item. The conversion set list 232 is data indicating the contents of conversion sets. The split keyword selection table 233 is data showing a list of keywords after splitting (split keywords) when a keyword is split and registered in the anonymization DB 110 . Note that the conversion set list 232 and the split keyword selection table 233 are data generated by the data registration requesting unit 250 based on the keyword list 231 . The conversion set list 232 and the split keyword selection table 233 are secretly managed so that they cannot be accessed from outside the data registration server 200 .

図２２は、キーワード一覧の一例を示す図である。キーワード一覧２３１には、ＤＢ２１０の項目ごとに、該当項目に設定可能なキーワードのリストが登録されている。図２２の例では、「診療科」の項目に登録できるキーワードは３個であり、「性別」の項目に登録できるキーワードは２個であり、「年齢層」の項目に登録できるキーワードは４個である。 FIG. 22 is a diagram showing an example of a keyword list. In the keyword list 231, a list of keywords that can be set for each item of the DB 210 is registered. In the example of FIG. 22, three keywords can be registered in the item "medical department", two keywords can be registered in the item "sex", and four keywords can be registered in the item "age group". is.

図２３は、変換集合の第１の生成例を示す図である。キーワードリスト２３１ａには、年齢層に設定可能な４個のキーワードが示されている。またＧ＝２であり、各変換集合２３２ｇ～２３２ｊには２つずつの分割キーワードが含まれる。図１４に示した分割キーワード選択テーブル２３３に基づいて分割キーワードを生成する場合、年齢層に関する分割キーワードは８個となる。 FIG. 23 is a diagram illustrating a first generation example of a transformation set. The keyword list 231a shows four keywords that can be set for age groups. Also, G=2, and each conversion set 232g to 232j includes two divided keywords. When the divided keywords are generated based on the divided keyword selection table 233 shown in FIG. 14, there are eight divided keywords related to age groups.

データ登録要求部２５０は、まず４個の変換集合２３２ｇ～２３２ｊの要素の格納領域を生成する。またデータ登録要求部２５０は、変換集合２３２ｇ～２３２ｊの識別子をそれぞれ「２－０」～「２－３」とする。識別子の左側の数値は「年齢層」に対応する値であり、右側の数値は「年齢層」の変換集合２３２ｇ～２３２ｊに対する通し番号である。 The data registration request unit 250 first creates storage areas for the elements of the four conversion sets 232g to 232j. The data registration requesting unit 250 sets the identifiers of the conversion sets 232g to 232j to "2-0" to "2-3", respectively. The numerical value on the left side of the identifier is the value corresponding to the "age group", and the numerical value on the right side is the serial number for the conversion sets 232g to 232j of the "age group".

データ登録要求部２５０は、例えばキーワードリスト２３１ａから、所定の順番あるいはランダムな順番ですべてのキーワードを１回ずつ選択する。図２３の例では、キーワードリスト２３１ａの左から順にキーワードを選択するものとする。 The data registration request unit 250 selects all keywords once from the keyword list 231a, for example, in a predetermined order or in random order. In the example of FIG. 23, the keywords are selected in order from the left of the keyword list 231a.

データ登録要求部２５０は、選択したキーワードの分割キーワードを番号の小さい変換集合から順に、変換集合の格納領域に格納していく。このとき、データ登録要求部２５０は、同じキーワードに対応する分割キーワードは異なる変換集合に格納する。またデータ登録要求部２５０は、変換集合内の分割キーワードの変換元のキーワードの組み合わせが同じとなる複数の変換集合が生じないようにする。 The data registration request unit 250 stores the divided keywords of the selected keyword in the conversion set storage area in ascending order of number. At this time, the data registration requesting unit 250 stores divided keywords corresponding to the same keyword in different conversion sets. The data registration requesting unit 250 also prevents the occurrence of a plurality of conversion sets in which the combination of the conversion source keywords of the divided keywords in the conversion set is the same.

例えばデータ登録要求部２５０は、最初に選択したキーワードの１つ目の分割キーワードを先頭の変換集合２３２ｇに登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目の分割キーワードを２番目の変換集合２３２ｈに登録する。例えば「老人」が最初に選択された場合、「老人０」が変換集合２３２ｇに登録され、「老人１」が変換集合２３２ｈに登録される。 For example, the data registration requesting unit 250 registers the first divided keyword of the first selected keyword in the first conversion set 232g. Next, the data registration request unit 250 registers the second divided keyword of the keyword in the second conversion set 232h. For example, if "old man" is selected first, "old man 0" is registered in the conversion set 232g and "old man 1" is registered in the conversion set 232h.

データ登録要求部２５０は、２番目に選択したキーワードの１つ目の分割キーワードを、直前に選択したキーワードの２つ目の分割キーワードと同じ変換集合２３２ｈに登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目の分割キーワードを次の変換集合２３２ｉに登録する。例えば「成人」が２番目に選択された場合、「成人０」が変換集合２３２ｈに登録され、「成人１」が変換集合２３２ｉに登録される。 The data registration requesting unit 250 registers the first divided keyword of the second selected keyword in the same conversion set 232h as the second divided keyword of the previously selected keyword. Next, the data registration request unit 250 registers the second divided keyword of the keyword in the next conversion set 232i. For example, when "adult" is selected second, "adult 0" is registered in the conversion set 232h and "adult 1" is registered in the conversion set 232i.

データ登録要求部２５０は、３番目以降に選択した各キーワードの分割キーワードについても、２番目に選択したキーワードと同様の手順で変換集合に登録する。例えば「青年」が３番目に選択された場合、「青年０」が変換集合２３２ｉに登録され、「青年１」が変換集合２３２ｊに登録される。 The data registration requesting unit 250 also registers the divided keywords of the third and subsequent keywords in the conversion set in the same procedure as the second selected keyword. For example, if "young man" is selected third, "young man 0" is registered in conversion set 232i and "young man 1" is registered in conversion set 232j.

データ登録要求部２５０は、最後に選択したキーワードの分割キーワードについては、格納領域が空いている変換集合に登録する。例えば「児童」が最後に選択された場合、「児童０」が変換集合２３２ｇに登録され、「児童１」が変換集合２３２ｊに登録される。 The data registration requesting unit 250 registers the divided keyword of the last selected keyword in a conversion set with a free storage area. For example, if "child" is selected last, "child 0" is registered in transform set 232g and "child 1" is registered in transform set 232j.

このような手順で変換集合２３２ｇ～２３２ｊが生成される。変換集合２３２ｇ～２３２ｊそれぞれは、属する要素のうちの少なくとも１つが分割キーワードであり、かつその分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の変換集合に含まれる。そのため変換集合の生成条件を満たしている。また変換集合２３２ｇ～２３２ｊ内の分割キーワードの変換元のキーワードの組み合わせが同じとなる複数の変換集合は生じていない。 Transformation sets 232g to 232j are generated by such a procedure. Each of the conversion sets 232g to 232j has at least one of the elements belonging to it being a split keyword, and another split keyword having a common conversion source keyword with the split keyword is included in the other conversion set. Therefore, it satisfies the transformation set generation condition. In addition, there are no multiple conversion sets in which the combination of the conversion source keywords of the divided keywords in the conversion sets 232g to 232j is the same.

図２３には年齢層の変換集合２３２ｇ～２３２ｊの生成例を示したが、診療科および性別についても同様の手順でそれぞれの変換集合２３２ａ～２３２ｆを生成することができる。その結果、図１５に示したような変換集合一覧２３２が生成される。 FIG. 23 shows an example of generation of transformation sets 232g to 232j for age groups, but transformation sets 232a to 232f can also be generated for departments and genders using the same procedure. As a result, a transformation set list 232 as shown in FIG. 15 is generated.

図２３に示したのは、Ｇ＝２，Ｌ＝２，Ｘ_j＝４，Ｍ_j（ｊ番目の項目の変換集合数）＝４の場合の例であるが、これらのパラメータの値が別の値であっても、同様に適切な変換集合を生成することができる。 FIG. 23 shows an example of G=2, L=2, X _j =4, and M _j (the number of transformation sets of the j-th item)=4, but these parameter values are different. A suitable transform set can also be generated for any value of .

図２４は、変換集合の第２の生成例を示す図である。図２４には、Ｇ＝３，Ｌ＝４，Ｘ_j＝４，Ｍ_j＝６の場合における変換集合４９ａ～４９ｆの生成例が示されている。
例えばデータ登録要求部２５０は、最初に選択したキーワードの１つ目の分割キーワードを先頭の変換集合４９ａの要素番号「０」の要素に登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目以降の分割キーワードを、変換集合番号と要素番号を１ずつ加算しながら、該当する番号の要素に登録していく。なおデータ登録要求部２５０は、加算後の要素番号がＬとなる場合、要素番号を「０」に戻す。 FIG. 24 is a diagram illustrating a second generation example of a transformation set. FIG. 24 shows an example of generation of transformation sets 49a to 49f when G=3, L=4, X _j =4 and M _j =6.
For example, the data registration requesting unit 250 registers the first divided keyword of the first selected keyword in the element with the element number "0" of the first conversion set 49a. Next, the data registration requesting unit 250 registers the second and subsequent divided keywords of the keyword in the elements of the corresponding numbers while adding 1 to the conversion set number and the element number. When the element number after addition becomes L, the data registration requesting unit 250 resets the element number to "0".

例えば「老人」が最初に選択された場合、「老人０」が変換集合４９ａの要素番号「０」の要素に登録される。続けて、「老人１」が変換集合４９ｂの要素番号「１」の要素に登録され、「老人２」が変換集合４９ｃの要素番号「２」の要素に登録され、「老人３」が変換集合４９ｄの要素番号「０」の要素に登録される。 For example, when "old man" is selected first, "old man 0" is registered in the element with the element number "0" of the conversion set 49a. Subsequently, "old man 1" is registered in the element with element number "1" in the conversion set 49b, "old man 2" is registered in the element with element number "2" in the conversion set 49c, and "old man 3" is registered in the conversion set. It is registered in the element of element number "0" of 49d.

データ登録要求部２５０は、２つ目以降に選択されたキーワードの１つ目の分割キーワードを、直前に選択されたキーワードの１つ目の分割キーワードを格納した変換集合に対して変換集合番号で次の変換集合に登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目以降の分割キーワードを、変換集合番号と要素番号を１ずつ加算しながら、該当する番号の要素に登録していく。なおデータ登録要求部２５０は、加算後の要素番号がＬとなる場合、要素番号を「０」に戻す。 The data registration request unit 250 assigns the first segmented keyword of the second and subsequent keywords to the transformation set storing the first segmented keyword of the previously selected keyword by the transformation set number. Register with the following transformation set. Next, the data registration requesting unit 250 registers the second and subsequent divided keywords of the keyword in the elements of the corresponding numbers while adding 1 to the conversion set number and the element number. When the element number after addition becomes L, the data registration requesting unit 250 resets the element number to "0".

例えば２つ目のキーワードとして「成人」が選択されると、「成人０」が変換集合４９ｂの要素番号「０」の要素に登録される。続けて、「成人１」が変換集合４９ｃの要素番号「１」の要素に登録され、「成人２」が変換集合４９ｄの要素番号「２」の要素に登録され、「成人３」が変換集合４９ｅの要素番号「０」の要素に登録される。 For example, when "adult" is selected as the second keyword, "adult 0" is registered in the element with the element number "0" of the conversion set 49b. Subsequently, "adult 1" is registered in the element with element number "1" of the conversion set 49c, "adult 2" is registered in the element with element number "2" of the conversion set 49d, and "adult 3" is registered in the conversion set. It is registered in the element of element number "0" of 49e.

次に「青年」が選択されると、「青年０」が変換集合４９ｃの要素番号「０」の要素に登録される。続けて、「青年１」が変換集合４９ｄの要素番号「１」の要素に登録され、「青年２」が変換集合４９ｅの要素番号「２」の要素に登録され、「青年３」が変換集合４９ｆの要素番号「０」の要素に登録される。 Next, when "Youth" is selected, "Youth 0" is registered in the element with the element number "0" of the conversion set 49c. Subsequently, "young man 1" is registered in the element with element number "1" in the conversion set 49d, "young man 2" is registered in the element with element number "2" in the conversion set 49e, and "young man 3" is registered in the conversion set. It is registered in the element of element number "0" of 49f.

データ登録要求部２５０は、最後に選択したキーワードの分割キーワードについては、格納領域が空いている変換集合に登録する。例えば「児童」が最後に選択された場合、「児童０」が変換集合４９ａの要素番号「１」の要素に登録される。続けて「児童１」が変換集合４９ｂの要素番号「２」の要素に登録され、「児童２」が変換集合４９ｅの要素番号「１」の要素に登録され、「児童３」が変換集合４９ｆの要素番号「２」の要素に登録される。 The data registration requesting unit 250 registers the divided keyword of the last selected keyword in a conversion set with a free storage area. For example, when "child" is selected last, "child 0" is registered in the element with the element number "1" of the conversion set 49a. Subsequently, "child 1" is registered in the element of element number "2" of the conversion set 49b, "child 2" is registered in the element of element number "1" of the conversion set 49e, and "child 3" is registered in the conversion set 49f. is registered in the element with the element number "2".

データ登録要求部２５０は、すべての分割キーワードを登録後に空いている要素にダミー要素を登録する。例えば変換集合４９ｆの要素番号「１」の要素に「Ｄ０」が登録され、変換集合４９ａの要素番号「２」の要素に「Ｄ１」が登録される。 The data registration request unit 250 registers dummy elements in empty elements after registering all divided keywords. For example, "D0" is registered as the element with the element number "1" in the conversion set 49f, and "D1" is registered as the element with the element number "2" in the conversion set 49a.

このようにＧ、Ｌ、Ｘ_j、Ｍ_jの値が増えても、適切な変換集合を生成することができる（変換集合生成手順の詳細は図２８参照）。データ登録要求部２５０は、変換集合一覧２３２を生成後、図１４に示すような分割キーワード選択テーブル２３３を生成する。データ登録要求部２５０は、生成した変換集合一覧２３２と分割キーワード選択テーブル２３３とを変換情報記憶部２３０に格納する。そしてデータ登録要求部２５０は、分割キーワード選択テーブル２３３に基づいてＤＢ２１０内のキーワードを確率的に分割キーワードに変換後、秘匿化ＤＢ１１０に登録するための登録データを生成する。 Even if the values of G, L, X _j , and M _j increase in this way, an appropriate transformation set can be generated (see FIG. 28 for details of the transformation set generation procedure). After generating the conversion set list 232, the data registration requesting unit 250 generates a split keyword selection table 233 as shown in FIG. The data registration request unit 250 stores the generated conversion set list 232 and split keyword selection table 233 in the conversion information storage unit 230 . The data registration requesting unit 250 then probabilistically converts the keywords in the DB 210 into divided keywords based on the divided keyword selection table 233 , and then generates registration data for registration in the anonymization DB 110 .

図２５は、分割キーワードを用いて生成した登録データの一例を示す図である。ＤＢ２１０に格納されていた真のデータに示される各キーワードが分割キーワード選択テーブル２３３に示される選択確率で確率的に分割キーワードに変換され、分割済データ２１１が生成される。そして、分割済データ２１１に基づいて登録データ６０が生成される。 FIG. 25 is a diagram showing an example of registration data generated using split keywords. Each keyword shown in the true data stored in the DB 210 is stochastically converted into a split keyword with a selection probability shown in the split keyword selection table 233, and the split data 211 is generated. Registration data 60 is generated based on the divided data 211 .

登録データ６０には、分割済データ２１１のレコードの集合である真のデータのレコード群６０ａと、分割済データ２１１のレコードの各分割キーワードを変換して得られたダミーレコード群６０ｂが含まれている。真のデータのレコード群６０ａの群番号は「０」であり、ダミーレコード群６０ｂの群番号は「１」である。 The registered data 60 includes a true data record group 60a, which is a set of records of the divided data 211, and a dummy record group 60b obtained by converting each divided keyword of the divided data 211 records. there is The group number of the true data record group 60a is "0", and the group number of the dummy record group 60b is "1".

各レコードにはランダムにＩＤが付与されている。登録データ６０内の各レコードにはフラグが付与されている。そして登録データ６０に含まれるキーワード（フラグ値も含む）が暗号化される。なおフラグ値は、変換元のレコードの分割済データ２１１内でのＩＤと、属する群の群番号との組である。 Each record is assigned an ID at random. Each record in the registration data 60 is given a flag. Keywords (including flag values) included in the registration data 60 are then encrypted. Note that the flag value is a set of the ID in the divided data 211 of the conversion source record and the group number of the group to which it belongs.

データ登録要求部２５０は、登録データ６０の各項目値を暗号化し、登録データ６０のレコードをＩＤでソートした後、データ管理サーバ１００の秘匿化ＤＢ１１０に登録する。 The data registration request unit 250 encrypts each item value of the registration data 60 , sorts the records of the registration data 60 by ID, and registers them in the anonymization DB 110 of the data management server 100 .

図２６は、秘匿化ＤＢの一例を示す図である。秘匿化ＤＢ１１０に登録されたレコードはＩＤによってソートされており、真のデータのレコードとダミーレコードが混在して登録されている。 FIG. 26 is a diagram illustrating an example of an anonymization DB; The records registered in the anonymization DB 110 are sorted by ID, and real data records and dummy records are registered together.

次に、データ登録処理の手順について詳細に説明する。
図２７は、データ登録処理の手順の一例を示すフローチャートである。以下、図２７に示す処理をステップ番号に沿って説明する。 Next, the procedure of data registration processing will be described in detail.
FIG. 27 is a flowchart illustrating an example of the procedure of data registration processing. The processing shown in FIG. 27 will be described below along with the step numbers.

［ステップＳ１０１］データ登録要求部２５０は、群数Ｇとキーワード分割数Ｌの設定入力を受け付ける。ＧとＬは共に２以上の整数である。ＧとＬの値が大きいほど安全性が向上するが登録するダミーレコード数も増加する。そこで、ＧとＬの値は、秘匿化ＤＢ１１０に求められる頻度分析攻撃に対する安全性の度合いと、秘匿化ＤＢ１１０に許容されるダミーレコード数とを勘案して、データ登録サーバ２００の管理者が決定する。 [Step S101] The data registration requesting unit 250 receives a setting input of the number of groups G and the number of keyword divisions L. FIG. Both G and L are integers of 2 or more. As the values of G and L are larger, the safety improves, but the number of dummy records to be registered also increases. Therefore, the values of G and L are determined by the administrator of the data registration server 200 in consideration of the degree of security against frequency analysis attacks required of the anonymization DB 110 and the number of dummy records allowed in the anonymization DB 110. do.

［ステップＳ１０２］データ登録要求部２５０は、項目ごとの変換集合数を決定する。例えばｊ番目の項目の変換集合数Ｍ_jは、天井関数を用いて以下の式（３）で表される。 [Step S102] The data registration request unit 250 determines the number of conversion sets for each item. For example, the transformation set number M _j of the j-th item is expressed by the following equation (3) using a ceiling function.

ｊ番目の項目のキーワードの種類数Ｘ_jにキーワード分割数Ｌを乗算した値（Ｌ×Ｘ_j）が、その項目の分割キーワード数である。分割キーワード数を群数Ｇで除算した結果の天井関数（除算結果以上の最小の整数）と「３」とのうちの大きい方の値が、変換集合数となる。 The value (L×X _j ) obtained by multiplying the keyword type number X _j of the j-th item by the keyword division number L is the number of divided keywords of the item. The larger one of the ceiling function (minimum integer greater than or equal to the division result) obtained by dividing the number of divided keywords by the number of groups G and "3" is the number of transformation sets.

変換集合数が「３」以上となるようにしたことで、Ｇ＝２の場合において、性別のように２種類のキーワードしか存在しない項目についても、３つの変換集合が生成される。これにより性別についても、異なる変換集合において、それぞれの分割キーワードの変換元のキーワードの組み合わせが同じとなることを抑止することができる。 By setting the number of conversion sets to be "3" or more, in the case of G=2, three conversion sets are generated even for an item such as gender that has only two types of keywords. As a result, it is possible to prevent the combination of the conversion source keywords of the divided keywords from becoming the same in different conversion sets for gender as well.

［ステップＳ１０３］データ登録要求部２５０は、すべての項目それぞれについて、ステップＳ１０２で決定した変換集合数分の変換集合を生成する。なお、データ登録要求部２５０は、変換集合の格納領域の数よりも分割キーワード数が少ない場合には、ダミー要素を変換集合の要素として登録する。生成される各変換集合は少なくとも１つの分割キーワードを含み、その分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の変換集合に含まれる。これにより、図１５の変換集合一覧２３２に示すような変換集合２３２ａ～２３２ｊが生成される。なお、変換集合生成手順の詳細は後述する（図２８参照）。 [Step S103] The data registration requesting unit 250 generates conversion sets for each of the items as many as the number of conversion sets determined in step S102. Note that the data registration requesting unit 250 registers dummy elements as elements of the conversion set when the number of divided keywords is smaller than the number of storage areas of the conversion set. Each conversion set to be generated includes at least one split keyword, and other split keywords that share the same split keyword and conversion source keyword are included in other conversion sets. As a result, conversion sets 232a to 232j as shown in the conversion set list 232 of FIG. 15 are generated. Details of the transformation set generation procedure will be described later (see FIG. 28).

［ステップＳ１０４］データ登録要求部２５０は、分割キーワード選択テーブル２３３を生成する。例えばデータ登録要求部２５０は、キーワード一覧２３１に示されるキーワードごとに「０」～「１」のＬ－１個の乱数を生成する。そしてデータ登録要求部２５０は、生成した乱数を境界値とし、「０」～「１」の数値範囲を境界値で分割して得られる複数の数値範囲それぞれの大きさを、分割キーワードのそれぞれの選択確率とする。 [Step S104 ] The data registration requesting unit 250 generates the divided keyword selection table 233 . For example, the data registration request unit 250 generates L−1 random numbers from “0” to “1” for each keyword shown in the keyword list 231 . Then, the data registration requesting unit 250 uses the generated random number as a boundary value, and determines the size of each of a plurality of numerical ranges obtained by dividing the numerical range of "0" to "1" by the boundary value as the size of each of the divided keywords. be the selection probability.

例えばデータ登録要求部２５０は、Ｌ＝２の場合、１つのキーワードについて１個の乱数を生成する。例えば図１４に示した分割キーワード選択テーブル２３３では、「小児科」について乱数「０．７２」が生成されている。そこでデータ登録要求部２５０は、「０」から乱数までの数値範囲の大きさ「０．７２」を、１つ目の分割キーワード「小児科０」の選択確率として設定している。またデータ登録要求部２５０は、乱数から「１」までの数値範囲の大きさ「０．２８」を、２つ目の分割キーワード「小児科１」の選択確率として設定している。 For example, when L=2, the data registration requesting unit 250 generates one random number for one keyword. For example, in the divided keyword selection table 233 shown in FIG. 14, a random number "0.72" is generated for "pediatrics". Therefore, the data registration requesting unit 250 sets the magnitude of the numerical range of "0.72" from "0" to a random number as the selection probability of the first divided keyword "pediatrics 0". The data registration requesting unit 250 also sets the magnitude of the numerical range of "0.28" from the random number to "1" as the selection probability of the second divided keyword "pediatrics 1".

キーワードごとに乱数が生成されるため、すべての分割キーワードの選択確率は不統一な値となる。データ登録要求部２５０は、分割キーワードに対応付けてその選択確率を分割キーワード選択テーブル２３３に設定する。これにより、図１４に示すような分割キーワード選択テーブル２３３が生成される。 Since a random number is generated for each keyword, the selection probabilities of all divided keywords are non-uniform values. The data registration request unit 250 sets the selection probability in the divided keyword selection table 233 in association with the divided keyword. Thereby, a divided keyword selection table 233 as shown in FIG. 14 is generated.

［ステップＳ１０５］データ登録要求部２５０は、ＤＢ２１０から平文のデータを読み込む。
［ステップＳ１０６］データ登録要求部２５０は、読み込んだ平文のデータのレコードに登録されているキーワードを、分割キーワード選択テーブル２３３に示される選択確率で確率的に分割キーワードに変換する。例えばデータ登録要求部２５０は、平文のデータに含まれるキーワードを１つずつ選択する。次にデータ登録要求部２５０は、選択したキーワードの分割キーワードの選択確率に応じた大きさの「０」～「１」内の数値範囲を分割キーワードに割り当てる。例えば図１４に示した分割キーワード選択テーブル２３３の場合において「小児科」が選択されると、データ登録要求部２５０は「０」から「０．７２」の数値範囲に「小児科０」を割り当て、「０．２８」から「１」の数値範囲に「小児科１」を割り当てる。そしてデータ登録要求部２５０は、「０」～「１」の範囲内の乱数を生成し、生成した乱数を含む数値範囲に割り当てられた分割キーワードに、選択したキーワードを変換する。 [Step S105 ] The data registration request unit 250 reads plaintext data from the DB 210 .
[Step S106 ] The data registration requesting unit 250 probabilistically converts the keywords registered in the record of the read plaintext data into divided keywords with the selection probabilities shown in the divided keyword selection table 233 . For example, the data registration requesting unit 250 selects keywords included in the plaintext data one by one. Next, the data registration requesting unit 250 assigns a numerical value range between "0" and "1" corresponding to the selection probability of the divided keyword of the selected keyword to the divided keyword. For example, when "pediatrics" is selected in the case of the divided keyword selection table 233 shown in FIG. "0.28" to "1" is assigned to "Pediatrics 1". Then, the data registration request unit 250 generates a random number within the range of "0" to "1" and converts the selected keyword into divided keywords assigned to a numerical range including the generated random number.

［ステップＳ１０７］データ登録要求部２５０は、群数Ｇ－１個のダミーレコード群を含むダミーデータを生成する。なお、ダミーデータ生成処理の詳細は後述する（図２９参照）。 [Step S107] The data registration requesting unit 250 generates dummy data including dummy record groups of which the number of groups is G-1. Details of the dummy data generation process will be described later (see FIG. 29).

［ステップＳ１０８］データ登録要求部２５０は、真のデータとダミーデータとのレコードそれぞれに、ランダムにＩＤを付与する。
［ステップＳ１０９］データ登録要求部２５０は、各レコードにフラグを付与する。例えばデータ登録要求部２５０は、各レコードについて、そのレコードの変換元となったレコードのＤＢ２１０内でのＩＤと、そのレコードが属する群の群番号との組をフラグ値として生成する。元のレコードのＩＤと群番号との組をフラグ値とすることで、各レコードにユニークなフラグ値が生成される。 [Step S108] The data registration requesting unit 250 randomly assigns an ID to each record of true data and dummy data.
[Step S109] The data registration request unit 250 adds a flag to each record. For example, the data registration requesting unit 250 generates, for each record, a set of the ID in the DB 210 of the record that is the conversion source of the record and the group number of the group to which the record belongs as a flag value. A unique flag value is generated for each record by using a pair of the original record ID and group number as the flag value.

［ステップＳ１１０］データ登録要求部２５０は、各レコードをＩＤでソートする。
［ステップＳ１１１］データ登録要求部２５０は、ソートされたレコード群を暗号化して、秘匿化ＤＢ１１０に登録する。例えばデータ登録要求部２５０は、レコード内の項目値ごとに暗号化し、暗号化された値を有するレコード群を、登録データとしてデータ管理サーバ１００に送信する。データ管理サーバ１００では、データ登録部１２０が登録データを受信し、受信した登録データを秘匿化ＤＢ１１０に格納する。 [Step S110] The data registration request unit 250 sorts the records by ID.
[Step S111 ] The data registration request unit 250 encrypts the sorted record group and registers it in the anonymization DB 110 . For example, the data registration request unit 250 encrypts each item value in the record, and transmits a group of records having the encrypted value to the data management server 100 as registration data. In the data management server 100 , the data registration unit 120 receives the registration data and stores the received registration data in the anonymization DB 110 .

次に変換集合の生成処理について詳細に説明する。
図２８は、変換集合生成処理の手順の一例を示すフローチャートである。以下、図２８に示す処理をステップ番号に沿って説明する。 Next, the transformation set generation processing will be described in detail.
FIG. 28 is a flowchart illustrating an example of the procedure of transformation set generation processing. The processing shown in FIG. 28 will be described below along with the step numbers.

［ステップＳ１２１］データ登録要求部２５０は、変換集合の生成処理を行っていない項目のうちの１つを、処理対象項目として選択する。
［ステップＳ１２２］データ登録要求部２５０は、キーワード一覧２３１の中から、処理対象項目のキーワードのうち変換集合に未設定のキーワードを１つ選択する。 [Step S121] The data registration request unit 250 selects one of the items for which conversion set generation processing has not been performed as a processing target item.
[Step S122 ] The data registration requesting unit 250 selects from the keyword list 231 one of the keywords of the item to be processed that has not been set in the conversion set.

［ステップＳ１２３］データ登録要求部２５０は、設定対象の分割キーワードを示す分割キーワード番号ｌに初期値「０」を設定する（ｌ＝０）。
［ステップＳ１２４］データ登録要求部２５０は、変換集合における値が未設定の要素のうちの要素番号が最小の要素の要素番号を、設定先の要素番号ｇに設定する。 [Step S123] The data registration requesting unit 250 sets the split keyword number l indicating the split keyword to be set to the initial value "0" (l=0).
[Step S124] The data registration requesting unit 250 sets the element number of the element with the smallest element number among the elements whose values are not set in the conversion set, to the element number g of the setting destination.

［ステップＳ１２５］データ登録要求部２５０は、値が未設定の要素番号ｇの要素のうちの変換集合番号が最小の要素の変換集合の変換集合番号を、設定先の変換集合番号ｍに設定する。 [Step S125] The data registration request unit 250 sets the conversion set number of the element with the smallest conversion set number among the elements with the element number g whose value is not set, to the conversion set number m of the setting destination. .

［ステップＳ１２６］データ登録要求部２５０は、ｘ_m,gの要素にｌ番目の分割キーワードを設定する。
［ステップＳ１２７］データ登録要求部２５０は、キーワード番号ｌがＬ－１に達したか否かを判断する（ｌ＝Ｌ－１？）。データ登録要求部２５０は、Ｌ－１に達していなければ処理をステップＳ１２８に進める。またデータ登録要求部２５０は、Ｌ－１に達していれば処理をステップＳ１３０に進める。 [Step S126] The data registration request unit 250 sets the l-th divided keyword to the element of x _m,g .
[Step S127] The data registration requesting unit 250 determines whether or not the keyword number l has reached L-1 (l=L-1?). Data registration requesting unit 250 advances the process to step S128 if L-1 has not been reached. If the data registration requesting unit 250 has reached L-1, the process proceeds to step S130.

［ステップＳ１２８］データ登録要求部２５０は、設定先の要素番号ｇと設定先の変換集合番号ｍとを更新する。例えばデータ登録要求部２５０は、設定先の要素番号を「ｇ＝ｍｏｄ（ｇ＋１，Ｇ）」に更新する。またデータ登録要求部２５０は、設定先の変換集合番号ｍを「ｍ＝ｍｏｄ（ｍ＋１，Ｍ）」に更新する。さらにデータ登録要求部２５０は、分割キーワード番号ｌに１を加算する（ｌ＝ｌ＋１）。 [Step S128] The data registration request unit 250 updates the destination element number g and the destination conversion set number m. For example, the data registration request unit 250 updates the setting destination element number to "g=mod (g+1, G)". The data registration requesting unit 250 also updates the conversion set number m of the setting destination to "m=mod (m+1, M)". Further, the data registration request unit 250 adds 1 to the divided keyword number l (l=l+1).

［ステップＳ１２９］データ登録要求部２５０は、ｘ_m,gの要素に値が未設定か否かを判断する。データ登録要求部２５０は、値が未設定であれば処理をステップＳ１２６に進める。またデータ登録要求部２５０は、値が設定済であれば、処理をステップＳ１２４に進める。 [Step S129] The data registration request unit 250 determines whether or not values have not been set for the elements of x _m,g . If the value is not set, data registration requesting unit 250 advances the process to step S126. If the value has already been set, data registration requesting unit 250 advances the process to step S124.

［ステップＳ１３０］データ登録要求部２５０は、処理対象項目のキーワードのうち未設定のキーワードが残っているか否かを判断する。データ登録要求部２５０は、未設定のキーワードがあれば、処理をステップＳ１２２に進める。またデータ登録要求部２５０は、未設定のキーワードがなければ処理をステップＳ１３１に進める。 [Step S130] The data registration request unit 250 determines whether or not there are any unset keywords among the keywords of the item to be processed. If there is an unset keyword, data registration requesting unit 250 advances the process to step S122. If there is no unset keyword, data registration requesting unit 250 advances the process to step S131.

なおデータ登録要求部２５０は、選択した項目のすべてのキーワードに対応する分割キーワードを変換集合の要素に設定完了後に、その項目に対応する変換集合に値が未設定の領域が残っている場合、該当領域にダミー要素を設定する。 Note that the data registration requesting unit 250, after completing the setting of split keywords corresponding to all the keywords of the selected item to the elements of the conversion set, if the conversion set corresponding to the item still has an unset area, Set a dummy element in the relevant area.

［ステップＳ１３１］データ登録要求部２５０は、未処理の項目があるか否かを判断する。データ登録要求部２５０は、すべての項目について変換集合を生成する処理が完了していれば、変換集合生成処理を終了する。またデータ登録要求部２５０は、未処理の項目があれば処理をステップＳ１２１に進める。 [Step S131] The data registration request unit 250 determines whether or not there is an unprocessed item. The data registration request unit 250 ends the conversion set generation processing if the processing for generating the conversion set for all items has been completed. If there is an unprocessed item, the data registration requesting unit 250 advances the process to step S121.

このような手順で変換集合を生成することで、適切な変換集合を生成することができる。次に、ダミーデータ生成処理について詳細に説明する。
図２９は、ダミーデータ生成処理の手順の一例を示すフローチャートである。以下、図２９に示す処理をステップ番号に沿って説明する。 An appropriate conversion set can be generated by generating a conversion set in such a procedure. Next, dummy data generation processing will be described in detail.
FIG. 29 is a flowchart illustrating an example of the procedure of dummy data generation processing. The processing shown in FIG. 29 will be described below along with the step numbers.

［ステップＳ１４１］データ登録要求部２５０は、ステップＳ１０２で算出した群数Ｇを取得する。
［ステップＳ１４２］データ登録要求部２５０は、真のデータをＧ－１個コピーし、Ｇ－１個のダミーレコード群を生成する。 [Step S141] The data registration request unit 250 acquires the number of groups G calculated in step S102.
[Step S142] The data registration requesting unit 250 copies G-1 pieces of true data and generates G-1 dummy record groups.

［ステップＳ１４３］データ登録要求部２５０は、真のデータの全項目それぞれについて、ステップＳ１４４～Ｓ１４６の処理を実行する。
［ステップＳ１４４］データ登録要求部２５０は、全ダミーレコードそれぞれについて、ステップＳ１４５の処理を実行する。 [Step S143] The data registration requesting unit 250 executes the processing of steps S144 to S146 for each item of true data.
[Step S144] The data registration request unit 250 executes the process of step S145 for each dummy record.

［ステップＳ１４５］データ登録要求部２５０は、ダミーレコードの項目値を変換する。例えばデータ登録要求部２５０は、処理対象の項目に対応する変換集合の中から、ダミーレコードにおける処理対象の項目値（変換対象項目値）を含む変換集合を選択する。次にデータ登録要求部２５０は、処理対象のダミーレコードの属するダミーレコード群の群番号を取得する。データ登録要求部２５０は、選択した変換集合内の変換対象項目値に対応する要素から群番号の分だけ巡回的に右の要素を、その変換集合から取得する。そしてデータ登録要求部２５０は、変換処理対象のダミーレコード内の変換対象項目値を、取得した要素の値（ダミー値）に変換する。 [Step S145] The data registration request unit 250 converts the item values of the dummy record. For example, the data registration requesting unit 250 selects a conversion set including the item value to be processed (conversion target item value) in the dummy record from the conversion sets corresponding to the item to be processed. Next, the data registration request unit 250 acquires the group number of the dummy record group to which the dummy record to be processed belongs. The data registration requesting unit 250 cyclically acquires the elements to the right of the group number from the element corresponding to the conversion target item value in the selected conversion set from the conversion set. Then, the data registration request unit 250 converts the conversion target item value in the dummy record to be converted into the acquired element value (dummy value).

［ステップＳ１４６］データ登録要求部２５０は、全ダミーレコードそれぞれについて、ステップＳ１４５の処理が完了した場合、処理をステップＳ１４７に進める。
［ステップＳ１４７］データ登録要求部２５０は、全項目それぞれについて、ステップＳ１４４～Ｓ１４６の処理が完了した場合、ダミーデータ生成処理を終了する。 [Step S146] When the processing of step S145 is completed for each dummy record, the data registration requesting unit 250 advances the processing to step S147.
[Step S147] When the processes of steps S144 to S146 are completed for all items, the data registration requesting unit 250 ends the dummy data generation process.

このようにして、真のデータの項目値を、変換集合内の他の要素に置き換えることで、ダミーデータが生成される。そしてデータ登録要求部２５０が、生成したダミーデータとフラグ値を含む登録データを暗号化して秘匿化ＤＢ１１０に登録する。 In this way, dummy data is generated by replacing the item values of the true data with other elements in the conversion set. Then, the data registration requesting unit 250 encrypts the registration data including the generated dummy data and the flag value and registers it in the anonymization DB 110 .

秘匿化ＤＢ１１０内のデータを検索しようとする検索者は、例えば端末装置４００の検索条件入力画面を介して検索キーワードを端末装置４００に入力する。
図３０は、検索条件入力画面の一例を示す図である。検索条件入力画面６１には、例えば検索変数としてＤＢ２１０内の各項目の名称が設定されている。そして検索変数に対応付けて、その検索変数の指定値を検索キーワードとして入力するための指定値入力領域６２が設けられている。検索者が指定値入力領域６２に、検索変数として示されている項目の指定値を入力すると、端末装置４００は、入力された指定値を検索キーワードとする検索を行う。 A searcher who intends to search for data in the anonymized DB 110 inputs a search keyword into the terminal device 400 via a search condition input screen of the terminal device 400, for example.
FIG. 30 is a diagram showing an example of a search condition input screen. For example, the name of each item in the DB 210 is set as a search variable on the search condition input screen 61 . A specified value input area 62 is provided in association with a search variable for entering a specified value of the search variable as a search keyword. When the searcher enters a specified value for an item indicated as a search variable in the specified value input area 62, the terminal device 400 performs a search using the entered specified value as a search keyword.

端末装置４００は、秘匿化検索を実施するために、変換情報記憶部４２０に格納された情報を用いて検索キーワードの分割キーワードへの変換などの処理を行う。
図３１は、端末装置内の変換情報記憶部に格納される情報の一例を示す図である。変換情報記憶部４２０には、キーワード一覧４２１、変換集合一覧４２２、および分割キーワード一覧４２３が格納されている。このうちキーワード一覧４２１および変換集合一覧４２２は、それぞれデータ登録サーバ２００が有するキーワード一覧２３１および変換集合一覧２３２と同じ内容のデータである。 The terminal device 400 uses the information stored in the conversion information storage unit 420 to perform processing such as conversion of the search keyword into divided keywords in order to perform anonymized search.
31 is a diagram depicting an example of information stored in a conversion information storage unit within a terminal device; FIG. A keyword list 421 , a conversion set list 422 , and a split keyword list 423 are stored in the conversion information storage unit 420 . Of these, the keyword list 421 and the conversion set list 422 are data having the same content as the keyword list 231 and the conversion set list 232 of the data registration server 200, respectively.

分割キーワード一覧４２３は、ＤＢ２１０に登録可能なキーワードそれぞれと分割キーワードとの対応関係を示すデータである。分割キーワード一覧４２３の内容は、データ登録サーバ２００が有する分割キーワード選択テーブル２３３から選択確率の情報を削除したものと同じである。端末装置４００は、分割キーワード一覧４２３に基づいて、検索キーワードを分割キーワードに変換する。 The split keyword list 423 is data indicating the correspondence relationship between each keyword that can be registered in the DB 210 and the split keywords. The contents of the split keyword list 423 are the same as those obtained by deleting the selection probability information from the split keyword selection table 233 of the data registration server 200 . The terminal device 400 converts the search keyword into divided keywords based on the divided keyword list 423 .

図３２は、分割キーワードへの変換の一例を示す図である。端末装置４００は、入力された検索条件を示す平文検索クエリ６３を分割する。例えば端末装置４００は、検索者から検索条件として「年齢層＝老人」（左辺が項目名、右辺がキーワード（項目値）を表す）の入力を受け付けたものとする。端末装置４００は、分割キーワード選択テーブル２３３から、キーワード「老人」に対応付けられた分割キーワードを取得する。図３２の例では、「老人０」と「老人１」とが分割キーワードとして取得される。端末装置４００は、検索キーワード「老人」を分割キーワード「老人０」および「老人１」に分割して、それらの論理和を示す分割クエリ６４を生成する。 FIG. 32 is a diagram showing an example of conversion into split keywords. The terminal device 400 divides the plaintext search query 63 indicating the input search condition. For example, it is assumed that the terminal device 400 has received an input of "age group=elderly" (the left side represents the item name and the right side represents the keyword (item value)) as a search condition from the searcher. The terminal device 400 acquires the split keyword associated with the keyword “old man” from the split keyword selection table 233 . In the example of FIG. 32, "old man 0" and "old man 1" are acquired as divided keywords. The terminal device 400 divides the search keyword "old man" into the divided keywords "old man 0" and "old man 1", and generates a divided query 64 indicating the logical sum thereof.

端末装置４００は、変換集合一覧４２２に基づいて、分割クエリ６４から攪乱クエリを生成する。
図３３は、攪乱クエリの生成の一例を示す図である。端末装置４００が有する変換集合一覧４２２の内容は変換集合一覧２３２と同じであり、ＤＢ２１０の項目に対応する変換集合４２２ａ～４２２ｊが含まれる。 The terminal device 400 generates a perturbed query from the split query 64 based on the transformation set list 422 .
FIG. 33 is a diagram illustrating an example of generation of a disruptive query. The conversion set list 422 of the terminal device 400 has the same contents as the conversion set list 232, and includes conversion sets 422a to 422j corresponding to the items of the DB 210. FIG.

端末装置４００は、分割クエリ６４に含まれる分割キーワード「老人０」および「老人１」それぞれについて、検索対象の群をランダムに決定する。例えば「老人０」の検索対象の群は「１」、「老人１」の検索対象の群は「０」に決定されたものとする。 The terminal device 400 randomly determines a search target group for each of the split keywords “elderly 0” and “elderly 1” included in the split query 64 . For example, it is assumed that the search target group for "elderly people 0" is determined to be "1", and the search target group for "elderly people 1" is determined to be "0".

端末装置４００は、変換集合一覧４２２を参照し、決定された群に基づいて、分割クエリ６４内の分割キーワードを変換して攪乱クエリ６５を生成する。例えば「老人０」は、変換集合４２２ｇに従って、群「１」の要素「児童０」に変換される。また「老人１」は検索対象の群が「０」（真のデータのレコード群）であるため、変換されない。 The terminal device 400 refers to the conversion set list 422 and converts the split keywords in the split query 64 based on the determined group to generate the confusion query 65 . For example, "Older 0" is transformed into an element "Child 0" of group "1" according to transformation set 422g. "Old man 1" is not converted because the search target group is "0" (true data record group).

そして端末装置４００は、攪乱クエリ６５内のキーワードを暗号化する。これにより暗号文のキーワードを含む秘匿化検索クエリ６６が生成される。端末装置４００は、秘匿化検索クエリ６６を含む検索要求をデータ管理サーバ１００に送信する。 The terminal device 400 then encrypts the keyword in the deranged query 65 . As a result, an anonymized search query 66 including the ciphertext keyword is generated. The terminal device 400 transmits a search request including the anonymized search query 66 to the data management server 100 .

図３４は、検索処理の一例を示す図である。データ管理サーバ１００は、秘匿化検索クエリ６６を受信すると、秘匿化検索クエリ６６に示される条件を満たすレコードを秘匿化ＤＢ１１０から検索する。例えば「Ｈ（年齢層＝児童０）」の条件を満たすレコードは、ＩＤ「９」とＩＤ「１０」のレコードである。また「Ｈ（年齢層＝老人１）」の条件を満たすレコードは、ＩＤ「３」とＩＤ「８」のレコードである。秘匿化検索クエリ６６は、「Ｈ（年齢層＝児童０）」と「Ｈ（年齢層＝老人１）」との論理和であるため、これらの条件のいずれかを満たすレコードが秘匿化検索結果６７に含められる。データ管理サーバ１００は、秘匿化検索結果６７を端末装置４００に送信する。 FIG. 34 is a diagram depicting an example of search processing; Upon receiving the anonymized search query 66 , the data management server 100 searches the anonymized DB 110 for records that satisfy the conditions indicated in the anonymized search query 66 . For example, the records satisfying the condition of "H (age group=0 children)" are the records with IDs "9" and "10". Records satisfying the condition of "H (age group=elderly 1)" are records with IDs "3" and "8". Since the anonymized search query 66 is a logical sum of “H (age group=children 0)” and “H (age group=elderly 1)”, a record that satisfies either of these conditions is an anonymized search result. 67 included. The data management server 100 transmits the anonymized search result 67 to the terminal device 400 .

端末装置４００は、秘匿化検索結果６７を受信すると、秘匿化検索結果６７内の項目値を復号する。端末装置４００は、復号することにより、秘匿化検索結果６７における３つ目と４つ目のレコードが「年齢層＝児童０」を満たしており、１つ目と２つ目のレコードが「年齢層＝老人１」を満たしていることを認識する。 Upon receiving the anonymous search result 67 , the terminal device 400 decrypts the item values in the anonymous search result 67 . By decrypting the terminal device 400, the third and fourth records in the anonymized search result 67 satisfy “age group=children 0”, and the first and second records satisfy “age layer = old man 1" is satisfied.

端末装置４００は、復号して得られた各レコードのフラグに基づいて、そのレコードが属する群の群番号を判断する。そして端末装置４００は、各レコードが属する群が、検索群か否かを判断する。例えば「年齢層＝児童０」の検索群は「１」であるため「年齢層＝児童０」を満たすレコードのうち、群番号「１」のレコードのみが検索対象のレコードである。また「年齢層＝老人１」の検索群は「０」であるため「年齢層＝老人１」を満たすレコードのうち、群番号「０」のレコードのみが検索対象のレコードである。 The terminal device 400 determines the group number of the group to which the record belongs based on the flag of each record obtained by decoding. Then, the terminal device 400 determines whether or not the group to which each record belongs is the search group. For example, since the search group of "age group=child 0" is "1", among the records satisfying "age group=child 0", only the record of group number "1" is the record to be searched. Also, since the search group for "age group=elderly 1" is "0", only records with group number "0" among the records satisfying "age group=elderly 1" are the records to be searched.

端末装置４００は、秘匿化検索結果６７に含まれていたレコードのうち検索群に属していないレコードを除去し、残りのレコードを復元する。なお真のデータのレコード群（群番号「０」）に属していたレコードについては、復元処理は不要である。端末装置４００は、ダミーレコード群（群番号「１」）に属していたレコードの項目値を、データ登録サーバ２００が有する変換集合一覧２３２（図１５参照）と同じ内容の変換集合一覧４２２（図３３参照）に基づいて、攪乱クエリ６５の生成時と逆の変換を行う。その結果、群番号「１」に属するダミーレコードにおける「小児科１」は「婦人科０」に変換され、「男１」は「女０」に変換され、「児童０」は「老人０」に変換される。 The terminal device 400 removes records that do not belong to the search group from among the records included in the anonymized search result 67, and restores the remaining records. Records belonging to the true data record group (group number "0") do not need to be restored. The terminal device 400 stores the item values of the records belonging to the dummy record group (group number “1”) in the conversion set list 422 (see FIG. 15) having the same contents as the conversion set list 232 (see FIG. 33), conversion is performed in the reverse order of the generation of the disturbing query 65 . As a result, "pediatrics 1" in the dummy record belonging to group number "1" is converted to "gynecology 0", "male 1" is converted to "female 0", and "child 0" is converted to "elderly 0". converted.

端末装置４００は、各レコードに項目値として設定されている分割キーワードを元のキーワードに変換し、フラグを除去することで検索結果６８を生成する。図３４の例では、分割キーワードの最後の数字を削除することで、元のキーワードに変換することができる。 The terminal device 400 converts the split keywords set as item values in each record into the original keywords, removes the flags, and generates search results 68 . In the example of FIG. 34, by deleting the last number of the split keyword, it can be converted to the original keyword.

図３５は、検索処理の手順の一例を示すフローチャートである。以下、図３５に示す処理をステップ番号に沿って説明する。
［ステップＳ２０１］検索要求部４４０は、ユーザからの検索条件として入力された検索キーワードと、その検索キーワードに対応する項目を、平文検索クエリとして取得する。 FIG. 35 is a flowchart illustrating an example of a search processing procedure. The processing shown in FIG. 35 will be described below along with the step numbers.
[Step S201] The search request unit 440 acquires a search keyword input as a search condition by the user and an item corresponding to the search keyword as a plaintext search query.

［ステップＳ２０２］検索要求部４４０は、変換集合一覧４２２と分割キーワード一覧４２３を生成する。例えば検索要求部４４０は、データ登録要求部２５０における項目ごとの変換集合生成処理と同様の処理を行い、変換集合一覧４２２を生成する。検索要求部４４０によって生成される変換集合一覧４４２は、データ登録要求部２５０で生成された変換集合一覧２３２と同じものとなる。 [Step S202] The search request unit 440 generates a conversion set list 422 and a split keyword list 423. FIG. For example, the search request unit 440 performs processing similar to the conversion set generation processing for each item in the data registration request unit 250 to generate the conversion set list 422 . A conversion set list 442 generated by the search request unit 440 is the same as the conversion set list 232 generated by the data registration request unit 250 .

データ登録要求部２５０が乱数を用いてステップＳ１２２（図２８参照）のキーワードの選択順を決定している場合、検索要求部４４０は、データ登録要求部２５０が乱数の生成に使用した乱数シードをデータ登録サーバ２００から取得する。検索要求部４４０は、取得した乱数シードに基づいて、データ登録要求部２５０が変換集合に用いたものと同じ乱数を生成して、キーワードの選択順を決定する。その結果、データ登録サーバ２００が有する変換集合一覧２３２と同じ内容の変換集合一覧４２２を生成することができる。なお検索要求部４４０は、データ登録サーバ２００から変換集合一覧２３２を取得し、端末装置４００の変換集合一覧４２２として変換情報記憶部４２０に格納してもよい。 If the data registration requesting unit 250 uses random numbers to determine the keyword selection order in step S122 (see FIG. 28), the search requesting unit 440 uses the random number seed used by the data registration requesting unit 250 to generate the random numbers. Obtained from the data registration server 200 . The search request unit 440 generates the same random numbers as those used for the conversion set by the data registration request unit 250 based on the obtained random number seed, and determines the order of keyword selection. As a result, a conversion set list 422 having the same contents as the conversion set list 232 possessed by the data registration server 200 can be generated. Note that the search request unit 440 may acquire the conversion set list 232 from the data registration server 200 and store it in the conversion information storage unit 420 as the conversion set list 422 of the terminal device 400 .

検索要求部４４０が生成する分割キーワード一覧４２３は、データ登録サーバ２００における分割キーワード選択テーブル２３３から選択確率の情報を除いたものである。例えば検索要求部４４０は、データ登録サーバ２００と同じアルゴリズムで分割キーワードを生成する。例えば検索要求部４４０は、キーワード一覧４２１に示されるキーワードの後ろに、分割キーワードを識別する０から昇順の番号を追加することで、分割キーワードを生成する。 The split keyword list 423 generated by the search request unit 440 is obtained by removing the selection probability information from the split keyword selection table 233 in the data registration server 200 . For example, the search request unit 440 generates split keywords using the same algorithm as the data registration server 200 . For example, the search request unit 440 generates split keywords by adding numbers in ascending order from 0 for identifying split keywords after the keywords shown in the keyword list 421 .

［ステップＳ２０３］検索要求部４４０は、取得した平文検索クエリを分割する。例えば検索要求部４４０は、平文検索クエリに含まれる検索キーワードを分割キーワードに分割する。そして検索要求部４４０は、分割によって得られた分割キーワードごとの分割クエリを生成する。図３２に示すような分割キーワード一覧４２３が生成されているとき、検索キーワード「老人」は「老人０」と「老人１」とに分割される。そして検索キーワード「老人０」を含む分割クエリと検索キーワード「老人１」を含む分割クエリとが生成される。分割により生成された分割クエリの検索結果の論理和が、取得した平文検索クエリの検索結果である。 [Step S203] The search request unit 440 divides the obtained plaintext search query. For example, the search request unit 440 divides the search keyword included in the plaintext search query into divided keywords. Then, the search request unit 440 generates a split query for each split keyword obtained by splitting. When the divided keyword list 423 as shown in FIG. 32 is generated, the search keyword "old man" is divided into "old man 0" and "old man 1". Then, a divided query containing the search keyword "old man 0" and a divided query containing the search keyword "old man 1" are generated. The logical sum of the search results of the split queries generated by splitting is the search result of the acquired plaintext search query.

取得した平文検索クエリには、複数の検索キーワードが含まれる場合がある。複数の検索キーワードの論理和検索の場合、検索要求部４４０は、各検索キーワードを分割キーワードに分解し、分割キーワードごとの分割クエリを生成する。複数生成された分割クエリによる検索結果の論理和が、取得した平文検索クエリの検索結果である。 The obtained plaintext search query may contain multiple search keywords. In the case of a logical sum search of multiple search keywords, the search request unit 440 breaks down each search keyword into split keywords and generates split queries for each split keyword. The logical sum of the search results of the multiple generated split queries is the obtained search result of the plaintext search query.

また複数の検索キーワードの論理積検索の場合、検索要求部４４０は、項目が異なる分割キーワード間のすべての組み合わせを生成する。３つ以上の項目それぞれの検索キーワードの論理積の場合であれば、検索要求部４４０は、各項目から１ずつ分割キーワードを選択することで生成可能な分割キーワードのすべての組み合わせを生成する。そして検索要求部４４０は、生成した分割キーワードの組み合わせごとの論理積の平文検索クエリを生成する。この場合も、複数生成された分割クエリによる検索結果の論理和が、取得した平文検索クエリの検索結果である。 Also, in the case of a logical product search of a plurality of search keywords, the search request unit 440 generates all combinations of divided keywords with different items. In the case of a logical product of search keywords for three or more items, the search request unit 440 selects one split keyword from each item to generate all possible combinations of split keywords. Then, the search request unit 440 generates a plaintext search query of logical product for each combination of the generated divided keywords. Also in this case, the logical sum of the search results of the multiple generated split queries is the obtained search result of the plaintext search query.

例えば取得した平文検索クエリが「Ａ∧Ｂ」であり、検索キーワード「Ａ」は「Ａ０」と「Ａ１」に分割され、検索キーワード「Ｂ」は「Ｂ０」と「Ｂ１」に分割されるものとする。この場合、検索要求部４４０は、分割クエリとして、「Ａ０∧Ｂ０」、「Ａ０∧Ｂ１」、「Ａ１∧Ｂ０」、「Ａ１∧Ｂ１」を生成する。 For example, the obtained plaintext search query is "A∧B", the search keyword "A" is divided into "A0" and "A1", and the search keyword "B" is divided into "B0" and "B1". and In this case, the search request unit 440 generates "A0 Λ B0", "A0 Λ B1", "A1 Λ B0", and "A1 Λ B1" as split queries.

［ステップＳ２０４］検索要求部４４０は、生成された分割クエリのうち未選択のものを１つ選択する。
［ステップＳ２０５］検索要求部４４０は、真のデータのレコード群とダミーレコード群とを含むすべての群の中から、１つの群を確率的に選択する。検索要求部４４０は、選択したレコード群を対象として検索を行うこととなる。なお、秘匿化ＤＢ１１０内のすべてのダミーレコード群は真のデータのレコード群と値は異なるものの同一の頻度分布を有している。そのためダミーレコード群を検索対象としたとしても、検索結果として得られたダミーレコード内のダミー値を変換集合に従って逆変換することで正しい検索結果を得ることができる。 [Step S204] The search request unit 440 selects one of the generated divided queries that has not yet been selected.
[Step S205] The search request unit 440 stochastically selects one group from all groups including the true data record group and the dummy record group. The search request unit 440 searches the selected record group. All the dummy record groups in the anonymization DB 110 have the same frequency distribution as the true data record group, although the values are different. Therefore, even if a dummy record group is targeted for retrieval, correct retrieval results can be obtained by inversely transforming the dummy values in the dummy records obtained as retrieval results according to the transformation set.

このように分割クエリを変換集合に従って変換（攪乱）すれば、どの群を検索しても、平文データ群を検索した際と同一の結果が得られる。そのため、検索クエリの攪乱を行いながらも、検索クエリの数は増加しない。また検索要求部４４０が検索対象とする群をランダムに決定することで、攻撃者３３が検索される群の偏りによって秘匿化ＤＢの各レコードの群番号を特定することが抑止される。 By transforming (disturbing) the split query according to the transform set in this way, the same result as when the plaintext data group is retrieved can be obtained regardless of which group is retrieved. Therefore, the number of search queries does not increase while perturbing search queries. In addition, by randomly determining the group to be searched by the search request unit 440, the attacker 33 is prevented from specifying the group number of each record in the anonymization DB due to the bias of the group to be searched.

［ステップＳ２０６］検索要求部４４０は、ダミーレコード群を選択したか否かを判断する。検索要求部４４０は、選択したのがダミーレコード群であれば、処理をステップＳ２０７に進める。また検索要求部４４０は、選択したのが真のデータのレコード群であれば、処理をステップＳ２１２に進める。 [Step S206] The search request unit 440 determines whether or not a dummy record group has been selected. If a dummy record group is selected, search request unit 440 advances the process to step S207. If the selected record group is true data, search request unit 440 advances the process to step S212.

［ステップＳ２０７］検索要求部４４０は、生成した分割クエリを、変換集合に基づいて攪乱する。例えば検索要求部４４０は、分割クエリにおいて、検索対象の項目と分割キーワードが指定されている場合、まず検索対象の項目に対応する１以上の変換集合の中から、指定された分割キーワードに対応する要素を含む変換集合を特定する。次に検索要求部４４０は、選択したダミーレコード群の群番号を取得する。検索要求部４４０は、検索対象の分割キーワードに対応する変換集合内の要素から群番号の分だけ巡回的に右の要素を、その変換集合から取得する。そして検索要求部４４０は、分割クエリ内の変換対象の分割キーワードを、取得した要素の値（ダミー値）に変換する。 [Step S207] The search request unit 440 disturbs the generated split query based on the transformation set. For example, when a search target item and a split keyword are specified in a split query, the search request unit 440 first selects one or more conversion sets corresponding to the search target item and matches the specified split keyword. Identify the transform set containing the element. Next, the search request unit 440 acquires the group number of the selected dummy record group. The search request unit 440 obtains from the conversion set the elements to the right of the group number cyclically from the element in the conversion set corresponding to the divided keyword to be searched. Then, the search request unit 440 converts the conversion-target split keyword in the split query into the acquired element value (dummy value).

［ステップＳ２０８］検索要求部４４０は、ステップＳ２０７で変換された後の分割クエリを暗号化して秘匿化検索クエリを生成する。検索要求部４４０は、生成した秘匿化検索クエリをデータ管理サーバ１００に送信する。 [Step S208] The search request unit 440 encrypts the divided query converted in step S207 to generate an anonymized search query. The search request unit 440 transmits the generated anonymous search query to the data management server 100 .

［ステップＳ２０９］検索要求部４４０は、データ管理サーバ１００から秘匿化検索の検索結果（秘匿化検索結果）を取得する。検索要求部４４０は、秘匿化検索結果から、ステップＳ２０６で選択されたダミーレコード群に属するダミーレコードのみを抽出する。このとき検索要求部４４０は、各レコードのフラグ値に基づいて、選択されたダミーレコード群に属するダミーレコードを特定できる。 [Step S209 ] The search request unit 440 acquires search results of the anonymous search (anonymized search results) from the data management server 100 . The search request unit 440 extracts only dummy records belonging to the dummy record group selected in step S206 from the anonymized search results. At this time, the search request unit 440 can specify dummy records belonging to the selected dummy record group based on the flag value of each record.

［ステップＳ２１０］検索要求部４４０は、秘匿化検索結果に含まれる、抽出したダミーレコード内の項目値（ダミー値）を、予めデータ登録サーバ２００から取得した暗号鍵を用いて復号する。 [Step S210 ] The search request unit 440 decrypts the item values (dummy values) in the extracted dummy records, which are included in the anonymized search results, using the encryption key obtained in advance from the data registration server 200 .

［ステップＳ２１１］検索要求部４４０は、ステップＳ２０６で選択されたダミーレコード群に属するダミーレコードの復号されたダミー値を変換集合に基づいて逆変換し、平文の真の値が設定された検索結果を復元する。 [Step S211] The search requesting unit 440 inversely transforms the decoded dummy values of the dummy records belonging to the dummy record group selected in step S206 based on the transformation set, and the search results in which the true value of the plaintext is set. to restore.

図３６は、ダミー値の逆変換の一例を示す図である。例えば検索要求部４４０は、ダミー値が属する項目に対応する１以上の変換集合の中からダミー値に対応する要素を含む変換集合を特定する。図３６の例では、変換集合６９が特定されたものとする。次に検索要求部４４０は、ステップＳ２０６で選択したダミーレコード群の群番号を取得する。検索要求部４４０は、ダミー値に対応する変換集合６９内の要素から、群番号の分だけ巡回的に左の要素を、その変換集合６９から取得する。そして検索要求部４４０は、復号されたダミー値を、取得した要素の値（真のデータの分割キーワード）に変換する。例えば群番号「２」のダミーレコード群に属するダミーレコードのダミー値が「児童１」の場合、そのダミー値は変換集合６９において「児童１」の左側に２つ目の要素「成人１」に逆変換される。 FIG. 36 is a diagram showing an example of inverse transformation of dummy values. For example, the search request unit 440 identifies a conversion set including an element corresponding to the dummy value from among one or more conversion sets corresponding to the item to which the dummy value belongs. In the example of FIG. 36, it is assumed that transformation set 69 has been specified. Next, the search request unit 440 acquires the group number of the dummy record group selected in step S206. The search request unit 440 cyclically acquires from the conversion set 69 the left element by the group number from the element in the conversion set 69 corresponding to the dummy value. Then, the search request unit 440 converts the decoded dummy value into the value of the acquired element (divided keyword of true data). For example, if the dummy value of the dummy record belonging to the dummy record group with the group number "2" is "Child 1", the dummy value is added to the second element "Adult 1" to the left of "Child 1" in the conversion set 69. inversely transformed.

図３６に示した逆変換の関係は、データ登録サーバ２００がデータ登録時に行った項目値の変換に用いた写像の逆写像となる全単射関係である。選択されたダミーレコード群に属するダミーレコード内のすべてのダミー値に対してこのような逆変換を行うことで、ダミー値に対応する分割キーワードが得られる。検索要求部４４０は、分割キーワード一覧に基づいて、分割キーワードを元のキーワード（真の値）に変換する。これにより真の検索結果が生成される。その後、検索要求部４４０は、処理をステップＳ２１５に進める。 The inverse transformation relationship shown in FIG. 36 is a bijective relationship that is the inverse mapping of the mapping used for the conversion of the item values performed by the data registration server 200 at the time of data registration. By performing such inverse transformation on all dummy values in the dummy records belonging to the selected dummy record group, split keywords corresponding to the dummy values are obtained. The search request unit 440 converts the split keywords into the original keywords (true values) based on the list of split keywords. This produces true search results. After that, search request unit 440 advances the process to step S215.

以下、図３５の説明に戻る。
［ステップＳ２１２］検索要求部４４０は、検索条件に真のデータのレコード群に対応するフラグ値を追加した検索クエリを暗号化して秘匿化検索クエリを生成し、秘匿化検索クエリをデータ管理サーバ１００に送信する。 Hereinafter, the description will return to FIG.
[Step S212] The search request unit 440 encrypts the search query with the flag value corresponding to the true data record group added to the search condition, generates an anonymized search query, and sends the anonymized search query to the data management server 100. Send to

［ステップＳ２１３］検索要求部４４０は、データ管理サーバ１００から秘匿化検索クエリを用いた秘匿化検索結果を取得する。
［ステップＳ２１４］検索要求部４４０は、秘匿化検索結果に含まれる、真のデータのレコード群のレコード内の項目値を復号する。なお検索要求部４４０は、ステップＳ２１０と同様に、選択した群のレコード（真のデータのレコード）を特定する。検索要求部４４０は、真のデータのレコード内の分割キーワードを、分割キーワード一覧に基づいて元のキーワードに変換し、平文の真の値が設定された検索結果を得る。 [Step S213 ] The search request unit 440 acquires anonymous search results using the anonymous search query from the data management server 100 .
[Step S214] The search request unit 440 decodes the item values in the records of the true data record group included in the anonymized search results. Similar to step S210, search request unit 440 identifies the selected group of records (true data records). The search request unit 440 converts the split keywords in the record of the true data into the original keywords based on the list of split keywords, and obtains search results in which plain text true values are set.

［ステップＳ２１５］検索要求部４４０は、平文の検索結果を出力する。
［ステップＳ２１６］検索要求部４４０は、すべての分割クエリが選択されたか否かを判断する。検索要求部４４０は、未選択の分割クエリがある場合、処理をステップＳ２０４に進める。また検索要求部４４０は、すべての分割クエリが選択され、対応する秘匿化検索クエリによる検索が終了した場合、検索処理を終了する。 [Step S215] The search request unit 440 outputs plaintext search results.
[Step S216] The search request unit 440 determines whether or not all divided queries have been selected. If there is an unselected split query, the search request unit 440 advances the process to step S204. Moreover, the search request unit 440 ends the search processing when all the divided queries are selected and the search by the corresponding anonymized search query ends.

検索結果は、例えば検索結果表示画面に表示される。
図３７は、検索結果表示画面の一例を示す図である。検索結果表示画面７０には、例えば患者に関する検索条件と、その検索条件に適合する患者の数が表示されている。また検索結果表示画面７０には、検索でヒットしたレコードの平文のデータが表示されている。 Search results are displayed, for example, on a search result display screen.
FIG. 37 is a diagram showing an example of a search result display screen. The search result display screen 70 displays, for example, search conditions related to patients and the number of patients who match the search conditions. The search result display screen 70 also displays plaintext data of records hit by the search.

このような検索により、Ｇ個の分割キーワードの集合ごとに項目値の出現頻度が均等化される。その結果、項目値ごとの出現頻度の偏りが抑止され、頻度分析攻撃に対する安全性が向上する。しかも分割キーワードの出現頻度は秘密の選択確率によって決まっているため、頻度攪乱後のあるキーワードの出現頻度に基づいて、そのキーワードの候補を絞り込むことは困難である。 Such a search equalizes the frequency of appearance of item values for each set of G divided keywords. As a result, bias in appearance frequency for each item value is suppressed, and security against frequency analysis attacks is improved. Moreover, since the appearance frequency of the divided keywords is determined by the secret selection probability, it is difficult to narrow down the keyword candidates based on the appearance frequency of a certain keyword after frequency disturbance.

以下、図３８と図３９を参照して、秘匿化ＤＢ１１０頻度分析攻撃の困難性について説明する。
図３８は、分割キーワードによる頻度攪乱の一例を示す図である。例えばＤＢ２１０内の年齢層の項目について、「老人」の出現頻度が「９８」、「成人」の出現頻度が「３５」、「青年」の出現頻度が「９１」、児童の出現頻度が「６３」であるものとする。各キーワードは２つずつに分割され、図１５に示す変換集合２３２ｇ～２３２ｊによって群「１」のダミーレコードの項目値がダミー値への変換が行われることで、出現頻度攪乱が実現される。 The difficulty of the anonymized DB 110 frequency analysis attack will be described below with reference to FIGS. 38 and 39. FIG.
FIG. 38 is a diagram showing an example of frequency disturbance by split keywords. For example, regarding the age group items in the DB 210, the appearance frequency of "elderly" is "98", the appearance frequency of "adult" is "35", the appearance frequency of "youth" is "91", and the appearance frequency of children is "63". ” shall be Each keyword is divided into two, and the conversion sets 232g to 232j shown in FIG. 15 convert the item values of the dummy records of the group "1" into dummy values, thereby realizing frequency disturbance.

秘匿化ＤＢ１１０には頻度攪乱後のデータが登録され、変換集合ごとに分割キーワードの出現頻度が均等化されている。図中、実線の矩形内の数字は群「０」（真のデータのレコード群）における出現頻度であり、破線の矩形内の数字は群「１」（ダミーレコード群）における出現頻度である。 Data after frequency disturbance is registered in the anonymization DB 110, and the frequency of appearance of divided keywords is equalized for each conversion set. In the figure, the numbers in solid line rectangles are the frequencies of occurrence in group "0" (true data record group), and the numbers in dashed line rectangles are the appearance frequencies in group "1" (dummy record group).

図３８の例では、「Ｈ（老人０）」と「Ｈ（児童０）」の出現頻度は共に「６５」である。「Ｈ（老人１）」と「Ｈ（成人０）」の出現頻度は共に「８１」である。「Ｈ（成人１）」と「Ｈ（青年０）」の出現頻度は共に「８０」である。「Ｈ（青年１）」と「Ｈ（児童１）」の出現頻度は共に「６１」である。 In the example of FIG. 38, the frequency of appearance of both "H (old man 0)" and "H (child 0)" is "65". The frequency of appearance of both "H (old man 1)" and "H (adult 0)" is "81". The frequency of occurrence of both "H (adult 1)" and "H (youth 0)" is "80". The frequency of appearance of both "H (youth 1)" and "H (child 1)" is "61".

図３９は、頻度分析攻撃の一例を示す図である。例えば攻撃者３３は、秘匿化ＤＢ１１０への検索状況を盗み見ることで、検索によってヒットしたレコード数により攪乱後頻度「６５，８１，８０，６１」の４パターンであることを把握できたものとする。 FIG. 39 is a diagram illustrating an example of a frequency analysis attack; For example, it is assumed that the attacker 33, by spying on the search status of the anonymized DB 110, is able to grasp the four patterns of post-disturbance frequencies "65, 81, 80, 61" based on the number of records hit by the search. .

攻撃者３３がＤＢ２１０における各キーワードの出現頻度を知っているとき、攻撃者３３がそれらの出現頻度を組み合わせても攪乱後頻度とはならない。そこで攻撃者３３は、分割頻度と変換集合の要素の組み合わせを総当たりで、組み合わせごとの攪乱後頻度を算出することが考えられる。 When the attacker 33 knows the appearance frequency of each keyword in the DB 210, even if the attacker 33 combines those appearance frequencies, the post-disturbance frequency is not obtained. Therefore, it is conceivable for the attacker 33 to brute-force combinations of division frequencies and conversion set elements to calculate post-disturbance frequencies for each combination.

例えば「老人」であれば、「老人０」と「老人１」それぞれの分割頻度の比率は「０：９８」、「１：９７」、「２：９６」、・・・、「９８：０」の９９通り存在する。同様に「成人」についての分割頻度の比率は３６通り、「青年」についての分割頻度の比率は９２通り、「児童」についての分割頻度の比率は６４通り存在する。 For example, for "elderly", the ratios of division frequencies for "elderly 0" and "elderly 1" are "0:98", "1:97", "2:96", ..., "98:0". ” exists in 99 ways. Similarly, there are 36 division frequency ratios for "adult", 92 division frequency ratios for "youth", and 64 division frequency ratios for "child".

また同じ変換集合の要素となる分割キーワードの組み合わせも、多数のパターンが考えられる。例えば「老人０」と同じ変換集合に属する分割キーワードは、「児童０」以外にも「成人０」の場合、「青年０」の場合も考えられる。変換集合の生成アルゴリズムが既知であったとしても、変換集合を生成する前にキーワードをランダムに並べ替えておけば、どのような組み合わせの変換集合が生成されるのかを秘密にしておくことは可能である。 Also, there are many possible patterns of combinations of split keywords that are elements of the same conversion set. For example, divided keywords belonging to the same conversion set as "old man 0" may be "adult 0" and "youth 0" in addition to "child 0". Even if the transformation set generation algorithm is known, it is possible to keep secret what combinations of transformation sets are generated by randomly rearranging the keywords before generating the transformation set. is.

攻撃者３３は、これらのすべての組み合わせの可能性を考慮して、攪乱後頻度「６５，８１，８０，６１」それぞれの検索対象のキーワードを探し出すこととなる。しかし、攪乱後頻度が「６５，８１，８０，６１」となるような分割頻度と変換集合のパターンは多数存在する。例えば図１５に示す通りの変換集合２３２ｇ～２３２ｊであったとしても、変換集合２３２ｇ～２３２ｊそれぞれの分割キーワードの攪乱後頻度が「６５，８１，８０，６１」となる分割頻度の組み合わせは、図３９に示すように３５通り存在する。 The attacker 33 considers all of these possible combinations, and searches for the search target keywords of the post-disturbance frequencies “65, 81, 80, 61”. However, there are many patterns of division frequencies and transformation sets that result in post-perturbation frequencies of "65, 81, 80, 61". For example, even if there are conversion sets 232g to 232j as shown in FIG. As shown in 39, there are 35 ways.

また変換集合２３２ｇ～２３２ｊそれぞれの分割キーワードの攪乱後頻度が「６１，８０，８１，６５」の場合（出現頻度例７２）がある。この出現頻度例７２の場合を、変換集合２３２ｇ～２３２ｊそれぞれの分割キーワードの攪乱後頻度が「６５，８１，８０，６１」の場合（出現頻度例７１）と区別することはできない。変換集合２３２ｇ～２３２ｊそれぞれの分割キーワードの攪乱後頻度が「６１，８０，８１，６５」の場合（出現頻度例７２）についても、そのような攪乱後頻度となる分割頻度の組み合わせは多数存在する。 Further, there is a case where the post-conversion frequencies of the divided keywords of the conversion sets 232g to 232j are "61, 80, 81, 65" (appearance frequency example 72). The case of appearance frequency example 72 cannot be distinguished from the case (appearance frequency example 71) in which the post-disturbance frequencies of the divided keywords of the conversion sets 232g to 232j are "65, 81, 80, 61". In the case where the post-disturbance frequencies of the divided keywords of the conversion sets 232g to 232j are "61, 80, 81, 65" (appearance frequency example 72), there are many combinations of the post-disturbance frequencies. .

そのため攻撃者３３は、検索が実行されたときにヒットしたレコード数から検索対象のキーワードを絞り込もうとしても、いずれのキーワードもあり得ることとなり、絞り込みは困難である。例えば検索にヒットしたレコード数が「８１」であったとしても、出現頻度例７１または出現頻度例７２があり得ることを考慮すると、検索対象のキーワードとしてはいずれのキーワードもあり得ることとなる。このように頻度分析攻撃が困難となっている。 Therefore, even if the attacker 33 tries to narrow down the keywords to be searched based on the number of hit records when the search is executed, any keyword can be included, and narrowing down is difficult. For example, even if the number of records hit by the search is "81", considering that there may be an appearance frequency example 71 or an appearance frequency example 72, any keyword can be used as a keyword to be searched. This makes frequency analysis attacks difficult.

また端末装置４００では、検索結果としてダミーレコードを取得した場合でも、そのダミーレコードのダミー値に基づいて、入力された検索条件に適合する真のデータの項目値を取得することができる。そのため、攻撃者３３がデータ管理サーバ１００から送信された検索結果を取得したとしても、検索結果に含まれる項目値が真のデータの項目値なのかダミー値なのかを、攻撃者３３は判別することができない。その結果、真のデータの機密性が向上する。 Even when a dummy record is acquired as a search result, the terminal device 400 can acquire the item value of the true data that matches the input search condition based on the dummy value of the dummy record. Therefore, even if the attacker 33 obtains the search results sent from the data management server 100, the attacker 33 determines whether the item values included in the search results are true data item values or dummy values. I can't. As a result, true data confidentiality is improved.

なお、変換集合一覧２３２は秘密情報であり、秘匿化ＤＢ１１０にデータを登録するデータ登録サーバ２００と、その秘匿化ＤＢ１１０の検索が許可されている端末装置４００のみが変換集合一覧２３２を有する。また１レコードごとのレコードの追加・削除を行うと、追加・削除された暗号文と平文の対応付けが可能となる。そのためデータ登録サーバ２００は、１行単位での変換集合が特定されるため、１レコードごとのレコードの追加・削除は行わない。すなわちデータ登録サーバ２００は、所定数以上の量のレコードを纏めて、ＤＢ２１０および秘匿化ＤＢ１１０への追加または削除を行う。 Note that the conversion set list 232 is confidential information, and only the data registration server 200 that registers data in the anonymization DB 110 and the terminal device 400 that is permitted to search the anonymization DB 110 have the conversion set list 232 . Also, by adding/deleting records one record at a time, it becomes possible to associate the added/deleted ciphertext and plaintext. Therefore, the data registration server 200 does not add or delete records for each record since the conversion set is specified for each line. That is, the data registration server 200 collects a predetermined number or more of records and adds or deletes them to the DB 210 and the anonymization DB 110 .

〔第３の実施の形態〕
次に第３の実施の形態について説明する。第３の実施の形態は、ダミーレコードを用いずに頻度攪乱を行うものである。 [Third Embodiment]
Next, a third embodiment will be described. The third embodiment performs frequency perturbation without using dummy records.

前述の第２の実施の形態では複数のダミーレコードを有するダミーデータを追加しているため、秘匿化ＤＢ１１０内のレコード数は、元のデータのレコード数の整数倍となる。第３の実施の形態ではデータ登録サーバ２００は、ダミーレコードの生成に代えて、分割キーワードを所定数ずつの変換集合に纏め、同じ変換集合に属する分割キーワードを共通の値（共有キーワード）に変換することで頻度攪乱を行う。すなわち、複数の値が共通の値に変換されることで、秘匿化ＤＢ内の各値の出現頻度は、真のデータの各値の出現頻度とは異なる数となる。このときデータ登録サーバ２００は、各レコードに、共有キーワードの変換元の分割キーワードを示す情報をフラグとして設定する。 Since dummy data having a plurality of dummy records is added in the above-described second embodiment, the number of records in the anonymization DB 110 is an integral multiple of the number of original data records. In the third embodiment, instead of generating dummy records, the data registration server 200 collects the divided keywords into a predetermined number of conversion sets, and converts the divided keywords belonging to the same conversion set into a common value (shared keyword). to perform frequency perturbation. That is, by converting a plurality of values into a common value, the frequency of appearance of each value in the anonymization DB becomes a number different from the frequency of appearance of each value of true data. At this time, the data registration server 200 sets, in each record, information indicating the split keyword that is the conversion source of the shared keyword as a flag.

図４０は、共通キーワードへの変換の一例を示す図である。第３の実施の形態では、データ登録サーバ２００は、第２の実施の形態における変換集合一覧に代えて、共有集合一覧２３４を生成する。 FIG. 40 is a diagram showing an example of conversion to common keywords. In the third embodiment, the data registration server 200 generates a shared set list 234 instead of the conversion set list in the second embodiment.

共有集合一覧２３４には、項目ごとの１以上の共有集合２３４ａ～２３４ｊが含まれる。共有集合２３４ａ～２３４ｊの生成条件は、少なくとも１つの要素が分割キーワードであり、かつその分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の変換集合に含まれることである。換言すると共有集合２３４ａ～２３４ｊそれぞれは、１つのキーワードに対応するｎ個の分割キーワードのうちのｎ－１個以下の分割キーワードを含む。 The sharing set list 234 includes one or more sharing sets 234a-234j for each item. The conditions for generating the shared sets 234a to 234j are that at least one element is a split keyword, and another split keyword that shares the split keyword with the conversion source keyword is included in another conversion set. In other words, each of the shared sets 234a-234j includes n−1 or less divided keywords out of the n divided keywords corresponding to one keyword.

図４０の例では、共有集合一覧２３４には、診療科についての３つの共有集合２３４ａ～２３４ｃ、性別についての３つの共有集合２３４ｄ～２３４ｄｆ、および年齢層についての４つの共有集合２３４ｇ～２３４ｊが含まれている。共有集合２３４ｄ，２３４ｆには１つの要素が含まれ、それ以外の共有集合２３４ａ～２３４ｃ，２３４ｅ，２３４ｇ～２３４ｊには、２つずつの要素が含まれている。共有集合２３４ａ～２３４ｊに含まれる要素は、すべて分割キーワードである。また分割しないキーワードがある場合、そのキーワードも共有集合２３４ａ～２３４ｊの要素となる。 In the example of FIG. 40, the shared set list 234 includes three shared sets 234a-234c for clinical departments, three shared sets 234d-234df for gender, and four shared sets 234g-234j for age groups. is The shared sets 234d and 234f contain one element, and the other shared sets 234a to 234c, 234e and 234g to 234j contain two elements each. The elements contained in shared sets 234a-234j are all split keywords. Also, if there is a keyword that is not divided, that keyword is also an element of the shared set 234a-234j.

データ登録サーバ２００は、第２の実施の形態と同様に、分割キーワード選択テーブル２３３（図１４参照）を参照して、平文のＤＢ２１０に格納されていたキーワードを、そのキーワードに対応する複数の分割キーワードのうちの１つに確率的に変換する。そしてデータ登録サーバ２００は、分割済データ２１１を生成する。 As in the second embodiment, the data registration server 200 refers to the divided keyword selection table 233 (see FIG. 14), and divides the keyword stored in the plaintext DB 210 into a plurality of divisions corresponding to the keyword. Convert probabilistically to one of the keywords. The data registration server 200 then generates the divided data 211 .

さらにデータ登録サーバ２００は、分割済データ２１１内の分割キーワードを、共有集合一覧２３４に基づいて共有キーワードへ変換する。例えばデータ登録サーバ２００は、１つの共有集合内の要素それぞれに対応する分割キーワードは、同じ共有キーワードに変換する。図４０の例では、共有集合２３４ａ～２３４ｊに含まれる要素を列挙した文字列が共有キーワードとなっている。 Furthermore, the data registration server 200 converts the divided keywords in the divided data 211 into shared keywords based on the shared set list 234 . For example, the data registration server 200 converts split keywords corresponding to elements in one shared set into the same shared keyword. In the example of FIG. 40, the shared keyword is a character string listing the elements included in the shared sets 234a to 234j.

なお図４０における共有集合２３４ａ～２３４ｊ内の「：」の左の数字は要素番号である。「：」の右の文字列が要素として設定された分割キーワードである。
例えば共有集合２３４ａには「小児科０」と「内科０」とが要素として含まれる。そのため分割済データ２１１内の分割キーワード「小児科０」と「内科０」は、いずれも共通キーワード「小児科０，内科０」に変換される。なお共有集合２３４ｄには、要素が「男０」のみである。そのため共有集合２３４ｄに対応する共有キーワードは、分割キーワードと同じ「男０」である。 Note that the numbers to the left of ":" in shared sets 234a to 234j in FIG. 40 are element numbers. The character string to the right of ":" is a split keyword set as an element.
For example, the shared set 234a includes "pediatrics 0" and "internal medicine 0" as elements. Therefore, the divided keywords “pediatrics 0” and “internal medicine 0” in the divided data 211 are both converted into the common keyword “pediatrics 0, internal medicine 0”. It should be noted that the common set 234d has only "man 0" as an element. Therefore, the shared keyword corresponding to the shared set 234d is "male 0", which is the same as the divided keyword.

分割済データ２１１内の各レコードに対応するレコードが、登録データ８１として生成される。そしてデータ登録サーバ２００は、登録データ８１内に各レコードにフラグを付与する。登録データ８１は、第１の実施の形態に示した攪乱レコード群６の一例である。 A record corresponding to each record in the divided data 211 is generated as registration data 81 . The data registration server 200 then assigns a flag to each record within the registration data 81 . Registration data 81 is an example of disturbance record group 6 shown in the first embodiment.

フラグは、変換元の分割キーワードに対応する要素の要素番号と、変換元のレコードのＩＤを含んでいる。例えば分割済データ２１１のＩＤ「０」のレコードは、診療科「内科０」、性別「男１」、年齢層「老人１」を含む。「内科０」は「小児科０，内科０」に変換されている。共有集合２３４ａにおける「内科０」の要素番号は「１」である。「男１」は「男１，女０」に変換されている。共有集合２３４ｄにおける「男１」の要素番号は「０」である。「老人１」は「老人１，成人０」に変換されている。「老人１」の要素番号は「０」である。すると変換後のレコードには、レコードのＩＤ「０」、「内科０」の要素番号「１」、「男１」の要素番号「０」、および「老人１」の要素番号「０」を含むフラグ「０（１，０，０）」が付与される。 The flag includes the element number of the element corresponding to the split keyword of the conversion source and the ID of the record of the conversion source. For example, the record with the ID "0" in the divided data 211 includes the clinical department "internal medicine 0", the sex "male 1", and the age group "elderly 1". "Internal Medicine 0" is converted to "Pediatrics 0, Internal Medicine 0". The element number of "internal medicine 0" in the shared set 234a is "1". "Male 1" is converted to "Male 1, Female 0". The element number of "man 1" in the shared set 234d is "0". "Older 1" is converted to "Older 1, Adult 0". The element number of "old man 1" is "0". Then, the converted record contains record ID "0", element number "1" of "internal medicine 0", element number "0" of "man 1", and element number "0" of "old man 1". A flag "0 (1, 0, 0)" is given.

データ登録サーバ２００は、登録データ８１を項目値ごと（フラグも含め）に暗号化して、秘匿化ＤＢ１１０に登録する。その際、データ登録サーバ２００は、登録データ８１の各レコードにランダムなＩＤを付与し、ＩＤでソートしてもよい。 The data registration server 200 encrypts the registration data 81 for each item value (including the flag) and registers it in the anonymization DB 110 . At that time, the data registration server 200 may assign a random ID to each record of the registration data 81 and sort by the ID.

端末装置４００は、図３２に示す分割キーワード一覧４２３を有すると共に、データ登録サーバ２００が有する共有集合一覧２３４と同じ内容の共有集合一覧を有する。そして端末装置４００は、秘匿化ＤＢ１１０を検索する場合、検索キーワードを対応する分割キーワードそれぞれに変換後、さらに分割キーワードを共有キーワードに変換する。そして端末装置４００は、共有キーワードを含む秘匿化検索クエリをデータ管理サーバ１００に送信する。 The terminal device 400 has a divided keyword list 423 shown in FIG. 32 and a shared set list having the same content as the shared set list 234 that the data registration server 200 has. When searching the anonymization DB 110, the terminal device 400 converts the search keyword into corresponding divided keywords, and then converts the divided keywords into shared keywords. The terminal device 400 then transmits an anonymized search query including the shared keyword to the data management server 100 .

なお、共有集合一覧２３４は秘密情報であり、秘匿化ＤＢ１１０にデータを登録するデータ登録サーバ２００と、その秘匿化ＤＢ１１０の検索が許可されている端末装置４００のみが共有集合一覧２３４を有する。また１レコードごとのレコードの追加・削除を行うと、追加・削除された暗号文と平文の対応付けが可能となる。そのためデータ登録サーバ２００は、１行単位での共有集合が特定されるため、１レコードごとのレコードの追加・削除は行わない。すなわちデータ登録サーバ２００は、所定数以上の量のレコードを纏めて、ＤＢ２１０および秘匿化ＤＢ１１０への追加または削除を行う。 Note that the shared set list 234 is confidential information, and only the data registration server 200 that registers data in the anonymization DB 110 and the terminal device 400 that is permitted to search the anonymized DB 110 have the shared set list 234 . Also, by adding/deleting records one record at a time, it becomes possible to associate the added/deleted ciphertext and plaintext. Therefore, the data registration server 200 does not add/delete records for each record since the shared set is specified in units of one line. That is, the data registration server 200 collects a predetermined number or more of records and adds or deletes them to the DB 210 and the anonymization DB 110 .

データ管理サーバ１００は、秘匿化検索クエリに基づいて、秘匿化ＤＢ１１０から、該当する共有キーワードを含むレコードを検索する。秘匿化ＤＢ１１０内の共有キーワードの出現頻度は、共有キーワードの変換元となった各分割キーワードの出現頻度の合計である。また、分割キーワードは確率的に生成されている。そのため攻撃者３３が共有キーワードの出現頻度を取得したとしても、元の検索キーワードを推定するのは困難である。 Based on the anonymized search query, the data management server 100 searches the anonymized DB 110 for records containing the relevant shared keyword. The appearance frequency of the shared keyword in the anonymization DB 110 is the total appearance frequency of each divided keyword that is the conversion source of the shared keyword. Also, the split keywords are generated stochastically. Therefore, even if the attacker 33 obtains the appearance frequency of the shared keyword, it is difficult to estimate the original search keyword.

端末装置４００は、秘匿化検索クエリにヒットしたレコードをデータ管理サーバ１００から取得すると、そのレコードのフラグに基づいて、共通キーワードを、変換元の分割キーワードに変換することができる。そして端末装置４００は、分割キーワード一覧４２３に基づいて分割キーワードを元のキーワードに変換する。これにより端末装置４００は、検索キーワードを含むレコードを取得することができる。 When the terminal device 400 acquires a record hit by the anonymized search query from the data management server 100, the terminal device 400 can convert the common keyword into the split keyword of the conversion source based on the flag of the record. Then, the terminal device 400 converts the split keywords into the original keywords based on the split keyword list 423 . Thereby, the terminal device 400 can acquire a record including the search keyword.

またフラグ値にはレコードのＩＤが含まれているため、フラグ値を暗号化したときの暗号文がレコードごとにユニークな値となる。これにより、フラグ値の頻度分析攻撃を防ぐことができる。 Also, since the flag value includes the ID of the record, the ciphertext when the flag value is encrypted becomes a unique value for each record. This can prevent flag value frequency analysis attacks.

図４０には、ＤＢ２１０に登録可能なすべてのキーワードについて分割キーワードに変換する例を示しているが、すべての共有集合が生成条件を満たせるのであれば、一部のキーワードについては分割しなくてもよい。 FIG. 40 shows an example in which all keywords that can be registered in the DB 210 are converted into split keywords. good.

図４１は、共有集合の他の生成例を示す第１の図である。図４１に示す共有集合一覧８２には、診療科についての３つの共有集合８２ａ～８２ｃ、性別についての２つの共有集合８２ｄ～８２ｅ、および年齢層についての３つの共有集合８２ｆ～８２ｈが含まれている。図４１の例では、診療科の「内科」、性別のキーワード「女」、年齢層のキーワード「青年」および「児童」については分割キーワードへの変換が行われない。そのため、これらのキーワードはそのまま共有集合の要素となっている。なお共有集合８２ｃ，８２ｅに含まれる要素は１つである。第２の実施の形態における変換集合と異なり、共有集合は、要素が１つであってもよい。 FIG. 41 is a first diagram showing another example of generating shared sets. The shared set list 82 shown in FIG. 41 includes three shared sets 82a to 82c for clinical departments, two shared sets 82d to 82e for gender, and three shared sets 82f to 82h for age groups. there is In the example of FIG. 41, the clinical department "internal medicine", the gender keyword "female", and the age group keywords "youth" and "child" are not converted into divided keywords. Therefore, these keywords are elements of the shared set as they are. The number of elements included in the shared sets 82c and 82e is one. Unlike the transformation set in the second embodiment, the shared set may have one element.

共有集合一覧８２に示す各共有集合８２ａ～８２ｈは、共有集合の生成条件を満たしている。ただし共有集合一覧８２を用いて共有キーワードへの変換を行うと、図４０に示した共有集合一覧２３４を用いた場合に比べて、平文推定の難易度が下がる。例えば共有集合一覧８２を用いて共有キーワードへの変換を行った秘匿化ＤＢ１１０に対して、攻撃者３３は以下のような攻撃が可能である。 Each sharing set 82a to 82h shown in the sharing set list 82 satisfies the sharing set generation conditions. However, if conversion to a shared keyword is performed using the shared set list 82, the degree of difficulty of plaintext estimation is lowered compared to the case where the shared set list 234 shown in FIG. 40 is used. For example, the attacker 33 can make the following attacks against the anonymization DB 110 that has been converted into shared keywords using the shared set list 82 .

例えば攻撃者３３が性別に対する検索の検索対象の特定を試みる場合を想定する。攻撃者３３は、性別の共有集合８２ｄ，８２ｅそれぞれの共有キーワードごとの攪乱後頻度を得る。例えば攻撃者３３は、検索者が性別「女」のレコードを検索した場合にヒットしたレコード数を取得することで、共有集合８２ｄに対応する共有キーワードの攪乱後頻度（「男０」の頻度と「女」の頻度との合計）を得ることができる。攻撃者３３は、共有集合８２ｄに対応する共有キーワードの攪乱後頻度から平文の「女」の頻度を減算する。すると「男０」の頻度となる。さらに攻撃者３３は、減算結果にもう１つの共有集合８２ｅに対応する共有キーワードの攪乱後頻度を加算する。加算後の頻度は、「男０」の頻度と「男１」の頻度との合計である。 Assume, for example, that the attacker 33 attempts to identify a search target for searching for gender. The attacker 33 obtains post-disturbance frequencies for each shared keyword of each of the gender shared sets 82d and 82e. For example, the attacker 33 acquires the number of hit records when the searcher searches for records with the gender “female”, thereby obtaining the post-disturbance frequency of the shared keyword corresponding to the shared set 82d (the frequency of “male 0” and the number of hits). frequency of “female” and the total) can be obtained. The attacker 33 subtracts the frequency of the plaintext "woman" from the post-disturbance frequency of the shared keyword corresponding to the shared set 82d. Then, the frequency of "male 0" is obtained. Furthermore, the attacker 33 adds the post-disturbance frequency of the shared keyword corresponding to another shared set 82e to the subtraction result. The frequency after addition is the sum of the frequency of "Male 0" and the frequency of "Male 1".

攻撃者３３は、加算結果が「男」の頻度と等しくなることが確認できれば、共有集合８２ｄに対応する共有キーワードの攪乱後頻度となる検索が行われた場合、対応する共有集合８２ｄには「女」が含まれると判断できる。以後、攻撃者３３は、検索者により共有集合８２ｄに対応する共有キーワードの攪乱後頻度となる検索のみが行われた場合、「女」を検索したと特定することができる。 If the attacker 33 can confirm that the addition result is equal to the frequency of "male", and if a search with the post-disturbance frequency of the shared keyword corresponding to the shared set 82d is performed, the corresponding shared set 82d includes " It can be judged that "woman" is included. After that, the attacker 33 can identify that the searcher has searched for "female" only when the searcher performs only a search with the post-disruption frequency of the shared keyword corresponding to the shared set 82d.

このように、共有集合一覧８２に示すような共有集合８２ａ～８２ｈでは、性別のように登録可能なキーワード数が少ない項目がある場合に、頻度分析攻撃に対する安全性が低下してしまう。ただし、各項目の登録可能なキーワード数が多ければ、攻撃者３３が行う攻撃で用いるキーワード（攻撃において該当キーワードの頻度の加減算を行う）の組み合わせ数が膨大となる。その結果、共有集合一覧８２に示すように一部の要素が、ＤＢ２１０に登録されるキーワードのまま（分割キーワードに変換されていない）であっても十分に安全となる。 In this way, in the shared sets 82a to 82h shown in the shared set list 82, if there is an item such as gender that has a small number of keywords that can be registered, the security against frequency analysis attacks is lowered. However, if the number of keywords that can be registered for each item is large, the number of combinations of keywords used in attacks by the attacker 33 (addition and subtraction of the frequency of the relevant keywords in the attacks) becomes enormous. As a result, as shown in the shared set list 82, some elements are sufficiently secure even if the keywords registered in the DB 210 remain as they are (not converted into split keywords).

なお第３の実施の形態においても、第２の実施の形態における変換集合と同様、含まれる分割キーワードの変換元のキーワードの組み合わせが同じとなる共有集合が複数ある場合、頻度分析攻撃に対して脆弱となる。そのためデータ登録要求部２５０は、含まれる分割キーワードの変換元のキーワードの組み合わせが互いに異なるように、複数の共有集合を生成する。 In the third embodiment, as in the conversion set in the second embodiment, if there are a plurality of shared sets in which the combination of the conversion source keywords of the divided keywords is the same, the frequency analysis attack become vulnerable. Therefore, the data registration requesting unit 250 generates a plurality of shared sets so that the combinations of keywords that are the conversion sources of the divided keywords are different from each other.

第３の実施の形態を実現するために各装置が有する要素は、図１８に示した第２の実施の形態の各装置の要素と同じである。ただし第３の実施の形態では、各要素の処理内容の一部または記憶する情報の一部が第２の実施の形態と異なる。 Elements of each device for realizing the third embodiment are the same as those of each device of the second embodiment shown in FIG. However, in the third embodiment, part of the processing contents of each element or part of the information to be stored differs from the second embodiment.

第３の実施の形態ではデータ登録サーバ２００の変換情報記憶部２３０に格納される情報が、第２の実施の形態と異なる。
図４２は、第３の実施の形態におけるデータ登録サーバの変換情報記憶部に格納される情報の一例を示す図である。変換情報記憶部２３０は、キーワード一覧２３１、共有集合一覧２３４、および分割キーワード選択テーブル２３３を記憶する。図２１に示した第２の実施の形態の変換情報記憶部２３０と比較すると、第２の実施の形態における変換集合一覧２３２が第３の実施の形態では共有集合一覧２３４となっている。 In the third embodiment, the information stored in the conversion information storage unit 230 of the data registration server 200 differs from that in the second embodiment.
42 is a diagram illustrating an example of information stored in a conversion information storage unit of a data registration server according to the third embodiment; FIG. The conversion information storage unit 230 stores a keyword list 231 , a shared set list 234 and a split keyword selection table 233 . When compared with the conversion information storage unit 230 of the second embodiment shown in FIG. 21, the conversion set list 232 in the second embodiment becomes the shared set list 234 in the third embodiment.

共有集合一覧２３４は、データ登録要求部２５０によって生成される。データ登録要求部２５０は、例えば第２の実施の形態における変換集合生成手順と同じ手順で共有集合を生成することができる。 Shared set list 234 is generated by data registration request unit 250 . The data registration request unit 250 can generate a shared set by the same procedure as the transformation set generation procedure in the second embodiment, for example.

図４３は、共有集合の生成例を示す図である。キーワードリスト２３１ａには、年齢層に設定可能な４個のキーワードが示されている。またＧ＝２であり、各共有集合２３４ｇ～２３４ｊには２つずつの分割キーワードが含まれる。図１４に示した分割キーワード選択テーブル２３３に基づいて分割キーワードを生成する場合、年齢層に関する分割キーワードは８個となる。 FIG. 43 is a diagram illustrating an example of generating a sharing set. The keyword list 231a shows four keywords that can be set for age groups. Also, G=2, and each shared set 234g-234j includes two split keywords. When the divided keywords are generated based on the divided keyword selection table 233 shown in FIG. 14, there are eight divided keywords related to age groups.

データ登録要求部２５０は、まず４個の共有集合２３４ｇ～２３４ｊの要素の格納領域を生成する。またデータ登録要求部２５０は、共有集合２３４ｇ～２３４ｊの識別子をそれぞれ「２－０」～「２－３」とする。識別子の左側の数値は「年齢層」に対応する値であり、右側の数値は「年齢層」の共有集合２３４ｇ～２３４ｊに対する通し番号である。 The data registration request unit 250 first creates storage areas for the elements of the four shared sets 234g to 234j. The data registration requesting unit 250 sets the identifiers of the shared sets 234g to 234j to "2-0" to "2-3", respectively. The numerical value on the left side of the identifier is the value corresponding to the "age group", and the numerical value on the right side is the serial number for the shared set 234g-234j of the "age group".

データ登録要求部２５０は、例えばキーワードリスト２３１ａから、所定の順番あるいはランダムな順番ですべてのキーワードを１回ずつ選択する。図４３の例では、キーワードリスト２３１ａの左から順にキーワードを選択するものとする。 The data registration request unit 250 selects all keywords once from the keyword list 231a, for example, in a predetermined order or in random order. In the example of FIG. 43, the keywords are selected in order from the left of the keyword list 231a.

データ登録要求部２５０は、選択したキーワードの分割キーワードを番号の小さい共有集合から順に、共有集合の格納領域に格納していく。このとき、データ登録要求部２５０は、同じキーワードに対応する分割キーワードは異なる共有集合に格納する。またデータ登録要求部２５０は、共有集合内の分割キーワードの変換元のキーワードの組み合わせが同じとなる複数の共有集合が生じないようにする。 The data registration requesting unit 250 stores the divided keywords of the selected keyword in the shared set storage area in ascending order of number. At this time, the data registration requesting unit 250 stores divided keywords corresponding to the same keyword in different shared sets. The data registration requesting unit 250 also prevents the occurrence of a plurality of shared sets in which the combination of the conversion source keywords of the divided keywords in the shared set is the same.

例えばデータ登録要求部２５０は、最初に選択したキーワードの１つ目の分割キーワードを先頭の共有集合２３４ｇに登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目の分割キーワードを２番目の共有集合２３４ｈに登録する。例えば「老人」が最初に選択された場合、「老人０」が共有集合２３４ｇに登録され、「老人１」が共有集合２３４ｈに登録される。 For example, the data registration requesting unit 250 registers the first divided keyword of the first selected keyword in the leading shared set 234g. Next, the data registration request unit 250 registers the second divided keyword of the keyword in the second shared set 234h. For example, if "old man" is selected first, "old man 0" is registered in shared set 234g and "old man 1" is registered in shared set 234h.

データ登録要求部２５０は、２番目に選択したキーワードの１つ目の分割キーワードを、直前に選択したキーワードの２つ目の分割キーワードと同じ共有集合２３４ｈに登録する。次にデータ登録要求部２５０は、そのキーワードの２つ目の分割キーワードを次の共有集合２３４ｉに登録する。例えば「成人」が２番目に選択された場合、「成人０」が共有集合２３４ｈに登録され、「成人１」が共有集合２３４ｉに登録される。 The data registration requesting unit 250 registers the first divided keyword of the second selected keyword in the same shared set 234h as the second divided keyword of the previously selected keyword. Next, the data registration request unit 250 registers the second divided keyword of the keyword in the next shared set 234i. For example, if "adult" is selected second, "adult 0" is registered in shared set 234h and "adult 1" is registered in shared set 234i.

データ登録要求部２５０は、３番目以降に選択した各キーワードの分割キーワードについても、２番目に選択したキーワードと同様の手順で共有集合に登録する。例えば「青年」が３番目に選択された場合、「青年０」が共有集合２３４ｉに登録され、「青年１」が共有集合２３４ｊに登録される。 The data registration requesting unit 250 also registers the divided keywords of the third and subsequent keywords in the shared set in the same procedure as the second selected keyword. For example, if "Youth" is selected third, "Youth 0" is registered in shared set 234i and "Youth 1" is registered in shared set 234j.

データ登録要求部２５０は、最後に選択したキーワードの分割キーワードについては、格納領域が空いている共有集合に登録する。例えば「児童」が最後に選択された場合、「児童０」が共有集合２３４ｇに登録され、「児童１」が共有集合２３４ｊに登録される。 The data registration requesting unit 250 registers the divided keyword of the last selected keyword in a shared set with a free storage area. For example, if "Child" is selected last, then "Child 0" is registered in shared set 234g and "Child 1" is registered in shared set 234j.

このような手順で共有集合２３４ｇ～２３４ｊが生成される。共有集合２３４ｇ～２３４ｊそれぞれは、属する要素のうちの少なくとも１つが分割キーワードであり、かつその分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の共有集合に含まれる。そのため共有集合の生成条件を満たしている。また共有集合２３４ｇ～２３４ｊ内の分割キーワードの変換元のキーワードの組み合わせが同じとなる複数の共有集合は生じていない。 Shared sets 234g to 234j are generated by such a procedure. At least one of the elements belonging to each of the shared sets 234g to 234j is a split keyword, and another split keyword that shares the split keyword with the conversion source keyword is included in the other shared set. Therefore, it satisfies the conditions for generating a shared set. In addition, there are no multiple shared sets in which the combination of the conversion source keywords of the divided keywords in the shared sets 234g to 234j is the same.

図４３には年齢層の共有集合２３４ｇ～２３４ｊの生成例を示したが、診療科および性別についても同様の手順でそれぞれの共有集合２３４ａ～２３４ｆを生成することができる。その結果、図４０に示したような共有集合一覧２３４が生成される。 Although FIG. 43 shows an example of generation of shared sets 234g to 234j for age groups, shared sets 234a to 234f can also be generated for clinical departments and genders in a similar manner. As a result, a shared set list 234 as shown in FIG. 40 is generated.

ｊ番目の項目のキーワードの種類数をＸ_j、ｊ番目の項目の共有集合数をＭ’_jとしたとき、図４３に示したのは、Ｇ＝２，Ｌ＝２，Ｘ_j＝４，Ｍ’_j＝４の場合の例である。これらのパラメータの値が別の値であっても、同様に適切な共有集合を生成することができる。 Assuming that the number of types of keywords in the j-th item is X _j and the number of shared sets of the _j -th item is M' _j , FIG. This is an example when M' _j =4. Other values for these parameters can be used to generate a suitable shared set as well.

データ登録サーバ２００は、共有集合一覧２３４を用いて分割キーワードを共有キーワードに変換することで生成された登録データ８１の各レコードにランダムなＩＤを付与し、ＩＤでソートした後、各レコードを秘匿化ＤＢ１１０に格納する。 The data registration server 200 assigns a random ID to each record of the registered data 81 generated by converting the divided keywords into shared keywords using the shared set list 234, sorts them by ID, and then hides each record. stored in the transformation DB 110.

図４４は、共有キーワードが格納された秘匿化ＤＢの一例を示す図である。秘匿化ＤＢ１１０に登録されたレコードはＩＤによってソートされており、真のデータにおける変換元のレコードとは異なる順番で登録されている。 FIG. 44 is a diagram showing an example of an anonymization DB storing shared keywords. The records registered in the anonymization DB 110 are sorted by ID, and are registered in an order different from the conversion source records in the true data.

次にデータ登録処理の手順について説明する。
図４５は、データ登録処理の手順の一例を示すフローチャートである。以下、図４５に示す処理をステップ番号に沿って説明する。 Next, the procedure of data registration processing will be described.
FIG. 45 is a flow chart showing an example of the procedure of data registration processing. The processing shown in FIG. 45 will be described below along with the step numbers.

［ステップＳ３０１］データ登録要求部２５０は、群数Ｇとキーワード分割数Ｌの設定入力を受け付ける。ＧとＬは共に２以上の整数である。ＧとＬの値が大きいほど安全性が向上するが登録するダミーレコード数も増加する。そこで、ＧとＬの値は、秘匿化ＤＢ１１０に求められる頻度分析攻撃に対する安全性の度合いと、秘匿化ＤＢ１１０に許容されるダミーレコード数とを勘案して、データ登録サーバ２００の管理者が決定する。 [Step S301] The data registration requesting unit 250 receives a setting input of the number of groups G and the number of keyword divisions L. FIG. Both G and L are integers of 2 or more. As the values of G and L are larger, the safety improves, but the number of dummy records to be registered also increases. Therefore, the values of G and L are determined by the administrator of the data registration server 200 in consideration of the degree of security against frequency analysis attacks required of the anonymization DB 110 and the number of dummy records allowed in the anonymization DB 110. do.

［ステップＳ３０２］データ登録要求部２５０は、項目ごとの共有集合数を決定する。例えばｊ番目の項目の共有集合数Ｍ’_jは、天井関数を用いて以下の式（４）で表される。 [Step S302] The data registration request unit 250 determines the number of shared sets for each item. For example, the number of shared sets M' _j of the j-th item is expressed by the following equation (4) using a ceiling function.

ｊ番目の項目のキーワードの種類数Ｘ_jにキーワード分割数Ｌを乗算した値（Ｌ×Ｘ_j）が、その項目の分割キーワード数である。分割キーワード数を群数Ｇで除算した結果の天井関数（除算結果以上の最小の整数）と「３」とのうちの大きい方の値が、共有集合数となる。 The value (L×X _j ) obtained by multiplying the keyword type number X _j of the j-th item by the keyword division number L is the number of divided keywords of the item. The larger one of the ceiling function (minimum integer equal to or greater than the division result) obtained by dividing the number of divided keywords by the number of groups G and "3" is the number of shared sets.

共有集合数が「３」以上となるようにしたことで、Ｇ＝２の場合において、性別のように２種類のキーワードしか存在しない項目についても、３つの共有集合が生成される。これにより性別についても、異なる共有集合において、それぞれの分割キーワードの変換元のキーワードの組み合わせが同じとなることを抑止することができる。 By setting the number of shared sets to be "3" or more, in the case of G=2, three shared sets are generated even for an item such as gender that has only two types of keywords. As a result, it is possible to prevent the same combination of the conversion source keywords of the divided keywords in different shared sets for gender.

［ステップＳ３０３］データ登録要求部２５０は、すべての項目それぞれについて、ステップＳ３０２で決定した共有集合数分の共有集合を生成する。なお、データ登録要求部２５０は、共有集合の格納領域の数よりも分割キーワード数が少ない場合には、共有集合の要素の格納領域の一部を空欄のままとする。生成される各共有集合は少なくとも１つの分割キーワードを含み、その分割キーワードと変換元のキーワードが共通の他の分割キーワードが他の共有集合に含まれる。これにより、図４０の共有集合一覧２３４に示すような共有集合２３４ａ～２３４ｊが生成される。 [Step S303] The data registration requesting unit 250 generates shared sets for each of the items as many as the number of shared sets determined in step S302. If the number of divided keywords is smaller than the number of shared set storage areas, the data registration request unit 250 leaves part of the shared set element storage areas blank. Each shared set that is generated includes at least one split keyword, and other split keywords that share the split keyword and the conversion source keyword are included in the other shared sets. As a result, shared sets 234a to 234j as shown in the shared set list 234 of FIG. 40 are generated.

［ステップＳ３０４］データ登録要求部２５０は、分割キーワード選択テーブル２３３を生成する。この処理の詳細は、図２７に示したステップＳ１０４の処理と同様である。
［ステップＳ３０５］データ登録要求部２５０は、ＤＢ２１０から平文のデータを読み込む。 [Step S304 ] The data registration request unit 250 generates the split keyword selection table 233 . The details of this process are the same as the process of step S104 shown in FIG.
[Step S305 ] The data registration request unit 250 reads plaintext data from the DB 210 .

［ステップＳ３０６］データ登録要求部２５０は、読み込んだ平文のデータのレコードに登録されているキーワードを、分割キーワード選択テーブル２３３に示される選択確率で確率的に分割キーワードに変換する。この処理の詳細は、図２７に示したステップＳ１０６の処理と同様である。 [Step S306 ] The data registration requesting unit 250 probabilistically converts the keywords registered in the record of the read plaintext data into divided keywords with the selection probabilities shown in the divided keyword selection table 233 . The details of this process are the same as the process of step S106 shown in FIG.

［ステップＳ３０７］データ登録要求部２５０は、各レコードの分割キーワードを共有キーワードに変換する。なお、共有キーワードへ変換する処理の詳細は後述する（図４６参照）。 [Step S307] The data registration request unit 250 converts the divided keywords of each record into shared keywords. Details of the process of converting to a shared keyword will be described later (see FIG. 46).

［ステップＳ３０８］データ登録要求部２５０は、共有キーワードを含むレコードそれぞれに、ランダムにＩＤを付与する。
［ステップＳ３０９］データ登録要求部２５０は、各レコードにフラグ値を付与する。例えばデータ登録要求部２５０は、各レコードについて、そのレコードの変換元となった平文のＤＢ２１０内のレコードのＩＤと、各項目の共有キーワードへの変換元となった分割キーワードの共有集合内での要素番号との組をフラグ値として生成する。変換元のレコードのＩＤを含むことにより、すべてのフラグ値をユニークな値にすることができ、フラグ値の頻度分析攻撃を防ぐことができる。また変換元の分割キーワードの共有集合での要素番号がフラグ値に含まれていることにより、フラグ値を参照すれば、共有キーワードを元の分割キーワードに戻すことが可能となる。 [Step S308] The data registration requesting unit 250 randomly assigns an ID to each record containing the shared keyword.
[Step S309] The data registration request unit 250 assigns a flag value to each record. For example, the data registration request unit 250 determines, for each record, the ID of the plaintext record in the DB 210 that is the source of conversion of the record, and the shared set of divided keywords that are the sources of conversion of each item into the shared keyword. Create a pair with the element number as a flag value. By including the ID of the record from which the conversion is made, all flag values can be made unique, and frequency analysis attacks on flag values can be prevented. In addition, since the flag value includes the element number in the shared set of the split keyword of the conversion source, the shared keyword can be restored to the original split keyword by referring to the flag value.

［ステップＳ３１０］データ登録要求部２５０は、各レコードをＩＤでソートする。
［ステップＳ３１１］データ登録要求部２５０は、ソートされたレコード群を暗号化して、秘匿化ＤＢ１１０に登録する。例えばデータ登録要求部２５０は、レコード内の項目値ごとに暗号化し、暗号化された値を有するレコード群を、登録データとしてデータ管理サーバ１００に送信する。データ管理サーバ１００では、データ登録部１２０が登録データを受信し、受信した登録データを秘匿化ＤＢ１１０に格納する。 [Step S310] The data registration request unit 250 sorts the records by ID.
[Step S311 ] The data registration request unit 250 encrypts the sorted record group and registers it in the anonymization DB 110 . For example, the data registration request unit 250 encrypts each item value in the record, and transmits a group of records having the encrypted value to the data management server 100 as registration data. In the data management server 100 , the data registration unit 120 receives the registration data and stores the received registration data in the anonymization DB 110 .

次に分割キーワードを共有キーワードへ変換する処理について詳細に説明する。
図４６は、共有キーワードへ変換する処理手順の詳細を示すフローチャートである。以下、図４６に示す処理をステップ番号に沿って説明する。 Next, a detailed description will be given of the process of converting a divided keyword into a shared keyword.
FIG. 46 is a flow chart showing the details of the processing procedure for conversion into shared keywords. The processing shown in FIG. 46 will be described below along with the step numbers.

［ステップＳ３２１］データ登録要求部２５０は、項目値が分割キーワードに変換された分割済データ２１１（図４０参照）に含まれる分割キーワードを１つ選択する。
［ステップＳ３２２］データ登録要求部２５０は、共有集合一覧２３４から、選択した分割キーワードが属する項目の共有集合を特定し、特定した共有集合から選択した分割キーワードを含む共有集合を検索する。 [Step S321] The data registration request unit 250 selects one split keyword included in the split data 211 (see FIG. 40) whose item values have been converted to split keywords.
[Step S322] The data registration requesting unit 250 identifies from the shared set list 234 the shared set of items to which the selected divided keyword belongs, and searches for shared sets containing the selected divided keyword from the identified shared set.

［ステップＳ３２３］データ登録要求部２５０は、選択した分割キーワードを、その分割キーワードを含む共有集合に対応する共有キーワードに置き換える。例えばデータ登録要求部２５０は、選択した分割キーワードが属する共有集合のすべての要素を文字列結合し、共有キーワードとして出力する。またデータ登録要求部２５０は、予め共有集合ごとに特定の共有キーワードを設定しておき、選択した分割キーワードが属する共有集合に設定された共有キーワードに、分割キーワードを変換してもよい。 [Step S323] The data registration requesting unit 250 replaces the selected divided keyword with a shared keyword corresponding to the shared set containing the divided keyword. For example, the data registration requesting unit 250 concatenates all elements of the shared set to which the selected divided keyword belongs, and outputs the result as a shared keyword. The data registration requesting unit 250 may set a specific shared keyword in advance for each shared set, and convert the divided keyword into the shared keyword set for the shared set to which the selected divided keyword belongs.

［ステップＳ３２４］データ登録要求部２５０は、分割済データ２１１内に未選択の分割キーワードがあるか否かを判断する。データ登録要求部２５０は、未選択の分割キーワードがある場合、処理をステップＳ３２１に進める。またデータ登録要求部２５０は、すべての分割キーワードの共有キーワードへの置き換えが完了した場合、共有キーワードへ変換する処理を終了する。 [Step S324] The data registration request unit 250 determines whether or not there is an unselected split keyword in the split data 211. If there is an unselected split keyword, the data registration requesting unit 250 advances the process to step S321. Further, when the replacement of all the divided keywords with the shared keywords is completed, the data registration requesting unit 250 ends the process of converting to the shared keywords.

図４７は、分割キーワードから共有キーワードへの変換の一例を示す図である。例えば分割キーワード「老人１」が選択されたものとする。「老人１」は、共有集合２３４ｈに含まれている。共有集合２３４ｈの要素は、「老人１」と「成人０」である。そこで「老人１」と「成人０」の文字列をコンマを挟んで結合した「老人１，成人０」が共有キーワードとして出力される。 FIG. 47 is a diagram showing an example of conversion from divided keywords to shared keywords. For example, it is assumed that the divided keyword "old man 1" is selected. "Old man 1" is included in the shared set 234h. The elements of the shared set 234h are "elderly 1" and "adult 0". Therefore, "old man 1, adult 0" is output as a shared keyword by combining the character strings of "old man 1" and "adult 0" with a comma in between.

分割済データ２１１（図４０参照）内の各レコードの分割キーワードが共有キーワードに置き換えられることにより、各レコードは登録データ８１（図４０参照）のレコードとなる。 By replacing the divided keyword of each record in the divided data 211 (see FIG. 40) with the shared keyword, each record becomes a record of the registration data 81 (see FIG. 40).

図４８は、登録データのレコードの一例を示す図である。分割キーワードが設定されたレコード８３は、ＩＤ「０」であり、診療科「内科０」、性別「男１」、年齢層「老人１」を有する。分割キーワード「内科０」は、「小児科０」と「内科０」とを含む共有集合２３４ａの要素であるため、共有キーワード「小児科０，内科０」に変換される。分割キーワード「男１」は、「男１」と「女０」とを含む共有集合２３４ｅの要素であるため、共有キーワード「男１，女０」に変換される。分割キーワード「老人１」は、「老人１」と「成人０」とを含む共有集合２３４ｈの要素であるため、共有キーワード「老人１，成人０」に変換される。 FIG. 48 is a diagram showing an example of a record of registration data. A record 83 in which a division keyword is set has an ID of "0", a clinical department of "internal medicine 0", a sex of "male 1", and an age group of "elderly 1". Since the divided keyword "internal medicine 0" is an element of the shared set 234a including "pediatrics 0" and "internal medicine 0", it is converted to the shared keyword "pediatrics 0, internal medicine 0". The divided keyword "male 1" is an element of the shared set 234e including "male 1" and "female 0", so it is converted to the shared keyword "male 1, female 0". The divided keyword "old man 1" is an element of the shared set 234h including "old man 1" and "adult 0", so it is converted to the shared keyword "old man 1, adult 0".

共有キーワードを有するレコード８４には、ランダムにＩＤ「５」が付与されている。またレコード８４のフラグ値には、レコード８３のＩＤ「０」が含まれる。またレコード８４のフラグ値には、レコード８３内の分割キーワードについての、その分割キーワードが属する共有集合における要素番号が含まれる。 A record 84 having a shared keyword is randomly given an ID of "5". Also, the flag value of record 84 includes the ID “0” of record 83 . The flag value of record 84 also includes the element number of the shared set to which the split keyword in record 83 belongs.

このようにして生成されたレコード８４は、項目値とフラグ値とが暗号化され、秘匿化ＤＢ１１０に登録される。秘匿化ＤＢ１１０に対する検索は、共有キーワードを用いて行われる。 The record 84 generated in this manner is registered in the anonymization DB 110 with the item values and flag values encrypted. A search for the anonymization DB 110 is performed using a shared keyword.

第３の実施の形態では端末装置４００の変換情報記憶部４２０に格納される情報が、第２の実施の形態と異なる。
図４９は、第３の実施の形態における端末装置の変換情報記憶部に格納される情報の一例を示す図である。変換情報記憶部４２０は、キーワード一覧４２１、共有集合一覧４２４、および分割キーワード一覧４２３を記憶する。図３１に示した第２の実施の形態の変換情報記憶部４２０と比較すると、第２の実施の形態における変換集合一覧４２２が第３の実施の形態では共有集合一覧４２４となっている。 In the third embodiment, information stored in the conversion information storage unit 420 of the terminal device 400 is different from that in the second embodiment.
49 is a diagram illustrating an example of information stored in a conversion information storage unit of a terminal device according to the third embodiment; FIG. The conversion information storage unit 420 stores a keyword list 421 , a shared set list 424 and a split keyword list 423 . Comparing with the conversion information storage unit 420 of the second embodiment shown in FIG. 31, the conversion set list 422 in the second embodiment becomes the shared set list 424 in the third embodiment.

共有集合一覧４２４は、例えば検索要求部４４０によって、データ登録要求部２５０による共有集合一覧２３４の生成手順と同じ手順で生成される。また検索要求部４４０は、データ登録サーバ２００から共有集合一覧２３４を取得し、自身の共有集合一覧４２４として変換情報記憶部４２０に格納してもよい。 The shared set list 424 is generated, for example, by the search request unit 440 in the same procedure as the generated shared set list 234 by the data registration request unit 250 . The search request unit 440 may also acquire the shared set list 234 from the data registration server 200 and store it in the conversion information storage unit 420 as its own shared set list 424 .

検索要求部４４０は、検索要求が入力されると分割キーワード一覧４２３と共有集合一覧４２４とを用いて、秘匿化検索クエリを生成する。
図５０は、秘匿化検索クエリの一例を示す図である。検索要求部４４０は、検索条件が入力されると、検索条件に示される検索キーワードを含む平文検索クエリ９１を生成する。次に検索要求部４４０は、分割キーワード一覧４２３を参照して、検索キーワードを、対応する複数の分割キーワードそれぞれに分割し、複数の分割キーワードの論理和検索を行う分割クエリ９２を生成する。例えば検索キーワードが「老人」の場合、「老人０」と「老人１」の論理和検索を行う分割クエリ９２が生成される。 When a search request is input, the search request unit 440 uses the split keyword list 423 and shared set list 424 to generate an anonymized search query.
FIG. 50 is a diagram showing an example of an anonymized search query. When a search condition is input, the search request unit 440 generates a plaintext search query 91 including the search keyword indicated by the search condition. Next, the search request unit 440 refers to the split keyword list 423, splits the search keyword into a plurality of corresponding split keywords, and generates a split query 92 for performing a logical sum search of the split keywords. For example, if the search keyword is "elderly", a split query 92 is generated that performs a logical sum search of "elderly 0" and "elderly 1".

さらに検索要求部４４０は、共有集合一覧４２４を参照し、分割キーワードを、その分割キーワードが属する共有集合に対応する共有キーワードに変換することで攪乱クエリ９３を生成する。例えば分割キーワード「老人０」の共有キーワード「老人０，児童０」と分割キーワード「老人１」の共有キーワード「老人１，成人０」との論理和検索を行う攪乱クエリ９３が生成される。 Further, the search request unit 440 refers to the shared set list 424 and converts the divided keyword into a shared keyword corresponding to the shared set to which the divided keyword belongs, thereby generating the disturbance query 93 . For example, a disturbance query 93 is generated that performs a logical sum search of the shared keyword "old man 0, child 0" of the divided keyword "old man 0" and the shared keyword "old man 1, adult 0" of the divided keyword "old man 1".

そして検索要求部４４０は、攪乱クエリ９３内の共有キーワードを暗号化した秘匿化検索クエリ９４を生成する。検索要求部４４０は、秘匿化検索クエリ９４をデータ管理サーバ１００に送信する。するとデータ管理サーバ１００において、秘匿化検索クエリ９４に応じた秘匿化検索が行われる。 The search request unit 440 then generates an anonymized search query 94 by encrypting the shared keyword in the disturbing query 93 . The search request unit 440 transmits the anonymized search query 94 to the data management server 100 . Then, in the data management server 100, an anonymized search is performed according to the anonymized search query 94. FIG.

図５１は、共有キーワードを用いた秘匿化検索の一例を示す図である。秘匿化検索クエリ９４を取得したデータ管理サーバ１００では、検索部１４０が秘匿化検索クエリ９４に示されている共有キーワードの暗号文を含むレコードを秘匿化ＤＢ１１０から検索する。そして検索部１４０は、該当するレコードを含む秘匿化検索結果９５を端末装置４００に送信する。 FIG. 51 is a diagram showing an example of anonymous search using a shared keyword. In the data management server 100 that has acquired the anonymized search query 94 , the search unit 140 searches the anonymized DB 110 for a record containing the encrypted text of the shared keyword indicated in the anonymized search query 94 . The search unit 140 then transmits the anonymized search result 95 including the corresponding record to the terminal device 400 .

端末装置４００の検索要求部４４０は、秘匿化検索結果９５内の項目値を復号して、攪乱検索結果９６を生成する。検索要求部４４０は、攪乱検索結果９６の各レコードのフラグに基づいて、該当レコード内の共有キーワードの変換元の分割キーワードを判断する。そして検索要求部４４０は、復号された秘匿化検索結果９５から、分割クエリ９２に含まれる分割キーワードに基づいて変換された共有キーワードを有するレコードを検索する。 The search request unit 440 of the terminal device 400 decodes the item values in the anonymized search result 95 to generate the disturbed search result 96 . The search request unit 440 determines the split keyword from which the shared keyword in the record is converted based on the flag of each record of the disturbed search result 96 . Then, the search request unit 440 searches the decrypted anonymized search result 95 for a record having the shared keyword converted based on the split keyword included in the split query 92 .

検索要求部４４０は、分割クエリ９２に示される分割キーワードを有するレコードの共有キーワードを分割キーワードに逆変換し、さらにその分割キーワードをＤＢ２１０に登録可能なキーワードに変換する。そして検索要求部４４０は、フラグを除去した後、ＤＢ２１０に登録可能なキーワードを有するレコードを検索結果９７として出力する。 The search request unit 440 inversely converts the shared keyword of the record having the split keyword indicated in the split query 92 into the split keyword, and further converts the split keyword into a keyword that can be registered in the DB 210 . After removing the flag, the search request unit 440 outputs records having keywords that can be registered in the DB 210 as search results 97 .

図５２は、検索処理の手順の一例を示すフローチャートである。以下、図５２に示す処理をステップ番号に沿って説明する。
［ステップＳ４０１］検索要求部４４０は、ユーザからの検索条件として入力された検索キーワードと、その検索キーワードに対応する項目を、平文検索クエリとして取得する。 FIG. 52 is a flowchart illustrating an example of a search processing procedure. The processing shown in FIG. 52 will be described below along with the step numbers.
[Step S401] The search request unit 440 acquires a search keyword input as a search condition by the user and an item corresponding to the search keyword as a plaintext search query.

［ステップＳ４０２］検索要求部４４０は、共有集合一覧４２４と分割キーワード一覧４２３を生成する。例えば検索要求部４４０は、データ登録要求部２５０における項目ごとの共有集合生成処理と同様の処理を行い、共有集合一覧４２４を生成する。検索要求部４４０によって生成される共有集合一覧は、データ登録要求部２５０で生成された共有集合一覧と同じものとなる。 [Step S402] The search request unit 440 generates a shared set list 424 and a split keyword list 423. FIG. For example, the search request unit 440 performs processing similar to the shared set generation processing for each item in the data registration request unit 250 to generate the shared set list 424 . The shared set list generated by the search request unit 440 is the same as the shared set list generated by the data registration request unit 250 .

［ステップＳ４０３］検索要求部４４０は、取得した平文検索クエリを分割する。例えば検索要求部４４０は、平文検索クエリに含まれる検索キーワードを分割キーワードに分割する。そして検索要求部４４０は、分割によって得られた分割キーワードの論理和を示す分割クエリを生成する。 [Step S403] The search request unit 440 divides the obtained plaintext search query. For example, the search request unit 440 divides the search keyword included in the plaintext search query into divided keywords. The search request unit 440 then generates a split query indicating the logical sum of the split keywords obtained by splitting.

取得した平文検索クエリには、複数の検索キーワードが含まれる場合がある。複数の検索キーワードの論理和検索の場合、検索要求部４４０は、各検索キーワードを分割キーワードに分解し、すべての分割キーワードの論理和を示す分割クエリを生成する。また複数の検索キーワードの論理積検索の場合、検索要求部４４０は、項目が異なる分割キーワード間のすべての組み合わせを生成する。３つ以上の項目それぞれの検索キーワードの論理積の場合であれば、検索要求部４４０は、各項目から１ずつ分割キーワードを選択することで生成可能な分割キーワードのすべての組み合わせを生成する。そして検索要求部４４０は、生成した分割キーワードの組み合わせごとの論理積を示す論理式を生成する。さらに検索要求部４４０は、分割キーワードの組み合わせごとに生成した論理積の論理式間の論理和を示す分割クエリを生成する。 The obtained plaintext search query may contain multiple search keywords. In the case of a logical sum search of a plurality of search keywords, the search request unit 440 decomposes each search keyword into split keywords and generates a split query indicating the logical sum of all split keywords. Also, in the case of a logical product search of a plurality of search keywords, the search request unit 440 generates all combinations of divided keywords with different items. In the case of a logical product of search keywords for three or more items, the search request unit 440 selects one split keyword from each item to generate all possible combinations of split keywords. Then, the search request unit 440 generates a logical expression indicating the logical AND for each combination of the generated split keywords. Furthermore, the search request unit 440 generates a split query indicating the logical sum between the logical expressions of the logical product generated for each combination of the split keywords.

例えば取得した平文検索クエリが「Ａ∧Ｂ」であり、検索キーワード「Ａ」は「Ａ０」と「Ａ１」に分割され、検索キーワード「Ｂ」は「Ｂ０」と「Ｂ１」に分割されるものとする。この場合、検索要求部４４０は、分割クエリとして、「（Ａ０∧Ｂ０）∨（Ａ０∧Ｂ１）∨（Ａ１∧Ｂ０）∨（Ａ１∧Ｂ１）」を生成する。 For example, the obtained plaintext search query is "A∧B", the search keyword "A" is divided into "A0" and "A1", and the search keyword "B" is divided into "B0" and "B1". and In this case, the search requesting unit 440 generates "(A0∧B0)∨(A0∧B1)∨(A1∧B0)∨(A1∧B1)" as the split query.

［ステップＳ４０４］検索要求部４４０は、生成した分割クエリを、共有集合に基づいて攪乱クエリに変換する。例えば検索要求部４４０は、分割クエリにおいて、検索対象の項目と分割キーワードが指定されている場合、まず検索対象の項目に対応する１以上の共有集合の中から、指定された分割キーワードに対応する要素を含む共有集合を特定する。次に検索要求部４４０は、特定した共有集合に対応する共有キーワードに、分割クエリ内の変換対象の分割キーワードを変換する。 [Step S404] The search request unit 440 converts the generated split query into a perturbation query based on the shared set. For example, when a search target item and a split keyword are specified in a split query, the search request unit 440 first selects one or more shared sets corresponding to the search target item and matches the specified split keyword. Identify the shared set containing the element. Next, the search request unit 440 converts the split keyword to be converted in the split query into the shared keyword corresponding to the specified shared set.

［ステップＳ４０５］検索要求部４４０は、ステップＳ４０４で変換された後の共有キーワードを暗号化して秘匿化検索クエリを生成する。検索要求部４４０は、生成した秘匿化検索クエリをデータ管理サーバ１００に送信する。 [Step S405] The search request unit 440 encrypts the shared keyword converted in step S404 to generate an anonymized search query. The search request unit 440 transmits the generated anonymous search query to the data management server 100 .

［ステップＳ４０６］検索要求部４４０は、データ管理サーバ１００から秘匿化検索の検索結果（秘匿化検索結果）を取得する。
［ステップＳ４０７］検索要求部４４０は、秘匿化検索結果に含まれる項目値（暗号文）を予めデータ登録サーバ２００から取得した暗号鍵を用いて復号し、平文の共有キーワードを含む攪乱検索結果を生成する。 [Step S406 ] The search request unit 440 acquires search results of the anonymous search (anonymized search results) from the data management server 100 .
[Step S407] The search request unit 440 decrypts the item values (encrypted text) included in the anonymized search results using the encryption key obtained in advance from the data registration server 200, and generates the disturbed search results including the plaintext shared keyword. Generate.

［ステップＳ４０８］検索要求部４４０は、攪乱検索結果から、分割クエリに含まれていた分割キーワードに基づいて変換された共有キーワードを含むレコードを検索する。例えば検索要求部４４０は、復号して得られた共有キーワードを、その共有キーワードに対応する共有集合の要素である分割キーワードに分割する。検索要求部４４０は、分割して得られた分割キーワードのうち、フラグ値に示される要素番号に対応する分割キーワードを抽出する。検索要求部４４０は、抽出した分割キーワードが、分割クエリに示される分割キーワードのいずれかと一致するか否かを判断する。検索要求部４４０は、一致した場合、その分割キーワードを含むレコードを、真の検索結果として抽出する。また検索要求部４４０は、抽出した分割キーワードが、分割クエリに示される分割キーワードのいずれとも一致しない場合、抽出した分割キーワードの抽出元のレコードを削除する。 [Step S408] The search request unit 440 searches for a record containing a shared keyword converted based on the split keyword included in the split query from the disturbed search results. For example, the search request unit 440 divides the decrypted shared keyword into divided keywords that are elements of the shared set corresponding to the shared keyword. The search request unit 440 extracts the split keywords corresponding to the element numbers indicated by the flag values from among the split keywords obtained by splitting. The search request unit 440 determines whether the extracted split keyword matches any of the split keywords indicated in the split query. If they match, the search request unit 440 extracts the record containing the split keyword as a true search result. If the extracted split keyword does not match any of the split keywords indicated in the split query, the search request unit 440 deletes the record from which the extracted split keyword is extracted.

［ステップＳ４０９］検索要求部４４０は、真の検索結果として抽出したレコードの共有キーワードそれぞれを、ＤＢ２１０に設定されていたキーワードに復元する。例えば検索要求部４４０は、各共有キーワードを、その共有キーワードに対応する共有集合内の、フラグ値で示される要素番号の分割キーワードに変換する。そして検索要求部４４０は、分割キーワードを、分割キーワード一覧４２３においてその分割キーワードに対応するキーワードに変換する。 [Step S409] The search request unit 440 restores the keywords set in the DB 210 to the shared keywords of the records extracted as true search results. For example, the search request unit 440 converts each shared keyword into a segmented keyword of the element number indicated by the flag value within the shared set corresponding to the shared keyword. The search request unit 440 then converts the divided keyword into a keyword corresponding to the divided keyword in the divided keyword list 423 .

［ステップＳ４１０］検索要求部４４０は、平文の検索結果を出力する。
このようにして、端末装置４００を用いて秘匿化ＤＢ１１０に対する検索結果を得ることができる。この際、データ管理サーバ１００に対して送信される秘匿化検索クエリでは共有キーワードが指定されている。そのため、検索条件として入力された検索キーワードの平文のＤＢ２１０内での出現頻度と、秘匿化検索クエリにヒットするレコード数（秘匿化ＤＢ１１０内での共有キーワードの出現頻度）とは異なり、攻撃者３３による頻度分析攻撃が困難となっている。 [Step S410] The search request unit 440 outputs plaintext search results.
In this way, the terminal device 400 can be used to obtain search results for the anonymization DB 110 . At this time, the shared keyword is specified in the anonymized search query transmitted to the data management server 100 . Therefore, the appearance frequency of the plaintext of the search keyword input as the search condition in the DB 210 and the number of records hit by the anonymized search query (appearance frequency of the shared keyword in the anonymized DB 110) differ from each other. It is difficult to conduct a frequency analysis attack by

図５３は、共有キーワードを用いた場合の頻度分析攻撃の困難性を示す図である。例えばＤＢ２１０において、「小児科」の出現頻度は「２７」、「婦人科」の出現頻度は「８４」、「内科」の出現頻度は「９５」であるものとする。「小児科」は１９個の分割キーワード「小児科０」と８個の分割キーワード「小児科１」とに確率的に変換されている。「婦人科」は５０個の分割キーワード「婦人科０」と３４個の分割キーワード「婦人科１」とに確率的に変換されている。「内科」は５２個の分割キーワード「内科０」と４３個の分割キーワード「内科１」とに確率的に変換されている。 FIG. 53 is a diagram showing the difficulty of frequency analysis attacks when using shared keywords. For example, in the DB 210, the appearance frequency of "pediatrics" is "27", the appearance frequency of "gynecology" is "84", and the appearance frequency of "internal medicine" is "95". "Pediatrics" is stochastically converted into 19 split keywords "Pediatrics 0" and 8 split keywords "Pediatrics 1". "Gynecology" is stochastically converted into 50 split keywords "Gynecology 0" and 34 split keywords "Gynecology 1". "Internal medicine" is stochastically converted into 52 divided keywords "internal medicine 0" and 43 divided keywords "internal medicine 1".

分割キーワード数は６個である。分割キーワードを２個ずつ含む共有集合が３個生成され、その共有集合に基づいて共有キーワードが生成され、秘匿化ＤＢ１１０に暗号化された共有キーワードが登録されている。秘匿化ＤＢ１１０では、共有キーワード「小児科０，内科０」の出現頻度が「７１」、共有キーワード「小児科１，婦人科０」の出現頻度が「５８」、共有キーワード「婦人科１，内科１」の出現頻度が「７７」となっている。 The number of divided keywords is six. Three shared sets each containing two divided keywords are generated, shared keywords are generated based on the shared sets, and encrypted shared keywords are registered in the anonymization DB 110 . In the anonymization DB 110, the frequency of appearance of the shared keyword "pediatrics 0, internal medicine 0" is "71", the appearance frequency of the shared keyword "pediatrics 1, gynecology 0" is "58", and the shared keyword "gynecology 1, internal medicine 1". is "77".

攻撃者３３は、キーワード数が３個であり、共有集合数が３個であり、各キーワードが２つずつに分割され共有集合内の分割キーワード数が２個であることを知っているものとする。しかし各キーワードがどのような比率で分割キーワードに分割されたのかは攻撃者３３には分からない。そのため攻撃者３３が各共有キーワードの出現頻度がそれぞれ「７１」、「５８」、「７７」であることを知ったとしても、その共有キーワードに含まれる分割キーワードを絞り込むことはできない。そのため頻度分析攻撃は困難である。 Assume that the attacker 33 knows that the number of keywords is three, the number of shared sets is three, each keyword is divided into two, and the number of divided keywords in the shared set is two. do. However, the attacker 33 does not know at what ratio each keyword is divided into divided keywords. Therefore, even if the attacker 33 learns that the frequency of appearance of each shared keyword is "71", "58", and "77", respectively, he cannot narrow down the divided keywords included in the shared keyword. This makes frequency analysis attacks difficult.

〔第４の実施の形態〕
次に第４の実施の形態について説明する。第４の実施の形態は、第３の実施の形態を改良し、数値範囲の検索に対する頻度分析攻撃の困難性を向上させたものである。 [Fourth Embodiment]
Next, a fourth embodiment will be described. The fourth embodiment is an improvement over the third embodiment to increase the difficulty of frequency analysis attacks against numerical range searches.

データ管理サーバ１００は、データを暗号化したままで検索を行う。暗号化したままでの検索は、原則として完全一致検索である。完全一致検索の場合、数値の大小関係を比較することはできない。そこである範囲内の数値を有するレコードの検索を行う場合、端末装置４００は、その範囲内に存在し得るすべての数値の論理和を検索キーワードとする。 The data management server 100 performs a search while encrypting the data. Searching with encrypted data is, in principle, an exact match search. In the case of an exact match search, it is not possible to compare numerical values. Therefore, when searching for records having numerical values within a certain range, the terminal device 400 uses the logical sum of all possible numerical values within the range as a search keyword.

図５４は、数値範囲検索の一例を示す図である。データ登録サーバ２００のＤＢ２１０には、患者に関するレコードが登録されており、各レコードは診療科、性別、年齢の項目値を有している。各項目値は確率的に分割キーワードに変換され、さらに共有キーワードに変換されて秘匿化ＤＢ１１０に登録される。分割キーワードは、各年齢の数値の後に「＿０」、「＿１」を追加した文字列であるものとする。例えば年齢「１８」の分割キーワードは「１８＿０」と「１８＿１」となる。 FIG. 54 is a diagram showing an example of numerical range search. Records relating to patients are registered in the DB 210 of the data registration server 200, and each record has item values of clinical department, sex, and age. Each item value is probabilistically converted into a split keyword, further converted into a shared keyword, and registered in the anonymization DB 110 . It is assumed that the segmented keyword is a character string in which "_0" and "_1" are added after each numerical value of age. For example, the divided keywords for age "18" are "18_0" and "18_1".

端末装置４００は、年齢の範囲を指定した検索条件６０１が入力されると、その範囲内の全年齢の分割キーワードの論理和を示す分割クエリ６０２を生成する。端末装置４００は、分割クエリ６０２内の分割キーワードを共有キーワードに変換することで攪乱クエリ６０３を生成し、共有キーワードを暗号化することで秘匿化検索クエリ６０４を生成する。そして端末装置４００が秘匿化検索クエリ６０４をデータ管理サーバ１００に送信すると、データ管理サーバ１００が秘匿化ＤＢ１１０の検索を行う。 When a search condition 601 specifying an age range is input, the terminal device 400 generates a split query 602 indicating the logical sum of split keywords for all ages within that range. The terminal device 400 generates a disturbed query 603 by converting a divided keyword in the divided query 602 into a shared keyword, and generates an anonymized search query 604 by encrypting the shared keyword. When the terminal device 400 transmits the anonymized search query 604 to the data management server 100 , the data management server 100 searches the anonymized DB 110 .

図５５は、年齢の共有集合の生成例（比較例）を示す図である。年齢のキーワードリスト６１１には、「０～１００」の１０１個の整数が設定されている。各キーワードは、２個ずつの分割キーワードを有する。したがって分割キーワード数は２０２となる。要素を２個（Ｇ＝２）ずつ含む共有集合を生成した場合、１０１個の共有集合１－１～１－３，・・・，１－５１～１－５３，・・・，１－９９～１－１０１が生成される。図５５の例では、第３の実施の形態において図４３に示した手順で共有集合１－１～１－３，・・・，１－５１～１－５３，・・・，１－９９～１－１０１が生成されている。 FIG. 55 is a diagram illustrating an example (comparative example) of generation of a shared set of ages. In the age keyword list 611, 101 integers "0 to 100" are set. Each keyword has two split keywords. Therefore, the number of divided keywords is 202. 101 shared sets 1-1 to 1-3, . . . , 1-51 to 1-53, . ~1-101 are generated. In the example of FIG. 55, shared sets 1-1 to 1-3, . . . , 1-51 to 1-53, . 1-101 are generated.

共有集合１－１～１－３，・・・，１－５１～１－５３，・・・，１－９９～１－１０１それぞれに要素として含まれる分割キーワードは、その共有集合に対応する共有キーワードに変換される。例えばキーワード「０」の分割キーワード「０＿０」は共有キーワード「０＿０，１００＿０」に変換され、分割キーワード「０＿１」は共有キーワード「０＿１，１＿０」に変換される。共有キーワードが暗号化され、秘匿化ＤＢ１１０に格納される。そして暗号文の共有キーワードに対する秘匿化検索が行われる。 , 1-51 to 1-53, . . . , 1-99 to 1-101. converted to keywords. For example, the divided keyword "0_0" of the keyword "0" is converted into the shared keyword "0_0,100_0", and the divided keyword "0_1" is converted into the shared keyword "0_1,1_0". A shared keyword is encrypted and stored in the anonymization DB 110 . Then, anonymized search is performed on the shared keyword of the ciphertext.

年齢を指定した検索では、数値範囲を指定した検索が可能となる。数値範囲の検索は、図５４に示すように、該当範囲内の数値の論理和検索に置き換えられる。すると攻撃者３３は、発行される秘匿化検索クエリの論理和の項が多いことにより、数値の項目に対する検索であることが推定できる。攻撃者３３は、秘匿化検索クエリに同時に含まれる暗号文は連番であると仮定することで、共有キーワードの暗号文の前後関係を特定できる。 A search specifying an age enables a search specifying a numerical range. The numeric range search is replaced by a logical sum search of the numbers within the range, as shown in FIG. Then, the attacker 33 can presume that the search is for numerical items because there are many logical sum terms in the anonymized search queries issued. The attacker 33 can identify the context of the ciphertexts of the shared keywords by assuming that the ciphertexts simultaneously included in the anonymized search query are serial numbers.

例えば「０～１」の数値範囲の検索が行われると、共有キーワード「０＿０，１００＿０」、「０＿１，１＿０」、「１＿１，２＿０」のいずれかの暗号文を有するレコードがヒットする。攻撃者３３は、これらのレコードに含まれる共有キーワードの変換元の分割キーワードには連番の数値が含まれると推定できる。そこで攻撃者３３は、共有キーワード「０＿０，１００＿０」、「０＿１，１＿０」、「１＿１，２＿０」の暗号文が連続するように並べる。攻撃者３３は、様々な数値範囲の検索が行われるごとに、ヒットしたレコードに含まれる共有キーワードの暗号文が連続するように暗号文を並べていく。攻撃者３３は、最終的には、図５５の共有キーワードの並びと同じ順に、共有キーワードの暗号文を並べることができる。 For example, when a search is performed in the numerical range of "0 to 1", a record having a ciphertext of one of the shared keywords "0_0, 100_0", "0_1, 1_0", and "1_1, 2_0" is hit. The attacker 33 can presume that the split keywords that are the conversion sources of the shared keywords included in these records include serial numbers. Therefore, the attacker 33 arranges the ciphertexts of the shared keywords "0_0, 100_0", "0_1, 1_0", and "1_1, 2_0" so as to be continuous. The attacker 33 arranges the ciphertexts so that the ciphertexts of the shared keywords included in the hit records are consecutive each time the search is performed in various numerical ranges. The attacker 33 can finally arrange the ciphertexts of the shared keywords in the same order as the arrangement of the shared keywords in FIG.

なお図５５の例では、分割キーワード「１００＿０」を除き、上位の共有集合ほど、小さい数値のキーワードの分割キーワードが含まれている。このとき攻撃者３３は共有キーワード間の連続関係が分かるのみであり、共有キーワードの暗号文を並べても、その配列の先頭と後方のうちのどちらの数値が小さく、どちらの数値が大きいのかは判別できない。 In the example of FIG. 55, except for the split keyword "100_0", the higher the shared set, the smaller the number of split keywords included. At this time, the attacker 33 can only know the continuous relationship between the shared keywords, and even if the ciphertexts of the shared keywords are arranged, it is possible to determine which of the numbers at the beginning and the end of the array is smaller and which is larger. Can not.

しかし共有キーワードの暗号文の中盤の値を含むレコードが検索された場合、攻撃者３３は、全体の数値範囲「０～１００」の内の中盤の数値が検索されたことを認識できる。すなわち検索条件の絞り込みが可能となってしまう。 However, if a record containing the middle value of the ciphertext of the shared keyword is retrieved, the attacker 33 can recognize that the middle numerical value within the entire numerical range "0 to 100" has been retrieved. That is, it becomes possible to narrow down the search conditions.

そこで第４の実施の形態では、データ登録サーバ２００のデータ登録要求部２５０は、キーワードリストに数値が設定されている場合、ある数値範囲の連続する数値を含む共有集合に、範囲が重複しない他の数値範囲の連続する数値範囲が含まれるように共有集合を生成する。 Therefore, in the fourth embodiment, when numerical values are set in the keyword list, the data registration requesting unit 250 of the data registration server 200 adds a shared set containing consecutive numerical values in a certain numerical range to a shared set in which the ranges do not overlap. Generate a shared set such that it contains the consecutive numeric ranges of the numeric ranges of .

図５６は、年齢の共有集合の生成例を示す図である。図５６には、要素を２個（Ｇ＝２）ずつ含む１０２個の共有集合２－１～２－６，・・・，２－９７～２－１０２が示されている。 FIG. 56 is a diagram illustrating an example of generating a shared set of ages. FIG. 56 shows 102 shared sets 2-1 to 2-6, .

データ登録要求部２５０は、例えばキーワードリスト６１１から値の小さい順に数値を選択する。データ登録要求部２５０は、選択した数値の分割キーワードを最上位の共有集合２－１から順に下位に向かって１つずつ設定する。２つ目以降に選択した数値の分割キーワードは、前に選択した数値を設定した共有集合の次の共有集合に設定する。 The data registration requesting unit 250 selects numerical values from the keyword list 611 in ascending order of value, for example. The data registration requesting unit 250 sequentially sets the selected numeric divided keywords one by one from the highest shared set 2-1 downward. The second and subsequent numerical division keywords are set in the shared set next to the previously selected numerical value set.

データ登録要求部２５０は、すべての共有集合２－１～２－１０２に１つずつの分割キーワードが設定されると、各共有集合に対する２つ目の要素を設定する。ただし最上位の共有集合２－１と最下位の共有集合２－１０２は、２つ目の分割キーワードの設定対象外とされる。そこでデータ登録要求部２５０は、選択した数値の分割キーワードを下位から２番目の共有集合２－１０１から順に上位に向かって１つずつ設定する。 The data registration requesting unit 250 sets a second element for each shared set when one divided keyword is set for each of the shared sets 2-1 to 2-102. However, the highest shared set 2-1 and the lowest shared set 2-102 are excluded from setting of the second divided keyword. Therefore, the data registration requesting unit 250 sequentially sets the selected numeric divided keywords one by one from the second lowest shared set 2-101 to the upper ones.

このように共有集合を生成することで、ある数値範囲の連続する数値を含む共有集合には、範囲が重複しない別の数値範囲の連続する数値も含まれることとなる。その結果、攻撃者３３が、数値の連続性に基づいて共有集合２－１～２－１０２に対応する共有キーワードの暗号文を並べることができたとしても、ある暗号文を含むレコードが検索されたときの検索された数値範囲の絞り込みは困難となる。例えば攻撃者３３は、「０～２」付近の検索、「９８～１００」付近の検索、「４８～５０」付近の検索、「５１～５３」付近の検索、「４８～５３」付近の検索を見分けることが困難である。 By generating a shared set in this way, a shared set containing consecutive numbers in a certain numerical range will also contain consecutive numbers in another numerical range whose ranges do not overlap. As a result, even if the attacker 33 were able to arrange the ciphertexts of the shared keywords corresponding to the shared sets 2-1 to 2-102 based on numerical continuity, records containing a certain ciphertext would not be retrieved. It is difficult to narrow down the searched numerical range when For example, the attacker 33 searches around "0-2", searches around "98-100", searches around "48-50", searches around "51-53", searches around "48-53" is difficult to discern.

群数Ｇ（共有集合の最大要素数）を増やすと、検索にヒットする件数が他の共有キーワードの影響で増加する。年齢のキーワード数が多い項目は、各キーワードの頻度がキーワード数に応じた減少が見込まれる。そこで項目値が数値の場合における該当項目に対応する群数を増やしてもよい。 When the number of groups G (the maximum number of elements in a shared set) is increased, the number of search hits increases due to the influence of other shared keywords. For items with a large number of age keywords, the frequency of each keyword is expected to decrease according to the number of keywords. Therefore, the number of groups corresponding to the applicable item when the item value is a numerical value may be increased.

図５７は、群数を３とした場合の共有集合の生成例を示す図である。図５７には、要素を３個（Ｇ＝３）ずつ含む６８個の共有集合３－１～３－６，・・・，３－６３～３－６８が示されている。 FIG. 57 is a diagram illustrating an example of generating a shared set when the number of groups is three. FIG. 57 shows 68 shared sets 3-1 to 3-6, .

データ登録要求部２５０は、例えばキーワードリスト６１１から値の小さい順に数値を選択する。データ登録要求部２５０は、選択した数値の分割キーワードを最上位の共有集合３－１から順に下位に向かって１つずつ設定する。データ登録要求部２５０は、すべての共有集合３－１～３－６，・・・，３－６３～３－６８に１つずつの分割キーワードが設定されると、各共有集合に対する２つ目の要素を設定する。ただし最下位の共有集合３－６８は、２つ目の分割キーワードの設定対象外とされる。そこでデータ登録要求部２５０は、選択した数値の分割キーワードを下位から２番目の共有集合３－６７から順に上位に向かって１つずつ設定する。データ登録要求部２５０は、すべての共有集合３－１～３－６，・・・，３－６３～３－６８に２つずつの分割キーワードが設定されると、各共有集合に対する３つ目の要素を設定する。ただし最上位の共有集合３－１は、３つ目の分割キーワードの設定対象外とされる。そこでデータ登録要求部２５０は、選択した数値の分割キーワードを上位から２番目の共有集合３－２から順に下位に向かって１つずつ設定する。 The data registration requesting unit 250 selects numerical values from the keyword list 611 in ascending order of value, for example. The data registration requesting unit 250 sequentially sets the selected numeric divided keywords one by one from the highest shared set 3-1 downward. , 3-63 to 3-68, the data registration requesting unit 250 sets the second split keyword for each shared set 3-1 to 3-6, . set the elements of However, the shared set 3-68 at the lowest level is excluded from setting of the second divided keyword. Therefore, the data registration requesting unit 250 sequentially sets the selected numeric divided keywords one by one from the second lowest shared set 3-67 to the upper ones. , 3-63 to 3-68, the data registration requesting unit 250 sets a third split keyword for each shared set 3-1 to 3-6, . set the elements of However, the highest shared set 3-1 is excluded from setting of the third divided keyword. Therefore, the data registration requesting unit 250 sets the selected numeric divided keywords one by one in descending order from the second shared set 3-2 from the top.

このように群数Ｇを増やすことで、数値範囲の検索が行われても、ヒットしたレコードに含まれる数値範囲の数が多数となり、検索対象の絞り込みがより困難となる。
〔その他の実施の形態〕
第２～第４の実施の形態では、病院が有するデータに対する秘匿検索の例を示したが、他の分野でも利用可能である。 By increasing the number of groups G in this way, even if a numerical range search is performed, the number of numerical ranges included in hit records will be large, making it more difficult to narrow down the search target.
[Other embodiments]
In the second to fourth embodiments, an example of confidential search for data held by a hospital has been shown, but it can also be used in other fields.

また第２～第４の実施の形態ではデータ登録サーバ２００，３００とデータ管理サーバ１００とを分けているが、データ登録サーバ２００，３００がデータ管理サーバ１００の機能を有していてもよい。 In addition, although the data registration servers 200 and 300 and the data management server 100 are separated in the second to fourth embodiments, the data registration servers 200 and 300 may have the functions of the data management server 100 .

以上、実施の形態を例示したが、実施の形態で示した各部の構成は同様の機能を有する他のものに置換することができる。また、他の任意の構成物や工程が付加されてもよい。さらに、前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 Although the embodiment has been exemplified above, the configuration of each part shown in the embodiment can be replaced with another one having the same function. Also, any other components or steps may be added. Furthermore, any two or more configurations (features) of the above-described embodiments may be combined.

１データ登録装置
１ａ記憶部
１ｂ処理部
２サーバ
２ａ秘匿化ＤＢ
３データ利用装置
４秘匿レコード群
５ａ～５ｃ項目値集合
６攪乱レコード群 1 data registration device 1a storage unit 1b processing unit 2 server 2a anonymization DB
3 data utilization device 4 secret record group 5a-5c item value set 6 disturbance record group

Claims

to the computer,
n (n is a natural number) second item values for each of a plurality of first item values that can be set for one item in the confidential record in the confidential record group including one or more confidential records to be confidential mapping,
each of the plurality of item value sets is any number of 1 to n-1 of the second item values among the two or more second item values associated with the same first item value Classifying each of the second item values into one of the plurality of item value sets so as to include
probabilistically converting the first item value set in the one item in the confidential record to one of the associated second item values;
Based on the confidential record group, by adding the dummy second item value, a disturbance record group is generated in which the number of search hits for the second item value belonging to the same item value set is equalized,
Giving a flag indicating the truth or falseness of the second item value included in the record to a record in the disturbance record group,
encrypting the disturbance records;
A confidential information management program that causes processing to be performed.

In the conversion of the first item value to the second item value, the selection probability of the second item value is randomly determined, and among the second item values associated with the first item value, selecting one with a probability according to the selection probability and converting the first item value to the selected second item value;
The secret information management program according to claim 1.

In the classification of the second item values, two or more of the item value sets are prevented from having the same combination of the first item values corresponding to the second item values belonging to them.
3. The confidential information management program according to claim 1 or 2.

In the generation of the disturbance record group,
set in the one item of each of the confidential records according to a bijective relationship in which each of the second item values belonging to the item value set is bijected to a different second item value belonging to the same item value set; generating a dummy value corresponding to each of the second item values,
generating a dummy record group having the same number of dummy records as the confidential records, in which the dummy value is set in the one item;
generating the disturbance record group including the confidential record group and the dummy record group;
4. The confidential information management program according to any one of claims 1 to 3.

In the generation of the dummy record group, when there are a plurality of the dummy record groups, the second item value set in the one item of the confidential record is converted to a different second item value for each dummy record group. using the bijective relationship that is different for each of the dummy record groups,
5. The confidential information management program according to claim 4.

In the provision of the flag, a first flag indicating true is provided to the confidential record, and a second flag indicating false is provided to the dummy record,
In the encryption, the second item value set to the one item of each of the confidential record and the dummy record, the first flag assigned to the confidential record, and the first flag assigned to the dummy record Encrypt 2 flags,
6. The confidential information management program according to claim 4 or 5.

In assigning the second flag, the second flag having a value different for each of the dummy record groups generated one or more and different from the first flag is set to the dummy records included in each of the dummy record groups. ,
7. The confidential information management program according to claim 6.

In the setting of the first flag, the first flag including the identifier of the confidential record and the value indicating the group number of the confidential record group is given to the confidential record,
In setting the second flag, the second flag including the identifier of the dummy record and the group number of the dummy record group to which it belongs is set in the dummy record.
The secret information management program according to claim 7.

In the generation of the disturbance record group,
The second item value before conversion set in the one item in the confidential record is associated with the item value set to which the second item value before conversion belongs, and any item belonging to the same item value set is associated with the item value set to which the second item value before conversion belongs. generating a disturbance record by converting to a third item value that also hits a search for the second item value of , and generating the disturbance record group including the disturbance record;
4. The confidential information management program according to any one of claims 1 to 3.

In the addition of the flag, a third flag including information indicating the second item value before conversion is added to the disturbance record;
In the encryption, encrypting the third item value set in the one item of the disturbance record and the given third flag;
10. The confidential information management program according to claim 9.

In the addition of the third flag, the third flag containing the identifier of the confidential record and the element number in the item value set of the element corresponding to the second item value before conversion is added to the confidential record Give,
11. The confidential information management program according to claim 10.

In the classification, when the plurality of first item values are numerical values, an item value set including the second item values corresponding to each of the plurality of consecutive numerical values within the first numerical range overlaps with the first numerical range. classifying the second item value into one of the plurality of item value sets such that the second item value corresponding to each consecutive numerical value within the second numerical range is included;
12. The confidential information management program according to any one of claims 1 to 11.

the computer
n (n is a natural number) second item values for each of a plurality of first item values that can be set for one item in the confidential record in the confidential record group including one or more confidential records to be confidential mapping,
each of the plurality of item value sets is any number of 1 to n-1 of the second item values among the two or more second item values associated with the same first item value Classifying each of the second item values into one of the plurality of item value sets so as to include
probabilistically converting the first item value set in the one item in the confidential record to one of the associated second item values;
Based on the confidential record group, by adding the dummy second item value, a disturbance record group is generated in which the number of search hits for the second item value belonging to the same item value set is equalized,
Giving a flag indicating the truth or falseness of the second item value included in the record to a record in the disturbance record group,
encrypting the disturbance records;
Confidential information management method.

a storage unit that stores a confidential record group including one or more confidential records to be confidential;
Associate n (n is a natural number) second item values with each of a plurality of first item values that can be set for one item in the confidential record in the confidential record group,
each of the plurality of item value sets is any number of 1 to n-1 of the second item values among the two or more second item values associated with the same first item value Classifying each of the second item values into one of the plurality of item value sets so as to include
probabilistically converting the first item value set in the one item in the confidential record to one of the associated second item values;
Based on the confidential record group, by adding the dummy second item value, a disturbance record group is generated in which the number of search hits for the second item value belonging to the same item value set is equalized,
Giving a flag indicating the truth or falseness of the second item value included in the record to a record in the disturbance record group,
encrypting the disturbance records;
a processing unit;
A data registration device having

a server having a database;
n (n is a natural number) second item values for each of a plurality of first item values that can be set for one item in the confidential record in the confidential record group including one or more confidential records to be confidential mapping,
each of the plurality of item value sets is any number of 1 to n-1 of the second item values among the two or more second item values associated with the same first item value Classifying each of the second item values into one of the plurality of item value sets so as to include
probabilistically converting the first item value set in the one item in the confidential record to one of the associated second item values;
Based on the confidential record group, by adding the dummy second item value, a disturbance record group is generated in which the number of search hits for the second item value belonging to the same item value set is equalized,
Giving a flag indicating the truth or falseness of the second item value included in the record to a record in the disturbance record group,
encrypting the disturbance record group;
storing the disturbance record group in the database;
a data registration device;
converting the search item value of the one item indicated in the search condition into one or more of the second item values corresponding to the first item value having the same value as the search item value;
transmitting to the server a search query for searching the second item value obtained by the conversion as it is in ciphertext;
Obtaining from the server a detection record that hits the search query in the database;
Determining whether the value that hits the second item value in the detection record is true or false based on the flag attached to the detection record;
a data utilization device;
Confidential Information Management System.

The data registration device
In the generation of the disturbance record group,
set in the one item of each of the confidential records according to a bijective relationship in which each of the second item values belonging to the item value set is bijected to a different second item value belonging to the same item value set; generating a dummy value corresponding to each of the second item values,
generating a dummy record group having the same number of dummy records as the confidential records, in which the dummy value is set in the one item;
generating the disturbance record group including the confidential record group and the dummy record group;
In the generation of the disturbance record group,
Giving a first flag indicating true to the confidential record,
Giving a second flag indicating false to the dummy record,
In said encryption,
encrypting the second item value set in the one item of each of the secret record and the dummy record, the first flag given to the secret record, and the second flag given to the dummy record; ,
storing the confidential record group and the dummy record group in the database of the server;
The data utilization device is
Sending to the server a search query containing the ciphertext of each of the second item values obtained by the conversion;
obtaining search results from the search query in the database from the server;
acquiring the confidential record satisfying the search condition from the search result based on the first flag or the second flag set in each of the confidential record and the dummy record included in the search result;
16. The confidential information management system according to claim 15.

The data registration device
In generating the disturbance record group, the second item value before conversion set in the one item in the confidential record is associated with an item value set to which the second item value before conversion belongs, generating a disturbance record by converting it into a third item value that hits a search for any of the second item values belonging to the same item value set, and generating the disturbance record group including the disturbance record;
The data utilization device is
converting each of the second item values obtained by the conversion into the third item value corresponding to the second item value, and transmitting a search query including encrypted text of the third item value to the server;
obtaining search results from the search query in the database from the server;
decoding the third item value and the third flag set in each of the disturbance records included in the search result;
determining the second item value from which the third item value is converted based on the third flag;
extracting from the search result the disturbance record storing the encrypted text of the second item value corresponding to the first item value satisfying the search condition;
16. The confidential information management system according to claim 15.