JP2012248940A

JP2012248940A - Data generation device, data generation method, data generation program and database system

Info

Publication number: JP2012248940A
Application number: JP2011116962A
Authority: JP
Inventors: Mamoru Kato; 守加藤; Hideya Shibata; 秀哉柴田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2011-05-25
Filing date: 2011-05-25
Publication date: 2012-12-13

Abstract

PROBLEM TO BE SOLVED: To enhance safety by making the frequency analysis in concealment search more difficult even when variation in possible values of attribute data is limited.SOLUTION: Dummy data of the number of possible values of input data is generated, a predetermined probability is assigned to each dummy data thus generated, and then the predetermined number of dummy data is selected from the dummy data thus generated so that a selected probability becomes the probability assigned. An input tag is generated by performing a predetermined calculation of the input data, and a dummy tag is generated by performing the predetermined calculation of the dummy data selected. Encryption data of the input data is associated with the input tag and the dummy tag and stored.

Description

この発明は、データを暗号化したまま検索可能とする秘匿検索技術に関する。特に、頻度分析を困難にし、安全性を高めた秘匿検索技術に関する。 The present invention relates to a secret search technique that enables search while data is encrypted. In particular, the present invention relates to a secret search technique that makes frequency analysis difficult and increases safety.

近年、クラウドコンピューティングと呼ばれるコンピュータの利用形態が普及してきている。クラウドコンピューティングにおいては、利用者はデータの保管や管理などのコンピュータ処理をネットワーク経由でサービスとして利用することができる。
このような形態においては、データの所有者であるサービス利用者とデータの管理者であるサービス提供者とが異なる。そのため、サービス利用者の機密情報がサービス提供者に漏えいすることを防ぐために、サービス利用者が保存するデータを暗号化することが一般的になりつつある。データを暗号化したまま所望するデータを検索することができる秘匿検索のサービスが実現できれば、利用者の利便性が大きく向上する。 In recent years, a use form of a computer called cloud computing has become widespread. In cloud computing, a user can use computer processing such as data storage and management as a service via a network.
In such a form, the service user who is the owner of the data is different from the service provider who is the manager of the data. Therefore, in order to prevent the confidential information of the service user from leaking to the service provider, it is becoming common to encrypt data stored by the service user. If a secret search service that can search for desired data while encrypting the data can be realized, the convenience of the user is greatly improved.

データを暗号化したまま検索する方法として、確定的暗号やハッシュを用いる方法がある。確定的暗号やハッシュでは、入力データに対する暗号化データが一意に決定されるため、暗号化されたままでも照合が可能となり、逐次検索やインデックスを用いた検索が実現できる。
特許文献１では、暗号化データと関連付けられたハッシュ値を用いることにより検索時の絞込みを高速化する方法について開示されている。このような確定的暗号やハッシュ値を用いる方法では、入力データの内容が直接漏えいすることはないが、データの出現頻度の分析が可能であるため、秘匿性が低下するという課題がある。
この課題に対して、特許文献２では、ハッシュ値にダミーデータを混入することにより、データの出現頻度をかく乱する方法が開示されている。 As a method for searching data while it is encrypted, there is a method using deterministic encryption or hash. In the deterministic encryption or hash, since the encrypted data for the input data is uniquely determined, it is possible to perform collation even when the data is encrypted, and it is possible to realize sequential search and search using an index.
Patent Document 1 discloses a method for speeding up narrowing at the time of search by using a hash value associated with encrypted data. In such a method using deterministic encryption or hash value, the content of input data does not leak directly, but since the frequency of appearance of data can be analyzed, there is a problem that confidentiality is lowered.
To deal with this problem, Patent Document 2 discloses a method of disturbing the appearance frequency of data by mixing dummy data into a hash value.

特開２００５−２４２７４０公報JP 2005-242740 A 特開２００５−７２９１７公報JP-A-2005-72917

特許文献２では、全文検索システムにおける単語等の検索対象文字列の出現頻度のかく乱を目的としている。特許文献２では、検索に用いられない文字列（句読点など）を、入力文書の文字列の出現頻度に近似した偏りを持つ文字列に置き換えている。したがって、特許文献２では、入力データの取り得る値のバリエーションが多く、かつ、不特定多数のデータが１レコード（１文書）に関連付けられるという特徴がある。 Patent Document 2 aims to disturb the appearance frequency of search target character strings such as words in the full-text search system. In Patent Document 2, a character string (such as punctuation marks) that is not used for search is replaced with a character string having a bias that approximates the appearance frequency of the character string of the input document. Therefore, Patent Document 2 has a feature that there are many variations of values that can be taken by input data, and an unspecified number of data are associated with one record (one document).

属性データ（例えば住所の都道府県名や市町村名等）のような取り得る値のバリエーションが限られており、１レコードで取れる値が排他的（１つのみ）であるようなデータベースがある。このようなデータベースに特許文献２に記載された発明を適用する場合には、以下の（１）（２）のような課題がある。
（１）追加するデータ（ダミーデータ）の出現頻度を元データの出現頻度に合わせると、取り得る値のバリエーションが限られることから、頻度分析が可能となり秘匿性が低下する。
（２）入力データの中の検索に用いられないデータをダミーデータに置き換えることはできず、ダミーデータを新たに追加する必要がある。 There are limited variations of values that can be taken, such as attribute data (for example, the prefecture name or city name of the address), and there are databases in which the value that can be taken in one record is exclusive (only one). When applying the invention described in Patent Document 2 to such a database, there are the following problems (1) and (2).
(1) If the appearance frequency of the data to be added (dummy data) is matched to the appearance frequency of the original data, variations in possible values are limited, and thus frequency analysis is possible and confidentiality is reduced.
(2) Data that is not used for search in the input data cannot be replaced with dummy data, and it is necessary to newly add dummy data.

この発明は、属性データの取り得る値のバリエーションが限られているような場合でも、秘匿検索における頻度分析をより困難にし、安全性を高めることを目的とする。 An object of the present invention is to make frequency analysis in a secret search more difficult and enhance safety even when variations of values that attribute data can take are limited.

この発明に係るデータ生成装置は、
入力データを入力する入力部と、
前記入力部が入力した入力データを暗号化して暗号化データを生成する暗号化部と、
前記入力データの取り得る値の数を要素数として計測する要素数計測部と、
前記要素数計測部が計測した要素数に応じた数のダミーデータを生成するとともに、生成した各ダミーデータに所定の確率を割り当てるダミーデータ生成部と、
選択される確率が前記ダミーデータ生成部が割り当てた確率になるように、前記ダミーデータ生成部が生成したダミーデータから所定の数のダミーデータを選択するダミーデータ選択部と、
前記入力データに含まれるデータに対して所定の計算をして入力タグを生成するとともに、前記ダミーデータ選択部が選択したダミーデータに対して前記所定の計算をしてダミータグを生成する登録タグ生成部と、
前記登録タグ生成部が生成した入力タグとダミータグとを、前記暗号化部が生成した暗号化データに関連付けて記憶装置に記憶させる記憶部と
を備えることを特徴とする。 The data generation device according to the present invention is:
An input unit for inputting input data;
An encryption unit that encrypts input data input by the input unit to generate encrypted data;
An element number measuring unit that measures the number of values that the input data can take as the number of elements;
A dummy data generation unit that generates a number of dummy data according to the number of elements measured by the element number measurement unit, and assigns a predetermined probability to each generated dummy data;
A dummy data selection unit that selects a predetermined number of dummy data from the dummy data generated by the dummy data generation unit, so that the probability of selection is the probability assigned by the dummy data generation unit;
Generating an input tag by performing a predetermined calculation on the data included in the input data, and generating a registered tag for generating the dummy tag by performing the predetermined calculation on the dummy data selected by the dummy data selection unit And
And a storage unit that stores the input tag and the dummy tag generated by the registration tag generation unit in a storage device in association with the encrypted data generated by the encryption unit.

この発明に係るデータ生成装置では、取り得る値の数（要素数）に応じてダミーデータ数を決定し、各ダミーデータが選択される頻度を任意に決定している。そのため、元データに近いデータの偏りを再現しながら、出現頻度をランダムとすることができ、秘匿検索における頻度分析をより困難にし、安全性を高めることができる。 In the data generation device according to the present invention, the number of dummy data is determined according to the number of possible values (number of elements), and the frequency with which each dummy data is selected is arbitrarily determined. Therefore, the appearance frequency can be made random while reproducing the bias of data close to the original data, making frequency analysis in the secret search more difficult and improving the safety.

実施の形態１に係る秘匿化データベースシステム１００の構成の一例を示す図。The figure which shows an example of a structure of the concealment database system 100 which concerns on Embodiment 1. FIG. 実施の形態１に係る秘匿化データベースシステム１００のデータ登録時の動作を示すフローチャート。5 is a flowchart showing an operation at the time of data registration of the concealment database system 100 according to the first embodiment. 実施の形態１に係る秘匿化データベースシステム１００のデータ検索時の動作を示すフローチャート。5 is a flowchart showing an operation at the time of data search of the concealment database system 100 according to the first embodiment. 実施の形態１に係るダミー生成部２３０の構成の一例を示す図。FIG. 3 is a diagram illustrating an example of a configuration of a dummy generation unit 230 according to the first embodiment. 実施の形態１に係るダミー生成部２３０のデータ登録時の動作を示すフローチャート。5 is a flowchart showing an operation at the time of data registration of a dummy generation unit 230 according to the first embodiment. 実施の形態１に係る秘匿化データベース３３０の表の構成の一例を示す図。The figure which shows an example of the structure of the table | surface of the concealment database 330 which concerns on Embodiment 1. FIG. 実施の形態１に係る秘匿化データベース３３０の表の構成の図６とは異なる一例を示す図。The figure which shows an example different from FIG. 6 of the structure of the table | surface of the concealment database 330 which concerns on Embodiment 1. FIG. 実施の形態２に係る秘匿化データベースシステム１００の構成の一例を示す図。The figure which shows an example of a structure of the concealment database system 100 which concerns on Embodiment 2. FIG. 実施の形態２に係る秘匿化データベースシステム１００のデータ検索時の動作を示すフローチャート。The flowchart which shows the operation | movement at the time of the data search of the concealment database system 100 which concerns on Embodiment 2. FIG. 実施の形態２に係るダミー生成部２３０のデータ検索時の動作を示すフローチャート。9 is a flowchart showing an operation at the time of data search of the dummy generation unit 230 according to the second embodiment. クライアント２００とサーバ３００とのハードウェア構成の一例を示す図。The figure which shows an example of the hardware constitutions of the client 200 and the server 300.

実施の形態１．
図１は、実施の形態１に係る秘匿化データベースシステム１００の構成の一例を示す図である。
秘匿化データベースシステム１００は、クライアント２００（データ生成装置）と、サーバ３００（データベースサーバ）とを備える。クライアント２００とサーバ３００とは、ネットワーク４００で接続される。ネットワーク４００はインターネットやイントラネット、ＬＡＮ、電話回線網などにより構成される。クライアント２００とサーバ３００とが１台のコンピュータに同居し、コンピュータ内部のバスやメモリ階層をネットワーク４００とする構成としてもよい。
入力データ５００は、サーバ３００へ登録されるデータであり、登録時にクライアント２００へ入力される。検索キー６００は、検索のキーとして利用されるデータであり、検索時にクライアント２００に入力される。検索結果データ７００は、検索キー６００により検索された結果、クライアント２００から出力されるデータである。 Embodiment 1 FIG.
FIG. 1 is a diagram illustrating an example of a configuration of a concealment database system 100 according to the first embodiment.
The concealment database system 100 includes a client 200 (data generation device) and a server 300 (database server). Client 200 and server 300 are connected via network 400. The network 400 includes the Internet, an intranet, a LAN, a telephone line network, and the like. The client 200 and the server 300 may coexist in one computer, and the network and the bus or memory hierarchy inside the computer may be used as the network 400.
The input data 500 is data registered in the server 300 and is input to the client 200 at the time of registration. The search key 600 is data used as a search key, and is input to the client 200 at the time of search. The search result data 700 is data output from the client 200 as a result of searching with the search key 600.

クライアント２００は、暗号化部２１０、登録データハッシュ部２２０（登録タグ生成部）、ダミー生成部２３０、検索キーハッシュ部２４０（検索タグ生成部）、照合部２５０（抽出部）、復号部２６０、ユーザインタフェース部２７０（入力部）、クライアント通信部２８０（記憶部）を備える。
サーバ３００は、登録部３１０、検索部３２０、秘匿化データベース３３０、サーバ通信部３４０を備える。秘匿化データベース３３０には、ハッシュデータフィールド３３１、暗号化データフィールド３３２の領域が設けられている。 The client 200 includes an encryption unit 210, a registration data hash unit 220 (registration tag generation unit), a dummy generation unit 230, a search key hash unit 240 (search tag generation unit), a collation unit 250 (extraction unit), a decryption unit 260, A user interface unit 270 (input unit) and a client communication unit 280 (storage unit) are provided.
The server 300 includes a registration unit 310, a search unit 320, a concealment database 330, and a server communication unit 340. The concealment database 330 is provided with a hash data field 331 and an encrypted data field 332.

図２は、実施の形態１に係る秘匿化データベースシステム１００のデータ登録時の動作を示すフローチャートである。
まず、クライアント２００側で処理が実行される。
（Ｓ１０１）にて、ユーザインタフェース部２７０は、入力データ５００を入力装置により入力する。（Ｓ１０２）にて、暗号化部２１０は、入力データ５００を処理装置により暗号化し、暗号化データを生成する。（Ｓ１０３）にて、ダミー生成部２３０は、入力データ５００に基づき処理装置によりダミーデータを生成する。（Ｓ１０４）にて、登録データハッシュ部２２０は、入力データ５００をハッシュしたハッシュデータ（入力タグ）と、ダミーデータをハッシュした追加ダミーデータ（ダミータグ）とを処理装置により生成する。（Ｓ１０５）にて、クライアント通信部２８０は、暗号化データとハッシュデータと追加ダミーデータとを、ネットワーク４００を経由してサーバ通信部３４０へ送信する。その際、ハッシュデータと追加ダミーデータの順序をランダムに並び替えて送信する。これにより、ハッシュデータと追加ダミーデータとが混合され、区別がつかないようになり、セキュリティ強度を高めることができる。 FIG. 2 is a flowchart showing an operation at the time of data registration of the concealment database system 100 according to the first embodiment.
First, processing is executed on the client 200 side.
In (S101), user interface unit 270 inputs input data 500 using an input device. In (S102), encryption unit 210 encrypts input data 500 with a processing device, and generates encrypted data. In (S103), dummy generation unit 230 generates dummy data by the processing device based on input data 500. In (S104), registered data hash unit 220 generates hash data (input tag) obtained by hashing input data 500 and additional dummy data (dummy tag) obtained by hashing dummy data by the processing device. In (S105), client communication unit 280 transmits the encrypted data, hash data, and additional dummy data to server communication unit 340 via network 400. At that time, the order of the hash data and the additional dummy data is randomly rearranged and transmitted. As a result, the hash data and the additional dummy data are mixed and cannot be distinguished, and the security strength can be increased.

続いて、サーバ３００側で処理が実行される。
（Ｓ１０６）にて、登録部３１０は、暗号化データとハッシュデータと追加ダミーデータとを秘匿化データベース３３０に登録する。このとき、暗号化部２１０が生成した暗号化データは暗号化データフィールド３３２に登録され、登録データハッシュ部２２０が生成したデータ（ハッシュデータと追加ダミーデータ）は対応する暗号化データと関連付けられてハッシュデータフィールド３３１に登録される。 Subsequently, processing is executed on the server 300 side.
In (S <b> 106), registration unit 310 registers encrypted data, hash data, and additional dummy data in concealment database 330. At this time, the encrypted data generated by the encryption unit 210 is registered in the encrypted data field 332, and the data (hash data and additional dummy data) generated by the registered data hash unit 220 is associated with the corresponding encrypted data. Registered in the hash data field 331.

図３は、実施の形態１に係る秘匿化データベースシステム１００のデータ検索時の動作を示すフローチャートである。
まず、クライアント２００側で処理が実行される。
（Ｓ２０１）にて、ユーザインタフェース部２７０は、検索キー６００を入力装置により入力する。（Ｓ２０２）にて、検索キーハッシュ部２４０は、検索キー６００をハッシュしたハッシュデータ（検索タグ）を処理装置により生成する。なお、Ｓ１０４とＳ２０２とでは、同じハッシュ関数が用いられる。（Ｓ２０３）にて、クライアント通信部２８０は、検索キーのハッシュデータをネットワーク４００を経由してサーバ通信部３４０へ送信する。 FIG. 3 is a flowchart showing an operation at the time of data search of the concealment database system 100 according to the first embodiment.
First, processing is executed on the client 200 side.
In (S201), user interface unit 270 inputs search key 600 using the input device. In (S202), the search key hash unit 240 generates hash data (search tag) obtained by hashing the search key 600 by the processing device. Note that the same hash function is used in S104 and S202. In (S203), client communication unit 280 transmits the hash data of the search key to server communication unit 340 via network 400.

続いて、サーバ３００側で処理が実行される。
（Ｓ２０４）にて、検索部３２０は、検索キーのハッシュデータをキーとしてハッシュデータフィールド３３１の検索を処理装置により行う。（Ｓ２０５）にて、検索部３２０は、検索ヒットするハッシュデータがあったどうかを処理装置により判定する。ヒットしなかった場合（Ｓ２０５でＮＯ）には、（Ｓ２０６）にて、サーバ通信部３４０は、検索結果（ヒット件数０）をクライアント通信部２８０へ送信する。一方、ヒットした場合（Ｓ２０５でＹＥＳ）には、（Ｓ２０７）にて、検索部３２０は、ヒットしたハッシュデータに関連付けられた暗号化データを暗号化データフィールド３３２から処理装置により抽出する。（Ｓ２０８）にて、サーバ通信部３４０は、検索結果の暗号化データをネットワーク４００を経由してクライアント通信部２８０へ送信する。 Subsequently, processing is executed on the server 300 side.
In (S204), search unit 320 searches the hash data field 331 by the processing device using the hash data of the search key as a key. In (S205), search unit 320 determines whether there is hash data that causes a search hit by the processing device. If there is no hit (NO in S205), the server communication unit 340 transmits the search result (the number of hits 0) to the client communication unit 280 in (S206). On the other hand, if there is a hit (YES in S205), in (S207), the search unit 320 extracts encrypted data associated with the hit hash data from the encrypted data field 332 by the processing device. In (S208), server communication unit 340 transmits the encrypted data of the search result to client communication unit 280 via network 400.

再び、クライアント２００側で処理が実行される。
（Ｓ２０９）にて、復号部２６０は、暗号化データを処理装置により復号する。（Ｓ２１０）にて、照合部２５０は、復号されたデータを検索キー６００と処理装置により照合し、合致するデータを検索結果とする。（Ｓ２１１）にて、ユーザインタフェース部２７０は、Ｓ２０６の検索結果又はＳ２１０の検索結果を検索結果データ７００として出力する。 The process is executed again on the client 200 side.
In (S209), decryption unit 260 decrypts the encrypted data by the processing device. In (S210), collation unit 250 collates the decrypted data with search key 600 and the processing device, and sets the matching data as the search result. In (S211), the user interface unit 270 outputs the search result of S206 or the search result of S210 as the search result data 700.

以上に述べたように、秘匿化データベースシステム１００では、データの暗号化、復号、およびハッシュはクライアント２００側にて実行される。そのため、サーバ３００側へ入力データ５００や検索キー６００が元のまま（暗号化されない状態で）送られることはなく、サーバ３００側での盗聴に対して秘匿性が保たれる。
また、追加ダミーデータがハッシュデータに混合されるため、ハッシュデータフィールドの頻度分析の攻撃に対してセキュリティ強度を高めることができる。 As described above, in the concealment database system 100, data encryption, decryption, and hashing are executed on the client 200 side. Therefore, the input data 500 and the search key 600 are not sent to the server 300 side as they are (in an unencrypted state), and confidentiality is maintained against eavesdropping on the server 300 side.
Further, since the additional dummy data is mixed with the hash data, it is possible to increase the security strength against the attack of the frequency analysis of the hash data field.

図４は、実施の形態１に係るダミー生成部２３０の構成の一例を示す図である。
ダミー生成部２３０は、要素数計測部２３１、要素表２３２、ダミーデータ生成部２３３、ダミーデータ表２３４、追加ダミーデータ数設定部２３５、ダミーデータ選択部２３６を備える。 FIG. 4 is a diagram illustrating an example of the configuration of the dummy generation unit 230 according to the first embodiment.
The dummy generation unit 230 includes an element number measurement unit 231, an element table 232, a dummy data generation unit 233, a dummy data table 234, an additional dummy data number setting unit 235, and a dummy data selection unit 236.

図５は、実施の形態１に係るダミー生成部２３０のデータ登録時の動作を示すフローチャートである。
（Ｓ３０１）にて、要素数計測部２３１は、ダミー生成部２３０に入力データ５００が入力される度に、入力データ５００が登録されるフィールドに出現するユニークな要素数を、要素表２３２を参照して計測する。例えば、要素数計測部２３１は、入力データ５００が県名を登録する県名フィールドに登録される場合には、県名フィールドに登録されている県の数をカウントする。要素数計測部２３１は、入力データ５００が要素表２３２における該当のフィールドに対応する欄に存在するかどうか確認し、存在しない場合には新たに要素表に追加する。要素数計測部２３１は、その上で要素表のその欄のエントリ数を要素数ｙとする。
（Ｓ３０２）にて、要素数計測部２３１は、要素数ｙに変更があるかどうかを処理装置により判定する。変更がある場合（Ｓ３０２でＹＥＳ）にはＳ３０３へ処理を進めてＳ３０３〜Ｓ３０６を実行し、変更がない場合（Ｓ３０２でＮＯ）にはＳ３０３〜Ｓ３０６を省略してＳ３０７へ処理を進める。 FIG. 5 is a flowchart showing an operation at the time of data registration of the dummy generation unit 230 according to the first embodiment.
In (S301), the element count measurement unit 231 refers to the element table 232 for the unique number of elements that appear in the field in which the input data 500 is registered each time the input data 500 is input to the dummy generation unit 230. And measure. For example, when the input data 500 is registered in the prefecture name field in which the prefecture name is registered, the element number measuring unit 231 counts the number of prefectures registered in the prefecture name field. The element number measuring unit 231 checks whether the input data 500 exists in the column corresponding to the corresponding field in the element table 232, and adds it to the element table if it does not exist. The element number measuring unit 231 sets the number of entries in that column of the element table as the element number y.
In (S302), the number-of-elements measurement unit 231 determines whether or not the number of elements y is changed by the processing device. If there is a change (YES in S302), the process proceeds to S303 and S303 to S306 are executed, and if there is no change (NO in S302), S303 to S306 are omitted and the process proceeds to S307.

（Ｓ３０３）にて、ダミーデータ生成部２３３は、使用ダミーデータ数ｋを式（１）を用いて処理装置により計算する。
＜式（１）＞
ｋ＝（ｙ＋α）×ｎ
ここで、αは０以上の整数であり定数である。αは通常は０としてよい。しかし、要素数ｙが小さい（例えば１０以下などの）場合には、αを１以上の値とすることで使用ダミーデータのバリエーションを増やし、セキュリティ強度を高める。ｎは追加ダミーデータ数設定部２３５により設定された追加ダミーデータ数である。追加ダミーデータ数設定部２３５は、追加ダミーデータ数を、入力装置を介してユーザにより設定させ、データベースの表定義時に表毎、あるいは列毎に決定させる。追加ダミーデータ数が大きいほどセキュリティ強度を高めることができるが、検索対象となるハッシュデータフィールドのサイズが大きくなるため、ユーザの要件に応じて設定可能とする。
（Ｓ３０４）にて、ダミーデータ生成部２３３は、不足する使用ダミーデータを処理装置により生成する。使用ダミーデータは、入力データに対して特定の演算（例えば定数加算など）を行うことで生成してもよいし、擬似乱数を用いてランダムに生成してもよい。
（Ｓ３０５）にて、ダミーデータ生成部２３３は、ｉ＝１，．．．，ｋのｋ個の使用ダミーデータｘ（ｉ）の出現確率ｐ（ｉ）を、Σ_ｉ＝１ ^ｋｐ（ｉ）＝１となるように、処理装置によりランダムに割り当てる。なお、出現確率ｐ（ｉ）とは、使用ダミーデータｘ（ｉ）がダミーデータ選択部２３６により選択され、使用される確率である。
（Ｓ３０６）にて、ダミーデータ生成部２３３は、ダミーデータ表２３４を更新する。すなわち、ダミーデータ生成部２３３は、Ｓ３０４で新たに生成した使用ダミーデータのエントリを追加し、全エントリの出現確率ｐ（ｉ）をＳ３０５で生成した値に更新する。ダミーデータ表２３４は、使用ダミーデータｘ（ｉ）と出現確率ｐ（ｉ）とを対応付けて保存した表であり、簡単には２次元配列構造にて実現できるが、他のデータ構造としてもよい。 In (S303), dummy data generation unit 233 calculates the number of used dummy data k by the processing device using equation (1).
<Formula (1)>
k = (y + α) × n
Here, α is an integer of 0 or more and is a constant. α may normally be 0. However, when the number of elements y is small (for example, 10 or less), the use dummy data is increased by setting α to a value of 1 or more, thereby increasing the security strength. n is the number of additional dummy data set by the additional dummy data number setting unit 235. The additional dummy data number setting unit 235 causes the user to set the number of additional dummy data via the input device, and determines it for each table or each column when defining a table in the database. The greater the number of additional dummy data, the higher the security strength. However, since the size of the hash data field to be searched increases, it can be set according to user requirements.
In (S304), dummy data generation unit 233 generates insufficient use dummy data by the processing device. The use dummy data may be generated by performing a specific operation (for example, constant addition) on the input data, or may be generated randomly using a pseudo-random number.
In (S305), the dummy data generation unit 233 determines that i = 1,. . . , K are randomly assigned by the processing device such that Σ _{i = 1} ^k p (i) = 1, so that the appearance probability p (i) of the k used dummy data x (i). The appearance probability p (i) is the probability that the dummy data selection unit 236 will select and use the use dummy data x (i).
In (S306), dummy data generation unit 233 updates dummy data table 234. That is, the dummy data generation unit 233 adds the entry of the use dummy data newly generated in S304, and updates the appearance probability p (i) of all entries to the value generated in S305. The dummy data table 234 is a table in which the use dummy data x (i) and the appearance probability p (i) are stored in association with each other. The dummy data table 234 can be easily realized by a two-dimensional array structure. Good.

（Ｓ３０７）にて、ダミーデータ選択部２３６は、ダミーデータ表２３４を参照し、出現確率ｐ（ｉ）に基づいてｋ個の使用ダミーデータｘ（ｉ）の中からｎ個の使用ダミーデータｘ（ｉ）を選択して出力する。つまり、ダミーデータ選択部２３６は、使用ダミーデータｘ（ｉ）の選択される確率が出現確率ｐ（ｉ）となるように、ｋ個の使用ダミーデータｘ（ｉ）の中からｎ個の使用ダミーデータｘ（ｉ）を選択する。 In (S307), the dummy data selection unit 236 refers to the dummy data table 234 and, based on the appearance probability p (i), n use dummy data x out of the k use dummy data x (i). Select (i) and output. That is, the dummy data selection unit 236 uses n pieces of use dummy data x (i) from the k pieces of use dummy data x (i) so that the probability that the use dummy data x (i) is selected becomes the appearance probability p (i). Dummy data x (i) is selected.

以上述べたように、ダミー生成部２３０は、使用ダミーデータｘ（ｉ）の数を要素数に応じて決定する。これにより、使用ダミーデータｘ（ｉ）の偏りが元データに近い偏りになる。そして、使用ダミーデータｘ（ｉ）の出現確率ｐ（ｉ）をランダムに割り当てることで、ハッシュデータフィールド３３１に対する頻度分析をより困難にすることができ、セキュリティ強度が向上する。 As described above, the dummy generation unit 230 determines the number of use dummy data x (i) according to the number of elements. Thereby, the bias of the use dummy data x (i) becomes a bias close to the original data. Then, by randomly assigning the appearance probability p (i) of the use dummy data x (i), the frequency analysis for the hash data field 331 can be made more difficult, and the security strength is improved.

図６は、実施の形態１に係る秘匿化データベース３３０の表の構成の一例を示す図である。図６（ａ）はユーザ表５１０を示す図であり、図６（ｂ）は秘匿化表３３３を示す図である。
ユーザ表５１０は、クライアント２００を用いてデータベースにアクセスするユーザから見た表の一例である。秘匿化表３３３は、ユーザ表５１０に対応して秘匿化データベース３３０に格納された表の一例である。秘匿化表３３３では暗号化データおよびハッシュデータの例を英数字の文字列で示しているが、任意のバイナリデータであってよい。
ユーザ表５１０中の列５２０を暗号化したデータが、秘匿化表３３３の暗号化データフィールド３３２に登録される。秘匿化表３３３には、列５２０に関連するハッシュデータフィールド３３１が追加されている。ハッシュデータフィールド３３１は、追加ダミーデータ数ｎに対して、入力データ５００をハッシュしたハッシュデータを登録する１列を加えたｎ＋１列で構成される。暗号化の有無およびハッシュデータの有無はユーザ表の列毎に設定することが可能である。ｒ列に対してハッシュデータを追加する場合にはｒ×（ｎ＋１）列が秘匿化表３３３に追加される。
ユーザ表５１０にＩＤ＝０００１のレコードがユーザインタフェース部２７０を通じて追加される場合、例えば入力データ５００は暗号化されて暗号化データ５０１となり、それに対応してハッシュデータ５０２，５０３として生成される（ｎ＝１の場合の例）。ハッシュデータ５０２，５０３は、一方が追加ダミーデータであり、他方が入力データ５００のハッシュデータである。しかし、前述の通り順序がランダム化されるため、どちらが追加ダミーデータであるかはわからない。したがって、ｎの値を増加させるにつれてセキュリティ強度を向上できる。 FIG. 6 is a diagram illustrating an example of a table configuration of the anonymization database 330 according to the first embodiment. FIG. 6A is a diagram showing the user table 510, and FIG. 6B is a diagram showing the concealment table 333.
The user table 510 is an example of a table viewed from a user who accesses the database using the client 200. The concealment table 333 is an example of a table stored in the concealment database 330 corresponding to the user table 510. In the concealment table 333, examples of encrypted data and hash data are shown as alphanumeric character strings, but arbitrary binary data may be used.
Data obtained by encrypting the column 520 in the user table 510 is registered in the encrypted data field 332 of the concealment table 333. In the concealment table 333, a hash data field 331 related to the column 520 is added. The hash data field 331 includes n + 1 columns obtained by adding one column for registering hash data obtained by hashing the input data 500 to the number of additional dummy data n. The presence or absence of encryption and the presence or absence of hash data can be set for each column of the user table. When hash data is added to the r column, r × (n + 1) columns are added to the concealment table 333.
When a record with ID = 0001 is added to the user table 510 through the user interface unit 270, for example, the input data 500 is encrypted to become encrypted data 501 and correspondingly generated as hash data 502 and 503 (n = 1 example). One of the hash data 502 and 503 is additional dummy data, and the other is hash data of the input data 500. However, since the order is randomized as described above, it is not known which is additional dummy data. Therefore, the security strength can be improved as the value of n is increased.

なお、秘匿化表３３３の列名も秘匿化することで更に頻度解析を行いにくくできる。また、ユーザ表５１０の列名と、秘匿化表３３３の列名との対応関係はユーザインタフェース部２７０にて管理され、ユーザからのアクセス時に適切に相互変換される。秘匿化表３３３は広く用いられているＲＤＢ（ＲｅｌａｔｉｏｎａｌＤａｔａＢａｓｅ）により実装できるが、その他のオブジェクト指向データベースや列指向データベースなどによる実装としてもよい。ＲＤＢであれば、ハッシュデータフィールド３３１に対してインデックスを生成することで高速な検索が可能となる。 It should be noted that the frequency analysis can be made more difficult by concealing the column names of the concealment table 333 as well. Further, the correspondence relationship between the column names of the user table 510 and the column names of the concealment table 333 is managed by the user interface unit 270, and is appropriately converted when accessed by the user. The concealment table 333 can be implemented by a widely used RDB (Relational DataBase), but may be implemented by other object-oriented database or column-oriented database. In the case of RDB, a high-speed search is possible by generating an index for the hash data field 331.

図７は、実施の形態１に係る秘匿化データベース３３０の表の構成の図６とは異なる一例を示す図である。図７（ａ）はユーザ表５１０を示す図であり、図７（ｂ）は秘匿化表３３３の暗号化データフィールド３３２部分の表を示す図であり、図７（ｃ）は秘匿化表３３３のハッシュデータフィールド３３１部分の表を示す図である。
図６との違いは、秘匿化表３３３の暗号化フィールドと、ハッシュフィールドとが別の表で実装される点である。列５２０に関連するハッシュデータフィールド３３１が秘匿化表３３３の暗号化フィールドとは別の表に実装され、ユーザ表５１０の１列に対してハッシュデータフィールドも１列（Ｈ００１の列）となり、関連付けのためにＩＤをＨ００２に保持する。従って、追加ダミーデータ数ｎに対してｎ＋１行が１レコード分に対応する。ハッシュデータ５０２，５０３の順序がランダム化されるのは前述と同様である。検索時には２つの表を連携させる（ＪＯＩＮさせる）ことによりハッシュデータの検索結果から暗号化データの抽出が可能である。 FIG. 7 is a diagram showing an example different from FIG. 6 of the table configuration of the concealment database 330 according to the first embodiment. 7A is a diagram showing the user table 510, FIG. 7B is a diagram showing a table of the encrypted data field 332 portion of the concealment table 333, and FIG. 7C is the concealment table 333. It is a figure which shows the table | surface of the hash data field 331 part.
The difference from FIG. 6 is that the encryption field of the concealment table 333 and the hash field are implemented in different tables. The hash data field 331 related to the column 520 is implemented in a table different from the encrypted field of the concealment table 333, and the hash data field is also one column (H001 column) with respect to one column of the user table 510. For this reason, the ID is held in H002. Therefore, n + 1 rows correspond to one record for the number of additional dummy data n. The order of the hash data 502 and 503 is randomized as described above. At the time of retrieval, it is possible to extract encrypted data from the retrieval result of hash data by linking (joining) two tables.

ユーザインタフェース部２７０は、一般に用いられるデータベースのインタフェースを実装することで、ユーザの利便性を向上することができる。例えば、ユーザインタフェース部２７０は、標準ＳＱＬインタフェースに準拠した実装とすることができる。
図６に示した表を検索する場合の例では、ユーザが以下のような問合せ文を作成して検索を実行する。
＜ユーザにより入力される問合せ文＞
ＳＥＬＥＣＴ＊ＦＲＯＭ “ユーザ表５１０” ＷＨＥＲＥ “住所（県）”＝“神奈川”；
すると、秘匿化データベース３３０に対しては以下のような問合せに変換される。
＜変換された後の問合せ文＞
ＳＥＬＥＣＴ＊ＦＲＯＭ “秘匿化表３３３” ＷＨＥＲＥ “Ｃ０２０”＝“８ｄｆｋ３ｅ３” ＯＲ “Ｃ０２１”＝“８ｄｆｋ３ｅ３”；
つまり、「住所（県）」という列名は検索用のハッシュデータフィールドを指すＣ０２０、Ｃ０２１という列名に変換され、「神奈川」という検索キーはハッシュされて「８ｄｆｋ３ｅ３」に変換される。ハッシュフィールドは複数の列から成るため、複数の条件のＯＲ条件に変換される。検索結果を返す際にはＣ０２０以降の列データは検索専用であるため捨てられ、暗号化データが復号されて出力される。このような変換を行うことにより、ユーザは秘匿化データベースへトランスペアレントな（透過的な）アクセスが可能となる。 The user interface unit 270 can improve user convenience by mounting a commonly used database interface. For example, the user interface unit 270 can be implemented in accordance with a standard SQL interface.
In the example in which the table shown in FIG. 6 is searched, the user creates the following query statement and executes the search.
<Inquiry text entered by the user>
SELECT * FROM “user table 510” WHERE “address (prefecture)” = “Kanagawa”;
Then, the concealment database 330 is converted into the following query.
<Query text after conversion>
SELECT * FROM “Concealment Table 333” WHERE “C020” = “8dfk3e3” OR “C021” = “8dfk3e3”;
That is, the column name “address (prefecture)” is converted into column names C020 and C021 indicating the hash data field for search, and the search key “Kanagawa” is hashed and converted into “8dfk3e3”. Since the hash field is composed of a plurality of columns, it is converted into an OR condition of a plurality of conditions. When returning the search result, the column data after C020 is discarded because it is exclusively for search, and the encrypted data is decrypted and output. By performing such conversion, the user can have transparent (transparent) access to the concealment database.

登録データハッシュ部２２０および検索キーハッシュ部２４０は、一般に用いられている任意のハッシュ関数を用いることができる。ハッシュデータの照合による検索を行うために、元データが決まれば一意にハッシュデータが決まるような関数であればよい。望ましくは、ＭＤ５やＳＨＡ−１などの暗号学的ハッシュ関数を用いることでセキュリティ強度を高めることができる。また、元データから暗号化データを一意に特定できる確定的暗号を用いることも可能である。
元データとハッシュデータとは１対１に対応するハッシュ関数を用いると、ハッシュデータの衝突が発生しないため、１つの検索キーに対する検索結果が一意に特定できる。その結果、照合部２５０での照合を省略することができるため、検索速度を向上させることができる。元データとハッシュデータとがｎ対１に対応するようなハッシュ関数を用いる場合には、ハッシュデータの頻度分析に対するセキュリティ強度が向上する。しかし、検索時に過剰検出が発生するため、照合部２５０にて復号データと検索キーとの照合を行う必要がある。 The registration data hash unit 220 and the search key hash unit 240 can use any hash function that is generally used. In order to perform a search based on collation of hash data, any function can be used as long as the original data is determined and the hash data is uniquely determined. Desirably, the security strength can be increased by using a cryptographic hash function such as MD5 or SHA-1. It is also possible to use deterministic encryption that can uniquely identify the encrypted data from the original data.
If a hash function corresponding to one-to-one correspondence between the original data and the hash data is used, collision of hash data does not occur, so that a search result for one search key can be uniquely specified. As a result, since the collation by the collation unit 250 can be omitted, the search speed can be improved. When a hash function is used in which the original data and the hash data have an n-to-1 correspondence, the security strength for hash data frequency analysis is improved. However, since excessive detection occurs at the time of retrieval, it is necessary for the collation unit 250 to collate the decrypted data with the retrieval key.

暗号化部２１０および復号部２６０は、一般に用いられている任意の暗号アルゴリズムを用いることができる。ＤＥＳやＡＥＳ，ＭＩＳＴＹ（登録商標）などの共通鍵暗号でもよいし、ＲＳＡ暗号のような公開鍵暗号を用いてもよい。但し元データから暗号化データを一意に特定できないようにした確率的暗号を用いる必要がある。暗号鍵は暗号化部２１０および復号部２６０で管理するか、またはクライアントから参照可能な鍵管理サーバにて管理することができる。暗号鍵はユーザ単位、データベース単位、表単位、カラム単位など、任意の単位で設定するようにしてもよい。 The encryption unit 210 and the decryption unit 260 can use any commonly used encryption algorithm. Common key encryption such as DES, AES, and MISTY (registered trademark) may be used, or public key encryption such as RSA encryption may be used. However, it is necessary to use a stochastic encryption that prevents the encrypted data from being uniquely identified from the original data. The encryption key can be managed by the encryption unit 210 and the decryption unit 260, or can be managed by a key management server that can be referenced from the client. The encryption key may be set in an arbitrary unit such as a user unit, a database unit, a table unit, or a column unit.

実施の形態２．
図８は、実施の形態２に係る秘匿化データベースシステム１００の構成の一例を示す図である。
図８に示す秘匿化データベースシステム１００と、図１に示す秘匿化データベースシステム１００との違いは、検索時にもダミーデータを用いる点である。図８では、ダミー生成部２３０から検索キーハッシュ部２４０への矢印が追加されている。 Embodiment 2. FIG.
FIG. 8 is a diagram illustrating an example of the configuration of the concealment database system 100 according to the second embodiment.
The difference between the anonymized database system 100 shown in FIG. 8 and the anonymized database system 100 shown in FIG. 1 is that dummy data is used also in the search. In FIG. 8, an arrow from the dummy generation unit 230 to the search key hash unit 240 is added.

図９は、実施の形態２に係る秘匿化データベースシステム１００のデータ検索時の動作を示すフローチャートである。
図３を用いて説明した動作との違いを説明する。図９に示すＳ４０１は図３に示すＳ２０１と同様であり、図９に示すＳ４０６−Ｓ４１２は、図３に示すＳ２０５〜Ｓ２１１と同様である。
（Ｓ４０２）にて、ダミー生成部２３０は、ダミーデータを処理装置により生成する。（Ｓ４０３）にて、検索キーハッシュ部２４０は、検索キー６００をハッシュしたハッシュデータ（検索タグ）と、Ｓ４０２で生成されたダミーデータをハッシュした追加ダミーデータ（ダミータグ）とを処理装置により生成する。（Ｓ４０４）にて、クライアント通信部２８０は、検索キーのハッシュデータと追加ダミーデータとを、順序をランダム化することにより混合してサーバ通信部３４０へ送信する。（Ｓ４０５）にて、検索部３２０は、検索キーのハッシュデータと追加ダミーデータとをキーとして、いずれかにヒットすることを条件としてハッシュデータフィールド３３１の検索を行う。
なお、ダミーデータのハッシュデータによる検索で検索結果に追加される過剰検出は、Ｓ４１１において除去され、検索結果は正しい結果となる。 FIG. 9 is a flowchart showing an operation at the time of data search of the concealment database system 100 according to the second embodiment.
A difference from the operation described with reference to FIG. 3 will be described. 9 is the same as S201 shown in FIG. 3, and S406 to S412 shown in FIG. 9 are the same as S205 to S211 shown in FIG.
In (S402), dummy generation unit 230 generates dummy data by the processing device. In (S403), search key hash unit 240 generates hash data (search tag) obtained by hashing search key 600 and additional dummy data (dummy tag) obtained by hashing the dummy data generated in S402 by the processing device. . In (S404), the client communication unit 280 mixes the hash data of the search key and the additional dummy data by randomizing the order and transmits the mixed data to the server communication unit 340. In (S405), the search unit 320 searches the hash data field 331 using the hash data of the search key and the additional dummy data as keys, on the condition that either is hit.
Note that the excessive detection added to the search result in the search using the hash data of the dummy data is removed in S411, and the search result becomes a correct result.

図１０は、実施の形態２に係るダミー生成部２３０のデータ検索時の動作を示すフローチャートである。
図５に示すデータ登録時とは異なり、データ検索時にはダミーデータ表２３４の更新を行わない。そのため、単に、（Ｓ５０１）にて、ダミーデータ選択部２３６は、ダミーデータ表２３４を参照し、出現確率ｐ（ｉ）に基づいてｋ個の使用ダミーデータｘ（ｉ）の中からｎ個の使用ダミーデータｘ（ｉ）を選択して出力する。なお、ｎは登録時に用いる変数とは異なるｎ’を検索専用に設定することにしてもよい。 FIG. 10 is a flowchart showing an operation at the time of data search of the dummy generation unit 230 according to the second embodiment.
Unlike the data registration shown in FIG. 5, the dummy data table 234 is not updated during the data search. Therefore, simply in (S501), the dummy data selection unit 236 refers to the dummy data table 234 and selects n pieces of k use dummy data x (i) based on the appearance probability p (i). Use dummy data x (i) is selected and output. Note that n may be set exclusively for search, and n ′ different from the variable used at the time of registration.

以上述べたように、実施の形態２に係る秘匿化データベースシステム１００では、検索時にもダミーデータを用いることにより、検索条件中の検索キーの頻度解析に対してもセキュリティ強度を高めることができる。 As described above, in the concealment database system 100 according to the second embodiment, the security strength can be increased for the frequency analysis of the search key in the search condition by using the dummy data also during the search.

図１１は、クライアント２００とサーバ３００とのハードウェア構成の一例を示す図である。
図１１に示すように、クライアント２００とサーバ３００とは、プログラムを実行するＣＰＵ９１１（Ｃｅｎｔｒａｌ・Ｐｒｏｃｅｓｓｉｎｇ・Ｕｎｉｔ、中央処理装置、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、プロセッサともいう）を備えている。ＣＰＵ９１１は、バス９１２を介してＲＯＭ９１３、ＲＡＭ９１４、ＬＣＤ９０１（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）、キーボード９０２（Ｋ／Ｂ）、通信ボード９１５、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。磁気ディスク装置９２０（固定ディスク装置）の代わりに、光ディスク装置、メモリカード読み書き装置などの記憶装置でもよい。磁気ディスク装置９２０は、所定の固定ディスクインタフェースを介して接続される。 FIG. 11 is a diagram illustrating an example of a hardware configuration of the client 200 and the server 300.
As shown in FIG. 11, the client 200 and the server 300 include a CPU 911 (also referred to as a central processing unit, a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, and a processor) that executes a program. Yes. The CPU 911 is connected to the ROM 913, the RAM 914, the LCD 901 (Liquid Crystal Display), the keyboard 902 (K / B), the communication board 915, and the magnetic disk device 920 via the bus 912, and controls these hardware devices. Instead of the magnetic disk device 920 (fixed disk device), a storage device such as an optical disk device or a memory card read / write device may be used. The magnetic disk device 920 is connected via a predetermined fixed disk interface.

磁気ディスク装置９２０又はＲＯＭ９１３などには、オペレーティングシステム９２１（ＯＳ）、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。 An operating system 921 (OS), a window system 922, a program group 923, and a file group 924 are stored in the magnetic disk device 920 or the ROM 913. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922.

プログラム群９２３には、上記の説明において「暗号化部２１０」、「登録データハッシュ部２２０」、「ダミー生成部２３０」、「検索キーハッシュ部２４０」、「照合部２５０」、「復号部２６０」、「ユーザインタフェース部２７０」、「クライアント通信部２８０」、「登録部３１０」、「検索部３２０」、「秘匿化データベース３３０」、「サーバ通信部３４０」等として説明した機能を実行するソフトウェアやプログラムやその他のプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。
ファイル群９２４には、上記の説明において「入力データ５００」、「検索キー６００」、「検索結果データ７００」、「ハッシュデータ」、「追加ダミーデータ」、「仕様ダミーデータ」等の情報やデータや信号値や変数値やパラメータが、「ファイル」や「データベース」の各項目として記憶される。「ファイル」や「データベース」は、ディスクやメモリなどの記録媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶された情報やデータや信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示などのＣＰＵ９１１の動作に用いられる。抽出・検索・参照・比較・演算・計算・処理・出力・印刷・表示のＣＰＵ９１１の動作の間、情報やデータや信号値や変数値やパラメータは、メインメモリやキャッシュメモリやバッファメモリに一時的に記憶される。 The program group 923 includes “encryption unit 210”, “registration data hash unit 220”, “dummy generation unit 230”, “search key hash unit 240”, “collation unit 250”, and “decryption unit 260” in the above description. ”,“ User interface unit 270 ”,“ client communication unit 280 ”,“ registration unit 310 ”,“ search unit 320 ”,“ confidentiality database 330 ”,“ server communication unit 340 ”, etc. And programs and other programs are stored. The program is read and executed by the CPU 911.
The file group 924 includes information and data such as “input data 500”, “search key 600”, “search result data 700”, “hash data”, “additional dummy data”, and “specification dummy data” in the above description. Signal values, variable values, and parameters are stored as items of “file” and “database”. The “file” and “database” are stored in a recording medium such as a disk or a memory. Information, data, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for the operation of the CPU 911 such as calculation / processing / output / printing / display. Information, data, signal values, variable values, and parameters are temporarily stored in the main memory, cache memory, and buffer memory during the operation of the CPU 911 for extraction, search, reference, comparison, calculation, calculation, processing, output, printing, and display. Is remembered.

また、上記の説明におけるフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号値は、ＲＡＭ９１４のメモリ、その他光ディスク等の記録媒体やＩＣチップに記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体や電波によりオンライン伝送される。
また、上記の説明において「〜部」として説明するものは、「〜回路」、「〜装置」、「〜機器」、「〜手段」、「〜機能」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。また、「〜装置」として説明するものは、「〜回路」、「〜機器」、「〜手段」、「〜機能」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。さらに、「〜処理」として説明するものは「〜ステップ」であっても構わない。すなわち、「〜部」として説明するものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。或いは、ソフトウェアのみ、或いは、素子・デバイス・基板・配線などのハードウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、ＲＯＭ９１３等の記録媒体に記憶される。プログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。すなわち、プログラムは、上記で述べた「〜部」としてコンピュータ等を機能させるものである。あるいは、上記で述べた「〜部」の手順や方法をコンピュータ等に実行させるものである。 In the above description, the arrows in the flowchart mainly indicate input / output of data and signals, and the data and signal values are recorded in a memory of the RAM 914, other recording media such as an optical disk, and an IC chip. Data and signals are transmitted online by a bus 912, signal lines, cables, other transmission media, and radio waves.
In addition, what is described as “to part” in the above description may be “to circuit”, “to device”, “to device”, “to means”, and “to function”. It may be “step”, “˜procedure”, “˜processing”. In addition, what is described as “˜device” may be “˜circuit”, “˜device”, “˜means”, “˜function”, and “˜step”, “˜procedure”, “ ~ Process ". Furthermore, what is described as “to process” may be “to step”. That is, what is described as “˜unit” may be realized by firmware stored in the ROM 913. Alternatively, it may be implemented only by software, or only by hardware such as elements, devices, substrates, and wirings, by a combination of software and hardware, or by a combination of firmware. Firmware and software are stored in a recording medium such as ROM 913 as a program. The program is read by the CPU 911 and executed by the CPU 911. That is, the program causes a computer or the like to function as the “˜unit” described above. Alternatively, the computer or the like is caused to execute the procedures and methods of “to part” described above.

１００秘匿化データベースシステム、２００クライアント、２１０暗号化部、２２０登録データハッシュ部、２３０ダミー生成部、２３１要素数計測部、２３２要素表、２３３ダミーデータ生成部、２３４ダミーデータ表、２３５追加ダミーデータ数設定部、２３６ダミーデータ選択部、２４０検索キーハッシュ部、２５０照合部、２６０復号部、２７０ユーザインタフェース部、２８０クライアント通信部、３００サーバ、３１０登録部、３２０検索部、３３０秘匿化データベース、３３１ハッシュデータフィールド、３３２暗号化データフィールド、３３３秘匿化表、３４０サーバ通信部、４００ネットワーク、５００入力データ、５１０ユーザ表、６００検索キー、７００検索結果データ。 100 concealment database system, 200 client, 210 encryption unit, 220 registration data hash unit, 230 dummy generation unit, 231 element number measurement unit, 232 element table, 233 dummy data generation unit, 234 dummy data table, 235 additional dummy data Number setting unit, 236 dummy data selection unit, 240 search key hash unit, 250 verification unit, 260 decryption unit, 270 user interface unit, 280 client communication unit, 300 server, 310 registration unit, 320 search unit, 330 concealment database, 331 Hash data field, 332 Encrypted data field, 333 Concealment table, 340 Server communication unit, 400 network, 500 input data, 510 user table, 600 search key, 700 search result data.

Claims

An input unit for inputting input data;
An encryption unit that encrypts input data input by the input unit to generate encrypted data;
An element number measuring unit that measures the number of values that the input data can take as the number of elements;
A dummy data generating unit that generates dummy data of a number corresponding to the number of elements measured by the element number measuring unit;
A dummy data selection unit that selects n pieces of dummy data (n is an integer of 1 or more) determined in advance from dummy data generated by the dummy data generation unit;
A registration tag generation unit that generates predetermined tags for the input data and generates input tags, and generates dummy tags by performing the predetermined calculations on dummy data selected by the dummy data selection unit;
A data generation device comprising: a storage unit that stores an input tag and a dummy tag generated by the registration tag generation unit in a storage device in association with the encrypted data generated by the encryption unit.

The dummy data generation unit generates dummy data and assigns a predetermined probability to each generated dummy data,
The dummy data selection unit selects n dummy data set in advance from the dummy data generated by the dummy data generation unit so that the probability of selection becomes the probability assigned by the dummy data generation unit. The data generation device according to claim 1.

The dummy data generation unit is k dummy data calculated by k = (y + α) × n based on the n, the number of elements y measured by the number-of-elements measurement unit, and a predetermined value α. And i = 1,. . . , K, and randomly assign the probability p (i) that k dummy data are selected such that Σ _{i = 1} ^k p (i) = 1,
The data generation apparatus according to claim 2, wherein the dummy data selection unit selects n dummy data from the dummy data generated by the dummy data generation unit.

4. The storage unit according to claim 1, wherein the storage unit arranges the input tag and the dummy data in a random order, and transmits the input tag and the dummy data in association with the encrypted data to be stored in the storage device. 5. Data generator.

The data generation device further includes:
A search key input part for inputting a search key;
A search tag generating unit that generates the search tag by performing the predetermined calculation on the search key input by the search key input unit;
5. The data generation device according to claim 1, further comprising: an acquisition unit that acquires, from the storage device, encrypted data associated with the search tag generated by the search tag generation unit. .

The dummy data selection unit selects n ′ (n ′ is an integer equal to or greater than 1) dummy data determined in advance from the dummy data generated by the dummy data generation unit,
The search tag generation unit generates the search tag and generates a dummy tag by performing the predetermined calculation on the n ′ pieces of dummy data selected by the dummy data selection unit,
The acquisition unit transmits the search tag and the dummy tag to the storage device, acquires encrypted data associated with any one of the search tag and the dummy data from the storage device,
The data generation device further includes:
The data generation apparatus according to claim 1, further comprising: an extraction unit that extracts encrypted data associated with the search tag from the encrypted data acquired by the acquisition unit.

The data generation apparatus according to claim 6, wherein the dummy data generation unit arranges the search tag and the dummy data in a random order and transmits them to the storage device.

An input process in which the input device inputs input data;
An encryption step in which the processing device generates encrypted data by encrypting the input data input in the input step;
A processing unit that measures the number of values that the input data can take as the number of elements;
A dummy data generation step in which the processing device generates dummy data of a number corresponding to the number of elements measured in the element number measurement step;
A dummy data selection step in which the processing device selects n pieces of dummy data (n is an integer of 1 or more) determined in advance from the dummy data generated in the dummy data generation step;
A processing device performs a predetermined calculation on the input data to generate an input tag, and generates a registration tag for generating a dummy tag by performing the predetermined calculation on the dummy data selected in the dummy data selection step Process,
A data generation method comprising: a storage device storing a storage device in a storage device in association with the encrypted data generated in the encryption step, the input tag and the dummy tag generated in the registration tag generation step .

Input processing to input input data;
An encryption process for encrypting the input data input in the input process to generate encrypted data;
Element number measurement processing for measuring the number of values that the input data can take as the number of elements;
Dummy data generation processing for generating a number of dummy data according to the number of elements measured in the element number measurement processing;
A dummy data selection process for selecting n predetermined dummy data from the dummy data generated by the dummy data generation process;
A registration tag generation process for generating a predetermined tag for the input data and generating an input tag, and generating a dummy tag by performing the predetermined calculation for the dummy data selected in the dummy data selection process;
A data generation program that causes a computer to execute a storage process in which an input tag and a dummy tag generated in the registration tag generation process are associated with encrypted data generated in the encryption process and stored in a storage device.

A database system comprising a data generation device that generates data, and a database server that stores data generated by the data generation device,
The data generation device includes:
An input unit for inputting input data;
An encryption unit that encrypts input data input by the input unit to generate encrypted data;
An element number measuring unit that measures the number of values that the input data can take as the number of elements;
A dummy data generating unit that generates dummy data of a number corresponding to the number of elements measured by the element number measuring unit;
A dummy data selection unit that selects a predetermined number of dummy data from the dummy data generated by the dummy data generation unit;
A registration tag generation unit that generates predetermined tags for the input data and generates input tags, and generates dummy tags by performing the predetermined calculations on dummy data selected by the dummy data selection unit;
An input tag and a dummy tag generated by the registration tag generation unit, and a storage unit that transmits the encrypted data generated by the encryption unit to the database server,
The database server is
A database system comprising: a registration unit that associates an input tag, dummy data, and encrypted data transmitted by the storage unit and registers them in a database.