JPWO2016203555A1

JPWO2016203555A1 - Concealed similarity search system and similarity concealment search method

Info

Publication number: JPWO2016203555A1
Application number: JP2017524187A
Authority: JP
Inventors: 尚宜佐藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2015-06-16
Filing date: 2015-06-16
Publication date: 2018-02-15
Anticipated expiration: 2035-06-16
Also published as: JP6557338B2; WO2016203555A1

Abstract

検索要求にしたがって秘匿化されている検索対象データに類似する被検索対象データを検索する類似性秘匿検索システムであって、ユーザ端末は、質的データまたは量的データを含む検索対象データの入力を受け付ける入力部と、検索対象データに含まれる質的データを、一致不一致の判定が可能な暗号化方式で暗号化する暗号化処理部と、検索対象データに含まれる量的データと被検索対象データに含まれる量的データとに基づいて、両者が一定以上の類似性を有するための条件を算出する類似性判定処理部と、暗号化された質的データと条件とを、被検索対象データを検索するための類似検索サーバに送信する通信部と、を備え、類似検索サーバは、ユーザ端末から質的データと条件とを受信し、または検索対象データに類似すると判定された被検索対象データをユーザ端末に送信するサーバ通信部と、ユーザ端末から受信した質的データと、あらかじめ記憶部に記憶された被検索対象データに含まれる質的データとに基づいて両者が類似しているか否かを判定し、両者が類似していると判定した場合、類似していると判定した被検索対象データの中から条件を満たす量的データを含む被検索対象データを検索対象データに類似するデータとして出力する類似性判定計算処理部と、を備える。A similarity concealment search system for searching data to be searched similar to search target data that is concealed according to a search request, wherein a user terminal inputs search target data including qualitative data or quantitative data An input unit that accepts data, an encryption processing unit that encrypts qualitative data included in the search target data using an encryption method that can be used to determine whether the data matches or does not match, and quantitative data and search target data included in the search target data Based on the quantitative data included in the data, the similarity determination processing unit that calculates a condition for both to have a certain degree of similarity, the encrypted qualitative data and the condition, and the search target data And a communication unit that transmits to a similar search server for searching. The similar search server receives qualitative data and conditions from a user terminal, or determines that it is similar to search target data. Based on the server communication unit that transmits the search target data to the user terminal, the qualitative data received from the user terminal, and the qualitative data included in the search target data stored in the storage unit in advance If it is determined whether or not they are similar, and if it is determined that they are similar, the search target data including quantitative data that satisfies the conditions from the search target data determined to be similar is searched for A similarity determination calculation processing unit that outputs data similar to the data.

Description

本発明は、サーバが保管する平文、あるいは暗号化されたデータのうち、自身が保有するデータと類似したデータを検索する類似性秘匿検索システム、類似性秘匿検索方法に関する。 The present invention relates to a similarity search system and a similarity search method for searching for data similar to the data held by itself among plaintext or encrypted data stored in a server.

近年膨大なデータから未知の、かつ有効な知識を引き出すビッグデータ分析が注目を集めている。また、企業では自社保有のデータはもとより、社外のデータも収集し、マーケティングや業務効率化に活用するなど情報の種々の分析はますます重要な活動として認識されるようになっている。一方でそのようなデータの管理に不備があり、顧客の個人情報を含む情報が大量に漏えいする事件・事故が多発しており、社会問題となっている。 In recent years, big data analysis that draws unknown and effective knowledge from vast amounts of data has attracted attention. In addition, companies are gathering not only their own data, but also collecting data from outside the company and using it for marketing and operational efficiency, various analyzes of information are increasingly recognized as important activities. On the other hand, such data management is inadequate, and there are many incidents and accidents in which a large amount of information including personal information of customers is leaked, which is a social problem.

このような漏えい事故は漏えいさせた企業は多額の補償金を負担することになり、場合によっては企業の存亡に関わる事態となる。また、無関係の企業にとっても情報保有者である個人などが情報提供時に躊躇、結果、情報を収集しにくくなり、有効な分析結果を得られなくなるなど、悪影響が出ることが懸念される。 In such a leakage accident, the leaked company will bear a large amount of compensation, and in some cases, it will be related to the existence of the company. In addition, even for unrelated companies, there is a concern that an individual who is an information holder may have an adverse effect such as making it difficult to collect information and obtaining an effective analysis result when providing information.

これらの状況に対し、正しく情報を収集、利活用するための法整備が進んでいる。また、各業界において取り扱う情報、特に個人情報について機微性、プライバシ性が異なるため、各業界におけるガイドラインも整備されつつある。しかし、これらはあくまでも法律面の整備であり、情報漏えいに対しての抑止には効果があるものの、情報漏えい対策そのものではない。 In these circumstances, legislation has been developed to correctly collect and use information. In addition, since the sensitivity and privacy of information handled in each industry, especially personal information, is different, guidelines in each industry are being developed. However, these are merely legal improvements and are effective in deterring information leakage, but are not information leakage countermeasures themselves.

情報漏えいやそれが引き起こすプライバシ侵害を防ぐ技術としては、従来から、情報にアクセスする人を制限するアクセスコントロールや、機材の盗難などで効果的な暗号化などが考えられてきた。しかし、昨今の情報漏えい事件では、計算機リソースの管理を任された外部委託業者の権限悪用によるものが少なくない。このような外部委託業者が情報を管理するために、データベースなどにアクセスする正当な権限を渡されるため、管理業務に不必要な情報まで閲覧可能となり、それが悪用されてしまう。これに対しては、従来のアクセスコントロールは無力であり、また、暗号化はストレージ暗号のように記録媒体からデータを取り出して、利用する際には復号化する運用のものである場合には、やはり管理者は正当な手続きのもと、復号されたデータを閲覧できるため無力である。 Conventionally, techniques for preventing information leakage and privacy infringement caused by such information have been considered such as access control for restricting people who access information and effective encryption for theft of equipment. However, in recent cases of information leakage, there are many cases due to misuse of authority of outsourcing contractors entrusted with managing computer resources. In order for such an outsourcer to manage information, a legitimate authority to access a database or the like is handed over, so that even information unnecessary for management work can be browsed and abused. In contrast, conventional access control is powerless, and when encryption is used to extract data from a recording medium like a storage cipher and decrypt it when used, Again, the administrator is powerless because he can view the decrypted data under a legitimate procedure.

外部委託者、あるいはその傍には復号鍵を置かず、社内の限られた正当なユーザのみが復号鍵を持っていて、利用時には手元に暗号文を引き戻して復号する運用も可能である。しかし、従来の暗号技術ではデータベースに保管した場合、暗号化状態では何も処理できず、例えば、基本的な操作である検索や、数値データの場合の集計等も諦めなければならない。これでは単なる暗号文保管庫でしかなく、データベース機能を無駄にしてしまう。 A decryption key is not placed in the outsourcer or in the vicinity of it, and only a limited and valid user in the company has the decryption key, and when using it, the ciphertext can be pulled back and decrypted. However, in the conventional encryption technology, when it is stored in a database, nothing can be processed in the encrypted state. For example, basic operations such as retrieval and aggregation in the case of numerical data must be given up. This is just a ciphertext storage and wastes the database function.

そこで昨今、安全性の高い暗号化技術を用いながら、復号化することなく何等かの処理を可能にする技術の研究に注目が集まっている。例えば、復号化することなく平文の一致不一致が判定できる検索可能暗号の研究が盛んに行われている。これは、平文が文字列(氏名、住所など)の場合を対象としたもので、数値データに対する大小比較など、順序関係を判定することは難しい。数値データの比較、特にある尺度の下で類似しているか否かなどの判定は、あらゆる分析はもとより、通常業務においても頻出する重要な演算機能である。その際、複数の項目からなる数値データをベクトルとみなした場合、多くの場合で「尺度」はベクトル空間の通常の距離(ユークリッド距離)、従ってベクトルの内積で表現される。よって、そのデータ(ベクトル)を秘匿したままである「尺度」に沿った類似性を判定する場合、例えば、非特許文献１に開示された技術を用いて、ベクトルを秘匿化したままで内積計算を実行する方式が必要である。 Therefore, recently, attention has been focused on research on a technology that enables some processing without decryption while using a highly secure encryption technology. For example, research on searchable ciphers that can determine whether plaintext matches or does not match without decryption has been actively conducted. This is intended for the case where the plain text is a character string (name, address, etc.), and it is difficult to determine the order relationship such as a size comparison for numerical data. The comparison of numerical data, especially the determination of whether or not they are similar under a certain scale, is an important calculation function that appears frequently in normal business as well as in any analysis. At this time, when numerical data composed of a plurality of items is regarded as a vector, in many cases, the “scale” is expressed by a normal distance (Euclidean distance) in the vector space, and thus an inner product of the vectors. Therefore, when determining the similarity along the “scale” in which the data (vector) is kept secret, for example, using the technique disclosed in Non-Patent Document 1, the inner product calculation with the vector kept secret A method to execute is necessary.

Kana Shimizu, "Privacy preserving search for chemical compound libraries", JSBi-55, CBI/JSBi2011(2011).Kana Shimizu, "Privacy preserving search for chemical compound libraries", JSBi-55, CBI / JSBi2011 (2011).

非特許文献１では、準同型暗号を利用して秘匿化したままで内積計算することは可能である。しかし、準同型暗号の処理速度が遅いことにより、大量のデータを処理することは困難である。また、一方のベクトル(A)と、他方の大量のベクトル(B)との内積計算を考えた場合、ベクトル（B）がその内積計算結果を得ることができると、内積の性質から、ベクトル（B）はベクトル（A）を復元可能となり、秘匿性を保つことができない。従って、ベクトル（B）には内積値そのものは開示せず、内積値の暗号文を計算させる必要がある。 In Non-Patent Document 1, it is possible to calculate the inner product while keeping it secret using homomorphic encryption. However, due to the slow processing speed of homomorphic encryption, it is difficult to process a large amount of data. Also, when considering the inner product calculation of one vector (A) and the other large number of vectors (B), if the vector (B) can obtain the inner product calculation result, the vector ( B) can restore the vector (A) and cannot keep confidentiality. Therefore, the inner product value itself is not disclosed in the vector (B), and it is necessary to calculate the ciphertext of the inner product value.

しかし、ベクトル（B）が保有する大量のベクトル全てに対応した暗号文をベクトル（A）に送付する方式では、通信量の増大や、ベクトル（B）側のみならずベクトル（A）側の処理も膨大になるために非現実的である。従って、ベクトル（B）をある程度まで絞り込んだ上で、データの送受信や復号処理を行うことが必要となる。 However, in the method of sending ciphertexts corresponding to all of the large number of vectors held by vector (B) to vector (A), the amount of communication increases and processing on the vector (A) side as well as the vector (B) side is performed. It is also unrealistic to become enormous. Therefore, it is necessary to perform transmission / reception and decoding of data after narrowing down the vector (B) to some extent.

すなわち、ユーザが持つ数値データを秘匿しつつ、外部ストレージ等に保管された大量の数値データの中から類似したデータを検索して引き出す際、データの類似性を判定するために内積を用いることが多く、非特許文献１のような秘匿内積計算プロトコルが必要となる。しかし、大量のデータとの間で類似性を判定するためには、膨大な量の通信と、ユーザおよび外部装置側双方に膨大な計算処理が必要となるとともに効率が悪くなってしまうという問題があった。 That is, the inner product is used to determine the similarity of data when searching for and retrieving similar data from a large amount of numerical data stored in an external storage or the like while keeping numerical data held by the user confidential. In many cases, a secret inner product calculation protocol as in Non-Patent Document 1 is required. However, in order to determine similarity between a large amount of data, there is a problem that a huge amount of communication and a huge amount of calculation processing are required on both the user and the external device side, and the efficiency is deteriorated. there were.

本発明は、上記に鑑みてなされたものであって、従来技術に比べて処理負荷をかけずに、秘匿性を維持しつつ、効率よくデータの類似性を判定することが可能な類似性秘匿検索システム、類似性秘匿検索方法を提供することを目的とする。 The present invention has been made in view of the above, and similarity concealment that can efficiently determine the similarity of data while maintaining secrecy without applying a processing load compared to the prior art. An object is to provide a search system and a similarity search method.

上述した課題を解決し、目的を達成するために、本発明にかかる類似性秘匿検索システムは、検索要求にしたがって秘匿化されている検索対象データに類似する被検索対象データを検索する類似性秘匿検索システムであって、ユーザ端末は、質的データまたは量的データを含む前記検索対象データの入力を受け付ける入力部と、前記検索対象データに含まれる前記質的データを、一致不一致の判定が可能な暗号化方式で暗号化する暗号化処理部と、前記検索対象データに含まれる前記量的データと前記被検索対象データに含まれる前記量的データとに基づいて、両者が一定以上の類似性を有するための条件を算出する類似性判定処理部と、暗号化された前記質的データと前記条件とを、前記被検索対象データを検索するための類似検索サーバに送信する通信部と、を備え、前記類似検索サーバは、前記ユーザ端末から前記質的データと前記条件とを受信し、または前記検索対象データに類似すると判定された前記被検索対象データを前記ユーザ端末に送信するサーバ通信部と、前記ユーザ端末から受信した質的データと、あらかじめ記憶部に記憶された前記被検索対象データに含まれる質的データとに基づいて両者が類似しているか否かを判定し、両者が類似していると判定した場合、類似していると判定した前記被検索対象データの中から前記条件を満たす量的データを含む被検索対象データを前記検索対象データに類似するデータとして出力する類似性判定計算処理部と、を備えることを特徴とする類似性秘匿検索システムとして構成される。 In order to solve the above-mentioned problems and achieve the object, the similarity concealment search system according to the present invention searches for similarity search data similar to the search target data concealed according to the search request. In the search system, the user terminal can determine whether or not the input unit that receives input of the search target data including qualitative data or quantitative data matches the qualitative data included in the search target data. Based on the encryption processing unit for encrypting with an encryption method and the quantitative data included in the search target data and the quantitative data included in the search target data. A similarity determination processing unit for calculating a condition for having a search condition, a similarity search server for searching the search target data using the encrypted qualitative data and the condition The similarity search server receives the qualitative data and the condition from the user terminal, or the search target data determined to be similar to the search target data. Whether or not they are similar based on the server communication unit to be transmitted to the terminal, the qualitative data received from the user terminal, and the qualitative data included in the search target data stored in advance in the storage unit If the data is determined to be similar to each other, the search target data including quantitative data satisfying the above condition among the search target data determined to be similar is similar to the search target data. And a similarity determination calculation processing unit that outputs as data to be processed.

また、本発明は、上記類似性秘匿検索システムで行われる類似性秘匿検索方法としても把握される。 Moreover, this invention is grasped | ascertained also as the similarity secret search method performed with the said similarity secret search system.

本発明によれば、従来技術に比べて処理負荷をかけずに、秘匿性を維持しつつ、効率よくデータの類似性を判定することが可能となる。 According to the present invention, it is possible to efficiently determine the similarity of data while maintaining confidentiality without applying a processing load as compared with the prior art.

本実施形態に係る類似性秘匿検索方式の構成例を示す図である。It is a figure which shows the structural example of the similarity concealment search system which concerns on this embodiment. 第一実施形態に係るデータ保管・類似検索サーバの構成例を示す図である。It is a figure which shows the structural example of the data storage and similarity search server which concerns on 1st embodiment. 第一実施形態に係るユーザ端末の構成例を示す図である。It is a figure which shows the structural example of the user terminal which concerns on 1st embodiment. 第一実施形態に係るデータの分類を表す表である。It is a table | surface showing the classification | category of the data which concern on 1st embodiment. 第一実施形態に係る各ユーザの処理とデータ保管・類似検索サーバの処理を示すフローチャートである。It is a flowchart which shows each user's process and data storage / similarity search server process which concern on 1st embodiment. 第一実施形態に係るユーザの類似性判定に用いるデータ生成手順を示すフローチャートである。It is a flowchart which shows the data generation procedure used for the user similarity determination which concerns on 1st embodiment. 第二実施形態に係るユーザのデータ暗号化とその送付、データ保管・類似検索サーバによるデータ保管の手順を示すフローチャートである。It is a flowchart which shows the procedure of the data storage by the user's data encryption which concerns on 2nd embodiment, its transmission, and a data storage / similarity search server. 第二実施形態に係るユーザのデータ暗号化と判定条件生成とその送付、データ保管・類似検索サーバによる類似データの絞込み手順を示すフローチャートである。It is a flowchart which shows a user's data encryption which concerns on 2nd embodiment, determination condition production | generation, its transmission, and the narrowing-down procedure of the similar data by a data storage / similarity search server.

次に、本発明を実施するための形態（「実施形態」という）について、適宜図面を参照しながら詳細に説明する。なお、以下に示す実施形態において、同一の構成要素には原則として同一の符号を付け、繰り返しの説明は省略する。 Next, modes for carrying out the present invention (referred to as “embodiments”) will be described in detail with reference to the drawings as appropriate. Note that, in the embodiment described below, the same components are denoted by the same reference symbols in principle, and repeated description is omitted.

《システム構成》
図１は、本発明にかかる類似性秘匿検索システム、および類似性秘匿検索方法を適用した類似性秘匿検索システム１０００の構成例を示す図である。図１に示すように、類似性秘匿検索システム１０００は、複数のユーザ端末３００〜５００と、類似検索サーバ２００とがネットワーク１００を介して互いに接続された構成となっている。例えば、あるサービス提供形態では、類似検索サーバ２００はクラウド事業者に設置され、一般的なコンピュータであるサーバ装置から構成される。また、クラウド事業者のサービスを利用するユーザが操作するユーザ端末３００は、ＰＣや携帯電話、スマートフォン等の一般的なコンピュータである情報処理装置から構成される。なお、図１では、本システムがユーザ端末３００〜５００の３台を含む構成として記載されているが、その台数は任意である。"System configuration"
FIG. 1 is a diagram showing a configuration example of a similarity concealment search system 1000 to which the similarity concealment search system and the similarity concealment search method according to the present invention are applied. As shown in FIG. 1, the similarity confidential search system 1000 has a configuration in which a plurality of user terminals 300 to 500 and a similarity search server 200 are connected to each other via a network 100. For example, in a certain service provision form, the similar search server 200 is installed in a cloud provider and is configured from a server device that is a general computer. In addition, the user terminal 300 operated by a user who uses the service of the cloud provider is configured by an information processing apparatus that is a general computer such as a PC, a mobile phone, or a smartphone. In FIG. 1, the present system is described as a configuration including three user terminals 300 to 500, but the number is arbitrary.

《第一実施形態》
次に、図２〜図８を参照して、類似性秘匿検索システム１０００の第一実施形態を説明する。<< first embodiment >>
Next, a first embodiment of the similarity concealment search system 1000 will be described with reference to FIGS.

（類似検索サーバ）
図２は、第一実施形態に係る類似検索サーバ２００の構成例を示す図である。図示するように、類似検索サーバ２００は、データを処理する制御部２１０と、データを記憶する記憶部２２０と、情報の入力を受け付ける入力部２０１と、情報を出力する出力部２０２と、外部装置とのデータの送受信を行う通信部２０３とを備える。制御部２１０は、例えば、ＣＰＵ（Central Processing Unit）等の演算装置から構成され、入力部２１０は、キーボード等の入力装置から構成され、出力部２０２は、ＬＣＤ（Liquid Crystal Display）等の表示装置から構成され、通信部２０３は、ＮＩＣ（Network Interface Controller）等の通信装置から構成される。制御部２１０は、全体処理部２１１と、類似性判定計算処理部２１２と、を備える。(Similar search server)
FIG. 2 is a diagram illustrating a configuration example of the similar search server 200 according to the first embodiment. As illustrated, the similarity search server 200 includes a control unit 210 that processes data, a storage unit 220 that stores data, an input unit 201 that receives input of information, an output unit 202 that outputs information, and an external device. And a communication unit 203 that transmits and receives data. For example, the control unit 210 includes an arithmetic device such as a CPU (Central Processing Unit), the input unit 210 includes an input device such as a keyboard, and the output unit 202 includes a display device such as an LCD (Liquid Crystal Display). The communication unit 203 includes a communication device such as a NIC (Network Interface Controller). The control unit 210 includes an overall processing unit 211 and a similarity determination calculation processing unit 212.

全体処理部２１１は、類似検索サーバ２００における処理を統括制御したり、入力部２０１を介して受け付けた情報を記憶部２２０に格納したり、ユーザ端末３００〜５００との間で行われる情報の送受信を制御する。また、全体処理部２１１は、データを出力部２０２に表示する処理を行ったり、記憶部２２０に格納されているデータを読み込み、通信部２０３を介して、ユーザ端末３００〜５００に送信する。 The overall processing unit 211 performs overall control of processing in the similarity search server 200, stores information received via the input unit 201 in the storage unit 220, and transmits and receives information to and from the user terminals 300 to 500 To control. Further, the overall processing unit 211 performs processing for displaying data on the output unit 202, reads data stored in the storage unit 220, and transmits the data to the user terminals 300 to 500 via the communication unit 203.

類似性判定計算処理部２１２は、ユーザ端末３００〜５００から受信した検索用データと、あらかじめ類似検索サーバ２００内に記憶されている被検索対象データとの類似性を判定し、ある閾値以上となるデータを被検索対象データの中から抽出して出力する。具体的には、類似性判定計算処理部２１２は、検索用データに含まれる質的データを示す項目と、上記被検索対象データに含まれる質的データを示す項目との類似性を判定する。例えば、検索用データに含まれる質的データを示す項目と上記被検索対象データに含まれる質的データを示す項目が７０％以上一致している場合、両者は類似していると判定する。この類似性の判定は、質的データを示す項目を暗号化するための暗号化鍵により暗号化されたまま行われる。 The similarity determination calculation processing unit 212 determines the similarity between the search data received from the user terminals 300 to 500 and the search target data stored in the similar search server 200 in advance, and becomes equal to or greater than a certain threshold. Data is extracted from the search target data and output. Specifically, the similarity determination calculation processing unit 212 determines the similarity between an item indicating qualitative data included in the search data and an item indicating qualitative data included in the search target data. For example, when the item indicating the qualitative data included in the search data and the item indicating the qualitative data included in the search target data match 70% or more, it is determined that the two are similar. This similarity determination is performed while being encrypted with an encryption key for encrypting an item indicating qualitative data.

さらに、類似性判定計算処理部２１２は、類似していると判定した被検索対象データを絞り込み、絞り込まれた被検索対象データに含まれる量的データを示す項目と、上記検索用データに含まれ量的データを暗号化するための暗号化鍵により暗号化された量的データを示す項目との内積値を計算し、絞り込まれた被検索対象データの中から、その内積値がある閾値以上となるデータを抽出する。例えば、検索用データに含まれる量的データを示す項目と上記絞り込まれた被検索対象データに含まれる量的データを示す項目との内積値が０．７以上となる場合、そのデータを絞り込まれた被検索対象データの中から抽出する。この類似性の判定は、量的データを暗号化するための暗号化鍵により暗号化されたまま行われる。抽出された上記データは、全体処理部２１１により、通信部２０３を介して検索結果データ３３０３として送信される。 Further, the similarity determination calculation processing unit 212 narrows down the search target data determined to be similar, and includes items indicating quantitative data included in the narrowed search target data and the search data. Calculate the inner product value with the item indicating the quantitative data encrypted with the encryption key for encrypting the quantitative data, and within the narrowed search target data, the inner product value is a certain threshold value or more Extract the data. For example, when the inner product value of an item indicating quantitative data included in the search data and an item indicating quantitative data included in the narrowed search target data is 0.7 or more, the data is narrowed down. From the search target data. This similarity determination is performed while being encrypted with an encryption key for encrypting quantitative data. The extracted data is transmitted as search result data 3303 by the overall processing unit 211 via the communication unit 203.

記憶部２２０のデータ記憶部２３０には、類似性判定計算処理部２１２が被検索対象データの中から抽出した上記データ２３１が格納されている。一時情報記憶部２４０は、制御部２１０が実行する処理で一時的に必要となる情報が格納される。類似検索サーバ２００の各部が行う処理については、フローチャートを用いて後述する。 The data storage unit 230 of the storage unit 220 stores the data 231 extracted from the search target data by the similarity determination calculation processing unit 212. The temporary information storage unit 240 stores information that is temporarily required for processing executed by the control unit 210. Processing performed by each unit of the similarity search server 200 will be described later using a flowchart.

（ユーザ端末）
図３は、第一実施形態に係るユーザ端末３００の構成例を示す図である。なお、各ユーザ端末の構成は同様の構成であるため、以下では、ユーザ端末３００の構成例について示している。また、これらのユーザ端末は、暗号化を行わない場合には、暗号化処理部３１２と、復号化処理部３１３と、暗復号化鍵記憶部３４０、類似性判定処理部３１４、暗復号化鍵記憶部３４０等、暗復号化に必要な各部を備えなくてもよい。(User terminal)
FIG. 3 is a diagram illustrating a configuration example of the user terminal 300 according to the first embodiment. In addition, since the structure of each user terminal is the same structure, below, the structural example of the user terminal 300 is shown. When these user terminals do not perform encryption, the encryption processing unit 312, the decryption processing unit 313, the encryption / decryption key storage unit 340, the similarity determination processing unit 314, the encryption / decryption key Each unit necessary for encryption / decryption such as the storage unit 340 may not be provided.

図示するように、ユーザ端末３００は、データを処理する制御部３１０と、データを記憶する記憶部３２０と、情報の入力を受け付ける入力部３０１と、情報を出力する出力部３０２と、外部装置とのデータの送受信を行う通信部３０３とを備える。上記各部は、類似検索サーバ２００と同様の機能を備えた装置から構成される。 As illustrated, the user terminal 300 includes a control unit 310 that processes data, a storage unit 320 that stores data, an input unit 301 that receives input of information, an output unit 302 that outputs information, an external device, and the like. And a communication unit 303 that transmits and receives the data. Each of the above units is configured by a device having the same function as the similar search server 200.

制御部３１０は、全体処理部３１１と、暗号化処理部３１２と、復号化処理部３１３と、類似性判定処理部３１４とを備える。 The control unit 310 includes an overall processing unit 311, an encryption processing unit 312, a decryption processing unit 313, and a similarity determination processing unit 314.

全体処理部３１１は、ユーザ端末３００における処理を統括制御したり、入力部３０１を介して、受け付けた情報を記憶部３２０に格納したり、類似検索サーバ２００との間で行われる情報の送受信を制御する。また、全体処理部３１１は、データを出力部３０３に表示する処理を行ったり、記憶部３２０に格納されているデータを読み込み、それぞれ、暗号化処理部３１２にデータの暗号化をさせたり、復号化処理部３１３に暗号化データの復号化をさせる。また、全体処理部３１１は、入力部３０１が受け付けた情報に基づいて、あらかじめユーザにより選定された検索対象データに類似するデータを被検索対象データの中から検索するための検索用データとして生成する。この検索用データが暗号化前データ３３０１となる。また、全体処理部３１１は、暗号化処理部３１２により暗号化前データ３３０１が暗号化された後の暗号化後データ３３０２を、通信部３０３を介して、類似検索サーバ２００に送信する。 The overall processing unit 311 performs overall control of processing in the user terminal 300, stores received information in the storage unit 320 via the input unit 301, and transmits / receives information to / from the similarity search server 200. Control. In addition, the overall processing unit 311 performs processing for displaying data on the output unit 303, reads data stored in the storage unit 320, and causes the encryption processing unit 312 to encrypt data or decrypt data, respectively. The encryption processing unit 313 decrypts the encrypted data. The overall processing unit 311 generates data similar to the search target data selected by the user in advance as search data for searching the search target data based on the information received by the input unit 301. . This search data becomes the pre-encryption data 3301. In addition, the overall processing unit 311 transmits the encrypted data 3302 after the pre-encrypted data 3301 is encrypted by the encryption processing unit 312 to the similarity search server 200 via the communication unit 303.

暗号化処理部３１２は、上記暗号化前データ３３０１のうち、質的データを示す項目を暗号化するための暗号化鍵３４１により質的データを示す項目を暗号化し、暗号化後データ３３０２を生成する。また、暗号化処理部３１２は、類似性判定処理部３１４により求められた、量的データを示す項目をベクトル化した対象ベクトルと被検索対象データに含まれる量的データを示す項目をベクトル化した被対象ベクトルとの内積値が一定値以上であると判定するための閾値を満たす対象ベクトルを構成する基底ベクトルの組とを、量的データを示す項目を暗号化するための暗号化鍵３４１により暗号化する。この閾値を満たすことが、両者が一定以上の類似性を有するための条件となる。 The encryption processing unit 312 encrypts the item indicating the qualitative data with the encryption key 341 for encrypting the item indicating the qualitative data in the pre-encryption data 3301 and generates the encrypted data 3302 To do. Further, the encryption processing unit 312 vectorizes the target vector obtained by vectorizing the item indicating the quantitative data and the item indicating the quantitative data included in the search target data, which is obtained by the similarity determination processing unit 314. A set of base vectors constituting a target vector that satisfies a threshold for determining that the inner product value with the target vector is equal to or greater than a certain value is obtained by an encryption key 341 for encrypting an item indicating quantitative data. Encrypt. Satisfying this threshold is a condition for both to have a certain similarity or more.

復号化処理部３１３は、類似検索サーバ２００から受信した検索結果データ３３０３の質的データを示す項目および量的データを示す項目を、復号化鍵３４２により復号化し、復号化データ３３０４を生成する。 The decryption processing unit 313 decrypts the item indicating the qualitative data and the item indicating the quantitative data of the search result data 3303 received from the similar search server 200 with the decryption key 342 to generate decrypted data 3304.

類似性判定処理部３１４は、上記閾値を満たす対象ベクトルを構成する基底ベクトルの組を求めるための処理である線形分解処理を実行し、基底ベクトルの組と、上記基底ベクトルの係数であるスカラー値の組とを出力する。 The similarity determination processing unit 314 performs a linear decomposition process that is a process for obtaining a set of base vectors that constitute a target vector that satisfies the above threshold, and sets a set of base vectors and a scalar value that is a coefficient of the base vector Are output.

記憶部３２０は、データ記憶部３３０と、暗復号化鍵記憶部３４０と、一時情報記憶部３５０とを有している。データ記憶部３３０は、上記暗号化前データ３３０１と、上記暗号化後データ３３０２と、類似検索サーバ２００によって検索された暗号化前データ３３０１に類似するデータである検索結果データ３３０３と、検索結果データ３３０３が復号化鍵３４２により復号化された復号化データ３３０４とを記憶する。 The storage unit 320 includes a data storage unit 330, an encryption / decryption key storage unit 340, and a temporary information storage unit 350. The data storage unit 330 includes the pre-encryption data 3301, the post-encryption data 3302, search result data 3303 that is similar to the pre-encryption data 3301 searched by the similar search server 200, and search result data. 3303 stores the decrypted data 3304 decrypted by the decryption key 342.

また、暗復号化鍵記憶部３４０は、暗号化前データ３３０１を暗号化するための暗号化鍵と、検索結果データ３３０３を復号化するための復号化鍵３４２とを記憶する。暗号化鍵は、質的データを暗号化するための暗号化鍵と、量的データを暗号化するための暗号化鍵とを記憶してもよいし、同じ暗号化鍵を記憶してもよい。復号化鍵についても暗号化鍵と同様、質的データおよび量的データごとに記憶してもよいし、同じ復号化鍵を記憶してもよい。なお、一時情報記憶部３５０には、類似検索サーバ２００の場合と同様に、制御部３１０が実行する処理で一時的に必要となる情報が格納される。 The encryption / decryption key storage unit 340 stores an encryption key for encrypting the pre-encryption data 3301 and a decryption key 342 for decrypting the search result data 3303. The encryption key may store an encryption key for encrypting qualitative data and an encryption key for encrypting quantitative data, or may store the same encryption key. . Similarly to the encryption key, the decryption key may be stored for each qualitative data and quantitative data, or the same decryption key may be stored. The temporary information storage unit 350 stores information temporarily necessary for processing executed by the control unit 310 as in the case of the similar search server 200.

以下では、ユーザ端末３００〜５００は、類似検索サーバ２００に対して、あらかじめ上記被検索対象データを送信し、類似検索サーバ２００は、データ記憶部２３０に、その被検索対象データをデータ２３１として記憶しているものとする。また、第一実施形態ではデータ２３１は、質的データは平文または暗号文として、量的データは平文として記録されている前提で説明する。 Hereinafter, the user terminals 300 to 500 transmit the search target data to the similar search server 200 in advance, and the similar search server 200 stores the search target data as data 231 in the data storage unit 230. Suppose you are. In the first embodiment, the data 231 will be described on the assumption that qualitative data is recorded as plain text or cipher text, and quantitative data is recorded as plain text.

（データの分類）
図４は、ユーザ端末３００が被検索対象データの中から検索する検索対象データを分類した表である。本システムで取り扱う検索対象データには、質的データを示す項目と量的データを示す項目とが含まれるため、以下、両者の分類について説明する。質的データとは、例えば、性別、住所、職種等、分類や種類を区別するためだけのデータであって、数値で演算できないデータである。また、量的データとは、例えば、病院での検査結果の数値データ、枚数、金額等、数値の大小に意味があり、数値で演算可能なデータである。(Data classification)
FIG. 4 is a table in which search target data to be searched from the search target data by the user terminal 300 is classified. Since the search target data handled by this system includes items indicating qualitative data and items indicating quantitative data, the classification of both will be described below. The qualitative data is, for example, data only for distinguishing classifications and types such as gender, address, occupation, etc., and is data that cannot be calculated numerically. Further, the quantitative data is data that can be calculated numerically, for example, in terms of numerical values such as numerical data, number of sheets, amount of money, etc. of examination results in a hospital.

本システムでは、図４に示すように、検索対象データに含まれる質的データを示す項目および量的データを示す項目のいずれの場合も、これらのデータが他者（例えば、治療を受ける患者や家族以外の者）に開示可能である場合には、暗号化せずにそのまま被検索対象データを絞り込むための検索用データとして利用する（Ｓ４０１、Ｓ４０２）。 In this system, as shown in FIG. 4, in both cases of items indicating qualitative data and items indicating quantitative data included in the search target data, these data are not sent to others (for example, patients receiving treatment or If it can be disclosed to a person other than the family, it is used as search data for narrowing down the search target data without encryption (S401, S402).

一方、これらのデータが他者に開示可能でない場合、質的データを示す項目については、両者の一致不一致の判定が可能な暗号化方式を用いて暗号化し、検索用データとして利用する（Ｓ４０３）。このような暗号化方式としては、例えば、検索可能暗号技術（特許第５４１２４１４号）を用いることができる。量的データを示す項目については、秘匿したまま両者の類似性を判定することが可能な暗号化方式を用いて暗号化し、検索用データとして利用する（Ｓ４０４）。このような暗号化方式としては、例えば、従来から知られている様々な準同型暗号技術を用いることができる。なお、検索用データに含まれる各項目が開示可能であるか否かは、例えば、患者やその家族の同意が得られない場合、研究のためにそれ自体秘匿性が高い場合等の開示基準により判定することができる。 On the other hand, if these data cannot be disclosed to others, the items indicating qualitative data are encrypted using an encryption method capable of determining whether they match or not, and used as search data (S403). . As such an encryption method, for example, a searchable encryption technique (Japanese Patent No. 5412414) can be used. The item indicating the quantitative data is encrypted using an encryption method capable of determining the similarity between the two while being kept secret and used as search data (S404). As such an encryption method, for example, various conventionally known homomorphic encryption techniques can be used. Whether or not each item included in the search data can be disclosed depends on the disclosure criteria, for example, when the consent of the patient or their family cannot be obtained, or when confidentiality is high for research purposes. Can be determined.

（絞込み例）
図５は、ユーザ端末３００が、検索対象データにより、あらかじめ記憶されている被検索対象データの中から類似するデータを検索し、類似検索サーバ２００が絞込みを行う処理の処理手順を示すフローチャートである。以下では、検索対象データを用いて被検索対象データの中から類似するデータを絞り込んで検索する例として、ある患者の検査データを検索対象データとし、その患者に類似する患者の検査データを被検索対象データとしている。また、検査対象データおよび被検索対象データは、１または複数の質的データを示す項目（例えば、患者の氏名、住所、性別、病名等）と、１または複数の量的データを示す項目（例えば、患者の年齢、検査値等）とを含んでいる。(Example of narrowing down)
FIG. 5 is a flowchart showing a processing procedure of a process in which the user terminal 300 searches for similar data from search target data stored in advance using the search target data, and the similar search server 200 narrows down. . In the following, as an example of narrowing down and searching for similar data from the search target data using the search target data, the search data of a patient is set as search target data, and the test data of a patient similar to that patient is searched Target data. The examination target data and the search target data include items indicating one or more qualitative data (for example, patient name, address, sex, disease name, etc.) and items indicating one or more quantitative data (for example, Patient age, laboratory values, etc.).

まず、ユーザ端末３００では、全体処理部３１１は、検索対象データの中から質的データや量的データを示す項目を抽出して上記検索用データを生成し、生成した検索用データを暗号化前データ３３０１として記憶する（Ｓ５０１）。このとき、抽出した項目を暗号化する必要がない場合にはＳ５０５に進む。 First, in the user terminal 300, the overall processing unit 311 generates items of search data by extracting items indicating qualitative data and quantitative data from the search target data, and the generated search data is encrypted. It is stored as data 3301 (S501). At this time, if it is not necessary to encrypt the extracted item, the process proceeds to S505.

続いて、暗号化処理部３１２は、暗号化前データ３３０１のうち、上記暗号化鍵３４１により質的データを示す項目を暗号化し、暗号化後データ３３０２を生成する（Ｓ５０２）。このとき、検索用データに量的データを示す項目が含まれない場合にはＳ５０５に進む。 Subsequently, the encryption processing unit 312 encrypts items indicating qualitative data in the pre-encryption data 3301 using the encryption key 341, and generates post-encryption data 3302 (S502). At this time, if an item indicating quantitative data is not included in the search data, the process proceeds to S505.

続いて、類似性判定処理部３１４は、上記線形分解処理を実行し、基底ベクトルの組と、基底ベクトルの係数であるスカラー値の組とを出力する（Ｓ５０３）。線形分解処理の具体的な処理については図６を用いて後述する。 Subsequently, the similarity determination processing unit 314 executes the linear decomposition process, and outputs a set of basis vectors and a set of scalar values that are coefficients of the basis vectors (S503). Specific processing of the linear decomposition processing will be described later with reference to FIG.

暗号化処理部３１２は、上記線形分解処理を実行して出力された基底ベクトルの組およびスカラー値の組を、暗号化鍵３４１により暗号化する（Ｓ５０４）。 The encryption processing unit 312 encrypts the set of basis vectors and the set of scalar values output by executing the above linear decomposition process using the encryption key 341 (S504).

全体処理部３１１は、上記Ｓ５０１〜Ｓ５０４で生成されたデータを、類似検索サーバ２００に送信する（Ｓ５０５）。なお、上記Ｓ５０１〜Ｓ５０４の各ステップは、必ずしもすべて実行される必要はなく、検索用データが平文のみである場合、質的データを示す項目のみを含む場合、量的データを示す項目のみを含む場合には、上述したように取捨選択して実行することができる。また、Ｓ５０２、Ｓ５０３はどちらが先に処理されてもよい。 The overall processing unit 311 transmits the data generated in S501 to S504 to the similarity search server 200 (S505). Note that the steps S501 to S504 do not necessarily have to be executed. If the search data is plaintext only, if only items indicating qualitative data are included, only items indicating quantitative data are included. In this case, it can be selected and executed as described above. Either S502 or S503 may be processed first.

類似検索サーバ２００では、類似性判定計算処理部２１２が、ユーザ端末３００から受信した上記データをキーとして被検索対象データを絞り込む（Ｓ５０６）。例えば、ユーザ端末３００から受信した上記データが質的データを示す項目を含む場合には、その項目と、上記被検索対象データに含まれる質的データを示す項目とが７０％以上一致している場合には両者は類似していると判定し、被検索対象データを絞り込む。上記絞り込みにおいては、例えば、文字列データであれば完全一致や部分一致、あるいは類義語などにより絞込みを行う。この場合、あらかじめ類似検索サーバ２００に類義語を検索するための辞書を記憶しておき、その辞書と上記データとの一致度、およびその辞書と上記被検索対象データとの一致度が同程度の場合に、両者が類似すると判定してもよい。 In the similar search server 200, the similarity determination calculation processing unit 212 narrows down the search target data using the data received from the user terminal 300 as a key (S506). For example, when the data received from the user terminal 300 includes an item indicating qualitative data, the item and an item indicating qualitative data included in the search target data match 70% or more. In this case, it is determined that both are similar, and the search target data is narrowed down. In the above-mentioned narrowing down, for example, narrowing down is performed by complete matching, partial matching, or synonyms for character string data. In this case, a dictionary for searching for synonyms is stored in advance in the similar search server 200, and the degree of coincidence between the dictionary and the data and the degree of coincidence between the dictionary and the data to be searched are similar. Alternatively, it may be determined that both are similar.

類似性判定計算処理部２１２は、Ｓ５０６で絞り込んだ被検索対象データに含まれる量的データを示す項目と、上記データに含まれる量的データを示す項目との内積値を計算し、絞り込まれた被検索対象データの中から、その内積値がある閾値以上となるデータを抽出する（Ｓ５０７）。例えば、両者の内積値が閾値０．７以上となる全てのデータを、絞り込まれた被検索対象データの中から抽出する。 The similarity determination calculation processing unit 212 calculates the inner product value of the item indicating the quantitative data included in the search target data narrowed down in S506 and the item indicating the quantitative data included in the data, and is narrowed down. Data for which the inner product value is equal to or greater than a threshold value is extracted from the search target data (S507). For example, all data for which the inner product value of both is equal to or greater than the threshold value 0.7 is extracted from the narrowed search target data.

そして、全体処理部２１１は、上記抽出したデータを、通信部２０３を介して検索結果データ３３０３として送信する（Ｓ５０８）。 Then, the overall processing unit 211 transmits the extracted data as search result data 3303 via the communication unit 203 (S508).

ユーザ端末３００の復号化処理部３１３は、類似検索サーバ２００から受信した検索結果データ３３０３の質的データを示す項目および量的データを示す項目を、復号化鍵３４２により復号化し、復号化データ３３０４を生成する（Ｓ５０９）。その後、ユーザ端末３００の全体制御部３１１は、生成された復号化データ３３０４を出力部３０２に出力し、ユーザによりその結果が評価される（Ｓ５１０）。 The decryption processing unit 313 of the user terminal 300 decrypts the item indicating the qualitative data and the item indicating the quantitative data of the search result data 3303 received from the similar search server 200 with the decryption key 342, and decrypts the decrypted data 3304. Is generated (S509). Thereafter, the overall control unit 311 of the user terminal 300 outputs the generated decoded data 3304 to the output unit 302, and the result is evaluated by the user (S510).

（線形分解処理）
図６は、図５に示したＳ５０３における線形分解処理の処理手順を示すフローチャートである。ユーザ端末３００では、秘匿したい量的データ（Ｓ４０４）としてベクトルＸを有している。ユーザ端末３００の類似性判定処理部３１４は、まず、自然数ｍとｔとを決定し（Ｓ６０１）、ベクトルＸを構成する、ｍ個の基底ベクトルによるt種類の線形分解をランダムに生成する（Ｓ６０２）。ここでいくつかのスカラーａｉｊは０でもよい。(Linear decomposition)
FIG. 6 is a flowchart showing the processing procedure of the linear decomposition process in S503 shown in FIG. The user terminal 300 has a vector X as quantitative data to be concealed (S404). The similarity determination processing unit 314 of the user terminal 300 first determines the natural numbers m and t (S601), and randomly generates t types of linear decompositions using m basis vectors constituting the vector X (S602). ). Here, some scalars aij may be zero.

続いて、類似性判定処理部３１４は、上記ベクトルＸとの内積値が満たすべき閾値から導かれる各ベクトルＸを構成する基底ベクトルＧｉｊとスカラー値、すなわち上記閾値を満たす内積値の条件を算出する（Ｓ６０３）。 Subsequently, the similarity determination processing unit 314 calculates a basis vector Gij and a scalar value that constitute each vector X derived from a threshold value that the inner product value with the vector X should satisfy, that is, a condition of the inner product value that satisfies the threshold value. (S603).

類似性判定処理部３１４は、Ｓ６０２で生成したベクトルの集合｛Ｇｉｊ｝と、Ｓ６０３で導かれたスカラー値とを、図５に示したＳ５０３で出力するデータとするとともに、各スカラー値｛ａｉｊ｝は、他者に開示することなく秘密の状態で暗復号化鍵記憶部３４０に記憶する（Ｓ６０４）。Ｓ６０４の処理が終了すると、線形分解処理が終了する。 The similarity determination processing unit 314 sets the vector set {Gij} generated in S602 and the scalar value derived in S603 as data to be output in S503 illustrated in FIG. 5 and each scalar value {aij}. Is stored in the encryption / decryption key storage unit 340 in a secret state without being disclosed to others (S604). When the process of S604 ends, the linear decomposition process ends.

（第一実施形態のまとめ）
第一実施形態によれば、類似性判定するデータに含まれる質的データと、類似性判定されるデータに含まれる質的データとを暗号化したまま類似性判定し、所定の閾値以上のデータを抽出し、さらに、抽出されたデータに含まれる量的データと、類似性判定するデータに含まれる量的データとの内積値が所定の閾値以上のデータを絞り込み、その結果をクライアントに返す。したがって、ユーザ(複数でも可)が平文のままあるいは暗号化したデータを類似検索サーバに預託し、あるユーザ(ユーザ端末３００)が自身の持つデータのうち一部を開示、残りを秘匿したまま効率的に類似検索することができる。すなわち、クラウドなど第三者に預託されている大量データに対し、ユーザ自身の持つデータを秘匿しつつ、効率的に類似のデータを検索することが可能になる。(Summary of first embodiment)
According to the first embodiment, similarity determination is performed while encrypting qualitative data included in data for determining similarity and qualitative data included in data for which similarity determination is performed, and data that is equal to or greater than a predetermined threshold value Furthermore, data whose inner product value of the quantitative data included in the extracted data and the quantitative data included in the similarity determination data is equal to or larger than a predetermined threshold is narrowed down, and the result is returned to the client. Therefore, the user (s) can leave the plaintext or encrypted data in the similar search server, and a certain user (user terminal 300) can disclose part of the data held by himself / herself and keep the rest secret. Similar search. That is, it becomes possible to efficiently search for similar data while concealing the data owned by the user from a large amount of data deposited in a third party such as the cloud.

また、平文で開示可能な部分はそのまま従来の類似性判定を実施して絞込みを行い、また開示不可な部分は、質的データに対しては暗号化したまま一致不一致判定できる暗号方式を適用して絞込みを行い、また量的データに対しては類似性判定に必要な付加的データを送付することで絞込みを実施する。第一実施形態ではその付加的データとして、秘匿したいデータのランダムな線形分割を利用した方法を記述したが、他の手法を用いてもよい。 For the parts that can be disclosed in plain text, the conventional similarity determination is performed as it is to narrow down, and for the parts that cannot be disclosed, an encryption method that can determine the match / mismatch while encrypting the qualitative data is applied. In addition, the quantitative data is narrowed down by sending additional data necessary for similarity determination. In the first embodiment, a method using random linear division of data to be concealed is described as the additional data. However, other methods may be used.

《第二実施形態》
次に、図７と図８を参照して、類似性秘匿検索システム１０００の第二実施形態を説明する。<< Second Embodiment >>
Next, a second embodiment of the similarity concealment search system 1000 will be described with reference to FIGS.

第二実施形態では、第一実施形態に対し、ユーザ端末３００(ユーザ端末４００、５００を含んでもよい。以下同じ。)が、事前に預託するデータのうち秘匿したい量的データ部分の暗号化の手順と、類似検索対象のデータであるベクトルＸを暗号化して送付し、類似検索サーバ２００が類似検索処理を実行する手順を加えたものである。 In the second embodiment, in contrast to the first embodiment, the user terminal 300 (which may include the user terminals 400 and 500; the same shall apply hereinafter) encrypts the quantitative data portion that is to be concealed among the data deposited in advance. A procedure and a procedure in which the vector X which is the data of the similar search target is encrypted and sent, and the similar search server 200 executes the similar search processing are added.

ユーザ端末３００は、類似性判定処理部３１４が、類似検索サーバ２００に預託する複数の量的データに対し、直交変換Ｔをランダムに生成し（Ｓ７０１）、スカラーｒをランダムに生成し（Ｓ７０２）、さらにベクトルＷをランダムに生成する（Ｓ７０３）。これらの生成の順番は入れ替えてもよい。またＷは０ベクトルでもよい。 In the user terminal 300, the similarity determination processing unit 314 randomly generates orthogonal transformation T for a plurality of quantitative data deposited in the similarity search server 200 (S701), and randomly generates a scalar r (S702). Further, a vector W is randomly generated (S703). The order of generation may be changed. W may be a 0 vector.

次に、類似性判定処理部３１４は、預託対象となる検索対象データの秘匿したい量的データ部分｛Ｙｉ｝をベクトルとみなし、そのベクトルに対してアフィン変換Ｙ’ｉ＝ｒＴＹｉ＋Ｗを施し、｛Ｙ’ｉ｝を他のデータをともに預託するため、これらのデータを類似検索サーバ２００に送信する（Ｓ７０５）。類似検索サーバ２００の類似性判定計算処理部２１２は、ユーザ端末３００から受信した上記データを記憶部２２０に記憶する（Ｓ７０６）。 Next, the similarity determination processing unit 314 regards the quantitative data portion {Yi} to be concealed of the search target data to be deposited as a vector, performs affine transformation Y′i = rTYi + W on the vector, {Y In order to deposit 'i} together with other data, these data are transmitted to the similarity search server 200 (S705). The similarity determination calculation processing unit 212 of the similarity search server 200 stores the data received from the user terminal 300 in the storage unit 220 (S706).

そして、ユーザ端末３００が類似検索される被検索対象データの秘匿したい量的データを秘匿化してベクトルＸとする場合、第二実施形態では、まず、ユーザ端末３００の類似性判定処理部３１４は、スカラーsをランダムに生成し（Ｓ８０１）、直交変換およびアフィン変換を施してＸ’＝ｓＴＸを計算し（Ｓ８０２）、ベクトルＸとの内積値が満たすべき閾値から導かれる各Ｙ’ｉとスカラー値、すなわち上記閾値を満たす内積値の条件を算出する（Ｓ８０３）。ユーザ端末３００は上記Ｘ’とＳ８０３で求めた条件を類似検索サーバ２００に送信する（Ｓ８０４）。すなわち、ユーザ端末３００の類似性判定処理部３１４は、検索対象データに含まれる量的データをベクトルとみなして当該ベクトルに対して、Ｓ８０２のように直交変換およびアフィン変換し、これらの変換後のベクトルと、検索対象データと被検索対象データとが類似していると判定するために変換後のベクトルが満たすべき内積値を条件として算出する。 And when concealing the quantitative data to be concealed of the search target data to be searched for by the user terminal 300 as the vector X, in the second embodiment, first, the similarity determination processing unit 314 of the user terminal 300 A scalar s is randomly generated (S801), orthogonal transformation and affine transformation are performed to calculate X ′ = sTX (S802), and each Y′i and scalar value derived from a threshold value that the inner product value with the vector X should satisfy That is, a condition for the inner product value satisfying the threshold is calculated (S803). The user terminal 300 transmits the above X ′ and the condition obtained in S803 to the similar search server 200 (S804). That is, the similarity determination processing unit 314 of the user terminal 300 regards the quantitative data included in the search target data as a vector, and performs orthogonal transformation and affine transformation on the vector as in S802, and after these transformations In order to determine that the vector is similar to the search target data and the search target data, the inner product value to be satisfied by the converted vector is calculated as a condition.

そして、後述するように、類似検索サーバ２００の類似性判定計算処理部２１２は、質的データに基づいて類似していると判定した被検索対象データのうち上記内積値を満たす量的データを含む被検索対象データを前記データとして出力する。 As will be described later, the similarity determination calculation processing unit 212 of the similarity search server 200 includes quantitative data that satisfies the inner product value among the search target data determined to be similar based on the qualitative data. The search target data is output as the data.

類似検索サーバ２００の類似性判定計算処理部２１２は、記録されている｛Ｙ’ｉ｝とＸ’との内積を計算し、Ｓ８０３で算出された条件に合致するものを抽出して被検索対象データを絞込み、そのリストを生成する（Ｓ８０５）。その後、類似性判定計算処理部２１２は、図５に示したＳ５０７以降の処理と同様の処理を実行し、抽出したデータを送信する（Ｓ８０６、図５のＳ５０７、Ｓ５０８）。 The similarity determination calculation processing unit 212 of the similar search server 200 calculates the inner product of the recorded {Y′i} and X ′, extracts the one that matches the condition calculated in S803, and is searched for The data is narrowed down and the list is generated (S805). Thereafter, the similarity determination calculation processing unit 212 executes the same processing as the processing after S507 shown in FIG. 5 and transmits the extracted data (S806, S507 and S508 in FIG. 5).

（第二実施形態のまとめ）
第二実施形態によれば、ユーザは量的データのうち秘匿したい部分についても事前の暗号化預託と、秘匿化したままでの絞込み手順を実施することで、絞込みを実施可能で、効率的な類似検索が可能となる。(Summary of second embodiment)
According to the second embodiment, the user can perform narrowing down by performing a prior encryption deposit and a narrowing-down procedure while keeping confidential portions of quantitative data to be concealed. Similar search is possible.

１００：ネットワーク
２００：類似検索サーバ
２０１：入力部
２０２：出力部
２０３：通信部
２１０：制御部
２１１：全体処理部
２１２：類似性判定計算処理部
２２０：記憶部
２３０：データ記憶部
２３１：データ
２４０：一時情報記憶部
３００、４００、５００：ユーザ端末
３０１：入力部
３０２：出力部
３０３：通信部
３１０：制御部
３１１：全体処理部
３１２：暗号化処理部
３１３：復号化処理部
３１４：類似性判定処理部
３２０：記憶部
３３０：データ記憶部
３４０：暗復号化鍵記憶部
３４１：暗号化鍵
３４２：復号化鍵
３５０：一時情報記憶部。100: network 200: similarity search server 201: input unit 202: output unit 203: communication unit 210: control unit 211: overall processing unit 212: similarity determination calculation processing unit 220: storage unit 230: data storage unit 231: data 240 : Temporary information storage unit 300, 400, 500: User terminal 301: Input unit 302: Output unit 303: Communication unit 310: Control unit 311: Overall processing unit 312: Encryption processing unit 313: Decryption processing unit 314: Similarity Determination processing unit 320: storage unit 330: data storage unit 340: encryption / decryption key storage unit 341: encryption key 342: decryption key 350: temporary information storage unit.

Claims

A similarity concealment search system that searches for search target data similar to search target data that is concealed according to a search request,
User terminal
An input unit for receiving input of the search target data including qualitative data or quantitative data;
An encryption processing unit that encrypts the qualitative data included in the search target data using an encryption method capable of determining coincidence or mismatch;
A similarity determination processing unit that calculates a condition for both to have a certain degree of similarity based on the quantitative data included in the search target data and the quantitative data included in the search target data; ,
A communication unit that transmits the encrypted qualitative data and the condition to a similar search server for searching the search target data; and
The similar search server
A server communication unit that receives the qualitative data and the condition from the user terminal, or transmits the search target data determined to be similar to the search target data, to the user terminal;
It is determined whether or not both are similar based on the qualitative data received from the user terminal and the qualitative data included in the search target data stored in advance in the storage unit. Similarity determination calculation that outputs search target data including quantitative data satisfying the condition among the search target data determined to be similar as data similar to the search target data A processing unit;
A similarity concealment search system comprising:

The similarity concealment search system according to claim 1,
The similarity determination processing unit of the user terminal regards the quantitative data as a vector, generates a set of one or a plurality of base vectors and a scalar value constituting the vector, generates the base vector, and In order to determine that the search target data and the search target data are similar, the inner product value to be satisfied by the basis vector is calculated as the condition,
The similarity determination calculation processing unit of the similar search server includes search target data including the quantitative data satisfying the inner product value among the search target data determined to be similar based on qualitative data. Output as the data,
The similarity concealment search system characterized by this.

The similarity concealment search system according to claim 2,
The similarity determination processing unit of the user terminal regards the quantitative data included in the search target data as a vector, and performs orthogonal transform and affine transform on the vector, the vector after each transform, and the search In order to determine that the target data and the search target data are similar, the inner product value to be satisfied by the converted vector is calculated as the condition,
The similarity determination calculation processing unit of the similar search server includes search target data including the quantitative data satisfying the inner product value among the search target data determined to be similar based on qualitative data. Output as the data,
The similarity concealment search system characterized by this.

The similarity concealment search system according to claim 1,
The input unit accepts input of the qualitative data or the quantitative data including plaintext that is not concealed;
The encryption processing unit does not perform processing when the qualitative data is the plaintext,
The similarity determination processing unit is based on the quantitative data included in the search target data and the quantitative data included in the search target data. To calculate the conditions of
The similarity concealment search system characterized by this.

A similarity concealment search method for searching data to be searched similar to search target data concealed according to a search request,
An input step of receiving input of the search target data including qualitative data or quantitative data from a user terminal;
An encryption processing step of encrypting the qualitative data included in the search target data by an encryption method capable of determining coincidence or mismatch;
A similarity determination processing step for calculating a condition for both to have a certain similarity or more based on the quantitative data included in the search target data and the quantitative data included in the search target data; ,
Transmitting the encrypted qualitative data and the condition to a similar search server for searching the search target data;
Receiving the qualitative data and the condition from the user terminal;
A determination step of determining whether or not both are similar based on the qualitative data received from the user terminal and the qualitative data included in the search target data stored in advance in a storage unit;
When it is determined that they are similar, the search target data including quantitative data that satisfies the above condition among the search target data determined to be similar is output as data similar to the search target data Similarity determination calculation processing step,
A server transmission step of transmitting the search target data determined to be similar to the search target data to the user terminal;
The similarity concealment search method characterized by including this.

The similarity concealment search method according to claim 5,
In the similarity determination processing step, the quantitative data is regarded as a vector, a set of one or a plurality of base vectors and a scalar value constituting the vector is generated, the generated base vector, the search target data, In order to determine that the search target data is similar, the inner product value to be satisfied by the basis vector is calculated as the condition,
In the similarity determination calculation processing step, search target data including the quantitative data satisfying the inner product value among the search target data determined to be similar based on qualitative data is output as the data. ,
The similarity concealment search method characterized by this.

The similarity concealment search method according to claim 6,
In the similarity determination processing step, the quantitative data included in the search target data is regarded as a vector, orthogonal transformation and affine transformation are performed on the vector, the vector after each transformation, the search target data, and the search target data In order to determine that the search target data is similar, the inner product value to be satisfied by the converted vector is calculated as the condition,
In the similarity determination calculation processing step, search target data including the quantitative data satisfying the inner product value among the search target data determined to be similar based on qualitative data is output as the data. ,
The similarity concealment search method characterized by this.

The similarity concealment search method according to claim 5,
In the input step, the input of the qualitative data or the quantitative data including plaintext that is not concealed is accepted,
In the encryption processing step, when the qualitative data is the plaintext, the processing is not performed,
In the similarity determination processing step, both have a certain similarity or more based on the quantitative data of the plaintext included in the search target data and the quantitative data included in the search target data. To calculate the conditions of
The similarity concealment search method characterized by this.