JP2013210933A

JP2013210933A - Recommendation support method, recommendation support device and program

Info

Publication number: JP2013210933A
Application number: JP2012081872A
Authority: JP
Inventors: Takayasu Yamaguchi; 高康山口; Masayuki Terada; 雅之寺田
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2012-03-30
Filing date: 2012-03-30
Publication date: 2013-10-10

Abstract

PROBLEM TO BE SOLVED: To rapidly provide a recommendation with a high degree of accuracy while preventing leakage of information.SOLUTION: A server 10 extracts a user identifier for each attribute and encrypts it (a first arithmetic result). A recommendation support device 30 extracts the user identifier for each option and encrypts it (a second arithmetic result). The server 10 encrypts the second arithmetic result (a third arithmetic result). The recommendation support device 30 encrypts the first arithmetic result (a fourth arithmetic result). The recommendation support device 30, with respect to all combinations of attributes and options, obtains a first summary value by tallying the number of code texts coinciding by the third and fourth arithmetic results and generates multiple sets after randomly excluding one or more code texts from the code texts coinciding by the third and fourth arithmetic results, and with respect to each of the sets, it obtains a second summary value by tallying the number of code texts coinciding with the fourth arithmetic result and calculates a parameter for obtaining a product recommendation value on the basis of the first and second summary values.

Description

本発明は、ユーザの属性に応じたリコメンドを行う技術に関する。 The present invention relates to a technique for performing a recommendation according to a user attribute.

商品に関する情報をコンパクト且つ迅速に提供することは、商品の販売において重要である。また、計算機の高速化、記憶装置の大容量化、統計処理技術の発展等に伴い、近年では顧客の属性や購買行動の統計的な分析に基づいた商品の推薦（以下、リコメンドという。）の有用性が注目されている。
しかし、小規模な商店などでは、適切なリコメンドを行うのに十分な売上記録を保有しているとは限らない。売上記録が少ないと、リコメンドの精度が低下してしまうコールドスタート問題と呼ばれる状況に陥りやすい。また、商店へは、該当する売上記録のない新規の客もやってくる。 Providing information on products in a compact and quick manner is important in selling products. In recent years, with the speeding up of computers, the increase in capacity of storage devices, and the development of statistical processing techniques, product recommendations based on statistical analysis of customer attributes and purchasing behavior (hereinafter referred to as recommendations) are being made. Usefulness is attracting attention.
However, small stores do not always have enough sales records to make appropriate recommendations. If there are few sales records, it is easy to get into a situation called the cold start problem, where the accuracy of the recommendation decreases. In addition, new customers who do not have a corresponding sales record come to the store.

このような環境においては、商店が、大量の情報を保有する組織と連携してリコメンドを行うことが有用であると考えられる。例えば、携帯電話の通信サービスを提供する通信事業者や、電子マネーの発行者など、個人の属性情報（性別、住所、年齢など）を管理する事業者と、商品の売上記録を保有する商店とが連携することにより、商店への来訪客の属性に応じたリコメンドを行うことが可能になる。しかし、プライバシ保護への要求が高まってきている昨今、個人の属性情報の取扱いには慎重さが求められる。
そこで、暗号化技術や統計的開示制御技術などを用いたプライバシ保護データマイニング（ＰＰＤＭ）技術が提案されている。ＰＰＤＭ技術を適切に用いることにより、組織の枠を超えたリコメンドを安全に提供することが可能になると期待される。例えば、非特許文献１、２では、垂直分割データベースにおいて各データベースが互いにデータを開示し合うことなくベイズアプローチによる識別を行う技術が提案されている。 In such an environment, it is considered useful for a store to make a recommendation in cooperation with an organization that holds a large amount of information. For example, a carrier that manages personal attribute information (gender, address, age, etc.), such as a telecommunications carrier that provides mobile phone communication services or an issuer of electronic money, and a store that holds sales records of products. By coordinating, it becomes possible to make recommendations according to the attributes of visitors to the store. However, with the increasing demand for privacy protection, the handling of personal attribute information requires caution.
Therefore, a privacy protection data mining (PPDM) technique using an encryption technique or a statistical disclosure control technique has been proposed. Appropriate use of PPDM technology is expected to make it possible to safely provide recommendations beyond organizational boundaries. For example, Non-Patent Documents 1 and 2 propose a technique for performing identification using a Bayesian approach without disclosing data from each other in a vertically partitioned database.

Vaidya, J., Kantarcioglu, M. and Clifton, C. :Privacy-preserving Naive Bayes classication,The VLDB Journal, Vol. 17, pp. 879-898 (2008).Vaidya, J., Kantarcioglu, M. and Clifton, C.: Privacy-preserving Naive Bayes classication, The VLDB Journal, Vol. 17, pp. 879-898 (2008). 菊池浩明，香川大介，石井一彦，寺田雅之，本郷節之，組織間プライバシー保護データマイニングの考察，電子情報通信学会暗号と情報セキュリティシンポジウム2010，pp. 1-5 (2010).Hiroaki Kikuchi, Daisuke Kagawa, Kazuhiko Ishii, Masayuki Terada, Nobuyuki Hongo, Consideration of Inter-organizational Privacy Protection Data Mining, IEICE Cryptography and Information Security Symposium 2010, pp. 1-5 (2010). Thomas P. Minka, : Estimating a Dirichlet distribution, 2000 revised 2003, 2009,http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/Thomas P. Minka,: Estimating a Dirichlet distribution, 2000 revised 2003, 2009, http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/

しかしながら、上記の従来技術を用いて来訪客へのリコメンドを実施することは実用上容易ではない。その主な理由として、来訪客のプライバシ保護の問題、精度の問題、学習と識別にかかる計算コストの問題が挙げられる。すなわち、１）従来技術はプロバイダ・商店間でのデータ開示を防止するが、リコメンドを受ける来訪客のプライバシを保護する手段が十分でないこと、２）スムージング最適化などの識別精度向上のための方法が適用困難であり、過剰適合などによるリコメンド精度の低下が避けられないこと、３）来訪客へのリコメンドの度に商店がプロバイダと計算コストの高いマルチパーティ計算を行う必要があること、などの問題がある。
そこで、本発明は、情報の漏洩を防ぎつつ、高い精度で迅速にリコメンドを提供することを目的とする。 However, it is not easy in practice to recommend a visitor using the above-described conventional technology. The main reasons include visitor privacy protection issues, accuracy issues, and computational costs for learning and identification. That is, 1) Although the prior art prevents the disclosure of data between the provider and the store, there is not enough means for protecting the privacy of the visitor receiving the recommendation. 2) A method for improving the identification accuracy such as smoothing optimization. Is difficult to apply, and it is inevitable that recommendation accuracy is reduced due to overconformity, etc. 3) Every time a recommendation is made to a visitor, the store needs to perform multi-party calculation with a provider at a high calculation cost, etc. There's a problem.
Therefore, an object of the present invention is to provide a recommendation quickly with high accuracy while preventing leakage of information.

本発明は、端末のユーザを識別するためのユーザ識別子と、当該ユーザの属性を表す属性情報とが対応付けられたユーザ情報を記憶するサーバと、前記ユーザによって選択された選択肢と、当該ユーザに対応する前記ユーザ識別子とが対応付けられた選択履歴を記憶するリコメンド支援装置と、自装置のユーザの属性を表す属性情報を記憶する端末とによって実行されるリコメンド支援方法であって、前記サーバが、前記属性毎に当該属性に対応付けられたユーザ識別子を前記ユーザ情報から抽出し、当該ユーザ識別子を第１暗号化方式で暗号化した第１演算結果を前記リコメンド支援装置に送信する第１演算ステップと、前記リコメンド支援装置が、前記選択肢毎に当該選択肢に対応付けられたユーザ識別子を前記選択履歴から抽出し、当該ユーザ識別子を前記第１暗号化方式で暗号化した第２演算結果を前記サーバに送信する第２演算ステップと、前記サーバが、前記リコメンド支援装置から受信した第２演算結果を前記第１暗号化方式で暗号化した第３演算結果を前記リコメンド支援装置に送信する第３演算ステップと、前記リコメンド支援装置が、前記サーバから受信した第１演算結果を前記第１暗号化方式で暗号化した第４演算結果を算出する第４演算ステップと、前記リコメンド支援装置が、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文の数を集計して第１集計値を求める第１集計ステップと、前記リコメンド支援装置が、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文から１つ以上の暗号文をランダムに除いた複数の集合を作成し、当該複数の集合の各々について前記第４演算結果と一致する暗号文の数を集計して第２集計値を求める第２集計ステップと、前記リコメンド支援装置が、前記第１集計値と前記第２集計値とに基づいて前記商品に対応するリコメンド値を求めるためのパラメータを算出するパラメータ算出ステップとを有することを特徴とするリコメンド支援方法を提供する。 The present invention relates to a server that stores user information in which a user identifier for identifying a user of a terminal and attribute information representing the attribute of the user are associated with each other, an option selected by the user, A recommendation support method that is executed by a recommendation support apparatus that stores a selection history associated with the corresponding user identifier and a terminal that stores attribute information representing an attribute of a user of the own apparatus. A first operation for extracting a user identifier associated with the attribute for each attribute from the user information and transmitting a first operation result obtained by encrypting the user identifier with a first encryption method to the recommendation support device And the recommendation support device extracts a user identifier associated with the option for each option from the selection history, A second calculation step of transmitting a second calculation result obtained by encrypting a user identifier using the first encryption method to the server; and a second calculation result received by the server from the recommendation support device. A third calculation step of transmitting a third calculation result encrypted by the method to the recommendation support device; and a first calculation result obtained by the recommendation support device encrypting the first calculation result received from the server by the first encryption method. A fourth calculation step for calculating four calculation results, and the recommendation support apparatus determines the number of ciphertexts that match between the third calculation result and the fourth calculation result for all combinations of the attribute and the option. A first summing step for summing up to obtain a first summed value; and the recommendation support apparatus, for all combinations of the attribute and the option, A plurality of sets are generated by randomly removing one or more ciphertexts from ciphertexts that match the fourth operation result, and the number of ciphertexts that match the fourth operation result is determined for each of the plurality of sets. A second counting step for calculating and calculating a second total value; and a parameter for calculating a recommendation value corresponding to the product by the recommendation support device based on the first total value and the second total value And a parameter calculation step. A recommendation support method is provided.

前記リコメンド支援方法において、前記端末が、自装置に記憶された属性情報を加法準同型性を満たす第２暗号化方式で暗号化した暗号化属性情報を前記リコメンド支援装置に送信する属性情報暗号化ステップと、前記リコメンド支援装置が、前記暗号化属性情報を前記端末から受信し、前記パラメータ算出ステップにおいて算出されたパラメータと当該暗号化属性情報とに基づいて、前記リコメンド値を加法準同型性を満たす第３暗号化方式で暗号化した暗号化リコメンド値を算出し、当該暗号化リコメンド値を前記端末に送信するリコメンド値暗号化ステップと、前記端末が、前記暗号化リコメンド値を前記リコメンド支援装置から受信し、当該暗号化リコメンド値を復号してリコメンド値を算出する復号ステップとを有するようにしてもよい。 In the recommendation support method, attribute information encryption in which the terminal transmits to the recommendation support device encrypted attribute information obtained by encrypting the attribute information stored in the own device with a second encryption method satisfying additive homomorphism. And the recommendation support apparatus receives the encrypted attribute information from the terminal, and based on the parameter calculated in the parameter calculating step and the encrypted attribute information, the recommended value is added to the homomorphism. A recommendation value encryption step of calculating an encrypted recommendation value encrypted by the third encryption method to be satisfied, and transmitting the encrypted recommendation value to the terminal; and the terminal sends the encrypted recommendation value to the recommendation support device. A decryption step of decrypting the encrypted recommendation value and calculating the recommendation value. Good.

また、前記リコメンド支援方法において、前記サーバ及び前記リコメンド支援装置は、前記第１暗号化方式として冪剰余計算を用いて暗号化を行うようにしてもよい。 In the recommendation support method, the server and the recommendation support apparatus may perform encryption using residue calculation as the first encryption method.

また、前記第１演算ステップにおいて、前記サーバは、前記ユーザ識別子を一方向関数で演算してから冪剰余計算を行い、前記第２演算ステップにおいて、前記リコメンド支援装置は、前記ユーザ識別子を一方向関数で演算してから冪剰余計算を行うようにしてもよい。 In the first calculation step, the server performs a remainder calculation after calculating the user identifier with a one-way function, and in the second calculation step, the recommendation support device sets the user identifier in one direction. The remainder calculation may be performed after calculating with a function.

また、前記第１演算ステップにおいて、前記サーバは、前記第１演算結果に含まれる暗号文を並べ替えて前記リコメンド支援装置に送信し、前記第３演算ステップにおいて、前記サーバは、前記第３演算結果に含まれる暗号文を並べ替えて前記リコメンド支援装置に送信するようにしてもよい。 In the first calculation step, the server rearranges ciphertexts included in the first calculation result and transmits the ciphertexts to the recommendation support device. In the third calculation step, the server performs the third calculation. The ciphertexts included in the result may be rearranged and transmitted to the recommendation support apparatus.

また、前記リコメンド値暗号化ステップにおいて、前記リコメンド支援装置は、前記第３暗号化方式として加法準同型性を満たす冪剰余計算を用いて前記リコメンド値を暗号化するようにしてもよい。 Further, in the recommendation value encryption step, the recommendation support apparatus may encrypt the recommendation value using a remainder calculation satisfying additive homomorphism as the third encryption method.

また、本発明は、端末のユーザを識別するためのユーザ識別子と、当該ユーザの属性を表す属性情報とが対応付けられたユーザ情報を記憶するユーザ情報記憶手段と、前記属性毎に当該属性に対応付けられたユーザ識別子を前記ユーザ情報から抽出し、当該ユーザ識別子を第１暗号化方式で暗号化した第１演算結果を外部に送信する第１演算手段と、外部から受信した第２演算結果を前記第１暗号化方式で暗号化した第３演算結果を外部に送信する第３演算手段とを有するサーバとの間、及び、自装置のユーザの属性を表す属性情報を記憶する属性情報記憶手段と、当該属性情報を加法準同型性を満たす第２暗号化方式で暗号化した暗号化属性情報を外部に送信する属性情報暗号化手段と、暗号化リコメンド値を外部から受信し、当該暗号化リコメンド値を復号してリコメンド値を算出する復号手段とを有する端末との間で通信を行う通信手段と、前記ユーザによって選択された選択肢と、当該ユーザに対応する前記ユーザ識別子とが対応付けられた選択履歴を記憶する選択履歴記憶手段と、前記選択肢毎に当該選択肢に対応付けられたユーザ識別子を前記選択履歴から抽出し、当該ユーザ識別子を前記第１暗号化方式で暗号化した第２演算結果を前記サーバに送信する第２演算手段と、前記サーバから受信した第１演算結果を前記第１暗号化方式で暗号化した第４演算結果を算出する第４演算手段と、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文の数を集計して第１集計値を求める第１集計手段と、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文から１つ以上の暗号文をランダムに除いた複数の集合を作成し、当該複数の集合の各々について前記第４演算結果と一致する暗号文の数を集計して第２集計値を求める第２集計手段と、前記第１集計値と前記第２集計値とに基づいて前記商品に対応するリコメンド値を求めるためのパラメータを算出するパラメータ算出手段とを有することを特徴とするリコメンド支援装置を提供する。
前記リコメンド支援装置において、前記暗号化属性情報を前記端末から受信し、前記パラメータ算出手段によって算出されたパラメータと当該暗号化属性情報とに基づいて、前記リコメンド値を加法準同型性を満たす第３暗号化方式で暗号化した暗号化リコメンド値を算出し、当該暗号化リコメンド値を前記端末に送信するリコメンド値暗号化手段を有するようにしてもよい。 In addition, the present invention provides a user information storage unit that stores user information in which a user identifier for identifying a user of a terminal and attribute information representing the attribute of the user are associated with each other. A first computing means for extracting the associated user identifier from the user information and transmitting the first computation result obtained by encrypting the user identifier by the first encryption method to the outside; and the second computation result received from the outside. Attribute information storage for storing attribute information representing the attributes of the user of the device itself, and a server having third calculation means for transmitting the third calculation result encrypted with the first encryption method to the outside Means, attribute information encryption means for transmitting the attribute information encrypted by the second encryption method that satisfies the additive homomorphism, and the encryption recommendation value received from the outside, and the encryption Conversion Communication means for performing communication with a terminal having decoding means for decoding a recommendation value and calculating a recommendation value, an option selected by the user, and the user identifier corresponding to the user are associated with each other. A selection history storage means for storing the selected history, and a second operation in which, for each option, a user identifier associated with the option is extracted from the selection history, and the user identifier is encrypted by the first encryption method. Second computing means for transmitting a result to the server; fourth computing means for calculating a fourth computation result obtained by encrypting the first computation result received from the server by the first encryption method; For all combinations with options, a first counting means for counting the number of ciphertexts matching the third calculation result and the fourth calculation result to obtain a first total value; For all combinations with the above-mentioned options, a plurality of sets are created by randomly removing one or more ciphertexts from the ciphertexts that match in the third operation result and the fourth operation result, A second totaling unit that counts the number of ciphertexts that match the fourth calculation result for each to obtain a second totaled value, and corresponds to the product based on the first totaled value and the second totaled value There is provided a recommendation support apparatus comprising parameter calculation means for calculating a parameter for obtaining a recommendation value.
In the recommendation support device, the encrypted attribute information is received from the terminal, and based on the parameter calculated by the parameter calculation means and the encrypted attribute information, the recommendation value satisfies an additive homomorphism. You may make it have the recommendation value encryption means which calculates the encryption recommendation value encrypted with the encryption system, and transmits the said encryption recommendation value to the said terminal.

また、本発明は、端末のユーザを識別するためのユーザ識別子と、当該ユーザの属性を表す属性情報とが対応付けられたユーザ情報を記憶するユーザ情報記憶手段と、前記属性毎に当該属性に対応付けられたユーザ識別子を前記ユーザ情報から抽出し、当該ユーザ識別子を第１暗号化方式で暗号化した第１演算結果を外部に送信する第１演算手段と、外部から受信した第２演算結果を前記第１暗号化方式で暗号化した第３演算結果を外部に送信する第３演算手段とを有するサーバとの間、及び、自装置のユーザの属性を表す属性情報を記憶する属性情報記憶手段と、当該属性情報を加法準同型性を満たす第２暗号化方式で暗号化した暗号化属性情報を外部に送信する属性情報暗号化手段と、暗号化リコメンド値を外部から受信し、当該暗号化リコメンド値を復号してリコメンド値を算出する復号手段とを有する端末との間で通信を行う通信手段を有するコンピュータを、前記ユーザによって選択された選択肢と、当該ユーザに対応する前記ユーザ識別子とが対応付けられた選択履歴を記憶する選択履歴記憶手段と、前記選択肢毎に当該選択肢に対応付けられたユーザ識別子を前記選択履歴から抽出し、当該ユーザ識別子を前記第１暗号化方式で暗号化した第２演算結果を前記サーバに送信する第２演算手段と、前記サーバから受信した第１演算結果を前記第１暗号化方式で暗号化した第４演算結果を算出する第４演算手段と、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文の数を集計して第１集計値を求める第１集計手段と、前記属性と前記選択肢との全ての組み合わせについて、前記第３演算結果と前記第４演算結果とで一致する暗号文から１つ以上の暗号文をランダムに除いた複数の集合を作成し、当該複数の集合の各々について前記第４演算結果と一致する暗号文の数を集計して第２集計値を求める第２集計手段と、前記第１集計値と前記第２集計値とに基づいて前記商品に対応するリコメンド値を求めるためのパラメータを算出するパラメータ算出手段として機能させるためのプログラムを提供する。 In addition, the present invention provides a user information storage unit that stores user information in which a user identifier for identifying a user of a terminal and attribute information representing the attribute of the user are associated with each other. A first computing means for extracting the associated user identifier from the user information and transmitting the first computation result obtained by encrypting the user identifier by the first encryption method to the outside; and the second computation result received from the outside. Attribute information storage for storing attribute information representing the attributes of the user of the device itself, and a server having third calculation means for transmitting the third calculation result encrypted with the first encryption method to the outside Means, attribute information encryption means for transmitting the attribute information encrypted by the second encryption method that satisfies the additive homomorphism, and the encryption recommendation value received from the outside, and the encryption Conversion An option selected by the user, and a user identifier corresponding to the user, having a communication unit that performs communication with a terminal having a decoding unit that decodes the recommendation value and calculates the recommendation value Selection history storage means for storing the associated selection history, and a user identifier associated with the option for each option is extracted from the selection history, and the user identifier is encrypted with the first encryption method Second calculation means for transmitting a second calculation result to the server; fourth calculation means for calculating a fourth calculation result obtained by encrypting the first calculation result received from the server by the first encryption method; A first set for obtaining a first total value by totalizing the number of ciphertexts that match between the third calculation result and the fourth calculation result for all combinations of attributes and options. And a plurality of sets obtained by randomly removing one or more ciphertexts from ciphertexts matching the third operation result and the fourth operation result for all combinations of the attribute and the option. , Based on the second totaling means for totaling the number of ciphertexts that match the fourth operation result for each of the plurality of sets to obtain a second total value, the first total value, and the second total value And a program for functioning as parameter calculation means for calculating a parameter for obtaining a recommendation value corresponding to the product.

本発明によれば、情報の漏洩を防ぎつつ、高い精度で迅速にリコメンドを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, a recommendation can be provided rapidly with high precision, preventing the leakage of information.

リコメンド支援システム１の構成を示す図。The figure which shows the structure of the recommendation assistance system 1. FIG. リコメンド支援装置３０のハードウェア構成を示す図。The figure which shows the hardware constitutions of the recommendation assistance apparatus 30. サーバ１０のハードウェア構成を示す図。The figure which shows the hardware constitutions of the server. 端末５０のハードウェア構成を示す図。The figure which shows the hardware constitutions of the terminal 50. 実施形態の動作を示すシーケンス図。The sequence diagram which shows operation | movement of embodiment. １ヶ月あたりの売上人数と処理時間との関係を示す図。The figure which shows the relationship between the sales number of persons per month, and processing time. 商店が利用できるＰＣの台数と、リコメンドの遅延との関係を示す図。The figure which shows the relationship between the number of PC which a shop can use, and the delay of recommendation.

（１）実施形態の構成
図１は、本発明の一実施形態に係るリコメンド支援システム１の構成を示す図である。リコメンド支援システム１は、サーバ１０、リコメンド支援装置３０および端末５０を有する。
図２は、リコメンド支援装置３０のハードウェア構成を示す図である。リコメンド支援装置３０は、ＣＰＵ（Central Processing Unit）などの演算装置とＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）などの記憶装置とを備えた制御部３０１と、ハードディスク装置などの記憶装置を備えた記憶部３０２と、通信インターフェースを備えた通信部３０３とを備える。記憶部３０２には、ＯＳ（Operating System）やアプリケーションプログラムが記憶されており、制御部３０１がこれらのプログラムを実行することによってリコメンド支援装置３０の動作を制御する。制御部３０１は、通信部３０３を制御することにより、サーバ１０及び端末５０との間で通信を行う。 (1) Configuration of Embodiment FIG. 1 is a diagram showing a configuration of a recommendation support system 1 according to an embodiment of the present invention. The recommendation support system 1 includes a server 10, a recommendation support device 30, and a terminal 50.
FIG. 2 is a diagram illustrating a hardware configuration of the recommendation support apparatus 30. The recommendation support device 30 includes a control unit 301 including an arithmetic device such as a CPU (Central Processing Unit) and a storage device such as a ROM (Read Only Memory) and a RAM (Random Access Memory), and a storage device such as a hard disk device. The storage unit 302 includes a communication unit 303 including a communication interface. The storage unit 302 stores an OS (Operating System) and application programs, and the control unit 301 controls the operation of the recommendation support apparatus 30 by executing these programs. The control unit 301 performs communication between the server 10 and the terminal 50 by controlling the communication unit 303.

図３は、サーバ１０のハードウェア構成を示す図である。サーバ１０も、リコメンド支援装置３０と同様に、制御部１０１と記憶部１０２と通信部１０３とを有する。制御部１０１が記憶部１０２に記憶されたプログラムを実行することによってサーバ１０の動作を制御する。制御部１０１は、通信部１０３を制御することにより、リコメンド支援装置３０との間で通信を行う。 FIG. 3 is a diagram illustrating a hardware configuration of the server 10. Similarly to the recommendation support apparatus 30, the server 10 also includes a control unit 101, a storage unit 102, and a communication unit 103. The control unit 101 controls the operation of the server 10 by executing a program stored in the storage unit 102. The control unit 101 communicates with the recommendation support apparatus 30 by controlling the communication unit 103.

図４は、端末５０のハードウェア構成を示す図である。端末５０は、ＣＰＵなどの演算装置とＲＯＭ及びＲＡＭなどの記憶装置とを備えた制御部５０１と、ＳＲＡＭ（Static Random Access Memory）などの記憶装置を備えた記憶部５０２と、アンテナや無線通信インターフェースを備えた通信部５０３と、スピーカ、マイクロホン及び音声処理回路を備えた音声入出力部５０４と、複数のキーやタッチスクリーンなどの操作子を備えた操作部５０５と、液晶パネルや液晶駆動回路を備えた表示部５０６とを備えている。記憶部５０２には、ＯＳやアプリケーションプログラムが記憶されており、制御部５０１がこれらのプログラムを実行することによって端末５０の動作を制御する。また、制御部５０１は、操作部５０５が受け付けたユーザの操作に応じて通信部５０３を制御することにより、リコメンド支援装置３０との間で通信を行う。 FIG. 4 is a diagram illustrating a hardware configuration of the terminal 50. The terminal 50 includes a control unit 501 including an arithmetic device such as a CPU and a storage device such as a ROM and a RAM, a storage unit 502 including a storage device such as an SRAM (Static Random Access Memory), an antenna, and a wireless communication interface. A communication unit 503 including a voice input / output unit 504 including a speaker, a microphone, and a voice processing circuit, an operation unit 505 including a plurality of keys and an operator such as a touch screen, a liquid crystal panel, and a liquid crystal driving circuit. And a display unit 506 provided. The storage unit 502 stores an OS and application programs, and the control unit 501 controls the operation of the terminal 50 by executing these programs. In addition, the control unit 501 performs communication with the recommendation support apparatus 30 by controlling the communication unit 503 according to a user operation received by the operation unit 505.

本実施形態では、プロバイダと商店と来訪客の三者を想定する。プロバイダは、例えば、先払い式又は後払い式の電子マネーやクレジットカード等の決済業務に関連する情報を管理する事業者や、携帯電話等の通信サービスを提供する通信事業者等であり、サーバ１０を運用する。商店は、リコメンド支援装置３０を運用する。リコメンド支援装置３０は、店舗に備えられていてもよいし、電子商取引における仮想店舗のウェブサイトを運営するサーバであってもよい。来訪客は、端末５０を所持して商店にやってきた客、又は、端末５０から仮想店舗のウェブサイトにアクセスした客である。
三者の各々が保有する情報の構成は次のとおりである。なお、情報の具体的な内容は一例にすぎない。 In the present embodiment, a provider, a store, and a visitor are assumed. The provider is, for example, a business operator that manages information related to settlement operations such as prepaid or postpaid electronic money or credit cards, a communication business provider that provides communication services such as mobile phones, and the like. operate. The store operates the recommendation support device 30. The recommendation support device 30 may be provided in a store or may be a server that operates a website of a virtual store in electronic commerce. A visitor is a customer who has come to a store with the terminal 50, or a customer who has accessed a virtual store website from the terminal 50.
The composition of the information held by each of the three parties is as follows. The specific content of the information is only an example.

プロバイダは、ユーザ情報（表１）を保有する。ユーザ情報は、プロバイダとの間で電子マネー、クレジットカード、通信サービス等の利用に関する契約を結んだユーザを識別するためのユーザ識別子と、当該ユーザの属性（ユーザの年代、性別、住所など）を表す属性情報とが対応付けられた情報である。

The provider holds user information (Table 1). The user information includes a user identifier for identifying a user who has a contract for use of electronic money, a credit card, a communication service, etc. with a provider, and the user's attributes (user age, gender, address, etc.). This is information associated with attribute information to be represented.

商店は、売上記録（表２）を保有する。売上記録は、ユーザによって購入された商品を表す商品情報と、当該ユーザに対応するユーザ識別子とを対応付けた情報である。この例では、商品は本であり、商品情報は、本Ａと本Ｂの２種類である。
なお、売上記録は、本発明に係る選択履歴の一例である。本発明に係る選択履歴は、ユーザによって選択された選択肢と、当該ユーザに対応するユーザ識別子とが対応付けられた情報である。選択肢とは、例えば、ユーザが購入した商品でもよいし、購入する商品の候補としてユーザが選択した商品でもよいし、複数種類の試供品のうちユーザが選択した試供品でもよい。また、選択肢は、物品に限定されず、例えば、通信事業者が提供する通信サービスや、旅行代理店が販売するツアー等、有形又は無形のサービスでもよい。つまり、本発明に係る選択肢とは、物品、情報、サービス等、商店からユーザに提供されるあらゆるものを含む概念であり、また、ユーザが選択した選択肢は、有償で提供されてもよいし、無償で提供されてもよい。

The store has a sales record (Table 2). The sales record is information in which product information representing a product purchased by a user is associated with a user identifier corresponding to the user. In this example, the merchandise is a book, and the merchandise information is of two types, book A and book B.
The sales record is an example of the selection history according to the present invention. The selection history according to the present invention is information in which an option selected by a user is associated with a user identifier corresponding to the user. The option may be, for example, a product purchased by the user, a product selected by the user as a candidate for the product to be purchased, or a sample selected by the user from a plurality of types of samples. The options are not limited to articles, and may be tangible or intangible services such as a communication service provided by a communication carrier and a tour sold by a travel agency. That is, the option according to the present invention is a concept including everything provided to the user from the store, such as goods, information, services, etc., and the option selected by the user may be provided for a fee, It may be provided free of charge.

来訪客は、自身の属性を表す属性情報（表３）を保有する。なお、端末５０は、サーバ１０を運用するプロバイダとの間で利用契約が結ばれた端末でもよいし、他のプロバイダとの間で利用契約が結ばれた端末でもよいし、どのプロバイダとも利用契約が結ばれていない端末でもよい。つまり、サーバ１０を運用するプロバイダは、来訪客のユーザ識別子に対応するユーザ情報を保有していなくてもよく、端末５０がユーザの属性情報を保有さえしていれば、プロバイダとの利用契約の有無は問わない。

The visitor possesses attribute information (Table 3) indicating its own attribute. Note that the terminal 50 may be a terminal that has a usage contract with a provider that operates the server 10, a terminal that has a usage contract with another provider, or a usage contract with any provider. It may be a terminal that is not connected. In other words, the provider that operates the server 10 does not need to have user information corresponding to the user identifier of the visitor. If the terminal 50 has user attribute information, the provider contracts for using the provider. It doesn't matter.

（２）リコメンドに要求される条件
仮に情報漏洩に対する対策を講じる必要がないならば、来訪客の属性に基づくリコメンドは次のようにして達成可能である。
＜手順１＞商店又はプロバイダは、プロバイダが保有する属性情報と商店が保有する商品情報をユーザ識別子によって対応付ける。
＜手順２＞商店は、商品情報と対応付けられた属性情報から、ユーザの属性とユーザが購入した商品との関係を統計的に推定する。
＜手順３＞来訪客は、来店時に商店に対して自身の属性を提示する。
＜手順４＞商店は、手順３で推定した属性と商品との関係と、来訪客から提示された属性とから、来訪客が購入する可能性が高い商品を推定し、来訪客に対して推薦する。
＜手順５＞来訪客がプロバイダのユーザであった場合には、来訪客が商品を購入したならば、商店は、来訪客からユーザ識別子の提供を受け、売上記録を追加する。 (2) Conditions required for recommendations If it is not necessary to take measures against information leakage, recommendations based on the attributes of visitors can be achieved as follows.
<Procedure 1> The store or the provider associates the attribute information held by the provider with the product information held by the store by the user identifier.
<Procedure 2> The store statistically estimates the relationship between the attribute of the user and the product purchased by the user from the attribute information associated with the product information.
<Procedure 3> The visitor presents his / her attributes to the store when visiting the store.
<Procedure 4> Based on the relationship between the attribute and the product estimated in Procedure 3 and the attribute presented by the visitor, the store estimates the product that the visitor is likely to purchase and recommends it to the visitor. To do.
<Procedure 5> When the visitor is a provider user, if the visitor purchases a product, the store receives a user identifier from the visitor and adds a sales record.

しかし、プロバイダと商店と来訪客は、それぞれの立場の違いから、リコメンドに対して次のような条件を要求する。プロバイダはユーザから預かっているユーザ情報の漏洩を防ぎたい。商店は、的確なリコメンドを提供したいが、売上記録の漏洩を防ぎたい。来訪客は、初めて立ち寄った商店でも的確で迅速なリコメンドを受けたいが、自身の属性の漏洩を防ぎたい。また、プロバイダと商店は、ユーザ数や売上記録数が増加した場合でも計算コストを抑えたい。表７は、これらの要求条件をまとめたものである。

However, the provider, the store, and the visitor demand the following conditions from the recommendation because of the difference in their respective positions. The provider wants to prevent leakage of user information stored by the user. The store wants to provide accurate recommendations, but wants to prevent the leakage of sales records. Visitors want to receive accurate and prompt recommendations at the first store they visit, but want to prevent leaks of their attributes. Providers and stores want to reduce the calculation cost even when the number of users and sales records increase. Table 7 summarizes these requirements.

（３）従来方式
ここで、プライバシ保護データマイニングの従来方式の問題点について説明する。
表７で示した要求条件を満たした上でリコメンドを実現するためには、垂直分割データベースにおいて各データベースが保持する値を互いに開示することなくリコメンド値を計算する必要がある。その実現方法としては、決定木に基づく方法、協調フィルタリングに基づく方法、k-meanクラスタリングに基づく方法などが挙げられるが、ここでは、計算コストやコールドスタート問題（売上記録が少ない状況や来訪客が新規の客であった場合などに、リコメンドの精度が低下してしまったり、リコメンド値がそもそも計算できなくなってしまったりするという問題）の回避などの観点から、ベイズアプローチに基づく方法である垂直分割プライバシ保護ナイーブベイズ識別器に着目する。
垂直分割プライバシ保護ナイーブベイズ識別器は、垂直分割された複数のデータベース

の元でベイズアプローチによる識別を行う。その構成法は、Vaidyaら（非特許文献１）、および菊池ら（非特許文献２）によってそれぞれ提案されている。以下、両方式による識別器の構成について説明する。 (3) Conventional Method Here, problems of the conventional method of privacy protection data mining will be described.
In order to realize the recommendation while satisfying the requirements shown in Table 7, it is necessary to calculate the recommendation value without disclosing the values held in each database in the vertically partitioned database. Implementation methods include decision tree-based methods, collaborative filtering-based methods, and k-mean clustering-based methods. Here, however, calculation costs and cold start problems (such as low sales records and Vertical segmentation, which is a method based on the Bayesian approach, from the viewpoint of avoiding the problem that the accuracy of the recommendation decreases or the recommendation value cannot be calculated in the first place when it is a new customer. Focus on privacy protected naive Bayes classifiers.
Vertical Split Privacy Protection Naive Bayes Classifier is a vertically partitioned multiple database

Based on the Bayesian approach. The construction method is proposed by Vaidya et al. (Non-Patent Document 1) and Kikuchi et al. (Non-Patent Document 2), respectively. Hereinafter, the configuration of the discriminator using both types will be described.

非特許文献２で提案された方式（以下、文献２方式という。）では、文献１方式と同様に、式（１）、式（２）のモデルに従って識別器を構成する。ただし、文献１方式の識別

つきセキュアマッチングを用いてこれを計算する。チャフつきセキュアマッチングは、セキュアマッチング（たとえば、Agrawal, R., Evmievski, A. and Srikant, R.: Information sharing across private databases,Proceedings of the 2003 ACM SIGMOD international conference on Management of data,SIGMOD '03, New York, NY, USA, ACM, pp. 86[97 (2003).）の変種であり、送信暗号文を確率的にダミーデータ（チャフ）に挿げ替えて送信することにより、マッチング結果に人為的なノイズを与える。チャフつきセキュアマッチングの演算結果はチャフによる確率的なバイアスを含むため、マルチパーティ計算を用いてこのバイアスを復元する。これにより、プロバイダが保持する顧客の集合と商店が保持する顧客の集合が一致しない（双方のデータベースが同期していない）場合や、それぞれのデータベースに欠損値がある場合などの効率を改善している。 In the method proposed in Non-Patent Document 2 (hereinafter referred to as Document 2 method), the discriminator is configured according to the models of Equation (1) and Equation (2), as in the Literature 1 method. However, identification of the document 1 method

This is calculated using secure matching. Secure matching with chaff is based on secure matching (for example, Agrawal, R., Evmievski, A. and Srikant, R .: Information sharing across private databases, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, SIGMOD '03, New York, NY, USA, ACM, pp. 86 [97 (2003).), And the transmission ciphertext is probabilistically replaced with dummy data (chaff) and sent to the matching result artificially. Noise. Since the calculation result of the secure matching with chaff includes a stochastic bias due to chaff, this bias is restored using multi-party calculation. This improves efficiency when the set of customers held by the provider does not match the set of customers held by the store (both databases are not synchronized), or when there are missing values in each database. Yes.

ただし、この仮定が成立しない状況においても、商店の売上記録に対して、プロバイダのユーザのうち商店の顧客以外ユーザのレコードをダミーレコードとして追加する事前処理によりレコード数を仮想的に一致させれば、文献１方式を適用することが可能である。
なお、ダミーレコードを追加する代わりにプロバイダのユーザ情報から商店の顧客以外のユーザのレコードを除去する方法でもレコード数を一致させることが可能であり、且つ、処理対象のデータ量を大幅に削減することが可能である。しかし、これは安全性の面で問題がある。すなわち、プロバイダのユーザ情報から不要なレコードを削除するために、プロバイダに対して商店の顧客リストを開示する必要がある。すなわち、プロバイダに対し、どのユーザがその商店で買い物をしたことがあるかという情報が開示される。そのため、商店の営業上の秘密がプロバイダに対して開示されるのみならず、商店の顧客のプライバシも保護されなくなる。 However, even in a situation where this assumption does not hold, if the number of records is virtually matched by pre-processing for adding records of the provider users other than the customers of the store as dummy records to the store sales records, It is possible to apply the document 1 method.
In addition, it is possible to match the number of records even by removing the records of users other than the customers of the store from the provider user information instead of adding the dummy records, and greatly reduce the amount of data to be processed. It is possible. However, this is a safety issue. That is, in order to delete unnecessary records from the provider user information, it is necessary to disclose the customer list of the store to the provider. That is, information indicating which user has made a purchase at the store is disclosed to the provider. Therefore, not only the store's business secret is disclosed to the provider, but also the privacy of the store's customers is not protected.

そのため、これ以降の文献１方式に関する説明においては、あらかじめ商店の売上記録に対して、プロバイダのユーザのうち商店の顧客以外のユーザのレコードをダミーレコードとして追加する事前処理が施されているものとする。
なお、文献２方式は秘匿内積計算ではなく、（チャフつき）セキュアマッチングに基づくことから、データベースの同期の問題は生じない。 For this reason, in the following description of the document 1 method, a pre-processing for adding a record of a user other than the customer of the store as a dummy record to the store sales record in advance is performed on the store sales record. To do.
Note that the document 2 method is based on secure matching (with chaff), not on the secret inner product calculation, so that there is no database synchronization problem.

（４．２）安全性
次に、プロバイダ、商店、来訪客のそれぞれに対する安全性について説明する。なお、以下の説明においては、離散対数問題や準同型暗号などの暗号プリミティブは十分に安全であるものとする。
まず、文献１方式は、前述のデータベースの同期が保たれているという仮定のもとでは、商店による来訪客のリコメンド値算出の過程において、商店が保有する売上記録がプロバイダに開示されることはない。また、プロバイダが保有するユーザ情報について、来訪客へのリコメンド結果自体より多くの情報が商店や来訪客に対して開示されることはない。以上のことから、プロバイダおよび商店の情報は十分に安全に保護されると考えられる。
しかし、その一方で、来訪客のプライバシはプロバイダや商店に対して保護されない。式（１）のマルチパーティ計算は、プロバイダと商店との間で実施される。来訪客の属性に応じたリコメンドを行うためには、マルチパーティ計算の入力とするためにをプロバイダか商店のいずれかに開示する必要がある。 (4.2) Safety Next, the safety for each of the provider, the store, and the visitor will be described. In the following description, it is assumed that cryptographic primitives such as the discrete logarithm problem and homomorphic encryption are sufficiently safe.
First, in the document 1 method, on the assumption that the above-mentioned database is kept in synchronization, the sales record held by the store is disclosed to the provider in the process of calculating the recommended value of the visitor by the store. Absent. Further, with regard to user information held by the provider, more information than the recommendation result itself to the visitor is never disclosed to the store or the visitor. From the above, it is considered that information on providers and stores is sufficiently protected.
However, on the other hand, the privacy of visitors is not protected against providers and stores. The multi-party calculation of Equation (1) is performed between the provider and the store. In order to make a recommendation according to the visitor's attributes, it is necessary to disclose to either the provider or the store as input for multi-party calculation.

文献２方式は、プロバイダと商店間でチャフつきセキュアマッチングを実施した後に、マルチパーティ計算を用いてリコメンド値を算出する。チャフつきセキュアマッチングの結果は、最終的なリコメンド結果より多くの情報を含む（チャフの割合に依存する。チャフの割合が多ければ開示される情報量は少なくなるが、精度に悪影響を与える。）ため、商店に開示されるプロバイダの情報は、文献１方式よりも多くなる。そのため、文献１方式と比較して、商店がユーザ情報の内容を統計的に推定しやすくなると考えられる。
しかし、実際に扱われる情報を考慮すると、商店が入手可能なのは、たかだか（仮にチャフの割合を０としても）ある商品を購入した男性は何人かなどの集計値から構成される、Ｖ×Ｌの集計表である。集計値がごく小数になるなどの特殊な場合を除き、集計表から個人の情報を特定することは困難であると言われている（たとえば、瀧敦弘集計表におけるセル秘匿問題とその研究動向，統計数理， Vol. 51, pp. 337-350 (2003))。
なお、来訪客のプライバシに関しては、文献２方式も文献１方式と同様に保護されない。リコメンド値の算出にあたり、来訪客の属性情報はマルチパーティ計算のためにプロバイダもしくは商店に開示されることになる。 In the document 2 method, after performing secure matching with chaff between a provider and a store, a recommendation value is calculated using multi-party calculation. The result of secure matching with chaff includes more information than the final recommendation result (it depends on the ratio of chaff. If the ratio of chaff is large, the amount of information disclosed will be small, but the accuracy will be adversely affected). Therefore, the provider information disclosed to the store is larger than that in the document 1 method. Therefore, it is considered that it becomes easier for the store to statistically estimate the contents of the user information as compared with the document 1 method.
However, in consideration of the information that is actually handled, it is possible to obtain V × L, which is composed of aggregate values such as how many men have purchased a product at most (even if the ratio of chaff is 0). It is a summary table. It is said that it is difficult to identify personal information from the summary table except in special cases where the aggregate value is very small (for example, Akihiro Tsuji. Statistical Mathematics, Vol. 51, pp. 337-350 (2003)).
As for the privacy of visitors, the document 2 method is not protected in the same manner as the document 1 method. In calculating the recommendation value, the attribute information of the visitor is disclosed to the provider or the store for multi-party calculation.

（４．３）精度
次に、精度に関して説明する。文献１方式、文献２方式のいずれも、条件付き確率を用いた最尤推定である。Play Tennisデータセットを用いて実験を行ったところ、提案方式では79%の正解率が得られたが、文献１、２の方式では36%の正解率しか得られなかった。なお、文献２は、実際には安全性を確保するためにチャフを混入するため、さらに精度が低下する懸念がある（精度低下は混入させるチャフの量による）。 (4.3) Accuracy Next, accuracy will be described. Both the literature 1 method and the literature 2 method are maximum likelihood estimations using conditional probabilities. An experiment using the Play Tennis data set revealed that the proposed method achieved a 79% accuracy rate, but the literature 1 and 2 methods only achieved a 36% accuracy rate. In Reference 2, since the chaff is actually mixed in order to ensure safety, there is a concern that the accuracy further decreases (the accuracy decrease depends on the amount of chaff mixed).

文献１方式は、プロバイダと商店が、ユーザの人数Ｎだけの秘匿内積計算を、属性の区分の総数Ｖと商品の種類の数Ｌだけ繰り返す必要がある。これは合計で２ＮＶＬ回の冪剰余計算を必要とする。一方、文献２方式は、合計でＮＷ＋ＭＧ＋ＭＧＶ＋ＮＷＬ回の冪剰余計算を必要とする。すなわち、プロバイダはＮ人Ｗ項目のユーザ情報についてＮＷ回の冪剰余計算を行い、商店はＭ人Ｇ種類の売上記録についてＭＧ回の冪剰余計算を行い、互いに計算結果を交換する。その後、プロバイダは商店から受け取ったデータをＶ回繰り返してＭＧＶ回の冪剰余計算を行い、商店はプロバイダから受け取ったデータをＬ回繰り返してＮＷＬ回の冪剰余計算を行う。例えば、１００万人（Ｎ）の会員を有するプロバイダと、１００種類（Ｌ）の商品からリコメンドを提供したい商店との間で学習を行うとすると、１回の冪剰余計算に要する時間が１．５ｍｓの計算機を用いた場合、文献１方式では１００日以上かかってしまうが、文献２方式では１０日以内で済む。 According to the document 1, the provider and the store need to repeat the secret inner product calculation for the number of users N by the total number V of attribute classifications and the number L of product types. This requires a total of 2NVL remainder calculations. On the other hand, the document 2 method requires NW + MG + MGV + NWL times of surplus calculation in total. That is, the provider performs NW surplus calculation for the user information of the N person W items, and the store performs MG surplus calculation for the sales record of the M person G type and exchanges the calculation results with each other. Thereafter, the provider repeats the data received from the store V times and performs MGV surplus calculations, and the store repeats the data received from the provider L times and performs NWL surplus calculations. For example, when learning is performed between a provider having 1 million (N) members and a store that wants to provide recommendations from 100 types (L) of products, the time required for one surplus calculation is 1. When a computer of 5 ms is used, it takes 100 days or more with the literature 1 method, but within 10 days with the literature 2 method.

式（１）によるリコメンド値の算出は、来訪客が商店に来店するたびに実施される。そのため、この計算は来訪客の来店後、可能な限り速やかに行う必要がある。識別の処理に何分も要するようでは、推薦商品を提示する頃には来訪客は既に買い物を終えてしまっており、リコメンドが全く意味をなさなくなる可能性がある。しかし、学習結果と来訪客の属性情報を用いた識別過程では、文献１方式も文献２方式も来訪客からプロバイダまで含めてのマルチパーティ計算が必要である。来訪客の端末にリコメンドを提示する度にマルチパーティ計算を行うのは、レスポンスの観点から現実的であるとは言えない。 The calculation of the recommendation value by the equation (1) is performed every time the visitor visits the store. Therefore, it is necessary to perform this calculation as soon as possible after the visitor visits the store. If the identification process takes many minutes, the visitor has already finished shopping by the time the recommended product is presented, and the recommendation may not make any sense. However, in the identification process using the learning result and the visitor attribute information, both the document 1 method and the document 2 method require multi-party calculation including visitors to providers. It is not realistic from the viewpoint of response to perform multi-party calculation every time a recommendation is presented to a visitor's terminal.

（５）実施形態の計算モデル
本実施形態の計算モデルについて説明する。
第１に、精度の問題に対しては、経験ベイズ法の導入により精度の向上を図る。しかし、経験ベイズ法で一般に用いられる交差検定は、或る特定のレコードを除去した差分データを用いて処理を行うため、差分データを用いた残差開示により個人データが特定され得るという問題が発生する。そこで、本実施形態では、セキュアマッチングで得られる秘匿されたデータからパラメータ学習を行うことによって、差分データを用いずに経験ベイズ法に基づくベイズ識別器を構築する。
第２に、来訪客のプライバシ保護の問題に対しては、来訪客の属性情報を商店やプロバイダに開示することなくリコメンドを実施するために、秘匿内積計算を導入する。 (5) Calculation model of embodiment The calculation model of this embodiment is demonstrated.
First, for accuracy problems, the accuracy will be improved by introducing an experience Bayesian method. However, since cross-validation generally used in the experience Bayesian method performs processing using difference data from which a specific record is removed, there is a problem that personal data can be specified by residual disclosure using difference data To do. Therefore, in the present embodiment, a Bayes discriminator based on the experience Bayes method is constructed without using difference data by performing parameter learning from secret data obtained by secure matching.
Secondly, in order to carry out the recommendation without disclosing the attribute information of the visitor to the store or the provider, the confidential inner product calculation is introduced for the problem of the privacy protection of the visitor.

（５．１）計算モデルの概要
最初に、経験ベイズ法に基づくベイズ識別器の計算モデルについて、その概要を説明する。
経験ベイズ法は、ベイズ推定における事前確率の問題を回避する方法の一つである。経験ベイズ法は、パラメトリックなベイズモデルにおいて、その事前確率を客観的に設定するために、モデルのパラメータに加えて、ハイパーパラメータと呼ぶ未知変数を導入する。経験ベイズ法では、このハイパーパラメータにも、観測したデータから客観的に適切な値を設定することで高い精度が得られる。 (5.1) Outline of Calculation Model First, an outline of the calculation model of the Bayes classifier based on the experience Bayes method will be described.
The experience Bayesian method is one of the methods for avoiding the problem of prior probability in Bayesian estimation. The empirical Bayes method introduces unknown variables called hyperparameters in addition to model parameters in order to objectively set prior probabilities in a parametric Bayes model. In the experience Bayesian method, high accuracy can be obtained for these hyperparameters by setting appropriate values objectively from the observed data.

（５．３）リコメンド値の算出式
最後に、リコメンド値の算出式について説明する。
本実施形態では、来訪客のプライバシを保護するために、リコメンド支援装置３０と端末５０との間で秘匿内積計算を行うが、冪剰余計算が必要になるため、リコメンドに遅延が生じる。そこで、端末５０で行う冪剰余計算は事前に済ませることによって、遅延を抑える。本実施形態では、プロバイダのユーザ数や売上記録の増加によらず、１つの商品の周辺尤度を１回の冪剰余計算で算出する。 (5.3) Recommendation Value Calculation Formula Finally, the recommendation value calculation formula will be described.
In this embodiment, in order to protect the privacy of a visitor, a confidential inner product calculation is performed between the recommendation support apparatus 30 and the terminal 50. However, since a remainder calculation is required, a delay occurs in the recommendation. Therefore, the delay calculation is suppressed by performing the surplus calculation performed in the terminal 50 in advance. In this embodiment, the marginal likelihood of one product is calculated by a single remainder calculation regardless of the number of users of the provider or the increase in sales records.

（６）実施形態の動作
図５は、実施形態の動作を示すシーケンス図である。以下で説明する動作は、サーバ１０の制御部１０１とリコメンド支援装置３０の制御部３０１と端末５０の制御部５０１がそれぞれプログラムに記述された手順に従って実行する動作であるが、ここでは、図１に示した機能ブロックを動作の主体として説明する。
以下で説明する動作は、リコメンド支援装置３０に対して、リコメンド値算出の指示が入力されたことを契機として実行される。具体的には、当該指示が入力されたならば、リコメンド支援装置３０がサーバ１０に対してステップＳ０１の処理の開始を指示するとともに、リコメンド支援装置３０がステップＳ０２の処理を開始する。 (6) Operation of Embodiment FIG. 5 is a sequence diagram showing the operation of the embodiment. The operations described below are operations executed by the control unit 101 of the server 10, the control unit 301 of the recommendation support apparatus 30, and the control unit 501 of the terminal 50 according to the procedures described in the program. The functional block shown in (1) will be described as the subject of the operation.
The operation described below is executed when a recommendation value calculation instruction is input to the recommendation support apparatus 30. Specifically, when the instruction is input, the recommendation support apparatus 30 instructs the server 10 to start the process of step S01, and the recommendation support apparatus 30 starts the process of step S02.

なお、パラメータの算出（ステップＳ０１からＳ０７までの処理）は、リコメンド値算出の度に実行してもよいが、例えば1ヶ月といった間隔で定期的に実行してもよい。あるいは、第１集計値及び第２集計値に有意な変化が現れることが予想されるタイミングで、商店側の判断により随時、パラメータの算出を実行するようにしてもよい。例えば、新商品の発売後に或る程度の期間が経過した後に実行してもよいし、実際に特定の商品の売れ行きに変化が現れた場合に実行してもよい。このように定期的又は随時にパラメータを算出した場合、算出されたパラメータをリコメンド支援装置３０の記憶部３０２に記憶させておき、リコメンド支援装置３０に対してリコメンド値算出の指示が入力されたならば、記憶部３０２からパラメータを読み出して、ステップＳ０８以降の処理を実行する。
なお、ステップＳ０１からＳ０５までの内容は説明済みであるから、ここでは説明を省略する。 The parameter calculation (processing from steps S01 to S07) may be executed every time the recommendation value is calculated, but may be periodically executed at intervals of, for example, one month. Alternatively, the parameter may be calculated at any time based on the judgment of the store at a timing at which a significant change is expected to appear in the first and second total values. For example, it may be executed after a certain period of time has elapsed after the release of a new product, or may be executed when a change in the sales of a specific product actually appears. When parameters are calculated periodically or as needed in this manner, the calculated parameters are stored in the storage unit 302 of the recommendation support apparatus 30 and a recommendation value calculation instruction is input to the recommendation support apparatus 30. For example, the parameters are read from the storage unit 302, and the processes after step S08 are executed.
In addition, since the content from step S01 to S05 has been demonstrated, description is abbreviate | omitted here.

（７）具体例
（７．１）Book-Crossingデータセット
Book-Crossingデータセット（以下、ＢＣＤＳという。）は、Book-Crossing communityにおける本の評価のデータセットである。このデータセットは、Cai-Nicolas Zieglerによって２００４年８月から９月のうちの４週間クローリングされたものであり、271,379冊の本に対する1,149,780件の評価値を含む。
ＢＣＤＳのユーザは匿名化されているが、年齢と居住地の情報は有している。本実験では、年齢については８０歳を超えるデータは取り除き、１０歳刻みの年代にして用いる。また、居住地については、国際標準化機構（ISO）によって公表されている２４９ヶ国の国名にして用いる。国名を割り当てる際は、まず、ISO3166-1における英語名（アメリカ合衆国の場合であればUnited Statesを含むデータに国名を割り当て、もしもISO3166-1における英語名を含まない場合は、ISO3166-1 alpha-3（アメリカ合衆国の場合であればUSA）を含むデータに国名を割り当てる。 (7) Specific example (7.1) Book-Crossing data set
The Book-Crossing dataset (hereinafter referred to as BCDS) is a dataset for book evaluation in the Book-Crossing community. This data set was crawled for four weeks from August to September 2004 by Cai-Nicolas Ziegler and contains 1,149,780 ratings for 271,379 books.
BCDS users are anonymized, but have information about age and residence. In this experiment, data over 80 years of age is removed, and the age is incremented by 10 years. In addition, the country of residence is used with the names of 249 countries published by the International Organization for Standardization (ISO). When assigning country names, first assign an English name in ISO3166-1 (in the case of the United States, assign a country name to data that includes United States. If you do not include an English name in ISO3166-1, use ISO3166-1 alpha-3. Assign a country name to data that includes (USA for the United States).

ＢＣＤＳの評価値は１０点満点で付与されている。本具体例では、ユーザが５点を超える評価値を与えた本を、ユーザが満足した本であるとみなす。
本具体例では、それぞれの本ごとに満足したユーザの属性を学習しておき、未知のユーザの属性を元に、そのユーザが満足できる本を推薦できるかを計測する。本具体例では、一般的な三交差検定を用いて学習に用いるユーザとテストに用いるユーザとを分けて、リコメンドの精度を測定する。ＢＣＤＳには多くの評価値が得られているベストセラーの本もあれば、そうではない本もある。交差検定に用いるデータ数を確保するため、本具体例では、評価値が１０件を超える本（4,518種類）を対象とする。 The BCDS evaluation value is assigned with a maximum score of 10 points. In this specific example, a book that the user gave an evaluation value of more than 5 points is regarded as a book that the user is satisfied with.
In this specific example, the user attributes satisfied for each book are learned, and based on the unknown user attributes, it is measured whether a book that the user can satisfy can be recommended. In this specific example, the accuracy of the recommendation is measured by separating the user used for learning and the user used for the test by using a general three-cross test. There are some best-selling books in BCDS that have many ratings, and some are not. In order to secure the number of data used for cross-validation, this specific example targets books (4,518 types) with an evaluation value exceeding 10 cases.

表８は、ＢＣＤＳに対して文献１方式と本実施形態でリコメンドの実験を行った結果である。本実施形態は文献１方式より平均で1.5点高い評価値を得られた。

Table 8 shows the results of a recommendation experiment performed on BCDS using the Document 1 method and this embodiment. In this embodiment, an evaluation value that is 1.5 points higher on average than the literature 1 method was obtained.

（７．２）Play-Tennisデータセット
表９に、Play-Tennisデータセットを示す（縦二重線の左側）。Play-Tennisデータセット（以下、ＰＴＤＳという。）は、１４日間のそれぞれの日付の属性の条件下において、テニスを行ったか休んだかを記したものである。

(7.2) Play-Tennis Data Set Table 9 shows the Play-Tennis data set (left side of the vertical double line). The Play-Tennis data set (hereinafter referred to as PTDS) describes whether tennis was played or rested under the conditions of the attribute of each date for 14 days.

このＰＴＤＳを用いて、それぞれの属性の条件下においてテニスを行ったか休んだかを適切に識別できるか、本実施形態と文献１方式による精度を定量的に明らかにする。本具体例での精度の測定は、少ないデータから精度良く精度測定を行うため、1日ずつ順に抜き出したデータをテストデータとし、残りを学習データとする、Leave One Outを用いる。 Using this PTDS, whether or not tennis has been played or rested under the conditions of each attribute can be appropriately identified, and the accuracy according to the present embodiment and the document 1 method is quantitatively clarified. In the measurement of accuracy in this specific example, Leave One Out is used, in which accuracy is accurately measured from a small amount of data, so that data extracted one by one in turn is used as test data and the rest is used as learning data.

ここで、ＰＴＤＳの日付を実施形態におけるユーザ識別子とみなし、属性（空模様、気温、湿度、風）を実施形態における属性情報Ｘとみなし、テニスを行ったか休んだかを実施形態における商品情報Ｙとみなす。すなわち、属性情報Ｘの項目は、空模様、気温、湿度、風の４種類であるから、属性情報Ｘの項目の数Ｗは４である。属性情報Ｘの区分は、空模様が３種類、気温が３種類、湿度が２種類、風が２種類の合計１０種類であるから、属性情報Ｘの区分の総数Ｖは１０である。すなわち、プロバイダは、Ｎ＝１４人、Ｗ＝４項目からなるユーザ情報を保有する。プロバイダが保有するユーザ識別子ｔは、長さＮ＝１４のベクトルである。プロバイダが保有する属性情報Ｘは、Ｎ＝１４行、Ｖ＝１０列のマトリクスである。
商店は、学習データに相当するデータを保有する。商店は、売上１人あたりＧ＝１種類の商品を含むＭ＝１３件の売上記録を保有する（Ｎ≠Ｍより、プロバイダと商店のデータは非同期である）。商店が保有するユーザ識別子ｕは、長さＭ＝１３のベクトルである。商店が保有する商品情報Ｙは、Ｍ＝１３行、Ｌ＝２列のマトリクスである。

Here, the date of PTDS is regarded as the user identifier in the embodiment, the attribute (sky pattern, temperature, humidity, wind) is regarded as attribute information X in the embodiment, and whether tennis is played or rested is regarded as product information Y in the embodiment. . That is, since the attribute information X has four types of items of sky, temperature, humidity, and wind, the number W of items of the attribute information X is four. The attribute information X has a total of 10 categories, ie, three types of sky patterns, three types of temperature, two types of humidity, and two types of wind. That is, the provider holds user information including N = 14 people and W = 4 items. The user identifier t held by the provider is a vector of length N = 14. The attribute information X held by the provider is a matrix of N = 14 rows and V = 10 columns.
The store has data corresponding to the learning data. The store holds M = 13 sales records including G = 1 type of merchandise per sales person (from N ≠ M, the provider and store data are asynchronous). The user identifier u held by the store is a vector of length M = 13. The merchandise information Y held by the store is a matrix of M = 13 rows and L = 2 columns.

ＰＴＤＳに対する文献１方式と本実施形態による識別結果を表９（縦二重線の右側）に示す。ＢＣＤＳのときと同じく、文献１方式は、識別結果に偏りが見られる結果となった。この結果から、精度を定量的に示す正解率を算出する。識別結果については、識別が正解でリコメンドが「行った」である場合をTrue Positive、識別が不正解でリコメンドが「休んだ」である場合をFalse Negative、識別が不正解でリコメンドが「行った」である場合をFalse Positive、識別が正解でリコメンドが「休んだ」である場合をTrue Negativeとそれぞれ数える。正解率は、（True Positive + True Negative）／（True Positive＋False Negative＋False Positive＋True Negative）で求められる。表１０に、正解率の算出結果を示す。本実施形態は、文献１方式より１５％高い正解率を得られた。

The literature 1 method for PTDS and the identification results according to this embodiment are shown in Table 9 (right side of the vertical double line). As in the case of BCDS, the document 1 method showed a bias in the identification results. From this result, a correct answer rate quantitatively indicating the accuracy is calculated. As for the identification results, True Positive when the identification is correct and the recommendation is “Done”, False Negative when the identification is incorrect and the Recommendation is “Relaxed”, and the recommendation is “I performed with incorrect identification and the recommendation” ”Is counted as False Positive, and the case where the identification is correct and the recommendation is“ rested ”is counted as True Negative. The accuracy rate is obtained by (True Positive + True Negative) / (True Positive + False Negative + False Positive + True Negative). Table 10 shows the calculation result of the correct answer rate. In the present embodiment, a correct answer rate 15% higher than that of the literature 1 method was obtained.

（７．３）計算コスト
本実施形態で想定しているデータの規模について、最近の電子マネー利用サービスの動向を鑑みて具体化する。ただし、電子マネーには、先払い式電子マネーと後払い式電子マネーがある。先払い式電子マネーは属性が正しく登録されているとは限らないことから、ここでは、利用者と支払者が一致すると考えられる後払い式電子マネーに着目する。 (7.3) Calculation Cost The scale of data assumed in the present embodiment is embodied in view of the recent trend of electronic money utilization services. However, electronic money includes prepaid electronic money and postpaid electronic money. Since prepaid electronic money is not always registered correctly, attention is paid here to postpaid electronic money in which the user and the payer are considered to match.

まず、ユーザ情報について、２０１０年３月末時点のユーザ数はＡ社が１４２０万人、Ｂ社が４８６万人、Ｃ社が１１０万人である。これより、ユーザ情報の数Ｎは、１００万〜１０００万人を想定する。ユーザの属性については、年代と性別と住所を想定し、属性情報Ｘの項目数Ｗは、３項目を想定する。また、年代は８区分、性別は２区分、住所は４７区分（都道府県）とし、属性情報Ｘの区分の総数Ｖは、５７種類を想定する。 First, regarding user information, the number of users as of the end of March 2010 is 14.2 million for Company A, 4.86 million for Company B, and 1.1 million for Company C. From this, the number N of user information assumes 1 million-10 million people. As for user attributes, age, gender, and address are assumed, and the number of items W of the attribute information X is assumed to be three items. Further, it is assumed that the age is 8 categories, the sex is 2 categories, the address is 47 categories (prefectures), and the total number V of attribute information X categories is 57 types.

次に、売上記録について、全国で１ヶ月に７８万６千箇所で１６８万件の電子マネー決済が行われていることから、１ヶ月１箇所あたりの売上情報の記録数の平均は２１８件である。この値は商店の規模、所在地、業種、来訪客のリピート率などにより大きく異なると考えられるため、売上記録に含まれる人数Ｍは、１００〜１０万人を想定する。商品の種類数については、飲食店のメニューはせいぜい１００種類も道程すれば十分かと考えられるが、コンビニエンスストアの商品は２５００種類ある。商店の規模、所在地、業種などにより大きく異なると考えられるため、商品の種類数Ｌは、１００〜１万種類を想定する。また、全国で１回あたりの決済金額は８３０円であるので、一度の来店で購入する商品は数種類であると考えられる。売上記録を月単位に締めるとすると、来訪客が毎週来店したとしても１ヶ月にユーザが購入する商品は十数種類であると考えられるので、売上記録に含まれる１人あたりの商品の種類数の平均Ｇは、１０種類を想定する。 Next, with regard to sales records, 1,680,000 electronic money payments are made at 786,000 locations nationwide in a month, so the average number of sales information records per month is 218. is there. Since this value is considered to vary greatly depending on the size, location, type of business, repeat rate of visitors, etc., the number of people M included in the sales record is assumed to be 100,000 to 100,000. With regard to the number of types of products, it is considered that it is sufficient to go through 100 menus at restaurants, but there are 2500 types of products at convenience stores. Since it is considered that the store size varies greatly depending on the size, location, type of business, etc., the number L of items is assumed to be 100 to 10,000. In addition, since the settlement amount per transaction in the whole country is 830 yen, it is considered that there are several kinds of products purchased at one visit. If the sales record is tightened on a monthly basis, even if a visitor visits the store every week, it is considered that there are dozens of products purchased by the user per month, so the number of types of products per person included in the sales record The average G assumes 10 types.

次に、上記の規模を踏まえて、シミュレーションにより従来方式と本実施形態の計算コストを明らかにする。なお、プロバイダと商店はそれぞれ４コアのＣＰＵを備えた計算機を利用でき、来訪客は１コアのＣＰＵを備えた端末を利用できると仮定し、１回の冪剰余にかかる計算時間は６ｍｓ／コアと仮定する。 Next, based on the above scale, the calculation cost of the conventional method and this embodiment is clarified by simulation. It is assumed that the provider and the store can each use a computer having a 4-core CPU, and that the visitor can use a terminal having a 1-core CPU, the calculation time required for one surplus is 6 ms / core. Assume that

図６は、１ヶ月あたりの売上人数と処理時間との関係を示す図である。学習にかかる計算コストとして、１ヶ月あたりの売上人数（単位：人）に対して処理時間（単位：日）をプロットすると、プロバイダのユーザ数が１００万人の場合は図６（ａ）のようになり、１，０００万人の場合は図６（ｂ）のようになる。実験の結果、本実施形態は文献１方式に対して１０倍以上速いという結果が得られた。プロバイダのユーザ数が１００万人で商品が１００種類の場合は、文献１方式では１００日以上かかってしまうが、本実施形態であれば１０日以内で計算できるという結果となった。 FIG. 6 is a diagram showing the relationship between the number of sales per month and the processing time. When the processing time (unit: day) is plotted against the number of sales per month (unit: person) as the calculation cost for learning, when the number of users of the provider is 1 million, as shown in FIG. In the case of 10 million people, it becomes as shown in FIG. As a result of the experiment, a result that the present embodiment is 10 times faster than the document 1 method was obtained. When the number of provider users is 1 million and there are 100 types of products, the document 1 method takes 100 days or more, but in this embodiment, the calculation can be made within 10 days.

図７は、商店が利用できる計算機の台数と、来訪客へのリコメンドの遅延との関係を示す図である。飲食店など商品の種類数Ｌが１００種類程度であれば、商店が利用できる計算機の台数が１台の場合に１０秒弱の遅延となる。計算機の台数を増やせば、商店が行うＶＬ回の冪剰余を並列実行できるので、遅延は減少する。しかし、来訪客の端末が行うＬ回の冪剰余は並列実行できないので、商店が計算機の台数を増やしても、遅延は１００×６×１０−３＝０．６秒に漸近するだけである。商品の種類Ｌを１００種類から１，０００種類、１０，０００種類へと増やすと、遅延も約１０倍、約１００倍へと増える結果となった。 FIG. 7 is a diagram illustrating the relationship between the number of computers that can be used by the store and the delay in recommending to visitors. If the number L of goods such as restaurants is about 100, a delay of a little less than 10 seconds occurs when the number of computers that can be used by the store is one. If the number of computers is increased, VL surpluses performed by the store can be executed in parallel, so that the delay is reduced. However, since the L surpluses performed by the visitor's terminal cannot be executed in parallel, even if the store increases the number of computers, the delay is only asymptotic to 100 × 6 × 10 −3 = 0.6 seconds. When the product type L was increased from 100 types to 1,000 types and 10,000 types, the delay also increased to about 10 times and about 100 times.

（８）本実施形態と従来方式との比較
（８．１）安全性
本実施形態の学習過程の安全性、識別仮定の安全性について説明する。なお、以下の説明において、離散対数問題や準同型暗号などの暗号プリミティブは十分に安全であるとする。
本実施形態の学習過程は、以下の２つの過程により構成される。
学習過程１プロバイダと商店でのセキュアマッチングによる集計値の計算
学習過程２商店におけるパラメータおよびハイパーパラメータの計算 (8) Comparison between this embodiment and conventional method (8.1) Safety The safety of the learning process and the safety of the identification assumption of this embodiment will be described. In the following description, it is assumed that cryptographic primitives such as the discrete logarithm problem and homomorphic encryption are sufficiently safe.
The learning process of the present embodiment is composed of the following two processes.
Learning process 1 Calculation of aggregate values by secure matching between provider and store Learning process 2 Calculation of parameters and hyperparameters in store

学習過程１の安全性は、セキュアマッチングの安全性に依存する。本実施形態では、片方向のセキュアマッチング、すなわち商店のみが集計値を得るプロトコルを用いているため、プロバイダは商店の売上記録に含まれる人数Ｍ、商品の種類数Ｌ、１人あたりの売上に含まれる商品数Ｇ以外の情報を得ることはない。商店は、セキュアマッチングの出力で

情報を得ることはない。Ｍ、Ｌ、Ｇ、Ｎ、Ｗ、Ｖはいずれも公開パラメータであるから、プロバイダと商店がこれらの情報を得ても、ユーザのプライバシーの侵害にはならない。 The safety of learning process 1 depends on the safety of secure matching. In this embodiment, one-way secure matching, that is, a protocol in which only the store obtains the aggregated value is used, so the provider can calculate the number of people M, the number of product types L included in the store sales record, and the sales per person. No information other than the number G of products included is obtained. Shops can output secure matching

I don't get information. Since all of M, L, G, N, W, and V are public parameters, even if the provider and the store obtain such information, it does not infringe the privacy of the user.

学習過程２は、商店により学習過程１の出力である集計値と、学習過程１で得られる暗号文のみを用いてハイパーパラメータが最適化される。すなわち、商店は新たにプロバイダのユーザ情報に関する情報を得ることはない。
本実施形態の識別過程は、商店と来訪客での秘匿内積計算によって構成され、商店は、新たに来訪客のプライバシに関する情報を得ることはない。来訪客は、秘匿内積計算の出力である各商品の周辺尤度と、商品の種類数のみを得る。
以上より、大域的なパラメータであるＭ、Ｌ、Ｇ、Ｎ、Ｗ、Ｖを除き、本実施形態の実行によって、プロバイダはいかなる情報も得ることはない。商店は集計値を、来訪客は周辺尤度をそれぞれ得ることになる。ここで、集計値はプロバイダのユーザ情報の保護に、周辺尤度は商店の売上記録の保護にそれぞれ関係する。 In the learning process 2, the hyper parameter is optimized by using only the total value that is the output of the learning process 1 and the ciphertext obtained in the learning process 1 by the store. That is, the store does not newly obtain information on the user information of the provider.
The identification process of the present embodiment is configured by calculation of a secret inner product between a store and a visitor, and the store does not newly obtain information on the visitor's privacy. The visitor obtains only the marginal likelihood of each product and the number of types of products, which are the output of the secret inner product calculation.
From the above, except for the global parameters M, L, G, N, W, and V, the provider does not obtain any information by executing this embodiment. Stores get aggregated values, and visitors get marginal likelihood. Here, the total value is related to the protection of the provider's user information, and the peripheral likelihood is related to the protection of the sales record of the store.

商店の売上記録の保護については、文献１方式や文献２方式が推薦する商品のみを来訪客に開示するのに対し、本実施形態は商品ごとの周辺尤度を開示する。これは、来訪客のプライバシ保護とのトレードオフである。しかし、現実的には単一のユーザの属性に対する周辺尤度から商店の売上記録を推定することは困難である。また、たとえ攻撃者が結託して多数の来訪客の属性に対する周辺尤度を収集できたとしても、高々属性ごとの商品の売上の割合を推測することしかできず、個々の来訪客がどの商品を購入したかを推測することはできない。なお、来訪客の端末として携帯電話機を利用する場合は、携帯電話機が備えるＩＣカードであるＳＩＭ／ＵＩＭを用いて推薦商品の導出を行わせることも考えられる。この場合、文献１方式及び文献２方式と同様に、推薦する商品以外の情報が来訪客に開示されることはない。
以上のとおり、来訪客のプライバシは、本実施形態で完全に保護される。これに対して、文献１方式と文献２方式では、前述のように来訪客の属性が商店に開示される。 Regarding the protection of the sales record of the store, only the products recommended by the literature 1 method and the literature 2 method are disclosed to the visitor, whereas this embodiment discloses the marginal likelihood for each product. This is a trade-off with visitor privacy protection. However, in reality, it is difficult to estimate the sales record of a store from the marginal likelihood for a single user attribute. Even if an attacker can collaborate and collect marginal likelihood for a large number of visitor attributes, he can only estimate the sales ratio of the products for each attribute at most, You can't guess what you bought. When a mobile phone is used as a visitor's terminal, it may be possible to derive recommended products using a SIM / UIM that is an IC card provided in the mobile phone. In this case, information other than recommended products is not disclosed to the visitor as in the case of the literature 1 method and the literature 2 method.
As described above, the privacy of visitors is completely protected in this embodiment. On the other hand, in the document 1 method and the document 2 method, the attributes of the visitor are disclosed to the store as described above.

（８．２）精度
本節では、本実施形態の精度について、ＢＣＤＳとＰＴＤＳを用いた実験結果を踏まえて説明する。
ＢＣＤＳを用いた実験では、表８に示したとおり、推薦した本に対する各ユーザの評価値は本実施形態が平均６．０点で、文献１方式は平均１．６点となった。ここで、評価値の標準偏差を考慮すると、本実施形態は概ね（前後１σ以内）５〜７点の評価値を得ており、本実験ではユーザが５点を超える評価値を与えた本を学習させてリコメンドを行ったため、概ねユーザの満足を得られたといえる結果が得られた。同様に、文献１方式、文献２方式の結果を鑑みると、文献１方式、文献２方式は概ね０〜３．２点の評価値を得ており、多くのユーザの満足を得られたとは言い難い結果となった。 (8.2) Accuracy In this section, the accuracy of the present embodiment will be described based on experimental results using BCDS and PTDS.
In the experiment using BCDS, as shown in Table 8, the evaluation value of each user for the recommended book averaged 6.0 points in the present embodiment, and averaged 1.6 points in the document 1 method. Here, in consideration of the standard deviation of the evaluation value, this embodiment generally obtains an evaluation value of 5 to 7 points (within 1σ before and after), and in this experiment, a book in which the user gave an evaluation value exceeding 5 points was obtained. Since the recommendation was made after learning, the result that it can be said that the user's satisfaction was generally obtained. Similarly, considering the results of the document 1 method and the document 2 method, the document 1 method and the document 2 method have generally obtained evaluation values of 0 to 3.2 points, and it is said that many users have been satisfied. The result was difficult.

また、両者の違いは、評価値以外にも、推薦に含む本の種類にも現れた。本実施形態は３３７種類の本を推薦したが、文献１方式、文献２方式は６０種類の本しか推薦せずに、偏ったリコメンドとなった。本実施形態は、ユーザの属性に応じて多くの種類の本を薦める傾向が現れている。皆に同じ商品を薦める単調なリコメンドよりも、属性に応じてバラエティに富んだリコメンドの方が良いと言える。ＢＣＤＳを用いた実験では、本実施形態が文献１方式よりもより多くの商品を推薦し，かつ精度も高い傾向を確認できた。
ＰＴＤＳを用いた実験では、文献１方式、文献２方式は３６％の正解率しか得られなかったが、本実施形態では７９％の正解率が得られた。 In addition to the evaluation value, the difference between the two also appeared in the types of books included in the recommendation. In the present embodiment, 337 types of books are recommended, but the literature 1 method and the literature 2 method recommend only 60 types of books, resulting in biased recommendations. In the present embodiment, there is a tendency to recommend many kinds of books according to user attributes. It is better to recommend a variety of recommendations depending on the attributes, rather than a monotonous recommendation that recommends the same product to everyone. In an experiment using BCDS, it was confirmed that the present embodiment recommended more products than the literature 1 method and had a high accuracy.
In the experiment using PTDS, only the accuracy rate of 36% was obtained in the literature 1 method and the literature 2 method, but in this embodiment, the accuracy rate of 79% was obtained.

（８．３）計算コスト
本実施形態の計算コストについて、学習過程と識別過程に分けて説明する。
学習過程の計算コストは、文献１方式では２ＮＶＬ回の冪剰余計算が必要であるが、文献２方式と本実施形態ではＮＷ＋ＭＧ＋ＭＧＶ＋ＮＷＬ回で済む。文献２方式と本実施形態は、商店で行うＮＷＬ回の計算コストが支配的であるため、主にプロバイダの秋因数Ｎと商店の商品の種類の数Ｌによって処理時間が変動する。（４．４）で示したのと同じ条件下では、１００万人のユーザ情報を保有するプロバイダと１００種類の商品を扱う商店は１０日以内に学習を行うことができる。この計算は、並列度が極めて高いため、並列処理による計算時間の短縮が容易である。１，０００台分の計算能力を持つクラウドを利用してスケールアウトする場合は、１，０００万人のユーザ情報を保有するプロバイダと１０，０００種類の商品を扱う商店は１０日以内に学習を行うことができる。 (8.3) Calculation Cost The calculation cost of this embodiment will be described separately for the learning process and the identification process.
The calculation cost of the learning process requires 2NVL times of remainder calculation in the document 1 method, but only NW + MG + MGV + NWL times in the document 2 method and this embodiment. Since the calculation cost of NWL times performed in the store is dominant in the literature 2 method and the present embodiment, the processing time varies mainly depending on the autumn factor N of the provider and the number L of product types of the store. Under the same conditions as described in (4.4), a provider that holds 1 million user information and a store that handles 100 types of products can learn within 10 days. Since this calculation has an extremely high degree of parallelism, it is easy to shorten the calculation time by parallel processing. When scaling out using a cloud with the capacity of 1,000 units, providers with 10 million user information and stores that handle 10,000 types of products can learn within 10 days. It can be carried out.

識別過程の計算コストは、文献１方式と文献２方式ではマルチパーティ計算が必要であるが、本実施形態ではＶＬ＋Ｌ回の冪剰余計算で済み、プロバイダのユーザ数や商店の売上記録数により影響を受けない。（４．４）で示したのと同じ条件下では、１００種類の商品を扱う商店は１０秒程度の遅延で来訪客へリコメンドを提供できる。しかし、来訪客は商店よりもスケールアウトが困難であるため、来訪客の端末で行うＬ回の計算がボトルネックとなり、主に商品数Ｌによって処理時間が変動する。たとえ商店が１，０００台分の計算能力を持つクラウドを利用したとしても、１０，０００種類の商品を扱う商店では１００秒程度の時間を要する。だが、リコメンドの提示にあたり、全商品に対する推薦値の計算終了を待つことなく、計算途中でも、その時点までに最も高い周辺尤度を得た商品を逐次入れ替えて表示するなどの工夫により対応は可能である。 The calculation cost of the identification process requires multi-party calculation in the document 1 method and the document 2 method, but in this embodiment, VL + L times of surplus calculation are required, and it is affected by the number of users of the provider and the number of sales records of the store. I do not receive it. Under the same conditions as described in (4.4), a store that handles 100 types of products can provide recommendations to visitors with a delay of about 10 seconds. However, since it is more difficult for a visitor to scale out than a store, the calculation performed L times at the visitor's terminal becomes a bottleneck, and the processing time varies mainly depending on the number of products L. Even if a store uses a cloud with a calculation capacity for 1,000 units, a store handling 10,000 kinds of products requires about 100 seconds. However, when presenting a recommendation, it is possible to respond by ingenuity such as sequentially replacing the products with the highest marginal likelihood until that point in time, without waiting for the end of calculation of recommended values for all products. It is.

（９）変形例
実施形態を以下の変形例のように変形してもよい。また、複数の変形例を組み合わせてもよい。また、実施形態と変形例を組み合わせもよい。
（９．１）変形例１
実施形態では、セキュアサンプリングの過程で１人分のデータに相当するＷ個の暗号文を順に除いて第２集計値を求める例を示したが、ｎ（２＜ｎ）人分のデータに相当するｎＷ個の暗号文を順に除いて第２集計値を求めるようにしてもよい。この場合、１人分ずつ除く場合と比べて精度は低下するが、差分攻撃が成功しても複数人までしかユーザを絞り込めないため、安全性では上回ると考えられる。
要するに、第２集計値は、属性と選択肢との全ての組み合わせについて、第３演算結果と第４演算結果とで一致する暗号文から１つ以上の暗号文をランダムに除いた複数の集合を作成し、当該複数の集合の各々について第４演算結果と一致する暗号文の数を集計して求めればよい。 (9) Modifications The embodiment may be modified as in the following modifications. A plurality of modified examples may be combined. Further, the embodiment and the modification examples may be combined.
(9.1) Modification 1
In the embodiment, an example is shown in which the second aggregated value is obtained by sequentially removing W ciphertexts corresponding to data for one person in the process of secure sampling, but it corresponds to data for n (2 <n) persons. The second total value may be obtained by sequentially removing nW ciphertexts. In this case, the accuracy is reduced as compared with the case where one person is excluded, but even if the differential attack is successful, only a plurality of users can be narrowed down.
In short, the second total value creates a plurality of sets in which one or more ciphertexts are randomly removed from ciphertexts matching the third operation result and the fourth operation result for all combinations of attributes and options. Then, for each of the plurality of sets, the number of ciphertexts that match the fourth calculation result may be obtained by aggregation.

（９．３）変形例３
端末５０は、携帯電話機等の携帯端末でもよいし、いわゆるパーソナルコンピュータ等の据え置き型の情報処理装置でもよい。 (9.3) Modification 3
The terminal 50 may be a mobile terminal such as a mobile phone or a stationary information processing apparatus such as a so-called personal computer.

（９．４）変形例４
実施形態では、サーバ１０、リコメンド支援装置３０、端末５０の各制御部がプログラムを実行することによって動作する例を示したが、実施形態と同様の機能をハードウェアで各装置に実装するようにしてもよい。また、このプログラムを、光記録媒体、半導体メモリ等、コンピュータで読み取り可能な記録媒体に記録して提供し、この記録媒体からプログラムを読み取って各装置の記憶部に記憶させるようにしてもよい。また、このプログラムを電気通信回線経由で提供してもよい。 (9.4) Modification 4
In the embodiment, the example in which each control unit of the server 10, the recommendation support device 30, and the terminal 50 operates by executing a program has been described. However, the same function as that of the embodiment is implemented in each device by hardware. May be. Further, the program may be provided by being recorded on a computer-readable recording medium such as an optical recording medium or a semiconductor memory, and the program may be read from the recording medium and stored in the storage unit of each device. Further, this program may be provided via a telecommunication line.

１０…サーバ、１０１…制御部、１０２…記憶部、１０３…通信部、１１…ユーザ情報記憶部、１２…第１演算部、１３…第３演算部、３０…リコメンド支援装置、３０１…制御部、３０２…記憶部、３０３…通信部、３１…売上記録記憶部、３２…第２演算部、３３…第４演算部、３４…第１集計部、３５…第２集計部、３６…パラメータ算出部、３７…リコメンド値暗号化部、５０…端末、５０１…制御部、５０２…記憶部、５０３…通信部、５１…属性情報記憶部、５２…属性情報暗号化部、５３…復号部 DESCRIPTION OF SYMBOLS 10 ... Server, 101 ... Control part, 102 ... Memory | storage part, 103 ... Communication part, 11 ... User information memory | storage part, 12 ... 1st calculating part, 13 ... 3rd calculating part, 30 ... Recommendation support apparatus, 301 ... Control part , 302 ... storage unit, 303 ... communication unit, 31 ... sales record storage unit, 32 ... second calculation unit, 33 ... fourth calculation unit, 34 ... first tabulation unit, 35 ... second tabulation unit, 36 ... parameter calculation , 37 ... recommended value encryption unit, 50 ... terminal, 501 ... control unit, 502 ... storage unit, 503 ... communication unit, 51 ... attribute information storage unit, 52 ... attribute information encryption unit, 53 ... decryption unit

Claims

A server for storing user information in which a user identifier for identifying a user of the terminal and attribute information representing the attribute of the user are associated;
A recommendation support device that stores a selection history in which an option selected by the user is associated with the user identifier corresponding to the user;
A recommendation support method executed by a terminal that stores attribute information representing an attribute of a user of the own device,
The server extracts a user identifier associated with the attribute for each attribute from the user information, and transmits a first calculation result obtained by encrypting the user identifier with a first encryption method to the recommendation support apparatus. A first calculation step;
The recommendation support device extracts a user identifier associated with the option for each option from the selection history, and transmits a second calculation result obtained by encrypting the user identifier with the first encryption method to the server. A second computing step to:
A third computation step in which the server transmits a third computation result obtained by encrypting the second computation result received from the recommendation support device using the first encryption method to the recommendation support device;
A fourth calculation step in which the recommendation support device calculates a fourth calculation result obtained by encrypting the first calculation result received from the server by the first encryption method;
The recommendation support device calculates the first total value by calculating the number of ciphertexts that match between the third calculation result and the fourth calculation result for all combinations of the attribute and the option. Steps,
A plurality of sets in which the recommendation support device randomly removes one or more ciphertexts from ciphertexts matching the third operation result and the fourth operation result for all combinations of the attribute and the option. A second counting step of calculating the number of ciphertexts that match the fourth operation result for each of the plurality of sets to obtain a second total value;
The recommendation support method further comprising: a parameter calculation step of calculating a parameter for obtaining a recommendation value corresponding to the product based on the first aggregation value and the second aggregation value. .

An attribute information encryption step in which the terminal transmits to the recommendation support device encrypted attribute information obtained by encrypting the attribute information stored in the device by a second encryption method satisfying additive homomorphism;
The recommendation support device receives the encrypted attribute information from the terminal, and based on the parameter calculated in the parameter calculating step and the encrypted attribute information, the recommendation value satisfies additive homomorphism. A recommendation value encryption step of calculating an encryption recommendation value encrypted by an encryption method, and transmitting the encryption recommendation value to the terminal;
The recommendation support according to claim 1, further comprising: a decrypting step in which the terminal receives the encrypted recommendation value from the recommendation support device, decrypts the encrypted recommendation value, and calculates a recommendation value. Method.

The recommendation support method according to claim 1, wherein the server and the recommendation support device perform encryption using residue calculation as the first encryption method.

In the first calculation step, the server performs a remainder calculation after calculating the user identifier with a one-way function,
4. The recommendation support method according to claim 3, wherein, in the second calculation step, the recommendation support apparatus performs a remainder calculation after calculating the user identifier with a one-way function. 5.

In the first calculation step, the server rearranges the ciphertexts included in the first calculation result and transmits them to the recommendation support device,
5. The recommendation support according to claim 1, wherein, in the third calculation step, the server rearranges ciphertexts included in the third calculation result and transmits the ciphertexts to the recommendation support device. Method.

The recommendation value encrypting step, wherein the recommendation support apparatus encrypts the recommendation value using a remainder calculation satisfying additive homomorphism as the third encryption method. The recommendation support method in any one of.

User information storage means for storing user information in which a user identifier for identifying a user of a terminal and attribute information representing the attribute of the user are associated, and a user identifier associated with the attribute for each attribute Is extracted from the user information, the first calculation means for transmitting the first calculation result obtained by encrypting the user identifier by the first encryption method to the outside, and the second calculation result received from the outside is the first encryption A server having a third calculation means for transmitting the third calculation result encrypted by the method to the outside, and
Attribute information storage means for storing attribute information representing the attributes of the user of the device itself, and attribute information encryption for transmitting to the outside encrypted attribute information obtained by encrypting the attribute information with a second encryption method satisfying additive homomorphism A communication unit that communicates with a terminal having an encryption unit and a decryption unit that receives the encrypted recommendation value from the outside, decrypts the encrypted recommendation value, and calculates the recommendation value;
A selection history storage unit that stores a selection history in which an option selected by the user is associated with the user identifier corresponding to the user;
Second calculation means for extracting a user identifier associated with the option for each option from the selection history, and transmitting a second calculation result obtained by encrypting the user identifier using the first encryption method to the server; ,
A fourth calculation means for calculating a fourth calculation result obtained by encrypting the first calculation result received from the server by the first encryption method;
First counting means for totaling the number of ciphertexts that match between the third calculation result and the fourth calculation result for all combinations of the attribute and the option to obtain a first total value;
For all combinations of the attribute and the option, create a plurality of sets by randomly removing one or more ciphertexts from ciphertexts matching the third operation result and the fourth operation result, Second counting means for counting the number of ciphertexts that match the fourth operation result for each of the sets to obtain a second total value;
A recommendation support apparatus, comprising: parameter calculation means for calculating a parameter for obtaining a recommendation value corresponding to the product based on the first aggregation value and the second aggregation value.

The encryption attribute information is received from the terminal, and the recommended value is encrypted with a third encryption method satisfying additive homomorphism based on the parameter calculated by the parameter calculation means and the encryption attribute information. 8. The recommendation support apparatus according to claim 7, further comprising a recommendation value encryption unit that calculates the encrypted recommendation value and transmits the encrypted recommendation value to the terminal.

User information storage means for storing user information in which a user identifier for identifying a user of a terminal and attribute information representing the attribute of the user are associated, and a user identifier associated with the attribute for each attribute Is extracted from the user information, the first calculation means for transmitting the first calculation result obtained by encrypting the user identifier by the first encryption method to the outside, and the second calculation result received from the outside is the first encryption A server having a third calculation means for transmitting the third calculation result encrypted by the method to the outside, and
Attribute information storage means for storing attribute information representing the attributes of the user of the device itself, and attribute information encryption for transmitting to the outside encrypted attribute information obtained by encrypting the attribute information with a second encryption method satisfying additive homomorphism A computer having communication means for performing communication between the communication means and the terminal having the encryption recommendation value received from the outside, and the decryption means for decoding the encryption recommendation value and calculating the recommendation value;
A selection history storage unit that stores a selection history in which an option selected by the user is associated with the user identifier corresponding to the user;
Second calculation means for extracting a user identifier associated with the option for each option from the selection history, and transmitting a second calculation result obtained by encrypting the user identifier using the first encryption method to the server; ,
A fourth calculation means for calculating a fourth calculation result obtained by encrypting the first calculation result received from the server by the first encryption method;
First counting means for totaling the number of ciphertexts that match between the third calculation result and the fourth calculation result for all combinations of the attribute and the option to obtain a first total value;
For all combinations of the attribute and the option, create a plurality of sets by randomly removing one or more ciphertexts from ciphertexts matching the third operation result and the fourth operation result, Second counting means for counting the number of ciphertexts that match the fourth operation result for each of the sets to obtain a second total value;
The program for functioning as a parameter calculation means which calculates the parameter for calculating | requiring the recommendation value corresponding to the said goods based on the said 1st total value and the said 2nd total value.