JP2016510913A

JP2016510913A - Privacy protection recommendation method and system based on matrix factorization and ridge regression

Info

Publication number: JP2016510913A
Application number: JP2015561771A
Authority: JP
Inventors: ヨアニディス，エフストラティオス; ヴァインスベルグ，エフード; タフト，ニーナ，アン; ジョイエ，マルク; ニコラエンコ，ヴァレリア
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2013-08-09
Filing date: 2014-05-01
Publication date: 2016-04-11
Also published as: KR20160041028A; CN105144625A; EP3031165A2; JP2016510912A; JP2016517069A; CN105009505A; CN105103487A

Abstract

プライバシー保護リコメンデーションを生成する方法とシステムは、第１のユーザセットから、トークンとアイテムを含む第１のレコードセットを入力として受信することにより始まる；第１のレコードセットに対する行列因子分解に基づき第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計及び評価し、要求ユーザからの第２のレコードに対するリッジ回帰に基づき第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計及び評価し、プライバシー保護して、少なくとも１つのアイテムに関するリコメンデーションを生成し、レコードとそれから抽出される情報がそのソース以外にパーティには秘密にしておかれ、リコメンデーションは要求ユーザのみによって知られる。該システムは、ｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計する暗号サービスプロバイダと、第１の回路を評価するリコメンダも含む。要求ユーザは第２の回路を評価し、第１のユーザセットには含まれない。A method and system for generating a privacy protection recommendation begins by receiving as input a first record set containing tokens and items from a first user set; based on a matrix factorization for the first record set. Design and evaluate one garbled circuit, design and evaluate a second garbled circuit based on ridge regression on the second record from the requesting user, protect privacy, generate recommendations for at least one item, The record and the information extracted from it are kept secret to the party other than its source, and recommendations are known only by the requesting user. The system also includes a cryptographic service provider that designs a garbled circuit and a recommender that evaluates the first circuit. The requesting user evaluates the second circuit and is not included in the first user set.

Description

本原理は、プライバシー保存リコメンデーションシステム及びセキュアマルチパーティ計算に関し、具体的には、プライバシーを守り目隠しをした状態で、行列因子分解とリッジ回帰に基づき、貢献ユーザと非貢献ユーザをレーティングするリコメンデーションの提供に関する。 This principle relates to a privacy preservation recommendation system and secure multi-party calculation. Specifically, a recommendation to rate contributors and non-contributors based on matrix factorization and ridge regression with privacy protected and blindfolded. Related to the provision of.

ここ十年での多くの研究と商業活動により、リコメンデーションシステムの使用が広がっている。かかるシステムはユーザに映画、テレビ番組、音楽、本、ホテル、レストランなどの多種のアイテムのパーソナライズされたリコメンデーションを提供する。図１は一般的なリコメンデーションシステム１００のコンポーネントを示す図であり、複数のユーザ１１０は、ソースと、ユーザの入力１２０を処理してリコメンデーション１４０を出力するリコメンデーションシステム（ＲｅｃＳｙｓ）１３０とを表す。有用なリコメンデーションを受けるため、ユーザは、自分の嗜好（ユーザの入力）に関する多くのパーソナル情報を供給するが、リコメンダがこのデータを適切に管理するとの信頼をおいている。 Much research and commercial activity in the last decade has expanded the use of recommendation systems. Such a system provides users with personalized recommendations for various items such as movies, TV shows, music, books, hotels, restaurants and the like. FIG. 1 is a diagram showing components of a general recommendation system 100. A plurality of users 110 includes a source and a recommendation system (RecSys) 130 that processes a user input 120 and outputs a recommendation 140. Represent. In order to receive useful recommendations, users supply a lot of personal information about their preferences (user input), but they trust that the recommender properly manages this data.

それにもかかわらず、非特許文献１及び非特許文献２などの初期の研究は、リコメンダによりかかる情報が乱用されたりユーザがプライバシーの危険にさらされたりすることがある複数の方法を示している。リコメンダは利益を得るためにデータを再販売したり、ユーザが意図的に明かした事以上の情報を抽出したりする衝動にかられることが多い。例えば、映画のレーティングやある人のテレビ視聴履歴など一般的にはセンシティブとは思われていないユーザ嗜好の記録であっても、これを用いてユーザの政党、ジェンダーなどを推論できる。リコメンデーションシステムのデータから推論できるプライベート情報は、新しいデータマイニング及び推論方法が開発されるにつれ、常に進化し続けている。極端に言えば、ユーザ嗜好の記録を用いてユーザを一意的に特定することさえでき、非特許文献３はＮｅｔｆｌｉｘデータセットを匿名の身元を暴くことによりこれをデモンストレーションして驚かせた。というように、リコメンダに悪意がないとしても、かかるデータが意図せずにもれると、ユーザはリンケージ攻撃（ｌｉｎｋａｇｅａｔｔａｃｋ）、すなわちあるデータベースを、別のデータベースにおけるプライバシーを補う補助情報として用いる攻撃にさらされる。 Nevertheless, early studies such as Non-Patent Document 1 and Non-Patent Document 2 show multiple ways in which such information can be abused by the recommender and the user is at risk of privacy. Recommenders are often urged to resell data to gain profits or to extract information beyond what the user has purposely revealed. For example, a user's preference record that is generally not considered sensitive, such as a movie rating or a TV viewing history of a person, can be used to infer user political parties, gender, and the like. Private information that can be inferred from recommendation system data is constantly evolving as new data mining and inference methods are developed. In extreme terms, a user preference record could even be used to uniquely identify a user, and Non-Patent Document 3 was surprised by demonstrating the Netflix dataset by revealing its anonymous identity. Thus, even if the recommender is not malicious, if such data is unintentionally, the user can use a linkage attack, that is, an attack that uses one database as supplementary information to supplement privacy in another database. Exposed.

将来の推論の危険、事故的情報漏洩、またはインサイダーによる危険（意図的漏洩）を常に予見できるとは限らないので、ユーザが自分のパーソナルデータを暗号化せずに明かさないリコメンデーションシステムを構成することに興味がある。本願と同日に出願された、発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹＰＲＥＳＥＲＶＩＮＧＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＲＴＩＯＮ」である本発明者による同時係属中の出願は、行列因子分解に基づくプライバシー保護リコメンデーションシステムについて記載している。これは、ユーザがリコメンダシステムに送信したレーティングに作用する。これは個別ユーザのレーティングや彼らがレーティングしたアイテムを知らなくても、アイテムのレートを決める（ｐｒｏｆｉｌｅｓ）。これは、リコメンダがアイテムプロファイルを学習することにユーザが同意することを仮定する。
本原理は、リコメンダシステムが、ユーザのレーティングやシステムがレーティングしたアイテムに関する情報を何も学習せず、アイテムプロファイルに関する情報や、ユーザデータから抽出された統計情報を何も学習しないプライバシー保護が強いリコメンデーションシステムを提案する。よって、リコメンデーションシステムはレーティングに貢献したユーザにリコメンデーションを提供するが、提供するリコメンデーションについては完全にブラインド（ｂｌｉｎｄ）である。さらに、リコメンデーションシステムは、リッジ回帰を利用することにより、元の行列因子分解演算に参加していない新しいユーザにリコメンデーションを提供できる。
［関連出願との相互参照］
本願は２０１３年８月９日に出願された次の米国仮特許出願の利益と優先権を主張するものである：第６１／８６４０８８号（発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹＰＲＥＳＥＲＶＩＮＧＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＴＩＯＮ」；第６１／８６４０８５号（発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹＰＲＥＳＥＲＶＩＮＧＣＯＵＮＴＩＮＧ」；第６１／８６４０９４号（発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹ−ＰＲＥＳＥＲＶＩＮＧＲＥＣＯＭＭＥＮＤＡＴＩＯＮＴＯＲＡＴＩＮＧＣＯＮＴＲＩＢＵＴＩＮＧＵＳＥＲＳＢＡＳＥＤＯＮＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＴＩＯＮ」）；及び第６１／８６４０９８号（発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹ−ＰＲＥＳＥＲＶＩＮＧＲＥＣＯＭＭＥＮＤＡＴＩＯＮＢＡＳＥＤＯＮＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＴＩＯＮＡＮＤＲＩＤＧＥＲＥＧＲＥＳＳＩＯＮ」）。また、本願は２０１３年１２月１９日に出願された次の国際出願の利益と優先権を主張する：第ＰＣＴ／ＵＳ１３／７６３５３号（発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹＰＲＥＳＥＲＶＩＮＧＣＯＵＮＴＩＮＧ」。また、２０１３年３月４日に出願された次の米国仮特許出願の利益と優先権を主張する：第６１／７７２４０４号（発明の名称「ＰＲＩＶＡＣＹ−ＰＲＥＳＥＲＶＩＮＧＬＩＮＥＡＲＡＮＤＲＩＤＧＥＲＥＧＲＥＳＳＩＯＮ」）。これらの仮出願及び国際出願はすべての目的においてここに参照援用する。 Since it is not always possible to foresee the risk of future reasoning, accidental information leakage, or insider risk (intentional leakage), configure a recommendation system that does not reveal the user's personal data without encryption I am interested in that. A co-pending application by the present inventor, entitled “A METHOD AND SYSTEM FOR PRIVACY PRESEVERING MATRIX FACTORY ZARTION”, filed on the same day as the present application, describes a privacy protection recommendation system based on matrix factorization. . This affects the rating sent by the user to the recommender system. This profiles the items without knowing the ratings of the individual users or the items they have rated. This assumes that the user agrees that the recommender will learn the item profile.
This principle has a strong privacy protection in which the recommender system does not learn any information about the user's rating or items rated by the system, and does not learn any information about item profiles or statistical information extracted from user data. Propose a recommendation system. Thus, while the recommendation system provides recommendations to users who have contributed to the rating, the recommendations provided are completely blind. In addition, the recommendation system can provide recommendations to new users who have not participated in the original matrix factorization operation by using ridge regression.
[Cross-reference with related applications]
This application claims the benefit and priority of the following US provisional patent application filed on Aug. 9, 2013: 61 / 864,88 (Title of Invention "A METHOD AND SYSTEM FOR PRIVACY PRESEVERING FACTORY FACILATION") No. 61/864085 (Title of the Invention “A METHOD AND SYSTEM FOR PRIVACY PRESEVERING COUNTING COUNTIZING REGION -------------) And 61 / 864,098 (invention); Name “A METHOD AND SYSTEM FOR PRIVACY-PRESVERVING RECOMMENDATION BASED ON MATRIX FACTIONATION AND RIDGE REGRESION”). Also, this application claims the benefit of the next international application and the priority of the next international application filed on December 19, 2013. / US 13/76353 (Title of Invention “A METHOD AND SYSTEM FOR PRIVACY PRESEVERING COUNTING”. Also claims the benefit and priority of the next US provisional patent application filed on March 4, 2013: 61st / 772404 (Title of Invention "PRIVACY-PRESERVING LINEAR AND RIDGE REGRESION") These provisional applications and Applicant refers incorporated herein for all purposes when.

Ｂ．Ｍｏｂａｓｈｅｒ，Ｒ．Ｂｕｒｋｅ，Ｒ．Ｂｈａｕｍｉｋ，ａｎｄＣ．Ｗｉｌｌｉａｍｓ著「Ｔｏｗａｒｄｔｒｕｓｔｗｏｒｔｈｙｒｅｃｏｍｍｅｎｄｅｒｓｙｓｔｅｍｓ：Ａｎａｎａｌｙｓｉｓｏｆａｔｔａｃｋｍｏｄｅｌｓａｎｄａｌｇｏｒｉｔｈｍｒｏｂｕｓｔｎｅｓｓ」（ＡＣＭＴｒａｎｓ．ＩｎｔｅｒｎｅｔＴｅｃｈｎ．，７（４），２００７）B. Mobasher, R.A. Burke, R.A. Bhaumik, and C.I. Williams "Toward trustworthy recommenders systems: Analysis of attack models and algorithms robustness," (ACM Trans. Internet Techn.), 7 (ACM Trans. Internet Techn., 7). Ｅ．Ａ‘ｉｍｅｕｒ，Ｇ．Ｂｒａｓｓａｒｄ，Ｊ．Ｍ．Ｆｅｒｎａｎｄｅｚ，ａｎｄＦ．Ｓ．Ｍ．Ｏｎａｎａ著「ＡＬＡＭＢＩＣ：Ａｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇｒｅｃｏｍｍｅｎｄｅｒｓｙｓｔｅｍｆｏｒｅｌｅｃｔｒｏｎｉｃｃｏｍｍｅｒｃｅ」（Ｉｎｔ．ＪｏｕｒｎａｌＩｎｆ．Ｓｅｃ，７（５），２００８）E. A'imur, G. Brassard, J.M. M.M. Fernandez, and F.M. S. M.M. Onana, “ALAMMBIC: A privacy-preserving recommender system for electrical commence” (Int. Journal Inf. Sec, 7 (5), 2008). Ａ．ＮａｒａｎｙａｎａｎｄＶ．Ｓｈｍａｔｉｋｏｖ著「Ｒｏｂｕｓｔｄｅ−ａｎｏｎｙｍｉｚａｔｉｏｎｏｆｌａｒｇｅｓｐａｒｓｅｄａｔａｓｅｔｓ」（ＩＥＥＥＳ＆Ｐ，２００８）A. Naryanyan and V. “Robust de-anonymization of large sparse datasets” by Shmatikov (IEEE S & P, 2008)

本原理は、プライバシーを保護して、行列因子分解として知られる協調的フィルタリング手法に基づき、セキュアにリコメンデーションを提供する方法を提案する。具体的に、該方法は、ユーザがアイテム（例えば、映画、本）に与えたレーティングを入力として受け取り、各アイテムと各ユーザのプロファイルを生成する。そのプロファイルは、後で、ユーザが各アイテムにどんなレーティングを与えるか予測するのに利用できる。本原理により、行列因子分解に基づくリコメンダシステムは、ユーザのレーティング、ユーザがどのアイテムをレーティングしたか、アイテムプロファイル、又はユーザデータから抽出される任意の統計情報を学習することなく、このタスクを実行できる。特に、リコメンダシステムは、提供するリコメンデーションには完全にブラインドで、レーティングに貢献したユーザに、まだレーティングしていないアイテムをどのようにレーティングするかに対する予測の形式で、リコメンデーションを提供する。さらに、リコメンデーションシステムは、リッジ回帰を利用することにより、元の行列因数分解演算に参加していない新しいユーザにリコメンデーションを提供できる。 The present principle proposes a method that provides privacy recommendations while protecting privacy and based on a collaborative filtering technique known as matrix factorization. Specifically, the method receives as input the rating that a user gave to an item (eg, movie, book) and generates a profile for each item and each user. That profile can later be used to predict what rating the user will give to each item. With this principle, a recommender system based on matrix factorization can perform this task without learning the user's rating, what item the user has rated, the item profile, or any statistical information extracted from the user data. Can be executed. In particular, the recommender system provides recommendations in the form of predictions on how to rate items that have not yet been rated to users who have contributed to the rating, completely blind to the recommendations provided. Furthermore, the recommendation system can provide recommendations to new users who are not participating in the original matrix factorization operation by using ridge regression.

本原理の一態様により、行列因子分解とリッジ回帰とによりセキュアにリコメンデーションを生成する方法が提供される。該方法は、第１のレコードセット（２２０）を受信するステップであって、各レコードは、第１のユーザセット（２１０）中の各ユーザから受信され、トークンセットとアイテムセットとを含み、前記各ユーザ（３１５）以外のパーティには秘密にされる、ステップと、リコメンダ（ＲｅｃＳｙｓ）において第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを用いて行列因子分解に基づき前記第１のレコードセットを評価するステップ（２３０）であって、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔ（３５５）の出力は前記第１のレコードセット中のすべてのアイテムのマスクされたアイテムプロファイルを含む、ステップと、少なくとも１つのアイテムについて要求ユーザからリコメンデーション要求を受信するステップ（３３０）と、前記要求ユーザが、第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを用いてリッジ回帰に基づき第２のレコードと前記マスクされたアイテムプロファイルとを評価するステップであって、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記少なくとも１つのアイテムに関するリコメンデーションを含み、前記リコメンデーションは前記要求ユーザのみにより知られる、ステップ（３８５）とを有する。本方法は、さらに、前記ＣＳＰ中の第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記第１のレコードセットに対して行列因子分解を行うように設計するステップ（３４０）であって、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記第１のレコードセット中のすべてのアイテムのマスクされたアイテムプロファイルを含む、ステップと、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記ＲｅｃＳｙｓに転送するステップ（３４５）と、前記ＣＳＰ中の第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記第２のレコードと前記マスクされたアイテムプロファイルとにリッジ回帰を行うように設計するステップ（３６５）であって、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記少なくとも１つのアイテムに関するリコメンデーションを含む、ステップと、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記要求ユーザに転送するステップ（３７０）とをさらに有する。この方法の設計するステップは、行列因子分解演算をブーリアン回路として設計するステップ（３４０２）と、リッジ回帰演算をブーリアン回路として設計するステップ（３６５２）とを有する。行列因子分解回路を設計するステップは、前記第１のレコードセットのアレイを構成するステップと、前記アレイに対して、ソーティング（４２０、４４０、４７０、４９０）、コピー（４３０、４５０）、更新（４７０、４８０）、比較（４８０）、及び傾斜貢献の計算の動作（４６０）を実行するステップとを有する。本方法は、さらに、前記ＣＳＰが前記ｇａｒｂｌｅｄｃｉｒｃｕｉｔの設計用のパラメータセットを受信するステップであって、パラメータは前記ＲｅｃＳｙｓにより送信される、ステップ（３３５、３６０）をさらに有する。 One aspect of the present principles provides a method for securely generating recommendations by matrix factorization and ridge regression. The method includes receiving a first record set (220), wherein each record is received from each user in the first user set (210) and includes a token set and an item set; Secreted to parties other than each user (315), and evaluating (230) the first record set based on matrix factorization using a first garbled circuit in a recommender (RecSys) Wherein the output of the first garbled circuit (355) includes a masked item profile of all items in the first record set, and requests a recommendation request from a requesting user for at least one item. Receiving step (330); and A user evaluates a second record and the masked item profile based on ridge regression using a second garbled circuit, wherein the output of the second garbled circuit is the at least one item The recommendation comprises a step (385) known only by the requesting user. The method further comprises designing (340) a first factored circuit in the CSP to perform a matrix factorization on the first record set, the output of the first garbled circuit. Includes masked item profiles of all items in the first record set, transferring the first garbled circuit to the RecSys (345), and a second garbled in the CSP designing a circuit to perform a ridge regression on the second record and the masked item profile (365), wherein the output of the second garble circuit is a recommendation for the at least one item And (370) forwarding the second garbled circuit to the requesting user. The designing step of the method includes the step of designing the matrix factorization operation as a Boolean circuit (3402) and the step of designing the ridge regression operation as a Boolean circuit (3652). The step of designing a matrix factorization circuit includes constructing an array of the first record set, sorting (420, 440, 470, 490), copying (430, 450), updating (to the array). 470, 480), a comparison (480), and a slope contribution calculation operation (460). The method further includes a step (335, 360) in which the CSP receives a parameter set for designing the garbled circuit, wherein parameters are transmitted by the RecSys.

本原理の一態様により、本方法は、さらに、前記第１のレコードセットを暗号化して暗号化レコードを生成するステップ（３１５）であって、暗号化は第１のレコードセットの受信前に行われる、ステップをさらに有する。本方法は、さらに、前記ＣＳＰにおいて公開暗号鍵を生成するステップと、前記鍵を前記各ユーザに送信するステップ（３１０）とをさらに有する。暗号スキームは部分的準同型暗号であってもよく（３１０）、本方法は、さらに、前記ＲｅｃＳｙｓにおいて前記暗号化されたレコードをマスクして、マスクされたレコードを生成するステップ（３２０）と、前記ＣＳＰにおいて、前記マスクされたレコードを復号して、マスクされ復号されたレコードを生成するステップ（３２５）とを有する。本方法の設計するステップ（３４０）は、さらに、前記マスクされ復号されたレコードを、処理する前に、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔ内でアンマスクするステップを有する。本方法は、さらに、前記ＣＳＰと前記ＲｅｃＳｙｓとの間で（３５０２）ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒを行うステップ（３５０）であって、前記ＲｅｃＳｙｓは前記マスクされ復号されたレコードのｇａｒｂｌｅｄ値を受け取り、前記レコードは前記ＲｅｃＳｙｓと前記ＣＳＰには秘密にしておかれる、ステップを有する。 In accordance with one aspect of the present principles, the method further includes encrypting the first record set to generate an encrypted record (315), wherein the encryption is performed prior to receiving the first record set. The method further includes a step. The method further comprises the steps of generating a public encryption key at the CSP and transmitting the key to each user (310). The cryptographic scheme may be partially homomorphic (310), and the method further comprises masking the encrypted record in the RecSys to generate a masked record (320); The CSP includes a step (325) of decoding the masked record to generate a masked decoded record. The designing step (340) of the method further comprises unmasking the masked decoded record within the first garbled circuit before processing. The method further comprises (3502) performing an obligatory transfer (3502) between the CSP and the RecSys, wherein the RecSys receives a garbled value of the masked and decoded record, the record being RecSys and the CSP have steps that are kept secret.

本原理の一態様により、リッジ回帰回路を設計するステップ（３６５）は、前記要求ユーザから、マスクされたアイテムプロファイルと第２のレコードとを受信するステップ（３６５３）と、マスクされたアイテムプロファイルをアンマスクして、トークンと、アイテムと、アイテムプロファイルとのアレイを生成するステップであって、対応するアイテムプロファイルは前記第２のレコードからの各トークンとアイテムに加えられる、ステップ（３６５４）と、前記タプルアレイにリッジ回帰を行い、要求ユーザプロファイルを生成するステップ（３６５６）と、前記要求ユーザプロファイルと少なくとも１つのアイテムプロファイルからリコメンデーションを計算するステップ（３６５８）とを有する。リッジ回帰演算のためのアレイを生成するステップはソーティングネットワークを用いて実行してもよい（３６５４）。本方法は、さらに、前記要求ユーザ、ＣＳＰ及びＲｅｃＳｙｓの間で（３８０２）プロキシｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒを行うステップ（３８０）であって、前記要求ユーザは前記マスクされたアイテムプロファイルのｇａｒｂｌｅｄ値を受信し、前記マスクされたアイテムプロファイルは前記要求ユーザとＣＳＰには秘密にしておかれるステップをさらに有する。 According to one aspect of the present principles, designing (365) a ridge regression circuit includes receiving (3653) a masked item profile and a second record from the requesting user; Unmasking to generate an array of tokens, items and item profiles, the corresponding item profile being added to each token and item from the second record (3654), Performing a ridge regression on the tuple array to generate a requested user profile (3656), and calculating a recommendation from the requested user profile and at least one item profile (3658). The step of generating an array for the ridge regression operation may be performed using a sorting network (3654). The method further includes (3802) performing a proxy obligatory transfer (380) between the requesting user, CSP, and RecSys, wherein the requesting user receives a garbled value of the masked item profile, and The masked item profile further comprises the step of keeping it secret to the requesting user and CSP.

本原理の一態様により、本方法は、さらに、各レコードのトークンとアイテムの数を受け取るステップ（２２０、３０５、３３０）をさらに有する。さらに、本方法は、各レコードのトークン数が最大値を表す値より小さいとき、トークン数が前記値と等しいレコードを生成するため、各レコードをヌルエントリでパディングするステップ（３０５２）をさらに有する。前記第１のレコードセットのソースはデータベースであり、前記第２のレコードのソースはデータベースである。 According to one aspect of the present principles, the method further comprises receiving (220, 305, 330) a number of tokens and items for each record. Further, the method further includes a step (3052) of padding each record with a null entry in order to generate a record in which the number of tokens is equal to the value when the number of tokens in each record is smaller than a value representing the maximum value. The source of the first record set is a database, and the source of the second record set is a database.

本原理の一態様により、行列因子分解とリッジ回帰とによりセキュアにリコメンデーションを生成するシステムが提供される。該システムは、第１のレコードセットを提供する第１のユーザセットと、セキュアな行列因子分解とリッジ回帰回路を提供する暗号サービスプロバイダ（ＣＳＰ）と、行列因子分解を評価するＲｅｃＳｙｓと、第２のレコードを提供し、前記リッジ回帰回路を評価し、各レコードがそのユーザ以外のパーティには秘密にしておかれるようにする、要求ユーザとを有し、ユーザ、ＣＳＰ及びＲｅｃＳｙｓはそれぞれ、少なくとも１つの入出力（６０４）を受けるプロセッサ（６０２）と、前記プロセッサと信号通信する少なくとも１つのメモリ（６０６、６０８）と、前記ＲｅｃＳｙｓプロセッサは、第１のユーザセットから第１のレコードセットを受信し、各レコードは、トークンセットとアイテムセットとを含み、前記各ユーザ以外のパーティには秘密にされ、要求ユーザから、少なくとも１つのアイテムを求める要求を受信し、第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを用いて行列因子分解に基づき前記第１のレコードセットを評価し、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記第１のレコードセット中のすべてのアイテムのマスクされたアイテムプロファイルを含み、前記要求ユーザのプロセッサは、第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを用いてリッジ回帰に基づき第２のレコードと前記マスクされたアイテムプロファイルとを評価し、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記少なくとも１つのアイテムに関するリコメンデーションを含み、前記リコメンデーションは前記要求ユーザのみにより知られる、ように構成され得る。前記ＣＳＰプロセッサは、第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記第１のレコードセットに対して行列因子分解を行うように設計し、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記第１のレコードセット中のすべてのアイテムのマスクされたアイテムプロファイルを含み、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記ＲｅｃＳｙｓに転送し、第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記第２のレコードと前記マスクされたアイテムプロファイルとにリッジ回帰を行うように設計し、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔの出力は前記少なくとも１つのアイテムに関するリコメンデーションを含み、前記第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを前記要求ユーザに転送するように構成されている。本システムのＣＳＰプロセッサは、行列因子分解演算をブーリアン回路として設計し、リッジ回帰演算をブーリアン回路として設計するように構成されていることにより、ｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計するように構成し得る。前記ＣＳＰプロセッサは、前記第１のレコードセットのアレイを構成するように構成されていることにより、前記行列因子分解回路を設計するように構成し得る。前記アレイにおけるソーティング、コピー、交信、比較及び傾斜貢献の計算の動作を行う本システムのＣＳＰプロセッサは、ｇａｒｂｌｅｄｃｉｒｃｕｉｔの設計のパラメータセットを受信するようにさらに構成され、パラメータは前記ＲｅｃＳｙｓにより送信されたものである。 One aspect of the present principles provides a system for securely generating recommendations by matrix factorization and ridge regression. The system includes a first user set that provides a first record set, a cryptographic service provider (CSP) that provides a secure matrix factorization and ridge regression circuit, a RecSys that evaluates matrix factorization, a second And a requesting user that each record is kept secret to parties other than the user, each of the user, the CSP, and the RecSys is at least 1 A processor (602) receiving one input / output (604), at least one memory (606, 608) in signal communication with the processor, and the RecSys processor receiving a first record set from a first user set , Each record includes a token set and an item set, and Receive a request for at least one item from the requesting user, evaluate the first record set based on matrix factorization using a first garbled circuit, and the first garbled The output of circuit includes a masked item profile of all items in the first record set, and the requesting user's processor uses a second garble circuit to generate a second record and the mask based on ridge regression. And the output of the second garbled circuit includes a recommendation for the at least one item, and the recommendation is known only by the requesting user. The CSP processor designs a first garbled circuit to perform a matrix factorization on the first record set, and the output of the first garbled circuit is all items in the first record set. Designed to transfer the first garbled circuit to the RecSys, and to perform a ridge regression on the second garbled circuit to the second record and the masked item profile. , The output of the second garbled circuit includes a recommendation for the at least one item, and is configured to forward the second garbled circuit to the requesting user. The CSP processor of the present system can be configured to design a garbled circuit by designing the matrix factorization operation as a Boolean circuit and the ridge regression operation as a Boolean circuit. The CSP processor may be configured to design the matrix factorization circuit by being configured to configure the first record set array. The CSP processor of the system performing operations of sorting, copying, communicating, comparing and calculating the slope contribution in the array is further configured to receive a parameter set of garbled circuit designs, and parameters were transmitted by the RecSys. Is.

本原理の一態様により、前記第１のユーザセットの各ユーザプロセッサは、各レコードを暗号化して、前記レコードを提供する前に、暗号化したレコードを生成するように構成され得る。本システムのＣＳＰプロセッサは、前記ＣＳＰにおいて公開暗号鍵を生成し、前記鍵を前記第１のユーザセットに送信するように構成され得る。暗号化スキームは部分的準同型暗号であってもよく、ＲｅｃＳｙｓプロセッサは、暗号化されたレコードをマスクして、マスクされたレコードを生成するようにさらに構成されてもよく、前記ＣＳＰプロセッサは、マスクされたレコードを復号して、マスクされ復号されたレコードを生成するようにさらに構成されてもよい。本システムのＣＳＰプロセッサは、
前記マスクされ復号されたレコードを、処理する前に、前記第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔ内でアンマスクするようにさらに構成されていることにより、第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計するように構成し得る。本システムのＲｅｃＳｙｓプロセッサとＣＳＰプロセッサは、ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒを実行するようにさらに構成されてもよく、ＲｅｃＳｙｓはマスクされ復号されたレコードのｇａｒｂｌｅｄ値を受信し、レコードはＲｅｃＳｙｓとＣＳＰには秘密にしておかれる。本システムのＣＳＰプロセッサは、
前記要求ユーザから、マスクされたアイテムプロファイルと第２のレコードとを受信し、マスクされたアイテムプロファイルをアンマスクして、トークンと、アイテムと、アイテムプロファイルとのアレイを生成し、対応するアイテムプロファイルは前記第２のレコードからの各トークンとアイテムに加えられ、前記タプルアレイにリッジ回帰を行い、要求ユーザプロファイルを生成し、前記要求ユーザプロファイルと少なくとも１つのアイテムプロファイルからリコメンデーションを計算するように構成されていることにより、第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを設計するように構成し得る。本システムのＣＳＰプロセッサは、ソーティングネットワークを設計するように構成されていることにより、リッジ回帰演算のためのアレイを生成するように構成し得る。要求ユーザプロセッサ、ＲｅｃＳｙｓプロセッサ及びＣＳＰプロセッサは、プロキシｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒを実行するようにさらに構成でき、要求ユーザはマスクされたアイテムプロファイルのｇａｒｂｌｅｄ値を受信し、マスクされたアイテムプロファイルは要求ユーザとＣＳＰには秘密にしておかれる。 In accordance with one aspect of the present principles, each user processor of the first user set may be configured to encrypt each record and generate an encrypted record before providing the record. The CSP processor of the system may be configured to generate a public encryption key at the CSP and send the key to the first user set. The encryption scheme may be a partially homomorphic cipher, and the RecSys processor may be further configured to mask the encrypted records to generate a masked record, the CSP processor comprising: It may be further configured to decode the masked record to generate a masked decoded record. The CSP processor of this system is
The masked decoded record may be further configured to be unmasked in the first garbled circuit prior to processing, thereby designing a first garbled circuit. The RecSys processor and CSP processor of the system may be further configured to perform an obligatory transfer, where RecSys receives the garbled value of the masked and decoded record, and the record is kept secret to RecSys and the CSP. It is burned. The CSP processor of this system is
A masked item profile and a second record are received from the requesting user, the masked item profile is unmasked to generate an array of tokens, items, and item profiles, the corresponding item profile is Added to each token and item from the second record, configured to perform a ridge regression on the tuple array, generate a requested user profile, and calculate a recommendation from the requested user profile and at least one item profile The second garbled circuit can be designed. The CSP processor of the system may be configured to generate an array for ridge regression operations by being configured to design a sorting network. The requesting user processor, the RecSys processor, and the CSP processor can be further configured to execute a proxy obligatory transfer, the requesting user receives a garbled value of the masked item profile, and the masked item profile is sent to the requesting user and CSP. Keep it secret.

本原理の一態様により、前記ＲｅｃＳｙｓプロセッサは、各レコードのトークン数を受信するようにさらに構成され、前記トークン数は各レコードのソースにより送信されたものである。前記第１のユーザセットの各プロセッサは、各レコードのトークン数が最大値を表す値より小さいとき、トークン数が前記値と等しいレコードを生成するため、各レコードをヌルエントリでパディングするように構成され得る。前記第１のレコードセットのソースはデータベースであり、前記第２のレコードのソースはデータベースである。 In accordance with one aspect of the present principles, the RecSys processor is further configured to receive a token count for each record, the token count being sent by the source of each record. Each processor of the first user set is configured to pad each record with a null entry in order to generate a record having the number of tokens equal to the value when the number of tokens of each record is smaller than a value representing a maximum value. Can be done. The source of the first record set is a database, and the source of the second record set is a database.

本原理の別のフィーチャ及び利点は、添付した図面を参照する実施形態の詳細な説明から明らかになる。 Other features and advantages of the present principles will become apparent from the detailed description of embodiments with reference to the accompanying drawings.

本原理は簡単に説明する以下の図面を参照してよりよく理解することができる。
先行技術によるリコメンデーションシステムのコンポーネントを示す図である。本原理によるリコメンデーションシステムのコンポーネントを示す図である。本原理によるプライバシー保護リコメンデーション方法を示すフローチャートを示す図である。本原理によるプライバシー保護リコメンデーション方法を示すフローチャートを示す図である。本原理によるプライバシー保護リコメンデーション方法を示すフローチャートを示す図である。は本原理によるプライバシー保護リコメンデーション方法を示すフローチャートを示す図である。本原理による行列因子分解アルゴリズムを示す図である。本原理による行列因子分解アルゴリズムを示す図である。本原理による行列因子分解アルゴリズムを示す図である。図５Ａ，Ｂは本原理による行列因子分解アルゴリズムにより構成されたデータ構造Ｓを示す図である。本原理による実装に利用される計算環境を示すブロック図である。 The principles can be better understood with reference to the following drawings, which are briefly described.
FIG. 2 is a diagram showing components of a recommendation system according to the prior art. It is a figure which shows the component of the recommendation system by this principle. It is a figure which shows the flowchart which shows the privacy protection recommendation method by this principle. It is a figure which shows the flowchart which shows the privacy protection recommendation method by this principle. It is a figure which shows the flowchart which shows the privacy protection recommendation method by this principle. FIG. 5 is a flowchart showing a privacy protection recommendation method according to the present principle. It is a figure which shows the matrix factorization algorithm by this principle. It is a figure which shows the matrix factorization algorithm by this principle. It is a figure which shows the matrix factorization algorithm by this principle. 5A and 5B are diagrams showing a data structure S configured by a matrix factorization algorithm according to the present principle. It is a block diagram which shows the calculation environment utilized for the implementation by this principle.

本原理によると、プライバシー保護かつブラインドなやり方で、行列因子分解として知られる協調的フィルタリング手法に基づきリコメンデーションをセキュアに行う方法を提供する。 According to the present principles, a method is provided for securing recommendations based on a collaborative filtering technique known as matrix factorization in a privacy-preserving and blind manner.

本原理の方法は、レコードのコーパス中のアイテムに関するリコメンデーションをするサービスとして機能し得る。各レコードはトークンとアイテムのセットを含む。レコードのセットは二以上のレコードを含み、トークンのセットは少なくとも１つのトークンを含む。当業者には言うまでもなく、上記の例においてレコードはユーザを表すこともあり、トークンはレコード中の対応するアイテムのユーザのレーティングであり得る。トークンはアイテムに関連するランク、加重又はものさし（ｍｅａｓｕｒｅｓ）を表してもよいし、アイテムは人、タスク又はジョブを表しても良い。例えば、ランク、加重又は尺度は個人の健康に関連していてもよいし、研究者は人々の健康尺度の相関をとることを試みている。それらは、個人の生産性と関連づけることもでき、企業はあるジョブのスケジュールを事前の履歴に基づいて予測しようと試みている。しかし、関係する個人のプライバシーを確保するため、サービスは、各レコードの内容、それが提供するアイテムプロファイル、又はユーザデータ（レコード）から抽出される統計的情報について学習することなく、ブラインドでそうしたいと願っている。具体的に、サービスは、（ａ）各トークン／アイテムがどのレコードに現れるか、またはもっと強い理由から（ｂ）各レコード中にどのトークン／アイテムが現れるか、（ｃ）トークンの値、及び（ｄ）ユーザデータから抽出されたアイテムプロファイルまたは統計的情報を学習すべきではない。さらに、サービスは、リッジ回帰を利用することにより、元の行列因子分解演算に参加していない新しいユーザにリコメンデーションを提供できる。以下、「プライバシー保護（ｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇ）」、「プライベート」及び「セキュア（ｓｅｃｕｒｅ）」などの用語は互換的に用い、ユーザがプライベートだと考えている情報（レコード）がユーザのみに知られていることを示し、「ブラインド（ｂｌｉｎｄ）」との用語はユーザ以外のパーティがリコメンデーションについて知らない（ｂｌｉｎｄ）ことを示す。 The method of the present principles can serve as a recommendation service for items in a corpus of records. Each record contains a set of tokens and items. The set of records includes two or more records, and the set of tokens includes at least one token. It goes without saying to those skilled in the art that in the above example the record may represent a user and the token may be the user's rating of the corresponding item in the record. Tokens may represent ranks, weights or measures associated with the item, and items may represent people, tasks or jobs. For example, ranks, weights or scales may be related to individual health, and researchers are trying to correlate people's health scales. They can also be associated with individual productivity, and companies are trying to predict the schedule of a job based on prior history. However, to ensure the privacy of the individuals involved, the service wants to do so blindly without learning about the contents of each record, the item profile it provides, or statistical information extracted from user data (records). I hope. Specifically, the service may: (a) in which record each token / item will appear, or for stronger reasons (b) which token / item will appear in each record, (c) the value of the token, and ( d) Should not learn item profiles or statistical information extracted from user data. In addition, the service can provide recommendations to new users who are not participating in the original matrix factorization operation by using ridge regression. Hereinafter, terms such as “privacy-preserving”, “private”, and “secure” are used interchangeably, and information (record) that the user considers private is known only to the user. The term “blind” indicates that a party other than the user does not know about the recommendation.

プライバシー保護の方法における行列因子分解の実行に関連する幾つかの問題がある。第１の、プライバシー問題を解決するため、行列因子分解はリコメンダがユーザのレーティングや、ユーザがどのアイテムをレーティングしたかを学習せずに実行されなければならない。後者の要請が重要である：初期の研究で、ユーザがどの映画をレーティングしたかを知るだけでそのジェンダーなどを推論するのに用いられることが分かっている。第２に、かかるプライバシー保護アルゴリズムは効率的でなければならず、ユーザによりなされるレーティングの数とともに素直に（ｇｒａｃｅｆｕｌｌｙ）（例えば、リニアに）スケールしなければならない。プライバシー要件は、行列因子分解アルゴリズムがデータｏｂｌｉｖｉｏｕｓ（ｄａｔａ−ｏｂｌｉｖｉｏｕｓ）、すなわちその実行がユーザ入力に依存してはならないことを示唆している。さらに、行列因数分解により行われる演算は非線形である。すなわち、これらの両制約下においてどのように行列因数分解を効率的に実装するかアプリオリには明らかではない。最後に、実際の実世界のシナリオでは、ユーザは、限られた通信及び計算のリソースしか有しておらず、自分のデータを供給した後にはオンラインに留まると期待すべきではない。それよりも、リコメンデーションサービスに対してオンラインとオフラインを行ったり来たりするユーザがいるときに動作可能である「送信して忘れる（ｓｅｎｄａｎｄｆｏｒｇｅｔ）」タイプのソリューションが望ましい。 There are several problems associated with performing matrix factorization in privacy protection methods. First, to solve the privacy problem, matrix factorization must be performed without the recommender learning the user's rating and what items the user has rated. The latter requirement is important: early research has shown that just knowing which movie a user has rated can be used to infer its gender. Second, such privacy protection algorithms must be efficient and must scale gracefully (eg, linearly) with the number of ratings made by the user. The privacy requirement suggests that the matrix factorization algorithm is data oblivious (data-obvious), ie its execution should not depend on user input. Furthermore, operations performed by matrix factorization are non-linear. That is, it is not clear a priori how to efficiently implement matrix factorization under both of these constraints. Finally, in real-world scenarios, users have limited communication and computational resources and should not be expected to stay online after supplying their data. Rather, a “send and forget” type solution that is operable when there are users going back and forth to and from the recommendation service is desirable.

行列因子分解の概要として、標準的な「協調的フィルタリング」設定では、ｎ人のユーザがｍ個のアイテム（例えば、映画）のサブセットをレーティングする。［ｎ］：＝｛１，．．．，ｎ｝をユーザのセット、［ｍ］：＝｛１，．．．，ｍ｝をアイテムのセットと、レーティングが生成されたユーザ／アイテムのペアをＭ⊆［ｎ］ｘ［ｍ］とし、レーティングの総数をＭ＝［Ｍ］とする。最後に、（ｉ，ｊ）∈Ｍについて、ユーザｉによりアイテムｊに対して生成されたレーティングをｒ_ｉ，ｊ∈Ｒで表す。現実的な設定では、ｎとｍは大きな数であり、一般的には１０^４と１０^６の間の範囲にある。また、提供されるレーティングはわずかであり、すなわち、Ｍ＝Ｏ（ｎ＋ｍ）であり、これは潜在的レーティングの総数ｎ×ｍよりずっと小さい。これは一般的なユーザの振るまいと一貫性がある。各ユーザは、（「カタログ」サイズｍには依らず）有限数のアイテムのみをレーティングするからである。 As an overview of matrix factorization, in a standard “collaborative filtering” setting, n users rate a subset of m items (eg, movies). [N]: = {1,. . . , N} is a set of users, [m]: = {1,. . . , M} is a set of items, a user / item pair for which a rating is generated is M⊆ [n] × [m], and the total number of ratings is M = [M]. Finally, for (i, j) εM, the rating generated for item j by user i is denoted by r _{i, j} εR. In realistic settings, n and m are large numbers and are generally in the range between 10 ⁴ and 10 ⁶ . Also, only a few ratings are provided, ie M = O (n + m), which is much smaller than the total number of potential ratings n × m. This is consistent with typical user behavior. This is because each user rates only a finite number of items (independent of the “catalog” size m).

Ｍ中のレーティングが与えられると、リコメンダシステムは［ｎ］×［ｍ］＼Ｍ中のユーザ／アイテムペアのレーティングを予測したい。行列因子分解は、既存のレーティングに対してバイリニアモデルをフィッティングすることにより、このタスクを実行する。具体的に、小さな次元ｄ∈Ｎの場合、

となるベクトルｕ_ｉ∈Ｒ^ｄ、ｉ∈［ｎ］かつｖ_ｊ∈Ｒ^ｄ、ｊ∈［ｍ］があると仮定する。ここでε_ｉ，ｊは、独立かつ同じ分布を有する（ｉｎｄｅｐｅｎｄｅｎｔａｎｄｉｄｅｎｔｉｃａｌｌｙｄｉｓｔｒｉｂｕｔｅｄ）ガウシアンランダム変数である。ベクトルｕ_ｉとｖ_ｊはそれぞれユーザプロファイルとアイテムプロファイルと呼ばれ、＜ｕ_ｉ，ｖ_ｊ＞はベクトルの内積である。用いた記法は、第ｉ行がユーザｉのプロファイルを含むｎ×ｄ行列の
［外１］

と、第ｊ行がアイテムｊのプロファイルを含むｍ×ｄ行列の
［外２］

である。 Given a rating in M, the recommender system wants to predict the rating of the user / item pair in [n] × [m] \ M. Matrix factorization performs this task by fitting a bilinear model to an existing rating. Specifically, for a small dimension d∈N,

Become vectors _{^{u i ∈R d, i∈ [n}} ] and _{v j} ∈R ^d, it is assumed that there is j? [M]. Here, ε _{i, j} is a Gaussian random variable that is independent and has the same distribution (independent and distributed). The vectors u _i and v _j are called a user profile and an item profile, respectively, and <u _i , v _j > is an inner product of the vectors. The notation used is that of the nxd matrix in which the i-th row contains the profile of user i [Outside 1]

And the mxd matrix where the j-th row contains the profile of item j [outside 2]

It is.

レーティング
［外３］

が与えられると、リコメンダは一般的に、λ、μ＞０として、

の規格化最小二乗最小化を実行して、プロファイルＵとＶを計算する。当業者には言うまでもないが、プロファイルＵとＶのガウシアンプライア（Ｇａｕｓｓｉａｎｐｒｉｏｒｓ）を仮定すると、式（２）の最小化はＵとＶの最大尤度推定に対応する。留意点として、ユーザプロファイルとアイテムプロファイルがあれば、リコメンダは、ユーザｉとアイテムｊに対して、

のようにレーティング
［外４］

を予測できる。 Rating [Outside 3]

Is generally given by λ, μ> 0,

To calculate the profiles U and V. As will be appreciated by those skilled in the art, given the Gaussian priors of profiles U and V, the minimization in equation (2) corresponds to the maximum likelihood estimation of U and V. As a reminder, if there is a user profile and an item profile, the recommender

Like rating [Outside 4]

Can be predicted.

式（２）の規格化された平均二乗誤差は凸関数ではない。この最小化を実行する複数の方法が文献で提案されている。本原理は、よく使われる方法である傾斜降下にフォーカスし、以下に説明する。式（２）の規格化された平均二乗誤差をＦ（Ｕ，Ｖ）と記すと、傾斜降下は適応規則によりプロファイルＵとＶを反復的に適応させることにより動作する：

ここでγ＞０は小さい利得係数であり、

ここでＵ（０）とＶ（０）は一様ランダムノルム１行（ｕｎｉｆｏｒｍｌｙｒａｎｄｏｍｎｏｒｍ１ｒｏｗｓ）よりなる（すなわち、プロファイルはノルム１ボールから一様にランダムに（ｕｎｉｆｏｒｍｌｙａｔｒａｎｄｏｍ）選択される）。 The normalized mean square error of equation (2) is not a convex function. Several methods for performing this minimization have been proposed in the literature. This principle focuses on tilt descent, a commonly used method, and is described below. Denoting the normalized mean square error of equation (2) as F (U, V), the slope descent works by iteratively adapting the profiles U and V according to the adaptation rules:

Where γ> 0 is a small gain coefficient,

Here, U (0) and V (0) are composed of uniform random norm 1 rows (that is, the profile is uniformly selected from norm 1 ball). .

本原理の他の一態様は、ソートネットワークとＹａｏのｇａｒｂｌｅｄｃｉｒｃｕｉｔｓに基づく行列因子分解のためのセキュアなマルチパーティ計算（ＭＰＣ）アルゴリズムを提案することである。セキュアマルチパーティ計算（ＭＰＣ）は１９８０年代にＡ．Ｃｈｉ−ＣｈｉｈＹａｏにより初めて提案された。Ｙａｏのプロトコル（ｇａｒｂｌｅｄｃｉｒｃｕｉｔｓとしても知られている）はセキュアマルチパーティ計算の一般的な方法である。Ｖ．Ｎｉｋｏｌａｅｎｋｏ，Ｕ．Ｗｅｉｎｓｂｅｒｇ，Ｓ．Ｉｏａｎｎｉｄｉｓ，Ｍ．Ｊｏｙｅ，Ｄ．Ｂｏｎｅｈ，及びＮ．Ｔａｆｔによる「Ｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇＲｉｄｇｅＲｅｇｒｅｓｓｉｏｎｏｎＨｕｎｄｒｅｄｓｏｆｍｉｌｌｉｏｎｓｏｆｒｅｃｏｒｄｓ」（ＩＥＥＥＳ＆Ｐ，２０１３）から翻案したその変形例において、プロトコルはｎ人の入力オーナーのセット（ここで、ａ_ｉはユーザｉ、１≦ｉ≦ｎ、のプライベート入力を示す）と、ｆ（ａ_１，．．．，ａ_ｎ）の評価を望むエバリュエータと、サードパーティである暗号サービスプロバイダ（ＣＳＰ）の間で行われる。プロトコルの終わりには、エバリュエータはｆ（ａ_１，．．．，ａ_ｎ）の値を学習するが、どのパーティもこの出力値から分かることよりも多くのものは学習しない。このプロトコルでは、関数ｆがブール回路として、例えばＯＲ、ＡＮＤ、ＮＯＴ及びＸＯＲゲートのグラフとして表せることが必要であり、エバリュエータとＣＳＰが共謀しないことが必要である。 Another aspect of the present principle is to propose a secure multi-party computation (MPC) algorithm for matrix factorization based on sorted networks and Yao's garbled circuits. Secure multi-party computing (MPC) First proposed by Chi-Chih Yao. Yao's protocol (also known as garbled circuits) is a common method of secure multiparty computation. V. Nikolaenko, U. Weinsberg, S.M. Ioannidis, M.M. Joye, D.H. Boneh, and N.C. In the modification that was adapted from "Privacy-preserving Ridge Regression on Hundreds of millions of records " (IEEE S & P, 2013) due to Taft, protocol n's input owners set of (in this case, _{a i} is user i, 1 ≦ i = n), and an evaluator desiring to evaluate f (a ₁ ,..., a _n ) and a third-party cryptographic service provider (CSP). At the end of the protocol, the evaluator learns the value of f (a ₁ ,..., _An ), but no party learns more than it knows from this output value. This protocol requires that function f can be represented as a Boolean circuit, for example as a graph of OR, AND, NOT, and XOR gates, and that the evaluator and CSP must not collide.

最近ではＹａｏのｇａｒｂｌｅｄｃｉｒｃｕｉｔｓを実装する多くのフレームワークがある。汎用ＭＰＣに対する異なる１つのアプローチは秘密共有方式に基づき、他の１つのアプローチは完全準同形暗号（ＦＨＥ）に基づく。秘密共有方式は、線形系、線形回帰、オークションの解法などさまざまな線形代数演算について提案されている。秘密共有は、計算の負荷を等しく共有し複数ラウンドで通信する少なくとも３つの共謀しないオンラインオーソリティが必要であり、計算はそれらのうち２つが共謀しない限りセキュアである。ｇａｒｂｌｅｄｃｉｒｃｕｉｔｓは、２つの共謀していないオーソリティとずっと少ない通信しか仮定しないので、エバリュエータがクラウドサービスであり暗号サービスプロバイダ（ＣＳＰ）がトラステッドハードウェアコンポーネント（ｔｒｕｓｔｅｄｈａｒｄｗａｒｅｃｏｍｐｏｎｅｎｔ）に実装されたシナリオにより適している。 Recently, there are many frameworks that implement Yao's garbled circuits. One different approach to general purpose MPC is based on a secret sharing scheme and the other one is based on fully homomorphic encryption (FHE). Secret sharing schemes have been proposed for various linear algebra operations such as linear systems, linear regression, and auction solutions. Secret sharing requires at least three unconspired online authorities that share the computational load equally and communicate in multiple rounds, and the computation is secure unless two of them conspire. Garbled Circuits assumes two unconspired authorities and much less communication, so it is more suitable for a scenario where the evaluator is a cloud service and the cryptographic service provider (CSP) is implemented in a trusted hardware component. Yes.

使用する暗号プリミティブに関わらず、セキュアマルチパーティ計算のための効率的なアルゴリズムを構成する場合の主要な問題は、アルゴリズムをデータｏｂｌｉｖｉｏｕｓな方法（ｄａｔａ−ｏｂｌｉｖｉｏｕｓｆａｓｈｉｏｎ）で実装し、すなわち実行パスが入力に依存しないようにする点にある。一般的に、制限された時間Ｔ内に実行できるＲＡＭプログラムはＯ（Ｔ＾３）チューリングマシン（ＴＭ）に変換できる。これはアランチューリングにより考案された、数学的計算の理想的モデルとして機能する理論的計算機であり、Ｏ（Ｔ＾３）は複雑性がＴ^３に比例することを意味する。また、制限されたＴ時間ＴＭはどれでもサイズＯ（Ｔ＾３）の回路に変換でき、これはデータｏｂｌｉｖｉｏｕｓ（ｄａｔａ−ｏｂｌｉｖｉｏｕｓ）である。これは、どの制限されたＴ時間実行可能ＲＡＭプログラムであっても、Ｏ（Ｔ＾３ｌｏｇＴ）の複雑性を有するデータｏｂｌｉｖｉｏｕｓ回路に変換できることを示唆している。かかる複雑性は高すぎ、ほとんどのアプリケーションでは法外に高すぎる。効率的なデータｏｂｌｉｖｉｏｕｓな実装が知られていないアルゴリズムに関する調査は、Ｗ．ＤｕａｎｄＭ．Ｊ．Ａｔａｌｌａｈによる文献「Ｓｅｃｕｒｅｍｕｌｔｉ−ｐａｒｔｙｃｏｍｐｕｔａｔｉｏｎｐｒｏｂｌｅｍｓａｎｄｔｈｅｉｒａｐｐｌｉｃａｔｉｏｎｓ：Ａｒｅｖｉｅｗａｎｄｏｐｅｎｐｒｏｂｌｅｍｓ」（ＮｅｗＳｅｃｕｒｉｔｙＰａｒａｄｉｇｍｓＷｏｒｋｓｈｏｐ，２００１）に記載されている。行列因子分解問題は広い意味でデータマイニング要約問題のカテゴリーに入る。 Regardless of the cryptographic primitives used, the main problem when constructing an efficient algorithm for secure multi-party computation is that the algorithm is implemented in a data-obvious fashion, ie the execution path is input The point is not to depend on. In general, a RAM program that can be executed within a limited time T can be converted into an O (T ^ 3) Turing machine (TM). This is a theoretical computer devised by Alan Turing that functions as an ideal model for mathematical calculations, where O (T ＾ 3) means that the complexity is proportional to T ³ . Also, any limited T time TM can be converted into a circuit of size O (T ^ 3), which is data oblivious (data-obligious). This suggests that any limited T-time executable RAM program can be converted to a data oblivious circuit with O (T ^ 3 logT) complexity. Such complexity is too high and is prohibitively high for most applications. A survey of algorithms for which an efficient data oblivious implementation is not known is Du and M.M. J. et al. A reference by Atallah, “Secure multi-party computing programs and their applications: A review and open programs” (described in New Security Paradigms Works, 200). Matrix factorization problems fall into the category of data mining summary problems in a broad sense.

ソーティングネットワークは当初ソーティングの並列化及びハードウェア実装の効率化を可能にするため開発された。これらのネットワークは、入力シーケンス（ａ_１，ａ_２，・・・，ａ_ｎ）を単調増加シーケンス（ａ’_１，ａ’_２，・・・，ａ’_ｎ）にソートする回路である。これは、主要構成ブロックである比較・スワップ回路を配線することにより構成されている。幾つかの業績は、暗号化目的のためにソーティングネットワークのデータｏｂｌｉｖｉｏｕｓ性（ｄａｔａ−ｏｂｌｉｖｉｏｕｓｎｅｓｓ）を利用している。しかし、暗号化はプライバシーを確保するには必ずしも十分ではない。敵対者は、暗号化ストレージに対するあなたのアクセスパターンを見ることができれば、あなたのアプリケーションが何をしているかに関するセンシティブな情報を知ることができる。ｏｂｌｉｖｉｏｕｓＲＡＭは、アクセスされた時にメモリを継続的にシャッフルし、どのデータがアクセスされたか、又はそれが以前いつアクセスされたかを完全に隠すことにより、この問題を解決する。ｏｂｌｉｖｉｏｕｓＲＡＭにおいて、データｏｂｌｉｖｉｏｕｓランダム置換をする手段としてソーティングを用いる。最近では、ｏｂｌｉｖｉｏｕｓＲＡＭは凸包、全最近傍、及び加重積集合のデータｏｂｌｉｖｉｏｕｓ計算を実行するのに使われている。 The sorting network was originally developed to allow parallel sorting and more efficient hardware implementation. These networks are circuits that sort the input sequence (a ₁ , a ₂ ,..., A _n ) into a monotonically increasing sequence (a ′ ₁ , a ′ ₂ ,..., A ′ _n ). This is configured by wiring a comparison / swap circuit which is a main component block. Some achievements utilize the data-obviousness of the sorting network for encryption purposes. However, encryption is not always sufficient to ensure privacy. Adversaries can see sensitive access information about what your application is doing if they can see your access patterns to encrypted storage. Obvious RAM solves this problem by continually shuffling memory when accessed and completely hiding what data was accessed or when it was previously accessed. In the oblivious RAM, sorting is used as a means for performing random replacement of data oblivious. Recently, oblivious RAM has been used to perform data oblivious computations of convex hulls, all nearest neighbors, and weighted product sets.

本原理の他の一態様は、当初、行列因子分解演算に参加していない新しいユーザにリコメンデーションを提供するために、リコメンデーションシステムがリッジ回帰を利用することである。リッジ回帰は、多数のデータポイントを入力として取り、これらのポイントを通るベストフィットする曲線を見つけるアルゴリズムである。このアルゴリズムは多くの機械学習アルゴリズムの構成ブロックである。米国仮特許出願第６１／７７２４０４号に説明されているように、ｎ個の入力変数ｘ_ｉ∈Ｒ^ｄのセットと出力変数ｙ_ｉ∈Ｒのセットが与えられると、ｙ_ｉ＝ｆ（ｘ_ｉ）であると関数ｆ：Ｒｄ→Ｒを学習する問題は回帰として知られている。 Another aspect of the present principle is that the recommendation system uses ridge regression to provide recommendations to new users who are not initially participating in the matrix factorization operation. Ridge regression is an algorithm that takes a large number of data points as input and finds a best-fit curve through these points. This algorithm is a building block of many machine learning algorithms. As described in US Provisional Patent Application No. 61 / 77,404, given a set of n input variables x _i εR ^{d and} a set of output variables y _i εR, y _i = f (x _i The problem of learning the function f: Rd → R is known as regression.

線形回帰は、ｆが線形写像によりよく近似できる、すなわち幾つかのβ∈Ｒｄに対して、

となるとの前提に基づく。ここで（・）^Ｔは転置演算を示す。 Linear regression is such that f can better approximate a linear mapping, ie for some β∈Rd

Based on the assumption that Here, (·) ^T indicates a transposition operation.

ベクトルβ＝（β_ｋ）ｋ＝１，．．．，ｄは、予測に使うほか、ｙが入力変数にどう依存するか示す点に興味がある。具体的に、係数β_ｋは出力に対して正負どちらか相関を示し、大きさは相対的な需要性を捉える。これらの係数が比較可能であり、数値的に安定していることを保証するため、入力ｘ_ｉを同じ有限領域（例えば、［−１，１］）にリスケールする。 Vector β = (β _k ) k = 1,. . . , D are used for prediction and are interested in how y depends on the input variable. Specifically, the coefficient β _k indicates either positive or negative correlation with the output, and the magnitude captures relative demand. To ensure that these coefficients are comparable and numerically stable, the input x _i is rescaled to the same finite region (eg, [−1, 1]).

ベクトルβ∈Ｒ^ｄを計算するため、次の二次関数

をＲ^ｄにわたり最小化することにより、後者をデータにフィットする。 To calculate the vector β∈R ^d , the quadratic function

Fit the latter to the data by minimizing over R ^d .

式（７）を最小化する手順はリッジ回帰と呼ばれている。目的関数Ｆ（β）はペナルティ項
［外５］

を含む。これはｐａｒｓｉｍｏｎｉｏｕｓな解を好む。直感的には、λ＝０の場合、最小化は単純な最小二乗問題を解くことに相当する。λ＞０の場合、ペナルティ項はノルムが大きい解にペナルティを与える。同じようにデータフィットする２つの解のうち、大きい係数が少ないものが好ましい。 The procedure for minimizing equation (7) is called ridge regression. The objective function F (β) is a penalty term [Outside 5]

including. This prefers a parsimonyous solution. Intuitively, when λ = 0, minimization is equivalent to solving a simple least squares problem. When λ> 0, the penalty term penalizes a solution with a large norm. Of the two solutions that fit the data in the same way, the one with a small large coefficient is preferable.

本原理は、加重積集合に近いがｇａｒｂｌｅｄｃｉｒｃｕｉｔｓを含むセキュアマルチパーティソーティングに基づく方法を提案する。図２は、本原理によるプライバシー保護リコメンデーションシステムの動作を示す。動作は次の通りである：
Ｉ．リコメンデーションシステム（ＲｅｃＳｙｓ）２３０、ブラインドプライバシー保護行列因子分解演算を実行する装置。具体的に、ＲｅｃＳｙｓは、ユーザに関する有用な事（ユーザがどの映画をレーティングしたか、どんなレートにしたか、またはユーザデータから得られるリコメンデーションを含む統計情報（平均、アイテムプロファイルなど））を学習せずに、ユーザのレーティングに対する行列因子分解から抽出したアイテムプロファイルＶをブラインドで計算する。
ＩＩ．暗号サービスプロバイダ（ＣＳＰ）２５０。これは、どの映画をレーティングしたか、どんなレーティングにしたか、またはリコメンデーションを含むユーザデータから抽出された任意の統計情報（平均、アイテムプロファイルなど）を含む、ユーザに関する有用な事を学習せずにセキュアな計算を可能とする。
ＩＩＩ．各々がアイテムのセット２２０に対するレーティングのセットを有するユーザのセットＡを含む一以上のユーザ２１０よりなるソースＡ。各ユーザｉ∈［ｎ］は、行列因子分解による自分のレーティングｒ_ｉ，ｊ：（ｉ，ｊ）∈Ｍに基づくアイテムのプロファイリングに同意するが、リコメンダには自分のレーティング、どのアイテムをレーティングしたか、及びユーザデータから抽出されるどんな統計情報（平均、アイテムプロファイルなど）も含めて何も開示することを望まない。これらのユーザはリコメンデーションを受けることを望む者もいるし望まない者もいる。例えば、リコメンデーションシステムはユーザのデータに対してお金を払っても良い。同様に、ソースＡは一以上のユーザＡのデータを含むデータベースを表していても良い。
ＩＶ．各々がアイテムのセットに対するレーティングのセットを有し、他人がアイテムをどうレーティングするかの予測の形式のリコメンデーションを受け取ることを望む、ユーザのセットＢ２１０４を含む一以上のユーザ２１０よりなるソースＢ。各ユーザは、自分のレーティング、どのアイテムをレーティングしたか、及びユーザデータから抽出されるすべての統計情報（平均、アイテムプロファイルなど）を含め、リコメンダに何も開示したくない。セットＢはセットＡと重なっても重ならなくてもよい。すなわち、リコメンデーションを得たいと思うユーザは行列因子分解演算に参加してもしなくてもよい。よって、セットＡとセットＢは互いに素であってもなくてもよい。同様に、ソースＢは一以上のユーザＢのデータを含むデータベースを表していても良い。 The present principle proposes a method based on secure multi-party sorting that is close to the weighted product set but includes garbled circuits. FIG. 2 illustrates the operation of a privacy protection recommendation system according to the present principles. The operation is as follows:
I. Recommendation system (RecSys) 230, a device that performs blind privacy protection matrix factorization operations. Specifically, RecSys learns useful things about the user (which movies the user rated, what rate, or statistical information including recommendations from user data (average, item profile, etc.)) Instead, the item profile V extracted from the matrix factorization with respect to the user's rating is calculated blindly.
II. Cryptographic service provider (CSP) 250. It doesn't learn useful things about the user, including what movies they rated, what rating they gave, or any statistical information (average, item profile, etc.) extracted from user data including recommendations Enables secure calculations.
III. Source A consisting of one or more users 210, including a set of users A, each having a set of ratings for a set of items 220. Each user i ∈ [n] agrees to item profiling based on his rating r _{i, j} : (i, j) ∈M by matrix factorization _, but recommends his rating and which item Do not want to disclose anything, including any statistical information (average, item profile, etc.) extracted from user data. These users may or may not want to receive recommendations. For example, the recommendation system may pay for user data. Similarly, source A may represent a database containing one or more user A data.
IV. Source B consisting of one or more users 210, including a set of users B 2104, each having a set of ratings for a set of items and wishing to receive recommendations in the form of predictions of how others will rate the item . Each user does not want to disclose anything to the recommender, including his rating, which item he has rated, and all statistical information (average, item profile, etc.) extracted from the user data. Set B may or may not overlap with set A. That is, users who want to get recommendations may or may not participate in matrix factorization operations. Therefore, set A and set B may or may not be prime. Similarly, source B may represent a database containing one or more user B data.

本原理によると、ＲｅｃＳｙｓが行列因子分解を実行でき、一方ＲｅｃＳｙｓもＣＳＰもリコメンデーションＲを含め、ユーザに関する有用なことは学習しないようにするプロトコルが提案される。具体的に、どちらもユーザのレーティング、ユーザがどのアイテムをレーティングしたか学習せず、どちらもアイテムプロファイルＶ、ユーザプロファイルＵ、リコメンデーション、またはユーザデータから抽出した統計情報を学習すべきではない。当業者には明らかであるが、リコメンダがユーザ及びアイテムのプロファイルを両方とも学習できるプロトコルは、公開しすぎる。かかる設計では、リコメンダは式（３）の内積からユーザのレーティングを簡単に推論できる。そのように、本原理は、リコメンダとＣＳＰがユーザプロファイル、アイテムプロファイル、又はユーザデータから抽出される統計的情報を学習しないプライバシー保護プロトコルを提案する。要約すると、これらは完全にブラインド（ｂｌｉｎｄ）で動作を実行し、ユーザに関する又はユーザデータから抽出された有用な情報を学習しない。 In accordance with the present principles, a protocol is proposed that allows RecSys to perform matrix factorization, while neither RecSys nor CSP learns useful things about the user, including recommendation R. Specifically, neither should learn the user's rating, what item the user rated, and neither should learn the item profile V, user profile U, recommendations, or statistical information extracted from user data. As will be apparent to those skilled in the art, protocols that allow recommenders to learn both user and item profiles are too public. In such a design, the recommender can easily infer the user's rating from the inner product of equation (3). As such, the present principles propose a privacy protection protocol in which recommenders and CSPs do not learn statistical information extracted from user profiles, item profiles, or user data. In summary, they perform operations completely blind and do not learn useful information about the user or extracted from user data.

アイテムプロファイルは、ユーザ／レコードのセットのレーティングの関数としてアイテムを定義するメトリック（ｍｅｔｒｉｃ）として見なすことができる。同様に、ユーザプロファイルは、ユーザ／レコードのセットのレーティングの関数としてユーザを定義するメトリック（ｍｅｔｒｉｃ）として見なすことができる。この意味で、アイテムプロファイルはアイテムの承認／非承認のものさし（ｍｅａｓｕｒｅ）である。ユーザプロファイルは、ユーザの好き嫌いのものさし、すなわちユーザのパーソナリティの反映である。ユーザ／レコードの大きなセットに基づき計算するとき、アイテムまたはユーザのプロファイルはそれぞれアイテムまたはユーザの独立なものさし（ｍｅａｓｕｒｅ）と考えることができる。当業者は、アイテムプロファイルのみを学習する有用性があることに気づくだろう。第１に、行列因子分解によるアイテムのＲｄへの組み込みにより、リコメンダは類似性を推論（及びエンコード）できる：プロファイルのユークリッド距離が小さいアイテムは、ユーザにより同様にレーティングされるアイテムである。そのため、アイテムプロファイルを学習するタスクは、リコメンダにとって実際のリコメンデーションのタスクよりも関心がある。具体的に、ソースがデータベースである場合のように、ユーザはリコメンデーションを受けることを必要としないまたは望まないことがある。第２に、アイテムプロファイルを取得すると、トリビア（ｔｒｉｖｉａ）がある：リコメンダは、ユーザから追加的データの開示を受けなくても、それを用いて関連するリコメンデーションを提供できる。リコメンダはユーザにＶを送る（又はそれを公開する）；ユーザｉは、自分のアイテム毎のレーティングを知っているので、式（２）をｕ_ｉについて解いて自分の（プライベートな）プロファイルｕ_ｉを推論できる；Ｖが与えられると（これは別の問題である）、各ユーザは自分のレーティングにリッジ回帰を行うことにより、自分のプロファイルを得られる。ｕ_ｉとＶがあれば、ユーザは、式（４）によりローカルで他のアイテムに対する自分のレーティングを予測できる。 The item profile can be viewed as a metric that defines the item as a function of the rating of the user / record set. Similarly, the user profile can be viewed as a metric that defines the user as a function of the rating of the user / record set. In this sense, the item profile is an item approval / non-approval measure. The user profile is what the user likes and dislikes, that is, a reflection of the user's personality. When calculating based on a large set of users / records, the item or user's profile can be thought of as the item or user's independent measure, respectively. Those skilled in the art will find it useful to learn only item profiles. First, by incorporating items into Rd by matrix factorization, recommenders can infer (and encode) similarities: items with a low Euclidean distance in the profile are items that are similarly rated by the user. Therefore, the task of learning the item profile is more interested for the recommender than the actual recommendation task. Specifically, the user may not need or want to receive recommendations, such as when the source is a database. Secondly, obtaining an item profile has a trivia: recommenders can use it to provide relevant recommendations without receiving additional data disclosure from the user. The recommender sends V to the user (or publishes it); since user i knows the rating for each item, he solves equation (2) for u _{i and} has his (private) profile u _i Given V (this is another problem), each user can get his profile by performing a ridge regression on his rating. If u _i and V are present, the user can predict his rating for other items locally according to equation (4).

上記のシナリオは両方とも、リコメンダもユーザもＶの公開（ｐｕｂｌｉｃｒｅｌｅａｓｅ）に反対しないことを前提とする。単純化のため、及びリコメンダに対するかかるプロトコルの有用性の理由で、本願と同日に出願した本発明者による、発明の名称「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹＰＲＥＳＥＲＶＩＮＧＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＲＴＩＯＮ」である同時係属中の出願によりリコメンダはアイテムプロファイルを学習できる。本原理は、リコメンダはブラインドで演算を行い、ユーザに関する有用な情報を（Ｖさえも）学習しないが、ユーザが自分の予測レーティングを学習するように、かつ行列因子分解にレーティングを提供しなかったユーザがリコメンデーションを得られるように、このデザインを拡張する。 Both of the above scenarios assume that neither the recommender nor the user is against public release of V. For simplification and because of the usefulness of such a protocol for recommenders, by the inventor filed on the same day as this application, the co-pending application with the title “A METHOD AND SYSTEM FOR PRIVACY PRESEVERING MATRIX FACTORY ZARTION” Recommenders can learn item profiles. The principle is that the recommender operates blindly and does not learn useful information about the user (even V), but does not provide a rating for the matrix factorization so that the user learns his prediction rating Extend this design so that users can get recommendations.

本原理によれば、セキュリティ保証は正直だが興味深いスレットモデル（ｔｈｒｅａｔｍｏｄｅｌ）の下で成り立つ。言い換えると、ＲｅｃＳｙｓとＣＳＰは上記の通りプロトコルに従う。しかし、これらの興味を有するパーティは、追加の情報を推論するため、オフラインであっても、プロトコルトランスクリプトを分析することを選ぶ。さらに、リコメンダとＣＳＰは共謀しないと仮定する。 According to this principle, security guarantees are honest but interesting under a threat model. In other words, RecSys and CSP follow the protocol as described above. However, these interested parties choose to analyze the protocol transcript, even offline, to infer additional information. Further assume that the recommender and the CSP do not conspire.

本原理の好ましい実施形態は、図３のフローチャート３００を満たし、次のステップにより記述されるプロトコルを有する：
Ｐ１．ソースＡはＲｅｃＳｙｓに、トークン（レーティング）とアイテムのいくつのペアが各参加レコード３０５に送られるか、レポートする。レコードのセットは二以上のレコードを含み、レコードごとのトークンのセットは少なくとも１つのトークンを含む。ソースがユーザのセットであるとき、各ユーザは自分のトークンとアイテムの数をＲｅｃＳｙｓに個別にレポートする。
Ｐ２．ＣＳＰは部分的準同型スキームξの公開暗号鍵を生成し、それをすべてのユーザ（ソースＡ）に送る（ステップ３１０）。当業者には言うまでもなく、準同型暗号は、暗号の一形式であり、これにより、暗号文に対してあるタイプの計算を実行して暗号化された結果を求め、平文に対して行った演算の結果と一致させることができる。例えば、どちらの人も個別の数字を見いだすことができなくても、ある人は２つの暗号化数字を加え、他の人はその結果を復号できる。部分的準同型暗号は平文に対する１つの演算（加算又は乗算）に関しては準同型である。部分的準同型暗号はスカラーへの加算及び乗算に関しては準同型である。ソースＡがユーザのセットであるとき、各ユーザは自分のトークンとアイテムの数をＲｅｃＳｙｓに個別にレポートする。
Ｐ３．セットＡ中の各ユーザは自分の鍵を用いて自分のデータを暗号化する（３１５）。具体的に、すべてのペア（ｊ，ｒ_ｉ，ｊ）について、ここでｊはアイテムｉｄであり、ｒ_ｉ，ｊはユーザｉがｊに与えたレーティングであり、ユーザは公開暗号鍵を用いてこのペアを暗号化する。セットＡ中の各ユーザは、自分の暗号化データをＲｅｃＳｙｓに送る。
Ｐ４．ＲｅｃＳｙｓは、暗号化データにマスクηを加え、暗号化されマスクされたデータをＣＳＰに送る（３２０）。当業者には言うまでもなく、マスクはデータ難読化の一形式であり、乱数生成器やシャッフリングと同じくらい単純でよい。
Ｐ５．ＣＳＰは暗号化しマスクしたデータを復号する（３２５）。
Ｐ６．ＲｅｃＳｙｓは、すべてのアイテムのコーパス中の少なくとも一アイテムについて、少なくとも一要求ユーザからリコメンデーション要求を受け取る（３３０）。各要求ユーザはセットＢに属し、ステップＰ１の貢献レコードを有しても有しなくてもよい。リコメンデーションを要求する要求ユーザがセットＡからのものであれば、本願と同日に出願され、発明の名称が「ＡＭＥＴＨＯＤＡＮＤＳＹＳＴＥＭＦＯＲＰＲＩＶＡＣＹ−ＰＲＥＳＥＲＶＩＮＧＲＥＣＯＭＭＥＮＤＡＴＩＯＮＴＯＲＡＴＩＮＧＣＯＮＴＲＩＢＵＴＩＮＧＵＳＥＲＳＢＡＳＥＤＯＮＭＡＴＲＩＸＦＡＣＴＯＲＩＺＡＴＩＯＮ」である本発明者による同時係属中の出願に記載したように別のプロトコルを使える。各要求ユーザは、いくつのアイテムをレーティングしたか、ＲｅｃＳｙｓにレポートする。
Ｐ７．ＲｅｃＳｙｓは、ＣＳＰに、ユーザとアイテムプロファイルの大きさ（すなわち、パラメータｄ）、レーティングの総数（すなわち、パラメータＭ）、セットＡ中のユーザの総数とアイテムの総数、及びｇａｒｂｌｅｄｃｉｒｃｕｉｔ中の実数の整数及び少数部分を表すのに用いられるビット数を含む、第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔを構成するのに必要な完全な仕様を送る（３３５）。
Ｐ８．ＣＳＰは、レコードに対して行列因子分解を行う、当業者にｇａｒｂｌｅｄｃｉｒｃｕｉｔとして知られたものを準備する（３４０）。ｇａｒｂｌｅｄであるために、回路はまずブーリアン回路として書かれる（３４０２）。回路への入力は、ＲｅｃＳｙｓがユーザデータをマスクするのに用いたマスクを含む。回路内において、マスクはデータをアンマスク（ｕｎｍａｓｋ）し、次いで行列因子分解するのに用いられる。回路の出力はＶ、すなわちアイテムプロファイルである。ＣＳＰは、アイテムｊごとに１つ、ランダムマスクρ_ｊを選択する。これらは各アイテムｊのプロファイルを隠すのに用いられる。回路は、平文にアイテムプロファイルＶを出力するのではなく、ＣＳＰにより構成された回路はマスクｐ_ｊでマスクされたアイテムプロファイルＶｊを出力する。個別レコードの内容及びレコードから抽出された情報に関する知識は得られない。
Ｐ９．ＣＳＰはＲｅｃＳｙｓに、行列因子分解のためｇａｒｂｌｅｄｃｉｒｃｕｉｔを送る（３４５）。具体的に、ＣＳＰは、ｇａｒｂｌｅｄテーブルへのゲートを処理して、それを回路構造により決まる順番でＲｅｃＳｙｓに送信する。
Ｐ１０．ＲｅｃＳｙｓとＣＳＰとの間の（３５０２）ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒ（３５０）により、ＲｅｃＳｙｓは自機またはＣＳＰが実際の値を学習せずに、暗号化されマスクされたレコードのｇａｒｂｌｅｄ値を学習する。当業者には言うまでもなく、普通の（ｐｌａｉｎ）ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒは、送信者が潜在的に多くの情報のうちの一つを（どの情報が転送されたかについてｏｂｌｉｖｉｏｕｓでいる）受信者に転送する一種の転送である。プロキシ（ｐｒｏｘｙ）ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒは４以上のパーティが関与するｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒである。
Ｐ１１．ＲｅｃＳｙｓは、マスクされたアイテムプロファイルを出力し、それをＣＳＰに送るｇａｒｂｌｅｄｃｉｒｃｕｉｔを評価する（３５５）。
Ｐ１２．ＲｅｃＳｙｓはＣＳＰに数Ｍｊを知らせ、第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔの仕様を与える。ほとんどのパラメータは、ユーザ及びアイテムプロファイルの大きさ（すなわち、パラメータｄ）、ｇａｒｂｌｅｄｃｉｒｃｕｉｔ中の実数の整数及び小数部を表すのに用いるビット数を含む、第１のｇａｒｂｌｅｄｃｉｒｃｕｉｔ中のパラメータを複製する（３６０）。
Ｐ１３．ＣＳＰは、要求ユーザのレーティングとマスクされたアイテムプロファイルにリッジ回帰を行いユーザが関心を有するアイテムのリコメンデーションを生成する第２のｇａｒｂｌｅｄｃｉｒｃｕｉｔを準備する（３６５）。ｇａｒｂｌｅｄであるために、回路はまずブーリアン回路として書かれる（３６５２）。この回路は次のタスクを実行する：
ａ．ユーザによりレーティングされた各アイテムｗについて、マスクされたアイテムプロファイルν_ｊ＋ρ_ｊを入力として受け取り、要求ユーザｉからＭ_ｊ個のレーティング（ｗ，ｒ_ｉ，ｗ）を受け取る（３６５２）。
ｂ．アイテムプロファイルをアンマスクし、ユーザによりレーティングされた各アイテムｗについて、それをユーザｉのＭ_ｊ個のペア（ｗ，ｒ_ｊ，ｗ）のタプル（ｗ，ｒ_ｊ，ｗ，ν_ｗ）のアレイに入れる（３６５４）。これは次のステップにより実行される：
ｉ．ユーザによりレーティングされた各アイテムｗについて、ユーザｉのＭ_ｊ個のすべてのペア（ｗ，ｒ_ｊ，ｗ）に次いで、アンマスクされたすべてのアイテムプロファイルν_ｊをアレイに入れる。
ｉｉ．ソーティングネットワークを用いて、アイテムプロファイルに関してこのアレイをソートする。その際、ソーティングの終わりに、各ペア（ｗ，ｒ_ｊ，ｗ）の次に、それが対応するプロファイルν_ｗが来るようにする。
ｉｉｉ．回路は、右から左にリニアパスを行い、各アイテムのアンマスクされたプロファイルν_ｗをそれが対応するタプル（ｗ，ｒ_ｊ，ｗ）にコピーする。
ｉｖ．ソーティングネットワークを用いて、回路は、これらのレーティングタプルをアイテムプロファイルから分離し、レーティングが、コピーされたアイテムプロファイルと共に、アレイの第１のＭ_ｊ個のポジションを占めるようにする。
ｃ．回路は、次いで、レーティングとそのアイテムプロファイルに対してリッジ回帰を行い、必要な代入をすることにより式（７）から導かれる

の解であるユーザプロファイルｕ_ｉを計算する（３６５６）。これは、米国仮特許出願第６１／７７２４０４に記載されたように、リッジ回帰を行う回路を用いて計算できる。
ｄ．このプロファイルｕ_ｉとアンマスクされたアイテムプロファイルν_ｊとを用いて、回路は、すべての関心アイテムｊの予測レーティング
［外６］

を計算し、これらの予測を出力する（３６５８）。
Ｐ１４．ＣＳＰは、セットＢの要求ユーザｉにこの回路を転送する（３７０）。
Ｐ１５．要求ユーザｉとＣＳＰとの間のｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒ３７５により（３７５２）、ユーザは自分の入力（ｊ，ｒ_ｉ，ｊ）に対するｇａｒｂｌｅｄ値を取得する。
Ｐ１６．要求ユーザｉ、ＲｅｃＳｙｓ、及びＣＳＰの間のプロキシｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒ３８０により、ユーザは、マスクされたアイテムプロファイルν_ｊ＋ρ_ｊに対応するｇａｒｂｌｅｄ値を取得する。具体的に、このプロキシｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒにおいて、ＲｅｃＳｙｓはマスクされたアイテムプロファイルを提供し、要求ユーザはマスクされたアイテムプロファイルのｇａｒｂｌｅｄ値を受け取り、ＣＳＰはプロキシとして動作する一方、どのパーティもアイテムプロファイルを学習せず、ＲｅｃＳｙｓのみがマスクされたアイテムプロファイルを知っている。
Ｐ１７．要求ユーザは回路を評価し、すべての関心アイテムの予測レーティングを出力として取得する（３８５）。 A preferred embodiment of the present principles satisfies the flowchart 300 of FIG. 3 and has a protocol described by the following steps:
P1. Source A reports to RecSys how many token (rating) and item pairs are sent to each participation record 305. The set of records includes two or more records, and the set of tokens for each record includes at least one token. When the source is a set of users, each user reports his token and number of items individually to RecSys.
P2. The CSP generates a public encryption key for the partially homomorphic scheme ξ and sends it to all users (source A) (step 310). It goes without saying to those skilled in the art that homomorphic cryptography is a form of cryptography, whereby an encrypted result is obtained by performing a certain type of computation on the ciphertext and an operation performed on the plaintext. Can be matched with the result. For example, if neither person can find an individual number, one can add two encrypted numbers and the other can decrypt the result. Partially homomorphic encryption is homomorphic for one operation (addition or multiplication) on plaintext. Partially homomorphic encryption is homomorphic with respect to addition and multiplication to scalars. When Source A is a set of users, each user reports his token and the number of items individually to RecSys.
P3. Each user in the set A encrypts his / her data using his / her key (315). Specifically, for all pairs (j, ri _{, j} ), where j is the item id, ri _{, j} is the rating given to _j by user i, and the user uses the public encryption key Encrypt this pair. Each user in set A sends his encrypted data to RecSys.
P4. RecSys adds the mask η to the encrypted data and sends the encrypted and masked data to the CSP (320). It goes without saying to those skilled in the art that a mask is a form of data obfuscation and can be as simple as a random number generator or shuffling.
P5. The CSP decrypts the encrypted and masked data (325).
P6. RecSys receives a recommendation request from at least one requesting user for at least one item in the corpus of all items (330). Each requesting user belongs to set B and may or may not have the contribution record of step P1. If the requesting user requesting the recommendation is from Set A, it will be filed on the same day as the present application, and the title of the invention will be “A METHOD AND AND SYSTEM FOR PRIVACY-PRESVERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED FACTED ON MATRIZ Other protocols can be used as described in the copending application by the inventor. Each requesting user reports to RecSys how many items have been rated.
P7. RecSys tells the CSP the size of the user and item profile (ie, parameter d), the total number of ratings (ie, parameter M), the total number of users in set A and the total number of items, and a real integer in a garbled circuit. And the full specification necessary to construct the first garbled circuit, including the number of bits used to represent the fractional part (335).
P8. The CSP prepares (340) what is known to those skilled in the art as a garbled circuit that performs matrix factorization on the records. Because it is garbled, the circuit is first written as a Boolean circuit (3402). The input to the circuit includes the mask that RecSys used to mask the user data. Within the circuit, the mask is used to unmask the data and then to matrix factorization. The output of the circuit is V, ie the item profile. The CSP selects one random mask ρ _j for each item j. These are used to hide the profile of each item j. Circuit, instead of outputting the item profile V in plaintext, circuit constituted by CSP outputs the item profile Vj masked by the mask p _j. Knowledge about the contents of the individual records and the information extracted from the records cannot be obtained.
P9. The CSP sends a garbled circuit to RecSys for matrix factorization (345). Specifically, the CSP processes the gate to the garbled table and sends it to RecSys in the order determined by the circuit structure.
P10. By (3502) obligious transfer (350) between RecSys and CSP, RecSys learns the garbled value of the encrypted and masked record without learning the actual value by itself or the CSP. It goes without saying to those skilled in the art that a plain obligate transfer is a type in which the sender forwards one of a lot of potentially information to the receiver (which is oblivious as to which information has been forwarded). It is a transfer. A proxy obligatory transfer is an obligate transfer involving four or more parties.
P11. RecSys outputs a masked item profile and evaluates the garbled circuit that sends it to the CSP (355).
P12. RecSys informs the CSP of the number Mj and gives the specifications for the second garbled circuit. Most parameters duplicate the parameters in the first garbled circuit, including the size of the user and item profile (ie parameter d), the real integer in the garbled circuit and the number of bits used to represent the fractional part. (360).
P13. The CSP prepares a second garbled circuit that performs ridge regression on the requesting user's rating and the masked item profile to generate recommendations for items of interest to the user (365). To be garbled, the circuit is first written as a Boolean circuit (3652). This circuit performs the following tasks:
a. For each item w rated by the user, the masked item profile ν _j + ρ _j is received as input, and M _j ratings (w, r _{i, w} ) are received from the requesting user i (3652).
b. Unmask the item profile and for each item w rated by the user, put it into an array of tuples (w, r _{j, w} , v _w ) of M _j pairs (w, r _{j, w} ) for user i (3654). This is done by the following steps:
i. For each item w rated by the user, all M _j pairs (w, r _{j, w} ) of user i are then placed in the array, all unmasked item profiles v _j .
ii. Sort this array for item profiles using a sorting network. Then, at the end of sorting, each pair (w, r _{j, w} ) is followed by the profile v _w corresponding to it.
iii. The circuit performs a linear path from right to left and copies the unmasked profile ν _w of each item to the tuple (w, r _{j, w} ) to which it corresponds.
iv. Using a sorting network, the circuit separates these rating tuples from the item profile so that the rating occupies the first M _j positions of the array along with the copied item profile.
c. The circuit is then derived from equation (7) by performing a ridge regression on the rating and its item profile and making the necessary substitutions.

The user profile u _i which is a solution of is calculated (3656). This can be calculated using a circuit that performs ridge regression, as described in US Provisional Patent Application No. 61 / 777,404.
d. Using this profile u _i and the unmasked item profile ν _j , the circuit predicts the predicted rating of all items of interest j [out 6]

And outputs these predictions (3658).
P14. The CSP forwards the circuit to the requesting user i of set B (370).
P15. By an obligatory transfer 375 between the requesting user i and the CSP (3752), the user obtains a garbled value for his input (j, r _{i, j} ).
P16. A proxy obligatory transfer 380 between the requesting user i, RecSys, and the CSP allows the user to obtain a garbled value corresponding to the masked item profile ν _j + ρ _j . Specifically, in this proxy transparent transfer, RecSys provides the masked item profile, the requesting user receives the garbled value of the masked item profile, and the CSP acts as a proxy, while any party learns the item profile Without knowing only the Recsys masked item profile.
P17. The requesting user evaluates the circuit and obtains the predicted rating of all items of interest as output (385).

上記の構成は、セットＢのユーザ（セットＡに含まれても含まれなくてもよい、すなわち行列因子分解演算のため自分のレーティングを送信していてもいなくてもよい）に対してうまく行く。 The above configuration works well for set B users (which may or may not be included in set A, i.e. they may or may not send their ratings for matrix factorization operations). .

技術的には、このプロトコルは各ユーザにより提供されたトークン数をリークする。これは簡単なプロトコル修正により、例えば、所定の最大数にいたるまで、適切な「ヌル（ｎｕｌｌ）」エントリーで送信されたレコードに「パディング（ｐａｄｄｉｎｇ）」することにより、直すことができる。単純化のため、プロトコルはこの「パディング」演算無しで説明した。 Technically, this protocol leaks the number of tokens provided by each user. This can be remedied by a simple protocol modification, for example by “padding” to the record sent with the appropriate “null” entry, up to a predetermined maximum number. For simplicity, the protocol has been described without this “padding” operation.

ｇａｒｂｌｅｄｃｉｒｃｕｉｔは一度だけしか使えないので、同じレーティングに対するその後の計算は、ユーザがプロキシｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒを通して自分のデータを再送信する必要がある。この理由により、本原理のプロトコルは、公開鍵暗号をｇａｒｂｌｅｄｃｉｒｃｕｉｔと組み合わせるハイブリッドアプローチを取り入れた。 Since the garbled circuit can only be used once, subsequent calculations for the same rating require the user to re-send his data through a proxy transparent transfer. For this reason, the principle protocol has adopted a hybrid approach that combines public key cryptography with a garbled circuit.

本原理では、公開鍵暗号を次のように用いる：各ユーザｉは、暗号アルゴリズムξｐｋ_ＣＳＰを用いて公開鍵ｐｋ_ＣＳＰの下で自分の各入力（ｊ，ｒ_ｉ，ｊ）を暗号化し、レーティングされた各アイテムｊについて、ユーザはＲｅｃＳｙｓにペア（ｉ，ｃ）をｃ＝ξ_{ｐｋＣＳＰ}（ｊ，ｒ_ｉ，ｊ）とともに送信する。ここで、全部でＭ個のレーティングが送信される。自分のレーティングを送信したユーザはオフラインになれる。 In this principle, public key cryptography is used as follows: Each user i encrypts his input (j, ri _{, j} ) under the public key pk _CSP using the cryptographic algorithm ξpk _CSP For each item j done, the user sends a pair (i, c) to _RecSys with c = ξ _pkCSP (j, r _{i, j} ). Here, a total of M ratings are transmitted. Users who submit their ratings can go offline.

ＣＳＰ公開鍵暗号アルゴリズムは部分的準同型（ｐａｒｔｉａｌｌｙｈｏｍｏｍｏｒｐｈｉｃ）である：対応する暗号鍵を知らなくても定数を暗号メッセージに適用できる。明らかに、ＰａｉｌｌｉｅｒやＲｅｇｅｖなどの加法的準同型スキームを用いて定数を加算できるが、ｈａｓｈ−ＥｌＧａｍａｌは、部分的準同型であり、これで十分であり、この場合にはより効率的に実装できる。 The CSP public key cryptographic algorithm is partially homomorphic: constants can be applied to cryptographic messages without knowing the corresponding cryptographic key. Obviously, constants can be added using additive homomorphic schemes such as Palier and Regev, but hash-ElGamal is partially homomorphic, which is sufficient and in this case can be implemented more efficiently. .

ユーザからＭ個のレーティングを受け取ると、暗号化は部分的準同型（ｐａｒｔｉａｌｌｙｈｏｍｏｍｏｒｐｈｉｃ）であることを思い出せば、ＲｅｃＳｙｓはそれをランダムマスク
［外７］

で見えなくする。ここで、ηはランダム又は疑似ランダム変数であり、演算記号（○に十字）はＸＯＲ演算である。ＲｅｃＳｙｓはそれを、ｇａｒｂｌｅｄｃｉｒｃｕｉｔを構成するのに必要な完全な仕様とともにＣＳＰに送る。具体的に、ＲｅｃＳｙｓは、ユーザ及びアイテムプロファイルの大きさ（ｄｉｍｅｎｓｉｏｎ）（すなわち、パラメータｄ）、レーティングの総数（すなわち、パラメータＭ）、及びユーザ及びアイテムの総数、並びにｇａｒｂｌｅｄｃｉｒｃｕｉｔ中の実数の整数及び少数部分を表すのに用いるビット数を指定する。 If M ratings are received from the user, remember that the encryption is partially homomorphic, RecSys will randomize it [outside 7]

Make it invisible. Here, η is a random or pseudo-random variable, and an operation symbol (a cross in a circle) is an XOR operation. RecSys sends it to the CSP along with the complete specification needed to construct a garbled circuit. Specifically, RecSys is the dimension of the user and item profile (ie, parameter d), the total number of ratings (ie, parameter M), and the total number of users and items, as well as real integers in the garbled circuit and Specifies the number of bits used to represent the fractional part.

ＲｅｃＳｙｓは、集積されたＭ個のレーティングに行列因子分解を行いたいたびに、ＭをＣＳＰにレポートする。ＣＳＰは、入力を復号するｇａｒｂｌｅｄｃｉｒｃｕｉｔをＲｅｃＳｙｓに提供し、次いで行列因子分解を行う。Ｖ．Ｎｉｋｏｌａｅｎｋｏ，Ｕ．Ｗｅｉｎｓｂｅｒｇ，Ｓ．Ｉｏａｎｎｉｄｉｓ，Ｍ．Ｊｏｙｅ，Ｄ．Ｂｏｎｅｈ，ａｎｄＮ．Ｔａｆｔによる文献「Ｐｒｉｖａｃｙ−ｐｒｅｓｅｒｖｉｎｇｒｉｄｇｅｒｅｇｒｅｓｓｉｏｎｏｎｈｕｎｄｒｅｄｓｏｆｍｉｌｌｉｏｎｓｏｆｒｅｃｏｒｄｓ」（ＩＥＥＥＳ＆Ｐ，２０１３）では、回路内での復号は、マスクと準同型暗号を用いることにより、回避されている。本原理は行列因子分解についてのこのアイデアを利用するが、部分的準同型暗号スキーム（ｐａｒｔｉａｌｌｙｈｏｍｏｍｏｒｐｈｉｃｅｎｃｒｙｐｔｉｏｎｓｃｈｅｍｅ）のみを必要とする。 RecSys reports M to the CSP whenever it wants to perform matrix factorization on the accumulated M ratings. The CSP provides a garbled circuit for decoding the input to RecSys and then performs matrix factorization. V. Nikolaenko, U. Weinsberg, S.M. Ioannidis, M.M. Joye, D.H. Boneh, and N.C. In the document “Privacy-Preserving Ridge Regression on Hundreds of Records of Records” (IEEE S & P, 2013) by Taft, decryption within a circuit is avoided by using a mask and a homomorphic encryption. The present principle uses this idea for matrix factorization, but only requires a partially homomorphic encryption scheme.

暗号を受け取ると、ＣＳＰはそれを復号し、マスクされた値
［外８］

を得る。次いで、行列因子分解を青写真として用いて、ＣＳＰはＹａｏのｇａｒｂｌｅｄｃｉｒｃｕｉｔを準備する。これは、
（ａ）マスクηに対応するｇａｒｂｌｅｄ値を入力として取る；
（ｂ）マスクηを取り除き、対応するタプル（ｉ，ｊ，ｒ_ｉ，ｊ）を回復する；
（ｃ）行列因子分解を行う；及び
（ｄ）
［外９］

でマスクされたアイテムプロファイル
［外１０］

を出力する。 Upon receipt of the cipher, the CSP decrypts it and masks it [outside 8].

Get. The CSP then prepares Yao's garbled circuit using matrix factorization as a blueprint. this is,
(A) Take as input a garbled value corresponding to the mask η;
(B) remove the mask η and recover the corresponding tuple (i, j, ri _{, j} );
(C) perform matrix factorization; and (d)
[Outside 9]

Item profile masked with [Outside 10]

Is output.

式（４）と（５）で概説した傾斜降下演算による行列因子分解の計算は、実数の加算、減算及び乗算を含む。これらの演算は回路で効率的に実装できる。傾斜降下（４）のＫ回の繰り返しはＫ個の回路「レイヤ（ｌａｙｅｒｓ）」に対応し、各回路レイヤは前の例やの値からプロファイルの新しい値を計算する。回路の出力はアイテムプロファイルＶであり、ユーザプロファイルは破棄される。 Calculation of matrix factorization by slope descent operations outlined in equations (4) and (5) includes real number addition, subtraction and multiplication. These operations can be implemented efficiently with a circuit. K iterations of the slope descent (4) correspond to K circuit “layers”, each circuit layer calculating a new value for the profile from the values of the previous example and so on. The output of the circuit is the item profile V, and the user profile is discarded.

本技術分野の当業者には言うまでもなく、例えば、ＲＡＭモデルにおいて、演算が平文で実行される場合、傾斜降下の各繰り返しを計算する時間的複雑性はＯ（Ｍ）である。各傾斜の計算（５）は２Ｍ個の項の加算を含み、プロファイル更新（４）はＯ（ｎ＋ｍ）＝Ｏ（Ｍ）で行える。 It goes without saying to those skilled in the art that, for example, in a RAM model, if the operation is performed in plaintext, the temporal complexity of calculating each iteration of slope descent is O (M). Each slope calculation (5) involves the addition of 2M terms, and profile update (4) can be done with O (n + m) = O (M).

回路として傾斜降下を実装する際の主要な問題は、効率よくそうすることにある。これを例示するため、次のナイーブな実装を考える：
Ｑ１．各ペア（ｉ，ｊ）∈［ｎ］×［ｍ］、入力からインジケータδ_ｉ，ｊ＝１_{（ｉ，ｊ）}∈Ｍ（ｉがｊをレーティングしたら１，そうでなければ０）を生成する。
Ｑ２．各繰り返しにおいて、これらの回路の出力を用いて、各アイテム及びユーザ傾斜を、ｍとｎの積にわたる合計として計算する。ここで、

である。 The main problem when implementing slope descent as a circuit is to do so efficiently. To illustrate this, consider the following naive implementation:
Q1. Each pair (i, j) ε [n] × [m], generates an indicator δ _{i, j} = 1 _{(i, j)} εM (1 if i rated j, 1 otherwise) .
Q2. At each iteration, the output of these circuits is used to calculate each item and user slope as a sum over the product of m and n. here,

It is.

残念ながらこの実装は効率的ではない：傾斜降下アルゴリズムの各繰り返しの回路複雑性（ｃｉｒｃｕｉｔｃｏｍｐｌｅｘｉｔｙ）はＯ（ｎ×ｍ）である。Ｍ≪ｎ×Ｍのとき、実際にはこういう場合が普通であるが、上記の回路は平文における傾斜降下（ｇｒａｄｉｅｎｔｄｅｓｃｅｎｔ）より急激に効率的でなくなる。実際、ほとんどのデータセットについて、二次のコストＯ（ｎ×ｍ）は禁止的である。
ナイーブな実装の非効率性は、どのユーザがアイテムをレーティングしたか、どのアイテムが回路デザインの時にユーザによりレーティングされたかを特定できないことに起因する。これがデータに本来備わった希薄性をレバレッジする能力を緩和している。 Unfortunately, this implementation is not efficient: the circuit complexity of each iteration of the gradient descent algorithm is O (n × m). When M << n × M, this is usually the case, but the above circuit is not as efficient as the gradient drop in plain text. In fact, for most datasets, the secondary cost O (n × m) is prohibitive.
The inefficiency of naive implementation is due to the inability to determine which user has rated the item and which item has been rated by the user at the time of circuit design. This alleviates the ability to leverage the sparseness inherent in the data.

逆に、本原理の好ましい実施形態では、複雑性がθ（（η＋ｍ＋Ｍ）ｌｏｇ^２（ｎ＋ｍ＋Ｍ））である、すなわち平文における実装の多重対数係数内であるソーティングネットワークに基づく回路実装が提供される。要約すると、タプル（ｉ，ｊ，ｒ_ｉ，ｊ）と、ユーザ及びアイテム両方のプロファイルのプレースホルダー（ｐｌａｃｅｈｏｌｄｅｒｓ）｜に対応する両方の入力データが共にアレイに格納される。適当なソーティング演算により、ユーザ又はアイテムプロファイルは、識別子を共有する入力の近くに配置され得る。データをリニアパスすることにより、傾斜の計算とプロファイルの更新ができる。ソーティング時、プレースホルダーは＋∞として、すなわち他のどの数より大きいとして扱われる。 Conversely, a preferred embodiment of the present principles provides a circuit implementation based on a sorting network whose complexity is θ ((η + m + M) log ² (n + m + M)), ie, within the multiple logarithmic coefficients of the implementation in plain text. In summary, both the input data corresponding to the tuple (i, j, ri _{, j} ) and the placeholders | of both the user and item profiles are stored in the array. With appropriate sorting operations, the user or item profile can be placed near the input sharing identifier. By linearly passing the data, it is possible to calculate the slope and update the profile. When sorting, placeholders are treated as + ∞, ie greater than any other number.

本原理の好ましい実施形態による、図４のフローチャート４００を満たす行列因子分解アルゴリズムは、次のステップで記述され得る：
Ｃ１．行列Ｓを初期化する（４１０）。
アルゴリズムは、入力として、セット
［外１１］

を、または同じ事であるが、タプル
［外１２］

を受け取り、ｎ＋ｍ＋Ｍ個のタプルのアレイを構成する。Ｓの最初のｎ個とｍ個のタプルは、それぞれユーザ及びアイテムプロファイルのプレースホルダーとして機能し、残りのＭ個のタプルは入力Ｌ_ｉを格納する。より具体的には、各ユーザｉ∈［ｎ］について、アルゴリズムはタプル
［外１３］

を構成する。ここで、ｕ_ｉ∈Ｒ^ｄはユーザｉの初期プロファイルであり、ランダムに選択される。各アイテムｊ∈［ｍ］について、アルゴリズムは、タプル
［外１４］

を構成する。ここで、ν_ｊ∈Ｒｄはアイテムｊの初期プロファイルであり、これもランダムに選択される。最後に、各ペア（ｉ，ｊ）∈Ｍについて、アルゴリズムは、対応するタプル
［外１５］

を構成する。ここで、ｒ_ｉ，ｊはユーザｉのアイテムｊに対するレーティングである。結果として得られるアレイは図５（Ａ）に示した通りである。ｋ番目のタプルのｉ番目の要素により示すと、これらの要素は次の役割を果たす：
（ａ）ｓ_１，ｋ：［ｎ］中のユーザ識別子；
（ｂ）ｓ_２，_ｋ ^：［ｍ］中のアイテム識別子；
（ｃ）ｓ_３，ｋ：タプルが「プロファイル」または「入力」タプルであるか示すバイナリフラグ；
（ｄ）ｓ_４，ｋ：「入力」タプル中のレーティング；
（ｅ）ｓ_５，ｋ：Ｒ^ｄ中のユーザプロファイル；
（ｆ）ｓ_６，ｋ：Ｒ^ｄ中のアイテムプロファイル。
Ｃ２．タプルをユーザＩＤに関して（行１と３に関して）昇順でソートする（４２０）。２つのＩＤが等しければ、タプルフラグすなわち各タプルの第３要素を比較することにより、均衡を破る。このように、ソーティング後、各「ユーザプロファイル」タプルは同じＩＤの「入力」タプルにより引き継がれる：
Ｃ３．ユーザプロファイル（左パス）をコピーする（４３０）：
［外１６］

Ｃ４．タプルをアイテムＩＤに関して（行２と３に関して）昇順でソートする（４４０）。
４４０．２つのＩＤが等しければ、タプルフラグすなわち各タプルの第３要素を比較することにより、均衡を破る。
Ｃ５．アイテムプロファイル（左パス）をコピーする（４５０）：
［外１７］

Ｃ６．∀ｋ＜Ｍについて傾斜貢献を計算する（４６０）：
［外１８］

Ｃ７．アイテムプロファイルを更新する（右パス）（４７０）：
［外１９］

Ｃ８．行１と３に関してタプルをソートする（４７５）
Ｃ９．ユーザプロファイルを更新する（右パス）（４８０）：
［外２０］

Ｃ１０．繰り返し数がＫ未満であればＣ３に進む（４８５）
Ｃ１１．行３と２に関してタプルをソートする（４９０）
Ｃ１２．アイテムプロファイルｓ_６，ｋ（ｋ＝１，．．．，ｍ）を出力する（４９５）。出力は少なくとも１つのアイテムプロファイルに制限され得る。 A matrix factorization algorithm that satisfies the flowchart 400 of FIG. 4 according to a preferred embodiment of the present principles may be described in the following steps:
C1. The matrix S is initialized (410).
Algorithm is set as input [outside 11]

Or the same thing but a tuple [outside 12]

And configure an array of n + m + M tuples. The first n and m number of tuples of S, respectively function as a placeholder for the user and item profile, the rest of the M-tuple stores the input L _i. More specifically, for each user i∈ [n], the algorithm is a tuple [outside 13].

Configure. Here, u _i ∈ R ^d is the initial profile of the user i and is selected at random. For each item j∈ [m], the algorithm is a tuple [outside 14]

Configure. Here, ν _j εRd is the initial profile of item j, which is also selected at random. Finally, for each pair (i, j) εM, the algorithm computes the corresponding tuple [outside 15]

Configure. Here, r _{i, j} is a rating for item j of user i. The resulting array is as shown in FIG. Indicated by the i-th element of the k-th tuple, these elements play the following roles:
(A) s _{1, k} : user identifier in [n];
(B) s ₂ , _k ^: item identifier in [m];
(C) s _{3, k} : binary flag indicating whether the tuple is a “profile” or “input” tuple;
(D) s _{4, k} : rating in the “input” tuple;
(E) s _{5, k} : user profile in R ^d ;
(F) s _{6, k} : Item profile in R ^d .
C2. The tuples are sorted in ascending order (420) with respect to user ID (for rows 1 and 3). If the two IDs are equal, the balance is broken by comparing the tuple flag, ie the third element of each tuple. Thus, after sorting, each “user profile” tuple is inherited by an “input” tuple with the same ID:
C3. Copy user profile (left path) (430):
[Outside 16]

C4. The tuples are sorted in ascending order (440) with respect to item IDs (with respect to rows 2 and 3).
If 440.2 IDs are equal, the balance is broken by comparing the tuple flags, ie the third element of each tuple.
C5. Copy the item profile (left path) (450):
[Outside 17]

C6. Calculate the tilt contribution for ∀k <M (460):
[Outside 18]

C7. Update item profile (right pass) (470):
[Outside 19]

C8. Sort tuples for rows 1 and 3 (475)
C9. Update user profile (right pass) (480):
[Outside 20]

C10. If the number of repetitions is less than K, proceed to C3 (485)
C11. Sort tuples for rows 3 and 2 (490)
C12. The item profile s _{6, k} (k = 1,..., M) is output (495). The output can be limited to at least one item profile.

傾斜降下インターラクションは次の３つの主要ステップを有する：
Ａ．プロファイルのコピー：各繰り返しにおいて、各ユーザｉと各アイテムｊのプロファイルｕ_ｉとν_ｊは、ｉとｊが現れる各「入力」タプルの対応要素ｓ_５，ｋとｓ_６，ｋにコピーされる。これはアルゴリズムのステップＣ２ないしＣ５に実装されている。例えば、ユーザプロファイルをコピーするため、ユーザＩＤ（すなわち、ｓ_１，ｋ）を主インデックスとして、かつフラグ（すなわち、ｓ_３，ｋ）を副インデックスとして用いて、Ｓをソートする。Ｓの初期状態に適用されるかかるソーティングの一例を図５（Ｂ）に示した。その後、アルゴリズムのステップＣ３で形式的に説明したように、左から右に（「左」パス）アレイを横切ることにより、ユーザＩＤをコピーする。これは、各「プロファイル」タプルからその隣接する「入力」タプルにｓ_５，ｋをコピーする；アイテムプロファイルは同様にコピーされる。
Ｂ．傾斜貢献の計算：プロファイルをコピーした後、例えば、（ｉ，ｊ）に対応する各「入力」タプルは、最後の繰り返しで計算された（ｓ_４，ｋの）レーティングｒ_ｉ，ｊと（それぞれｓ_５，ｋとｓ_６，ｋ中の）プロファイルｕ_ｉとｖ_ｊを格納する。これらから、次の数量が計算される：
［外２１］

、これは式（５）で与えられたｕ_ｉとｖ_ｊに関する傾斜中のタプルの「貢献」と見なせる。これらは、アルゴリズムのステップＣ６に示したように、タプルの要素ｓ_５，ｋとｓ_６，ｋを置き換える。フラグの適切な使用により、この演算は「入力」タプルにのみ影響し、「プロファイル」タプルは変化しない。
Ｃ．プロファイルの更新：最後に、アルゴリズムのステップＣ７乃至Ｃ９に示したように、ユーザ及びアイテムプロファイルが更新される。適切なソーティングにより、「プロファイル」タプルが、ＩＤが共通な「入力」タプルに隣接される。更新されたプロファイルはアレイの右から左への横断（ｔｒａｖｅｒｓｉｎｇ）（「右パス」）により計算される。この動作により、「入力」タプルを横断する時に、傾斜の貢献を加える。「プロファイル」タプルと出会うと、加算された傾斜貢献がプロファイルに加算され、適宜スケールされる。プロファイルを通過した後、フラグｓ_３，ｋ、ｓ_{３，ｋ＋１}の適切な使用により、傾斜貢献の加算がゼロから再度始まる。 Inclined descent interaction has three main steps:
A. Profile copy: At each iteration, the profiles u _i and ν _j of each user i and each item j are copied to the corresponding elements s _{5, k} and s _{6, k} of each “input” tuple in which i and j appear. . This is implemented in steps C2 to C5 of the algorithm. For example, to copy a user profile, sort S using the user ID (ie, s _{1, k} ) as the primary index and the flag (ie, s _{3, k} ) as the secondary index. An example of such sorting applied to the initial state of S is shown in FIG. The user ID is then copied by traversing the array from left to right (“left” path), as formally described in step C3 of the algorithm. This copies s _{5, k} from each “profile” tuple to its adjacent “input” tuple; the item profile is copied as well.
B. Calculation of the slope contribution: After copying the profile, for example, each “input” tuple corresponding to (i, j) and the rating r _{i, j} (of s _{4, k} ) calculated in the last iteration (respectively Store profiles u _i and v _{j (} in s _{5, k} and s _{6, k} ). From these, the following quantities are calculated:
[Outside 21]

This can be regarded as the “contribution” of the tuple in the gradient with respect to u _i and v _j given in equation (5). These replace the tuple elements s _{5, k} and s _{6, k} as shown in step C6 of the algorithm. With proper use of flags, this operation only affects the “input” tuple, and the “profile” tuple does not change.
C. Profile update: Finally, the user and item profiles are updated as shown in steps C7 to C9 of the algorithm. With proper sorting, “profile” tuples are adjacent to “input” tuples with a common ID. The updated profile is calculated by traversing the array from right to left (“right path”). This action adds a tilt contribution when traversing the “input” tuple. Upon encountering a “profile” tuple, the added slope contribution is added to the profile and scaled accordingly. After passing through the profile, the addition of the slope contribution starts again from zero with the proper use of the flags s _{3, k} , s _{3, k + 1} .

上記の動作はＫ回、すなわち傾斜降下の所望の繰り返し数だけ繰り返される。最後に、最後の繰り返しの終わりに、フラグ（すなわち、ｓ_３，ｋ）を主インデックスとし、アイテムＩＤ（すなわち、ｓ_２，ｋ）を副インデックスとして、アレイをソートする。これにより、すべてのアイテムプロファイルタプルは、そのアイテムプロファイルを出力できるアレイ中の最初のｍ個の位置になる。さらに、ユーザプロファイルを取得するため、最後の繰り返しの終わりに、フラグ（すなわち、ｓ_３，ｋ）を主インデックスとし、ユーザＩＤ（すなわち、ｓ_１，ｋ）を副インデックスとして、アレイをソートする。これにより、すべてのユーザプロファイルタプルは、そのユーザプロファイルを出力できるアレイ中の最初のｎ個の位置になる。 The above operation is repeated K times, that is, the desired number of slope descents. Finally, at the end of the last iteration, the array is sorted with the flag (ie, s _{3, k} ) as the primary index and the item ID (ie, s _{2, k} ) as the secondary index. This makes all item profile tuples the first m positions in the array from which the item profile can be output. Further, to obtain the user profile, the array is sorted at the end of the last iteration, with the flag (ie, s _{3, k} ) as the primary index and the user ID (ie, s _{1, k} ) as the secondary index. This causes all user profile tuples to be the first n positions in the array where the user profile can be output.

本技術分野の当業者には言うまでもないが、上記の各動作はデータｏｂｌｉｖｉｏｕｓ（ｄａｔａ−ｏｂｌｉｖｉｏｕｓ）であり、回路として実装できる。プロファイルのコピーと更新には、（ｎ＋ｍ＋Ｍ）個のゲートが必要であり、全体的な複雑性は、例えば、Ｂａｔｃｈｅｒの回路を用いて、
［外２２］

のコストを生じるものをソートすることにより判断される。アルゴリズムのステップＣ６におけるソーティングと傾斜計算は、最も計算量が大きい演算である；幸い、両者は並列化可能である。また、ソーティングは、各繰り返しにおいて前に計算された比較を再利用することにより、さらに最適化できる。特に、この回路はブーリアン回路（例えば、ＯＲ、ＡＮＤ、ＮＯＴ、及びＸＯＲゲートのグラフとして）実装でき、これにより前述の通り実装をｇａｒｂｌｅｄすることができる。 It goes without saying to those skilled in the art that each of the above operations is data oblivious (data-obvious) and can be implemented as a circuit. Copying and updating a profile requires (n + m + M) gates, and the overall complexity is, for example, using a Batcher circuit,
[Outside 22]

It is determined by sorting those that cause costs. The sorting and slope calculation in step C6 of the algorithm is the computation with the largest amount of computation; fortunately both can be parallelized. Sorting can also be further optimized by reusing the previously calculated comparison at each iteration. In particular, this circuit can be implemented as a Boolean circuit (eg, as a graph of OR, AND, NOT, and XOR gates), which can garble the implementation as described above.

本原理によると、上記の行列因子分解アルゴリズムと前記のプロトコルの実装により、プライバシーを保護したリコメンデーションの新規な方法が提供される。また、このソリューションにより、ソーティングネットワークを用いることにより、平文において実行される行列因子分解の、多重対数係数内の複雑性を有する回路を提供する。さらに、本実装の付加的利点は、ｇａｒｂｌｉｎｇとこの回路の実行が高度に並列化できることである。 According to this principle, the implementation of the above matrix factorization algorithm and the above protocol provides a novel method for recommending privacy protected. This solution also provides a circuit with complexity within the multilogarithmic coefficients of matrix factorization performed in plaintext by using a sorting network. Furthermore, an additional advantage of this implementation is that garbling and execution of this circuit can be highly parallelized.

本原理によるシステムの実装では、ｇａｒｂｌｅｄｃｉｒｃｕｉｔ構成は公開されているｇａｒｂｌｅｄｃｉｒｃｕｉｔフレームワークであるＦａｓｔＧＣに基づくものであった。ＦａｓｔＧＣはＪａｖａ（登録商標）ベースのオープンソースフレームワークであり、これにより基本的なＸＯＲ、ＯＲ及びＡＮＤゲートを用いた回路定義が可能となる。回路が構成されると、フレームワークがｇａｒｂｌｉｎｇ、ｏｂｌｉｖｉｏｕｓｔｒａｎｓｆｅｒ及びｇａｒｂｌｅｄｃｉｒｃｕｉｔの完全な評価を処理する。しかし、回路をｇａｒｂｌｉｎｇし実行する前に、ＦａｓｔＧＣはＪａｖａ（登録商標）オブジェクトのセットとしてメモリ中にｇａｒｂｌｅｄされていない回路全体を表す。これらのオブジェクトは、ｇａｒｂｌｅｄｃｉｒｃｕｉｔが生じるべきメモリフットプリントに対して大きなメモリオーバーヘッドを生じる。いつの時点でもゲートのサブセットのみがｇａｒｂｌｅｄされ、及び／または実行されるからである。さらに、上記の通り、ＦａｓｔＧＣは実行プロセスと並列にｇａｒｂｌｉｎｇを実行するが、両方の動作は順次に行われる：ゲートは、その入力の準備ができると、一つずつ処理される。当業者には言うまでもなく、この実装は並列化の影響を受けにくい。 In the implementation of the system according to the present principle, the garbled circuit configuration is based on FastGC, which is a public garbled circuit framework. FastGC is an open source framework based on Java (registered trademark), which enables circuit definition using basic XOR, OR, and AND gates. Once the circuit is configured, the framework handles the full evaluation of garbling, obligious transfer, and garbled circuit. However, before garbling and executing the circuit, FastGC represents the entire circuit that is not garbled in memory as a set of Java objects. These objects create a large memory overhead for the memory footprint that a garbled circuit should occur. This is because only a subset of the gates are garbled and / or executed at any given time. Further, as described above, FastGC performs garbling in parallel with the execution process, but both operations are performed sequentially: the gate is processed one by one when it is ready for its input. It goes without saying to those skilled in the art that this implementation is less susceptible to parallelization.

結果として、フレームワークはこれらの２つの問題を解決するため修正され、ＦａｓｔＧＣのメモリフットプリントを低減するだけでなく、並列的ｇａｒｂｌｉｎｇ及び複数のプロセッサに跨る計算を可能にした。具体的に、回路を水平にパーティションして順次的「レイヤ」にして、それぞれが並列に実行できる垂直な「スライス」のセットを有する機能を導入した。レイヤは、その入力がすべて準備できた時にのみ、メモリ中に生成される。一旦ｇａｒｂｌｅｄされ評価されると、レイヤ全体がメモリから削除され、次のレイヤが構成でき、こうしてメモリフットプリントが最大レイヤのサイズに制限される。レイヤの実行はスライスをスレッドにアサインして、並列実行を可能にするスケジューラを用いて行われる。並列化は複数のコアを有する単一マシン上に実装されたが、この実装は簡単に複数の異なるマシンにわたる実行に拡張できる。スライス間の共有状態は仮定されていないからである。最後に、アルゴリズムで概説した数値演算を実装するため、ＦａｓｔＧＣを拡張して固定小数点表現の実数の加算と乗算、及びソーティングをサポートした。ソーティングの場合、Ｂａｔｃｈｅｒのソーティングネットワークを用いた。固定小数点表現により、切り捨ての結果生じる精度低下と回路のサイズとの間にトレードオフの関係が生じた。 As a result, the framework was modified to solve these two problems, not only reducing the memory footprint of FastGC, but also allowing parallel garbling and computation across multiple processors. Specifically, we introduced the ability to partition the circuit horizontally into sequential “layers”, each having a set of vertical “slices” that can be executed in parallel. A layer is created in memory only when all of its inputs are ready. Once garbled and evaluated, the entire layer is deleted from memory and the next layer can be constructed, thus limiting the memory footprint to the size of the largest layer. The execution of layers is performed using a scheduler that assigns slices to threads and enables parallel execution. Although parallelization was implemented on a single machine with multiple cores, this implementation can easily be extended to run across multiple different machines. This is because a shared state between slices is not assumed. Finally, in order to implement the numerical operations outlined in the algorithm, FastGC was extended to support addition and multiplication of real numbers in fixed-point representation, and sorting. In the case of sorting, a Batcher sorting network was used. The fixed-point representation resulted in a trade-off between the precision loss resulting from truncation and the circuit size.

さらに、アルゴリズムの実装が複数の方法で最適化された。特に、
（ａ）回路の実行の始めに計算される比較を再利用することにより、ソーティングのコストを低減した：
ソーティングネットワークの基本的構成ブロックは、比較及びスワップ回路であり、これは２つのアイテムを比較して、必要に応じてそれをスワップし、出力されるペアを順序付ける。行列因子分解アルゴリズムのソーティング動作（ラインＣ４とＣ８）は、繰り返し毎にまったく同じ入力を用いて、Ｋ個の傾斜降下繰り返しのそれぞれにおいてタプル間の同じ比較を行う。実際、各ソーティングは、各繰り返しにおいて、全く同じ方法で、アレイＳ中のタプルの順序を変える。この特性は、これらのソーティングの比較動作をそれぞれ一度だけ実行することに利用する。特に、（ｉ，ｊ，フラグ，レーティング）形式のタプルのソーティングは、例えば、先にｉとフラグ、ｊとフラグ、そしてまたｉとフラグに関して、（ユーザ又はアイテムプロファイルのペイロード無しに）計算の始めに実行される。その後、比較回路の出力は、傾斜降下中に用いられるスワップ回路への入力としてこれらの各ソーティングにおいて再利用される。結果として、各繰り返しにおいて適用される「ソーティング」ネットワークは、何も比較をしないが、単純にタプルの順序を変える（すなわち、これは「置換」ネットワークである）。
（ｂ）アレイＳのサイズを低減する：
すべての比較を事前に計算しておくと、Ｓ中のタプルのサイズを大幅に削減できる。初めに、当業者には言うまでもなく、ユーザ又はアイテムＩＤに対応する行は、行列因子分解アルゴリズムにおいて、ソーティング中の比較への入力として使われるだけである。フラグとレーティングはコピー及び更新フェーズ中に用いられるが、それらの相対的位置は各繰り返しにおいて同一である。さらに、これらの位置は、計算の初めにおいて、タプル（ｉ，ｊ，フラグ，レーティング）のソーティングの出力として計算できる。そのため、各繰り返しにおいて実行される「置換」動作は、ユーザ及びアイテムプロファイルにのみ適用する必要がある；他のすべての行はアレイＳから削除できる。もう一つの改善により、置換のコストがさらに１／２に低減する：例えば、ユーザプロファイルのセットを固定し、アイテムプロファイルのみを置換する。次いで、アイテムプロファイルが２つの状態の間でローテーションし、一方の状態には他方の状態から置換により到達できる：ユーザプロファイルと合わせられ、部分傾斜が計算される状態と、アイテムプロファイルが更新されコピーされる状態である。
（ｃ）ＸＯＲを用いてスワップ動作を最適化した：
ＸＯＲ演算が「フリー」に実行できるので、可能であれば、ＸＯＲを用いることにより、比較、スワップ、更新及びコピー動作の最適化を行う。本技術分野の当業者には言うまでもなく、フリーＸＯＲゲートは、関連するｇａｒｂｌｅｄテーブル及びそれに対応するハッシング又は対称鍵演算を行わなくてもｇａｒｂｌｅｄでき、計算と通信における顕著な改善である。
（ｄ）計算を並列化する：
ソーティングと傾斜計算は、行列因子分解回路における計算の大部分をなし（コピーと更新による貢献は実行時間の３％及び非ＸＯＲゲートの０．４％未満である）、これらの演算はＦａｓｔＧＣのこの拡張により並列化される。傾斜計算は明らかに並列化可能である；ソーティングネットワークも高度に並列化可能である（並列化はその開発の主要なモチベーションである）。さらに、各ソートの多くのパラレルスライスが同一であるので、回路スライスを確定している同じＦａｓｔＧＣは異なる入力で再利用され、メモリ中でオブジェクトを繰り返し生成及び削除する必要性が大幅に減る。 In addition, the algorithm implementation was optimized in several ways. In particular,
(A) The cost of sorting was reduced by reusing the comparison calculated at the beginning of the circuit execution:
The basic building block of the sorting network is a compare and swap circuit, which compares two items, swaps them as necessary, and orders the output pairs. The sorting operation of the matrix factorization algorithm (lines C4 and C8) makes the same comparison between tuples in each of the K gradient descent iterations, using exactly the same inputs for each iteration. In fact, each sorting changes the order of the tuples in the array S in exactly the same way at each iteration. This property is used to perform each of these sorting comparison operations only once. In particular, sorting of tuples of the form (i, j, flag, rating), for example, starts the calculation (without the user or item profile payload) for i and flags, j and flags, and also i and flags first. To be executed. The output of the comparison circuit is then reused in each of these sortings as an input to the swap circuit used during ramp down. As a result, the “sorting” network applied at each iteration makes no comparisons, but simply changes the order of the tuples (ie, this is a “replacement” network).
(B) Reduce the size of the array S:
If all comparisons are calculated in advance, the size of the tuples in S can be significantly reduced. Initially, it goes without saying to those skilled in the art that the row corresponding to the user or item ID is only used as an input to the comparison during sorting in the matrix factorization algorithm. Although flags and ratings are used during the copy and update phase, their relative positions are the same in each iteration. Furthermore, these positions can be calculated as the output of sorting tuples (i, j, flags, ratings) at the beginning of the calculation. As such, the “replace” operation performed at each iteration only needs to be applied to the user and item profiles; all other rows can be deleted from the array S. Another improvement further reduces replacement costs by a factor of two: for example, fixing a set of user profiles and replacing only item profiles. The item profile then rotates between the two states, and one state can be reached by replacement from the other state: combined with the user profile, the partial slope is calculated, and the item profile is updated and copied This is a state.
(C) Optimized swap operation using XOR:
Since the XOR operation can be executed “freely”, if possible, the comparison, swap, update, and copy operations are optimized by using XOR. It goes without saying to those skilled in the art that free XOR gates can be garbled without the associated garbled tables and their corresponding hashing or symmetric key operations, a significant improvement in computation and communication.
(D) Parallelize the computation:
Sorting and slope calculations make up the bulk of the computation in the matrix factorization circuit (the contribution from copying and updating is less than 3% of execution time and less than 0.4% of non-XOR gates), and these operations are performed by FastGC Parallelized by extension. Gradient computation is obviously parallelizable; sorting networks are also highly parallelizable (parallelization is a major motivation for its development). Furthermore, since many parallel slices in each sort are the same, the same FastGC that establishes the circuit slice is reused with different inputs, greatly reducing the need to repeatedly create and delete objects in memory.

言うまでもなく、本原理は、ハードウェア、ソフトウェア、ファームウェア、特殊用途プロセッサ、またはこれらの組み合わせなどのいろいろな形体で実施することができる。好ましくはハードウェアとソフトウェアを組み合わせて本原理を実施する。また、プログラム記録装置に有体的に化体されたアプリケーションプログラムとしてソフトウェアを実施してもよい。そのアプリケーションプログラムは、好適なアーキテクチャを有する機械にアップロードされ、実行される。好ましくは、機械は、中央処理装置（ＣＰＵ）、ランダムアクセスメモリ（ＲＡＭ）、及び入出力（Ｉ／Ｏ）インタフェース等のハードウェアを有するコンピュータプラットフォームで実施される。コンピュータプラットフォームはオペレーティングシステムとマイクロコードも含んでもよい。ここに説明した様々なプロセスや機能は、オペレーティングシステムにより実行できる、マイクロ命令コードの一部やアプリケーションプログラムの一部（またはこれらの組み合わせ）であってもよい。また、追加的データ記憶装置や印刷装置等その他の様々な周辺装置をコンピュータプラットフォームに接続してもよい。 Of course, the present principles may be implemented in various forms such as hardware, software, firmware, special purpose processors, or combinations thereof. Preferably, this principle is implemented using a combination of hardware and software. The software may be implemented as an application program tangibly embodied in the program recording apparatus. The application program is uploaded and executed on a machine having a suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as a central processing unit (CPU), random access memory (RAM), and input / output (I / O) interfaces. The computer platform may also include an operating system and microcode. The various processes and functions described herein may be part of microinstruction code or part of application program (or a combination thereof) that can be executed by the operating system. In addition, various other peripheral devices such as an additional data storage device and a printing device may be connected to the computer platform.

図６は、本原理の実装に利用される最小限度の計算環境６００を示すブロック図である。計算環境６００は、プロセッサ６１０、少なくとも１つの（好ましくは２つ以上の）Ｉ／Ｏインタフェース６２０を含む。Ｉ／Ｏインタフェースは、有線又は無線であり、無線の実施形態では、計算環境６００がグローバルネットワーク（例えば、インターネット）上で動作し、他のコンピュータ又はサービス（例えば、クラウドベースの計算又はストレージサービス）と通信できる適切な無線通信プロトコルで予め構成され、本原理が、例えばエンドユーザにリモートで提供されるＳｏｆｔｗａｒｅａｓａＳｅｒｖｉｃｅ（ＳＡＡＳ）機能をとして提供できるようにしている。一以上のメモリ６３０及び／又はストレージデバイス（ＨＤＤ）６４０も計算環境６００内に提供される。計算環境６００は、本原理の一実施形態による、行列因子分解Ｃ１−Ｃ１２（図４）のためのプロトコルＰ１−Ｐ１７（図３）を実装する。特に、本原理の一実施形態では、計算環境６００はＲｅｃＳｙｓ２３０を実装し得る；別の計算環境６００はＣＳＰ２５０を実装し、ソースはそれぞれが区別できるユーザ２１０に関連する一又は複数の計算環境６００を含み、ＲｅｃＳｙｓ２３０及びＣＳＰ２５０との通信に使われる、デスクトップコンピュータ、セルラー電話、スマートフォン、フォーンウォッチ、タブレットコンピュータ、パーソナルデジタルアシスタント（ＰＤＡ）、ノートブック及びラップトップコンピュータを含むがこれに限定されない。また、ＣＳＰ２５０は、ソース中に含まれても、または同等であるが、ソースの各ユーザ２２１０の計算環境に含まれても良い。 FIG. 6 is a block diagram illustrating a minimal computing environment 600 utilized to implement the present principles. The computing environment 600 includes a processor 610 and at least one (preferably two or more) I / O interfaces 620. The I / O interface is wired or wireless, and in a wireless embodiment, the computing environment 600 runs on a global network (eg, the Internet) and other computers or services (eg, cloud-based computing or storage services). Pre-configured with a suitable wireless communication protocol that can communicate with, this principle allows for providing, for example, a Software as a Service (SAAS) function that is provided remotely to an end user. One or more memories 630 and / or storage devices (HDDs) 640 are also provided within the computing environment 600. The computing environment 600 implements protocols P1-P17 (FIG. 3) for matrix factorization C1-C12 (FIG. 4) according to one embodiment of the present principles. In particular, in one embodiment of the present principles, computing environment 600 may implement RecSys 230; another computing environment 600 may implement CSP 250, and the source may include one or more computing environments 600 associated with user 210, each of which can be distinguished. Including, but not limited to, desktop computers, cellular phones, smartphones, phone watches, tablet computers, personal digital assistants (PDAs), notebooks and laptop computers used for communication with RecSys 230 and CSP 250. The CSP 250 may be included in the source or equivalent, but may be included in the computing environment of each user 2210 of the source.

さらに言うまでもなく、添付した図面に示したシステム構成要素や方法ステップの一部はソフトウェアで実施されてもよいので、システム構成要素（または方法ステップ）間の実際的な結合は本原理をプログラムするそのプログラム方法に応じて異なる。ここに開示された教示を受けて、関連技術分野の当業者は、本原理の同様な実施形態や構成を考えることができるであろう。 Further, it will be appreciated that some of the system components and method steps shown in the accompanying drawings may be implemented in software, so that the actual coupling between system components (or method steps) is the one that programs this principle. It depends on the programming method. Given the teachings disclosed herein, one of ordinary skill in the related art will be able to contemplate similar embodiments and configurations of the present principles.

例示した実施形態を添付した図面を参照して説明したが、言うまでもなく、本原理はこれらの実施形態には限定されず、当業者は、本原理の範囲と精神から逸脱することなく、様々な変化と修正を施すことができるであろう。かかる変更や修正はすべて添付した請求項に記載した本原理の範囲内に含まれるものである。

While the illustrated embodiments have been described with reference to the accompanying drawings, it will be understood that the present principles are not limited to these embodiments, and those skilled in the art will be able to understand various aspects without departing from the scope and spirit of the principles. Changes and modifications could be made. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.

Claims

A method for securely generating recommendations by matrix factorization and ridge regression,
Receiving a first record set, wherein each record is received from each user in the first user set, includes a token set and an item set, and is kept secret to parties other than each user; Step,
Evaluating a first record set based on matrix factorization using a first garbled circuit in a recommender (RecSys), wherein the output of the first garbled circuit includes all of the first record set Including a masked item profile for the item; and
Receiving a recommendation request from a requesting user for at least one item;
The requesting user evaluates a second record and the masked item profile based on ridge regression using a second garbled circuit, wherein the output of the second garbled circuit is the at least one one Including a recommendation for an item, the recommendation being known only by the requesting user.

Designing the first garbled circuit in the CSP to perform matrix factorization on the first record set, wherein the output of the first garbled circuit is all of the first record set Including a masked item profile of items of
Transferring the first garbled circuit to RecSys;
Designing a second garbled circuit in the CSP to perform a ridge regression on the second record and the masked item profile, wherein the output of the second garbled circuit is the at least one Including recommendations on items, steps;
Forwarding the second garbled circuit to the requesting user;
The method of claim 1.

The designing step includes
Designing a matrix factorization operation as a Boolean circuit;
Designing the ridge regression operation as a Boolean circuit;
The method of claim 2.

The steps to design the matrix factorization circuit are:
Configuring an array of the first record set;
Performing operations of sorting, copying, updating, comparing, and calculating a slope contribution on the array;
The method of claim 3.

Encrypting the first record set to generate an encrypted record, wherein the encryption is further performed before receiving the first record set;
The method of claim 2.

Generating a public encryption key in the CSP;
Sending a key to each user;
The method of claim 5.

The cipher is a partial homomorphic cipher, and the method is
Masking the encrypted record in the RecSys to generate a masked record;
In the CSP, decoding the masked record to generate a masked decoded record;
The method of claim 5.

The designing step includes
Unmasking the masked and decoded record in the first garbled circuit before processing;
The method of claim 7.

Performing an obligatory transfer between the CSP and the RecSys, wherein the RecSys receives a garbled value of the masked and decrypted record, and the record is kept secret to the RecSys and the CSP;
The method of claim 7.

The steps to design the ridge regression circuit are:
Receiving a masked item profile and a second record from the requesting user;
Unmasking the masked item profile to generate a tuple array including tokens, items, and item profiles, wherein a corresponding item profile is added to each token and item from the second record; Steps,
Performing ridge regression on the tuple array to generate a requested user profile;
Calculating a recommendation from the requested user profile and at least one item profile;
The method of claim 3.

The step of generating the array is performed using a sorting network.
The method of claim 10.

Performing a proxy obligatory transfer between the requesting user, CSP and RecSys, wherein the requesting user receives a garbled value of the masked item profile, and the masked item profile is sent to the requesting user and the CSP; The method of claim 1, wherein is kept secret.

Receiving the number of tokens and items for each record;
The method of claim 1.

The method according to claim 1, further comprising the step of padding each record with a null entry to generate a record in which the number of tokens is equal to the value when the number of tokens in each record is less than a value representing the maximum value.

The method of claim 1, wherein the source of the first record set is a database and the source of the second record is a database.

The method of claim 2, further comprising the step of the CSP receiving a parameter set for designing a garbled circuit, wherein a parameter is transmitted by the RecSys.

A system that securely generates recommendations by matrix factorization and ridge regression,
A first user set providing a first record set;
A cryptographic service provider (CSP) that provides a secure matrix factorization and ridge regression circuit for the first and second circuits;
RecSys for evaluating matrix factorization;
Providing a second record, evaluating the ridge regression circuit, and allowing each record to be kept secret to parties other than the user;
User, CSP and RecSys are each
A processor receiving at least one input / output;
At least one memory in signal communication with the processor;
The Recsys processor is:
Receiving a first set of records from a first set of users, each record comprising a token set and an item set, kept secret to parties other than the respective users;
Receiving a request from the requesting user for at least one item;
Evaluating the first set of records based on matrix factorization using a first garbled circuit;
The output of the first garbled circuit includes a masked item profile of all items of the first record set;
The requesting user's processor is:
Configured to evaluate a second record and the masked item profile based on ridge regression using a second garbled circuit;
The output of the second garbled circuit includes a recommendation for at least one item, the recommendation being known only by the requesting user;
system.

The CSP processor is:
A first garbled circuit is designed to perform matrix factorization on the first record set, and the output of the first garbled circuit is a masked item profile of all items of the first record set Including
Transferring the first garbled circuit to the RecSys;
A second garbled circuit is designed to perform a ridge regression on the second record and the masked item profile, the output of the second garbled circuit includes a recommendation for the at least one item;
Configured to forward the second garbled circuit to the requesting user;
The system of claim 17.

The CSP processor
Design matrix factorization operations as Boolean circuits,
By configuring the ridge regression operation as a Boolean circuit, it is configured to design a garbled circuit.
The system of claim 18.

The CSP processor
Configuring an array of the first recordset;
Is configured to design a matrix factorization circuit by being configured to perform operations of sorting, copying, communicating, comparing and calculating the slope contribution in the array,
The system of claim 19.

Each user processor of the first user set is configured to encrypt each record and generate an encrypted record before providing the record.
The system of claim 18.

The CSP processor further includes:
Generating a public encryption key in the CSP;
Configured to send a key to the first set of users;
The system of claim 21.

The cipher is a partial homomorphic cipher,
The RecSys processor
Configured to mask encrypted records and generate masked records,
The CSP processor further includes:
Configured to decrypt the masked record and generate a masked decrypted record;
The system of claim 21.

The CSP processor
The masked decoded record is further configured to design the first garbled circuit by being configured to unmask within the first garbled circuit before processing.
24. The system of claim 23.

The Recsys processor and the CSP processor are further configured to perform an obligatory transfer,
The RecSys receives the garbled value of the masked and decrypted record, and the record is kept secret to the RecSys and CSP.
24. The system of claim 23.

The CSP processor
Receiving a masked item profile and a second record from the requesting user;
Unmasking the masked item profile to generate an array of tokens, items, and item profiles, a corresponding item profile being added to each token and item from the second record;
Perform a ridge regression on the tuple array, generate the required user profile,
The system of claim 19, wherein the system is configured to design the second garbled circuit by being configured to calculate a recommendation from the requested user profile and at least one item profile.

The CSP processor is configured to generate an array by being configured to design a sorting network;
27. The system of claim 26.

The request processor, the RecSys processor, and the CSP processor are further configured to perform a proxy obligatory transfer,
The requesting user receives a garbled value of the masked item profile, and the masked item profile is kept secret from the requesting user and CSP.
The system of claim 17.

The RecSys processor
Configured to receive a token count for each record, the token count being sent by the source of each record;
The system of claim 17.

Each processor of the first user set is configured to pad each record with a null entry in order to generate a record having the number of tokens equal to the value when the number of tokens of each record is smaller than a value representing a maximum value. Being
The system of claim 17.

The source of the first record set is a database and the source of the second record set is a database;
The system of claim 17.

The CSP processor further includes:
configured to receive a parameter set of a garbled circuit design, the parameters are those sent by the RecSys,
The system of claim 18.