KR20160041028A

KR20160041028A - A method and system for privacy preserving matrix factorization

Info

Publication number: KR20160041028A
Application number: KR1020157023839A
Authority: KR
Inventors: 에프스트라티오스 이오아니디스; 이후드 바인스베르그; 니나 앤 타프트; 마르끄 좌; 발레리아 니콜라엔코
Original assignee: 톰슨 라이센싱
Priority date: 2013-08-09
Filing date: 2014-05-01
Publication date: 2016-04-15
Also published as: CN105103487A; JP2016510913A; JP2016517069A; CN105144625A; CN105009505A; JP2016510912A; EP3031165A2

Abstract

추천 시스템들에서 사용하기 위한 행렬 분해를 통해 항목들을 보안적으로 프로파일링하기 위한 방법 및 시스템은, 임의의 개별 레코드의 콘텐츠를 학습하지 않고, 토큰들 및 항목들을 포함하는 레코드들의 세트를 입력으로서 수신하고; 항목 프로파일들이 아닌 레코드들로부터 추출된 임의의 정보 또는 임의의 개별 레코드의 콘텐츠를 학습하지 않고 적어도 하나의 항목에 관한 프라이버시-보호 방식으로 항목 프로파일들을 생성하기 위해 레코드들의 세트에 대한 행렬 분해에 기초하여 왜곡 회로를 설계 및 평가함으로써 시작한다. 시스템은 세 당사자들, 즉, 레코드들에 대한 소스를 나타내는 데이터베이스 또는 복수의 사용자들; 왜곡 회로를 설계할 암호-서비스 제공자 및 회로를 평가할 추천기 시스템을 포함하며, 따라서, 상기 레코드들 및 항목 프로파일들이 아닌 레코드들로부터 추출된 임의의 정보는 이들의 소스 이외의 당사자들로부터 비밀로 유지된다.A method and system for securely profiling items through matrix decomposition for use in recommendation systems includes receiving as input a set of records including tokens and items without learning the content of any individual record and; Based on matrix decomposition for a set of records to generate item profiles in a privacy-protected manner with respect to at least one item without learning the contents of any information or any individual records extracted from the records other than the item profiles Start by designing and evaluating distortion circuits. The system comprises three parties, a database or a plurality of users representing a source for records; And a recommender system for evaluating the circuit, so that any information extracted from the records other than the records and item profiles is kept secret from parties other than their sources do.

Description

[0001] METHOD AND SYSTEM FOR PRIVATIZATION OF PRIVACY PROTECTION MATRIX [0002] METHOD AND SYSTEM FOR PRIVATE PRESERVING MATRIX FACTORIZATION [

관련 출원들에 대한 교차-참조Cross-reference to related applications

이 출원은 2013년 8월 9일에 출원된 미국 가특허 출원들인, "A METHOD AND SYSTEM FOR PRIVACY PRESERVING MATRIX FACTORIZATION"라는 명칭의 일련번호 제61/864088호; "A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING"라는 명칭의 일련번호 제61/864085호; "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION"라는 명칭의 일련번호 제61/864094호; 및 "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION"라는 명칭의 일련번호 제61/864098호의 이익 및 이들에 대한 우선권을 주장한다. 추가로, 이 출원은 2013년 12월 19일에 출원된 "A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING"라는 명칭의 PCT 특허 출원 번호 제PCT/US13/76353호 및 2013년 3월 4일에 출원된 "PRIVACY-PRESERVING LINEAR AND RIDGE REGRESSION"라는 명칭의 미국 가특허 출원 번호 제61/772404호의 이익 및 이들에 대한 우선권을 주장한다. 가출원 및 PCT 출원은 모든 목적으로 그 전체가 본원에 참조로 명시적으로 포함된다.This application is a continuation-in-part of Serial No. 61/864088, entitled " A METHOD AND SYSTEM FOR PRIVACY PRESERVING MATRIX FACTORIZATION, " filed August 9, 2013, Serial No. 61/864085 entitled " A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING "; Serial No. 61/864094 entitled " A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION "; And 61 / 864,098 entitled " A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION ". This application is further described in PCT Patent Application No. PCT / US13 / 76353 entitled " A METHOD AND SYSTEM FOR PRIVACY PRESERVING COUNTING ", filed December 19, 2013, Quot; PRIVACY-PRESERVING LINEAR AND RIDGE REGRESSION ", which claims priority to US patent application Ser. No. 61 / 772,404. The provisional application and the PCT application are expressly incorporated by reference herein in their entirety for all purposes.

기술분야Technical field

본원의 원리들은 프라이버시-보호 추천 시스템들 및 보안 다자간 계산(secure multi-party computation)에 관한 것이며, 특히, 항목들을 프로파일링하기 위해 프라이버시-보호 방식(privacy-preserving fashion)으로, 보안적으로 행렬 분해(matrix factorization)로서 공지된 협업적 필터링 기법(collaborative filtering technique)을 수행하는 것에 관한 것이다.The principles of the present disclosure relate to privacy-protected recommendation systems and secure multi-party computation, and particularly to privacy-preserving fashion for profiling items, and to performing a collaborative filtering technique known as matrix factorization.

과거 수십 년간 엄청난 연구 및 상업 활동은 추천 시스템의 광범위한 사용을 가져왔다. 이러한 시스템들은 영화, TV 쇼, 음악, 서적, 호텔, 레스토랑 등과 같은 많은 종류의 항목들에 대한 개인화된 추천들을 사용자들에게 제공한다. 도 1은 일반적인 추천 시스템(100)의 컴포넌트들, 즉, 사용자의 입력들(120)을 프로세싱하고 추천들(140)을 출력하는 소스 및 추천기 시스템(RecSys)(130)을 나타내는 다수의 사용자들(110)을 예시한다. 유용한 추천들을 수신하기 위해, 사용자들은 자신들의 선호도들에 관한 중요한 개인 정보(사용자들의 입력들)를 공급하여, 추천기가 이 데이터를 적절하게 관리할 것이라고 믿는다.In the past decades, tremendous research and commercial activity has led to widespread use of referral systems. These systems provide users with personalized recommendations for many kinds of items such as movies, TV shows, music, books, hotels, restaurants, and the like. Figure 1 illustrates the components of a general recommendation system 100, that is, a plurality of users < RTI ID = 0.0 > (RecSys) < / RTI > 130 representing a source and recommender system (RecSys) 130 that processes user inputs 120 and outputs recommendations 140 (110). To receive useful recommendations, users supply important personal information about their preferences (users' inputs), believing that the recommender will manage this data appropriately.

그럼에도, B. Mobasher, R. Burke, R. Bhaumik, 및 C. Williams에 의한 연구("Toward trustworthy recommender systems: An analysis of attack models and algorithm robustness.", ACM Trans. Internet Techn., 7(4), 2007), 및 E. A'imeur, G. Brassard, J. M. Fernandez, 및 F. S. M. Onana에 의한 연구("ALAMBIC: A privacy-preserving recommender system for electronic commerce", Int. Journal Inf. Sec, 7(5), 2008)와 같은 조기 연구들은 추천기들이 이러한 정보를 악용하거나 사용자를 프라이버시 위협에 노출시키는 다수의 방식들을 식별하였다. 추천기들은 종종 이익을 위해 데이터를 되팔도록, 뿐만 아니라 사용자에 의해 의도적으로 공개된 것 이상의 정보를 추출하도록 자극받는다. 예를 들어, 심지어 영화 등급 또는 개인의 TV 시청 이력과 같이 통상적으로 민감한 것으로서 인지되지 않는 사용자 선호도들의 레코드들은 사용자의 정치 소속, 성별 등을 추론하기 위해 사용될 수 있다. 추천 시스템 내의 데이터로부터 추론될 수 있는 개인 정보는 악의적 또는 선의의 목적으로, 새로운 데이터 마이닝 및 추론 방법들이 개발됨에 따라 계속 진화하고 있다. 극단적으로, 사용자 선호도들의 레코드들은 심지어 사용자를 고유하게 식별하기 위해 사용될 수 있다: A. Naranyan 및 V. Shmatikov는 문헌("Robust de-anonymization of large sparse datasets", in IEEE S&P, 2008)에서, Netflix 데이터세트를 탈-익명화(de-anonymizing)함으로써 이를 강력하게 보여주었다. 이와 같이, 추천기가 악의적이지 않더라도, 이러한 데이터의 비의도적인 누출은 사용자들이 연쇄 공격, 즉, 하나의 데이터베이스를 보조 정보로서 사용하여 상이한 데이터베이스에서 프라이버시를 위태롭게 하는 공격에 취약하도록 만든다.Nevertheless, a study by B. Mobasher, R. Burke, R. Bhaumik, and C. Williams ("Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness.", ACM Trans. , 2007), and E. A'imeur, G. Brassard, JM Fernandez, and FSM Onana ("ALAMBIC: A privacy-preserving recommender system for electronic commerce" , 2008) have identified a number of ways in which recommenders exploit this information or expose users to privacy threats. Recommenders are often tempted to reproduce the data for profit, as well as to extract more information than is intentionally released by the user. For example, records of user preferences that are not normally perceived as sensitive, such as a movie rating or an individual's TV viewing history, can be used to infer a user's political affiliation, gender, and the like. Personal information that can be deduced from the data in the referral system is evolving as malicious or good intentions are developed as new data mining and reasoning methods are developed. Extremely, records of user preferences can even be used to uniquely identify a user: A. Naranyan and V. Shmatikov, in "Robust de-anonymization of large sparse datasets," in IEEE S & P, This was demonstrated by de-anonymizing the data set. Thus, even if the recommender is not malicious, the unintentional leakage of such data makes it susceptible to attacks by users, such as attacks using one database as ancillary information and endangering privacy in different databases.

향후의 추론 위협들, 우발적인 정보 누출, 또는 내부 위협들(의도적 누출)을 항상 예측할 수 없기 때문에, 사용자들이 자신의 개인 데이터를 명백하게 노출시키지 않는 추천 시스템을 구축하는 것이 관심 대상이 된다. 오늘날 암호화된 데이터 상에서 동작하는 실제적인 추천 시스템들이 존재하지 않는다. 추가로, 사용자들이 제공하는 평가들, 또는 심지어 사용자들이 어느 항목들을 평가했는지를 전혀 알지 않고도 항목들을 프로파일링할 수 있는 추천기를 구축하는 것이 관심 대상이다. 본원의 원리들은 이러한 보안 추천 시스템을 제안한다.It is of interest to build a recommendation system that does not explicitly expose users' personal data, as future speculative threats, accidental information leaks, or internal threats (intentional leaks) are not always predictable. There are no practical recommendation systems that operate on encrypted data today. In addition, it is of interest to build recommenders that can profile items without ever knowing what users are offering, or even what users have rated. The principles herein suggest such a security recommendation system.

본원의 원리들은 항목들을 프로파일링하기 위해 프라이버시-보호 방식으로, 행렬 분해로서 공지된 협업적 필터링 기법을 보안적으로 수행하기 위한 방법을 제안한다. 특히, 방법은 항목들(예를 들어, 영화들, 서적들)에 사용자들이 부여한 평가들을 입력으로서 수신하고, 사용자가 각각의 항목에 어떤 평가를 부여할 수 있는지를 예측하기 위해 후속적으로 사용될 수 있는 각각의 항목에 대한 프로파일을 생성한다. 본원의 원리들은 추천기 시스템이 행렬 분해에 기초하여 사용자의 평가들, 또는 심지어 사용자가 어느 항목을 평가했는지를 전혀 알지 않고 이 작업을 수행할 수 있게 한다.The principles of the present application propose a method for securely performing collaborative filtering techniques known as matrix decomposition in a privacy-protected manner to profile items. In particular, the method can be used subsequently to receive evaluations given by users to items (e. G., Movies, books) as input and to predict which evaluation the user may give to each item Create a profile for each item in the list. The principles herein allow the recommender system to perform this task without knowledge of the user's assessments, or even what the user has rated based on the matrix decomposition.

본원의 원리들의 일 양상에 따르면, 행렬 분해를 통해 항목들을 보안적으로 프로파일링하기 위한 방법이 제공되며, 방법은: 소스로부터 레코드들(220)의 세트를 수신하는 단계 ― 레코드는 토큰들의 세트 및 항목들의 세트를 포함하고, 각각의 레코드는 상기 소스 이외의 당사자(party)들로부터 비밀로 유지됨 ― ; 적어도 하나의 별도의 항목을 수신하는 단계(360); 및 행렬 분해에 기초하여 왜곡 회로(garbled circuit)를 사용함으로써 추천기(RecSys)(230) 내의 적어도 하나의 별도의 항목 및 레코드들의 세트를 평가하는 단계(395)를 포함하고, 왜곡 회로의 출력은 적어도 하나의 별도의 항목에 대한 항목 프로파일들이다. 방법은 레코드들의 세트(380) 및 적어도 하나의 별도의 항목에 대해 행렬 분해를 수행하도록 암호-시스템 제공자(CSP; Crypto-System Provider) 내의 왜곡 회로를 설계하는 단계(360) ― 왜곡 회로는 적어도 하나의 별도의 항목의 항목 프로파일들을 출력함 ― ; 및 왜곡 회로를 RecSys에 전달하는 단계(385)를 더 포함할 수 있다. 방법에서 설계하는 단계는: 행렬 분해 동작을 부울 회로로서 설계하는 것(382)을 포함할 수 있다. 방법에서 행렬 분해 회로를 설계하는 단계는: 레코드들의 세트의 어레이를 구성하는 것(410); 및 어레이에 대해 분류 동작(420, 440, 470, 490), 복제 동작(430, 450), 업데이트 동작(470, 480), 비교 동작(480) 및 기울기 기여도들(gradient contributions)의 계산 동작(460)을 수행하는 것을 포함할 수 있다. 방법은 상기 CSP에 의해 왜곡 회로의 설계를 위한 파라미터들의 세트를 수신하는 단계를 더 포함할 수 있고, 파라미터들은 RecSys에 의해 송신되었다(330).According to an aspect of the principles herein, a method is provided for securely profiling items through matrix decomposition, the method comprising: receiving a set of records (220) from a source, A set of items, each record kept secret from parties other than the source; Receiving (360) at least one separate item; And evaluating (395) at least one discrete item and a set of records in the recommender (RecSys) 230 by using a garbled circuit based on matrix decomposition, the output of the distortion circuit Item profiles for at least one separate item. The method includes designing (360) a distortion circuit in a Crypto-System Provider (CSP) to perform a matrix decomposition on a set of records (380) and at least one separate item, Outputting the item profiles of the separate items of the item; And transmitting (385) the distortion circuit to the RecSys. The designing step in the method may include: designing 382 a matrix decomposition operation as a Boolean circuit. The method of designing a matrix decomposition circuit in a method includes: configuring (410) an array of sets of records; 440, 470, and 490 for the array, copy operations 430 and 450, update operations 470 and 480, compare operation 480, and calculation operations of gradient contributions 460 ). &Lt; / RTI > The method may further comprise receiving a set of parameters for the design of the distortion circuit by the CSP, and the parameters are transmitted by RecSys (330).

본원의 원리들의 일 양상에 따르면, 방법은 레코드들의 세트를 암호화하여 암호화된 레코드들을 생성하는 단계(330)를 더 포함할 수 있고, 암호화하는 단계는 레코드들의 세트를 수신하는 단계 이전에 수행된다. 방법은 공개 암호 키들이 CSP에서 생성되어 소스에 송신되도록(320) 할 수 있다. 방법은 CSP에서 공개 암호 키들을 생성하는 단계; 및 소스에 키들을 송신하는 단계(320)를 더 포함할 수 있다. 암호화 방식은 부분적으로 준동형 암호화(330)일 수 있고, 방법은: RecSys 내의 암호화된 레코드들을 마스킹하여 마스킹된 레코드들을 생성하는 단계(340); 및 CSP 내의 마스킹된 레코드들을 암호해독하여 암호해독된-마스킹된 레코드들을 생성하는 단계(350)를 더 포함할 수 있다. 방법에서의 설계 단계(380)는: 암호해독된-마스킹된 레코드들을 프로세싱하기 이전에 왜곡 회로 내에서 이들을 언마스킹(unmasking)하는 것을 포함할 수 있다. 방법은: CSP와 RecSys 사이에서(392) 불확정 전송(oblivious transfer)들(390)을 수행하는 단계를 더 포함할 수 있고, RecSys는 암호해독된-마스킹된 레코드들의 왜곡된 값들을 수신하고, 레코드들은 RecSys 및 CSP로부터 비공개로 유지된다.According to an aspect of the principles herein, the method may further comprise encrypting a set of records to generate encrypted records (330), wherein encrypting is performed prior to receiving the set of records. The method may be such that public cryptographic keys are generated in the CSP 320 and transmitted to the source. The method includes generating public cryptographic keys in a CSP; And transmitting (320) keys to the source. The encryption scheme may be partially perpetual encryption 330, and the method may include: masking the encrypted records in RecSys to generate (340) masked records; And decrypting the masked records in the CSP to generate decrypted-masked records (350). The design step 380 in the method may include: unmasking them in the distortion circuit before processing the decrypted-masked records. The method may further comprise performing (392) oblivious transfers 390 between the CSP and the RecSys, the RecSys receiving the distorted values of the decrypted-masked records, Are kept private from RecSys and CSP.

본원의 원리들의 일 양상에 따르면, 방법은 각각의 레코드(220, 310)의 다수의 토큰들 및 항목들을 수신하는 단계를 더 포함할 수 있다. 또한, 방법은 상기 값과 동일한 다수의 토큰들을 가지는 레코드들을 생성하기 위해, 각각의 레코드의 토큰들의 개수가 최댓값을 나타내는 값보다 더 작을 때, 널(null) 엔트리들로 각각의 레코드를 패딩(pad)하는 단계(312)를 포함할 수 있다. 방법에서의 레코드들의 세트의 소스는 데이터베이스 및 사용자들(210)의 세트 중 하나일 수 있고, 각각의 사용자는 하나의 레코드의 소스이고, 각각의 레코드는 그것의 대응하는 사용자 이외의 당사자들로부터 비밀로 유지된다.According to an aspect of the principles herein, the method may further comprise receiving a plurality of tokens and items of each record 220, 310. The method also includes padding each record with null entries when the number of tokens of each record is less than a value representing the maximum value to produce records having a number of tokens equal to the value (Step 312). The source of a set of records in the method may be one of a set of databases and users 210, each user being a source of one record, each record being a secret from a party other than its corresponding user Lt; / RTI >

본원의 원리들의 일 양상에 따르면, 행렬 분해를 통해 항목들을 보안적으로 프로파일링하기 위한 시스템이 제공되며, 시스템은 레코드들의 세트를 제공할 소스, 보안 행렬 분해 회로를 제공할 암호-서비스 제공자(CSP) 및 레코드들이 소스 이외의 당사자들로부터 비공개로 유지되도록 레코드들을 평가할 RecSys를 포함하고, 소스, CSP 및 RecSys는 각각: 적어도 하나의 입력/출력(604)을 수신하기 위한 프로세서(602); 및 프로세서와 신호 통신하는 적어도 하나의 메모리(606, 608)를 포함하고, RecSys의 프로세서는: 레코드들의 세트를 수신하고 ― 각각의 레코드는 토큰들의 세트 및 항목들의 세트를 포함하고, 각각의 레코드는 비밀로 유지됨 ― ; 적어도 하나의 별도의 항목을 수신하고; 그리고 행렬 분해에 기초하여 왜곡 회로를 이용하여 레코드들의 세트 및 적어도 하나의 별도의 항목을 평가하도록 구성되고, 왜곡 회로의 출력은 적어도 하나의 별도의 항목에 대한 항목 프로파일들이다. 시스템 내의 CSP의 프로세서는: 레코드들의 세트 및 적어도 하나의 별도의 항목의 행렬 분해를 수행하도록 왜곡 회로를 설계하고 ― 왜곡 회로는 적어도 하나의 별도의 항목에 대한 항목 프로파일들을 출력함 ― ; 그리고 RecSys에 왜곡 회로를 전달하도록 구성될 수 있다. 시스템 내의 CSP의 프로세서는 행렬 분해 동작을 부울 회로로서 설계하도록 구성됨으로써 왜곡 회로를 설계하도록 구성될 수 있다. 시스템 내의 CSP의 프로세서는 상기 레코드들의 세트의 어레이를 구성하고; 어레이에 대해 분류 동작, 복제 동작, 업데이트 동작, 비교 동작 및 기울기 기여도들의 계산 동작을 수행하도록 구성됨으로써 행렬 분해 회로를 설계하도록 구성될 수 있다. 시스템 내의 CSP의 프로세서는 추가로 왜곡 회로의 설계에 대한 파라미터들의 세트를 수신하도록 구성될 수 있고, 파라미터들은 상기 RecSys에 의해 송신되었다.According to an aspect of the principles of the present disclosure there is provided a system for securely profiling items through matrix decomposition, the system comprising a source to provide a set of records, a cryptographic service provider (CSP) And a RecSys for evaluating the records so that the records are kept private from parties other than the source, the source, the CSP and the RecSys each comprising: a processor 602 for receiving at least one input / output 604; And at least one memory (606, 608) in signal communication with the processor, wherein the processor of the RecSys: receives a set of records, each record comprising a set of tokens and a set of items, Kept secret -; Receive at least one separate item; And to evaluate a set of records and at least one separate item using a distortion circuit based on matrix decomposition, the output of the distortion circuit being item profiles for at least one separate item. The processor of the CSP in the system: designing a distortion circuit to perform a matrix decomposition of a set of records and at least one separate item, the distortion circuit outputting item profiles for at least one separate item; And to deliver a distortion circuit to RecSys. The processor of the CSP in the system may be configured to design a distortion circuit by being configured to design the matrix decomposition operation as a Boolean circuit. The processor of the CSP in the system configuring the array of sets of records; And to perform a calculation operation of the classification operation, the copy operation, the update operation, the comparison operation, and the slope contributions for the array. The processor of the CSP in the system may be further configured to receive a set of parameters for the design of the distortion circuit, and the parameters are transmitted by the RecSys.

본원의 원리들의 일 양상에 따르면, 시스템 내의 소스 프로세서는 레코드들의 세트를 제공하기 이전에 상기 레코드들의 세트를 암호화하여 암호화된 레코드들을 생성하도록 구성될 수 있다. 시스템 내의 CSP의 프로세서는 추가로 공개 암호 키들을 생성하고; 키들을 소스에 송신하도록 구성될 수 있다. 암호화 방식은 부분적 준동형 암호화일 수 있으며, RecSys의 프로세서는 추가로: 암호화된 레코드들을 마스킹하여 마스킹된 레코드들을 생성하도록 구성될 수 있고; CSP의 프로세서는 추가로: 마스킹된 레코드들을 암호해독하여 암호해독된-마스킹된 레코드들을 생성하도록 구성될 수 있다. 시스템 내의 CSP의 프로세서는 암호해독된-마스킹된 레코드들을 프로세싱하기 이전에 왜곡 회로 내에서 이들을 언마스킹하도록 추가로 구성됨으로써 왜곡 회로를 설계하도록 구성될 수 있다. RecSys의 프로세서 및 CSP의 프로세서는 불확정 전송들을 수행하도록 추가로 구성될 수 있고, 상기 RecSys는 암호해독된-마스킹된 레코드들의 왜곡 값들을 수신하고, 레코드들은 RecSys 및 CSP로부터 비공개(private)로 유지된다.According to an aspect of the principles herein, a source processor in the system may be configured to encrypt the set of records before generating a set of records to generate encrypted records. The processor of the CSP in the system further generates public cryptographic keys; And transmit the keys to the source. The encryption scheme may be partially perturbed encryption and the processor of RecSys may further be configured to: mask the encrypted records to generate the masked records; The processor of the CSP may further be configured to decrypt the masked records to generate decrypted-masked records. The processor of the CSP in the system may be further configured to design the distortion circuitry to unmask them in the distortion circuit before processing the decrypted-masked records. The processor of the RecSys and the processor of the CSP can be further configured to perform indeterminate transmissions, the RecSys receives the distortion values of the decrypted-masked records, and the records remain private from the RecSys and CSP .

본원의 원리들의 일 양상에 따르면, 시스템 내의 RecSys의 프로세서는 추가로: 각각의 레코드의 다수의 토큰들을 수신하도록 구성될 수 있고, 다수의 토큰들은 상기 소스에 의해 송신되었다. 시스템 내의 소스 프로세서는: 각각의 레코드의 토큰들의 개수가 최댓값을 나타내는 값보다 더 작을 때, 상기 값과 동일한 토큰들의 개수를 가지는 레코드들을 생성하기 위해, 각각의 레코드를 널 엔트리들로 패딩하도록 구성될 수 있다. 레코드들의 세트의 소스가 데이터베이스 및 사용자들의 세트 중 하나일 수 있고, 소스가 사용자들의 세트인 경우, 각각의 사용자는 적어도 하나의 입력/출력(604)을 수신하기 위한 프로세서(602); 및 적어도 하나의 메모리(606, 608)를 포함하고, 각가의 사용자는 하나의 레코드의 소스이며, 각각의 레코드는 그것의 대응하는 사용자 이외의 당사자들로부터 비밀로 유지된다.According to an aspect of the principles herein, the processor of the RecSys in the system may additionally be configured to receive a plurality of tokens of each record, and a plurality of tokens are transmitted by the source. The source processor in the system is configured to: pad each record with null entries to produce records having the same number of tokens as the value when the number of tokens in each record is less than the value representing the maximum value . If the source of the set of records may be one of a set of databases and users, and the source is a set of users, each user may have a processor 602 for receiving at least one input / output 604; And at least one memory (606, 608), each user is a source of one record, and each record is kept secret from parties other than its corresponding user.

본원의 원리들의 추가적인 특징들 및 장점들은 첨부도면들에 관련하여 전개될 예시적인 실시예들의 후속하는 상세한 설명으로부터 명백해질 것이다.Additional features and advantages of the principles herein will become apparent from the following detailed description of illustrative embodiments which will be developed in connection with the accompanying drawings.

본원의 원리들은 하기에 간략하게 기술될 후속하는 예시적인 도면들에 따라 더 잘 이해될 수 있다.
도 1은 종래 기술의 추천 시스템의 컴포넌트들을 예시한다.
도 2는 본원의 원리들에 따른 추천 시스템의 컴포넌트들을 예시한다.
도 3a, 3b 및 3c는 본원의 원리들에 따라 행렬 분해를 통해 항목들을 프로파일링하기 위한 프라이버시-보호 방법의 흐름도를 예시한다.
도 4a, 4b 및 4c는 본원의 원리들에 따른 행렬 분해 알고리즘의 흐름도를 예시한다.
도 5의 (A), (B)는 본원의 원리들에 따른 행렬 분해 알고리즘에 의해 구성된 데이터 구조(S)를 예시한다.
도 6은 본원의 원리들을 구현하기 위해 이용되는 컴퓨팅 환경의 블록도를 예시한다.The principles herein may be better understood in accordance with the following illustrative figures, which are briefly described below.
Figure 1 illustrates components of a prior art recommendation system.
2 illustrates components of a recommendation system in accordance with the principles of the present disclosure.
Figures 3a, 3b and 3c illustrate a flow chart of a privacy-protection method for profiling items through matrix decomposition according to the principles of the present disclosure.
Figures 4A, 4B and 4C illustrate a flow diagram of a matrix decomposition algorithm according to the principles of the present disclosure.
Figures 5 (A) and 5 (B) illustrate a data structure S constructed by a matrix decomposition algorithm according to the principles of the present application.
Figure 6 illustrates a block diagram of a computing environment used to implement the principles of the present disclosure.

본원의 원리들에 따르면, 항목들을 프로파일링하기 위해 프라이버시-보호 방식으로, 행렬 분해로서 공지된 협업적 필터링 기법을 보안적으로 수행하기 위한 방법이 제공된다.According to the principles herein, a method is provided for securely performing collaborative filtering techniques known as matrix decomposition in a privacy-protected manner to profile items.

본원의 원리들의 방법은 레코드들의 코퍼스(corpus) 내의 적어도 하나의 항목을 프로파일링하기 위한 서비스로서의 역할을 할 수 있으며, 각각의 레코드는 토큰들 및 항목들의 세트를 포함한다. 세트 또는 레코드는 하나 초과의 레코드를 포함하고, 토큰들의 세트는 적어도 하나의 토큰을 포함한다. 통상의 기술자는 레코드가 사용자를 나타낼 수 있음을 위의 예에서 인지할 것이며; 토큰들은 레코드 내의 대응하는 항목들에 대한 사용자의 평가들일 수 있다. 토큰들은 또한 항목들과 연관된 순위들, 가중치들 또는 측정치들을 나타낼 수 있고, 항목들은 사람들, 작업들 또는 직업들을 나타낼 수 있다. 예를 들어, 순위들, 가중치들 또는 측정치들은 개인의 건강과 연관될 수 있고, 연구원은 모집단(population)의 건강 측정치들을 상관시키려고 시도한다. 또는 이들은 개인의 생산성과 연관될 수 있고, 회사는 이전 이력에 기초하여 특정 직업에 대한 스케쥴을 예측하려고 시도한다. 그러나, 관련된 개인들의 프라이버시를 보장하기 위해, 서비스는 각각의 레코드의 콘텐츠 또는 항목 프로파일들이 아닌 레코드들로부터 추출된 임의의 정보를 알지 않고도 그렇게 하기를 원한다. 특히, 서비스는 (a) 각각의 토큰/항목이 어느 레코드들에서 나타났는지, 또는 더 강력하게는, (b) 각각의 레코드에 어느 토큰들/항목들이 나타나는지, 및 (c) 토큰들의 값들을 학습하지 않아야 한다. 다음에서, "프라이버시-보호", "개인적인" 및 "보안적인"과 같은 용어들 및 단어들은, 사용자(레코드)에 의해 개인적인 것으로서 간주되는 정보가 오직 사용자에 의해서만 공지됨을 나타내기 위해 상호교환가능하게 사용된다.The method of principles herein may serve as a service for profiling at least one item in a corpus of records, each record including tokens and a set of items. The set or record contains more than one record, and the set of tokens includes at least one token. A typical descriptor will recognize in the above example that a record can represent a user; The tokens may be user evaluations of corresponding items in the record. The tokens may also represent rankings, weights, or measures associated with the items, and the items may represent people, tasks, or occupations. For example, rankings, weights, or measures may be associated with an individual's health and the researcher attempts to correlate the health measures of the population. Or they can be associated with individual productivity, and the company tries to predict a schedule for a particular job based on its previous history. However, in order to ensure the privacy of the individuals concerned, the service wants to do so without knowing any information extracted from the records, rather than the content or item profiles of each record. In particular, the service is configured to (a) determine which records each token / item appeared in, or more strongly, (b) which tokens / items appear in each record, and (c) You should not. In the following, terms and words such as "privacy-protected "," personal ", and "secure" are used interchangeably to denote that information deemed to be private by the user (record) Is used.

프라이버시-보호 방식으로 행렬 분해를 수행하는 것과 연관된 몇몇 과제(challenge)들이 존재한다. 먼저, 프라이버시 우려들을 해결하기 위해, 행렬 분해는 추천기가 사용자들의 평가들, 또는 심지어 사용자들이 어느 항목들을 평가했는지를 전혀 학습하지 않고 수행되어야 한다. 후자의 요건이 핵심인데, 즉, 조기의 연구들은 심지어 사용자가 어느 영화를 평가했는지를 아는 것이 예를 들어, 이들의 성별을 추론하기 위해 사용될 수 있음을 보여준다. 둘째, 이러한 프라이버시-보호 알고리즘은 효율적이어야 하며, 사용자들에 의해 제출된 평가들의 수를 이용하여 적절하게(예를 들어, 선형으로) 스케일링해야 한다. 프라이버시 요건들은 행렬 분해 알고리즘이 데이터-불확정적이어야 함을 내포하는데, 즉, 그 실행은 사용자 입력에 의존하지 않아야 한다. 또한, 행렬 분해에 의해 수행되는 동작들은 비-선형이며; 따라서, 이러한 제약들 모두 하에서 행렬 분해를 어떻게 효율적으로 구현할지가 선험적으로 명백하지 않다. 마지막으로, 실제의 현실적 시나리오에서, 사용자들은 제한된 통신 및 계산 자원들을 가지며, 이들이 자신의 데이터를 제공한 이후 온라인으로 유지하도록 예상하지 않아야 한다. 대신, 추천 서비스로부터 온라인과 오프라인 사이에서 왔다 갔다 이동하는 사용자들의 존재시 동작할 수 있는 "센드 앤드 포겟(send and forget)" 타입의 해법을 가지는 것이 바람직하다.There are several challenges associated with performing matrix decomposition in a privacy-protected manner. First, in order to address privacy concerns, matrix decomposition must be performed without any knowledge of the recommendations by the user's assessments, or even by what users have rated them. The latter requirement is key, that is, early research shows that even knowing which movies a user has rated can be used, for example, to infer their gender. Second, such a privacy-protection algorithm should be efficient and scaled appropriately (e.g., linearly) using the number of evaluations submitted by users. The privacy requirements imply that the matrix decomposition algorithm should be data-indeterminate, i.e., its execution should not depend on user input. Also, the operations performed by matrix decomposition are non-linear; Hence, it is not a priori clear how to efficiently implement matrix decomposition under all of these constraints. Finally, in realistic scenarios, users have limited communication and computing resources and should not expect them to keep online after providing their data. Instead, it is desirable to have a "send and forget" type of solution that can operate in the presence of users moving back and forth between online and offline from a referral service.

행렬 분해의 개요로서, 표준 "협업적 필터링" 설정에서, n명의 사용자들은 m개의 가능한 항목들(예를 들어, 영화)의 서브세트를 평가한다.

의 사용자들의 세트, 및

의 항목들의 세트에 대해, 평가가 생성된 사용자/항목 쌍들을

로, 그리고 평가들의 전체 개수를

로 표기한다. 마지막으로,

에 대해, 항목 j에 대해 사용자 i에 의해 생성된 평가를

라고 표기한다. 실제 설정에서, n과 m 모두 큰 수이며, 통상적으로 10⁴ 와 10⁶ 사이를 범위로 한다. 추가로, 제공된 평가들은 희소적, 즉

인데, 이는 잠재적인 평가들의 전체 수인 n×m보다 훨씬 더 작다. 이는, 각각의 사용자가 단지 유한 개수의 항목들만을 평가할 수 있음에 따라("목록" 크기인 m에 의존하지 않고), 통상적인 사용자 행동과 부합한다.As an overview of matrix decomposition, in the standard "collaborative filtering" setting, n users evaluate a subset of m possible items (e.g., movies).

A set of users of

For the set of items of the user /

, And the total number of ratings

. Finally,

, The evaluation generated by user i for item j

. In the actual setting, both n and m are large numbers, typically between 10 ⁴ and 10 ⁶ . In addition, the evaluations provided are scarce, i.e.,

, Which is much smaller than n x m, the total number of potential assessments. This is consistent with normal user behavior, as each user can only evaluate a finite number of items (without relying on the "list" size m).

M 내의 평가들이 주어지면, 추천기 시스템은

내의 사용자/항목 쌍들에 대한 평가들을 예측하기를 원한다. 행렬 분해는 기존의 평가들 상에서 이중-선형(bi-linear) 모델을 맞춤으로써 이 작업을 수행한다. 특히, 일부 작은 디멘젼

에 대해, 벡터

및

가 존재한다고 가정되며, 따라서,Given the ratings in M, the recommender system

Lt; RTI ID = 0.0 > user / item pairs < / RTI > Matrix decomposition does this by aligning the bi-linear model on existing evaluations. In particular, some small dimensions

, For vector

And

Is present, and therefore,

이고,

는 i.i.d.(independent and identically distributed; 독립적이고 동일하게 분포된) 가우시안 랜덤 변수들이다. 벡터들

및

는 각자 사용자 프로파일 및 항목 프로파일로 명명되며,

는 벡터들의 내적(inner product of the vectors)이다. 사용된 표기는 제i 행이 사용자 i의 프로파일을 포함하는 n×d 행렬에 대해

이고, 제j 행이 항목 j의 프로파일을 포함하는 m×d 행렬에 대해

이다.ego,

Are independent and identically distributed (iid) Gaussian random variables. Vectors

And

Are each named user profile and item profile,

Is the inner product of the vectors. The notation used is that for the n × d matrix where the ith row contains the user i's profile

, And the j-th row is for an m x d matrix containing the profile of the item j

to be.

평가

가 주어지면, 추천기는 통상적으로 일부 양의(positive) λ,μ＞0에 대해, 후속하는 정규 최소 제곱 최소화(regularized least squares minimization)를 수행하여 프로파일들 U 및 V를 계산한다:evaluation

, The recommender typically performs the following regularized least squares minimization for some positive?, Mu> 0 to calculate the profiles U and V:

통상의 기술자는, 프로파일들 U 및 V에 대한 가우시안 사전 확률(Gaussian priors)을 가정하면, (2)에서의 최소화가 U 및 V의 최대 우도 추정(maximum likelihood estimation)에 대응함을 인지할 것이다. 사용자 프로파일 및 항목 프로파일을 가지고, 추천기가 후속적으로 평가

을 예측할 수 있고, 따라서 사용자 i 및 항목 j에 대해The usual descriptor will recognize that the minimization in (2) corresponds to the maximum likelihood estimation of U and V, assuming Gaussian priors for the profiles U and V. With a user profile and an item profile, the recommender is evaluated subsequently

And therefore, for user i and item j

임에 유의한다..

수학식 (2)에서의 정규 평균 제곱 에러는 볼록 함수가 아니며; 이 최소화를 수행하기 위한 여러 방법들이 문헌에서 제안되었다. 본원의 원리들은 실제로 사용되는 인기 있는 방법인 기울기 하강(gradient descent)에 초점을 두며, 이는 다음과 같이 기술된다. 수학식 (2)에서의 정규 평균 제곱 에러를 F(U,V)로 표기하여, 기울기 하강은 조정 규칙:The normal mean square error in equation (2) is not a convex function; Several methods for performing this minimization have been proposed in the literature. The principles herein focus on a gradient descent, which is a popular method in practice, which is described as follows. The normalized mean square error in Equation (2) is denoted by F (U, V)

을 통해 프로파일들(U 및 V)을 반복적으로 조정함으로써 동작하고, 여기서,

는 작은 이득 인자이고,Lt; RTI ID = 0.0 > U < / RTI > and V,

Is a small gain factor,

이고, U(0) 및 V(0)는 균일하게 랜덤 크기 1인 행들(uniformly random norm 1 rows)로 구성된다(즉, 프로파일들은 크기 1의 볼(norm 1 ball)로부터 u.a.r.로(균일하게 랜덤으로; uniformly at random) 선택된다.U (0) and V (0) consist of uniformly random norm 1 rows (i. E., The profiles are from norm 1 ball to uar (uniformly random Uniformly at random.

본원의 원리들의 또다른 양상은 분류 네트워크들 및 Yao의 왜곡 회로에 기초하여 행렬 분해를 위한 보안 다자간 계산(MPC; multi-party computation) 알고리즘을 제안한다. 보안 다자간 계산(MPC)은 1980년대에 A. Chi-Chih Yao에 의해 초기에 제안되었다. Yao의 프로토콜(즉, 왜곡 회로들)은 보안 다자간 계산을 위한 포괄적인 방법이다. V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, 및 N. Taft에 의한 문헌("Privacy-preserving Ridge Regression on Hundreds of millions of records", in IEEE S&P, 2013)으로부터 조정된 그 변형에서, 프로토콜은 n개의 입력 소유자들의 세트인,

를 평가하기를 원하는 평가자 ― a_i는 사용자 i의 개인 입력을 나타내고 1≤i≤n 임 ―, 및 제3자인 암호-서비스 제공자(CSP; Crypto-Service Provider) 사이에서 실행된다. 프로토콜의 종료 시, 평가자는

의 값을 학습하지만, 어떠한 당사자도 이 출력 값으로부터 노출되는 것 이상을 학습하지는 않는다. 프로토콜은 함수(f)가 부울 회로로서, 예를 들어, OR, AND, NOT 및 XOR 게이트들의 그래프로서 표현될 수 있는 것, 및 평가자와 CSP가 충돌하지 않을 것을 요구한다.Another aspect of the present principles proposes a multi-party computation (MPC) algorithm for matrix decomposition based on classification networks and Yao's distortion circuit. Security Multiparty Calculation (MPC) was initially proposed by A. Chi-Chih Yao in the 1980s. Yao's protocol (ie, distortion circuits) is a comprehensive method for secure multiparametric calculations. V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft ("Privacy-preserving Ridge Regression on Hundreds of Millions of Records", in IEEE S & P, In its variant, the protocol is a set of n input holders,

The evaluator-a _i wanting to evaluate the user input i is a personal input of user i, 1? I? N, and a third party Crypto-Service Provider (CSP). Upon termination of the protocol, the evaluator

, But no party learns more than is exposed from this output value. The protocol requires that the function (f) be a Boolean circuit, which can be represented, for example, as a graph of OR, AND, NOT, and XOR gates, and that the evaluator and the CSP do not collide.

최근 Yao의 왜곡 회로들을 구현하는 다수의 프레임워크들이 존재한다. 범용 MPC에 대한 상이한 접근법은 비밀-공유 방식들(secret-sharing schemes)에 기초하고, 또다른 방식은 전체-준동형 암호화(FHE; fully-homomorphic encryption)에 기초한다. 비밀-공유 방식들은 선형 시스템, 선형 회귀, 및 경매를 풀어내는 것과 같은 다양한 선형 대수 동작들에 대해 제안되었다. 비밀-공유는 계산의 작업부하를 동일하게 공유하고, 다수의 라운드들에 걸쳐 통신하는 적어도 3개의 비-결탁 온라인 당국(non-colluding online authorities)을 요구하며; 계산은 이들 중 두개가 결탁하지 않는 한 안전하다. 왜곡 회로들은, 2개의 비-결탁 당국, 및 평가자가 클라우드 서비스이고 암호-서비스 제공자(CSP)가 신뢰받은 하드웨어 컴포넌트에서 구현되는 시나리오에 더욱 적합한 훨씬 더 적은 통신만을 가정한다.Recently, there are a number of frameworks that implement Yao's distortion circuits. A different approach to universal MPC is based on secret-sharing schemes and another approach is based on fully-homomorphic encryption (FHE). Secret-sharing schemes have been proposed for a variety of linear algebraic operations such as linear systems, linear regression, and solving an auction. Secret-sharing requires at least three non-colluding online authorities that share the workload of computation equally and communicate over multiple rounds; Calculation is safe unless two of these are conciliatory. The distortion circuits assume only fewer communications that are better suited to the scenario in which the two non-contracting authorities and the evaluator are implemented in a hardware service and a cryptographic service provider (CSP) is implemented in a trusted hardware component.

사용된 암호 프리미티브(primitive)와는 무관하게, 보안 다자간 계산을 위한 효율적인 알고리즘의 구축에 있어서의 주요한 과제는 데이터-불확정 방식으로 알고리즘을 구현하는데 있는데, 즉, 따라서, 실행 경로가 입력에 의존하지 않는다. 일반적으로, 제한된 시간(T) 내에 실행가능한 임의의 RAM 프로그램은 0(T^3) 터닝 머신(TM; Turing machine)으로 변환될 수 있는데, 이는 수학적 계산을 위한 이상화된 모델로서 작용하도록 Alan Turing에 의해 발명된 이론적 계산 머신이며, 여기서 0(T^3)는 복잡도가 T³에 비례함을 의미한다. 추가로, 임의의 제한된 T-시간 TM은 데이터-불확정성인 사이즈 0(T log T)의 회로로 변환될 수 있다. 이는 임의의 제한된 T-시간 실행가능한 RAM 프로그램이 0(T^3 log T) 복잡도를 가지는 데이터-불확정성 회로로 변환될 수 있음을 내포한다. 이러한 복잡도는 너무 높으며, 대부분의 응용예들에서 금지된다. 효율적인 데이터-불확정성 구현예들이 공지되지 않은 알고리즘에 대한 조사는 W. Du 및 M. J. Atallah에 의한 문헌("Secure multi-party computation problems and their applications: A review and open problems", in New Security Paradigms Workshop, 2001)에서 찾을 수 있으며, 행렬 분해 문제는 넓게는 데이터 마이닝 요약 문제들의 카테고리에 든다.Regardless of the cipher primitives used, a major challenge in constructing efficient algorithms for secure multiparameter computation is to implement algorithms in a data-indeterminate manner, ie, the execution path is not dependent on input. In general, any RAM program executable within a limited time T may be converted to a 0 (T ^ 3) Turing machine, which is described by Alan Turing as acting as an idealized model for mathematical calculations , Where 0 (T ^ 3) means that the complexity is proportional to T ³ . In addition, any limited T-time TM may be converted to a circuit of size 0 (T log T) which is data-uncertain. This implies that any limited T-time executable RAM program can be converted into a data-uncertainty circuit with a 0 (T ^ 3 log T) complexity. This complexity is too high and forbidden in most applications. An investigation of algorithms for which efficient data-uncertainty implementations are not known can be found in W. Du and MJ Atallah ("Secure multi-party computation problems and their applications: A review and open problems ", New Security Paradigms Workshop, 2001 ), And the problem of matrix decomposition is broadly categorized into data mining summary problems.

분류 네트워크들은 원래는 분류 병렬화 뿐만 아니라 효율적인 하드웨어 구현을 가능하게 하도록 개발되었다. 이러한 네트워크들은 입력 시퀀스

를 단조 증가 시퀀스

로 분류하는 회로들이다. 이들은 비교 및 스와프 회로들(compare-and-swap circuits)과 함께 이들의 주요 빌딩 블록을 와이어링함으로써 구성된다. 몇몇 작업들은 암호화 목적으로 분류 네트워크들의 데이터-불확정성을 이용한다. 그러나, 암호화는 프라이버시를 보장하기에 항상 충분하지는 않다. 상대방이 암호화된 저장에 대한 당신의 액세스 패턴들을 관측할 수 있는 경우, 그들은 당신의 애플리케이션들이 무엇을 하고 있는지에 관한 민감한 정보를 여전히 학습할 수 있다. 불확정성 RAM은 메모리가 액세스되고 있을 때 메모리를 계속 셔플링하고; 이에 의해, 어떤 데이터가 액세스되고 있는지 심지어 그것이 이전에 언제 액세스되었는지를 완벽하게 숨김으로써, 이 문제를 해결한다. 불확정성 RAM에서, 분류는 데이터-불확정성 랜덤 순열을 생성하는 수단으로서 사용된다. 더 최근에, 그것은 볼록 껍질(convex hull), 모든-가장 가까운 이웃들, 및 가중된 교집합의 데이터-불확정성 계산을 수행하기 위해 사용되었다.Classification networks were originally developed to enable efficient hardware implementation as well as classification parallelization. These networks may include input sequences

A monotone increasing sequence

. These are configured by wiring their major building blocks together with compare-and-swap circuits. Some operations take advantage of the data-uncertainty of classification networks for encryption purposes. However, encryption is not always sufficient to ensure privacy. If the other party can observe your access patterns for encrypted storage, they can still learn sensitive information about what your applications are doing. Uncertainty RAM continues to shuffle memory while memory is being accessed; This solves this problem by completely hiding what data is being accessed and even when it was previously accessed. In uncertainty RAM, the classification is used as a means of generating a random-permutation data-uncertainty. More recently, it was used to perform data-uncertainty calculations of convex hull, all-nearest neighbors, and weighted intersection.

본원의 원리들은 가중된 교집합에 가깝지만 왜곡 회로들을 포함하는 보안 다자간 분류에 기초한 방법을 제안한다. 도 2는 본원의 원리들에 따른 프라이버시-보호 행렬 분해 시스템 내의 행위자들(actors) 또는 당사자들을 도시한다. 이들은 다음과 같다:The principles of the present application propose a method based on a secure multiline classification approaching weighted intersection but involving distortion circuits. FIG. 2 illustrates actors or parties in a privacy-protection matrix decomposition system according to the principles of the present application. These are:

I. 프라이버시-보호 행렬 분해 동작을 수행하는 엔티티인 추천기 시스템(RecSys)(230). 특히, RecSys는, 사용자들에 관해 유용한 또는 항목 프로파일들이 아닌 사용자 데이터로부터 추출된 어떤 것도 학습하지 않고, 사용자 평가들에 대해 행렬 분해로부터 추출된 바와 같은 항목 프로파일들(V(240))을 학습하기를 원한다.I. Recommender system (RecSys) 230, which is an entity that performs privacy-protect matrix decomposition operations. In particular, RecSys learns item profiles (V (240)) as extracted from the matrix decomposition for user evaluations, without learning anything extracted from user data that is not useful or item profiles about users .

Ⅱ. 사용자들에 관해 유용한 또는 사용자 데이터로부터 추출된 어떠한 것도 학습하지 않고 보안 계산을 가능하게 할 암호-서비스 제공자(CSP)(250). Ⅱ. A Cryptographic Service Provider (CSP) 250 that will enable security calculations without learning anything useful or about users extracted from user data.

Ⅲ. 각각이 항목들의 세트(220)에 대한 평가들의 세트를 가지는 하나 이상의 사용자들(210)로 구성된 소스. 각각의 사용자

는 행렬 분해를 통해 자신의 평가들

에 기초한 항목들의 프로파일링에 동의하지만, 자신의 평가들 또는 심지어 이들이 어느 항목들을 평가했는지를 추천기에 노출하기를 원하지 않는다. 등가적으로, 소스는 하나 이상의 사용자들의 데이터를 포함하는 데이터베이스를 나타낼 수 있다.Ⅲ. Each comprising a set of ratings for each of a set of items (220). Each user

Through its matrix decomposition,

, But do not want to expose to their recommendations or even to what items they have evaluated to the recommender. Equivalently, a source may represent a database containing data of one or more users.

본원의 원리들에 따르면, RecSys가 항목 프로파일들을 제공하기 위해 행렬 분해를 실행하게 하는 반면 RecSys나 CSP 어느 것도 항목 프로파일들, 즉, 도 2에서 RecSys의 단독 출력인 V가 아닌 어떠한 것도 학습하지 않는 프로토콜이 제안된다. 특히, 사용자의 평가들, 또는 심지어 사용자가 실제로 평가한 항목들 중 어느 것도 학습하지 않아야 한다. 통상의 기술자는, 추천기가 사용자 프로파일 및 항목 프로파일 둘 모두를 학습할 수 있게 하는 프로토콜이 너무 많은 것을 노출하며, 즉 이러한 설계에서, 추천기가 수학식 (3)에서의 내적으로부터 사용자의 평가들을 평범하게 추론할 수 있다는 점을 명백하게 인지할 것이다. 따라서, 본원의 원리들은 추천기가 항목 프로파일들만을 학습하는 프라이버시-보호 프로토콜을 제안한다.According to the principles herein, RecSys allows to perform matrix decomposition to provide item profiles, while neither RecSys nor CSP is able to use item profiles, that is, a protocol that does not learn anything other than V, the sole output of RecSys in FIG. Is proposed. In particular, it should not learn any of the user's evaluations, or even the items that the user actually evaluated. A typical descriptor exposes too much of a protocol that allows the recommender to learn both the user profile and the item profile, that is, in this design, the recommender would make user's evaluations from the dot product in equation (3) I can clearly infer that I can reason. Thus, the principles herein suggest a privacy-protection protocol in which the recommender learns only item profiles.

항목 프로파일은 사용자들/레코드들의 세트의 평가들의 함수로서 항목을 정의하는 메트릭으로서 보여질 수 있다. 유사하게, 사용자 프로파일은 사용자들/레코드들의 세트의 평가들의 함수로서 사용자를 정의하는 메트릭으로서 보여질 수 있다. 이러한 견지에서, 항목 프로파일은 항목의 승인/불승인의 척도, 즉, 항목의 특징들 또는 특성들의 반영이다. 그리고, 사용자 프로파일은 사용자의 호/불호(likes/dislikes)의 척도, 즉, 사용자의 성격의 반영이다. 사용자들/레코드들의 큰 세트에 기초하여 계산된 경우, 항목 또는 사용자 프로파일은 각자 항목 또는 사용자의 독립적 척도로서 보여질 수 있다. 통상의 기술자는 항목 프로파일들을 단독으로 학습하는 유틸리티가 존재함을 인지할 것이다. 먼저, 행렬 분해를 통한 R^d 내의 항목들의 삽입은 추천기가 유사성을 추론(및 인코딩)할 수 있게 하며, 그것의 프로파일들이 작은 유클리드 거리를 가지는 항목들은 사용자들에 의해 유사하게 평가된 항목들이다. 따라서, 항목 프로파일들을 학습하는 작업은 추천의 실제 작업을 넘어서 추천기에 대한 관심 대상이다. 특히, 사용자들은 소스가 데이터베이스인 경우일 수 있을 때, 추천들을 수신할 필요가 없을 수 있거나 원하지 않을 수 있다. 둘째, 항목 프로파일들을 획득한 경우, 사소한 정보(trivia)가 존재할 수 있다: 추천기는 사용자들에 의한 어떠한 추가적인 데이터 노출도 없이 관련 추천들을 제공하기 위해 이들을 사용할 수 있다. 추천기는 사용자에게 V를 송신할 수 있고(또는 그것을 공개적으로 릴리즈할 수 있고); 항목마다의 사용자의 평가를 아는 경우, 사용자 i는 수학식 (2)를

에 대해 풀어냄으로써 사용자의 (개인) 프로파일인

를 추론할 수 있고; 주어진 V에 대해(이것은 별도의 문제임), 각각의 사용자는 사용자의 평가들에 대해 리지 회귀(ridge regression)를 수행함으로써 사용자의 프로파일을 획득할 수 있다.

및 V를 가지는 경우, 사용자는 수학식 (4)를 통해 로컬로 다른 항목들에 대한 사용자의 모든 평가들을 예측할 수 있다. 이는 이 출원과 동일한 날에 "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION"라는 명칭으로 출원된 발명자들에 의해 공동-계류중인 출원의 주제이다.The item profile can be viewed as a metric that defines the item as a function of evaluations of a set of users / records. Similarly, the user profile can be viewed as a metric that defines the user as a function of the evaluations of the set of users / records. In this regard, the item profile is a measure of the acceptance / disapproval of the item, i.e., a reflection of the characteristics or characteristics of the item. And, the user profile is a measure of the user's likes / dislikes, i.e., the reflection of the user's personality. When computed based on a large set of users / records, the item or user profile can be viewed as an independent measure of each item or user. A typical descriptor will recognize that there is a utility to learn item profiles solely. First, the insertion of items in R ^d through matrix decomposition allows the recommender to infer (and encode) similarities, and the items whose profiles have small Euclidian distances are similarly evaluated by users. Thus, the task of learning item profiles is of interest to the recommender beyond the actual work of recommendation. In particular, users may or may not want to receive recommendations when they may be the source of the database. Second, if item profiles are acquired, there may be trivia: the recommender can use them to provide relevant recommendations without any additional data exposure by users. The recommender can send a V (or release it publicly) to the user; If the user's evaluation of each item is known, the user i can use Equation (2)

(Personal) profile of the user

Can be inferred; For a given V (which is a separate issue), each user can obtain the user's profile by performing a ridge regression on the user's evaluations.

And V, the user can predict all of the user's evaluations of other items locally via equation (4). This is the subject of a co-pending application by the inventors filed on the same day as the present application under the title "METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION".

위에서 논의된 두 시나리오들 모두는 추천기 또는 사용자들 누구도 V의 공개적 릴리즈를 반대하지 않음을 가정한다. 간략함을 위해, 뿐만 아니라, 추천기에 대한 이러한 프로토콜의 효용 때문에, 본원의 원리들은 추천기가 항목 프로파일들을 학습할 수 있게 한다. 그러나, 이 출원과 동일한 날에 "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION" 및 "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION"라는 명칭으로 출원된 발명자들에 의해 공동-계류중인 출원들에서 기술된 바와 같이, 사용자들이 자신의 예측된 평가들을 학습하는 반면 추천기가 사용자들에 관해 유용한 또는 심지어 V가 아닌 사용자 데이터로부터 추출된 어떠한 것도 학습하지 않도록 이 설계를 확장하기 위한 방식이 존재한다.Both of the scenarios discussed above assume that neither the recommender nor the users are opposed to the public release of V. For the sake of brevity, as well as the utility of such a protocol for recommenders, the principles of the present application allow the recommender to learn item profiles. Filed on the same day as the present application entitled " A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION TO RATING CONTRIBUTING USERS BASED ON MATRIX FACTORIZATION ", and "A METHOD AND SYSTEM FOR PRIVACY-PRESERVING RECOMMENDATION BASED ON MATRIX FACTORIZATION AND RIDGE REGRESSION & While the users learn their predicted assessments, as described in co-pending applications by the inventors of the present invention, the recommender does not learn anything extracted from useful or even non-V user data about users There is a way to extend this design.

통상의 기술자는, 일반적으로, 프로파일(V)의 출력 또는 사용자에 대한 평가 예측들이 다른 사용자의 평가들에 관한 무언가를 노출할 수 있음을 이해할 것이다. 예를 들어, 단 2명의 사용자들이 존재하는 비정상의 경우들에서, 두 노출들 모두는 사용자들이 각각 다른 사람의 평가들을 발견하게 할 수 있다. 본원의 원리들은 이러한 경우들에 초점을 두지 않는다. 항목 프로파일들 또는 개별 평가들의 노출의 프라이버시 내포가 허용가능(tolerable)하지 않을 때, 차동 프라이버시와 같은 기법들이 이러한 출력들에 잡음을 추가하고 이러한 누출들에 대해 보호하기 위해 사용될 수 있다.It will be appreciated by those of ordinary skill in the art that, in general, the output of the profile (V) or the evaluation estimates for the user may expose something to other user evaluations. For example, in the case of abnormalities where there are only two users, both exposures may allow users to discover the evaluations of different persons, respectively. The principles of the present application do not focus on these cases. When the privacy implications of item profiles or exposure of individual assessments are not tolerable, techniques such as differential privacy may be used to add noise to these outputs and protect against such leaks.

본원의 원리들에 따르면, 보안성 보장은 정직하지만 호기심 많은 위협 모델 하에서 유지할 것이라 가정된다. 다시 말해, RecSys 및 CSP가 규정된 바와 같은 프로토콜에 따르지만, 이러한 관심 대상인 당사자들은 일부 추가 정보를 추론하기 위해 프로토콜 사본(transcript)들을 심지어 오프라인으로 분석하도록 선택할 수 있다. 추천기 및 CSP가 결탁하지 않을 것임이 추가로 가정된다.According to the principles herein, security assurance is assumed to be maintained under an honest but curious threat model. In other words, although the RecSys and CSP are in accordance with protocols as specified, these interested parties may choose to analyze protocol transcripts even offline to infer some additional information. It is further assumed that the Recommender and the CSP will not co-exist.

본원의 원리들의 바람직한 실시예는 도 3의 흐름도(300)를 만족시키며 후속하는 단계들에 의해 기술된 프로토콜을 포함한다:A preferred embodiment of the present principles includes a protocol that satisfies the flowchart 300 of FIG. 3 and is described by the following steps:

P1. 소스는 얼마나 많은 토큰들(평가들) 및 항목들의 쌍들이 각각의 참여 레코드에 대해 제출될지를 RecSys에 보고한다(310). 세트 또는 레코드들은 하나 초과의 레코드를 포함하고, 레코드 당 토큰들의 세트는 적어도 하나의 토큰을 포함한다.P1. The source reports to RecSys how many tokens (evaluations) and pairs of items will be submitted for each participating record (310). The set or records contain more than one record, and the set of tokens per record includes at least one token.

P2. CSP는 부분적 준동형 방식에 대한 공개 암호 키인 ξ를 생성하고, 이를 모든 사용자들(소스)에 송신한다(320). 통상의 기술자는 특정 타입의 계산들이 암호문에 대해 수행되도록 하고, 암호해독된 암호화 결과가 평문에 대해 수행된 동작들의 결과에 매치한다는 것을 획득하는 암호화의 형태임을 이해할 것이다. 예를 들어, 누구도 개별 번호들의 값을 찾을 수 없는 상태로, 한 사람이 2개의 암호화된 번호들을 추가할 수 있고, 이후 또다른 사람이 결과를 암호해독할 수 있다. 부분적 준동형 암호화는 평문들에 대한 하나의 연산(덧셈 또는 곱셈)에 대해 준동형이다. 부분적 준동형 암호화는 스칼라에 대한 덧셈 및 곱셈에 대해 준동형일 수 있다.P2. The CSP generates a public cryptographic key ξ for the partial peer-to-peer scheme and sends it to all users (source) (320). It will be appreciated that a typical descriptor is a form of encryption that allows certain types of computations to be performed on a cipher text and that the decrypted cipher result matches the result of the operations performed on the plain text. For example, no one can find the value of individual numbers, one can add two encrypted numbers, and then another person can decrypt the results. Partially perturbed encryption is quasi-dynamic for one operation (addition or multiplication) on plaintexts. Partially perturbed cryptography can be crossover for addition and multiplication to a scalar.

P3. 각각의 사용자는 자신의 키를 사용하여 자신의 데이터를 암호화시키고, 암호화된 데이터를 RecSys에 송신한다(330). 특히, j가 항목 id이고

가 사용자 i가 j에 제공한 평가인 모든 쌍

에 대해, 사용자는 공개 암호 키를 사용하여 이 쌍을 암호화한다.P3. Each user encrypts his / her data using his / her key and transmits the encrypted data to RecSys (330). Specifically, j is the item id

Lt; RTI ID = 0.0 > i < / RTI >

, The user encrypts this pair using the public cryptographic key.

P4. RecSys는 마스크(η)를 암호화된 데이터에 더하고, 마스킹되고 암호화된 데이터를 CSP에 송신한다(340). 통상의 기술자는, 마스크가 데이터 난독화(obfuscation)의 형태이며, 난수 생성기를 추가하는 것 또는 난수에 의해 셔플링하는 것만큼 단순할 수 있음을 이해할 것이다.P4. RecSys adds the mask eta to the encrypted data and sends the masked and encrypted data to the CSP (340). It will be appreciated by those of ordinary skill in the art that the mask is a form of data obfuscation and can be as simple as adding a random number generator or shuffling by random numbers.

P5. CSP는 마스킹된 데이터를 암호해독한다(350).P5. The CSP decrypts the masked data (350).

P6. RecSys는 행렬 분해를 계산하기 위한 별도의 항목들의 세트를 수신하거나 결정한다(360). 이러한 항목들의 세트는 코퍼스 내의 모든 항목들, 모든 항목들의 서브세트, 또는 심지어 레코드들 내에 존재하지 않는 항목들을 포함할 수 있다.P6. RecSys receives or determines a separate set of items to calculate matrix decomposition (360). The set of items may include all items in the corpus, a subset of all items, or even items that are not in the records.

P7. RecSys는, 사용자 및 항목 프로파일들의 디멘젼(즉, 파라미터 d)(372), 평가들의 전체 개수(즉, 파라미터 M)(374), 사용자들 및 항목들의 전체 개수(376) 및 왜곡 회로에서 실수의 정수 및 소수 부분들을 나타내기 위해 사용되는 비트수(378)를 포함하는, 왜곡 회로를 구축하기 위해 필요한 완전한 사양들을 CSP에 송신한다(370). 별도의 항목들의 세트는, 모든 항목들이 레코드들에 존재하지 않는 경우, 파라미터들에 포함될 것이다.P7. The RecSys may include a number of user and item profiles (i.e., parameter d) 372, a total number of ratings (i.e., parameter M) 374, a total number of users and items 376, And the number of bits 378 used to represent the fractional parts, to the CSP (370). A separate set of items will be included in the parameters if all items are not present in the records.

P8. CSP는 별도의 항목들의 세트에 관한 레코드들에 대해 행렬 분해를 수행하는 왜곡 회로로서 통상의 기술자에게 알려져 있는 것을 준비한다(380). 왜곡시키기 위해, 회로는 먼저 부울 회로(382)로서 기입된다. 회로에 대한 입력은 RecSys가 사용자 데이터를 마스킹하기 위해 사용했던 마스크들을 포함한다. 회로 내에서, 마스크는 데이터를 언마스킹하고 이후 행렬 분해를 수행하기 위해 사용된다. 회로의 출력은 항목 프로파일들인 V이다. 임의의 개별 레코드들의 콘텐츠 및 항목 프로파일들이 아닌 레코드들로부터 추출된 임의의 정보의 콘텐츠에 관해 어떠한 지식도 획득되지 않는다.P8. The CSP prepares (380) what is known to those of ordinary skill in the art as a distortion circuit that performs matrix decomposition on records relating to a separate set of items. To distort, the circuit is first written as a Boolean circuit 382. The input to the circuit contains the masks that RecSys used to mask the user data. In the circuit, the mask is used to unmask the data and then perform matrix decomposition. The output of the circuit is V, the item profiles. No knowledge is obtained about the content of any individual records and the content of any information extracted from the records other than the item profiles.

P9. CSP는 행렬 분해를 위한 왜곡 회로를 RecSys에 송신한다(385). 구체적으로, CSP는 게이트들을 왜곡 테이블들 내로 프로세싱하고, 이들을 회로 구조에 의해 정의된 순서로 RecSys에 전송한다.P9. The CSP sends the distortion circuit for matrix decomposition to RecSys (385). Specifically, the CSP processes gates into distortion tables and sends them to RecSys in the order defined by the circuit structure.

P10. RecSys와 CSP(392) 사이의 불확정 전송(390)을 통해, RecSys는, 그 자신 또는 CSP가 실제 값들을 학습하지 않은 채, 암호해독되고 마스킹된 레코드들의 왜곡된 값들을 학습한다. 통상의 기술자는, 불확정 전송이, 송신기가 수신기에 잠재적으로 많은 정보 피스들(pieces of information) 중 하나를 전달하여, (만약 존재하는 경우) 어느 피스가 전달되는지에 대해 불명확하게 유지시키는 전달 타입임을 이해할 것이다.P10. Through an indeterminate transmission 390 between RecSys and CSP 392, RecSys learns the distorted values of the decoded and masked records, either itself or the CSP, without learning the actual values. Conventional descriptors indicate that an indeterminate transmission is a type of transmission that causes the transmitter to transmit one of a number of potentially pieces of information to the receiver and to keep it unclear as to which piece is delivered (if any) I will understand.

P11. RecSys는 항목 프로파일들(V)을 계산하고 항목 프로파일들(V)을 출력하는 왜곡 회로를 평가한다(395).P11. RecSys evaluates (395) distortion circuits that compute item profiles (V) and output item profiles (V).

기술적으로, 이 프로토콜은 V를 넘어서 또한 각각의 사용자에 의해 제공되는 토큰들의 수를 누출한다. 이는 단순한 프로토콜 수정을 통해, 예를 들어, 미리-설정된 최대 수에 도달할 때까지 제출된 레코드들을 적절하게 "널" 엔트리들로 "패딩(padding)" 함(312)으로써, 정류될 수 있다. 간략함을 위해, 프로토콜은 이러한 "패딩" 동작 없이 기술되었다.Technically, this protocol leaks beyond V and also the number of tokens provided by each user. This may be rectified by simple protocol modification, for example by "padding" the submitted records to the "null" entries, as appropriate, until a pre-set maximum number is reached. For brevity, the protocol has been described without this "padding" operation.

왜곡 회로들이 단 한 번만 사용될 수 있음에 따라, 동일한 평가들에 대한 임의의 향후 계산은 사용자들이 프록시 불확정 전송을 통해 자신의 데이터를 재-제출할 것을 요구할 것이다. 프록시 불확정 전송은 셋 이상의 당사자들이 관여된 불확정 전송이다. 이러한 이유로, 본원의 원리들의 프로토콜은 공개-키 암호화를 왜곡 회로들과 결합시키는 하이브리드 방식을 채택한다.As the distortion circuits can only be used once, any future calculations for the same evaluations will require users to re-submit their data via proxy indeterminate transmission. A proxy indeterminate transmission is an indeterminate transmission involving more than two parties. For this reason, the protocol of the principles herein adopts a hybrid approach that combines public-key encryption with distortion circuits.

본원의 원리들에서, 공개-키 암호화는 다음과 같이 사용된다: 각각의 사용자 i는 CSP에 의해 의미론적으로 안전한 암호화 알고리즘

에 제공된 공개 키 pk_CSP하에서 자신의 각자의 입력들

을 암호화하고, 평가된 각각의 항목(j)에 대해, 사용자는

를 가지는 쌍(i, c)을 RecSys에 제출하며, 여기서, M개의 평가들이 전체적으로 제출된다. 자신의 평가들을 제출한 사용자는 오프라인이 될 수 있다.In the principles herein, public-key encryption is used as follows: Each user i is encrypted by CSP with a semantically secure encryption algorithm

_{Lt; RTI ID =} 0.0 > pk _{CSP < /} RTI >

, And for each item j evaluated, the user

(I, c) to RecSys, where M evaluations are submitted as a whole. Users who submit their assessments can be taken offline.

CSP 공개-키 암호화 알고리즘은 부분적으로 준동형인데, 즉, 대응하는 암호해독 키를 알지 않고, 암호화된 메시지에 상수가 적용될 수 있다. 명백하게는, Paillier 또는 Regev와 같은 가산적 준동형 방식이 또한 일정한, 그러나 부분적으로만 준동형이고, 이것으로 충분할 것이며, 이 경우 더욱 효율적으로 구현될 수 있는 hash-ElGamal을 추가하기 위해 사용될 수 있다.The CSP public-key encryption algorithm is partially perturbed, i.e., without knowing the corresponding decryption key, a constant may be applied to the encrypted message. Apparently, additive quasi-dynamic methods such as Paillier or Regev are also constant, but only partially quadratic, which may be sufficient and can be used to add hash-ElGamal which can be implemented more efficiently in this case.

사용자들로부터 M개의 평가들을 수신할 시에 - 암호화가 부분적으로 준동형임을 상기하여 - RecSys는 랜덤 마스크들

을 이용하여 이들을 모호하게 하며, 여기서,

는 랜덤 또는 의사-랜덤 변수이고,

는 XOR 연산이다. RecSys는 왜곡 회로를 구축하기 위해 필요한 완전한 사양들과 함께 이들을 CSP에 송신한다. 특히, RecSys는 사용자 및 항목 프로파일들의 디멘젼(즉, 파라미터 d), 평가들의 전체 수(즉, 파라미터 M), 및 사용자들의 그리고 항목들의 전체 수, 뿐만 아니라, 왜곡 회로에서 실수의 정수 및 소수 부분들을 표현하기 위해 사용되는 비트수를 특정한다.Recieving M ratings from users - Recalling that encryption is partially quasi-dynamic - RecSys uses random masks

To make them ambiguous, where < RTI ID = 0.0 >

Is a random or pseudo-random variable,

Is an XOR operation. RecSys sends these to the CSP along with the complete specifications needed to build the distortion circuit. In particular, RecSys provides the integer and fractional parts of real numbers in the distortion circuit as well as the dimensions (i.e., parameter d) of the user and item profiles, the total number of evaluations (i.e., parameter M) Specifies the number of bits used to represent.

RecSys가 M개의 누산 평가들에 대해 행렬 분해를 수행하기를 원할 때마다, 그것은 CSP에 M을 보고한다. CSP는 RecSys에 (a) 입력들을 암호해독하고 이후 (b) 행렬 분해를 수행하는 왜곡 회로를 제공할 수 있다. V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, 및 N. Taft에 의한 문헌("Privacy-preserving ridge regression on hundreds of millions of records", in IEEE S&P, 2013)에서, 회로 내에서의 암호해독은 마스크들 및 준동형 암호화를 사용함으로써 회피된다. 본원의 원리들은 이 아이디어를 행렬 분해에 대해 이용하지만, 부분적 준동형 암호화 방식만을 요구한다.Whenever RecSys wants to perform matrix decomposition on M accumulation evaluations, it reports M to CSP. CSP can provide a distortion circuit to (a) decode inputs and then (b) perform matrix decomposition on RecSys. In the Privacy-preserving ridge regression on hundreds of millions of records, in IEEE S & P, 2013, by V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft, Cryptanalysis within the circuit is avoided by using masks and perceptual encryption. Our principles use this idea for matrix decomposition, but only require a partial perturbation cipher.

암호화들을 수신할 시에, CSP는 이들을 암호해독하여 마스킹된 값들

을 획득한다. 이후, 행렬 분해를 청사진으로서 사용하여, CSP는:Upon receiving the ciphers, the CSP decrypts them and stores the masked values

. Using the matrix decomposition as a blueprint, CSP then:

(a) 마스크들

에 대응하는 왜곡된 값들을 입력으로서 취하고;(a) masks

Taking as input the distorted values corresponding to < RTI ID = 0.0 >

(b) 마스크들

을 제거하여 대응하는 투플들

을 복원시키고;(b)

Lt; RTI ID = 0.0 >

&Lt; / RTI >

(c) 행렬 분해를 수행하고; 그리고(c) performing matrix decomposition; And

(d) 항목 프로파일들(V)을 출력하는 (d) outputting item profiles (V)

Yao의 왜곡 회로를 준비한다.Prepare Yao's distortion circuit.

수학식 (4) 및 (5)에서 개요화된 기울기 하강 동작에 의한 행렬 분해의 계산은, 실수의 가산, 감산 및 승산들을 수반한다. 이러한 동작들은 회로에서 효율적으로 구현될 수 있다. 기울기 하강(4)의 K번의 반복들은, 각각이 이전 계층의 값들로부터 프로파일들의 새로운 값들을 계산하는, K개의 회로 "계층"에 대응한다. 회로의 출력들은 항목 프로파일들(V)인 반면, 사용자 프로파일들은 폐기된다.The calculation of matrix decomposition by the slope descending operation outlined in equations (4) and (5) involves addition, subtraction and multiplication of real numbers. These operations can be efficiently implemented in a circuit. The K iterations of the slope descent 4 corresponds to K circuit "layers ", each of which computes new values of the profiles from the values of the previous layer. The outputs of the circuit are item profiles (V), while user profiles are discarded.

통상의 기술자는, 연산들이 명료하게, 예를 들어, RAM 모델에서 수행될 때, 기울기 하강의 각각의 반복을 계산하는 시간 복잡도가 O(M)임을 관측할 것이다. 각각의 기울기(5)의 계산은 2M개 항목들을 추가하는 것을 수반하고, 프로파일 업데이트들(4)은

에서 수행될 수 있다.A typical descriptor will observe that the time complexity for calculating each iteration of slope descent is O (M) when the operations are performed explicitly, for example, in a RAM model. The calculation of each slope 5 involves adding 2M entries, and the profile updates 4

Lt; / RTI >

기울기 하강을 회로로서 구현하는데 있어서의 주요 과제는 매우 효율적으로 수행하는 것에 있다. 이를 예시하기 위해, 후속하는 순수한(naive) 구현예를 고려할 수 있다:The main challenge in implementing slope descent as a circuit is to perform very efficiently. To illustrate this, the following naive implementation may be considered:

Q1. 각각의 쌍

에 대해, i가 j를 평가하는 경우 1이고 그렇지 않은 경우 0인 표시자들

을 입력으로부터 계산하는 회로를 생성한다.Q1. Each pair

For indicators where i evaluates j and 1 otherwise,

From the input.

Q2. 각각의 반복에서, 이들 회로들의 출력들을 사용하여, 각자 m 및 n 곱들에 대한 합산으로서 각각의 항목 및 사용자 기울기를 계산한다:Q2. In each iteration, the outputs of these circuits are used to calculate each item and the user slope as a sum for their respective m and n products:

불행히도, 이 구현예는 비효율적인데, 즉, 기울기 하강 알고리즘의 모든 반복이

의 회로 복잡도를 가질 것이다.

일 때, 그것이 일반적으로 실제 경우임에 따라, 위의 회로는 명료하게 기울기 하강보다 훨씬 덜 효율적이다. 실제로, 2차 비용

은 대부분의 데이터세트들에 대해 금지된다. 순수한 구현예의 비효율성은 어느 사용자들이 항목을 평가했는지 그리고 어느 항목들이 회로 설계 시점에 사용자에 의해 평가되는지를 식별하지 못함으로부터 발생하여, 데이터에서의 내재적인 희소성(inherent sparsity)을 레버리지(leverage)할 능력을 경감시킨다.Unfortunately, this implementation is inefficient, i.e., all iterations of the slope descent algorithm

Of circuit complexity.

The above circuit is obviously much less efficient than the slope descent, as it is generally the case in practice. In fact,

Is forbidden for most data sets. The inefficiency of pure implementations arises from the inability to identify which users have evaluated an item and which items are being evaluated by the user at the time of circuit design, thereby leverage inherent sparsity in the data Reduce ability.

반면, 본원의 원리들의 바람직한 실시예에 따르면, 회로 구현예는 그 복잡성이 O((η + m + M)log²(n + m + M))인 분류 네트워크들에 기초하여, 즉, 명료하게 구현예의 다중 로그 인자 내에서 제공된다. 요약하면, 투플들

에 대응하는 입력 데이터, 및 사용자 및 항목 프로파일들 모두에 대한 플레이스홀더 ⊥는 어레이에 함께 저장된다. 적절한 분류 동작들을 통해, 사용자 또는 항목 프로파일들은 이들이 식별자를 공유하는 입력에 가깝게 배치될 수 있다. 데이터를 통한 선형 패스(pass)들은 기울기들의 계산뿐만 아니라 프로파일들의 업데이트들을 허용한다. 분류 시에, 플레이스홀더는 +∞, 즉, 임의의 다른 수보다 더 큰 것으로 취급된다.On the other hand, according to a preferred embodiment of the principles herein, the circuit implementation is based on classification networks whose complexity is O ((侶 + m + M) log ² (n + m + M)), It is provided within the multiple logarithm of the implementation. In summary,

And the placeholder ⊥ for both user and item profiles are stored together in the array. Through appropriate classification operations, user or item profiles can be placed close to the inputs over which they share an identifier. Linear passes through the data allow updates of profiles as well as computation of slopes. In sorting, the placeholder is treated as + ∞, ie, greater than any other number.

본원의 원리들의 바람직한 실시예에 따르며 도 4의 흐름도(400)를 만족시키는 행렬 분해 알고리즘은 후속하는 단계들에 의해 기술될 수 있다:A matrix decomposition algorithm according to a preferred embodiment of the principles herein and that satisfies the flowchart 400 of FIG. 4 may be described by the following steps:

C1. 행렬(S)을 초기화시킨다(410).C1. The matrix S is initialized (410).

알고리즘은 집합들

, 또는 등가적으로, 투플들

을 입력으로서 수신하고, 투플들의 n + m + M 어레이를 구성한다. S의 처음 n개 및 m개 투플들은 각자 사용자 및 항목 프로파일드에 대한 플레이스홀더로서의 역할을 하는 반면, 나머지 M개의 투플들은 입력들(L_i)을 저장한다. 더 구체적으로, 각각의 사용자

에 대해, 알고리즘은 투플

을 구성하고, 여기서

는 랜덤으로 선택된 사용자 i의 초기 프로파일이다. 각각의 항목

에 대해, 알고리즘은 투플

을 구성하고, 여기서

는 또한 랜덤으로 선택된 항목 j의 초기 프로파일이다. 마지막으로, 각각의 쌍

에 대해, 알고리즘은 대응하는 투플

을 구성하고, 여기서

는 항목 j에 대한 사용자 i의 평가이다. 결과적인 어레이는 도 5의 (A)에 도시되어 있는 바와 같다.

에 의해 제k 투플의 제l 엘리먼트를 표기함으로써, 이들 엘리먼트들은 후속하는 역할들을 수행한다:The algorithm

, Or equivalently,

As an input, and constructs an n + m + M array of tuples. The first n and m tuples of S each serve as placeholders for user and item profiles while the remaining M tuples store inputs L _i . More specifically, each user

For the sake of simplicity,

Lt; RTI ID = 0.0 >

Is the initial profile of randomly selected user i. Each item

For the sake of simplicity,

Lt; RTI ID = 0.0 >

Is also the initial profile of the randomly selected item j. Finally, each pair

For the corresponding tuple < RTI ID = 0.0 >

Lt; RTI ID = 0.0 >

Is an estimate of user i for item j. The resulting array is as shown in Fig. 5 (A).

By marking the first element of the k-th tuple by means of these elements, they perform the following roles:

(a) s_1,k : [n] 내의 사용자 식별자들(a) user identifiers in s _{1, k} : [n]

(b) s_2,k : [m] 내의 항목 식별자들(b) s _{2, k} : the item identifiers in [m]

(c) s_3,k : 투플이 "프로파일" 투플인지 또는 "입력" 투플인지를 나타내는 바이너리 플래그(c) s _{3, k} : a binary flag indicating whether the tuple is a "profile" tuple or an "input" tuple

(d) s_4,k : "입력" 투플들에서의 평가들(d) s _{4, k} : evaluations on "input" tuples

(e) s_5,k : R^d에서의 사용자 프로파일들(e) s _{5, k} : user profiles at R ^d

(f) s_6,k : R^d에서의 항목 프로파일들(f) s _{6, k} : Item profiles at R ^d

C2. 사용자 id들에 대해(행들(1 및 3)에 대해) 오름차순으로 투플들을 분류한다(420). 2개의 id들이 동일한 경우, 투플 플래그들, 즉, 각각의 투플 내의 3번째 엘리먼트들을 비교함으로써 결속들(ties)을 끊는다. 따라서, 분류 이후, 각각의 "사용자 프로파일" 투플은 동일한 id를 가지는 "입력" 투플들에 의해 이어진다.C2. Classify the tuples in ascending order (for rows 1 and 3) for user ids (420). If the two ids are equal, the ties are broken by comparing the tuple flags, i.e., the third elements in each tuple. Thus, after classification, each "user profile" tuple is followed by "input"

C3. 사용자 프로파일들을 복제한다(좌측 패스)(430):C3. Clone user profiles (left pass) 430:

에 대해,

About,

C4. 항목 id들에 대해(행들(2 및 3)에 대해) 오름차순으로 투플들을 분류한다(440). 2개의 id들이 동일한 경우, 투플 플래그들, 즉, 각각의 투플 내의 3번째 엘리먼트들을 비교함으로써 결속들을 끊는다.C4. (440) the tuples in ascending order (for rows 2 and 3) for the item IDs. If the two id's are the same, break the binds by comparing the tuple flags, i.e., the third elements in each tuple.

C5. 항목 프로파일들(좌측 패스)을 복제한다(450):C5. Clone item profiles (left pass) 450:

에 대해,

About,

C6. ∀k<M에 대해, 기울기 기여도들을 계산한다(460):C6. For ∀k <M, the slope contributions are calculated (460):

∀k<M에 대해,For ∀k < M,

C7. 항목 프로파일들을 업데이트한다(우측 패스)(470):C7. Update the entry profiles (right pass) (470):

에 대해,

About,

C8. 행들(1 및 3)에 대해 투플들을 분류한다(475)C8. Classify the tuples for rows 1 and 3 (475)

C9. 사용자 프로파일들을 업데이트한다(우측 패스)(480):C9. Update user profiles (right pass) (480):

에 대해,

About,

C10. 반복 횟수가 K보다 더 작은 경우, C3로 간다(485)C10. If the iteration count is less than K, go to C3 (485)

C11. 행들(3 및 2)에 대해 투플들을 분류한다(490)C11. Classify the tuples for rows 3 and 2 (490)

C12. k = 1,..., m에 대해 항목 프로파일들(s_6,k)을 출력하며(495), 출력은 적어도 하나의 항목 프로파일로 제한될 수 있다.C12. (495) the item profiles (s _{6, k} ) for k = 1, ..., m, and the output may be limited to at least one item profile.

기울기 하강 반복들은 후속하는 3개의 주요 단계들을 포함한다:The slope descent iterations include the following three main steps:

A. 프로파일들을 복제한다: 각각의 반복 시에, 각각의 개별 사용자 i 및 각각의 항목 j의 프로파일들(

및

)은 i 및 j가 나타나는 각각의 "입력" 투플의 대응하는 엘리먼트들(s_5,k 및 s_6,k)에 복제된다. 이는 알고리즘의 단계들(C2 내지 C5)에서 구현된다. 예를 들어, 사용자 프로파일들을 복제하기 위해, S는 주 인덱스로서 사용자 id(즉, s_1,k) 및 보조 인덱스로서 플래그(즉, s_3,k)를 사용하여 분류된다. S의 초기 상태에 적용되는 이러한 분류의 예는 도 5의 (B)에서 찾을 수 있다. 후속적으로, 사용자 id들은 알고리즘의 단계(C3)에서 공식적으로 기재된 바와 같이, 어레이를 좌측에서 우측으로("좌측" 패스) 트래버스(traversing)함으로써 복제된다. 이는 s_5,k를 각각의 "프로파일" 투플로부터 그것의 인접한 "입력" 투플들로 복제하며; 항목 프로파일들이 유사하게 복제된다.A. Replicate profiles : At each iteration, the profiles of each individual user i and each item j (

And

) Is duplicated in the corresponding elements (s _{5, k} and s _{6, k} ) of each "input" tuple in which i and j appear. This is implemented in steps (C2 to C5) of the algorithm. For example, to replicate user profiles, S is classified using the user id (i.e., s _{1, k} ) as the primary index and the flag (i.e., s _{3, k} ) as the secondary index. An example of this classification applied to the initial state of S can be found in Figure 5 (B). Subsequently, the user IDs are replicated by traversing the array from left to right ("left" path), as is formally described in step C3 of the algorithm. This replicates s _{5, k} from its respective "profile" to its adjacent "input"tuples; Item profiles are similarly replicated.

B. 기울기 기여도들을 계산한다: 프로파일들이 복제된 이후, 예를 들어, (i,j)에 대응하는 각각이 "입력" 투플은, 마지막 반복에서 계산된 바와 같이, 평가(

)(s_4,k 내의) 뿐만 아니라 프로파일들(

및

)(각자 s_5,k 및 s_6,k내의)을 저장한다. 이들로부터, 후속하는 양:

및

이 계산되며, 이는 수학식 (5)에 의해 주어진 바와 같이

및

에 대해 기울기들에서의 투플의 "기여도"로서 보일 수 있다. 이들은, 알고리즘의 단계(C6)에 의해 지시된 바와 같이, 투플의 s_5,k 및 s_6,k 엘리먼트들을 대체한다. 플래그들의 적절한 사용을 통해, 이러한 동작은 "입력" 투플들에만 영향을 미치며, "프로파일" 투플들을 변경없이 남겨둔다.B. Compute slope contributions : After the profiles have been copied, each "input" tuple corresponding to, for example, (i, j)

) (in s _{4, k} ) as well as profiles

And

) (Each s _{5, k} And s _{6, k} ). From these, the following amounts:

And

Is calculated, which is given by Equation (5)

And

Quot; contribution "of the tuple at slopes with respect to < / RTI > These are s _{5, k} (n) of the tuple, as indicated by step C6 of the algorithm And s _{6, k} elements. Through proper use of flags, this operation only affects the "input" tuples, leaving the "profile" tuples unchanged.

C. 프로파일들을 업데이트한다: 마지막으로, 알고리즘의 단계들(C7 내지 C9)에 도시된 바와 같이 사용자 및 항목 프로파일들이 업데이트된다. 적절한 분류를 통해, "프로파일" 투플들은 이들이 id들을 공유하는 "입력" 투플들에 인접하도록 다시 만들어진다. 업데이트된 프로파일들은 어레이의 우-에서-좌로(right-to-left) 트래버스("우측 패스")를 통해 계산된다. 이 동작은 기울기들이 "입력" 투플들을 트래버스함에 따라 기울기들의 기여도들을 추가한다. "프로파일" 투플에 마주칠 시에, 합산된 기울기 기여도들이 프로파일에 추가되고, 적절하게 스케일링된다. 프로파일을 패스한 이후, 기울기 기여도들의 합산은 플래그들(s_3,k, s_3,k ₊ ₁)의 적절한 사용을 통해, 제로로부터 재시작한다.C. Profiles Update : Finally, the user and item profiles are updated as shown in steps (C7 through C9) of the algorithm. With proper classification, the "profile" tuples are re-created such that they are adjacent to the "input" Updated profiles are computed through a right-to-left traversal ("right-hand path") of the array. This operation adds the contributions of the slopes as the slopes traverse the "input" tuples. Upon encountering the "profile" tuple, the summed slope contributions are added to the profile and scaled appropriately. After passing the profile, the summation of the slope contributions restarts from zero, through appropriate use of flags (s _{3, k} , s _{3, k} ₊ ₁ ).

위의 동작들은 K번, 즉, 기울기 하강의 바람직한 반복 횟수만큼 반복될 것이다. 마지막으로, 마지막 반복의 종료 시에, 어레이는 주 인덱스로서의 플래그들(즉, s_3,k), 및 보조 인덱스로서의 항목 id들(즉, s_2,k)에 대해 분류된다. 이는, 항목 프로파일들이 출력될 수 있는 어레이 내의 처음 m개 위치들에 모든 항목 프로파일 투플들을 가져온다. 또한, 사용자 프로파일들을 획득하기 위해, 마지막 반복의 종료 시, 어레이는 주 인덱스로서의 플래그들(즉, s_3,k), 및 보조 인덱스로서의 사용자 id들(즉, s_1,k)에 대해 분류된다. 이는 사용자 프로파일들이 출력될 수 있는 어레이 내의 처음 n개 위치들에 모든 사용자 프로파일 투플들을 가져온다.The above operations will be repeated K times, that is, the desired number of iterations of the slope descent. Finally, at the end of the last iteration, the array is sorted for flags (i.e., s _{3, k} ) as the primary index and item id's (i.e., s _{2, k} ) as the secondary index. This brings all item profile tuples to the first m positions in the array where item profiles can be output. Further, to obtain user profiles, at the end of the last iteration, the array is sorted for the flags (i.e., s _{3, k} ) as the primary index and user ids (i.e., s _{1, k} ) as the secondary index . This brings all user profile tuples to the first n positions in the array where user profiles can be output.

통상의 기술자는 위의 동작들 각각이 데이터-불확정적이며, 회로로서 구현될 수 있음을 인지할 것이다. 프로파일들의 복제 및 업데이트는 (n + m + M)개 게이트들을 요구하고, 따라서, 전체 복잡도는 예를 들어, Batcher의 회로를 사용하여 0((n + m + M)log²(n + m + M)) 비용을 산출하는 분류에 의해 결정될 수 있다. 알고리즘의 단계(C6)에서의 분류 및 기울기 계산은 계산상으로 가장 집중적인 동작들이며; 다행히, 둘 모두 매우 병렬화가능(parallelizable)하다. 추가로, 분류는 각각의 반복에서 이전에 계산된 비교들을 재사용함으로써 추가로 최적화될 수 있다. 특히, 이 회로는 부울 회로로서(예를 들어, OR, AND, NOT 및 XOR 게이트들의 그래프로서) 구현될 수 있으며, 이는 이전에 설명된 바와 같이, 구현예가 왜곡될 수 있게 한다.It will be appreciated by those of ordinary skill in the art that each of the above operations is data-indeterminate and may be implemented as circuitry. Copying and updating of the profile is (n + m + M) more gates require, and therefore, overall complexity, e.g., 0 by using the circuit of the Batcher ((n + m + M ) log 2 (n + m + M)) cost. The classification and slope calculations in step C6 of the algorithm are computationally intensive operations; Fortunately, both are highly parallelizable. Additionally, the classification may be further optimized by reusing the previously calculated comparisons in each iteration. In particular, this circuit may be implemented as a Boolean circuit (e.g., as a graph of OR, AND, NOT, and XOR gates), which, as previously described, allows the implementation to be distorted.

본원의 원리들에 따르면, 이전에 기술된 프로토콜과 함께 전술된 행렬 분해 알고리즘의 구현예는, 프라이버시-보호 방식으로, 행렬 분해를 위한 신규 방법을 제공한다. 추가로, 이 해법은 분류 네트워크들을 사용함으로써 명백하게 수행되는 행렬 분해의 다중로그 인자(polylogarithmic factor) 내의 복잡도를 가지는 회로를 획득한다. 또한 이 구현예의 추가적인 장점은 이 회로의 왜곡 및 실행이 매우 병렬화가능하다는 점이다.According to the principles herein, an implementation of the matrix decomposition algorithm described above in conjunction with the previously described protocol provides a novel method for matrix decomposition, in a privacy-protected manner. In addition, the solution obtains a circuit having complexity in a polylogarithmic factor of matrix decomposition, which is explicitly performed by using classification networks. A further advantage of this implementation is that the distortion and execution of this circuit is highly parallelizable.

본원의 원리들에 따른 시스템의 구현예에서, 왜곡 회로 구성은 공개적으로 이용가능한 왜곡 회로 프레임워크인 FastGC에 기초하였다. FastGC는 자바-기반 공개-소스 프레임워크이며, 이는 기초 XOR, OR 및 AND 게이트들을 사용하는 회로 정의를 가능하게 한다. 일단 회로들이 구성되면, 프레임워크는 왜곡 불확정 전송 및 왜곡된 회로의 완전한 평가를 다룬다. 그러나, 회로를 왜곡하고 실행하기 이전에, FastGC는 자바 객체들의 세트로서 메모리 내의 전체 미왜곡 회로를 나타낸다. 이러한 객체들은, 게이트들의 서브세트만이 임의의 시점에서 왜곡되고 그리고/또는 실행됨에 따라, 미왜곡된 회로가 도입해야 하는 메모리 풋프린트에 대한 상당한 메모리 오버헤드를 초래한다. 또한, FastGC가 전술된 바와 같은 실행 프로세스와 병렬로 왜곡을 수행하지만, 두 동작들 모두 순차적 방식으로 발생하는데, 즉, 게이트들은 이들의 입력들이 준비되면 한번에 하나씩 프로세싱된다. 통상의 기술자는 이 구현예가 병렬화에 대해 수정가능하지 않음을 명백하게 인지할 것이다.In an implementation of the system according to the principles of the present disclosure, the distortion circuitry was based on the publicly available distortion circuit framework FastGC. FastGC is a Java-based open-source framework, which enables circuit definition using basic XOR, OR, and AND gates. Once the circuits are constructed, the framework handles distortion uncertain transmission and a complete evaluation of the distorted circuit. However, before distorting and executing the circuit, FastGC represents the entire unoriented circuit in memory as a set of Java objects. These objects result in significant memory overhead for the memory footprint that the un-distorted circuit must introduce as only a subset of the gates are distorted and / or executed at any point in time. In addition, while FastGC performs distortion in parallel with the execution process as described above, both operations occur in a sequential manner, i.e., the gates are processed one at a time when their inputs are ready. Those of ordinary skill in the art will clearly recognize that this implementation is not modifiable for parallelism.

그 결과, 프레임워크는 이러한 2개의 이슈들을 다루도록 수정되어, FastGC의 메모리 풋프린트를 감소시킬 뿐만 아니라 다수의 프로세서들에 걸친 병렬화된 왜곡 및 계산을 가능하게 한다. 특히, 회로를, 각각이 병렬로 실행될 수 있는 수직 "슬라이스들"의 세트를 포함하는 순차적 "계층들"로 수평으로 구획하기 위한 능력을 도입하였다. 계층은 그것의 모든 입력들이 준비될 때만 메모리에서 생성된다. 일단 그것이 왜곡되고 평가되면, 전체 계층이 메모리로부터 제거되고, 후속하는 계층이 구성될 수 있으며, 따라서, 메모리 풋프린트를 가장 큰 계층의 사이즈로 제한한다. 계층의 실행은 스케쥴러를 사용하여 그것의 슬라이스들을 스레드들에 할당하여, 이들을 병렬로 실행할 수 있게 하도록 수행된다. 병렬화가 다수의 코어들을 가진 단일 머신 상에서 구현되었지만, 슬라이스들 간의 어떠한 공유 상태도 가정되지 않으므로, 구현예는 간단한 방식으로 상이한 머신들에 걸쳐 실행하도록 확장될 수 있다.As a result, the framework has been modified to address these two issues, which not only reduces the memory footprint of FastGC, but also enables parallelized distortion and computation across multiple processors. In particular, we have introduced the ability to horizontally partition circuits into sequential "layers" that each contain a set of vertical "slices" that can be executed in parallel. A layer is created in memory only when all its inputs are ready. Once it is distorted and evaluated, the entire hierarchy can be removed from memory and subsequent hierarchies can be constructed, thus limiting the memory footprint to the size of the largest hierarchy. Execution of the hierarchy is performed by using a scheduler to allocate its slices to the threads so that they can be executed in parallel. Although parallelism is implemented on a single machine with multiple cores, no sharing state between the slices is assumed, so the implementation can be extended to run across different machines in a simple manner.

마지막으로, 알고리즘에서 개요화된 수치 연산들을 구현하기 위해, FastGC는 고정-포인트 숫자 표현을 통해 실재하는 가산 및 승산들, 뿐만 아니라 분류를 지원하도록 확장되었다. 분류를 위해, Batcher의 분류 네트워크가 사용되었다. 고정점 표현은 절단으로부터 초래되는 정확성 손실과 회로 사이즈 사이의 절충을 도입하였다.Finally, to implement the numerical operations summarized in the algorithm, FastGC has been extended to support classification as well as additions and multiplications, as well as real-valued, via fixed-point number representation. For classification, Batcher's classification network was used. Fixed-point representation introduces a trade-off between accuracy loss resulting from truncation and circuit size.

또한, 알고리즘의 구현예는 다수의 방식으로 최적화되었으며, 특히:Furthermore, the implementation of the algorithm has been optimized in a number of ways, in particular:

(a) 그것은 회로 실행의 시작 시에 계산된 비교들을 재사용함으로써 분류 비용을 감소시킨다:(a) It reduces the cost of classification by reusing the comparisons calculated at the start of circuit execution:

분류 네트워크의 기본 빌딩 블록은 필요한 경우 2개의 항목들을 비교하고 이들을 스와핑하는 비교-및-스와프 회로이며, 이에 따라서, 출력 쌍이 정렬된다. 행렬 분해 알고리즘의 분류 동작들(선(C4 및 C8))은, 반복마다 정확히 동일한 입력들을 사용하여, K개의 기울기 하강 반복 각각에서 투플들 간에 동일한 비교들을 수행한다. 실제로, 각각의 분류는 각각의 반복에서 정확히 동일한 방식으로 어레이(S) 내의 투플들을 치환한다. 이 특징은 이들 분류들 각각에 대한 비교 동작들을 단 한번만 수행함으로써 이용된다. 특히, 형태(i, j, 플래그, 평가)의 투플들의 분류들은 (사용자 또는 항목 프로파일들의 페이로드 없이) 계산의 시작시에, 예를 들어, 먼저 i와 플래그, j와 플래그 및 다시 i와 플래그에 대해 수행된다. 후속적으로, 비교 회로들의 출력들은 기울기 하강 동안 사용되는 스와프 회로들에 대한 입력으로서 이들 분류들 각각에서 재사용된다. 그 결과, 각각의 반복에서 적용된 "분류" 네트워크는 임의의 비교들을 수행하는 것이 아니라, 단순히 투플들을 치환한다(즉, 그것은 "치환" 네트워크임);The basic building block of the classification network is a compare-and-swap circuit that compares two items, if necessary, and swaps them, so that the output pairs are aligned accordingly. The classification operations (lines C4 and C8) of the matrix decomposition algorithm perform the same comparisons between the tuples in each of the K slope descent iterations, using exactly the same inputs per iteration. In practice, each classification replaces the tuples in the array S in exactly the same way in each iteration. This feature is used by performing only one comparison operation for each of these classifications. In particular, the classifications of the tuples of the form (i, j, flags, evaluation) are computed at the beginning of the calculation (without the payload of user or item profiles), for example with i and flag first, j and flag, Lt; / RTI > Subsequently, the outputs of the comparison circuits are reused in each of these classifications as inputs to the swap circuits used during the slope descent. As a result, the "classification" network applied in each iteration simply replaces the tuples (i. E., It is a "replacement" network), rather than performing any comparisons;

(b) 그것은 어레이(S)의 사이즈를 감소시킨다:(b) it reduces the size of the array S:

모든 비교들의 사전계산은 우리가 또한 S 내의 투플들의 사이즈를 크게 감소시킬 수 있게 한다. 처음에, 통상의 기술자는 사용자 또는 항목 id들에 대응하는 행들이 분류 동안 비교들에 대한 입력으로서 행렬 분해 알고리즘에서만 사용됨을 관측할 수 있다. 플래그들 및 평가들은 복제 및 업데이트 단계 동안 사용되지만, 이들의 상대적인 위치들은 각각의 반복에서 동일하다. 또한, 이들 위치들은 우리의 계산 시작시에 투플들(i, j, 플래그, 평가)의 분류의 출력으로서 계산될 수 있다. 따라서, 각각의 반복에서 수행되는 "치환" 동작들은 사용자 및 항목 프로파일들에만 적용될 필요가 있으며; 모든 다른 행들은 어레이(S)로부터 제거될 수 있다. 하나 이상의 개선은 한 세트의 프로파일들, 예를 들어, 사용자들을 수정하고, 항목 프로파일들만을 치환하여, 추가적으로 2배만큼 치환들의 비용을 감소시킨다. 그후, 항목 프로파일들은 각각이 치환을 통해 다른 것으로부터 도달가능한 2개의 상태들, 즉, 이들이 사용자 프로파일들과 함께 정렬되며 부분적 기울기들이 계산되는 하나의 상태, 및 항목 프로파일들이 업데이트되고 복제되는 하나의 상태 사이에서 회전한다.The precomputation of all comparisons allows us to also greatly reduce the size of the tuples in S. Initially, the normal descriptor can observe that rows corresponding to user or item ids are used only in the matrix decomposition algorithm as inputs to comparisons during classification. The flags and evaluations are used during the cloning and updating phase, but their relative positions are the same in each iteration. These positions can also be calculated as the output of the classification of the tuples (i, j, flags, evaluation) at the start of our calculations. Thus, the "replacement" operations performed in each iteration need only be applied to user and item profiles; All other rows may be removed from the array S. One or more improvements may result in modifying a set of profiles, e. G., Users, and replacing only the item profiles, further reducing the cost of the substitutions by a factor of two. Thereafter, the item profiles can be divided into two states, each of which can be reached from the other via substitution, that is, one state in which they are aligned with user profiles and partial slopes are calculated, and one state in which item profiles are updated and replicated .

(c) 그것은 XOR들을 사용함으로써 스와프 동작들을 최적화시킨다:(c) It optimizes swap operations by using XORs:

XOR 연산들이 "무료로" 실행될 수 있는 것을 고려하면, 비교, 스와프, 업데이트 및 복제 동작들의 최적화는 가능한 경우마다 XOR들을 사용함으로써 수행된다. 통상의 기술자는, 무료-XOR 게이트들이 연관된 왜곡 테이블들 및 대응하는 해싱 또는 대칭 키 연산들 없이 왜곡되어, 계산 및 통신에서의 뚜렷한 개선을 나타낼 수 있다는 점을 이해할 것이다.Considering that XOR operations can be performed "free ", optimization of comparison, swap, update and copy operations is performed by using XORs whenever possible. It will be appreciated by those of ordinary skill in the art that free-XOR gates may be distorted without the associated distortion tables and corresponding hashing or symmetric key operations, resulting in a distinct improvement in computation and communication.

(d) 그것은 계산들을 병렬화한다:(d) it parallelizes the calculations:

분류 및 기울기 계산들은 행렬 분해 회로에서의 대부분의 계산을 구성한다(복제 및 업데이트는 기껏해야 실행 시간의 3% 및 비-xor 게이트들의 0.4%에 기여한다); 이들 동작들은 FastGC의 이러한 확장을 통해 병렬화된다. 기울기 계산들은 명백하게 병렬화가능하며; 분류 네트워크들은 또한 매우 병렬화가능하다(병렬화는 그 개발 이면의 주요 동기이다). 또한, 각각의 분류에서의 병렬 슬라이스들 중 다수가 동일하기 때문에, 회로 슬라이스들을 정의하는 동일한 FastGC 객체들이 상이한 입력들과 함께 재사용되어, 메모리에서 객체들을 반복적으로 생성하고 소멸시킬 필요성을 현저하게 감소시킨다.Classification and slope calculations constitute the majority of calculations in matrix decomposition circuits (cloning and updating contribute at most to 3% of execution time and 0.4% of non-xor gates); These operations are parallelized through this extension of FastGC. The slope calculations are apparently parallelizable; Classification networks are also highly parallelizable (parallelization is the main motivation behind the development). Also, since many of the parallel slices in each class are the same, the same FastGC objects defining the circuit slices are reused with different inputs, significantly reducing the need to repeatedly create and destroy objects in memory .

본원의 원리들이 다양한 형태들의 하드웨어, 소프트웨어, 펌웨어, 특수 목적 프로세서들, 또는 이들의 조합에서 구현될 수 있다는 점이 이해되어야 한다. 바람직하게는, 본원의 원리들은 하드웨어와 소프트웨어의 조합으로서 구현된다. 또한, 소프트웨어는 바람직하게는 프로그램 저장 디바이스 상에서 유형으로 구현된 응용 프로그램으로서 구현된다. 응용 프로그램은 임의의 적절한 아키텍쳐를 포함하는 머신에 업로드되어 이에 의해 실행될 수 있다. 바람직하게는, 머신은 하나 이상의 중앙 처리 장치(CPU), 랜덤 액세스 메모리(RAM), 및 입력/출력(I/O) 인터페이스(들)와 같은 하드웨어를 가지는 컴퓨터 플랫폼 상에서 구현된다. 컴퓨터 플랫폼은 또한 운영 체제 및 마이크로명령 코드를 포함한다. 본원에 기술된 다양한 프로세스들 및 함수들은 마이크로명령 코드의 일부 또는 응용 프로그램의 일부(또는 이들의 조합)일 수 있으며, 이는 운영 체제를 통해 실행된다. 추가로, 추가적인 데이터 저장 디바이스 및 인쇄 디바이스와 같은 다양한 다른 주변 디바이스들은 컴퓨터 플랫폼에 접속될 수 있다.It is to be understood that the principles herein may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof. Preferably, the principles herein are implemented as a combination of hardware and software. Further, the software is preferably implemented as an application program that is implemented as a type on a program storage device. The application program may be uploaded to and executed by a machine including any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing unit (CPU), random access memory (RAM), and input / output (I / O) interface (s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may be part of the microcommand code or part of the application (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices, such as additional data storage devices and printing devices, may be connected to the computer platform.

도 6은 본원의 원리들을 구현하기 위해 사용되는 최소 컴퓨팅 환경(600)의 블록도를 도시한다. 컴퓨팅 환경(600)은 프로세서(610), 및 적어도 하나의(그리고 바람직하게는 하나 초과의) I/O 인터페이스(620)를 포함한다. I/O 인터페이스는 유선 또는 무선일 수 있고, 무선 구현예에서, 컴퓨팅 환경(600)이 글로벌 네트워크(예를 들어, 인터넷) 상에서 동작하고 예를 들어, 본원의 원리들이 최종 사용자들에게 원격으로 제공되는 SAAS(Software as a Service; 서비스로서의 소프트웨어) 피처로서 제공될 수 있게 하기 위해 다른 컴퓨터들 또는 서버들(예를 들어, 클라우드 기반 컴퓨팅 또는 저장 서버들)과 통신하게 하도록 적절한 무선 통신 프로토콜들을 이용하여 사전-구성된다. 하나 이상의 메모리들(630) 및/또는 저장 디바이스들(HDD)(640)이 또한 컴퓨팅 환경(600) 내에 제공된다. 컴퓨팅 환경(600) 또는 복수의 컴퓨터 환경들(600)은 본원의 원리들의 일 실시예에 따른 행렬 분해(C1-C12)(도 4)에 대해, 프로토콜(P1-P11)(도 3)을 구현할 수 있다. 특히, 본원의 원리들의 실시예에서, 컴퓨팅 환경(600)은 RecSys(230)를 구현할 수 있고; 별도의 컴퓨팅 환경(600)은 CSP(250)를 구현할 수 있고, 소스는, 각각이 다른 사용자(210)와 연관되며, RecSys(230) 및 CSP(250)와 통신하기 위해 사용되는, 데스크톱 컴퓨터, 셀룰러 폰, 스마트폰, 폰 와치(phone watch), 태블릿 컴퓨터, 개인용 디지털 보조단말(PDA), 넷북 및 랩톱 컴퓨터를 포함하지만 이에 제한되지 않는, 하나의 또는 복수의 컴퓨터 환경들(600)을 포함할 수 있다. 추가로, CSP(250)는 소스에 포함되거나, 또는 등가적으로, 소스의 각각의 사용자(210)의 컴퓨터 환경에 포함될 수 있다.FIG. 6 illustrates a block diagram of a minimal computing environment 600 used to implement the principles herein. The computing environment 600 includes a processor 610, and at least one (and preferably more than one) I / O interface 620. The I / O interface may be wired or wireless, and in a wireless implementation, the computing environment 600 may operate on a global network (e.g., the Internet) and may include, for example, Using appropriate wireless communication protocols to communicate with other computers or servers (e.g., cloud-based computing or storage servers) in order to be provided as a software as a service (SAAS) feature Pre-configured. One or more memories 630 and / or storage devices (HDD) 640 are also provided in the computing environment 600. Computing environment 600 or a plurality of computer environments 600 may implement protocols P1-P11 (Figure 3) for matrix decomposition (C1-C12) (Figure 4) according to one embodiment of the present principles . In particular, in an embodiment of the principles herein, computing environment 600 may implement RecSys 230; A separate computing environment 600 may implement the CSP 250 and the sources may be a desktop computer, a desktop computer, a laptop computer, etc., each of which is associated with another user 210 and used to communicate with the RecSys 230 and the CSP 250. [ But are not limited to, one or more computer environments 600, including but not limited to cellular phones, smart phones, phone watches, tablet computers, personal digital assistants (PDAs), netbooks and laptop computers . In addition, the CSP 250 can be included in the source, or equivalently, included in the computer environment of each user 210 of the source.

첨부 도면들에 도시된 구성 시스템 컴포넌트들 및 방법 단계들 중 일부가 바람직하게는 소프트웨어로 구현되기 때문에, 시스템 컴포넌트들(또는 프로세스 단계들) 사이의 실제 접속들은 본원의 원리들이 프로그래밍되는 방식에 따라 상이할 수 있다는 점이 추가로 이해되어야 한다. 본원의 교시들을 고려하면, 관련 기술분야의 통상의 지식을 가진 자는 본원의 원리들의 이러한 그리고 유사한 구현예들 또는 구성들을 참작할 수 있을 것이다.Because some of the constituent system components and method steps shown in the accompanying drawings are preferably implemented in software, actual connections between system components (or process steps) may be different depending on the manner in which the principles herein are programmed It should be further understood that it can be done. Given the teachings herein, those skilled in the art will be able to contemplate these and similar implementations or configurations of the principles herein.

예시적인 실시예들이 첨부 도면들과 관련하여 본원에 기술되었지만, 본원의 원리들이 상기 정확한 실시예들에 제한되지 않으며, 본원의 원리들의 범위 또는 사상에서 벗어나지 않고, 관련 기술분야의 통상의 지식을 가진 자에 의해 다양한 변경들 및 수정들이 실행될 수 있다는 점이 이해되어야 한다. 모든 이러한 변경들 및 수정들은 첨부된 청구항들에서 설명된 바와 같이 본원의 원리들의 범위 내에 포함되도록 의도된다.While the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the principles herein are not limited to the precise embodiments set forth herein, but are capable of modifications without departing from the scope or spirit of the principles herein, It should be understood that various changes and modifications may be effected therein by one or more of those skilled in the art. All such modifications and variations are intended to be included within the scope of the principles herein as set forth in the appended claims.

Claims

A method for securely profiling items through matrix factorization, the method comprising:
Receiving a set of records (220) from a source, the record comprising a set of tokens and a set of items, each record kept secret from parties other than the source;
Receiving (360) at least one separate item; And
(395) the set of records and the at least one separate item in a recommender (RecSys) 230 using a garbled circuit based on matrix decomposition, the output of the distortion circuit Comprising item profiles for at least one separate item,
&Lt; / RTI >

The method according to claim 1,
Designing the distortion circuit in a Crypto-System Provider (CSP) to perform a matrix decomposition on the set of records (380) and the at least one separate item (360) Comprises item profiles for the at least one separate item; And
Transmitting (385) the distortion circuit to the RecSys,
&Lt; / RTI >

3. The method of claim 2,
Wherein the designing step includes designing a matrix decomposition operation as a Boolean circuit (382).

The method of claim 3,
The step of designing the matrix disassembly circuit comprises:
Constructing (410) an array of sets of records; And
For the array, the operations of calculating the classification operations 420, 440, 470 and 490, the duplication operations 430 and 450, the update operations 470 and 480, the comparison operation 480 and the gradient contributions 460)
&Lt; / RTI >

3. The method of claim 2,
Encrypting the set of records to generate encrypted records (330), wherein the encrypting is performed prior to the step of receiving the set of records.

6. The method of claim 5,
Generating public cryptographic keys in the CSP; And
Transmitting (320) the keys to the source,
&Lt; / RTI >

6. The method of claim 5,
Wherein the encryption is partially homomorphic encryption 320, the method comprising:
Masking the encrypted records in the RecSys to generate (340) masked records; And
Decrypting the masked records in the CSP to generate decrypted-masked records (350).

8. The method of claim 7,
The designing step 380 includes:
Unmasking the decrypted-masked records in the distortion circuit prior to processing the decrypted-masked records.

8. The method of claim 7,
Further comprising performing (392) oblivious transfers (390) between the CSP and the RecSys, the RecSys receiving distorted values of the decrypted-masked records, Are kept private from the RecSys and the CSP.

The method according to claim 1,
Further comprising receiving a plurality of tokens and items of each record (220, 310).

The method according to claim 1,
Padding each record with null entries to produce records having the same number of tokens as the value when the number of tokens in each record is less than the value indicating the maximum value (step < RTI ID = 0.0 > 312). &Lt; / RTI >

The method according to claim 1,
Wherein the source of the set of records is one of a set of databases and users 210, each user is a source of one record, and the one record is kept secret from parties other than the respective user .

3. The method of claim 2,
Further comprising receiving a set of parameters for the design of the distortion circuit by the CSP, wherein the parameters are transmitted by the RecSys (370).

A system for securely profiling items through matrix decomposition,
The system includes a source to provide a set of records, a cryptographic service provider (CSP) to provide a secure matrix factorization circuit, and a cryptographic service provider to evaluate the records such that the records are kept private from parties other than the source RecSys, wherein the source, the CSP and the RecSys each comprise:
A processor (602) for receiving at least one input / output (604); And
At least one memory (606, 608) in signal communication with the processor,
Wherein the processor of the RecSys comprises:
Receiving a set of records, each record including a set of tokens and a set of items, each record kept secret;
Receive at least one separate item; And
Wherein the distortion circuit is configured to evaluate the set of records and the at least one discrete item using a distortion circuit based on matrix decomposition, the output of the distortion circuit comprising item profiles for the at least one separate item.

15. The method of claim 14,
The processor of the CSP comprises:
Design the distortion circuit to perform a matrix decomposition of the set of records and the at least one separate item, the output of the distortion circuit including item profiles for the at least one separate item; And
And to deliver the distortion circuit to the RecSys.

16. The method of claim 15,
The processor of the CSP comprises:
And wherein the system is configured to design the matrix decomposition operation as a Boolean circuit.

17. The method of claim 16,
The processor of the CSP comprises:
Construct an array of sets of records;
Wherein the processor is configured to design the matrix disassembly circuit to perform a classification operation, a duplication operation, an update operation, a comparison operation, and a calculation operation of slope contributions for the array.

16. The method of claim 15,
The source processor comprising:
And encrypt the set of records prior to providing the set of records to generate encrypted records.

19. The method of claim 18,
The processor of the CSP further comprises:
Generate public cryptographic keys; And
And transmit the keys to the source.

19. The method of claim 18,
Wherein the encryption is partly perturbative encryption and the processor of the RecSys is further configured to mask the encrypted records to generate masked records;
Wherein the processor of the CSP is further configured to decrypt the masked records to generate decrypted-masked records.

21. The method of claim 20,
Wherein the processor of the CSP is further configured to unmask the decrypted-masked records in the distortion circuit prior to processing the decrypted-masked records.

21. The method of claim 20,
Wherein the processor of the RecSys and the processor of the CSP are further configured to perform indefinite transfers, the RecSys receives distorted values of decrypted-masked records, and the records are kept private from the RecSys and the CSP System.

15. The method of claim 14,
Wherein the processor of the RecSys is further configured to receive a plurality of tokens of each record, the plurality of tokens being transmitted by the source.

15. The method of claim 14,
The processor of the source is configured to pad each record with null entries to produce records having the same number of tokens as the value when the number of tokens in each record is less than a value representing the maximum value system.

15. The method of claim 14,
A processor (602) for receiving at least one input / output (604) if the source of the set of records is one of a set of databases and users and the source is a set of users; And at least one memory (606, 608), each user being a source of one record, wherein the one record is kept secret from parties other than the respective user.

16. The method of claim 15,
Wherein the processor of the CSP is further configured to receive a set of parameters for the design of the distortion circuit, and wherein the parameters are transmitted by the RecSys.