CN107766742B - Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment - Google Patents

Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment Download PDF

Info

Publication number
CN107766742B
CN107766742B CN201711065040.4A CN201711065040A CN107766742B CN 107766742 B CN107766742 B CN 107766742B CN 201711065040 A CN201711065040 A CN 201711065040A CN 107766742 B CN107766742 B CN 107766742B
Authority
CN
China
Prior art keywords
matrix
user
item
correlation
factor matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711065040.4A
Other languages
Chinese (zh)
Other versions
CN107766742A (en
Inventor
李先贤
傅星珵
王利娥
刘鹏
褚宏光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN201711065040.4A priority Critical patent/CN107766742B/en
Publication of CN107766742A publication Critical patent/CN107766742A/en
Application granted granted Critical
Publication of CN107766742B publication Critical patent/CN107766742B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-correlation differential privacy matrix decomposition method under a non-independent same-distribution environment, which considers the multi-correlation of other attributes of data, uses a correlation target perturbation mechanism to simultaneously introduce the correlation properties of the data into a model target function, and simultaneously ensures the safety and the effectiveness of a prediction result. The method mainly comprises two parts, namely, a generated random noise matrix satisfies correlation noise matrix calculation of a prediction result which satisfies differential privacy under the assumption of non-independent same distribution, and a correlation differential privacy matrix decomposition training process which introduces other attribute multi-correlation and adds the random noise matrix. The invention can improve the prediction precision as much as possible to offset the precision loss caused by privacy protection under the condition of ensuring the data privacy safety.

Description

Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment
Technical Field
The invention relates to the technical field of data privacy protection, in particular to a multi-correlation differential privacy matrix decomposition method in a non-independent and same-distribution environment.
Background
The recommendation system is widely applied under the current society, particularly the Internet industry. Matrix factorization is a popular collaborative filtering method for constructing recommendation systems. In the collaborative filtering recommendation system, since the scoring of items by users may reveal personal privacy, for example, personal preferences (scoring data) may be utilized to infer the health condition, political tendency, or even true identity of the users, the scoring in the raw scoring data is sensitive, the scoring matrix contains privacy information of the users, and the risk of privacy disclosure is caused when the scoring matrix is not used, which is now appreciated by related researchers.
Many researchers have proposed many anonymous protection models at present, and if the researchers combine the differential privacy model, a differential privacy matrix decomposition model is proposed for the credibility and the incredibility of the recommendation system. However, matrix decomposition and differential privacy are both proposed on the assumption that the data sets are independently and identically distributed, whereas data in real scenes are often correlated. Therefore, under real data, the matrix decomposition has the problem of recommendation precision, and the original privacy protection capability is lost due to the addition of the relevance between data.
In view of the fact that the non-independent and identically distributed data with the correlation characteristics are closer to reality and have greater research value, research on the correlation data is also a current hot problem. In the existing privacy protection research, most research is based on the assumption of independent co-distributed data, and the association between individuals is not taken into consideration, so that compared with the independent co-distributed data, the non-independent co-distributed data with complex association has higher value and is more challenging. For the non-independent same distribution matrix decomposition, the main problems are shown in the following aspects:
(1) the relevance exists between users and between items, and the traditional differential privacy model adds too much noise when the evaluation data which are not independently distributed are implemented, so that the data effectiveness is greatly reduced;
(2) the correlation property between the user and the project can be used as auxiliary information to be provided for matrix decomposition to improve the prediction accuracy while enhancing the background knowledge of the attacker. However, conventional matrix decomposition methods do not take these correlation properties into account;
(3) on the premise of introducing respective correlation properties of users and items to improve matrix decomposition utility, the traditional differential privacy mechanism can no longer ensure privacy security, so that a new differential privacy mechanism is needed to ensure that privacy is not leaked.
Disclosure of Invention
The invention aims to solve the problem that the conventional differential privacy matrix decomposition loses the original privacy protection capability when the non-independent and identically distributed data are faced, and provides a multi-correlation differential privacy matrix decomposition method under the non-independent and identically distributed environment.
In order to solve the problems, the invention is realized by the following technical scheme:
the method for decomposing the multi-correlation differential privacy matrix in the non-independent same-distribution environment specifically comprises the following steps:
step 1, preprocessing attribute spaces of users and items, and respectively calculating a user correlation coefficient matrix and an item correlation coefficient matrix;
step 2, based on a difference privacy model, introducing a target function of matrix decomposition of multi-correlation to generate a random noise matrix which obeys Laplace distribution; namely:
step 2.1, calculating the value ranges of the user correlation coefficient, the project correlation coefficient and the grading data, namely the difference between the maximum value and the minimum value, and calculating the sensitivity of the user factor matrix and the sensitivity of the project matrix according to the difference;
2.2, calculating random numbers which obey Laplace distribution according to the sensitivity of the user factor matrix and the sensitivity of the item matrix respectively, and uniformly and randomly generating a group of random numbers to ensure that an L1 norm value of the group of random numbers as a vector is exactly equal to the obtained random numbers which obey Laplace distribution, thereby obtaining a user random noise matrix and an item random noise matrix;
step 3, training the target function by adopting a random gradient descent method to realize correlation difference privacy matrix decomposition;
step 3.1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing a user factor matrix and a project factor matrix, wherein the user factor matrix is a matrix of d × n, the project factor matrix is a matrix of d × m, n is the number of users, m is the number of projects, and d is a decomposition dimension;
step 3.2, judging whether iteration is finished, namely whether the current iteration frequency reaches the set maximum iteration frequency, and if not, continuing to execute downwards; if yes, executing step 3.6;
step 3.3, calculating an Error matrix Error of the iteration:
Error=R-UT*V
wherein, R represents a project rating matrix of a user, U represents a current user factor matrix, V represents a current project factor matrix, and T represents transposition;
step 3.4, traversing each row of the scoring matrix R, calculating the partial derivative of the objective function of each row to the current user factor matrix U, and updating the user factor matrix U by adding the partial derivatives of each user of the current user factor matrix U and the corresponding row;
step 3.5, traversing each row of the original scoring matrix R, calculating the partial derivative of each row of objective functions to the current project factor matrix V, and updating the project factor matrix V by adding each project of the current project factor matrix V and the partial derivative of the corresponding row;
step 3.6, repeating the steps 3.2 to 3.5 until the iteration is finished, and when the iteration is finished, calculating and outputting a prediction matrix R':
R′=UT*V
wherein, U represents the current user factor matrix, V represents the current item factor matrix, and T represents the transposition.
In step 1, calculating a correlation coefficient Jaccard (X, Y) of 2 users or items by using the Jaccard similarity distance as follows:
Figure BDA0001455573930000021
where | X @ Y | represents the number of common attributes of 2 users or items, and | X @ Y | represents the number of all attributes of 2 users or items.
In the step 2.1, the sensitivity USens of the user factor matrix is:
Figure BDA0001455573930000031
in the step 2.1, the sensitivity VSens of the project factor matrix is:
Figure BDA0001455573930000032
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, and VRange tableShowing the value range of the related coefficient of the item,
Figure BDA0001455573930000033
representing the correlation coefficient between user i and user o,
Figure BDA0001455573930000034
representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set of items 1 to m except the item j, wherein n is the number of users, and m is the number of items.
In step 2.2, the ith column vector of the random noise matrix is used
Figure BDA0001455573930000035
Comprises the following steps:
Figure BDA0001455573930000036
in step 2.2, the jth column vector of the random noise matrix is processed
Figure BDA0001455573930000037
Comprises the following steps:
Figure BDA0001455573930000038
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, and Lap (.) represents the probability density function of the Laplace distribution.
In step 3.4, the partial derivative of the ith row of the user factor matrix U
Figure BDA0001455573930000039
Comprises the following steps:
Figure BDA00014555739300000310
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,
Figure BDA00014555739300000311
representing the correlation coefficient between the ith user and the ith user,
Figure BDA00014555739300000312
the ith column vector representing the user random noise matrix,
Figure BDA00014555739300000313
representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,
Figure BDA00014555739300000314
indicates the item number j, l ∈ [ n ] evaluated by the ith user]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
In step 3.5, the partial derivative of the jth row of the item factor matrix V
Figure BDA00014555739300000315
Comprises the following steps:
Figure BDA00014555739300000316
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkRepresenting the corresponding kth entry in the entry factor matrix VThe vector of the column of the destination,
Figure BDA00014555739300000317
representing the correlation coefficient between the jth item and the kth item,
Figure BDA0001455573930000041
the jth column vector representing the term random noise matrix,
Figure BDA0001455573930000042
representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,
Figure BDA0001455573930000043
user number i, k ∈ [ m ] indicating that item j was evaluated]-iIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
Compared with the prior art, the method improves the original privacy protection model according to the correlation based on the non-independent same distribution data aiming at the application background of the real data under the recommendation system, and the improved privacy protection model has the following characteristics:
(1) because of the relevance between the non-independent and distributed data, the improved privacy protection model must introduce the relevance between the data as a factor into the recommendation system. Therefore, according to the data characteristics, the correlation matrix of the data items is calculated and constructed, and the correlation coefficient error regular term is calculated and introduced into the objective function of matrix decomposition, so that the model prediction accuracy is improved.
(2) The method protects the data privacy by using a differential privacy method, and designs a new perturbation mechanism algorithm, namely a correlation target perturbation mechanism, for ensuring that a new model meets the requirements of a differential privacy model while introducing the data correlation into a matrix decomposition training process.
(3) The invention provides a multi-relevance differential privacy matrix decomposition method by surrounding the auxiliary background knowledge of relevance and fully considering two conditions that an attacker can strengthen attack success probability by using the background knowledge and the prediction precision is improved by using the relevance through matrix decomposition, so that the precision loss caused by privacy protection is counteracted by improving the prediction precision as much as possible under the condition of ensuring the data privacy safety.
Drawings
FIG. 1 is a view of a model structure;
FIG. 2 is a flow chart of a correlation random noise matrix calculation;
fig. 3 is a correlation differential privacy matrix decomposition training process.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings in conjunction with specific examples.
A method for decomposing a multi-correlation differential privacy matrix in a non-independent same distribution environment specifically comprises the following steps:
step 1, preprocessing data and calculating parameters needed by training a model.
The factor matrices U and V are first initialized, i.e. column vectors of factor matrices U and V are chosen uniformly and randomly from the L1 norm sphere. Then, the correlation coefficient matrixes delta of the users and the items are respectively calculated according to the attribute spaces of the users and the itemsUAnd ΔV
And 2, calculating a noise matrix which needs to be added to the new model to meet the difference privacy.
The scoring data used by the invention is the scoring data of the user on the items, and the background knowledge of the attacker can be the correlation degree between the two items and other information besides the scoring.
Firstly, according to the original scoring matrix R and the user correlation coefficient matrix deltaUAnd the item correlation matrix coefficient ΔVAnd respectively calculating sensitivities USEns and VSens of the user factor matrix and the project factor matrix according to a correlation target perturbation mechanism and a privacy budget E. Then, uniformly and randomly selecting random numbers meeting Laplace (US/∈) and Laplace (VS/∈) of Laplace distribution, and respectively using the random numbers as two factor matrix directions of U and VMeasuring the L1 norm of each column vector of noise, uniformly and randomly generating a group of random numbers with L1 norms as the value, and adding the random numbers as column vectors into a noise matrix to obtain a user random noise matrix NUSum term random noise matrix NV
And 3, realizing the training process of the model.
Due to the sparsity of the original scoring data R, a random gradient descent method is adopted for training. In each iteration, the original scoring data has value elements, errors are calculated according to the target function of the model, the correlation coefficient matrix obtained in the previous step and the noise matrix, and U and V are updated through a gradient formula. Finally, an inner product matrix R 'of the U and the V is solved, and a result R' is output.
The objective function of the multi-correlation differential privacy matrix decomposition is:
Figure BDA0001455573930000051
wherein,
Figure BDA0001455573930000052
for the correlation coefficient of the ith user and the ith user,
Figure BDA0001455573930000053
correlation coefficients for the jth item (movie) and the kth item (movie);
the key steps and principles of the process of the invention are described in further detail below:
model structure
As shown in fig. 1, the model structure of the multi-correlation differential privacy matrix decomposition based on the non-independent co-distributed data is described as follows:
(1) the module is composed of two parts: a data preprocessing module and a correlation target perturbation mechanism module.
(2) The data preprocessing module is mainly used for preprocessing the original scores R and the attribute space of the user items and respectively calculating the correlation coefficient matrixes delta of the users and the itemsUAnd ΔV
(3) The correlation target perturbation mechanism module comprises two sub-modules: and (3) generating a correlation random noise matrix and performing correlation differential privacy matrix decomposition training. From raw scores R and a matrix of correlation coefficients ΔUVRespectively calculating user and project random noise matrixes NU,NVAnd then adding noise in the matrix decomposition training process according to the random noise matrix.
Second, data preprocessing
The data preprocessing is mainly used for calculating a correlation coefficient matrix of the user and the item, correlation coefficients of the user and the item are calculated based on data of an attribute space of the user and the item, and common calculation methods include Jacard similarity distance, Pearson correlation coefficient and the like.
TABLE 1 user rating of movies
Figure BDA0001455573930000061
TABLE 2 attribute space of users in movie rating data
User Sex Age Occupation Zip-code
Alice F Under 18 K-12 student 48267
Bob M 56+ self-employed 70072
Cindy M 25-34 scientist 55117
Dale M 45-49 executive/managerial 02460
Eric F 50-55 homemaker 55117
TABLE 3 Attribute space for movies in movie rating data
Movie Genres
Toy Story Animation|Children's|Comedy
Jumanji Adventure|Children's|Fantasy
Grumpier Old Men Comedy|Romance
Waiting to Exhale Comedy|Drama
Father of the Bride Part II Comedy
Heat Thriller
The correlation coefficient matrix is calculated from the attribute values in the attribute space, and since most attributes are non-numerical, the Jacard similarity coefficient is used, and the formula is as follows:
Figure BDA0001455573930000062
where X and Y represent the attribute vectors of user 1 and user 2, respectively, and the jaccard similarity factor is equivalent to the ratio of the number of attributes common to both users to the number of attributes owned by both users. As shown in table 2, Alice and Bob have no same attribute value in the user attribute space, so their jaccard similarity coefficient is 0; if the zip code number of Candy is the same as that of Eric, the Jacard similarity coefficient is
Figure BDA0001455573930000063
TABLE 3 film Attribute space since only one value is an aggregate-type Attribute, the aggregate-value Attribute is taken as the AttributeSex set calculation, i.e. Jacard similarity coefficient of Toy Story and Jumann ji
Figure BDA0001455573930000071
The above-mentioned companies can obtain user-user and film-film pairwise correlation Jacard similarity coefficients, i.e. user-user correlation coefficient matrix and item-item correlation coefficient matrix.
Third, correlation target perturbation mechanism
The invention provides a differential privacy matrix decomposition method considering multi-correlation of other attributes of data based on non-independent same-distribution data, and a correlation target perturbation mechanism is used for introducing the correlation properties of the data into a model target function at the same time, so that the safety and the effectiveness of a prediction result are ensured. The method mainly comprises two parts, namely, a generated random noise matrix satisfies correlation noise matrix calculation of a prediction result which satisfies differential privacy under the assumption of non-independent same distribution, and a correlation differential privacy matrix decomposition training process which introduces other attribute multi-correlation and adds the random noise matrix.
(1) Correlation random noise matrix
The correlation target perturbation mechanism is based on a difference privacy model, and a random noise matrix which obeys Laplace distribution is generated according to a target function which introduces matrix decomposition of multi-correlation. Referring to fig. 2, the detailed steps are as follows:
step 1, calculating the value ranges of the grading data, the user correlation coefficient and the project correlation coefficient respectively, namely the difference between the maximum value and the minimum value, and recording as RRange, URange and VRange.
And 2, respectively calculating sensitivities USEns and VSens of the user factor matrix and the project matrix.
The sensitivity USens of the user factor matrix is:
Figure BDA0001455573930000072
the sensitivity VSens of the project factor matrix is:
Figure BDA0001455573930000073
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, VRange represents the value range of the project correlation coefficient,
Figure BDA0001455573930000078
representing the correlation coefficient between user i and user o,
Figure BDA0001455573930000074
representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set of items 1 to m except the item j, wherein n is the number of users, and m is the number of items.
Step 3, calculating random numbers obeying Laplace distribution according to the obtained sensitivity, and uniformly and randomly generating a group of random numbers, so that the L1 norm value of the group of random numbers as a vector is exactly equal to the previously obtained random numbers obeying Laplace distribution, and the formula of the random numbers obeying Laplace distribution is as follows:
ith column vector of user random noise matrix
Figure BDA0001455573930000075
Comprises the following steps:
Figure BDA0001455573930000076
jth column vector of item random noise matrix
Figure BDA0001455573930000077
Comprises the following steps:
Figure BDA0001455573930000081
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, Lap (one.) represents the probability density function of Laplace distribution, and-represents that the value of the random vector is proportional to the probability density function.
Step 4, returning to the user random noise matrix NUSum term random noise matrix NV
(2) Correlation differential privacy matrix decomposition
The correlation difference privacy matrix decomposition is a training stage of the multi-correlation difference privacy matrix decomposition method, and a random gradient descent method is adopted for an objective function of the multi-correlation difference privacy matrix decomposition method. In this stage, the correlation coefficient matrix and the random noise matrix calculated in the foregoing are used to satisfy the requirement of the differential privacy protection model, and meanwhile, the correlation is used to improve the prediction accuracy so as to offset the accuracy loss caused by protecting privacy. Referring to fig. 3, the detailed steps of the training process are as follows:
step 1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing factor matrixes U and V, wherein a user factor matrix is a matrix with the size of d multiplied by n, a project factor matrix is a matrix with the size of d multiplied by m, n is the number of users, m is the number of projects, and d is a decomposition dimension.
And 2, judging whether the iteration is finished or not, and if not, continuing to execute downwards. If so, go to step 7.
Step 3, calculating an error matrix of the iteration, wherein the formula is as follows:
Error=R-UT*V
where R represents the original user-item scoring matrix of dxm, U represents the user factor matrix, V represents the item factor matrix, and T represents the transpose.
And 4, traversing each row of the original scoring matrix R, and calculating the partial derivative of the objective function of each row to U. The calculation formula of the U partial derivative in the ith row is as follows:
Figure BDA0001455573930000082
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViFor indicatingThe column vector r corresponding to the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,
Figure BDA0001455573930000083
representing the correlation coefficient between the ith user and the ith user,
Figure BDA0001455573930000084
the ith column vector representing the user random noise matrix,
Figure BDA0001455573930000085
representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,
Figure BDA0001455573930000086
indicates the item number j, l ∈ [ n ] evaluated by the ith user]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
And 5, traversing each column of the original scoring matrix R, and calculating the partial derivative of each column of the objective function to V. The partial derivative of V in line j is calculated as follows:
Figure BDA0001455573930000087
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkA column vector representing the corresponding kth entry in the entry factor matrix V,
Figure BDA0001455573930000091
representing the correlation coefficient between the jth item and the kth item,
Figure BDA0001455573930000092
the jth column vector representing the term random noise matrix,
Figure BDA0001455573930000093
representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,
Figure BDA0001455573930000094
user number i, k ∈ [ m ] indicating that item j was evaluated]-iIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
And 6, respectively updating the corresponding U vector and V vector by using the partial derivatives obtained in the steps 4 and 5. The update formula is as follows:
Figure BDA0001455573930000095
Figure BDA0001455573930000096
wherein i belongs to [ n ], j belongs to [ m ].
And 7, repeating the steps 2 to 6 until iteration is completed, and calculating a prediction matrix R ═ UTV, and output R'.
The method and the device realize matrix decomposition meeting a difference privacy model under the condition of non-independent and same distributed data, and can effectively improve the prediction precision while ensuring the safety. When the recommendation system carries out recommendation, the grading data of the user are protected, and the recommendation precision of the recommendation system is improved.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims (6)

1. The method for decomposing the multi-correlation differential privacy matrix in the non-independent same distribution environment is characterized by comprising the following steps of:
step 1, preprocessing attribute spaces of users and items, and respectively calculating a user correlation coefficient matrix and an item correlation coefficient matrix;
step 2, based on a difference privacy model, introducing a target function of matrix decomposition of multi-correlation to generate a user random noise matrix and a project random noise matrix which obey Laplace distribution; namely:
step 2.1, calculating the value ranges of the user correlation coefficient, the project correlation coefficient and the grading data, namely the difference between the maximum value and the minimum value, and calculating the sensitivity of the user factor matrix and the sensitivity of the project factor matrix according to the difference;
2.2, calculating random numbers which obey Laplace distribution according to the sensitivity of the user factor matrix and the sensitivity of the project factor matrix respectively, and uniformly and randomly generating a group of random numbers to ensure that an L1 norm value of the group of random numbers as a vector is exactly equal to the obtained random numbers which obey Laplace distribution, thereby obtaining a user random noise matrix and a project random noise matrix;
step 3, training the target function by adopting a random gradient descent method to realize correlation difference privacy matrix decomposition;
step 3.1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing a user factor matrix and a project factor matrix, wherein the user factor matrix is a matrix of d × n, the project factor matrix is a matrix of d × m, n is the number of users, m is the number of projects, and d is a decomposition dimension;
step 3.2, judging whether iteration is finished, namely whether the current iteration frequency reaches the set maximum iteration frequency, and if not, continuing to execute downwards; if yes, executing step 3.6;
step 3.3, calculating an Error matrix Error of the iteration:
Error=R-UT*V
wherein, R represents a project rating matrix of a user, U represents a current user factor matrix, V represents a current project factor matrix, and T represents transposition;
step 3.4, traversing each row of the scoring matrix R, calculating the partial derivative of the objective function of each row to the current user factor matrix U, and updating the user factor matrix U by adding the partial derivatives of each user of the current user factor matrix U and the corresponding row;
step 3.5, traversing each row of the original scoring matrix R, calculating the partial derivative of each row of objective functions to the current project factor matrix V, and updating the project factor matrix V by adding each project of the current project factor matrix V and the partial derivative of the corresponding row;
step 3.6, repeating the steps 3.2 to 3.5 until the iteration is finished, and when the iteration is finished, calculating and outputting a prediction matrix R':
R′=UT*V
wherein, U represents the current user factor matrix, V represents the current item factor matrix, and T represents the transposition.
2. The method of claim 1, wherein in step 1, Jaccard's similarity distance is used to calculate the correlation coefficient Jaccard (X, Y) of 2 users or items as:
Figure FDA0002740311910000021
where | X @ Y | represents the number of common attributes of 2 users or items, and | X @ Y | represents the number of all attributes of 2 users or items.
3. The method for decomposition of a multiple correlation differential privacy matrix in a non-independent, co-distributed environment according to claim 1, in step 2.1,
the sensitivity USens of the user factor matrix is:
Figure FDA0002740311910000022
the sensitivity VSens of the project factor matrix is:
Figure FDA0002740311910000023
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, VRange represents the value range of the project correlation coefficient,
Figure FDA0002740311910000024
representing the correlation coefficient between user i and user o,
Figure FDA0002740311910000025
representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set from 1 to m except the item j, wherein n is the number of users, m is the number of items, and d is the dimension of decomposition.
4. The method for decomposition of a multiple correlation differential privacy matrix in a non-independent, co-distributed environment, according to claim 1, in step 2.2,
ith column vector of user random noise matrix
Figure FDA0002740311910000026
Comprises the following steps:
Figure FDA0002740311910000027
jth column vector of item random noise matrix
Figure FDA0002740311910000028
Comprises the following steps:
Figure FDA0002740311910000029
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, Lap (one.) represents the probability density function of Laplace distribution, and-represents that the value of the random vector is proportional to the probability density function.
5. The method according to claim 1, wherein in step 3.4, the partial derivatives of the ith row of the Uf matrix U are calculated according to the partial derivatives of the Uf matrix U
Figure FDA00027403119100000210
Comprises the following steps:
Figure FDA00027403119100000211
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,
Figure FDA00027403119100000212
representing the correlation coefficient between the ith user and the ith user,
Figure FDA0002740311910000031
the ith column vector representing the user random noise matrix,
Figure FDA0002740311910000032
express possession of scores in the score matrix RThe user and item pair set with scores, M represents the number of the user and item pairs with scores in the scoring matrix R,
Figure FDA0002740311910000033
indicates the item number j, l ∈ [ n ] evaluated by the ith user]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
6. The method of claim 1, wherein the partial derivatives of the jth row of the item factor matrix V in step 3.5 are partial derivatives of the jth row of the dependent co-distributed environment
Figure FDA0002740311910000034
Comprises the following steps:
Figure FDA0002740311910000035
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkA column vector representing the corresponding kth entry in the entry factor matrix V,
Figure FDA0002740311910000036
representing the correlation coefficient between the jth item and the kth item,
Figure FDA0002740311910000037
the jth column vector representing the term random noise matrix,
Figure FDA0002740311910000038
representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,
Figure FDA0002740311910000039
user number i, k ∈ [ m ] indicating that item j was evaluated]-jIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
CN201711065040.4A 2017-11-02 2017-11-02 Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment Active CN107766742B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711065040.4A CN107766742B (en) 2017-11-02 2017-11-02 Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711065040.4A CN107766742B (en) 2017-11-02 2017-11-02 Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment

Publications (2)

Publication Number Publication Date
CN107766742A CN107766742A (en) 2018-03-06
CN107766742B true CN107766742B (en) 2021-02-19

Family

ID=61272434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711065040.4A Active CN107766742B (en) 2017-11-02 2017-11-02 Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment

Country Status (1)

Country Link
CN (1) CN107766742B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443061B (en) * 2018-05-03 2023-06-20 创新先进技术有限公司 Data encryption method and device
CN111079177B (en) * 2019-12-04 2023-01-13 湖南大学 Privacy protection method based on wavelet transformation and used for time correlation in track data
CN111177781A (en) * 2019-12-30 2020-05-19 北京航空航天大学 Differential privacy recommendation method based on heterogeneous information network embedding
CN112668044B (en) * 2020-12-21 2022-04-12 中国科学院信息工程研究所 Privacy protection method and device for federal learning
CN113204793A (en) * 2021-06-09 2021-08-03 辽宁工程技术大学 Recommendation method based on personalized differential privacy protection
CN113821732B (en) * 2021-11-24 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Item recommendation method and equipment for protecting user privacy and learning system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091824A1 (en) * 2015-09-25 2017-03-30 The Provost, Fellows, Foundation Scholars, And The Other Members Of The Board Method and system for providing item recommendations in a privacy-enhanced manner
CN106557654B (en) * 2016-11-16 2020-03-17 中山大学 Collaborative filtering method based on differential privacy technology
CN106651549B (en) * 2017-01-09 2019-10-01 山东大学 A kind of personalized automobile recommended method and system merging supply and demand chain

Also Published As

Publication number Publication date
CN107766742A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107766742B (en) Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment
Zhu et al. Fairness-aware tensor-based recommendation
Tutz et al. A penalty approach to differential item functioning in Rasch models
Chen et al. Privacy-preserving dynamic personalized pricing with demand learning
Garreta et al. Learning scikit-learn: machine learning in python
CN106909607B (en) A kind of collaborative filtering group recommending method based on random perturbation technology
CN107766745B (en) Hierarchical privacy protection method in hierarchical data release
Niu et al. A relaxed gradient based algorithm for solving Sylvester equations
CN111177781A (en) Differential privacy recommendation method based on heterogeneous information network embedding
CN108280217A (en) A kind of matrix decomposition recommendation method based on difference secret protection
CN107862219B (en) Method for protecting privacy requirements in social network
CN111400612B (en) Personalized recommendation method integrating social influence and project association
CN110837603B (en) Integrated recommendation method based on differential privacy protection
Chen et al. Deep tensor factorization for multi-criteria recommender systems
CN113918832B (en) Graph convolution collaborative filtering recommendation system based on social relationship
Wang et al. DNN-DP: Differential privacy enabled deep neural network learning framework for sensitive crowdsourcing data
CN113918834B (en) Graph convolution collaborative filtering recommendation method fusing social relations
CN108470052A (en) A kind of anti-support attack proposed algorithm based on matrix completion
CN108628955A (en) The personalized method for secret protection and system of commending system
CN110490002A (en) A kind of multidimensional crowdsourcing data true value discovery method based on localization difference privacy
Kasap et al. A polynomial modeling based algorithm in top-N recommendation
Feng et al. Edge–cloud-aided differentially private tucker decomposition for cyber–physical–social systems
Chiang A Note on the⊤‐Stein Matrix Equation
EP4291314A1 (en) Automatic detection of prohibited gaming content
CN113342994A (en) Recommendation system based on non-sampling cooperative knowledge graph network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant