CN107766742A - Dependent is the same as more correlation difference privacy matrix disassembling methods under distributional environment - Google Patents
Dependent is the same as more correlation difference privacy matrix disassembling methods under distributional environment Download PDFInfo
- Publication number
- CN107766742A CN107766742A CN201711065040.4A CN201711065040A CN107766742A CN 107766742 A CN107766742 A CN 107766742A CN 201711065040 A CN201711065040 A CN 201711065040A CN 107766742 A CN107766742 A CN 107766742A
- Authority
- CN
- China
- Prior art keywords
- matrix
- user
- mrow
- item
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 208
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000001419 dependent effect Effects 0.000 title claims abstract 4
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 44
- 230000035945 sensitivity Effects 0.000 claims description 23
- 238000007781 pre-processing Methods 0.000 claims description 8
- 230000017105 transposition Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 abstract description 11
- 230000008569 process Effects 0.000 abstract description 8
- 230000002596 correlated effect Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 2
- 235000009508 confectionery Nutrition 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of dependent with more correlation difference privacy matrix disassembling methods under distributional environment, it considers the more correlations of other attributes of data, the relevant nature of data is incorporated into model objective function simultaneously using correlation object intrusion mechanism, while ensure that the security and utility of prediction result.The wherein main random noise matrix for including generation meets that prediction result meets the correlated noise matrix computations of difference privacy in the case where dependent is with distributional assumption, and introduces the more correlations of other attributes and add the correlator difference privacy matrix decomposition training process two large divisions of random noise matrix.The present invention can improve precision of prediction as far as possible to offset the loss of significance that secret protection is brought in the case where reaching guarantee data-privacy safety.
Description
Technical Field
The invention relates to the technical field of data privacy protection, in particular to a multi-correlation differential privacy matrix decomposition method in a non-independent and same-distribution environment.
Background
The recommendation system is widely applied under the current society, particularly the Internet industry. Matrix factorization is a popular collaborative filtering method for constructing recommendation systems. In the collaborative filtering recommendation system, since the scoring of items by users may reveal personal privacy, for example, personal preferences (scoring data) may be utilized to infer the health condition, political tendency, or even true identity of the users, the scoring in the raw scoring data is sensitive, the scoring matrix contains privacy information of the users, and the risk of privacy disclosure is caused when the scoring matrix is not used, which is now appreciated by related researchers.
Many researchers have proposed many anonymous protection models at present, and if the researchers combine the differential privacy model, a differential privacy matrix decomposition model is proposed for the credibility and the incredibility of the recommendation system. However, matrix decomposition and differential privacy are both proposed on the assumption that the data sets are independently and identically distributed, whereas data in real scenes are often correlated. Therefore, under real data, the matrix decomposition has the problem of recommendation precision, and the original privacy protection capability is lost due to the addition of the relevance between data.
In view of the fact that the non-independent and identically distributed data with the correlation characteristics are closer to reality and have greater research value, research on the correlation data is also a current hot problem. In the existing privacy protection research, most research is based on the assumption of independent co-distributed data, and the association between individuals is not taken into consideration, so that compared with the independent co-distributed data, the non-independent co-distributed data with complex association has higher value and is more challenging. For the non-independent same distribution matrix decomposition, the main problems are shown in the following aspects:
(1) the relevance exists between users and between items, and the traditional differential privacy model adds too much noise when the evaluation data which are not independently distributed are implemented, so that the data effectiveness is greatly reduced;
(2) the correlation property between the user and the project can be used as auxiliary information to be provided for matrix decomposition to improve the prediction accuracy while enhancing the background knowledge of the attacker. However, conventional matrix decomposition methods do not take these correlation properties into account;
(3) on the premise of introducing respective correlation properties of users and items to improve matrix decomposition utility, the traditional differential privacy mechanism can no longer ensure privacy security, so that a new differential privacy mechanism is needed to ensure that privacy is not leaked.
Disclosure of Invention
The invention aims to solve the problem that the conventional differential privacy matrix decomposition loses the original privacy protection capability when the non-independent and identically distributed data are faced, and provides a multi-correlation differential privacy matrix decomposition method under the non-independent and identically distributed environment.
In order to solve the problems, the invention is realized by the following technical scheme:
the method for decomposing the multi-correlation differential privacy matrix in the non-independent same-distribution environment specifically comprises the following steps:
step 1, preprocessing attribute spaces of users and items, and respectively calculating a user correlation coefficient matrix and an item correlation coefficient matrix;
step 2, based on a difference privacy model, introducing a target function of matrix decomposition of multi-correlation to generate a random noise matrix which obeys Laplace distribution; namely:
step 2.1, calculating the value ranges of the user correlation coefficient, the project correlation coefficient and the grading data, namely the difference between the maximum value and the minimum value, and calculating the sensitivity of the user factor matrix and the sensitivity of the project matrix according to the difference;
2.2, calculating random numbers which obey Laplace distribution according to the sensitivity of the user factor matrix and the sensitivity of the item matrix respectively, and uniformly and randomly generating a group of random numbers to ensure that an L1 norm value of the group of random numbers as a vector is exactly equal to the obtained random numbers which obey Laplace distribution, thereby obtaining a user random noise matrix and an item random noise matrix;
step 3, training the target function by adopting a random gradient descent method to realize correlation difference privacy matrix decomposition;
step 3.1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing a user factor matrix and a project factor matrix, wherein the user factor matrix is a matrix of d × n, the project factor matrix is a matrix of d × m, n is the number of users, m is the number of projects, and d is a decomposition dimension;
step 3.2, judging whether iteration is finished, namely whether the current iteration frequency reaches the set maximum iteration frequency, and if not, continuing to execute downwards; if yes, executing step 3.6;
step 3.3, calculating an Error matrix Error of the iteration:
Error=R-UT*V
wherein, R represents a project rating matrix of a user, U represents a current user factor matrix, V represents a current project factor matrix, and T represents transposition;
step 3.4, traversing each row of the scoring matrix R, calculating the partial derivative of the objective function of each row to the current user factor matrix U, and updating the user factor matrix U by adding the partial derivatives of each user of the current user factor matrix U and the corresponding row;
step 3.5, traversing each row of the original scoring matrix R, calculating the partial derivative of each row of objective functions to the current project factor matrix V, and updating the project factor matrix V by adding each project of the current project factor matrix V and the partial derivative of the corresponding row;
step 3.6, repeating the steps 3.2 to 3.5 until the iteration is finished, and when the iteration is finished, calculating and outputting a prediction matrix R':
R′=UT*V
wherein, U represents the current user factor matrix, V represents the current item factor matrix, and T represents the transposition.
In step 1, calculating a correlation coefficient Jaccard (X, Y) of 2 users or items by using the Jaccard similarity distance as follows:
where | X ∩ Y | represents the number of common attributes for 2 users or items, | X ∪ Y | represents the number of all attributes for 2 users or items.
In the step 2.1, the sensitivity USens of the user factor matrix is:
in the step 2.1, the sensitivity VSens of the project factor matrix is:
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, VRange represents the value range of the project correlation coefficient,representing the correlation coefficient between user i and user o,representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set of items 1 to m except the item j, wherein n is the number of users, and m is the number of items.
In step 2.2, the ith column vector of the random noise matrix is usedComprises the following steps:
in step 2.2, the jth column vector of the random noise matrix is processedComprises the following steps:
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, and Lap (.) represents the probability density function of the Laplace distribution.
In step 3.4, the partial derivative of the ith row of the user factor matrix UComprises the following steps:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,representing the correlation coefficient between the ith user and the ith user,the ith column vector representing the user random noise matrix,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,representing the ith userThe evaluated item number j, l ∈ [ n ]]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
In step 3.5, the partial derivative of the jth row of the item factor matrix VComprises the following steps:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkA column vector representing the corresponding kth entry in the entry factor matrix V,representing the correlation coefficient between the jth item and the kth item,the jth column vector representing the term random noise matrix,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,user number i, k ∈ [ m ] indicating that item j was evaluated]-iIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
Compared with the prior art, the method improves the original privacy protection model according to the correlation based on the non-independent same distribution data aiming at the application background of the real data under the recommendation system, and the improved privacy protection model has the following characteristics:
(1) because of the relevance between the non-independent and distributed data, the improved privacy protection model must introduce the relevance between the data as a factor into the recommendation system. Therefore, according to the data characteristics, the correlation matrix of the data items is calculated and constructed, and the correlation coefficient error regular term is calculated and introduced into the objective function of matrix decomposition, so that the model prediction accuracy is improved.
(2) The method protects the data privacy by using a differential privacy method, and designs a new perturbation mechanism algorithm, namely a correlation target perturbation mechanism, for ensuring that a new model meets the requirements of a differential privacy model while introducing the data correlation into a matrix decomposition training process.
(3) The invention provides a multi-relevance differential privacy matrix decomposition method by surrounding the auxiliary background knowledge of relevance and fully considering two conditions that an attacker can strengthen attack success probability by using the background knowledge and the prediction precision is improved by using the relevance through matrix decomposition, so that the precision loss caused by privacy protection is counteracted by improving the prediction precision as much as possible under the condition of ensuring the data privacy safety.
Drawings
FIG. 1 is a view of a model structure;
FIG. 2 is a flow chart of a correlation random noise matrix calculation;
fig. 3 is a correlation differential privacy matrix decomposition training process.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings in conjunction with specific examples.
A method for decomposing a multi-correlation differential privacy matrix in a non-independent same distribution environment specifically comprises the following steps:
step 1, preprocessing data and calculating parameters needed by training a model.
The factor matrices U and V are first initialized, i.e. column vectors of factor matrices U and V are chosen uniformly and randomly from the L1 norm sphere. Then, the correlation coefficient matrixes delta of the users and the items are respectively calculated according to the attribute spaces of the users and the itemsUAnd ΔV。
And 2, calculating a noise matrix which needs to be added to the new model to meet the difference privacy.
The scoring data used by the invention is the scoring data of the user on the items, and the background knowledge of the attacker can be the correlation degree between the two items and other information besides the scoring.
Firstly, according to the original scoring matrix R and the user correlation coefficient matrix deltaUAnd the item correlation matrix coefficient ΔVAnd respectively calculating sensitivities USEns and VSens of the user factor matrix and the project factor matrix according to a correlation target perturbation mechanism and a privacy budget E. Then, uniformly and randomly selecting random numbers meeting Laplace distribution (US/∈) and Laplace (VS/∈) to serve as L1 norms of each column of noise vectors of U and V factor matrix vectors respectively, uniformly and randomly generating a group of random numbers with L1 norms as the values to serve as column vectors to be added into a noise matrix, and obtaining a user random noise matrix NUSum term random noise matrix NV。
And 3, realizing the training process of the model.
Due to the sparsity of the original scoring data R, a random gradient descent method is adopted for training. In each iteration, the original scoring data has value elements, errors are calculated according to the target function of the model, the correlation coefficient matrix obtained in the previous step and the noise matrix, and U and V are updated through a gradient formula. Finally, an inner product matrix R 'of the U and the V is solved, and a result R' is output.
The objective function of the multi-correlation differential privacy matrix decomposition is:
wherein,for the correlation coefficient of the ith user and the ith user,correlation coefficients for the jth item (movie) and the kth item (movie);
the key steps and principles of the process of the invention are described in further detail below:
model structure
As shown in fig. 1, the model structure of the multi-correlation differential privacy matrix decomposition based on the non-independent co-distributed data is described as follows:
(1) the module is composed of two parts: a data preprocessing module and a correlation target perturbation mechanism module.
(2) The data preprocessing module is mainly used for preprocessing the original scores R and the attribute space of the user items and respectively calculating the correlation coefficient matrixes delta of the users and the itemsUAnd ΔV。
(3) The correlation target perturbation mechanism module comprises two sub-modules: and (3) generating a correlation random noise matrix and performing correlation differential privacy matrix decomposition training. From raw scores R and a matrix of correlation coefficients ΔU,ΔVRespectively calculating user and project random noise matrixes NU,NVAnd then adding noise in the matrix decomposition training process according to the random noise matrix.
Second, data preprocessing
The data preprocessing is mainly used for calculating a correlation coefficient matrix of the user and the item, correlation coefficients of the user and the item are calculated based on data of an attribute space of the user and the item, and common calculation methods include Jacard similarity distance, Pearson correlation coefficient and the like.
TABLE 1 user rating of movies
TABLE 2 attribute space of users in movie rating data
User | Sex | Age | Occupation | Zip-code |
Alice | F | Under 18 | K-12 student | 48267 |
Bob | M | 56+ | self-employed | 70072 |
Cindy | M | 25-34 | scientist | 55117 |
Dale | M | 45-49 | executive/managerial | 02460 |
Eric | F | 50-55 | homemaker | 55117 |
TABLE 3 Attribute space for movies in movie rating data
Movie | Genres |
Toy Story | Animation|Children's|Comedy |
Jumanji | Adventure|Children's|Fantasy |
Grumpier Old Men | Comedy|Romance |
Waiting to Exhale | Comedy|Drama |
Father of the Bride Part II | Comedy |
Heat | Thriller |
The correlation coefficient matrix is calculated from the attribute values in the attribute space, and since most attributes are non-numerical, the Jacard similarity coefficient is used, and the formula is as follows:
where X and Y represent the attribute vectors of user 1 and user 2, respectively, and the jaccard similarity factor is equivalent to the ratio of the number of attributes common to both users to the number of attributes owned by both users. As shown in table 2, Alice and Bob have no same attribute value in the user attribute space, so their jaccard similarity coefficient is 0; if the zip code number of Candy is the same as that of Eric, the Jacard similarity coefficient isTable 3 in the movie attribute space, since only one value is an attribute of the collection type, the collection-value attribute is calculated as an attribute collection, i.e., the jlcard similarity coefficient of the top Story and JumanjiThe above-mentioned companies can obtain user-user and film-film pairwise related Jacard similarity coefficient, i.e. user-user phaseA relationship matrix and an item-item correlation coefficient matrix.
Third, correlation target perturbation mechanism
The invention provides a differential privacy matrix decomposition method considering multi-correlation of other attributes of data based on non-independent same-distribution data, and a correlation target perturbation mechanism is used for introducing the correlation properties of the data into a model target function at the same time, so that the safety and the effectiveness of a prediction result are ensured. The method mainly comprises two parts, namely, a generated random noise matrix satisfies correlation noise matrix calculation of a prediction result which satisfies differential privacy under the assumption of non-independent same distribution, and a correlation differential privacy matrix decomposition training process which introduces other attribute multi-correlation and adds the random noise matrix.
(1) Correlation random noise matrix
The correlation target perturbation mechanism is based on a difference privacy model, and a random noise matrix which obeys Laplace distribution is generated according to a target function which introduces matrix decomposition of multi-correlation. Referring to fig. 2, the detailed steps are as follows:
step 1, calculating the value ranges of the grading data, the user correlation coefficient and the project correlation coefficient respectively, namely the difference between the maximum value and the minimum value, and recording as RRange, URange and VRange.
And 2, respectively calculating sensitivities USEns and VSens of the user factor matrix and the project matrix.
The sensitivity USens of the user factor matrix is:
the sensitivity VSens of the project factor matrix is:
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, VRange represents the value range of the project correlation coefficient,representing the correlation coefficient between user i and user o,representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set of items 1 to m except the item j, wherein n is the number of users, and m is the number of items.
Step 3, calculating random numbers obeying Laplace distribution according to the obtained sensitivity, and uniformly and randomly generating a group of random numbers, so that the L1 norm value of the group of random numbers as a vector is exactly equal to the previously obtained random numbers obeying Laplace distribution, and the formula of the random numbers obeying Laplace distribution is as follows:
ith column vector of user random noise matrixComprises the following steps:
jth column vector of item random noise matrixComprises the following steps:
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, Lap (one.) represents the probability density function of Laplace distribution, and-represents that the value of the random vector is proportional to the probability density function.
Step 4, returning to the user random noise matrix NUSum term random noise matrix NV。
(2) Correlation differential privacy matrix decomposition
The correlation difference privacy matrix decomposition is a training stage of the multi-correlation difference privacy matrix decomposition method, and a random gradient descent method is adopted for an objective function of the multi-correlation difference privacy matrix decomposition method. In this stage, the correlation coefficient matrix and the random noise matrix calculated in the foregoing are used to satisfy the requirement of the differential privacy protection model, and meanwhile, the correlation is used to improve the prediction accuracy so as to offset the accuracy loss caused by protecting privacy. Referring to fig. 3, the detailed steps of the training process are as follows:
step 1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing factor matrixes U and V, wherein a user factor matrix is a matrix with the size of d multiplied by n, a project factor matrix is a matrix with the size of d multiplied by m, n is the number of users, m is the number of projects, and d is a decomposition dimension.
And 2, judging whether the iteration is finished or not, and if not, continuing to execute downwards. If so, go to step 7.
Step 3, calculating an error matrix of the iteration, wherein the formula is as follows:
Error=R-UT*V
where R represents the original user-item scoring matrix of dxm, U represents the user factor matrix, V represents the item factor matrix, and T represents the transpose.
And 4, traversing each row of the original scoring matrix R, and calculating the partial derivative of the objective function of each row to U. The calculation formula of the U partial derivative in the ith row is as follows:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,representing the correlation coefficient between the ith user and the ith user,the ith column vector representing the user random noise matrix,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,indicates the item number j, l ∈ [ n ] evaluated by the ith user]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
And 5, traversing each column of the original scoring matrix R, and calculating the partial derivative of each column of the objective function to V. The partial derivative of V in line j is calculated as follows:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkTo representThe column vector of the k-th item in the item factor matrix V,representing the correlation coefficient between the jth item and the kth item,the jth column vector representing the term random noise matrix,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,user number i, k ∈ [ m ] indicating that item j was evaluated]-iIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
And 6, respectively updating the corresponding U vector and V vector by using the partial derivatives obtained in the steps 4 and 5. The update formula is as follows:
wherein i belongs to [ n ], j belongs to [ m ].
And 7, repeating the steps 2 to 6 until iteration is completed, and calculating a prediction matrix R ═ UTV, and output R'.
The method and the device realize matrix decomposition meeting a difference privacy model under the condition of non-independent and same distributed data, and can effectively improve the prediction precision while ensuring the safety. When the recommendation system carries out recommendation, the grading data of the user are protected, and the recommendation precision of the recommendation system is improved.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.
Claims (6)
1. The method for decomposing the multi-correlation differential privacy matrix in the non-independent same distribution environment is characterized by comprising the following steps of:
step 1, preprocessing attribute spaces of users and items, and respectively calculating a user correlation coefficient matrix and an item correlation coefficient matrix;
step 2, based on a difference privacy model, introducing a target function of matrix decomposition of multi-correlation to generate a random noise matrix which obeys Laplace distribution; namely:
step 2.1, calculating the value ranges of the user correlation coefficient, the project correlation coefficient and the grading data, namely the difference between the maximum value and the minimum value, and calculating the sensitivity of the user factor matrix and the sensitivity of the project matrix according to the difference;
2.2, calculating random numbers which obey Laplace distribution according to the sensitivity of the user factor matrix and the sensitivity of the item matrix respectively, and uniformly and randomly generating a group of random numbers to ensure that an L1 norm value of the group of random numbers as a vector is exactly equal to the obtained random numbers which obey Laplace distribution, thereby obtaining a user random noise matrix and an item random noise matrix;
step 3, training the target function by adopting a random gradient descent method to realize correlation difference privacy matrix decomposition;
step 3.1, uniformly and randomly selecting a vector formed by random numbers from an L1 norm sphere, and constructing a user factor matrix and a project factor matrix, wherein the user factor matrix is a matrix of d × n, the project factor matrix is a matrix of d × m, n is the number of users, m is the number of projects, and d is a decomposition dimension;
step 3.2, judging whether iteration is finished, namely whether the current iteration frequency reaches the set maximum iteration frequency, and if not, continuing to execute downwards; if yes, executing step 3.6;
step 3.3, calculating an Error matrix Error of the iteration:
Error=R-UT*V
wherein, R represents a project rating matrix of a user, U represents a current user factor matrix, V represents a current project factor matrix, and T represents transposition;
step 3.4, traversing each row of the scoring matrix R, calculating the partial derivative of the objective function of each row to the current user factor matrix U, and updating the user factor matrix U by adding the partial derivatives of each user of the current user factor matrix U and the corresponding row;
step 3.5, traversing each row of the original scoring matrix R, calculating the partial derivative of each row of objective functions to the current project factor matrix V, and updating the project factor matrix V by adding each project of the current project factor matrix V and the partial derivative of the corresponding row;
step 3.6, repeating the steps 3.2 to 3.5 until the iteration is finished, and when the iteration is finished, calculating and outputting a prediction matrix R':
R′=UT*V
wherein, U represents the current user factor matrix, V represents the current item factor matrix, and T represents the transposition.
2. The method of claim 1, wherein in step 1, Jaccard's similarity distance is used to calculate the correlation coefficient Jaccard (X, Y) of 2 users or items as:
<mrow> <mi>J</mi> <mi>a</mi> <mi>c</mi> <mi>c</mi> <mi>a</mi> <mi>r</mi> <mi>d</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <mi>X</mi> <mo>&cap;</mo> <mi>Y</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>X</mi> <mo>&cup;</mo> <mi>Y</mi> <mo>|</mo> </mrow> </mfrac> </mrow>
where | X ∩ Y | represents the number of common attributes for 2 users or items, | X ∪ Y | represents the number of all attributes for 2 users or items.
3. The method for decomposition of a multiple correlation differential privacy matrix in a non-independent, co-distributed environment according to claim 1, in step 2.1,
the sensitivity USens of the user factor matrix is:
<mrow> <mi>U</mi> <mi>S</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> <mo>=</mo> <mn>2</mn> <mo>*</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <munder> <mo>&Sigma;</mo> <mrow> <mi>o</mi> <mo>&Element;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mi>n</mi> <mo>&rsqb;</mo> </mrow> <mrow> <mo>-</mo> <mi>i</mi> </mrow> </msup> </mrow> </munder> <msubsup> <mi>&Delta;</mi> <mrow> <mi>i</mi> <mi>o</mi> </mrow> <mi>U</mi> </msubsup> <mo>*</mo> <mrow> <mo>(</mo> <mi>R</mi> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>g</mi> <mi>e</mi> <mo>+</mo> <mi>d</mi> <mo>*</mo> <mi>U</mi> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>g</mi> <mi>e</mi> <mo>)</mo> </mrow> </mrow>
the sensitivity VSens of the project factor matrix is:
<mrow> <mi>V</mi> <mi>S</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> <mo>=</mo> <mn>2</mn> <mo>*</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <munder> <mo>&Sigma;</mo> <mrow> <mi>w</mi> <mo>&Element;</mo> <msup> <mrow> <mo>&lsqb;</mo> <mi>m</mi> <mo>&rsqb;</mo> </mrow> <mrow> <mo>-</mo> <mi>j</mi> </mrow> </msup> </mrow> </munder> <msubsup> <mi>&Delta;</mi> <mrow> <mi>j</mi> <mi>w</mi> </mrow> <mi>V</mi> </msubsup> <mo>*</mo> <mrow> <mo>(</mo> <mi>R</mi> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>g</mi> <mi>e</mi> <mo>+</mo> <mi>d</mi> <mo>*</mo> <mi>V</mi> <mi>R</mi> <mi>a</mi> <mi>n</mi> <mi>g</mi> <mi>e</mi> <mo>)</mo> </mrow> </mrow>
wherein RRange represents the value range of the grading data, URange represents the value range of the user correlation coefficient, VRange represents the value range of the project correlation coefficient,representing the correlation coefficient between user i and user o,representing the correlation coefficient between item j and user w, o e n]-iIndicating that user o belongs to the set of 1 to n except user i, w e m]-jIndicating that the item w belongs to a set of items 1 to m except the item j, wherein n is the number of users, and m is the number of items.
4. The method for decomposition of a multiple correlation differential privacy matrix in a non-independent, co-distributed environment, according to claim 1, in step 2.2,
ith column vector of user random noise matrixComprises the following steps:
<mrow> <msubsup> <mi>N</mi> <mi>i</mi> <mi>U</mi> </msubsup> <mo>~</mo> <mi>L</mi> <mi>a</mi> <mi>p</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>U</mi> <mi>S</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> </mrow> <mrow> <mo>&Element;</mo> <mo>/</mo> <mn>2</mn> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
jth column vector of item random noise matrixComprises the following steps:
<mrow> <msubsup> <mi>N</mi> <mi>j</mi> <mi>U</mi> </msubsup> <mo>~</mo> <mi>L</mi> <mi>a</mi> <mi>p</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mi>V</mi> <mi>S</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> </mrow> <mrow> <mo>&Element;</mo> <mo>/</mo> <mn>2</mn> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>
where, USEns represents the sensitivity of the user factor matrix, VSens represents the sensitivity of the item factor matrix, epsilon represents the set privacy budget, and Lap (.) represents the probability density function of the Laplace distribution.
5. The method according to claim 1, wherein in step 3.4, the partial derivatives of the ith row of the Uf matrix U are calculated according to the partial derivatives of the Uf matrix UComprises the following steps:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, lambda is the user regular item parameter, ulA column vector representing the corresponding ith user in the user factor matrix U,representing the correlation coefficient between the ith user and the ith user,the ith column vector representing the user random noise matrix,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,indicates the item number j, l ∈ [ n ] evaluated by the ith user]-iIndicating that user i belongs to a set of 1 to n except user i, and T indicates transposition.
6. The method of claim 1, wherein the partial derivatives of the jth row of the item factor matrix V in step 3.5 are partial derivatives of the jth row of the dependent co-distributed environmentComprises the following steps:
wherein v isjA column vector, u, representing the corresponding jth item in the item factor matrix ViRepresenting the column vector, r, of the ith user in the user factor matrix UijRepresents the grade of the ith user to the jth item, mu is a regular item parameter of the item, vkA column vector representing the corresponding kth entry in the entry factor matrix V,representing the correlation coefficient between the jth item and the kth item,representing the item random noise matrixThe number of the j column vectors is,representing the set of user and item pairs with scoring values in the scoring matrix R, M representing the number of user and item pairs with scoring values in the scoring matrix R,user number i, k ∈ [ m ] indicating that item j was evaluated]-iIndicating that item k belongs to a set of items 1 to m except item j and T indicates a transpose.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711065040.4A CN107766742B (en) | 2017-11-02 | 2017-11-02 | Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711065040.4A CN107766742B (en) | 2017-11-02 | 2017-11-02 | Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107766742A true CN107766742A (en) | 2018-03-06 |
CN107766742B CN107766742B (en) | 2021-02-19 |
Family
ID=61272434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711065040.4A Active CN107766742B (en) | 2017-11-02 | 2017-11-02 | Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107766742B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443061A (en) * | 2018-05-03 | 2019-11-12 | 阿里巴巴集团控股有限公司 | A kind of data ciphering method and device |
CN111079177A (en) * | 2019-12-04 | 2020-04-28 | 湖南大学 | Wavelet transform-based privacy protection method for time correlation in track data |
CN111177781A (en) * | 2019-12-30 | 2020-05-19 | 北京航空航天大学 | Differential privacy recommendation method based on heterogeneous information network embedding |
CN112668044A (en) * | 2020-12-21 | 2021-04-16 | 中国科学院信息工程研究所 | Privacy protection method and device for federal learning |
CN113204793A (en) * | 2021-06-09 | 2021-08-03 | 辽宁工程技术大学 | Recommendation method based on personalized differential privacy protection |
CN113821732A (en) * | 2021-11-24 | 2021-12-21 | 阿里巴巴达摩院(杭州)科技有限公司 | Item recommendation method and equipment for protecting user privacy and learning system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170091824A1 (en) * | 2015-09-25 | 2017-03-30 | The Provost, Fellows, Foundation Scholars, And The Other Members Of The Board | Method and system for providing item recommendations in a privacy-enhanced manner |
CN106557654A (en) * | 2016-11-16 | 2017-04-05 | 中山大学 | A kind of collaborative filtering based on difference privacy technology |
CN106651549A (en) * | 2017-01-09 | 2017-05-10 | 山东大学 | Individualized automobile recommendation method and system fusing supply-demand chain |
-
2017
- 2017-11-02 CN CN201711065040.4A patent/CN107766742B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170091824A1 (en) * | 2015-09-25 | 2017-03-30 | The Provost, Fellows, Foundation Scholars, And The Other Members Of The Board | Method and system for providing item recommendations in a privacy-enhanced manner |
CN106557654A (en) * | 2016-11-16 | 2017-04-05 | 中山大学 | A kind of collaborative filtering based on difference privacy technology |
CN106651549A (en) * | 2017-01-09 | 2017-05-10 | 山东大学 | Individualized automobile recommendation method and system fusing supply-demand chain |
Non-Patent Citations (2)
Title |
---|
TIANQING ZHU等: "Correlated Differential Privacy: Hiding Information in Non-IID Data Set", 《IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY》 * |
何明等: "一种基于差分隐私保护的协同过滤推荐方法", 《计算机研究与发展》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443061A (en) * | 2018-05-03 | 2019-11-12 | 阿里巴巴集团控股有限公司 | A kind of data ciphering method and device |
CN111079177A (en) * | 2019-12-04 | 2020-04-28 | 湖南大学 | Wavelet transform-based privacy protection method for time correlation in track data |
CN111079177B (en) * | 2019-12-04 | 2023-01-13 | 湖南大学 | Privacy protection method based on wavelet transformation and used for time correlation in track data |
CN111177781A (en) * | 2019-12-30 | 2020-05-19 | 北京航空航天大学 | Differential privacy recommendation method based on heterogeneous information network embedding |
CN112668044A (en) * | 2020-12-21 | 2021-04-16 | 中国科学院信息工程研究所 | Privacy protection method and device for federal learning |
CN112668044B (en) * | 2020-12-21 | 2022-04-12 | 中国科学院信息工程研究所 | Privacy protection method and device for federal learning |
CN113204793A (en) * | 2021-06-09 | 2021-08-03 | 辽宁工程技术大学 | Recommendation method based on personalized differential privacy protection |
CN113821732A (en) * | 2021-11-24 | 2021-12-21 | 阿里巴巴达摩院(杭州)科技有限公司 | Item recommendation method and equipment for protecting user privacy and learning system |
Also Published As
Publication number | Publication date |
---|---|
CN107766742B (en) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107766742B (en) | Multi-correlation differential privacy matrix decomposition method under non-independent same-distribution environment | |
Garreta et al. | Learning scikit-learn: machine learning in python | |
CN107766745B (en) | Hierarchical privacy protection method in hierarchical data release | |
CN111177781A (en) | Differential privacy recommendation method based on heterogeneous information network embedding | |
CN111400612B (en) | Personalized recommendation method integrating social influence and project association | |
CN111125517B (en) | Implicit matrix decomposition recommendation method based on differential privacy and time perception | |
CN113918832B (en) | Graph convolution collaborative filtering recommendation system based on social relationship | |
CN110837603B (en) | Integrated recommendation method based on differential privacy protection | |
Chen et al. | Deep tensor factorization for multi-criteria recommender systems | |
Wang et al. | DNN-DP: Differential privacy enabled deep neural network learning framework for sensitive crowdsourcing data | |
CN112883070B (en) | Generation type countermeasure network recommendation method with differential privacy | |
CN113569286B (en) | Frequent item set mining method based on localized differential privacy | |
CN110490002B (en) | Multidimensional crowdsourcing data truth value discovery method based on localized differential privacy | |
CN108470052A (en) | A kind of anti-support attack proposed algorithm based on matrix completion | |
CN112800207B (en) | Commodity information recommendation method and device and storage medium | |
CN113918833A (en) | Product recommendation method realized through graph convolution collaborative filtering of social network relationship | |
Do et al. | Unveiling hidden implicit similarities for cross-domain recommendation | |
CN113918834A (en) | Graph convolution collaborative filtering recommendation method fusing social relations | |
Kasap et al. | A polynomial modeling based algorithm in top-N recommendation | |
CN109033453A (en) | RBM and differential privacy protection based clustering movie recommendation method and system | |
CN106203165A (en) | The big data analysis method for supporting of information based on credible cloud computing | |
CN114065016A (en) | Recommendation method, device, equipment and computer readable storage medium | |
Chiang | A Note on the⊤‐Stein Matrix Equation | |
CN113342994B (en) | Recommendation system based on non-sampling cooperative knowledge graph network | |
Chen et al. | Sparse general non-negative matrix factorization based on left semi-tensor product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |