CN115718927A

CN115718927A - Difference privacy mixed recommendation method based on untrusted server

Info

Publication number: CN115718927A
Application number: CN202211295662.7A
Authority: CN
Inventors: 杨昌松; 唐紫薇; 丁勇; 柳悦玲; 刘洋
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-02-28

Abstract

The invention belongs to the technical field of privacy protection, and discloses a differential privacy mixed recommendation method based on an untrusted server, which comprises the steps of constructing a new privacy protection recommendation system framework, utilizing implicit feedback behavior data of a user, considering different value sensitivity and data distribution at a client, disturbing original data by using an LCF-VDP mechanism and uploading the disturbed original data to a server; and the server mixes the similarity of the two algorithms, finally selects the topk mixed similarity to send to each user device, and carries out prediction score calculation and recommendation in each user device. The invention provides an LDP-VDP mechanism by considering the deployment of a recommendation system in an untrusted server and considering the difference of numerical sensitivities and the problem of original data distribution, and designs a new framework for protecting privacy and safety by client-server cooperation on the basis, wherein the framework mixes two recommendation algorithms, so that the defects of each recommendation algorithm can be effectively overcome, and the advantages of each recommendation algorithm can be better exerted.

Description

Difference privacy mixed recommendation method based on untrusted server

Technical Field

The invention belongs to the technical field of privacy protection, and particularly relates to a difference privacy mixed recommendation method based on an untrusted server.

Background

At present, the rapid development of technologies such as internet, big data, cloud computing and the like brings convenience to the life of people and causes the problem of 'information overload'. Therefore, systematic inoculation is recommended. The method helps the user to filter the items which are unlikely to generate behaviors from the mass data by mining the binary relation between the user and the items, and generates personalized recommendations.

Nowadays, recommendation systems have become indispensable tools in the fields of social networking sites, movie entertainment, electronic commerce and the like. However, the recommendation system needs to collect a large amount of user information and user behaviors, and the collected information often reveals the privacy of the user. Users may be reluctant to record and store their own data by the recommendation system for privacy and security reasons. Moreover, china marked the privacy protection of data by the recommendation management regulation of internet information service algorithm in 11 months in 2021, and the problem is urgently needed to be solved in the research field of recommendation systems.

Existing privacy technologies are largely centered around anonymity, encryption, and perturbation technologies. The anonymity technology ensures the privacy of the user by generalizing the user identifier to an equivalence class, such as the k anonymity technology, but the k anonymity technology is very easy to be attacked by an attacker with background knowledge, and cannot achieve the purpose of user privacy security. The encryption technology encodes the data plaintext into the ciphertext which can be decoded only by specific personnel, the confidentiality of the data in the storage and transmission processes is guaranteed, but the encryption technology is applied to a recommendation system and relates to key transmission and encryption calculation of a large amount of user data, and huge communication and operation expenses cause that the data is difficult to fall to the ground in an actual application scene.

In the recommendation system, differential privacy can be achieved only through a noise adding mechanism, and no extra calculation overhead is needed, so that the recommendation system has attracted much attention. The differential privacy protection technology proposed by Dwork et al solves the ubiquitous background knowledge attack problem in most privacy protection technologies. Compared with the traditional cryptology security model, the differential privacy can quantize the privacy protection degree through the privacy budget, so that the data security degree between different privacy protection models is comparable. McSherry et al add noise satisfying differential privacy to implement interference when building a term similarity covariance matrix, and then submit the noise to a recommendation system to implement recommendation, thereby achieving the purpose of privacy protection. Chen et al first divides the data set into sets of suitably sized classes, then skillfully uses an exponential mechanism to select a set of neighbors from the target class, and finally performs a recommendation calculation based on the set of neighbors. Zhang et al improves the similarity function on the basis of Chen et al, performs weighted calculation on a plurality of similarities to construct a mixed similarity, improves a clustering algorithm, and effectively improves recommendation accuracy.

However, most of the existing recommendation system models based on differential privacy consider the server as credible and do not conform to the actual scene. Local Differential Privacy (LDP) is a robust privacy protection model following centralized differential privacy techniques, which fully considers the possibility of data collectors stealing or revealing user privacy during the acquisition process. RAPPOR is a representative technique of local differential privacy, but each user of the RAPPOR needs to transmit a vector with the length h to a data collector, so the transmission cost between the user and the data collector is high. Aiming at the problem of high communication cost, after each user codes a character string in the S-Hist method, one bit of the character string is randomly selected, and the character string is disturbed by using a random response technology and then is sent to a data collector, so that the transmission cost is reduced. Wang et al analyzed the characteristics of existing LDP technology, proposed a "pure" protocol framework, and introduced aggregation and decoding techniques applicable to all "pure" protocols.

In short, the local differential privacy mechanism makes it impossible for an attacker to deduce which record the input data is based on an output result of the privacy algorithm. However, it treats all user data as equally sensitive, with more noise than traditional centralized differential privacy, which can seriously impact the usability of the algorithm. Meanwhile, most of the existing privacy protection recommendation algorithms are single recommendation algorithms aiming at explicit feedback behavior data and are only suitable for trusted server scenarios. However, most recommendation service providers are not trusted, thereby posing a significant privacy risk. Therefore, it is desirable to design a new method and system for mixed recommendation of differential privacy.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) In the existing privacy technology, the anonymity technology is very easy to be attacked by attackers with background knowledge, and the purpose of user privacy safety cannot be achieved; the encryption technology applied to the recommendation system relates to key transmission and encryption calculation of a large amount of user data, and huge communication and operation overhead causes the recommendation system to be difficult to fall on the ground in an actual application scene.

(2) Most of the existing recommendation system models based on the differential privacy consider the server as credible and do not conform to the actual scene; all user data are considered equally sensitive by the local differential privacy mechanism, and the usability of the algorithm is seriously affected by larger noise brought by the local differential privacy mechanism than that brought by the traditional centralized differential privacy mechanism.

(3) The conventional privacy protection recommendation algorithm is a single recommendation algorithm aiming at explicit feedback behavior data and is only suitable for a trusted server scene, however, most recommendation service providers are not trusted, and therefore a significant privacy risk is caused.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a differential privacy mixed recommendation method based on an untrusted server, and particularly relates to a differential privacy mixed recommendation method, a differential privacy mixed recommendation system, a differential privacy mixed recommendation medium, a differential privacy mixed recommendation device and a terminal.

The invention is realized in such a way, and provides a differential privacy hybrid recommendation method, which comprises the following steps: constructing a new privacy protection recommendation system framework, utilizing implicit feedback behavior data of a user, considering different numerical sensitivity and data distribution at a client, and using an LCF-VDP mechanism to disturb original data and upload the data to a server; and the server mixes the similarity of the two algorithms, finally selects the topk mixed similarity to send to each user device, and carries out prediction score calculation and recommendation in each user device.

Further, the client-server collaborative protection hybrid recommendation system framework based on differential privacy comprises the following steps:

the method comprises the following steps: in the client-side user data privacy processing, the privacy disclosure of the user uploaded to the server-side is guaranteed to be limited within the differential privacy-defined privacy budget,

step two: item-item similarity is mixed at a server side, and specifically, content-based recommendation is added to a collaborative filtering recommendation algorithm based on items, so that the problem of cold start of articles can be solved. The method has the advantages that (two recommendation algorithms are mixed, so that the defects of each recommendation algorithm can be effectively made up and the advantages of each recommendation algorithm can be better exerted), and the problem of privacy of a user is solved because extra information of the user is not introduced when the item-item similarity is calculated on the basis of the content on the server.

Step three: the prediction score is calculated and recommended locally at the client, so that the problem that the privacy of the user is leaked due to the fact that sensitive information of the user is deduced according to the recommendation result calculated by the server can be solved.

Further, designing a data perturbation technology LCF-VDP suitable for the recommendation system in the step one, and perturbing and calculating the item-item similarity extracted from the user implicit data are specifically as follows:

and (3) by utilizing a designed data perturbation technology LCF-VDP suitable for a recommendation system, considering the problems of data distribution and numerical sensitivity, and enabling the probability of 0 responding to the true value to be unequal to the probability of 1 responding to the true value.

Privacy in the context of a recommendation system is defined as: when there are m entries, let X = [ X1, X = ₂ ,...x _m ]And Y = [ Y = ₁ ,y ₂ ,...y _m ]Respectively representing real interactive data and disturbed interactive data of a user, wherein X ⁱ And Y ⁱ The real interactive data and the disturbed interactive data respectively representing the user comprise items I _i Or does not contain item I _i According to the definition of differential privacy

Wherein

Then:

designing a new perturbation mechanism: when the numerical value is 1, returning the original value by the probability of p, setting the probability of 1-p as 0, when the numerical value is 0, setting the probability of q as 1, and returning the original value by the probability of 1-q; when p/q is less than or equal to e ^ε When, the mechanism satisfies-differential privacy; wherein p is [1/2,1 ]]When p =1-q, it is the conventional perturbation mechanism. When an extreme case p =1 is considered, only data with the value of 1 is subjected to privacy protection; when the data after receiving a disturbance is 0, the original value is also 0, and privacy protection is not performed on the value 0, however, the original value may also be 0 when the data after receiving a disturbance is 1; the degree of privacy protection is controlled according to the controls p and q.

In addition, the mixed item-item similarity at the server end in the step two is specifically as follows:

calculating similarity sim _ jac (I) from the disturbed user-project interaction matrix _i ,I _j )；

Wherein, interaction (Y) _i ,Y _j ) Representing the number of users, union (Y), that interacted with item i and item j simultaneously from the perturbed user-item matrix _i ,Y _j ) Representing the number of users interacting at least item i or item j from the perturbed user-item matrix.

Calculating similarity sim _ tag (I) from item labels _i ,I _j ) The similarity weighted combination is in the form:

sim(I _i ,I _j )＝a*sim_jac(I _i ,I _j )+b*sim_tag(I _i ,I _j )。

(3) Project prediction scoring calculation recommendations

Finally, the calculation of the project prediction score at the client side in the third step is specifically as follows:

the project prediction scores are in the form:

wherein sim (I) _i ,I _j ) Is the mixed similarity.

Another object of the present invention is to provide a differential privacy hybrid recommendation system applying the differential privacy hybrid recommendation method, including:

the user data privacy processing module is used for uploading the disturbed interactive data to the server in order to ensure the data privacy safety of the user, and the disturbing mode uses the LCF-VDP technology;

the mixed recommendation construction module is used for providing a mixed recommendation frame based on client-server cooperative protection of differential privacy by considering the scene of the untrusted server;

and the client side score calculation recommendation module is used for receiving the mixed similarity data, calculating the prediction score according to the locally stored real interactive data, and finally taking topN from the project prediction score and recommending the topN to the user.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the differential privacy mixing recommendation method.

It is a further object of the present invention to provide a computer readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the differential privacy mixing recommendation method.

Another object of the present invention is to provide an information data processing terminal for implementing the differential privacy hybrid recommendation system.

By combining the technical scheme and the technical problem to be solved, the technical scheme to be protected by the invention has the advantages and positive effects that:

the recommendation algorithm for most existing privacy protection is a single recommendation algorithm for explicit feedback behavior data and is only applicable to trusted server scenarios. However, most recommendation service providers are not trusted, so that a significant privacy risk problem is caused, the invention provides a new privacy protection recommendation system framework, the framework utilizes the implicit feedback behavior data of the user, the numerical sensitivity and the data distribution are different when the client considers the data collection, and an LCF-VDP (local logic filtering-value differential privacy) mechanism is used for disturbing the original data and uploading the original data to the server; and the server mixes the similarity of the two algorithms, finally selects the topk mixed similarity to send to each user device, and carries out prediction score calculation and recommendation in each user device. The experimental result shows that compared with the traditional perturbation method, the new perturbation mechanism improves the accuracy of recommendation, and the invention is the first mixed recommendation system framework of client-server cooperative protection based on differential privacy definition aiming at the implicit feedback behavior data of the user. For different data sets, the height adjustable parameters of the invention can select proper disturbance parameters according to different data distributions, and a content-based recommendation algorithm plate can adopt a better algorithm for extracting the similarity of items according to different application scenes.

The differential privacy mixed recommendation method provided by the invention considers that a recommendation system is deployed in an untrusted server, considers the difference of numerical sensitivity and the problem of original data distribution, provides an LDP-VDP (local collaborative filtering-value differential privacy) mechanism, and designs a new framework for protecting privacy security by client-server cooperation on the basis, wherein the framework mixes two recommendation algorithms, so that the defects of each recommendation algorithm can be effectively overcome, and the advantages of each recommendation algorithm can be better exerted.

The technical scheme of the invention fills the technical blank in the industry at home and abroad: according to the invention, the first hybrid recommendation system framework based on the client-server cooperative protection of the differential privacy definition aiming at the implicit feedback behavior data of the user is provided. For different data sets, the height adjustable parameters of the invention can select proper disturbance parameters according to different data distributions, and a content-based recommendation algorithm plate can adopt a better algorithm for extracting the similarity of items according to different application scenes.

The technical scheme of the invention solves the technical problem that people are eagerly to solve but can not be successfully solved all the time: the recommendation algorithm based on the traditional cryptography is difficult to land in an actual application scene due to huge communication and operation overhead caused by key transmission and encryption calculation of a large amount of user data, the safety model based on the differential privacy can be realized only by adding noise, and the degree of privacy protection can be controlled through privacy budget, so that the problem of user data privacy safety in the application scene of the recommendation system is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a hybrid recommendation method for differential privacy according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a centralized differential privacy data processing framework provided by embodiments of the present invention;

FIG. 3 is a schematic diagram of a data processing framework for local differential privacy provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of a framework of a hybrid recommendation system based on differential privacy protection according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a perturbation mechanism provided by an embodiment of the present invention to satisfy a differential privacy definition;

FIG. 6 is a schematic diagram illustrating the comparison of the accuracy of two perturbation mechanisms provided by the embodiment of the present invention under different privacy budgets;

FIG. 7 is a graph illustrating the comparison of the accuracy of two perturbation mechanisms provided by the embodiment of the present invention under different Jaccard similarity weights;

FIG. 8 is a schematic diagram of the accuracy of two perturbation mechanisms provided by the embodiment of the present invention under different topk similarity matrices;

fig. 9 is a schematic diagram of the accuracy of two perturbation mechanisms provided by the embodiment of the present invention under different topN recommendation results.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a method, a system, a medium, a device and a terminal for recommending a mixture of differential privacy, which are described in detail below with reference to the accompanying drawings.

This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.

As shown in fig. 1, the differential privacy hybrid recommendation method provided in the embodiment of the present invention includes the following steps:

s101, designing a data disturbance technology LCF-VDP suitable for a recommendation system, storing a real interaction record at a client of a user, carrying out user data privacy processing by adopting the LCF-VDP and uploading the processed data to a server by considering different problems of user data distribution and numerical sensitivity.

S102, defining a mixing mode of a server side; considering the scene of the untrusted server, adopting a simple weighted combination mode to mix the similarity extracted from the data after privacy processing with the similarity extracted from the content;

s103, designing a prediction score calculation and recommendation method; and calculating project prediction scores locally at the client according to the stored real interaction records and the topk mixed similarity, and taking topN from the calculated project prediction scores and recommending the topN to the user.

As a preferred embodiment, the differential privacy mixed recommendation method provided in the embodiment of the present invention specifically includes the following steps:

1. recommendation algorithm for privacy protection

1.1 item-based collaborative filtering for privacy protection

The idea of the project-based collaborative filtering algorithm is as follows: the same user establishes similarity among items, selects n items with the most similar degree as the neighbors of the items, and then recommends the similar items according to the items liked by the target user before when recommending the items to the target user. The method for collaborative filtering based on the project is defined in three aspects of similarity calculation, prediction score calculation and recommendation method.

Suppose there are n users U = { U = { (U) ₁ ,U ₂ ,...U _n And m items I = { I = } ₁ ,I ₂ ,...I _m }, user U _u Recording the historical interaction data of the E U as X _u Having X _u ∈I。

Firstly, an item-item similarity matrix is calculated from a disturbed user-item interaction matrix by using the Jaccard similarity, and the item-based coordinated filtering recommendation algorithm takes the user privacy protection into consideration.

Wherein, interaction (I) _i ,I _j ) Indicates the number of users, union (I), who have interacted with item I and item j simultaneously _i ,I _j ) Representing the number of users that interacted with at least item i or item j.

Second, the user U is calculated by the weighted sum of the interaction records of the neighbor items _u Item I of _i Prediction score of (2):

wherein sim (I) _i ,I _j ) Indicates the similarity of item i and item j, X _u Representing the historical interaction record of user u for item j.

1.2 content-based recommendation Algorithm

The idea of the content-based collaborative filtering algorithm is: and calculating the user preference according to the articles the user likes in the past period of time, and recommending similar articles for the user. The present invention calculates project-project similarity based on tags extracted from the projects.

Wherein, interaction _ tags (I) _i ,I _j ) Number of intersections of tags representing item I and item j, union _ tags (I) _i ,I _j ) The union number of labels representing item i and item j.

1.3 Mixed recommendation Algorithm for privacy protection

The project-based collaborative filtering has the capability of recommending new information, can discover interest preferences of target users which are potential but not discovered by the target users, can effectively share the experience of other people as a collective thought, avoids the incompleteness and inaccuracy of content analysis, but needs a large amount of user historical behavior data to predict the future behaviors and interests of the users, namely has the cold start problem, and the content-based recommendation algorithm has the advantages that a large amount of users are not needed, but the potential interests of the users cannot be mined. Therefore, the invention mixes the two recommendation algorithms, thereby eliminating the defects of the single recommendation algorithm and better playing the advantages of the respective algorithms.

The method for mixing the similarity is multiple, the invention adopts a simple weighted combination mode to mix the similarity, and the concrete form is as follows:

mixed_sim(I _i ,I _j )＝a*sim_jac(I _i ,I _j )+b*sim_tag(I _i ,I _j ) (4)

wherein a is sim _ jac (I) _i ,I _j ) Weight of (c), sim _ tag (I) _i ,I _j ) Is b. When a =1,b =0, the hybrid algorithm is degraded to the collaborative filtering algorithm based on items, and it is noted that the item-item similarity extracted from the implicit user data is calculated from the disturbed data, i.e. the data after privacy processing.

2. Privacy definition

2.1 differential privacy

Differential privacy is a privacy protection technique based on data distortion. The data is distorted by adding noise to the query or analysis result, and the influence of the operation of inserting or deleting a record in a certain data set on the output result of any query is ensured to be controlled within a given privacy budget, so that the aim of privacy protection is fulfilled. The formalization of differential privacy is defined as follows:

define 1. -differential privacy. For any pair of adjacent data sets D and D' differing by only one data, and any possible query result S, the algorithm M satisfies-differential privacy if one privacy protection mechanism M can make the query result on the adjacent data sets satisfy equation (5).

Wherein epsilon refers to the privacy budget, which is used to measure privacy and data availability. The smaller epsilon, the better the privacy, the worse the availability of the data, and vice versa. Fig. 2 is a data processing framework for a centralized differential privacy (traditional differential privacy) protection technique.

2.2 local differential privacy

Local differential privacy is a model framework that fully considers the stealing or disclosure of user privacy by data collectors during data collection. In the model, each user firstly carries out privacy processing on data, then the processed data are sent to a data collector, and the data collector carries out statistics on the collected data to obtain an effective analysis result. Formalization of local differential privacy is defined as follows:

define 2 local differential privacy. Given n users, each user gives a privacy algorithm M and its definition domain Dom (M) and value domain Ran (M) for a record, and if the algorithm M obtains the same output result on any two records X and X '(X and X' are both in Dom (M)), the algorithm M obtains the same output result

Satisfying equation (6), the algorithm M satisfies-local differential privacy.

Where ε refers to the privacy budget used to measure privacy and data availability. The smaller epsilon, the better the privacy, the worse the availability of the data, and vice versa. Fig. 3 is a data processing framework for a local differential privacy protection technique.

3. System model definition

Considering the scenario of an untrusted server, the framework of the client-server cooperative protection recommendation system based on differential privacy provided by the invention is as follows:

a) And (4) privacy processing of client user data. The client of the user stores real interactive data, and in order to ensure the data privacy safety of the user, the interactive data is disturbed and then uploaded to the server, and the disturbance mode uses LCF-VDP technology.

b) The server calculates and mixes the similarities. And the server receives the disturbed interactive data and calculates the project-project similarity according to the disturbed data of all the users. In order to solve the problem of cold start of the system, the server stores the information of the project, screens out the keywords which can represent the project most as the tags of the project, and then carries out similarity calculation between the projects according to the tags, thereby ensuring that the newly added project is possible to be recommended. And finally, the server generates a new project-project similarity matrix by the project-project similarity matrixes generated in the two different modes in a certain weighting mode, selects topk from the new project-project similarity matrix and sends the topk to the client. Because the similarity calculated from the tags extracted from the items themselves is an inherent attribute of the items and does not relate to the interaction behavior of the user, the operation of calculating the similarity according to the tags at the server side does not reveal the privacy of the user.

c) The client calculates the prediction scores and recommends. Traditional calculations of recommendations are handed to the server for completion, however recommendations are predictions of the user's future behavior that can be used to infer sensitive information about the user. The invention considers it as possible to reveal privacy, so the proposed method will calculate the prediction score locally at the user's client and make recommendations. And the client receives the mixed similarity data and calculates a prediction score according to the real interaction data stored locally. Fig. 4 is a hybrid recommendation system framework for client-server cooperative protection based on differential privacy definition for implicit feedback behavior data proposed by the present invention.

4. Specific methods

4.1 data perturbation

The conventional random response technique simply responds to the true value with a probability of p and responds to the false value with a probability of 1-p, regardless of the distribution of the original data, i.e., the probabilities of 0 and 1 responding to the true value in the original data are the same. However, in the recommendation system, the user-item interaction matrix is sparse, and the sensitivity of 1 in the user-item interaction matrix is far greater than 0, so that the invention designs a new perturbation technology suitable for the recommendation system, named as LCF-VDP, which considers the problems of data distribution and numerical sensitivity to make the probability of 0 responding to the true value not equal to the probability of 1 responding to the true value.

Implicit definition in a recommendation system scenario is: assuming that there are m entries, let X = [ X ] ₁ ,x ₂ ,...x _m ]And Y = [ Y = ₁ ,y ₂ ,...y _m ]Respectively representing real interactive data and disturbed interactive data of a user, wherein X ⁱ And Y ⁱ The real interactive data and the disturbed interactive data respectively representing the user comprise items I _i Or does not contain item I _i According to the definition of 2.1 differential privacy

Wherein

Comprises the following steps:

designing a new perturbation mechanism: when the numerical value is 1, the probability of p is used for returning the original value, the probability of 1-p is used for setting 0, when the numerical value is 0, the probability of q is used for setting 1, and the probability of 1-q is used for returning the original value. Easy to calculate, when p/q is less than or equal to e ^ε This mechanism satisfies-differential privacy. Wherein p is [1/2,1 ]]When p =1-q, it is the conventional perturbation mechanism. In addition, when extreme p =1 is considered, the invention actually performs privacy protection only on data with a value of 1, that is, if the invention receives a disturbed data with a value of 0, the invention can ensure that the original value is also 0, and the value of 0 is not subjected to privacy protection, whereas if the invention receives a disturbed data with a value of 1, the original value may also be 0. The present invention can control the degree of privacy protection based on controlling p and q. Fig. 5 shows a conventional perturbation method (a) and an LCF-VDP mechanism (b), respectively.

4.2 similarity calculation

a) Calculating similarity sim _ jac (I) from the disturbed user-project interaction matrix _i ,I _j )。

Wherein, interaction (Y) _i ,Y _j ) Represents the number of users, union (Y), who interacted with item i and item j simultaneously from the perturbed user-item matrix _i ,Y _j ) Representing the number of users that interacted with at least item i or item j from the perturbed user-item matrix.

b) The content-based recommendation algorithm is described in detail in section 1.2 of the present invention. Calculating similarity sim _ tag (I) from item labels _i ,I _j )。

c) Regarding the mixed recommendation algorithm without privacy protection, which is described in detail in section 1.3 of the present invention, the similarity weighted combination is as follows:

sim(I _i ,I _j )＝a*sim_jac(I _i ,I _j )+b*sim_tag(I _i ,I _j ) (9)

4.3 project prediction score calculation recommendation

The calculation of project prediction scores has been described in detail in section 1.1 of the present invention. The project prediction score is in the form of:

note sim (I) at this time _i ,I _j ) Is the mixed similarity.

And finally, taking topN from the calculated project prediction scores and recommending the topN to the user.

The differential privacy mixed recommendation system provided by the embodiment of the invention comprises:

the mixed recommendation construction module is used for providing a client-server cooperative protection mixed recommendation framework based on differential privacy by considering the scene of the untrusted server;

and the client side score calculation recommendation module is used for receiving the mixed similarity data, calculating a prediction score according to the locally stored real interactive data, and finally taking topN from the project prediction score and recommending the topN to the user.

The embodiment of the invention has some positive effects in the process of research and development or use, and indeed has great advantages compared with the prior art, and the following contents are described by combining data, graphs and the like in the experimental process.

1. Experimental methods

1.1 data set

The invention selects a data set Movielens-100k which is widely used in the field of recommendation systems. It contains 943 users, 1682 movies, a total of 100,000 rating data (1-5) and movie data information and attribute information of the users.

Because the recommendation algorithm is based on the implicit feedback behavior data of the user and focuses more on protecting the historical behavior data of the user instead of the score value, the score value is converted into binary representation behavior data, 1 represents interacted, and 0 represents non-interacted. The categories of movies in the dataset are additionally processed as labels in preparation for subsequent extraction of item-item similarities.

1.2 comparative method

In an untrusted server scenario, the research on a project-based collaborative filtering recommendation algorithm for differential privacy protection is less, and in order to evaluate the performance of a new perturbation method cited in consideration of different numerical sensitivities and data distributions, the present invention compares with a traditional indistinguishable numerical value cited in the documents "Guo T, luo J, dong K, et al, localization differential privacy private entity-based collaborative filtering [ J ]. Information Sciences,2019, 502". The performance of the method is examined on a single recommendation algorithm and a mixed recommendation algorithm.

1.3 evaluation index

For comparison with the perturbation method in the document "Guo T, luo J, dong K, et al, local differential private item-based collaborative filtering [ J ]. Information Sciences,2019, 502", an evaluation index defining the recommendation accuracy in this document is used.

Given target user U _u E is U, N items are always recommended by a recommendation algorithm without privacy protection, and the recommendation algorithm is marked as R _u However, a recommendation algorithm based on differential privacy protection may recommend any possible N items to a target user, resulting in a deviation of the prediction score from the recommendation prediction score without privacy protection. From this deviation, the recommended accuracy is defined as follows:

wherein the content of the first and second substances,

is a recommended item list generated based on a recommendation algorithm for differential privacy protection, len () is a function that returns the size of the list.

1.4 parameter settings

Experimental setup document "Guo T, luo J, dong K, et al, locally differential private item-based colloidal fibrous filtering [ J]The parameters in Information Sciences,2019,502 " ^ε /e ^ε +1. In addition, theoretically, p of the LCF-VDP mechanism can be taken to be [1/2,1 ]]Any value in between, i.e. different data sets can find the best p and q according to their data distribution, where the invention considers its extreme case, p =1, corresponding to q = p/e ^ε Differential privacy is satisfied.

2. Results and analysis of the experiments

Experiment 1: effect of different privacy budget on recommendation results

This experiment is intended to investigate the impact of privacy projections on the recommendation results. To better verify the impact of different privacy budget allocations on the utility of the perturbation mechanism, N =10,k =20,a =1,b =0 was set in the Movielens-100k dataset, when the hybrid recommendation algorithm degenerates to the project-based collaborative filtering recommendation algorithm, the recommendation accuracy of the algorithm perturbed by the LCF-VDP mechanism when epsilon =1,k =20,n =10 was calculated, and the experimental alignment was performed with the same algorithm randomized by the conventional perturbation method. The results of the comparative experiment are shown in FIG. 6.

The accuracy of the two perturbation mechanisms at different privacy budgets is shown in fig. 6. With the increase of the privacy budget epsilon, the higher the recommendation accuracy of the two algorithms, which represents better data availability, and in line with the theory of differential privacy, when the proportion of the probability that the perturbation mechanism outputs the same result is larger, the added noise is less, and the data availability is increased. In practical applications, an appropriate privacy budget may be selected according to the recommendation system accuracy requirements.

Experiment 2: influence of disturbance mechanism on recommendation result under different similarity weights

The purpose of the experiment is to examine the utility of a perturbation mechanism in a two-similarity arbitrary weighted combination hybrid algorithm. The invention sets epsilon =1, k =20 and N =10, the weight for adjusting the Jaccard similarity gradually increases from 0.1 to 1, and the corresponding weight for item-item similarity extracted according to the label gradually decreases, and the data disturbed by the LCF-VDP mechanism is always higher than the data randomized by the traditional disturbance mode. The LCF-VDP mechanism provided by considering the difference of numerical sensitivity and data distribution has better utility. The accuracy of the two perturbation mechanisms at different Jaccard similarity weightings is shown in FIG. 7.

Experiment 3: influence of different similarity numbers k on recommendation results

In order to verify the influence of the similarity number k on the recommendation result, based on a Movielens-100k dataset, taking the privacy budget epsilon =1, the jaccard similarity weight value as 0.5, the fixed recommendation list number N as 10, setting the similarity number k as {10, 20, 30, 40, 50 respectively, calculating the recommendation accuracy after being disturbed by the LCF-VDP mechanism, and comparing the recommendation accuracy with a traditional disturbance mechanism in the same experimental environment, wherein the test result is shown in fig. 8.

From experimental results, it can be seen that the larger the k value is, the better the k value is, and because of the influence of the weight distribution ratio and the hyperparameter N, the extreme value of the LCF-VDP mechanism, that is, setting p =1, is not necessarily the optimal result, but because the overall user-item matrix is sparse and the number of 0 is much greater than 1, the privacy budget is mostly distributed to 0, and the effect of the probability is better than that of the conventional perturbation. It is worth noting that the privacy protection recommendation system framework designed by the invention calculates the prediction score and recommends locally at the client of the user, that is, the topk mixed similarity needs to be selected, and the proper k value greatly affects the performance of the algorithm.

Experiment 4: influence of different recommendation list number N on recommendation result

In order to verify the influence of different recommendation list numbers N on the recommendation result, based on the Movielens-100k dataset, the privacy budget ∈ =1, the jaccard similarity weight value is 0.5, the fixed similarity number k is 50, the recommendation list numbers N are set to {10, 20, 30, 40, and 50, respectively, the recommendation accuracy after being disturbed by the LCF-VDP mechanism is calculated, and compared with the traditional disturbance mechanism in the same experimental environment, and the test result is shown in fig. 9.

From experimental results, it can be seen that the larger the N value is, the better the N value is, because the recommendation algorithm based on the differential privacy protection is to recommend any possible N items, but it can be seen that the LCF-VDP mechanism is always more effective than the conventional perturbation algorithm, thereby further verifying that the proposed LCF-VDP mechanism has higher data availability than the conventional perturbation method in consideration of the difference of the numerical sensitivity and the data distribution.

It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered by the scope of the present invention.

Claims

1. The differential privacy mixed recommendation method based on the untrusted server is characterized by comprising the following steps of: constructing a new privacy protection recommendation system framework, utilizing implicit feedback behavior data of a user, considering different numerical sensitivity and data distribution at a client, and using an LCF-VDP mechanism to disturb original data and upload the data to a server; and the server mixes the similarity of the two algorithms, finally selects the topk mixed similarity and sends the topk mixed similarity to each user device, and performs prediction score calculation and recommendation in each user device.

2. The untrusted server based differential privacy hybrid recommendation method according to claim 1, wherein the untrusted server based differential privacy hybrid recommendation method comprises the steps of:

designing a data perturbation technology LCF-VDP suitable for a recommendation system, storing a real interaction record at a client, carrying out user data privacy processing by adopting the LCF-VDP and uploading the processed data to a server;

defining a mixing mode of a server side, and mixing the similarity extracted from the data after privacy processing with the similarity extracted from the content by adopting a simple weighted combination mode;

designing a prediction score calculation and recommendation method; and calculating project prediction scores locally at the client according to the stored real interaction records and the topk mixed similarity, and taking topN from the calculated project prediction scores and recommending the topN to the user.

3. The untrusted server based mixed recommendation method of differential privacy of claim 2, wherein the non-privacy preserving item-based collaborative filtering in step one comprises:

defining an adopted project-based collaborative filtering method from three aspects of similarity calculation, prediction score calculation and a recommendation method;

when there are n users U = { U = { [ U ] ₁ ,U ₂ ,...U _n And m items I = { I = } ₁ ,I ₂ ,...I _m }, user U _u Marking the historical interaction data of the E U as X _u Having X _u ∈I；

Calculating an item-item similarity matrix from the user-item interaction matrix by using the Jaccard similarity;

wherein, interaction (I) _i ,I _j ) Representing the number of users, units (I), interacting with item I and item j simultaneously _i ,I _j ) Representing at least the number of users interacting with item i or item j;

computing user U with weighted sum of interaction records of neighbor items _u Item I of _i The prediction score of (a);

wherein, sim (I) _i ,I _j ) Indicates the similarity of item i and item j, X _u Representing the historical interaction records of the user u on the item j;

the calculating the item-item similarity according to the tags extracted from the items comprises:

wherein, interaction _ tags (I) _i ,I _j ) Number of intersections of tags representing item I and item j, union _ tags (I) _i ,I _j ) The union number of tags representing item i and item j;

the form of mixing similarity by adopting a simple weighted combination mode is as follows:

mixed_sim(I _i ,I _j )＝a*sim_jac(I _i ,I _j )+b*sim_tag(I _i ,I _j )；

wherein a is sim _ jac (I) _i ,I _j ) Weight of (c), sim _ tag (I) _i ,I _j ) The weight value of (1) is b; when a =1,b =0, the mixing algorithm degenerates into an item-based collaborative filtering algorithm.

4. The untrusted server based differential privacy hybrid recommendation method according to claim 2, wherein the formalization of the differential privacy in the second step is defined as follows:

-differential privacy: for any pair of adjacent data sets D and D' differing by only one piece of data and any possible query result S, if a privacy protection mechanism M can enable the query result on the adjacent data sets to satisfy the following formula, the algorithm M satisfies-differential privacy;

wherein epsilon refers to privacy budget used for measuring privacy and data availability; the smaller epsilon, the better privacy, the worse the availability of the data, and vice versa;

the formalization of the local differential privacy is defined as follows:

local differential privacy: given n users, each user gives a privacy algorithm M and its definition domain Dom (M) and value domain Ran (M) for a record, and if the algorithm M obtains the same output result on any two records X and X '(X and X' are both in Dom (M)), the algorithm M obtains the same output result

If the following formula is satisfied, the algorithm M satisfies-local differential privacy;

wherein epsilon refers to privacy budget used for measuring privacy and data availability; the smaller epsilon, the better the privacy, the worse the availability of the data, and vice versa.

5. The untrusted server based differential privacy hybrid recommendation method according to claim 2, wherein the client-server based differential privacy collaborative protection recommendation system framework in the second step is as follows:

(1) Client user data privacy processing: the client of the user stores real interactive data, and the disturbance mode uses LCF-VDP technology to ensure the data privacy safety of the user and upload the disturbed interactive data to the server;

(2) The server calculates and mixes the similarity: the server receives the disturbed interactive data and calculates the project-project similarity according to the disturbed data of all the users; the server stores the information of the projects, screens out keywords which can represent the projects most as tags of the projects, and then carries out similarity calculation between the projects according to the tags; the server generates a new project-project similarity matrix by the project-project similarity matrixes generated in the two different modes in a certain weighting mode, selects topk from the new project-project similarity matrix and sends the topk to the client;

(3) The client calculates the prediction score and recommends: calculating the prediction score locally at the client of the user and recommending; and the client receives the mixed similarity data and calculates a prediction score according to the locally stored real interactive data.

6. The untrusted server based differential privacy hybrid recommendation method according to claim 2, wherein the step three of designing the LCF-VDP suitable for the data perturbation technique of the recommendation system, and the calculating the similarity and performing the project prediction score calculation recommendation comprises:

(1) Data perturbation

By utilizing a designed data perturbation technology LCF-VDP suitable for a recommendation system, considering the problems of data distribution and numerical sensitivity, the probability of 0 responding to the true value is not equal to the probability of 1 responding to the true value;

implicit definition in a recommendation system scenario is: when there are m entries, let X = [ X = ₁ ,x ₂ ,...x _m ]And Y = [ Y = ₁ ,y ₂ ,...y _m ]Respectively representing real interactive data and disturbed interactive data of a user, wherein X ⁱ And Y ⁱ The real interactive data and the disturbed interactive data respectively representing the user comprise items I _i Or does not contain item I _i According to the definition of differential privacy

Wherein

Then:

designing a new perturbation mechanism: when the numerical value is 1, returning the original value by the probability of p, setting the probability of 1-p as 0, when the numerical value is 0, setting the probability of q as 1, and returning the original value by the probability of 1-q; when p/q is less than or equal to e ^ε When, the mechanism satisfies-differential privacy; wherein p is [1/2,1 ]]When p =1-q, it is the conventional perturbation mechanism; when an extreme case p =1 is considered, only data with the value of 1 is subjected to privacy protection; when the data after receiving a disturbance is 0, the original value is also 0, and privacy protection is not performed on the value 0, however, the original value may also be 0 when the data after receiving a disturbance is 1; controlling the degree of privacy protection according to controls p and q;

(2) Similarity calculation

Calculating similarity sim _ jac (I) from the perturbed user-project interaction matrix _i ,I _j )；

Wherein, interaction (Y) _i ,Y _j ) Representing simultaneous interaction of item i and item from perturbed user-item matrixNumber of users of item j, unity (Y) _i ,Y _j ) Representing the number of users interacting at least with item i or item j from the perturbed user-item matrix;

sim(I _i ,I _j )＝a*sim_jac(I _i ,I _j )+b*sim_tag(I _i ,I _j )；

(3) Project prediction scoring calculation recommendations

The project prediction scores are in the form:

wherein, sim (I) _i ,I _j ) Is the mixed similarity.

7. A differential privacy hybrid recommendation system applying the untrusted server based differential privacy hybrid recommendation method according to any one of claims 1 to 6, characterized in that the differential privacy hybrid recommendation system comprises:

8. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program, which, when executed by the processor, causes the processor to carry out the steps of the untrusted server based differential privacy mixing recommendation method according to any one of claims 1 to 6.

9. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the untrusted server based differential privacy mixing recommendation method according to any one of claims 1 to 6.

10. An information data processing terminal characterized by being configured to implement the differential privacy hybrid recommendation system according to claim 7.