CN107679415A

CN107679415A - Secret protection cooperates with the collaborative filtering method based on model of Web service prediction of quality

Info

Publication number: CN107679415A
Application number: CN201710875789.9A
Authority: CN
Inventors: 毛睿; 李荣华; 陆敏华; 王毅; 罗秋明; 商烁; 刘刚
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2018-02-09
Also published as: WO2019056572A1

Abstract

The invention discloses a kind of collaborative filtering method based on model of secret protection collaboration Web service prediction of quality, comprise the following steps：The first step, Data Collection：Each user is being collected locally service quality value, i.e. qos value；Second step, data camouflage：Pretend service quality value；3rd step, the service quality value after the collaborative filtering camouflage based on model；4th step, prediction result：According to the service quality value after collaborative filtering come prediction result.Difference privacy is introduced collaborative Web service QoS prediction frameworks by the present invention first, and user can obtain maximum secret protection by ensuring the availability of data.Test result indicates that there is superiority in terms of the potential structure of capture QoS data present approach provides the QoS predictions of safety and accurate cooperation Web service, and based on the collaborative filtering method of model.

Description

Model-based collaborative filtering method for privacy protection collaborative Web service quality prediction

Technical Field

The invention belongs to the field of computers, particularly relates to a collaborative filtering method for privacy protection, and particularly relates to a collaborative filtering method based on a model for predicting the quality of privacy protection collaborative Web services.

Background

Quality of service (QoS) is widely used to describe the non-functional characteristics of web services. Quality of service based selection, composition, and recommendation web services techniques are widely discussed in recent papers. The premise of these methods is that the exact QoS value of the Web service is always available. But obtaining an accurate quality of service value is not an easy task. On the one hand, the QoS values published by service providers or third party communities are inaccurate for service users because they are susceptible to uncertain internet environments. On the other hand, it is impractical for a service user to directly evaluate the QoS of all available services due to time, cost and other resource constraints. To address this issue, a breakthrough is the QoS prediction of personalized collaborative Web services. The basic idea is that users with similar characteristics tend to observe similar QoS values for the same service, and therefore when it is desired to predict the QoS value observed by a particular user for a web service, the value observed by the users with similar characteristics may be substituted.

In this way, different users are also typically given different QoS prediction values for the same service, and the final prediction value is actually dependent on their particular context. Based on these provided QoS values, various techniques have been employed to improve quality, particularly the accuracy of the predictions.

Collaborative Web services QoS prediction has become an important tool for generating accurate personalized QoS. Although much effort has been made in research to improve the accuracy of collaborative QoS predictions, there is insufficient effort to protect user privacy in this process. In fact, the observed QoS values may be sensitive information, and thus users may be reluctant to share them with others. For example, the observed response time fed back by a user is typically dependent on her location, indicating that the user's location can be inferred from the QoS information she provides. Therefore, one problem is whether the recommendation system can make an accurate personalized QoS prediction for the user on the premise of protecting the privacy of the user.

Homomorphic encryption, which allows computation on ciphertext, is a straightforward way to achieve privacy. However, all these operations not only require significant computational costs, but also require continuous communication between the parties, even without considering the difficulties of applying some complex computations into the cryptographic domain. Therefore, it is not feasible to deal with our problem by using homomorphic encryption.

Another technique, random perturbation, proposed by polic et al, claims that accurate recommendations can still be obtained with this technique, while randomness from a specific distribution is added to the original data to prevent information leakage. However, the range of randomness α is empirically chosen and there is no provable privacy guarantee. However, for applications that perturb data to cluster, an adversary can accurately infer the user's private data with up to 70% accuracy.

Thus, while the privacy preserving method of random perturbation is insecure, it motivates us to design a lightweight and provable random perturbation. Specifically, a privacy protection QoS prediction model and a differential privacy model are developed for a user, the model can powerfully protect privacy data and has provable privacy guarantee, and the model is the most advanced privacy data state protection technology at present. Differential privacy has attracted extensive attention because it is intended to provide an effective method to minimize noise added to the original data.

Despite the widespread interest in differential privacy, the application of QoS prediction remains quite limited. Reference 1[ 1], [ F.McSherry and I.Mironov.Dimentally private records systems. Machanavajhala et al [ reference 2] studied privacy protection for personalized social recommendations, which is based entirely on the user's social graph. With differential privacy, sensitive links in a social graph can be effectively protected, meaning that an attacker cannot infer the presence of a single link in the graph by passively observing the recommendation. However, another problem is that quality recommendations can only be achieved with weak privacy parameters, or only for a small fraction of users. Mc sheery and Mironov [ reference 1] apply differential privacy to collaborative filtering r.m. bell and y.koren. Scalable collaborative filtering with joined weighted interference weights. Icdm 2007 ], which is a common solution for recommendation systems. They split the recommendation algorithm into two parts: the method comprises a learning stage and an individual recommendation stage, wherein the learning stage is executed by using differential privacy guarantee, and the individual recommendation stage uses a learning result for individual prediction. Unlike the work done in references 1 and 2, which focuses on privacy assurance of data distribution rather than knowledge learning, the present invention explores other methods than that being studied in reference 1, such as a latent factor model.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a model-based collaborative filtering method for privacy protection collaborative Web service quality prediction. Differential privacy is introduced into a collaborative Web service QoS prediction framework for the first time, and a user can obtain the maximum privacy protection by ensuring the availability of data. Experimental results show that the method provides safe and accurate QoS prediction of the collaborative Web service, and the model-based collaborative filtering method has superiority in capturing the potential structure of QoS data.

In order to solve the technical problem, the invention provides a model-based collaborative filtering method for privacy protection collaborative Web service quality prediction, which comprises the following steps:

the first step, data collection: each user locally collects a quality of service value, i.e. a QoS value;

step two, data disguising: disguising a quality of service value;

thirdly, cooperatively filtering the service quality value after disguising based on the model;

fourthly, predicting the result: and predicting the result according to the service quality value after collaborative filtering.

As a preferable technical solution of the present invention, in the second step, the data camouflage is performed by a method including:

using r _ui To represent the quality of service value-QoS value, r, collected by user u for web service i _u Represents the entire vector of QoS values evaluated by user u, and similarly, I _ui And I _u Respectively representing binary elements and vectors, R, indicating whether QoS values are present or not _u Representing the disguised data; the epsilon-differential privacy for each user u is achieved by the following equation:

R _ui ＝r _ui +Laplace(Δf/ε)

where ε is the privacy parameter for exploiting privacy, Δ f is defined as the maximum difference between QoS values, i.e.:

Δf＝max(r _ui -r _uj )

r _ui represents the quality of service value collected by user u for web service i-QoS value, r _uj Represents a quality of service value-QoS value collected by user u for web service j;

and the meaning of Laplace () is given by the following formula:

if the probability density function for a random variable x is:

the random variable x has a laplace (μ, b) distribution; μ and b are a position parameter and a scale parameter, respectively; let μ =0, so the distribution is considered to beStandard deviation ofA symmetric exponential distribution of (a); to add noise that obeys the laplacian distribution, let b = Δ f/ε, and the generation of noise is referred to as laplace (Δ f/ε).

As a preferred technical scheme of the invention, in the second step, the data disguise achieves the purpose of disguising data by randomly interfering the original data; randomness should ensure that sensitive information cannot be derived from perturbed data, including quality of service values for each individual user; when the number of users is very large, the aggregated information of the users can still be evaluated with high accuracy.

As a preferred solution of the invention, in a second step, said privacy parameter epsilon is given by each user, by using differential privacy, the random number added in the observed QoS value is the minimum value that maintains a considerable accuracy with respect to the specific privacy.

As a preferred technical solution of the present invention, in the third step, the model-based collaborative filtering method specifically comprises: using factorization of the matrix MF, assuming a sparse matrix Q _n*m Represents the observed QoS values of n users and m services, where each element q _ij Reflecting QoS values of user i using service j, using input matrix Q _n*m MF aims at serving the users to the matrix Q _n*m Factorization into two matrices of lower dimension d: user factor matrix U _n*d And a service factor matrix V _m*d (ii) a Then, Q _n*m The null element in (b) may be approximated as the product of U and V, i.e., the unknown QoS value q' _ij ByTo estimate;

MF is often converted to an optimization problem and local optimal solutions are obtained by iteration; the objective function or loss function of the MF is defined as:

first partIs the squared difference between the existing QoS matrix and the prediction matrix, but only for elements that have been evaluated by the user; the latter partIs a regularization term added to handle overfitting due to input sparsity; by processing the optimization, a user factor matrix U is finally obtained _n*d And a service factor matrix V _m*d (ii) a This problem is solved by using a random gradient descent SGD, whose iterative equation is as follows:

where γ is the learning rate and λ' is the regularization coefficient; the choice of two parameters will significantly affect the result, which will diverge rather than converge when the value of γ is large; to obtain convergence, γ is empirically set to 0.001, and likewise λ' is empirically set to 0.01, although longer training times are required; the iteration will terminate when the objective function value is less than a certain threshold.

As a preferred technical solution of the present invention, in the fourth step, the prediction result specifically is: after the QoS value of a certain service is obtained through collaborative filtering, the QoS values of other users aiming at the same service are retrieved, the user with the closest value is selected, the two users have similar interests and hobbies, similar recommendation is made based on the user with the closest value, and the related value of the next user is adopted as the prediction result of the previous user.

Compared with the prior art, the invention has the following beneficial effects: the invention discloses a model-based collaborative filtering method for privacy protection collaborative Web service quality prediction, which provides a privacy protection collaborative QoS prediction framework, can protect private data of a user, and simultaneously reserves the capability of generating accurate QoS prediction. The invention introduces differential privacy as preprocessing of QoS data prediction, and the differential privacy is a strict and provable privacy protection technology. The present invention implements the proposed method based on a general method known as the laplace mechanism, and performs extensive experiments to study its performance on real datasets. The privacy accuracy of the experiment is evaluated under different conditions, and the result shows that under some constraints, the method can achieve better performance than the baseline. The invention mainly has the following advantages:

1. for the method proposed by the present invention, the privacy-preserving algorithm can be parameterized and used to match the prediction to its non-private analogues. Although there are some specialized analytical requirements, the method itself is relatively straightforward and readily available.

2. By integrating privacy protection into the application, users can be provided with unrestricted access to the original data with much less than the entire data set that meets privacy criteria at the end of its output.

3. The present invention tests the method with a real dataset. The result shows that the prediction accuracy of the disguised data of the invention is very close to that of the private data of the user.

4. Experimental results prove that the model-based collaborative filtering method for privacy protection collaborative Web service quality prediction has superiority in capturing the potential structure of QoS data.

Drawings

The invention is further illustrated with reference to the following figures and examples.

FIG. 1 is a schematic flow diagram of a model-based collaborative filtering method for privacy-preserving collaborative Web quality of service prediction according to the present invention.

Fig. 2 is a schematic diagram of a privacy preserving collaborative QoS prediction model.

FIG. 3 is a schematic diagram illustrating the comparison of privacy and accuracy between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; fig. 3 (a) represents the response time, and fig. 3 (b) represents the total time.

FIG. 4 is a schematic diagram of the comparison of the impact of service between differential privacy based QoS prediction and the original method under different privacy in the experiment of the present invention; fig. 4 (a) represents the response time, and fig. 4 (b) represents the total time.

FIG. 5 is a schematic diagram showing a comparison of user impact between differential privacy based QoS prediction and original methods under different privacy in the experiment of the present invention; fig. 5 (a) represents the response time, and fig. 5 (b) represents the total time.

FIG. 6 is a graph showing the results of the comparison of the precision at different densities between the QoS prediction based on differential privacy and the original method under different privacy conditions in the experiment of the present invention; fig. 6 (a) represents the response time, and fig. 6 (b) represents the total time.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views illustrating only the basic structure of the present invention in a schematic manner, and thus show only the constitution related to the present invention.

1. System model and problem definition

1. Differential privacy

It is necessary to distinguish between differential privacy and conventional cryptographic systems. Differential privacy gives a strictly quantitative definition of privacy leakage under a very strict attack model, and demonstrates that: based on the idea of differential privacy, users can maximally obtain privacy protection and ensure the usability of data. The method has the following advantages: although the data is distorted, the noise required for the disturbance is independent of the data size. We can achieve a high level of privacy protection by adding a very small amount of noise. Although many privacy preserving methods have been proposed, such as k-anonymity and l-diversity, differential privacy is still considered to be the most stringent and robust privacy preserving model on its solid mathematical basis.

2.1 Security Definitions under differentiated privacy

There are two prerequisites for differential privacy. One is that the output of any computation (e.g., SUM) should not be affected by operations like inserting or deleting records. Another is that it gives a strictly quantitative definition of the privacy disclosure under a very strict attack model: an attacker cannot distinguish records with a probability greater than epsilon even if she knows the entire data set except the target. The formula is defined as follows:

definition 1: (epsilon-differential privacy) if D1 and D2 differ by at most one element for all datasets and all S e Range (K), then the random function K gives epsilon-differential privacy,

d is a database of rows, D1 is a subset of D2, and the larger data set D2 contains exactly one additional row. The probability space Pr [ ] in any case is on the coin flip of K. Privacy parameters ε >0 are public, with smaller ε yielding stronger privacy guarantees.

Since differential privacy is defined under probability, any method to achieve this must be random. Some of these methods rely on the addition of controlled noise, such as laplace mechanisms [ c.dwork, f.mcsherry, k.nissm and a.smith.calibration noise to sensitivity in private data analysis. Tcc 2006-284 ]. Others, such as exponential mechanisms and a posteriori sampling, sample from a problem-dependent distribution. We will explain the structure in detail in the following section.

2.2 Laplace mechanism of Global sensitivity

In addition to the definition of differential privacy, dwork [ c.dwork, f.mcsherry, k.nissm and a.smith.calibration noise to sensitivity in private data analysis.tcc 2006-284 ] also states that differential privacy can be achieved by adding random noise that obeys a laplacian distribution. If the probability density function for a random variable is:

the random variable has a laplacian (μ, b) distribution. μ and b are the location parameter and the scale parameter, respectively. For simplicity, we assume μ =0, so the distribution can be considered as the standard deviation ofSymmetric exponential distribution of (a).

To add noise that obeys the laplacian distribution, let b = Δ f/∈, and the generation of noise is referred to as

laplace(Δf/ε)

Here, Δ f is the global sensitivity, which is defined below. ε is a privacy parameter for exploiting privacy. As we see from the equation, the added noise is proportional to Δ f and inversely proportional to ∈.

Definition 2: (Global sensitivity) pair f: D → R ^d L of f _k -sense is defined as:

for all D1, D2 differs by at most one element, | | | · | luminance _k Represents L _k And (4) norm.

3.1 System model

[ S.Zhung, J.Ford and F.Makedon.Deriving Private Information from random modulated sources SDM 2006 ] has proven that random perturbations are unsafe because it can be inferred by clustering techniques, but the system model proposed [ J.Zhu, P.He, Z.Zhung and M.R.Lyu.A Private-compressing QoS Prediction Framework for Web Service recommendation. ICWS 2015 ] is mature and applicable to many scenarios, and therefore, this model is applied here. As shown in fig. 2, in particular, each USER (USER 1, USER2 \8230; USER, etc.) invokes and collects QoS values locally and masquerades the QoS values she observes, and then sends the owners of all the masqueraded QoS values to the SERVER (SERVER). The QoS value can then be uploaded securely because the server cannot derive any personally sensitive information with masquerading data. However, the data masquerading scheme should still be able to allow the server to perform collaborative filtering (neighborhood-based or model-based) from the masquerading data. Based on the predicted QoS value (QoS Prediction), the server may run various applications, such as selection, combination, and recommendation based on the QoS value.

Data masquerading is a key component of privacy protection collaborative Web service QoS prediction. The basic idea of data masquerading is to randomly disturb the original data among these attributes:

a) The randomness should ensure that sensitive information cannot be derived from the perturbed data (e.g., qoS values for each individual user);

b) Although personal information is limited, when the number of users is very large, the aggregate information of the users can be evaluated with high accuracy.

This property is useful for calculations based on aggregated information. Without knowing the exact values of the individual data items, we can still get meaningful results, since the required aggregated information can be estimated from the perturbed data.

Another important point of our approach is the trade-off between accuracy and privacy. The more random numbers, the larger the gap between the masquerading data and the original data, which provides a higher level of privacy protection. Conversely, the fewer the random numbers, the more apparent the data characteristics. For context-based calculations, this indicates that the results are more accurate. The balance between process accuracy and privacy is an open question. In the present invention, privacy is parameterized as ε and is given by each user. By utilizing differential privacy, the random number added in the observed QoS value is a minimum value that maintains a fair degree of accuracy with respect to the specific privacy.

2. The invention relates to a collaborative filtering method based on a model for privacy protection collaborative Web service quality prediction

As shown in fig. 1, the model-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention includes the following steps:

first step, data collection: each user locally collects a quality of service value, i.e. a QoS value;

step two, data disguising: disguising a quality of service value;

The data disguising in the second step adopts the following method:

using r _ui To represent the quality of service value-QoS value, r, collected by user u for web service i _u Represents the entire vector of QoS values evaluated by user u, and similarly, I _ui And I _u Respectively representing a binary element and a vector, R, indicating whether a QoS value is present or not _u Representing the disguised data; the epsilon-differential privacy for each user u is achieved by the following equation:

R _ui ＝r _ui +Laplace(Δf/ε)

Δf＝max(r _ui -r _uj )

r _ui represents the quality of service value collected by user u for web service i-QoS value, r _uj Represents the quality of service value-QoS value collected by user u for web service j;

and the meaning of Laplace () is given by the following formula:

if the probability density function for a random variable x is:

the random variable x has a laplace (μ, b) distribution; μ and b are a position parameter and a scale parameter, respectively; let μ =0, so the distribution is considered as a standard deviation ofA symmetric exponential distribution of (a); to add noise that obeys the laplacian distribution, let b = Δ f/ε, and the generation of noise is referred to as laplace (Δ f/ε).

The privacy parameter epsilon is given by each user and by using differential privacy the random number added in the observed QoS value is the minimum value that maintains a considerable accuracy with respect to the specific privacy.

The data disguise achieves the purpose of disguising data by randomly interfering the original data; the randomness should ensure that sensitive information, including quality of service values for each individual user, cannot be derived from perturbed data; when the number of users is very large, the aggregated information of the users can still be evaluated with high accuracy.

Differential privacy based on data masquerading:

we use r _ui To represent the QoS value, r, collected by user u for web service i _u Represents the entire vector of QoS values evaluated by user u, and similarly, I _ui And I _u Respectively, a binary element and a vector indicating whether a QoS value exists. c. C _u ＝|I _u Is the number of QoS values evaluated by user u. In our discussion, differential privacy is a key technology for data masquerading. Laplace mechanism [ c.dwork, f.mcsherry, k.nissm and a.smith.calibration noise to sensitivity in private data analysis. Tcc 2006]Epsilon-differential privacy is obtained by increasing the noise of the laplacian distribution.

Definition 3: (Laplace mechanism [ C.Dwork.Diemential privacy. Encyclopedia of Cryptography and Security.2011:338-340.]) Giving a function: g = D → R ^d The following calculation maintains epsilon-differential privacy

X＝g(x)+Laplace(Δf/ε)

Where epsilon is a privacy parameter for exploiting privacy, and a smaller epsilon provides a stronger privacy guarantee. Δ f is the de-global sensitivity. Here, we use L ₁ Norm calculation Δ f:

for simplicity, ε -differential privacy for each user u is achieved by the following equation:

R _ui ＝r _ui +Laplace(Δf/ε)

where Δ f is defined as the maximum difference between the QoS values, i.e.:

Δf＝max(r _ui -r _uj )

after masquerading, all users send the masquerading QoS value R to the server _u Random retention of the original data r _ui Of the sensitive information. However, we can still estimate the aggregate information of the users. Thus, R can be directly accessed independently _ui To perform QoS prediction.

Wherein, the third step is based on the collaborative filtering of the model. Collaborative Filtering (CF) is a well established technique employed by most modern recommendation systems. In QoS prediction of collaborative Web services, a user is required to provide an observed QoS value of a service that the user uses to a recommendation system. Based on the collected QoS values, the recommendation system may predict the QoS of all available services for the user through some premium algorithm. The more QoS values a user provides, the higher the prediction accuracy. In the invention, model-based collaborative filtering is adopted, and the specific method is as follows:

matrix Factorization (MF) [ Z.Zheng, H.Ma, M.R.Lyu and I.King.QoS-aware web service communication by collaborative filtering. TSC 2011,4 (2): 140-152] is a typical solution for model-based collaborative filtering, and the accuracy of prediction can be effectively improved by researching potential factors of a model.

Hypothesis sparse matrix Q _n*m Represents the observed QoS values of n users and m services, where each element q _ij Reflecting the QoS value of user i using service j. Using input matrix Q _n*m MF aims at serving the users to the matrix Q _n*m Factorize into two matrices of lower dimension d: moment of user factorArray U _n*d And a service factor matrix V _m*d . Then, Q _n*m The null element in (b) may be approximated as the product of U and V, i.e., the unknown QoS value q' _ij ByTo estimate.

The MF is often converted into an optimization problem and a locally optimal solution is obtained through iteration. The objective function (or loss function) of the MF is defined as:

the first partIs the squared difference between the existing QoS matrix and the prediction matrix, but only for elements that have been evaluated by the user. The latter part λ (| | U) _i || ² +||V _j || ² ) Is a regularization term added to handle overfitting due to input sparsity. By processing this optimization, we end up with a user factor matrix U _n*d And a service factor matrix V _m*d 。

Alternative Least Squares (ALS) and random gradient descent (SGD) are two common methods to solve this optimization problem. Since the Alternative Least Squares (ALS) method is more difficult and requires the computation of the inverse matrix, we use random gradient descent (SGD) to solve this problem. The iteration equation for SGD is as follows:

where γ is the learning rate and λ' is the regularization coefficient. The choice of two parameters will significantly affect the result. When the value of γ is large, the result diverges rather than converges. To achieve convergence we empirically set γ to 0.001, although longer training times are required. Also, λ' is empirically set to 0.01.

In the first iteration, U and V are randomly set. But appropriate settings may make the later efficient calculations faster. Therefore, we initialize U and V around the average of all QoS values observed. The iteration will terminate when the objective function value is less than a certain threshold.

And in the fourth step, after the QoS value of a certain service is obtained through collaborative filtering, searching other users for the QoS value of the same service, selecting the user with the closest value, which indicates that the two users have similar interests, making similar recommendation based on the user with the closest value, and adopting the related value of the next user as the prediction result of the previous user.

3. Experiment of

In this section, we performed three series of experiments on real datasets to evaluate our privacy preserving QoS prediction framework. The first series of experiments investigated the balance between privacy and accuracy when using the proposed method. Two additional series of experiments investigated some important data features, including the effect of size and density on the performance of our method.

TABLE 1 data set statistics

3.1 Experimental configuration

We first note that [ Z.ZHENG, Y.ZHANG and M.R.Lyu.inquiring QoS of Real-World Web services.TSC 2014 7 (1): 32-39; z.zheng, y.zhang and m.r.lyu.distributed QoS Evaluation for Real-World Web services. Icws 2010, 83-90, introduces a Real Web service QoS dataset comprising 5,825 Real Web service QoS values observed by 339 users. This data set is very useful in studying the accuracy of QoS predictions. From the data set, we focus on two representative QoS attributes: response Time (RT) and total Time (TP). Table 1 describes the statistics of the data set, AVE and STD are mean and standard deviation, respectively, and density refers to the ratio of observed data to all data. More details of the dataset may be found in [ Z.Zheng, Y.Zhang and M.R.Lyu.investigating QoS of Real-World Web services.TSC 2014 7 (1): 32-39; z.zheng, y.zhang and m.r.lyu.distributed QoS Evaluation for Real-World Web services.icws 2010.

We use cross-validation to train and evaluate QoS predictions. The data set is relatively complete, but in practice, due to limited time and resources, users usually call only a few services, and the data density is generally below 10%. To simulate this sparsity in our experiments, we randomly removed entries from the complete data set, leaving only a smaller density of historical QoS values as our training set. The deleted data is used as a test set for accuracy evaluation.

Then, we perform the QoS prediction algorithm on the training set and predict the test set. We implemented and evaluated four algorithms. UIPCC proposed in [ z.zheng, h.ma, m.r.lyu and i.king.wsrec: a Collaborative Filtering Based Web Service recontaminer system. Icws 2009 ] is a representative implementation Based on neighborhood Collaborative Filtering and MF introduced in [ z.zheng, h.ma, m.r.lyu and i.king.qos-aware Web Service registration by Collaborative Filtering. Tsc 2011,4 (2): 140-152] is an implementation Based on model Collaborative Filtering. LUIPCC and LYMPH are two different privacy integration approaches implemented by the laplace mechanism.

To quantify the accuracy of QoS predictions, we take Root Mean Square Error (RMSE) as a widely used metric in the relevant work (e.g. [ a. Berlioz, a. Friedman, m.a. Kaafr, r.boreli and s.berkovsky.application differential privacy to matrix factor. Recsys 2015:

r consists of all values in the training set that need to be predicted, and | R | is the number of elements in R. q's' _ui Is a predictor of set R, q _ui Are the corresponding values in the test set. Generally, a smaller RMSE indicates a better prediction result.

Note that the default parameter settings are shown in table 2. We empirically select the parameters of UIPCC and MF. By default, ε is set to 0.5, which protects sufficient privacy.

TABLE 2 parameter settings

UIPCC	k＝20	λ＝0.1	-
				MF	d＝20	γ＝0.001	λ'＝0.01
Laplace	ε＝0.5	-	-

3.2 privacy and accuracy

Fig. 3 is a comparison between our QoS prediction based on differential privacy and the original method under different privacy, corresponding to RT and TP. By introducing differential privacy into QoS prediction, a user may achieve privacy protection. But for users who adopt our approach, they do need to consider a balance between privacy and accuracy. On the one hand, the user can obtain higher privacy protection by adding more laplacian noise, which certainly reduces the effectiveness of the data. On the other extreme side, the user can achieve 100% accuracy without adding any laplacian noise. To study the performance of varying accuracy, we performed QoS prediction algorithms on the test set and predicted the test set. The privacy parameter epsilon is incremented in steps 0.5 in the range 0.5 to 4. We can observe that both LUIPCC and LMF drop to RMSE as epsilon increases. A larger epsilon implies a looser privacy constraint and the utility of the data is not limited, so the user can get better accuracy. It is also worth noting that when epsilon becomes large in fig. 3 (e.g., greater than 2.0), our privacy preserving methods LUIPCC and LMF can achieve almost the same or even higher accuracy as UIPCC. Particularly when ε is greater than 4, the prediction accuracy of LMF is better than UIPCC. Furthermore, we have also found that MF is superior to UIPCC. This demonstrates the superiority of the model-based approach in capturing the underlying structure of the QoS data. Another fact that we need to pay attention to is that although a recent work [ j.zhu, p.he, z.zheng and m.r.lyu.a Privacy-serving QoS Prediction Framework for Web Service recommendation. Icws ] claims better performance than the original algorithms (UIPCC and MF), the added randomness to prevent Information leakage is not large enough, the adversary can accurately infer the user's Private data with the application of the cluster [ s.zhang, j.ford and f.major.deviation Information from rated sources. Sdm.

In summary, our differential privacy based algorithm can provide privacy preserving QoS prediction with parameterized privacy. The results show that the user data we disguise is very close to the loose constraints of the user private data.

3.3 influencing data size

To evaluate the impact of data size, we designed experiments by varying the number of services and users, respectively. In fig. 4, step 1000 sets the number of users as 339, and the number of services changes from 1000 to 5000, where the services are randomly selected from the original data set. Other parameter settings for the experiment are shown in table 2. We performed the same experimental setup in fig. 5, which contained 5825 services.

It is clear that both the number of services and the number of users have a positive impact on the accuracy of the algorithm, which means that the more data is given, the better the prediction is. In other words, with more data we can provide better accuracy.

Another finding is that the trend of the original algorithm and our privacy-based differential privacy algorithm is the same, such as the trend of UIPCC and LUIPCC or the trend of MF and LMF, although the precision difference between different data sizes is large. This means that the noise required for digital concealment is independent of the data size, so the user can achieve a high level of privacy protection by adding a very small amount of noise.

3.4 Effect of Density

In addition to data size, the density, expressed as θ, is also a major factor in algorithm performance. Figure 6 shows the results of the accuracy comparison at different densities. Although the effect of density on the original algorithm is not significant, it does have a significant effect of our difference-based algorithm. Higher density datasets perform better. This result means that density is also a key factor in determining the performance of the differential privacy method. More importantly, as the number of services becomes larger, the gap between the traditional approach and our privacy-based differencing approach becomes smaller and smaller. More specifically, when the density is set to 5 in fig. 6, the gap between the LUIPCC and UIPCC is 5. However, as the density increased to 30, the gap between LUIPCC and UIPCC decreased to 1. Thus, users are suggested to use a higher density data set to make the prediction closer to the original result.

5. Conclusion

The invention introduces differential privacy into a collaborative Web service QoS prediction framework for the first time. Differential privacy gives a strict quantitative definition of privacy leakage under very strict constraints. Based on the idea of differential privacy, the user can obtain maximum privacy protection by ensuring the availability of data. Experimental results show that the system and the method provide safe and accurate QoS prediction of the cooperative Web service.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A collaborative filtering method based on a model for privacy protection collaborative Web service quality prediction is characterized by comprising the following steps:

the first step, data collection: each user locally collects quality of service values, i.e. QoS values;

step two, data disguising: disguising a quality of service value;

2. The method of claim 1, wherein in the second step, the data masquerading is performed as follows:

R _ui ＝r _ui +Laplace(Δf/ε)

Δf＝max(r _ui -r _uj )

r _ui representing a quality of service value-QoS value, r, collected by user u for a web service i _uj Represents the quality of service value-QoS value collected by user u for web service j;

and the meaning of Laplace () is given by the following formula:

if the probability density function for a random variable x is:

the random variable x has a laplace (μ, b) distribution; μ and b are a position parameter and a scale parameter, respectively; let μ =0, so the distribution is considered as a standard deviation ofA symmetric exponential distribution of; to add noise that obeys the laplacian distribution, let b = Δ f/ε, and the generation of noise is referred to as laplace (Δ f/ε).

3. The method of claim 1, wherein in the second step, the data masquerading achieves the purpose of masquerading data by randomly disturbing the original data; randomness should ensure that sensitive information cannot be derived from perturbed data, including quality of service values for each individual user; when the number of users is very large, the aggregated information of the users can still be evaluated with high accuracy.

4. A method according to claim 2, characterized in that in the second step, the privacy parameter s is given by each user, and by using differential privacy, the random number added in the observed QoS value is a minimum value that maintains a considerable accuracy with respect to the particular privacy.

5. The method of claim 1, wherein in a third step, the model-based collaborative filterThe method comprises the following steps: using factorization of matrices MF, assuming a sparse matrix Q _n*m Represents the observed QoS values of n users and m services, where each element q _ij Reflecting QoS values of user i using service j, using input matrix Q _n*m MF aims at serving the users to the matrix Q _n*m Factorization into two matrices of lower dimension d: user factor matrix U _n*d And a service factor matrix V _m*d (ii) a Then, Q _n*m The null element in (b) may be approximated as the product of U and V, i.e., the unknown QoS value q' _ij ByTo estimate;

MF is often converted into an optimization problem and a locally optimal solution is obtained through iteration; the objective function or loss function of MF is defined as:

the first partIs the squared difference between the existing QoS matrix and the prediction matrix, but only for elements that have been evaluated by the user; the latter part λ (| 0) _i || ² +||V _j || ² ) Is a regularization term added to handle overfitting due to input sparsity; by processing the optimization, a user factor matrix U is finally obtained _n*d And a service factor matrix V _m*d (ii) a This problem is solved by using a random gradient descent SGD, whose iterative equation is as follows:

6. The method according to claim 1, wherein in the fourth step, the prediction result is specifically: after the QoS value of a certain service is obtained through collaborative filtering, the QoS values of other users aiming at the same service are retrieved, the users with the most similar values are selected, the two users have similar interests and hobbies, similar recommendation is made based on the users, and the related value of the next user is used as the prediction result of the previous user.