WO2019056571A1

WO2019056571A1 - Method for predicting quality of web service

Info

Publication number: WO2019056571A1
Application number: PCT/CN2017/113484
Authority: WO
Inventors: 毛睿; 李荣华; 陆敏华; 王毅; 罗秋明; 商烁; 刘刚
Original assignee: 深圳大学
Priority date: 2017-09-25
Filing date: 2017-11-29
Publication date: 2019-03-28
Also published as: CN107609421A

Abstract

Provided is a method for predicting the quality of a Web service (QoS). The method comprises the following steps: step one, data collection, involving: each user locally collecting a QoS value, i.e. a QoS value; step two, normalization, involving: executing z-score normalization on the QoS value; step three, data disguising, involving: disguising the QoS value; step four, using neighbourhood-based collaborative filtering to process the disguised QoS value; and step five, result prediction, involving: predicting a result according to the QoS value subjected to collaborative filtering. The method can protect the privacy of a user and ensure the availability of data.

Description

[Name of invention made by ISA according to Rule 37.2] Method for predicting the quality of WEB service

Technical field

The invention belongs to the field of computers, and particularly relates to a collaborative filtering method for privacy protection, in particular to a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction.

Background technique

Quality of Service (QoS) is widely used to describe the non-functional nature of web services. Quality of service based selection, composition and recommendation web service technologies have been extensively discussed in recent papers. The premise of these methods is that the exact QoS values of the Web service are always available. But getting accurate quality of service values is not an easy task. On the one hand, QoS values published by service providers or third-party communities are inaccurate for service users because they are susceptible to an uncertain Internet environment. On the other hand, it is impractical for service users to directly assess the QoS of all available services due to time, cost and other resource constraints. To solve this problem, the breakthrough point is the QoS prediction of personalized collaborative Web services. The basic idea is that users with similar personalities tend to observe similar QoS values for the same service, so when it is necessary to predict the QoS values observed by a particular user for a web service, they can be replaced with values observed by users with similar personality. .

In this way, different users for the same service are usually given different QoS predictions, and the final predicted value actually depends on its specific context. Based on these provided QoS values, various techniques have been employed to improve quality, particularly prediction accuracy.

Collaborative Web Services QoS prediction has become an important tool for generating accurate personalized QoS. Although many achievements have been made in research to improve the accuracy of collaborative QoS prediction, the work done to protect user privacy in this process is not enough. In fact, the observed QoS values may be sensitive information, so users may be reluctant to share them with others. For example, the observed response time fed back by the user typically depends on her location, which indicates that the location of the user can be inferred from the QoS information she provides. Therefore, one question is whether the recommendation system can accurately and personally predict QoS for users while preserving user privacy.

Homomorphic encryption, which allows calculations on ciphertext, is a direct way to achieve privacy. However, all of these operations require not only a large computational cost, but also continuous communication between the parties, and even the difficulty of applying some complex calculations to the encryption domain. Therefore, it is not feasible to deal with our problems by using homomorphic encryption.

Another technique, the random perturbation proposed by Polat et al., claims that accurate predictions can still be obtained with this technique, while randomness from a particular distribution is added to the raw data to prevent information leakage. However, the range of randomness a is chosen empirically and there is no provable privacy guarantee. However, for clustering applications that perturb data, the adversary can accurately infer the user's private data with an accuracy of up to 70%.

Therefore, although the random perturbation privacy protection method is not safe, it inspires us to design a lightweight and provable random perturbation. Specifically, we developed a privacy-protected QoS prediction model for users, a differential privacy model that strongly protects private data and has provable privacy guarantees. This is the most advanced privacy-protected data state technology. Differential privacy has caused widespread concern because it aims to provide an efficient way to minimize the noise added to the original data.

Although differential privacy has received widespread attention, the application of QoS prediction is still quite limited. Reference 1 [F.McSherry and I.Mironov.

Private recommender systems: building privacy into the net. SIGKDD 2009: 627-636] and reference 2 [A. Machanavajjhala, A. Korolova and ADSarma. Personalized social recommendations: accurate or private. PVLDB 2011 4(7): 440-450 ] are two privacy-based privacy protection recommendation systems, which is the most relevant work for our problem. Machanavajjhala et al. [Reference 2] studied the privacy protection of personalized social recommendations, which is based entirely on the user's social graph. With differential privacy, sensitive links in the social graph can be effectively protected, which means that an attacker cannot infer the existence of a single link in the graph by passively observing the recommendation results. However, another problem is that quality recommendations can only be achieved with weaker privacy parameters, or only for a small number of users. McSheery and Mironov [Reference 1] apply differential privacy to collaborative filtering [RMBell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. ICDM 2007: 43-52], which is a general solution for recommending systems. They divide the recommendation algorithm into two parts: the learning phase and the separate recommendation phase. The learning phase uses differential privacy guarantees to perform, and the separate recommendation phase uses the learning results for individual predictions. Unlike the work done by Reference 1 and Reference 2, the present invention focuses on privacy assurance of data distribution, rather than knowledge learning, and the present invention explores other methods, such as latent factor models, other than those being studied in Reference 1.

Summary of the invention

The technical problem to be solved by the present invention is to provide a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, and introduce differential privacy into a collaborative Web service QoS prediction framework for the first time, and users can obtain the maximum by ensuring data availability. privacy protection. Experimental results show that the method of the present invention provides secure and accurate QoS prediction for collaborative Web services.

To solve the above technical problem, the present invention provides a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, which includes the following steps:

The first step, data collection: each user collects the quality of service value, that is, the QoS value locally;

The second step, normalization: performing z-score normalization on the quality of service values;

The third step, data camouflage: camouflage the quality of service;

The fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;

The fifth step, the prediction result: predict the result according to the collaboratively filtered quality of service value.

As a preferred technical solution of the present invention, in the second step, the performing z-score normalization on the quality of service value uses the following equation:

Where r _ui represents the quality of service value-QoS value collected by the user u for the web service i,

Is the QoS vector r _u average, ω _u is the standard deviation of the QoS vector r _u ; after normalization, the QoS data has zero mean and unit variance.

As a preferred technical solution of the present invention, in the third step, the data masquerading pretends the normalized QoS value according to the following formula:

Q _ui =q _ui +Laplace(Δf/ε)

Where ε is a privacy parameter, set by the user u, and Δf is defined according to the distribution of QoS values, ie, Δf=max(r _ui -r _uj );

r _ui represents the quality of service value-QoS value collected by the user u for the web service i, and r _uj represents the quality of service value-QoS value collected by the user u for the web service j;

The meaning of Laplace() is given by the following formula:

If the probability density function of a random variable x is:

Then the random variable x has a Laplacian (μ, b) distribution; μ and b are positional parameters and scale parameters, respectively; let μ = 0, so the distribution is considered to be the standard deviation

The symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).

As a preferred technical solution of the present invention, in the third step, after the data is disguised, the user sends the masqueraded value Q _ui to the server, and randomly stores the sensitive information of the original data q _ui .

As a preferred technical solution of the present invention, in the third step, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy. The minimum value of the degree.

As a preferred technical solution of the present invention, in the third step, the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including each individual user. The quality of service value; when the number of users is very large, the aggregated information of these users can still be evaluated with high accuracy.

As a preferred technical solution of the present invention, in the fourth step, the method for neighborhood-based collaborative filtering is as follows: the similarity between two users u and v is calculated based on services that they usually use the following equation call. :

Where S=S _u ∩S _v is the service set normally invoked by user u and user v, r _u,i is the service i observed by user u

QoS value,

Is the average QoS value of all services observed by user u;

Use Q _ui to approximate the similarity values as follows:

According to z normalization,

And by substituting the formula into the calculation, the similarity is calculated as

During the z normalization,

easy

It can be shown that despite the use of data masquerading, the scalar product property between the two vectors remains the same;

The range of Sim(u,v) is [-1,1]. The larger the value, the more similar the two users or services are; based on the above similarity values, the QoS value of the service i observed by the user u can be directly predicted; The equation uses a similar user of user u:

As a preferred technical solution of the present invention, in the fifth step, the prediction result is specifically: after cooperatively filtering to obtain the QoS value of a certain service, searching for QoS values of other users for the same service, and selecting the user with the closest value, indicating The two users have similar interests and hobbies, based on this, similar recommendations are used, and the relevant value of the latter user is used as the prediction result of the previous user.

Compared with the prior art, the present invention has the following beneficial effects: the neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention proposes a privacy protection cooperative QoS prediction framework, which can protect user's private data while retaining The ability to generate accurate QoS predictions. The present invention introduces differential privacy as a preprocessing for QoS data prediction for the first time. Differential privacy is a strict and provable privacy protection technology, and users can obtain maximum privacy protection by ensuring data availability. The present invention implements the proposed method based on a general method called Laplace mechanism, and conducts extensive experiments to study its performance on real data sets. The privacy accuracy of the experiment was evaluated under different conditions, and the results show that under some constraints, the present invention can achieve better performance than the baseline. The present invention has the following main advantages:

1. For the method proposed by the present invention, the privacy protection algorithm can be parameterized and used to match the prediction to its non-private analog. Although there are some specialized analytical requirements, the method itself is relatively straightforward and readily available.

2. By integrating privacy protection into the application, unconstrained access to the original data can be provided to the user in the event that its final output is substantially less than the entire data set that meets the privacy criteria.

3. The present invention tests the method with a real data set. The results show that the prediction accuracy of the camouflaged data of the present invention is very close to the user's private data.

DRAWINGS

The invention will now be further described with reference to the drawings and embodiments.

1 is a schematic flow chart of a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention.

2 is a schematic diagram of a privacy protection collaborative QoS prediction model.

3 is a schematic diagram comparing the privacy and accuracy between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 3(a) represents the response time, and FIG. 3(b) represents the total time.

4 is a schematic diagram showing the comparison of the impact of the service between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 4(a) represents the response time, and FIG. 4(b) represents the entire time.

5 is a schematic diagram of comparison of user influences between differential privacy based QoS prediction and original methods under different privacy in the experiment of the present invention; FIG. 5(a) represents response time, and FIG. 5(b) represents full time.

6 is a schematic diagram showing the results of the accuracy comparison between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 6(a) represents the response time, and FIG. 6(b) represents the total time.

Detailed ways

The invention will now be described in further detail with reference to the drawings. These drawings are simplified schematic diagrams, and only the basic structure of the present invention is illustrated in a schematic manner, and thus only the configurations related to the present invention are shown.

First, the system model and problem definition

Differential privacy

It is necessary to distinguish between differential privacy and traditional cryptosystems. Differential privacy gives a rigorous quantitative definition of privacy leakage under a very strict attack model and demonstrates that based on the idea of differential privacy, users can maximize privacy protection and ensure data availability. The biggest advantage of this method is that although the data is distorted, the perturbation is required. The noise is independent of the data size. We can achieve a high level of privacy protection by adding a very small amount of noise. Although many privacy protection methods have been proposed, such as k-anonymity and l-diversity, differential privacy is still considered to be the most rigorous and robust privacy protection model based on its solid mathematical foundation.

2.1 Security definition under differential privacy

Differential privacy has two premises. One is that the output of any calculation (such as SUM) should not be affected by operations like inserting or deleting records. The other is that it gives a strict quantitative definition of privacy leakage under a very strict attack model: an attacker cannot distinguish between records with a probability greater than ε, even if she knows the entire data set except the target. The formula is defined as follows:

Definition 1: (ε-differential privacy) If for all data sets D1 and D2 differs on at most one element and all S∈Range(K), the random function K gives ε-differential privacy,

D is the database of rows, D1 is a subset of D2, and the larger dataset D2 happens to contain an additional row. In any case the probability space Pr[.] is on the coin flip of K. The privacy parameter ε>0 is public, and the smaller ε produces a stronger privacy guarantee.

Since differential privacy is defined under probability, any method that achieves this must be random. Some of these methods rely on the addition of controlled noise, such as the Laplace mechanism [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284]. Others, such as index mechanisms and a posteriori sampling, are sampled from a problem-dependent distribution. We will explain the structure in detail in the following sections.

2.2 Laplace mechanism of global sensitivity

In addition to the definition of differential privacy, Dwork [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284] also claims that differential privacy can be added by adding obedience The random noise of the Plath distribution is implemented. If the probability density function of a random variable is:

Then the random variable has a Laplacian (μ, b) distribution. μ and b are positional and scale parameters, respectively. For the sake of simplicity, we set μ=0, so the distribution can be considered as the standard deviation.

Symmetrical exponential distribution.

In order to add noise obeying the Laplacian distribution, let b = Δf / ε and call the generation of noise

Laplace(Δf/ε)

Here, Δf is the global sensitivity, and the definition is given below. ε is a privacy parameter used to utilize privacy. As we can see from the equation, the added noise is proportional to Δf and inversely proportional to ε.

Definition 2: (global sensitivity) For f:D→R ^d , L _k -sensitivity of f is defined as:

For all D1, D2 differs on at most one element, and ||.|| _k represents the L _k norm.

3.1 system model

[S. Zhang, J. Ford and F. Makedon. Deriving Private Information from Randomly Perturbed Ratings. SDM 2006: 59-69] Random perturbations have proven to be unsafe because they can be inferred by clustering techniques, but [J. Zhu, P.He, Z. Zheng and MRLyu. A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation. ICWS 2015:241-248] The proposed system model is mature and suitable for many scenarios, so it applies here. This model. As shown in Figure 2, specifically, each user (USER1, USER2...USERn, etc.) locally calls and collects the QoS value and masquerades the QoS value she observes, and then sends all masqueraded QoS values to the server (SERVER). Owner. The QoS value can then be safely uploaded because the server cannot export any personal sensitive information with spoofed data. However, the data masquerading scheme should still be able to allow the server to collaborate on filtering from masqueraded data (near-domain or model-based). Based on the predicted QoS value (QoS Prediction), the server can run various applications, such as selection, combining and recommendation based on QoS values.

Data masquerading is a key component of QoS prediction for privacy-protected collaborative Web services. The basic idea of data masquerading is to randomly interfere with raw data in these attributes:

a) randomness should ensure that sensitive information (eg QoS values for each individual user) cannot be derived from the perturbed data;

b) Although personal information is limited, when the number of users is very large, the aggregated information of these users can still be evaluated with higher accuracy.

This property is useful for calculations based on aggregated information. Without knowing the exact value of a single data item, we can still produce meaningful results because the aggregated information needed can be estimated from the perturbed data.

Another focus of our approach is the trade-off between accuracy and privacy. The more random numbers, the greater the gap between the spoofed data and the original data, which provides a higher level of privacy protection. Conversely, the less random numbers, the more obvious the data characteristics. For context-based calculations, this indicates that the results are more accurate. Dealing with the balance between accuracy and privacy is an open question. In the present invention, privacy is parameterized as ε and is given by each user. By exploiting differential privacy, the random number added in the observed QoS values is a minimum that is fairly accurate relative to a particular privacy.

Second, the neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction

As shown in FIG. 1 , a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention includes the following steps:

The third step, data camouflage: camouflage the quality of service;

The fifth step, the prediction result: predict the result according to the coordinated filtered quality of service value.

Among them, in the second step of normalization, in order to eliminate the difference between user data and improve the accuracy, the user needs to perform z-score normalization on the observed QoS data. Perform Z-score normalization on QoS values using the following equation:

Among them, in the third step data masquerading, the normalized QoS value is disguised according to the following formula:

Q _ui =q _ui +Laplace(Δf/ε)

Where ε is a privacy parameter set by the user u, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy. The minimum value. Δf is defined according to the distribution of QoS values, that is, Δf=max(r _ui -r _uj ). r _ui represents the quality of service value-QoS value collected by the user u for the web service i, and r _uj represents the quality of service value-QoS value collected by the user u for the web service j.

The meaning of Laplace() is given by the following formula:

If the probability density function of a random variable x is:

Symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).

After masquerading, the user sends his masqueraded value Q _ui to the server and randomly stores the sensitive information of the original data q _ui . However, it is still possible to estimate the aggregated information of the user. Therefore, QoS can be predicted by directly accessing Q _ui .

The data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including the quality of service value of each individual user; when the number of users is very large, The aggregated information of these users can be evaluated with high accuracy.

Differential privacy based on data masquerading: we use r _ui to represent the QoS value collected by user u for web service i, r _u represents the entire vector of QoS values evaluated by user u, and similarly, I _ui and I _u represent indications, respectively Binary elements and vectors for the existence of QoS values. c _u =|I _u | is the number of QoS values evaluated by user u. In our discussion, differential privacy is a key technology for data masquerading. Laplace mechanism [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284] obtains ε- by increasing the noise of the Laplacian distribution. Differential privacy.

Definition 3: (Laplace mechanism [C.Dwork.

privacy.Encyclopedia of Cryptography and Security.2011:338-340.]) gives a function: g=D→R ^d , the following calculation maintains ε-differential privacy

X=g(x)+Laplace(Δf/ε)

Where ε is the privacy parameter used to take advantage of privacy, and smaller ε provides a stronger privacy guarantee. Δf is the de-global sensitivity.

Here, we calculate Δf using L ₁ -norm:

For the sake of simplicity, the ε-differential privacy of each user u is achieved by the following equation:

R _ui =r _ui +Laplace(Δf/ε)

Where Δf is defined as the maximum difference between QoS values, ie:

Δf=max(r _ui -r _uj )

The fourth step is based on neighborhood-based collaborative filtering. Collaborative filtering (CF) is a mature technology adopted by most modern recommendation systems. In the QoS prediction of collaborative Web services, the user is required to provide the observed QoS value of the service used by the user to the recommendation system. Based on the collected QoS values, the recommendation system can predict the QoS of all available services for the user through some high quality algorithms. The more QoS values provided by the user, the higher the prediction accuracy. In the present invention, we use neighborhood-based collaborative filtering by:

Two types of similarity are calculated to improve prediction accuracy: user similarity and service similarity. In particular, the similarity between two users u and v is calculated based on the services they typically call using the following equation:

Where S=S _u ∩S _v is the service set normally invoked by user u and user v, and r _u,i is the QoS value of service i observed by user u,

Is the average QoS value of all services observed by user u.

However, due to the masquerading of QoS values, on the server side we only have the masqueraded QoS value Q _ui instead of the true value q _ui . Therefore, we consider using Q _ui to approximate the similarity values as follows.

According to z normalization,

And by substituting the formula into the calculation, the similarity can be calculated as

Again, we observed that during z normalization,

Then, easy to get

Next, we will demonstrate that despite the use of data masquerading, the scalar product properties between the two vectors remain unchanged. For the sake of clarity, we denote two vectors as a = (a ₁ , a ₂ , ..., a _n ) and b = (b ₁ , b ₂ , ..., b _n ), respectively. After masquerading, the two vectors become A = (A ₁ , A ₂ , ..., A _n ) and B = (B ₁ , B ₂ , ..., B _n ). We have,

Since a _i and Laplace(Δf _b /ε _b ) are independent vectors, and Laplace(Δf _b /ε _b ) is a symmetric exponential distribution of μ=0, we can derive ∑a _i Laplace(Δf _b /ε _b )≈ 0. Similarly, ∑b _i Laplace(Δf _a /ε _a )≈0, and ∑Laplace(Δf _a /ε _a )Laplace(Δf _b /ε _b )≈0 can also be obtained. Therefore, we derive the following equation:

AB≈∑a _i b _i =ab

In addition, we can also draw:

Note that the range of Sim(u,v) is [-1,1], and the larger the value, the more similar the two users (or services). Based on the above similarity value, the QoS value of the service i observed by the user u can be directly predicted. A similar user of user u is utilized by the following equation:

As with user-based QoS prediction, project-based QoS prediction can also be calculated in such a way that the two approaches can be combined to improve the accuracy of QoS prediction.

Among them, in the fifth step prediction result, after collaboratively filtering to obtain the QoS value of a certain service, the QoS values of other users for the same service are retrieved, and the users with the closest values are selected, which indicates that the two users have similar interests and hobbies. To make a similar recommendation, use the relevant value of the latter user as the prediction result of the previous user.

Third, the experiment

In this section, we conducted three series of experiments on real data sets to evaluate our privacy-protected QoS prediction framework. The first series of experiments investigated the balance between privacy and accuracy when using the proposed method. The other two series of experiments examined some important data characteristics, including the effect of size and density on the performance of our method.

Table 1, data set statistics

3.1 experimental configuration

We first notice [Z.Zheng, Y.Zhang and MRLyu.Investigating QoS of Real-World Web Services.TSC 2014 7(1):32-39;Z.Zheng,Y.Zhang and MRLyu.Distributed QoS Evaluation For Real-World Web Services.ICWS 2010:83-90] introduced a real The Web Services QoS data set, which includes the QoS values of 5,825 real Web services observed by 339 users. This data set is very useful when studying the accuracy of QoS predictions. Based on the data set, we focus on two representative QoS attributes: response time (RT) and full time (TP). Table 1 describes the statistics of the data set, AVE and STD are the mean and standard deviation, respectively, and density is the ratio of observed data to all data. More details of the data set can be found in [Z. Zheng, Y. Zhang and MRLyu. Investigating QoS of Real-World Web Services. TSC 2014 7(1): 32-39; Z. Zheng, Y. Zhang and MRLyu Found in .Distributed QoS Evaluation for Real-World Web Services.ICWS 2010:83-90].

We use cross-validation to train and evaluate QoS predictions. The data set here is relatively complete, but in practice, due to limited time and resources, users usually only call a small number of services, and the data density is generally below 10%. To simulate this sparsity in our experiments, we randomly removed entries from the complete data set, leaving only the historical QoS values of smaller density as our training set. The deleted data is used as a test set for accuracy assessment.

Then, we perform a QoS prediction algorithm on the training set and predict the test set. We implemented and evaluated four algorithms. The UIPCC proposed in [Z. Zheng, H. Ma, MRLyu and I. King. WSRec: A Collaborative Filtering Based Web Service Recommender System. ICWS 2009: 437-444] is a representative implementation based on neighborhood collaborative filtering. [Z.Zheng, H.Ma, MRLyu and I.King. QoS-aware web service recommendation by collaborative filtering. TSC 2011, 4(2): 140-152] is the implementation of model-based collaborative filtering. . LUIPCC and LYMPH are two differential privacy integration methods implemented through the Laplace mechanism.

In order to quantify the accuracy of QoS prediction, we use root mean square error (RMSE) as a metric widely used in related work (eg [A.Berlioz, A. Friedman, MA Kaafar, R. Boreli and S. Berkovsky. Applying differential privacy To matrix factorization.RECSYS 2015:107-114;F.McSherry and I.Mironov.

Private recommender systems:building privacy into the net.SIGKDD 2009:627-636]):

R consists of all the values that need to be predicted in the training set, and |R| is the number of elements in R. q' _ui is the predicted value of set R, and q _ui is the corresponding value in the test set. In general, the smaller the RMSE, the better the prediction.

Note that the default parameter settings are shown in Table 2. We choose the parameters of UIPCC and MF based on experience. By default, ε is set to 0.5, which protects enough privacy.

Table 2, parameter settings

UIPCCUIPCC	k＝20k=20	λ＝0.1λ=0.1	--
MFMF	d＝20d=20	γ＝0.001γ=0.001	λ'＝0.01λ'=0.01
LaplaceLaplace	ε＝0.5ε=0.5	--	--

3.2 Privacy and accuracy

Figure 3 is a comparison of RT and TP between our QoS prediction based on differential privacy and the original method under different privacy. By introducing differential privacy into QoS prediction, users can implement privacy protection. But for users who adopt our approach, they do need to consider the balance between privacy and accuracy. On the one hand, users can get more privacy protection by adding more Laplacian noise, which will definitely reduce the validity of the data. Another extreme aspect, users can get Get 100% accuracy without adding any Laplacian noise. In order to study the performance of the change accuracy, we performed a QoS prediction algorithm on the test set and predicted the test set. The privacy parameter ε is incremented by a step size of 0.5 in the range of 0.5 to 4. We can observe that both LUIPCC and LMF fall to RMSE as ε increases. Larger ε means more relaxed privacy constraints, and the utility of the data is not limited, so users can get better accuracy. It is also worth noting that when ε is large in Figure 3 (for example, greater than 2.0), our privacy protection methods LUIPCC and LMF can achieve almost the same or even higher accuracy than UIPCC. Especially when ε is greater than 4, the prediction accuracy of LMF is better than UIPCC. In addition, we also found that MF is better than UIPCC. This demonstrates the superiority of the model-based approach in capturing the potential structure of QoS data. Another fact that needs our attention is that although a recent work [J.Zhu, P.He, Z. Zheng and MRLyu.A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation.ICWS 2015:241-248] Claims better performance than the original algorithm (UIPCC and MF), but the randomness added to prevent information leakage is not large enough, with the application of clustering [S.Zhang, J.Ford and F.Makedon.Deriving Private Information from Randomly Perturbed Ratings. SDM 2006: 59-69] An adversary can accurately infer a user's private data.

In summary, our differential privacy based algorithm can provide privacy-protected QoS prediction with parameterized privacy. The results show that our disguised user data is very close to the loose constraints of the user's private data.

3.3 Impact data size

To assess the impact of data size, we designed experiments by changing the number of services and users. In Figure 4, step 1000 sets the number of users to 339 and the number of services from 1000 to 5000, with the service randomly selected from the original data set. The other parameter settings for the experiment are shown in Table 2. We performed the same experimental setup in Figure 5, which contained 5,825 services.

Obviously, the number of services and the number of users have a positive impact on the accuracy of the algorithm, which means that the more data given, the better the prediction. In other words, with more data, we can provide better accuracy.

Another finding is that although the precision between different data sizes varies widely, the trend of the original algorithm and our privacy-based differential privacy algorithm are the same, such as the trend of UIPCC and LUIPCC or the trend of MF and LMF. This means that the noise required for digital hiding is independent of the data size, so users can achieve a high level of privacy protection by adding a very small amount of noise.

3.4 Effect of density

In addition to the data size, the density expressed as θ is also a major factor in the performance of the algorithm. Figure 6 shows the results of the accuracy comparison at different densities. Although the effect of density on the original algorithm is not obvious, it does have a significant impact on our differential algorithm. Datasets with higher densities perform better. This result means that density is also a key factor in determining the performance of the differential privacy method. More importantly, as the number of services grows, the gap between traditional methods and our privacy-based differential approach is getting smaller. More specifically, when the density is set to 5 in FIG. 6, the gap between LUIPCC and UIPCC is 5. However, as the density increases to 30, the gap between LUIPCC and UIPCC is reduced to one. Therefore, users are advised to use a higher density data set to bring the prediction closer to the original result.

Fourth, the conclusion

The present invention is the first to introduce differential privacy into a collaborative Web services QoS prediction framework. Differential privacy gives a strict quantitative definition of privacy leakage under very strict constraints. Based on the idea of differential privacy, users can get the most privacy protection by ensuring the availability of data. Experimental results show that the system and method of the present invention provides secure and accurate QoS prediction for collaborative Web services.

In view of the above-described embodiments of the present invention, various changes and modifications may be made by those skilled in the art without departing from the scope of the invention. The technical scope of the present invention is not limited to the contents of the specification, and the technical scope thereof must be determined according to the scope of the claims.

Claims

A neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, comprising the following steps:

The first step, data collection: each user collects the quality of service value, that is, the QoS value locally;

The second step, normalization: performing z-score normalization on the quality of service values;

The third step, data camouflage: camouflage the quality of service;

The fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;

The fifth step, the prediction result: predict the result according to the collaboratively filtered quality of service value.
The method of claim 1 wherein in the second step, said performing z-score normalization on the quality of service value uses the following equation:

Where r ui represents the quality of service value-QoS value collected by the user u for the web service i,
Is the QoS vector r u average, ω u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
The method of claim 2, wherein in the third step, the data masquerading masquerades the normalized QoS value according to the following formula:

Q ui =q ui +Laplace(Δf/ε)

Where ε is a privacy parameter, set by the user u, and Δf is defined according to the distribution of QoS values, ie

r ui represents the quality of service value-QoS value collected by the user u for the web service i, and r uj represents the quality of service value-QoS value collected by the user u for the web service j;

The meaning of Laplace() is given by the following formula:

If the probability density function of a random variable x is:

Then the random variable x has a Laplacian (μ, b) distribution; μ and b are positional parameters and scale parameters, respectively; let μ = 0, so the distribution is considered to be the standard deviation
Symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).
The method according to claim 3, wherein in the third step, after the data is masqueraded, the user sends the masqueraded value Q ui to the server, and randomly stores the sensitive information of the original data q ui .
The method of claim 3, wherein in the third step, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relative to Specific privacy maintains a minimum of considerable accuracy.
The method according to claim 1, wherein in the third step, the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that the sensitive information cannot be derived from the turbulent data, including Quality of service value for each individual user; when the number of users is very large, these can still be evaluated with higher accuracy User's aggregated information.
The method according to claim 1, wherein in the fourth step, the neighborhood-based collaborative filtering method is as follows: the similarity between two users u and v is based on the fact that they are usually invoked using the following equation Service to calculate:

Where S=S u ∩S v is the service set normally invoked by user u and user v, and r u,i is the QoS value of service i observed by user u,
Is the average QoS value of all services observed by user u;

Use Q ui to approximate the similarity values as follows:

According to z normalization,
And by substituting the formula into the calculation, the similarity is calculated as

During the z normalization,
easy

It can be shown that despite the use of data masquerading, the scalar product property between the two vectors remains the same;

The range of Sim(u,v) is [-1,1]. The larger the value, the more similar the two users or services are; based on the above similarity values, the QoS value of the service i observed by the user u can be directly predicted; The equation uses a similar user of user u:
The method according to claim 1, wherein in the fifth step, the prediction result is specifically: after collaboratively filtering to obtain a QoS value of a service, and retrieving QoS values of other users for the same service, the selection values are the closest. The user indicates that the two users have similar interests and hobbies based on this, and the relevant value of the latter user is used as the prediction result of the previous user.