WO2019056571A1 - Method for predicting quality of web service - Google Patents

Method for predicting quality of web service Download PDF

Info

Publication number
WO2019056571A1
WO2019056571A1 PCT/CN2017/113484 CN2017113484W WO2019056571A1 WO 2019056571 A1 WO2019056571 A1 WO 2019056571A1 CN 2017113484 W CN2017113484 W CN 2017113484W WO 2019056571 A1 WO2019056571 A1 WO 2019056571A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
qos
service
value
data
Prior art date
Application number
PCT/CN2017/113484
Other languages
French (fr)
Chinese (zh)
Inventor
毛睿
李荣华
陆敏华
王毅
罗秋明
商烁
刘刚
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2019056571A1 publication Critical patent/WO2019056571A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Definitions

  • the invention belongs to the field of computers, and particularly relates to a collaborative filtering method for privacy protection, in particular to a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction.
  • QoS Quality of Service
  • Quality of Service is widely used to describe the non-functional nature of web services.
  • Quality of service based selection, composition and recommendation web service technologies have been extensively discussed in recent papers. The premise of these methods is that the exact QoS values of the Web service are always available. But getting accurate quality of service values is not an easy task.
  • QoS values published by service providers or third-party communities are inaccurate for service users because they are susceptible to an uncertain Internet environment.
  • the breakthrough point is the QoS prediction of personalized collaborative Web services.
  • the basic idea is that users with similar personalities tend to observe similar QoS values for the same service, so when it is necessary to predict the QoS values observed by a particular user for a web service, they can be replaced with values observed by users with similar personality. .
  • Homomorphic encryption which allows calculations on ciphertext, is a direct way to achieve privacy.
  • all of these operations require not only a large computational cost, but also continuous communication between the parties, and even the difficulty of applying some complex calculations to the encryption domain. Therefore, it is not feasible to deal with our problems by using homomorphic encryption.
  • the random perturbation privacy protection method is not safe, it inspires us to design a lightweight and provable random perturbation.
  • a privacy-protected QoS prediction model for users a differential privacy model that strongly protects private data and has provable privacy guarantees. This is the most advanced privacy-protected data state technology. Differential privacy has caused widespread concern because it aims to provide an efficient way to minimize the noise added to the original data.
  • McSheery and Mironov [Reference 1] apply differential privacy to collaborative filtering [RMBell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. ICDM 2007: 43-52], which is a general solution for recommending systems. They divide the recommendation algorithm into two parts: the learning phase and the separate recommendation phase. The learning phase uses differential privacy guarantees to perform, and the separate recommendation phase uses the learning results for individual predictions. Unlike the work done by Reference 1 and Reference 2, the present invention focuses on privacy assurance of data distribution, rather than knowledge learning, and the present invention explores other methods, such as latent factor models, other than those being studied in Reference 1.
  • the technical problem to be solved by the present invention is to provide a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, and introduce differential privacy into a collaborative Web service QoS prediction framework for the first time, and users can obtain the maximum by ensuring data availability. privacy protection.
  • Experimental results show that the method of the present invention provides secure and accurate QoS prediction for collaborative Web services.
  • the present invention provides a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, which includes the following steps:
  • the first step, data collection each user collects the quality of service value, that is, the QoS value locally;
  • the second step normalization: performing z-score normalization on the quality of service values
  • the third step data camouflage: camouflage the quality of service
  • the fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;
  • the fifth step, the prediction result predict the result according to the collaboratively filtered quality of service value.
  • the performing z-score normalization on the quality of service value uses the following equation:
  • r ui represents the quality of service value-QoS value collected by the user u for the web service i
  • ⁇ u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
  • the data masquerading pretends the normalized QoS value according to the following formula:
  • is a privacy parameter, set by the user u
  • r ui represents the quality of service value-QoS value collected by the user u for the web service i
  • r uj represents the quality of service value-QoS value collected by the user u for the web service j;
  • the symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b ⁇ f / ⁇ , and the generation of noise is called laplace ( ⁇ f / ⁇ ).
  • the user sends the masqueraded value Q ui to the server, and randomly stores the sensitive information of the original data q ui .
  • the privacy parameter ⁇ is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy.
  • the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including each individual user.
  • the quality of service value when the number of users is very large, the aggregated information of these users can still be evaluated with high accuracy.
  • the method for neighborhood-based collaborative filtering is as follows: the similarity between two users u and v is calculated based on services that they usually use the following equation call. :
  • QoS value Is the average QoS value of all services observed by user u;
  • the range of Sim(u,v) is [-1,1]. The larger the value, the more similar the two users or services are; based on the above similarity values, the QoS value of the service i observed by the user u can be directly predicted; The equation uses a similar user of user u:
  • the prediction result is specifically: after cooperatively filtering to obtain the QoS value of a certain service, searching for QoS values of other users for the same service, and selecting the user with the closest value, indicating The two users have similar interests and hobbies, based on this, similar recommendations are used, and the relevant value of the latter user is used as the prediction result of the previous user.
  • the present invention Compared with the prior art, the present invention has the following beneficial effects: the neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention proposes a privacy protection cooperative QoS prediction framework, which can protect user's private data while retaining The ability to generate accurate QoS predictions.
  • the present invention introduces differential privacy as a preprocessing for QoS data prediction for the first time. Differential privacy is a strict and provable privacy protection technology, and users can obtain maximum privacy protection by ensuring data availability.
  • the present invention implements the proposed method based on a general method called Laplace mechanism, and conducts extensive experiments to study its performance on real data sets. The privacy accuracy of the experiment was evaluated under different conditions, and the results show that under some constraints, the present invention can achieve better performance than the baseline.
  • the present invention has the following main advantages:
  • the privacy protection algorithm can be parameterized and used to match the prediction to its non-private analog. Although there are some specialized analytical requirements, the method itself is relatively straightforward and readily available.
  • unconstrained access to the original data can be provided to the user in the event that its final output is substantially less than the entire data set that meets the privacy criteria.
  • the present invention tests the method with a real data set. The results show that the prediction accuracy of the camouflaged data of the present invention is very close to the user's private data.
  • FIG. 1 is a schematic flow chart of a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention.
  • FIG. 2 is a schematic diagram of a privacy protection collaborative QoS prediction model.
  • FIG. 3 is a schematic diagram comparing the privacy and accuracy between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention
  • FIG. 3(a) represents the response time
  • FIG. 3(b) represents the total time.
  • FIG. 4 is a schematic diagram showing the comparison of the impact of the service between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention
  • FIG. 4(a) represents the response time
  • FIG. 4(b) represents the entire time.
  • FIG. 5 is a schematic diagram of comparison of user influences between differential privacy based QoS prediction and original methods under different privacy in the experiment of the present invention
  • FIG. 5(a) represents response time
  • FIG. 5(b) represents full time.
  • FIG. 6 is a schematic diagram showing the results of the accuracy comparison between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention
  • FIG. 6(a) represents the response time
  • FIG. 6(b) represents the total time.
  • differential privacy gives a rigorous quantitative definition of privacy leakage under a very strict attack model and demonstrates that based on the idea of differential privacy, users can maximize privacy protection and ensure data availability.
  • the biggest advantage of this method is that although the data is distorted, the perturbation is required.
  • the noise is independent of the data size.
  • many privacy protection methods have been proposed, such as k-anonymity and l-diversity, differential privacy is still considered to be the most rigorous and robust privacy protection model based on its solid mathematical foundation.
  • Definition 1 ( ⁇ -differential privacy) If for all data sets D1 and D2 differs on at most one element and all S ⁇ Range(K), the random function K gives ⁇ -differential privacy,
  • D is the database of rows
  • D1 is a subset of D2
  • the larger dataset D2 happens to contain an additional row.
  • Pr[.] is on the coin flip of K.
  • the privacy parameter ⁇ >0 is public, and the smaller ⁇ produces a stronger privacy guarantee.
  • the random variable has a Laplacian ( ⁇ , b) distribution.
  • ⁇ and b are positional and scale parameters, respectively.
  • 0
  • Symmetrical exponential distribution
  • ⁇ f is the global sensitivity, and the definition is given below.
  • is a privacy parameter used to utilize privacy. As we can see from the equation, the added noise is proportional to ⁇ f and inversely proportional to ⁇ .
  • D2 differs on at most one element, and
  • k represents the L k norm.
  • each user (USER1, USER2...USERn, etc.) locally calls and collects the QoS value and masquerades the QoS value she observes, and then sends all masqueraded QoS values to the server (SERVER). Owner.
  • the QoS value can then be safely uploaded because the server cannot export any personal sensitive information with spoofed data.
  • the data masquerading scheme should still be able to allow the server to collaborate on filtering from masqueraded data (near-domain or model-based).
  • the server can run various applications, such as selection, combining and recommendation based on QoS values.
  • Data masquerading is a key component of QoS prediction for privacy-protected collaborative Web services.
  • the basic idea of data masquerading is to randomly interfere with raw data in these attributes:
  • a) randomness should ensure that sensitive information (eg QoS values for each individual user) cannot be derived from the perturbed data;
  • This property is useful for calculations based on aggregated information. Without knowing the exact value of a single data item, we can still produce meaningful results because the aggregated information needed can be estimated from the perturbed data.
  • a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction includes the following steps:
  • the first step, data collection each user collects the quality of service value, that is, the QoS value locally;
  • the second step normalization: performing z-score normalization on the quality of service values
  • the third step data camouflage: camouflage the quality of service
  • the fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;
  • the fifth step, the prediction result predict the result according to the coordinated filtered quality of service value.
  • r ui represents the quality of service value-QoS value collected by the user u for the web service i
  • ⁇ u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
  • the normalized QoS value is disguised according to the following formula:
  • is a privacy parameter set by the user u
  • the privacy parameter ⁇ is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy.
  • r ui represents the quality of service value-QoS value collected by the user u for the web service i
  • r uj represents the quality of service value-QoS value collected by the user u for the web service j.
  • the user After masquerading, the user sends his masqueraded value Q ui to the server and randomly stores the sensitive information of the original data q ui . However, it is still possible to estimate the aggregated information of the user. Therefore, QoS can be predicted by directly accessing Q ui .
  • the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including the quality of service value of each individual user; when the number of users is very large, The aggregated information of these users can be evaluated with high accuracy.
  • is the privacy parameter used to take advantage of privacy, and smaller ⁇ provides a stronger privacy guarantee.
  • ⁇ f is the de-global sensitivity.
  • ⁇ f is defined as the maximum difference between QoS values, ie:
  • the fourth step is based on neighborhood-based collaborative filtering.
  • Collaborative filtering (CF) is a mature technology adopted by most modern recommendation systems.
  • the user is required to provide the observed QoS value of the service used by the user to the recommendation system.
  • the recommendation system can predict the QoS of all available services for the user through some high quality algorithms. The more QoS values provided by the user, the higher the prediction accuracy.
  • we use neighborhood-based collaborative filtering by:
  • r u,i is the QoS value of service i observed by user u
  • project-based QoS prediction can also be calculated in such a way that the two approaches can be combined to improve the accuracy of QoS prediction.
  • the fifth step prediction result after collaboratively filtering to obtain the QoS value of a certain service, the QoS values of other users for the same service are retrieved, and the users with the closest values are selected, which indicates that the two users have similar interests and hobbies. To make a similar recommendation, use the relevant value of the latter user as the prediction result of the previous user.
  • Table 1 describes the statistics of the data set, AVE and STD are the mean and standard deviation, respectively, and density is the ratio of observed data to all data. More details of the data set can be found in [Z. Zheng, Y. Zhang and MRLyu. Investigating QoS of Real-World Web Services. TSC 2014 7(1): 32-39; Z. Zheng, Y. Zhang and MRLyu Found in .Distributed QoS Evaluation for Real-World Web Services.ICWS 2010:83-90].
  • RMSE root mean square error
  • R consists of all the values that need to be predicted in the training set, and
  • q' ui is the predicted value of set R, and q ui is the corresponding value in the test set. In general, the smaller the RMSE, the better the prediction.
  • Figure 3 is a comparison of RT and TP between our QoS prediction based on differential privacy and the original method under different privacy.
  • users can implement privacy protection. But for users who adopt our approach, they do need to consider the balance between privacy and accuracy. On the one hand, users can get more privacy protection by adding more Laplacian noise, which will definitely reduce the validity of the data. Another extreme aspect, users can get Get 100% accuracy without adding any Laplacian noise.
  • the privacy parameter ⁇ is incremented by a step size of 0.5 in the range of 0.5 to 4.
  • our differential privacy based algorithm can provide privacy-protected QoS prediction with parameterized privacy.
  • the results show that our disguised user data is very close to the loose constraints of the user's private data.
  • step 1000 sets the number of users to 339 and the number of services from 1000 to 5000, with the service randomly selected from the original data set.
  • the other parameter settings for the experiment are shown in Table 2.
  • the density expressed as ⁇ is also a major factor in the performance of the algorithm.
  • Figure 6 shows the results of the accuracy comparison at different densities.
  • density is also a key factor in determining the performance of the differential privacy method. More importantly, as the number of services grows, the gap between traditional methods and our privacy-based differential approach is getting smaller. More specifically, when the density is set to 5 in FIG. 6, the gap between LUIPCC and UIPCC is 5. However, as the density increases to 30, the gap between LUIPCC and UIPCC is reduced to one. Therefore, users are advised to use a higher density data set to bring the prediction closer to the original result.
  • the present invention is the first to introduce differential privacy into a collaborative Web services QoS prediction framework.
  • Differential privacy gives a strict quantitative definition of privacy leakage under very strict constraints.
  • Based on the idea of differential privacy users can get the most privacy protection by ensuring the availability of data.
  • Experimental results show that the system and method of the present invention provides secure and accurate QoS prediction for collaborative Web services.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a method for predicting the quality of a Web service (QoS). The method comprises the following steps: step one, data collection, involving: each user locally collecting a QoS value, i.e. a QoS value; step two, normalization, involving: executing z-score normalization on the QoS value; step three, data disguising, involving: disguising the QoS value; step four, using neighbourhood-based collaborative filtering to process the disguised QoS value; and step five, result prediction, involving: predicting a result according to the QoS value subjected to collaborative filtering. The method can protect the privacy of a user and ensure the availability of data.

Description

[根据细则37.2由ISA制定的发明名称] 法一种WEB服务质量预测方法[Name of invention made by ISA according to Rule 37.2] Method for predicting the quality of WEB service 技术领域Technical field
本发明属于计算机领域,具体涉及一种隐私保护的协同过滤方法,尤其涉及一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法。The invention belongs to the field of computers, and particularly relates to a collaborative filtering method for privacy protection, in particular to a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction.
背景技术Background technique
服务质量(QoS)广泛用于描述web服务的非功能特性。基于服务质量的选择、组合和推荐web服务技术在近几年的论文中广泛讨论。这些方法的前提是Web服务的精确QoS值总是可用的。但是获得精确的服务质量值不是一件容易的事。一方面,服务提供商或第三方社区公布的QoS值对服务用户不准确,因为它们容易受到不确定的互联网环境的影响。另一方面,由于时间、成本和其他资源的限制,服务用户直接评估所有可用服务的QoS是不切实际的。为了解决这个问题,突破点是个性化协作Web服务的QoS预测。基本思想是性格相似的用户倾向于针对相同的服务观测到相似的QoS值,因此当需要预测某特定用户针对某web服务观测到的QoS值时,可以用性格相似的用户观测到的值来替代。Quality of Service (QoS) is widely used to describe the non-functional nature of web services. Quality of service based selection, composition and recommendation web service technologies have been extensively discussed in recent papers. The premise of these methods is that the exact QoS values of the Web service are always available. But getting accurate quality of service values is not an easy task. On the one hand, QoS values published by service providers or third-party communities are inaccurate for service users because they are susceptible to an uncertain Internet environment. On the other hand, it is impractical for service users to directly assess the QoS of all available services due to time, cost and other resource constraints. To solve this problem, the breakthrough point is the QoS prediction of personalized collaborative Web services. The basic idea is that users with similar personalities tend to observe similar QoS values for the same service, so when it is necessary to predict the QoS values observed by a particular user for a web service, they can be replaced with values observed by users with similar personality. .
通过这种方法,对于相同的服务不同的用户也通常被给予不同的QoS预测值,并且最终预测值实际上取决于其特定上下文。基于这些提供的QoS值,已经采用各种技术来改进质量,特别是预测的准确性。In this way, different users for the same service are usually given different QoS predictions, and the final predicted value actually depends on its specific context. Based on these provided QoS values, various techniques have been employed to improve quality, particularly prediction accuracy.
协作式Web服务QoS预测已成为生成准确的个性化QoS的重要工具。虽然在提高协同QoS预测的准确性的研究方面已经取得了许多成就,但在这个过程中为保护用户隐私做的工作还不够。而事实上,观测到的QoS值可能是敏感信息,因此用户可能不愿意与他人共享它们。例如,由用户反馈的观察响应时间通常取决于她的位置,这表明可以从她提供的QoS信息推断出用户的位置。因此,一个问题是推荐系统是否可以在保护用户隐私的前提下,为用户进行准确的个性化QoS预测。Collaborative Web Services QoS prediction has become an important tool for generating accurate personalized QoS. Although many achievements have been made in research to improve the accuracy of collaborative QoS prediction, the work done to protect user privacy in this process is not enough. In fact, the observed QoS values may be sensitive information, so users may be reluctant to share them with others. For example, the observed response time fed back by the user typically depends on her location, which indicates that the location of the user can be inferred from the QoS information she provides. Therefore, one question is whether the recommendation system can accurately and personally predict QoS for users while preserving user privacy.
允许在密文上进行计算的同态加密是实现隐私的直接方式。然而,所有这些操作不仅需要很大的计算成本,而且需要各方之间的持续通信,甚至还没考虑将一些复杂的计算应用到加密域中的困难。因此,通过使用同态加密来处理我们的问题是不可行的。Homomorphic encryption, which allows calculations on ciphertext, is a direct way to achieve privacy. However, all of these operations require not only a large computational cost, but also continuous communication between the parties, and even the difficulty of applying some complex calculations to the encryption domain. Therefore, it is not feasible to deal with our problems by using homomorphic encryption.
另一种技术,即由Polat等人提出的随机扰动,他们声称采用这种技术仍然可以获得准确的推荐,而来自特定分布的随机性被添加到原始数据以防止信息泄露。然而,随机性α的范围是根据经验选择的,并且没有可证明的隐私保证。然而,对于扰动数据的应用程序进行聚类,对手可以准确地推断用户的私人数据,且精度高达70%。Another technique, the random perturbation proposed by Polat et al., claims that accurate predictions can still be obtained with this technique, while randomness from a particular distribution is added to the raw data to prevent information leakage. However, the range of randomness a is chosen empirically and there is no provable privacy guarantee. However, for clustering applications that perturb data, the adversary can accurately infer the user's private data with an accuracy of up to 70%.
因此,虽然随机扰动的隐私保护方法是不安全的,但它启发我们设计一个轻量级和可证明的随机扰动。具体来说,我们为用户开发了一个的隐私保护QoS预测模型,差分隐私模型,该模型能强有力的保护隐私数据并且有可证明的隐私保证,这是目前最先进的保护隐私数据状态技术。差分隐私已经引起了广泛的关注研究,因为它旨在提供有效的方法来最小化添加到原始数据的噪声。Therefore, although the random perturbation privacy protection method is not safe, it inspires us to design a lightweight and provable random perturbation. Specifically, we developed a privacy-protected QoS prediction model for users, a differential privacy model that strongly protects private data and has provable privacy guarantees. This is the most advanced privacy-protected data state technology. Differential privacy has caused widespread concern because it aims to provide an efficient way to minimize the noise added to the original data.
尽管差分隐私的收到广泛关注,但QoS预测的应用仍然相当有限。参考文献1[F.McSherry and I.Mironov.
Figure PCTCN2017113484-appb-000001
private recommender systems:building privacy into the net.SIGKDD 2009:627-636]和参考文献2[A.Machanavajjhala,A.Korolova and A.D.Sarma.Personalized social recommendations:accurate or private.PVLDB 2011 4(7):440-450]是两个基于隐私的隐私保护推荐系统,这是我们的问题最相关的工作。Machanavajjhala等人[参考文献2]研究了个性化的社交推荐的隐私保护,其完全基于用户的社交图。利用差分隐私,可以有效地保护社交图中的敏感链路,这意味着攻击者不能通过被动地观察推荐结果来推断图中的单个链路的存在。但是,另一问题是只有在较弱 的隐私参数下才能实现优质推荐,或者只能为一小部分用户实现优质推荐。McSheery和Mironov[参考文献1]将差分隐私应用于协作过滤[R.M.Bell and Y.Koren.Scalable collaborative filtering with jointly derived neighborhood interpolation weights.ICDM 2007:43-52],这是推荐系统的通常解决方案。他们将推荐算法分为两部分:学习阶段和单独推荐阶段,学习阶段利用差分隐私保证执行,单独推荐阶段将学习结果用于单独预测。与参考文献1和参考文献2完成的工作不同,本发明关注数据发布的隐私保证,而不是知识学习,本发明探索了除了参考文献1中正在研究的其他方法,如潜在因素模型。
Although differential privacy has received widespread attention, the application of QoS prediction is still quite limited. Reference 1 [F.McSherry and I.Mironov.
Figure PCTCN2017113484-appb-000001
Private recommender systems: building privacy into the net. SIGKDD 2009: 627-636] and reference 2 [A. Machanavajjhala, A. Korolova and ADSarma. Personalized social recommendations: accurate or private. PVLDB 2011 4(7): 440-450 ] are two privacy-based privacy protection recommendation systems, which is the most relevant work for our problem. Machanavajjhala et al. [Reference 2] studied the privacy protection of personalized social recommendations, which is based entirely on the user's social graph. With differential privacy, sensitive links in the social graph can be effectively protected, which means that an attacker cannot infer the existence of a single link in the graph by passively observing the recommendation results. However, another problem is that quality recommendations can only be achieved with weaker privacy parameters, or only for a small number of users. McSheery and Mironov [Reference 1] apply differential privacy to collaborative filtering [RMBell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. ICDM 2007: 43-52], which is a general solution for recommending systems. They divide the recommendation algorithm into two parts: the learning phase and the separate recommendation phase. The learning phase uses differential privacy guarantees to perform, and the separate recommendation phase uses the learning results for individual predictions. Unlike the work done by Reference 1 and Reference 2, the present invention focuses on privacy assurance of data distribution, rather than knowledge learning, and the present invention explores other methods, such as latent factor models, other than those being studied in Reference 1.
发明内容Summary of the invention
本发明要解决的技术问题在于提供一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法,首次将差分隐私引入协作式Web服务QoS预测框架,用户可以通过确保数据的可用性获得最大的隐私保护。实验结果表明,本发明方法提供了安全和准确的协作Web服务的QoS预测。The technical problem to be solved by the present invention is to provide a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, and introduce differential privacy into a collaborative Web service QoS prediction framework for the first time, and users can obtain the maximum by ensuring data availability. privacy protection. Experimental results show that the method of the present invention provides secure and accurate QoS prediction for collaborative Web services.
为解决上述技术问题,本发明提供一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法,包括如下步骤:To solve the above technical problem, the present invention provides a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, which includes the following steps:
第一步,数据收集:每个用户在本地收集服务质量值,即QoS值;The first step, data collection: each user collects the quality of service value, that is, the QoS value locally;
第二步,归一化:对服务质量值执行z分数归一化;The second step, normalization: performing z-score normalization on the quality of service values;
第三步,数据伪装:伪装服务质量值;The third step, data camouflage: camouflage the quality of service;
第四步,基于邻域的协同过滤伪装后的服务质量值;The fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;
第五步,预测结果:根据协同过滤后的服务质量值来预测结果。The fifth step, the prediction result: predict the result according to the collaboratively filtered quality of service value.
作为本发明优选的技术方案,第二步中,所述对服务质量值执行z分数归一化使用以下等式:As a preferred technical solution of the present invention, in the second step, the performing z-score normalization on the quality of service value uses the following equation:
Figure PCTCN2017113484-appb-000002
Figure PCTCN2017113484-appb-000002
其中,rui表示由用户u针对web服务i收集的服务质量值-QoS值,
Figure PCTCN2017113484-appb-000003
是QoS向量ru平均值,ωu是QoS向量ru的标准差;在归一化之后,QoS数据具有零均值和单位方差。
Where r ui represents the quality of service value-QoS value collected by the user u for the web service i,
Figure PCTCN2017113484-appb-000003
Is the QoS vector r u average, ω u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
作为本发明优选的技术方案,第三步中,所述数据伪装根据下面公式伪装正规化的QoS值:As a preferred technical solution of the present invention, in the third step, the data masquerading pretends the normalized QoS value according to the following formula:
Qui=qui+Laplace(Δf/ε)Q ui =q ui +Laplace(Δf/ε)
其中,ε是隐私参数,由用户u设置,Δf根据QoS值的分布来定义,即Δf=max(rui-ruj);Where ε is a privacy parameter, set by the user u, and Δf is defined according to the distribution of QoS values, ie, Δf=max(r ui -r uj );
rui表示由用户u针对web服务i收集的服务质量值-QoS值,ruj表示由用户u针对web服务j收集的服务质量值-QoS值;r ui represents the quality of service value-QoS value collected by the user u for the web service i, and r uj represents the quality of service value-QoS value collected by the user u for the web service j;
Laplace()的含义由以下公式给出:The meaning of Laplace() is given by the following formula:
如果一个随机变量x的概率密度函数为:If the probability density function of a random variable x is:
Figure PCTCN2017113484-appb-000004
Figure PCTCN2017113484-appb-000004
则该随机变量x具有拉普拉斯(μ,b)分布;μ和b分别是位置参数和尺度参数;设μ=0,因此分布被认为是标准偏差为
Figure PCTCN2017113484-appb-000005
的对称指数分布;为了添加服从拉普拉斯分布的噪声, 设b=Δf/ε,并且将噪声的生成称为laplace(Δf/ε)。
Then the random variable x has a Laplacian (μ, b) distribution; μ and b are positional parameters and scale parameters, respectively; let μ = 0, so the distribution is considered to be the standard deviation
Figure PCTCN2017113484-appb-000005
The symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).
作为本发明优选的技术方案,第三步中,所述数据伪装后,用户将自己伪装的值Qui发送到服务器,并随机性保存原始数据qui的敏感信息。As a preferred technical solution of the present invention, in the third step, after the data is disguised, the user sends the masqueraded value Q ui to the server, and randomly stores the sensitive information of the original data q ui .
作为本发明优选的技术方案,第三步中,所述隐私参数ε由每个用户给出,通过利用差分隐私,在观察到的QoS值中添加的随机数是相对于在特定隐私保持相当精确度的最小值。As a preferred technical solution of the present invention, in the third step, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy. The minimum value of the degree.
作为本发明优选的技术方案,第三步中,所述数据伪装通过随机干扰原始数据达到伪装数据的目的;随机性应该能够保证不能从扰动的数据中推导出敏感信息,包括每个单独的用户的服务质量值;当用户数量非常大时,仍然能以较高的准确度来评估这些用户的聚合信息。As a preferred technical solution of the present invention, in the third step, the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including each individual user. The quality of service value; when the number of users is very large, the aggregated information of these users can still be evaluated with high accuracy.
作为本发明优选的技术方案,第四步中,所述基于邻域的协同过滤的方法如下:两个用户u和v之间的相似性是基于它们通常使用以下等式调用的服务来计算的:As a preferred technical solution of the present invention, in the fourth step, the method for neighborhood-based collaborative filtering is as follows: the similarity between two users u and v is calculated based on services that they usually use the following equation call. :
Figure PCTCN2017113484-appb-000006
Figure PCTCN2017113484-appb-000006
其中,S=Su∩Sv是用户u和用户v通常调用的服务集合,ru,i是用户u观测到的服务iWhere S=S u ∩S v is the service set normally invoked by user u and user v, r u,i is the service i observed by user u
的QoS值,
Figure PCTCN2017113484-appb-000007
是用户u观察到的所有服务的平均QoS值;
QoS value,
Figure PCTCN2017113484-appb-000007
Is the average QoS value of all services observed by user u;
使用Qui来近似计算相似度值如下:Use Q ui to approximate the similarity values as follows:
根据z归一化,
Figure PCTCN2017113484-appb-000008
并且通过将该公式代入计算中,计算相似度为
According to z normalization,
Figure PCTCN2017113484-appb-000008
And by substituting the formula into the calculation, the similarity is calculated as
Figure PCTCN2017113484-appb-000009
Figure PCTCN2017113484-appb-000009
在z归一化期间,
Figure PCTCN2017113484-appb-000010
易得
During the z normalization,
Figure PCTCN2017113484-appb-000010
easy
Figure PCTCN2017113484-appb-000011
Figure PCTCN2017113484-appb-000011
可以证明,尽管使用数据伪装,两个向量之间的标量积属性保持不变;因此,得出It can be shown that despite the use of data masquerading, the scalar product property between the two vectors remains the same;
Figure PCTCN2017113484-appb-000012
Figure PCTCN2017113484-appb-000012
Sim(u,v)的范围为[-1,1],值越大说明两个用户或服务更相似;基于上述相似性值,可直接预测用户u观察到的服务i的QoS值;通过以下等式利用用户u的相似用户: The range of Sim(u,v) is [-1,1]. The larger the value, the more similar the two users or services are; based on the above similarity values, the QoS value of the service i observed by the user u can be directly predicted; The equation uses a similar user of user u:
Figure PCTCN2017113484-appb-000013
Figure PCTCN2017113484-appb-000013
作为本发明优选的技术方案,第五步中,所述预测结果具体为:协同过滤得到某一服务的QoS值后,检索其他用户针对同一服务的QoS值,选择值最相近的用户,这表明两个用户有相似的兴趣爱好,基于此做相似推荐,采用后一用户的相关值作为前一用户的预测结果。As a preferred technical solution of the present invention, in the fifth step, the prediction result is specifically: after cooperatively filtering to obtain the QoS value of a certain service, searching for QoS values of other users for the same service, and selecting the user with the closest value, indicating The two users have similar interests and hobbies, based on this, similar recommendations are used, and the relevant value of the latter user is used as the prediction result of the previous user.
与现有技术相比,本发明具有以下有益效果:本发明隐私保护协同Web服务质量预测的基于邻域的协同过滤方法,提出一个隐私保护协作QoS预测框架,可以保护用户的私有数据,同时保留生成准确的QoS预测的能力。本发明首次引入差分隐私作为QoS数据预测的预处理,差分隐私是一种严格和可证明的隐私保护技术,用户可以通过确保数据的可用性获得最大的隐私保护。本发明基于一种名为拉普拉斯机制的通用方法实现提出的方法,并进行广泛的实验以研究其在现实数据集上的性能。在不同的条件下评估实验的隐私精度,结果表明在一些约束下,本发明可以实现比基准线有更好的性能。本发明主要有下列优点:Compared with the prior art, the present invention has the following beneficial effects: the neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention proposes a privacy protection cooperative QoS prediction framework, which can protect user's private data while retaining The ability to generate accurate QoS predictions. The present invention introduces differential privacy as a preprocessing for QoS data prediction for the first time. Differential privacy is a strict and provable privacy protection technology, and users can obtain maximum privacy protection by ensuring data availability. The present invention implements the proposed method based on a general method called Laplace mechanism, and conducts extensive experiments to study its performance on real data sets. The privacy accuracy of the experiment was evaluated under different conditions, and the results show that under some constraints, the present invention can achieve better performance than the baseline. The present invention has the following main advantages:
1、针对本发明提出的方法,隐私保护算法可以被参数化,并且被用来将预测与其非私人类似物匹配。虽然有一些专门的分析要求,但方法本身是相对直接和易于获得的。1. For the method proposed by the present invention, the privacy protection algorithm can be parameterized and used to match the prediction to its non-private analog. Although there are some specialized analytical requirements, the method itself is relatively straightforward and readily available.
2、通过将隐私保护集成到应用程序中,可以在其最终输出大大少于符合隐私标准的整个数据集的情况下,向用户提供对原始数据的不受约束的访问。2. By integrating privacy protection into the application, unconstrained access to the original data can be provided to the user in the event that its final output is substantially less than the entire data set that meets the privacy criteria.
3、本发明用真实数据集测试了该方法。结果表明,本发明伪装的数据的预测准确性非常接近的用户的私人数据。3. The present invention tests the method with a real data set. The results show that the prediction accuracy of the camouflaged data of the present invention is very close to the user's private data.
附图说明DRAWINGS
下面结合附图和实施例对本发明进一步说明。The invention will now be further described with reference to the drawings and embodiments.
图1是本发明隐私保护协同Web服务质量预测的基于邻域的协同过滤方法的流程示意图。1 is a schematic flow chart of a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention.
图2是隐私保护协同QoS预测模型示意图。2 is a schematic diagram of a privacy protection collaborative QoS prediction model.
图3是本发明实验中基于差分隐私的QoS预测与不同隐私下的原始方法之间隐私与准确性的比较示意图;图3(a)代表响应时间,图3(b)代表全部时间。3 is a schematic diagram comparing the privacy and accuracy between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 3(a) represents the response time, and FIG. 3(b) represents the total time.
图4是本发明实验中基于差分隐私的QoS预测与不同隐私下的原始方法之间服务的影响的比较示意图;图4(a)代表响应时间,图4(b)代表全部时间。4 is a schematic diagram showing the comparison of the impact of the service between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 4(a) represents the response time, and FIG. 4(b) represents the entire time.
图5是本发明实验中基于差分隐私的QoS预测与不同隐私下的原始方法之间用户的影响的比较示意图;图5(a)代表响应时间,图5(b)代表全部时间。5 is a schematic diagram of comparison of user influences between differential privacy based QoS prediction and original methods under different privacy in the experiment of the present invention; FIG. 5(a) represents response time, and FIG. 5(b) represents full time.
图6是本发明实验中基于差分隐私的QoS预测与不同隐私下的原始方法之间不同密度下精度比较的结果示意图;图6(a)代表响应时间,图6(b)代表全部时间。6 is a schematic diagram showing the results of the accuracy comparison between the QoS prediction based on differential privacy and the original method under different privacy in the experiment of the present invention; FIG. 6(a) represents the response time, and FIG. 6(b) represents the total time.
具体实施方式Detailed ways
现在结合附图对本发明作进一步详细的说明。这些附图均为简化的示意图,仅以示意方式说明本发明的基本结构,因此其仅显示与本发明有关的构成。The invention will now be described in further detail with reference to the drawings. These drawings are simplified schematic diagrams, and only the basic structure of the present invention is illustrated in a schematic manner, and thus only the configurations related to the present invention are shown.
一、系统模型和问题定义First, the system model and problem definition
1.差分隐私Differential privacy
区分差分隐私和传统密码系统是很有必要的。差分隐私给出了在非常严格的攻击模型下隐私泄漏的严格定量的定义,并且证明了:基于差分隐私的想法,用户可以最大限度地获得隐私保护并且确保数据的可用性。这种方法的最大优点是:尽管数据有失真,但扰动所需的 噪声与数据大小无关。我们可以通过添加非常少量的噪声来实现高级别的隐私保护。尽管已经提出了许多隐私保护方法,如k-anonymity和l-diversity,但是差分隐私仍然以其坚实的数学基础被认为是最严格和健壮的隐私保护模型。It is necessary to distinguish between differential privacy and traditional cryptosystems. Differential privacy gives a rigorous quantitative definition of privacy leakage under a very strict attack model and demonstrates that based on the idea of differential privacy, users can maximize privacy protection and ensure data availability. The biggest advantage of this method is that although the data is distorted, the perturbation is required. The noise is independent of the data size. We can achieve a high level of privacy protection by adding a very small amount of noise. Although many privacy protection methods have been proposed, such as k-anonymity and l-diversity, differential privacy is still considered to be the most rigorous and robust privacy protection model based on its solid mathematical foundation.
2.1差分隐私下的安全定义2.1 Security definition under differential privacy
差分隐私有两个前提。一个是,任何计算(例如SUM)的输出不应受到像插入或删除记录之类的操作的影响。另一个是,它给出了在非常严格的攻击模型下的隐私泄漏的严格定量的定义:攻击者不能区分具有大于ε的概率的记录,即使她知道除目标之外的整个数据集。公式定义如下:Differential privacy has two premises. One is that the output of any calculation (such as SUM) should not be affected by operations like inserting or deleting records. The other is that it gives a strict quantitative definition of privacy leakage under a very strict attack model: an attacker cannot distinguish between records with a probability greater than ε, even if she knows the entire data set except the target. The formula is defined as follows:
定义1:(ε-差分隐私)如果对于所有数据集D1和D2在至多一个元素上不同并且所有S∈Range(K),则随机函数K给出了ε-差分隐私,Definition 1: (ε-differential privacy) If for all data sets D1 and D2 differs on at most one element and all S∈Range(K), the random function K gives ε-differential privacy,
Figure PCTCN2017113484-appb-000014
Figure PCTCN2017113484-appb-000014
D是行的数据库,D1是D2的子集,并且较大数据集D2恰好包含一个附加行。在任何情况下的概率空间Pr[.]在K的硬币翻转上。隐私参数ε>0是公开的,较小的ε产生更强的隐私保证。D is the database of rows, D1 is a subset of D2, and the larger dataset D2 happens to contain an additional row. In any case the probability space Pr[.] is on the coin flip of K. The privacy parameter ε>0 is public, and the smaller ε produces a stronger privacy guarantee.
由于差分隐私是在概率下定义的,实现这一点的任何方法必然是随机的。其中一些方法依赖于添加受控噪声,如拉普拉斯机制[C.Dwork,F.McSherry,K.Nissim and A.Smith.Calibrating noise to sensitivity in private data analysis.TCC 2006:265-284]。其他,如指数机制和后验抽样,从一个问题依赖的分布中抽样。我们将在下面部分详细说明结构。Since differential privacy is defined under probability, any method that achieves this must be random. Some of these methods rely on the addition of controlled noise, such as the Laplace mechanism [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284]. Others, such as index mechanisms and a posteriori sampling, are sampled from a problem-dependent distribution. We will explain the structure in detail in the following sections.
2.2全局灵敏度的拉普拉斯机制2.2 Laplace mechanism of global sensitivity
除了差分隐私的定义,Dwork[C.Dwork,F.McSherry,K.Nissim and A.Smith.Calibrating noise to sensitivity in private data analysis.TCC 2006:265-284]还声称,差分隐私可以通过添加服从拉普拉斯分布的随机噪声来实现。如果一个随机变量的概率密度函数为:In addition to the definition of differential privacy, Dwork [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284] also claims that differential privacy can be added by adding obedience The random noise of the Plath distribution is implemented. If the probability density function of a random variable is:
Figure PCTCN2017113484-appb-000015
Figure PCTCN2017113484-appb-000015
则该随机变量具有拉普拉斯(μ,b)分布。μ和b分别是位置参数和尺度参数。为了简单起见,我们设μ=0,因此分布可以被认为是标准偏差为
Figure PCTCN2017113484-appb-000016
的对称指数分布。
Then the random variable has a Laplacian (μ, b) distribution. μ and b are positional and scale parameters, respectively. For the sake of simplicity, we set μ=0, so the distribution can be considered as the standard deviation.
Figure PCTCN2017113484-appb-000016
Symmetrical exponential distribution.
为了添加服从拉普拉斯分布的噪声,设b=Δf/ε,并且将噪声的生成称为In order to add noise obeying the Laplacian distribution, let b = Δf / ε and call the generation of noise
laplace(Δf/ε)Laplace(Δf/ε)
这里,Δf是全局灵敏度,下面给出定义。ε是用于利用隐私的隐私参数。正如我们从等式中看出的,所添加的噪声与Δf成正比,并且与ε成反比。Here, Δf is the global sensitivity, and the definition is given below. ε is a privacy parameter used to utilize privacy. As we can see from the equation, the added noise is proportional to Δf and inversely proportional to ε.
定义2:(全局灵敏度)对f:D→Rd,f的Lk-sensitivity定义为:Definition 2: (global sensitivity) For f:D→R d , L k -sensitivity of f is defined as:
Figure PCTCN2017113484-appb-000017
Figure PCTCN2017113484-appb-000017
对于所有D1,D2在至多一个元素上不同,||.||k代表Lk范数。For all D1, D2 differs on at most one element, and ||.|| k represents the L k norm.
3.1系统模型3.1 system model
[S.Zhang,J.Ford and F.Makedon.Deriving Private Information from Randomly Perturbed Ratings.SDM 2006:59-69]已经证明随机扰动是不安全的,因为它可以通过聚类技术推断,但是[J.Zhu,P.He,Z.Zheng and M.R.Lyu.A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation.ICWS 2015:241-248]提出的系统模型是成熟的,适用于许多场景,因此,在这里适用这个模型。如图2所示,具体来说,每个用户(USER1,USER2…USERn等)在本地调用和收集QoS值,并伪装她观察到的QoS值,然后向服务器(SERVER)发送所有伪装的QoS值的拥有者。之后可以安全地上传QoS值,因为服务器不能导出任何具有伪装数据的个人敏感信息。然而,数据伪装方案应该仍然能够允许服务器从伪装的数据进行协作过滤(基于邻域的或基于模型的)。基于预测的QoS值(QoS Prediction),服务器可以运行各种应用,例如基于QoS值的选择,组合和推荐。[S. Zhang, J. Ford and F. Makedon. Deriving Private Information from Randomly Perturbed Ratings. SDM 2006: 59-69] Random perturbations have proven to be unsafe because they can be inferred by clustering techniques, but [J. Zhu, P.He, Z. Zheng and MRLyu. A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation. ICWS 2015:241-248] The proposed system model is mature and suitable for many scenarios, so it applies here. This model. As shown in Figure 2, specifically, each user (USER1, USER2...USERn, etc.) locally calls and collects the QoS value and masquerades the QoS value she observes, and then sends all masqueraded QoS values to the server (SERVER). Owner. The QoS value can then be safely uploaded because the server cannot export any personal sensitive information with spoofed data. However, the data masquerading scheme should still be able to allow the server to collaborate on filtering from masqueraded data (near-domain or model-based). Based on the predicted QoS value (QoS Prediction), the server can run various applications, such as selection, combining and recommendation based on QoS values.
数据伪装是隐私保护协同Web服务QoS预测的关键组成部分。数据伪装的基本思想是在这些属性中随机干扰原始数据:Data masquerading is a key component of QoS prediction for privacy-protected collaborative Web services. The basic idea of data masquerading is to randomly interfere with raw data in these attributes:
a)随机性应该能够保证不能从扰动的数据中推导出敏感信息(例如每个单独的用户的QoS值);a) randomness should ensure that sensitive information (eg QoS values for each individual user) cannot be derived from the perturbed data;
b)尽管个人的信息有限,但是当用户数量非常大时,仍然可以以较高的准确度来评估这些用户的聚合信息。b) Although personal information is limited, when the number of users is very large, the aggregated information of these users can still be evaluated with higher accuracy.
这种属性对于基于聚合信息的计算是很有用的。在不知道单个数据项的确切值的情况下,我们仍然可以得出有意义的结果,这是因为所需的聚合信息可以从扰动的数据中估计。This property is useful for calculations based on aggregated information. Without knowing the exact value of a single data item, we can still produce meaningful results because the aggregated information needed can be estimated from the perturbed data.
我们的方法的另一个重点是准确性和隐私之间的权衡。随机数越多,伪装数据和原始数据之间的差距越大,这就提供了更高级别的隐私保护。相反,随机数越少,数据特性越明显。针对基于上下文的计算,这表明结果更准确。处理好准确性和隐私性之间的平衡是一个开放性问题。在本发明中,隐私被参数化为ε并由每个用户给出。通过利用差分隐私,在观察到的QoS值中添加的随机数是相对于在特定隐私保持相当精确度的最小值。Another focus of our approach is the trade-off between accuracy and privacy. The more random numbers, the greater the gap between the spoofed data and the original data, which provides a higher level of privacy protection. Conversely, the less random numbers, the more obvious the data characteristics. For context-based calculations, this indicates that the results are more accurate. Dealing with the balance between accuracy and privacy is an open question. In the present invention, privacy is parameterized as ε and is given by each user. By exploiting differential privacy, the random number added in the observed QoS values is a minimum that is fairly accurate relative to a particular privacy.
二、本发明一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法Second, the neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction
如图1所示,本发明一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法,包括如下步骤:As shown in FIG. 1 , a neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction according to the present invention includes the following steps:
第一步,数据收集:每个用户在本地收集服务质量值,即QoS值;The first step, data collection: each user collects the quality of service value, that is, the QoS value locally;
第二步,归一化:对服务质量值执行z分数归一化;The second step, normalization: performing z-score normalization on the quality of service values;
第三步,数据伪装:伪装服务质量值;The third step, data camouflage: camouflage the quality of service;
第四步,基于邻域的协同过滤伪装后的服务质量值;The fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;
第五步,预测结果:根据协调过滤后的服务质量值来预测结果。The fifth step, the prediction result: predict the result according to the coordinated filtered quality of service value.
其中,第二步归一化中,为了消除用户数据之间的差异并且提高准确性,用户需要对观察到的QoS数据执行z分数归一化。使用以下等式对QoS值执行Z分数归一化:Among them, in the second step of normalization, in order to eliminate the difference between user data and improve the accuracy, the user needs to perform z-score normalization on the observed QoS data. Perform Z-score normalization on QoS values using the following equation:
Figure PCTCN2017113484-appb-000018
Figure PCTCN2017113484-appb-000018
其中,rui表示由用户u针对web服务i收集的服务质量值-QoS值,
Figure PCTCN2017113484-appb-000019
是QoS向量ru平均值,ωu是QoS向量ru的标准差;在归一化之后,QoS数据具有零均值和单位方差。
Where r ui represents the quality of service value-QoS value collected by the user u for the web service i,
Figure PCTCN2017113484-appb-000019
Is the QoS vector r u average, ω u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
其中,第三步数据伪装中,根据下面公式伪装正规化的QoS值: Among them, in the third step data masquerading, the normalized QoS value is disguised according to the following formula:
Qui=qui+Laplace(Δf/ε)Q ui =q ui +Laplace(Δf/ε)
其中,ε是隐私参数,由用户u设置,所述隐私参数ε由每个用户给出,通过利用差分隐私,在观察到的QoS值中添加的随机数是相对于在特定隐私保持相当精确度的最小值。Δf根据QoS值的分布来定义,即Δf=max(rui-ruj)。rui表示由用户u针对web服务i收集的服务质量值-QoS值,ruj表示由用户u针对web服务j收集的服务质量值-QoS值。Where ε is a privacy parameter set by the user u, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relatively accurate relative to the specific privacy. The minimum value. Δf is defined according to the distribution of QoS values, that is, Δf=max(r ui -r uj ). r ui represents the quality of service value-QoS value collected by the user u for the web service i, and r uj represents the quality of service value-QoS value collected by the user u for the web service j.
Laplace()的含义由以下公式给出:The meaning of Laplace() is given by the following formula:
如果一个随机变量x的概率密度函数为:If the probability density function of a random variable x is:
Figure PCTCN2017113484-appb-000020
Figure PCTCN2017113484-appb-000020
则该随机变量x具有拉普拉斯(μ,b)分布;μ和b分别是位置参数和尺度参数;设μ=0,因此分布被认为是标准偏差为
Figure PCTCN2017113484-appb-000021
的对称指数分布;为了添加服从拉普拉斯分布的噪声,设b=Δf/ε,并且将噪声的生成称为laplace(Δf/ε)。
Then the random variable x has a Laplacian (μ, b) distribution; μ and b are positional parameters and scale parameters, respectively; let μ = 0, so the distribution is considered to be the standard deviation
Figure PCTCN2017113484-appb-000021
Symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).
伪装后,用户将自己伪装的值Qui发送到服务器,并随机性保存原始数据qui的敏感信息。然而,仍然可以估计用户的聚合信息。因此,可以通过直接访问Qui来对QoS做预测。After masquerading, the user sends his masqueraded value Q ui to the server and randomly stores the sensitive information of the original data q ui . However, it is still possible to estimate the aggregated information of the user. Therefore, QoS can be predicted by directly accessing Q ui .
所述数据伪装通过随机干扰原始数据达到伪装数据的目的;随机性应该能够保证不能从扰动的数据中推导出敏感信息,包括每个单独的用户的服务质量值;当用户数量非常大时,仍然能以较高的准确度来评估这些用户的聚合信息。The data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that sensitive information cannot be derived from the turbulent data, including the quality of service value of each individual user; when the number of users is very large, The aggregated information of these users can be evaluated with high accuracy.
基于数据伪装的差分隐私:我们使用rui来表示由用户u针对web服务i收集的QoS值,ru代表用户u评估的QoS值的整个向量,并且类似地,Iui和Iu分别表示指示QoS值是否存在的二进制元素和向量。cu=|Iu|是由用户u评估的QoS值的数量。在我们的论述中,差分隐私是用于数据伪装的关键技术。拉普拉斯机制[C.Dwork,F.McSherry,K.Nissim and A.Smith.Calibrating noise to sensitivity in private data analysis.TCC 2006:265-284]通过增加拉普拉斯分布的噪声获得ε-差分隐私。Differential privacy based on data masquerading: we use r ui to represent the QoS value collected by user u for web service i, r u represents the entire vector of QoS values evaluated by user u, and similarly, I ui and I u represent indications, respectively Binary elements and vectors for the existence of QoS values. c u =|I u | is the number of QoS values evaluated by user u. In our discussion, differential privacy is a key technology for data masquerading. Laplace mechanism [C. Dwork, F. McSherry, K. Nissim and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC 2006: 265-284] obtains ε- by increasing the noise of the Laplacian distribution. Differential privacy.
定义3:(拉普拉斯机制[C.Dwork.
Figure PCTCN2017113484-appb-000022
privacy.Encyclopedia of Cryptography and Security.2011:338-340.])给出一个函数:g=D→Rd,下面的计算维护ε-差分隐私
Definition 3: (Laplace mechanism [C.Dwork.
Figure PCTCN2017113484-appb-000022
privacy.Encyclopedia of Cryptography and Security.2011:338-340.]) gives a function: g=D→R d , the following calculation maintains ε-differential privacy
X=g(x)+Laplace(Δf/ε)X=g(x)+Laplace(Δf/ε)
其中ε是用于利用隐私的隐私参数,较小的ε提供更强的隐私保证。Δf是去全局灵敏度。Where ε is the privacy parameter used to take advantage of privacy, and smaller ε provides a stronger privacy guarantee. Δf is the de-global sensitivity.
这里,我们用L1-范数计算Δf: Here, we calculate Δf using L 1 -norm:
Figure PCTCN2017113484-appb-000023
Figure PCTCN2017113484-appb-000023
为了简单起见,每个用户u的ε-差分隐私通过以下等式实现:For the sake of simplicity, the ε-differential privacy of each user u is achieved by the following equation:
Rui=rui+Laplace(Δf/ε)R ui =r ui +Laplace(Δf/ε)
其中,Δf被定义为QoS值之间的最大差值,即:Where Δf is defined as the maximum difference between QoS values, ie:
Δf=max(rui-ruj)Δf=max(r ui -r uj )
其中第四步,基于邻域的协同过滤。协同过滤(Collaborative filtering,CF)是大多数现代推荐系统采用的成熟技术。在协同Web服务的QoS预测中,需要用户将其所使用的服务的观测QoS值提供给推荐系统。基于所收集的QoS值,推荐系统可以通过一些优质算法来预测用户的所有可用服务的QoS。用户提供的QoS值越多,那么预测精度就越高。在本发明中,我们采用基于邻域的协同过滤,具体方法为:The fourth step is based on neighborhood-based collaborative filtering. Collaborative filtering (CF) is a mature technology adopted by most modern recommendation systems. In the QoS prediction of collaborative Web services, the user is required to provide the observed QoS value of the service used by the user to the recommendation system. Based on the collected QoS values, the recommendation system can predict the QoS of all available services for the user through some high quality algorithms. The more QoS values provided by the user, the higher the prediction accuracy. In the present invention, we use neighborhood-based collaborative filtering by:
计算两种类型的相似性以便提高预测精度:用户相似性和服务相似性。特别地,两个用户u和v之间的相似性是基于它们通常使用以下等式调用的服务来计算的:Two types of similarity are calculated to improve prediction accuracy: user similarity and service similarity. In particular, the similarity between two users u and v is calculated based on the services they typically call using the following equation:
Figure PCTCN2017113484-appb-000024
Figure PCTCN2017113484-appb-000024
其中,S=Su∩Sv是用户u和用户v通常调用的服务集合,ru,i是用户u观测到的服务i的QoS值,
Figure PCTCN2017113484-appb-000025
是用户u观察到的所有服务的平均QoS值。
Where S=S u ∩S v is the service set normally invoked by user u and user v, and r u,i is the QoS value of service i observed by user u,
Figure PCTCN2017113484-appb-000025
Is the average QoS value of all services observed by user u.
然而,由于QoS值的伪装,在服务器端我们只有伪装的QoS值Qui,而不是真值qui。因此,我们考虑使用Qui来近似计算相似度值如下。However, due to the masquerading of QoS values, on the server side we only have the masqueraded QoS value Q ui instead of the true value q ui . Therefore, we consider using Q ui to approximate the similarity values as follows.
根据z归一化,
Figure PCTCN2017113484-appb-000026
且通过该公式代入计算中,可计算相似度为
According to z normalization,
Figure PCTCN2017113484-appb-000026
And by substituting the formula into the calculation, the similarity can be calculated as
Figure PCTCN2017113484-appb-000027
Figure PCTCN2017113484-appb-000027
同样,我们观察到在z归一化期间,
Figure PCTCN2017113484-appb-000028
则,易得
Again, we observed that during z normalization,
Figure PCTCN2017113484-appb-000028
Then, easy to get
Figure PCTCN2017113484-appb-000029
Figure PCTCN2017113484-appb-000029
接下来,我们将证明,尽管使用数据伪装,两个向量之间的标量积属性保持不变。为了清楚起见,我们分别将两个向量表示为a=(a1,a2,...,an)和b=(b1,b2,...,bn)。伪装后,两 个向量变为A=(A1,A2,...,An)和B=(B1,B2,...,Bn)。我们有,Next, we will demonstrate that despite the use of data masquerading, the scalar product properties between the two vectors remain unchanged. For the sake of clarity, we denote two vectors as a = (a 1 , a 2 , ..., a n ) and b = (b 1 , b 2 , ..., b n ), respectively. After masquerading, the two vectors become A = (A 1 , A 2 , ..., A n ) and B = (B 1 , B 2 , ..., B n ). We have,
Figure PCTCN2017113484-appb-000030
Figure PCTCN2017113484-appb-000030
因为ai和Laplace(Δfbb)是独立向量,并且Laplace(Δfbb)是μ=0的对称指数分布,我们可以得出∑aiLaplace(Δfbb)≈0。同样,还可以得出∑biLaplace(Δfaa)≈0,以及∑Laplace(Δfaa)Laplace(Δfbb)≈0。因此,我们得出以下方程:Since a i and Laplace(Δf bb ) are independent vectors, and Laplace(Δf bb ) is a symmetric exponential distribution of μ=0, we can derive ∑a i Laplace(Δf bb )≈ 0. Similarly, ∑b i Laplace(Δf aa )≈0, and ∑Laplace(Δf aa )Laplace(Δf bb )≈0 can also be obtained. Therefore, we derive the following equation:
AB≈∑aibi=abAB≈∑a i b i =ab
此外,我们还可以得出:In addition, we can also draw:
Figure PCTCN2017113484-appb-000031
Figure PCTCN2017113484-appb-000031
注意,Sim(u,v)的范围为[-1,1],值越大说明两个用户(或服务)更相似。基于上述相似性值,可以直接预测用户u观察到的服务i的QoS值。通过以下等式利用用户u的相似用户:Note that the range of Sim(u,v) is [-1,1], and the larger the value, the more similar the two users (or services). Based on the above similarity value, the QoS value of the service i observed by the user u can be directly predicted. A similar user of user u is utilized by the following equation:
Figure PCTCN2017113484-appb-000032
Figure PCTCN2017113484-appb-000032
与基于用户的QoS预测一样,基于项目的QoS预测也可以这样计算,这两种方式可以组合在一起以提高QoS预测的准确性。As with user-based QoS prediction, project-based QoS prediction can also be calculated in such a way that the two approaches can be combined to improve the accuracy of QoS prediction.
其中,第五步预测结果中,协同过滤得到某一服务的QoS值后,检索其他用户针对同一服务的QoS值,选择值最相近的用户,这表明两个用户有相似的兴趣爱好,基于此做相似推荐,采用后一用户的相关值作为前一用户的预测结果。Among them, in the fifth step prediction result, after collaboratively filtering to obtain the QoS value of a certain service, the QoS values of other users for the same service are retrieved, and the users with the closest values are selected, which indicates that the two users have similar interests and hobbies. To make a similar recommendation, use the relevant value of the latter user as the prediction result of the previous user.
三、实验Third, the experiment
在本节中,我们对真实数据集进行三个系列的实验,以评估我们的隐私保护QoS预测框架。第一系列实验研究当使用所提出的方法时隐私性和准确性之间的平衡。另外两个系列的实验研究了一些重要的数据特征,包括大小和密度对我们的方法的性能的影响。In this section, we conducted three series of experiments on real data sets to evaluate our privacy-protected QoS prediction framework. The first series of experiments investigated the balance between privacy and accuracy when using the proposed method. The other two series of experiments examined some important data characteristics, including the effect of size and density on the performance of our method.
表1、数据集统计Table 1, data set statistics
Figure PCTCN2017113484-appb-000033
Figure PCTCN2017113484-appb-000033
3.1实验配置3.1 experimental configuration
我们首先注意到[Z.Zheng,Y.Zhang and M.R.Lyu.Investigating QoS of Real-World Web Services.TSC 2014 7(1):32-39;Z.Zheng,Y.Zhang and M.R.Lyu.Distributed QoS Evaluation for Real-World Web Services.ICWS 2010:83-90]中引入了一个真正的 Web服务QoS数据集,其中包括339个用户观察到的5,825个真实Web服务的QoS值。该数据集在研究QoS预测的准确性时非常有用。根据数据集,我们关注两个代表性的QoS属性:响应时间(RT)和全部时间(TP)。表1描述了数据集的统计,AVE和STD分别是平均值和标准差,密度是指观察数据与所有数据的比率。数据集的更多细节可以在[Z.Zheng,Y.Zhang and M.R.Lyu.Investigating QoS of Real-World Web Services.TSC 2014 7(1):32-39;Z.Zheng,Y.Zhang and M.R.Lyu.Distributed QoS Evaluation for Real-World Web Services.ICWS 2010:83-90]中找到。We first notice [Z.Zheng, Y.Zhang and MRLyu.Investigating QoS of Real-World Web Services.TSC 2014 7(1):32-39;Z.Zheng,Y.Zhang and MRLyu.Distributed QoS Evaluation For Real-World Web Services.ICWS 2010:83-90] introduced a real The Web Services QoS data set, which includes the QoS values of 5,825 real Web services observed by 339 users. This data set is very useful when studying the accuracy of QoS predictions. Based on the data set, we focus on two representative QoS attributes: response time (RT) and full time (TP). Table 1 describes the statistics of the data set, AVE and STD are the mean and standard deviation, respectively, and density is the ratio of observed data to all data. More details of the data set can be found in [Z. Zheng, Y. Zhang and MRLyu. Investigating QoS of Real-World Web Services. TSC 2014 7(1): 32-39; Z. Zheng, Y. Zhang and MRLyu Found in .Distributed QoS Evaluation for Real-World Web Services.ICWS 2010:83-90].
我们使用交叉验证来训练和评估QoS预测。这里的数据集是比较完整的,但在实践中,由于时间和资源有限,用户通常只会调用少量服务,而且数据密度一般在10%以下。为了在我们的实验中模拟这种稀疏性,我们从完整数据集中随机删除条目,只保留较小密度的历史QoS值作为我们的训练集。将被删除的数据作为准确性评估的测试集。We use cross-validation to train and evaluate QoS predictions. The data set here is relatively complete, but in practice, due to limited time and resources, users usually only call a small number of services, and the data density is generally below 10%. To simulate this sparsity in our experiments, we randomly removed entries from the complete data set, leaving only the historical QoS values of smaller density as our training set. The deleted data is used as a test set for accuracy assessment.
然后,我们对训练集执行QoS预测算法,并对测试集进行预测。我们实现和评估了四种算法。在[Z.Zheng,H.Ma,M.R.Lyu and I.King.WSRec:A Collaborative Filtering Based Web Service Recommender System.ICWS 2009:437-444]中提出的UIPCC是基于邻域协同过滤的代表性实现,[Z.Zheng,H.Ma,M.R.Lyu and I.King.QoS-aware web service recommendation by collaborative filtering.TSC 2011,4(2):140-152]中引入的MF是基于模型的协同过滤的实现。LUIPCC和LYMPH是通过拉普拉斯机制实现的两种差异隐私整合方法。Then, we perform a QoS prediction algorithm on the training set and predict the test set. We implemented and evaluated four algorithms. The UIPCC proposed in [Z. Zheng, H. Ma, MRLyu and I. King. WSRec: A Collaborative Filtering Based Web Service Recommender System. ICWS 2009: 437-444] is a representative implementation based on neighborhood collaborative filtering. [Z.Zheng, H.Ma, MRLyu and I.King. QoS-aware web service recommendation by collaborative filtering. TSC 2011, 4(2): 140-152] is the implementation of model-based collaborative filtering. . LUIPCC and LYMPH are two differential privacy integration methods implemented through the Laplace mechanism.
为了量化QoS预测的准确性,我们采用均方根误差(RMSE)作为在相关工作中广泛使用的度量(例如[A.Berlioz,A.Friedman,M.A.Kaafar,R.Boreli and S.Berkovsky.Applying differential privacy to matrix factorization.RECSYS 2015:107-114;F.McSherry and I.Mironov.
Figure PCTCN2017113484-appb-000034
private recommender systems:building privacy into the net.SIGKDD 2009:627-636]):
In order to quantify the accuracy of QoS prediction, we use root mean square error (RMSE) as a metric widely used in related work (eg [A.Berlioz, A. Friedman, MA Kaafar, R. Boreli and S. Berkovsky. Applying differential privacy To matrix factorization.RECSYS 2015:107-114;F.McSherry and I.Mironov.
Figure PCTCN2017113484-appb-000034
Private recommender systems:building privacy into the net.SIGKDD 2009:627-636]):
R由训练集中需要预测的所有值组成,而|R|是R中元素的个数。q'ui是集合R的预测值,qui是测试集中的相应值。通常,RMSE越小,表示预测结果更好。R consists of all the values that need to be predicted in the training set, and |R| is the number of elements in R. q' ui is the predicted value of set R, and q ui is the corresponding value in the test set. In general, the smaller the RMSE, the better the prediction.
注意到,默认的参数设置如表2所示。我们根据经验来选择UIPCC和MF的参数。默认情况下,ε设为0.5,这样可以保护足够的隐私。Note that the default parameter settings are shown in Table 2. We choose the parameters of UIPCC and MF based on experience. By default, ε is set to 0.5, which protects enough privacy.
表2、参数设置Table 2, parameter settings
UIPCCUIPCC k=20k=20 λ=0.1λ=0.1 --
MFMF d=20d=20 γ=0.001γ=0.001 λ'=0.01λ'=0.01
LaplaceLaplace ε=0.5ε=0.5 -- --
3.2隐私与准确性3.2 Privacy and accuracy
图3是我们基于差分隐私的QoS预测与不同隐私下的原始方法之间对应于RT和TP的比较。通过将差分隐私引入QoS预测,用户可以实现隐私保护。但对于采用我们方法的用户,他们确实需要考虑隐私与准确性之间的平衡。一方面,用户可以通过添加更多的拉普拉斯噪声来获得更高的隐私保护,这肯定会降低数据的有效性。另一比较极端的方面,用户可以获 得100%的精度,而不需要增加任何拉普拉斯噪声。为了研究变化精度的性能,我们对测试集执行QoS预测算法,并对测试集进行预测。隐私参数ε以步长0.5在0.5到4这个范围递增。我们可以观察到,当ε增大时,LUIPCC和LMF都下降到RMSE。较大的ε意味着更宽松的隐私约束,数据的效用不受限制,因此用户可以获得更好的准确性。此外值得注意的是,当图3中ε变大(例如大于2.0时),我们的隐私保护方法LUIPCC和LMF可以获得与UIPCC几乎相同甚至更高的准确度。特别是当ε大于4时,LMF的预测精度要好于UIPCC。此外,我们还发现MF优于UIPCC。这表明基于模型的方法在捕获QoS数据的潜在结构方面的优越性。需要我们注意的另一个事实是,虽然最近的一项工作[J.Zhu,P.He,Z.Zheng and M.R.Lyu.A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation.ICWS 2015:241-248]声称比原始算法(UIPCC和MF)都有更好的性能,但是为防止信息泄露而添加的随机性不够大,随着聚类的应用[S.Zhang,J.Ford and F.Makedon.Deriving Private Information from Randomly Perturbed Ratings.SDM 2006:59-69]对手可以准确地推断用户的隐私数据。Figure 3 is a comparison of RT and TP between our QoS prediction based on differential privacy and the original method under different privacy. By introducing differential privacy into QoS prediction, users can implement privacy protection. But for users who adopt our approach, they do need to consider the balance between privacy and accuracy. On the one hand, users can get more privacy protection by adding more Laplacian noise, which will definitely reduce the validity of the data. Another extreme aspect, users can get Get 100% accuracy without adding any Laplacian noise. In order to study the performance of the change accuracy, we performed a QoS prediction algorithm on the test set and predicted the test set. The privacy parameter ε is incremented by a step size of 0.5 in the range of 0.5 to 4. We can observe that both LUIPCC and LMF fall to RMSE as ε increases. Larger ε means more relaxed privacy constraints, and the utility of the data is not limited, so users can get better accuracy. It is also worth noting that when ε is large in Figure 3 (for example, greater than 2.0), our privacy protection methods LUIPCC and LMF can achieve almost the same or even higher accuracy than UIPCC. Especially when ε is greater than 4, the prediction accuracy of LMF is better than UIPCC. In addition, we also found that MF is better than UIPCC. This demonstrates the superiority of the model-based approach in capturing the potential structure of QoS data. Another fact that needs our attention is that although a recent work [J.Zhu, P.He, Z. Zheng and MRLyu.A Privacy-Preserving QoS Prediction Framework for Web Service Recommendation.ICWS 2015:241-248] Claims better performance than the original algorithm (UIPCC and MF), but the randomness added to prevent information leakage is not large enough, with the application of clustering [S.Zhang, J.Ford and F.Makedon.Deriving Private Information from Randomly Perturbed Ratings. SDM 2006: 59-69] An adversary can accurately infer a user's private data.
总而言之,我们基于差分隐私的算法可以提供具有参数化隐私的隐私保护QoS预测。结果表明,我们伪装的用户数据与用户私有数据的宽松约束非常接近。In summary, our differential privacy based algorithm can provide privacy-protected QoS prediction with parameterized privacy. The results show that our disguised user data is very close to the loose constraints of the user's private data.
3.3影响数据大小3.3 Impact data size
为了评估数据大小的影响,我们分别通过改变服务和用户的数量来设计实验。在图4中,步骤1000将用户数设定为339,服务次数从1000变为5000,其中服务从原始数据集中随机选择。实验的其他参数设置如表2所示。我们在图5中进行相同的实验设置,其中包含5825个服务。To assess the impact of data size, we designed experiments by changing the number of services and users. In Figure 4, step 1000 sets the number of users to 339 and the number of services from 1000 to 5000, with the service randomly selected from the original data set. The other parameter settings for the experiment are shown in Table 2. We performed the same experimental setup in Figure 5, which contained 5,825 services.
很明显,服务数量和用户数量对算法的准确性都有积极的影响,这意味着给出的数据越多,预测就越好。换句话说,随着更多的数据,我们可以提供更好的准确性。Obviously, the number of services and the number of users have a positive impact on the accuracy of the algorithm, which means that the more data given, the better the prediction. In other words, with more data, we can provide better accuracy.
另一个发现是,尽管不同数据大小之间的精度差异很大,但原始算法和我们基于隐私的差分隐私算法的趋势是相同的,如UIPCC和LUIPCC的趋势或MF和LMF的趋势。这意味着数字隐藏所需的噪声与数据大小无关,因此用户可以通过添加非常少量的噪声来实现高水平的隐私保护。Another finding is that although the precision between different data sizes varies widely, the trend of the original algorithm and our privacy-based differential privacy algorithm are the same, such as the trend of UIPCC and LUIPCC or the trend of MF and LMF. This means that the noise required for digital hiding is independent of the data size, so users can achieve a high level of privacy protection by adding a very small amount of noise.
3.4密度的影响3.4 Effect of density
除了数据大小,表示为θ的密度也是算法性能的主要因素。图6给出了不同密度下精度比较的结果。虽然密度对原始算法的影响不明显,但它确实具有我们基于差分算法的显着影响。密度较高的数据集表现更好。这个结果意味着密度也是确定差分隐私方法性能的关键因素。更重要的是,当服务数量变大时,传统方法与我们基于隐私的差分方法之间的差距越来越小。更准确地说,在图6中当密度设定为5时,LUIPCC和UIPCC之间的间隙为5。然而,当密度增加到30时,LUIPCC和UIPCC之间的间隙减小到1。因此,建议用户使用更高密度的数据集使预测更接近原始结果。In addition to the data size, the density expressed as θ is also a major factor in the performance of the algorithm. Figure 6 shows the results of the accuracy comparison at different densities. Although the effect of density on the original algorithm is not obvious, it does have a significant impact on our differential algorithm. Datasets with higher densities perform better. This result means that density is also a key factor in determining the performance of the differential privacy method. More importantly, as the number of services grows, the gap between traditional methods and our privacy-based differential approach is getting smaller. More specifically, when the density is set to 5 in FIG. 6, the gap between LUIPCC and UIPCC is 5. However, as the density increases to 30, the gap between LUIPCC and UIPCC is reduced to one. Therefore, users are advised to use a higher density data set to bring the prediction closer to the original result.
四、结论Fourth, the conclusion
本发明是首次将差分隐私引入协作式Web服务QoS预测框架。差分隐私在非常严格的约束条件下给出隐私泄漏的严格定量的定义。基于差分隐私的思想,用户可以通过确保数据的可用性获得最大的隐私保护。实验结果表明,本发明系统和方法提供了安全和准确的协作Web服务的QoS预测。The present invention is the first to introduce differential privacy into a collaborative Web services QoS prediction framework. Differential privacy gives a strict quantitative definition of privacy leakage under very strict constraints. Based on the idea of differential privacy, users can get the most privacy protection by ensuring the availability of data. Experimental results show that the system and method of the present invention provides secure and accurate QoS prediction for collaborative Web services.
以上述依据本发明的理想实施例为启示,通过上述的说明内容,相关工作人员完全可以在不偏离本项发明技术思想的范围内,进行多样的变更以及修改。本项发明的技术性范围并不局限于说明书上的内容,必须要根据权利要求范围来确定其技术性范围。 In view of the above-described embodiments of the present invention, various changes and modifications may be made by those skilled in the art without departing from the scope of the invention. The technical scope of the present invention is not limited to the contents of the specification, and the technical scope thereof must be determined according to the scope of the claims.

Claims (8)

  1. 一种隐私保护协同Web服务质量预测的基于邻域的协同过滤方法,其特征在于,包括如下步骤:A neighborhood-based collaborative filtering method for privacy protection collaborative Web service quality prediction, comprising the following steps:
    第一步,数据收集:每个用户在本地收集服务质量值,即QoS值;The first step, data collection: each user collects the quality of service value, that is, the QoS value locally;
    第二步,归一化:对服务质量值执行z分数归一化;The second step, normalization: performing z-score normalization on the quality of service values;
    第三步,数据伪装:伪装服务质量值;The third step, data camouflage: camouflage the quality of service;
    第四步,基于邻域的协同过滤伪装后的服务质量值;The fourth step is based on the neighborhood-based collaborative filtering of the quality of service value after masquerading;
    第五步,预测结果:根据协同过滤后的服务质量值来预测结果。The fifth step, the prediction result: predict the result according to the collaboratively filtered quality of service value.
  2. 如权利要求1所述的方法,其特征在于,第二步中,所述对服务质量值执行z分数归一化使用以下等式:The method of claim 1 wherein in the second step, said performing z-score normalization on the quality of service value uses the following equation:
    Figure PCTCN2017113484-appb-100001
    Figure PCTCN2017113484-appb-100001
    其中,rui表示由用户u针对web服务i收集的服务质量值-QoS值,
    Figure PCTCN2017113484-appb-100002
    是QoS向量ru平均值,ωu是QoS向量ru的标准差;在归一化之后,QoS数据具有零均值和单位方差。
    Where r ui represents the quality of service value-QoS value collected by the user u for the web service i,
    Figure PCTCN2017113484-appb-100002
    Is the QoS vector r u average, ω u is the standard deviation of the QoS vector r u ; after normalization, the QoS data has zero mean and unit variance.
  3. 如权利要求2所述的方法,其特征在于,第三步中,所述数据伪装根据下面公式伪装正规化的QoS值:The method of claim 2, wherein in the third step, the data masquerading masquerades the normalized QoS value according to the following formula:
    Qui=qui+Laplace(Δf/ε)Q ui =q ui +Laplace(Δf/ε)
    其中,ε是隐私参数,由用户u设置,Δf根据QoS值的分布来定义,即
    Figure PCTCN2017113484-appb-100003
    Where ε is a privacy parameter, set by the user u, and Δf is defined according to the distribution of QoS values, ie
    Figure PCTCN2017113484-appb-100003
    rui表示由用户u针对web服务i收集的服务质量值-QoS值,ruj表示由用户u针对web服务j收集的服务质量值-QoS值;r ui represents the quality of service value-QoS value collected by the user u for the web service i, and r uj represents the quality of service value-QoS value collected by the user u for the web service j;
    Laplace()的含义由以下公式给出:The meaning of Laplace() is given by the following formula:
    如果一个随机变量x的概率密度函数为:If the probability density function of a random variable x is:
    Figure PCTCN2017113484-appb-100004
    Figure PCTCN2017113484-appb-100004
    则该随机变量x具有拉普拉斯(μ,b)分布;μ和b分别是位置参数和尺度参数;设μ=0,因此分布被认为是标准偏差为
    Figure PCTCN2017113484-appb-100005
    的对称指数分布;为了添加服从拉普拉斯分布的噪声,设b=Δf/ε,并且将噪声的生成称为laplace(Δf/ε)。
    Then the random variable x has a Laplacian (μ, b) distribution; μ and b are positional parameters and scale parameters, respectively; let μ = 0, so the distribution is considered to be the standard deviation
    Figure PCTCN2017113484-appb-100005
    Symmetric exponential distribution; in order to add noise obeying the Laplacian distribution, let b = Δf / ε, and the generation of noise is called laplace (Δf / ε).
  4. 如权利要求3所述的方法,其特征在于,第三步中,所述数据伪装后,用户将自己伪装的值Qui发送到服务器,并随机性保存原始数据qui的敏感信息。The method according to claim 3, wherein in the third step, after the data is masqueraded, the user sends the masqueraded value Q ui to the server, and randomly stores the sensitive information of the original data q ui .
  5. 如权利要求3所述的方法,其特征在于,第三步中,所述隐私参数ε由每个用户给出,通过利用差分隐私,在观察到的QoS值中添加的随机数是相对于在特定隐私保持相当精确度的最小值。The method of claim 3, wherein in the third step, the privacy parameter ε is given by each user, and by using differential privacy, the random number added in the observed QoS value is relative to Specific privacy maintains a minimum of considerable accuracy.
  6. 如权利要求1所述的方法,其特征在于,第三步中,所述数据伪装通过随机干扰原始数据达到伪装数据的目的;随机性应该能够保证不能从扰动的数据中推导出敏感信息,包括每个单独的用户的服务质量值;当用户数量非常大时,仍然能以较高的准确度来评估这些 用户的聚合信息。The method according to claim 1, wherein in the third step, the data masquerading achieves the purpose of masquerading data by randomly interfering with the original data; the randomness should ensure that the sensitive information cannot be derived from the turbulent data, including Quality of service value for each individual user; when the number of users is very large, these can still be evaluated with higher accuracy User's aggregated information.
  7. 如权利要求1所述的方法,其特征在于,第四步中,所述基于邻域的协同过滤的方法如下:两个用户u和v之间的相似性是基于它们通常使用以下等式调用的服务来计算的:The method according to claim 1, wherein in the fourth step, the neighborhood-based collaborative filtering method is as follows: the similarity between two users u and v is based on the fact that they are usually invoked using the following equation Service to calculate:
    Figure PCTCN2017113484-appb-100006
    Figure PCTCN2017113484-appb-100006
    其中,S=Su∩Sv是用户u和用户v通常调用的服务集合,ru,i是用户u观测到的服务i的QoS值,
    Figure PCTCN2017113484-appb-100007
    是用户u观察到的所有服务的平均QoS值;
    Where S=S u ∩S v is the service set normally invoked by user u and user v, and r u,i is the QoS value of service i observed by user u,
    Figure PCTCN2017113484-appb-100007
    Is the average QoS value of all services observed by user u;
    使用Qui来近似计算相似度值如下:Use Q ui to approximate the similarity values as follows:
    根据z归一化,
    Figure PCTCN2017113484-appb-100008
    并且通过将该公式代入计算中,计算相似度为
    According to z normalization,
    Figure PCTCN2017113484-appb-100008
    And by substituting the formula into the calculation, the similarity is calculated as
    Figure PCTCN2017113484-appb-100009
    Figure PCTCN2017113484-appb-100009
    在z归一化期间,
    Figure PCTCN2017113484-appb-100010
    易得
    During the z normalization,
    Figure PCTCN2017113484-appb-100010
    easy
    Figure PCTCN2017113484-appb-100011
    Figure PCTCN2017113484-appb-100011
    可以证明,尽管使用数据伪装,两个向量之间的标量积属性保持不变;因此,得出It can be shown that despite the use of data masquerading, the scalar product property between the two vectors remains the same;
    Figure PCTCN2017113484-appb-100012
    Figure PCTCN2017113484-appb-100012
    Sim(u,v)的范围为[-1,1],值越大说明两个用户或服务更相似;基于上述相似性值,可直接预测用户u观察到的服务i的QoS值;通过以下等式利用用户u的相似用户:The range of Sim(u,v) is [-1,1]. The larger the value, the more similar the two users or services are; based on the above similarity values, the QoS value of the service i observed by the user u can be directly predicted; The equation uses a similar user of user u:
    Figure PCTCN2017113484-appb-100013
    Figure PCTCN2017113484-appb-100013
  8. 如权利要求1所述的方法,其特征在于,第五步中,所述预测结果具体为:协同过滤得到某一服务的QoS值后,检索其他用户针对同一服务的QoS值,选择值最相近的用户,这表明两个用户有相似的兴趣爱好,基于此做相似推荐,采用后一用户的相关值作为前一用户的预测结果。 The method according to claim 1, wherein in the fifth step, the prediction result is specifically: after collaboratively filtering to obtain a QoS value of a service, and retrieving QoS values of other users for the same service, the selection values are the closest. The user indicates that the two users have similar interests and hobbies based on this, and the relevant value of the latter user is used as the prediction result of the previous user.
PCT/CN2017/113484 2017-09-25 2017-11-29 Method for predicting quality of web service WO2019056571A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710874203.7 2017-09-25
CN201710874203.7A CN107609421A (en) 2017-09-25 2017-09-25 Secret protection cooperates with the collaborative filtering method based on neighborhood of Web service prediction of quality

Publications (1)

Publication Number Publication Date
WO2019056571A1 true WO2019056571A1 (en) 2019-03-28

Family

ID=61057941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/113484 WO2019056571A1 (en) 2017-09-25 2017-11-29 Method for predicting quality of web service

Country Status (2)

Country Link
CN (1) CN107609421A (en)
WO (1) WO2019056571A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276016A (en) * 2019-06-28 2019-09-24 中国科学技术大学 A kind of socialization recommended method based on difference privacy
CN113364621A (en) * 2021-06-04 2021-09-07 浙江大学 Service quality prediction method under service network environment
CN113423058A (en) * 2021-06-08 2021-09-21 山东浪潮科学研究院有限公司 Privacy protection method based on location-based service
CN114760657A (en) * 2022-03-11 2022-07-15 河海大学 Active QoS monitoring method and system based on LSTM-BSPM under mobile edge environment
CN116132347A (en) * 2023-04-06 2023-05-16 湖南工商大学 Bi-LSTM-based service QoS prediction method in computing network convergence environment

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427891B (en) * 2018-03-12 2022-11-04 南京理工大学 Neighborhood recommendation method based on differential privacy protection
CN108763954B (en) * 2018-05-17 2022-03-01 西安电子科技大学 Linear regression model multidimensional Gaussian difference privacy protection method and information security system
CN109257217B (en) * 2018-09-19 2021-08-10 河海大学 Privacy protection-based Web service QoS prediction method under mobile edge environment
CN109376549B (en) * 2018-10-25 2021-09-10 广州电力交易中心有限责任公司 Electric power transaction big data publishing method based on differential privacy protection
CN111881345B (en) * 2020-07-13 2023-06-09 汕头大学 Neural collaborative filtering service quality prediction method based on position context awareness
CN116595254B (en) * 2023-05-18 2023-12-12 杭州绿城信息技术有限公司 Data privacy and service recommendation method in smart city

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170449A (en) * 2011-04-28 2011-08-31 浙江大学 Web service QoS prediction method based on collaborative filtering
CN103377250A (en) * 2012-04-27 2013-10-30 杭州载言网络技术有限公司 Top-k recommendation method based on neighborhood

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684850B (en) * 2013-11-25 2017-02-22 浙江大学 Service neighborhood based Web Service quality prediction method
CN103840985A (en) * 2014-02-28 2014-06-04 浙江大学 Web service quality prediction method and device based on user neighborhoods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102170449A (en) * 2011-04-28 2011-08-31 浙江大学 Web service QoS prediction method based on collaborative filtering
CN103377250A (en) * 2012-04-27 2013-10-30 杭州载言网络技术有限公司 Top-k recommendation method based on neighborhood

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE, MING ET AL.: "A Collaborative Filtering Recommendation Method Based on Differential Privacy", JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT, vol. 54, no. 7, 15 July 2017 (2017-07-15), pages 1142 - 1145 *
ZHU, TIANQING ET AL.: "Differential Privacy for Neighborhood-Based Collaborative Filtering '' 2013", IEEE /ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, 10 April 2014 (2014-04-10), pages 752 - 759, XP032586089 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276016A (en) * 2019-06-28 2019-09-24 中国科学技术大学 A kind of socialization recommended method based on difference privacy
CN110276016B (en) * 2019-06-28 2022-10-28 中国科学技术大学 Social recommendation method based on differential privacy
CN113364621A (en) * 2021-06-04 2021-09-07 浙江大学 Service quality prediction method under service network environment
CN113423058A (en) * 2021-06-08 2021-09-21 山东浪潮科学研究院有限公司 Privacy protection method based on location-based service
CN114760657A (en) * 2022-03-11 2022-07-15 河海大学 Active QoS monitoring method and system based on LSTM-BSPM under mobile edge environment
CN114760657B (en) * 2022-03-11 2024-05-03 河海大学 Active QoS monitoring method and system based on LSTM-BSPM in mobile edge environment
CN116132347A (en) * 2023-04-06 2023-05-16 湖南工商大学 Bi-LSTM-based service QoS prediction method in computing network convergence environment
CN116132347B (en) * 2023-04-06 2023-06-27 湖南工商大学 Bi-LSTM-based service QoS prediction method in computing network convergence environment

Also Published As

Publication number Publication date
CN107609421A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
WO2019056572A1 (en) Model-based collaborative filtering method for collaborative web quality-of-service prediction for privacy protection
WO2019056573A1 (en) Differential privacy-based system and method for collaborative web quality-of-service prediction for privacy protection
WO2019056571A1 (en) Method for predicting quality of web service
Ganti et al. PoolView: stream privacy for grassroots participatory sensing
Liu et al. Differential private collaborative Web services QoS prediction
Silva et al. Privacy in the cloud: A survey of existing solutions and research challenges
Zhang et al. Privacy for all: Demystify vulnerability disparity of differential privacy against membership inference attack
Pötter et al. Towards privacy-preserving framework for non-intrusive load monitoring
Jiang et al. Differential privacy in privacy-preserving big data and learning: Challenge and opportunity
US20230351036A1 (en) Data Analytics Privacy Platform with Quantified Re-Identification Risk
WO2020204812A1 (en) Privacy separated credit scoring mechanism
Rodríguez et al. Towards the adaptation of SDC methods to stream mining
Alfalayleh et al. Quantifying privacy: A novel entropy-based measure of disclosure risk
Jha et al. Big data security and privacy: A review on issues challenges and privacy preserving methods
Ali-Eldin et al. A privacy risk assessment model for open data
Zhang et al. Individual attribute and cascade influence capability-based privacy protection method in social networks
Zhang et al. A Game-theoretic Framework for Federated Learning
Liu et al. Differential privacy performance evaluation under the condition of non-uniform noise distribution
Chehab et al. Towards a lightweight policy-based privacy enforcing approach for IoT
Li et al. Differentially private network data release via stochastic kronecker graph
Thuraisingham et al. Towards a framework for developing cyber privacy metrics: A vision paper
Brito Differentially private release of count-weighted graphs
Borisov et al. Application of Computer Simulation to the Anonymization of Personal Data: State-of-the-Art and Key Points
Surendra et al. Considerations for privacy preserved open Big Data analytics platform
Ravikanth et al. Implementation of Robust Privacy-Preserving Machine Learning with Intrusion Detection and Cybersecurity Protection Mechanism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17926332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17926332

Country of ref document: EP

Kind code of ref document: A1