CN110059501A

CN110059501A - A kind of safely outsourced machine learning method based on difference privacy

Info

Publication number: CN110059501A
Application number: CN201910302716.XA
Authority: CN
Inventors: 李进; 雷震光; 李同; 姜冲
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2019-07-26
Anticipated expiration: 2039-04-16
Also published as: CN110059501B

Abstract

The invention discloses a security outsourcing machine learning method based on differential privacy, which belongs to the field of cyberspace security. After uploading to the cloud server, the cloud server stores and adds noise to the encrypted data, and obtains the query function through interaction with the machine learning model provider to perform machine learning. This method effectively combines outsourcing computing with differential privacy, which not only ensures the security and privacy of machine learning, but also greatly reduces computing overhead and computing cost and improves computing efficiency, effectively alleviating the traditional outsourcing machine learning method. inefficiencies and security issues.

Description

A secure outsourcing machine learning method based on differential privacy

技术领域technical field

本发明属于网络空间安全领域，具体涉及一种基于差分隐私的安全外包机器学习方法。The invention belongs to the field of cyberspace security, in particular to a security outsourcing machine learning method based on differential privacy.

背景技术Background technique

随着互联网和信息技术的发展，越来越多的数据产生和被利用。据统计，目前全球数据的增长速度在每年40％左右，未来五年全球大数据产业将得到强劲发展。面对日益激增的海量数据，云计算技术作为一种新型的数据计算和存储模式，可以大大满足其对存储和处理的要求。通过云计算技术的存储和计算外包能力，用户可将本地计算及存储需求转移到云端，借助云服务器强大的计算和存储能力来提高对数据处理的效率。因此，拥有强大计算能力的云计算技术成为大数据技术的好伙伴。With the development of the Internet and information technology, more and more data are generated and utilized. According to statistics, the current growth rate of global data is about 40% per year, and the global big data industry will develop strongly in the next five years. In the face of the increasing surge of massive data, cloud computing technology, as a new data computing and storage mode, can greatly meet its storage and processing requirements. Through the storage and computing outsourcing capabilities of cloud computing technology, users can transfer local computing and storage requirements to the cloud, and use the powerful computing and storage capabilities of cloud servers to improve the efficiency of data processing. Therefore, cloud computing technology with powerful computing power has become a good partner of big data technology.

与此同时，机器学习以高效的学习算法、丰富庞大的数据和强大的计算环境为基础，利用人类积累的大量数据广泛应用于模式识别、计算机视觉、数据挖掘等场景中。在科学研究和工业发展的推动下,机器学习涉及的领域以及应用也越来越广，尤其是在医学、金融、商业等领域。例如在医学诊断中,我们通过收集海量病例数据训练出一个机器学习模型，可以精确的分析出患者患某种疾病的几率。At the same time, machine learning is based on efficient learning algorithms, abundant and huge data and powerful computing environment, and is widely used in pattern recognition, computer vision, data mining and other scenarios by using a large amount of data accumulated by humans. Driven by scientific research and industrial development, machine learning involves more and more fields and applications, especially in medicine, finance, business and other fields. For example, in medical diagnosis, we train a machine learning model by collecting massive case data, which can accurately analyze the probability of a patient suffering from a certain disease.

云计算外包服务虽然利用其强大的存储和计算能力解决用户计算困难等问题，但作为不完全可信的第三方，我们的个人敏感信息会面临很多新的安全挑战，包括外包数据存储和计算等服务的安全和隐私保护问题。例如：2018年3月17日《纽约时报》报道称，剑桥咨询公司获得了超5000万的Facebook用户数据访问权限，导致坐拥20亿用户的Facebook陷入了史上最大的个人信息泄露风波。Although cloud computing outsourcing services use its powerful storage and computing capabilities to solve problems such as user computing difficulties, as an incompletely trusted third party, our personal sensitive information will face many new security challenges, including outsourcing data storage and computing, etc. Service security and privacy protection issues. For example: On March 17, 2018, The New York Times reported that Cambridge Consulting had obtained access to the data of more than 50 million Facebook users, causing Facebook, which has 2 billion users, to fall into the largest personal information leak in history.

应对上述隐私挑战，传统解决方法是数据提供者通过使用加密技术来保护数据的隐私，但最终实现的效果很不理想。差分隐私作为一种最流行的隐私保护技术已经被广泛应用与研究，其主要思想是对于差别只有一条记录的两个数据集，查询它们获得相同值的概率是可忽略的。最常用的方法是在查询结果上添加满足某种分布的噪音，使查询结果随机化。作为一种替代方案，差分隐私不仅保护数据的隐私，而且提高对数据处理的效率。因此，数据提供者将数据外包给云服务器，再通过云服务器与机器学习模型提供者进行交互可以完成安全有效的机器学习任务。To deal with the above privacy challenges, the traditional solution is for data providers to protect the privacy of data by using encryption technology, but the final result is not ideal. Differential privacy, as one of the most popular privacy protection techniques, has been widely used and studied. Its main idea is that for two data sets that differ by only one record, the probability of querying them to obtain the same value is negligible. The most common method is to randomize the query results by adding noise that satisfies a certain distribution to the query results. As an alternative, differential privacy not only protects the privacy of data, but also improves the efficiency of data processing. Therefore, the data provider outsources the data to the cloud server, and then interacts with the machine learning model provider through the cloud server to complete safe and effective machine learning tasks.

在对现有方法的研究中发现，传统方法至少存在以下问题：In the research of existing methods, it is found that traditional methods have at least the following problems:

1)为了适应不同的应用和隐私预算，应用于不同查询任务的数据必须添加不同类型的噪声，这不可避免地增大计算开销和交互，提高了计算成本。1) In order to adapt to different applications and privacy budgets, different types of noise must be added to the data applied to different query tasks, which inevitably increases the computational overhead and interaction, and increases the computational cost.

2)当数据提供者发布他们的数据时，公共实体即云服务器必须存在于能够以不同类型的噪声存储所有不同类型数据集的地方，对云服务器的存储空间提出了很大的挑战。2) When data providers publish their data, the public entity i.e. cloud server must exist where all different types of datasets can be stored with different types of noise, posing a great challenge to the storage space of cloud servers.

发明内容SUMMARY OF THE INVENTION

为了解决传统方案在数据集中添加不同类型的噪声而导致的低效率问题，本发明提供一种基于差分隐私的安全外包机器学习方法，结合云计算技术和差分隐私技术，将复杂的计算和存储任务外包出去，不仅保证了机器学习的安全性和隐私性，同时大大降低了计算开销和成本并提升了计算效率，有效缓解了传统外包机器学习方法中面临的低效和安全问题。In order to solve the problem of low efficiency caused by adding different types of noise to the data set in the traditional solution, the present invention provides a secure outsourcing machine learning method based on differential privacy, which combines cloud computing technology and differential privacy technology to convert complex computing and storage tasks. Outsourcing not only ensures the security and privacy of machine learning, but also greatly reduces computing overhead and cost and improves computing efficiency, effectively alleviating the inefficiency and security problems faced by traditional outsourcing machine learning methods.

本发明采用如下技术方案实现：本基于差分隐私的安全外包机器学习方法，包括步骤：The present invention adopts the following technical solutions to realize: the differential privacy-based security outsourcing machine learning method includes the steps:

S1、数据提供者选取具有加法同态加密性质的Paillier加密算法，与机器学习模型提供者DE协商生成一对密钥(sk,pk)，其中数据提供者持有公钥pk，机器学习模型提供者DE持有私钥sk；S1. The data provider selects the Paillier encryption algorithm with the property of additive homomorphic encryption, and negotiates with the machine learning model provider DE to generate a pair of keys (sk, pk), in which the data provider holds the public key pk, and the machine learning model provides The owner DE holds the private key sk;

S2、数据提供者将在上传之前根据属性区分对其数据进行预处理，随后用公钥pk将预处理的数据M＝(m₁,m₂,...,m_n)加密，将加密后的密文数据||m₁||_pk，||m₂||_pk，...，||m_n||_pk发送到云服务器CSP；S2. The data provider will preprocess its data according to the attribute distinction before uploading, and then encrypt the preprocessed data M=(m ₁ , m ₂ , . . . , m _n ) with the public key pk. The ciphertext data ||m ₁ || _pk , ||m ₂ || _pk , ..., ||m _n || _pk are sent to the cloud server CSP;

S3、云服务器CSP接收上传的密文数据，并从机器学习模型提供者DE中获取查询函数F，计算出符合ε-差分隐私标准的噪声η，用Add(||M||,||η||)算法将其添加到步骤S2中的密文数据,将加噪后的数据||F(M)+η||_pk发送给机器学习模型提供者DE；S3. The cloud server CSP receives the uploaded ciphertext data, and obtains the query function F from the machine learning model provider DE, and calculates the noise η that conforms to the ε-differential privacy standard. Add(||M||,||η ||) algorithm adds it to the ciphertext data in step S2, and sends the noised data ||F(M)+η|| _pk to the machine learning model provider DE;

S4、机器学习模型提供者DE接收加噪后的数据，解密Dec(||m₁||_pk，||m₂||_pk，...，||m_n||_pk，sk)后，获取噪声数据(F(M)+η)，并作为输入，运用机器学习算法对噪声数据进行分析，完成机器学习任务。S4. The machine learning model provider DE receives the noised data, and after decrypting Dec(||m ₁ || _pk , ||m ₂ || _pk , ..., ||m _n || _pk , sk), Obtain the noise data (F(M)+η), and use it as an input to analyze the noise data using machine learning algorithms to complete the machine learning task.

本发明相比传统全同态加密，不需要花费云服务器大量的存储空间和计算空间；利用同态加密的性质可使在第三步中的云服务器安全地对加密数据加噪声，解决了数据安全问题。与现有方法相比，其取得的有益效果主要有以下几点：Compared with the traditional full homomorphic encryption, the present invention does not need to spend a large amount of storage space and computing space on the cloud server; using the properties of homomorphic encryption, the cloud server in the third step can safely add noise to the encrypted data, which solves the problem of data encryption. safe question. Compared with the existing methods, the beneficial effects obtained are mainly as follows:

1)数据提供者不需要在本地添加噪声，噪声的添加借助强大的云服务器运用云计算技术来完成。1) The data provider does not need to add noise locally, and the addition of noise is done with the help of a powerful cloud server using cloud computing technology.

2)通过加同态加密技术，利用对密文数据加法操作而不影响其数据完整性的性质，保证了数据在运算和存储过程中不会在云服务器和机器学习模型提供者之间泄露；而相比全同态加密又大大降低了通信复杂度，减少了加密过程中的交互操作，降低了计算开销，提高了计算效率。同时，引用差分隐私技术通过对敏感数据的加噪声使其实现隐私保护。2) Through the addition of homomorphic encryption technology, the property of adding operations to ciphertext data without affecting its data integrity ensures that the data will not be leaked between the cloud server and the machine learning model provider during the operation and storage process; Compared with fully homomorphic encryption, the communication complexity is greatly reduced, the interactive operations in the encryption process are reduced, the computational overhead is reduced, and the computational efficiency is improved. At the same time, the differential privacy technology is used to achieve privacy protection by adding noise to sensitive data.

3)保证了外包机器学习的安全性，隐私数据在没有向不可信的第三方透露的前提下实现了机器学习。3) The security of outsourced machine learning is guaranteed, and private data is not disclosed to untrusted third parties to realize machine learning.

附图说明Description of drawings

图1是本发明外包机器学习方法的流程图；Fig. 1 is the flow chart of the outsourcing machine learning method of the present invention;

图2是在运用本发明方法的数据和原始不加噪数据进行相同机器学习任务的效果对比图。FIG. 2 is a comparison diagram of the effect of using the data of the method of the present invention and the original non-noise-added data to perform the same machine learning task.

具体实施方式Detailed ways

基于云端的数据计算作为一种新型的数据计算和存储模式，具有十分强大的数据处理能力和更大的存储空间。本发明通过云计算技术，大量的本地计算操作(包括运用差分隐私技术加噪)可以借助云服务器去完成；通过云服务器与机器学习模型提供者的交互完成机器学习任务，从而实现了安全高效的外包机器学习任务。为了便于技术人员对本发明的理解，下面结合附图和实施例对本发明进行详细说明，但本发明的实施方式不限于此。As a new type of data computing and storage mode, cloud-based data computing has very powerful data processing capabilities and larger storage space. In the present invention, through cloud computing technology, a large number of local computing operations (including using differential privacy technology to add noise) can be completed with the help of cloud servers; machine learning tasks are completed through the interaction between cloud servers and machine learning model providers, thereby realizing safe and efficient. Outsource machine learning tasks. In order to facilitate the understanding of the present invention by those skilled in the art, the present invention will be described in detail below with reference to the accompanying drawings and embodiments, but the embodiments of the present invention are not limited thereto.

本发明涉及的一些基本概念如下：Some basic concepts involved in the present invention are as follows:

1)Paillier同态加密：同态加密技术与一般的加密技术一样对加密方消息实施加密操作，即在不解密密文的条件下，通过对密文执行操作，就能够做到对明文数据的各种计算，满足了隐私保护的安全性需求。其次，同态加密技术具有一般加密技术不具备的自然属性。一般加密状态的数据直接计算便会破坏相应明文，而利用同态加密的密文数据可直接运算而不会破坏对应明文信息的完整性和保密性。总之，同态加密是一种加密形式，它允许特定类型的计算对密文进行加密，解密时对明文执行匹配结果的操作可以获得一个加密的结果。Paillier同态加密是加法性质同态加密，在密文空间计算中有较好的应用，也适用于本发明方法。1) Paillier Homomorphic Encryption: Homomorphic encryption technology, like general encryption technology, performs encryption operations on the encrypted message, that is, without decrypting the ciphertext, by performing operations on the ciphertext, the plaintext data can be encrypted. Various calculations meet the security requirements of privacy protection. Secondly, homomorphic encryption technology has natural properties that general encryption technology does not have. In general, direct calculation of encrypted data will destroy the corresponding plaintext, while ciphertext data using homomorphic encryption can be directly calculated without destroying the integrity and confidentiality of the corresponding plaintext information. In summary, homomorphic encryption is a form of encryption that allows certain types of computations to encrypt ciphertext, and perform operations on the plaintext to obtain an encrypted result during decryption. Paillier homomorphic encryption is additive homomorphic encryption, which has a good application in the calculation of ciphertext space and is also suitable for the method of the present invention.

2)ε-差分隐私：是一种用于在统计数据库中形式化隐私的框架，用来防止去匿名化的技术。在此定义下，对数据库的计算处理结果对于具体某个记录的变化是不敏感的，单个记录在数据集中或者不在数据集中，对计算结果的影响微乎其微。由于差分隐私是一个概率概念，任何差分隐私机制必然是随机的。针对本方法我们采用Laplace机制，主要是通过添加基于ΔF和隐私预算ε的Laplace噪声来干扰数据。2) ε-Differential Privacy: is a framework for formalizing privacy in statistical databases, a technique used to prevent de-anonymization. Under this definition, the calculation and processing results of the database are insensitive to changes in a specific record, and a single record in the data set or not in the data set has little impact on the calculation results. Since differential privacy is a probabilistic concept, any differential privacy mechanism is necessarily random. For this method, we adopt the Laplace mechanism, which mainly disturbs the data by adding Laplace noise based on ΔF and privacy budget ε.

3)外包计算：外包计算是一种将开销大、计算复杂的计算外包给不受信任的服务器的技术，它允许资源受限的数据提供者将其计算负载外包给具有无限计算资源的云服务器。3) Outsourced computing: Outsourced computing is a technology that outsources expensive and computationally complex computations to untrusted servers, which allows resource-constrained data providers to outsource their computing load to cloud servers with unlimited computing resources .

4)机器学习：美国人工智能领域专家Arthur Samuel对机器学习是这样描述的：机器学习是在不直接针对问题进行编程的情况下，赋予计算机学习能力的一个研究领域。机器学习领域大体上分为三个子领域：监督学习、无监督学习和强化学习。同时机器学习作为一种服务，大型互联网公司现在将机器学习作为其云平台上的一项服务。例如Google预测API，Amazon机器学习(AmazonML)，Microsoft Azure机器学习(Azure ML)等。我们可以通过使用云平台上的机器学习应用来完成我们的机器学习任务。4) Machine learning: Arthur Samuel, an American expert in artificial intelligence, described machine learning as follows: Machine learning is a research field that gives computers the ability to learn without directly programming problems. The field of machine learning is broadly divided into three subfields: supervised learning, unsupervised learning, and reinforcement learning. At the same time machine learning as a service, large internet companies now offer machine learning as a service on their cloud platforms. Such as Google Prediction API, Amazon Machine Learning (AmazonML), Microsoft Azure Machine Learning (Azure ML), etc. We can accomplish our machine learning tasks by using machine learning applications on cloud platforms.

如图1所示，本方法中存在3个实体，分别为用户(数据提供者)、云服务器(CSP)、机器学习模型提供者Data Evaluators(DE)；其中，用户拥有数据，并提供这些数据应用于机器学习；CSP与用户交互为其提供云存储和外包计算服务，并对数据进行加噪处理；DE与CSP交互获取含噪数据并执行相应的机器学习任务。具体步骤如下：As shown in Figure 1, there are 3 entities in this method, namely users (data providers), cloud servers (CSP), and machine learning model providers Data Evaluators (DE); among them, users own data and provide these data Applied to machine learning; CSP interacts with users to provide cloud storage and outsourced computing services, and adds noise to the data; DE interacts with CSP to obtain noisy data and perform corresponding machine learning tasks. Specific steps are as follows:

S1、用户选取具有加法同态加密性质的Paillier加密算法，与DE协商生成一对密钥(sk,pk)。随机取大质数p和q，令n＝pq，λ(n)＝lcm(p-1,q-1)，并且定义函数L(n)＝(n-1)/n，然后随机取g∈(Z/n2Z)*使其满足gcd(L(g^λ(n)mod n²))＝1，其中lcm和gcd分别表示最小公倍数和最大公约数。公钥为pk＝(n,g)，私钥为sk＝(p,λ)；其中用户持有公钥pk，DE持有私钥sk。S1. The user selects the Paillier encryption algorithm with the property of additive homomorphic encryption, and negotiates with the DE to generate a pair of keys (sk, pk). Randomly take large prime numbers p and q, let n=pq, λ(n)=lcm(p-1,q-1), and define the function L(n)=(n-1)/n, then randomly take g∈ (Z/n2Z)* so that it satisfies gcd(L(g ^λ(n) mod n ² ))=1, where lcm and gcd represent the least common multiple and the greatest common divisor, respectively. The public key is pk=(n, g), and the private key is sk=(p, λ); the user holds the public key pk, and the DE holds the private key sk.

S2、用户将在上传之前根据属性区分对其数据进行预处理。随后，用公钥pk将预处理的数据M＝(m₁,m₂,...,m_n)加密，将加密后的密文数据||m₁||_pk，||m₂||_pk，...，||m_n||_pk发送到云服务器。S2. The user will preprocess his data according to the attribute distinction before uploading. Then, encrypt the preprocessed data M=(m ₁ , m ₂ , . . . , m _n ) with the public key pk, and encrypt the encrypted ciphertext data ||m ₁ || _pk , ||m ₂ || _pk , ..., ||m _n || _pk is sent to the cloud server.

加密过程为：要加密的明文为M∈Z_n，选择一个随机数r＜n，然后计算密文C＝g^Mrⁿmod n²。The encryption process is: the plaintext to be encrypted is M∈Z _n , select a random number r<n, and then calculate the ciphertext C=g ^M r ⁿ mod n ² .

S3、云服务器CSP接收步骤S2上传的密文数据，并从机器学习模型提供者DE中获取查询函数F，采用Laplace机制计算出符合ε-差分隐私标准的噪声η，用Add(||M||,||η||)算法将其添加到步骤S2中的密文数据,将加噪后的数据||F(M)+η||_pk发送给DE。S3. The cloud server CSP receives the ciphertext data uploaded in step S2, obtains the query function F from the machine learning model provider DE, uses the Laplace mechanism to calculate the noise η that conforms to the ε-differential privacy standard, and uses Add(||M| |,||η||) algorithm adds it to the ciphertext data in step S2, and sends the noised data ||F(M)+η|| _pk to DE.

由于采用加法同态加密Paillier算法，保证了数据在被用户加密后经过云服务器不被泄露隐私，同时利用加法同态加密性质E(x+y)＝Eval(E(x),E(y))使云服务器可以直接对密文数据进行运算而不会破坏对应明文信息的完整性和保密性。Due to the use of the additive homomorphic encryption Paillier algorithm, it is guaranteed that the privacy of the data will not be leaked through the cloud server after being encrypted by the user. At the same time, the property of additive homomorphic encryption E(x+y)=Eval(E(x), E(y) ) enables the cloud server to directly operate on the ciphertext data without destroying the integrity and confidentiality of the corresponding plaintext information.

云服务器CSP运用Laplace机制计算出符合ε-差分隐私标准的噪声η的步骤如下：The steps for cloud server CSP to use Laplace mechanism to calculate the noise η conforming to the ε-differential privacy standard are as follows:

S31、云服务器CSP首先与机器学习模型提供者DE交互得到查询函数F，计算敏感度ΔF。S31. The cloud server CSP first interacts with the machine learning model provider DE to obtain a query function F, and calculates the sensitivity ΔF.

S32、根据设置好的privacy budget参数计算出b＝ΔF/ε。S32, calculate b=ΔF/ε according to the set privacy budget parameter.

S33、生成Laplace噪声η。S33. Generate Laplace noise η.

S4、DE接收步骤S3上传的加噪后的数据，解密Dec(||m₁||_pk，||m₂||_pk，...，||m_n||_pk，sk)后，获取噪声数据(F(M)+η)，并作为输入，运用机器学习算法对噪声数据进行分类、回归等分析，从而完成机器学习任务。因为是加噪的数据，机器学习模型提供者无法得知其原始数据，实现了机器学习过程中的隐私保护。S4. DE receives the noise-added data uploaded in step S3, decrypts Dec(||m ₁ || _pk , ||m ₂ || _pk , ..., ||m _n || _pk , sk), and obtains Noise data (F(M)+η) is used as input, and machine learning algorithm is used to classify and regress the noise data, so as to complete the machine learning task. Because it is noisy data, the machine learning model provider cannot know its original data, which realizes the privacy protection in the machine learning process.

解密过程为：计算F(M)+η＝L(C^λ(n)mod n²)*μmod n，其中C为密文信息。The decryption process is: calculating F(M)+η=L(C ^λ(n) mod n ² )*μmod n, where C is the ciphertext information.

从以上实施过程可知，本发明基于ε-差分隐私的安全外包机器学习方法，将云计算技术与差分隐私结合起来，利用同态加密性质将加噪操作外包给云服务器。云服务器通过与机器学习模型提供者之间的交互，使数据可以安全的参与机器学习，不但保护了用户的隐私、没有产生隐私泄露，相比传统方案还大大提升了运算效率。本方法具有很强的应用性，可广泛应用于各种需要实现隐私保护机器学习的场景。It can be seen from the above implementation process that the present invention is a secure outsourcing machine learning method based on ε-differential privacy, combines cloud computing technology with differential privacy, and utilizes the property of homomorphic encryption to outsource the noise addition operation to a cloud server. Through the interaction between the cloud server and the machine learning model provider, the data can safely participate in machine learning, which not only protects the user's privacy, but also does not cause privacy leakage, and greatly improves the computing efficiency compared with the traditional solution. This method has strong applicability and can be widely used in various scenarios where privacy-preserving machine learning needs to be implemented.

在本实施例中，用户为医院，医院作为数据提供者希望利用患者的个人病例即隐私数据运用机器学习算法来实现智能化诊断。但是，它既需要提供有用的数据用于机器学习，又要保护病人的隐私。另外两个实体，云服务器与机器学习模型提供者都是不完全可信的，我们利用本发明提供的方法可以实现在不向其他实体泄露隐私的情况下实现机器学习。In this embodiment, the user is a hospital, and the hospital, as a data provider, hopes to use the patient's personal case, that is, the private data, to implement intelligent diagnosis by using a machine learning algorithm. However, it requires both providing useful data for machine learning and protecting patient privacy. The other two entities, the cloud server and the machine learning model provider, are not completely trusted, and we can implement machine learning without revealing privacy to other entities by using the method provided by the present invention.

首先，医院作为用户即数据提供者，用Paillier加法同态加密算法生成一对密钥(sk，pk)，其中医院持有公钥pk，利用公钥pk对用于训练机器学习模型的敏感数据M＝(m₁,m₂,...,m_n)进行加密处理生成密文，私钥sk交由机器学习模型提供者保管。First, as a user or data provider, the hospital uses the Paillier addition homomorphic encryption algorithm to generate a pair of keys (sk, pk), in which the hospital holds the public key pk, and uses the public key pk to pair the sensitive data used to train the machine learning model M=(m ₁ , m ₂ ,..., m _n ) performs encryption processing to generate ciphertext, and the private key sk is handed over to the machine learning model provider for safekeeping.

随后医院将加密后的密文数据上传到云服务器，云服务器接收所上传的密文，并从机器学习模型提供者DE中获取查询函数F，采用Laplace机制计算出符合ε-差分隐私标准的噪声η，用Add(||M||,||η||)算法将其添加到所生成的密文中,将加噪后的数据||F(M)+η||_pk发送给机器学习模型提供者。Then the hospital uploads the encrypted ciphertext data to the cloud server, the cloud server receives the uploaded ciphertext, and obtains the query function F from the machine learning model provider DE, and uses the Laplace mechanism to calculate the noise that conforms to the ε-differential privacy standard η, use the Add(||M||,||η||) algorithm to add it to the generated ciphertext, and send the noised data ||F(M)+η|| _pk to the machine learning model provider.

机器学习提供者用私钥sk解密后得到噪声数据，可根据医院的需求进行机器学习任务，例如对特定疾病的诊断。The machine learning provider decrypts the noisy data with the private key sk, and can perform machine learning tasks according to the needs of the hospital, such as the diagnosis of specific diseases.

如图2所示，通过实验对比原始不加噪声的数据，在机器学习过程中，使用本方法对原始数据加噪后运用不同的机器学习算法进行分类任务的影响很小。因此，本方法在保证安全隐私的前提下并没有太大影响机器学习的准确性，实验可证明本方法完全有效。As shown in Figure 2, by comparing the original data without noise through experiments, in the process of machine learning, using this method to add noise to the original data has little impact on the classification task using different machine learning algorithms. Therefore, this method does not greatly affect the accuracy of machine learning under the premise of ensuring security and privacy, and experiments can prove that this method is completely effective.

本专利所描述的实施例仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The embodiments described in this patent are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Claims

1. a security outsourcing machine learning method based on differential privacy, is characterized in that, comprises the steps:

S1. The data provider selects the Paillier encryption algorithm with the property of additive homomorphic encryption, and negotiates with the machine learning model provider DE to generate a pair of keys (sk, pk), in which the data provider holds the public key pk, and the machine learning model provides The owner DE holds the private key sk;

S2. The data provider will preprocess its data according to the attribute distinction before uploading, and then encrypt the preprocessed data M=(m ₁ , m ₂ , . . . , m _n ) with the public key pk. The ciphertext data ∥m ₁ ∥ _pk , ∥m ₂ ∥ _pk , ..., ∥m _n ∥ _pk are sent to the cloud server CSP;

S3. The cloud server CSP receives the uploaded ciphertext data, and obtains the query function F from the machine learning model provider DE, and calculates the noise η that conforms to the ε-differential privacy standard, and uses the Add(∥M∥,∥η∥) algorithm Add it to the ciphertext data in step S2, and send the noised data ∥F(M)+ _η∥pk to the machine learning model provider DE;

S4. The machine learning model provider DE receives the noised data, decrypts Dec(∥m ₁ ∥ _pk , ∥m ₂ ∥ _pk , ..., ∥m _n ∥ _pk , sk), and obtains the noise data (F( M)+η), and as the input, use the machine learning algorithm to analyze the noise data to complete the machine learning task.

2. the security outsourcing machine learning method based on differential privacy according to claim 1, is characterized in that, in step S3, cloud server CSP utilizes Laplace mechanism to calculate the noise η that meets ε-differential privacy standard, and steps are as follows:

S31. The cloud server CSP first interacts with the machine learning model provider DE to obtain a query function F, and calculates the sensitivity ΔF;

S32, calculate b=ΔF/ε according to the set privacy budget parameter;

S33. Generate noise η.

3. The safe outsourcing machine learning method based on differential privacy according to claim 1, is characterized in that, the generation process of key (sk, pk) is: take large prime numbers p and q at random, let n=pq, λ ( n)=lcm(p-1,q-1), and define the function L(n)=(n-1)/n; then randomly take g∈(Z/n ² Z)* to satisfy gcd(L( g ^λ(n) mod n ² ))=1; where lcm and gcd represent the least common multiple and the greatest common divisor, respectively, the public key is pk=(n, g), and the private key is sk=(p, λ).

4. The security outsourcing machine learning method based on differential privacy according to claim 3, characterized in that, in the encryption process in step S2: the plaintext to be encrypted is M ∈ Z _n , a random number r<n is selected, and then the encryption is calculated. The text C=g Mr ⁿ mod ⁿ ² .

5. The secure outsourcing machine learning method based on differential privacy according to claim 3 or 4, wherein in step S4, decryption process: calculate F(M)+η=L(C ^λ(n) mod n ² ) *μmod n, where C is the ciphertext information.