WO2024098897A1 - 一种基于同态加密的预测模型训练方法、系统、设备及介质 - Google Patents

一种基于同态加密的预测模型训练方法、系统、设备及介质 Download PDF

Info

Publication number
WO2024098897A1
WO2024098897A1 PCT/CN2023/115580 CN2023115580W WO2024098897A1 WO 2024098897 A1 WO2024098897 A1 WO 2024098897A1 CN 2023115580 W CN2023115580 W CN 2023115580W WO 2024098897 A1 WO2024098897 A1 WO 2024098897A1
Authority
WO
WIPO (PCT)
Prior art keywords
prediction model
training
local
computing node
local prediction
Prior art date
Application number
PCT/CN2023/115580
Other languages
English (en)
French (fr)
Inventor
张旭
吴睿振
王小伟
孙华锦
王凛
Original Assignee
苏州元脑智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州元脑智能科技有限公司 filed Critical 苏州元脑智能科技有限公司
Publication of WO2024098897A1 publication Critical patent/WO2024098897A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols

Definitions

  • the present application belongs to the field of artificial intelligence, and specifically relates to a prediction model training method, system, device and non-volatile readable storage medium based on homomorphic encryption.
  • Gaussian process regression is a non-hyperparameter statistical probability model. Given training data and test inputs, the prediction of Gaussian process regression is divided into two steps: inference and prediction, and there is no need to solve optimization problems.
  • the inference process assumes that the function to be learned obeys the Gaussian process, gives the Gaussian prior probability distribution of the model, and then uses the observed value and Bayesian rule to find the Gaussian posterior probability distribution of the model.
  • each computing node After completing the local model prediction, each computing node sends the obtained local prediction (expectation and variance) to the server, allowing the server to complete the calculation of the global model, for example, using the average aggregation algorithm to obtain the global model.
  • the server After completing the local model prediction, each computing node sends the obtained local prediction (expectation and variance) to the server, allowing the server to complete the calculation of the global model, for example, using the average aggregation algorithm to obtain the global model.
  • attackers will snoop and steal the transmitted local prediction values, which threatens the privacy of the local model.
  • this application proposes a prediction model training method based on homomorphic encryption, including:
  • the local prediction model is trained on the computing node based on the training data, the local prediction model is encrypted using a homomorphic encryption algorithm, and the encrypted local prediction model is sent to the server;
  • the server side calculates the encrypted global prediction model using a predetermined calculation method based on the received encrypted local prediction model, and sends the global prediction model to the computing node;
  • the encrypted global prediction model is decrypted at the computing node using a homomorphic encryption algorithm, and the decrypted global prediction model is fused with the local prediction model.
  • training a local prediction model based on training data at a computing node includes:
  • a training subset is obtained based on the projection of the training set at the computing node, and a local prediction model is trained based on the Gaussian process regression algorithm using the training subset.
  • obtaining a training subset based on projecting the training set at a computing node includes:
  • a local projection set is determined through the projection set and based on the training data of the computing nodes, and a training subset of the computing nodes is determined according to the projection set.
  • determining a training subset of computing nodes according to the projection set further includes:
  • each projection point is taken from the local projection set, and training data within a neighborhood of a predetermined size is selected based on each projection point to construct a training subset.
  • the method further comprises:
  • the neighborhood range, the distance between data points, and the size of the projection set are determined based on the computing power of the computing nodes.
  • encrypting the local prediction model using a homomorphic encryption algorithm includes:
  • Public and private keys are constructed on the computing node based on the homomorphic encryption algorithm and the local prediction model is encrypted using the public key.
  • calculating the global prediction model by a predetermined calculation method using the received encrypted local prediction model at the server side includes:
  • the multiple encrypted local prediction models are multiplied to obtain the encrypted global prediction model.
  • decrypting the global prediction model by a homomorphic encryption algorithm at a computing node, and fusing the decrypted global prediction model with the local prediction model includes:
  • the average value of the intermediate global prediction model is obtained to obtain the global Prediction model, and fuse the global prediction model with the local prediction model on the computing node.
  • Another aspect of the present application further proposes a prediction model training system based on homomorphic encryption, comprising:
  • a local prediction model training module is configured to train a local prediction model based on training data on a computing node, encrypt the local prediction model using a homomorphic encryption algorithm, and send the encrypted local prediction model to a server;
  • a global prediction model generation module is configured to calculate the encrypted global prediction model of the received encrypted local prediction model by a predetermined calculation method at the server end, and send the global prediction model to the computing node;
  • the local prediction model optimization module is configured to decrypt the encrypted global prediction model through a homomorphic encryption algorithm at the computing node, and fuse the decrypted global prediction model with the local prediction model.
  • the local prediction model training module is further configured to:
  • a training subset is obtained based on the projection of the training set at the computing node, and a local prediction model is trained based on the Gaussian process regression algorithm using the training subset.
  • the local prediction model training module is further configured to:
  • a local projection set is determined through the projection set and based on the training data of the computing nodes, and a training subset of the computing nodes is determined according to the projection set.
  • the local prediction model training module is further configured to:
  • each projection point is taken from the local projection set, and training data within a neighborhood of a predetermined size is selected based on each projection point to construct a training subset.
  • the local prediction model training module is further configured to:
  • the neighborhood range, the distance between data points, and the size of the projection set are determined based on the computing power of the computing nodes.
  • the local prediction model training module is further configured to:
  • Public and private keys are constructed on the computing node based on the homomorphic encryption algorithm and the local prediction model is encrypted using the public key.
  • the global prediction model generation module is further configured to:
  • the multiple encrypted local prediction models are multiplied to obtain the encrypted global prediction model.
  • the local prediction model optimization module is further configured to:
  • the average value of the intermediate global prediction model is obtained to obtain the global Prediction model, and fuse the global prediction model with the local prediction model on the computing node.
  • Another aspect of the present application also provides a computer device, comprising:
  • the memory stores computer instructions executable on the processor, and when the instructions are executed by the processor, the steps of any one of the methods in the above-mentioned implementation manner are implemented.
  • Another aspect of the present application further provides a computer non-volatile readable storage medium, which stores a computer program.
  • a computer program When the computer program is executed by a processor, the steps of any one of the methods in the above-mentioned embodiments are implemented.
  • each computing node in distributed learning or the local prediction model obtained by training on the computing node is encrypted on the computing node according to the homomorphic encryption model, and the encrypted local prediction model is sent to the server.
  • the server directly multiplies the encrypted local prediction model according to the homomorphic encryption characteristics to obtain the encrypted global prediction model and feeds it back to the computing node.
  • the computing node optimizes and integrates its own local prediction model.
  • only the encrypted local prediction model and the ciphertext-based global prediction model are transmitted, which has extremely high security. At the same time, it has extremely high tolerance for data transmission bandwidth and transmission delay of distributed training.
  • FIG1 is a flow chart of a prediction model training method based on homomorphic encryption provided in an embodiment of the present application
  • FIG2 is a schematic diagram of the structure of a prediction model training system based on homomorphic encryption provided in an embodiment of the present application
  • FIG3 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of a computer non-volatile readable storage medium provided in an embodiment of the present application.
  • This application aims to solve the existing problems in federated learning or distributed learning, such as the time-consuming model training between distributed computing nodes.
  • the problem of model data intercommunication during the process of federated learning lies in privacy protection.
  • Each computing node only shares the model, while its own training data is private data and cannot be shared.
  • the security and privacy of the model are also taken seriously.
  • the trained model can reflect the characteristics of the training data of the computing node to a certain extent, or the state of the equipment or related data corresponding to the training data of the computing node in the field. When the model is obtained by others, the state of the related field represented by the model can be reversed through the output of the model.
  • sharing the model may also have security risks or the risk of being reversed by others.
  • Commonly used traditional encryption methods include the chaotic encryption algorithm proposed in the 1980s.
  • chaotic encryption cannot guarantee that ciphertext operations can be performed directly on the server side.
  • the ciphertext of the model prediction expectation and variance transmitted locally needs to be decrypted on the server side, and then the parameters of the decrypted model such as expectation and variance are aggregated to obtain the global prediction expectation and global variance. If you do not want the server to know the local model, then you can perform addition, subtraction, multiplication and division operations on the ciphertext. That is, even if the server cannot obtain the model data of the user or the corresponding computing node, it becomes a difficult problem to implement federated learning of the model.
  • the present application proposes a prediction model training method based on homomorphic encryption, including:
  • Step S1 training a local prediction model based on training data at a computing node, encrypting the local prediction model using a homomorphic encryption algorithm, and sending the encrypted local prediction model to a server;
  • Step S2 The server side calculates the encrypted global prediction model using a predetermined calculation method based on the received encrypted local prediction model, and sends the global prediction model to the computing node;
  • Step S3 Decrypt the encrypted global prediction model at the computing node using a homomorphic encryption algorithm, and merge the decrypted global prediction model with the local prediction model.
  • the computing node refers to a computer for training a local prediction model.
  • the local prediction model is obtained by training the collected data as training data according to the corresponding business model, and the local prediction model trained based on the training data of the computing node is further encrypted by a homomorphic encryption algorithm, and the encrypted local prediction model is sent to the server.
  • the homomorphic encryption algorithm is the Paillier encryption algorithm.
  • the content of the Paillier algorithm is as follows:
  • lcm(a,b) represents the least common multiple of a and b
  • gcd(a,b) represents the greatest common divisor of a and b.
  • Sn ⁇ u ⁇ n2
  • u 1modn ⁇
  • the function L on Sn is defined as:
  • step S1 at each computing node, the local prediction model trained by the node itself is encrypted to obtain the corresponding encrypted local prediction model ciphertext forms m 1 , m 2 ,...,m k at each computing node (assuming there are k computing nodes).
  • step S2 the encrypted local prediction model data received from the computing node is regarded as m 1 +...+m k .
  • the encrypted local model ciphertexts can be directly multiplied to obtain the encrypted global prediction model E(m 1 )E(m 2 )...E(m k ). Then the global prediction model is sent to each computing node.
  • step S3 after receiving the encrypted local prediction model E(m 1 )E(m 2 )...E(m k ), the computing node decrypts E(m 1 )E(m 2 )...E(m k ) according to the decryption method of the homomorphic encryption algorithm to obtain multiple m 1 +...+m k .
  • the average value of m 1 +...+m k is calculated to obtain the corresponding global prediction model.
  • the global prediction model is then fused and optimized with the local prediction model of the computing node itself.
  • training a local prediction model based on training data at a computing node includes:
  • a training subset is obtained based on the projection of the training set at the computing node, and a local prediction model is trained based on the Gaussian process regression algorithm using the training subset.
  • obtaining a training subset based on projecting the training set at a computing node includes:
  • determining a training subset of computing nodes according to the projection set further includes:
  • each projection point is taken from the local projection set, and training data within a neighborhood of a predetermined size is selected based on each projection point to construct a training subset.
  • the projection method is used to obtain a subset of training data for training the local prediction model. Specifically:
  • Step 1 Consider each computing node And its local training dataset For a test data ix*, calculate the test data ix* to the training set The projection of is marked as:
  • this local projection set Contains Projection data, that is
  • Step 2 For each computing node and its projection set Take out each projection point and mark it as Here the subscript j represents the jth projection point. Then for each projection point Find a neighborhood Make And for It should be noted here that the number of neighbors is adjustable and can be selected to be fixed.
  • Step 3 For each computing node Construct a new training set As the training subset for training local prediction models.
  • the method further comprises:
  • the neighborhood range, the distance between data points, and the size of the projection set are determined based on the computing power of the computing nodes.
  • the distance between the neighborhood and data points that affect the training subset The projection set is and neighborhood It can be flexibly set according to the computing performance of the computing node. When the performance of the computing node is good, the size of the upper training subset can be appropriately increased.
  • encrypting the local prediction model using a homomorphic encryption algorithm includes:
  • Public and private keys are constructed on the computing node based on the homomorphic encryption algorithm and the local prediction model is encrypted using the public key.
  • calculating the global prediction model by a predetermined calculation method using the received encrypted local prediction model at the server side includes:
  • the multiple encrypted local prediction models are multiplied to obtain the encrypted global prediction model.
  • the global prediction model is decrypted by a homomorphic encryption algorithm at a computing node. And the decrypted global prediction model is integrated with the local prediction model including:
  • the global prediction model is obtained by averaging the intermediate global prediction models according to the number of local prediction models participating in the calculation of the global prediction model, and the global prediction model is fused with the local prediction model on the computing node.
  • the present application uses the Gaussian process regression algorithm as the model training algorithm in combination with the homomorphic encryption training method of the present application for illustration.
  • the objective function as in is the nx-dimensional input space.
  • the output is one-dimensional, that is, At time t (federated learning is always in dynamic learning).
  • y f(x) + ⁇ formula (1).
  • the goal of Gaussian process regression is to use the training set In the test data set Approximate function f on.
  • Rnx ⁇ RnxR Define a symmetric positive semidefinite kernel function k: Rnx ⁇ RnxR , that is: ⁇ k(x,x')f(x)f(x')d ⁇ (x)d ⁇ (x') ⁇ 0;
  • Gaussian process regression uses training set Prediction test dataset This output It still obeys the normal distribution, that is, in:
  • each computing node will use the trained local prediction Send to the server.
  • Step 1 Consider each computing node And its local training dataset For a test data
  • this local projection set Contains Projection data, that is
  • Step 2 For each computing node and its projection set Take out each projection point and mark it as Here the subscript j represents the jth projection point. Then for each projection point Find a neighborhood Make And for It should be noted here that the number of neighbors is adjustable and can be selected to be fixed.
  • Step 3 For each computing node Construct a new training set
  • the kernel function is chosen as:
  • the Gaussian posterior probability distribution is calculated, that is:
  • the following is a local model transmission scheme based on the Paillier homomorphic encryption algorithm and a server-side ciphertext average aggregation algorithm.
  • This encryption algorithm has been proven to be semantically secure.
  • n is the number of encrypted local prediction models, that is, the sum of each computing node.
  • Each computing node uses the global prediction m ave,1 (t) and m ave,2 (t) and its own local prediction models m 1 and m 2 to optimize the model prediction.
  • the local prediction model trained on each computing node in distributed learning is encrypted on the computing node according to the homomorphic encryption model, and the encrypted local prediction model is sent to the server.
  • the server directly multiplies the encrypted local prediction model according to the homomorphic encryption characteristics to obtain the encrypted global prediction model and feeds it back to the computing node.
  • the computing node optimizes and integrates its own local prediction model.
  • only the encrypted local prediction model and the ciphertext-based global prediction model are transmitted, which has extremely high security. At the same time, it has extremely high tolerance for data transmission bandwidth and transmission delay of distributed training.
  • the method provided by this application can be widely used in various fields of federated learning, and can fully take into account the needs of different users in various fields for privacy data security.
  • the user's data and local models will not be obtained by the federated learning provider (ie, the server side), fully guaranteeing the data privacy and model privacy of each participant under the federated learning framework. Even if the data in the transmission link is intercepted by someone, the corresponding model cannot be obtained. It can effectively prevent those who steal models through network means. It provides security for model sharing training in fields such as personal home assistants, and personal privacy data and local prediction models trained based on personal privacy data are under security protection. It is easier for those who provide federated learning services to gain the trust of users.
  • another aspect of the present application further proposes a prediction model training system based on homomorphic encryption, comprising:
  • the local prediction model training module 1 is configured to train a local prediction model based on training data at a computing node, encrypt the local prediction model using a homomorphic encryption algorithm, and send the encrypted local prediction model to a server;
  • the global prediction model generation module 2 is configured to calculate the encrypted global prediction model of the received encrypted local prediction model by a predetermined calculation method at the server end, and send the global prediction model to the computing node;
  • the local prediction model optimization module 3 is configured to decrypt the encrypted global prediction model through a homomorphic encryption algorithm at the computing node, and fuse the decrypted global prediction model with the local prediction model.
  • the local prediction model training module is further configured to:
  • a training subset is obtained based on the projection of the training set at the computing node, and a local prediction model is trained based on the Gaussian process regression algorithm using the training subset.
  • the local prediction model training module 1 is further configured to:
  • a local projection set is determined through the projection set and based on the training data of the computing nodes, and a training subset of the computing nodes is determined according to the projection set.
  • the local prediction model training module 1 is further configured to:
  • each projection point is taken from the local projection set, and training data within a neighborhood of a predetermined size is selected based on each projection point to construct a training subset.
  • the local prediction model training module 1 is further configured to:
  • the neighborhood range, the distance between data points, and the size of the projection set are determined based on the computing power of the computing nodes.
  • the local prediction model training module 1 is further configured to:
  • Public and private keys are constructed on the computing node based on the homomorphic encryption algorithm and the local prediction model is encrypted using the public key.
  • the global prediction model generation module 2 is further configured to:
  • the multiple encrypted local prediction models are multiplied to obtain the encrypted global prediction model.
  • the local prediction model optimization module 3 is further configured to:
  • the global prediction model is obtained by averaging the intermediate global prediction models according to the number of local prediction models participating in the calculation of the global prediction model, and the global prediction model is fused with the local prediction model on the computing node.
  • FIG. 3 another aspect of the present application further provides a computer device, comprising:
  • the memory 22 stores computer instructions 23 that can be executed on the processor 21.
  • the instructions 23 are executed by the processor 21, the steps of any one of the methods in the above-mentioned implementation manner are implemented.
  • FIG. 4 another aspect of the present application further proposes a computer non-volatile readable storage medium 401 , which stores a computer program 402 , and when the computer program 402 is executed by a processor, the steps of any one of the methods in the above-mentioned embodiments are implemented.
  • the software module can reside in a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a removable disk, a CD-ROM, or a storage medium of any other form known in the art.
  • An exemplary storage medium is coupled to the processor so that the processor can read information from the storage medium or write information to the storage medium.
  • the storage medium can be integrated with the processor.
  • the processor and the storage medium can reside in an ASIC.
  • the ASIC can reside in a user terminal.
  • the processor and the storage medium can reside in a user terminal as discrete components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请属于人工智能领域,具体涉及一种基于同态加密的预测模型训练方法、系统、设备及非易失性可读存储介质。方法包括:在计算节点基于训练数据训练局部预测模型,将所述局部预测模型通过同态加密算法进行加密,并将加密后的局部预测模型发送到服务端;在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将所述全局预测模型发送到计算节点;在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。通过本申请提出的一种基于同态加密的预测模型训练方法,整个分布式训练过程中仅有加密后的局部预测模型和基于密文的全局预测模型在传输,具有极高的安全性。

Description

一种基于同态加密的预测模型训练方法、系统、设备及介质
相关申请的交叉引用
本申请要求于2022年11月10日提交中国专利局,申请号为202211401730.3,申请名称为“一种基于同态加密的预测模型训练方法、系统、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于人工智能领域,具体涉及一种基于同态加密的预测模型训练方法、系统、设备及非易失性可读存储介质。
背景技术
一般分布式机器学习采用深度神经网络作为机器学习模型,根据中心极限定理,假设神经网络中的权重服从高斯正态分布,随着神经网络的宽度趋近于无穷,那么这样的神经网络等价于高斯过程回归。然而高斯过程回归是一个非超参数的统计概率模型,给定训练数据和测试输入,高斯过程回归的预测分为推断和预测两步,不需要求解优化问题。推断过程假设要学习的函数服从高斯过程,给出模型的高斯先验概率分布,然后利用观测值和Bayesian规则,求出模型的高斯后验概率分布。当完成局部模型预测之后,各个计算节点将所得到的局部预测(期望和方差)发送至服务器,让服务器完成全局模型的计算,例如,利用平均聚合算法求取全局模型。但是在局部模型传输过程中,攻击者会窥探和窃取传输的局部预测值,使得局部模型的隐私性受到了威胁。
当计算节点完成模型预测,在将预测结果发送至服务器的过程中,这些预测结果无疑是脆弱的,敏感的,容易受到攻击者的窥探和破坏。例如攻击者将图像数据集中的数据进行更改,使得训练出来的模型与真实的模型相比,有一定的偏差,这会影响到准确模型的应用,甚至会带来经济损失。为了保证模型预测在传输过程中不被窃取,加密方法是一个好的选择。
但传统的加密方式均需要则需要在服务端和计算节点之间互相加密和解密对解密后的明文进行模型计算。如果并不希望让服务器知道局部预测模型或者说并不希望在服务器端获取到(包括解密后)局部预测模型,则当前的传统实现方式并不能实现。
发明内容
为解决上述问题,本申请提出一种基于同态加密的预测模型训练方法,包括:
在计算节点基于训练数据训练局部预测模型,将局部预测模型通过同态加密算法进行加密,并将加密后的局部预测模型发送到服务端;
在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将全局预测模型发送到计算节点;
在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
在本申请的一些实施方式中,在计算节点基于训练数据训练局部预测模型包括:
在计算节点基于对训练集投影得到训练子集,并通过训练子集基于高斯过程回归算法训练局部预测模型。
在本申请的一些实施方式中,在计算节点基于对训练集投影得到训练子集包括:
定义训练数据点之间的距离并基于训练距离定义数据点到训练数据集合的投影集合;
通过投影集合并基于计算节点的训练数据确定局部投影集合,并根据投影集合确定计算节点的训练子集。
在本申请的一些实施方式中,根据投影集合确定计算节点的训练子集还包括:
响应于在计算节点确定局部投影集合,从局部投影集合中取出每一个投影点,基于每一个投影点选取预定大小的邻域范围内的训练数据构建训练子集。
在本申请的一些实施方式中,方法还包括:
根据计算节点的计算能力制定邻域范围、数据点之间的距离以及投影集合的大小。
在本申请的一些实施方式中,将局部预测模型通过同态加密算法进行加密包括:
在计算节点基于同态加密算法构建公钥和私钥并通过公钥对局部预测模型进行加密。
在本申请的一些实施方式中,在服务端将收到的加密后的局部预测模型通过预定计算方式计算全局预测模型包括:
响应于收到的多个加密后的局部预测模型根据同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
在本申请的一些实施方式中,在计算节点通过同态加密算法对全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合包括:
在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型;
根据参与全局预测模型计算的局部预测模型个数对中间全局预测模型求平均值得到全局 预测模型,并将全局预测模型与计算节点上的局部预测模型进行融合。
本申请的另一方面还提出一种基于同态加密的预测模型训练系统,包括:
局部预测模型训练模块,局部预测模型训练模块配置用于在计算节点基于训练数据训练局部预测模型,并将局部预测模型通过同态加密算法进行加密,将加密后的局部预测模型发送到服务端;
全局预测模型生成模块,全局预测模型生成模块配置用于在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将全局预测模型发送到计算节点;
局部预测模型优化模块,局部预测模型优化模块配置用于在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
在计算节点基于对训练集投影得到训练子集,并通过训练子集基于高斯过程回归算法训练局部预测模型。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
定义训练数据点之间的距离并基于训练距离定义数据点到训练数据集合的投影集合;
通过投影集合并基于计算节点的训练数据确定局部投影集合,并根据投影集合确定计算节点的训练子集。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
响应于在计算节点确定局部投影集合,从局部投影集合中取出每一个投影点,基于每一个投影点选取预定大小的邻域范围内的训练数据构建训练子集。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
根据计算节点的计算能力制定邻域范围、数据点之间的距离以及投影集合的大小。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
在计算节点基于同态加密算法构建公钥和私钥并通过公钥对局部预测模型进行加密。
在本申请的一些实施方式中,全局预测模型生成模块进一步配置用于:
响应于收到的多个加密后的局部预测模型根据同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
在本申请的一些实施方式中,局部预测模型优化模块进一步配置用于:
在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型;
根据参与全局预测模型计算的局部预测模型个数对中间全局预测模型求平均值得到全局 预测模型,并将全局预测模型与计算节点上的局部预测模型进行融合。
本申请的又一方面还提出一种计算机设备,包括:
至少一个处理器;以及
存储器,存储器存储有可在处理器上运行的计算机指令,指令由处理器执行时实现上述实施方式中任意一项方法的步骤。
本申请的再一方面还提出一种计算机非易失性可读存储介质,计算机非易失性可读存储介质存储有计算机程序,计算机程序被处理器执行时实现上述实施方式中任意一项方法的步骤。
通过本申请提出的一种基于同态加密的预测模型训练方法,将分布式学习中各个计算节点或计算节点上训练得到的局部预测模型在计算节点上根据同态加密模型加密,并将加密后的局部预测模型发送到服务器,服务器直接将加密后的局部预测模型根据同态加密特性进行相乘得到加密的全局预测模型并反馈给计算节点,计算节点解密后对自身的局部预测模型进行优化融合,整个分布式训练过程中仅有加密后的局部预测模型和基于密文的全局预测模型在传输,具有极高的安全性。同时对分布式训练的数据传输带宽和传输延迟都有极高的容忍度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种基于同态加密的预测模型训练方法的流程示意图;
图2为本申请实施例提供的一种基于同态加密的预测模型训练系统的结构示意图;
图3为本申请实施例提供的一种计算机设备的结构示意图;
图4为本申请实施例提供的一种计算机非易失性可读存储介质的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。
本申请旨在解决现有的联邦学习或分布式学习中,各个分布式计算节点之间在模型训练 时的模型数据互通问题,联邦学习的优点在于隐私保护,各个计算节点只将模型共享,而各自的训练数据则属于隐私数据不能共享。但处于对数据安全以及隐私安全的需要,模型的安全及隐私性也受到重视,对于一个计算节点来说训练出的模型在一定程度上能反应该计算节点的训练数据的特性或者说该计算节点的训练数据在该领域中对应的设备或者相关数据的状态,当被他人获取到模型时,可通过模型的输出反向推出模型所代表的相关领域的状态。因此将模型共享出去可能也会存在安全隐患或存在被人反推出相关信息的风险。而当前处于安全考虑需要在分布式计算节点之间对模型数据进行加密传输方法中,常用的传统的加密方式包括上世纪80年代提出的混沌加密算法,但是混沌加密并不能保证在服务器端可以直接进行密文运算,而是需要在服务器端对局部传输的模型预测期望和方差的密文进行解密,然后对解密后的模型的参数如期望和方差进行聚合,从而得到全局的预测期望和全局的方差。如果并不希望让服务器知道局部模型,那么又可以进行密文的加减乘除运算,即在即便是服务端也无法获取用户或对应计算节点的模型数据实现模型的联邦学习则成为一个难题。
如图1所示,为解决上述问题,本申请提出一种基于同态加密的预测模型训练方法,包括:
步骤S1、在计算节点基于训练数据训练局部预测模型,将局部预测模型通过同态加密算法进行加密,并将加密后的局部预测模型发送到服务端;
步骤S2、在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将全局预测模型发送到计算节点;
步骤S3、在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
在本实施例中,在步骤S1中,计算节点是指训练局部预测模型的计算机,一般在一个联邦学习系统中存在多个计算节点。根据自身收集到数据作为训练数据按照对应的业务模型进行训练得到局部预测模型,进一步将基于计算节点的训练数据训练好的局部预测模型通过同态加密算法进行加密,并将加密后的局部预测模型发送到服务端。
同态加密算法即Paillier加密算法,Paillier算法的内容如下:
(1)密钥生成:选择两个大素数p,q,令n=pq,λ=lcm(p-1,q-1),它们满足gcd(λ,n)=1。选择g使其满足:
gcd(L(gλmod 2),n)=1;
将(n,g)作为公钥,λ作为私钥。
(2)加密:对任意的选择随机数密文为:
c=gmrnmodn2
(3)解密:对任意的明文为:
符号表示:lcm(a,b)表示a,b的最小公倍数,gcd(a,b)表示a,b的最大公约数。表示集合{0,1,…,n-1}。表示集合中与n互素元素的集合。对于集合Sn={u<n2|u=1modn},定义Sn上的函数L为:
表示实数空间,表示n维欧式空间。|x|表示x的绝对值。
用E(m)表示对明文m的加密,用D(c)表示密文c的解密。以上算法是满足同态的,也就是对任意的有:
D(E(m1)E(m2))=m1+m2modn,
所以,如果也就有D(E(m1)E(m2))=m1+m2。容易得到此性质推广到多个数据相加的结果,也就是,如果有D(E(m1)…E(mk))=m1+…+mk。此加密算法已经被证明是满足语义安全的。因此,在步骤S1中,在各个计算节点,将其节点自身训练的到的局部预测模型进行加密分别在各个计算节点得到对应加密后的局部预测模型密文形式的m1,m2,……,mk,(假设有k个计算节点)。
基于上述原理,在步骤S2中,将所收到的来自计算节点加密后的局部预测模型的数据当做m1+…+mk,根据上述可知,将加密后的局部模型的密文直接相乘得即可到加密形式的全局预测模型E(m1)E(m2)……E(mk)。然后将全局预测模型发送给每一个计算节点。
在步骤S3中,计算节点在收到加密形式的局部预测模型E(m1)E(m2)……E(mk)之后,按照同态加密算法的解密方式对E(m1)E(m2)……E(mk)进行解密得到多个m1+…+mk。并对m1+…+mk求取平均值得到对应的全局预测模型。然后将全局预测模型与计算节点自身的局部预测模型进行融合优化。
在本申请的一些实施方式中,在计算节点基于训练数据训练局部预测模型包括:
在计算节点基于对训练集投影得到训练子集,并通过训练子集基于高斯过程回归算法训练局部预测模型。
在本申请的一些实施方式中,在计算节点基于对训练集投影得到训练子集包括:
定义训练数据点之间的距离并基于训练距离定义数据点到训练数据集合的投影集合;
通过投影集合并基于计算节点的训练数据确定局部投影集合,并根据投影集合确定计算节点的训练子集。
在本申请的一些实施方式中,根据投影集合确定计算节点的训练子集还包括:
响应于在计算节点确定局部投影集合,从局部投影集合中取出每一个投影点,基于每一个投影点选取预定大小的邻域范围内的训练数据构建训练子集。
在本实施例中,在一些情况下计算节点收集到的训练数据较多,并且随时间变化会持续增加,需要对训练数据进行进一步简化,故在本实施例中采用投影法获取用于训练局部预测模型的训练数据子集。具体地:
定义两个训练数据点x和x'的距离为d(x,x')=||x-x'||,数据点x到集合的距离为定义数据点x到集合的投影集合为
Step 1:考虑每一个计算节点及其局部训练数据集针对一个测试数据ix*,计算测试数据ix*到训练集的投影,标注为:
在每一个时刻t,这个局部投影集合包含有个投影数据,即
Step 2:对每一个计算节点及其投影集合取出每一个投影点,标注为这里下标j表示第j个投影点。然后针对每一个投影点找出它的一个邻域使得并且针对 这里需要注意,邻域的个数是可调的,可以实现选取固定。
Step 3:对每一个计算节点构造新的训练集合作为用于训练局部预测模型的训练子集。
在本申请的一些实施方式中,方法还包括:
根据计算节点的计算能力制定邻域范围、数据点之间的距离以及投影集合的大小。
在本实施例中,对于影响训练子集的邻域、数据点之间的距离投影集合为以及邻域均可根据计算节点的运算性能进行灵活设定。当计算节点的性能较好时可适当地提高上训练子集的大小。
在本申请的一些实施方式中,将局部预测模型通过同态加密算法进行加密包括:
在计算节点基于同态加密算法构建公钥和私钥并通过公钥对局部预测模型进行加密。
在本申请的一些实施方式中,在服务端将收到的加密后的局部预测模型通过预定计算方式计算全局预测模型包括:
响应于收到的多个加密后的局部预测模型根据同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
在本申请的一些实施方式中,在计算节点通过同态加密算法对全局预测模型进行解密, 并将解密后的全局预测模型与局部预测模型进行融合包括:
在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型;
根据参与全局预测模型计算的局部预测模型个数对中间全局预测模型求平均值得到全局预测模型,并将全局预测模型与计算节点上的局部预测模型进行融合。
实施例:
在本实施例中,本申请以高斯过程回归算法作为模型训练算法结合本申请的同态加密训练方法进行说明。
首先,定义目标函数为其中是nx维输入空间。不失一般性,我们假设输出为一维,即在时刻t(联邦学习一直处于动态学习中)。给定相应的输出为:y=f(x)+ε公式(1)。这里ε是服从均值为0,方差为的高斯概率分布的高斯噪声,即定义如下形式的训练集其中是输入数据集合,y=[y(1),y(2),...,y(ns)]Τ是聚合了输出的列向量。高斯过程回归目标是利用训练集在测试数据集合上逼近函数f。
定义对称正半定的核函数k:Rnx×RnxR,即:
∫k(x,x')f(x)f(x')dν(x)dν(x')≥0;
其中ν是测度。让返回一个列向量,使得它的第i个元素等于f(x(i))。假设函数f是来自高斯过程先验概率分布的一个采样,这个先验分布的均值函数为μ,核函数是k。那么训练输出和测试输出服从联合概率分布:
其中返回由μ(x(i))和μ(x*(i))组成的向量,返回一个矩阵使得第i行第j列的元素是k(x(i),x*(j))。
利用高斯过程的性质,高斯过程回归利用训练集预测测试数据集的输出。这个输出依然服从正态分布,即其中:

在分布式机器学习中,考虑一个网络中有n个计算节点。定义这个集合为在每一个时刻t,每一个计算节点利用局部的训练数据来预测函数对于测试输入的输出。其中,yi(t)=[yi(1),…,yi(t)]。每一个计算节点训练的局部预测值为:

如果在联邦学习框架下,每一个计算节点都会将训练好的局部预测发送给服务器。
1.基于对训练集投影的训练子集合构造:
定义两个训练数据点x和x'的距离为d(x,x')=||x-x'||,数据点x到集合的距离为定义数据点x到集合的投影集合为
Step 1:考虑每一个计算节点及其局部训练数据集针对一个测试数据
x*,计算测试数据x*到训练集的投影,标注为:
在每一个时刻t,这个局部投影集合包含有个投影数据,即
Step 2:对每一个计算节点及其投影集合取出每一个投影点,标注为这里下标j表示第j个投影点。然后针对每一个投影点找出它的一个邻域使得并且针对 这里需要注意,邻域的个数是可调的,可以实现选取固定。
Step 3:对每一个计算节点构造新的训练集合
2.选择核函数:
一般,核函数选择:
3.针对每一个计算节点在新的训练集上计算高斯后验概率分布,即:

在训练子集利用公式(7)得到局部预测和covi(f*)。然后将此局部预测发送给服务器。服务器利用聚合算法对局部预测值进行聚合,给出全局预测。
下面给出基于Paillier同态加密算法的局部模型传输方案以及服务器端密文平均聚合算法。
(一)基于Paillier的局部模型预测加密算法:
Paillier算法的内容如下:
1)密钥生成:选择两个大素数p,q,令n=pq,λ=lcm(p-1,q-1),它们满足:
gcd(λ,n)=1。选择g使其满足:
gcd(L(gλmodn2),n)=1,
我们将(,g)作为公钥,λ作为私钥。
(2)加密:对任意的选择随机数密文为:
c=gmrnmodn2
(3)解密:对任意的明文为:
符号表示:lcm(a,b)表示a,b的最小公倍数,gcd(a,b)表示a,b的最大公约数。表示集 合{0,1,…,n-1}。表示集合中与n互素元素的集合。对于集合Sn={u<n2|u=1modn},定义Sn上的函数L为:
表示实数空间,表示n维欧式空间。|x|表示x的绝对值。
我们用E(m)表示对明文m的加密,用D(c)表示密文c的解密。以上算法是满足同态的,也就是对任意的有:
D(E(m1)E(m2))=m1+m2modn。
所以,如果也就有D(E(m1)E(m2))=m1+m2。由此可得到此性质推广到多个数据相加的结果,也就是,如果有D(E(m1)…E(mk))=m1+…+mk
此加密算法已经被证明是满足语义安全的。
定义要加密的数据(明文)为且在t时刻,是个常数。将和mi,2(t)=covi(f*,t)利用Paillier同态加密算法进行加密,对应的密文为yi,1(t)和yi,2(t)。
(二)基于密文的平均聚合算法:
在时刻t,当服务器接收到计算节点i发送的加密的预测期望yi,1(t)和方差yi,2(t)后,我们进行如下密文相乘运算y1,1(t)…yn,1(t)和y1,2(t)…yn,2(t)。然后将密文的乘积发送回各个计算节点。
(三)利用Paillier解密算法对全局预测密文解密:
解密算法如下公式给出:
D(y1,1(t)…yn,1(t))=m1,1(t)+…+mn,1(t);
D(y1,2(t)…yn,2(t))=m1,2(t)+…+mn,2(t);
然后,我们进行平均运算:

这里mave,1(t)是解密后的全局模型预测期望,mave,2(t)是解密后的全局模型预测方差。n为加密的局部预测模型个数,即各个计算节点的总和。各个计算节点利用全局预测mave,1(t)和mave,2(t)以及自己局部预测模型m1和m2进行模型预测优化。
通过本申请提出的一种基于同态加密的预测模型训练方法,将分布式学习中各个计算节点上训练得到的局部预测模型在计算节点上根据同态加密模型加密,并将加密后的局部预测模型发送到服务器,服务器直接将加密后的局部预测模型根据同态加密特性进行相乘得到加密的全局预测模型并反馈给计算节点,计算节点解密后对自身的局部预测模型进行优化融合,整个分布式训练过程中仅有加密后的局部预测模型和基于密文的全局预测模型在传输,具有极高的安全性。同时对分布式训练的数据传输带宽和传输延迟都有极高的容忍度。
同时,本申请提供的方法可广泛应用联邦学习的各个领域,可充分照顾各个领域中不同用户对隐私数据安全的需要,用户的数据以及局部模型都不会被联邦学习提供者(即服务器端)获取,充分保证联邦学习框架下各个参与者的数据隐私和模型隐私。即便被人截取传输链路中的数据也得不到对应的模型。可有效杜绝通过网络手段窃取模型者。对于个人家庭助理等领域的模型共享训练提供安全保障,个人隐私数据和基于个人隐私数据训练的局部预测模型均处于安全保护中。对于提供联邦学习服务者更容易取得用户的信任。
如图2所示,本申请的另一方面还提出一种基于同态加密的预测模型训练系统,包括:
局部预测模型训练模块1,局部预测模型训练模块1配置用于在计算节点基于训练数据训练局部预测模型,并将局部预测模型通过同态加密算法进行加密,将加密后的局部预测模型发送到服务端;
全局预测模型生成模块2,全局预测模型生成模块2配置用于在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将全局预测模型发送到计算节点;
局部预测模型优化模块3,局部预测模型优化模块3配置用于在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
在本申请的一些实施方式中,局部预测模型训练模块进一步配置用于:
在计算节点基于对训练集投影得到训练子集,并通过训练子集基于高斯过程回归算法训练局部预测模型。
在本申请的一些实施方式中,局部预测模型训练模块1进一步配置用于:
定义训练数据点之间的距离并基于训练距离定义数据点到训练数据集合的投影集合;
通过投影集合并基于计算节点的训练数据确定局部投影集合,并根据投影集合确定计算节点的训练子集。
在本申请的一些实施方式中,局部预测模型训练模块1进一步配置用于:
响应于在计算节点确定局部投影集合,从局部投影集合中取出每一个投影点,基于每一个投影点选取预定大小的邻域范围内的训练数据构建训练子集。
在本申请的一些实施方式中,局部预测模型训练模块1进一步配置用于:
根据计算节点的计算能力制定邻域范围、数据点之间的距离以及投影集合的大小。
在本申请的一些实施方式中,局部预测模型训练模块1进一步配置用于:
在计算节点基于同态加密算法构建公钥和私钥并通过公钥对局部预测模型进行加密。
在本申请的一些实施方式中,全局预测模型生成模块2进一步配置用于:
响应于收到的多个加密后的局部预测模型根据同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
在本申请的一些实施方式中,局部预测模型优化模块3进一步配置用于:
在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型;
根据参与全局预测模型计算的局部预测模型个数对中间全局预测模型求平均值得到全局预测模型,并将全局预测模型与计算节点上的局部预测模型进行融合。
如图3所示,本申请的又一方面还提出一种计算机设备,包括:
至少一个处理器21;以及
存储器22,存储器22存储有可在处理器21上运行的计算机指令23,指令23由处理器21执行时实现上述实施方式中任意一项方法的步骤。
如图4所示,本申请的再一方面还提出一种计算机非易失性可读存储介质401,计算机非易失性可读存储介质401存储有计算机程序402,计算机程序402被处理器执行时实现上述实施方式中任意一项方法的步骤。
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。
上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解,结合这里的公开所描述的方法或算法的步骤可以直接包含在硬件中、由处理器执行的软件模块中或这两者的组合中。软件模块可以驻留在RAM存储器、快闪存储器、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动盘、CD-ROM、或本领域已知的任何其它形式的存储介质中。示例性的存储介质被耦合到处理器,使得处理器能够从该存储介质中读取信息或向该存储介质写入信息。在一个替换方案中,存储介质可以与处理器集成在一起。处理器和存储介质可以驻留在ASIC中。ASIC可以驻留在用户终端中。在一个替换方案中,处理器和存储介质可以作为分立组件驻留在用户终端中。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。

Claims (31)

  1. 一种基于同态加密的预测模型训练方法,其特征在于,包括:
    在计算节点基于训练数据训练局部预测模型,将所述局部预测模型通过同态加密算法进行加密,并将加密后的局部预测模型发送到服务端;
    在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将所述全局预测模型发送到计算节点;
    在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
  2. 根据权利要求1所述的方法,其特征在于,所述在计算节点基于训练数据训练局部预测模型包括:
    在计算节点对训练集进行简化,并通过简化后的所述训练子集基于高斯过程回归算法训练局部预测模型。
  3. 根据权利要求1所述的方法,其特征在于,所述在计算节点对训练集进行简化,并通过简化后的所述训练子集基于高斯过程回归算法训练局部预测模型包括:
    当训练集大于预设训练集大小时,在计算节点基于对训练集进行简化,并通过简化后的所述训练子集基于高斯过程回归算法训练局部预测模型。
  4. 根据权利要求3所述的方法,其特征在于,所述当训练集大于预设训练集大小时,在计算节点对训练集进行简化,并通过简化后的所述训练子集基于高斯过程回归算法训练局部预测模型包括:
    当训练集大于预设训练集大小时,在计算节点基于对训练集投影得到训练子集,并通过所述训练子集基于高斯过程回归算法训练局部预测模型。
  5. 根据权利要求4所述的方法,其特征在于,所述在计算节点基于对训练集投影得到训练子集包括:
    定义训练数据点之间的距离并基于所述距离定义数据点到训练数据集合的投影集合;
    通过所述投影集合并基于计算节点的训练数据确定局部投影集合,并根据所述投影集合确定所述计算节点的训练子集。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述投影集合确定所述计算节点的训练子集还包括:
    响应于在计算节点确定所述局部投影集合,从所述局部投影集合中取出每一个投影点,基于每一个投影点选取预定大小的邻域范围内的训练数据构建训练子集。
  7. 根据权利要求6所述的方法,其特征在于,还包括:
    根据计算节点的计算能力制定所述邻域范围、数据点之间的距离以及投影集合的大小。
  8. 根据权利要求1所述的方法,其特征在于,所述将所述局部预测模型通过同态加密算法进行加密包括:
    在计算节点基于同态加密算法构建公钥和私钥并通过公钥对所述局部预测模型进行加密。
  9. 根据权利要求1所述的方法,其特征在于,所述在服务端将收到的加密后的局部预测模型通过预定计算方式计算全局预测模型包括:
    响应于收到的多个加密后的局部预测模型根据所述同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
  10. 根据权利要求1所述的方法,其特征在于,所述在计算节点通过同态加密算法对全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合包括:
    在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型;根据参与全局预测模型计算的局部预测模型个数对所述中间全局预测模型求平均值得到全局预测模型,并将所述全局预测模型与所述计算节点上的局部预测模型进行融合。
  11. 根据权利要求1所述的方法,其特征在于,所述计算节点为联邦学习系统中训练局部预测模型的计算机。
  12. 根据权利要求1所述的方法,其特征在于,所述计算节点为分布式系统中训练局部预测模型的计算机。
  13. 根据权利要求1所述的方法,其特征在于,所述同态加密算法包括Paillier加密算法。
  14. 根据权利要求13所述的方法,其特征在于,所述将所述局部预测模型通过同态加密算法进行加密包括:
    将所述局部预测模型通过所述Paillier加密算法进行加密。
  15. 根据权利要求2所述的方法,其特征在于,所述在计算节点基于对训练集投影得到训练子集,并通过所述训练子集基于高斯过程回归算法训练局部预测模型,包括:
    在计算节点自身收集训练数据作为训练集;
    在计算节点基于对所述训练集投影得到训练子集,并通过所述训练子集基于高斯过程回归算法按照计算节点对应的业务模型训练局部预测模型。
  16. 根据权利要求7所述的方法,其特征在于,所述根据计算节点的计算能力制定所述邻域范围、数据点之间的距离以及投影集合的大小,包括:
    当计算节点的计算能力高于预设计算能力大小时,提高所述邻域范围、数据点之间的距 离以及投影集合的大小。
  17. 根据权利要求7所述的方法,其特征在于,所述根据计算节点的计算能力制定所述邻域范围、数据点之间的距离以及投影集合的大小,包括:
    当计算节点的计算能力低于预设计算能力大小时,降低所述邻域范围、数据点之间的距离以及投影集合的大小。
  18. 根据权利要求6所述的方法,其特征在于,所述邻域范围、数据点之间的距离以及投影集合的大小固定。
  19. 一种基于同态加密的预测模型训练系统,其特征在于,包括:
    局部预测模型训练模块,在计算节点基于训练数据训练局部预测模型,并将所述局部预测模型通过同态加密算法进行加密,将加密后的局部预测模型发送到服务端;
    全局预测模型生成模块,所述全局预测模型生成模块配置用于在服务端将收到的加密后的局部预测模型通过预定计算方式计算加密形式的全局预测模型,并将所述全局预测模型发送到计算节点;
    局部预测模型优化模块,所述局部预测模型优化模块配置用于在计算节点通过同态加密算法对加密形式的全局预测模型进行解密,并将解密后的全局预测模型与局部预测模型进行融合。
  20. 根据权利要求19所述的系统,其特征在于,所述局部预测模型训练模块进一步配置用于:
    在计算节点基于对训练集投影得到训练子集,并通过所述训练子集基于高斯过程回归算法训练局部预测模型。
  21. 根据权利要求20所述的系统,其特征在于,所述局部预测模型训练模块进一步配置用于:
    定义训练数据点之间的距离并基于所述距离定义数据点到训练数据集合的投影集合。
  22. 根据权利要求21所述的系统,其特征在于,
    通过所述投影集合并基于计算节点的训练数据确定局部投影集合,并根据所述投影集合确定所述计算节点的训练子集。
  23. 根据权利要求22所述的系统,其特征在于,所述局部预测模型训练模块进一步配置用于:
    响应于在计算节点确定所述局部投影集合,从所述局部投影集合中取出每一个投影点。
  24. 根据权利要求23所述的系统,其特征在于,基于每一个投影点选取预定大小的邻域 范围内的训练数据构建训练子集。
  25. 根据权利要求24所述的系统,其特征在于,所述局部预测模型训练模块进一步配置用于:
    根据计算节点的计算能力制定所述邻域范围、数据点之间的距离以及投影集合的大小。
  26. 根据权利要求19所述的系统,其特征在于,所述局部预测模型训练模块进一步配置用于:
    在计算节点基于同态加密算法构建公钥和私钥并通过公钥对所述局部预测模型进行加密。
  27. 根据权利要求19所述的系统,其特征在于,所述全局预测模型生成模块进一步配置用于:
    响应于收到的多个加密后的局部预测模型根据所述同态加密算法的密文与明文的运算相关性,将多个加密后的局部预测模型进行相乘得到加密形式的全局预测模型。
  28. 根据权利要求19所述的系统,其特征在于,所述局部预测模型优化模块进一步配置用于:
    在计算节点基于私钥对收到的加密形式的全局预测模型进行解密得到中间全局预测模型。
  29. 根据权利要求28所述的系统,其特征在于,根据参与全局预测模型计算的局部预测模型个数对所述中间全局预测模型求平均值得到全局预测模型,并将所述全局预测模型与所述计算节点上的局部预测模型进行融合。
  30. 一种计算机设备,其特征在于,包括:
    至少一个处理器;以及
    存储器,所述存储器存储有可在所述处理器上运行的计算机指令,所述指令由所述处理器执行时实现权利要求1-18任意一项所述方法的步骤。
  31. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-18任意一项所述方法的步骤。
PCT/CN2023/115580 2022-11-10 2023-08-29 一种基于同态加密的预测模型训练方法、系统、设备及介质 WO2024098897A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211401730.3A CN115664632B (zh) 2022-11-10 2022-11-10 一种基于同态加密的预测模型训练方法、系统、设备及介质
CN202211401730.3 2022-11-10

Publications (1)

Publication Number Publication Date
WO2024098897A1 true WO2024098897A1 (zh) 2024-05-16

Family

ID=85015340

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/115580 WO2024098897A1 (zh) 2022-11-10 2023-08-29 一种基于同态加密的预测模型训练方法、系统、设备及介质

Country Status (2)

Country Link
CN (1) CN115664632B (zh)
WO (1) WO2024098897A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115664632B (zh) * 2022-11-10 2023-03-21 苏州浪潮智能科技有限公司 一种基于同态加密的预测模型训练方法、系统、设备及介质
CN117938355B (zh) * 2024-03-21 2024-06-25 中国信息通信研究院 一种基于区块链的联合预测方法、介质及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10241992B1 (en) * 2018-04-27 2019-03-26 Open Text Sa Ulc Table item information extraction with continuous machine learning through local and global models
CN113239404A (zh) * 2021-06-04 2021-08-10 南开大学 一种基于差分隐私和混沌加密的联邦学习方法
CN113810168A (zh) * 2020-12-30 2021-12-17 京东科技控股股份有限公司 机器学习模型的训练方法、服务器及计算机设备
CN115174191A (zh) * 2022-06-30 2022-10-11 山东云海国创云计算装备产业创新中心有限公司 一种局部预测值安全传输方法、计算机设备和存储介质
CN115664632A (zh) * 2022-11-10 2023-01-31 苏州浪潮智能科技有限公司 一种基于同态加密的预测模型训练方法、系统、设备及介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817958B (zh) * 2022-04-24 2024-03-29 山东云海国创云计算装备产业创新中心有限公司 一种基于联邦学习的模型训练方法、装置、设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10241992B1 (en) * 2018-04-27 2019-03-26 Open Text Sa Ulc Table item information extraction with continuous machine learning through local and global models
CN113810168A (zh) * 2020-12-30 2021-12-17 京东科技控股股份有限公司 机器学习模型的训练方法、服务器及计算机设备
CN113239404A (zh) * 2021-06-04 2021-08-10 南开大学 一种基于差分隐私和混沌加密的联邦学习方法
CN115174191A (zh) * 2022-06-30 2022-10-11 山东云海国创云计算装备产业创新中心有限公司 一种局部预测值安全传输方法、计算机设备和存储介质
CN115664632A (zh) * 2022-11-10 2023-01-31 苏州浪潮智能科技有限公司 一种基于同态加密的预测模型训练方法、系统、设备及介质

Also Published As

Publication number Publication date
CN115664632A (zh) 2023-01-31
CN115664632B (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2024098897A1 (zh) 一种基于同态加密的预测模型训练方法、系统、设备及介质
Li et al. Privacy-preserving distributed profile matching in proximity-based mobile social networks
US20160020898A1 (en) Privacy-preserving ridge regression
Rao et al. Privacy techniques for edge computing systems
Saputro et al. On preserving user privacy in smart grid advanced metering infrastructure applications
WO2021190452A1 (zh) 用于云雾协助物联网的轻量级属性基签密方法
Chenam et al. A designated cloud server-based multi-user certificateless public key authenticated encryption with conjunctive keyword search against IKGA
CN110730064B (zh) 一种群智感知网络中基于隐私保护的数据融合方法
Liang et al. A ciphertext‐policy attribute‐based proxy re‐encryption scheme for data sharing in public clouds
Wang et al. A new proxy re-encryption scheme for protecting critical information systems
Zhou et al. Leakage-resilient CCA2-secure certificateless public-key encryption scheme without bilinear pairing
Cheng et al. Lightweight noninteractive membership authentication and group key establishment for WSNs
Fu et al. Offline/Online lattice-based ciphertext policy attribute-based encryption
Zhang et al. Privacy-preserving multikey computing framework for encrypted data in the cloud
Xiong et al. Optimizing rewards allocation for privacy-preserving spatial crowdsourcing
Niu et al. Achieving secure friend discovery in social strength-aware pmsns
Wu et al. Symmetric-bivariate-polynomial-based lightweight authenticated group key agreement for industrial internet of things
Ramezanian et al. Privacy preserving shortest path queries on directed graph
Wang et al. Access control encryption without sanitizers for Internet of Energy
Hsu et al. Non‐interactive integrated membership authentication and group arithmetic computation output for 5G sensor networks
Wu et al. Cross-domain identity-based matchmaking encryption
Hu et al. A countermeasure against cryptographic key leakage in cloud: public-key encryption with continuous leakage and tampering resilience
Shen et al. Verifiable privacy-preserving federated learning under multiple encrypted keys
Zhang et al. Privacy‐friendly weighted‐reputation aggregation protocols against malicious adversaries in cloud services
Arij et al. SAMAFog: service-aware mutual authentication fog-based protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23887592

Country of ref document: EP

Kind code of ref document: A1