CN116667996A - Verifiable federal learning method based on mixed homomorphic encryption - Google Patents
Verifiable federal learning method based on mixed homomorphic encryption Download PDFInfo
- Publication number
- CN116667996A CN116667996A CN202310620541.3A CN202310620541A CN116667996A CN 116667996 A CN116667996 A CN 116667996A CN 202310620541 A CN202310620541 A CN 202310620541A CN 116667996 A CN116667996 A CN 116667996A
- Authority
- CN
- China
- Prior art keywords
- model
- aggregation
- encryption
- pasta
- bfv
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000002776 aggregation Effects 0.000 claims abstract description 82
- 238000004220 aggregation Methods 0.000 claims abstract description 82
- 238000012795 verification Methods 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000004364 calculation method Methods 0.000 claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 17
- 235000015927 pasta Nutrition 0.000 claims description 50
- 230000006870 function Effects 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 6
- 238000013139 quantization Methods 0.000 claims description 6
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 10
- 238000004891 communication Methods 0.000 abstract description 6
- 238000010801 machine learning Methods 0.000 description 6
- 238000004806 packaging method and process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005242 forging Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0435—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/0838—Key agreement, i.e. key establishment technique in which a shared key is derived by parties as a function of information contributed by, or associated with, each of these
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
本发明公开了一种基于混合同态加密的可验证联邦学习方法,其特点是在中心化联邦学习的客户‑服务器基础架构上,采用支持SIMD操作的混合同态加密技术加密数据,在保证数据隐私和正确聚合的同时调动对称加密计算简单、无密文膨胀的优势,弥补公钥同态加密方案计算复杂,密文膨胀严重的不足;使用拉格朗日插值法构造验证码,支持客户端对聚合结果进行检验。具体包括:系统初始化、模型训练与数据加密、聚合、聚合结果验证与模型更新等步骤。与现有技术方案相比,本发明在保证了联邦学习聚合过程中用户梯度机密性、聚合正确性以及聚合结果完整性的同时,减少公钥加密次数,减轻计算和通信负担,有较大效率提升。
The invention discloses a verifiable federated learning method based on hybrid homomorphic encryption, which is characterized in that on the client-server infrastructure of centralized federated learning, the hybrid homomorphic encryption technology supporting SIMD operation is used to encrypt data, while ensuring data Simultaneously mobilize the advantages of simple symmetric encryption calculation and no ciphertext expansion while maintaining privacy and correct aggregation, and make up for the shortcomings of public key homomorphic encryption schemes such as complex calculation and serious ciphertext expansion; use Lagrangian interpolation method to construct verification codes, and support clients Check the aggregation results. Specifically include: system initialization, model training and data encryption, aggregation, verification of aggregation results, and model update. Compared with the existing technical solutions, the present invention not only ensures the confidentiality of user gradients, the correctness of aggregation, and the integrity of aggregation results in the federated learning aggregation process, but also reduces the number of public key encryption times, reduces the burden of calculation and communication, and has greater efficiency promote.
Description
技术领域technical field
本发明涉及密码学技术的同态加密领域以及机器学习的隐私保护领域,尤其涉及一种基于混合同态加密的可验证联邦学习方法。The invention relates to the field of homomorphic encryption of cryptography technology and the field of privacy protection of machine learning, in particular to a verifiable federated learning method based on hybrid homomorphic encryption.
背景技术Background technique
随着第四次工业革命的到来,新一代人工智能、大数据等技术为传统行业智能化转型发展带来了机遇。机器学习作为近年来大数据分析的主要方法之一,涵盖了深度神经网络和回归算法等多种智能算法,成功应用于医疗卫生、自动驾驶、金融以及工业制造等诸多领域。高质量的机器学习模型通常需要海量可靠数据作为训练支撑,因此基于人工智能服务的性能在很大程度上受到训练数据质量的影响,大规模的数据收集至关重要。然而,大多数行业中数据有限且质量较差,往往会导致训练的机器学习模型过拟合、不可靠等问题,不足以支撑人工智能技术的实现。一种直观的解决思路便是数据共享,融合使用不同行业、同一行业不同企业之间的数据。传统的集中式训练中,服务器要求所有参与方将他们的本地数据上传到云。然后,服务器在云上初始化深度神经网络,并使用训练样本对其进行训练,直到获得最佳参数。最终,云服务器将发布预测服务接口或最优参数返回给各参与方。这种集中式训练方法引发出了一系列数据隐私与安全问题:用户的本地数据可能包含敏感的隐私信息,例如在医疗保健系统中,患者可能不愿意与第三方服务提供商(如云服务器)共享他们的医疗数据。此外,不同数据源之间往往存在着难以打破的壁垒,在大多数行业中,数据是以孤岛的形式存在的。所以想要直接将分散在各个行业、各个地区的数据进行整合是难以实现的,或者说是成本巨大且有隐私危害的。With the advent of the fourth industrial revolution, a new generation of artificial intelligence, big data and other technologies have brought opportunities for the intelligent transformation and development of traditional industries. As one of the main methods of big data analysis in recent years, machine learning covers a variety of intelligent algorithms such as deep neural network and regression algorithm, and has been successfully applied in many fields such as medical care, automatic driving, finance and industrial manufacturing. High-quality machine learning models usually require massive amounts of reliable data as training support, so the performance of AI-based services is largely affected by the quality of training data, and large-scale data collection is crucial. However, the limited and poor quality of data in most industries often leads to problems such as overfitting and unreliability of the trained machine learning models, which are not enough to support the realization of artificial intelligence technology. An intuitive solution is data sharing, which integrates and uses data from different industries and different companies in the same industry. In traditional centralized training, the server requires all participants to upload their local data to the cloud. The server then initializes the deep neural network on the cloud and trains it using training samples until optimal parameters are obtained. Finally, the cloud server will release the prediction service interface or return the optimal parameters to all participants. This centralized training method raises a series of data privacy and security issues: the user's local data may contain sensitive private information, for example, in the healthcare system, patients may not be willing to communicate with third-party service providers (such as cloud servers) share their medical data. In addition, there are often hard-to-break barriers between different data sources. In most industries, data exists in the form of silos. Therefore, it is difficult to directly integrate data scattered in various industries and regions, or it is costly and privacy-threatening.
为了解决“数据孤岛”的问题,联邦学习的概念被提出:在进行机器学习的过程中,各参与方可借助其他方数据进行联合建模。各方无需共享数据资源,即数据不出本地的情况下,进行数据联合训练,建立共享的机器学习模型。根据通信体系架构的不同,联邦学习可分为中心化联邦学习和去中心化联邦学习。中心化联邦学习基于客户端-服务器架构,去中心化联邦学习则基于端对端的网络架构。相比于去中心化联邦学习,中心化联邦学习通常是简单有效的,可拓展性和稳定性的特点也使其应用更加广泛。在一个中心化的联邦学习方案中,各个参与方在本地使用私有数据训练一个局部模型。在每一轮模型更新中,用户将局部模型的更新参数(梯度),上传给聚合服务器。聚合服务器对所有用户的更新参数进行聚合,并将聚合值返回给各个用户进行模型更新。然而,已经有研究表明,敌手仍然可以基于共享梯度间接获取敏感信息。于是,延伸出了隐私保护的联邦学习这一研究分支。In order to solve the problem of "data islands", the concept of federated learning was proposed: in the process of machine learning, each participant can conduct joint modeling with the help of other parties' data. All parties do not need to share data resources, that is, when the data does not come out of the local area, data joint training is carried out to establish a shared machine learning model. According to different communication architectures, federated learning can be divided into centralized federated learning and decentralized federated learning. Centralized federated learning is based on client-server architecture, while decentralized federated learning is based on end-to-end network architecture. Compared with decentralized federated learning, centralized federated learning is usually simple and effective, and its scalability and stability make it more widely used. In a centralized federated learning scheme, each participant trains a local model locally using private data. In each round of model update, the user uploads the update parameters (gradient) of the local model to the aggregation server. The aggregation server aggregates the update parameters of all users, and returns the aggregated value to each user for model update. However, it has been shown that an adversary can still obtain sensitive information indirectly based on shared gradients. Thus, the research branch of privacy-preserving federated learning was extended.
为了保护联邦学习中参与方的隐私,目前已经提出了许多策略,其中应用比较广泛的有安全多方计算、差分隐私和同态加密技术。安全多方计算允许多个参与方以一种各方除了输入和输出之外一无所知的方式,协同计算一个具有私有数据的约定函数。差分隐私的原理是通过向数据集注入噪声模糊敏感信息,使得第三方无法区分个人来确保数据集中每个独立样本的隐私。同态加密允许直接在密文上执行某些计算,而无需先解密它们。与多方安全计算和差分隐私方法相比,同态加密技术具有更强的隐私保障,且对训练模型的准确度影响小。然而,因为常用的同态加密技术都属于公钥密码体制,具有较长的密钥且需要复杂的数学计算,从而会导致昂贵的计算开销和密文膨胀问题。因此,基于同态加密技术的联邦学习系统的发展受到高计算和通信开销瓶颈的限制。在密码学研究领域中,近期有学者提出了混合同态加密的概念,将对称加密与公钥同态加密相融合,并提出了一种支持SIMD(Single Instruction Multiple Data,单指令多数据)操作的可行实例:Pasta(一种对称流加密算法簇)对称密码与BFV(基于RLWE(Ring-Learning With Errors,环上容错学习)难题的全同态加密方案)同态密码的融合使用(以下简称为“Pasta+BFV”方案)。对称密码计算高效的特点正与同态加密形成互补,以此为隐私保护的联邦学习方案提供了新思路。In order to protect the privacy of participants in federated learning, many strategies have been proposed, among which secure multi-party computation, differential privacy and homomorphic encryption technologies are widely used. Secure multi-party computation allows multiple parties to collaboratively compute a contracted function with private data in a way that each party knows nothing about its inputs and outputs. The principle of differential privacy is to ensure the privacy of each independent sample in the data set by injecting noise into the data set to blur sensitive information, making it impossible for third parties to distinguish individuals. Homomorphic encryption allows certain computations to be performed directly on ciphertexts without decrypting them first. Compared with multi-party secure computing and differential privacy methods, homomorphic encryption technology has stronger privacy protection and has little impact on the accuracy of the training model. However, because the commonly used homomorphic encryption technology belongs to the public key cryptosystem, it has a long key and requires complex mathematical calculations, which will lead to expensive calculation overhead and ciphertext expansion. Therefore, the development of federated learning systems based on homomorphic encryption techniques is limited by high computational and communication overhead bottlenecks. In the field of cryptography research, some scholars have recently proposed the concept of hybrid homomorphic encryption, which combines symmetric encryption and public key homomorphic encryption, and proposes a method that supports SIMD (Single Instruction Multiple Data, single instruction multiple data) operations. Feasible examples of pasta (a symmetric stream encryption algorithm cluster) symmetric cipher and BFV (a fully homomorphic encryption scheme based on the RLWE (Ring-Learning With Errors, fault-tolerant learning on the ring) problem) fusion of homomorphic ciphers (hereinafter referred to as for the "Pasta+BFV" scheme). The computationally efficient characteristics of symmetric cryptography are complementing homomorphic encryption, which provides a new idea for privacy-preserving federated learning schemes.
此外,联邦学习中还存在着数据完整性问题。聚合服务器作为一个第三方存在,很容易造成单点故障问题。在没有完整性保证的情况下,一旦服务器受到威胁,控制服务器的敌手就可以操纵全局模型。恶意的服务器可以通过伪造聚合结果逆向分析参与方的私有数据或者破坏用户本地模型,导致错误分类结果。In addition, there are data integrity issues in federated learning. The aggregation server exists as a third party, which can easily cause a single point of failure problem. Without integrity guarantees, once a server is compromised, an adversary controlling the server can manipulate the global model. Malicious servers can reversely analyze the private data of the participants or destroy the user's local model by forging the aggregation results, resulting in wrong classification results.
综上所述,保护用户的隐私和数据完整性是联合学习训练过程中的两个基本问题。此外,在基于同态加密的联邦学习方案中,如何在实现较强隐私保障保护数据安全的同时,减少公钥加密次数,减轻计算和通信负担,在安全性和效率之间平衡是一个待解的难题。因此,设计一种安全高效的可验证联邦学习方法是紧迫而有意义的。To sum up, protecting user privacy and data integrity are two fundamental issues in the federated learning training process. In addition, in the federated learning scheme based on homomorphic encryption, how to reduce the number of public key encryptions, reduce the burden of calculation and communication, and balance security and efficiency while achieving strong privacy protection to protect data security is an open question. problem. Therefore, it is urgent and meaningful to design a safe and efficient verifiable federated learning method.
发明内容Contents of the invention
为了解决背景技术中所描述的联邦学习领域下存在的隐私安全、数据完整性和效率问题,本发明的目的是提供了一种基于混合同态加密的可验证联邦学习方法。该方法适用于客户-服务器架构下的中心化联邦学习场景,让各方在源数据不出本地的情况下,依靠聚合服务器进行数据联合训练,解决了“数据孤岛”问题。针对隐私安全和效率问题,本发明使用了支持SIMD的“Pasta+BFV”混合同态加密技术,引入“对称暂代同态”的思想将复杂的计算由客户端转移到服务器端以减轻用户负担,并缓解了密文膨胀导致的通信压力;基于BFV同态加密实现在加密的条件下正确聚合以保护用户源数据隐私;基于SIMD打包,实现一次加密多个数据,减少了加密次数。针对数据完整性问题,使用拉格朗日插值法构造验证码,支持客户端对聚合结果进行检验。In order to solve the privacy security, data integrity and efficiency problems existing in the field of federated learning described in the background technology, the purpose of the present invention is to provide a verifiable federated learning method based on hybrid homomorphic encryption. This method is applicable to the centralized federated learning scenario under the client-server architecture, allowing all parties to rely on the aggregation server for data joint training when the source data does not go out of the local area, which solves the "data island" problem. Aiming at privacy, security and efficiency issues, this invention uses the "Pasta+BFV" hybrid homomorphic encryption technology that supports SIMD, and introduces the idea of "symmetric transient homomorphism" to transfer complex calculations from the client to the server to reduce the burden on users , and alleviate the communication pressure caused by ciphertext expansion; based on BFV homomorphic encryption, it can be correctly aggregated under encrypted conditions to protect the privacy of user source data; based on SIMD packaging, it can encrypt multiple data at one time and reduce the number of encryptions. For data integrity issues, the Lagrangian interpolation method is used to construct verification codes to support the client to verify the aggregation results.
实现本发明目的的具体技术方案是:The concrete technical scheme that realizes the object of the invention is:
一种基于混合同态加密的可验证联邦学习方法,包括的实体:密钥生成机构PKG(Public Key Generator)、n个客户端以及聚合服务器;用于客户-服务器架构下的联邦学习场景,特点是该方法包括以下具体步骤:A verifiable federated learning method based on hybrid homomorphic encryption, including entities: PKG (Public Key Generator), n clients, and aggregation servers; used in federated learning scenarios under client-server architecture, features Yes, the method includes the following specific steps:
步骤A:初始化Step A: Initialize
n个客户端先根据业务需求进行协商,就训练模型达成约定;密钥生成机构PKG生成初始化全局模型、密钥和公共参数,并按要求分发给客户端和聚合服务器;The n clients first negotiate according to business needs, and reach an agreement on the training model; the key generation agency PKG generates and initializes the global model, key and public parameters, and distributes them to the client and the aggregation server as required;
步骤B:模型训练与数据加密Step B: Model Training and Data Encryption
在每一轮模型更新中,各客户端使用本地数据训练局部模型,并计算出本轮迭代的模型更新参数;然后,对更新参数进行预处理,构造验证码并组成明文;最后,对明文进行Pasta对称加密,并将Pasta对称密文发送给聚合服务器;In each round of model update, each client uses local data to train the local model, and calculates the model update parameters of this round of iteration; then, preprocesses the update parameters, constructs the verification code and composes the plaintext; finally, the plaintext Pasta symmetric encryption, and the Pasta symmetric ciphertext is sent to the aggregation server;
步骤C:聚合Step C: Aggregation
聚合服务器接收到来自所有客户端的Pasta对称密文之后,首先进行密文转换得到BFV同态密文;然后对所有的BFV同态密文进行聚合,并将聚合结果发送给所有客户端;After the aggregation server receives the Pasta symmetric ciphertext from all clients, it first converts the ciphertext to obtain the BFV homomorphic ciphertext; then aggregates all the BFV homomorphic ciphertext and sends the aggregation result to all clients;
步骤D:聚合结果验证与模型更新Step D: Aggregation result verification and model update
各客户端接收到来自聚合服务器的聚合结果之后,进行同态解密;接着,依靠拉格朗日插值法对聚合结果进行检验,若验证通过则使用聚合结果更新模型,否则丢弃该聚合结果;然后,进入下一轮迭代直到模型收敛或达到最大联邦轮数。After each client receives the aggregation result from the aggregation server, it performs homomorphic decryption; then, it relies on the Lagrangian interpolation method to verify the aggregation result, and if the verification is passed, the aggregation result is used to update the model; otherwise, the aggregation result is discarded; then , enter the next round of iterations until the model converges or reaches the maximum number of federation rounds.
其中,所述步骤A具体包括:Wherein, the step A specifically includes:
A1:模型初始化A1: Model initialization
个客户端先根据训练目标就训练模型达成约定,PKG根据这一约定生成初始化全局模型、学习率、梯度量化精度、映射有限域和最大联邦轮数,并分发给所有客户端作为初始的局部模型;Each client first reaches an agreement on the training model according to the training goal. According to this agreement, PKG generates an initial global model, learning rate, gradient quantization accuracy, mapping finite field, and the maximum number of federation rounds, and distributes it to all clients as the initial local model. ;
A2:密钥初始化A2: Key initialization
PKG生成Pasta加密方案、BFV加密方案以及安全参数λ;然后为所有客户端生成一个Pasta密钥并分发给各个客户端;接着,生成一组公用的BFV密钥,BFV密钥包括公钥、私钥和计算密钥,其中公钥和计算密钥对所有参与方即n个客户端和聚合服务器公开,私钥对所有的客户端共享,而对聚合服务器保密;接着,依次使用BFV公钥加密各个客户端的Pasta密钥,构成用户列表发送给聚合服务器;PKG generates Pasta encryption scheme, BFV encryption scheme and security parameter λ; then generates a Pasta key for all clients and distributes it to each client; then, generates a set of public BFV keys, BFV keys include public key, private key key and calculation key, where the public key and calculation key are open to all participants, that is, n clients and the aggregation server, and the private key is shared with all clients, but kept secret to the aggregation server; then, use the BFV public key to encrypt in turn The pasta key of each client forms a user list and sends it to the aggregation server;
A3:公共参数初始化A3: Public parameter initialization
PKG生成一个用于拉格朗日插值的参数序列,并将其发送给所有客户端。PKG generates a parameter sequence for Lagrangian interpolation and sends it to all clients.
所述步骤B具体包括:Described step B specifically comprises:
B1:模型训练B1: Model training
各客户端使用本地私有数据集训练局部模型,计算出本轮训练的损失函数以及用于更新的梯度;Each client uses the local private data set to train the local model, and calculates the loss function of the current round of training and the gradient used for updating;
B2:数据预处理B2: Data Preprocessing
为了对梯度进行加密,对梯度进行预处理以将其转换为适用于“PASTA+BFV”混合同态加密的形式;具体包括,首先对浮点数梯度进行量化转换为整数,然后将量化后的梯度映射到有限域上以适应加密算法;最后,根据Pasta支持的SIMD操作的阈值参数将所有待加密数据分组;In order to encrypt the gradient, the gradient is preprocessed to convert it into a form suitable for "PASTA+BFV" hybrid homomorphic encryption; specifically, firstly, the floating-point gradient is quantized and converted into an integer, and then the quantized gradient Mapped to a finite field to adapt to the encryption algorithm; finally, all the data to be encrypted are grouped according to the threshold parameter of the SIMD operation supported by Pasta;
B3:构造验证码B3: Construct verification code
对于每一个经过预处理的梯度分组,采用拉格朗日插值法构造一个验证码;For each preprocessed gradient packet, a verification code is constructed using Lagrangian interpolation;
B4:加密B4: encryption
根据一个经过预处理的梯度分组及其验证码构造出一次加密的明文向量;使用Pasta密码对该明文向量进行加密,并将Pasta对称密文发送给聚合服务器。Construct an encrypted plaintext vector based on a preprocessed gradient group and its verification code; use the Pasta password to encrypt the plaintext vector, and send the Pasta symmetric ciphertext to the aggregation server.
所述步骤C具体包括:Described step C specifically comprises:
C1:密文转换C1: Ciphertext conversion
聚合服务器接收到来自客户端的Pasta对称密文之后,首先从在初始化过程中收到的来自PKG的用户列表中检索出该用户经由BFV加密过的Pasta密钥;接着,根据BFV密码的全同态性,同态地解密对称密文以将其转换为BFV同态密文;After the aggregation server receives the Pasta symmetric ciphertext from the client, it first retrieves the user's Pasta key encrypted by BFV from the user list received from PKG during the initialization process; then, according to the fully homomorphic BFV cipher , decrypt the symmetric ciphertext homomorphically to convert it into a BFV homomorphic ciphertext;
C2:聚合C2: aggregation
将所有接收到的Pasta对称密文转换为BFV同态密文之后,聚合服务器对BFV同态密文进行聚合,然后将聚合结果发送给各个客户端。After converting all received Pasta symmetric ciphertexts into BFV homomorphic ciphertexts, the aggregation server aggregates the BFV homomorphic ciphertexts, and then sends the aggregation results to each client.
所述步骤D具体包括:Described step D specifically comprises:
D1:验证D1: Verification
客户端接收到来自聚合服务器的聚合结果之后,对其进行解密划分出聚合后的梯度分组和聚合验证码;然后对聚合后的梯度分组使用拉格朗日插值法构造出检验码,验证检验码是否与聚合验证码相等;若相等,则验证成功;否则,验证失败;After receiving the aggregation result from the aggregation server, the client decrypts it and divides the aggregated gradient grouping and aggregation verification code; then uses the Lagrangian interpolation method to construct the verification code for the aggregated gradient grouping, and verifies the verification code Whether it is equal to the aggregation verification code; if they are equal, the verification is successful; otherwise, the verification fails;
D2:更新模型D2: Update the model
如果验证成功,用户对聚合后的梯度进行处理,即模型训练与数据加密过程的数据预处理中的逆操作;然后使用恢复的聚合梯度更新模型;如果验证失败,则丢弃该聚合值;If the verification is successful, the user processes the aggregated gradient, that is, the inverse operation in the data preprocessing of the model training and data encryption process; then uses the restored aggregated gradient to update the model; if the verification fails, the aggregated value is discarded;
D3:迭代进入下一轮训练,即重复执行步骤B~步骤D,直到模型收敛或者达到最大联邦轮数。D3: Iterate into the next round of training, that is, repeat steps B to D until the model converges or reaches the maximum number of federation rounds.
与现有隐私保护联邦学习方法相比,本发明的有益效果是:Compared with the existing privacy protection federated learning method, the beneficial effects of the present invention are:
(1)本发明将混合同态加密的概念应用到了联邦学习场景下,调动对称加密计算简单、无密文膨胀的优势,弥补了公钥同态加密方案计算复杂,密文膨胀严重的不足。在现有的基于同态加密的联邦学习方法中,客户端通常直接使用公钥同态加密方案对梯度进行加密,然后将同态密文传输给聚合服务器计算聚合结果。然而,公钥加密方案产生的密文通常会比明文长很多,其膨胀因子取决于加密方案的安全参数。为了保证密码方案的安全性,这个安全参数需要足够大。已有研究证明,执行这类联邦学习方法产生的数据传输量比不加密时增长了150倍以上。此外,公钥同态加密方案的实施需要执行复杂的密码操作(如模乘和幂运算),这将为一些算力有限的客户端带来很大的计算压力。在本发明中,客户端使用对称加密算法对梯度进行加密,然后将对称密文传输给聚合服务器计算聚合结果。注意到对称加密方案计算简单,且明密文长度相等,即密文膨胀因子为1。这将计算负担从算力有限的客户端转移到聚合服务器上,减少了客户端的计算开销,同时缓解了公钥加密的密文膨胀问题带来的通信压力。(1) The present invention applies the concept of hybrid homomorphic encryption to the federated learning scenario, mobilizes the advantages of simple symmetric encryption calculation and no ciphertext expansion, and makes up for the shortcomings of public key homomorphic encryption schemes that are complex in calculation and serious ciphertext expansion. In the existing federated learning method based on homomorphic encryption, the client usually directly uses the public key homomorphic encryption scheme to encrypt the gradient, and then transmits the homomorphic ciphertext to the aggregation server to calculate the aggregation result. However, the ciphertext produced by a public-key encryption scheme is usually much longer than the plaintext, and its inflation factor depends on the security parameters of the encryption scheme. In order to guarantee the security of the cryptographic scheme, this security parameter needs to be large enough. It has been shown that implementing this type of federated learning method results in more than 150 times more data transfer than without encryption. In addition, the implementation of the public key homomorphic encryption scheme needs to perform complex cryptographic operations (such as modular multiplication and exponentiation), which will bring great computational pressure to some clients with limited computing power. In the present invention, the client uses a symmetric encryption algorithm to encrypt the gradient, and then transmits the symmetric ciphertext to the aggregation server to calculate the aggregation result. Note that the symmetric encryption scheme is simple to calculate, and the plain and ciphertext lengths are equal, that is, the ciphertext expansion factor is 1. This shifts the computing burden from the client with limited computing power to the aggregation server, reducing the computing overhead of the client and alleviating the communication pressure caused by the ciphertext expansion problem of public key encryption.
(2)本发明使用的混合同态加密方案为“Pasta+BFV”方案,该方案支持SIMD思想。BFV同态加密方案支持多项式打包,即将一个明文向量编码为一个多项式,将向量的加密转换为多项式加密。进而,密文上的同态操作等效于向量上的元素间操作。为了在混合同态加密方案中延用BFV的打包优势,本发明选择了支持SIMD思想的Pasta对称加密方案。一个SIMD操作阈值为的Pasta加密方案可以在加密个明文后一次性转换为一个BFV同态密文。与一次只加密一个明文,并逐个进行密文转换相比,使客户端的加解密次数和聚合服务器端的密文转换次数减少到提高了计算效率。(2) The hybrid homomorphic encryption scheme used in the present invention is the "Pasta+BFV" scheme, which supports the SIMD idea. The BFV homomorphic encryption scheme supports polynomial packaging, that is, encoding a plaintext vector into a polynomial, and converting the encryption of the vector into polynomial encryption. Furthermore, homomorphic operations on ciphertexts are equivalent to element-to-element operations on vectors. In order to continue to use the packaging advantages of BFV in the hybrid homomorphic encryption scheme, the present invention selects the Pasta symmetric encryption scheme that supports the SIMD idea. A Pasta encryption scheme with a SIMD operation threshold of 1 can be transformed into a BFV homomorphic ciphertext at one time after encrypting plaintext. Compared with only encrypting one plaintext at a time and performing ciphertext conversion one by one, the number of encryption and decryption on the client side and the number of ciphertext conversions on the aggregation server side are reduced to Improved computational efficiency.
(3)本发明使用拉格朗日插值方法实现了对聚合结果的验证。不仅保证了数据完整性,与使用同态哈希和基于线性配对的一般方法相比,计算更加简单高效。(3) The present invention uses the Lagrangian interpolation method to realize the verification of the aggregation result. Not only data integrity is guaranteed, but the calculation is simpler and more efficient than using homomorphic hashing and the general method based on linear pairing.
附图说明Description of drawings
图1为本发明架构图;Fig. 1 is a structure diagram of the present invention;
图2为本发明流程图;Fig. 2 is a flowchart of the present invention;
图3为本发明中梯度分组方法的示意图。Fig. 3 is a schematic diagram of the gradient grouping method in the present invention.
具体实施方式Detailed ways
结合以下具体实施例和附图,对本发明作进一步的详细说明。实施本发明的过程、条件、实验方法等,除以下专门提及的内容之外,均为本领域的普遍知识和公知常识,本发明无特别限制内容。应注意到,相似的标号和字母在附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。The present invention will be further described in detail in conjunction with the following specific embodiments and accompanying drawings. The process, conditions, experimental methods, etc. for implementing the present invention are general knowledge and common knowledge in the art except for the content specifically mentioned below, and the present invention has no special limitation content. It should be noted that like numerals and letters denote similar items in the drawings, therefore, once an item is defined in one drawing, it does not require further definition and explanation in subsequent drawings.
术语解释:Explanation of terms:
(1)对称密码(Symmetric Key Encryption,SKE):根据加密和解密使用的密钥是否相同,密码体制分为对称密码体制和非对称密码体制,非对称密码体制也称为公钥密码体制。对称密码是一种加解密使用相同密钥的密码体制。对称密码利用密钥和加密算法将明文变为密文。运用相同的密钥和解密算法,可以从密文恢复出明文。(1) Symmetric Key Encryption (SKE): According to whether the keys used for encryption and decryption are the same, the cryptosystem is divided into symmetric cryptosystem and asymmetric cryptosystem, and asymmetric cryptosystem is also called public key cryptosystem. Symmetric encryption is a cryptographic system that uses the same key for encryption and decryption. Symmetric ciphers use a key and an encryption algorithm to convert plaintext into ciphertext. Using the same key and decryption algorithm, the plaintext can be recovered from the ciphertext.
一个对称加密方案SKE由三个概率多项式时间算法组成:A symmetric encryption scheme SKE consists of three probabilistic polynomial time algorithms:
SKE=(SKE.keyGen,SKE.Enc.SKE.Dec)。SKE = (SKE.keyGen, SKE.Enc.SKE.Dec).
其中,SKE.keyGen为密钥生成算法,SKE.Enc为加密算法,SKe.Dec为解密算法。Among them, SKE.keyGen is the key generation algorithm, SKE.Enc is the encryption algorithm, and SKe.Dec is the decryption algorithm.
(2)同态加密(Homomorphic Encryption,HE):同态加密允许直接在密文上执行某些计算,而无需先解密它们。常用的同态加密方案都属于公钥密码体制。记给定明文m1,m2加密后的密文分别为则同态加密的同态性表现为:(2) Homomorphic Encryption (HE): Homomorphic encryption allows certain computations to be performed directly on ciphertexts without decrypting them first. Commonly used homomorphic encryption schemes all belong to public key cryptography. Note that the encrypted ciphertexts of the given plaintext m 1 and m 2 are respectively Then the homomorphism of homomorphic encryption is expressed as:
加法同态: Additive homomorphism:
乘法同态: Multiplicative homomorphism:
根据同态加密支持的运算种类和次数,又可以分为部分同态加密、些许同态加密和全同态加密。部分同态加密仅支持加法或乘法运算的同态性。些许同态加密支持有限次加法和乘法运算;全同态加密支持密文上任意计算的同态性,且不限制计算次数。本发明采用的BFV密码方案就是一种全同态加密方案。According to the types and times of operations supported by homomorphic encryption, it can be divided into partial homomorphic encryption, partial homomorphic encryption and full homomorphic encryption. Partially homomorphic encryption only supports homomorphism for addition or multiplication operations. Some homomorphic encryption supports a limited number of addition and multiplication operations; fully homomorphic encryption supports the homomorphism of any calculation on the ciphertext, and does not limit the number of calculations. The BFV encryption scheme adopted in the present invention is a fully homomorphic encryption scheme.
一个公钥同态加密方案HE由四个概率多项式时间算法组成:A public key homomorphic encryption scheme HE consists of four probabilistic polynomial time algorithms:
HE=(HE.keyGen,HE.Enc,HE.Dec,HE.Eval)。HE=(HE.keyGen, HE.Enc, HE.Dec, HE.Eval).
其中,HE.keyGen为密钥生成算法,它输出一组公钥同态方案的密钥,包括公钥,私钥和计算密钥。HE.Enc和HE.Dec分别为加密算法和解密算法。HE.Eval为同态评估算法,输入计算密钥、目标计算函数和密文,该算法输出的结果在解密后与直接使用目标计算函数对原明文进行运算得到的值相等。Among them, HE.keyGen is a key generation algorithm, which outputs a set of public key homomorphic keys, including public key, private key and calculation key. HE.Enc and HE.Dec are encryption algorithm and decryption algorithm respectively. HE.Eval is a homomorphic evaluation algorithm. Input the calculation key, target calculation function and ciphertext. After decryption, the output result of the algorithm is equal to the value obtained by directly using the target calculation function to operate on the original plaintext.
(3)混合同态加密(Hybrid Homomorphic Encryption,HHE):考虑到常用的同态加密技术都属于公钥密码体制,具有较长的密钥且需要复杂的数学计算,从而会导致昂贵的计算开销和密文膨胀问题。有研究学者提出了混合同态加密的概念。其主要思想是:使用密文膨胀因子为1的对称密码方案加密数据来代替膨胀因子较大的同态加密,然后用同态密码加密对称加密方案的密钥,并将其和对称密文一起发送给云服务提供方。接着,云服务提供方首先同态地执行对称解密电路将对称密文转换为同态密文,然后继续执行需要的计算操作。以下将由一个公钥同态加密方案和一个对称密码方案引出对混合同态密码的定义:(3) Hybrid Homomorphic Encryption (HHE): Considering that the commonly used homomorphic encryption technology belongs to the public key cryptosystem, it has a long key and requires complex mathematical calculations, which will lead to expensive calculation overhead and ciphertext expansion problem. Some researchers have proposed the concept of hybrid homomorphic encryption. The main idea is: use a symmetric encryption scheme with a ciphertext expansion factor of 1 to encrypt data instead of a homomorphic encryption with a large expansion factor, then use a homomorphic encryption to encrypt the key of the symmetric encryption scheme, and combine it with the symmetric ciphertext sent to the cloud service provider. Next, the cloud service provider first executes the symmetric decryption circuit homomorphically to convert the symmetric ciphertext into homomorphic ciphertext, and then continues to perform required computing operations. The definition of hybrid homomorphic encryption will be derived from a public key homomorphic encryption scheme and a symmetric encryption scheme as follows:
基于一个公钥同态加密方案HE和一个对称加密方案SKE,可以构建一个混合同态加密方案HHE,它由五个概率多项式时间算法组成:Based on a public key homomorphic encryption scheme HE and a symmetric encryption scheme SKE, a hybrid homomorphic encryption scheme HHE can be constructed, which consists of five probabilistic polynomial time algorithms:
HHE=(HHE.keyHen,HHE.Enc,HHE.Decomp,HHE.Eval,HHE.Dec)。HHE=(HHE.keyHen, HHE.Enc, HHE.Decomp, HHE.Eval, HHE.Dec).
其中密钥生成算法HHE.keyGen调用HE.keyGen和SKE.keyGen;加密算法HHE.Enc调用SKE.Enc对明文进行加密,并调用HE.Enc对对称密钥进行加密;密文转换算法HHE.Decomp调用HE.Rval,同态地执行对称密码的解密电路,将对称密文转换为同态密文;同态评估算法HHW.Eval和解密算法HHE.Dec分别直接调用HE.Eval和HE.Dec。The key generation algorithm HHE.keyGen calls HE.keyGen and SKE.keyGen; the encryption algorithm HHE.Enc calls SKE.Enc to encrypt the plaintext, and calls HE.Enc to encrypt the symmetric key; the ciphertext conversion algorithm HHE.Decomp Call HE.Rval, execute the decryption circuit of the symmetric cipher homomorphically, and convert the symmetric ciphertext into the homomorphic ciphertext; the homomorphic evaluation algorithm HHW.Eval and the decryption algorithm HHE.Dec directly call HE.Eval and HE.Dec respectively.
有限域也称为伽罗瓦域,是仅含有有限个元素的域,Fq表示含有q个元素的有限域。应用于联邦学习场景下的同态加密方案一般都以有限域Fq为输入明文空间,其中许多流行的同态加密算法(如BFV)都支持SIMD操作:将一个明文向量编码为一个多项式,将向量的加密转换为多项式加密,于是密文上的同态操作等效于向量上的元素间操作。为了在混合同态加密方案中保持这一特性,有很多学者就满足对应要求的对称密码方案进行了研究,并得出了一些成果,如Pasta。因此,为了依靠SIMD获取进一步的效率提升,本发明选用Pasta对称密码方案和BFV同态加密方案构造混合同态加密方案,其中Pasta是上的对称流加密算法簇,/>是有限域Fq上的维向量空间,q一般选为满足2(16)<q<2(60)的大素数,为SIMD操作即批量转换的阈值。根据给定的q和安全级别要求,可以计算出合适的。在这样构成的“Pasta+BFV”加密方案中,加密个有限域Fq上的明文后可以一次性转换为一个BFV同态密文。A finite field, also known as a Galois field, is a field containing only a finite number of elements, and F q represents a finite field containing q elements. Homomorphic encryption schemes applied in federated learning scenarios generally take the finite field F q as the input plaintext space, and many popular homomorphic encryption algorithms (such as BFV) support SIMD operations: encode a plaintext vector into a polynomial, and The encryption of the vector is transformed into polynomial encryption, so the homomorphic operation on the ciphertext is equivalent to the operation between elements on the vector. In order to maintain this feature in the hybrid homomorphic encryption scheme, many scholars have conducted research on symmetric encryption schemes that meet the corresponding requirements, and have obtained some results, such as Pasta. Therefore, in order to rely on SIMD to obtain further efficiency improvement, the present invention selects the Pasta symmetric encryption scheme and the BFV homomorphic encryption scheme to construct a hybrid homomorphic encryption scheme, wherein Pasta is A family of symmetric stream encryption algorithms on is a dimensional vector space on the finite field F q , and q is generally selected as a large prime number satisfying 2 (16) <q<2 (60) , which is the threshold for SIMD operation, that is, batch conversion. According to the given q and security level requirements, suitable can be calculated. In the "Pasta+BFV" encryption scheme constituted in this way, the encrypted plaintext over a finite field F q can be transformed into a BFV homomorphic ciphertext at one time.
应注意到:以上关于混合同态密码方案算法组成的陈述只是为了对混合同态加密的构造方式和执行过程进行说明,且在上述定义中混合同态加密算法每一次对明文进行加密时都伴随着同态加密对称密码的操作,而在实际应用中这些操作可以进行简化分割。因此,以下对于本发明的实施例说明中并不直接使用混合同态加密方案的正式定义形式,而是基于混合同态加密的思想,延用基同态加密和对称加密的术语进行说明。It should be noted that the above statement about the composition of the hybrid homomorphic encryption scheme algorithm is only to illustrate the construction method and execution process of the hybrid homomorphic encryption, and in the above definition, the hybrid homomorphic encryption algorithm is accompanied by each time the plaintext is encrypted. The operation of the homomorphic encryption symmetric cipher, and these operations can be simplified and divided in practical applications. Therefore, the following description of the embodiments of the present invention does not directly use the formal definition form of the hybrid homomorphic encryption scheme, but based on the idea of hybrid homomorphic encryption, the terms of basic homomorphic encryption and symmetric encryption are used for description.
(4)拉格朗日插值法:拉格朗日插值法是一种多项式插值方法。给定+1个坐标不同的点,拉格朗日插值法可以给出一个恰好经过这+1个点的阶多项式函数。其计算思路为首先在所给节点上计算出节点基函数,然后做基函数的线性组合,得到一个组合系数为节点函数值的插值多项式。(4) Lagrange interpolation method: Lagrange interpolation method is a polynomial interpolation method. Given +1 points with different coordinates, Lagrangian interpolation can give an order polynomial function that passes through exactly these +1 points. The calculation idea is to first calculate the node basis function on the given node, and then do the linear combination of the basis functions to obtain an interpolation polynomial whose combination coefficient is the value of the node function.
具体计算过程说明如下:The specific calculation process is described as follows:
给定+1个不同的插值点xi,(i=0,1,…,n.),以及相应的数值f(xi)。首先计算插值基函数:Given +1 different interpolation points x i , (i=0,1,...,n.), and corresponding values f(xi ) . First compute the interpolation basis functions:
显然,li(x)也是一个n阶多项式,且满足Obviously, l i (x) is also a polynomial of order n, and satisfies
然后对基函数做线性组合:Then do a linear combination of the basis functions:
于是,得到了一个n阶多项式Ln(x),它显然满足Ln(ni)=f(xi),即为经过n+1个给定插值点的拉格朗日插值多项式。Thus, an n-order polynomial L n (x) is obtained, which obviously satisfies L n (n i )=f(xi ) , that is, a Lagrangian interpolation polynomial passing through n+1 given interpolation points.
实施例Example
参阅图1,本发明采用基于客户-服务器架构的中心化联邦学习结构,包括三类实体:密钥生成机构PKG、n个客户端以及聚合服务器。密钥生成机构负责参数初始化和密钥分发,完成初始化任务后便不再参与后续过程。每一个客户端Pi,(+∈N,N={1,2,…,n})都有一个私有数据集Di={<xj,yj>|j=1,2,…,T},其中xj是输入,yj是标签,T=|Di|表示数据集的大小。使用该数据集,客户端在本地训练一个局部模型f(x,M),其中x为输入,M为模型参数,训练的目标是获得使损失函数最小化的模型参数,即达到模型收敛。因此,在每一轮模型更新中,Pi选择Di的一个随机子集在本地训练模型,然后计算出损失函数的梯度Wi。为了加速模型收敛,弥补本地数据不足的问题,各客户端不直接使用本地梯度Wi进行更新,而是通过获取本地梯度与其他客户端梯度的聚合值,即全局梯度进行更新。于是,各客户端对本地梯度加密并添加验证码得到密文ci并上传,请求聚合服务器对所有客户端的本地梯度进行聚合。聚合服务器接收来自所有客户端的密文后,对密文进行聚合得到C并发回给各个客户端。各客户端接收聚合密文之后对密文进行解包得到聚合验证码和全局梯度WglobPl。根据验证结果,客户端决定是否使用WglobPl更新模型,然后进入下一轮迭代直到模型收敛或达到约定的最大联邦轮数。Referring to Fig. 1, the present invention adopts a centralized federated learning structure based on a client-server architecture, including three types of entities: a key generation organization PKG, n clients, and an aggregation server. The key generation organization is responsible for parameter initialization and key distribution, and will no longer participate in the subsequent process after completing the initialization task. Each client P i , (+∈N, N={1,2,…,n}) has a private data set D i ={<x j ,y j >|j=1,2,…, T}, where x j is the input, y j is the label, and T = |D i | denotes the size of the dataset. Using this data set, the client trains a local model f(x,M) locally, where x is the input and M is the model parameter. The goal of training is to obtain the model parameters that minimize the loss function, that is, to achieve model convergence. Therefore, in each round of model update, P i selects a random subset of D i to train the model locally, and then calculates the gradient W i of the loss function. In order to speed up model convergence and make up for the lack of local data, each client does not directly use the local gradient W i to update, but obtains the aggregate value of the local gradient and other client gradients, that is, the global gradient to update. Therefore, each client encrypts the local gradient and adds a verification code to obtain the ciphertext ci and uploads it, and requests the aggregation server to aggregate the local gradients of all clients. After the aggregation server receives the ciphertext from all clients, it aggregates the ciphertext to obtain C and sends it back to each client. After each client receives the aggregated ciphertext, it unpacks the ciphertext to obtain the aggregated verification code and the global gradient W globPl . According to the verification result, the client decides whether to use W globPl to update the model, and then enters the next round of iteration until the model converges or reaches the agreed maximum number of federation rounds.
参阅图2,本发明提出了一种基于混合同态加密的可验证联邦学习方法,包括以下步骤:Referring to Fig. 2, the present invention proposes a verifiable federated learning method based on hybrid homomorphic encryption, including the following steps:
步骤A:初始化Step A: Initialize
n个客户端根据业务需求进行协商,就训练模型达成约定;密钥生成机构PKG生成初始化全局模型、密钥和公共参数,并按要求分发给客户端和聚合服务器;N clients negotiate according to business requirements and reach an agreement on the training model; the key generation agency PKG generates and initializes the global model, keys and public parameters, and distributes them to clients and aggregation servers as required;
步骤B:模型训练与数据加密Step B: Model Training and Data Encryption
在每一轮模型更新中,各客户端使用本地数据训练局部模型,并计算出本轮迭代的模型更新参数;然后,对更新参数进行预处理,构造验证码并组成明文;最后,对明文进行Pasta对称加密,并将Pasta对称密文发送给聚合服务器;In each round of model update, each client uses local data to train the local model, and calculates the model update parameters of this round of iteration; then, preprocesses the update parameters, constructs the verification code and composes the plaintext; finally, the plaintext Pasta symmetric encryption, and the Pasta symmetric ciphertext is sent to the aggregation server;
步骤C:聚合Step C: Aggregation
聚合服务器接收到来自所有客户端的Pasta对称密文之后,首先进行密文转换得到BFV同态密文;然后对所有的BFV同态密文进行聚合,并将聚合结果发送给所有客户端;After the aggregation server receives the Pasta symmetric ciphertext from all clients, it first converts the ciphertext to obtain the BFV homomorphic ciphertext; then aggregates all the BFV homomorphic ciphertext and sends the aggregation result to all clients;
步骤D:聚合结果验证与模型更新Step D: Aggregation result verification and model update
各客户端接收到来自聚合服务器的聚合结果之后,进行同态解密;接着,依靠拉格朗日插值法对聚合结果进行检验,若验证通过则使用聚合结果更新模型,否则丢弃该聚合结果;然后,进入下一轮迭代,即重复执行步骤B~步骤D,直到模型收敛或达到最大联邦轮数。After each client receives the aggregation result from the aggregation server, it performs homomorphic decryption; then, it relies on the Lagrangian interpolation method to verify the aggregation result, and if the verification is passed, the aggregation result is used to update the model; otherwise, the aggregation result is discarded; then , to enter the next round of iterations, that is, repeat steps B to D until the model converges or reaches the maximum number of federation rounds.
所述步骤A具体包括:Described step A specifically comprises:
步骤A1:模型初始化。所有的客户端根据训练目标就训练模型达成约定,PKG根据这一约定生成初始化全局模型d(x,M),学习率η、梯度的量化精度lw、映射有限域Fq以及模型的最大联邦轮数rmax,并分发给所有客户端作为初始模型,即第一轮模型更新中所用的局部模型。Step A1: Model initialization. All clients reach an agreement on the training model according to the training goal, and PKG generates an initial global model d(x,M), learning rate η, gradient quantization accuracy l w , mapping finite field F q and the model's maximum federated model according to this agreement The number of rounds r max is distributed to all clients as the initial model, that is, the local model used in the first round of model update.
步骤A2:密钥初始化。PKG根据有限域Fq和安全级别要求计算出SIMD操作阈值t,生成上的Pasta加密方案、BFV加密方案以及安全参数。然后,对于每一个客户端Pi生成一个对其他客户端Pj,(j∈N,j≠i)和聚合服务器保密的Pasta密钥ki。接着,生成公用的BFV密钥(pk,sk,evk),其中公钥pk和计算密钥evk对所有参与方公开(n个客户端和聚合服务器),私钥sk对所有的客户端共享,而对聚合服务器保密。接着,依次使用BFV公钥pk加密各个客户端的Pasta密钥,即对于客户端Pi,i∈N,计算/>构成用户列表发送给聚合服务器。Step A2: Key initialization. PKG calculates the SIMD operation threshold t according to the finite field F q and security level requirements, and generates Pasta encryption scheme, BFV encryption scheme and security parameters on the above. Then, for each client P i generate a Pasta key ki that is kept secret from other clients P j , (j∈N, j≠i) and the aggregation server. Next, generate public BFV keys (pk, sk, evk), where the public key pk and calculation key evk are open to all participants (n clients and aggregation servers), and the private key sk is shared with all clients. It is kept secret from the aggregation server. Next, use the BFV public key pk to encrypt the pasta key of each client in turn, that is, for the client P i , i∈N, calculate /> form user list sent to the aggregation server.
步骤A3:公共参数初始化。PKG生成一个用于拉格朗日插值的参数序列{a1,a2,…,at},并将其发送给所有客户端。其中,t为Pasta加密方案的SIMD操作阈值。Step A3: Public parameter initialization. PKG generates a parameter sequence {a 1 ,a 2 ,…,a t } for Lagrangian interpolation and sends it to all clients. where t is the SIMD operation threshold of the Pasta encryption scheme.
本发明所述步骤B具体包括:Step B of the present invention specifically includes:
步骤B1:模型训练。在每一轮模型更新中,客户端Pi在本地私有数据集中随机选择一个子集训练局部模型,计算出本轮训练的损失函数/> 以及用于更新梯度/>其中,/>为梯度算符。记 其中ngn=|Wi|为Wi的长度。Step B1: Model training. In each round of model update, client P i randomly selects a subset of the local private dataset Train the local model and calculate the loss function of the current round of training /> and for updating gradients /> where, /> is the gradient operator. remember Where n gn =|W i | is the length of W i .
步骤B2:数据预处理。为了对梯度进行加密,对梯度进行预处理以将其转换为适用于“PASTA+BFV”混合同态加密的形式。具体包括,首先对浮点数向量梯度Wi进行量化转换为整数向量,然后将量化后的梯度映射到有限域Fq上得到具体计算过程为:对于梯度向量的每一个分量/>计算/>得到 其中,(·)round和ψ(·)分别为量化函数和有限域映射函数:Step B2: Data preprocessing. To encrypt the gradients, the gradients are preprocessed to transform them into a form suitable for "PASTA+BFV" hybrid homomorphic encryption. Specifically, first, the floating-point vector gradient W i is quantized and converted into an integer vector, and then the quantized gradient is mapped to the finite field F q to obtain The specific calculation process is: for each component of the gradient vector /> calculation /> get Among them, (·) round and ψ(·) are quantization function and finite field mapping function respectively:
表示小于或等于x的最大整数。 Represents the largest integer less than or equal to x.
最后,参阅图3,根据Pasta支持的SIMD操作的阈值参数对所有待加密数据进行分组:连续的-1个分量组成一组,当最后一组分量数目不足-1时,用零补足,故分组总数其中/>表示大于或等于x的最大整数。将第j个分组记为/> 则/> Finally, referring to Figure 3, all the data to be encrypted are grouped according to the threshold parameter of the SIMD operation supported by Pasta: consecutive -1 components form a group, and when the number of the last group of components is less than -1, it is filled with zeros, so grouping total where /> Represents the largest integer greater than or equal to x. Record the jth group as /> Then />
步骤B3:构造验证码。对于每一个经过预处理的梯度分组 采用拉格朗日插值法构造一个验证码,具体过程为:根据-1个插值点/>求得一个t-2阶的拉格朗日多项式/>然后代入得到验证码/> Step B3: Construct the verification code. For each preprocessed gradient group Use the Lagrange interpolation method to construct a verification code, the specific process is: according to -1 interpolation point /> Find a Lagrangian polynomial of order t-2 /> Then substitute in to get the verification code />
步骤B4:加密。对于每一个分组及其对应的验证码/>可以构造出一个明文向量/>使用Pasta密码对该明文向量进行加密得到对称密文 并将所有的Pasta对称密文发送给聚合服务器。Step B4: Encryption. for each group and its corresponding verification code /> A plaintext vector can be constructed /> Use the Pasta cipher to encrypt the plaintext vector to obtain the symmetric ciphertext And send all Pasta symmetric ciphertexts to the aggregation server.
本发明所述步骤C具体包括:Step C of the present invention specifically includes:
步骤C1:密文转换。聚合服务器接收到来自客户端的Pasta对称密文 之后,对于每一个客户端Pi从在步骤A2中收到的来自PKG的用户列表/>中检索出该用户经由BFV加密过的Pasta密钥/>然后根据BFV密码的全同态性,同态地解密对称密文以将其转换为BFV同态密文/> Step C1: Ciphertext conversion. The aggregation server receives the pasta symmetric ciphertext from the client Afterwards, for each client P i from the user list from PKG received in step A2 /> retrieve the pasta key encrypted by the user via BFV /> Then according to the fully homomorphic property of the BFV cipher, the symmetric ciphertext is decrypted homomorphically to convert it into a BFV homomorphic ciphertext />
步骤C2:聚合。将所有接收到的Pasta对称密文转换为BFV密文之后,聚合服务器对BFV密文进行聚合其中h为聚合函数。然后将聚合结果(C(j))HE,j∈Ngroup发送给各个客户端。Step C2: Polymerization. After converting all received Pasta symmetric ciphertexts into BFV ciphertexts, the aggregation server aggregates the BFV ciphertexts where h is the aggregation function. Then the aggregation result (C (j) ) HE ,j∈N group is sent to each client.
本发明所述步骤D具体包括:Step D of the present invention specifically includes:
步骤D1:验证。客户端接收到来自聚合服务器的聚合结果之后,对其进行解密m(j)=BFV.Dec((R(j))HE),j∈Ngroup,根据m(j)=G(j)||v(j)得出聚合后的梯度分组G(j)和聚合验证码v(j)。然后对聚合后的梯度分组使用拉格朗日插值法构造出检验码:即对于每一个j∈Ngroup,根据t-1个插值点构造出拉格朗日多项式(Lt-2(x))(j),然后代入at得到验证码(Lt-2(at))(j)。然后,验证检验码(Lt-2(at)(j)是否与聚合验证码v(j)相等。若相等,则验证成功;否则,验证失败。Step D1: Verification. After the client receives the aggregation result from the aggregation server, it decrypts m (j) = BFV.Dec((R (j) ) HE ), j∈N group , according to m (j) = G (j) | |v (j) obtains the aggregated gradient group G (j) and the aggregated verification code v (j) . Then group the aggregated gradients Use the Lagrange interpolation method to construct the check code: that is, for each j∈N group , according to t-1 interpolation points Construct the Lagrange polynomial (L t-2 (x)) (j) , and then substitute a t to get the verification code (L t-2 (a t )) (j) . Then, verify whether the verification code (L t-2 ( at ) (j) is equal to the aggregation verification code v (j) . If they are equal, the verification is successful; otherwise, the verification fails.
步骤D2:更新模型。如果验证成功,则根据所有的聚合分组,可以得到聚合梯度 依赖BFV的加法同态性,可以证明/> 即聚合过程的正确性。于是,客户端对聚合后的梯度进行恢复,即对每一个分量/>执行数据预处理步骤B2中的逆操作/>得到恢复的梯度/>然后各客户端用恢复的聚合梯度更新模型如果验证失败,则丢弃该聚合值。Step D2: Update the model. If the verification is successful, the aggregated gradient can be obtained according to all aggregated groups Relying on the additive homomorphism of BFV, it can be proved that /> That is, the correctness of the aggregation process. Therefore, the client restores the aggregated gradient, that is, for each component /> Perform the inverse of the data preprocessing step B2 /> get restored gradients /> Each client then updates the model with the recovered aggregated gradients If validation fails, the aggregate value is discarded.
步骤D3:迭代进入下一轮训练,即重复执行步骤B1~步骤D3,直到模型收敛或者达到最大联邦轮数rmax。Step D3: Enter the next round of training iteratively, that is, repeat steps B1 to D3 until the model converges or reaches the maximum number of federation rounds r max .
实施例中以客户端P*+∈N,N={1,2,…,n})为例对本发明流程中客户端相关操作进行具体说明,需要注意的是Pi并不局限于某一特定客户端,而代表所有的客户端将同等的操作。In the embodiment, the client P * +∈N, N={1,2,...,n}) is taken as an example to describe the client-related operations in the process of the present invention. It should be noted that P i is not limited to a certain specific client, while representing all clients will operate equally.
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above-described embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still carry out the foregoing embodiments The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310620541.3A CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310620541.3A CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116667996A true CN116667996A (en) | 2023-08-29 |
Family
ID=87709182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310620541.3A Pending CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116667996A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117196017A (en) * | 2023-09-28 | 2023-12-08 | 数力聚(北京)科技有限公司 | Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification |
CN117560229A (en) * | 2024-01-11 | 2024-02-13 | 吉林大学 | Federal non-intrusive load monitoring user verification method |
CN117811722A (en) * | 2024-03-01 | 2024-04-02 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
TWI846601B (en) * | 2023-09-19 | 2024-06-21 | 英業達股份有限公司 | Operating system and method for a fully homomorphic encryption neural network model |
CN118317296A (en) * | 2024-04-08 | 2024-07-09 | 湖北长江北斗数字产业有限公司 | A data encryption method for Beidou short message transmission |
CN119338029A (en) * | 2024-12-20 | 2025-01-21 | 中电科大数据研究院有限公司 | Decentralized hierarchical federation learning method and system and edge server |
-
2023
- 2023-05-30 CN CN202310620541.3A patent/CN116667996A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI846601B (en) * | 2023-09-19 | 2024-06-21 | 英業達股份有限公司 | Operating system and method for a fully homomorphic encryption neural network model |
CN117196017A (en) * | 2023-09-28 | 2023-12-08 | 数力聚(北京)科技有限公司 | Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification |
CN117560229A (en) * | 2024-01-11 | 2024-02-13 | 吉林大学 | Federal non-intrusive load monitoring user verification method |
CN117560229B (en) * | 2024-01-11 | 2024-04-05 | 吉林大学 | A federated non-intrusive load monitoring user verification method |
CN117811722A (en) * | 2024-03-01 | 2024-04-02 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
CN117811722B (en) * | 2024-03-01 | 2024-05-24 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
CN118317296A (en) * | 2024-04-08 | 2024-07-09 | 湖北长江北斗数字产业有限公司 | A data encryption method for Beidou short message transmission |
CN119338029A (en) * | 2024-12-20 | 2025-01-21 | 中电科大数据研究院有限公司 | Decentralized hierarchical federation learning method and system and edge server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mouchet et al. | Multiparty homomorphic encryption from ring-learning-with-errors | |
CN116667996A (en) | Verifiable federal learning method based on mixed homomorphic encryption | |
Hijazi et al. | Secure federated learning with fully homomorphic encryption for iot communications | |
Ion et al. | Private intersection-sum protocol with applications to attributing aggregate ad conversions | |
Hassan et al. | An efficient outsourced privacy preserving machine learning scheme with public verifiability | |
CN112380578A (en) | Edge computing framework based on block chain and trusted execution environment | |
KR20200087252A (en) | Method and system for key agreement using semi-group | |
Ananth et al. | From FE combiners to secure MPC and back | |
Luo et al. | SVFL: Efficient secure aggregation and verification for cross-silo federated learning | |
Shen et al. | ABNN2: secure two-party arbitrary-bitwidth quantized neural network predictions | |
Zhang et al. | Privacyeafl: Privacy-enhanced aggregation for federated learning in mobile crowdsensing | |
Cui et al. | An efficient attribute-based multi-keyword search scheme in encrypted keyword generation | |
CN116451805A (en) | A privacy-preserving federated learning method based on blockchain anti-poisoning attack | |
Zhang et al. | Practical and efficient attribute-based encryption with constant-size ciphertexts in outsourced verifiable computation | |
CN111159727B (en) | Multi-party cooperation oriented Bayes classifier safety generation system and method | |
An et al. | QChain: Quantum-resistant and decentralized PKI using blockchain | |
CN118381600B (en) | Federal learning privacy protection method and system | |
CN112291053B (en) | A CP-ABE Method Based on Lattice and Basic Access Tree | |
Peng et al. | Efficient distributed decryption scheme for IoT gateway-based applications | |
Feng et al. | Batch-aggregate: Efficient aggregation for private federated learning in vanets | |
Liu et al. | ESA-FedGNN: Efficient secure aggregation for federated graph neural networks | |
Gong et al. | Nearly optimal protocols for computing multi-party private set union | |
Karakoç et al. | SET-OT: A secure equality testing protocol based on oblivious transfer | |
CN111277406A (en) | A method for comparing the advantages of secure two-way vector based on blockchain | |
Shi et al. | Quantum scheme for privacy-preserving range max/min query in edge-based internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |