CN111915294A - Safety, privacy protection and tradable distributed machine learning framework based on block chain technology - Google Patents

Safety, privacy protection and tradable distributed machine learning framework based on block chain technology Download PDF

Info

Publication number
CN111915294A
CN111915294A CN202010496847.9A CN202010496847A CN111915294A CN 111915294 A CN111915294 A CN 111915294A CN 202010496847 A CN202010496847 A CN 202010496847A CN 111915294 A CN111915294 A CN 111915294A
Authority
CN
China
Prior art keywords
node
model
nodes
machine learning
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010496847.9A
Other languages
Chinese (zh)
Other versions
CN111915294B (en
Inventor
曹向辉
梁伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010496847.9A priority Critical patent/CN111915294B/en
Publication of CN111915294A publication Critical patent/CN111915294A/en
Application granted granted Critical
Publication of CN111915294B publication Critical patent/CN111915294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0823Network architectures or network communication protocols for network security for authentication of entities using certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架,该框架包括以下部分:证书授权中心CA负责为区块链节点发放和撤销数字证书,对节点进行权限管理;区块链节点负责维护机器学习模型和参与机器学习模型交易;智能合约规定了分布式机器学习的运行规则和按照模型贡献度对节点进行收益划分;分布式账本记录了机器学习模型训练过程中的模型数据和模型交易数据;数据提供者负责收集本地数据上传至区块链节点服务器。The invention discloses a secure, privacy-protected, and tradable distributed machine learning framework based on blockchain technology. The framework includes the following parts: a certificate authority (CA) is responsible for issuing and revoking digital certificates for blockchain nodes; Perform authority management; blockchain nodes are responsible for maintaining machine learning models and participating in machine learning model transactions; smart contracts specify the operating rules of distributed machine learning and divide nodes according to model contributions; distributed ledgers record machine learning models Model data and model transaction data in the training process; the data provider is responsible for collecting local data and uploading it to the blockchain node server.

Description

一种基于区块链技术的安全、隐私保护、可交易的分布式机器 学习框架A secure, privacy-protected, tradable distributed machine based on blockchain technology learning framework

技术领域technical field

本发明涉及一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架,具体涉及一种利用区块链(联盟链)技术来解决分布式机器学习中的拜占庭攻击问题,同时利用差分隐私技术保护每个参与方的数据集隐私,并可以完成机器学习模型交易的框架,属于人工智能、区块链和信息安全领域。The invention relates to a secure, privacy-protected, and tradable distributed machine learning framework based on blockchain technology, and in particular to a solution to the Byzantine attack problem in distributed machine learning using blockchain (consortium chain) technology, At the same time, differential privacy technology is used to protect the data set privacy of each participant, and it can complete the framework of machine learning model transactions, which belongs to the fields of artificial intelligence, blockchain and information security.

背景技术Background technique

在分布式机器学习里常用到的参数服务器框架中,多个工作节点利用本地数据和当前全局模型训练得到局部模型,将其发送至参数服务器,参数服务器将所有的局部模型聚合,更新全局模型。但在这一过程中可能存在安全问题,其工作节点和参数服务器节点都有可能受到拜占庭攻击。具体来说,工作节点受到拜占庭攻击会向参数服务器发送一个错误的局部梯度,从而影响最终训练的模型效果;参数服务器节点受到拜占庭攻击会聚合出一个错误的全局模型,使得前面的训练都白费。近年来,由于区块链具有不可篡改、可追溯性、分布式存储、公共维护等优势,研究人员开始尝试将区块链用于物联网、医疗、金融等领域,已解决里面的安全、交易等问题。In the parameter server framework commonly used in distributed machine learning, multiple worker nodes use local data and the current global model to train to obtain a local model, and send it to the parameter server. The parameter server aggregates all the local models and updates the global model. However, there may be security problems in this process, and both its worker nodes and parameter server nodes may be subject to Byzantine attacks. Specifically, a Byzantine attack on a worker node will send a wrong local gradient to the parameter server, which affects the final training model effect; a Byzantine attack on a parameter server node will aggregate a wrong global model, making the previous training all in vain. In recent years, due to the advantages of blockchain, such as non-tampering, traceability, distributed storage, and public maintenance, researchers have begun to try to use blockchain in the Internet of Things, medical care, finance and other fields, and have solved the security, transaction, etc. And other issues.

至今,分布式机器学习中的拜占庭攻击问题研究已经取得了一些的研究成果。然而还存在以下的问题:1).已有的分布式机器学习算法没有考虑到参数服务器在聚合模型时受到拜占庭攻击的情况;2).对检测出的拜占庭节点如何处理,防止其干扰模型训练;3).如何在与分布式机器学习结合的区块链系统中实现一种激励机制,帮助系统更有效地运行;因此,迫切的需要一种新的方案解决上述技术问题。So far, some research results have been achieved in the Byzantine attack problem in distributed machine learning. However, there are still the following problems: 1) The existing distributed machine learning algorithms do not consider the Byzantine attack on the parameter server when aggregating models; 2) How to deal with the detected Byzantine nodes to prevent them from interfering with model training 3). How to implement an incentive mechanism in the blockchain system combined with distributed machine learning to help the system run more efficiently; therefore, a new solution is urgently needed to solve the above technical problems.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的问题是针对分布式机器学习,提供算法来解决工作节点和参数服务器节点受到拜占庭攻击问题,若引入区块链技术,需要解决区块链中的共识问题并提供一个有效的激励机制,促进区块链系统能够有效、长久地运行。The problem to be solved by the present invention is to provide an algorithm for distributed machine learning to solve the problem of Byzantine attack on working nodes and parameter server nodes. If blockchain technology is introduced, it is necessary to solve the consensus problem in the blockchain and provide an effective incentive mechanism to promote the effective and long-term operation of the blockchain system.

为了解决上述技术问题,本发明,一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架,其组成部分在于:部分1,多证书授权中心CA负责为区块链节点发放和撤销数字证书,对节点进行权限管理;部分2,区块链节点由用户节点和交易节点构成,分别负责维护机器学习模型和参与机器学习模型交易;部分3,智能合约由机器学习智能合约 (MLMC)和模型贡献智能合约(MCMC)构成,分布规定了分布式机器学习的运行规则和按照模型贡献度对节点进行收益划分;部分4,分布式账本记录了机器学习模型训练过程中的模型数据(包括局部模型和全局模型情况)和模型交易数据;步骤5,数据提供者负责收集本地数据上传至区块链节点服务器。该方案中,证书授权中心CA会对要加入系统的所有节点进行条件审核、监督和权限管理,在一定程度上可以避免恶意节点的加入,从而保障了系统的安全。交易节点和后面加入的用户节点都需要支付入链手续费(模型交易费)。交易节点在同步区块信息后,就会退出系统。而用户节点若被辨别为恶意节点,则也会退出系统,且无法退回之前的入链手续费并无法得到后面的模型交易费,这也是对恶意节点的惩罚,智能合约的规则对所有用户节点开放,并且它的内容很难被恶意节点篡改,分布式账本记录了机器学习模型训练过程中的模型数据和模型交易数据,它保证了数据的可追溯性,所有作恶数据都会被记录,从一定程度上保证了系统的安全。若系统的各节点不需要数据集隐私保护,这里的局部梯度可以不加入高斯噪声;同时,对数据集进行隐私保护的方法有很多,若有更合适的方法,可以切换成其它隐私保护方法。In order to solve the above technical problems, the present invention, a distributed machine learning framework based on blockchain technology for security, privacy protection, and tradability, consists of the following parts: Part 1, the multi-certificate authority CA is responsible for the blockchain node Issue and revoke digital certificates, and manage the rights of nodes; part 2, blockchain nodes are composed of user nodes and transaction nodes, which are responsible for maintaining machine learning models and participating in machine learning model transactions; part 3, smart contracts are composed of machine learning smart contracts (MLMC) and Model Contribution Smart Contract (MCMC), the distribution specifies the operating rules of distributed machine learning and the income division of nodes according to the model contribution degree; Part 4, the distributed ledger records the model in the training process of the machine learning model. Data (including local model and global model situation) and model transaction data; Step 5, the data provider is responsible for collecting local data and uploading it to the blockchain node server. In this scheme, the certificate authority CA will conduct conditional review, supervision and authority management on all nodes to be added to the system, which can avoid the joining of malicious nodes to a certain extent, thus ensuring the security of the system. Both the transaction node and the user node that joins later need to pay the chain entry fee (model transaction fee). After the transaction node synchronizes the block information, it will exit the system. If the user node is identified as a malicious node, it will also exit the system, and the previous entry fee cannot be returned and the subsequent model transaction fee cannot be obtained. This is also a punishment for malicious nodes. The rules of the smart contract apply to all user nodes. It is open, and its content is difficult to be tampered with by malicious nodes. The distributed ledger records the model data and model transaction data during the training process of the machine learning model. It ensures the traceability of the data. All malicious data will be recorded. To a certain extent, the security of the system is guaranteed. If each node of the system does not require data set privacy protection, Gaussian noise may not be added to the local gradient here. At the same time, there are many methods for data set privacy protection. If there is a more suitable method, you can switch to other privacy protection methods.

一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架的运行方法,其运行方法包括以下步骤:An operation method of a secure, privacy-protected, and tradable distributed machine learning framework based on blockchain technology, and the operation method includes the following steps:

步骤1,联盟链初始化阶段:CA服务器给联盟链初始节点发布数字证书,所有的参与者建立连接,达成一些初始共识;Step 1, the initial stage of the alliance chain: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connections and reach some initial consensus;

步骤2,参数初始化阶段:所有用户节点达成神经网络模型的一致性共识,并同步系统的测试集数据;Step 2, parameter initialization phase: all user nodes reach a consensus on the neural network model, and synchronize the test set data of the system;

步骤3,局部梯度计算阶段:所有用户节点按照id从小到大的顺序依次选出主节点,则剩余节点为背书节点,然后各节点利用本地数据和当前模型计算局部梯度,并在梯度上加入高斯噪声,使其局部梯度满足差分隐私,最后将局部梯度发送给主节点和背书节点;Step 3, the local gradient calculation stage: all user nodes select the main node in the order of id from small to large, then the remaining nodes are endorsed nodes, and then each node uses the local data and the current model to calculate the local gradient, and adds Gaussian to the gradient noise, so that its local gradient satisfies differential privacy, and finally sends the local gradient to the master node and the endorsement node;

步骤4,全局模型更新阶段:主节点根据各节点的局部梯度,按照具有拜占庭容错的梯度聚合算法计算全局梯度,然后系统运行IPBFT共识算法,若全局梯度得到系统共识,则更新全局模型,并将全局模型的有关信息写入区块中;Step 4, the global model update stage: the master node calculates the global gradient according to the local gradient of each node and the gradient aggregation algorithm with Byzantine fault tolerance, and then the system runs the IPBFT consensus algorithm. The relevant information of the global model is written into the block;

步骤5,训练终止阶段:当训练模型达到预期要求,系统便不再训练模型了,其后续的作用为维护模型交易。Step 5, training termination stage: when the training model meets the expected requirements, the system will no longer train the model, and its subsequent role is to maintain model transactions.

作为本发明的一种改进,步骤1:联盟链初始化阶段,具体如下:As an improvement of the present invention, step 1: the initial stage of the alliance chain, as follows:

CA服务器给联盟链初始节点发布数字证书,所有的参与者建立连接,达成一些初始的共识:a.统一大家的数据集建立的标准;b.统一模型交易费的标准;c.统一主节点和背书节点的选取规则。The CA server issues a digital certificate to the initial node of the alliance chain, and all participants establish connections to reach some initial consensus: a. Unify the standards for establishing everyone's data sets; b. Unify the standards for model transaction fees; c. Unify the master node and Selection rules for endorsement nodes.

作为本发明的一种改进,步骤2:参数初始化阶段,具体如下:在参数初始化阶段,所有用户节点达成神经网络模型的一致性共识,包括确定好神经网络模型的网络结构,批大小 B、训练迭代次数T、学习率ηt、初始权重w0、裁剪阈值为C,噪声大小σ等参数,同时,区块链节点将数据集标准下发给数据提供者,数据提供者收集训练集并上传至区块链节点,当神经网络模型和数据集都准备好之后,所有用户节点,都贡献出测试集,统一系统的测试集数据。之后,整个系统便可以开始神经网络模型训练了。As an improvement of the present invention, step 2: parameter initialization stage, which is specifically as follows: in the parameter initialization stage, all user nodes reach a consensus on the neural network model, including determining the network structure of the neural network model, batch size B, training The number of iterations T, the learning rate η t , the initial weight w 0 , the clipping threshold value C, the noise size σ and other parameters, at the same time, the blockchain node sends the data set standard to the data provider, and the data provider collects the training set and uploads it To the blockchain node, when the neural network model and the data set are ready, all user nodes contribute the test set to unify the test set data of the system. After that, the whole system can start neural network model training.

作为本发明的一种改进,步骤3:局部梯度计算阶段,具体如下:As an improvement of the present invention, step 3: local gradient calculation stage, as follows:

首先区块链中所有的用户节点确定主节点和背书节点,若主节点的id为i,则背书节点的id为i+1,i+2,…,i+m,然后每个节点根据自己的数据集和当前模型得到局部梯度,并在局部梯度上加入差分隐私,将其发送至主节点和背书节点;First, all user nodes in the blockchain determine the master node and the endorsement node. If the id of the master node is i, the id of the endorsement node is i+1, i+2,...,i+m, and then each node according to its own The data set and the current model get the local gradient, add differential privacy to the local gradient, and send it to the master node and the endorsement node;

具体计算过程如下:假设在第t步迭代中,第k个节点中取到的B个训练数据集为

Figure BDA0002523210140000031
全局模型权重为wt,裁剪阈值为C,噪声大小σ;The specific calculation process is as follows: Suppose that in the t-th iteration, the B training data sets obtained from the k-th node are
Figure BDA0002523210140000031
The global model weight is wt , the clipping threshold is C, and the noise size σ;

在第t步迭代,第k的工作节点每个样本的局部梯度为At the t-th iteration, the local gradient of each sample of the k-th worker node is

Figure BDA0002523210140000032
Figure BDA0002523210140000032

其中,模型预测结果为

Figure BDA0002523210140000033
l()为损失函数;Among them, the model prediction result is
Figure BDA0002523210140000033
l() is the loss function;

然后对局部梯度进行裁剪,并加入高斯噪声,最后得到第k个节点的局部梯度gk(wt)为Then the local gradient is clipped, and Gaussian noise is added, and finally the local gradient g k (w t ) of the kth node is obtained as

Figure BDA0002523210140000034
Figure BDA0002523210140000034

最后,各节点将自己的局部梯度发送至主节点和背书节点。Finally, each node sends its own local gradient to the master node and the endorsing node.

作为本发明的一种改进,步骤4:全局模型更新阶段,具体如下:主节点在收到各个节点的局部梯度后,运行具有拜占庭容错的梯度聚合算法来聚合局部梯度得到全局梯度并更新模型,同时采用moments accountant来跟踪隐私损失,接着,系统会运行IPBFT共识算法:主节点首先会将聚合计算结果(包括主节点id,聚合梯度,差分隐私损失,被选中的节点id 和局部梯度信息)写入区块blockt中,然后将blockt发送至背书节点处进行验证,若验证通过则将区块blockt广播至所有区块链节点,该区块就成功加入区块链中了。As an improvement of the present invention, step 4: the global model update stage, which is specifically as follows: after receiving the local gradients of each node, the master node runs the gradient aggregation algorithm with Byzantine fault tolerance to aggregate the local gradients to obtain the global gradient and update the model, At the same time, moments accountant is used to track the privacy loss. Then, the system will run the IPBFT consensus algorithm: the master node will first write the aggregation calculation results (including master node id, aggregation gradient, differential privacy loss, selected node id and local gradient information) to write Enter the block t into the block t , and then send the block t to the endorsement node for verification. If the verification passes, the block t is broadcast to all blockchain nodes, and the block is successfully added to the blockchain.

在步骤4中,区块链共识算法IPBFT能对梯度聚合结果进行有效验证,并能有效辨别出恶意节点,同时该算法适用于联盟链,与公有链共识算法(如PoW,PoS,PoET等等)相比,可以在较短时间内完成交易确认,并且该算法的通信复杂度更低。In step 4, the blockchain consensus algorithm IPBFT can effectively verify the gradient aggregation results, and can effectively identify malicious nodes. At the same time, the algorithm is suitable for alliance chains, and public chain consensus algorithms (such as PoW, PoS, PoET, etc. ), the transaction confirmation can be completed in a shorter time, and the communication complexity of the algorithm is lower.

与现有技术相比,本发明具有以下优点:1)本发明提供的基于区块链技术的分布式机器学习框架有很强的实用性,能用于所有基于梯度下降的分布式机器学习算法;2)本发明采用 CA对区块链节点(包括交易节点和用户节点)实现了有效的权限管理。对于交易节点,CA能收取其机器学习模型交易费,并控制权限有效期;对于恶意节点,CA能够撤销其用户权限,避免其破坏机器学习模型;3)本发明提出的IPBFT共识算法能有效抵御参数服务器节点聚合过程受到拜占庭攻击的同时辨别并剔除恶意节点,使系统越来越安全;4)本发明在区块链上有效实现了一个激励机制。具体来说,在区块链上部署智能合约,来实现对模型交易费的合理分配;5)本发明在分布式机器学习中加入差分隐私,能有效保护系统参与者的数据集隐私。Compared with the prior art, the present invention has the following advantages: 1) The distributed machine learning framework based on blockchain technology provided by the present invention has strong practicability and can be used for all distributed machine learning algorithms based on gradient descent 2) The present invention adopts CA to realize effective authority management for blockchain nodes (including transaction nodes and user nodes). For transaction nodes, CA can charge its machine learning model transaction fees and control the validity period of permissions; for malicious nodes, CA can revoke its user permissions to avoid damaging the machine learning model; 3) The IPBFT consensus algorithm proposed by the present invention can effectively resist parameters While the server node aggregation process is subjected to Byzantine attacks, malicious nodes are identified and eliminated, making the system more and more secure; 4) The present invention effectively implements an incentive mechanism on the blockchain. Specifically, smart contracts are deployed on the blockchain to achieve a reasonable distribution of model transaction fees; 5) The present invention adds differential privacy to distributed machine learning, which can effectively protect the data set privacy of system participants.

附图说明Description of drawings

图1是本发明提出的基于区块链技术的分布式机器学习框架;Fig. 1 is the distributed machine learning framework based on blockchain technology proposed by the present invention;

图2是本发明的CA框架图;Fig. 2 is the CA framework diagram of the present invention;

图3是本发明的运行流程图;Fig. 3 is the operation flow chart of the present invention;

图4为正常情况示意图;Figure 4 is a schematic diagram of a normal situation;

图5为本发明实例2在执行本地梯度计算(没有引入差分隐私)时,区块链20个节点有 8个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。5 is a schematic diagram of the comparison of the test set accuracy of the model obtained by different aggregation methods after the 20 nodes of the blockchain have 8 nodes under the Byzantine attack when the local gradient calculation is performed (differential privacy is not introduced) in Example 2 of the present invention.

图6为本发明实例3在执行本地梯度计算(引入了差分隐私)时,区块链20个节点有8 个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。Figure 6 is a schematic diagram of the comparison of the accuracy of the test set of the model obtained by different aggregation methods when 8 of the 20 nodes in the blockchain are subjected to Byzantine attack when performing local gradient calculation (introducing differential privacy) in Example 3 of the present invention.

图7为极其恶意情况下示意图;Figure 7 is a schematic diagram in an extremely malicious situation;

图8为本发明实例二在梯度聚合过程中,区块链100个节点有20个受到拜占庭攻击,分别运行IPBFT算法与PoW算法,节点数量随着迭代次数的变化对比图。Figure 8 is a comparison diagram of the number of nodes with the number of iterations when 20 of the 100 nodes in the blockchain are subjected to Byzantine attacks during the gradient aggregation process in Example 2 of the present invention, respectively running the IPBFT algorithm and the PoW algorithm.

图9为本发明实例二在执行本地梯度计算(没有引入差分隐私)时,区块链20个节点有8个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。Figure 9 is a schematic diagram of the comparison of the accuracy of the test set of the model obtained by different aggregation methods when 8 of the 20 nodes in the blockchain are subjected to Byzantine attack when local gradient calculation is performed (differential privacy is not introduced) in Example 2 of the present invention.

图10为本发明实例二在执行本地梯度计算(引入了差分隐私)时,区块链20个节点有 8个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。10 is a schematic diagram of the comparison of the accuracy of the test set of the model obtained by different aggregation methods after 8 nodes of the 20 nodes in the blockchain are subjected to Byzantine attack when performing local gradient calculation (introduced differential privacy) in Example 2 of the present invention.

具体实施方式Detailed ways

以下将结合附图及实施例来详细说明本发明的实施方式,借此对本发明如何应用技术手段来解决技术问题,并达成技术效果的实现过程能充分理解并据以实施。需要说明的是,只要不构成冲突,本发明中的各个实施例以及各实施例中的各个特征可以相互结合,所形成的技术方案均在本发明的保护范围之内。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples, so as to fully understand and implement the implementation process of how the present invention applies technical means to solve technical problems and achieve technical effects. It should be noted that, as long as there is no conflict, each embodiment of the present invention and each feature of each embodiment can be combined with each other, and the formed technical solutions all fall within the protection scope of the present invention.

实施例1:图1为本发明提出的基于区块链技术的安全、可交易的分布式机器学习框架。下面参照图1,详细说明该框架的各个组成部分。Embodiment 1: FIG. 1 is a secure and tradable distributed machine learning framework based on blockchain technology proposed by the present invention. 1, the various components of the framework will be described in detail.

一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架,所述框架包括以下部分:A secure, privacy-protected, tradable distributed machine learning framework based on blockchain technology, the framework includes the following parts:

部分1:证书授权中心CA;Part 1: Certificate Authority CA;

CA负责为区块链节点发放和撤销数字证书,对节点进行权限管理。它需要被所有区块链节点信任,同时也要被所有区块链节点监督。其结构如图2所示。为了安全起见,我们的CA 采用了比较常见的根CA和中间CA的根证书链实现方法。根CA不会为直接为服务器颁发证书,它会为自己生成两个中间CA(用户CA和交易者CA),中间CA作为根CA的代表为客户端应用签证,中间CA可以减少根CA的管理负担。CA is responsible for issuing and revoking digital certificates for blockchain nodes, and managing the rights of nodes. It needs to be trusted by all blockchain nodes and also supervised by all blockchain nodes. Its structure is shown in Figure 2. For security reasons, our CA adopts the common root CA and intermediate CA root certificate chain implementation method. The root CA will not issue certificates for the server directly, it will generate two intermediate CAs (user CA and trader CA) for itself. The intermediate CA acts as the representative of the root CA and applies the visa for the client. The intermediate CA can reduce the management of the root CA. burden.

部分2:区块链节点;Part 2: Blockchain Nodes;

在本发明的系统框架中,区块链节点有两种类型:交易节点和用户节点。In the system framework of the present invention, there are two types of blockchain nodes: transaction nodes and user nodes.

交易节点是外部用户希望获得训练模型,而加入区块链网络中的临时节点。交易节点在获得CA许可加入区块链后,执行一次区块同步,执行完区块同步后,数字证书就被注销,该节点就会从网络中退出。A transaction node is a temporary node that an external user wants to obtain a training model and joins the blockchain network. After the transaction node obtains the permission of the CA to join the blockchain, it performs a block synchronization. After the block synchronization is performed, the digital certificate is cancelled and the node is withdrawn from the network.

用户节点是构成区块链网络的主要组成部分,它的作用是维护训练我们的机器学习模型,并将数据打包写入区块链中的分布式账本中。每个用户节点都有局部梯度计算,全局模型聚合,记账,和验证区块信息等功能。User nodes are the main components that make up the blockchain network, and their role is to maintain and train our machine learning model, and package the data into the distributed ledger in the blockchain. Each user node has functions such as local gradient calculation, global model aggregation, bookkeeping, and verification of block information.

部分3:智能合约;Part 3: Smart Contracts;

在发明的系统框架中,有两个智能合约,分布是机器学习智能合约(MachineLearning Smart Contract,MLSC)和模型贡献智能合约(Model Contribution SmartContract,MCSC)。In the invented system framework, there are two smart contracts, the distribution is Machine Learning Smart Contract (MLSC) and Model Contribution Smart Contract (MCSC).

MLSC规定了分布式机器学习的运行规则,包括本地梯度计算、全局模型计算、IPBFT共识机制等等。MLSC specifies the operating rules of distributed machine learning, including local gradient calculation, global model calculation, IPBFT consensus mechanism, etc.

MCSC通过查看区块链中的账本信息,计算每个节点的模型贡献度,按照贡献度对模型交易费进行划分,同时,将交易信息写入区块链的记账节点也能分得一笔记账手续费。MCSC calculates the model contribution of each node by viewing the ledger information in the blockchain, and divides the model transaction fee according to the contribution. At the same time, the accounting node that writes the transaction information to the blockchain can also get a share Billing fee.

第i个节点的贡献度Ci具体计算过程如下:The specific calculation process of the contribution degree C i of the ith node is as follows:

Ci=c1*li+c2*giC i =c 1 *l i +c 2 * gi ,

其中li是节点参与全局梯度计算的次数,gi是节点贡献局部梯度的次数,c1和c2是全局梯度计算和局部梯度计算的贡献系数。where li is the number of times the node participates in the global gradient calculation, gi is the number of times the node contributes to the local gradient, and c1 and c2 are the contribution coefficients of the global gradient calculation and the local gradient calculation.

因为模型交易费F=记账手续费r+每个节点的模型贡献收益Ri的和。因此,每个节点的模型贡献收益Ri的计算过程如下:Because the model transaction fee F = the billing fee r + the sum of the model contribution revenue R i of each node. Therefore, the calculation process of the model contribution revenue R i of each node is as follows:

Figure BDA0002523210140000051
Figure BDA0002523210140000051

其中K是总的用户节点个数。where K is the total number of user nodes.

部分4:分布式账本;Part 4: Distributed ledger;

分布式账本记录了机器学习模型训练过程中的模型数据(包括局部模型和全局模型情况) 和模型交易数据。它保证了数据的可追溯性,所有作恶数据都会被记录,从一定程度上保证了系统的安全。The distributed ledger records model data (including local model and global model conditions) and model transaction data during machine learning model training. It ensures the traceability of data, and all malicious data will be recorded, which ensures the security of the system to a certain extent.

部分5:数据提供者;Section 5: Data Providers;

数据提供者负责收集数据并上传至本地服务器。The data provider is responsible for collecting the data and uploading it to the local server.

实施例2:一种基于区块链技术的安全、隐私保护、可交易的分布式机器学习框架的运行方法,其运行方法包括以下步骤:Embodiment 2: A method for operating a distributed machine learning framework that is secure, privacy-protected, and tradable based on blockchain technology, and the operating method includes the following steps:

图3为本发明框架的运行流程图,下面参照图3,详细说明系统运行的每个阶段。FIG. 3 is a flow chart of the operation of the framework of the present invention. Referring to FIG. 3 below, each stage of the system operation will be described in detail.

步骤1:联盟链初始化阶段;Step 1: Alliance chain initialization stage;

CA服务器给联盟链初始节点发布数字证书,所有的参与者建立连接,达成一些初始的共识:a.统一大家的数据集建立的标准(比如图片必须都为MNIST手写数据集标准);b.统一模型交易费的标准;c.统一主节点和背书节点的选取规则(这里按照节点id从小到大依次循环当选主节点,节点id在主节点id后面的m个节点当选背书节点,若比主节点id大的节点不够m个,则从id最小的开始依次递补)。The CA server issues a digital certificate to the initial node of the alliance chain, and all participants establish connections and reach some initial consensus: a. Unify the standards for everyone's data set establishment (for example, pictures must all be MNIST handwritten data set standards); b. Unify The standard of model transaction fee; c. Unify the selection rules of the master node and the endorsement node (here, the master node is elected according to the node id from small to large, and m nodes with the node id behind the master node id are selected as endorsement nodes, if the number of nodes is higher than that of the master node If the number of nodes with a large id is not enough m, the nodes with the smallest id will be supplemented in order).

步骤2:参数初始化阶段;Step 2: parameter initialization stage;

在参数初始化阶段,所有用户节点达成神经网络模型的一致性共识,包括确定好神经网络模型的网络结构,batchsize、训练迭代次数T、学习率ηt、初始权重w0、裁剪阈值为C,噪声大小σ等参数。同时,区块链节点将数据集标准下发给数据提供者。数据提供者收集训练集并上传至区块链节点。In the parameter initialization stage, all user nodes reach a consensus on the neural network model, including determining the network structure of the neural network model, batch size, number of training iterations T, learning rate η t , initial weight w 0 , clipping threshold C, noise parameters such as size σ. At the same time, the blockchain node issues the data set standard to the data provider. The data provider collects the training set and uploads it to the blockchain node.

当神经网络模型和数据集都准备好之后,所有用户节点,都贡献出测试集,统一系统的测试集数据。之后,整个系统便可以开始神经网络模型训练了。When the neural network model and data set are ready, all user nodes contribute the test set to unify the test set data of the system. After that, the whole system can start neural network model training.

步骤3:局部梯度计算阶段;Step 3: Local gradient calculation stage;

首先区块链中所有的用户节点确定主节点和背书节点,若主节点的id为i,则背书节点的id为i+1,i+2,…,i+m。然后每个节点根据自己的数据集和当前模型得到局部梯度,并在局部梯度上加入高斯噪声,使其满足差分隐私机制,最后将局部梯度发送至主节点和背书节点。First, all user nodes in the blockchain determine the master node and the endorsement node. If the id of the master node is i, the id of the endorsement node is i+1, i+2,...,i+m. Then each node obtains the local gradient according to its own data set and current model, and adds Gaussian noise to the local gradient to make it satisfy the differential privacy mechanism, and finally sends the local gradient to the master node and the endorsement node.

具体计算过程如下:假设在第t步迭代中,第k个节点中取到的B个训练数据集为

Figure BDA0002523210140000061
全局模型权重为wt,裁剪阈值为C,噪声大小σ。The specific calculation process is as follows: Suppose that in the t-th iteration, the B training data sets obtained from the k-th node are
Figure BDA0002523210140000061
The global model weight is wt , the clipping threshold is C, and the noise size σ.

在第t步迭代,第k的工作节点每个样本的局部梯度为At the t-th iteration, the local gradient of each sample of the k-th worker node is

Figure BDA0002523210140000062
Figure BDA0002523210140000062

其中,模型预测结果为

Figure BDA0002523210140000063
l()为损失函数。Among them, the model prediction result is
Figure BDA0002523210140000063
l() is the loss function.

然后对局部梯度进行裁剪,并加入高斯噪声,最后得到第k个节点的局部梯度gk(wt)为Then the local gradient is clipped, and Gaussian noise is added, and finally the local gradient g k (w t ) of the kth node is obtained as

Figure BDA0002523210140000064
Figure BDA0002523210140000064

步骤4:全局模型更新阶段;Step 4: Global model update stage;

主节点在收到各个节点的局部梯度后,运行具有拜占庭容错的梯度聚合算法(例如 multi-Krum、l-nearest aggregation等等)来聚合局部梯度得到全局梯度并更新模型,同时采用moments accountant来跟踪隐私损失。接着,系统会运行IPBFT共识算法:主节点首先会将聚合计算结果(包括主节点id,聚合梯度,差分隐私损失,被选中的节点id和局部梯度信息)写入区块blockt中,然后将blockt发送至背书节点处进行验证,若验证通过则将区块blockt广播至所有区块链节点,该区块就成功加入区块链中了。After receiving the local gradients of each node, the master node runs a gradient aggregation algorithm with Byzantine fault tolerance (such as multi-Krum, l-nearest aggregation, etc.) to aggregate the local gradients to obtain the global gradient and update the model, while using moments accountant to track loss of privacy. Next, the system will run the IPBFT consensus algorithm: the master node will first write the aggregate calculation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into block block t , and then The block t is sent to the endorsement node for verification. If the verification is passed, the block t is broadcast to all blockchain nodes, and the block is successfully added to the blockchain.

IPBFT:其中,如图4、图5、图6、图7所示,IPBFT算法的共识过程由8个阶段构成,分布是request-1(R-1)、pre-prepare-1(Pp-1)、prepare-1(P-2)、commit-1(C-1)、 request-2(R-2)、pre-prepare-2(Pp-2)、prepare-2(P-2)和commit-2(C-2)。所有的用户节点被分为主节点(L)、背书节点(E)和一般节点(G)。正常情况如图4所示,系统只需执行R-1、Pp-1、P-1和C-1这4步就能达到共识。而图5、图6则属于异常情况,系统比正常情况多执行R-2、Pp-2、P-2和C-2这4步。系统开始运行IPBFT的时刻定义为0时刻,若系统在t1时刻之前达到共识,则新的主节点将会被选出并开始下一共识过程;否则,IPBFT 将会判断该主节点是否为恶意节点。如果系统在t2时刻仍未达到共识,则这一共识过程中的主节点将会被认为是恶意节点,并将会从系统中被剔除。图7属于极其异常情况,这种情况下,错误的聚合结果会被共识,但在我们的系统中,恶意节点会被不断剔除,同时在联盟链中,由于CA的加入,节点作恶的可能性很低,因此,这种极其恶意情况是小概率事件,几乎不可能发生。并且这种错误的聚合结果即使在训练初期被引入,也不会对最后的训练模型造成影响。IPBFT: Among them, as shown in Figure 4, Figure 5, Figure 6, and Figure 7, the consensus process of the IPBFT algorithm consists of 8 stages, and the distribution is request-1 (R-1), pre-prepare-1 (Pp-1 ), prepare-1(P-2), commit-1(C-1), request-2(R-2), pre-prepare-2(Pp-2), prepare-2(P-2) and commit -2 (C-2). All user nodes are divided into master nodes (L), endorsement nodes (E) and general nodes (G). The normal situation is shown in Figure 4. The system only needs to execute the four steps of R-1, Pp-1, P-1 and C-1 to reach consensus. Figures 5 and 6 belong to abnormal situations, and the system executes the four steps of R-2, Pp-2, P-2 and C-2 more than the normal situation. The time when the system starts to run IPBFT is defined as time 0. If the system reaches consensus before time t1 , a new master node will be selected and the next consensus process will begin; otherwise, IPBFT will judge whether the master node is malicious node. If the system does not reach consensus at time t2 , the master node in this consensus process will be considered a malicious node and will be removed from the system. Figure 7 is an extremely abnormal situation. In this case, the wrong aggregation result will be consensus, but in our system, malicious nodes will be continuously eliminated. At the same time, in the alliance chain, due to the addition of CA, the possibility of nodes doing evil Very low, so this extremely malicious situation is a small probability event and is almost impossible to happen. And this wrong aggregation result will not affect the final training model even if it is introduced at the early stage of training.

如图4所示,在正常情况下,主节点是诚实的,并且诚实的背书节点数量不少于

Figure BDA0002523210140000071
m,则此时IPBFT的共识过程如下:As shown in Figure 4, under normal circumstances, the master node is honest, and the number of honest endorsing nodes is not less than
Figure BDA0002523210140000071
m, then the consensus process of IPBFT is as follows:

1)R-1:每个用户节点将自己的局部梯度发送至主节点和背书节点处。1) R-1: Each user node sends its own local gradient to the master node and the endorsement node.

2)Pp-1:主节点经过计算,将blockt发送至背书节点处进行验证。2) Pp-1: After calculation, the master node sends block t to the endorsement node for verification.

3)P-1:如果blockt被背书节点Ei验证通过,该背书节点将会发送有效的赞同凭证Vote(blockt,Ei)给主节点。3) P-1: If block t is verified by the endorsement node E i , the endorsement node will send a valid endorsement certificate Vote(block t , E i ) to the master node.

4)C-1:在这种情况下,主节点会收到至少

Figure BDA0002523210140000072
m赞同凭证,然后会生成区块证书 Cert(blockt)。然后主节点会将区块blockt和区块证书Cert(blockt)发送给其余用户节点进行区块同步。4) C-1: In this case, the master node will receive at least
Figure BDA0002523210140000072
m approves the certificate, and then generates the block certificate Cert(block t ). Then the master node will send the block block t and the block certificate Cert(block t ) to the rest of the user nodes for block synchronization.

如图5所示,在这种异常情况下,主节点是恶意的,并且诚实的背书节点数量不少于

Figure BDA0002523210140000073
m,则此时IPBFT的共识过程如下:As shown in Figure 5, in this abnormal situation, the master node is malicious, and the number of honest endorsing nodes is not less than
Figure BDA0002523210140000073
m, then the consensus process of IPBFT is as follows:

1)R-1:每个用户节点将自己的局部梯度发送至主节点和背书节点处。1) R-1: Each user node sends its own local gradient to the master node and the endorsement node.

2)Pp-1:主节点经过计算,将blockt发送至背书节点处进行验证。2) Pp-1: After calculation, the master node sends block t to the endorsement node for verification.

3)P-1:因为blockt不会被恶意的背书节点验证通过,这些恶意节点就不会发送赞同凭证给主节点。因此,主节点收到的赞同凭证的数量会小于

Figure BDA0002523210140000074
m,主节点也就不会产生区块证书Cert(blockt)。3) P-1: Since block t will not be verified by malicious endorsing nodes, these malicious nodes will not send endorsement credentials to the master node. Therefore, the number of endorsement credentials received by the masternode will be less than
Figure BDA0002523210140000074
m, the master node will not generate the block certificate Cert(block t ).

4)R-2:在这种异常情况下,系统在t1时刻之前没有达成对blockt的共识,所有的用户节点会将自己的局部梯度发送至其余用户节点。4) R-2: In this abnormal situation, the system does not reach a consensus on block t before time t1 , and all user nodes will send their local gradients to the rest of the user nodes.

5)Pp-2:主节点会广播blockt至其余所有用户节点进行验证。但是在这种异常情况下,主节点收到的赞同凭证的数量会小于

Figure BDA0002523210140000075
K(K是用户节点的数量),系统不会达成对区块blockt共识。同时,系统不会在t2时刻之前达成共识,主节点会被认为是恶意的,并将会从系统中被剔除。5) Pp-2: The master node will broadcast block t to all other user nodes for verification. But in this abnormal situation, the number of endorsement credentials received by the master node will be less than
Figure BDA0002523210140000075
K (K is the number of user nodes), the system will not reach a consensus on block t . At the same time, the system will not reach a consensus before time t2 , the master node will be considered malicious and will be eliminated from the system.

如图6所示,在这种异常情况下,主节点是诚实的,并且诚实的背书节点数量少于

Figure BDA0002523210140000076
m,则此时IPBFT的共识过程如下:As shown in Figure 6, in this abnormal situation, the master node is honest and the number of honest endorsing nodes is less than
Figure BDA0002523210140000076
m, then the consensus process of IPBFT is as follows:

1)R-1:每个用户节点将自己的局部梯度发送至主节点和背书节点处。1) R-1: Each user node sends its own local gradient to the master node and the endorsement node.

2)Pp-1:主节点经过计算,将blockt发送至背书节点处进行验证。2) Pp-1: After calculation, the master node sends block t to the endorsement node for verification.

3)P-1:如果blockt被背书节点Ei验证通过,该背书节点将会发送有效的赞同凭证Vote(blockt,Ei)给主节点。但是,这种情况下,有效的赞同凭证的数量会小于

Figure BDA0002523210140000081
m,主节点将不能生成区块证书。3) P-1: If block t is verified by the endorsement node E i , the endorsement node will send a valid endorsement certificate Vote(block t , E i ) to the master node. However, in this case, the number of valid endorsement credentials will be less than
Figure BDA0002523210140000081
m, the master node will not be able to generate block certificates.

4)R-2:在这种异常情况下,系统在t1时刻之前没有达成对blockt的共识,所有的用户节点会将自己的局部梯度发送至其余用户节点。4) R-2: In this abnormal situation, the system does not reach a consensus on block t before time t1 , and all user nodes will send their local gradients to the rest of the user nodes.

5)Pp-2:主节点会广播blockt至其余所有用户节点进行验证。5) Pp-2: The master node will broadcast block t to all other user nodes for verification.

6)P-2:如果区块blockt被用户节点Pi验证通过,该用户节点将会发送有效的赞同凭证 Vote(blockt,Pi)给主节点。6) P-2: If the block block t is verified by the user node Pi, the user node will send a valid approval certificate Vote(block t , Pi ) to the master node.

7)C-2:在这种情况下,主节点收到的赞同凭证的数量会不小于

Figure BDA0002523210140000082
K,它就可以生成区块证书Cert(blockt)。然后主节点会将区块blockt和区块证书Cert(blockt)发送给其余用户节点进行区块同步。7) C-2: In this case, the number of endorsement credentials received by the master node will be no less than
Figure BDA0002523210140000082
K, it can generate the block certificate Cert(block t ). Then the master node will send the block block t and the block certificate Cert(block t ) to the rest of the user nodes for block synchronization.

如图7所示,在这种极其恶意情况下,主节点是恶意的,并且恶意且与主节点串谋的背书节点数量不少于

Figure BDA0002523210140000083
m,则此时IPBFT的共识过程如下:As shown in Figure 7, in this extremely malicious situation, the masternode is malicious, and the number of endorsing nodes malicious and colluding with the masternode is not less than
Figure BDA0002523210140000083
m, then the consensus process of IPBFT is as follows:

1)R-1:每个用户节点将自己的局部梯度发送至主节点和背书节点处。1) R-1: Each user node sends its own local gradient to the master node and the endorsement node.

2)Pp-1:恶意的主节点会得到错误的聚合结果,并将blockt发送至背书节点处进行验证。2) Pp-1: The malicious master node will get the wrong aggregation result and send block t to the endorsement node for verification.

3)P-1:在这种情况下,blockt会被恶意且与主节点串谋的背书节点Ei验证通过,该背书节点将会发送赞同凭证Vote(blockt,Ei)给主节点。3) P-1: In this case, block t will be verified by the malicious endorser node E i that colludes with the master node, and the endorser node will send the endorsement certificate Vote(block t , E i ) to the master node. .

4)C-1:在这种情况下,主节点会收到至少

Figure BDA0002523210140000084
m赞同凭证,会生成区块证书Cert(blockt)然后主节点会将区块blockt和区块证书Cert(blockt)发送给其余用户节点进行区块同步。4) C-1: In this case, the master node will receive at least
Figure BDA0002523210140000084
m approves the certificate, and the block certificate Cert(block t ) will be generated, and then the master node will send the block block t and the block certificate Cert(block t ) to other user nodes for block synchronization.

可以看到在图7这种极其异常情况下,主节点与一些背书节点是恶意且串谋的,这种情况在我们的系统中发生概率是极小的。因为随着训练的进行,我们的系统会将恶意节点逐渐剔除,并且在联盟链中,由于CA的加入,节点作恶的概率是极小的。It can be seen that in the extremely abnormal situation of Figure 7, the master node and some endorsing nodes are malicious and colluding, and the probability of this situation in our system is extremely small. Because with the progress of training, our system will gradually eliminate malicious nodes, and in the alliance chain, due to the addition of CA, the probability of nodes doing evil is extremely small.

表1为相关共识算法应用在本发明中提出的分布式机器学习框架中的性能比较。可以看出本发明提出的共识算法IPBFT能够辨别恶意节点,而PBFT和PoW则不能辨别恶意节点。另外, PBFT和PoW需要在所有节点中互相传递局部梯度,所以它们的通信复杂度为O(K2),其中K 为用户节点的数量。而运行IPBFT后,随着训练的进行,恶意节点被逐渐剔除后,用户节点只需将局部梯度发送至1个主节点和m个背书节点,因此一般情况下它的通信复杂度为 O(mK);只有在图5和图6这两个恶意情况下,它的通信复杂度为O(K2)。因此,IPBFT的通信复杂度比PBFT和PoW更优。Table 1 shows the performance comparison of related consensus algorithms applied in the distributed machine learning framework proposed in the present invention. It can be seen that the consensus algorithm IPBFT proposed by the present invention can identify malicious nodes, while PBFT and PoW cannot identify malicious nodes. In addition, PBFT and PoW need to transfer local gradients to each other in all nodes, so their communication complexity is O(K 2 ), where K is the number of user nodes. After running IPBFT, after the malicious nodes are gradually eliminated as the training progresses, the user node only needs to send the local gradient to 1 master node and m endorsement nodes, so in general, its communication complexity is O(mK ); only in the two malicious cases shown in Figure 5 and Figure 6, its communication complexity is O(K 2 ). Therefore, the communication complexity of IPBFT is better than that of PBFT and PoW.

表1相关共识算法的比较Table 1 Comparison of related consensus algorithms

Figure BDA0002523210140000091
Figure BDA0002523210140000091

步骤5:训练终止阶段;Step 5: training termination stage;

当训练模型达到预期要求(模型精度达到要求或者模型的隐私损失将要超出隐私预算要求),系统便不再开始训练了。后续,区块链的主要作用便是维护机器学习模型的交易,若有新数据加入或者模型算法需要改进,则可重新开启机器学习训练的流程。When the training model meets the expected requirements (the model accuracy meets the requirements or the privacy loss of the model will exceed the privacy budget requirement), the system stops training. In the future, the main function of the blockchain is to maintain the transaction of the machine learning model. If new data is added or the model algorithm needs to be improved, the machine learning training process can be restarted.

实施例2:Example 2:

图8为本发明实例二在梯度聚合过程中,区块链100个节点有20个受到拜占庭攻击,分别运行IPBFT算法与PoW算法,节点数量随着迭代次数的变化对比图。Figure 8 is a comparison diagram of the number of nodes with the number of iterations when 20 of the 100 nodes in the blockchain are subjected to Byzantine attacks during the gradient aggregation process in Example 2 of the present invention, respectively running the IPBFT algorithm and the PoW algorithm.

图9为本发明实例二在执行本地梯度计算(没有引入差分隐私)时,区块链20个节点有 8个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。Figure 9 is a schematic diagram of the comparison of the accuracy of the test set of the model obtained by different aggregation methods when 8 of the 20 nodes in the blockchain are subjected to Byzantine attack when performing local gradient calculation (differential privacy is not introduced) in Example 2 of the present invention.

图10为本发明实例二在执行本地梯度计算(引入了差分隐私)时,区块链20个节点有 8个节点受到拜占庭攻击后采用不同聚合方法得到模型的测试集准确率对比示意图。10 is a schematic diagram of the comparison of the accuracy of the test set of the model obtained by different aggregation methods after 8 nodes of the 20 nodes in the blockchain are subjected to Byzantine attack when performing local gradient calculation (introduced differential privacy) in Example 2 of the present invention.

如图8可以看出随着系统的运行,IPBFT算法找出了20个恶意节点,并将这些恶意节点从系统中剔除了,而运行PoW算法的系统中恶意节点始终在。As can be seen in Figure 8, with the operation of the system, the IPBFT algorithm found 20 malicious nodes and eliminated these malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always there.

如图9可以看到在没有引入差分隐私的情况下,节点受到拜占庭攻击(随机梯度攻击) 后,multi-Krum算法比median算法聚合效果更优,与理想状况更为接近。As can be seen in Figure 9, without the introduction of differential privacy, the multi-Krum algorithm has better aggregation effect than the median algorithm after the node is attacked by Byzantine (stochastic gradient attack), and is closer to the ideal situation.

如图10可以看到在引入了差分隐私的情况下,节点受到拜占庭攻击(随机梯度攻击)后, median算法比multi-Krum算法聚合效果更优,与理想状况更为接近。As can be seen in Figure 10, when the differential privacy is introduced, after the node is subjected to Byzantine attack (stochastic gradient attack), the median algorithm has better aggregation effect than the multi-Krum algorithm, and is closer to the ideal situation.

从以上实验结果可以看出,我们提出的框架,能够有效解决分布式机器学习中参数服务器和工作节点都受到拜占庭攻击的情况,同时该框架可以对贡献节点进行奖励,对恶意节点进行剔除,保证系统能够更好地运行下去。此外,我们的框架还能应用其他不同的拜占庭聚合算法,使模型效果达到最佳。It can be seen from the above experimental results that the framework we propose can effectively solve the Byzantine attack on both the parameter server and the worker nodes in distributed machine learning. At the same time, the framework can reward contributing nodes and eliminate malicious nodes to ensure The system can run better. In addition, our framework can apply other different Byzantine aggregation algorithms to optimize the model performance.

虽然本发明所揭露的实施方式如上,但所述的内容只是为了便于理解本发明而采用的实施方式,并非用以限定本发明。任何本发明所属技术领域内的技术人员,在不脱离本发明所揭露的精神和范围的前提下,可以在实施的形式上及细节上作任何的修改与变化,但本发明的专利保护范围,仍须以所附的权利要求书所界定的范围为准。Although the embodiments disclosed in the present invention are as above, the content described is only an embodiment adopted to facilitate understanding of the present invention, and is not intended to limit the present invention. Any person skilled in the art to which the present invention belongs, without departing from the spirit and scope disclosed by the present invention, can make any modifications and changes in the form and details of the implementation, but the scope of patent protection of the present invention, The scope as defined by the appended claims shall still prevail.

Claims (6)

1. A secure, privacy preserving, tradable distributed machine learning framework based on blockchain techniques, comprising:
part 1, a Certificate Authority (CA) is responsible for issuing and revoking digital certificates for block chain nodes and carrying out authority management on the nodes;
the block link points are composed of user nodes and transaction nodes and are respectively responsible for maintaining the machine learning model and participating in the machine learning model transaction;
part 3, the intelligent contract is composed of a machine learning intelligent contract (MLMC) and a model contribution intelligent contract (MCMC), and the distribution defines the operation rule of distributed machine learning and the profit division is carried out on the nodes according to the model contribution degree;
part 4, the distributed account book records model data (including local model and global model conditions) and model transaction data in the machine learning model training process;
and part 5, the data provider is responsible for collecting local data and uploading the local data to the blockchain node server.
2. The method of claim 1 for operating a distributed machine learning framework based on blockchain technology for security, privacy protection and tradable, the method comprising the steps of:
step 1, alliance chain initialization stage: the CA server issues a digital certificate to an initial node of the alliance chain, and all participants establish connection to achieve some initial consensus;
step 2, parameter initialization stage: all user nodes achieve consistency consensus of the neural network model and synchronize test set data of the system;
step 3, local gradient calculation stage: all user nodes sequentially and circularly select main nodes according to the sequence that id is from small to large, m nodes behind the id of the main node are endorsement nodes, then each node calculates local gradient by using local data and a current model, Gaussian noise is added to the gradient to enable the local gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main node and the endorsement nodes;
step 4, global model updating stage: the main node calculates a global gradient according to the local gradient of each node and a gradient aggregation algorithm with Byzantine fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradient obtains the system consensus, the global model is updated, and the related information of the global model is written into the block;
step 5, training termination stage: when the training model meets the expected requirements, the system does not train the model any more, and the subsequent action is maintenance model transaction.
3. The method of claim 2 for operating a distributed machine learning framework based on blockchain technology for security, privacy protection and tradable, wherein the step 1: the alliance chain initialization stage specifically comprises the following steps:
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying criteria established by the data set of everybody; b. unifying the standard of the model transaction fee, wherein the transaction fee can be increased along with the perfection degree of the model; c. and unifying the selection rules of the main nodes and the endorsement nodes, wherein the main nodes are sequentially selected in a circulating mode according to the sequence from small to large of the node ids, and m nodes behind the main node ids are the endorsement nodes.
4. The method of claim 2 for operating a distributed machine learning framework based on blockchain technology for security, privacy protection and tradable, wherein step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes achieve the consistency consensus of the neural network model, including the determination of the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate etatInitial weight w0The cutting threshold value is C, the noise size sigma and other parameters, meanwhile, the block chain node issues the data set standard to a data provider, the data provider collects a training set and uploads the training set to the block chain node, and after the neural network model and the data set are prepared, all user nodes contribute to a test set and unify the test set data of the system. The entire system can then begin neural network model training.
5. The method of claim 2 for operating a distributed machine learning framework based on blockchain technology for security, privacy protection and tradable, wherein step 3: the local gradient calculation stage is as follows:
firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i +1, i +2, …, i + m, then obtaining a local gradient by each node according to a data set and a current model of the node, adding Gaussian noise to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally sending the local gradient to the main node and the endorsement node;
the specific calculation process is as follows: suppose that in the t-th iteration, B training data sets are obtained from the kth node
Figure FDA0002523210130000021
Global model weight of wtThe clipping threshold is C, and the noise size is sigma;
in the t-th iteration, the local gradient of each sample of the k-th working node is
Figure FDA0002523210130000022
Wherein the model predicts as
Figure FDA0002523210130000023
l () is a loss function;
then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth nodek(wt) Is composed of
Figure FDA0002523210130000024
6. The method of claim 2 for operating a distributed machine learning framework based on blockchain technology for security, privacy protection and tradable, wherein step 4: the global model updating stage specifically includes: after receiving the local gradients of each node, the master node operates a gradient aggregation algorithm with Byzantine fault tolerance to aggregate the local gradients to obtain a global gradient and update a model, meanwhile, a moments accounting method is adopted to track privacy loss, and then,the system will run the IPBFT consensus algorithm: the master node writes the result of the aggregation calculation into the blocktThen block is put intSending the block to an endorsement node for verification, and if the block passes the verification, sending the block to a block of endorsements for verificationtBroadcast to all blockchain nodes and the block is successfully added into the blockchain.
CN202010496847.9A 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology Active CN111915294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010496847.9A CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010496847.9A CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Publications (2)

Publication Number Publication Date
CN111915294A true CN111915294A (en) 2020-11-10
CN111915294B CN111915294B (en) 2023-11-28

Family

ID=73237547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010496847.9A Active CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Country Status (1)

Country Link
CN (1) CN111915294B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819177A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Personalized privacy protection learning method, device and equipment
CN113434873A (en) * 2021-06-01 2021-09-24 内蒙古大学 Federal learning privacy protection method based on homomorphic encryption
CN113806764A (en) * 2021-08-04 2021-12-17 北京工业大学 Distributed support vector machine based on block chain and privacy protection and optimization method thereof
CN113822758A (en) * 2021-08-04 2021-12-21 北京工业大学 An Adaptive Distributed Machine Learning Approach Based on Blockchain and Privacy
CN114118438A (en) * 2021-10-18 2022-03-01 华北电力大学 A privacy-preserving machine learning training and reasoning method and system based on blockchain
CN114510351A (en) * 2022-02-14 2022-05-17 北京中量质子网络信息科技有限公司 Super-large scale distributed machine learning framework, method, equipment and storage medium
CN114595836A (en) * 2022-02-28 2022-06-07 山东大学 Decentralized Machine Learning Method Based on Byzantine Fault Tolerance and Privacy Protection
CN116094732A (en) * 2023-01-30 2023-05-09 山东大学 Block chain consensus protocol privacy protection method and system based on rights and interests proving

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864198A (en) * 2017-11-07 2018-03-30 济南浪潮高新科技投资发展有限公司 A kind of block chain common recognition method based on deep learning training mission
US20190236559A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing smart flow contracts using distributed ledger technologies in a cloud based computing environment
WO2019222993A1 (en) * 2018-05-25 2019-11-28 北京大学深圳研究生院 Blockchain consensus method based on trust relationship
CN110599261A (en) * 2019-09-21 2019-12-20 江西理工大学 Electric automobile safety electric power transaction and excitation system based on energy source block chain
CN110738375A (en) * 2019-10-16 2020-01-31 国网湖北省电力有限公司电力科学研究院 Active power distribution network power transaction main body optimization decision method based on alliance chain framework

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864198A (en) * 2017-11-07 2018-03-30 济南浪潮高新科技投资发展有限公司 A kind of block chain common recognition method based on deep learning training mission
US20190236559A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing smart flow contracts using distributed ledger technologies in a cloud based computing environment
WO2019222993A1 (en) * 2018-05-25 2019-11-28 北京大学深圳研究生院 Blockchain consensus method based on trust relationship
CN110599261A (en) * 2019-09-21 2019-12-20 江西理工大学 Electric automobile safety electric power transaction and excitation system based on energy source block chain
CN110738375A (en) * 2019-10-16 2020-01-31 国网湖北省电力有限公司电力科学研究院 Active power distribution network power transaction main body optimization decision method based on alliance chain framework

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819177A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Personalized privacy protection learning method, device and equipment
CN113434873A (en) * 2021-06-01 2021-09-24 内蒙古大学 Federal learning privacy protection method based on homomorphic encryption
CN113822758B (en) * 2021-08-04 2023-10-13 北京工业大学 An adaptive distributed machine learning method based on blockchain and privacy
CN113822758A (en) * 2021-08-04 2021-12-21 北京工业大学 An Adaptive Distributed Machine Learning Approach Based on Blockchain and Privacy
CN113806764A (en) * 2021-08-04 2021-12-17 北京工业大学 Distributed support vector machine based on block chain and privacy protection and optimization method thereof
CN113806764B (en) * 2021-08-04 2023-11-10 北京工业大学 Distributed support vector machine based on blockchain and privacy protection and optimization method thereof
CN114118438A (en) * 2021-10-18 2022-03-01 华北电力大学 A privacy-preserving machine learning training and reasoning method and system based on blockchain
CN114118438B (en) * 2021-10-18 2023-07-21 华北电力大学 A blockchain-based privacy-preserving machine learning training and reasoning method and system
CN114510351A (en) * 2022-02-14 2022-05-17 北京中量质子网络信息科技有限公司 Super-large scale distributed machine learning framework, method, equipment and storage medium
CN114510351B (en) * 2022-02-14 2024-10-22 北京中量质子网络信息科技有限公司 Super-large-scale distributed machine learning device
CN114595836A (en) * 2022-02-28 2022-06-07 山东大学 Decentralized Machine Learning Method Based on Byzantine Fault Tolerance and Privacy Protection
CN116094732A (en) * 2023-01-30 2023-05-09 山东大学 Block chain consensus protocol privacy protection method and system based on rights and interests proving
CN116094732B (en) * 2023-01-30 2024-09-20 山东大学 Block chain consensus protocol privacy protection method and system based on rights and interests proving

Also Published As

Publication number Publication date
CN111915294B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN111915294B (en) Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology
CN113794675B (en) Distributed Internet of things intrusion detection method and system based on block chain and federal learning
US11669811B2 (en) Blockchain-based digital token utilization
CN109104413B (en) Method for solving intersection of private data for secure multi-party computation and verification method
Wang et al. Inter-bank payment system on enterprise blockchain platform
CN112685766B (en) Enterprise credit investigation management method and device based on block chain, computer equipment and storage medium
CN110020860A (en) A method, system and computer-readable storage medium for cross-chain asset transfer
Sun et al. A decentralized cross-chain service protocol based on notary schemes and hash-locking
CN112907252A (en) Block chain transaction method and system based on multi-person down-chain channel
CN110610421B (en) Margin management method and device under sharding framework
CN117171786A (en) A decentralized federated learning method to resist poisoning attacks
CN115952532A (en) A privacy protection method based on alliance chain federated learning
CN118916878A (en) Combined audit security defense method for federal learning multi-element poisoning attack
You et al. Accuracy degrading: toward participation-fair federated learning
CN114861211B (en) A data privacy protection method, system, and storage medium for metaverse scenarios
CN110598007A (en) Bill file processing method, device, medium and electronic equipment
Zhao et al. Blockchain-based decentralized federated learning: A secure and privacy-preserving system
CN114172661B (en) Bidirectional cross-link method, system and device for digital asset
Chen et al. Efficient and Non-Repudiable Data Trading Scheme Based on State Channels and Stackelberg Game
CN116796830A (en) Bycibe-hormonarch robust federal learning method and system based on block chain
CN111464539B (en) Blockchain accounting method and accounting node
Russell et al. The Philos Trust Algorithm: Preventing Exploitation of Distributed Trust
Sheng et al. Proof of diligence: Cryptoeconomic security for rollups
CN117312441B (en) Agricultural product yield estimation method and system based on blockchain
CN116823272B (en) Intelligent contract management system based on block chain consensus mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant