CN111915294B - Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology - Google Patents
Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology Download PDFInfo
- Publication number
- CN111915294B CN111915294B CN202010496847.9A CN202010496847A CN111915294B CN 111915294 B CN111915294 B CN 111915294B CN 202010496847 A CN202010496847 A CN 202010496847A CN 111915294 B CN111915294 B CN 111915294B
- Authority
- CN
- China
- Prior art keywords
- node
- model
- nodes
- machine learning
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000005516 engineering process Methods 0.000 title claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 33
- 230000008569 process Effects 0.000 claims abstract description 22
- 230000002776 aggregation Effects 0.000 claims description 21
- 238000004220 aggregation Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000003062 neural network model Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000005520 cutting process Methods 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 7
- 230000002159 abnormal effect Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006872 improvement Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 206010063659 Aversion Diseases 0.000 description 2
- 102100036289 Calcium-binding mitochondrial carrier protein SCaMC-2 Human genes 0.000 description 2
- 101001093153 Homo sapiens Calcium-binding mitochondrial carrier protein SCaMC-2 Proteins 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/389—Keeping log of transactions for guaranteeing non-repudiation of a transaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0823—Network architectures or network communication protocols for network security for authentication of entities using certificates
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Strategic Management (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Finance (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following parts: the certificate authority CA is responsible for issuing and canceling a digital certificate for the block chain node and managing the authority of the node; the blockchain node is responsible for maintaining a machine learning model and participating in machine learning model transactions; the intelligent contract prescribes running rules of distributed machine learning and carries out profit division on nodes according to model contribution degree; the distributed ledger records model data and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server.
Description
Technical Field
The invention relates to a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, in particular to a framework which solves the problem of Bayesian attack in distributed machine learning by utilizing a blockchain (alliance chain) technology, protects the data set privacy of each participant by utilizing a differential privacy technology and can finish machine learning model transaction, and belongs to the fields of artificial intelligence, blockchains and information security.
Background
In a parameter server framework commonly used in distributed machine learning, a plurality of working nodes are trained by using local data and a current global model to obtain a local model, the local model is sent to a parameter server, and the parameter server aggregates all the local models and updates the global model. However, there may be security problems in this process, and both the working node and the parameter server node may be subject to a bayer attack. Specifically, a work node subjected to a Bayesian attack will send an erroneous local gradient to the parameter server, thereby affecting the model effect of the final training; the parameter server node is subject to a bayer attack to aggregate an erroneous global model, making the previous training very much more expensive. In recent years, researchers have tried to use blockchains in the fields of internet of things, medical treatment, finance and the like because blockchains have the advantages of non-tamper ability, traceability, distributed storage, public maintenance and the like, and the problems of security, transaction and the like in the blockchains have been solved.
To date, the problem of the bayer attack in distributed machine learning has been studied to some extent. However, there are also the following problems: 1) The existing distributed machine learning algorithm does not consider the situation that a parameter server is subjected to Bayesian attack in the process of aggregating models; 2) How the detected Bayesian node is processed to prevent the Bayesian node from interfering with model training; 3) How to implement an incentive mechanism in a blockchain system combined with distributed machine learning to help the system to operate more efficiently; therefore, a new solution is urgently needed to solve the above technical problems.
Disclosure of Invention
The invention aims at solving the problems of the distributed machine learning, provides an algorithm to solve the problem that a working node and a parameter server node are subjected to the Bayesian attack, and if a blockchain technology is introduced, the problem of consensus in a blockchain needs to be solved, and an effective excitation mechanism is provided to promote the blockchain system to operate effectively and permanently.
In order to solve the technical problems, the invention provides a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following components: part 1, a multi-Certificate Authority (CA) is responsible for issuing and canceling digital certificates for block chain nodes and managing the rights of the nodes; part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction; part 3, the intelligent contract is composed of a machine learning intelligent contract (MLMC) and a model contribution intelligent contract (MCMC), the distribution prescribes the running rule of distributed machine learning and carries out profit division on the nodes according to the model contribution degree; part 4, the distributed ledger records model data (including local model and global model conditions) and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server, part 5. In the scheme, the certificate authority CA can conduct condition auditing, supervision and authority management on all nodes to be added into the system, so that the addition of malicious nodes can be avoided to a certain extent, and the safety of the system is guaranteed. Both the transaction node and the later joining user node need to pay in a chain of hand fees (model transaction fees). After synchronizing the block information, the transaction node exits the system. If the user node is identified as a malicious node, the system can be withdrawn, the former link-entering continuous fee can not be withdrawn, the subsequent model transaction fee can not be obtained, the malicious node is punished, the rule of the intelligent contract is opened to all the user nodes, the content of the intelligent contract is difficult to be tampered by the malicious node, the distributed account book records model data and model transaction data in the machine learning model training process, the traceability of the data is ensured, all the wrought data can be recorded, and the safety of the system is ensured to a certain extent. If each node of the system does not need data set privacy protection, the local gradient may not add gaussian noise; meanwhile, there are many methods for protecting the privacy of the data set, and if there are more suitable methods, the method can be switched to other privacy protection methods.
A method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:
step 1, a alliance chain initialization stage: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved;
step 2, parameter initialization: all user nodes achieve the consistency consensus of the neural network model and synchronize the test set data of the system;
step 3, a local gradient calculation stage: all user nodes sequentially select main nodes according to the order of id from small to large, the rest nodes are endorsement nodes, each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the local gradient to meet differential privacy, and finally the local gradient is sent to the main nodes and the endorsement nodes;
step 4, a global model updating stage: the master node calculates global gradients according to the local gradients of the nodes and a gradient aggregation algorithm with Bayesian fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradients obtain the system consensus, the global model is updated, and related information of the global model is written into the block;
step 5, training termination phase: when the training model meets the expected requirements, the system does not train the model any more and the subsequent function is to maintain the model trade.
As an improvement of the present invention, step 1: the alliance chain initialization stage is specifically as follows:
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying standards of model transaction fees; c. unifying the selection rules of the main node and the endorsement node.
As an improvement of the present invention, step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta t Initial weight w 0 And when the neural network model and the data set are prepared, all user nodes contribute to a test set and unify the test set data of the system. The entire system can then begin neural network model training.
As an improvement of the present invention, step 3: the local gradient calculation stage is specifically as follows:
firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, … and i+m, then each node obtains a local gradient according to a data set and a current model, adds differential privacy on the local gradient, and sends the differential privacy to the main node and the endorsement node;
the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weights arew t The clipping threshold is C, and the noise size sigma;
at the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function;
then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
And finally, each node transmits own local gradient to the main node and the endorsement node.
As an improvement of the present invention, step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, the moments accountant is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
In step 4, the blockchain consensus algorithm IPBFT can effectively verify the gradient aggregation result and effectively identify malicious nodes, and meanwhile, the algorithm is applicable to the alliance chain, and can complete transaction confirmation in a shorter time compared with a common chain consensus algorithm (such as PoW, poS, poET and the like), and the communication complexity of the algorithm is lower.
Compared with the prior art, the invention has the following advantages: 1) The distributed machine learning framework based on the block chain technology has strong practicability and can be used for all distributed machine learning algorithms based on gradient descent; 2) The invention adopts CA to realize effective authority management to block chain nodes (including transaction nodes and user nodes). For the transaction node, the CA can collect the transaction fee of the machine learning model of the CA and control the validity period of the authority; for malicious nodes, the CA can revoke the user rights of the malicious nodes, so that the malicious nodes are prevented from damaging a machine learning model; 3) The IPBFT consensus algorithm provided by the invention can effectively resist the ByBytrring attack of the parameter server node aggregation process and simultaneously identify and reject malicious nodes, so that the system is safer and safer; 4) The invention effectively realizes an excitation mechanism on the block chain. Specifically, intelligent contracts are deployed on the blockchain to realize reasonable distribution of model transaction fees; 5) According to the invention, differential privacy is added in the distributed machine learning, so that the data set privacy of the system participants can be effectively protected.
Drawings
FIG. 1 is a block chain technology based distributed machine learning framework in accordance with the present invention;
FIG. 2 is a CA frame diagram of the present invention;
FIG. 3 is a flow chart of the operation of the present invention;
FIG. 4 is a schematic diagram of the consensus process of IPBFT under normal conditions;
fig. 5 is a graph showing comparison of accuracy of test sets of models obtained by different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in example 2 of the present invention.
Fig. 6 is a graph showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in example 3 of the present invention.
FIG. 7 is a schematic diagram of the consensus process of IPBFT in such an extremely malicious situation;
FIG. 8 is a schematic diagram of the IPBFT algorithm finding out 20 malicious nodes and eliminating the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always in the schematic diagram;
fig. 9 shows that the multi-Krum algorithm has better aggregation effect than the media algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.
Fig. 10 shows that under the condition of introducing differential privacy, after the node is subjected to the bayer attack (random gradient attack), the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition.
Detailed Description
The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.
Example 1: FIG. 1 is a block chain technology based secure, tradable, distributed machine learning framework in accordance with the present invention. The individual components of the frame are described in detail below with reference to fig. 1.
A blockchain technology-based secure, privacy preserving, tradable, distributed machine learning framework, the framework comprising:
part 1: certificate authority CA;
the CA is responsible for issuing and revoking digital certificates for the blockchain nodes and performing authority management on the nodes. It needs to be trusted by all block link points, as well as supervised by all block link points. The structure of which is shown in figure 2. For security, our CA employs the root certificate chain implementation of the more common root and intermediate CAs. The root CA does not issue a certificate directly for the server, it generates two intermediate CAs (user CA and transactor CA) for itself, the intermediate CA acts as a proxy for the root CA to apply visas for the client, and the intermediate CA can reduce the management burden of the root CA.
Part 2: a blockchain node;
in the system framework of the present invention, there are two types of blockchain nodes: transaction nodes and user nodes.
The transaction node is a temporary node that the external user wishes to obtain a training model and join the blockchain network. After obtaining the CA permission to join the blockchain, the transaction node executes the blocksynchronization once, and after executing the blocksynchronization, the digital certificate is revoked, and the node is withdrawn from the network.
The user nodes are the main components of the blockchain network, and the user nodes are used for maintaining and training a machine learning model and writing data packets into a distributed ledger in the blockchain. Each user node has the functions of local gradient calculation, global model aggregation, accounting, block information verification and the like.
Part 3: an intelligent contract;
in the system framework of the invention, there are two smart contracts, distributed as machine learning smart contracts (Machine Learning Smart Contract, MLSC) and model contribution smart contracts (Model Contribution Smart Contract, MCSC).
The MLSC specifies the running rules of distributed machine learning, including local gradient computation, global model computation, IPBFT consensus mechanisms, and so forth.
The MCSC calculates the model contribution degree of each node by checking account book information in the blockchain, divides model transaction fees according to the contribution degree, and meanwhile, can divide a note accounting commission fee by writing the transaction information into accounting nodes of the blockchain.
Contribution C of ith node i The specific calculation process is as follows:
C i =c 1 *l i +c 2 *g i ,
wherein l i The number of times nodes participate in global gradient computation, g i Is the number of times a node contributes to a local gradient, c 1 And c 2 Is the contribution coefficient of global gradient computation and local gradient computation.
Because model transaction fee f=billingModel contribution benefits R of commission r+each node i A kind of electronic device. Thus, the model contribution revenue R for each node i The calculation process of (2) is as follows:
where K is the total number of user nodes.
Part 4: a distributed ledger;
the distributed ledger records model data (including local and global model cases) and model transaction data during machine learning model training. The method ensures the traceability of the data, and all the wrought data can be recorded, thereby ensuring the safety of the system to a certain extent.
Part 5: a data provider;
the data provider is responsible for collecting the data and uploading it to the local server.
Example 2: a method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:
fig. 3 is a flow chart illustrating the operation of the framework of the present invention, and each stage of the system operation is described in detail below with reference to fig. 3.
Step 1: a alliance chain initialization stage;
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying standards established by a data set of a whole family (such as pictures must be MNIST handwriting data set standards); b. unifying standards of model transaction fees; c. the selection rules of the main node and the endorsement node are unified (here, the main node is selected in a circulation mode from small to large according to the node id, m nodes with the node id behind the main node id select the endorsement node, if the nodes with the node id larger than the main node id are less than m nodes, the nodes are sequentially complemented from the beginning with the smallest id).
Step 2: a parameter initialization stage;
in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network modelComprises determining the network structure of a neural network model, the batch size, the training iteration times T and the learning rate eta t Initial weight w 0 And the clipping threshold is C, and the noise size sigma and other parameters. At the same time, the block link issues the data set criteria to the data provider. The data provider collects the training set and uploads it to the blockchain node.
After the neural network model and the data set are prepared, all user nodes contribute to the test set and unify the test set data of the system. The entire system can then begin neural network model training.
Step 3: a local gradient calculation stage;
first, all user nodes in a block chain determine a main node and an endorsement node, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m. And each node obtains a local gradient according to the data set and the current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main node and the endorsement node.
The specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w t The clipping threshold is C, the noise level sigma.
At the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function.
Then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
Step 4: a global model updating stage;
after receiving the local gradients of each node, the master node runs a gradient aggregation algorithm (e.g., multi-Krum, l-nearest aggregation, etc.) with bayer fault tolerance to aggregate the local gradients to obtain global gradients and update the model, while employing moments accountant to track privacy loss. The system then runs the IPBFT consensus algorithm: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
IPBFT: the consensus process of the IPBFT algorithm, as shown in FIGS. 4, 5, 6 and 7, consists of 8 stages, and the distribution is request-1 (R-1), pre-preparation-1 (Pp-1), preparation-1 (P-2), commit-1 (C-1), request-2 (R-2), pre-preparation-2 (Pp-2), preparation-2 (P-2) and commit-2 (C-2). All user nodes are divided into a master node (L), an endorsement node (E), and a generic node (G). Normally, as shown in FIG. 4, the system can reach consensus by only executing 4 steps of R-1, pp-1, P-1 and C-1. While FIGS. 5 and 6 are in an abnormal situation, the system performs 4 steps R-2, pp-2, P-2 and C-2 more than in a normal situation. The moment when the system starts to operate IPBFT is defined as 0 moment, if the system is at t 1 When consensus is reached before the moment, a new main node is selected and the next consensus process is started; otherwise, the IPBFT will determine if the master node is a malicious node. If the system is at t 2 The moment still does not reach the consensus, the master node in the consensus process is considered as a malicious node and is removed from the system. FIG. 7 is an extremely exceptional case where the false aggregate result is consensus, but in our system malicious nodes are continuously culled, while in the federated chain, the likelihood of node aversion is low due to CA joining, becauseThis extremely malicious situation is a small probability event, which is almost impossible to occur. And even if the false aggregation result is introduced in the initial stage of training, the false aggregation result does not affect the final training model.
As shown in fig. 4, under normal conditions, the master node is honest, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: if block t Endorsed node E i The endorsement node will send a valid endorsement credential Vote (block t ,E i ) To the master node.
4) C-1: in this case, the master node will receive at leastThe certificate is approved and then a block certificate Cert (block t ). The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
As shown in fig. 5, in such an abnormal situation, the master node is malicious, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: because of block t Will not be verified by malicious endorsement nodes, which will not send approvalThe credential is given to the master node. Thus, the number of approval credentials received by the master node will be less thanThe master node will not generate the block certificate Cert (block) t )。
4) R-2: in this abnormal situation, the system is at t 1 The block is not achieved before the moment t All user nodes will send their own local gradients to the rest of the user nodes.
5) Pp-2: the master node will broadcast the block t And verifying until all other user nodes. But in such an abnormal situation the number of approval credentials received by the master node will be less than(K is the number of user nodes) the system does not achieve block mapping t And (5) consensus. At the same time, the system will not be at t 2 The moment before reaching consensus, the master node will be considered malicious and will be rejected from the system.
As shown in fig. 6, in this abnormal case, the master node is honest, and the honest number of endorsement nodes is less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: if block t Endorsed node E i The endorsement node will send a valid endorsement credential Vote (block t ,E i ) To the master node. However, in this case, the number of valid endorsement credentials would be less thanThe master node will not be able to generate the block certificate.
4) R-2: where it isIn case of species abnormality, the system at t 1 The block is not achieved before the moment t All user nodes will send their own local gradients to the rest of the user nodes.
5) Pp-2: the master node will broadcast the block t And verifying until all other user nodes.
6) P-2: if block t By user node P i The user node will send a valid approval credential Vote (block t ,P i ) To the master node.
7) C-2: in this case, the number of approval credentials received by the master node will be no less thanIt can generate a block certificate Cert (block) t ). The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
As shown in fig. 7, in this extremely malicious case, the master node is malicious, and the number of endorsement nodes that are malicious and colluded with the master node is not lessThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the malicious master node can obtain the wrong aggregation result and block t And sending the data to an endorsement node for verification.
3) P-1: in this case, block t Endorsement node E that can be malicious and colluded with the master node i The endorsement node will send approval credential Vote (block) t ,E i ) To the master node.
4) C-1: in this case, the master node will receive at leastEndorsement of the credential, a block certificate Cert (block) t ) The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
It can be seen that in the extremely abnormal case of fig. 7, the master node is malicious and colluded with some endorsement nodes, which has a very small probability of occurrence in our system. Because our system will progressively cull malicious nodes as training proceeds, and the probability of node aversion is minimal in the federation chain due to the addition of CA.
Table 1 is a comparison of performance of a related consensus algorithm applied in the distributed machine learning framework presented in the present invention. It can be seen that the common-knowledge algorithm IPBFT provided by the present invention can identify malicious nodes, while PBFT and PoW cannot identify malicious nodes. In addition, PBFT and PoW need to communicate local gradients with each other in all nodes, so their communication complexity is O (K 2 ) Where K is the number of user nodes. After running IPBFT, after malicious nodes are gradually removed along with the progress of training, the user node only needs to send local gradients to 1 master node and m endorsement nodes, so that the communication complexity is O (mK) under the general condition; only in the two malicious cases of fig. 5 and 6, its communication complexity is O (K 2 ). Thus, the communication complexity of IPBFT is better than PBFT and PoW.
Table 1 comparison of related consensus algorithms
Step 5: a training termination stage;
when the training model meets the expected requirements (the model accuracy meets the requirements or the privacy loss of the model is about to exceed the privacy budget requirements), the system does not begin training any more. Subsequently, the main function of the blockchain is to maintain the trade of the machine learning model, and if new data is added or the model algorithm needs to be improved, the flow of the machine learning training can be restarted.
Example 2:
fig. 8 is a graph showing the comparison of the number of nodes with the number of iterations, in the gradient aggregation process, in which 20 nodes of the blockchain are subjected to the bayer attack, and the IPBFT algorithm and the PoW algorithm are respectively operated.
Fig. 9 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in the second embodiment of the present invention.
Fig. 10 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in the second embodiment of the present invention.
As can be seen from fig. 8, with the operation of the system, the IPBFT algorithm finds out 20 malicious nodes, and eliminates the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always present.
As can be seen in fig. 9, the multi-Krum algorithm aggregates better than the media algorithm and more closely approximates the ideal situation after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.
As can be seen from fig. 10, in the case of introducing differential privacy, the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack).
From the experimental results, the framework provided by the method can effectively solve the problem that both the parameter server and the working node are attacked by the Bayesian in the distributed machine learning, meanwhile, the framework can reward the contribution nodes, reject the malicious nodes and ensure that the system can run better. In addition, the framework can also apply other different Bayesian aggregation algorithms to optimize the model effect.
Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.
Claims (5)
1. A safe, privacy protection and tradable running method of a distributed machine learning framework based on a blockchain technology is realized by adopting a distributed machine learning system, and the system comprises the following parts:
part 1, a Certificate Authority (CA) is responsible for issuing and canceling a digital certificate for a block chain node and managing the authority of the node;
part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction;
the intelligent contract is composed of a machine learning intelligent contract MLMC and a model contribution intelligent contract MCMC, and the distribution prescribes the running rule of distributed machine learning and carries out profit division on nodes according to the model contribution degree;
part 4, the distributed ledger records model data in the training process of the machine learning model, including local model and global model conditions and model transaction data;
part 5, the data provider is responsible for collecting local data and uploading the local data to the blockchain node server;
the operation method is characterized by comprising the following steps:
step 1, a alliance chain initialization stage: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved;
step 2, parameter initialization: all user nodes achieve the consistency consensus of the neural network model and synchronize the test set data of the system;
step 3, a local gradient calculation stage: sequentially and circularly selecting main nodes by all user nodes according to the order of id from small to large, wherein m nodes are endorsement nodes after the main nodes are id, then each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main nodes and the endorsement nodes;
step 4, a global model updating stage: the master node calculates global gradients according to the local gradients of the nodes and a gradient aggregation algorithm with Bayesian fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradients obtain the system consensus, the global model is updated, and related information of the global model is written into the block;
step 5, training termination phase: when the training model meets the expected requirements, the system does not train the model any more and the subsequent function is to maintain the model trade.
2. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 1, wherein step 1: the alliance chain initialization stage is specifically as follows:
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying the standards of the model transaction fee, wherein the transaction fee increases along with the perfection degree of the model; c. and unifying the selection rules of the main node and the endorsement node, sequentially and circularly selecting the main node according to the order of the node ids from small to large, wherein m nodes after the main node id are endorsement nodes.
3. The method of operation of a blockchain technology based secure, privacy preserving, tradable, distributed machine learning framework of claim 2, wherein step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta t Initial weight w 0 Cutting threshold value is C, noise size sigma parameter, at the same time, block chain link point transmits data set standard to data provider, data provider collects training set and uploads it to block chain node, when neural networkAfter the model and the data set are prepared, all user nodes contribute to the test set, the test set data of the system are unified, and then the whole system starts the training of the neural network model.
4. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 3: the local gradient calculation stage is specifically as follows:
firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m, then each node obtains a local gradient according to a data set and a current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally, the local gradient is sent to the main node and the endorsement node;
the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w t The clipping threshold is C, and the noise size sigma;
at the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function;
then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
5. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, a moments accountant method is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will write the aggregate calculation result into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010496847.9A CN111915294B (en) | 2020-06-03 | 2020-06-03 | Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010496847.9A CN111915294B (en) | 2020-06-03 | 2020-06-03 | Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111915294A CN111915294A (en) | 2020-11-10 |
CN111915294B true CN111915294B (en) | 2023-11-28 |
Family
ID=73237547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010496847.9A Active CN111915294B (en) | 2020-06-03 | 2020-06-03 | Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111915294B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112819177B (en) * | 2021-01-26 | 2022-07-12 | 支付宝(杭州)信息技术有限公司 | Personalized privacy protection learning method, device and equipment |
CN113434873A (en) * | 2021-06-01 | 2021-09-24 | 内蒙古大学 | Federal learning privacy protection method based on homomorphic encryption |
CN113822758B (en) * | 2021-08-04 | 2023-10-13 | 北京工业大学 | Self-adaptive distributed machine learning method based on blockchain and privacy |
CN113806764B (en) * | 2021-08-04 | 2023-11-10 | 北京工业大学 | Distributed support vector machine based on blockchain and privacy protection and optimization method thereof |
CN114118438B (en) * | 2021-10-18 | 2023-07-21 | 华北电力大学 | Privacy protection machine learning training and reasoning method and system based on blockchain |
CN116094732A (en) * | 2023-01-30 | 2023-05-09 | 山东大学 | Block chain consensus protocol privacy protection method and system based on rights and interests proving |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864198A (en) * | 2017-11-07 | 2018-03-30 | 济南浪潮高新科技投资发展有限公司 | A kind of block chain common recognition method based on deep learning training mission |
WO2019222993A1 (en) * | 2018-05-25 | 2019-11-28 | 北京大学深圳研究生院 | Blockchain consensus method based on trust relationship |
CN110599261A (en) * | 2019-09-21 | 2019-12-20 | 江西理工大学 | Electric automobile safety electric power transaction and excitation system based on energy source block chain |
CN110738375A (en) * | 2019-10-16 | 2020-01-31 | 国网湖北省电力有限公司电力科学研究院 | Active power distribution network power transaction main body optimization decision method based on alliance chain framework |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190236559A1 (en) * | 2018-01-31 | 2019-08-01 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing smart flow contracts using distributed ledger technologies in a cloud based computing environment |
-
2020
- 2020-06-03 CN CN202010496847.9A patent/CN111915294B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864198A (en) * | 2017-11-07 | 2018-03-30 | 济南浪潮高新科技投资发展有限公司 | A kind of block chain common recognition method based on deep learning training mission |
WO2019222993A1 (en) * | 2018-05-25 | 2019-11-28 | 北京大学深圳研究生院 | Blockchain consensus method based on trust relationship |
CN110599261A (en) * | 2019-09-21 | 2019-12-20 | 江西理工大学 | Electric automobile safety electric power transaction and excitation system based on energy source block chain |
CN110738375A (en) * | 2019-10-16 | 2020-01-31 | 国网湖北省电力有限公司电力科学研究院 | Active power distribution network power transaction main body optimization decision method based on alliance chain framework |
Also Published As
Publication number | Publication date |
---|---|
CN111915294A (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111915294B (en) | Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology | |
Eyal et al. | {Bitcoin-NG}: A scalable blockchain protocol | |
DE102016224537B4 (en) | Master Block Chain | |
CN110033271B (en) | Cross-chain transaction method, system and computer readable storage medium | |
CN110519246B (en) | Trust degree calculation method based on trust block chain node | |
CN111988290B (en) | Transaction deletion method and system under user balance privacy protection and authorization supervision | |
CN105282160B (en) | Dynamic accesses control method based on prestige | |
CN111369730B (en) | Voting processing method and device based on block chain | |
CN111625820A (en) | Federal defense method based on AIoT-oriented security | |
CN114362987B (en) | Distributed voting system and method based on block chain and intelligent contract | |
CN115270145A (en) | User electricity stealing behavior detection method and system based on alliance chain and federal learning | |
US20190385183A1 (en) | Method for automatically providing cryptocurrency to recommender using propagation on sns | |
WO2019141505A1 (en) | Blockchain-based identity system | |
CN115952532A (en) | Privacy protection method based on federation chain federal learning | |
Blum et al. | Superlight–A permissionless, light-client only blockchain with self-contained proofs and BLS signatures | |
Braun-Dubler et al. | Blockchain: Capabilities, Economic Viability, and the Socio-Technical Environment | |
CN108173658A (en) | A kind of block chain consistency maintaining method and device | |
Han et al. | Study of data center communication network topologies using complex network propagation model | |
CN116186629A (en) | Financial customer classification and prediction method and device based on personalized federal learning | |
CN114172661B (en) | Bidirectional cross-link method, system and device for digital asset | |
Mišić et al. | Towards decentralization in dpos systems: election, voting and leader selection using virtual stake | |
CN116094797B (en) | Distributed identity trust management method based on secure multiparty computation | |
CN112650734B (en) | Block repairing method and related device | |
CN116720594B (en) | Decentralized hierarchical federal learning method | |
CN112434020B (en) | Database account cleaning method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |