CN111915294B - Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology - Google Patents

Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology Download PDF

Info

Publication number
CN111915294B
CN111915294B CN202010496847.9A CN202010496847A CN111915294B CN 111915294 B CN111915294 B CN 111915294B CN 202010496847 A CN202010496847 A CN 202010496847A CN 111915294 B CN111915294 B CN 111915294B
Authority
CN
China
Prior art keywords
node
model
nodes
machine learning
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010496847.9A
Other languages
Chinese (zh)
Other versions
CN111915294A (en
Inventor
曹向辉
梁伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010496847.9A priority Critical patent/CN111915294B/en
Publication of CN111915294A publication Critical patent/CN111915294A/en
Application granted granted Critical
Publication of CN111915294B publication Critical patent/CN111915294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/389Keeping log of transactions for guaranteeing non-repudiation of a transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0823Network architectures or network communication protocols for network security for authentication of entities using certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following parts: the certificate authority CA is responsible for issuing and canceling a digital certificate for the block chain node and managing the authority of the node; the blockchain node is responsible for maintaining a machine learning model and participating in machine learning model transactions; the intelligent contract prescribes running rules of distributed machine learning and carries out profit division on nodes according to model contribution degree; the distributed ledger records model data and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server.

Description

Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology
Technical Field
The invention relates to a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, in particular to a framework which solves the problem of Bayesian attack in distributed machine learning by utilizing a blockchain (alliance chain) technology, protects the data set privacy of each participant by utilizing a differential privacy technology and can finish machine learning model transaction, and belongs to the fields of artificial intelligence, blockchains and information security.
Background
In a parameter server framework commonly used in distributed machine learning, a plurality of working nodes are trained by using local data and a current global model to obtain a local model, the local model is sent to a parameter server, and the parameter server aggregates all the local models and updates the global model. However, there may be security problems in this process, and both the working node and the parameter server node may be subject to a bayer attack. Specifically, a work node subjected to a Bayesian attack will send an erroneous local gradient to the parameter server, thereby affecting the model effect of the final training; the parameter server node is subject to a bayer attack to aggregate an erroneous global model, making the previous training very much more expensive. In recent years, researchers have tried to use blockchains in the fields of internet of things, medical treatment, finance and the like because blockchains have the advantages of non-tamper ability, traceability, distributed storage, public maintenance and the like, and the problems of security, transaction and the like in the blockchains have been solved.
To date, the problem of the bayer attack in distributed machine learning has been studied to some extent. However, there are also the following problems: 1) The existing distributed machine learning algorithm does not consider the situation that a parameter server is subjected to Bayesian attack in the process of aggregating models; 2) How the detected Bayesian node is processed to prevent the Bayesian node from interfering with model training; 3) How to implement an incentive mechanism in a blockchain system combined with distributed machine learning to help the system to operate more efficiently; therefore, a new solution is urgently needed to solve the above technical problems.
Disclosure of Invention
The invention aims at solving the problems of the distributed machine learning, provides an algorithm to solve the problem that a working node and a parameter server node are subjected to the Bayesian attack, and if a blockchain technology is introduced, the problem of consensus in a blockchain needs to be solved, and an effective excitation mechanism is provided to promote the blockchain system to operate effectively and permanently.
In order to solve the technical problems, the invention provides a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following components: part 1, a multi-Certificate Authority (CA) is responsible for issuing and canceling digital certificates for block chain nodes and managing the rights of the nodes; part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction; part 3, the intelligent contract is composed of a machine learning intelligent contract (MLMC) and a model contribution intelligent contract (MCMC), the distribution prescribes the running rule of distributed machine learning and carries out profit division on the nodes according to the model contribution degree; part 4, the distributed ledger records model data (including local model and global model conditions) and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server, part 5. In the scheme, the certificate authority CA can conduct condition auditing, supervision and authority management on all nodes to be added into the system, so that the addition of malicious nodes can be avoided to a certain extent, and the safety of the system is guaranteed. Both the transaction node and the later joining user node need to pay in a chain of hand fees (model transaction fees). After synchronizing the block information, the transaction node exits the system. If the user node is identified as a malicious node, the system can be withdrawn, the former link-entering continuous fee can not be withdrawn, the subsequent model transaction fee can not be obtained, the malicious node is punished, the rule of the intelligent contract is opened to all the user nodes, the content of the intelligent contract is difficult to be tampered by the malicious node, the distributed account book records model data and model transaction data in the machine learning model training process, the traceability of the data is ensured, all the wrought data can be recorded, and the safety of the system is ensured to a certain extent. If each node of the system does not need data set privacy protection, the local gradient may not add gaussian noise; meanwhile, there are many methods for protecting the privacy of the data set, and if there are more suitable methods, the method can be switched to other privacy protection methods.
A method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:
step 1, a alliance chain initialization stage: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved;
step 2, parameter initialization: all user nodes achieve the consistency consensus of the neural network model and synchronize the test set data of the system;
step 3, a local gradient calculation stage: all user nodes sequentially select main nodes according to the order of id from small to large, the rest nodes are endorsement nodes, each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the local gradient to meet differential privacy, and finally the local gradient is sent to the main nodes and the endorsement nodes;
step 4, a global model updating stage: the master node calculates global gradients according to the local gradients of the nodes and a gradient aggregation algorithm with Bayesian fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradients obtain the system consensus, the global model is updated, and related information of the global model is written into the block;
step 5, training termination phase: when the training model meets the expected requirements, the system does not train the model any more and the subsequent function is to maintain the model trade.
As an improvement of the present invention, step 1: the alliance chain initialization stage is specifically as follows:
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying standards of model transaction fees; c. unifying the selection rules of the main node and the endorsement node.
As an improvement of the present invention, step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta t Initial weight w 0 And when the neural network model and the data set are prepared, all user nodes contribute to a test set and unify the test set data of the system. The entire system can then begin neural network model training.
As an improvement of the present invention, step 3: the local gradient calculation stage is specifically as follows:
firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, … and i+m, then each node obtains a local gradient according to a data set and a current model, adds differential privacy on the local gradient, and sends the differential privacy to the main node and the endorsement node;
the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weights arew t The clipping threshold is C, and the noise size sigma;
at the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function;
then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
And finally, each node transmits own local gradient to the main node and the endorsement node.
As an improvement of the present invention, step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, the moments accountant is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
In step 4, the blockchain consensus algorithm IPBFT can effectively verify the gradient aggregation result and effectively identify malicious nodes, and meanwhile, the algorithm is applicable to the alliance chain, and can complete transaction confirmation in a shorter time compared with a common chain consensus algorithm (such as PoW, poS, poET and the like), and the communication complexity of the algorithm is lower.
Compared with the prior art, the invention has the following advantages: 1) The distributed machine learning framework based on the block chain technology has strong practicability and can be used for all distributed machine learning algorithms based on gradient descent; 2) The invention adopts CA to realize effective authority management to block chain nodes (including transaction nodes and user nodes). For the transaction node, the CA can collect the transaction fee of the machine learning model of the CA and control the validity period of the authority; for malicious nodes, the CA can revoke the user rights of the malicious nodes, so that the malicious nodes are prevented from damaging a machine learning model; 3) The IPBFT consensus algorithm provided by the invention can effectively resist the ByBytrring attack of the parameter server node aggregation process and simultaneously identify and reject malicious nodes, so that the system is safer and safer; 4) The invention effectively realizes an excitation mechanism on the block chain. Specifically, intelligent contracts are deployed on the blockchain to realize reasonable distribution of model transaction fees; 5) According to the invention, differential privacy is added in the distributed machine learning, so that the data set privacy of the system participants can be effectively protected.
Drawings
FIG. 1 is a block chain technology based distributed machine learning framework in accordance with the present invention;
FIG. 2 is a CA frame diagram of the present invention;
FIG. 3 is a flow chart of the operation of the present invention;
FIG. 4 is a schematic diagram of the consensus process of IPBFT under normal conditions;
fig. 5 is a graph showing comparison of accuracy of test sets of models obtained by different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in example 2 of the present invention.
Fig. 6 is a graph showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in example 3 of the present invention.
FIG. 7 is a schematic diagram of the consensus process of IPBFT in such an extremely malicious situation;
FIG. 8 is a schematic diagram of the IPBFT algorithm finding out 20 malicious nodes and eliminating the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always in the schematic diagram;
fig. 9 shows that the multi-Krum algorithm has better aggregation effect than the media algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.
Fig. 10 shows that under the condition of introducing differential privacy, after the node is subjected to the bayer attack (random gradient attack), the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition.
Detailed Description
The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.
Example 1: FIG. 1 is a block chain technology based secure, tradable, distributed machine learning framework in accordance with the present invention. The individual components of the frame are described in detail below with reference to fig. 1.
A blockchain technology-based secure, privacy preserving, tradable, distributed machine learning framework, the framework comprising:
part 1: certificate authority CA;
the CA is responsible for issuing and revoking digital certificates for the blockchain nodes and performing authority management on the nodes. It needs to be trusted by all block link points, as well as supervised by all block link points. The structure of which is shown in figure 2. For security, our CA employs the root certificate chain implementation of the more common root and intermediate CAs. The root CA does not issue a certificate directly for the server, it generates two intermediate CAs (user CA and transactor CA) for itself, the intermediate CA acts as a proxy for the root CA to apply visas for the client, and the intermediate CA can reduce the management burden of the root CA.
Part 2: a blockchain node;
in the system framework of the present invention, there are two types of blockchain nodes: transaction nodes and user nodes.
The transaction node is a temporary node that the external user wishes to obtain a training model and join the blockchain network. After obtaining the CA permission to join the blockchain, the transaction node executes the blocksynchronization once, and after executing the blocksynchronization, the digital certificate is revoked, and the node is withdrawn from the network.
The user nodes are the main components of the blockchain network, and the user nodes are used for maintaining and training a machine learning model and writing data packets into a distributed ledger in the blockchain. Each user node has the functions of local gradient calculation, global model aggregation, accounting, block information verification and the like.
Part 3: an intelligent contract;
in the system framework of the invention, there are two smart contracts, distributed as machine learning smart contracts (Machine Learning Smart Contract, MLSC) and model contribution smart contracts (Model Contribution Smart Contract, MCSC).
The MLSC specifies the running rules of distributed machine learning, including local gradient computation, global model computation, IPBFT consensus mechanisms, and so forth.
The MCSC calculates the model contribution degree of each node by checking account book information in the blockchain, divides model transaction fees according to the contribution degree, and meanwhile, can divide a note accounting commission fee by writing the transaction information into accounting nodes of the blockchain.
Contribution C of ith node i The specific calculation process is as follows:
C i =c 1 *l i +c 2 *g i
wherein l i The number of times nodes participate in global gradient computation, g i Is the number of times a node contributes to a local gradient, c 1 And c 2 Is the contribution coefficient of global gradient computation and local gradient computation.
Because model transaction fee f=billingModel contribution benefits R of commission r+each node i A kind of electronic device. Thus, the model contribution revenue R for each node i The calculation process of (2) is as follows:
where K is the total number of user nodes.
Part 4: a distributed ledger;
the distributed ledger records model data (including local and global model cases) and model transaction data during machine learning model training. The method ensures the traceability of the data, and all the wrought data can be recorded, thereby ensuring the safety of the system to a certain extent.
Part 5: a data provider;
the data provider is responsible for collecting the data and uploading it to the local server.
Example 2: a method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:
fig. 3 is a flow chart illustrating the operation of the framework of the present invention, and each stage of the system operation is described in detail below with reference to fig. 3.
Step 1: a alliance chain initialization stage;
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying standards established by a data set of a whole family (such as pictures must be MNIST handwriting data set standards); b. unifying standards of model transaction fees; c. the selection rules of the main node and the endorsement node are unified (here, the main node is selected in a circulation mode from small to large according to the node id, m nodes with the node id behind the main node id select the endorsement node, if the nodes with the node id larger than the main node id are less than m nodes, the nodes are sequentially complemented from the beginning with the smallest id).
Step 2: a parameter initialization stage;
in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network modelComprises determining the network structure of a neural network model, the batch size, the training iteration times T and the learning rate eta t Initial weight w 0 And the clipping threshold is C, and the noise size sigma and other parameters. At the same time, the block link issues the data set criteria to the data provider. The data provider collects the training set and uploads it to the blockchain node.
After the neural network model and the data set are prepared, all user nodes contribute to the test set and unify the test set data of the system. The entire system can then begin neural network model training.
Step 3: a local gradient calculation stage;
first, all user nodes in a block chain determine a main node and an endorsement node, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m. And each node obtains a local gradient according to the data set and the current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main node and the endorsement node.
The specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w t The clipping threshold is C, the noise level sigma.
At the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function.
Then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
Step 4: a global model updating stage;
after receiving the local gradients of each node, the master node runs a gradient aggregation algorithm (e.g., multi-Krum, l-nearest aggregation, etc.) with bayer fault tolerance to aggregate the local gradients to obtain global gradients and update the model, while employing moments accountant to track privacy loss. The system then runs the IPBFT consensus algorithm: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
IPBFT: the consensus process of the IPBFT algorithm, as shown in FIGS. 4, 5, 6 and 7, consists of 8 stages, and the distribution is request-1 (R-1), pre-preparation-1 (Pp-1), preparation-1 (P-2), commit-1 (C-1), request-2 (R-2), pre-preparation-2 (Pp-2), preparation-2 (P-2) and commit-2 (C-2). All user nodes are divided into a master node (L), an endorsement node (E), and a generic node (G). Normally, as shown in FIG. 4, the system can reach consensus by only executing 4 steps of R-1, pp-1, P-1 and C-1. While FIGS. 5 and 6 are in an abnormal situation, the system performs 4 steps R-2, pp-2, P-2 and C-2 more than in a normal situation. The moment when the system starts to operate IPBFT is defined as 0 moment, if the system is at t 1 When consensus is reached before the moment, a new main node is selected and the next consensus process is started; otherwise, the IPBFT will determine if the master node is a malicious node. If the system is at t 2 The moment still does not reach the consensus, the master node in the consensus process is considered as a malicious node and is removed from the system. FIG. 7 is an extremely exceptional case where the false aggregate result is consensus, but in our system malicious nodes are continuously culled, while in the federated chain, the likelihood of node aversion is low due to CA joining, becauseThis extremely malicious situation is a small probability event, which is almost impossible to occur. And even if the false aggregation result is introduced in the initial stage of training, the false aggregation result does not affect the final training model.
As shown in fig. 4, under normal conditions, the master node is honest, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: if block t Endorsed node E i The endorsement node will send a valid endorsement credential Vote (block t ,E i ) To the master node.
4) C-1: in this case, the master node will receive at leastThe certificate is approved and then a block certificate Cert (block t ). The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
As shown in fig. 5, in such an abnormal situation, the master node is malicious, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: because of block t Will not be verified by malicious endorsement nodes, which will not send approvalThe credential is given to the master node. Thus, the number of approval credentials received by the master node will be less thanThe master node will not generate the block certificate Cert (block) t )。
4) R-2: in this abnormal situation, the system is at t 1 The block is not achieved before the moment t All user nodes will send their own local gradients to the rest of the user nodes.
5) Pp-2: the master node will broadcast the block t And verifying until all other user nodes. But in such an abnormal situation the number of approval credentials received by the master node will be less than(K is the number of user nodes) the system does not achieve block mapping t And (5) consensus. At the same time, the system will not be at t 2 The moment before reaching consensus, the master node will be considered malicious and will be rejected from the system.
As shown in fig. 6, in this abnormal case, the master node is honest, and the honest number of endorsement nodes is less thanThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the master node calculates the block t And sending the data to an endorsement node for verification.
3) P-1: if block t Endorsed node E i The endorsement node will send a valid endorsement credential Vote (block t ,E i ) To the master node. However, in this case, the number of valid endorsement credentials would be less thanThe master node will not be able to generate the block certificate.
4) R-2: where it isIn case of species abnormality, the system at t 1 The block is not achieved before the moment t All user nodes will send their own local gradients to the rest of the user nodes.
5) Pp-2: the master node will broadcast the block t And verifying until all other user nodes.
6) P-2: if block t By user node P i The user node will send a valid approval credential Vote (block t ,P i ) To the master node.
7) C-2: in this case, the number of approval credentials received by the master node will be no less thanIt can generate a block certificate Cert (block) t ). The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
As shown in fig. 7, in this extremely malicious case, the master node is malicious, and the number of endorsement nodes that are malicious and colluded with the master node is not lessThe consensus process of IPBFT at this time is as follows:
1) R-1: each user node sends its own local gradient to the master node and the endorsement node.
2) Pp-1: the malicious master node can obtain the wrong aggregation result and block t And sending the data to an endorsement node for verification.
3) P-1: in this case, block t Endorsement node E that can be malicious and colluded with the master node i The endorsement node will send approval credential Vote (block) t ,E i ) To the master node.
4) C-1: in this case, the master node will receive at leastEndorsement of the credential, a block certificate Cert (block) t ) The master node then blocks the block t And block certificate Cert (block) t ) And sending the block synchronization data to other user nodes for block synchronization.
It can be seen that in the extremely abnormal case of fig. 7, the master node is malicious and colluded with some endorsement nodes, which has a very small probability of occurrence in our system. Because our system will progressively cull malicious nodes as training proceeds, and the probability of node aversion is minimal in the federation chain due to the addition of CA.
Table 1 is a comparison of performance of a related consensus algorithm applied in the distributed machine learning framework presented in the present invention. It can be seen that the common-knowledge algorithm IPBFT provided by the present invention can identify malicious nodes, while PBFT and PoW cannot identify malicious nodes. In addition, PBFT and PoW need to communicate local gradients with each other in all nodes, so their communication complexity is O (K 2 ) Where K is the number of user nodes. After running IPBFT, after malicious nodes are gradually removed along with the progress of training, the user node only needs to send local gradients to 1 master node and m endorsement nodes, so that the communication complexity is O (mK) under the general condition; only in the two malicious cases of fig. 5 and 6, its communication complexity is O (K 2 ). Thus, the communication complexity of IPBFT is better than PBFT and PoW.
Table 1 comparison of related consensus algorithms
Step 5: a training termination stage;
when the training model meets the expected requirements (the model accuracy meets the requirements or the privacy loss of the model is about to exceed the privacy budget requirements), the system does not begin training any more. Subsequently, the main function of the blockchain is to maintain the trade of the machine learning model, and if new data is added or the model algorithm needs to be improved, the flow of the machine learning training can be restarted.
Example 2:
fig. 8 is a graph showing the comparison of the number of nodes with the number of iterations, in the gradient aggregation process, in which 20 nodes of the blockchain are subjected to the bayer attack, and the IPBFT algorithm and the PoW algorithm are respectively operated.
Fig. 9 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in the second embodiment of the present invention.
Fig. 10 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in the second embodiment of the present invention.
As can be seen from fig. 8, with the operation of the system, the IPBFT algorithm finds out 20 malicious nodes, and eliminates the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always present.
As can be seen in fig. 9, the multi-Krum algorithm aggregates better than the media algorithm and more closely approximates the ideal situation after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.
As can be seen from fig. 10, in the case of introducing differential privacy, the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack).
From the experimental results, the framework provided by the method can effectively solve the problem that both the parameter server and the working node are attacked by the Bayesian in the distributed machine learning, meanwhile, the framework can reward the contribution nodes, reject the malicious nodes and ensure that the system can run better. In addition, the framework can also apply other different Bayesian aggregation algorithms to optimize the model effect.
Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.

Claims (5)

1. A safe, privacy protection and tradable running method of a distributed machine learning framework based on a blockchain technology is realized by adopting a distributed machine learning system, and the system comprises the following parts:
part 1, a Certificate Authority (CA) is responsible for issuing and canceling a digital certificate for a block chain node and managing the authority of the node;
part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction;
the intelligent contract is composed of a machine learning intelligent contract MLMC and a model contribution intelligent contract MCMC, and the distribution prescribes the running rule of distributed machine learning and carries out profit division on nodes according to the model contribution degree;
part 4, the distributed ledger records model data in the training process of the machine learning model, including local model and global model conditions and model transaction data;
part 5, the data provider is responsible for collecting local data and uploading the local data to the blockchain node server;
the operation method is characterized by comprising the following steps:
step 1, a alliance chain initialization stage: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved;
step 2, parameter initialization: all user nodes achieve the consistency consensus of the neural network model and synchronize the test set data of the system;
step 3, a local gradient calculation stage: sequentially and circularly selecting main nodes by all user nodes according to the order of id from small to large, wherein m nodes are endorsement nodes after the main nodes are id, then each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main nodes and the endorsement nodes;
step 4, a global model updating stage: the master node calculates global gradients according to the local gradients of the nodes and a gradient aggregation algorithm with Bayesian fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradients obtain the system consensus, the global model is updated, and related information of the global model is written into the block;
step 5, training termination phase: when the training model meets the expected requirements, the system does not train the model any more and the subsequent function is to maintain the model trade.
2. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 1, wherein step 1: the alliance chain initialization stage is specifically as follows:
the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying the standards of the model transaction fee, wherein the transaction fee increases along with the perfection degree of the model; c. and unifying the selection rules of the main node and the endorsement node, sequentially and circularly selecting the main node according to the order of the node ids from small to large, wherein m nodes after the main node id are endorsement nodes.
3. The method of operation of a blockchain technology based secure, privacy preserving, tradable, distributed machine learning framework of claim 2, wherein step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta t Initial weight w 0 Cutting threshold value is C, noise size sigma parameter, at the same time, block chain link point transmits data set standard to data provider, data provider collects training set and uploads it to block chain node, when neural networkAfter the model and the data set are prepared, all user nodes contribute to the test set, the test set data of the system are unified, and then the whole system starts the training of the neural network model.
4. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 3: the local gradient calculation stage is specifically as follows:
firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m, then each node obtains a local gradient according to a data set and a current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally, the local gradient is sent to the main node and the endorsement node;
the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w t The clipping threshold is C, and the noise size sigma;
at the t-th iteration, the local gradient of each sample of the kth working node is that
Wherein the model prediction result is thatl () is a loss function;
then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node k (w t ) Is that
5. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, a moments accountant method is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will write the aggregate calculation result into the block t In (2) then block t Sending the block to an endorsement node for verification, and if the block passes the verification t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.
CN202010496847.9A 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology Active CN111915294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010496847.9A CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010496847.9A CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Publications (2)

Publication Number Publication Date
CN111915294A CN111915294A (en) 2020-11-10
CN111915294B true CN111915294B (en) 2023-11-28

Family

ID=73237547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010496847.9A Active CN111915294B (en) 2020-06-03 2020-06-03 Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Country Status (1)

Country Link
CN (1) CN111915294B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819177B (en) * 2021-01-26 2022-07-12 支付宝(杭州)信息技术有限公司 Personalized privacy protection learning method, device and equipment
CN113434873A (en) * 2021-06-01 2021-09-24 内蒙古大学 Federal learning privacy protection method based on homomorphic encryption
CN113822758B (en) * 2021-08-04 2023-10-13 北京工业大学 Self-adaptive distributed machine learning method based on blockchain and privacy
CN113806764B (en) * 2021-08-04 2023-11-10 北京工业大学 Distributed support vector machine based on blockchain and privacy protection and optimization method thereof
CN114118438B (en) * 2021-10-18 2023-07-21 华北电力大学 Privacy protection machine learning training and reasoning method and system based on blockchain
CN116094732A (en) * 2023-01-30 2023-05-09 山东大学 Block chain consensus protocol privacy protection method and system based on rights and interests proving

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864198A (en) * 2017-11-07 2018-03-30 济南浪潮高新科技投资发展有限公司 A kind of block chain common recognition method based on deep learning training mission
WO2019222993A1 (en) * 2018-05-25 2019-11-28 北京大学深圳研究生院 Blockchain consensus method based on trust relationship
CN110599261A (en) * 2019-09-21 2019-12-20 江西理工大学 Electric automobile safety electric power transaction and excitation system based on energy source block chain
CN110738375A (en) * 2019-10-16 2020-01-31 国网湖北省电力有限公司电力科学研究院 Active power distribution network power transaction main body optimization decision method based on alliance chain framework

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236559A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing smart flow contracts using distributed ledger technologies in a cloud based computing environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107864198A (en) * 2017-11-07 2018-03-30 济南浪潮高新科技投资发展有限公司 A kind of block chain common recognition method based on deep learning training mission
WO2019222993A1 (en) * 2018-05-25 2019-11-28 北京大学深圳研究生院 Blockchain consensus method based on trust relationship
CN110599261A (en) * 2019-09-21 2019-12-20 江西理工大学 Electric automobile safety electric power transaction and excitation system based on energy source block chain
CN110738375A (en) * 2019-10-16 2020-01-31 国网湖北省电力有限公司电力科学研究院 Active power distribution network power transaction main body optimization decision method based on alliance chain framework

Also Published As

Publication number Publication date
CN111915294A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111915294B (en) Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology
Eyal et al. {Bitcoin-NG}: A scalable blockchain protocol
DE102016224537B4 (en) Master Block Chain
CN110033271B (en) Cross-chain transaction method, system and computer readable storage medium
CN110519246B (en) Trust degree calculation method based on trust block chain node
CN111988290B (en) Transaction deletion method and system under user balance privacy protection and authorization supervision
CN105282160B (en) Dynamic accesses control method based on prestige
CN111369730B (en) Voting processing method and device based on block chain
CN111625820A (en) Federal defense method based on AIoT-oriented security
CN114362987B (en) Distributed voting system and method based on block chain and intelligent contract
CN115270145A (en) User electricity stealing behavior detection method and system based on alliance chain and federal learning
US20190385183A1 (en) Method for automatically providing cryptocurrency to recommender using propagation on sns
WO2019141505A1 (en) Blockchain-based identity system
CN115952532A (en) Privacy protection method based on federation chain federal learning
Blum et al. Superlight–A permissionless, light-client only blockchain with self-contained proofs and BLS signatures
Braun-Dubler et al. Blockchain: Capabilities, Economic Viability, and the Socio-Technical Environment
CN108173658A (en) A kind of block chain consistency maintaining method and device
Han et al. Study of data center communication network topologies using complex network propagation model
CN116186629A (en) Financial customer classification and prediction method and device based on personalized federal learning
CN114172661B (en) Bidirectional cross-link method, system and device for digital asset
Mišić et al. Towards decentralization in dpos systems: election, voting and leader selection using virtual stake
CN116094797B (en) Distributed identity trust management method based on secure multiparty computation
CN112650734B (en) Block repairing method and related device
CN116720594B (en) Decentralized hierarchical federal learning method
CN112434020B (en) Database account cleaning method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant