CN111915294B

CN111915294B - Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Info

Publication number: CN111915294B
Application number: CN202010496847.9A
Authority: CN
Inventors: 曹向辉; 梁伦
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2023-11-28
Anticipated expiration: 2040-06-03
Also published as: CN111915294A

Abstract

The invention discloses a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following parts: the certificate authority CA is responsible for issuing and canceling a digital certificate for the block chain node and managing the authority of the node; the blockchain node is responsible for maintaining a machine learning model and participating in machine learning model transactions; the intelligent contract prescribes running rules of distributed machine learning and carries out profit division on nodes according to model contribution degree; the distributed ledger records model data and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server.

Description

Safe, privacy-preserving and tradable distributed machine learning framework operation method based on blockchain technology

Technical Field

The invention relates to a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, in particular to a framework which solves the problem of Bayesian attack in distributed machine learning by utilizing a blockchain (alliance chain) technology, protects the data set privacy of each participant by utilizing a differential privacy technology and can finish machine learning model transaction, and belongs to the fields of artificial intelligence, blockchains and information security.

Background

In a parameter server framework commonly used in distributed machine learning, a plurality of working nodes are trained by using local data and a current global model to obtain a local model, the local model is sent to a parameter server, and the parameter server aggregates all the local models and updates the global model. However, there may be security problems in this process, and both the working node and the parameter server node may be subject to a bayer attack. Specifically, a work node subjected to a Bayesian attack will send an erroneous local gradient to the parameter server, thereby affecting the model effect of the final training; the parameter server node is subject to a bayer attack to aggregate an erroneous global model, making the previous training very much more expensive. In recent years, researchers have tried to use blockchains in the fields of internet of things, medical treatment, finance and the like because blockchains have the advantages of non-tamper ability, traceability, distributed storage, public maintenance and the like, and the problems of security, transaction and the like in the blockchains have been solved.

To date, the problem of the bayer attack in distributed machine learning has been studied to some extent. However, there are also the following problems: 1) The existing distributed machine learning algorithm does not consider the situation that a parameter server is subjected to Bayesian attack in the process of aggregating models; 2) How the detected Bayesian node is processed to prevent the Bayesian node from interfering with model training; 3) How to implement an incentive mechanism in a blockchain system combined with distributed machine learning to help the system to operate more efficiently; therefore, a new solution is urgently needed to solve the above technical problems.

Disclosure of Invention

The invention aims at solving the problems of the distributed machine learning, provides an algorithm to solve the problem that a working node and a parameter server node are subjected to the Bayesian attack, and if a blockchain technology is introduced, the problem of consensus in a blockchain needs to be solved, and an effective excitation mechanism is provided to promote the blockchain system to operate effectively and permanently.

In order to solve the technical problems, the invention provides a safe, privacy-protecting and tradable distributed machine learning framework based on a blockchain technology, which comprises the following components: part 1, a multi-Certificate Authority (CA) is responsible for issuing and canceling digital certificates for block chain nodes and managing the rights of the nodes; part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction; part 3, the intelligent contract is composed of a machine learning intelligent contract (MLMC) and a model contribution intelligent contract (MCMC), the distribution prescribes the running rule of distributed machine learning and carries out profit division on the nodes according to the model contribution degree; part 4, the distributed ledger records model data (including local model and global model conditions) and model transaction data in the machine learning model training process; the data provider is responsible for collecting local data for uploading to the block link point server, part 5. In the scheme, the certificate authority CA can conduct condition auditing, supervision and authority management on all nodes to be added into the system, so that the addition of malicious nodes can be avoided to a certain extent, and the safety of the system is guaranteed. Both the transaction node and the later joining user node need to pay in a chain of hand fees (model transaction fees). After synchronizing the block information, the transaction node exits the system. If the user node is identified as a malicious node, the system can be withdrawn, the former link-entering continuous fee can not be withdrawn, the subsequent model transaction fee can not be obtained, the malicious node is punished, the rule of the intelligent contract is opened to all the user nodes, the content of the intelligent contract is difficult to be tampered by the malicious node, the distributed account book records model data and model transaction data in the machine learning model training process, the traceability of the data is ensured, all the wrought data can be recorded, and the safety of the system is ensured to a certain extent. If each node of the system does not need data set privacy protection, the local gradient may not add gaussian noise; meanwhile, there are many methods for protecting the privacy of the data set, and if there are more suitable methods, the method can be switched to other privacy protection methods.

A method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:

step 1, a alliance chain initialization stage: the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved;

step 2, parameter initialization: all user nodes achieve the consistency consensus of the neural network model and synchronize the test set data of the system;

step 3, a local gradient calculation stage: all user nodes sequentially select main nodes according to the order of id from small to large, the rest nodes are endorsement nodes, each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the local gradient to meet differential privacy, and finally the local gradient is sent to the main nodes and the endorsement nodes;

step 4, a global model updating stage: the master node calculates global gradients according to the local gradients of the nodes and a gradient aggregation algorithm with Bayesian fault tolerance, then the system runs an IPBFT consensus algorithm, if the global gradients obtain the system consensus, the global model is updated, and related information of the global model is written into the block;

step 5, training termination phase: when the training model meets the expected requirements, the system does not train the model any more and the subsequent function is to maintain the model trade.

As an improvement of the present invention, step 1: the alliance chain initialization stage is specifically as follows:

the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying standards of model transaction fees; c. unifying the selection rules of the main node and the endorsement node.

As an improvement of the present invention, step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta _t Initial weight w ₀ And when the neural network model and the data set are prepared, all user nodes contribute to a test set and unify the test set data of the system. The entire system can then begin neural network model training.

As an improvement of the present invention, step 3: the local gradient calculation stage is specifically as follows:

firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, … and i+m, then each node obtains a local gradient according to a data set and a current model, adds differential privacy on the local gradient, and sends the differential privacy to the main node and the endorsement node;

the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weights arew _t The clipping threshold is C, and the noise size sigma;

at the t-th iteration, the local gradient of each sample of the kth working node is that

Wherein the model prediction result is thatl () is a loss function;

then cutting the local gradient, adding Gaussian noise, and finally obtaining the local gradient g of the kth node _k (w _t ) Is that

And finally, each node transmits own local gradient to the main node and the endorsement node.

As an improvement of the present invention, step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, the moments accountant is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block _t In (2) then block _t Sending the block to an endorsement node for verification, and if the block passes the verification _t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.

In step 4, the blockchain consensus algorithm IPBFT can effectively verify the gradient aggregation result and effectively identify malicious nodes, and meanwhile, the algorithm is applicable to the alliance chain, and can complete transaction confirmation in a shorter time compared with a common chain consensus algorithm (such as PoW, poS, poET and the like), and the communication complexity of the algorithm is lower.

Compared with the prior art, the invention has the following advantages: 1) The distributed machine learning framework based on the block chain technology has strong practicability and can be used for all distributed machine learning algorithms based on gradient descent; 2) The invention adopts CA to realize effective authority management to block chain nodes (including transaction nodes and user nodes). For the transaction node, the CA can collect the transaction fee of the machine learning model of the CA and control the validity period of the authority; for malicious nodes, the CA can revoke the user rights of the malicious nodes, so that the malicious nodes are prevented from damaging a machine learning model; 3) The IPBFT consensus algorithm provided by the invention can effectively resist the ByBytrring attack of the parameter server node aggregation process and simultaneously identify and reject malicious nodes, so that the system is safer and safer; 4) The invention effectively realizes an excitation mechanism on the block chain. Specifically, intelligent contracts are deployed on the blockchain to realize reasonable distribution of model transaction fees; 5) According to the invention, differential privacy is added in the distributed machine learning, so that the data set privacy of the system participants can be effectively protected.

Drawings

FIG. 1 is a block chain technology based distributed machine learning framework in accordance with the present invention;

FIG. 2 is a CA frame diagram of the present invention;

FIG. 3 is a flow chart of the operation of the present invention;

FIG. 4 is a schematic diagram of the consensus process of IPBFT under normal conditions;

fig. 5 is a graph showing comparison of accuracy of test sets of models obtained by different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in example 2 of the present invention.

Fig. 6 is a graph showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in example 3 of the present invention.

FIG. 7 is a schematic diagram of the consensus process of IPBFT in such an extremely malicious situation;

FIG. 8 is a schematic diagram of the IPBFT algorithm finding out 20 malicious nodes and eliminating the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always in the schematic diagram;

fig. 9 shows that the multi-Krum algorithm has better aggregation effect than the media algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.

Fig. 10 shows that under the condition of introducing differential privacy, after the node is subjected to the bayer attack (random gradient attack), the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition.

Detailed Description

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

Example 1: FIG. 1 is a block chain technology based secure, tradable, distributed machine learning framework in accordance with the present invention. The individual components of the frame are described in detail below with reference to fig. 1.

A blockchain technology-based secure, privacy preserving, tradable, distributed machine learning framework, the framework comprising:

part 1: certificate authority CA;

the CA is responsible for issuing and revoking digital certificates for the blockchain nodes and performing authority management on the nodes. It needs to be trusted by all block link points, as well as supervised by all block link points. The structure of which is shown in figure 2. For security, our CA employs the root certificate chain implementation of the more common root and intermediate CAs. The root CA does not issue a certificate directly for the server, it generates two intermediate CAs (user CA and transactor CA) for itself, the intermediate CA acts as a proxy for the root CA to apply visas for the client, and the intermediate CA can reduce the management burden of the root CA.

Part 2: a blockchain node;

in the system framework of the present invention, there are two types of blockchain nodes: transaction nodes and user nodes.

The transaction node is a temporary node that the external user wishes to obtain a training model and join the blockchain network. After obtaining the CA permission to join the blockchain, the transaction node executes the blocksynchronization once, and after executing the blocksynchronization, the digital certificate is revoked, and the node is withdrawn from the network.

The user nodes are the main components of the blockchain network, and the user nodes are used for maintaining and training a machine learning model and writing data packets into a distributed ledger in the blockchain. Each user node has the functions of local gradient calculation, global model aggregation, accounting, block information verification and the like.

Part 3: an intelligent contract;

in the system framework of the invention, there are two smart contracts, distributed as machine learning smart contracts (Machine Learning Smart Contract, MLSC) and model contribution smart contracts (Model Contribution Smart Contract, MCSC).

The MLSC specifies the running rules of distributed machine learning, including local gradient computation, global model computation, IPBFT consensus mechanisms, and so forth.

The MCSC calculates the model contribution degree of each node by checking account book information in the blockchain, divides model transaction fees according to the contribution degree, and meanwhile, can divide a note accounting commission fee by writing the transaction information into accounting nodes of the blockchain.

Contribution C of ith node _i The specific calculation process is as follows:

C _i ＝c ₁ *l _i +c ₂ *g _i ，

wherein l _i The number of times nodes participate in global gradient computation, g _i Is the number of times a node contributes to a local gradient, c ₁ And c ₂ Is the contribution coefficient of global gradient computation and local gradient computation.

Because model transaction fee f=billingModel contribution benefits R of commission r+each node _i A kind of electronic device. Thus, the model contribution revenue R for each node _i The calculation process of (2) is as follows:

where K is the total number of user nodes.

Part 4: a distributed ledger;

the distributed ledger records model data (including local and global model cases) and model transaction data during machine learning model training. The method ensures the traceability of the data, and all the wrought data can be recorded, thereby ensuring the safety of the system to a certain extent.

Part 5: a data provider;

the data provider is responsible for collecting the data and uploading it to the local server.

Example 2: a method of operating a secure, privacy preserving, tradable distributed machine learning framework based on blockchain technology, the method of operating comprising the steps of:

fig. 3 is a flow chart illustrating the operation of the framework of the present invention, and each stage of the system operation is described in detail below with reference to fig. 3.

Step 1: a alliance chain initialization stage;

the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying standards established by a data set of a whole family (such as pictures must be MNIST handwriting data set standards); b. unifying standards of model transaction fees; c. the selection rules of the main node and the endorsement node are unified (here, the main node is selected in a circulation mode from small to large according to the node id, m nodes with the node id behind the main node id select the endorsement node, if the nodes with the node id larger than the main node id are less than m nodes, the nodes are sequentially complemented from the beginning with the smallest id).

Step 2: a parameter initialization stage;

in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network modelComprises determining the network structure of a neural network model, the batch size, the training iteration times T and the learning rate eta _t Initial weight w ₀ And the clipping threshold is C, and the noise size sigma and other parameters. At the same time, the block link issues the data set criteria to the data provider. The data provider collects the training set and uploads it to the blockchain node.

After the neural network model and the data set are prepared, all user nodes contribute to the test set and unify the test set data of the system. The entire system can then begin neural network model training.

Step 3: a local gradient calculation stage;

first, all user nodes in a block chain determine a main node and an endorsement node, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m. And each node obtains a local gradient according to the data set and the current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main node and the endorsement node.

The specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w _t The clipping threshold is C, the noise level sigma.

Wherein the model prediction result is thatl () is a loss function.

Step 4: a global model updating stage;

after receiving the local gradients of each node, the master node runs a gradient aggregation algorithm (e.g., multi-Krum, l-nearest aggregation, etc.) with bayer fault tolerance to aggregate the local gradients to obtain global gradients and update the model, while employing moments accountant to track privacy loss. The system then runs the IPBFT consensus algorithm: the master node will first write the aggregate computation results (including master node id, aggregate gradient, differential privacy loss, selected node id and local gradient information) into the block _t In (2) then block _t Sending the block to an endorsement node for verification, and if the block passes the verification _t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.

IPBFT: the consensus process of the IPBFT algorithm, as shown in FIGS. 4, 5, 6 and 7, consists of 8 stages, and the distribution is request-1 (R-1), pre-preparation-1 (Pp-1), preparation-1 (P-2), commit-1 (C-1), request-2 (R-2), pre-preparation-2 (Pp-2), preparation-2 (P-2) and commit-2 (C-2). All user nodes are divided into a master node (L), an endorsement node (E), and a generic node (G). Normally, as shown in FIG. 4, the system can reach consensus by only executing 4 steps of R-1, pp-1, P-1 and C-1. While FIGS. 5 and 6 are in an abnormal situation, the system performs 4 steps R-2, pp-2, P-2 and C-2 more than in a normal situation. The moment when the system starts to operate IPBFT is defined as 0 moment, if the system is at t ₁ When consensus is reached before the moment, a new main node is selected and the next consensus process is started; otherwise, the IPBFT will determine if the master node is a malicious node. If the system is at t ₂ The moment still does not reach the consensus, the master node in the consensus process is considered as a malicious node and is removed from the system. FIG. 7 is an extremely exceptional case where the false aggregate result is consensus, but in our system malicious nodes are continuously culled, while in the federated chain, the likelihood of node aversion is low due to CA joining, becauseThis extremely malicious situation is a small probability event, which is almost impossible to occur. And even if the false aggregation result is introduced in the initial stage of training, the false aggregation result does not affect the final training model.

As shown in fig. 4, under normal conditions, the master node is honest, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:

1) R-1: each user node sends its own local gradient to the master node and the endorsement node.

2) Pp-1: the master node calculates the block _t And sending the data to an endorsement node for verification.

3) P-1: if block _t Endorsed node E _i The endorsement node will send a valid endorsement credential Vote (block _t ,E _i ) To the master node.

4) C-1: in this case, the master node will receive at leastThe certificate is approved and then a block certificate Cert (block _t ). The master node then blocks the block _t And block certificate Cert (block) _t ) And sending the block synchronization data to other user nodes for block synchronization.

As shown in fig. 5, in such an abnormal situation, the master node is malicious, and the number of honest endorsement nodes is not less thanThe consensus process of IPBFT at this time is as follows:

3) P-1: because of block _t Will not be verified by malicious endorsement nodes, which will not send approvalThe credential is given to the master node. Thus, the number of approval credentials received by the master node will be less thanThe master node will not generate the block certificate Cert (block) _t )。

4) R-2: in this abnormal situation, the system is at t ₁ The block is not achieved before the moment _t All user nodes will send their own local gradients to the rest of the user nodes.

5) Pp-2: the master node will broadcast the block _t And verifying until all other user nodes. But in such an abnormal situation the number of approval credentials received by the master node will be less than(K is the number of user nodes) the system does not achieve block mapping _t And (5) consensus. At the same time, the system will not be at t ₂ The moment before reaching consensus, the master node will be considered malicious and will be rejected from the system.

As shown in fig. 6, in this abnormal case, the master node is honest, and the honest number of endorsement nodes is less thanThe consensus process of IPBFT at this time is as follows:

3) P-1: if block _t Endorsed node E _i The endorsement node will send a valid endorsement credential Vote (block _t ,E _i ) To the master node. However, in this case, the number of valid endorsement credentials would be less thanThe master node will not be able to generate the block certificate.

4) R-2: where it isIn case of species abnormality, the system at t ₁ The block is not achieved before the moment _t All user nodes will send their own local gradients to the rest of the user nodes.

5) Pp-2: the master node will broadcast the block _t And verifying until all other user nodes.

6) P-2: if block _t By user node P _i The user node will send a valid approval credential Vote (block _t ,P _i ) To the master node.

7) C-2: in this case, the number of approval credentials received by the master node will be no less thanIt can generate a block certificate Cert (block) _t ). The master node then blocks the block _t And block certificate Cert (block) _t ) And sending the block synchronization data to other user nodes for block synchronization.

As shown in fig. 7, in this extremely malicious case, the master node is malicious, and the number of endorsement nodes that are malicious and colluded with the master node is not lessThe consensus process of IPBFT at this time is as follows:

2) Pp-1: the malicious master node can obtain the wrong aggregation result and block _t And sending the data to an endorsement node for verification.

3) P-1: in this case, block _t Endorsement node E that can be malicious and colluded with the master node _i The endorsement node will send approval credential Vote (block) _t ,E _i ) To the master node.

4) C-1: in this case, the master node will receive at leastEndorsement of the credential, a block certificate Cert (block) _t ) The master node then blocks the block _t And block certificate Cert (block) _t ) And sending the block synchronization data to other user nodes for block synchronization.

It can be seen that in the extremely abnormal case of fig. 7, the master node is malicious and colluded with some endorsement nodes, which has a very small probability of occurrence in our system. Because our system will progressively cull malicious nodes as training proceeds, and the probability of node aversion is minimal in the federation chain due to the addition of CA.

Table 1 is a comparison of performance of a related consensus algorithm applied in the distributed machine learning framework presented in the present invention. It can be seen that the common-knowledge algorithm IPBFT provided by the present invention can identify malicious nodes, while PBFT and PoW cannot identify malicious nodes. In addition, PBFT and PoW need to communicate local gradients with each other in all nodes, so their communication complexity is O (K ² ) Where K is the number of user nodes. After running IPBFT, after malicious nodes are gradually removed along with the progress of training, the user node only needs to send local gradients to 1 master node and m endorsement nodes, so that the communication complexity is O (mK) under the general condition; only in the two malicious cases of fig. 5 and 6, its communication complexity is O (K ² ). Thus, the communication complexity of IPBFT is better than PBFT and PoW.

Table 1 comparison of related consensus algorithms

Step 5: a training termination stage;

when the training model meets the expected requirements (the model accuracy meets the requirements or the privacy loss of the model is about to exceed the privacy budget requirements), the system does not begin training any more. Subsequently, the main function of the blockchain is to maintain the trade of the machine learning model, and if new data is added or the model algorithm needs to be improved, the flow of the machine learning training can be restarted.

Example 2:

fig. 8 is a graph showing the comparison of the number of nodes with the number of iterations, in the gradient aggregation process, in which 20 nodes of the blockchain are subjected to the bayer attack, and the IPBFT algorithm and the PoW algorithm are respectively operated.

Fig. 9 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is not introduced) in the second embodiment of the present invention.

Fig. 10 is a schematic diagram showing comparison of accuracy of test sets obtained by using different aggregation methods after 8 nodes of 20 nodes of a blockchain are subjected to a bayer attack when local gradient calculation is performed (differential privacy is introduced) in the second embodiment of the present invention.

As can be seen from fig. 8, with the operation of the system, the IPBFT algorithm finds out 20 malicious nodes, and eliminates the malicious nodes from the system, while the malicious nodes in the system running the PoW algorithm are always present.

As can be seen in fig. 9, the multi-Krum algorithm aggregates better than the media algorithm and more closely approximates the ideal situation after the node is subjected to the bayer attack (random gradient attack) without introducing differential privacy.

As can be seen from fig. 10, in the case of introducing differential privacy, the media algorithm has better aggregation effect than the multi-Krum algorithm and is closer to the ideal condition after the node is subjected to the bayer attack (random gradient attack).

From the experimental results, the framework provided by the method can effectively solve the problem that both the parameter server and the working node are attacked by the Bayesian in the distributed machine learning, meanwhile, the framework can reward the contribution nodes, reject the malicious nodes and ensure that the system can run better. In addition, the framework can also apply other different Bayesian aggregation algorithms to optimize the model effect.

Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.

Claims

1. A safe, privacy protection and tradable running method of a distributed machine learning framework based on a blockchain technology is realized by adopting a distributed machine learning system, and the system comprises the following parts:

part 1, a Certificate Authority (CA) is responsible for issuing and canceling a digital certificate for a block chain node and managing the authority of the node;

part 2, the blockchain node is composed of user nodes and transaction nodes and is respectively responsible for maintaining a machine learning model and participating in machine learning model transaction;

the intelligent contract is composed of a machine learning intelligent contract MLMC and a model contribution intelligent contract MCMC, and the distribution prescribes the running rule of distributed machine learning and carries out profit division on nodes according to the model contribution degree;

part 4, the distributed ledger records model data in the training process of the machine learning model, including local model and global model conditions and model transaction data;

part 5, the data provider is responsible for collecting local data and uploading the local data to the blockchain node server;

the operation method is characterized by comprising the following steps:

step 3, a local gradient calculation stage: sequentially and circularly selecting main nodes by all user nodes according to the order of id from small to large, wherein m nodes are endorsement nodes after the main nodes are id, then each node calculates a local gradient by using local data and a current model, gaussian noise is added to the gradient to enable the gradient to meet a differential privacy mechanism, and finally the local gradient is sent to the main nodes and the endorsement nodes;

2. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 1, wherein step 1: the alliance chain initialization stage is specifically as follows:

the CA server issues a digital certificate to the initial node of the alliance chain, all participants establish connection, and some initial consensus is achieved: a. unifying the standard established by the data set of the people; b. unifying the standards of the model transaction fee, wherein the transaction fee increases along with the perfection degree of the model; c. and unifying the selection rules of the main node and the endorsement node, sequentially and circularly selecting the main node according to the order of the node ids from small to large, wherein m nodes after the main node id are endorsement nodes.

3. The method of operation of a blockchain technology based secure, privacy preserving, tradable, distributed machine learning framework of claim 2, wherein step 2: the parameter initialization stage is as follows: in the parameter initialization stage, all user nodes reach the consistency consensus of the neural network model, including determining the network structure of the neural network model, the batch size B, the training iteration times T and the learning rate eta _t Initial weight w ₀ Cutting threshold value is C, noise size sigma parameter, at the same time, block chain link point transmits data set standard to data provider, data provider collects training set and uploads it to block chain node, when neural networkAfter the model and the data set are prepared, all user nodes contribute to the test set, the test set data of the system are unified, and then the whole system starts the training of the neural network model.

4. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 3: the local gradient calculation stage is specifically as follows:

firstly, determining a main node and an endorsement node by all user nodes in a block chain, if the id of the main node is i, the id of the endorsement node is i+1, i+2, …, i+m, then each node obtains a local gradient according to a data set and a current model, gaussian noise is added to the local gradient to enable the local gradient to meet a differential privacy mechanism, and finally, the local gradient is sent to the main node and the endorsement node;

the specific calculation process is as follows: assume that in the t-th iteration, the B training data sets acquired in the kth node areThe global model weight is w _t The clipping threshold is C, and the noise size sigma;

Wherein the model prediction result is thatl () is a loss function;

5. The method of operation of a secure, privacy preserving, tradable, distributed machine learning framework based on blockchain technology of claim 2, wherein step 4: the global model updating stage is specifically as follows: after the local gradients of all nodes are received by the main node, a gradient aggregation algorithm with Bayesian fault tolerance is operated to aggregate the local gradients to obtain global gradients and update a model, meanwhile, a moments accountant method is adopted to track privacy loss, and then, an IPBFT consensus algorithm is operated by the system: the master node will write the aggregate calculation result into the block _t In (2) then block _t Sending the block to an endorsement node for verification, and if the block passes the verification _t Broadcast to all blockchain nodes and the block is successfully added to the blockchain.