CN113822758B - Self-adaptive distributed machine learning method based on blockchain and privacy - Google Patents
Self-adaptive distributed machine learning method based on blockchain and privacy Download PDFInfo
- Publication number
- CN113822758B CN113822758B CN202110889794.1A CN202110889794A CN113822758B CN 113822758 B CN113822758 B CN 113822758B CN 202110889794 A CN202110889794 A CN 202110889794A CN 113822758 B CN113822758 B CN 113822758B
- Authority
- CN
- China
- Prior art keywords
- node
- local
- global
- parameters
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 118
- 230000008569 process Effects 0.000 claims abstract description 99
- 238000004220 aggregation Methods 0.000 claims abstract description 39
- 230000002776 aggregation Effects 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims abstract description 20
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 238000013468 resource allocation Methods 0.000 claims abstract description 13
- 238000005265 energy consumption Methods 0.000 claims abstract description 9
- 230000003044 adaptive effect Effects 0.000 claims abstract description 7
- 230000003993 interaction Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 24
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 238000002360 preparation method Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000012417 linear regression Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 2
- 238000006116 polymerization reaction Methods 0.000 claims description 2
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 238000004088 simulation Methods 0.000 abstract description 7
- 230000008859 change Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Finance (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a self-adaptive distributed machine learning method based on block chains and privacy, which comprises the following steps: and establishing a block chain-based distributed machine learning system model with privacy protection, and completing the interaction process between nodes according to block chain consensus. By analyzing the calculation complexity of the local node in the training process and the consensus process in detail, the energy consumption is considered to carry out an optimization method for calculating resource allocation, so that an adaptive aggregation method based on resource allocation optimization is provided. Simulation results show that the technical method of the invention carries out a training process with privacy protection between nodes based on distributed consensus, on one hand optimizes the computing resource allocation on the nodes under the constraint of energy consumption, and on the other hand, self-adaptively adjusts the global aggregation frequency, thereby improving the utilization rate of the total energy of the system and further improving the convergence performance of the distributed learning process.
Description
Technical Field
The invention belongs to the relevant fields of aggregation frequency and resource allocation in distributed machine learning, in particular to a computing resource optimizing method in distributed machine learning based on block chain consensus and privacy protection, and further relates to a self-adaptive aggregation method based on computing resource allocation optimization.
Background
Currently, people and internet devices are producing data that was not available. Machine learning is an important component of artificial intelligence as a method of data analysis from which decisions can be learned, identified and made. In order to fully exploit the value of the data, the most straightforward approach is to collect and store the data in a central server and then to perform centralized processing. However, data is typically generated by multiple parties and stored in a geographically distributed manner, making it difficult to collect large-scale geographically distributed data in a single data store. As a result, distributed machine learning is receiving increasing attention as an alternative solution to the central architecture, namely to distribute the learning workload to the data owners.
Although distributed machine learning can learn without sharing data, interactions and messaging between decentralized local nodes (datasets) still compromise the security and privacy of the data. In addition, each local update and global aggregation consumes computing resources of the network. The amount of resources consumed may vary over time and there is a complex relationship between resource allocation, frequency of global aggregation, and convergence performance of the model.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on block chains and privacy. The method combines an energy consumption formula to give out an optimization strategy of distributed node computing resource allocation while protecting privacy and ensuring safety in the distributed machine learning process, and continuously adjusts the frequency of global aggregation under the condition of fixed system energy, thereby improving the utilization rate of system energy to the maximum extent.
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on block chains and privacy. The method protects privacy and ensures safety in the distributed machine learning process, combines energy consumption to give out an optimization strategy of distributed node computing resource allocation, and realizes that the system continuously adjusts the frequency of global aggregation, thereby improving the utilization rate of system energy to the maximum extent and obtaining the optimal learning effect.
In order to solve the problems, the invention adopts the following technical scheme:
an adaptive distributed machine learning method based on blockchain and privacy includes the steps of:
step 1, establishing a distributed machine learning system model with privacy protection based on block chain
The computing party C and the participator P construct a distributed environment among nodes by means of a blockchain network, a local updating and global aggregation process of distributed machine learning is completed by means of linear regression and gradient descent, and partial homomorphic encryption technology is introduced to protect model parameters in a training process; and introducing a consensus process to verify the correctness of the model parameters. Finally, each distributed node interacts by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process.
Step 2, combining intelligent contracts among nodes to complete distributed consensus process
In order to ensure the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the computing party and the participator, and the distributed consensus process comprises ELW, ELP, EGW, EGP, CGP five transaction processes. The training parameters are transferred in the form of transactions by means of smart contracts and recorded in blocks.
Step 3, performance analysis of training process and consensus process
Step 3.1, training procedure
Step 3.2, consensus Process
Step 4, self-adaptive global aggregation based on resource allocation optimization
Two processes are involved in the blockchain and privacy protection based distributed machine learning system model: a distributed machine learning process and a blockchain consensus process. Under the system model, we consider n+1 nodes, including N local nodes, representing participants, one computing node, representing a computing party; the computing power of each node is denoted as f i (CPU cycles per second).
In addition, mu 1 Represents the average CPU cycles, μ required to complete one-step ciphertext computation 2 Representing the average CPU cycles required to complete a one-step plaintext calculation. Under PBFT consensus, at most f=3 exists -1 (N-1) failure nodes, each node generating or verifying a signature and generating or verifying a MAC requiring beta and theta CPU cycles, respectively, by driving the computational tasks required for intelligent contract verificationAlpha CPU cycles are required.
According to the invention, a distributed machine learning model with privacy protection based on a block chain is established, the calculation complexity of each node in different processes is analyzed in detail, the resource allocation optimization of the nodes is carried out by combining an energy consumption formula, and meanwhile, the constraint condition of an optimization function is formulated by introducing the energy formula, so that the final objective function of self-adaptive aggregation under the energy allocation optimization is provided.
Simulation results show that the proposed algorithm has better performance than the traditional algorithm (aggregation interval τ is fixed and the average allocation of resources is calculated).
Drawings
FIG. 1 is a system model;
FIG. 2 is a flow chart of a PBFT consensus protocol;
fig. 3 shows the change trend of the loss function value (n=3, 4, 5) with the change of the total energy of the system.
Fig. 4 shows a change trend of the loss function value according to the number of nodes (e=0.5×10 6 、1.5×10 5 ,τ=10)。
Detailed Description
The invention is further described below with reference to the drawings and examples.
Step 1, establishing a distributed machine learning system model with privacy protection based on block chain
FIG. 1 illustrates a system model of the present invention, then a blockchain-based distributed machine learning process with privacy protection can be described as: the local nodes and the computing nodes are deployed in the blockchain to form a safe distributed environment, the local nodes are responsible for local updating tasks of the training process, the computing nodes are responsible for global aggregation tasks of the training process, and each local node uses a gradient descent algorithm to complete the learning process of linear regression. In order to ensure the privacy of model parameters, homomorphic encryption technology is introduced in the training process, and each node completes the update task of each parameter in a ciphertext state by utilizing the homomorphic property of the homomorphic encryption technology. In addition, in order to ensure the credibility of the training process, a distributed consensus is introduced between nodes based on a blockchain network during global aggregation, so that ciphertext model parameters are transmitted and updated between nodes in a transaction mode by means of intelligent contracts.
Input vector x in machine learning model j And output y j In the above, the best fit equation for linear regression can be expressed as:
y j =w 0 +w 1 x j,1 +w 2 x j,2 +......=w T x j
its corresponding loss function F (w) is the mean square error, with the aim of solving the optimal parameter w that minimizes F (w).
In a distributed network, a node set is made up of a party P and a party C, i.e., i= { P, C }, i=n+1. Representing a set of N participants as p= { P 1 ,P 2 ,...P N },P i (i=1, 2,., N) represents a participant in possession of the sub-data set, D i Representing party P i Owned subdata set, then the total data set is represented as d= { D 1 ,D 2 ,...,D N }。 (x ij ,y ij )∈D i For D i The j-th data in (a).
In the system model proposed by the present invention, local updating occurs in each iteration process, i.e., t=1, 2, &..; global aggregation only occurs when iterative process t=Γτ, Γ=1, 2.
In addition, the computing party C holds a key pair for protecting the model parameters, so that encryption and decryption operations are carried out on the model parameters at any time in the operation process, but the party P always holds only the ciphertext model parameters, and the homomorphic encryption-based distributed machine learning process can be described as follows:
1. issuing global parameters: c, issuing ciphertext parameters after each global aggregationThe participants only see the ciphertext and cannot learn w g And (t) ensuring the privacy of the global model parameters. To->Representing possible globally aggregated model parameters, the interaction of local and global parameters may be described as:
2. local parameter updating: the participators complete the local updating process under the ciphertext state according to the homomorphic property of the homomorphic encryption algorithm. For the i-th participant, the local parameter update procedure may be expressed as:
wherein ,wi,k (t) represents the local model parameters w i The kth element in (t),representing P i The local gradient calculated over the iteration number t is defined as a single data sample (x ij ,y ij ) Gradient of->And is x ij,k Representing the kth element within the input vector, then:
3. global parameter updating: p (P) i At each run τPost-update commit of secondary local updatesC obtain->And then, carrying out global aggregation on the local parameters in a ciphertext state according to the following formula, and updating the global parameters.
Step 2, combining intelligent contracts among nodes to complete distributed consensus process
In the distributed machine learning model based on block chain and privacy protection, a PBFT consensus protocol is utilized to form distributed consensus among nodes, so that the reliability of the learning process is ensured, and the correctness of model parameters is confirmed. The model parameters are subjected to updating and interaction processes among nodes in a transaction mode through intelligent contracts, and uplink authentication is carried out. The workflow of the PBFT consensus protocol is shown in fig. 2.
The consensus process provided by the invention comprises five transaction types, namely: the method comprises the steps of issuing a transaction (EGW) by a computing party on a ciphertext of a global parameter, feeding back the transaction (ELW) by a participant on the ciphertext of a local parameter, submitting the transaction (ELP) by the participant on the ciphertext of the local parameter of the local variable intermediate parameter calculated by the participant, calculating the ciphertext transaction (EGP) by the computing party according to the local parameter and the global parameter, and calculating the plaintext transaction (CGP) by the computing party according to the decrypted ELP and the EGP to obtain the optimized parameter. In the consensus process, intelligent contracts are adopted to drive transactions and carry out block verification and chaining. The local parameter updating calculation is carried out on the local data set in the ciphertext state immediately after the participant receives the EGW transaction, the local parameter updating is carried out before aggregation, and the ELW transaction and the ELP transaction are submitted after completion; and the computing party carries out ciphertext operation on each received ELW transaction to obtain an EGW transaction, further obtains an EGP transaction in a ciphertext state, and then decrypts the ELP transaction and the EGP transaction, and further obtains a CGP transaction after operation.
In the transaction process, the computing party acts as a master node to divide the transaction into a block and performs consensus verification, namely, each participant serving as a slave node verifies the transaction process according to the public key, and the transaction process comprises signature, MAC and signature submitted by each nodeIs a calculation relation of (a).
Step 3, performance analysis of training process and consensus process
In the system model provided by the invention, the participant node and the calculator node finish local updating and global aggregation in the training process by utilizing the blockchain network together, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters. In the training process and the consensus process, five transaction types are contained, and the corresponding relation of the above contents is shown in the following table:
the performance of the training process and the consensus process is respectively analyzed, and the method comprises the following steps:
step 3.1 training procedure
For the training process, it consists of local updates and global aggregations, and contains five transactions driven by smart contracts. The computational cost and computational time measured in terms of algorithm complexity correspond to the computational process of each transaction:
(1) local update: local node P i' (I '∈i, I' =1,., N) updating the local ciphertext parameters according to the global ciphertext parameters issued by the computing nodes, and delivering in the blockchain in the form of an ELW transaction. In the local update step, the computation cost is O (|w| (2|D) i' I+1)), then P i' Is calculated at a cost of (a)Calculation time +.>Is that
Wherein, byl=1, 2 represents the computational power obtained by the node during training, +.>
(2) Global aggregation: first, the local node P i' (I '∈i, I' =1,., N) calculating an intermediate parameter ELP transaction for obtaining a local ciphertext variable using the updated local ciphertext parameter at the cost of O (|w||d i' |) is provided; then, compute node C (c=i "∈i, I" =n+1) gathers the data from P i' Updating global model parameters EGW transaction under ciphertext state, wherein the calculation cost is O (N|w|); at the same time, the computing node collects ELP transactions from the participants and updates the intermediate parameter EGP transactions for computing global model variables in the ciphertext state at the cost of O (Σ i' (2|D i' |+|w||D i' I)), is set at the right angle; finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, in order to obtain parameters that are ultimately used to optimize the modelAnd rho, delta, collecting ELP transactions from the participants by the computing nodes, meanwhile, decrypting the ciphertext parameters by using a private key in combination with the EGP transactions, and computing the optimization parameters in a plaintext state, wherein the computing cost is O (N). In the global aggregation step, P i' Is->Calculation time +.>Is that
C is the calculation costCalculation time +.>Is that
Step 3.2 consensus procedure
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(1) RequestPre-preparation: the calculator C acts as the master node i ', (i' =n+1), verifies the signatures and MACs of all transactions within the aggregate while packing the transactions into a new chunk. In this step, the cost is calculatedAnd calculating timeIs that
Wherein, bys=1, …,5, representing the computational power of the node in the consensus process, +.>
(2) Pre-preparation: the local node receives the new block with the Pre-preparation message as verification node i "+.i', (i" = 1,., N), first verifies the signature and MAC of the block, then verifies the signature and MAC of each transaction, and finally verifies the results according to the transaction calculation in the intelligent contract. In this step, the cost is calculatedAnd calculating timeIs that
(3) Prepore Commit: each node receives and examines the Prewire message to ensure that it is consistent with the Pre-Prewire message. When 2f Prepare messages are received from other nodes, the node will send a command message to all other nodes. In this step, the cost is calculatedAnd calculate time +.>Is that
(4) Commit Reply: each node receives and examines the wait message to ensure that it is consistent with the Prepare message. Once the node receives 2f Commit messages from other nodes, it will pass the Reply message to the master node. In this step, the cost is calculatedAnd calculate time +.>Is that
(5) Reply is added to the chain: the master node receives and examines the Reply message. When the master node receives 2f Reply messages, the new block will take effect and be added to the blockchain. In this step, the cost is calculatedAnd calculate time +.>Is that
Step 4, self-adaptive global aggregation based on resource allocation optimization
The invention obtains model parameters w (T, tau) by means of global aggregation at intervals tau during the distributed learning of T-round iteration times, and introduces an ideal loss function F (w * )(w * Representing ideal model parameters that are available based on the full data training), the objective equivalent of minimizing the achievable loss function is:
the objective function may be initially defined as follows:
s.t.C1:F e (f,T,τ)≤E
c1 limits total energy consumption; c2, C3 limit computational resources; c4 limits training time; c5 limits consensus time; e is the total energy provided by the system; t (T) time To provide a time limit. Constraints C4 and C5 will keep the training process and consensus process synchronized.
For energy consumption, the training process can be expressed asCan be expressed as in the consensus processWhere γ is a constant related to the hardware architecture; />And is also provided withThis represents whether node I (I e I) participates in each process. In the present invention, delta i' =[0,1,1,0,1,1,1]Representing participation of the computing nodes in the training and consensus process; delta i”≠i' =[1,1,0,1,1,1,0]Representing the participation of the local node in the training and consensus process. Thus, the energy cost of the system is expressed as
wherein ,representing the overall computing resource. />Representing local update proceduresThe energy costs generated; />Representing the energy costs incurred in the global polymerization process.
In addition, each parameter satisfies the following condition:
1)||F i (w)-F i (w')||≤ρ||w-w'||
2)
3)
4)F(w(T,τ)-F(w * ))≥ε
5)
6)
7)
the objective function is set to:
due to denominatorSince the value is constant, the optimum value of T is established when equation C1 takes the equal sign. Will->Substituting, the objective function can be rewritten as:
and finally, solving by using a convex optimization function algorithm.
The setting of simulation parameters and simulation results and analysis are given below:
the MATLAB is utilized for simulation, and a system model is established.
The present invention uses the boston room price dataset (Boston House Price Dataset) to experiment and analyze the results of the proposed algorithm. Some parameters in the simulation process are set to l=2, s=5, η=1×10 -6 ,μ 1 =0.1M cycles,μ 2 =0.05M cycles,/>α=0.2M cycles,β=0.8M cycles,θ=0.005M cycles,γ=1×10 -5 ,T time =300s。
Simulation fig. 4 shows the trend of the loss function value with the total energy of the system when the number N of local nodes is 3, 4, and 5, respectively. The figure shows that as the total energy of the system increases, the loss function value decreases, and the smaller the number of local nodes, the smaller the loss function value, with the same system energy. In addition, the smaller the number of local nodes in the process of the total energy change of the system, the smaller the total energy of the system required by the loss function value to reach convergence.
Simulated when the total energy of the system is e=0.5×10 respectively 5 ,E=1.5×10 5 And the loss function value changes trend along with the change of the number of the local nodes. The figure shows that as the number of local nodes participating in the distributed machine learning process increases, so does the loss function. Compared with the traditional algorithm (average distribution of computing resources, tau=10), on one hand, the invention reasonably distributes computing resources by analyzing the computing cost generated by each node in the transaction process, thereby fully utilizing the total energy of the system, and leading the difference between the loss function value under the traditional algorithm and the loss function value under the optimization algorithm to be larger when the total energy of the system is smaller (namely, the system energy is insufficient), compared with the difference, the performance of the provided resource distribution algorithm is effectively improved; on the other hand, the utilization optimization parameters are based onThe resource allocation algorithm continuously adjusts the τ value such that the smaller the loss function value obtained at the same number of nodes at the same system energy.
The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this invention will occur to those skilled in the art, and are intended to be within the spirit and scope of the invention.
Claims (5)
1. An adaptive distributed machine learning method based on blockchain and privacy, comprising the steps of:
step 1, establishing a distributed machine learning system model with privacy protection based on block chain
The computing party C and the participator P construct a distributed environment among nodes by means of a blockchain network, a local updating and global aggregation process of distributed machine learning is completed by means of linear regression and gradient descent, and partial homomorphic encryption technology is introduced to protect model parameters in a training process; introducing a consensus process, and verifying the correctness of model parameters; finally, each distributed node interacts by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process;
step 2, combining intelligent contracts among nodes to complete distributed consensus process
In order to ensure the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the computing party and the participator, and comprises ELW, ELP, EGW, EGP five transaction processes; the training parameters are transmitted in the form of transactions by means of intelligent contracts and recorded in blocks;
step 3, performance analysis of training process and consensus process
In the system model, a participant node and a calculator node finish local updating and global aggregation in a training process by utilizing a blockchain network together, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters; in the training process and the consensus process, five transaction types are contained, and the corresponding relation of the above contents is shown in the following table:
the performance of the training process and the consensus process is respectively analyzed, and the method comprises the following steps:
step 3.1 training procedure
For the training process, it consists of local updates and global aggregations, and contains five transactions driven by smart contracts; the computational cost and computational time measured in terms of algorithm complexity correspond to the computational process of each transaction:
(1) local update: local node P i' (I '∈i, I' =1,., N) updating local ciphertext parameters according to global ciphertext parameters issued by the computing nodes, and delivering in the blockchain in the form of ELW transactions; in the local update step, the computation cost is O (|w| (2|D) i' I+1)), then P i' Is calculated at a cost of (a)Calculation time +.>Is that
Wherein, byRepresenting the computing power gained by the node during training, < >>
(2) Global aggregation: first, the local node P i' (I '∈i, I' =1,., N) calculating an intermediate parameter ELP transaction for obtaining a local ciphertext variable using the updated local ciphertext parameter at the cost of O (|w||d i' |) is provided; then, compute node C (c=i "∈i, I" =n+1) gathers the data from P i' Updating global model parameters EGW transaction under ciphertext state, wherein the calculation cost is O (N|w|); at the same time, the computing node collects ELP transactions from the participants and updates the intermediate parameter EGP transactions for computing global model variables in the ciphertext state at the cost of O (Σ i' (2|D i' |+|w||D i' I)), is set at the right angle; finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, in order to obtain parameters that are ultimately used to optimize the modelRho, delta, the computing node collects ELP transaction from the participators, meanwhile, the secret key is used for decrypting the ciphertext parameters in combination with the EGP transaction, the optimization parameters are calculated in a plaintext state, and the calculation cost is O (N); in the global aggregation step, P i' Is->Calculation time +.>Is that
C is the calculation costCalculation time +.>Is that
Step 3.2 consensus procedure
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(1) request Pre-preparation: the calculator C serves as a master node i ', i' =N+1, verifies the signatures and the MAC of all transactions in the aggregation, and packages the transactions into a new block; in this step, the cost is calculatedAnd calculate time +.>Is that
Beta represents the number of CPU cycles required by the node to generate a signature MAC, theta represents the number of CPU cycles required by the node to verify a signature MAC, and alpha represents the number of CPU cycles required by the task to verify calculation through driving an intelligent contract; wherein, byRepresenting the computational power of the nodes in the consensus process,/-, for example>
(2) Pre-preparation: the local node is used as a verification node i ' -i ', i ' -1, & gt, N, a new block with a Pre-preparation message is received, firstly, the signature and the MAC of the block are verified, then the signature and the MAC of each transaction are verified, and finally, the verification result is verified according to a transaction calculation mode in an intelligent contract; in this step, the cost is calculatedAnd calculate time +.>Is that
(3) Prepore Commit: each node receives and examines the Prewire message to ensure that it is consistent with the Prewire message; when receiving 2f Prepaper messages from other nodes, the node sends a Commit message to all other nodes; in this step, the cost is calculatedAnd calculate time +.>Is that
(4) Commit Reply: each node receives and examines the Commit message to ensure that it is consistent with the Precure message; once the node receives 2f Commit messages from other nodes, the node will transmit Reply messages to the master node; in this step, the cost is calculatedAnd calculate time +.>Is that
(5) Reply is added to the chain: the master node receives and examines the Reply message; when the master node receives 2f Reply messages, the new block will take effect and be added into the blockchain; in this step, the cost is calculatedAnd calculate time +.>Is that
Step 4, self-adaptive global aggregation based on resource allocation optimization
Model parameters obtained by means of global aggregation at intervals tau during the distributed learning of the number of iterations of the T-round are w (T, tau), and an ideal loss function F (w * ),w * Representing ideal model parameters that are available based on full data training, the goal of minimizing the achievable loss function is equivalent to:
and solving by using a convex optimization function algorithm.
2. The adaptive distributed machine learning method based on blockchain and privacy as in claim 1, wherein in the distributed network, the node set is composed of a participant P and a calculator C, i= { P, C }, |i|=n+1; representing a set of N participants as p= { P 1 ,P 2 ,...P N },P i (i=1, 2,., N) represents a participant in possession of the sub-data set, D i Representing party P i Owned subdata set, then the total data set is represented as d= { D 1 ,D 2 ,...,D N };(x ij ,y ij )∈D i For D i The j-th data in (a); the system model comprises three steps of global parameter issuing, local gradient updating and global parameter updating, wherein the local updating occurs in each iteration process, namely t=1, 2. Global aggregation only occurs when iterative process t=Γτ, Γ=1, 2.
3. The blockchain and privacy-based adaptive distributed machine learning method of claim 2, wherein the homomorphic encryption-based distributed machine learning process can be described as:
1) Issuing global parameters: c, issuing ciphertext parameters after each global aggregationThe participants only see the ciphertext and cannot learn w g (t) ensuring the privacy of global model parameters; to->Representing possible globally aggregated model parameters, the interaction of local and global parameters may be described as:
2) Local parameter updating: the participators complete the local updating process under the ciphertext state according to the homomorphic property of the homomorphic encryption algorithm; for the i-th participant, the local parameter update procedure may be expressed as:
wherein ,wi,k (t) represents the local model parameters w i The kth element in (t),representing P i The local gradient calculated over the iteration number t is defined as a single data sample (x ij ,y ij ) Gradient of->And is x ij,k Representing the kth element within the input vector, then:
3) Global parameter updating: p (P) i Commit after every τ local updatesC obtain->Then, global aggregation is carried out on the local parameters in a ciphertext state according to the following formula, and the global parameters are updated;
4. the blockchain and privacy-based adaptive distributed machine learning method of claim 1, wherein the blockchain-based verifiable computing system model includes two processes: a distributed machine learning process and a blockchain consensus process; under the system model, consider n+1 nodes, including N local nodes, representing participants, one computing node, representing a computing party; the computing power of each node is denoted as f i A representation;
in addition, mu 1 Represents the average CPU cycles, μ required to complete one-step ciphertext computation 2 Representing the average CPU cycles required to complete a one-step plaintext calculation; under PBFT consensus, at most f=3 exists -1 (N-1) failed nodes, each node requiring β and θ CPU cycles to generate or verify a signature and to generate or verify a MAC, respectively, and α CPU cycles to pass the computational tasks required to drive smart contract verification.
5. The blockchain and privacy-based adaptive distributed machine learning method of claim 1, wherein in the set objective function, C1 limits total energy consumption; c2, C3 limit computational resources; c4 limits training time; c5 limits consensus time; e is the total energy provided by the system; t (T) time A time limit provided; constraints C4 and C5 will keep the training process and consensus process synchronized;
for energy consumption, the training process can be expressed asRepresented in the consensus process asWhere γ is a constant related to the hardware architecture; />And is also provided withRepresenting whether node I (I e I) participates in each process; delta i' =[0,1,1,0,1,1,1]Representing participation of a computing node in a process v; delta i”≠i' =[1,1,0,1,1,1,0]Representing participation of the local node in the process v; the energy cost of the system is expressed as
wherein ,representing overall computing resources; />Representing energy costs generated during the local update process; />Representing the energy costs incurred in the global polymerization process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110889794.1A CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110889794.1A CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822758A CN113822758A (en) | 2021-12-21 |
CN113822758B true CN113822758B (en) | 2023-10-13 |
Family
ID=78912826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110889794.1A Active CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822758B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114915429B (en) * | 2022-07-19 | 2022-10-11 | 北京邮电大学 | Communication perception calculation integrated network distributed credible perception method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111800274A (en) * | 2020-07-03 | 2020-10-20 | 北京工业大学 | Verifiable calculation energy consumption optimization method based on block chain |
CN111915294A (en) * | 2020-06-03 | 2020-11-10 | 东南大学 | Safety, privacy protection and tradable distributed machine learning framework based on block chain technology |
CN113114496A (en) * | 2021-04-06 | 2021-07-13 | 北京工业大学 | Block chain expandability problem solution based on fragmentation technology |
-
2021
- 2021-08-04 CN CN202110889794.1A patent/CN113822758B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111915294A (en) * | 2020-06-03 | 2020-11-10 | 东南大学 | Safety, privacy protection and tradable distributed machine learning framework based on block chain technology |
CN111800274A (en) * | 2020-07-03 | 2020-10-20 | 北京工业大学 | Verifiable calculation energy consumption optimization method based on block chain |
CN113114496A (en) * | 2021-04-06 | 2021-07-13 | 北京工业大学 | Block chain expandability problem solution based on fragmentation technology |
Also Published As
Publication number | Publication date |
---|---|
CN113822758A (en) | 2021-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Reliable and privacy-preserving truth discovery for mobile crowdsensing systems | |
Bogdanov et al. | Sharemind: A framework for fast privacy-preserving computations | |
Hu et al. | Achieving privacy-preserving and verifiable support vector machine training in the cloud | |
Wang et al. | Enhancing privacy preservation and trustworthiness for decentralized federated learning | |
Backes et al. | Asynchronous MPC with a strict honest majority using non-equivocation | |
CN108737116B (en) | Voting protocol method based on d-dimensional three-quantum entangled state | |
El Kassem et al. | More efficient, provably-secure direct anonymous attestation from lattices | |
CN113822758B (en) | Self-adaptive distributed machine learning method based on blockchain and privacy | |
Dou et al. | A distributed trust evaluation protocol with privacy protection for intercloud | |
Zhao et al. | Fuzzy identity-based dynamic auditing of big data on cloud storage | |
CN115733607A (en) | Block chain-based Pedersen secret sharing multi-party aggregation access control method | |
CN102301643A (en) | Management of cryptographic credentials in data processing systems | |
Xu et al. | A blockchain-based federated learning scheme for data sharing in industrial internet of things | |
Zhang et al. | A verifiable and privacy-preserving cloud mining pool selection scheme in blockchain of things | |
CN115828302B (en) | Micro-grid-connected control privacy protection method based on trusted privacy calculation | |
Jivanyan et al. | Hierarchical one-out-of-many proofs with applications to blockchain privacy and ring signatures | |
Zhou et al. | Efficient construction of verifiable timed signatures and its application in scalable payments | |
Tso | Two-in-one oblivious signatures | |
CN113541963B (en) | TEE-based extensible secure multiparty computing method and system | |
Akhter et al. | Privacy-preserving two-party k-means clustering in malicious model | |
CN113806764B (en) | Distributed support vector machine based on blockchain and privacy protection and optimization method thereof | |
Ma et al. | Do not perturb me: A secure byzantine-robust mechanism for machine learning in IoT | |
Wüller | Privacy-preserving electronic bartering | |
Su et al. | Efficient and flexible multiauthority attribute-based authentication for IoT devices | |
Mundele et al. | Polynomial Commitment-Based Zero-Knowledge Proof Schemes: A Brief Review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |