CN113822758A - Self-adaptive distributed machine learning method based on block chain and privacy - Google Patents
Self-adaptive distributed machine learning method based on block chain and privacy Download PDFInfo
- Publication number
- CN113822758A CN113822758A CN202110889794.1A CN202110889794A CN113822758A CN 113822758 A CN113822758 A CN 113822758A CN 202110889794 A CN202110889794 A CN 202110889794A CN 113822758 A CN113822758 A CN 113822758A
- Authority
- CN
- China
- Prior art keywords
- local
- node
- parameters
- global
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 123
- 230000008569 process Effects 0.000 claims abstract description 103
- 238000004220 aggregation Methods 0.000 claims abstract description 40
- 230000002776 aggregation Effects 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000005457 optimization Methods 0.000 claims abstract description 16
- 238000013468 resource allocation Methods 0.000 claims abstract description 12
- 238000005265 energy consumption Methods 0.000 claims abstract description 9
- 230000003993 interaction Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 24
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000012417 linear regression Methods 0.000 claims description 4
- 238000006116 polymerization reaction Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 238000012804 iterative process Methods 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 8
- 230000007423 decrease Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101100518501 Mus musculus Spp1 gene Proteins 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Finance (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a self-adaptive distributed machine learning method based on a block chain and privacy, which comprises the following steps: and establishing a distributed machine learning system model with privacy protection based on the block chain, and completing an interaction process between nodes according to block chain consensus. By analyzing the computational complexity of the local nodes in the training process and the consensus process in detail and considering the energy consumption, the optimization method for computing resource allocation is provided, so that the self-adaptive aggregation method based on resource allocation optimization is provided. Simulation results show that the technical method provided by the invention performs a training process with privacy protection among nodes based on distributed consensus, optimizes the distribution of computing resources on the nodes under the constraint of energy consumption on one hand, and adaptively adjusts the global aggregation frequency on the other hand, so that the utilization rate of the total energy of the system is improved, and the convergence performance of the distributed learning process is further improved.
Description
Technical Field
The invention belongs to the related field of aggregation frequency and resource allocation in distributed machine learning, and particularly relates to a calculation resource optimization method in distributed machine learning based on block chain consensus and privacy protection, and further relates to a self-adaptive aggregation method based on calculation resource allocation optimization.
Background
Currently, people and internet devices are generating unprecedented data. Machine learning is an important component of artificial intelligence as a method of data analysis, where data can be learned, identified, and decision-making. In order to fully mine the value of the data, the most direct method is to collect and store the data in a central server and then perform centralized processing. However, data is typically generated by multiple parties and stored in a geographically distributed manner, making it difficult to collect large-scale geographically distributed data in a single data store. Therefore, distributed machine learning is receiving increasing attention as an alternative solution to the central structure, i.e. to distribute the learning workload to the data owners.
Although distributed machine learning can learn without sharing data, interactions and messaging between scattered local nodes (data sets) can compromise data security and privacy. In addition, each local update and global aggregation consumes computational resources of the network. The amount of resources consumed may vary over time and there is a complex relationship between resource allocation, frequency of global aggregation and convergence performance of the model.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on a block chain and privacy. According to the method, privacy is protected and safety is guaranteed in the process of distributed machine learning, meanwhile, an optimization strategy of distributed node calculation resource allocation is given by combining an energy consumption formula, the frequency of global aggregation is continuously adjusted under the condition that the system energy is fixed, and therefore the utilization rate of the system energy is improved to the maximum extent.
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on a block chain and privacy. According to the method, while privacy is protected and safety is guaranteed in the process of distributed machine learning, an optimization strategy of distributed node computing resource allocation is given in combination with energy consumption, and the frequency of global aggregation is continuously adjusted by the system, so that the utilization rate of system energy is improved to the maximum extent, and the optimal learning effect is obtained.
In order to solve the problems, the invention adopts the following technical scheme:
a block chain and privacy based adaptive distributed machine learning method comprises the following steps:
The method comprises the following steps that a calculator C and a participant P construct a distributed environment among nodes by means of a block chain network, local updating and global aggregation processes of distributed machine learning are completed by utilizing linear regression and gradient descent, a partial homomorphic encryption technology is introduced, and model parameters in a training process are protected; and introducing a consensus process to verify the correctness of the model parameters. Finally, each distributed node is interacted by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process.
In order to guarantee the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the calculating party and the participating party, wherein the distributed consensus process comprises five transaction processes of ELW, ELP, EGW, EGP and CGP. The training parameters are communicated in the form of transactions by means of smart contracts and recorded in blocks.
Step 3.1, training procedure
Step 3.2 consensus Process
Two processes are included in the distributed machine learning system model based on block chains and privacy protection: a distributed machine learning process and a blockchain consensus process. Under the system model, N +1 nodes are considered, wherein the nodes comprise N local nodes which represent participants and a computing node which represents a calculator; computing power of each node by fi(CPU cycles per second).
In addition, in μ1Represents the average CPU cycles, mu, required to complete the one-step ciphertext computation2Representing the average CPU cycles needed to complete a one-step plaintext calculation. Under the common recognition of PBFT, f is 3 at most-1And (N-1) fault nodes, wherein each node generates or verifies a signature and a MAC (media access control) respectively needs beta CPU cycles and theta CPU cycles, and alpha CPU cycles are needed for computing tasks required by driving intelligent contract verification.
According to the method, the calculation complexity of each node in different processes is analyzed in detail by establishing a block chain-based distributed machine learning model with privacy protection, resource allocation optimization of the nodes is carried out by combining an energy consumption formula, and meanwhile, an energy formula is introduced to formulate a constraint condition of an optimization function, so that a final objective function of self-adaptive aggregation under energy allocation optimization is given.
Simulation results show that compared with the traditional algorithm (the aggregation interval tau is fixed and the computing resources are evenly distributed), the proposed algorithm has better performance.
Drawings
FIG. 1 is a system model;
FIG. 2 is a flow chart of a PBFT consensus protocol;
fig. 3 shows the variation trend of the loss function value (N ═ 3, 4, and 5) according to the total energy of the system.
Fig. 4 shows the variation trend of the loss function value (E ═ 0.5 × 10) according to the number of nodes6、1.5×105,τ=10)。
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 shows a system model of the present invention, then the block chain based distributed machine learning process with privacy protection can be described as: the method comprises the steps that local nodes and computing nodes are deployed in a block chain to form a safe distributed environment, the local nodes are responsible for local updating tasks in a training process, the computing nodes are responsible for global aggregation tasks in the training process, and each local node finishes a linear regression learning process by using a gradient descent algorithm. In order to ensure the privacy of the model parameters, a homomorphic encryption technology is introduced in the training process, and the homomorphic property of the homomorphic encryption technology is utilized to enable each node to complete the updating task of each parameter in a ciphertext state. In addition, in order to ensure the credibility of the training process, distributed consensus is introduced among nodes based on a block chain network during global aggregation, so that ciphertext model parameters are transmitted and updated among the nodes in a transaction form by means of an intelligent contract.
Input vector x in machine learning modeljAnd output yjAbove, the best fit equation for linear regression can be expressed as:
yj=w0+w1xj,1+w2xj,2+......=wTxj
its corresponding loss function f (w) is the mean square error, with the goal of solving for the optimum parameter w that minimizes f (w).
In a distributed network, a node set is composed of a participant P and a calculator C, i.e., I ═ { P, C }, | I | ═ N + 1. Representing the set of N participants as P ═ { P1,P2,...PN},Pi( i 1, 2.., N) represents a participant who owns a sub data set, DiRepresenting a participant PiOwned bySubdata set, then the total data set is denoted as D ═ D1,D2,...,DN}。 (xij,yij)∈DiIs DiThe jth data in (1).
In the system model provided by the invention, local updating occurs in each iteration process, namely T is 1, 2.. and T is the total iteration number of the system model; global aggregation occurs only when the iterative process T- Γ τ is 1, 2.., G being the total aggregation times of the system model, and T-G τ.
In addition, the calculator C holds a key pair for protecting the model parameters, so that the model parameters are encrypted and decrypted at any time in the operation process, but the participant P always only holds the ciphertext model parameters, and the distributed machine learning process based on homomorphic encryption can be described as follows:
1. and issuing global parameters: c issuing cipher text parameter after each global aggregationThe participant can only see the ciphertext and cannot know wgAnd (t), the privacy of the global model parameters is ensured. To be provided withRepresenting possible globally aggregated model parameters, the interaction process of the local parameters and the global parameters can be described as:
2. local parameter updating: and the participator completes the local updating process in the ciphertext state according to the homomorphism property of the homomorphic encryption algorithm. For the ith participant, the local parameter update procedure can be expressed as:
wherein ,wi,k(t) local model parameterswiThe k-th element in (t),represents PiThe local gradient, calculated over the number of iteration rounds t, is defined as a single data sample (x)ij,yij) Gradient of (2)Is summed with xij,kRepresenting the kth element in the input vector, then:
3. and global parameter updating: piCommit after every τ local updatesC obtainingThen, global aggregation is performed on the local parameters in a ciphertext state according to the following formula, and the global parameters are updated.
In the distributed machine learning model based on the block chain and privacy protection, a PBFT consensus protocol is utilized to form distributed consensus among nodes, so that the credibility of the learning process is ensured and the correctness of model parameters is confirmed. And the model parameters are enabled to complete updating and interaction processes among the nodes in a transaction form through an intelligent contract, and uplink authentication is carried out. The workflow of the PBFT consensus protocol is shown in fig. 2.
In the consensus process provided by the present invention, five transaction types are included, which are: the method comprises the following steps that a calculation party issues a transaction (EGW) to a ciphertext of a global parameter, a participant party feeds back the transaction (ELW) to a ciphertext of a local parameter, the participant party calculates a ciphertext submitting transaction (ELP) to a local variable intermediate parameter, the participant party calculates the local parameter to obtain a ciphertext transaction (EGP) of the global variable intermediate parameter according to the local parameter and the global parameter, and the calculation party calculates an optimized parameter plaintext transaction (CGP) according to the decrypted ELP and the EGP. And in the consensus process, intelligent contracts are adopted to drive transactions and block verification is carried out to enter a chain. After receiving the EGW transaction, the participating party immediately carries out local parameter updating calculation on the local data set in a ciphertext state and carries out local parameter updating before aggregation, and then submits the ELW transaction and the ELP transaction; and the calculator performs ciphertext operation on each received ELW transaction to obtain an EGW transaction, further obtains an EGP transaction in a ciphertext state, and then decrypts and operates the ELP transaction and the EGP transaction to further obtain a CGP transaction.
In the transaction process, the calculator acts as a main node to print the transaction into a block and carry out consensus verification, namely, each participant serving as a secondary node verifies the transaction process according to a public key, wherein the verification comprises the signature, MAC and MAC of the transaction submitted by each nodeThe operational relationship of (1).
In the system model provided by the invention, the participator node and the calculator node use the blockchain network to jointly complete local updating and global aggregation in the training process, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters. In the training process and the consensus process, five transaction types are included, and the correspondence of the above contents is shown in the following table:
respectively analyzing the performance of the training process and the consensus process, comprising the following steps:
step 3.1 training procedure
For the training process, it consists of local updates and global aggregation, and contains five transactions driven by smart contracts. Computational cost and computational time, measured in algorithm complexity, corresponding to the computational process of each transaction:
partial updating: local node Pi'And (I 'belongs to the I, I' is 1.., N) updating the local ciphertext parameters according to the global ciphertext parameters issued by the computing nodes, and transmitting the local ciphertext parameters in the block chain in an ELW transaction mode. In the local update step, the cost of computation is O (| w | (2| D)i'L +1)), then Pi'Is calculated cost ofAnd calculating the timeIs composed of
Wherein, in order 1,2 represents the computational power gained by the node during the training process,
global aggregation: first, a local node Pi'(I' e.i ═ 1.., N) using the updated local cipher text parameters to compute an intermediate parameter ELP transaction for obtaining local cipher text variables at the cost of O (| w | | D)i'I)); then, compute node C (C ═ I ∈ I, I ═ N +1) collects data from Pi'Updating the EGW transaction of the global model parameters in a ciphertext state, and calculating the cost to be O (N | w |); meanwhile, the computing node collects ELP transactions from the participants and updates the EGP transactions of the intermediate parameters for computing the global model variables in a ciphertext state, and the computation cost is O (sigma)i'(2|Di'|+|w||Di'|); finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, to obtain the parameters that will ultimately be used to optimize the modelRho and delta, the computing node collects ELP transactions from the participants, and simultaneously, in combination with the EGP transactions, the computing node decrypts the ciphertext parameters by using a private key and computes optimized parameters in a plaintext state, wherein the computation cost is O (N). In the global polymerization step, Pi'Is calculated cost ofAnd calculating the timeIs composed of
Step 3.2 consensus Process
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(ii) RequestPre-Prepare: the calculator C, acting as the master node i ', (i' ═ N +1), verifies the signatures and MACs of all transactions in the aggregation, and packs the transactions into a new block. In this step, the cost is calculatedAnd calculating timeIs composed of
Wherein, in orderAnd s is 1, …,5, which represents the computing power of the node in the consensus process,
(ii) Pre-prepare: the local node, as a verification node i ≠ i', (i ≠ 1., N), receives a new block with a Pre-prefix message, first verifies the signature and MAC of the block, then verifies the signature and MAC of each transaction, and finally verifies the result in a transaction calculation manner according to the verification in the intelligent contract. In this step, the cost is calculatedAnd calculating timeIs composed of
③ Prepare Commit: each node receives and checks the Prepare message to ensure that it is consistent with the Pre-Prepare message. When 2f Prepare messages are received from other nodes, the node sends a Commit message to all other nodes. In this step, the cost is calculatedAnd calculating timeIs composed of
Fourthly, Commit Reply: each node receives and checks the Commit message to ensure that it is consistent with the Prepare message. Once the node receives 2f Commit messages from other nodes, it will pass on the Reply message to the master node. In this step, the cost is calculatedAnd calculating timeIs composed of
Ply is added to the chain: the master node receives and examines the Reply message. When the master node receives 2f Reply messages, the new tile will take effect and be added to the tile chain. In this step, the cost is calculatedAnd calculating timeIs composed of
The method obtains model parameters w (T, tau) by means of distributed learning T-round iteration times and carrying out global aggregation at intervals tau in the period, and introduces an ideal loss function F (w*)(w*Representing ideal model parameters that can be obtained based on all data training), the objective of minimizing the reachable loss function is equivalent to:
the objective function can be initially defined in the form:
s.t.C1:Fe(f,T,τ)≤E
c1 limits total energy consumption; c2, C3 limit computing resources; c4 limits training time; c5 limits consensus time; e is the total energy of the system; t istimeIs the time limit provided. Constraints C4 and C5 will keep the training process and consensus process in sync.
For energy consumption, the training process can be expressed asThe consensus process can be expressed asWhere γ is a constant related to the hardware architecture;and isThis representation indicates whether node I (I ∈ I) participates in each process. In the present invention, δi'=[0,1,1,0,1,1,1]Representing the participation condition of the computing nodes in the training and consensus process; deltai”≠i'=[1,1,0,1,1,1,0]Representing the participation condition of the local node in the training and consensus process. Thus, the energy cost of the system is expressed as
wherein ,representing the overall computational resource.Represents the energy costs generated during the local update;representing the energy costs incurred in the global polymerization process.
In addition, each parameter satisfies the following condition:
1)||Fi(w)-Fi(w')||≤ρ||w-w'||
4)F(w(T,τ)-F(w*))≥ε
the objective function is set to:
due to the denominatorSince T is always positive, the optimal value of T is satisfied when equation C1 takes an equal sign. Then will beSubstituting, the objective function can be rewritten as:
and finally, solving by using a convex optimization function algorithm.
The setting of simulation parameters and simulation results and analysis are given below:
the MATLAB is used for simulation, and a system model is built.
The present invention uses the Boston House Price Dataset (Boston House Price Dataset) for experiments and results analysis of the proposed algorithm. Some parameters in the simulation process are set as l 2, s 5, eta 1 × 10-6,μ1=0.1M cycles,μ2=0.05M cycles,α=0.2M cycles,β=0.8M cycles,θ=0.005M cycles,γ=1×10-5,Ttime=300s。
The simulation graph 4 shows the variation trend of the loss function value along with the total energy variation of the system when the number of local nodes N is respectively 3, 4 and 5. The figure shows that the loss function value decreases as the total energy of the system increases, and the loss function value decreases for the smaller number of local nodes for the same system energy. In addition, the smaller the number of local nodes, the less total system energy is required for the loss function value to converge during the total system energy change.
The simulation is that when the total energy of the system is E ═ 0.5 × 105,E=1.5×105And the variation trend of the loss function value is changed along with the number of the local nodes. The graph shows that as the number of local nodes participating in the distributed machine learning process increases, the loss function also increases. Compared with the traditional algorithm (computing resource average distribution, tau is 10), on one hand, the computing resources are reasonably distributed by analyzing the computing cost generated by each node in the transaction process, so that the total energy of the system is fully utilized, the difference between the loss function value under the traditional algorithm and the loss function value under the optimization algorithm is larger when the total energy of the system is smaller (namely the system energy is less enough), and compared with the traditional algorithm, the performance of the resource distribution algorithm is effectively improved; on the other hand, the tau value is continuously adjusted by using the optimization parameters based on a resource allocation algorithm, so that the loss function value obtained at the same node under the same system energy is smaller.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.
Claims (5)
1. An adaptive distributed machine learning method based on block chains and privacy is characterized by comprising the following steps:
step 1, establishing a distributed machine learning system model with privacy protection based on a block chain
The method comprises the following steps that a calculator C and a participant P construct a distributed environment among nodes by means of a block chain network, local updating and global aggregation processes of distributed machine learning are completed by utilizing linear regression and gradient descent, a partial homomorphic encryption technology is introduced, and model parameters in a training process are protected; introducing a consensus process to verify the correctness of the model parameters; finally, each distributed node is interacted by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process;
step 2, combining intelligent contracts among nodes to complete distributed consensus process
In order to ensure the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the calculating party and the participating party, and comprises five transaction processes of ELW, ELP, EGW and EGP; the training parameters are transmitted in a transaction mode by means of an intelligent contract and are recorded in a block;
step 3, performance analysis of training process and consensus process
In the system model provided by the invention, a participator node and a calculator node use a block chain network to jointly complete local updating and global aggregation in a training process, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters; in the training process and the consensus process, five transaction types are included, and the correspondence of the above contents is shown in the following table:
respectively analyzing the performance of the training process and the consensus process, comprising the following steps:
step 3.1 training procedure
For the training process, the system consists of local updating and global aggregation and comprises five transactions driven by intelligent contracts; computational cost and computational time, measured in algorithm complexity, corresponding to the computational process of each transaction:
partial updating: local node Pi′(I 'belongs to the I, I' is 1.., N) local ciphertext parameters are updated according to global ciphertext parameters issued by the computing nodes, and the local ciphertext parameters are processed in a block chain in an ELW transaction modeCarrying out transmission; in the local update step, the cost of computation is O (| w | (2| D)i′L +1)), then Pi′Is calculated cost ofAnd calculating the timeIs composed of
Wherein, in orderRepresenting the computational power gained by the nodes during the training process,
global aggregation: first, a local node Pi′(I' e.i ═ 1.., N) using the updated local cipher text parameters to compute an intermediate parameter ELP transaction for obtaining local cipher text variables at the cost of O (| w | | D)i′I)); then, the compute node C (C ═ I ∈ I, I ″, N +1) collects the data from Pi′Updating the EGW transaction of the global model parameters in a ciphertext state, and calculating the cost to be O (N | w |); meanwhile, the computing node collects ELP transactions from the participants and updates the EGP transactions of the intermediate parameters for computing the global model variables in a ciphertext state, and the computation cost is O (sigma)i′(2|Di′|+|w||Di′|); finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, to obtain the parameters that will ultimately be used to optimize the modelRho and delta, the computing node collects ELP transactions from the participants, and simultaneously, in combination with the EGP transactions, the computing node decrypts the ciphertext parameters by using a private key and computes optimized parameters in a plaintext state, wherein the computation cost is O (N); in the global polymerization step, Pi′Is calculated cost ofAnd calculating the timeIs composed of
Step 3.2 consensus Process
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(ii) RequestPre-Prepare: the calculator C acts as a master node i' (i ═ N +1), verifies the signatures and MACs of all transactions in the aggregation, and packs the transactions into a new block; in this step, the cost is calculatedAnd calculating timeIs composed of
(ii) Pre-prepare: the local node serves as a verification node i ≠ i', (i ≠ 1., N) receives a new block with a Pre-prefix message, firstly verifies the signature and MAC of the block, then verifies the signature and MAC of each transaction, and finally verifies the result according to a transaction computation mode verified in an intelligent contract; in this step, the cost is calculatedAnd calculating timeIs composed of
③ Prepare Commit: each node receives and checks the Prepare message to ensure that it is consistent with the Pre-Prepare message; when receiving 2f Prepare messages from other nodes, the node sends a Commit message to all other nodes; in this step, the cost is calculatedAnd calculating timeIs composed of
Fourthly, Commit Reply: each node receives and checks the Commit message to ensure that it is consistent with the Prepare message; once the node receives 2f Commit messages from other nodes, the Reply message is transmitted to the main node; in this step, the cost is calculatedAnd calculating timeIs composed of
Ply is added to the chain: the main node receives and checks the Reply message; when the main node receives 2f Reply messages, the new block takes effect and is added into a block chain; in this step, the cost is calculatedAnd calculating timeIs composed of
Step 4, self-adaptive global aggregation based on resource allocation optimization
Model parameters obtained by means of distributed learning T-round iteration times and global aggregation at intervals tau in the period are w (T, tau), and an ideal loss function F (w) is introduced*),w*Representing the ideal model parameters available for training based on all data, the objective of minimizing the reachable loss function is equivalent to:
and solving by using a convex optimization function algorithm.
2. The adaptive distributed machine learning method based on blockchain and privacy of claim 1, whichThe distributed network is characterized in that in the distributed network, a node set consists of a participant P and a calculator C, namely I = { P, C }, | I | = N + 1; representing the set of N participants as P ═ { P1,P2,...PN},Pi(i 1, 2.., N) represents a participant who owns a sub data set, DiRepresenting a participant PiThe owned subdata set, then the total data set is denoted as D ═ D { (D)1,D2,...,DN};(xij,yij)∈DiIs DiThe jth data in (1); the system model comprises three steps of global parameter issuing, local gradient updating and global parameter updating, wherein the local updating is carried out in each iteration process, namely T is 1, 2. Global aggregation occurs only when the iterative process T = Γ τ, Γ ═ 1,2,. ·, G being the total aggregation times of the system model, and T ═ G τ.
3. The blockchain and privacy based adaptive distributed machine learning method of claim 2, wherein the homomorphic encryption based distributed machine learning process can be described as:
1) and issuing global parameters: c issuing cipher text parameter after each global aggregationThe participant can only see the ciphertext and cannot know wg(t), the privacy of the global model parameters is ensured; to be provided withRepresenting possible globally aggregated model parameters, the interaction process of the local parameters and the global parameters can be described as:
2) local parameter updating: the participator completes the local updating process in the ciphertext state according to the homomorphism property of the homomorphic encryption algorithm; for the ith participant, the local parameter update procedure can be expressed as:
wherein ,wi,k(t) local model parameters wiThe k-th element in (t),represents PiThe local gradient, calculated over the number of iteration rounds t, is defined as a single data sample (x)ij,yij) Gradient of (2)Is summed with xij,kRepresenting the kth element in the input vector, then:
3) and global parameter updating: piCommit after every τ local updatesC obtainingThen, global aggregation is carried out on the local parameters in a ciphertext state according to the following formula, and the global parameters are updated;
4. the blockchain and privacy based adaptive distributed machine learning method of claim 1, wherein the blockchain based verifiable computing system model includes two processes: a distributed machine learning process and a blockchain consensus process; under the system model, N +1 nodes are considered, wherein the nodes comprise N local nodes which represent participants and a computing node which represents a calculator; computing power of each node by fiRepresents;
in addition, in μ1Represents the average CPU cycles, mu, required to complete the one-step ciphertext computation2Representing the average CPU cycles required for completing one-step plaintext calculation; under the common recognition of PBFT, f is 3 at most-1And (N-1) fault nodes, wherein each node generates or verifies a signature and a MAC (media access control) respectively needs beta CPU cycles and theta CPU cycles, and alpha CPU cycles are needed for computing tasks required by driving intelligent contract verification.
5. The adaptive distributed machine learning method based on blockchain and privacy of claim 1, wherein C1 limits total energy consumption in the set objective function; c2, C3 limit computing resources; c4 limits training time; c5 limits consensus time; e is the total energy of the system; t istimeIs the time limit provided; constraints C4 and C5 will keep the training process and consensus process in sync;
for energy consumption, the training process can be expressed asThe consensus process can be expressed asWhere γ is a constant related to the hardware architecture;and isExpressing whether the node I (I belongs to I) participates in each process or not; in the present invention, δi'=[0,1,1,0,1,1,1]Representing the participation condition of the computing node in the process v; deltai''≠i'=[1,1,0,1,1,1,0]Representing the participation condition of the local node in the process v; the energy cost of the system is expressed as
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110889794.1A CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110889794.1A CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113822758A true CN113822758A (en) | 2021-12-21 |
CN113822758B CN113822758B (en) | 2023-10-13 |
Family
ID=78912826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110889794.1A Active CN113822758B (en) | 2021-08-04 | 2021-08-04 | Self-adaptive distributed machine learning method based on blockchain and privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113822758B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114915429A (en) * | 2022-07-19 | 2022-08-16 | 北京邮电大学 | Communication perception calculation integrated network distributed credible perception method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111800274A (en) * | 2020-07-03 | 2020-10-20 | 北京工业大学 | Verifiable calculation energy consumption optimization method based on block chain |
CN111915294A (en) * | 2020-06-03 | 2020-11-10 | 东南大学 | Safety, privacy protection and tradable distributed machine learning framework based on block chain technology |
CN113114496A (en) * | 2021-04-06 | 2021-07-13 | 北京工业大学 | Block chain expandability problem solution based on fragmentation technology |
-
2021
- 2021-08-04 CN CN202110889794.1A patent/CN113822758B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111915294A (en) * | 2020-06-03 | 2020-11-10 | 东南大学 | Safety, privacy protection and tradable distributed machine learning framework based on block chain technology |
CN111800274A (en) * | 2020-07-03 | 2020-10-20 | 北京工业大学 | Verifiable calculation energy consumption optimization method based on block chain |
CN113114496A (en) * | 2021-04-06 | 2021-07-13 | 北京工业大学 | Block chain expandability problem solution based on fragmentation technology |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114915429A (en) * | 2022-07-19 | 2022-08-16 | 北京邮电大学 | Communication perception calculation integrated network distributed credible perception method and system |
CN114915429B (en) * | 2022-07-19 | 2022-10-11 | 北京邮电大学 | Communication perception calculation integrated network distributed credible perception method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113822758B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Reliable and privacy-preserving truth discovery for mobile crowdsensing systems | |
Chaudhari et al. | Trident: Efficient 4pc framework for privacy preserving machine learning | |
CN112733967B (en) | Model training method, device, equipment and storage medium for federal learning | |
Zhang et al. | A survey on collaborative deep learning and privacy-preserving | |
CN109495465B (en) | Privacy set intersection method based on intelligent contracts | |
US20230188319A1 (en) | System and method for privacy-preserving distributed training of machine learning models on distributed datasets | |
Hahn et al. | Versa: Verifiable secure aggregation for cross-device federated learning | |
CN113420232B (en) | Privacy protection-oriented federated recommendation method for neural network of graph | |
Wang et al. | Enhancing privacy preservation and trustworthiness for decentralized federated learning | |
Zhang et al. | PPO-DFK: A privacy-preserving optimization of distributed fractional knapsack with application in secure footballer configurations | |
CN112118099A (en) | Distributed multi-task learning privacy protection method and system for resisting inference attack | |
CN115795518B (en) | Block chain-based federal learning privacy protection method | |
Li et al. | Efficient privacy-preserving federated learning with unreliable users | |
Dou et al. | A distributed trust evaluation protocol with privacy protection for intercloud | |
CN113822758B (en) | Self-adaptive distributed machine learning method based on blockchain and privacy | |
CN115604314A (en) | Privacy protection diagnosis method of low-orbit satellite diagnosis model based on joint learning | |
CN114760023A (en) | Model training method and device based on federal learning and storage medium | |
CN114491616A (en) | Block chain and homomorphic encryption-based federated learning method and application | |
Jivanyan et al. | Hierarchical one-out-of-many proofs with applications to blockchain privacy and ring signatures | |
CN117094412A (en) | Federal learning method and device aiming at non-independent co-distributed medical scene | |
CN116628504A (en) | Trusted model training method based on federal learning | |
CN115828302A (en) | Credible privacy calculation-based microgrid grid-connected control privacy protection method | |
CN115238288A (en) | Safety processing method for industrial internet data | |
CN113806764B (en) | Distributed support vector machine based on blockchain and privacy protection and optimization method thereof | |
CN117852074A (en) | Secure and privacy-protected distributed machine learning adaptive aggregation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |