CN113822758A - Self-adaptive distributed machine learning method based on block chain and privacy - Google Patents

Self-adaptive distributed machine learning method based on block chain and privacy Download PDF

Info

Publication number
CN113822758A
CN113822758A CN202110889794.1A CN202110889794A CN113822758A CN 113822758 A CN113822758 A CN 113822758A CN 202110889794 A CN202110889794 A CN 202110889794A CN 113822758 A CN113822758 A CN 113822758A
Authority
CN
China
Prior art keywords
local
node
parameters
global
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110889794.1A
Other languages
Chinese (zh)
Other versions
CN113822758B (en
Inventor
张延华
赵学慧
杨睿哲
李萌
司鹏搏
于非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110889794.1A priority Critical patent/CN113822758B/en
Publication of CN113822758A publication Critical patent/CN113822758A/en
Application granted granted Critical
Publication of CN113822758B publication Critical patent/CN113822758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a self-adaptive distributed machine learning method based on a block chain and privacy, which comprises the following steps: and establishing a distributed machine learning system model with privacy protection based on the block chain, and completing an interaction process between nodes according to block chain consensus. By analyzing the computational complexity of the local nodes in the training process and the consensus process in detail and considering the energy consumption, the optimization method for computing resource allocation is provided, so that the self-adaptive aggregation method based on resource allocation optimization is provided. Simulation results show that the technical method provided by the invention performs a training process with privacy protection among nodes based on distributed consensus, optimizes the distribution of computing resources on the nodes under the constraint of energy consumption on one hand, and adaptively adjusts the global aggregation frequency on the other hand, so that the utilization rate of the total energy of the system is improved, and the convergence performance of the distributed learning process is further improved.

Description

Self-adaptive distributed machine learning method based on block chain and privacy
Technical Field
The invention belongs to the related field of aggregation frequency and resource allocation in distributed machine learning, and particularly relates to a calculation resource optimization method in distributed machine learning based on block chain consensus and privacy protection, and further relates to a self-adaptive aggregation method based on calculation resource allocation optimization.
Background
Currently, people and internet devices are generating unprecedented data. Machine learning is an important component of artificial intelligence as a method of data analysis, where data can be learned, identified, and decision-making. In order to fully mine the value of the data, the most direct method is to collect and store the data in a central server and then perform centralized processing. However, data is typically generated by multiple parties and stored in a geographically distributed manner, making it difficult to collect large-scale geographically distributed data in a single data store. Therefore, distributed machine learning is receiving increasing attention as an alternative solution to the central structure, i.e. to distribute the learning workload to the data owners.
Although distributed machine learning can learn without sharing data, interactions and messaging between scattered local nodes (data sets) can compromise data security and privacy. In addition, each local update and global aggregation consumes computational resources of the network. The amount of resources consumed may vary over time and there is a complex relationship between resource allocation, frequency of global aggregation and convergence performance of the model.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on a block chain and privacy. According to the method, privacy is protected and safety is guaranteed in the process of distributed machine learning, meanwhile, an optimization strategy of distributed node calculation resource allocation is given by combining an energy consumption formula, the frequency of global aggregation is continuously adjusted under the condition that the system energy is fixed, and therefore the utilization rate of the system energy is improved to the maximum extent.
The invention aims to solve the technical problem of providing a self-adaptive distributed machine learning method based on a block chain and privacy. According to the method, while privacy is protected and safety is guaranteed in the process of distributed machine learning, an optimization strategy of distributed node computing resource allocation is given in combination with energy consumption, and the frequency of global aggregation is continuously adjusted by the system, so that the utilization rate of system energy is improved to the maximum extent, and the optimal learning effect is obtained.
In order to solve the problems, the invention adopts the following technical scheme:
a block chain and privacy based adaptive distributed machine learning method comprises the following steps:
step 1, establishing a distributed machine learning system model with privacy protection based on a block chain
The method comprises the following steps that a calculator C and a participant P construct a distributed environment among nodes by means of a block chain network, local updating and global aggregation processes of distributed machine learning are completed by utilizing linear regression and gradient descent, a partial homomorphic encryption technology is introduced, and model parameters in a training process are protected; and introducing a consensus process to verify the correctness of the model parameters. Finally, each distributed node is interacted by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process.
Step 2, combining intelligent contracts among nodes to complete distributed consensus process
In order to guarantee the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the calculating party and the participating party, wherein the distributed consensus process comprises five transaction processes of ELW, ELP, EGW, EGP and CGP. The training parameters are communicated in the form of transactions by means of smart contracts and recorded in blocks.
Step 3, performance analysis of training process and consensus process
Step 3.1, training procedure
Step 3.2 consensus Process
Step 4, self-adaptive global aggregation based on resource allocation optimization
Two processes are included in the distributed machine learning system model based on block chains and privacy protection: a distributed machine learning process and a blockchain consensus process. Under the system model, N +1 nodes are considered, wherein the nodes comprise N local nodes which represent participants and a computing node which represents a calculator; computing power of each node by fi(CPU cycles per second).
In addition, in μ1Represents the average CPU cycles, mu, required to complete the one-step ciphertext computation2Representing the average CPU cycles needed to complete a one-step plaintext calculation. Under the common recognition of PBFT, f is 3 at most-1And (N-1) fault nodes, wherein each node generates or verifies a signature and a MAC (media access control) respectively needs beta CPU cycles and theta CPU cycles, and alpha CPU cycles are needed for computing tasks required by driving intelligent contract verification.
According to the method, the calculation complexity of each node in different processes is analyzed in detail by establishing a block chain-based distributed machine learning model with privacy protection, resource allocation optimization of the nodes is carried out by combining an energy consumption formula, and meanwhile, an energy formula is introduced to formulate a constraint condition of an optimization function, so that a final objective function of self-adaptive aggregation under energy allocation optimization is given.
Simulation results show that compared with the traditional algorithm (the aggregation interval tau is fixed and the computing resources are evenly distributed), the proposed algorithm has better performance.
Drawings
FIG. 1 is a system model;
FIG. 2 is a flow chart of a PBFT consensus protocol;
fig. 3 shows the variation trend of the loss function value (N ═ 3, 4, and 5) according to the total energy of the system.
Fig. 4 shows the variation trend of the loss function value (E ═ 0.5 × 10) according to the number of nodes6、1.5×105,τ=10)。
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
Step 1, establishing a distributed machine learning system model with privacy protection based on a block chain
Fig. 1 shows a system model of the present invention, then the block chain based distributed machine learning process with privacy protection can be described as: the method comprises the steps that local nodes and computing nodes are deployed in a block chain to form a safe distributed environment, the local nodes are responsible for local updating tasks in a training process, the computing nodes are responsible for global aggregation tasks in the training process, and each local node finishes a linear regression learning process by using a gradient descent algorithm. In order to ensure the privacy of the model parameters, a homomorphic encryption technology is introduced in the training process, and the homomorphic property of the homomorphic encryption technology is utilized to enable each node to complete the updating task of each parameter in a ciphertext state. In addition, in order to ensure the credibility of the training process, distributed consensus is introduced among nodes based on a block chain network during global aggregation, so that ciphertext model parameters are transmitted and updated among the nodes in a transaction form by means of an intelligent contract.
Input vector x in machine learning modeljAnd output yjAbove, the best fit equation for linear regression can be expressed as:
yj=w0+w1xj,1+w2xj,2+......=wTxj
its corresponding loss function f (w) is the mean square error, with the goal of solving for the optimum parameter w that minimizes f (w).
Figure RE-GDA0003334357270000041
In a distributed network, a node set is composed of a participant P and a calculator C, i.e., I ═ { P, C }, | I | ═ N + 1. Representing the set of N participants as P ═ { P1,P2,...PN},Pi( i 1, 2.., N) represents a participant who owns a sub data set, DiRepresenting a participant PiOwned bySubdata set, then the total data set is denoted as D ═ D1,D2,...,DN}。 (xij,yij)∈DiIs DiThe jth data in (1).
In the system model provided by the invention, local updating occurs in each iteration process, namely T is 1, 2.. and T is the total iteration number of the system model; global aggregation occurs only when the iterative process T- Γ τ is 1, 2.., G being the total aggregation times of the system model, and T-G τ.
In addition, the calculator C holds a key pair for protecting the model parameters, so that the model parameters are encrypted and decrypted at any time in the operation process, but the participant P always only holds the ciphertext model parameters, and the distributed machine learning process based on homomorphic encryption can be described as follows:
1. and issuing global parameters: c issuing cipher text parameter after each global aggregation
Figure RE-GDA0003334357270000042
The participant can only see the ciphertext and cannot know wgAnd (t), the privacy of the global model parameters is ensured. To be provided with
Figure RE-GDA0003334357270000043
Representing possible globally aggregated model parameters, the interaction process of the local parameters and the global parameters can be described as:
Figure RE-GDA0003334357270000044
2. local parameter updating: and the participator completes the local updating process in the ciphertext state according to the homomorphism property of the homomorphic encryption algorithm. For the ith participant, the local parameter update procedure can be expressed as:
Figure RE-GDA0003334357270000045
wherein ,wi,k(t) local model parameterswiThe k-th element in (t),
Figure RE-GDA0003334357270000051
represents PiThe local gradient, calculated over the number of iteration rounds t, is defined as a single data sample (x)ij,yij) Gradient of (2)
Figure RE-GDA0003334357270000052
Is summed with xij,kRepresenting the kth element in the input vector, then:
Figure RE-GDA0003334357270000053
Figure RE-GDA0003334357270000054
3. and global parameter updating: piCommit after every τ local updates
Figure RE-GDA0003334357270000055
C obtaining
Figure RE-GDA0003334357270000056
Then, global aggregation is performed on the local parameters in a ciphertext state according to the following formula, and the global parameters are updated.
Figure RE-GDA0003334357270000057
Step 2, combining intelligent contracts among nodes to complete distributed consensus process
In the distributed machine learning model based on the block chain and privacy protection, a PBFT consensus protocol is utilized to form distributed consensus among nodes, so that the credibility of the learning process is ensured and the correctness of model parameters is confirmed. And the model parameters are enabled to complete updating and interaction processes among the nodes in a transaction form through an intelligent contract, and uplink authentication is carried out. The workflow of the PBFT consensus protocol is shown in fig. 2.
In the consensus process provided by the present invention, five transaction types are included, which are: the method comprises the following steps that a calculation party issues a transaction (EGW) to a ciphertext of a global parameter, a participant party feeds back the transaction (ELW) to a ciphertext of a local parameter, the participant party calculates a ciphertext submitting transaction (ELP) to a local variable intermediate parameter, the participant party calculates the local parameter to obtain a ciphertext transaction (EGP) of the global variable intermediate parameter according to the local parameter and the global parameter, and the calculation party calculates an optimized parameter plaintext transaction (CGP) according to the decrypted ELP and the EGP. And in the consensus process, intelligent contracts are adopted to drive transactions and block verification is carried out to enter a chain. After receiving the EGW transaction, the participating party immediately carries out local parameter updating calculation on the local data set in a ciphertext state and carries out local parameter updating before aggregation, and then submits the ELW transaction and the ELP transaction; and the calculator performs ciphertext operation on each received ELW transaction to obtain an EGW transaction, further obtains an EGP transaction in a ciphertext state, and then decrypts and operates the ELP transaction and the EGP transaction to further obtain a CGP transaction.
In the transaction process, the calculator acts as a main node to print the transaction into a block and carry out consensus verification, namely, each participant serving as a secondary node verifies the transaction process according to a public key, wherein the verification comprises the signature, MAC and MAC of the transaction submitted by each node
Figure RE-GDA0003334357270000061
The operational relationship of (1).
Step 3, performance analysis of training process and consensus process
In the system model provided by the invention, the participator node and the calculator node use the blockchain network to jointly complete local updating and global aggregation in the training process, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters. In the training process and the consensus process, five transaction types are included, and the correspondence of the above contents is shown in the following table:
Figure RE-GDA0003334357270000062
respectively analyzing the performance of the training process and the consensus process, comprising the following steps:
step 3.1 training procedure
For the training process, it consists of local updates and global aggregation, and contains five transactions driven by smart contracts. Computational cost and computational time, measured in algorithm complexity, corresponding to the computational process of each transaction:
partial updating: local node Pi'And (I 'belongs to the I, I' is 1.., N) updating the local ciphertext parameters according to the global ciphertext parameters issued by the computing nodes, and transmitting the local ciphertext parameters in the block chain in an ELW transaction mode. In the local update step, the cost of computation is O (| w | (2| D)i'L +1)), then Pi'Is calculated cost of
Figure RE-GDA0003334357270000063
And calculating the time
Figure RE-GDA0003334357270000064
Is composed of
Figure RE-GDA0003334357270000071
Figure RE-GDA0003334357270000072
Wherein, in order
Figure RE-GDA0003334357270000073
Figure RE-GDA0003334357270000073
Figure RE-GDA0003334357270000073
1,2 represents the computational power gained by the node during the training process,
Figure RE-GDA0003334357270000074
global aggregation: first, a local node Pi'(I' e.i ═ 1.., N) using the updated local cipher text parameters to compute an intermediate parameter ELP transaction for obtaining local cipher text variables at the cost of O (| w | | D)i'I)); then, compute node C (C ═ I ∈ I, I ═ N +1) collects data from Pi'Updating the EGW transaction of the global model parameters in a ciphertext state, and calculating the cost to be O (N | w |); meanwhile, the computing node collects ELP transactions from the participants and updates the EGP transactions of the intermediate parameters for computing the global model variables in a ciphertext state, and the computation cost is O (sigma)i'(2|Di'|+|w||Di'|); finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, to obtain the parameters that will ultimately be used to optimize the model
Figure RE-GDA0003334357270000075
Rho and delta, the computing node collects ELP transactions from the participants, and simultaneously, in combination with the EGP transactions, the computing node decrypts the ciphertext parameters by using a private key and computes optimized parameters in a plaintext state, wherein the computation cost is O (N). In the global polymerization step, Pi'Is calculated cost of
Figure RE-GDA0003334357270000076
And calculating the time
Figure RE-GDA0003334357270000077
Is composed of
Figure RE-GDA0003334357270000078
Figure RE-GDA0003334357270000079
C is a calculated cost
Figure RE-GDA00033343572700000710
And calculating the time
Figure RE-GDA00033343572700000711
Is composed of
Figure RE-GDA00033343572700000712
Figure RE-GDA00033343572700000713
Step 3.2 consensus Process
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(ii) RequestPre-Prepare: the calculator C, acting as the master node i ', (i' ═ N +1), verifies the signatures and MACs of all transactions in the aggregation, and packs the transactions into a new block. In this step, the cost is calculated
Figure RE-GDA0003334357270000081
And calculating time
Figure RE-GDA0003334357270000082
Is composed of
Figure RE-GDA0003334357270000083
Figure RE-GDA0003334357270000084
Wherein, in order
Figure RE-GDA0003334357270000085
And s is 1, …,5, which represents the computing power of the node in the consensus process,
Figure RE-GDA0003334357270000086
(ii) Pre-prepare: the local node, as a verification node i ≠ i', (i ≠ 1., N), receives a new block with a Pre-prefix message, first verifies the signature and MAC of the block, then verifies the signature and MAC of each transaction, and finally verifies the result in a transaction calculation manner according to the verification in the intelligent contract. In this step, the cost is calculated
Figure RE-GDA0003334357270000087
And calculating time
Figure RE-GDA0003334357270000088
Is composed of
Figure RE-GDA0003334357270000089
Figure RE-GDA00033343572700000810
③ Prepare Commit: each node receives and checks the Prepare message to ensure that it is consistent with the Pre-Prepare message. When 2f Prepare messages are received from other nodes, the node sends a Commit message to all other nodes. In this step, the cost is calculated
Figure RE-GDA00033343572700000811
And calculating time
Figure RE-GDA00033343572700000812
Is composed of
Figure RE-GDA00033343572700000813
Figure RE-GDA00033343572700000814
Fourthly, Commit Reply: each node receives and checks the Commit message to ensure that it is consistent with the Prepare message. Once the node receives 2f Commit messages from other nodes, it will pass on the Reply message to the master node. In this step, the cost is calculated
Figure RE-GDA00033343572700000815
And calculating time
Figure RE-GDA00033343572700000816
Is composed of
Figure RE-GDA00033343572700000817
Figure RE-GDA00033343572700000818
Ply is added to the chain: the master node receives and examines the Reply message. When the master node receives 2f Reply messages, the new tile will take effect and be added to the tile chain. In this step, the cost is calculated
Figure RE-GDA00033343572700000819
And calculating time
Figure RE-GDA00033343572700000820
Is composed of
Figure RE-GDA0003334357270000091
Figure RE-GDA0003334357270000092
Step 4, self-adaptive global aggregation based on resource allocation optimization
The method obtains model parameters w (T, tau) by means of distributed learning T-round iteration times and carrying out global aggregation at intervals tau in the period, and introduces an ideal loss function F (w*)(w*Representing ideal model parameters that can be obtained based on all data training), the objective of minimizing the reachable loss function is equivalent to:
Figure RE-GDA0003334357270000093
the objective function can be initially defined in the form:
Figure RE-GDA0003334357270000094
s.t.C1:Fe(f,T,τ)≤E
Figure RE-GDA0003334357270000095
Figure RE-GDA0003334357270000096
Figure RE-GDA0003334357270000097
Figure RE-GDA0003334357270000098
c1 limits total energy consumption; c2, C3 limit computing resources; c4 limits training time; c5 limits consensus time; e is the total energy of the system; t istimeIs the time limit provided. Constraints C4 and C5 will keep the training process and consensus process in sync.
For energy consumption, the training process can be expressed as
Figure RE-GDA0003334357270000099
The consensus process can be expressed as
Figure RE-GDA00033343572700000910
Where γ is a constant related to the hardware architecture;
Figure RE-GDA00033343572700000911
and is
Figure RE-GDA00033343572700000912
This representation indicates whether node I (I ∈ I) participates in each process. In the present invention, δi'=[0,1,1,0,1,1,1]Representing the participation condition of the computing nodes in the training and consensus process; deltai”≠i'=[1,1,0,1,1,1,0]Representing the participation condition of the local node in the training and consensus process. Thus, the energy cost of the system is expressed as
Figure RE-GDA0003334357270000101
wherein ,
Figure RE-GDA0003334357270000102
representing the overall computational resource.
Figure RE-GDA0003334357270000103
Represents the energy costs generated during the local update;
Figure RE-GDA0003334357270000104
representing the energy costs incurred in the global polymerization process.
In addition, each parameter satisfies the following condition:
1)||Fi(w)-Fi(w')||≤ρ||w-w'||
2)
Figure RE-GDA0003334357270000105
3)
Figure RE-GDA0003334357270000106
4)F(w(T,τ)-F(w*))≥ε
5)
Figure RE-GDA0003334357270000107
6)
Figure RE-GDA0003334357270000108
7)
Figure RE-GDA0003334357270000109
the objective function is set to:
Figure RE-GDA00033343572700001010
Figure RE-GDA00033343572700001011
Figure RE-GDA00033343572700001012
Figure RE-GDA00033343572700001013
Figure RE-GDA00033343572700001014
Figure RE-GDA00033343572700001015
due to the denominator
Figure RE-GDA0003334357270000111
Since T is always positive, the optimal value of T is satisfied when equation C1 takes an equal sign. Then will be
Figure RE-GDA0003334357270000112
Substituting, the objective function can be rewritten as:
Figure RE-GDA0003334357270000113
Figure RE-GDA0003334357270000114
Figure RE-GDA0003334357270000115
Figure RE-GDA0003334357270000116
Figure RE-GDA0003334357270000117
and finally, solving by using a convex optimization function algorithm.
The setting of simulation parameters and simulation results and analysis are given below:
the MATLAB is used for simulation, and a system model is built.
The present invention uses the Boston House Price Dataset (Boston House Price Dataset) for experiments and results analysis of the proposed algorithm. Some parameters in the simulation process are set as l 2, s 5, eta 1 × 10-6
Figure RE-GDA0003334357270000118
μ1=0.1M cycles,μ2=0.05M cycles,
Figure RE-GDA0003334357270000119
α=0.2M cycles,β=0.8M cycles,θ=0.005M cycles,γ=1×10-5,Ttime=300s。
The simulation graph 4 shows the variation trend of the loss function value along with the total energy variation of the system when the number of local nodes N is respectively 3, 4 and 5. The figure shows that the loss function value decreases as the total energy of the system increases, and the loss function value decreases for the smaller number of local nodes for the same system energy. In addition, the smaller the number of local nodes, the less total system energy is required for the loss function value to converge during the total system energy change.
The simulation is that when the total energy of the system is E ═ 0.5 × 105,E=1.5×105And the variation trend of the loss function value is changed along with the number of the local nodes. The graph shows that as the number of local nodes participating in the distributed machine learning process increases, the loss function also increases. Compared with the traditional algorithm (computing resource average distribution, tau is 10), on one hand, the computing resources are reasonably distributed by analyzing the computing cost generated by each node in the transaction process, so that the total energy of the system is fully utilized, the difference between the loss function value under the traditional algorithm and the loss function value under the optimization algorithm is larger when the total energy of the system is smaller (namely the system energy is less enough), and compared with the traditional algorithm, the performance of the resource distribution algorithm is effectively improved; on the other hand, the tau value is continuously adjusted by using the optimization parameters based on a resource allocation algorithm, so that the loss function value obtained at the same node under the same system energy is smaller.
The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims (5)

1. An adaptive distributed machine learning method based on block chains and privacy is characterized by comprising the following steps:
step 1, establishing a distributed machine learning system model with privacy protection based on a block chain
The method comprises the following steps that a calculator C and a participant P construct a distributed environment among nodes by means of a block chain network, local updating and global aggregation processes of distributed machine learning are completed by utilizing linear regression and gradient descent, a partial homomorphic encryption technology is introduced, and model parameters in a training process are protected; introducing a consensus process to verify the correctness of the model parameters; finally, each distributed node is interacted by means of a consensus process, and only ciphertext parameters of the model can be received in the interaction process;
step 2, combining intelligent contracts among nodes to complete distributed consensus process
In order to ensure the credibility of the learning process and confirm the correctness of the learning parameters, a distributed consensus process is formed between the calculating party and the participating party, and comprises five transaction processes of ELW, ELP, EGW and EGP; the training parameters are transmitted in a transaction mode by means of an intelligent contract and are recorded in a block;
step 3, performance analysis of training process and consensus process
In the system model provided by the invention, a participator node and a calculator node use a block chain network to jointly complete local updating and global aggregation in a training process, and a consensus process is introduced in the global aggregation to ensure the correctness of model parameters; in the training process and the consensus process, five transaction types are included, and the correspondence of the above contents is shown in the following table:
Figure FDA0003195444300000011
Figure FDA0003195444300000021
respectively analyzing the performance of the training process and the consensus process, comprising the following steps:
step 3.1 training procedure
For the training process, the system consists of local updating and global aggregation and comprises five transactions driven by intelligent contracts; computational cost and computational time, measured in algorithm complexity, corresponding to the computational process of each transaction:
partial updating: local node Pi′(I 'belongs to the I, I' is 1.., N) local ciphertext parameters are updated according to global ciphertext parameters issued by the computing nodes, and the local ciphertext parameters are processed in a block chain in an ELW transaction modeCarrying out transmission; in the local update step, the cost of computation is O (| w | (2| D)i′L +1)), then Pi′Is calculated cost of
Figure FDA0003195444300000022
And calculating the time
Figure FDA0003195444300000023
Is composed of
Figure FDA0003195444300000024
Figure FDA0003195444300000025
Wherein, in order
Figure FDA0003195444300000026
Representing the computational power gained by the nodes during the training process,
Figure FDA0003195444300000027
global aggregation: first, a local node Pi′(I' e.i ═ 1.., N) using the updated local cipher text parameters to compute an intermediate parameter ELP transaction for obtaining local cipher text variables at the cost of O (| w | | D)i′I)); then, the compute node C (C ═ I ∈ I, I ″, N +1) collects the data from Pi′Updating the EGW transaction of the global model parameters in a ciphertext state, and calculating the cost to be O (N | w |); meanwhile, the computing node collects ELP transactions from the participants and updates the EGP transactions of the intermediate parameters for computing the global model variables in a ciphertext state, and the computation cost is O (sigma)i′(2|Di′|+|w||Di′|); finally, since homomorphic encryption cannot handle the problem of ciphertext multiplication, to obtain the parameters that will ultimately be used to optimize the model
Figure FDA0003195444300000028
Rho and delta, the computing node collects ELP transactions from the participants, and simultaneously, in combination with the EGP transactions, the computing node decrypts the ciphertext parameters by using a private key and computes optimized parameters in a plaintext state, wherein the computation cost is O (N); in the global polymerization step, Pi′Is calculated cost of
Figure FDA0003195444300000029
And calculating the time
Figure FDA00031954443000000210
Is composed of
Figure FDA00031954443000000211
Figure FDA0003195444300000031
C is a calculated cost
Figure FDA0003195444300000032
And calculating the time
Figure FDA0003195444300000033
Is composed of
Figure FDA0003195444300000034
Figure FDA0003195444300000035
Step 3.2 consensus Process
For the consensus process introduced in the global aggregation, the PBFT consensus protocol comprises five steps:
(ii) RequestPre-Prepare: the calculator C acts as a master node i' (i ═ N +1), verifies the signatures and MACs of all transactions in the aggregation, and packs the transactions into a new block; in this step, the cost is calculated
Figure FDA0003195444300000036
And calculating time
Figure FDA0003195444300000037
Is composed of
Figure FDA0003195444300000038
Figure FDA0003195444300000039
Wherein, in order
Figure FDA00031954443000000310
Representing the computational power of the nodes in the consensus process,
Figure FDA00031954443000000311
(ii) Pre-prepare: the local node serves as a verification node i ≠ i', (i ≠ 1., N) receives a new block with a Pre-prefix message, firstly verifies the signature and MAC of the block, then verifies the signature and MAC of each transaction, and finally verifies the result according to a transaction computation mode verified in an intelligent contract; in this step, the cost is calculated
Figure FDA00031954443000000312
And calculating time
Figure FDA00031954443000000313
Is composed of
Figure FDA00031954443000000314
Figure FDA00031954443000000315
③ Prepare Commit: each node receives and checks the Prepare message to ensure that it is consistent with the Pre-Prepare message; when receiving 2f Prepare messages from other nodes, the node sends a Commit message to all other nodes; in this step, the cost is calculated
Figure FDA00031954443000000316
And calculating time
Figure FDA00031954443000000317
Is composed of
Figure FDA00031954443000000318
Figure FDA0003195444300000041
Fourthly, Commit Reply: each node receives and checks the Commit message to ensure that it is consistent with the Prepare message; once the node receives 2f Commit messages from other nodes, the Reply message is transmitted to the main node; in this step, the cost is calculated
Figure FDA0003195444300000042
And calculating time
Figure FDA0003195444300000043
Is composed of
Figure FDA0003195444300000044
Figure FDA0003195444300000045
Ply is added to the chain: the main node receives and checks the Reply message; when the main node receives 2f Reply messages, the new block takes effect and is added into a block chain; in this step, the cost is calculated
Figure FDA0003195444300000046
And calculating time
Figure FDA0003195444300000047
Is composed of
Figure FDA0003195444300000048
Figure FDA0003195444300000049
Step 4, self-adaptive global aggregation based on resource allocation optimization
Model parameters obtained by means of distributed learning T-round iteration times and global aggregation at intervals tau in the period are w (T, tau), and an ideal loss function F (w) is introduced*),w*Representing the ideal model parameters available for training based on all data, the objective of minimizing the reachable loss function is equivalent to:
Figure FDA00031954443000000410
and solving by using a convex optimization function algorithm.
2. The adaptive distributed machine learning method based on blockchain and privacy of claim 1, whichThe distributed network is characterized in that in the distributed network, a node set consists of a participant P and a calculator C, namely I = { P, C }, | I | = N + 1; representing the set of N participants as P ═ { P1,P2,...PN},Pi(i 1, 2.., N) represents a participant who owns a sub data set, DiRepresenting a participant PiThe owned subdata set, then the total data set is denoted as D ═ D { (D)1,D2,...,DN};(xij,yij)∈DiIs DiThe jth data in (1); the system model comprises three steps of global parameter issuing, local gradient updating and global parameter updating, wherein the local updating is carried out in each iteration process, namely T is 1, 2. Global aggregation occurs only when the iterative process T = Γ τ, Γ ═ 1,2,. ·, G being the total aggregation times of the system model, and T ═ G τ.
3. The blockchain and privacy based adaptive distributed machine learning method of claim 2, wherein the homomorphic encryption based distributed machine learning process can be described as:
1) and issuing global parameters: c issuing cipher text parameter after each global aggregation
Figure FDA0003195444300000051
The participant can only see the ciphertext and cannot know wg(t), the privacy of the global model parameters is ensured; to be provided with
Figure FDA0003195444300000052
Representing possible globally aggregated model parameters, the interaction process of the local parameters and the global parameters can be described as:
Figure FDA0003195444300000053
2) local parameter updating: the participator completes the local updating process in the ciphertext state according to the homomorphism property of the homomorphic encryption algorithm; for the ith participant, the local parameter update procedure can be expressed as:
Figure FDA0003195444300000054
wherein ,wi,k(t) local model parameters wiThe k-th element in (t),
Figure FDA0003195444300000055
represents PiThe local gradient, calculated over the number of iteration rounds t, is defined as a single data sample (x)ij,yij) Gradient of (2)
Figure FDA0003195444300000056
Is summed with xij,kRepresenting the kth element in the input vector, then:
Figure FDA0003195444300000057
Figure FDA0003195444300000058
3) and global parameter updating: piCommit after every τ local updates
Figure FDA0003195444300000059
C obtaining
Figure FDA00031954443000000510
Then, global aggregation is carried out on the local parameters in a ciphertext state according to the following formula, and the global parameters are updated;
Figure FDA00031954443000000511
4. the blockchain and privacy based adaptive distributed machine learning method of claim 1, wherein the blockchain based verifiable computing system model includes two processes: a distributed machine learning process and a blockchain consensus process; under the system model, N +1 nodes are considered, wherein the nodes comprise N local nodes which represent participants and a computing node which represents a calculator; computing power of each node by fiRepresents;
in addition, in μ1Represents the average CPU cycles, mu, required to complete the one-step ciphertext computation2Representing the average CPU cycles required for completing one-step plaintext calculation; under the common recognition of PBFT, f is 3 at most-1And (N-1) fault nodes, wherein each node generates or verifies a signature and a MAC (media access control) respectively needs beta CPU cycles and theta CPU cycles, and alpha CPU cycles are needed for computing tasks required by driving intelligent contract verification.
5. The adaptive distributed machine learning method based on blockchain and privacy of claim 1, wherein C1 limits total energy consumption in the set objective function; c2, C3 limit computing resources; c4 limits training time; c5 limits consensus time; e is the total energy of the system; t istimeIs the time limit provided; constraints C4 and C5 will keep the training process and consensus process in sync;
for energy consumption, the training process can be expressed as
Figure FDA0003195444300000061
The consensus process can be expressed as
Figure FDA0003195444300000062
Where γ is a constant related to the hardware architecture;
Figure FDA0003195444300000063
and is
Figure FDA0003195444300000064
Expressing whether the node I (I belongs to I) participates in each process or not; in the present invention, δi'=[0,1,1,0,1,1,1]Representing the participation condition of the computing node in the process v; deltai''≠i'=[1,1,0,1,1,1,0]Representing the participation condition of the local node in the process v; the energy cost of the system is expressed as
Figure FDA0003195444300000065
wherein ,
Figure FDA0003195444300000066
represents the overall computational resource;
Figure FDA0003195444300000067
represents the energy costs generated during the local update;
Figure FDA0003195444300000068
representing the energy costs incurred in the global polymerization process.
CN202110889794.1A 2021-08-04 2021-08-04 Self-adaptive distributed machine learning method based on blockchain and privacy Active CN113822758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110889794.1A CN113822758B (en) 2021-08-04 2021-08-04 Self-adaptive distributed machine learning method based on blockchain and privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110889794.1A CN113822758B (en) 2021-08-04 2021-08-04 Self-adaptive distributed machine learning method based on blockchain and privacy

Publications (2)

Publication Number Publication Date
CN113822758A true CN113822758A (en) 2021-12-21
CN113822758B CN113822758B (en) 2023-10-13

Family

ID=78912826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110889794.1A Active CN113822758B (en) 2021-08-04 2021-08-04 Self-adaptive distributed machine learning method based on blockchain and privacy

Country Status (1)

Country Link
CN (1) CN113822758B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915429A (en) * 2022-07-19 2022-08-16 北京邮电大学 Communication perception calculation integrated network distributed credible perception method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111800274A (en) * 2020-07-03 2020-10-20 北京工业大学 Verifiable calculation energy consumption optimization method based on block chain
CN111915294A (en) * 2020-06-03 2020-11-10 东南大学 Safety, privacy protection and tradable distributed machine learning framework based on block chain technology
CN113114496A (en) * 2021-04-06 2021-07-13 北京工业大学 Block chain expandability problem solution based on fragmentation technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915294A (en) * 2020-06-03 2020-11-10 东南大学 Safety, privacy protection and tradable distributed machine learning framework based on block chain technology
CN111800274A (en) * 2020-07-03 2020-10-20 北京工业大学 Verifiable calculation energy consumption optimization method based on block chain
CN113114496A (en) * 2021-04-06 2021-07-13 北京工业大学 Block chain expandability problem solution based on fragmentation technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114915429A (en) * 2022-07-19 2022-08-16 北京邮电大学 Communication perception calculation integrated network distributed credible perception method and system
CN114915429B (en) * 2022-07-19 2022-10-11 北京邮电大学 Communication perception calculation integrated network distributed credible perception method and system

Also Published As

Publication number Publication date
CN113822758B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Zhang et al. Reliable and privacy-preserving truth discovery for mobile crowdsensing systems
Chaudhari et al. Trident: Efficient 4pc framework for privacy preserving machine learning
CN112733967B (en) Model training method, device, equipment and storage medium for federal learning
Zhang et al. A survey on collaborative deep learning and privacy-preserving
CN109495465B (en) Privacy set intersection method based on intelligent contracts
US20230188319A1 (en) System and method for privacy-preserving distributed training of machine learning models on distributed datasets
Hahn et al. Versa: Verifiable secure aggregation for cross-device federated learning
CN113420232B (en) Privacy protection-oriented federated recommendation method for neural network of graph
Wang et al. Enhancing privacy preservation and trustworthiness for decentralized federated learning
Zhang et al. PPO-DFK: A privacy-preserving optimization of distributed fractional knapsack with application in secure footballer configurations
CN112118099A (en) Distributed multi-task learning privacy protection method and system for resisting inference attack
CN115795518B (en) Block chain-based federal learning privacy protection method
Li et al. Efficient privacy-preserving federated learning with unreliable users
Dou et al. A distributed trust evaluation protocol with privacy protection for intercloud
CN113822758B (en) Self-adaptive distributed machine learning method based on blockchain and privacy
CN115604314A (en) Privacy protection diagnosis method of low-orbit satellite diagnosis model based on joint learning
CN114760023A (en) Model training method and device based on federal learning and storage medium
CN114491616A (en) Block chain and homomorphic encryption-based federated learning method and application
Jivanyan et al. Hierarchical one-out-of-many proofs with applications to blockchain privacy and ring signatures
CN117094412A (en) Federal learning method and device aiming at non-independent co-distributed medical scene
CN116628504A (en) Trusted model training method based on federal learning
CN115828302A (en) Credible privacy calculation-based microgrid grid-connected control privacy protection method
CN115238288A (en) Safety processing method for industrial internet data
CN113806764B (en) Distributed support vector machine based on blockchain and privacy protection and optimization method thereof
CN117852074A (en) Secure and privacy-protected distributed machine learning adaptive aggregation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant