CN116389478A - Four-network fusion data sharing method based on blockchain and federal learning - Google Patents
Four-network fusion data sharing method based on blockchain and federal learning Download PDFInfo
- Publication number
- CN116389478A CN116389478A CN202310342414.1A CN202310342414A CN116389478A CN 116389478 A CN116389478 A CN 116389478A CN 202310342414 A CN202310342414 A CN 202310342414A CN 116389478 A CN116389478 A CN 116389478A
- Authority
- CN
- China
- Prior art keywords
- data
- equipment
- global model
- blockchain
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000004927 fusion Effects 0.000 title claims abstract description 16
- 238000004220 aggregation Methods 0.000 claims abstract description 36
- 230000002776 aggregation Effects 0.000 claims abstract description 19
- 230000006399 behavior Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 27
- 238000012795 verification Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 16
- 238000011156 evaluation Methods 0.000 claims description 13
- 239000004744 fabric Substances 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000005304 joining Methods 0.000 claims description 5
- 238000013210 evaluation model Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 8
- 230000007547 defect Effects 0.000 abstract description 4
- 206010063385 Intellectualisation Diseases 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0823—Network architectures or network communication protocols for network security for authentication of entities using certificates
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3247—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Finance (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Strategic Management (AREA)
- Evolutionary Biology (AREA)
- Technology Law (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a four-network fusion data sharing system based on blockchain and federal learning, which comprises data equipment, a certificate authority server and a blockchain network; the block chain network updates a global model of normal equipment by adopting a random gradient algorithm according to a self-aggregation result based on a federal learning mode, and records sharing behaviors; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment. According to the invention, the federation learning is integrated into the blockchain network, participants are not required to be always on-line, the defect that equipment is required to be always on-line by the traditional federation learning scheme is overcome, the method and the system are more in line with actual scenes, the efficiency and the safety of data sharing and fusion can be improved, and meanwhile, the intellectualization of transaction processing is realized.
Description
Technical Field
The invention relates to the technical field of four-network fusion data sharing, in particular to a four-network fusion data sharing method based on blockchain and federal learning.
Background
The four-network integration of the rail transit is realized by pushing a trunk railway network, an inter-city railway network, a city domain (suburban) railway network and a four-network integration of the urban rail transit network to construct an urban mass urban ring on a rail, so that the urban ring is pushed to develop. In order to improve connectivity of rail transit, a modern rail transit system with accurate function positioning, clear network hierarchy and high efficiency connection is established, more accurate travel track prediction and traffic flow space-time analysis of passengers are required, and service quality and passenger travel experience of trunk railway networks, inter-city railway networks, urban (suburban) railway networks and urban rail transit are improved through reasonable transfer planning, train dispatching and other behaviors. However, the data privacy of the main railway network, the inter-urban railway network, the urban (suburban) railway network and the urban rail transit is strong, even the data privacy is confidential, and the data island phenomenon is caused because the data island phenomenon is difficult to directly share, so that the cooperative work cannot be performed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a four-network fusion data sharing method based on block chain and federal learning.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a four-network converged data sharing system based on blockchain and federal learning, comprising:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
Optionally, the data device is divided into a data request device and a data providing device according to the sharing behavior, wherein the device requesting data from the sharing system is the data request device, and the device responding to the data request is the data providing device;
the data request device is used for sending a data request to the sharing system, wherein the data request comprises a device ID, a request effective duration, a transaction budget, an encrypted initial global model and request state information; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the data request sent by the received data request device, searching the historical record of the data request device according to the record on the blockchain network, sending response information to the data request device, and proposing a transaction through the response information; the response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
Optionally, the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
Optionally, the data device is specifically configured to:
the data providing device calculates the local gradient using the following formula:
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
and invoking intelligent contracts with digital signatures deployed in a blockchain network to upload local gradients;
the data requesting device verifies the legitimacy of the data providing device by verifying the digital signature after receiving the transaction for the gradient, and packages the transaction into blocks and broadcasts to other nodes after the verification is passed.
Optionally, the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
wherein,,represents the average gradient of the t-th round, S t Device ID, g (W) indicating data device selected by data requesting device at t-th round i,t,σ ) Representing the local gradient of the t-th round joining the gaussian differential privacy.
Optionally, the blockchain network is specifically configured to:
evaluating the contribution of the data device selected by the data requesting device using a gradient entropy based contribution evaluation model, expressed as:
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Is used for the information amount of the (a),S t representing the device ID of the data device selected by the data requesting device at round t.
Optionally, the blockchain network is specifically configured to:
and updating the global model of the normal equipment by adopting a random gradient algorithm according to the self-aggregation result, wherein the global model is expressed as follows:
wherein W is t+1 Representing updated global model parameters, W t Representing global model parameters, eta representing the learning rate of the present round,mean gradient of the t-th round is shown.
Optionally, the blockchain network is specifically configured to:
and calling the latest model intelligent contract to acquire a latest aggregation result and a latest global model to update the global model of the temporary equipment, wherein the latest aggregation result and the latest global model are expressed as:
W t+1 =(1-α i (t))W i,old +α i (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient, W i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
A four-network fusion data sharing method based on block chain and federation learning applied to the system comprises the following steps:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
The invention has the following beneficial effects:
according to the invention, the federation learning is integrated into the blockchain network, participants are not required to be always on-line, the defect that the traditional federation learning requires equipment to be on-line all the time is overcome, the method and the system are more in line with actual scenes, the efficiency and the safety of data sharing and fusion can be improved, and meanwhile, the intellectualization of transaction processing is realized.
Drawings
FIG. 1 is a schematic diagram of a four-network fusion data sharing system based on blockchain and federal learning in an embodiment of the present invention;
FIG. 2 is a timing diagram of a four-network fusion data sharing system based on blockchain and federation learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a workflow of a four-network converged data sharing system based on blockchain and federal learning in accordance with an embodiment of the present invention;
fig. 4 is a flow chart of a four-network fusion data sharing method based on blockchain and federation learning in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Federal learning is an emerging artificial intelligent basic technology for developing high-efficiency machine learning among multiple participants or multiple computing nodes on the premise of ensuring information security during large data exchange, protecting terminal data and personal data privacy and ensuring legal compliance.
The blockchain is used as a decentralized, data encryption and non-tamperable distributed shared database, can provide data confidentiality for federal learning data exchange, so as to ensure user privacy and data security among all participants, and can also ensure data consistency of model training by providing data by multiple participants, and the value driving incentive mechanism of the blockchain can also increase enthusiasm of providing data among all the participants and updating network model parameters.
The federal learning is combined with the blockchain, so that the requirements of partial situations of cooperative work and data sharing of all departments of four networks can be met, such as scenes of prediction of passenger travel rules, diagnosis of railway equipment faults and the like. The method breaks the data island, avoids the problem of insufficient model generalization capability caused by lack of sufficient data characteristics, does not leak data privacy, and has important significance for interconnection and intercommunication of four-network data and safe and efficient utilization.
The invention integrates federal learning into the blockchain, does not require the participants to be always on-line, overcomes the defect that the traditional federal learning requires equipment to be on-line all the time, is more in line with the actual scene, can improve the efficiency and safety of data sharing and fusion, and simultaneously achieves intelligent transaction processing.
Example 1
As shown in fig. 1 to 3, an embodiment of the present invention provides a four-network fusion data sharing system based on blockchain and federal learning, including:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
In an optional embodiment of the present invention, the data device is divided into a data requesting device and a data providing device according to a sharing behavior, wherein the device requesting data from the sharing system is the data requesting device, and the device responding to the data request is the data providing device;
the data request equipment is used for sending a data request to the sharing system; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the received data request sent by the data requesting device, searching the historical record of the data requesting device according to the record on the blockchain network, sending response information to the data requesting device, and proposing a transaction through the response information.
The data request includes a device ID, a request validity duration, a transaction budget, an encrypted initial global model, and request state information.
The response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
Specifically, the data device in this embodiment refers to various devices deployed in a network, which have a certain data processing capability and a certain storage capability, and may request data of other users or provide data to other users. For example, if the system is used for equipment risk and fault prediction, the equipment can be a sensor, and if the system is used for passenger travel law prediction, the equipment can be a data server. The device is both a component of the blockchain network and a user of the system. The computing resource rich device may be selected as a full node and the computing resource constrained device as a client node of the blockchain network.
To ensure security, the system is based on a federated blockchain design, where only allowed users can join the blockchain. Device e i The certificate needs to be applied by the CA server before joining the system. The CA server is responsible for verifying the identity of the applicant and issuing certificates. The device after passing the verification can join the blockchain network. e, e i After joining, it is chosen whether to be a full node or a light node, i.e. a client, according to its own computing power. Only all nodes have the entitlement packing and verification block.
A certain number of tokens must be escorted when the device registers. If there is malicious activity, the mortgage tokens will be revoked. While the device should possess local data, referred to as a local data set.
Each entity that successfully joins and registers with the blockchain is a system user and can be divided into different roles according to behavior. Such as a user requesting data from the system, is referred to as a data requestor. Data requester e j Preparing to issue a data request R to the system, R comprising a device ID, a request validity duration, a transaction budget, an encrypted initial global model, and a requestStatus, etc. The request is recorded as a transaction in the blockchain after verification by the consensus node. The data request is stored in a status database.
The user responding to the data request is called a data provider, who, upon receiving the data request R, first validates the request and searches the history of the requestor based on records on the blockchain. The data provider sends response information through which the transaction is presented. The response information includes device ID, data size, the duration of operation of the device, computing power, public key, etc.
Upon receipt of the response, the data requester will select a portion of the transactions based on the budget and send them a Fabric channel configuration that allows them to join the same task channel to form a temporary collaboration group. The Fabric channel is essentially a private atom broadcasting channel, and aims to prevent transaction information from being revealed to unauthorized nodes, so that entities outside the channel cannot access data in the channel, and the privacy and the security of transactions are improved.
Assuming that the number of malicious nodes does not exceed the maximum value of the Bayesian fault tolerance, namely under the condition that the system is safe, each task channel selects a node with good partial contribution degree as a leader and takes charge of consensus verification.
In an alternative embodiment of the invention, the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
Specifically, suppose that a set E of N devices responds to a data sharing request R, where e= { E 1 ,e 2 ,…,e N }. Is provided withStandby e i With a local data set D i Wherein each data sample is represented by d k And (3) representing. Let W be i,t Is device e i Is based on the local data set D at the t-th round i Is a local model parameter of W t Is a global model parameter. The goal of the local training is to minimize the loss function of the round to obtain a more excellent model.
The data providing apparatus of this embodiment firstly decrypts the initial global model using the public key of the requester, and at the same time verifies the initial global model by comparing the Hash values of the initial global model, and then they train the initialized global model and add gaussian differential privacy to the training parameters before local gradient uploading based on their own local data set, and if the noise parameter is σ, the parameter after adding noise is expressed as W i,t,σ ,W i,t,σ =W i,t +noise, thereby protecting private data in models and parameters from leakage.
For Gao Siji noise generation, we have a zero-mean Gaussian distribution called normal distribution, expressed as
Gaussian noise will passThe noise with the parameter sigma is added to k components of the output in proportion。
Let e (0, 1) be any value within the interval. For the followingThe parameters caused by Gaussian have sigma not less than cdelta 2 /∈。
Differential privacy may prevent privacy attacks at the model prediction stage. By adding noise to the model parameters, an attacker cannot obtain an accurate output result by querying the model, so that training data cannot be recovered, and whether a specific sample belongs to the training data of the model cannot be inferred from the output result of the model.
Existing work in federal using differential privacy training models involves defining the contribution of each user model update with a fixed cut-out value. However, there is no well-defined value of clipping norm in learning settings and tasks. To avoid the drawbacks of fixed clipping norms, we use an adaptive clipping method that can automatically adjust clipping thresholds based on model architecture and loss, amount of data on each device, client learning rate, and possibly various other parameters. To sum up, to provide privacy protection, we train the federal learning model on the gaussian mechanism using an adaptive clipping method using a differential privacy supported FedAvg algorithm.
In an alternative embodiment of the present invention, the data providing device of the present embodiment uploads their local model updates by invoking an upload smart contract with a digital signature. I.e. device e i The local gradient is calculated by:
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
then e i Submitting a transaction by invoking an upload smart contract, followed by preparationAnd (5) performing verification.
Once the nodes receive the transaction for the gradient, they first verify the legitimacy of the sender by verifying the digital signature. If the verification passes, the transaction is packaged into blocks and broadcast to other nodes.
In an alternative embodiment of the invention, the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
wherein,,represents the average gradient of the t-th round, S t Device ID, g (W) indicating data device selected by data requesting device at t-th round i,t,σ ) Representing the local gradient of the t-th round joining the gaussian differential privacy.
In particular, in order to solve the problem that a part of data devices are offline, the data request device selects a part of their uploading gradients from the data devices in the non-temporary devices to be aggregated at the current round each time before aggregation, and this part of data devices is called data contribution devices. Data contribution devices refer to devices that successfully upload gradients in the current round of training and are always online. Each time before aggregation, the data requesting device selects some entities from the data contributing devices, called candidate devices, whose uploading gradients will be aggregated at the current round.
After consensus is reached, the candidate list is delivered to the blockchain as a trigger for self-aggregating intelligent contracts and contribution evaluating intelligent contracts.
All devices in the same task channel invoke and trigger the corresponding smart contracts. Only the data contributing devices operate the self-aggregating intelligent contract with the candidate device list as input. Unlike the traditional federal learning model, which changes the model of aggregation with a central server, the device can aggregate by invoking an aggregate intelligence contract whose input parameter is the participant ID selected by the data requesting device. Gradient aggregation can then be automatically completed and the aggregate results uploaded to blockchain storage, which will be discarded for operations exceeding a certain time limit. The aggregation formula is:
all online devices can call the evaluation intelligent contract, and the contribution evaluation firstly considers how much effective information the gradient contains, so that the convergence speed can be increased. It is then desirable to determine the contribution from the amount of valid information in the gradient. It is therefore desirable to find a suitable method to quantify the information content of a measurement dataset and map the information content to a reasonable contribution function, which can also serve as a basis for pricing if transactions are involved. Therefore, the invention designs a novel contribution evaluation model based on gradient entropy. Entropy is a measure of uncertainty in a variable. According to the information entropy theory, the larger the entropy value is, the more information is contained. But the uploaded local gradient has a large uncertainty. According to this principle, the contribution of the data device selected by the data requesting device is evaluated using a gradient entropy-based contribution evaluation model, expressed as:
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Information amount S of (2) t Representing the device ID of the data device selected by the data requesting device at round t.
Based on the evaluation, the data requesting device awards a certain number of tokens to the candidate device.
In the present invention, the evaluation process is programmed to contribute to evaluating the smart contracts, enhancing the fairness and security of the system due to the nature of the smart contracts.
In an alternative embodiment of the present invention, the blockchain network is composed of nodes and intelligent contracts mainly composed of various devices, through which task requests can be issued between system users, the blockchain network is also responsible for recording internal transactions, training models and updating models, and the blockchain network deploys a plurality of intelligent contracts including basic contracts for registration, request or response operations, and aggregated contracts, updating contracts, etc. for supporting federal learning. Federal learning includes local federal learning and global federal learning. The local federal learning module is embedded in the device, and the global federal learning module is composed of self-aggregation intelligent contracts, evaluation intelligent contracts, latest model intelligent contracts and the like.
Device e i The global model is locally maintained and updated using a random gradient algorithm (SGA). However, if the device suddenly goes offline at the t-th round for some reason and cannot upload the gradient, then resumes communication at a random time, the device that has this situation will be referred to as a temporary device. The mismatch of the latest global model stored on the blockchain with the gradients they get during offline may cause dilution problems for learning. Traditional joint learning directly excludes these participants to ensure reliability of training, but results in inadequate training data.
To solve this problem, the present invention designs a dynamic learning mechanism through equation (6). For normal equipment e in the t-th round i The global model is updated by the SGA using the following equation:
wherein W is t+1 Representing updated global model parameters, W t Representing global model parameters, eta representing the learning rate of the present round,mean gradient of the t-th round is shown.
Temporary device e without contribution gradient in the t-th round i To maintain memory consistency of the last global model stored locally, while following the federal learning update process, they first invoke the latest model intelligence contract to obtain the latest aggregate results for the t rounds and update their global model using the following equation:
W t+1 =(1-α i (t))W i,old +α i (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient, Δt represents the difference between the last update time and the last training time; w (W) i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
Based on the dynamic learning mechanism designed by the invention, the data contribution equipment updates their global model with the help of the self-aggregation result by utilizing the SGA algorithm. The new global model is then sent to the blockchain to achieve consensus as the next iteration of the training model. For temporary devices, they first invoke the latest model intelligence contract to download the latest global model delivered by the contributor and update their global model, reducing the degree of mismatch by trade-offs between the latest and local models.
In an alternative embodiment of the invention, the data sharing event between the data requesting device and the data providing device is generated in the form of a transaction and broadcast in the blockchain network.
Example 2
As shown in fig. 4, the embodiment of the present invention further provides a four-network fusion data sharing method based on blockchain and federal learning, which is applied to the system described in embodiment 1, and includes the following steps:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Claims (9)
1. A four-network converged data sharing system based on blockchain and federal learning, comprising:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
2. The four-network convergence data sharing system based on blockchain and federation learning as claimed in claim 1, wherein the data devices are divided into data requesting devices and data providing devices according to sharing behaviors, wherein the devices requesting data from the sharing system are data requesting devices, and the devices responding to the data requests are data providing devices;
the data request device is used for sending a data request to the sharing system, wherein the data request comprises a device ID, a request effective duration, a transaction budget, an encrypted initial global model and request state information; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the data request sent by the received data request device, searching the historical record of the data request device according to the record on the blockchain network, sending response information to the data request device, and proposing a transaction through the response information; the response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
3. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
4. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data providing device calculates the local gradient using the following formula:
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
and invoking intelligent contracts with digital signatures deployed in a blockchain network to upload local gradients;
the data requesting device verifies the legitimacy of the data providing device by verifying the digital signature after receiving the transaction for the gradient, and packages the transaction into blocks and broadcasts to other nodes after the verification is passed.
5. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
6. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
evaluating the contribution of the data device selected by the data requesting device using a gradient entropy based contribution evaluation model, expressed as:
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Information amount S of (2) t Representing the device ID of the data device selected by the data requesting device at round t.
7. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
and updating the global model of the normal equipment by adopting a random gradient algorithm according to the self-aggregation result, wherein the global model is expressed as follows:
8. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
and calling the latest model intelligent contract to acquire a latest aggregation result and a latest global model to update the global model of the temporary equipment, wherein the latest aggregation result and the latest global model are expressed as:
W t+1 =(1-α i (t))W i,old +α i (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient, W i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
9. A four-network fusion data sharing method based on blockchain and federal learning applied to the system of claim 1, comprising the steps of:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310342414.1A CN116389478A (en) | 2023-03-31 | 2023-03-31 | Four-network fusion data sharing method based on blockchain and federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310342414.1A CN116389478A (en) | 2023-03-31 | 2023-03-31 | Four-network fusion data sharing method based on blockchain and federal learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116389478A true CN116389478A (en) | 2023-07-04 |
Family
ID=86966997
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310342414.1A Pending CN116389478A (en) | 2023-03-31 | 2023-03-31 | Four-network fusion data sharing method based on blockchain and federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116389478A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472866A (en) * | 2023-12-27 | 2024-01-30 | 齐鲁工业大学(山东省科学院) | Federal learning data sharing method under block chain supervision and excitation |
-
2023
- 2023-03-31 CN CN202310342414.1A patent/CN116389478A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472866A (en) * | 2023-12-27 | 2024-01-30 | 齐鲁工业大学(山东省科学院) | Federal learning data sharing method under block chain supervision and excitation |
CN117472866B (en) * | 2023-12-27 | 2024-03-19 | 齐鲁工业大学(山东省科学院) | Federal learning data sharing method under block chain supervision and excitation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110428351B (en) | Semi-distributed vehicle violation reporting method based on block chain | |
Wei et al. | A privacy-preserving fog computing framework for vehicular crowdsensing networks | |
CN110825810B (en) | Block chain-based crowd sensing dual privacy protection method | |
Terzi et al. | Securing emission data of smart vehicles with blockchain and self-sovereign identities | |
CN112929333B (en) | Vehicle networking data safe storage and sharing method based on hybrid architecture | |
CN111047316A (en) | Tamper-resistant intelligent block chain system and implementation method | |
CN113992526B (en) | Coalition chain cross-chain data fusion method based on credibility calculation | |
Zhang et al. | Smart contract for secure billing in ride-hailing service via blockchain | |
CN115270145A (en) | User electricity stealing behavior detection method and system based on alliance chain and federal learning | |
CN116389478A (en) | Four-network fusion data sharing method based on blockchain and federal learning | |
CN114978530B (en) | Distance calculation and privacy protection method for distributed space crowdsourcing in space information network | |
CN115499129A (en) | Multimode trust cross-chain consensus method, system, medium, equipment and terminal | |
CN116595094A (en) | Federal learning incentive method, device, equipment and storage medium based on block chain | |
CN113360951B (en) | Electronic evidence preservation method based on partitioned block chain | |
CN116996521B (en) | Relay committee cross-chain interaction system and method based on trust evaluation model | |
CN112688775B (en) | Management method and device of alliance chain intelligent contract, electronic equipment and medium | |
Guo et al. | Vehicloak: A blockchain-enabled privacy-preserving payment scheme for location-based vehicular services | |
CN117202203A (en) | Multi-factor comprehensive trust evaluation method in Internet of vehicles environment | |
Bai et al. | Blockchain-based Authentication and Proof-of-Reputation Mechanism for Trust Data Sharing in Internet of Vehicles. | |
Hegde et al. | Hash based integrity verification for vehicular cloud environment | |
CN114172661B (en) | Bidirectional cross-link method, system and device for digital asset | |
Alam et al. | Functionality, privacy, security and rewarding based on fog assisted cloud computing techniques in Internet of Vehicles | |
Sun et al. | An efficient and secure trading framework for shared charging service based on multiple consortium blockchains | |
Das et al. | Design of a Trust-Based Authentication Scheme for Blockchain-Enabled IoV System | |
CN111222057B (en) | Information processing method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |