CN116389478A - Four-network fusion data sharing method based on blockchain and federal learning - Google Patents

Four-network fusion data sharing method based on blockchain and federal learning Download PDF

Info

Publication number
CN116389478A
CN116389478A CN202310342414.1A CN202310342414A CN116389478A CN 116389478 A CN116389478 A CN 116389478A CN 202310342414 A CN202310342414 A CN 202310342414A CN 116389478 A CN116389478 A CN 116389478A
Authority
CN
China
Prior art keywords
data
equipment
global model
blockchain
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310342414.1A
Other languages
Chinese (zh)
Inventor
冯卫东
王爱丽
刘宇
耿欣
黎琳
常晓琳
鲁放
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Railway Information Technology Group Co ltd
China State Railway Group Co Ltd
Original Assignee
China Railway Information Technology Group Co ltd
China State Railway Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Railway Information Technology Group Co ltd, China State Railway Group Co Ltd filed Critical China Railway Information Technology Group Co ltd
Priority to CN202310342414.1A priority Critical patent/CN116389478A/en
Publication of CN116389478A publication Critical patent/CN116389478A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0823Network architectures or network communication protocols for network security for authentication of entities using certificates
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3247Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a four-network fusion data sharing system based on blockchain and federal learning, which comprises data equipment, a certificate authority server and a blockchain network; the block chain network updates a global model of normal equipment by adopting a random gradient algorithm according to a self-aggregation result based on a federal learning mode, and records sharing behaviors; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment. According to the invention, the federation learning is integrated into the blockchain network, participants are not required to be always on-line, the defect that equipment is required to be always on-line by the traditional federation learning scheme is overcome, the method and the system are more in line with actual scenes, the efficiency and the safety of data sharing and fusion can be improved, and meanwhile, the intellectualization of transaction processing is realized.

Description

Four-network fusion data sharing method based on blockchain and federal learning
Technical Field
The invention relates to the technical field of four-network fusion data sharing, in particular to a four-network fusion data sharing method based on blockchain and federal learning.
Background
The four-network integration of the rail transit is realized by pushing a trunk railway network, an inter-city railway network, a city domain (suburban) railway network and a four-network integration of the urban rail transit network to construct an urban mass urban ring on a rail, so that the urban ring is pushed to develop. In order to improve connectivity of rail transit, a modern rail transit system with accurate function positioning, clear network hierarchy and high efficiency connection is established, more accurate travel track prediction and traffic flow space-time analysis of passengers are required, and service quality and passenger travel experience of trunk railway networks, inter-city railway networks, urban (suburban) railway networks and urban rail transit are improved through reasonable transfer planning, train dispatching and other behaviors. However, the data privacy of the main railway network, the inter-urban railway network, the urban (suburban) railway network and the urban rail transit is strong, even the data privacy is confidential, and the data island phenomenon is caused because the data island phenomenon is difficult to directly share, so that the cooperative work cannot be performed.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a four-network fusion data sharing method based on block chain and federal learning.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a four-network converged data sharing system based on blockchain and federal learning, comprising:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
Optionally, the data device is divided into a data request device and a data providing device according to the sharing behavior, wherein the device requesting data from the sharing system is the data request device, and the device responding to the data request is the data providing device;
the data request device is used for sending a data request to the sharing system, wherein the data request comprises a device ID, a request effective duration, a transaction budget, an encrypted initial global model and request state information; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the data request sent by the received data request device, searching the historical record of the data request device according to the record on the blockchain network, sending response information to the data request device, and proposing a transaction through the response information; the response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
Optionally, the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
Optionally, the data device is specifically configured to:
the data providing device calculates the local gradient using the following formula:
Figure BDA0004158416780000031
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
and invoking intelligent contracts with digital signatures deployed in a blockchain network to upload local gradients;
the data requesting device verifies the legitimacy of the data providing device by verifying the digital signature after receiving the transaction for the gradient, and packages the transaction into blocks and broadcasts to other nodes after the verification is passed.
Optionally, the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
Figure BDA0004158416780000032
wherein,,
Figure BDA0004158416780000041
represents the average gradient of the t-th round, S t Device ID, g (W) indicating data device selected by data requesting device at t-th round i,t,σ ) Representing the local gradient of the t-th round joining the gaussian differential privacy.
Optionally, the blockchain network is specifically configured to:
evaluating the contribution of the data device selected by the data requesting device using a gradient entropy based contribution evaluation model, expressed as:
Figure BDA0004158416780000042
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Is used for the information amount of the (a),S t representing the device ID of the data device selected by the data requesting device at round t.
Optionally, the blockchain network is specifically configured to:
and updating the global model of the normal equipment by adopting a random gradient algorithm according to the self-aggregation result, wherein the global model is expressed as follows:
Figure BDA0004158416780000043
wherein W is t+1 Representing updated global model parameters, W t Representing global model parameters, eta representing the learning rate of the present round,
Figure BDA0004158416780000044
mean gradient of the t-th round is shown.
Optionally, the blockchain network is specifically configured to:
and calling the latest model intelligent contract to acquire a latest aggregation result and a latest global model to update the global model of the temporary equipment, wherein the latest aggregation result and the latest global model are expressed as:
W t+1 =(1-α i (t))W i,oldi (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient, W i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
A four-network fusion data sharing method based on block chain and federation learning applied to the system comprises the following steps:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
The invention has the following beneficial effects:
according to the invention, the federation learning is integrated into the blockchain network, participants are not required to be always on-line, the defect that the traditional federation learning requires equipment to be on-line all the time is overcome, the method and the system are more in line with actual scenes, the efficiency and the safety of data sharing and fusion can be improved, and meanwhile, the intellectualization of transaction processing is realized.
Drawings
FIG. 1 is a schematic diagram of a four-network fusion data sharing system based on blockchain and federal learning in an embodiment of the present invention;
FIG. 2 is a timing diagram of a four-network fusion data sharing system based on blockchain and federation learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a workflow of a four-network converged data sharing system based on blockchain and federal learning in accordance with an embodiment of the present invention;
fig. 4 is a flow chart of a four-network fusion data sharing method based on blockchain and federation learning in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Federal learning is an emerging artificial intelligent basic technology for developing high-efficiency machine learning among multiple participants or multiple computing nodes on the premise of ensuring information security during large data exchange, protecting terminal data and personal data privacy and ensuring legal compliance.
The blockchain is used as a decentralized, data encryption and non-tamperable distributed shared database, can provide data confidentiality for federal learning data exchange, so as to ensure user privacy and data security among all participants, and can also ensure data consistency of model training by providing data by multiple participants, and the value driving incentive mechanism of the blockchain can also increase enthusiasm of providing data among all the participants and updating network model parameters.
The federal learning is combined with the blockchain, so that the requirements of partial situations of cooperative work and data sharing of all departments of four networks can be met, such as scenes of prediction of passenger travel rules, diagnosis of railway equipment faults and the like. The method breaks the data island, avoids the problem of insufficient model generalization capability caused by lack of sufficient data characteristics, does not leak data privacy, and has important significance for interconnection and intercommunication of four-network data and safe and efficient utilization.
The invention integrates federal learning into the blockchain, does not require the participants to be always on-line, overcomes the defect that the traditional federal learning requires equipment to be on-line all the time, is more in line with the actual scene, can improve the efficiency and safety of data sharing and fusion, and simultaneously achieves intelligent transaction processing.
Example 1
As shown in fig. 1 to 3, an embodiment of the present invention provides a four-network fusion data sharing system based on blockchain and federal learning, including:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
In an optional embodiment of the present invention, the data device is divided into a data requesting device and a data providing device according to a sharing behavior, wherein the device requesting data from the sharing system is the data requesting device, and the device responding to the data request is the data providing device;
the data request equipment is used for sending a data request to the sharing system; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the received data request sent by the data requesting device, searching the historical record of the data requesting device according to the record on the blockchain network, sending response information to the data requesting device, and proposing a transaction through the response information.
The data request includes a device ID, a request validity duration, a transaction budget, an encrypted initial global model, and request state information.
The response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
Specifically, the data device in this embodiment refers to various devices deployed in a network, which have a certain data processing capability and a certain storage capability, and may request data of other users or provide data to other users. For example, if the system is used for equipment risk and fault prediction, the equipment can be a sensor, and if the system is used for passenger travel law prediction, the equipment can be a data server. The device is both a component of the blockchain network and a user of the system. The computing resource rich device may be selected as a full node and the computing resource constrained device as a client node of the blockchain network.
To ensure security, the system is based on a federated blockchain design, where only allowed users can join the blockchain. Device e i The certificate needs to be applied by the CA server before joining the system. The CA server is responsible for verifying the identity of the applicant and issuing certificates. The device after passing the verification can join the blockchain network. e, e i After joining, it is chosen whether to be a full node or a light node, i.e. a client, according to its own computing power. Only all nodes have the entitlement packing and verification block.
A certain number of tokens must be escorted when the device registers. If there is malicious activity, the mortgage tokens will be revoked. While the device should possess local data, referred to as a local data set.
Each entity that successfully joins and registers with the blockchain is a system user and can be divided into different roles according to behavior. Such as a user requesting data from the system, is referred to as a data requestor. Data requester e j Preparing to issue a data request R to the system, R comprising a device ID, a request validity duration, a transaction budget, an encrypted initial global model, and a requestStatus, etc. The request is recorded as a transaction in the blockchain after verification by the consensus node. The data request is stored in a status database.
The user responding to the data request is called a data provider, who, upon receiving the data request R, first validates the request and searches the history of the requestor based on records on the blockchain. The data provider sends response information through which the transaction is presented. The response information includes device ID, data size, the duration of operation of the device, computing power, public key, etc.
Upon receipt of the response, the data requester will select a portion of the transactions based on the budget and send them a Fabric channel configuration that allows them to join the same task channel to form a temporary collaboration group. The Fabric channel is essentially a private atom broadcasting channel, and aims to prevent transaction information from being revealed to unauthorized nodes, so that entities outside the channel cannot access data in the channel, and the privacy and the security of transactions are improved.
Assuming that the number of malicious nodes does not exceed the maximum value of the Bayesian fault tolerance, namely under the condition that the system is safe, each task channel selects a node with good partial contribution degree as a leader and takes charge of consensus verification.
In an alternative embodiment of the invention, the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
Specifically, suppose that a set E of N devices responds to a data sharing request R, where e= { E 1 ,e 2 ,…,e N }. Is provided withStandby e i With a local data set D i Wherein each data sample is represented by d k And (3) representing. Let W be i,t Is device e i Is based on the local data set D at the t-th round i Is a local model parameter of W t Is a global model parameter. The goal of the local training is to minimize the loss function of the round to obtain a more excellent model.
The data providing apparatus of this embodiment firstly decrypts the initial global model using the public key of the requester, and at the same time verifies the initial global model by comparing the Hash values of the initial global model, and then they train the initialized global model and add gaussian differential privacy to the training parameters before local gradient uploading based on their own local data set, and if the noise parameter is σ, the parameter after adding noise is expressed as W i,t,σ ,W i,t,σ =W i,t +noise, thereby protecting private data in models and parameters from leakage.
For Gao Siji noise generation, we have a zero-mean Gaussian distribution called normal distribution, expressed as
Figure BDA0004158416780000101
Figure BDA0004158416780000102
Let f:
Figure BDA0004158416780000103
for any K-dimensional function, define its/ 2 Sensitivity is delta 2
Figure BDA0004158416780000104
Gaussian noise will pass
Figure BDA0004158416780000105
The noise with the parameter sigma is added to k components of the output in proportion。
Let e (0, 1) be any value within the interval. For the following
Figure BDA0004158416780000106
The parameters caused by Gaussian have sigma not less than cdelta 2 /∈。
Differential privacy may prevent privacy attacks at the model prediction stage. By adding noise to the model parameters, an attacker cannot obtain an accurate output result by querying the model, so that training data cannot be recovered, and whether a specific sample belongs to the training data of the model cannot be inferred from the output result of the model.
Existing work in federal using differential privacy training models involves defining the contribution of each user model update with a fixed cut-out value. However, there is no well-defined value of clipping norm in learning settings and tasks. To avoid the drawbacks of fixed clipping norms, we use an adaptive clipping method that can automatically adjust clipping thresholds based on model architecture and loss, amount of data on each device, client learning rate, and possibly various other parameters. To sum up, to provide privacy protection, we train the federal learning model on the gaussian mechanism using an adaptive clipping method using a differential privacy supported FedAvg algorithm.
In an alternative embodiment of the present invention, the data providing device of the present embodiment uploads their local model updates by invoking an upload smart contract with a digital signature. I.e. device e i The local gradient is calculated by:
Figure BDA0004158416780000111
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
then e i Submitting a transaction by invoking an upload smart contract, followed by preparationAnd (5) performing verification.
Once the nodes receive the transaction for the gradient, they first verify the legitimacy of the sender by verifying the digital signature. If the verification passes, the transaction is packaged into blocks and broadcast to other nodes.
In an alternative embodiment of the invention, the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
Figure BDA0004158416780000121
wherein,,
Figure BDA0004158416780000122
represents the average gradient of the t-th round, S t Device ID, g (W) indicating data device selected by data requesting device at t-th round i,t,σ ) Representing the local gradient of the t-th round joining the gaussian differential privacy.
In particular, in order to solve the problem that a part of data devices are offline, the data request device selects a part of their uploading gradients from the data devices in the non-temporary devices to be aggregated at the current round each time before aggregation, and this part of data devices is called data contribution devices. Data contribution devices refer to devices that successfully upload gradients in the current round of training and are always online. Each time before aggregation, the data requesting device selects some entities from the data contributing devices, called candidate devices, whose uploading gradients will be aggregated at the current round.
After consensus is reached, the candidate list is delivered to the blockchain as a trigger for self-aggregating intelligent contracts and contribution evaluating intelligent contracts.
All devices in the same task channel invoke and trigger the corresponding smart contracts. Only the data contributing devices operate the self-aggregating intelligent contract with the candidate device list as input. Unlike the traditional federal learning model, which changes the model of aggregation with a central server, the device can aggregate by invoking an aggregate intelligence contract whose input parameter is the participant ID selected by the data requesting device. Gradient aggregation can then be automatically completed and the aggregate results uploaded to blockchain storage, which will be discarded for operations exceeding a certain time limit. The aggregation formula is:
Figure BDA0004158416780000123
all online devices can call the evaluation intelligent contract, and the contribution evaluation firstly considers how much effective information the gradient contains, so that the convergence speed can be increased. It is then desirable to determine the contribution from the amount of valid information in the gradient. It is therefore desirable to find a suitable method to quantify the information content of a measurement dataset and map the information content to a reasonable contribution function, which can also serve as a basis for pricing if transactions are involved. Therefore, the invention designs a novel contribution evaluation model based on gradient entropy. Entropy is a measure of uncertainty in a variable. According to the information entropy theory, the larger the entropy value is, the more information is contained. But the uploaded local gradient has a large uncertainty. According to this principle, the contribution of the data device selected by the data requesting device is evaluated using a gradient entropy-based contribution evaluation model, expressed as:
Figure BDA0004158416780000131
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Information amount S of (2) t Representing the device ID of the data device selected by the data requesting device at round t.
Based on the evaluation, the data requesting device awards a certain number of tokens to the candidate device.
In the present invention, the evaluation process is programmed to contribute to evaluating the smart contracts, enhancing the fairness and security of the system due to the nature of the smart contracts.
In an alternative embodiment of the present invention, the blockchain network is composed of nodes and intelligent contracts mainly composed of various devices, through which task requests can be issued between system users, the blockchain network is also responsible for recording internal transactions, training models and updating models, and the blockchain network deploys a plurality of intelligent contracts including basic contracts for registration, request or response operations, and aggregated contracts, updating contracts, etc. for supporting federal learning. Federal learning includes local federal learning and global federal learning. The local federal learning module is embedded in the device, and the global federal learning module is composed of self-aggregation intelligent contracts, evaluation intelligent contracts, latest model intelligent contracts and the like.
Device e i The global model is locally maintained and updated using a random gradient algorithm (SGA). However, if the device suddenly goes offline at the t-th round for some reason and cannot upload the gradient, then resumes communication at a random time, the device that has this situation will be referred to as a temporary device. The mismatch of the latest global model stored on the blockchain with the gradients they get during offline may cause dilution problems for learning. Traditional joint learning directly excludes these participants to ensure reliability of training, but results in inadequate training data.
To solve this problem, the present invention designs a dynamic learning mechanism through equation (6). For normal equipment e in the t-th round i The global model is updated by the SGA using the following equation:
Figure BDA0004158416780000141
wherein W is t+1 Representing updated global model parameters, W t Representing global model parameters, eta representing the learning rate of the present round,
Figure BDA0004158416780000142
mean gradient of the t-th round is shown.
Temporary device e without contribution gradient in the t-th round i To maintain memory consistency of the last global model stored locally, while following the federal learning update process, they first invoke the latest model intelligence contract to obtain the latest aggregate results for the t rounds and update their global model using the following equation:
W t+1 =(1-α i (t))W i,oldi (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient,
Figure BDA0004158416780000143
Figure BDA0004158416780000144
Δt represents the difference between the last update time and the last training time; w (W) i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
Based on the dynamic learning mechanism designed by the invention, the data contribution equipment updates their global model with the help of the self-aggregation result by utilizing the SGA algorithm. The new global model is then sent to the blockchain to achieve consensus as the next iteration of the training model. For temporary devices, they first invoke the latest model intelligence contract to download the latest global model delivered by the contributor and update their global model, reducing the degree of mismatch by trade-offs between the latest and local models.
In an alternative embodiment of the invention, the data sharing event between the data requesting device and the data providing device is generated in the form of a transaction and broadcast in the blockchain network.
Example 2
As shown in fig. 4, the embodiment of the present invention further provides a four-network fusion data sharing method based on blockchain and federal learning, which is applied to the system described in embodiment 1, and includes the following steps:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (9)

1. A four-network converged data sharing system based on blockchain and federal learning, comprising:
the data equipment is used for sending an equipment registration request to the certificate issuing authority server so as to acquire the certificate, the public key and the secret key distributed by the issuing authority server; adding the acquired certificate into a blockchain network, and receiving an initial global model sent by the blockchain network; training a local model added with differential privacy by using a local data set, and calling intelligent contracts deployed in a blockchain network to upload a local gradient;
the certificate issuing mechanism server is used for responding to the equipment registration request sent by the data equipment, verifying the identity of the data equipment and distributing a certificate, a public key and a secret key to the data equipment passing the verification;
a blockchain network consisting of each data device as a node and having responsibility for transmitting an initial global model to the newly joined device; based on a scheme combining blockchain and federal learning, dynamically selecting contributors according to the online state of the data equipment, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
2. The four-network convergence data sharing system based on blockchain and federation learning as claimed in claim 1, wherein the data devices are divided into data requesting devices and data providing devices according to sharing behaviors, wherein the devices requesting data from the sharing system are data requesting devices, and the devices responding to the data requests are data providing devices;
the data request device is used for sending a data request to the sharing system, wherein the data request comprises a device ID, a request effective duration, a transaction budget, an encrypted initial global model and request state information; selecting a part of transactions according to the transaction budget based on response information of the data providing devices, and sending a Fabric channel configuration to the responding data providing devices, wherein the Fabric channel configuration allows all responding data providing devices to join the same task channel to form a temporary collaboration group;
the data providing device is used for verifying the data request sent by the received data request device, searching the historical record of the data request device according to the record on the blockchain network, sending response information to the data request device, and proposing a transaction through the response information; the response information includes a device ID, a data size, a duration of operation of the device, computing power, and public key information of the device.
3. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data providing device decrypts the initial global model sent by the blockchain network by utilizing the public key of the data requesting device, verifies the initial global model by comparing the Hash value of the initial global model, and trains the initial global model passing the verification;
and adding Gaussian differential privacy into training parameters by utilizing a local data set of the data providing equipment, training a local model by adopting a federal average algorithm based on the Gaussian differential privacy, and training a federal learning model by adopting a self-adaptive clipping method.
4. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data providing device calculates the local gradient using the following formula:
Figure FDA0004158416730000021
wherein g t Representing the local gradient of the t-th round, L (W i,t ,D i ) Representing W i,t Indicating device e i Is based on the local data set D at the t-th round i Is a local model parameter of (a);
and invoking intelligent contracts with digital signatures deployed in a blockchain network to upload local gradients;
the data requesting device verifies the legitimacy of the data providing device by verifying the digital signature after receiving the transaction for the gradient, and packages the transaction into blocks and broadcasts to other nodes after the verification is passed.
5. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the data device is specifically configured to:
the data request equipment selects a part of data equipment from the data equipment which is successfully uploaded with the local gradient in the current round of training and is always online according to the online state of the data equipment, and gradient aggregation is carried out on the local gradient uploaded by the selected data equipment in the current round by adopting the following steps:
Figure FDA0004158416730000031
wherein,,
Figure FDA0004158416730000032
represents the average gradient of the t-th round, S t Device ID, g (W) indicating data device selected by data requesting device at t-th round i,t,σ ) Representing the local gradient of the t-th round joining the gaussian differential privacy.
6. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
evaluating the contribution of the data device selected by the data requesting device using a gradient entropy based contribution evaluation model, expressed as:
Figure FDA0004158416730000033
wherein C is i,t Representing the contribution evaluation value of the data equipment in the t-th round, E i,t (g(W i,t,σ ) A) represents the local gradient g (W) uploaded by the data device in the t-th round i,t,σ ) Information amount S of (2) t Representing the device ID of the data device selected by the data requesting device at round t.
7. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
and updating the global model of the normal equipment by adopting a random gradient algorithm according to the self-aggregation result, wherein the global model is expressed as follows:
Figure FDA0004158416730000041
wherein W is t+1 Representing updated global model parameters, W t Representing global model parameters, eta representing the learning rate of the present round,
Figure FDA0004158416730000042
mean gradient of the t-th round is shown.
8. The four-network converged data sharing system based on blockchain and federal learning of claim 2, wherein the blockchain network is specifically configured to:
and calling the latest model intelligent contract to acquire a latest aggregation result and a latest global model to update the global model of the temporary equipment, wherein the latest aggregation result and the latest global model are expressed as:
W t+1 =(1-α i (t))W i,oldi (t)W new
wherein W is t+1 Representing updated global model parameters, alpha i (t) represents the latest weight coefficient, W i,old Representing the latest global model parameters stored by the temporary device, W new Representing the latest global model parameters of the current round.
9. A four-network fusion data sharing method based on blockchain and federal learning applied to the system of claim 1, comprising the steps of:
s1, sending a device registration request to a certificate authority server by using data equipment;
s2, using a certificate authority server to respond to a device registration request sent by the data device, verifying the identity of the data device and distributing a certificate, a public key and a secret key to the data device passing verification;
s3, adding the data equipment into a blockchain network according to the acquired certificate;
s4, transmitting an initial global model to the newly added equipment by utilizing a blockchain network;
s5, utilizing a data device to train a local model added with differential privacy by utilizing a local data set, and calling an intelligent contract deployed in a blockchain network to upload a local gradient;
s6, dynamically selecting contributors according to the online state of the data equipment by utilizing a scheme of combining a blockchain network based on blockchain and federal learning, and triggering self-aggregation intelligent contracts and contribution evaluation intelligent contracts according to local gradients uploaded by the data equipment; according to the self-aggregation result, a random gradient algorithm is adopted to update a global model of normal equipment, and sharing behaviors are recorded; and calling the latest model intelligent contract to acquire a latest aggregation result and the latest global model to update the global model of the temporary equipment.
CN202310342414.1A 2023-03-31 2023-03-31 Four-network fusion data sharing method based on blockchain and federal learning Pending CN116389478A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310342414.1A CN116389478A (en) 2023-03-31 2023-03-31 Four-network fusion data sharing method based on blockchain and federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310342414.1A CN116389478A (en) 2023-03-31 2023-03-31 Four-network fusion data sharing method based on blockchain and federal learning

Publications (1)

Publication Number Publication Date
CN116389478A true CN116389478A (en) 2023-07-04

Family

ID=86966997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310342414.1A Pending CN116389478A (en) 2023-03-31 2023-03-31 Four-network fusion data sharing method based on blockchain and federal learning

Country Status (1)

Country Link
CN (1) CN116389478A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472866A (en) * 2023-12-27 2024-01-30 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472866A (en) * 2023-12-27 2024-01-30 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation
CN117472866B (en) * 2023-12-27 2024-03-19 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation

Similar Documents

Publication Publication Date Title
CN110428351B (en) Semi-distributed vehicle violation reporting method based on block chain
Wei et al. A privacy-preserving fog computing framework for vehicular crowdsensing networks
CN110825810B (en) Block chain-based crowd sensing dual privacy protection method
Terzi et al. Securing emission data of smart vehicles with blockchain and self-sovereign identities
CN112929333B (en) Vehicle networking data safe storage and sharing method based on hybrid architecture
CN111047316A (en) Tamper-resistant intelligent block chain system and implementation method
CN113992526B (en) Coalition chain cross-chain data fusion method based on credibility calculation
Zhang et al. Smart contract for secure billing in ride-hailing service via blockchain
CN115270145A (en) User electricity stealing behavior detection method and system based on alliance chain and federal learning
CN116389478A (en) Four-network fusion data sharing method based on blockchain and federal learning
CN114978530B (en) Distance calculation and privacy protection method for distributed space crowdsourcing in space information network
CN115499129A (en) Multimode trust cross-chain consensus method, system, medium, equipment and terminal
CN116595094A (en) Federal learning incentive method, device, equipment and storage medium based on block chain
CN113360951B (en) Electronic evidence preservation method based on partitioned block chain
CN116996521B (en) Relay committee cross-chain interaction system and method based on trust evaluation model
CN112688775B (en) Management method and device of alliance chain intelligent contract, electronic equipment and medium
Guo et al. Vehicloak: A blockchain-enabled privacy-preserving payment scheme for location-based vehicular services
CN117202203A (en) Multi-factor comprehensive trust evaluation method in Internet of vehicles environment
Bai et al. Blockchain-based Authentication and Proof-of-Reputation Mechanism for Trust Data Sharing in Internet of Vehicles.
Hegde et al. Hash based integrity verification for vehicular cloud environment
CN114172661B (en) Bidirectional cross-link method, system and device for digital asset
Alam et al. Functionality, privacy, security and rewarding based on fog assisted cloud computing techniques in Internet of Vehicles
Sun et al. An efficient and secure trading framework for shared charging service based on multiple consortium blockchains
Das et al. Design of a Trust-Based Authentication Scheme for Blockchain-Enabled IoV System
CN111222057B (en) Information processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination