CN114417398A - Data sharing method based on block chain and federal learning - Google Patents

Data sharing method based on block chain and federal learning Download PDF

Info

Publication number
CN114417398A
CN114417398A CN202111543907.9A CN202111543907A CN114417398A CN 114417398 A CN114417398 A CN 114417398A CN 202111543907 A CN202111543907 A CN 202111543907A CN 114417398 A CN114417398 A CN 114417398A
Authority
CN
China
Prior art keywords
node
data
nodes
team
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111543907.9A
Other languages
Chinese (zh)
Inventor
范新民
妙秦阳
汪晓丁
张灵杰
林晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Normal University
Original Assignee
Fujian Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Normal University filed Critical Fujian Normal University
Priority to CN202111543907.9A priority Critical patent/CN114417398A/en
Publication of CN114417398A publication Critical patent/CN114417398A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data sharing method based on a block chain and federal learning, wherein block chain nodes which trust each other are organized into a team, and the team meeting the credit rating requirement is selected to respond to a request task after the request task is received; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, updating the credit rating in time, ensuring the reliability of the credit rating, and relieving the privacy protection problem of the data in the Internet of things.

Description

Data sharing method based on block chain and federal learning
Technical Field
The invention relates to the technical field of data sharing of the Internet of things, in particular to a data sharing method based on a block chain and federal learning.
Background
With the development of internet technology, the internet of things (IoT) is widely used in various industries. The sensor is an important component of the Internet of things and is also the most important data source of the Internet of things system. The perception data collected by a single sensor often cannot meet the requirements of users, and the real value of the Internet of things lies in comprehensive utilization and sharing of various data and information. For example, in the healthcare field, data sharing may provide valuable health records, including treatment information and physical examination information, which may provide targeted treatment to patients. In the tourism industry, collected data are analyzed, data sharing can accurately know the preference of tourists, and future tourism hotspots are predicted, so that the service quality is improved. However, data sharing in the internet of things may face the following problems: first, it is difficult for every organization to establish mutual trust, and therefore, they are unlikely to share reliable local data; second, data privacy has become a big problem hindering data sharing, as data owners suffer from privacy disclosure. Thus, achieving efficient data sharing is a challenge, particularly if both of these issues have not been solved.
Machine learning techniques are widely used for data sharing. Traditional machine learning techniques first collect data and then focus on model training. However, large-scale data collection is often difficult to achieve because the data owner is concerned about privacy disclosure. Federated learning is a distributed machine learning framework. It not only reduces the computational burden of centralized equipment by aggregating local training models of data owners instead of raw data, but also protects data privacy of data owners. The block chain is used as a distributed shared account book and a database, has the characteristics of decentralization, non-tampering, traceability, collective maintenance, openness and transparency and the like, and can provide reliable technical support for privacy protection of data sharing. For example, the blockchain may record the sharing behavior of each participant providing the data model, forcing the participants to provide a reliable data model.
Secure data sharing in the internet of things is receiving more and more attention, and a large number of data sharing mechanisms based on block chains and federal learning are proposed: gao et al (Blockchain based secure IoT data sharing frame for SDN-enabled smart communications) propose a secure data sharing framework using blockchains and proxy re-encryption techniques; xu et al (BDSS-FA: A Block-based Data Security establishing Platform With Fine Grained Access Control) propose a new encryption algorithm based on hierarchical attributes, and the attributes are allocated to an authorization center based on a block chain to realize the safe Sharing of Data; makhdoom and the like (privySharing: A block-based frame for privacy presetting and secure data sharing in smart contracts) embed access control rules in an intelligent contract to control the access of a user to data, and divide a block chain into a plurality of channels to protect the privacy and the security of the data; K. P.Y u et al (Block-Enhanced Data Sharing with Traceable and Direct retrieval in IIoT) abstract proposes an efficient and safe Data Sharing model based on attribute encryption, which can resist various attacks; hao et al (effective and private-Enhanced fed Learning for Industrial Intelligent understanding) propose a high-efficiency federal Learning mode, guarantee the Privacy of the data, this scheme can resist collusion attack in the distributed environment, prevent the personal data from revealing at the same time; sattler et al (Robust and Communication-Efficient fed Learning From Non-i.i.d.data) propose a sparse compression framework suitable for the broadband limited environment to solve the Communication overhead in the model training; a.imteaj and m.h.amini (Distributed Sensing Using Smart End-User Devices: Pathway to fed Learning for Autonomous IoT) improve federal Learning by evaluating model feedback of participants and an update method of participant weights; lu et al (Block chain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT) combine Data Sharing, machine Learning, block chaining and federal Learning together to solve the Privacy protection problem in Data Sharing; l.yin et al (A Privacy monitoring mined Learning for Multiparty Data Sharing in Social IoTs) protects Data Privacy of Data Sharing participants in the Social Internet of things by combining Federal Learning and cryptography, and improves Data transmission and storage efficiency by using sparse differential gradient; chai et al (A Hierarchical Block-Enabled Learning for Knowledge Learning in the Internet of Vehicles) construct a secure Hierarchical Federated Learning scheme to protect the privacy of local data models and solve the security problem of resource Sharing in the Internet of Vehicles environment.
Although the above work has positively contributed to privacy protection, further research is needed to ensure the reliability of the data sharing process. Therefore, in order to realize safe and reliable untrusted data sharing, a data sharing mechanism based on federal learning is proposed.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the data sharing method based on the block chain and the federal learning is provided, and the privacy protection problem of data in the Internet of things can be effectively relieved.
In order to solve the technical problems, the invention adopts the technical scheme that:
a data sharing method based on block chains and federal learning comprises the following steps:
building mutually trusted block link points into a team;
receiving a request task, and selecting a team meeting the credit rating requirement to respond to the request task;
receiving a data sharing task, and using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time;
packing the model training process to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.
The invention has the beneficial effects that: building mutually trusted block link points into a team, and after receiving a request task, selecting a team meeting the credit rating requirement to respond to the request task; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm contributed by nodes, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, and updating the credit rating in time, thereby ensuring the reliability of the credit rating and effectively relieving the privacy protection problem of the data in the Internet of things.
Drawings
FIG. 1 is a general flow chart of a federated learning-based data sharing strategy according to an embodiment of the present invention;
FIG. 2 is a block diagram of a federated learning-based data sharing strategy according to an embodiment of the present invention;
fig. 3 is a specific flowchart of a data sharing policy based on federal learning according to an embodiment of the present invention.
Detailed Description
In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.
Referring to fig. 1 to fig. 3, an embodiment of the present invention provides a data sharing method based on a block chain and federal learning, including the steps of:
building mutually trusted block link points into a team;
receiving a request task, and selecting a team meeting the credit rating requirement to respond to the request task;
receiving a data sharing task, and using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time;
packing the model training process to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.
From the above description, the beneficial effects of the present invention are: building mutually trusted block link points into a team, and after receiving a request task, selecting a team meeting the credit rating requirement to respond to the request task; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm contributed by nodes, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, and updating the credit rating in time, thereby ensuring the reliability of the credit rating and effectively relieving the privacy protection problem of the data in the Internet of things.
Further, the grouping mutually trusted tile link points into a team comprises:
providing a preset amount of mortgage when a team is built, and calculating a penalty coefficient k of a bad behavior node:
Figure BDA0003415034900000041
in the formula, v represents the total number of work rounds of the nodes for completing the cooperative task, p represents the temporary exit frequency of the nodes, and q represents the lazy frequency of the nodes;
calculating the penalty value of the nodes in the team:
Figure BDA0003415034900000042
in the formula (I), the compound is shown in the specification,
Figure BDA0003415034900000043
mortgage for node i;
computing compensation value C of non-adverse behavior nodes in team1
Figure BDA0003415034900000051
Where N represents the total number of nodes in the team.
According to the description, when a team is built, a preset amount of mortgage is provided, the penalty coefficient of a node with bad behaviors is calculated, the penalty value of the node in the team is calculated according to the penalty coefficient of the node, and then the compensation value of each node in the team is calculated; therefore, aiming at possible bad behaviors of the nodes, a team management mechanism based on 'mortgage-penalty' is designed, the loss of other nodes can be made up, and each team can further manage and supervise members so as to efficiently and reliably complete the data sharing task.
Further, the grouping mutually trusted block nodes into a team further comprises:
setting the original credit of the leader node or each member node in the team to be zero;
calculating the reward value of the leader node in the team:
Figure BDA0003415034900000052
in the formula, CreditIndicating credit awards provided by task publishers, WkRepresenting the contribution of the weighted ratio data nodes to the global model;
calculating a reward value for each member node:
Figure BDA0003415034900000053
calculating a credit value C for each node2
C2=Cbase+Cobtain
In the formula, CbaseRepresenting the original credit accumulation value of the node.
As can be seen from the above description, in order to promote honest and effective training of nodes, a credit rating mechanism is introduced, and the credit rating mechanism is rewarded or punished according to the contribution of the nodes, so that the accuracy of the credit rating is ensured.
Further, the data sharing task includes an ID of the task requester, a requested task category, a timestamp, and a task level.
As can be seen from the above description, the ID of the task requester, the type of the task requested, the timestamp and the task level included in the data sharing task facilitate the normal operation of the subsequent team for corresponding tasks and training the corresponding data model.
Further, the selecting a team meeting a credit rating requirement to respond to the requested task comprises:
verifying the identity of the requesting task according to the nodes connected to the requesting task;
and judging whether the request task is processed according to the identification, if so, directly returning a processing result inquired in the block chain, otherwise, broadcasting the request task on the block chain, and selecting a team meeting the credit rating requirement to respond to the request task.
As can be seen from the above description, after receiving a request task, it is necessary to verify the identifier of the request task according to the node connected to the request task, and if the identifier is found in the block chain, it indicates that the request task has been processed, and returns a query result, otherwise, the request task is broadcast, and a team meeting the credit rating requirement is selected to respond to the request task, thereby ensuring that the request task is not repeatedly executed.
Further, using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches a preset accuracy or a maximum training time comprises:
training a verification model locally by using one node in the team meeting the credit rating requirement, and carrying out private key signature on model parameters of the verification model;
sending the signed model parameter to an unused node in the team meeting the credit rating requirement, and updating the model parameter of the unused node;
sending the updated model parameters to another unused node in the team meeting the credit rating requirement until the verification model reaches a preset accuracy or a maximum training time.
As can be seen from the above description, a verification model is generated when the data model is trained, and then the model parameters are signed by using a private key; randomly sending the signed model parameter to the next unused node, updating the model parameter by the next unused node according to local data, then randomly sending the updated model parameter to the other unused node, and repeating the process until the verification model reaches the preset accuracy or the maximum training time; therefore, nodes in the team train the global model through federal learning, the calculation burden of centralized equipment can be reduced, and the data privacy of data owners is protected.
Further, training a verification model using the nodes in the team that meet the credit rating requirement further comprises:
according to a random algorithm and two adjacent data sets with at most one different record, after removing two data sets in a row, calculating the probability that the same result is obtained by the random algorithm:
Pr[G(D)∈0]≤exp(ε)·Pr[G(D′)∈0];
where G denotes a random algorithm, ε denotes a privacy budget, usually a small constant, and D denotes a data set;
calculating the sensitivity:
Δf=maxD,D,||G(D)-G(G′)||;
the laplace mechanism applied to the global model is calculated from the sensitivities:
G=Gm+Lap(Δf/ε);
wherein G ismIs a trained global model.
As can be seen from the above description, by adding laplacian noise to the global data sharing model, differential privacy is applied to data sharing, thereby preventing inference attacks initiated by data requesters and providing further privacy protection for data.
Further, the packing the model training process locally comprises:
all sharing records between the data requester and the data nodes are used as sharing transactions;
and packaging the shared transaction into blocks through the transaction recording node and storing the blocks to the local.
As can be seen from the above description, all shared records between the data requester and the data node are packaged into blocks by the transaction record node, so a blockchain is introduced into the data sharing process, and each training process is recorded to ensure that the data provider provides high-quality data.
Further, the node contribution-based consensus algorithm agreeing among block link points and awarding credit to the team meeting the credit rating requirement comprises:
executing a consensus process by nodes executing a data sharing task, each node competing for the opportunity to write a transaction record into a block by a work contribution mechanism;
and broadcasting the corresponding block to other nodes by the node with the authority for verification, and adding the corresponding block to the block chain for auditing after the verification is passed.
As can be seen from the above description, by performing the consensus process, each node competes for the opportunity to write the transaction record to the block through the work contribution mechanism, and thus performing the transaction writing according to the contribution can reduce the computational burden of the device.
Further, the consensus algorithm based on node contribution comprises the steps of:
and calculating the contribution of each node according to the cosine similarity:
Figure BDA0003415034900000071
wherein
Figure BDA0003415034900000072
Representing the actual update gradient k of the node,
Figure BDA0003415034900000081
representing the local update gradient of the kth node,
Figure BDA0003415034900000082
representing the gradient of the model before the data node k is updated,
Figure BDA0003415034900000083
a gradient representing the global model;
performing a reward mechanism based on the contribution weight ratio;
the contribution value is calculated by the mapping function:
Figure BDA0003415034900000084
calculating the weight ratio of the node contribution to the global model by using a soft-max function;
calculating the function value of soft-max:
Figure BDA0003415034900000085
according to the description, the contribution values of the nodes are accurately calculated according to the cosine values, and the consensus algorithm based on the data node contribution is used for achieving consensus among the block link points, so that the credit reward is conveniently carried out subsequently.
The invention discloses a data sharing method based on a block chain and federal learning, which is suitable for realizing model sharing to protect the privacy of a data provider by using point-to-point federal learning in the background of the Internet of things, and is described by a specific implementation mode as follows:
example one
Referring to fig. 1 to 3, a data sharing method based on a block chain and federal learning includes the steps of:
and S1, grouping the mutually trusted block chain nodes into a team.
Each team has a team leader responsible for receiving data sharing tasks, supervising the joint learning process in data sharing, and sending a global model with differential privacy to task publishers.
The data node in step S1 may have selfish behavior, and an internal team management mechanism based on "mortgage-penalty" is designed to solve the problem, and specifically includes the following steps:
s11, providing a preset amount of mortgage when a team is built, and calculating a penalty coefficient k of a bad behavior node:
Figure BDA0003415034900000086
in the formula, v represents the total number of work rounds of the nodes for completing the cooperative task, p represents the number of times of the nodes for temporarily quitting, and q represents the number of times of the nodes for being lazy.
S12, calculating penalty values of nodes in the team:
Figure BDA0003415034900000087
in the formula (I), the compound is shown in the specification,
Figure BDA0003415034900000091
is the mortgage of node i.
S13, calculating the compensation value C of the non-adverse behavior node in the team1
Figure BDA0003415034900000092
Where N represents the total number of nodes in the team.
Further, in order to promote honest and effective training of the data nodes in the step S1, a credit rating mechanism is introduced, and the credit rating mechanism is rewarded or punished according to the contribution of the data nodes, which includes the following specific steps:
s14, setting the original credit of the leader node or each member node in the team to be zero.
S15, calculating the reward value of the leader node in the team:
Figure BDA0003415034900000093
in the formula, CreditIndicating credit awards provided by task publishers, WkRepresenting the contribution of the weighted ratio data nodes to the global model.
S16, calculating the reward value of each member node:
Figure BDA0003415034900000094
s17, calculating a credit value C2 of each node:
C2=Cbase+Cobtain
in the formula, CbaseRepresenting the original credit accumulation value of the node.
And S2, receiving the request task, and selecting the team meeting the credit rating requirement to respond to the request task.
Specifically, step S2 further includes: federated learning is a distributed machine learning framework. It not only reduces the computational burden on centralized devices by aggregating local training models (rather than raw data) of data owners, but also protects data privacy of data owners.
S21, initiating a data sharing request task: the data requestor initiates a data sharing request. The task contains the requestor's ID, the requested task category, a timestamp, and the task level, and is signed by its private key.
S22, team response task: after a data requestor issues a request task, the node connected to it will first verify its identity and then search the blockchain to determine if the request has been previously processed. And if the cache records exist, directly returning the query result. If it is a new request, the task will be broadcast on the blockchain and the data team meeting the credit requirements will respond to the task.
And S3, receiving a data sharing task, and training a verification model by using the nodes in the team meeting the credit rating requirement until the verification model reaches the preset accuracy or the maximum training time.
The data sharing task comprises an ID of a task requester, a requested task category, a time stamp and a task level.
S31, training a verification model locally by using one node in the team meeting the credit rating requirement, and carrying out private key signature on model parameters of the verification model.
S32, sending the signed model parameters to an unused node in the team meeting the credit rating requirement, and updating the model parameters of the unused node.
S33, sending the updated model parameters to another unused node in the team meeting the credit rating requirement until the verification model reaches the preset accuracy or the maximum training time.
And S4, packing the model training process to the local, achieving consensus among block link points based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.
Specifically, the blockchain is used as a distributed shared account book and a database, has the characteristics of decentralization, non-tampering, tracking, collective maintenance, openness and transparency and the like, and can provide reliable technical support for privacy protection of data sharing.
And S41, taking all sharing records between the data requester and the data node as sharing transactions.
And S42, packaging the sharing affair into blocks through the affair recording node and saving the blocks to the local.
S43, executing the consensus process through the nodes executing the data sharing task, wherein each node competes for the opportunity of writing the transaction record into the block through the work contribution mechanism.
And S44, the node with the authority broadcasts the corresponding block to other nodes for verification, and the corresponding block is added to the block chain for auditing after the verification is passed.
Therefore, in this embodiment, the combination of the blockchain and the federal learning not only solves the privacy and security problems of data sharing in a distributed scenario, but also improves the quality of shared data. The shared records for each participant can be tracked, which enables security audits.
Example two
Referring to fig. 1 to fig. 3, the present embodiment is different from the first embodiment in that a privacy difference is further defined to be applied to data sharing, and specifically, in consideration that a malicious data requester may initiate an attack, a team leader should add interference to a model, and a model protection method based on differential privacy is used, which specifically includes the following steps:
given a random algorithm G, two contiguous data sets D1 and D2 with at most one different recording;
after removing the two data sets in a row, the probability of obtaining the same result by the random algorithm G is calculated according to equation (7):
Pr[G(D)∈0]≤exp(ε)·Pr[G(D′)∈0];
where G denotes a random algorithm, ε denotes a privacy budget, usually a small constant, and D denotes a data set;
calculating the sensitivity:
Δf=maxD,D,||G(D)-G(G′)||;
the laplace mechanism applied to the global model is calculated from the sensitivities:
G=Gm+Lap(Δf/ε);
wherein G ismIs a trained global model.
EXAMPLE III
Referring to fig. 1 to fig. 3, the difference between the present embodiment and the first and second embodiments is that steps of a consensus algorithm based on node contribution are further defined, specifically:
and calculating the contribution of each node according to the cosine similarity:
Figure BDA0003415034900000111
wherein
Figure BDA0003415034900000112
Representing the actual update gradient k of the node,
Figure BDA0003415034900000113
representing the local update gradient of the kth node,
Figure BDA0003415034900000114
representing the gradient of the model before the data node k is updated,
Figure BDA0003415034900000115
a gradient representing the global model;
performing a reward mechanism based on the contribution weight ratio;
the contribution value is calculated by the mapping function:
Figure BDA0003415034900000116
calculating the weight ratio of the node contribution to the global model by using a soft-max function;
calculating the function value of soft-max:
Figure BDA0003415034900000121
therefore, the consensus mechanism in this embodiment has the advantage that it may prevent the lazy behavior of the nodes. Because in the process of multi-party cooperative training model, some lazy nodes may directly copy the previous model parameters to the next data node.
In summary, the data sharing method based on the block chain and the federal learning provided by the invention has the following beneficial effects: (1) from a data sharing perspective: the invention provides a data sharing mechanism based on federal learning, which converts a data sharing problem into a model sharing problem and realizes team-based data sharing. In addition, a reward and punishment mechanism is introduced. Specifically, the data requester carries out reward punishment on each team according to the result of data sharing, so that team members can complete data sharing with high quality and high reliability. Furthermore, in order to further penalize members providing unreliable data, a "mortgage-penalty" mechanism is introduced. Thus, each team may further manage and supervise the members so that they can efficiently and reliably complete the data sharing task. (2) From a privacy protection perspective analysis: according to the method and the device, the Laplace noise is added into the global data sharing model, and the difference privacy is applied to data sharing, so that inference attack initiated by a data requester is prevented, and further privacy protection is provided for the data.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data sharing method based on block chain and federal learning is characterized by comprising the following steps:
building mutually trusted block link points into a team;
receiving a request task, and selecting a team meeting the credit rating requirement to respond to the request task;
receiving a data sharing task, and using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time;
packing the model training process to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.
2. The method for data sharing based on blockchain and federal learning of claim 1, wherein the grouping mutually trusted blockchain nodes into a team comprises:
providing a preset amount of mortgage when a team is built, and calculating a penalty coefficient k of a bad behavior node:
Figure FDA0003415034890000011
in the formula, v represents the total number of work rounds of the nodes for completing the cooperative task, p represents the temporary exit frequency of the nodes, and q represents the lazy frequency of the nodes;
calculating the penalty value of the nodes in the team:
Figure FDA0003415034890000012
in the formula (I), the compound is shown in the specification,
Figure FDA0003415034890000013
mortgage for node i;
computing compensation value C of non-adverse behavior nodes in team1
Figure FDA0003415034890000014
Where N represents the total number of nodes in the team.
3. The method for data sharing based on blockchain and federal learning of claim 2, wherein the grouping mutually trusted blockchain nodes into a team further comprises:
setting the original credit of the leader node or each member node in the team to be zero;
calculating the reward value of the leader node in the team:
Figure FDA0003415034890000015
in the formula, CreditIndicating credit awards provided by task publishers, WkRepresenting the contribution of the weighted ratio data nodes to the global model;
calculating a reward value for each member node:
Figure FDA0003415034890000021
calculating a credit value C for each node2
C2=Cbase+Cobtain
In the formula, CbaseRepresenting the original credit accumulation value of the node.
4. The data sharing method based on the block chain and the federal learning of claim 1, wherein the data sharing task comprises an ID of a task requester, a requested task category, a time stamp and a task level.
5. The method of claim 1, wherein selecting teams meeting credit rating requirements to respond to the requested task comprises:
verifying the identity of the requesting task according to the nodes connected to the requesting task;
and judging whether the request task is processed according to the identification, if so, directly returning a processing result inquired in the block chain, otherwise, broadcasting the request task on the block chain, and selecting a team meeting the credit rating requirement to respond to the request task.
6. The method of claim 1, wherein training a verification model using nodes in the team meeting the credit rating requirement until the verification model reaches a preset accuracy or a maximum training time comprises:
training a verification model locally by using one node in the team meeting the credit rating requirement, and carrying out private key signature on model parameters of the verification model;
sending the signed model parameter to an unused node in the team meeting the credit rating requirement, and updating the model parameter of the unused node;
sending the updated model parameters to another unused node in the team meeting the credit rating requirement until the verification model reaches a preset accuracy or a maximum training time.
7. The method of claim 1, wherein training a verification model using nodes in the team meeting the credit rating requirement further comprises:
according to a random algorithm and two adjacent data sets with at most one different record, after removing two data sets in a row, calculating the probability that the same result is obtained by the random algorithm:
Pr[G(D)∈O]≤exp(ε)·Pr[G(D′)∈O];
where G denotes a random algorithm, ε denotes a privacy budget, usually a small constant, and D denotes a data set;
calculating the sensitivity:
Δf=maxD,D′||G(D)-G(G′)||;
the laplace mechanism applied to the global model is calculated from the sensitivities:
G=Gm+Lap(Δf/ε);
wherein G ismIs a trained global model.
8. The data sharing method based on blockchain and federal learning of claim 1, wherein the packing of the model training process to local comprises:
all sharing records between the data requester and the data nodes are used as sharing transactions;
and packaging the shared transaction into blocks through the transaction recording node and storing the blocks to the local.
9. The method for data sharing based on block chain and federal learning as claimed in claim 1, wherein said consensus algorithm based on node contribution reaches consensus among block chain nodes and rewards credit for the teams meeting credit rating requirement includes:
executing a consensus process by nodes executing a data sharing task, each node competing for the opportunity to write a transaction record into a block by a work contribution mechanism;
and broadcasting the corresponding block to other nodes by the node with the authority for verification, and adding the corresponding block to the block chain for auditing after the verification is passed.
10. The method for sharing data based on blockchain and federal learning as claimed in claim 1, wherein said consensus algorithm based on node contribution comprises the steps of:
and calculating the contribution of each node according to the cosine similarity:
Figure FDA0003415034890000031
wherein
Figure FDA0003415034890000032
Figure FDA0003415034890000033
Representing the actual update gradient k of the node,
Figure FDA0003415034890000034
representing the local update gradient of the kth node,
Figure FDA0003415034890000035
representing the gradient of the model before the data node k is updated,
Figure FDA0003415034890000036
a gradient representing the global model;
performing a reward mechanism based on the contribution weight ratio;
the contribution value is calculated by the mapping function:
Figure FDA0003415034890000041
calculating the weight ratio of the node contribution to the global model by using a soft-max function;
calculating the function value of soft-max:
Figure FDA0003415034890000042
CN202111543907.9A 2021-12-16 2021-12-16 Data sharing method based on block chain and federal learning Pending CN114417398A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111543907.9A CN114417398A (en) 2021-12-16 2021-12-16 Data sharing method based on block chain and federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111543907.9A CN114417398A (en) 2021-12-16 2021-12-16 Data sharing method based on block chain and federal learning

Publications (1)

Publication Number Publication Date
CN114417398A true CN114417398A (en) 2022-04-29

Family

ID=81267799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111543907.9A Pending CN114417398A (en) 2021-12-16 2021-12-16 Data sharing method based on block chain and federal learning

Country Status (1)

Country Link
CN (1) CN114417398A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114726551A (en) * 2022-06-06 2022-07-08 广州优刻谷科技有限公司 Meta-universe credit assessment method and device based on federal management
CN115174404A (en) * 2022-05-17 2022-10-11 南京大学 Multi-device federal learning system based on SDN networking
CN115510494A (en) * 2022-10-13 2022-12-23 贵州大学 Multi-party safety data sharing method based on block chain and federal learning
CN116029370A (en) * 2023-03-17 2023-04-28 杭州海康威视数字技术股份有限公司 Data sharing excitation method, device and equipment based on federal learning of block chain
CN117472866A (en) * 2023-12-27 2024-01-30 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174404A (en) * 2022-05-17 2022-10-11 南京大学 Multi-device federal learning system based on SDN networking
CN114726551A (en) * 2022-06-06 2022-07-08 广州优刻谷科技有限公司 Meta-universe credit assessment method and device based on federal management
CN114726551B (en) * 2022-06-06 2022-08-16 广州优刻谷科技有限公司 Meta-universe credit assessment method and device based on federal management
CN115510494A (en) * 2022-10-13 2022-12-23 贵州大学 Multi-party safety data sharing method based on block chain and federal learning
CN115510494B (en) * 2022-10-13 2023-11-21 贵州大学 Multiparty safety data sharing method based on block chain and federal learning
CN116029370A (en) * 2023-03-17 2023-04-28 杭州海康威视数字技术股份有限公司 Data sharing excitation method, device and equipment based on federal learning of block chain
CN116029370B (en) * 2023-03-17 2023-07-25 杭州海康威视数字技术股份有限公司 Data sharing excitation method, device and equipment based on federal learning of block chain
CN117472866A (en) * 2023-12-27 2024-01-30 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation
CN117472866B (en) * 2023-12-27 2024-03-19 齐鲁工业大学(山东省科学院) Federal learning data sharing method under block chain supervision and excitation

Similar Documents

Publication Publication Date Title
CN114417398A (en) Data sharing method based on block chain and federal learning
US20200394471A1 (en) Efficient database maching learning verification
Jiang et al. A medical big data access control model based on fuzzy trust prediction and regression analysis
CN115510494A (en) Multi-party safety data sharing method based on block chain and federal learning
Zhang et al. TDTA: A truth detection based task assignment scheme for mobile crowdsourced Industrial Internet of Things
Miao et al. An intelligent and privacy-enhanced data sharing strategy for blockchain-empowered Internet of Things
CN113779617B (en) State channel-based federal learning task credible supervision and scheduling method and device
Tang et al. A trust-based model for security cooperating in vehicular cloud computing
CN112530587A (en) Construction method of two-dimensional dynamic trust evaluation model for medical big data access control
Wang et al. The truthful evolution and incentive for large-scale mobile crowd sensing networks
Wu et al. A blockchain based access control scheme with hidden policy and attribute
Sun Research on the tradeoff between privacy and trust in cloud computing
Ahmadjee et al. A study on blockchain architecture design decisions and their security attacks and threats
Halgamuge et al. Trust model to minimize the influence of malicious attacks in sharding based blockchain networks
Wang et al. Blockchain-based federated learning in mobile edge networks with application in internet of vehicles
Singh et al. An adaptive mutual trust based access control model for electronic healthcare system
Rahmadika et al. Reliable collaborative learning with commensurate incentive schemes
CN112968873B (en) Encryption method and device for private data transmission
Liao et al. Blockchain-based mobile crowdsourcing model with task security and task assignment
Xi et al. CrowdLBM: A lightweight blockchain-based model for mobile crowdsensing in the Internet of Things
Kalapaaking et al. Smart Policy Control for Securing Federated Learning Management System
Liu et al. A fine‐grained medical data sharing scheme based on federated learning
CN114826684B (en) Decentralized crowdsourcing method, system and terminal supporting efficient privacy protection
CN116776373A (en) Medical data trusted sharing method based on blockchain and federal learning
Zhou et al. Ensuring Long-Term Trustworthy Collaboration in IoT Networks using Contract Theory and Reputation Mechanism on Blockchain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination