CN114417398A

CN114417398A - Data sharing method based on block chain and federal learning

Info

Publication number: CN114417398A
Application number: CN202111543907.9A
Authority: CN
Inventors: 范新民; 妙秦阳; 汪晓丁; 张灵杰; 林晖
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-04-29

Abstract

The invention discloses a data sharing method based on a block chain and federal learning, wherein block chain nodes which trust each other are organized into a team, and the team meeting the credit rating requirement is selected to respond to a request task after the request task is received; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, updating the credit rating in time, ensuring the reliability of the credit rating, and relieving the privacy protection problem of the data in the Internet of things.

Description

Data sharing method based on block chain and federal learning

Technical Field

The invention relates to the technical field of data sharing of the Internet of things, in particular to a data sharing method based on a block chain and federal learning.

Background

With the development of internet technology, the internet of things (IoT) is widely used in various industries. The sensor is an important component of the Internet of things and is also the most important data source of the Internet of things system. The perception data collected by a single sensor often cannot meet the requirements of users, and the real value of the Internet of things lies in comprehensive utilization and sharing of various data and information. For example, in the healthcare field, data sharing may provide valuable health records, including treatment information and physical examination information, which may provide targeted treatment to patients. In the tourism industry, collected data are analyzed, data sharing can accurately know the preference of tourists, and future tourism hotspots are predicted, so that the service quality is improved. However, data sharing in the internet of things may face the following problems: first, it is difficult for every organization to establish mutual trust, and therefore, they are unlikely to share reliable local data; second, data privacy has become a big problem hindering data sharing, as data owners suffer from privacy disclosure. Thus, achieving efficient data sharing is a challenge, particularly if both of these issues have not been solved.

Machine learning techniques are widely used for data sharing. Traditional machine learning techniques first collect data and then focus on model training. However, large-scale data collection is often difficult to achieve because the data owner is concerned about privacy disclosure. Federated learning is a distributed machine learning framework. It not only reduces the computational burden of centralized equipment by aggregating local training models of data owners instead of raw data, but also protects data privacy of data owners. The block chain is used as a distributed shared account book and a database, has the characteristics of decentralization, non-tampering, traceability, collective maintenance, openness and transparency and the like, and can provide reliable technical support for privacy protection of data sharing. For example, the blockchain may record the sharing behavior of each participant providing the data model, forcing the participants to provide a reliable data model.

Secure data sharing in the internet of things is receiving more and more attention, and a large number of data sharing mechanisms based on block chains and federal learning are proposed: gao et al (Blockchain based secure IoT data sharing frame for SDN-enabled smart communications) propose a secure data sharing framework using blockchains and proxy re-encryption techniques; xu et al (BDSS-FA: A Block-based Data Security establishing Platform With Fine Grained Access Control) propose a new encryption algorithm based on hierarchical attributes, and the attributes are allocated to an authorization center based on a block chain to realize the safe Sharing of Data; makhdoom and the like (privySharing: A block-based frame for privacy presetting and secure data sharing in smart contracts) embed access control rules in an intelligent contract to control the access of a user to data, and divide a block chain into a plurality of channels to protect the privacy and the security of the data; K. P.Y u et al (Block-Enhanced Data Sharing with Traceable and Direct retrieval in IIoT) abstract proposes an efficient and safe Data Sharing model based on attribute encryption, which can resist various attacks; hao et al (effective and private-Enhanced fed Learning for Industrial Intelligent understanding) propose a high-efficiency federal Learning mode, guarantee the Privacy of the data, this scheme can resist collusion attack in the distributed environment, prevent the personal data from revealing at the same time; sattler et al (Robust and Communication-Efficient fed Learning From Non-i.i.d.data) propose a sparse compression framework suitable for the broadband limited environment to solve the Communication overhead in the model training; a.imteaj and m.h.amini (Distributed Sensing Using Smart End-User Devices: Pathway to fed Learning for Autonomous IoT) improve federal Learning by evaluating model feedback of participants and an update method of participant weights; lu et al (Block chain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT) combine Data Sharing, machine Learning, block chaining and federal Learning together to solve the Privacy protection problem in Data Sharing; l.yin et al (A Privacy monitoring mined Learning for Multiparty Data Sharing in Social IoTs) protects Data Privacy of Data Sharing participants in the Social Internet of things by combining Federal Learning and cryptography, and improves Data transmission and storage efficiency by using sparse differential gradient; chai et al (A Hierarchical Block-Enabled Learning for Knowledge Learning in the Internet of Vehicles) construct a secure Hierarchical Federated Learning scheme to protect the privacy of local data models and solve the security problem of resource Sharing in the Internet of Vehicles environment.

Although the above work has positively contributed to privacy protection, further research is needed to ensure the reliability of the data sharing process. Therefore, in order to realize safe and reliable untrusted data sharing, a data sharing mechanism based on federal learning is proposed.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the data sharing method based on the block chain and the federal learning is provided, and the privacy protection problem of data in the Internet of things can be effectively relieved.

In order to solve the technical problems, the invention adopts the technical scheme that:

a data sharing method based on block chains and federal learning comprises the following steps:

building mutually trusted block link points into a team;

receiving a request task, and selecting a team meeting the credit rating requirement to respond to the request task;

receiving a data sharing task, and using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time;

packing the model training process to the local, achieving consensus among block chain nodes based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.

The invention has the beneficial effects that: building mutually trusted block link points into a team, and after receiving a request task, selecting a team meeting the credit rating requirement to respond to the request task; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm contributed by nodes, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, and updating the credit rating in time, thereby ensuring the reliability of the credit rating and effectively relieving the privacy protection problem of the data in the Internet of things.

Drawings

FIG. 1 is a general flow chart of a federated learning-based data sharing strategy according to an embodiment of the present invention;

FIG. 2 is a block diagram of a federated learning-based data sharing strategy according to an embodiment of the present invention;

fig. 3 is a specific flowchart of a data sharing policy based on federal learning according to an embodiment of the present invention.

Detailed Description

In order to explain technical contents, achieved objects, and effects of the present invention in detail, the following description is made with reference to the accompanying drawings in combination with the embodiments.

Referring to fig. 1 to fig. 3, an embodiment of the present invention provides a data sharing method based on a block chain and federal learning, including the steps of:

building mutually trusted block link points into a team;

From the above description, the beneficial effects of the present invention are: building mutually trusted block link points into a team, and after receiving a request task, selecting a team meeting the credit rating requirement to respond to the request task; after receiving a data sharing task, using nodes in a team meeting the credit rating requirement to train a verification model until the verification model reaches preset accuracy or maximum training time, and realizing model sharing to protect the privacy of a data provider; the method comprises the steps of packing model training processes to the local, achieving consensus among block chain nodes based on a consensus algorithm contributed by nodes, and rewarding credit for a team meeting credit rating requirements, so that each training process in the data sharing process is recorded to ensure that a data provider provides high-quality data, rewarding credit after achieving consensus, and updating the credit rating in time, thereby ensuring the reliability of the credit rating and effectively relieving the privacy protection problem of the data in the Internet of things.

Further, the grouping mutually trusted tile link points into a team comprises:

providing a preset amount of mortgage when a team is built, and calculating a penalty coefficient k of a bad behavior node:

in the formula, v represents the total number of work rounds of the nodes for completing the cooperative task, p represents the temporary exit frequency of the nodes, and q represents the lazy frequency of the nodes;

calculating the penalty value of the nodes in the team:

in the formula (I), the compound is shown in the specification,

mortgage for node i;

computing compensation value C of non-adverse behavior nodes in team₁：

Where N represents the total number of nodes in the team.

According to the description, when a team is built, a preset amount of mortgage is provided, the penalty coefficient of a node with bad behaviors is calculated, the penalty value of the node in the team is calculated according to the penalty coefficient of the node, and then the compensation value of each node in the team is calculated; therefore, aiming at possible bad behaviors of the nodes, a team management mechanism based on 'mortgage-penalty' is designed, the loss of other nodes can be made up, and each team can further manage and supervise members so as to efficiently and reliably complete the data sharing task.

Further, the grouping mutually trusted block nodes into a team further comprises:

setting the original credit of the leader node or each member node in the team to be zero;

calculating the reward value of the leader node in the team:

in the formula, C_reditIndicating credit awards provided by task publishers, W_kRepresenting the contribution of the weighted ratio data nodes to the global model;

calculating a reward value for each member node:

calculating a credit value C for each node₂：

C₂＝C_base+C_obtain；

In the formula, C_baseRepresenting the original credit accumulation value of the node.

As can be seen from the above description, in order to promote honest and effective training of nodes, a credit rating mechanism is introduced, and the credit rating mechanism is rewarded or punished according to the contribution of the nodes, so that the accuracy of the credit rating is ensured.

Further, the data sharing task includes an ID of the task requester, a requested task category, a timestamp, and a task level.

As can be seen from the above description, the ID of the task requester, the type of the task requested, the timestamp and the task level included in the data sharing task facilitate the normal operation of the subsequent team for corresponding tasks and training the corresponding data model.

Further, the selecting a team meeting a credit rating requirement to respond to the requested task comprises:

verifying the identity of the requesting task according to the nodes connected to the requesting task;

and judging whether the request task is processed according to the identification, if so, directly returning a processing result inquired in the block chain, otherwise, broadcasting the request task on the block chain, and selecting a team meeting the credit rating requirement to respond to the request task.

As can be seen from the above description, after receiving a request task, it is necessary to verify the identifier of the request task according to the node connected to the request task, and if the identifier is found in the block chain, it indicates that the request task has been processed, and returns a query result, otherwise, the request task is broadcast, and a team meeting the credit rating requirement is selected to respond to the request task, thereby ensuring that the request task is not repeatedly executed.

Further, using the nodes in the team meeting the credit rating requirement to train a verification model until the verification model reaches a preset accuracy or a maximum training time comprises:

training a verification model locally by using one node in the team meeting the credit rating requirement, and carrying out private key signature on model parameters of the verification model;

sending the signed model parameter to an unused node in the team meeting the credit rating requirement, and updating the model parameter of the unused node;

sending the updated model parameters to another unused node in the team meeting the credit rating requirement until the verification model reaches a preset accuracy or a maximum training time.

As can be seen from the above description, a verification model is generated when the data model is trained, and then the model parameters are signed by using a private key; randomly sending the signed model parameter to the next unused node, updating the model parameter by the next unused node according to local data, then randomly sending the updated model parameter to the other unused node, and repeating the process until the verification model reaches the preset accuracy or the maximum training time; therefore, nodes in the team train the global model through federal learning, the calculation burden of centralized equipment can be reduced, and the data privacy of data owners is protected.

Further, training a verification model using the nodes in the team that meet the credit rating requirement further comprises:

according to a random algorithm and two adjacent data sets with at most one different record, after removing two data sets in a row, calculating the probability that the same result is obtained by the random algorithm:

Pr[G(D)∈0]≤exp(ε)·Pr[G(D′)∈0]；

where G denotes a random algorithm, ε denotes a privacy budget, usually a small constant, and D denotes a data set;

calculating the sensitivity:

Δf＝max_D，D，||G(D)-G(G′)||；

the laplace mechanism applied to the global model is calculated from the sensitivities:

G＝G_m+Lap(Δf/ε)；

wherein G is_mIs a trained global model.

As can be seen from the above description, by adding laplacian noise to the global data sharing model, differential privacy is applied to data sharing, thereby preventing inference attacks initiated by data requesters and providing further privacy protection for data.

Further, the packing the model training process locally comprises:

all sharing records between the data requester and the data nodes are used as sharing transactions;

and packaging the shared transaction into blocks through the transaction recording node and storing the blocks to the local.

As can be seen from the above description, all shared records between the data requester and the data node are packaged into blocks by the transaction record node, so a blockchain is introduced into the data sharing process, and each training process is recorded to ensure that the data provider provides high-quality data.

Further, the node contribution-based consensus algorithm agreeing among block link points and awarding credit to the team meeting the credit rating requirement comprises:

executing a consensus process by nodes executing a data sharing task, each node competing for the opportunity to write a transaction record into a block by a work contribution mechanism;

and broadcasting the corresponding block to other nodes by the node with the authority for verification, and adding the corresponding block to the block chain for auditing after the verification is passed.

As can be seen from the above description, by performing the consensus process, each node competes for the opportunity to write the transaction record to the block through the work contribution mechanism, and thus performing the transaction writing according to the contribution can reduce the computational burden of the device.

Further, the consensus algorithm based on node contribution comprises the steps of:

and calculating the contribution of each node according to the cosine similarity:

wherein

Representing the actual update gradient k of the node,

representing the local update gradient of the kth node,

representing the gradient of the model before the data node k is updated,

a gradient representing the global model;

performing a reward mechanism based on the contribution weight ratio;

the contribution value is calculated by the mapping function:

calculating the weight ratio of the node contribution to the global model by using a soft-max function;

calculating the function value of soft-max:

according to the description, the contribution values of the nodes are accurately calculated according to the cosine values, and the consensus algorithm based on the data node contribution is used for achieving consensus among the block link points, so that the credit reward is conveniently carried out subsequently.

The invention discloses a data sharing method based on a block chain and federal learning, which is suitable for realizing model sharing to protect the privacy of a data provider by using point-to-point federal learning in the background of the Internet of things, and is described by a specific implementation mode as follows:

example one

Referring to fig. 1 to 3, a data sharing method based on a block chain and federal learning includes the steps of:

and S1, grouping the mutually trusted block chain nodes into a team.

Each team has a team leader responsible for receiving data sharing tasks, supervising the joint learning process in data sharing, and sending a global model with differential privacy to task publishers.

The data node in step S1 may have selfish behavior, and an internal team management mechanism based on "mortgage-penalty" is designed to solve the problem, and specifically includes the following steps:

s11, providing a preset amount of mortgage when a team is built, and calculating a penalty coefficient k of a bad behavior node:

in the formula, v represents the total number of work rounds of the nodes for completing the cooperative task, p represents the number of times of the nodes for temporarily quitting, and q represents the number of times of the nodes for being lazy.

S12, calculating penalty values of nodes in the team:

in the formula (I), the compound is shown in the specification,

is the mortgage of node i.

S13, calculating the compensation value C of the non-adverse behavior node in the team₁：

Where N represents the total number of nodes in the team.

Further, in order to promote honest and effective training of the data nodes in the step S1, a credit rating mechanism is introduced, and the credit rating mechanism is rewarded or punished according to the contribution of the data nodes, which includes the following specific steps:

s14, setting the original credit of the leader node or each member node in the team to be zero.

S15, calculating the reward value of the leader node in the team:

in the formula, C_reditIndicating credit awards provided by task publishers, W_kRepresenting the contribution of the weighted ratio data nodes to the global model.

S16, calculating the reward value of each member node:

s17, calculating a credit value C2 of each node:

C₂＝C_base+C_obtain。

And S2, receiving the request task, and selecting the team meeting the credit rating requirement to respond to the request task.

Specifically, step S2 further includes: federated learning is a distributed machine learning framework. It not only reduces the computational burden on centralized devices by aggregating local training models (rather than raw data) of data owners, but also protects data privacy of data owners.

S21, initiating a data sharing request task: the data requestor initiates a data sharing request. The task contains the requestor's ID, the requested task category, a timestamp, and the task level, and is signed by its private key.

S22, team response task: after a data requestor issues a request task, the node connected to it will first verify its identity and then search the blockchain to determine if the request has been previously processed. And if the cache records exist, directly returning the query result. If it is a new request, the task will be broadcast on the blockchain and the data team meeting the credit requirements will respond to the task.

And S3, receiving a data sharing task, and training a verification model by using the nodes in the team meeting the credit rating requirement until the verification model reaches the preset accuracy or the maximum training time.

The data sharing task comprises an ID of a task requester, a requested task category, a time stamp and a task level.

S31, training a verification model locally by using one node in the team meeting the credit rating requirement, and carrying out private key signature on model parameters of the verification model.

S32, sending the signed model parameters to an unused node in the team meeting the credit rating requirement, and updating the model parameters of the unused node.

S33, sending the updated model parameters to another unused node in the team meeting the credit rating requirement until the verification model reaches the preset accuracy or the maximum training time.

And S4, packing the model training process to the local, achieving consensus among block link points based on a consensus algorithm of node contribution, and rewarding credit for the team meeting the credit rating requirement.

Specifically, the blockchain is used as a distributed shared account book and a database, has the characteristics of decentralization, non-tampering, tracking, collective maintenance, openness and transparency and the like, and can provide reliable technical support for privacy protection of data sharing.

And S41, taking all sharing records between the data requester and the data node as sharing transactions.

And S42, packaging the sharing affair into blocks through the affair recording node and saving the blocks to the local.

S43, executing the consensus process through the nodes executing the data sharing task, wherein each node competes for the opportunity of writing the transaction record into the block through the work contribution mechanism.

And S44, the node with the authority broadcasts the corresponding block to other nodes for verification, and the corresponding block is added to the block chain for auditing after the verification is passed.

Therefore, in this embodiment, the combination of the blockchain and the federal learning not only solves the privacy and security problems of data sharing in a distributed scenario, but also improves the quality of shared data. The shared records for each participant can be tracked, which enables security audits.

Example two

Referring to fig. 1 to fig. 3, the present embodiment is different from the first embodiment in that a privacy difference is further defined to be applied to data sharing, and specifically, in consideration that a malicious data requester may initiate an attack, a team leader should add interference to a model, and a model protection method based on differential privacy is used, which specifically includes the following steps:

given a random algorithm G, two contiguous data sets D1 and D2 with at most one different recording;

after removing the two data sets in a row, the probability of obtaining the same result by the random algorithm G is calculated according to equation (7):

Pr[G(D)∈0]≤exp(ε)·Pr[G(D′)∈0]；

calculating the sensitivity:

Δf＝max_D，D，||G(D)-G(G′)||；

G＝G_m+Lap(Δf/ε)；

wherein G is_mIs a trained global model.

EXAMPLE III

Referring to fig. 1 to fig. 3, the difference between the present embodiment and the first and second embodiments is that steps of a consensus algorithm based on node contribution are further defined, specifically:

wherein

Representing the actual update gradient k of the node,

representing the local update gradient of the kth node,

representing the gradient of the model before the data node k is updated,

a gradient representing the global model;

performing a reward mechanism based on the contribution weight ratio;

the contribution value is calculated by the mapping function:

calculating the function value of soft-max:

therefore, the consensus mechanism in this embodiment has the advantage that it may prevent the lazy behavior of the nodes. Because in the process of multi-party cooperative training model, some lazy nodes may directly copy the previous model parameters to the next data node.

In summary, the data sharing method based on the block chain and the federal learning provided by the invention has the following beneficial effects: (1) from a data sharing perspective: the invention provides a data sharing mechanism based on federal learning, which converts a data sharing problem into a model sharing problem and realizes team-based data sharing. In addition, a reward and punishment mechanism is introduced. Specifically, the data requester carries out reward punishment on each team according to the result of data sharing, so that team members can complete data sharing with high quality and high reliability. Furthermore, in order to further penalize members providing unreliable data, a "mortgage-penalty" mechanism is introduced. Thus, each team may further manage and supervise the members so that they can efficiently and reliably complete the data sharing task. (2) From a privacy protection perspective analysis: according to the method and the device, the Laplace noise is added into the global data sharing model, and the difference privacy is applied to data sharing, so that inference attack initiated by a data requester is prevented, and further privacy protection is provided for the data.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A data sharing method based on block chain and federal learning is characterized by comprising the following steps:

building mutually trusted block link points into a team;

2. The method for data sharing based on blockchain and federal learning of claim 1, wherein the grouping mutually trusted blockchain nodes into a team comprises:

calculating the penalty value of the nodes in the team:

in the formula (I), the compound is shown in the specification,

mortgage for node i;

computing compensation value C of non-adverse behavior nodes in team₁：

Where N represents the total number of nodes in the team.

3. The method for data sharing based on blockchain and federal learning of claim 2, wherein the grouping mutually trusted blockchain nodes into a team further comprises:

calculating the reward value of the leader node in the team:

calculating a reward value for each member node:

calculating a credit value C for each node₂：

C₂＝C_base+C_obtain；

4. The data sharing method based on the block chain and the federal learning of claim 1, wherein the data sharing task comprises an ID of a task requester, a requested task category, a time stamp and a task level.

5. The method of claim 1, wherein selecting teams meeting credit rating requirements to respond to the requested task comprises:

6. The method of claim 1, wherein training a verification model using nodes in the team meeting the credit rating requirement until the verification model reaches a preset accuracy or a maximum training time comprises:

7. The method of claim 1, wherein training a verification model using nodes in the team meeting the credit rating requirement further comprises:

Pr[G(D)∈O]≤exp(ε)·Pr[G(D′)∈O]；

calculating the sensitivity:

Δf＝max_D，D′||G(D)-G(G′)||；

G＝G_m+Lap(Δf/ε)；

wherein G is_mIs a trained global model.

8. The data sharing method based on blockchain and federal learning of claim 1, wherein the packing of the model training process to local comprises:

9. The method for data sharing based on block chain and federal learning as claimed in claim 1, wherein said consensus algorithm based on node contribution reaches consensus among block chain nodes and rewards credit for the teams meeting credit rating requirement includes:

10. The method for sharing data based on blockchain and federal learning as claimed in claim 1, wherein said consensus algorithm based on node contribution comprises the steps of:

wherein

Representing the actual update gradient k of the node,

representing the local update gradient of the kth node,

representing the gradient of the model before the data node k is updated,

a gradient representing the global model;

performing a reward mechanism based on the contribution weight ratio;

the contribution value is calculated by the mapping function:

calculating the function value of soft-max: