CN114496274A

CN114496274A - Byzantine robust federated learning method based on block chain and application

Info

Publication number: CN114496274A
Application number: CN202111489012.1A
Authority: CN
Inventors: 张延楠; 尚璇; 张帅; 谢逸俊; 李伟
Original assignee: Hangzhou Qulian Technology Co Ltd
Current assignee: Hangzhou Qulian Technology Co Ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-05-13

Abstract

The invention discloses a Byzantine robust federated learning method based on a block chain and application thereof, wherein the method comprises the following steps: each client side registers in the block chain, and each client side participating in learning executes the federal learning task issued by the task issuer in the block chain so as to update the model; verifying the updating direction of the model update uploaded by each client by the aggregators screened from each client; the aggregator eliminates abnormal model updates according to the update direction verification result, and performs scale normalization and aggregation on the remaining model updates to obtain a new global model; the new global model is broadcast into the blockchain for the next round of federal learning. The data safety of a client and the robustness of a federal learning model in a federal learning framework are guaranteed.

Description

Byzantine robust federated learning method based on block chain and application

Technical Field

The invention belongs to the field of data security, and particularly relates to a Byzantine robust federated learning method based on a block chain and application thereof.

Background

Learning a model from real-world health data through machine learning techniques has proven effective in a variety of medical applications. Since healthcare data is typically isolated across data repositories, traditional or centralized machine learning algorithms require aggregation of these distributed healthcare data into one central repository in order to train the model. However, this presents practical challenges such as sharing regulatory limitations for patient-level sensitive data, high resources required to transmit and aggregate data, and introducing a high risk of equipment failure. Federal learning, which allows client data to remain local and only shares training parameters, seems to be an excellent solution to this problem.

In the medical federal learning framework, the system consists of a central server for maintaining a global model and clients participating in training. In each iteration, the server sends the current version of the global model to the client, which updates and creates local updates using its local data. The server then aggregates the local updates of the users and updates the global model for the next iteration.

However, due to the distributed nature of the federated learning framework and the limited background knowledge of the server, federated learning is susceptible to adversarial attacks. In particular, the client may make a batch of malicious updates elaborately, thereby seriously reducing the convergence performance of the overall global model, and making it impossible to jointly train a complete medical model. For example, an attacker may intentionally place a wrong label on medical data, resulting in an update error during training, which seriously affects the overall training result. Such behavior by client nodes is known as a byzantine attack, which has proven to be a strong hazard to federal learning of medical systems, which must be made robust to.

In order to defend against the byzantine attack under federal learning in medicine, previous studies have proposed strategies that can be divided into detection-based methods and validation-based methods. Where the detection-based approach selects the most honest partial model, they believe that gradient updates from benign clients tend to be distributed around the true gradient, while gradient updates from byzantine clients may be arbitrary and thus handled by applying robust estimation techniques for aggregating client updates. However, in order to make the byzantine model easier to select, an attacker can learn to improve his attack method or conduct an attack by continuously exploring the honest model.

The validation-based approach separately validates each submitted local model in order to identify the attacker. However, such methods prove to be vulnerable, and when the number of attackers continues to increase, not only will the authentication method fail, but the server will crash due to a large amount of overhead, and become the target of hijacking and control.

Disclosure of Invention

In view of the above, the first invention of the present invention is to provide a block chain-based byzantine robust federated learning method to ensure the data security of the client in the federated learning framework and the robustness of the federated learning model.

To achieve the first object, an embodiment provides a block chain-based byzantine robust federal learning method, including the following steps:

each client side registers in the block chain, and each client side participating in learning executes the federal learning task issued by the task issuer in the block chain so as to update the model;

verifying the updating direction of the model update uploaded by each client by the aggregators screened from each client;

the aggregator eliminates abnormal model updates according to the update direction verification result, and performs scale normalization and aggregation on the remaining model updates to obtain a new global model;

the new global model is broadcast into the blockchain for the next round of federal learning.

In one embodiment, the clients are registered in the blockchain, and the clients participating in learning execute the federal learning task issued by the task issuer in the blockchain, including:

when each client side is registered in the block chain, each client side has a private key and a private channel bound with the private key, and the private channel is used for communication between the client side and the block chain;

a task publisher creates a new block in a block chain and stores federated learning task information in the new block, wherein the federated learning task information comprises a model structure, initialization parameters and training hyper-parameters;

and the client participating in the federal learning downloads and stores the federal learning task information from the new block, performs model training by using local data to update the model, and uploads the model update to the block chain through a private channel.

In one embodiment, the verifying the update direction of the model update uploaded by each client by the aggregator screened from each client includes:

a cache pool of the block chain receives model updating in a preset time window, and at least 2 clients are randomly selected from all clients participating in federal learning to serve as aggregators;

and randomly dividing all model updates in the cache pool into a plurality of accounts and then distributing the accounts to aggregators, determining the updating direction of each model update by comparing the model updates contained in the received accounts with the global model in the previous round, sequencing the updating directions to form updating direction verification results, and uploading the accounts containing the updating direction verification results to a block chain.

In one embodiment, each round of training selects aggregators, which are selected using a roulette algorithm.

In one embodiment, when all model updates in the cache pool are randomly divided into multiple ledgers, each model update is guaranteed to be divided into at least 2 ledgers.

In one embodiment, the aggregator determines an update direction of each model update by comparing the model update contained in the received ledger with the global model of the previous round, including:

calculating a first total update quantity of the model update of the current round and the model update of the previous round of the client, calculating a second total update quantity of the global model of the current round and the global model of the previous round, and measuring the update direction of the model update of each client by using the cosine similarity of the first total update quantity and the second total update quantity.

In one embodiment, the aggregator eliminates abnormal model updates according to the update direction verification result, and performs scale normalization and aggregation on the remaining model updates to obtain a new global model, including:

and each aggregator synthesizes the update direction verification results of all aggregators to eliminate the model updates and the corresponding clients with ranked update directions in the account book, for the rest model updates, the scale of the global model in the previous round is taken as the standard, the normalized rest model updates have the same scale as the global model in the previous round, and then the model updates subjected to the normalization processing are aggregated to obtain the current new global model.

In one embodiment, the remaining model updates are normalized to be the same scale as the previous round of global models using the following formula:

wherein the content of the first and second substances,

model update, g, representing the Tth round of the jth client_T-1Representing the global model of round T-1, | | | | |, represents the vector l₂In the form of a norm,

presentation pair

And updating the normalized model.

In one embodiment, the new global model is broadcast into the blockchain for the next round of federal learning, including:

and comparing the new global model aggregated by each aggregator to check and eliminate the Byzantine client existing in the aggregators, creating a new block in the block chain after checking that the new global model is correct, uploading the Hash certificate and the new global model to the new block by the aggregator and broadcasting, and downloading the new global model from the new block by the client participating in the next round of federal learning to perform the next round of federal learning.

The technical conception of the invention is as follows: the objective of the byzantine attack under federal learning is to prevent convergence of the model, which itself conflicts with the training objective under federal learning. This conflict is intuitively reflected in the direction of the gradient update, and the angle between the model update of the byzantine attack and the global model will be larger than the angle between the normal update and the global model. Although a byzantine attacker can limit the angle between the model update and the global model in a single round of update, its final update direction should be at any greater angle to the normal update. Therefore, the invention can introduce a block chain technology to ensure the robustness of the federal learning to the Byzantine attack. Specifically, all updates are recorded using the ledger in the blockchain. And counting the final updating direction of each client from the global perspective, thereby excluding the Byzantine updating.

The second invention aims to provide a method for constructing an interested target classification model applied to the medical field, under a medical federal learning framework, through the fact that the model updating direction of the Byzantine attack is different from the model updating direction of normal training, a Byzantine attack client and corresponding model updating are screened out, meanwhile, the block chain technology is adopted to record the model updating, the model updating is prevented from being distorted, the medical federal learning framework is ensured to be trained more accurately, and the robustness of the interested target classification model is improved.

In order to achieve the second object, an embodiment provides a method for constructing a classification model of an object of interest applied in the medical field, including the following steps:

the federal study task issued by the task publisher is an interested target classification task in the medical field, the client side is provided with image data which belongs to the medical field and can be used for classifying the interested target, the model constructed by the federal study task is an interested target classification model, and the interested target classification model is constructed by the Byzantine robust federal study method.

The technical scheme provided by the two invention aims at least comprising the following beneficial effects:

by introducing a block chain technology, the training process of the federal learning task is clearer and more standard; the Byzantine updating is eliminated from the global perspective, the robustness is achieved, and single-round updating cheating is not easy to happen; and a plurality of aggregators are set for cross detection, so that the Byzantine attack is more difficult to implement, and the robustness of the global model and the classification model of the interested target is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a block chain-based byzantine robust federal learning method provided by an embodiment;

fig. 2 is a schematic diagram of direction of byzantine update, normal update, and global update under federal learning provided by an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

The federated learning in the medical field is difficult to defend against Byzantine attacks, so that the robustness of a model for classifying interested targets in the federated learning in the medical field is low, the classification is inaccurate, and meanwhile, the leakage of medical care data and the potential safety hazard of the data can be caused by all clients participating in the federated learning.

To ensure robustness to byzantine attacks, federal learning requires a decentralized architecture, and the entire process of federal learning needs to be recorded due to high privacy requirements in the medical field. The block chain can be provided with a plurality of dispersed servers, meanwhile, the training information can be prevented from being falsified, and the cracking requirements of the existing federal learning are perfectly met. Therefore, the embodiment provides a Byzantine robust federated learning method based on a block chain, an interested target classification model applied to the medical field is constructed through the Byzantine robust federated learning method, and a global model described below is the interested target classification model.

Fig. 1 is a flowchart of a block chain-based byzantine robust federal learning method according to an embodiment. As shown in fig. 1, the method for bazaltine robust federal learning based on a block chain according to the embodiment includes the following steps:

step 1, establishing a federal learning task in a block chain, and starting the federal learning training.

Federated learning jointly trains a common model through multiple clients, so the goals of federated learning may be analyzed as follows: it is assumed that there is a basic sample space Ω ═ X × Y, where X denotes the space of feature vectors and Y denotes the space of corresponding labels. For j e [ n ∈ ]]Let D be {1, ·, n }, let D_jRepresents the local data distribution of client j, hence the overall loss function f of client j_j(θ) can be expressed as:

wherein l (theta; (x)_j,y_j) For R, the model parameter theta and the variable (x)_j,y_j) The predicted loss function of (a) is,

is represented by (x)_j,y_j) Then a gradient estimate can be derived for the loss function

For the aggregation of federated learning, local gradient updates will be collected, and after the aggregation completes the gradient, parameters of the global model will be updated:

wherein n represents the number of clients currently participating in the federal learning, t represents the number of rounds of the current federal learning operation, and mu^tRepresenting the global learning rate for the current round. And when the number of running rounds reaches the set value, the federal learning completes training to obtain a final global model.

As a byzantine attacker, wants to prevent convergence of the global model, so its goal can be understood as making the loss of the global model go towards the maximum, so the attack goal of the byzantine client q can be expressed as:

is represented by (x)_j,y_j) By maximizing the penalty, the attacker can achieve the effect of preventing model convergence.

In the embodiment, when the federal learning is established under the block chain, a plurality of initialization operations are required, specifically including system initialization and task initialization, and after the initialization is finished, the federal learning task is performed.

For system initialization, the licensed block chain is used to control network size and enable authentication control. Before participating in the training of federal learning, the clients need to register in the system, and each client will have its own private key and a dedicated channel bound with the private key after registering. Through this dedicated channel, the client can upload each round of model updates.

For task initialization, the federated learning task is activated when a task publisher publishes a genetics block (foundational block). The genetics block contains information of a trained federal learning task, specifically including an interested target classification task and a corresponding model structure and initialization parameters, and also includes training hyper-parameters such as a learning rate, an optimizer, a loss function and a model parameter solving mode. In an embodiment, the model structure may be a VGG-16 model. The object of interest classification task may be some partial feature classification task, such as a tracheal feature classification task, a retinal glycome feature classification task, and the like.

The client side has image data of an interested target classification task, such as an X-ray image, and performs preprocessing on the image data, including graying the image and adjusting the size of the image to be matched with the size of the model. Clients participating in federal learning will download the genetics block and use the information in the genetics block to build training models and algorithms and initialize local image data in preparation for local training. In order to keep up with the latest training progress, the newly-added client also needs to download the latest block to update the model state, and participates in the latest round of federal training on the basis.

When the client side conducts local training, the client side conducts model training by using local image data, and model updating conducted through local training is uploaded to the updating storage block from a specific channel.

And 2, verifying the updating direction of the model update uploaded by each client by the aggregators screened from each client.

Model updates uploaded by the client will be temporarily stored in the cache pool and wait for allocation. For each round of training, a maximum receiving time window is set, once the receiving time window is exceeded, the cache pool is closed, and model updating is not received any more, so that equipment downtime and the phenomenon that a Byzantine client delays federal training can be prevented.

After the model updating reception is finished, at least 2 aggregators are randomly selected from the clients participating in the current round by adopting a roulette algorithm. Then, the model updates of the client are randomly grouped, and when randomly grouped, each model update is guaranteed to be divided into at least 2 ledgers so as to mutually verify the credibility of aggregators. Randomly grouped model updates are published with ledgers, and the aggregator will download the corresponding 1 ledger.

For each aggregator, taking round T as an example, the received account book contains two kinds of information, one is gradient information of round 0 to round T-1 of the global model, and the other is model update of round 0 to round T of different clients, that is, gradient information. According to the gradient information, the aggregator needs to calculate the global model and the final update direction of different clients.

Assume that the gradient information of the 0 th round of the global model is represented as g₀Gradient information of the T-1 th round of the global model is represented as g_T-1Then, the total update amount of the global model can be obtained: Δ g ═ g_T-1-g₀Likewise, the total update amount of the jth client can be calculated

Based on the total update quantity of the global model, the cosine similarity can be used for measuring the model update direction c of the client^j：

Update the direction c^jC of^jThe value of-1 will be [ -1, 1 [ ]]When the current client updating direction is just opposite to the global model updating direction, the angle is 180 degrees, and when c is^jWhen 1, it means that the current client direction is the same as the global model update direction, and the angle is 0 °.

FIG. 2 is a representation of examples of Federal learning Byzantine updates, Normal updates, and Global updatesThe new direction is schematically shown. As shown in FIG. 2, c₁,c₂And c₃Respectively representing the included angles between the updating directions of the two normal clients and the updating directions of the Byzantine client and the global model. Since the goal of the byzantine attack is to prevent the global model from converging, the effect of preventing model convergence is only achieved when the angle between the byzantine update and the global model would be greater than 90 °. The goal of the normal client update is to converge the global model, so the angle between the normal update and the global model is less than 90 °. Therefore, the angle of the byzantine update is larger than the normal update angle, so that the byzantine update can be detected.

After the aggregators complete the calculation of the updating directions of all the clients in the ledger, the clients are sorted from large to small according to the calculated updating directions to form updating direction verification results, and then the updating direction verification results are packaged into a new ledger to be published among the aggregators.

Each aggregator can share the update direction verification result to other aggregators, but cannot share the gradient update, so that the Byzantine client is prevented from manipulating the update direction verification result when the aggregator is elected.

And 3, eliminating abnormal model updates according to the update direction verification result by the aggregator, carrying out scale normalization and aggregation on the rest model updates to obtain a new global model, and carrying out the next round of federal learning.

In the embodiment, the aggregators can achieve consensus according to the ranking result of the updating direction, and the ranked clients and model updates are excluded. It is contemplated that byzantine clients may also amplify updates maliciously by a large factor, thereby making them dominate global model updates. Therefore, the size of each client update needs to be normalized. Due to federal learning and blockchain distribution characteristics, it is a challenge to determine the size that should be standardized. Embodiments normalize each local model update to the same size as the global update of the previous round, taking the global update of the previous round as a criterion. This normalization means rescaling the client updates to the same hyper-sphere where the global update is located in the vector space. Specifically, the normalization process is performed using the following formula:

wherein the content of the first and second substances,

model update, g, representing the Tth round of the jth client_T-1Representing the global update of round T-1, | | · | | represents l of the vector₂Norm form.

This normalization ensures that a single client update does not have too much impact on the aggregated global model updates. In addition, the normalization process also amplifies the smaller client updates to a certain extent, making them the same as the global updates. Smaller-sized client updates are more likely to come from benign clients, as byzantine clients always want to handle training by scaling up the update size. Thus, scaling up helps to reduce the impact of byzantine client updates, resulting in a better global model.

And the aggregator aggregates the model updates after the normalization processing to obtain a new global model. It should be noted that each aggregator will generate a new global model. If all aggregators are normal clients, the global model they produce will be the same. Therefore, it is also possible to check whether there is a byzantine client among the aggregators by comparison of the last global model, and delete the existing byzantine client.

After checking the new global model for errors, a new chunk is created that will pack back the aggregator's hash attestation and the new global model. This new block is then broadcast for the next round of training.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A Byzantine robust federated learning method based on a block chain is characterized by comprising the following steps:

2. The blockchain-based byzantine robust federal learning method as claimed in claim 1, wherein the clients are registered in the blockchain, and the clients participating in learning execute the federal learning task issued by the task issuer in the blockchain, including:

3. The blockchain-based byzantine robust federal learning method as claimed in claim 1, wherein the verifying of the update direction of the model update uploaded by each client by the aggregator screened from each client comprises:

4. The block chain based byzantine robust federal learning method as claimed in claim 3, wherein aggregators are selected for each round of training, and wherein the aggregators are selected using roulette algorithm.

5. The block chain based byzantine robust federal learning method as claimed in claim 3, wherein when all model updates in the cache pool are randomly divided into a plurality of accounts, each model update is guaranteed to be divided into at least 2 accounts.

6. The blockchain-based byzantine robust federal learning method as claimed in claim 3 or 5, wherein the aggregator determines an update direction of each model update by comparing the received ledger-contained model update with the previous round of global model, comprising:

7. The blockchain-based byzantine robust federal learning method as claimed in claim 1, wherein the aggregator eliminates abnormal model updates according to update direction verification results, and performs scale normalization and aggregation on the remaining model updates to obtain a new global model, comprising:

8. The blockchain-based byzantine robust federal learning method as in claim 7, wherein the remaining model updates are normalized to be the same size as the previous round of global model using the following formula:

wherein, the first and the second end of the pipe are connected with each other,

presentation pair

And updating the normalized model.

9. The blockchain-based byzantine robust federated learning method of claim 1 or 7, wherein the new global model is broadcasted into blockchains for the next round of federated learning, comprising:

10. A method for constructing a classification model of an object of interest applied to the medical field is characterized by comprising the following steps:

the federal study task issued by a task publisher is an interested target classification task in the medical field, a client side has image data which belongs to the medical field and can be used for interested target classification, a model constructed by the federal study task is an interested target classification model, and the interested target classification model is constructed by the Byzantine robust federal study method of claims 1-9.