WO2024027164A1

WO2024027164A1 - Adaptive personalized federated learning method supporting heterogeneous model

Info

Publication number: WO2024027164A1
Application number: PCT/CN2023/082145
Authority: WO
Inventors: 邓水光; 秦臻
Original assignee: 浙江大学; 浙江大学中原研究院
Priority date: 2022-08-01
Filing date: 2023-03-17
Publication date: 2024-02-08
Also published as: CN115271099A

Abstract

Disclosed in the present invention is an adaptive personalized federated learning method supporting a heterogeneous model. In the method, on the basis of supporting participants of federated learning in using models having different structures, high-accuracy data heterogeneity adaptive personalized federated learning is realized by means of learning a dynamic weight for model integration, and introducing an optimization objective for model integration during the process of training model parameters, such that the participants can benefit from federated learning in scenarios having different degrees of data heterogeneity. The adaptive personalized federated learning method of the present invention does not need to have a new hyper-parameter introduced and can be conveniently deployed in existing federated learning systems; and compared with traditional personalized federated learning methods, the present invention is more adaptable.

Description

An adaptive personalized federated learning method that supports heterogeneous models

Technical field

The invention belongs to the field of artificial intelligence technology, and specifically relates to an adaptive personalized federated learning method that supports heterogeneous models.

Background technique

Artificial intelligence has become one of the important technologies driving social and economic development and has been deeply integrated into every corner of people's lives. As the core artificial intelligence technology represented by deep learning continues to make new breakthroughs, artificial intelligence technology gradually relies on a large amount of data for model training. However, this has brought about the problem of excessive collection and use of personal privacy data, causing people to misunderstand the data. Privacy awareness and concerns are also growing. The introduction of data regulatory policies and the emergence of related regulatory technologies have promoted the development of privacy-protecting artificial intelligence technology and promoted federated learning, a computing method that collaborates with multiple participants to train machine learning models while protecting data privacy. Paradigm progress.

However, existing federated learning methods face two problems: data heterogeneity and model heterogeneity; on the one hand, the non-independent and identically distributed (non-IID) characteristics of the training data distributed on various participating devices This will seriously restrict the effectiveness of federated learning. Many studies have shown that the traditional federated averaging method converges slowly or even does not converge when the distribution of data held by each participant is different. Although many researchers have proposed a variety of personalized federated learning methods for the problem of data heterogeneity faced by federated learning, such as based on regularization (regularization), local tuning (tuning), model interpolation (model interpolation) and multi-task learning (multi -task learning), etc. However, these methods are only suitable for scenarios with certain degrees of data heterogeneity. In actual operation, since the training data is widely distributed on various participating devices, and the degree of data heterogeneity is usually unknown, it is difficult to select an appropriate personalized federated learning method. This has given rise to the need for adaptive personalized federated learning technology. need. On the other hand, the existing personalized federated learning methods are more oriented to the scenario of model isomorphism, that is, each participant needs to use a model with the same structure, and in the case of federated learning, each participant comes from different commercial organizations. , each participant may tend to use a model that is more suitable for their own business data, and the model structure may be different for each business owner. Confidentiality of professional organizations. Therefore, federated learning methods that enable differentiated model structures can further protect the privacy of participants and provide a higher degree of personalization.

Deep Mutual Learning technology (Deep Mutual Learning) provides the technical basis for training two different models at the same time based on the same data. On this basis, some researchers have proposed the Federated Mutual Learning (Federated Mutual Learning) method, and the participation of federated learning Participants train private models and global shared models at the same time. The private model remains local and its model structure and parameters are not shared. However, the structure and parameters of the global shared model are consistent among each participant, and the central server is responsible for periodicity. It can be aggregated and distributed locally as a medium for knowledge sharing among various participants.

In a federated learning system, each participant holds two different models: a private model and a global shared model. In order to improve the accuracy of the model, a simple approach is to directly average the output predictions of the two models and use the average prediction result as the final result. However, the performance of the two models on different data has certain differences: In the case of highly heterogeneous data, the private model learns the distribution of the corresponding participant's private data set well, thereby having better accuracy on the corresponding participant's private data set, while the global shared model suffers Due to the impact of data heterogeneity, accuracy is usually poor. In the case where the data tends to be isomorphic, the global shared model benefits from the knowledge sharing of multiple participants and has better accuracy, while the private model mainly relies on the knowledge of the corresponding participants. In this case, the accuracy Poorly, directly integrating two models will cause the accuracy of the integration to be severely affected by the low-accuracy model.

Contents of the invention

In view of the above, the present invention provides an adaptive personalized federated learning method that supports heterogeneous models to carry out adaptive personalized federated learning on different data when the private model structure and parameters of the participants are unknown. Participants can benefit from federated learning in scenarios with varying degrees of heterogeneity.

An adaptive personalized federated learning method that supports heterogeneous models, including the following steps:

(1) The central server initializes the parameters of the global shared model;

(2) The central server distributes the global shared model parameters to each participant of federated learning. After receiving the global shared model parameters, the participants use the parameters to update the global shared model they hold;

(3) Participants perform adaptive force learning to update the weights of private models;

(4) Participants use the newly obtained private training data to simultaneously train private models and global shared models based on the stochastic gradient descent algorithm;

(5) Participants upload the global shared model parameters after a round of iterative training to the central server;

(6) After the central server collects enough global shared model parameters, it aggregates these model parameters to obtain new global shared model parameters, and then returns to step (2) to distribute the new global shared model parameters to each participant. , and loop in this manner until the loss functions of all models converge or reach the maximum number of iterations.

Further, the global shared model is trained by the participants of federated learning, and the central server is responsible for aggregation. Each participant holds a copy of the global shared model. On the one hand, the model is available to each participant after the federated learning training is completed. It is used for reasoning, and on the other hand, it serves as a medium for sharing knowledge among participants.

Furthermore, the private model is a model held by each participant of the federated learning and the structure and parameters are not made public. The structure of the private model held by each participant is different.

Further, the participants are terminal devices in the federated learning system. In order to profit from the federated learning system, that is, to obtain higher accuracy model parameters, they upload the model parameters to the central server and download the aggregated model parameters from the central server. model parameters.

Further, the specific implementation method of step (3) is: the participant first divides a small part (for example, 5% of the training data) from the obtained private training data as the verification set, and combines the private model and the global shared model in Inference is performed on the verification set to obtain the predicted output result p _pri of the private model and the predicted output result p _sha of the global shared model; then the participants update the weight of the private model through the stochastic gradient descent method, and the update expression is as follows:

Among them: λ _i is the weight of the private model before updating, λ′ _i is the weight of the private model after updating, eta represents the learning rate, Indicates that L _CE (p _aen ,y) obtains the gradient of λ _i , L _CE (p _aen ,y) indicates the cross entropy of p _aen and y, p _aen indicates the result of the weighted average of p _pri and p _sha , and y is the true value Label.

Further, the loss function expression used for private model training in step (4) is as follows:
L _pri =L _CE (p _pri ,y)+D _KL (p _pri ||p _sha )+L _CE (p _aen ,y)

Among them: L _pri is the loss function of the private model, L _CE (p _pri ,y) represents the cross entropy of p _pri and y, L _CE (p _aen ,y) represents the cross entropy of p _aen and y, D _KL (p _pri ||p _sha ) represents the KL divergence of p _pri relative to p _sha , p _aen represents the weighted average result of p _pri and p _sha , y is the true value label, p _pri is the prediction output result of the private model, _psha is the prediction output result of the global shared model.

Further, the loss function expression used for global shared model training in step (4) is as follows:
L _sha =L _CE (p _sha ,y)+D _KL (p _sha ||p _pri )+L _CE (p _aen ,y)

Among them: L _sha is the loss function of the global shared model, L _CE (p _sha ,y) represents the cross entropy of p _sha and y, L _CE (p _aen ,y) represents the cross entropy of p _aen and y, D _KL (p _sha ||p _pri ) represents the KL divergence of p _sha relative to p _pri , p _aen represents the weighted average result of p _pri and p _sha , y is the true value label, p _pri is the prediction output result of the private model, p _sha The prediction output results for the global shared model.

Further, in step (6), after collecting enough global shared model parameters, the central server executes the federated averaging algorithm to aggregate these model parameters, and then issues the aggregated new global shared model parameters to each participant. By.

On the basis of supporting each participant in federated learning to use models with different structures, the method of the present invention achieves high accuracy by learning dynamic weights for model integration and introducing optimization goals for model integration in the process of training model parameters. Personalized federated learning that is adaptive to data heterogeneity can enable participants to benefit from federated learning in scenarios with varying degrees of data heterogeneity. In addition, the adaptive personalized federated learning method of the present invention does not require the introduction of new hyperparameters and can be easily deployed in existing federated learning systems. Specifically, the present invention has the following beneficial technical effects:

1. The present invention enables federated learning that supports model heterogeneity. On the basis of protecting participants' private training data from being leaked, it further protects the privacy of participants' model structures and achieves broader privacy protection.

2. The present invention enables an adaptive personalized federated learning method, which enables federated learning participants to benefit from federated learning in scenarios with different degrees of data heterogeneity (compared to using only local In the case of private data, a higher accuracy model is obtained).

3. The present invention solves the problem that the existing personalized federated learning method is only effective in scenarios with a specific degree of data heterogeneity; compared with the traditional personalized federated learning method, the present invention has stronger adaptability. .

Description of drawings

Figure 1 is a schematic diagram of the architecture of the adaptive personalized federated learning system of the present invention.

Figure 2 is a schematic flow chart of the adaptive personalized federated learning method of the present invention.

Detailed ways

In order to describe the present invention more specifically, the technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The system architecture of the adaptive personalized federated learning method that supports heterogeneous models is shown in Figure 1. The system mainly includes two parts: a central server and participants. The central server is responsible for coordinating each participant to run the federated learning method, including the overall situation. It is responsible for the initialization of the shared model, the reception, aggregation and delivery of the global shared model, and is also responsible for checking whether the global shared model has converged or whether the adaptive personalized federated learning method has cycled for a sufficient number of rounds to decide whether to terminate the method.

In this embodiment, each participant uses the method of the present invention to collaboratively train an image classification model, and uses the private model and global shared model obtained by training to perform subsequent reasoning.

First, participants coordinate to select a model for image classification as a global shared model, and jointly agree on parameters such as the number of rounds of the overall iteration of the method. Then, under the coordination of the central server, the following process steps are run as shown in Figure 2:

(1) Initialize the global shared model: The central server initializes the parameters of the selected global shared model. The initialization algorithm can be coordinated by each participant in advance, such as through the Xavier initialization method or the Kaiming initialization method. This embodiment does not impose restrictions.

(2) Global shared model delivery: After completing the parameter initialization of the global shared model, the central server will deliver the parameters of the global shared model to each participant in the federated learning. After receiving the parameters of the global shared model, each participant , update the global shared model held by itself.

(3) Learning adaptability: In this embodiment, each participant in federated learning holds a private training set composed of several training private data, in which each training data sample is a labeled picture. Each participant in federated learning randomly samples 5% of the training data from the private training set held by them as the verification set. For each data sample in the verification set, it is used as input and sent to the private model and the global shared model for inference. , obtain the classification result p _pri output by the private model and the classification result p _sha output by the global shared model, and obtain the weighted average classification result p _aen according to the following formula:

p _aen =λ _i ·p _pri +(1-λ _i )·p _sha

The participant's private model weight coefficient λ _i is then updated through the stochastic gradient descent algorithm, as shown in the following formula:

Among them: y represents the label of the image.

In this embodiment, in order to improve the stability of the λ _i learning process, mini-batch gradient descent is used to update λ _i , that is, several pictures are packaged into a batch of data and input into two models at once to obtain a Classification results of a batch of data, and update the weight λ _i according to the above formula based on the classification results of a batch of data. After several rounds of iterations, λ _i will converge to a suitable value, and the adaptive force learning step ends. In this embodiment, λ _i is iteratively updated on the verification set for several epochs. It should be noted that the solution of modifying the number of iterative updates of λ _i is still within the scope of the present invention.

(4) Learning integration: Each participant runs this step independently; for one of the participants, it uses its own private training data to simultaneously train the private model (Private Model) and the global shared model based on the stochastic gradient descent algorithm, and the process of training the private model The goal is to minimize the loss function L _pri defined as follows:

L _pri =L _CE (p _pri ,y)+D _KL ( _ppri ||p _sha )+L _CE (p _aen ,y)

Among them: L _CE (p, y) represents the cross-entropy loss function calculated based on the image classification result p output by the model and the real label y of the image, D _KL (p _pri ||p _sha ) represents the classification result p _pri output by the private model KL divergence calculated relative to the classification result p _sha output by the global shared model;

The goal of training the global shared model is to minimize the loss function _Lsha defined as follows:

L _sha =L _CE (p _sha ,y)+D _KL (p _sha ||p _pri )+L _CE (p _aen ,y)

In order to complete the above training tasks, this embodiment uses the small-batch gradient descent method for training. Specifically: assuming that the k-th batch of data is used in the t-th training, first based on the private data after the t-1th training model and the global shared model, using the k-th batch of data as input to obtain the classification results p _pri and p _sha , then update the private model according to the definition of L _pri , and then update the global shared model according to the definition of L _sha ; repeat the above steps for several cycles. , the learning integration step ends.

(5) Global shared model upload: After completing steps (3) and (4) of training, participants in federated learning upload their own trained global shared model to the central server, while keeping the private model locally.

(6) Global shared model aggregation and delivery: The central server receives enough global shared models Finally, federated averaging is performed to aggregate these global shared models. Considering that the participants of federated learning are usually not in a local area network, and the device performance of each participant is different, the central server will set a certain waiting time, and the global shared model received within the waiting time window will be used Aggregation, after the time window ends, the global shared model of the current round will no longer be received. After the central server ends the time window of the current round, it aggregates a new global shared model through the federated averaging algorithm. The aggregation process is as follows:

Among them: w _sha represents the new global shared model after aggregation, Represents the global shared model uploaded by the i-th participant.

Subsequently, the central server issues the aggregated new global shared model to each participant; each time step (6) is executed, the central server will check whether the number of method loops has reached the preset number of rounds of overall iteration, or whether The accuracy of the model has not been further improved after several consecutive rounds of aggregation; if one of the above two determination conditions is met, the method is terminated, otherwise it will be re-executed from step (3).

The above description of the embodiments is to facilitate those skilled in the art to understand and apply the present invention. It is obvious that those skilled in the art can easily make various modifications to the above embodiments, and the general descriptions here can be made. The principles may be applied to other embodiments without undue inventive effort. Therefore, the present invention is not limited to the above embodiments. Improvements and modifications made by those skilled in the art based on the disclosure of the present invention should be within the protection scope of the present invention.

Claims

An adaptive personalized federated learning method that supports heterogeneous models, including the following steps:

(1) The central server initializes the parameters of the global shared model;

(2) The central server distributes the global shared model parameters to each participant of federated learning. After receiving the global shared model parameters, the participants use the parameters to update the global shared model they hold;

(3) Participants perform adaptive force learning to update the weights of private models;

(4) Participants use the newly obtained private training data to simultaneously train private models and global shared models based on the stochastic gradient descent algorithm;

(5) Participants upload the global shared model parameters after a round of iterative training to the central server;

(6) After the central server collects enough global shared model parameters, it aggregates these model parameters to obtain new global shared model parameters, and then returns to step (2) to distribute the new global shared model parameters to each participant. , and loop in this manner until the loss functions of all models converge or reach the maximum number of iterations.
The adaptive personalized federated learning method according to claim 1, characterized in that: the global shared model is trained by federated learning participants, the central server is responsible for aggregation, and each participant holds a copy of the global shared model Copy, on the one hand, this model can be used by each participant for reasoning after the federated learning training is completed, and on the other hand, it serves as a medium for each participant to share knowledge.
The adaptive personalized federated learning method according to claim 1, characterized in that: the private model is a model held by each participant of federated learning and the structure and parameters are not public. Model structures vary.
The adaptive personalized federated learning method according to claim 1, characterized in that: the participants are terminal devices in the federated learning system, and they obtain higher accuracy model parameters in order to profit from the federated learning system. , upload model parameters to the central server and download aggregated model parameters from the central server.
The adaptive personalized federated learning method according to claim 1, characterized in that: the specific implementation method of step (3) is: participants first divide a small part from the obtained private training data as a verification set, The private model and the global shared model are inferred on the verification set to obtain the private model. The predicted output result p pri of the model and the predicted output result p sha of the global shared model; then the participants update the weight of the private model through the stochastic gradient descent method, and the update expression is as follows:

Among them: λ i is the weight of the private model before updating, λ′ i is the weight of the private model after updating, eta represents the learning rate, Indicates that L CE (p aen ,y) obtains the gradient of λ i , L CE (p aen ,y) indicates the cross entropy of p aen and y, p aen indicates the result of the weighted average of p pri and p sha , and y is the true value Label.
The adaptive personalized federated learning method according to claim 1, characterized in that: the loss function expression used for private model training in step (4) is as follows:
L pri =L CE (p pri ,y)+D KL (p pri ||p sha )+L CE (p aen ,y)

Among them: L pri is the loss function of the private model, L CE (p pri ,y) represents the cross entropy of p pri and y, L CE (p aen ,y) represents the cross entropy of p aen and y, D KL (p pri ||p sha ) represents the KL divergence of p pri relative to p sha , p aen represents the weighted average result of p pri and p sha , y is the true value label, p pri is the prediction output result of the private model, and p sha is Globally share the model’s prediction output.
The adaptive personalized federated learning method according to claim 1, characterized in that: the loss function expression used for global shared model training in step (4) is as follows:
L sha =L CE (p sha ,y)+D KL (p sha ||p pri )+L CE (p aen ,y)

Among them: L sha is the loss function of the global shared model, L CE (p sha ,y) represents the cross entropy of p sha and y, L CE (p aen ,y) represents the cross entropy of p aen and y, D KL (p sha ||p pri ) represents the KL divergence of p sha relative to p pri , p aen represents the weighted average result of p pri and p sha , y is the true value label, p pri is the prediction output result of the private model, p sha The prediction output results for the global shared model.
The adaptive personalized federated learning method according to claim 1, characterized in that: in step (6), after collecting enough global shared model parameters, the central server executes a federated averaging algorithm to aggregate these model parameters. , and then distribute the aggregated new global shared model parameters to each participant.