CN115115021A

CN115115021A - Personalized federal learning method based on asynchronous updating of model parameters

Info

Publication number: CN115115021A
Application number: CN202210050723.7A
Authority: CN
Inventors: 吴兰; 张亚可
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-09-27

Abstract

The invention relates to a federated learning method, in particular to an individualized federated learning method based on asynchronous update of model parameters. Firstly, a personalized federal learning framework is constructed, the framework is divided into two parts at a client, one part updates all model parameters, and interaction is carried out at a server. The other part is to fix the parameters of the basic layer to update the personalized layer and establish a personalized model; secondly, dividing a basic layer of the common characteristics and a personality layer of the personality characteristics by introducing a distance measurement function under the framework; and finally, designing a new model parameter asynchronous updating strategy, transmitting the basic layer parameters in each training cycle in the transmission process from the client to the server, and transmitting the parameter interval t of the individual layer to realize the asynchronous transmission of the basic layer and the individual layer parameters. The simulation is carried out on the public data set, and the effectiveness of the method is proved through experimental results.

Description

Personalized federal learning method based on asynchronous updating of model parameters

Technical Field

The invention relates to a federated learning method, in particular to an individualized federated learning method based on asynchronous update of model parameters.

Background

In recent years, with the continuous development of artificial intelligence technology, the protection of data privacy and security has become an important development trend. The occurrence of federal learning provides an effective way for solving the problem of machine learning under the condition of data privacy. Google, 2016, first proposed federal Learning and elaborated On by a pioneering document Konen J, Mcmahan H B, Ramage D, et al. FIG. 1 is a schematic diagram of a horizontal federated framework, consisting of four steps, first, local model training for each client; secondly, each client sends the updated model parameters to a server; thirdly, the server aggregates the parameters sent by the clients; and finally, the server sends the parameters after the aggregation update to each client. And each client starts iteration at the next moment, and the loop is repeated until the whole training process is converged.

Later, as research continued to be in depth, in terms of federal Learning, Mcmahan H B, Moore E, D Ramage, et al communication-Efficient Learning of Deep Networks from Decentralized Data [ J ] 2016. a federal averaging algorithm (feadvg) was proposed, which uses model averaging at the server to aggregate updates to parameters uploaded by clients, and constructs a global model from a subset of clients with non-independent, identically distributed Data; li X, Huang K, Yang W, et al.on the conversion of FedAvg on Non-IID Data [ J ].2019. the Convergence of the Federal averaging algorithm on Non-IID Data is further demonstrated; tian L, Sahu A K, Zaheer M, et al. Federated Optimization for Heterogeneous networks.2019. a Fedprox algorithm is proposed, an approximate term is added on the basis of FedAvg, and a global model is optimized, so that the local update difference of a client is allowed; xie M, Long G, Shen T, et al, Multi-Center fed Learning [ J ].2020, a Federal Learning framework with multiple central servers is proposed, and clients with similar model parameters are divided into one central server according to Euclidean distance, so as to solve the problem of difference of data distribution.

In terms of reducing communication, Reisizadeh A, Mokhtari A, Hassani H, et al Fedpaq A communication-information fed learning method with periodic averaging and quantization [ C ]// International Conference on Intelligent integration and statistics PMLR,2020: 2021-; dai X, Yan X, Zhou K, et al. hyper-Sphere quantification Communication-influence SGD for fed Learning [ J ] 2019, a general framework is proposed, which can balance Communication efficiency and gradient precision; wang J, Joshi G.Cooperation SGD A unified Framework for the Design and Analysis of Communication-efficiency SGD Algorithms [ J ].2018, it is proposed to perform multiple rounds of local optimization before sending the local model to the server in order to reduce Communication costs.

Although the above methods make a major contribution in protecting data privacy and reducing communication, they ignore the performance of the local client model in the iterative process. Deng Y, Kamani M, Mahdavi M.adaptive Personalized learning.2020 proves that when the non-independent distribution of data is increased, the generalization error of the global model to the local data of the client is also increased remarkably, so that the trained global model is difficult to adapt to the specific data task of each client. Accordingly, federal learning is imminent for personalized modeling locally at the client.

Disclosure of Invention

It is an object of the present invention to address the above-mentioned deficiencies in the background art by providing a personalized federal learning methodology based on asynchronous updating of model parameters.

The technical scheme adopted by the invention is as follows: weighting and aggregating the model parameters at the server side according to the data size of each client, and expressing as follows:

wherein n is _k Is the sample data size of the kth client, n is the total number of training samples,

model parameters at the t +1 moment of the kth client; the client updates all parameters of the local model by using a gradient descent method, and the formula is as follows;

the loss function is defined as:

wherein f is _k (w) loss function for kth client

As a preferred technical scheme of the invention: the method is as follows.

The global model of the server is W _t The local model of the client is W _k 。

As a preferred technical scheme of the invention: the method is as follows.

For a network model with L layers, model parameters W of a server are set _t The extension is expressed as { (W) _t ) ₁ ,(W _t ) ₂ ,...,(W _t ) _L }。

As a preferred technical scheme of the invention: the method is as follows.

The model parameters of the client are expressed as { (W) layer by layer _k ) ₁ ,(W _k ) ₂ ,...,(W _k ) _L }。

As a preferred technical scheme of the invention: the method is as follows.

The method also comprises the following steps of calculating the cosine similarity between the l-th layer server and the client, and expressing the cosine similarity as follows:

wherein (W) _t ) _l As model parameters of the server layer I, (W) _k ) _l The model parameter representing the l-th layer of the client side, if the P is closer to 1, the l-layer parameter representing the server side is more similar to the l-layer parameter of the client side.

As a preferred technical scheme of the invention: the method is as follows.

At k clients, the cosine similarity is calculated layer by layer according to the following formula:

as a preferred technical scheme of the invention: the method is as follows.

Further comprises calculating s _l Mean for K clients:

and (3) judging:

comparing the change value, if the absolute value of the change of the layer l for l-1 is far larger than the absolute value of the change of the layer l-1 for l-2; at this time, output l, and model W at time t according to l _t Partitioning into base layers

And personality layers

The invention has the beneficial effects that:

(1) a model parameter layering method is provided for effectively mining a basic layer and an individual layer which represent common features and individual features in a model. Fully describing differences between the common characteristics and the individual characteristics by constructing a distance measurement function; meanwhile, setting a threshold value through the distribution distance between layers, and effectively dividing a basic layer representing the common characteristics and an individual layer representing the individual characteristics; experiments prove that the method can effectively divide the base layer and the individual layer.

(2) And in order to improve the accuracy of personalized modeling, a model parameter asynchronous updating strategy is provided. Firstly, updating the global model of the server based on the updating principle of a base layer and a personality layer. In the transmission process from the server to the client parameters, a base layer is transmitted in each training round, and the transmission is carried out on the interval t rounds of the personality layers; secondly, the client local model uses the private data to update the personalized layer to establish a personalized model. Finally, compared with the existing method, the strategy can not only improve the accuracy of the personalized model, but also obviously reduce the communication cost.

Drawings

FIG. 1 is a schematic representation of the horizontal federal framework of the present invention;

FIG. 2 is a personalized custom Federal learning framework in a preferred embodiment of the present invention;

FIG. 3 is a diagram of a hierarchical asynchronous update strategy for a DNN model in a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of a data set in accordance with a preferred embodiment of the present invention;

FIG. 5 is a diagram of the cosine similarity of the DNN global model and the client model in the preferred embodiment of the present invention;

FIG. 6 is a diagram of personalized model accuracy for different number of personality layers for the DNN model in a preferred embodiment of the present invention;

FIG. 7 is a diagram illustrating the cosine similarity between the CNN global model and the client model in the preferred embodiment of the present invention;

fig. 8 is a diagram of the accuracy of the personalized model under different number of personalized layers of the CNN model in the preferred embodiment of the present invention.

Detailed Description

It should be noted that, in the present application, features of embodiments and embodiments may be combined with each other without conflict, and technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Referring to fig. 1 to 8, a personalized federal learning method based on asynchronous update of model parameters is provided in a preferred embodiment of the present invention, wherein a federated average algorithm is first proposed in the present embodiment, which is developed for obtaining a central prediction model of a Google application program and can be embedded in a mobile phone to protect privacy of a user. The server side weights and aggregates the model parameters according to the data size of each client side, and the method is represented as follows:

the model parameters at the time t +1 of the kth client. The client updates all parameters of the local model by using a gradient descent method, and the formula is as follows:

the loss function is defined as:

wherein f is _k (w) is the loss function for the kth client.

Transfer learning is to use knowledge obtained in solving one problem to solve another related problem, rather than learning from scratch. Inspired by model migration: the base layer in the model is suitable for the common characteristics of different tasks, which means that part of the parameters (base layer parameters) in the model parameters represent the common characteristics of different data sets; the personality layers in the model then learn personality characteristics associated with a particular data set. Furthermore, the migration learning can realize the personalized customization of the federal learning, the server side sends part of parameters of the global model to the client side, and the client side freezes the model parameters transmitted by the server and trains the rest model parameters by using local data so as to achieve the personalization. Based on the above elicitations, the embodiment provides a model layering algorithm, and personalized customization is locally implemented at the client by using model migration.

The personalized federal learning framework of the present embodiment is shown in fig. 2, wherein N is _m Denoted as mth client, (x) _i ,y _i ) _m Private data for the mth client; w is a group of ^m Are the parameters of the model thereof, wherein,

expressed as base layer model parameters for the mth client,

is a personality layer model parameter and each client has a different data distribution.

With ClientN _m For example, the personalized federal learning framework of the present embodiment is described in detail. As shown in FIG. 2, client N _m With two parts, one part freezing base layer parameters

And updating personality layer parameters

The other part does not freeze the base layer parameters, but updates the whole parameters W ^m And determining whether all the model parameters are sent to the server at the current moment according to the model parameter asynchronous updating strategy. And the server aggregates the model parameters sent by the clients and then sends the model parameters to the clients, and the process is repeated.

In this embodiment, the global model of the server is W _t The local model of the client is W _k . For a network model with L layers, model parameters W of a server are set _t The extension is expressed as { (W) _t ) ₁ ,(W _t ) ₂ ,...,(W _t ) _L And expressing the model parameters of the same client as (W) _k ) ₁ ,(W _k ) ₂ ,...,(W _k ) _L And calculating cosine similarity of the l-th layer server and the client, wherein the cosine similarity is expressed as follows:

in the above formula, (W) _t ) _l As model parameters of the server layer I, (W) _k ) _l Model parameters representing the client layer i. If P is closer to 1, the more similar the l-layer parameters of the server-side and client-side are. For k clients, the following formula exists for calculating the cosine similarity layer by layer:

calculating s _l Mean for K clients:

and (3) judging:

here, the change value is compared if the absolute value of the l-layer change for l-1 is much larger than the absolute value of the l-1 change for l-2. At this time, output l, and model W at time t according to l _t Partitioning into base layers

And personality layers

As shown in FIG. 3, the model of the server and the client is shown as DNN, and the model can be divided into a basic layer W of learning task common characteristics through the basic layer W _B Personality layer W of personality characteristics of learning task _P Wherein the parameter size of the base layer is represented as S _B The size of the personality layer parameter is expressed as S _P . FIG. 3 also shows a Federal learning example with four clients { A, B, C, D } and a central server to illustrate the embodimentThe model parameters of (1) asynchronously update the policy. The abscissa is the number of communication rounds and the ordinate represents { A, B, C, D } four clients and one central server. The bottom gray shaded portion represents the model parameters that participate in the update for each round, and the model parameters outside the bottom gray box are represented as personality layers for four clients. In the five-round communication round number (t-4., t-1, t), the model parameters of the base layer and the personality layer are updated only in the last two rounds at the same time, and for convenience, the inventor refers to freq 2/5. Reduced by 3S compared with the traditional federal learning _P The amount of traffic. Therefore, for a neural network model with a very large number of model parameters, the method can significantly reduce communication cost.

For the base layer parameters sent by the client fixed server, updating the parameters of the personalized layer model by using private data, and establishing the personalized model locally by using the following formula:

wherein

The individual layer at the kth client side at the moment t;

sending the basic layer parameters of each client to the server at the moment t;

local data of a kth client; b is the batch size.

The algorithm is mainly divided into two parts, one part being executed by the server, like algorithm 1.

In Algorithm 1, the network model parameters W are initialized by the server ₀ When the meaning of sets ← freq is freq ═ 2/10, sets ═ 9,10, and when freq ═ 4/10, sets ═ 7,8,9,10, and flag is defined according to the number of wheels; when t is 1, W is performed first _t ^k ←ClientUpdate(k,W _t ) (ii) a Secondly, weighting and updating the model of the server according to the size of the data volume of the client by using a formula (2); then calculating the cosine similarity of the server model and the client model layer by layer according to formulas (4) and (5), and distinguishing the boundary of the basic layer and the individual layer according to a formula (7); finally, the base layer model after the differentiation is carried out

And returning the data to the client. And when t is not equal to 1, only updating the base layer uploaded by the client, and returning the updated base layer to the client for updating the client at the next moment and personalized customization of the local model of the client.

The other part is executed by the client, such as algorithm 2.

In algorithm 2, when t is 1 or flag is true, the client updates the model parameters using local data and sends all the parameters to the server; when t is not equal to 1, in order to cooperate with the server, the client updates the basic layer by using local data, and uploads the updated basic layer to the server; meanwhile, the client side uses the base layer sent by the local data fixed server side to train the personalized layer to generate a personalized model locally.

Experimental data the MNIST data set is a training data set of pictures with a font of 60000 handwritten numbers, each of which is a 28 x 28 pixel grayscale image as shown in fig. 4. In order to generate non-independent and uniformly distributed data on the client, the embodiment firstly sorts the data according to the image digital labels (0, 1...., 9), and divides each 300 data into one group on the basis of sorting, and divides the data into two hundred groups. Two groups of local data are randomly selected as local data of one client, namely 600 local data of each client.

Firstly, the embodiment uses the traditional FedAvg algorithm to obtain the accuracy of the global model expressed when the global model faces different data tasks of the client. (2) Compared with the experiment (1), the effectiveness of the personalized model obtained based on the layering strategy provided by the embodiment is verified through analysis. (3) Meanwhile, in order to ensure the accuracy of the global model and reduce the communication cost, the embodiment also performs an asynchronous model parameter updating experiment, and performs comparative analysis on the communication cost with the FedAvg algorithm. The experimental parameters are shown in table 1.

TABLE 1 Experimental parameters

The present embodiment here employs two network structures, one being a 6-layer DNN network structure and the other being a convolutional neural network having 10 layers. The DNN and the convolutional neural network are common model structures in image classification experiments, and the experiments of the embodiment are all performed on the basis of the basic structures of the two networks, so that the fairness and the comparability of the experiments are guaranteed.

In the DNN network, cosine similarity comparison is performed between model parameters of ten clients and global model parameters layer by layer, and the result is shown in fig. 5; then, comparing the precision of the personalized models under different numbers of the basic layers/personalized layers, and verifying the layering strategy based on the cosine similarity provided by the embodiment as a result shown in fig. 6; finally, as shown in table 2, different frequency asynchronous update model accuracy and communication cost comparison experiments were performed in the DNN network.

TABLE 2 different frequency asynchronous update model precision and communication cost comparative analysis of DNN

When freq is 2/10, indicating that in 10 federal learning processes, the last two rounds are uploaded to all layers; for 4/10, the communication cost is the parameter transmission amount between the server and the client, and the embodiment uses the conventional feavg algorithm as the reference of the communication cost.

Similarly, in the CNN network, the cosine similarity comparison is performed layer by layer on the model parameters of the client and the global model parameters, and the result is shown in fig. 7; then, comparing the accuracy of the personalized models under different numbers of the basic layers/personalized layers, and further verifying the effectiveness of the cosine similarity-based layering strategy provided by the embodiment as a result shown in fig. 8; finally, as shown in table 3, a comparison experiment of model accuracy and communication cost of asynchronous update with different frequencies is performed in the CNN network, so that the expansibility of the algorithm is increased.

TABLE 3 comparison analysis of model accuracy and communication cost for asynchronous update of CNN at different frequencies

(1) Personalized modeling contrastive analysis

In the DNN network structure, as shown in fig. 5, ten curves represent ten clients, and the cosine similarity is significantly reduced at layers 3-4, so that the first three layers are used as the base layer (i.e. the number of personality layers in table two is 2). As shown in fig. 6, when the number of the personality layers is 2, the average accuracy of the personality model is 90.7% at the most, which is 21.5% higher than the accuracy of FedAvg (the number of personality layers is 0), 7.2% higher than the accuracy of 1 of personality layers, and 12.3% higher than the accuracy of 3 of personality layers; similarly, in the CNN network structure, as shown in fig. 7, the cosine similarity is significantly reduced at layers 7 to 8, and the first seven layers are used as the base layer. And as shown in fig. 8, the accuracy of the personalized model is 94.8% at the highest. The effectiveness of the hierarchical strategy algorithm provided by the embodiment is verified by carrying out simulation experiments on two different network structures.

(2) Global model and communication cost comparison analysis

In the embodiment, a conventional federal averaging algorithm is used as a reference of communication cost, as shown in tables 2 and 3, after 10 federal cycles, the global model accuracy obtained by the federal averaging algorithm under the DNN network and the CNN network is 70.44% and 77.72% respectively, and the communication cost is 1. In the DNN network, when freq is 2/10, the accuracy of the global model is 71.30%, and the communication cost is reduced by 28% relative to the Federal averaging algorithm; when freq is 4/10, the accuracy of the global model is 73.15%, and the communication cost is reduced by 20%. In the CNN network, the accuracy of the global model is 79.50%, and the communication cost is reduced by 11%; when freq is 4/10, the accuracy of the global model is 82.59%, and the communication cost is reduced by 15%. In a network model, the parameter quantity is usually very large, and the asynchronous updating strategy provided by the embodiment is verified by performing experiments through two different network structures, so that the communication cost is effectively reduced while the accuracy of the global model is ensured.

The method aims at the problems that influence of characteristic parameters of different layers in a model on personalized modeling is often ignored in federal learning, so that the model precision is low and the communication cost is high. The embodiment provides an asynchronous updating personalized federal learning method based on model parameters. By constructing a distance measurement function, comparing and setting based on layer-to-layer cosine similarity change values, and dividing a basic layer representing common characteristics and an individual layer representing individual characteristics; and establishing a high-quality personalized model locally at the client according to the asynchronous updating strategy of the model parameters. Through carrying out multiple simulation experiments, a better result is obtained, and the effectiveness of the method is proved.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A personalized federal learning method based on asynchronous update of model parameters is characterized in that: weighting and aggregating model parameters at a server end according to the data size of each client, and expressing as follows:

the loss function is defined as:

wherein f is _k (w) is the loss function for the kth client.

2. The method for personalized federal learning based on asynchronous update of model parameters as claimed in claim 1, wherein: the global model of the server is W _t The local model of the client is W _k 。

3. The method for personalized federal learning based on asynchronous update of model parameters as claimed in claim 2, wherein: for a network model with L layers, model parameters W of a server are set _t The extension is expressed as { (W) _t ) ₁ ,(W _t ) ₂ ,...,(W _t ) _L }。

4. The method for personalized federal learning based on asynchronous update of model parameters as claimed in claim 3, wherein: the model parameters of the client are expressed as { (W) layer by layer _k ) ₁ ,(W _k ) ₂ ,...,(W _k ) _L }。

5. The method for personalized federal learning based on asynchronous update of model parameters as claimed in claim 4, wherein: the method also comprises the following steps of calculating the cosine similarity between the l-th layer server and the client, and expressing the cosine similarity as follows:

6. The method for personalized federal learning based on asynchronous update of model parameters as claimed in claim 5, wherein: at k clients, the cosine similarity is calculated layer by layer according to the following formula:

7. the method for individualized federated learning based on asynchronous update of model parameters of claim 6, characterized in that: further comprises calculating s _l Mean for K clients:

and (3) judging:

And personality layers