CN112560991A

CN112560991A - Personalized federal learning method based on hybrid expert model

Info

Publication number: CN112560991A
Application number: CN202011567011.XA
Authority: CN
Inventors: 郭斌彬; 肖丹阳; 吴维刚
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-26
Anticipated expiration: 2040-12-25
Also published as: CN112560991B

Abstract

The invention provides an individualized federal learning method based on a mixed expert model, aiming at overcoming the defect that a private model in a large-scale stateless mobile federal environment is difficult to realize full training: all clients participate in the training of the global model together by adding federal learning to obtain a global model parameter theta_G(ii) a Each client downloads theta from the server respectively_GInitializing a feature extraction layer and an individual classification layer in the client by using the parameter, and obtaining an individual classification layer parameter by individualizing by using a fixed base layer method; at the moment, the client i has theta comprising a feature extraction layer parameter and a global classification layer parameter_GAnd the parameters of the individual classification layer, the feature extraction layer, the global classification layer and the individual classification layer are initialized by utilizing the three, the gating model is trained together, and the parameters of the gating model are obtained

And finally, the client side obtains parameters of the feature extraction layer, the global classification layer, the individual classification layer and the gating model, and the individual federal learning is completed.

Description

Personalized federal learning method based on hybrid expert model

Technical Field

The invention relates to the field of federal learning, in particular to a personalized federal learning method based on a mixed expert model.

Background

In the deep learning field, the quantity and quality of training data largely determine the training effect of the deep neural network model. Under the premise of considering user privacy, user data can not be collected to a data center, but an effective model is difficult to obtain through training only by island data of each client. The federated learning of data retention in the local client is an effective solution, and the federated learning obtains model updates by using data and computing resources dispersed in each client, and obtains a global model by aggregating the model updates of each client, thereby training to obtain a global model which makes full use of the global data while ensuring the privacy of user data. In personalized federal learning, each client obtains an independent personalized model suitable for local data distribution based on a global model, and the personalized model is more in line with an optimization target considered from the perspective of the client. However, personalization is achieved, meanwhile, global knowledge is forgotten by a client, and how to balance the relationship between the two is a research hotspot of personalized federal learning at the present stage.

A hybrid expert Model (MOE) is an integrated learning method that effectively utilizes multiple learners, decomposing complex tasks into individual expert models, and then using a gating model to combine the experts. The privacy federation learning of the field self-adaptation is inspired by an MOE framework, a local private model and a model participating in global updating are used as local and global experts, and the output ratio of the local private model and the model is obtained by a gating model, so that the effect of the local and global field self-adaptation is achieved. For example, a model training method based on domain adaptation and federal learning is proposed in publication No. CN111738440A (published japanese 2020-10-02).

However, private adaptive privacy federated learning requires all clients to participate synchronously, and at the same time, all clients need to keep intermediate states including local gating models and private model states in the federated training process, which makes it limited to federated learning between institutions, and it is difficult to achieve sufficient training for large-scale stateless mobile federated environment private models. Because the gating model is a single-layer linear neural network, the gating model cannot directly utilize high-dimensional input data to effectively balance the private model and the global model in a federal training task based on high-dimensional data such as images.

Disclosure of Invention

The invention provides a personalized federal learning method based on a mixed expert model, aiming at overcoming the defect that the large-scale stateless mobile federal environment private model in the prior art is difficult to realize full training.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the personalized federal learning method based on the hybrid expert model comprises the following steps:

s1: all clients participate in the training of the global model together by adding federal learning to obtain a global model parameter theta_G；

S2: each client i downloads the global model parameter theta from the server respectively_GInitializing parameters of a personality classification layer, a feature extraction layer and a global classification layer in the client i by using the parameters, and performing random initialization on a gating model;

s3: the client i carries out personalized federal learning, local data are input into the feature extraction layer to obtain an activation value, then the activation value is respectively input into the global classification layer, the personalized classification layer and the gating model, and the personalized classification layer and the gating model are respectively input into the global classification layer, the personalized classification layer and the gating model according to the global model parameter theta_GPerforming personalized federal learning, training to obtain personalized classification layer parameters

And gating model parameters

S4: judging whether a preset training round number is reached, if so, obtaining an individual classification layer parameter

And gating model parameters

The client i finishes personalized federal learning; if not, the step S3 is executed.

In the technical scheme, the number of the clients is i, the clients respectively hold the private local data, and the clients receive the global model parameters issued by the server and are applied to personalized federal learning. The client training model comprises an individual classification layer, a gating model and a global model, the global model comprises a global classification layer and a feature extraction layer, specifically, the output end of the feature extraction layer is respectively connected with the input ends of the individual classification layer, the gating model and the global classification layer, the output ends of the individual classification layer, the gating model and the global classification layer are respectively connected with the input end of an aggregation weighting layer, and the output end of the aggregation weighting layer is the output end of the client training model.

Preferably, the step of S1 further includes the steps of: the client i sends a global model parameter theta according to the server_G(θ_E,θ_C,G) Local training is carried out by utilizing a gradient descent method to obtain an updated global model parameter theta_G(θ_E,θ_C,G) And uploading to a server; wherein theta is_EFor the parameters of the feature extraction layer, θ_C,GParameters of the global classification layer.

Preferably, in step S3, the global model parameter θ is received according to the client_G(θ_E,θ_C,G) By fixing the parameter theta of the feature extraction layer_ETo classify the layer parameters for the personality

Fine tuning and personalized federal learning.

Preferably, the specific steps of the individual classification layer for performing the individual federal learning according to the global model parameters are as follows:

s3.1: initializing setting personality classification layer parameters

Setting local fine-tuning hyper-parameters;

S3.2：extracting mini-batch (x, y) epsilon D from local data of client i_iInputting the data into a feature extraction layer to obtain an activation value A_E,x；

S3.3: will activate the value A_E,xInputting output predictive tags in a personality classification layer

S3.4: computing predictive labels

Cross entropy loss1 with real label y, using back propagation algorithm to obtain individual classification layer parameter

Gradient of (2)

S3.5: classifying layer parameters according to personality

Gradient of (2)

Classifying the parameters of the hierarchy for the personality

Updating, judging whether the preset number of training rounds is reached, and if not, skipping to execute the step S3.2; if yes, outputting the parameters of the individual classification layer

And completing personalized federal learning of the personalized classification layer.

Preferably, the expression formula of the individual classification layer for performing the individual federal learning according to the global model parameters is as follows:

A_E,x＝M_E(x,θ_E)

in the formula, M_E(. cndot.) represents a feature extraction layer,

representing a personality classification level; CEL (·) represents a cross-entropy loss function; alpha represents the learning rate of the individual classification layer for individual federal learning; d_iRepresenting a client local data set.

Preferably, in step S3, the global model parameter θ is used as a basis_GBy fixing the parameter theta of the feature extraction layer_EAnd personality classification level parameters

To gate control model parameters

Fine tuning and personalized federal learning.

Preferably, the steps of the gated model for personalized federal learning according to global model parameters are as follows:

s3.6: random initialization of gated model parameters

Setting local fine-tuning hyper-parameters;

s3.7: extracting mini-batch (x, y) epsilon D from local data of client i_iInputting the data into a feature extraction layer to obtain an activation value A_E,x；

S3.8: will activate the value A_E,xRespectively inputting the individual classification layer, the global classification layer and the gating model, aggregating the output results to complete forward propagation, and obtaining an output prediction label

S3.9: computing predictive labels

Cross entropy loss with real label y 2, using back propagation algorithm to obtain gating model parameter

Gradient of (2)

S3.10: according to gated model parameters

Gradient of (2)

To gate control model parameters

Updating, judging whether the preset number of training rounds is reached, and if not, skipping to execute the step S3.7; if yes, outputting the gating model parameters

And completing personalized federal learning of the gating model.

Preferably, the gating model performs personalized federal learning according to global model parameters by using the following expression formula:

A_E,x＝M_E(x,θ_E)

in the formula, gateout represents the ratio of the global model obtained by inputting the activation value into the gating model and the output of the individual classification layer, and the value range is [0,1 ]]；M_C(. to) represents a global classification layer; beta represents the learning rate of the gated model for personalized federal learning.

Preferably, the hyper-parameters comprise a learning rate and a number of training rounds.

Preferably, the personalized federal learning method further comprises the steps of: the client i takes the tasks to be classified as input data x' and calculates the prediction probability prob of each classification for judging the personalized federal learning result; the expression formula is as follows:

A_E,x′＝M_E(x′,θ_E)

prob＝softmax(y′)

in the formula, y 'represents a prediction label corresponding to the output of the input data x'; softmax (·) denotes a softmax function.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that: according to global model parameters obtained through federal learning, parameters of the individual classification layer and the gating model are finely adjusted, the gating model is trained independently by using the global model and the individual classification layer, the individual classification layer and the global model are mixed, and the global knowledge is kept while the individuation capability is improved; the invention takes the individual classification layer and the global model as local and global experts to form a mixed expert model, then uses the gating model to combine the experts, and adopts the output of the characteristic extraction layer as the input of the gating model, so that the gating model can more effectively divide input data.

Drawings

Fig. 1 is a flowchart of the personalized federal learning method based on a hybrid expert model according to embodiment 1.

Fig. 2 is a flowchart of the personalized federal learning method based on a hybrid expert model according to embodiment 1.

Fig. 3 is a schematic structural diagram of the client training model in embodiment 1.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

The embodiment provides an individualized federal learning method based on a hybrid expert model, which is a flow chart of the individualized federal learning method based on the hybrid expert model according to the embodiment, as shown in fig. 1-2.

The personalized federal learning method based on the hybrid expert model provided by the embodiment comprises the following steps:

S2: each client i downloads the global model parameter theta from the server respectively_GAnd initializing parameters of a personality classification layer, a feature extraction layer and a global classification layer in the client i by using the parameters, and performing random initialization by using a gating model.

Further, the step of S1 includes the steps of:

the client i sends a global model parameter theta according to the server_G(θ_E,θ_C,G) Local training is carried out by utilizing a gradient descent method to obtain an updated global model parameter theta_G(θ_E,θ_C,G) And uploading to a server; and repeating the steps until the federal training is completed. Wherein theta is_EFor the parameters of the feature extraction layer, θ_C,GParameters of the global classification layer.

S3: the client i carries out personalized federal learning and inputs local data x into the feature extraction layer M_EIn (1), the activation value A is obtained_E,xThen, the activation values are respectively input into a global classification layer, an individual classification layer and a gate control model, and the individual classification layer and the gate control model are respectively based on a global model parameter theta_GPerforming personalized federal learning, training to obtain personalized classification layer parameters

And gating model parameters

Where i represents the client number.

When the individual classification layer is subjected to individual federal learning, the global model parameter theta received by the client i is used for learning_G(θ_E,θ_C,G) By fixing the parameter theta of the feature extraction layer_ETo classify the layer parameters for the personality

Fine tuning and personalized federal learning. The method comprises the following specific steps:

s3.1: initializing setting personality classification layer parameters

Setting local fine-tuning hyper-parameters;

s3.2: extracting mini-batch (x, y) epsilon D from local data of client i_iInputting the data into a feature extraction layer to obtain an activation value A_E,x(ii) a The expression formula is as follows:

A_E,x＝M_E(x,θ_E)；

s3.3: will activate the value A_E,xInputting into the individual classification layer, outputting to obtain the prediction label

The expression formula is as follows:

s3.4: computing predictive labels

Gradient of (2)

The expression formula is as follows:

s3.5: classifying layer parameters according to personality

Gradient of (2)

Classifying the parameters of the hierarchy for the personality

Updating, wherein the expression formula is as follows:

then judging whether the preset number of training rounds is reached, if not, skipping to execute the step S3.2; if yes, outputting the parameters of the individual classification layer

In the formula, M_E(. cndot.) represents a feature extraction layer,

Wherein, when carrying out personalized federal learning on the gating model, the global model parameter theta is used for learning_GBy fixing the parameter theta of the feature extraction layer_EAnd personality classification level parameters

To gate control model parameters

Fine tuning and personalized federal learning. The specific steps ofThe following were used:

s3.6: random initialization of gated model parameters

Setting local fine-tuning hyper-parameters;

s3.7: extracting mini-batch (x, y) epsilon D from local data of client i_iInputting the data into a feature extraction layer to obtain an activation value A_E,x(ii) a The expression formula is as follows:

A_E,x＝M_E(x,θ_E)；

s3.8: will activate the value A_E,xRespectively inputting individual classification layer, global classification layer and gating model, aggregating output results to complete forward propagation, and outputting to obtain prediction label

The expression formula is as follows:

s3.9: computing predictive labels

Gradient of (2)

The expression formula is as follows:

s3.10: according to gated model parameters

Gradient of (2)

To gate control model parameters

Updating, wherein the expression formula is as follows:

then judging whether the preset number of training rounds is reached, if not, skipping to execute the step S3.7; if yes, outputting the gating model parameters

And completing personalized federal learning of the gating model.

In addition, in this step, the super-parameters preset in the individual classification layer or the gating model include the learning rate and the number of training rounds of the individual federal learning.

S4: judging whether a preset training round number is reached, if so, completing personalized federal learning by the client; if not, the step S2 is executed.

Further, the personalized federal learning method further comprises the following steps: the client i takes the tasks to be classified as input data x' and calculates the prediction probability prob of each classification for judging the personalized federal learning result; the expression formula is as follows:

A_E,x′＝M_E(x′,θ_E)

prob＝softmax(y′)

In this embodiment, the number of the clients is multiple, the clients respectively hold the private local data, and the clients receive the global model parameters issued by the server and apply the global model parameters to personalized federal learning. The server in this embodiment is a single federal server, and is responsible for federal learning task allocation, coordinating available clients, receiving global model parameters uploaded from the clients, aggregating the global model parameters to obtain latest global model parameters, and then issuing the latest global model parameters to the clients again.

In this embodiment, the training model of the client includes an individual classification layer, a gating model and a global model, the global model includes a global classification layer and a feature extraction layer, specifically, an output end of the feature extraction layer is connected to input ends of the individual classification layer, the gating model and the global classification layer, output ends of the individual classification layer, the gating model and the global classification layer are connected to an input end of an aggregation weighting layer, and an output end of the aggregation weighting layer is an output end of the training model of the client. Fig. 3 is a schematic structural diagram of the client training model of this embodiment.

In this embodiment, the personality classification layer and the global model are used as local and global experts to form a hybrid expert model, and then a gating model is used to combine the experts. Furthermore, in the embodiment, the output of the feature extraction layer is used as the input of the gating model, so that the gating model can more effectively divide the input data.

In this embodiment, after the client and the server complete federal learning, the client further finely adjusts parameters of the individual classification layer and the gating model, and then separately trains the gating model by using the global model and the individual classification layer, so as to realize mixing of the individual classification layer and the global model, and maintain global knowledge while improving individuation ability.

The same or similar reference numerals correspond to the same or similar parts;

the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. The personalized federal learning method based on the hybrid expert model is characterized by comprising the following steps of:

S2: each client downloads the global model parameter theta from the server_GInitializing parameters of a personality classification layer, a feature extraction layer and a global classification layer in the client i by using the parameters, and performing random initialization on a gating model;

s3: the client i carries out personalized federal learning, local data are input into a feature extraction layer to obtain an activation value, then the activation value is respectively input into a global classification layer, a personalized classification layer and a gating model, and the personalized classification layer and the gating model are respectively input into the global classification layer, the personalized classification layer and the gating model according to a global model parameter theta_GPerforming personalized federal learning, training to obtain personalized classification layer parameters

And gating model parameters

And gating model parameters

2. The personalized federal learning method as claimed in claim 1, wherein the step S1 further comprises the steps of: the client i sends a global model parameter theta according to the server_G(θ_E,θ_C,G) Local training is carried out by utilizing a gradient descent method to obtain an updated global model parameter theta_G(θ_E,θ_C,G) And uploading to a server; wherein theta is_EFor the parameters of the feature extraction layer, θ_C,GParameters of the global classification layer.

3. The personalized federal learning method as claimed in claim 2, wherein in the step S3, the global model parameter θ is received from the client i_G(θ_E,θ_C,G) By fixing the parameter theta of the feature extraction layer_ETo classify the layer parameters for the personality

Fine tuning and personalized federal learning.

4. The personalized federal learning method as claimed in claim 3, wherein the personalized classification layer performs personalized federal learning according to global model parameters by the following specific steps:

s3.1: initializing setting personality classification layer parameters

Setting local fine-tuning hyper-parameters;

s3.2: extracting mini-batch (x, y) epsilon D from local data of client i_iInputting the data into a feature extraction layer to obtain an activation value A_E,x；

S3.4: computing the predictive label

Gradient of (2)

S3.5: classifying layer parameters according to the personality

Gradient of (2)

Classifying the parameters of the hierarchy for the personality

Personalized federation to complete a hierarchy of personality classificationsAnd (5) learning.

5. The personalized federal learning method as claimed in claim 4, wherein the expression formula of the personalized federal learning by the personality classification layer according to the global model parameters is as follows:

A_E,x＝M_E(x,θ_E)

in the formula, M_E(. cndot.) represents a feature extraction layer,

6. The personalized federal learning method as claimed in claim 5, wherein in the step S3, the global model parameter θ is used as a basis_GBy fixing the parameter theta of the feature extraction layer_EAnd personality classification level parameters

To gate control model parameters

Fine tuning and personalized federal learning.

7. The personalized federal learning method as claimed in claim 6, wherein the gated model performs personalized federal learning according to global model parameters by the following steps:

s3.6: random initialization of gated model parameters

Setting local fine-tuning hyper-parameters;

S3.9: computing the predictive label

Gradient of (2)

S3.10: according to the gating model parameters

Gradient of (2)

To gate control model parameters

And completing personalized federal learning of the gating model.

8. The personalized federal learning method as claimed in claim 7, wherein the gating model performs personalized federal learning according to global model parameters by using the following expression formula:

A_E,x＝M_E(x,θ_E)

wherein gateout represents an activation valueThe value range of the ratio of the global model obtained by inputting the gating model to the output of the individual classification layer is [0,1 ]]；M_C(. to) represents a global classification layer; beta represents the learning rate of the gated model for personalized federal learning.

9. The personalized federal learning method as claimed in claim 4 or 7, wherein the hyper-parameters include learning rate and training round number.

10. The personalized federal learning method as claimed in claim 8, further comprising the steps of: the client i takes the task to be classified as input data x' and calculates the prediction probability prob of each classification, and the expression formula is as follows:

A_E,x′＝M_E(x′,θ_E)

gateout＝M_Gate(A_E,x′,θ_Gatei)

prob＝softmax(y′)