CN114357067A - Personalized federal meta-learning method for data isomerism - Google Patents

Personalized federal meta-learning method for data isomerism Download PDF

Info

Publication number
CN114357067A
CN114357067A CN202111535626.9A CN202111535626A CN114357067A CN 114357067 A CN114357067 A CN 114357067A CN 202111535626 A CN202111535626 A CN 202111535626A CN 114357067 A CN114357067 A CN 114357067A
Authority
CN
China
Prior art keywords
model
meta
client
local
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111535626.9A
Other languages
Chinese (zh)
Other versions
CN114357067B (en
Inventor
杨磊
黄家明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202111535626.9A priority Critical patent/CN114357067B/en
Priority claimed from CN202111535626.9A external-priority patent/CN114357067B/en
Publication of CN114357067A publication Critical patent/CN114357067A/en
Application granted granted Critical
Publication of CN114357067B publication Critical patent/CN114357067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a personalized federal meta-learning method aiming at data isomerism, which comprises the following steps: determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client; initializing parameters of a federal training phase; grouping the clients according to the local data distribution vector uploaded by the clients; aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration; and after the federal training is finished, the client finely adjusts the meta-model in the group and the local data thereof to generate the personalized model. When the client participates in federal training, the client with approximate data distribution is dynamically divided into the same group according to the local data distribution vector uploaded in each round, and a corresponding meta-model is set for each group, so that the problems of slow model convergence and low accuracy caused in the environment with highly heterogeneous data are solved.

Description

Personalized federal meta-learning method for data isomerism
Technical Field
The invention relates to the research field of distributed machine learning under data isomerism, in particular to a personalized federal meta-learning method aiming at data isomerism.
Background
The popularity of edge devices in modern society, such as mobile phones and wearable devices, has led to a rapid growth in the distributed private data that people produce. Although these abundant data provide great opportunities for machine learning applications, the social concern about data privacy is increasing with the advent of regulations such as General Data Protection Regulations (GDPR) and Health Insurance Privacy and Accountability Act (HIPAA). This makes federal learning more and more popular, which is a new distributed machine learning paradigm that enables machine learning models to be developed and trained on data islands in a cooperative and privacy-preserving manner. The primary motivation for individual users to participate in federal learning is to utilize a shared knowledge base of other users in federal learning. Because single users are often faced with data-level limitations, such as data scarcity, low quality data, and unseen label classes, these limit their ability to train well-behaved local models.
Federated learning is a framework that enables multiple users, called clients, to collaboratively train a shared global model on their federated data without moving the data from their local devices. A central server coordinates the entire process of federal learning, which is a multi-turn process. At the beginning of each round, the server sends the current global model to the participating clients. Each client trains the model on its local data and passes model updates back to the server. The server collects these updates from all clients and makes one update to the global model, ending the round. Federal learning overcomes the privacy problem described above by eliminating the need to aggregate all data on a single device. Since the primary motivation for clients to participate in federated learning is to obtain better models, those clients that do not have enough private data to develop accurate local models will benefit the most from the federated learned models. For those clients that have enough private data to train accurate local models, however, the benefits of participating in federal learning are controversial, as the accuracy of shared global models may be lower than their locally trained local models. Furthermore, for many applications, the distribution of data across clients is highly Non-independent and co-distributed (Non-IID). This statistical heterogeneity makes it difficult for federal learning to train a single model that works well for all clients.
While the initial goal of federal learning was to find a single global model that could be deployed on each client, a single model may not be able to serve all clients simultaneously, as the data distribution of the clients may vary greatly between different devices. Therefore, the heterogeneity of data becomes one of the major challenges to find an efficient federated learning model. Several personalized federal learning approaches have been proposed to deal with data heterogeneity, some of which use different local models to fit client-specific local data, but may also extract public knowledge from data of other devices. In order to deal with the challenges presented by the statistical heterogeneity of data, it is necessary to personalize the global model. For example, when a next word prediction task is run on the client, it is obvious that users in different areas output different answers for the next word prediction of the sentence "i live at … …", so the model needs to predict different answers for each user. Most personalization techniques typically involve two discrete steps. The first step is to build a global model in a collaborative way. In a second step, a personalized model is built for each client using the client's private data. Generally speaking, optimizing purely for global accuracy results in patterns that are difficult to personalize. In order for personalized federal learning to function in practice, the following three objectives must be addressed simultaneously, not independently: (1) developing an improved personalized model to benefit most clients; (2) developing an accurate global model, so that a client with limited local data benefits from the accurate global model; (3) the fast convergence of the model can be realized in a lower training round.
In recent years, personalized federal learning has become one of the most promising approaches to the statistical challenge of non-independent co-distributed data in joint learning, and has attracted increasing attention.Jiang et al (Yihan Jiang, Jakub)
Figure BDA0003413063430000021
The link between the MAML algorithm (Chelsea Finn, Pieter Abbel and Source Levine 2017 Model-Agnostic Meta-Learning For Fast Adaptation Of Deep nets ICML 1126, 1126 and Federal Learning was explored by the Keith Rush and Sreeram Kannan.2019 Impropeng Federal Learning. They treat the global meta-model of the MAML as a global model for federal learning and the tasks as local models for clients. They also show that existing optimization-based meta-learning algorithms (such as MAML) can be integrated into federal learning to achieve personalization. In the literature (Alireza Fallah, Aryan Mokhtari, and Asuman E.Ozdagar.2020. personalized fed Learning With Theoretical guidelines: A model-cementitious method-Learning approach. in NeurIPS.), the authors propose Per-FedAvg, a personalized version of the MAML-based Federated averaging algorithm, which customizes the personalized model by training a good initial global model on the customer's local data. Compared to MAML-type methods, Khodak et al (Mikhail Khodak, Maria-Florina Balcan, and amino S. Talwalk lkar.2019.adaptive hierarchical based method-Learning methods. in NeurIPS, 5915-.
Although these personalized federal learning methods have better performance (especially accuracy comparisons) than traditional federal learning methods, the current art still ignores the potential drawback of statistical heterogeneity of client data. If the feature space has a large diversity for each local data distribution, then the personalized model may have multiple generalization directions. In this case, if only one global model is relied on for guidance, the overall performance of the personalized model is easily degraded due to generalized negative migration. To address this situation, the present invention alleviates the negative migration problem caused by this situation by providing different global models for clients with different generalization directions.
Disclosure of Invention
Aiming at the defects in the prior art, the invention can provide a personalized federal meta-learning method aiming at data heterogeneity. Before the client side formally participates in the federal training, a self-encoder is trained to provide vectors of local data distribution, then the server side divides all the client sides participating in the training into a plurality of groups according to the data distribution vectors uploaded by the client sides, and maintains a corresponding number of generalized models in the server side to respectively guide the personalized process, so that the problems in the prior art are solved.
The invention is realized by at least one of the following technical schemes.
A personalized federal meta-learning method aiming at data heterogeneity comprises the following steps:
s1, determining the structure of an automatic encoder in the initialization stage and the structure of a meta-model in the personalization stage of each client;
s2, performing an initialization stage to obtain central points of different data distributions;
s3, the clients participate in federal training, and are divided into a plurality of groups according to the data distribution vectors uploaded in each round;
s4, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration;
and S5, after the federal training is finished, the client adjusts the meta-models in the group and the local data thereof to generate the personalized models.
Further, before participating in federal learning, the client needs to download a unified automatic encoder and a model structure of the meta-model from the server; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.
Further, the step S2 of acquiring the center point includes the following steps:
s201, order DiLocal data set representing client, CkWhich represents the center point of the image,
Figure BDA0003413063430000041
an encoder section representing an auto encoder;
s202, each client utilizes a local data set DiTraining an autoencoder to obtain
Figure BDA0003413063430000042
S203, each client i uses an encoder
Figure BDA0003413063430000043
Obtaining each data sample x ∈ DiEmbedded vector of
Figure BDA0003413063430000044
Figure BDA0003413063430000045
Then averaging the embedded vectors of all samples to obtain a local data distribution vector
Figure BDA0003413063430000046
Figure BDA0003413063430000047
Uploading the data to a server;
s204, the server collects client data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck
Further, the model structure of the automatic encoder is one of a stack automatic encoder, a convolution automatic encoder and a circulation automatic encoder.
Further, the federal training follows a federal mean algorithm, specifically: suppose there are N clients, each clientThe terminals have a fixed local data set DiAt the beginning of each round, the server randomly selects a part of the clients, then sends the current global algorithm state to each client, each client performs local computation based on the global state and the local data set, and then sends the updated global state to the server. The server side aggregates the updated global states to generate a new global state, and repeats the process; under the framework of federal training.
Further, step S3 includes the steps of:
s301, let phikRepresenting the meta-model in the kth group, θiA local personalization model representing the client, R representing a total communication turn; selecting | S | clients from all clients participating in the federal training in each round; the number of times of local updating performed by the client is T; local data set DiRepresents, and each client owns | DiL number of data samples x; for client i, its local data set is divided into two parts
Figure BDA0003413063430000048
Figure BDA0003413063430000049
For the training of the client side, the training device,
Figure BDA00034130634300000410
personalization for the client;
s302, the server side randomly selects | S | clients and enables the corresponding meta-model phi to bekSending the data to the selected client;
s303, the client receives the meta-model phi from the serverkTime, in the meta model phikAnd local data DiAnd local updating is carried out, and when the local T belongs to the T round, the updating calculation mode is as follows:
Figure BDA00034130634300000411
wherein the content of the first and second substances,
Figure BDA00034130634300000412
representing a meta-model phikPerforming local updating of the t round on the client i in the communication round of the r round; α represents the learning rate of the local model;
Figure BDA00034130634300000413
representing the loss function in the model training process, its magnitude and
Figure BDA00034130634300000414
the correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;
Figure BDA0003413063430000051
the gradient size of the loss function when the neural network propagates reversely is represented;
Figure BDA0003413063430000052
represents from
Figure BDA0003413063430000053
Random sample size of
Figure BDA0003413063430000054
Data samples of (2) the updated local model
Figure BDA0003413063430000055
The meta-model is updated by the following calculation method:
Figure BDA0003413063430000056
wherein the content of the first and second substances,
Figure BDA0003413063430000057
representing a meta-model phikAfter the local update of round t on client i, in round r, the communication round is followed byA round of t +1 locally updated meta-models;
Figure BDA0003413063430000058
expressed is a loss function, the magnitude of which is
Figure BDA0003413063430000059
Associating; beta represents the learning rate of the meta-model, and is normally set to be beta ≦ alpha; subsequently, the step S303 is repeated until the local update of the T round is completed;
s304, order
Figure BDA00034130634300000510
Represents the data sample co-sampled in the T-round training, and the sampled data sample size is
Figure BDA00034130634300000511
And obtaining a local data distribution vector of the client under the communication round R belonging to R as follows:
Figure BDA00034130634300000512
s305, after the T-round local update is finished, the client side updates the meta-model
Figure BDA00034130634300000513
And a local data distribution vector h in step S304iAnd sent to the server together.
Further, the Meta-Model update algorithm is Model-independent Meta-learning (MAML), and in step S303, when updating the Meta-Model, the gradient obtained by back propagation is as follows
Figure BDA00034130634300000514
The method specifically comprises the following steps:
Figure BDA00034130634300000515
using the first order gradient version to make the update, the second orderThe gradient is ignored and the corresponding gradient is updated to
Figure BDA00034130634300000516
Figure BDA00034130634300000517
Further, the step S4 is specifically:
s401, the server receives the updated meta model uploaded from the selected client list S
Figure BDA00034130634300000518
And corresponding local data distribution vector hi}; the server side stores K clustering central points Ck
S402, respectively calculating a local data distribution vector { h) uploaded by each clientiSimilarity with K cluster center points:
Figure BDA00034130634300000519
where cos represents cosine similarity, hiRepresenting a local data distribution vector;
and allocating the client i to the group with the cluster center point with the maximum similarity:
Figure BDA00034130634300000520
wherein the content of the first and second substances,
Figure BDA00034130634300000521
a group number indicating the client is assigned;
when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { Gk},k∈[1,K]Each grouping result contains the identification number of the client;
s403, for each group GkAnd performing model aggregation in the group to generate the full of the next roundLocal meta model
Figure BDA0003413063430000061
The model polymerization mode is as follows:
Figure BDA0003413063430000062
wherein the content of the first and second substances,
Figure BDA0003413063430000063
after the r-th communication round is finished, generating a new k-th component model by using the following communication round r + 1;
Figure BDA0003413063430000064
its size is equal to the size of the data sampled at the time of local update of client i
Figure BDA0003413063430000065
In the context of a correlation, the correlation,
Figure BDA0003413063430000066
is the sampled data sample size, T is the local update times,
Figure BDA0003413063430000067
for the training data set of the client i,
Figure BDA0003413063430000068
in T-round training
Figure BDA0003413063430000069
The co-sampled data samples.
S404, the server side issues each group of updated meta-models to the corresponding clients in the group, for the unselected clients, the updated meta-models are not received, the steps S3 and S4 are repeated until the models are converged, and the server side stores the meta-models in each group
Figure BDA00034130634300000610
Further, the extra added computational complexity of the grouping operation is:
Figure BDA00034130634300000611
in the formula, K represents the number of groups, | S | represents the number of randomly selected clients in each round of the server, dhRepresenting the dimensionality of a local data distribution vector uploaded by a client, wherein the number of groups satisfies K < N;
for space complexity, since K meta-models need to be stored, the extra size of the storage space is
Figure BDA00034130634300000612
Wherein d isθRepresenting the parameter size of the meta-model.
Further, the step S5 is specifically:
s501, all clients utilize local data sets for personalized processes of the clients
Figure BDA00034130634300000613
Calculating its data distribution vector
Figure BDA00034130634300000614
And upload it to the server side,
Figure BDA00034130634300000615
an encoder section representing an auto encoder;
s502, the server completes grouping of all the clients according to the step S402 and issues the trained meta-model in each group to the clients in the group;
s503, the client combines the local data set according to the received meta-model
Figure BDA00034130634300000616
Executing gradient descent for several times to obtain personalized model
Figure BDA00034130634300000617
φkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,
Figure BDA00034130634300000618
for the reverse propagation of the time-to-loss function
Figure BDA00034130634300000619
The resulting gradient is calculated.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the clients with similar data distribution are adaptively divided into the same group in the federal training stage, and different global models are adopted in each group to respectively guide the clients to generate the personalized models, so that the problem of negative migration of a single global model to part of the clients is avoided, the cooperative training among the similar clients is promoted, the convergence rate is accelerated, and the accuracy of the personalized models is improved.
Drawings
FIG. 1 is a method flow diagram of a personalized federated meta-learning method for data heterogeneity according to the present invention;
FIG. 2 is a schematic illustration of the initialization phase of the present invention;
FIG. 3 is a schematic representation of the federal training phase of the present invention.
Detailed description of the invention
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The personalized federal study has wide application prospect in many fields, such as electronic commerce, finance, medical treatment, education, urban calculation, smart cities, edge calculation, Internet of things, mobile networks and the like. The following will describe how to perform personalized federal meta-learning, taking the mobile network field as an example.
As more and more users use smart phones, reliable and fast mobile input methods are also becoming more and more important. The next word prediction is a basic function of the input method, for example, the user inputs "today", words such as "evening, afternoon" appear in a pre-selection box of the input method for the user to select. Due to different input habits of different users, the distribution of local data samples of the users has great difference, so that personalized prediction models need to be established for the different users. In addition, in the training process, users with similar language habits are divided into the same group for collaborative training, so that the training process is accelerated, and the accuracy of the personalized prediction model is improved. In order to obtain a better next word prediction model, how to perform collaborative training of the model using personalized federal meta-learning on the local historical data of the user will be described below.
Example 1
The personalized federal meta-learning method for data heterogeneity shown in fig. 1 comprises the following steps:
firstly, determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client;
before participating in federal learning, mobile equipment of each user firstly needs to download a uniform automatic encoder and a model structure of a meta-model from a cloud server; the automatic encoder for initialization phase is one of neural networks, generally used for data dimension reduction or feature learning, and is used for representing the distribution of local language data of a user; the meta-model used for the personalized stage refers to a model under meta-learning, which can adapt to a learning model of a new task through training of a small number of samples, and for making next word prediction, a common language model such as an LSTM (Long Short-Term Memory) language model is adopted;
secondly, performing an initialization stage to obtain central points of different data distributions;
users need to be preliminarily grouped according to local data distribution before participating in formal federal learning, because some users have similar input habits, and the grouping of the users in the same group helps to improve the performance of the model. Specifically, the method comprises the following steps:
s201, order DiRepresenting local language data on a user's mobile device; ckRepresenting the center point of data distribution in each group, and grouping the users according to the similarity degree with the center point of each group;
Figure BDA0003413063430000081
an encoder section representing an auto encoder;
s202, on the mobile equipment of the user, according to the local language data DiTraining an auto-encoder to obtain an encoder portion
Figure BDA0003413063430000082
S203, in the mobile equipment of the user i, enabling each piece of local language data x to be in the range of DiInput to an encoder
Figure BDA0003413063430000083
In the method, corresponding embedded vectors are obtained
Figure BDA0003413063430000084
Then embedding the vectors h of all samplesiAveraging to obtain local language data distribution vector
Figure BDA0003413063430000085
Uploading the data to a cloud server;
s204, the cloud server collects user local data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck
In this embodiment, the model structure of the automatic encoder is a stack automatic encoder, and it is assumed that the size dh of the compressed vector is 25.
As a preferred embodiment, as shown in fig. 2, the automatic encoder is composed of an encoder portion and a decoder portion, and the features of the hidden layer are obtained by compressing the original data through the encoder portion. For user 1, assume a total of 100 local language data samples { x }1,...,x100The part of the encoder after training is
Figure BDA00034130634300000810
Then the local data distribution vector (characteristic of the encoder output) calculated by client 1 is
Figure BDA0003413063430000087
Figure BDA0003413063430000088
Figure BDA0003413063430000089
Representing the encoder portion of the autoencoder. And then, on the server, after receiving the local data distribution vectors uploaded by all the clients, clustering through a K-means algorithm, and if the K of the set clustering algorithm is 2, the server needs to store 2 central points output by the K-means algorithm, and the central points are used in the federally trained adaptive grouping.
Thirdly, the client side participates in federal training, and the client side is divided into a plurality of groups according to the data distribution vector uploaded in each round;
specifically, the federal training procedure follows a federal averaging algorithm, specifically: assuming participation of N users, each user has a fixed local data set DiAt the beginning of each round, the server randomly selects a portion of users, then the server sends the current global model to each user, each client performs local computations based on the global model and local datasets, then sends the updated global model to the server, the server aggregates the updated global models to generate a new global model, and repeats the process; under the framework of federal training, as shown in fig. 3, the specific process is as follows:
s301, since the users are divided into two groups, the meta-model set is { phi1,φ2Dividing users in the same group to personalize the local prediction model by using the meta-model of the group; θ i represents a model parameter of a local personalized predictive model of the user;
Figure BDA0003413063430000091
represents the total communication turn, which can be set to 500; selecting 3 users from all users participating in federal training in each round; the number of times of local updating of the mobile equipment of the user is T-5; local data set DiThe local data set sizes of 5 users are 100, 200, 20, 400, and 30, respectively, and the data set sizes are denoted by | DiI represents; for user i, its local data set is divided into two parts
Figure BDA0003413063430000092
Figure BDA0003413063430000093
For training of the mobile device of the user,
Figure BDA0003413063430000094
personalization with a user mobile device; the size of the data sample sampled per local update round of the user's mobile device is
Figure BDA0003413063430000095
S302, the cloud server randomly selects 3 users {1, 3, 5}, and corresponding meta model phi is usedkTo the part of users;
s303, if the user 1 receives the meta-model phi from the cloud server1Time, in the meta model phi1And local language data D1And performing local updating, and updating the parameters of the local prediction model by using a back propagation algorithm when the local t is updated in the 3 th round, wherein the updating calculation mode of the parameters is as follows:
Figure BDA0003413063430000096
wherein the content of the first and second substances,
Figure BDA0003413063430000097
representing a meta-model phi1In round 111, active communicationLocal update of round 3 is done on household 1, equivalent to
Figure BDA0003413063430000098
Alpha represents the learning rate of the local model and can be set as {0.001, 0.01, 0.1 };
Figure BDA0003413063430000099
representing the loss function in the model training process, its magnitude and
Figure BDA00034130634300000910
the correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;
Figure BDA00034130634300000911
the gradient size of the loss function when the neural network propagates reversely is represented;
Figure BDA00034130634300000912
represents from
Figure BDA00034130634300000913
Random sample size of
Figure BDA00034130634300000914
The updated local prediction model
Figure BDA00034130634300000915
The meta-model is updated by performing a back propagation algorithm, which is calculated as follows:
Figure BDA00034130634300000916
wherein the content of the first and second substances,
Figure BDA00034130634300000917
representing a meta-model phi1Local update of round 3 is performed on user 1 in round 111 communication round withMeta-models updated locally in round 4;
Figure BDA0003413063430000101
expressed is a loss function, the magnitude of which is
Figure BDA0003413063430000102
Associating; beta represents the learning rate of the meta model, and in general, beta is set to be beta ≦ alpha, and can be set to {0.0005, 0.005, 0.05 }; subsequently, the step S303 is repeated until the local update of the T-5 rounds is completed;
s304, order
Figure BDA0003413063430000103
Represents the data samples co-sampled in 5 rounds of training, the sampled data sample size is
Figure BDA0003413063430000104
Therefore, the local data distribution vector of the user under the communication turn r-111 can be obtained as follows:
Figure BDA0003413063430000105
Figure BDA0003413063430000106
s305, after 5 rounds of local updating are completed, the mobile equipment of the user enables the updated meta-model
Figure BDA0003413063430000107
And the local data distribution vector h in the step S3041Sent to the cloud server together.
Specifically, the Meta-Model update algorithm is Model-independent Meta-Learning (MAML), and the magnitude of the gradient obtained by back propagation is equal to the magnitude of the gradient obtained when updating the Meta-Model in step S303
Figure BDA0003413063430000108
The method specifically comprises the following steps:
Figure BDA0003413063430000109
the first-order gradient version is used for updating, the second-order gradient is omitted, the calculation pressure of the mobile equipment can be reduced, and the corresponding gradient updating is changed into the updating
Figure BDA00034130634300001010
Fourthly, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration, specifically:
s401, the cloud server receives the updated meta model uploaded from the selected user list {1, 3, 5}
Figure BDA00034130634300001011
And corresponding local data distribution vector hi}; in addition, the server side also stores 2 cluster center points { C1,C2};
S402, respectively calculating a local data distribution vector { h) uploaded by each user1,h3,h5The similarity between the cluster center points and 2 cluster center points is calculated as follows:
Figure BDA00034130634300001012
where cos represents the cosine similarity. Subsequently, the user 1 is assigned to the group in which the cluster center point with the largest similarity is located, and the calculation method is as follows:
Figure BDA00034130634300001013
wherein the content of the first and second substances,
Figure BDA00034130634300001014
indicating the group number to which user 1 is assigned.
When all users {1, 3, 5} finish grouping, get the grouping nodeAnd (5) fruit. Define the grouping result as { Gk},k∈[1,2]Each grouping result contains the identification number of the user;
s403, for each group GkAnd performing model aggregation in the group to generate a global meta-model of the next round
Figure BDA0003413063430000111
The model polymerization mode is as follows:
Figure BDA0003413063430000112
wherein the content of the first and second substances,
Figure BDA0003413063430000113
after the 111 th communication turn is finished, generating a new k component model by using the next communication turn 112;
Figure BDA0003413063430000114
its size is equal to the size of the data sampled at the time of local update of client i
Figure BDA0003413063430000115
In the context of a correlation, the correlation,
Figure BDA0003413063430000116
is the sampled data sample size, T is the local update times,
Figure BDA0003413063430000117
for the training data set of the user i,
Figure BDA0003413063430000118
in T-round training
Figure BDA0003413063430000119
The co-sampled data samples.
S404, the cloud server issues each group of updated meta-models to corresponding users in the group, and for the users which are not selected, the updated meta-models cannot be receivedAnd (4) molding. Repeating the steps S3 and S4 until the model converges, and the cloud server saves the meta model in each group
Figure BDA00034130634300001110
Fifthly, after the federal training is finished, the user finely adjusts the meta-models in the user group and the local data thereof to generate personalized models, specifically:
s501, all users {1, 2, 3, 4, 5} utilize local language data set
Figure BDA00034130634300001111
Calculating its data distribution vector
Figure BDA00034130634300001112
Uploading the data to a cloud server;
s502, the cloud server completes grouping of all users according to the step S402 and issues the trained meta-model in each group to the users in the group;
s503, combining the local data set by the user according to the received meta-model
Figure BDA00034130634300001113
Executing gradient descent for several times to obtain personalized model
Figure BDA00034130634300001114
φkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,
Figure BDA00034130634300001115
for the reverse propagation of the time-to-loss function
Figure BDA00034130634300001116
The resulting gradient is calculated.
Specifically, the obtained gradient descent frequency of the personalized model is defaulted to one time, and different users can appropriately set the gradient descent frequency according to the model performance. After the final personalized training is finished, the user can obtain the fitnessPersonalized language prediction model [ theta ] fitting local data distribution1,θ2,θ3,θ4,θ5}。
Example 2
As shown in fig. 3, a total of 5 users participate in federal training, which is divided into 2 groups in the cloud server. In any communication turn, assuming that the grouping result of the previous turn is {1, 2, 3} in the first group and {4, 5} in the second group, and the cloud server selects the users {1, 3, 5} to participate in the federal training process, then the users {1, 3} respectively receive the meta-model phi1User {5} will receive the meta-model φ2
For user 1, it is assumed that it has 100 local data, and local updates are performed locally T ═ 5 times, and each update random sample batch has a size of
Figure BDA0003413063430000121
Then after the local update is completed, the total sampled data size is min (2 × 5, 100) ═ 50. Then calculating the distribution vector of the batch of sampling data
Figure BDA0003413063430000122
Together with locally updated meta-model
Figure BDA0003413063430000123
And uploading to a cloud server. Similarly, users {3, 5} also perform the above process.
For the cloud server, the data distribution vector uploaded by the users {1, 3, 5} and the updated meta model are received
Figure BDA0003413063430000124
The cloud server first traverses { h }1,h3,h5List of local data distribution vectors { h } uploaded1,h3,h5And performing similarity comparison with the stored 2 cluster center points to complete grouping, wherein the grouping result is assumed as follows: user {1, 5} is a group, and user {3} is a group. Then, model aggregation operation is carried out, if the users {1, 3, 5} sampleAre respectively 50, 20, 30, a weighting factor of w can be calculated1=50/(50+30)=0.625,w3=20/20=1.0,w530/(50+30) ═ 0.375}, and then the first set of weighted meta-models is obtained
Figure BDA0003413063430000125
The second set of weighted meta-models is
Figure BDA0003413063430000126
And repeating the step S3, and the step S4 until the preset stop condition R is reached, wherein the preset stop condition R is 500 communication rounds.
Furthermore, the above grouping operation adds additional computational complexity compared to the original federal averaging algorithm:
Figure BDA0003413063430000127
where 2K represents the number of packets, | S | 5 represents the number of clients randomly selected by the server per round, dh25 represents the dimensionality of a local data distribution vector uploaded by a user, and the number of groups satisfies K < N;
for space complexity, since K is required to store 2 meta-models, the extra size of the storage space is as follows
Figure BDA0003413063430000128
Wherein d isθThe parameter size of the meta-model is denoted 25.
Example 3
In one embodiment, the model structure of the automatic encoder may be one of a convolution automatic encoder and a cycle automatic encoder,
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A personalized federal meta-learning method aiming at data heterogeneity is characterized by comprising the following steps:
s1, determining the structure of an automatic encoder in the initialization stage and the structure of a meta-model in the personalization stage of each client;
s2, performing an initialization stage to obtain central points of different data distributions;
s3, the clients participate in federal training, and are divided into a plurality of groups according to the data distribution vectors uploaded in each round;
s4, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration;
and S5, after the federal training is finished, the client adjusts the meta-models in the group and the local data thereof to generate the personalized models.
2. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the client needs to download the unified automatic encoder and the model structure of the meta-model from the server before participating in the federal learning; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.
3. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein the step S2 of obtaining the central point comprises the following steps:
s201, order DiLocal data set representing client, CkWhich represents the center point of the image,
Figure FDA0003413063420000011
an encoder section representing an auto encoder;
s202, each client utilizes a local data set DiTraining an autoencoder to obtain
Figure FDA0003413063420000012
S203, each client i uses an encoder
Figure FDA0003413063420000013
Obtaining each data sample x ∈ DiEmbedded vector of
Figure FDA0003413063420000014
Figure FDA0003413063420000015
Then averaging the embedded vectors of all samples to obtain a local data distribution vector
Figure FDA0003413063420000016
Figure FDA0003413063420000017
Uploading the data to a server;
s204, the server collects client data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck
4. The method of claim 1, wherein the model structure of the autoencoder is one of a stack autoencoder, a convolution autoencoder, and a loop autoencoder.
5. The method of claim 1, wherein the federal training followed a federationThe nation average algorithm specifically comprises the following steps: suppose there are N clients, each with a fixed local data set DiAt the beginning of each round, the server randomly selects part of the clients, then sends the current global algorithm state to each client, each client performs local calculation based on the global state and the local data set, then sends the updated global state to the server, and the server aggregates the updated global states to generate a new global state and repeats the process; under the framework of federal training.
6. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein step S3 includes the following steps:
s301, let phikRepresenting the meta-model in the kth group, θiA local personalization model representing the client, R representing a total communication turn; selecting | S | clients from all clients participating in the federal training in each round; the number of times of local updating performed by the client is T; local data set DiRepresents, and each client owns | DiL number of data samples x; for client i, its local data set is divided into two parts
Figure FDA0003413063420000021
Figure FDA0003413063420000022
For the training of the client side, the training device,
Figure FDA0003413063420000023
personalization for the client;
s302, the server side randomly selects | S | clients and enables the corresponding meta-model phi to bekSending the data to the selected client;
s303, the client receives the meta-model phi from the serverkTime, in the meta model phikAnd local data DiLocal update is carried out, and when the T is updated in the local T epsilon T round, the updated calculation is carried outThe method comprises the following steps:
Figure FDA0003413063420000024
wherein the content of the first and second substances,
Figure FDA0003413063420000025
representing a meta-model phikPerforming local updating of the t round on the client i in the communication round of the r round; α represents the learning rate of the local model;
Figure FDA0003413063420000026
representing the loss function in the model training process, its magnitude and
Figure FDA0003413063420000027
the correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;
Figure FDA0003413063420000028
the gradient size of the loss function when the neural network propagates reversely is represented;
Figure FDA0003413063420000029
represents from
Figure FDA00034130634200000210
Random sample size of
Figure FDA00034130634200000211
Data samples of (2) the updated local model
Figure FDA00034130634200000212
The meta-model is updated by the following calculation method:
Figure FDA00034130634200000213
wherein the content of the first and second substances,
Figure FDA00034130634200000214
representing a meta-model phikIn the communication round of the r round, after the local update of the t round is carried out on the client i, the meta-model is locally updated by the next round of t + 1;
Figure FDA00034130634200000215
expressed is a loss function, the magnitude of which is
Figure FDA00034130634200000216
Associating; beta represents the learning rate of the meta-model, and is normally set to be beta ≦ alpha; subsequently, the step S303 is repeated until the local update of the T round is completed;
s304, order
Figure FDA0003413063420000031
Represents the data sample co-sampled in the T-round training, and the sampled data sample size is
Figure FDA0003413063420000032
And obtaining a local data distribution vector of the client under the communication round R belonging to R as follows:
Figure FDA0003413063420000033
s305, after the T-round local update is finished, the client side updates the meta-model
Figure FDA0003413063420000034
And a local data distribution vector h in step S304iAnd sent to the server together.
7. The method of claim 6, wherein the method comprises a federated meta-learning method for data heterogeneityThen, the meta-model updating algorithm is model-independent meta-learning, and in step S303, when updating the meta-model, the gradient obtained by back propagation is as follows
Figure FDA0003413063420000035
The method specifically comprises the following steps:
Figure FDA0003413063420000036
using the first-order gradient version to update, neglecting the second-order gradient, and updating the corresponding gradient to
Figure FDA0003413063420000037
Figure FDA0003413063420000038
8. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the step S4 specifically includes:
s401, the server receives the updated meta model uploaded from the selected client list S
Figure FDA0003413063420000039
And corresponding local data distribution vector hi}; the server side stores K clustering central points Ck
S402, respectively calculating a local data distribution vector { h) uploaded by each clientiSimilarity with K cluster center points:
Figure FDA00034130634200000310
where cos represents cosine similarity, hiRepresenting a local data distribution vector;
and allocating the client i to the group with the cluster center point with the maximum similarity:
Figure FDA00034130634200000311
wherein the content of the first and second substances,
Figure FDA00034130634200000312
a group number indicating the client is assigned;
when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { Gk},k∈[1,K]Each grouping result contains the identification number of the client;
s403, for each group GkAnd performing model aggregation in the group to generate a global meta-model of the next round
Figure FDA00034130634200000313
The model polymerization mode is as follows:
Figure FDA00034130634200000314
wherein the content of the first and second substances,
Figure FDA00034130634200000315
after the r-th communication round is finished, generating a new k-th component model by using the following communication round r + 1;
Figure FDA0003413063420000041
its size is equal to the size of the data sampled at the time of local update of client i
Figure FDA0003413063420000042
In the context of a correlation, the correlation,
Figure FDA0003413063420000043
is the sampled data sample size, T is the local update times,
Figure FDA0003413063420000044
for the training data set of the client i,
Figure FDA0003413063420000045
in T-round training
Figure FDA0003413063420000046
Co-sampled data samples;
s404, the server side issues each group of updated meta-models to the corresponding clients in the group, for the unselected clients, the updated meta-models are not received, the steps S3 and S4 are repeated until the models are converged, and the server side stores the meta-models in each group
Figure FDA0003413063420000047
9. The method of claim 8, wherein the grouping operation additionally adds computational complexity as follows:
Figure FDA0003413063420000048
in the formula, K represents the number of groups, | S | represents the number of randomly selected clients in each round of the server, dhRepresenting the dimensionality of a local data distribution vector uploaded by a client, wherein the number of groups satisfies K < N;
for space complexity, since K meta-models need to be stored, the extra size of the storage space is
Figure FDA00034130634200000416
Wherein d isθRepresenting the parameter size of the meta-model.
10. The method for personalized federal meta-learning for data isomerism according to any one of claims 1 to 9, wherein the step S5 is specifically:
s501, all clients utilize local data sets for personalized processes of the clients
Figure FDA0003413063420000049
Calculating its data distribution vector
Figure FDA00034130634200000410
And upload it to the server side,
Figure FDA00034130634200000411
an encoder section representing an auto encoder;
s502, the server completes grouping of all the clients according to the step S402 and issues the trained meta-model in each group to the clients in the group;
s503, the client combines the local data set according to the received meta-model
Figure FDA00034130634200000412
Executing gradient descent for several times to obtain personalized model
Figure FDA00034130634200000413
φkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,
Figure FDA00034130634200000414
for the reverse propagation of the time-to-loss function
Figure FDA00034130634200000415
The resulting gradient is calculated.
CN202111535626.9A 2021-12-15 Personalized federal element learning method aiming at data isomerism Active CN114357067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111535626.9A CN114357067B (en) 2021-12-15 Personalized federal element learning method aiming at data isomerism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111535626.9A CN114357067B (en) 2021-12-15 Personalized federal element learning method aiming at data isomerism

Publications (2)

Publication Number Publication Date
CN114357067A true CN114357067A (en) 2022-04-15
CN114357067B CN114357067B (en) 2024-06-25

Family

ID=

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863169A (en) * 2022-04-27 2022-08-05 电子科技大学 Image classification method combining parallel ensemble learning and federal learning
CN114863499A (en) * 2022-06-30 2022-08-05 广州脉泽科技有限公司 Finger vein and palm vein identification method based on federal learning
CN115018085A (en) * 2022-05-23 2022-09-06 郑州大学 Data heterogeneity-oriented federated learning participation equipment selection method
CN115018019A (en) * 2022-08-05 2022-09-06 深圳前海环融联易信息科技服务有限公司 Model training method and system based on federal learning and storage medium
CN115115064A (en) * 2022-07-11 2022-09-27 山东大学 Semi-asynchronous federal learning method and system
CN115860116A (en) * 2022-12-02 2023-03-28 广州图灵科技有限公司 Federal learning method based on generative model and deep transfer learning
CN116306986A (en) * 2022-12-08 2023-06-23 哈尔滨工业大学(深圳) Federal learning method based on dynamic affinity aggregation and related equipment
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution
WO2024027164A1 (en) * 2022-08-01 2024-02-08 浙江大学 Adaptive personalized federated learning method supporting heterogeneous model
CN117973507A (en) * 2024-03-29 2024-05-03 山东省计算中心(国家超级计算济南中心) Group federation element learning method based on data enhancement and privacy enhancement

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560991A (en) * 2020-12-25 2021-03-26 中山大学 Personalized federal learning method based on hybrid expert model
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium
CN113420888A (en) * 2021-06-03 2021-09-21 中国石油大学(华东) Unsupervised federal learning method based on generalization domain self-adaptation
CN113705823A (en) * 2020-05-22 2021-11-26 华为技术有限公司 Model training method based on federal learning and electronic equipment
WO2021247944A1 (en) * 2020-06-03 2021-12-09 Qualcomm Technologies, Inc. Federated mixture models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705823A (en) * 2020-05-22 2021-11-26 华为技术有限公司 Model training method based on federal learning and electronic equipment
WO2021247944A1 (en) * 2020-06-03 2021-12-09 Qualcomm Technologies, Inc. Federated mixture models
WO2021115480A1 (en) * 2020-06-30 2021-06-17 平安科技(深圳)有限公司 Federated learning method, device, equipment, and storage medium
CN112560991A (en) * 2020-12-25 2021-03-26 中山大学 Personalized federal learning method based on hybrid expert model
CN113420888A (en) * 2021-06-03 2021-09-21 中国石油大学(华东) Unsupervised federal learning method based on generalization domain self-adaptation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李鉴等: "联邦学习及其在电信行业的应用", 信息通信技术与政策, no. 09, 15 September 2020 (2020-09-15), pages 39 - 45 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863169A (en) * 2022-04-27 2022-08-05 电子科技大学 Image classification method combining parallel ensemble learning and federal learning
CN115018085A (en) * 2022-05-23 2022-09-06 郑州大学 Data heterogeneity-oriented federated learning participation equipment selection method
CN115018085B (en) * 2022-05-23 2023-06-16 郑州大学 Data heterogeneity-oriented federal learning participation equipment selection method
CN114863499A (en) * 2022-06-30 2022-08-05 广州脉泽科技有限公司 Finger vein and palm vein identification method based on federal learning
CN114863499B (en) * 2022-06-30 2022-12-13 广州脉泽科技有限公司 Finger vein and palm vein identification method based on federal learning
CN115115064A (en) * 2022-07-11 2022-09-27 山东大学 Semi-asynchronous federal learning method and system
CN115115064B (en) * 2022-07-11 2023-09-05 山东大学 Semi-asynchronous federal learning method and system
WO2024027164A1 (en) * 2022-08-01 2024-02-08 浙江大学 Adaptive personalized federated learning method supporting heterogeneous model
CN115018019A (en) * 2022-08-05 2022-09-06 深圳前海环融联易信息科技服务有限公司 Model training method and system based on federal learning and storage medium
CN115860116A (en) * 2022-12-02 2023-03-28 广州图灵科技有限公司 Federal learning method based on generative model and deep transfer learning
CN116306986A (en) * 2022-12-08 2023-06-23 哈尔滨工业大学(深圳) Federal learning method based on dynamic affinity aggregation and related equipment
CN116306986B (en) * 2022-12-08 2024-01-12 哈尔滨工业大学(深圳) Federal learning method based on dynamic affinity aggregation and related equipment
CN117077817B (en) * 2023-10-13 2024-01-30 之江实验室 Personalized federal learning model training method and device based on label distribution
CN117077817A (en) * 2023-10-13 2023-11-17 之江实验室 Personalized federal learning model training method and device based on label distribution
CN117973507A (en) * 2024-03-29 2024-05-03 山东省计算中心(国家超级计算济南中心) Group federation element learning method based on data enhancement and privacy enhancement
CN117973507B (en) * 2024-03-29 2024-06-04 山东省计算中心(国家超级计算济南中心) Group federation element learning method based on data enhancement and privacy enhancement

Similar Documents

Publication Publication Date Title
Hamer et al. Fedboost: A communication-efficient algorithm for federated learning
US11636283B2 (en) Committed information rate variational autoencoders
US11836615B2 (en) Bayesian nonparametric learning of neural networks
CN107967515A (en) The method and apparatus quantified for neutral net
CN113705610A (en) Heterogeneous model aggregation method and system based on federal learning
CN110298446B (en) Deep neural network compression and acceleration method and system for embedded system
Liu et al. Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis
CN109978836A (en) User individual image esthetic evaluation method, system, medium and equipment based on meta learning
CN114385376B (en) Client selection method for federal learning of lower edge side of heterogeneous data
US20220129747A1 (en) System and method for deep customized neural networks for time series forecasting
CN115359298A (en) Sparse neural network-based federal meta-learning image classification method
CN115905687A (en) Cold start-oriented recommendation system and method based on meta-learning graph neural network
Yang et al. Residual encoder-decoder network for deep subspace clustering
CN113887698B (en) Integral knowledge distillation method and system based on graph neural network
CN118119954A (en) Hint adjustment using one or more machine learning models
CN114065033A (en) Training method of graph neural network model for recommending Web service combination
US20220129790A1 (en) System and method for deep enriched neural networks for time series forecasting
Wei et al. Sparse parameterization for epitomic dataset distillation
CN114357067A (en) Personalized federal meta-learning method for data isomerism
CN114357067B (en) Personalized federal element learning method aiming at data isomerism
CN115879507A (en) Large-scale graph generation method based on deep confrontation learning
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph
Jiao et al. Realization and improvement of object recognition system on raspberry pi 3b+
Li et al. Multi-task learning with attention: Constructing auxiliary tasks for learning to learn
Kukolj et al. Data clustering using a reorganizing neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant