CN114357067A - Personalized federal meta-learning method for data isomerism - Google Patents
Personalized federal meta-learning method for data isomerism Download PDFInfo
- Publication number
- CN114357067A CN114357067A CN202111535626.9A CN202111535626A CN114357067A CN 114357067 A CN114357067 A CN 114357067A CN 202111535626 A CN202111535626 A CN 202111535626A CN 114357067 A CN114357067 A CN 114357067A
- Authority
- CN
- China
- Prior art keywords
- model
- meta
- client
- local
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 56
- 238000009826 distribution Methods 0.000 claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 47
- 230000004931 aggregating effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 23
- 238000004891 communication Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 16
- 239000000126 substance Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012935 Averaging Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- YVPYQUNUQOZFHG-UHFFFAOYSA-N amidotrizoic acid Chemical compound CC(=O)NC1=C(I)C(NC(C)=O)=C(I)C(C(O)=O)=C1I YVPYQUNUQOZFHG-UHFFFAOYSA-N 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 2
- 101100455978 Arabidopsis thaliana MAM1 gene Proteins 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000005012 migration Effects 0.000 description 3
- 238000013508 migration Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008450 motivation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a personalized federal meta-learning method aiming at data isomerism, which comprises the following steps: determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client; initializing parameters of a federal training phase; grouping the clients according to the local data distribution vector uploaded by the clients; aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration; and after the federal training is finished, the client finely adjusts the meta-model in the group and the local data thereof to generate the personalized model. When the client participates in federal training, the client with approximate data distribution is dynamically divided into the same group according to the local data distribution vector uploaded in each round, and a corresponding meta-model is set for each group, so that the problems of slow model convergence and low accuracy caused in the environment with highly heterogeneous data are solved.
Description
Technical Field
The invention relates to the research field of distributed machine learning under data isomerism, in particular to a personalized federal meta-learning method aiming at data isomerism.
Background
The popularity of edge devices in modern society, such as mobile phones and wearable devices, has led to a rapid growth in the distributed private data that people produce. Although these abundant data provide great opportunities for machine learning applications, the social concern about data privacy is increasing with the advent of regulations such as General Data Protection Regulations (GDPR) and Health Insurance Privacy and Accountability Act (HIPAA). This makes federal learning more and more popular, which is a new distributed machine learning paradigm that enables machine learning models to be developed and trained on data islands in a cooperative and privacy-preserving manner. The primary motivation for individual users to participate in federal learning is to utilize a shared knowledge base of other users in federal learning. Because single users are often faced with data-level limitations, such as data scarcity, low quality data, and unseen label classes, these limit their ability to train well-behaved local models.
Federated learning is a framework that enables multiple users, called clients, to collaboratively train a shared global model on their federated data without moving the data from their local devices. A central server coordinates the entire process of federal learning, which is a multi-turn process. At the beginning of each round, the server sends the current global model to the participating clients. Each client trains the model on its local data and passes model updates back to the server. The server collects these updates from all clients and makes one update to the global model, ending the round. Federal learning overcomes the privacy problem described above by eliminating the need to aggregate all data on a single device. Since the primary motivation for clients to participate in federated learning is to obtain better models, those clients that do not have enough private data to develop accurate local models will benefit the most from the federated learned models. For those clients that have enough private data to train accurate local models, however, the benefits of participating in federal learning are controversial, as the accuracy of shared global models may be lower than their locally trained local models. Furthermore, for many applications, the distribution of data across clients is highly Non-independent and co-distributed (Non-IID). This statistical heterogeneity makes it difficult for federal learning to train a single model that works well for all clients.
While the initial goal of federal learning was to find a single global model that could be deployed on each client, a single model may not be able to serve all clients simultaneously, as the data distribution of the clients may vary greatly between different devices. Therefore, the heterogeneity of data becomes one of the major challenges to find an efficient federated learning model. Several personalized federal learning approaches have been proposed to deal with data heterogeneity, some of which use different local models to fit client-specific local data, but may also extract public knowledge from data of other devices. In order to deal with the challenges presented by the statistical heterogeneity of data, it is necessary to personalize the global model. For example, when a next word prediction task is run on the client, it is obvious that users in different areas output different answers for the next word prediction of the sentence "i live at … …", so the model needs to predict different answers for each user. Most personalization techniques typically involve two discrete steps. The first step is to build a global model in a collaborative way. In a second step, a personalized model is built for each client using the client's private data. Generally speaking, optimizing purely for global accuracy results in patterns that are difficult to personalize. In order for personalized federal learning to function in practice, the following three objectives must be addressed simultaneously, not independently: (1) developing an improved personalized model to benefit most clients; (2) developing an accurate global model, so that a client with limited local data benefits from the accurate global model; (3) the fast convergence of the model can be realized in a lower training round.
In recent years, personalized federal learning has become one of the most promising approaches to the statistical challenge of non-independent co-distributed data in joint learning, and has attracted increasing attention.Jiang et al (Yihan Jiang, Jakub)The link between the MAML algorithm (Chelsea Finn, Pieter Abbel and Source Levine 2017 Model-Agnostic Meta-Learning For Fast Adaptation Of Deep nets ICML 1126, 1126 and Federal Learning was explored by the Keith Rush and Sreeram Kannan.2019 Impropeng Federal Learning. They treat the global meta-model of the MAML as a global model for federal learning and the tasks as local models for clients. They also show that existing optimization-based meta-learning algorithms (such as MAML) can be integrated into federal learning to achieve personalization. In the literature (Alireza Fallah, Aryan Mokhtari, and Asuman E.Ozdagar.2020. personalized fed Learning With Theoretical guidelines: A model-cementitious method-Learning approach. in NeurIPS.), the authors propose Per-FedAvg, a personalized version of the MAML-based Federated averaging algorithm, which customizes the personalized model by training a good initial global model on the customer's local data. Compared to MAML-type methods, Khodak et al (Mikhail Khodak, Maria-Florina Balcan, and amino S. Talwalk lkar.2019.adaptive hierarchical based method-Learning methods. in NeurIPS, 5915-.
Although these personalized federal learning methods have better performance (especially accuracy comparisons) than traditional federal learning methods, the current art still ignores the potential drawback of statistical heterogeneity of client data. If the feature space has a large diversity for each local data distribution, then the personalized model may have multiple generalization directions. In this case, if only one global model is relied on for guidance, the overall performance of the personalized model is easily degraded due to generalized negative migration. To address this situation, the present invention alleviates the negative migration problem caused by this situation by providing different global models for clients with different generalization directions.
Disclosure of Invention
Aiming at the defects in the prior art, the invention can provide a personalized federal meta-learning method aiming at data heterogeneity. Before the client side formally participates in the federal training, a self-encoder is trained to provide vectors of local data distribution, then the server side divides all the client sides participating in the training into a plurality of groups according to the data distribution vectors uploaded by the client sides, and maintains a corresponding number of generalized models in the server side to respectively guide the personalized process, so that the problems in the prior art are solved.
The invention is realized by at least one of the following technical schemes.
A personalized federal meta-learning method aiming at data heterogeneity comprises the following steps:
s1, determining the structure of an automatic encoder in the initialization stage and the structure of a meta-model in the personalization stage of each client;
s2, performing an initialization stage to obtain central points of different data distributions;
s3, the clients participate in federal training, and are divided into a plurality of groups according to the data distribution vectors uploaded in each round;
s4, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration;
and S5, after the federal training is finished, the client adjusts the meta-models in the group and the local data thereof to generate the personalized models.
Further, before participating in federal learning, the client needs to download a unified automatic encoder and a model structure of the meta-model from the server; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.
Further, the step S2 of acquiring the center point includes the following steps:
s201, order DiLocal data set representing client, CkWhich represents the center point of the image,an encoder section representing an auto encoder;
S203, each client i uses an encoderObtaining each data sample x ∈ DiEmbedded vector of Then averaging the embedded vectors of all samples to obtain a local data distribution vector Uploading the data to a server;
s204, the server collects client data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck。
Further, the model structure of the automatic encoder is one of a stack automatic encoder, a convolution automatic encoder and a circulation automatic encoder.
Further, the federal training follows a federal mean algorithm, specifically: suppose there are N clients, each clientThe terminals have a fixed local data set DiAt the beginning of each round, the server randomly selects a part of the clients, then sends the current global algorithm state to each client, each client performs local computation based on the global state and the local data set, and then sends the updated global state to the server. The server side aggregates the updated global states to generate a new global state, and repeats the process; under the framework of federal training.
Further, step S3 includes the steps of:
s301, let phikRepresenting the meta-model in the kth group, θiA local personalization model representing the client, R representing a total communication turn; selecting | S | clients from all clients participating in the federal training in each round; the number of times of local updating performed by the client is T; local data set DiRepresents, and each client owns | DiL number of data samples x; for client i, its local data set is divided into two parts For the training of the client side, the training device,personalization for the client;
s302, the server side randomly selects | S | clients and enables the corresponding meta-model phi to bekSending the data to the selected client;
s303, the client receives the meta-model phi from the serverkTime, in the meta model phikAnd local data DiAnd local updating is carried out, and when the local T belongs to the T round, the updating calculation mode is as follows:
wherein the content of the first and second substances,representing a meta-model phikPerforming local updating of the t round on the client i in the communication round of the r round; α represents the learning rate of the local model;representing the loss function in the model training process, its magnitude andthe correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;the gradient size of the loss function when the neural network propagates reversely is represented;represents fromRandom sample size ofData samples of (2) the updated local modelThe meta-model is updated by the following calculation method:
wherein the content of the first and second substances,representing a meta-model phikAfter the local update of round t on client i, in round r, the communication round is followed byA round of t +1 locally updated meta-models;expressed is a loss function, the magnitude of which isAssociating; beta represents the learning rate of the meta-model, and is normally set to be beta ≦ alpha; subsequently, the step S303 is repeated until the local update of the T round is completed;
s304, orderRepresents the data sample co-sampled in the T-round training, and the sampled data sample size isAnd obtaining a local data distribution vector of the client under the communication round R belonging to R as follows:
s305, after the T-round local update is finished, the client side updates the meta-modelAnd a local data distribution vector h in step S304iAnd sent to the server together.
Further, the Meta-Model update algorithm is Model-independent Meta-learning (MAML), and in step S303, when updating the Meta-Model, the gradient obtained by back propagation is as followsThe method specifically comprises the following steps:
using the first order gradient version to make the update, the second orderThe gradient is ignored and the corresponding gradient is updated to
Further, the step S4 is specifically:
s401, the server receives the updated meta model uploaded from the selected client list SAnd corresponding local data distribution vector hi}; the server side stores K clustering central points Ck;
S402, respectively calculating a local data distribution vector { h) uploaded by each clientiSimilarity with K cluster center points:
where cos represents cosine similarity, hiRepresenting a local data distribution vector;
and allocating the client i to the group with the cluster center point with the maximum similarity:
wherein the content of the first and second substances,a group number indicating the client is assigned;
when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { Gk},k∈[1,K]Each grouping result contains the identification number of the client;
s403, for each group GkAnd performing model aggregation in the group to generate the full of the next roundLocal meta modelThe model polymerization mode is as follows:
wherein the content of the first and second substances,after the r-th communication round is finished, generating a new k-th component model by using the following communication round r + 1;its size is equal to the size of the data sampled at the time of local update of client iIn the context of a correlation, the correlation,is the sampled data sample size, T is the local update times,for the training data set of the client i,in T-round trainingThe co-sampled data samples.
S404, the server side issues each group of updated meta-models to the corresponding clients in the group, for the unselected clients, the updated meta-models are not received, the steps S3 and S4 are repeated until the models are converged, and the server side stores the meta-models in each group
Further, the extra added computational complexity of the grouping operation is:
in the formula, K represents the number of groups, | S | represents the number of randomly selected clients in each round of the server, dhRepresenting the dimensionality of a local data distribution vector uploaded by a client, wherein the number of groups satisfies K < N;
for space complexity, since K meta-models need to be stored, the extra size of the storage space isWherein d isθRepresenting the parameter size of the meta-model.
Further, the step S5 is specifically:
s501, all clients utilize local data sets for personalized processes of the clientsCalculating its data distribution vectorAnd upload it to the server side,an encoder section representing an auto encoder;
s502, the server completes grouping of all the clients according to the step S402 and issues the trained meta-model in each group to the clients in the group;
s503, the client combines the local data set according to the received meta-modelExecuting gradient descent for several times to obtain personalized modelφkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,for the reverse propagation of the time-to-loss functionThe resulting gradient is calculated.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the clients with similar data distribution are adaptively divided into the same group in the federal training stage, and different global models are adopted in each group to respectively guide the clients to generate the personalized models, so that the problem of negative migration of a single global model to part of the clients is avoided, the cooperative training among the similar clients is promoted, the convergence rate is accelerated, and the accuracy of the personalized models is improved.
Drawings
FIG. 1 is a method flow diagram of a personalized federated meta-learning method for data heterogeneity according to the present invention;
FIG. 2 is a schematic illustration of the initialization phase of the present invention;
FIG. 3 is a schematic representation of the federal training phase of the present invention.
Detailed description of the invention
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The personalized federal study has wide application prospect in many fields, such as electronic commerce, finance, medical treatment, education, urban calculation, smart cities, edge calculation, Internet of things, mobile networks and the like. The following will describe how to perform personalized federal meta-learning, taking the mobile network field as an example.
As more and more users use smart phones, reliable and fast mobile input methods are also becoming more and more important. The next word prediction is a basic function of the input method, for example, the user inputs "today", words such as "evening, afternoon" appear in a pre-selection box of the input method for the user to select. Due to different input habits of different users, the distribution of local data samples of the users has great difference, so that personalized prediction models need to be established for the different users. In addition, in the training process, users with similar language habits are divided into the same group for collaborative training, so that the training process is accelerated, and the accuracy of the personalized prediction model is improved. In order to obtain a better next word prediction model, how to perform collaborative training of the model using personalized federal meta-learning on the local historical data of the user will be described below.
Example 1
The personalized federal meta-learning method for data heterogeneity shown in fig. 1 comprises the following steps:
firstly, determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client;
before participating in federal learning, mobile equipment of each user firstly needs to download a uniform automatic encoder and a model structure of a meta-model from a cloud server; the automatic encoder for initialization phase is one of neural networks, generally used for data dimension reduction or feature learning, and is used for representing the distribution of local language data of a user; the meta-model used for the personalized stage refers to a model under meta-learning, which can adapt to a learning model of a new task through training of a small number of samples, and for making next word prediction, a common language model such as an LSTM (Long Short-Term Memory) language model is adopted;
secondly, performing an initialization stage to obtain central points of different data distributions;
users need to be preliminarily grouped according to local data distribution before participating in formal federal learning, because some users have similar input habits, and the grouping of the users in the same group helps to improve the performance of the model. Specifically, the method comprises the following steps:
s201, order DiRepresenting local language data on a user's mobile device; ckRepresenting the center point of data distribution in each group, and grouping the users according to the similarity degree with the center point of each group;an encoder section representing an auto encoder;
s202, on the mobile equipment of the user, according to the local language data DiTraining an auto-encoder to obtain an encoder portion
S203, in the mobile equipment of the user i, enabling each piece of local language data x to be in the range of DiInput to an encoderIn the method, corresponding embedded vectors are obtainedThen embedding the vectors h of all samplesiAveraging to obtain local language data distribution vectorUploading the data to a cloud server;
s204, the cloud server collects user local data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck。
In this embodiment, the model structure of the automatic encoder is a stack automatic encoder, and it is assumed that the size dh of the compressed vector is 25.
As a preferred embodiment, as shown in fig. 2, the automatic encoder is composed of an encoder portion and a decoder portion, and the features of the hidden layer are obtained by compressing the original data through the encoder portion. For user 1, assume a total of 100 local language data samples { x }1,...,x100The part of the encoder after training isThen the local data distribution vector (characteristic of the encoder output) calculated by client 1 is Representing the encoder portion of the autoencoder. And then, on the server, after receiving the local data distribution vectors uploaded by all the clients, clustering through a K-means algorithm, and if the K of the set clustering algorithm is 2, the server needs to store 2 central points output by the K-means algorithm, and the central points are used in the federally trained adaptive grouping.
Thirdly, the client side participates in federal training, and the client side is divided into a plurality of groups according to the data distribution vector uploaded in each round;
specifically, the federal training procedure follows a federal averaging algorithm, specifically: assuming participation of N users, each user has a fixed local data set DiAt the beginning of each round, the server randomly selects a portion of users, then the server sends the current global model to each user, each client performs local computations based on the global model and local datasets, then sends the updated global model to the server, the server aggregates the updated global models to generate a new global model, and repeats the process; under the framework of federal training, as shown in fig. 3, the specific process is as follows:
s301, since the users are divided into two groups, the meta-model set is { phi1,φ2Dividing users in the same group to personalize the local prediction model by using the meta-model of the group; θ i represents a model parameter of a local personalized predictive model of the user;represents the total communication turn, which can be set to 500; selecting 3 users from all users participating in federal training in each round; the number of times of local updating of the mobile equipment of the user is T-5; local data set DiThe local data set sizes of 5 users are 100, 200, 20, 400, and 30, respectively, and the data set sizes are denoted by | DiI represents; for user i, its local data set is divided into two parts For training of the mobile device of the user,personalization with a user mobile device; the size of the data sample sampled per local update round of the user's mobile device is
S302, the cloud server randomly selects 3 users {1, 3, 5}, and corresponding meta model phi is usedkTo the part of users;
s303, if the user 1 receives the meta-model phi from the cloud server1Time, in the meta model phi1And local language data D1And performing local updating, and updating the parameters of the local prediction model by using a back propagation algorithm when the local t is updated in the 3 th round, wherein the updating calculation mode of the parameters is as follows:
wherein the content of the first and second substances,representing a meta-model phi1In round 111, active communicationLocal update of round 3 is done on household 1, equivalent toAlpha represents the learning rate of the local model and can be set as {0.001, 0.01, 0.1 };representing the loss function in the model training process, its magnitude andthe correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;the gradient size of the loss function when the neural network propagates reversely is represented;represents fromRandom sample size ofThe updated local prediction modelThe meta-model is updated by performing a back propagation algorithm, which is calculated as follows:
wherein the content of the first and second substances,representing a meta-model phi1Local update of round 3 is performed on user 1 in round 111 communication round withMeta-models updated locally in round 4;expressed is a loss function, the magnitude of which isAssociating; beta represents the learning rate of the meta model, and in general, beta is set to be beta ≦ alpha, and can be set to {0.0005, 0.005, 0.05 }; subsequently, the step S303 is repeated until the local update of the T-5 rounds is completed;
s304, orderRepresents the data samples co-sampled in 5 rounds of training, the sampled data sample size isTherefore, the local data distribution vector of the user under the communication turn r-111 can be obtained as follows:
s305, after 5 rounds of local updating are completed, the mobile equipment of the user enables the updated meta-modelAnd the local data distribution vector h in the step S3041Sent to the cloud server together.
Specifically, the Meta-Model update algorithm is Model-independent Meta-Learning (MAML), and the magnitude of the gradient obtained by back propagation is equal to the magnitude of the gradient obtained when updating the Meta-Model in step S303The method specifically comprises the following steps:
the first-order gradient version is used for updating, the second-order gradient is omitted, the calculation pressure of the mobile equipment can be reduced, and the corresponding gradient updating is changed into the updating
Fourthly, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration, specifically:
s401, the cloud server receives the updated meta model uploaded from the selected user list {1, 3, 5}And corresponding local data distribution vector hi}; in addition, the server side also stores 2 cluster center points { C1,C2};
S402, respectively calculating a local data distribution vector { h) uploaded by each user1,h3,h5The similarity between the cluster center points and 2 cluster center points is calculated as follows:
where cos represents the cosine similarity. Subsequently, the user 1 is assigned to the group in which the cluster center point with the largest similarity is located, and the calculation method is as follows:
wherein the content of the first and second substances,indicating the group number to which user 1 is assigned.
When all users {1, 3, 5} finish grouping, get the grouping nodeAnd (5) fruit. Define the grouping result as { Gk},k∈[1,2]Each grouping result contains the identification number of the user;
s403, for each group GkAnd performing model aggregation in the group to generate a global meta-model of the next roundThe model polymerization mode is as follows:
wherein the content of the first and second substances,after the 111 th communication turn is finished, generating a new k component model by using the next communication turn 112;its size is equal to the size of the data sampled at the time of local update of client iIn the context of a correlation, the correlation,is the sampled data sample size, T is the local update times,for the training data set of the user i,in T-round trainingThe co-sampled data samples.
S404, the cloud server issues each group of updated meta-models to corresponding users in the group, and for the users which are not selected, the updated meta-models cannot be receivedAnd (4) molding. Repeating the steps S3 and S4 until the model converges, and the cloud server saves the meta model in each group
Fifthly, after the federal training is finished, the user finely adjusts the meta-models in the user group and the local data thereof to generate personalized models, specifically:
s501, all users {1, 2, 3, 4, 5} utilize local language data setCalculating its data distribution vectorUploading the data to a cloud server;
s502, the cloud server completes grouping of all users according to the step S402 and issues the trained meta-model in each group to the users in the group;
s503, combining the local data set by the user according to the received meta-modelExecuting gradient descent for several times to obtain personalized modelφkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,for the reverse propagation of the time-to-loss functionThe resulting gradient is calculated.
Specifically, the obtained gradient descent frequency of the personalized model is defaulted to one time, and different users can appropriately set the gradient descent frequency according to the model performance. After the final personalized training is finished, the user can obtain the fitnessPersonalized language prediction model [ theta ] fitting local data distribution1,θ2,θ3,θ4,θ5}。
Example 2
As shown in fig. 3, a total of 5 users participate in federal training, which is divided into 2 groups in the cloud server. In any communication turn, assuming that the grouping result of the previous turn is {1, 2, 3} in the first group and {4, 5} in the second group, and the cloud server selects the users {1, 3, 5} to participate in the federal training process, then the users {1, 3} respectively receive the meta-model phi1User {5} will receive the meta-model φ2。
For user 1, it is assumed that it has 100 local data, and local updates are performed locally T ═ 5 times, and each update random sample batch has a size ofThen after the local update is completed, the total sampled data size is min (2 × 5, 100) ═ 50. Then calculating the distribution vector of the batch of sampling dataTogether with locally updated meta-modelAnd uploading to a cloud server. Similarly, users {3, 5} also perform the above process.
For the cloud server, the data distribution vector uploaded by the users {1, 3, 5} and the updated meta model are receivedThe cloud server first traverses { h }1,h3,h5List of local data distribution vectors { h } uploaded1,h3,h5And performing similarity comparison with the stored 2 cluster center points to complete grouping, wherein the grouping result is assumed as follows: user {1, 5} is a group, and user {3} is a group. Then, model aggregation operation is carried out, if the users {1, 3, 5} sampleAre respectively 50, 20, 30, a weighting factor of w can be calculated1=50/(50+30)=0.625,w3=20/20=1.0,w530/(50+30) ═ 0.375}, and then the first set of weighted meta-models is obtainedThe second set of weighted meta-models is
And repeating the step S3, and the step S4 until the preset stop condition R is reached, wherein the preset stop condition R is 500 communication rounds.
Furthermore, the above grouping operation adds additional computational complexity compared to the original federal averaging algorithm:
where 2K represents the number of packets, | S | 5 represents the number of clients randomly selected by the server per round, dh25 represents the dimensionality of a local data distribution vector uploaded by a user, and the number of groups satisfies K < N;
for space complexity, since K is required to store 2 meta-models, the extra size of the storage space is as followsWherein d isθThe parameter size of the meta-model is denoted 25.
Example 3
In one embodiment, the model structure of the automatic encoder may be one of a convolution automatic encoder and a cycle automatic encoder,
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (10)
1. A personalized federal meta-learning method aiming at data heterogeneity is characterized by comprising the following steps:
s1, determining the structure of an automatic encoder in the initialization stage and the structure of a meta-model in the personalization stage of each client;
s2, performing an initialization stage to obtain central points of different data distributions;
s3, the clients participate in federal training, and are divided into a plurality of groups according to the data distribution vectors uploaded in each round;
s4, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration;
and S5, after the federal training is finished, the client adjusts the meta-models in the group and the local data thereof to generate the personalized models.
2. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the client needs to download the unified automatic encoder and the model structure of the meta-model from the server before participating in the federal learning; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.
3. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein the step S2 of obtaining the central point comprises the following steps:
s201, order DiLocal data set representing client, CkWhich represents the center point of the image,an encoder section representing an auto encoder;
S203, each client i uses an encoderObtaining each data sample x ∈ DiEmbedded vector of Then averaging the embedded vectors of all samples to obtain a local data distribution vector Uploading the data to a server;
s204, the server collects client data distribution vectors { H }iRunning K-means algorithm on the cluster to obtain K cluster center points Ck。
4. The method of claim 1, wherein the model structure of the autoencoder is one of a stack autoencoder, a convolution autoencoder, and a loop autoencoder.
5. The method of claim 1, wherein the federal training followed a federationThe nation average algorithm specifically comprises the following steps: suppose there are N clients, each with a fixed local data set DiAt the beginning of each round, the server randomly selects part of the clients, then sends the current global algorithm state to each client, each client performs local calculation based on the global state and the local data set, then sends the updated global state to the server, and the server aggregates the updated global states to generate a new global state and repeats the process; under the framework of federal training.
6. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein step S3 includes the following steps:
s301, let phikRepresenting the meta-model in the kth group, θiA local personalization model representing the client, R representing a total communication turn; selecting | S | clients from all clients participating in the federal training in each round; the number of times of local updating performed by the client is T; local data set DiRepresents, and each client owns | DiL number of data samples x; for client i, its local data set is divided into two parts For the training of the client side, the training device,personalization for the client;
s302, the server side randomly selects | S | clients and enables the corresponding meta-model phi to bekSending the data to the selected client;
s303, the client receives the meta-model phi from the serverkTime, in the meta model phikAnd local data DiLocal update is carried out, and when the T is updated in the local T epsilon T round, the updated calculation is carried outThe method comprises the following steps:
wherein the content of the first and second substances,representing a meta-model phikPerforming local updating of the t round on the client i in the communication round of the r round; α represents the learning rate of the local model;representing the loss function in the model training process, its magnitude andthe correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;the gradient size of the loss function when the neural network propagates reversely is represented;represents fromRandom sample size ofData samples of (2) the updated local modelThe meta-model is updated by the following calculation method:
wherein the content of the first and second substances,representing a meta-model phikIn the communication round of the r round, after the local update of the t round is carried out on the client i, the meta-model is locally updated by the next round of t + 1;expressed is a loss function, the magnitude of which isAssociating; beta represents the learning rate of the meta-model, and is normally set to be beta ≦ alpha; subsequently, the step S303 is repeated until the local update of the T round is completed;
s304, orderRepresents the data sample co-sampled in the T-round training, and the sampled data sample size isAnd obtaining a local data distribution vector of the client under the communication round R belonging to R as follows:
7. The method of claim 6, wherein the method comprises a federated meta-learning method for data heterogeneityThen, the meta-model updating algorithm is model-independent meta-learning, and in step S303, when updating the meta-model, the gradient obtained by back propagation is as followsThe method specifically comprises the following steps:
8. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the step S4 specifically includes:
s401, the server receives the updated meta model uploaded from the selected client list SAnd corresponding local data distribution vector hi}; the server side stores K clustering central points Ck;
S402, respectively calculating a local data distribution vector { h) uploaded by each clientiSimilarity with K cluster center points:
where cos represents cosine similarity, hiRepresenting a local data distribution vector;
and allocating the client i to the group with the cluster center point with the maximum similarity:
wherein the content of the first and second substances,a group number indicating the client is assigned;
when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { Gk},k∈[1,K]Each grouping result contains the identification number of the client;
s403, for each group GkAnd performing model aggregation in the group to generate a global meta-model of the next roundThe model polymerization mode is as follows:
wherein the content of the first and second substances,after the r-th communication round is finished, generating a new k-th component model by using the following communication round r + 1;its size is equal to the size of the data sampled at the time of local update of client iIn the context of a correlation, the correlation,is the sampled data sample size, T is the local update times,for the training data set of the client i,in T-round trainingCo-sampled data samples;
9. The method of claim 8, wherein the grouping operation additionally adds computational complexity as follows:
in the formula, K represents the number of groups, | S | represents the number of randomly selected clients in each round of the server, dhRepresenting the dimensionality of a local data distribution vector uploaded by a client, wherein the number of groups satisfies K < N;
10. The method for personalized federal meta-learning for data isomerism according to any one of claims 1 to 9, wherein the step S5 is specifically:
s501, all clients utilize local data sets for personalized processes of the clientsCalculating its data distribution vectorAnd upload it to the server side,an encoder section representing an auto encoder;
s502, the server completes grouping of all the clients according to the step S402 and issues the trained meta-model in each group to the clients in the group;
s503, the client combines the local data set according to the received meta-modelExecuting gradient descent for several times to obtain personalized modelφkRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,for the reverse propagation of the time-to-loss functionThe resulting gradient is calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111535626.9A CN114357067B (en) | 2021-12-15 | Personalized federal element learning method aiming at data isomerism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111535626.9A CN114357067B (en) | 2021-12-15 | Personalized federal element learning method aiming at data isomerism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114357067A true CN114357067A (en) | 2022-04-15 |
CN114357067B CN114357067B (en) | 2024-06-25 |
Family
ID=
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863169A (en) * | 2022-04-27 | 2022-08-05 | 电子科技大学 | Image classification method combining parallel ensemble learning and federal learning |
CN114863499A (en) * | 2022-06-30 | 2022-08-05 | 广州脉泽科技有限公司 | Finger vein and palm vein identification method based on federal learning |
CN115018085A (en) * | 2022-05-23 | 2022-09-06 | 郑州大学 | Data heterogeneity-oriented federated learning participation equipment selection method |
CN115018019A (en) * | 2022-08-05 | 2022-09-06 | 深圳前海环融联易信息科技服务有限公司 | Model training method and system based on federal learning and storage medium |
CN115115064A (en) * | 2022-07-11 | 2022-09-27 | 山东大学 | Semi-asynchronous federal learning method and system |
CN115860116A (en) * | 2022-12-02 | 2023-03-28 | 广州图灵科技有限公司 | Federal learning method based on generative model and deep transfer learning |
CN116306986A (en) * | 2022-12-08 | 2023-06-23 | 哈尔滨工业大学(深圳) | Federal learning method based on dynamic affinity aggregation and related equipment |
CN117077817A (en) * | 2023-10-13 | 2023-11-17 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
WO2024027164A1 (en) * | 2022-08-01 | 2024-02-08 | 浙江大学 | Adaptive personalized federated learning method supporting heterogeneous model |
CN117973507A (en) * | 2024-03-29 | 2024-05-03 | 山东省计算中心(国家超级计算济南中心) | Group federation element learning method based on data enhancement and privacy enhancement |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560991A (en) * | 2020-12-25 | 2021-03-26 | 中山大学 | Personalized federal learning method based on hybrid expert model |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
CN113420888A (en) * | 2021-06-03 | 2021-09-21 | 中国石油大学(华东) | Unsupervised federal learning method based on generalization domain self-adaptation |
CN113705823A (en) * | 2020-05-22 | 2021-11-26 | 华为技术有限公司 | Model training method based on federal learning and electronic equipment |
WO2021247944A1 (en) * | 2020-06-03 | 2021-12-09 | Qualcomm Technologies, Inc. | Federated mixture models |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705823A (en) * | 2020-05-22 | 2021-11-26 | 华为技术有限公司 | Model training method based on federal learning and electronic equipment |
WO2021247944A1 (en) * | 2020-06-03 | 2021-12-09 | Qualcomm Technologies, Inc. | Federated mixture models |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
CN112560991A (en) * | 2020-12-25 | 2021-03-26 | 中山大学 | Personalized federal learning method based on hybrid expert model |
CN113420888A (en) * | 2021-06-03 | 2021-09-21 | 中国石油大学(华东) | Unsupervised federal learning method based on generalization domain self-adaptation |
Non-Patent Citations (1)
Title |
---|
李鉴等: "联邦学习及其在电信行业的应用", 信息通信技术与政策, no. 09, 15 September 2020 (2020-09-15), pages 39 - 45 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863169A (en) * | 2022-04-27 | 2022-08-05 | 电子科技大学 | Image classification method combining parallel ensemble learning and federal learning |
CN115018085A (en) * | 2022-05-23 | 2022-09-06 | 郑州大学 | Data heterogeneity-oriented federated learning participation equipment selection method |
CN115018085B (en) * | 2022-05-23 | 2023-06-16 | 郑州大学 | Data heterogeneity-oriented federal learning participation equipment selection method |
CN114863499A (en) * | 2022-06-30 | 2022-08-05 | 广州脉泽科技有限公司 | Finger vein and palm vein identification method based on federal learning |
CN114863499B (en) * | 2022-06-30 | 2022-12-13 | 广州脉泽科技有限公司 | Finger vein and palm vein identification method based on federal learning |
CN115115064A (en) * | 2022-07-11 | 2022-09-27 | 山东大学 | Semi-asynchronous federal learning method and system |
CN115115064B (en) * | 2022-07-11 | 2023-09-05 | 山东大学 | Semi-asynchronous federal learning method and system |
WO2024027164A1 (en) * | 2022-08-01 | 2024-02-08 | 浙江大学 | Adaptive personalized federated learning method supporting heterogeneous model |
CN115018019A (en) * | 2022-08-05 | 2022-09-06 | 深圳前海环融联易信息科技服务有限公司 | Model training method and system based on federal learning and storage medium |
CN115860116A (en) * | 2022-12-02 | 2023-03-28 | 广州图灵科技有限公司 | Federal learning method based on generative model and deep transfer learning |
CN116306986A (en) * | 2022-12-08 | 2023-06-23 | 哈尔滨工业大学(深圳) | Federal learning method based on dynamic affinity aggregation and related equipment |
CN116306986B (en) * | 2022-12-08 | 2024-01-12 | 哈尔滨工业大学(深圳) | Federal learning method based on dynamic affinity aggregation and related equipment |
CN117077817B (en) * | 2023-10-13 | 2024-01-30 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
CN117077817A (en) * | 2023-10-13 | 2023-11-17 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
CN117973507A (en) * | 2024-03-29 | 2024-05-03 | 山东省计算中心(国家超级计算济南中心) | Group federation element learning method based on data enhancement and privacy enhancement |
CN117973507B (en) * | 2024-03-29 | 2024-06-04 | 山东省计算中心(国家超级计算济南中心) | Group federation element learning method based on data enhancement and privacy enhancement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hamer et al. | Fedboost: A communication-efficient algorithm for federated learning | |
US11636283B2 (en) | Committed information rate variational autoencoders | |
US11836615B2 (en) | Bayesian nonparametric learning of neural networks | |
CN107967515A (en) | The method and apparatus quantified for neutral net | |
CN113705610A (en) | Heterogeneous model aggregation method and system based on federal learning | |
CN110298446B (en) | Deep neural network compression and acceleration method and system for embedded system | |
Liu et al. | Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis | |
CN109978836A (en) | User individual image esthetic evaluation method, system, medium and equipment based on meta learning | |
CN114385376B (en) | Client selection method for federal learning of lower edge side of heterogeneous data | |
US20220129747A1 (en) | System and method for deep customized neural networks for time series forecasting | |
CN115359298A (en) | Sparse neural network-based federal meta-learning image classification method | |
CN115905687A (en) | Cold start-oriented recommendation system and method based on meta-learning graph neural network | |
Yang et al. | Residual encoder-decoder network for deep subspace clustering | |
CN113887698B (en) | Integral knowledge distillation method and system based on graph neural network | |
CN118119954A (en) | Hint adjustment using one or more machine learning models | |
CN114065033A (en) | Training method of graph neural network model for recommending Web service combination | |
US20220129790A1 (en) | System and method for deep enriched neural networks for time series forecasting | |
Wei et al. | Sparse parameterization for epitomic dataset distillation | |
CN114357067A (en) | Personalized federal meta-learning method for data isomerism | |
CN114357067B (en) | Personalized federal element learning method aiming at data isomerism | |
CN115879507A (en) | Large-scale graph generation method based on deep confrontation learning | |
CN115131605A (en) | Structure perception graph comparison learning method based on self-adaptive sub-graph | |
Jiao et al. | Realization and improvement of object recognition system on raspberry pi 3b+ | |
Li et al. | Multi-task learning with attention: Constructing auxiliary tasks for learning to learn | |
Kukolj et al. | Data clustering using a reorganizing neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |