CN114357067A

CN114357067A - Personalized federal meta-learning method for data isomerism

Info

Publication number: CN114357067A
Application number: CN202111535626.9A
Authority: CN
Inventors: 杨磊; 黄家明
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-04-15
Anticipated expiration: 2041-12-15

Abstract

The invention discloses a personalized federal meta-learning method aiming at data isomerism, which comprises the following steps: determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client; initializing parameters of a federal training phase; grouping the clients according to the local data distribution vector uploaded by the clients; aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration; and after the federal training is finished, the client finely adjusts the meta-model in the group and the local data thereof to generate the personalized model. When the client participates in federal training, the client with approximate data distribution is dynamically divided into the same group according to the local data distribution vector uploaded in each round, and a corresponding meta-model is set for each group, so that the problems of slow model convergence and low accuracy caused in the environment with highly heterogeneous data are solved.

Description

Personalized federal meta-learning method for data isomerism

Technical Field

The invention relates to the research field of distributed machine learning under data isomerism, in particular to a personalized federal meta-learning method aiming at data isomerism.

Background

The popularity of edge devices in modern society, such as mobile phones and wearable devices, has led to a rapid growth in the distributed private data that people produce. Although these abundant data provide great opportunities for machine learning applications, the social concern about data privacy is increasing with the advent of regulations such as General Data Protection Regulations (GDPR) and Health Insurance Privacy and Accountability Act (HIPAA). This makes federal learning more and more popular, which is a new distributed machine learning paradigm that enables machine learning models to be developed and trained on data islands in a cooperative and privacy-preserving manner. The primary motivation for individual users to participate in federal learning is to utilize a shared knowledge base of other users in federal learning. Because single users are often faced with data-level limitations, such as data scarcity, low quality data, and unseen label classes, these limit their ability to train well-behaved local models.

Federated learning is a framework that enables multiple users, called clients, to collaboratively train a shared global model on their federated data without moving the data from their local devices. A central server coordinates the entire process of federal learning, which is a multi-turn process. At the beginning of each round, the server sends the current global model to the participating clients. Each client trains the model on its local data and passes model updates back to the server. The server collects these updates from all clients and makes one update to the global model, ending the round. Federal learning overcomes the privacy problem described above by eliminating the need to aggregate all data on a single device. Since the primary motivation for clients to participate in federated learning is to obtain better models, those clients that do not have enough private data to develop accurate local models will benefit the most from the federated learned models. For those clients that have enough private data to train accurate local models, however, the benefits of participating in federal learning are controversial, as the accuracy of shared global models may be lower than their locally trained local models. Furthermore, for many applications, the distribution of data across clients is highly Non-independent and co-distributed (Non-IID). This statistical heterogeneity makes it difficult for federal learning to train a single model that works well for all clients.

While the initial goal of federal learning was to find a single global model that could be deployed on each client, a single model may not be able to serve all clients simultaneously, as the data distribution of the clients may vary greatly between different devices. Therefore, the heterogeneity of data becomes one of the major challenges to find an efficient federated learning model. Several personalized federal learning approaches have been proposed to deal with data heterogeneity, some of which use different local models to fit client-specific local data, but may also extract public knowledge from data of other devices. In order to deal with the challenges presented by the statistical heterogeneity of data, it is necessary to personalize the global model. For example, when a next word prediction task is run on the client, it is obvious that users in different areas output different answers for the next word prediction of the sentence "i live at … …", so the model needs to predict different answers for each user. Most personalization techniques typically involve two discrete steps. The first step is to build a global model in a collaborative way. In a second step, a personalized model is built for each client using the client's private data. Generally speaking, optimizing purely for global accuracy results in patterns that are difficult to personalize. In order for personalized federal learning to function in practice, the following three objectives must be addressed simultaneously, not independently: (1) developing an improved personalized model to benefit most clients; (2) developing an accurate global model, so that a client with limited local data benefits from the accurate global model; (3) the fast convergence of the model can be realized in a lower training round.

In recent years, personalized federal learning has become one of the most promising approaches to the statistical challenge of non-independent co-distributed data in joint learning, and has attracted increasing attention.Jiang et al (Yihan Jiang, Jakub)

The link between the MAML algorithm (Chelsea Finn, Pieter Abbel and Source Levine 2017 Model-Agnostic Meta-Learning For Fast Adaptation Of Deep nets ICML 1126, 1126 and Federal Learning was explored by the Keith Rush and Sreeram Kannan.2019 Impropeng Federal Learning. They treat the global meta-model of the MAML as a global model for federal learning and the tasks as local models for clients. They also show that existing optimization-based meta-learning algorithms (such as MAML) can be integrated into federal learning to achieve personalization. In the literature (Alireza Fallah, Aryan Mokhtari, and Asuman E.Ozdagar.2020. personalized fed Learning With Theoretical guidelines: A model-cementitious method-Learning approach. in NeurIPS.), the authors propose Per-FedAvg, a personalized version of the MAML-based Federated averaging algorithm, which customizes the personalized model by training a good initial global model on the customer's local data. Compared to MAML-type methods, Khodak et al (Mikhail Khodak, Maria-Florina Balcan, and amino S. Talwalk lkar.2019.adaptive hierarchical based method-Learning methods. in NeurIPS, 5915-.

Although these personalized federal learning methods have better performance (especially accuracy comparisons) than traditional federal learning methods, the current art still ignores the potential drawback of statistical heterogeneity of client data. If the feature space has a large diversity for each local data distribution, then the personalized model may have multiple generalization directions. In this case, if only one global model is relied on for guidance, the overall performance of the personalized model is easily degraded due to generalized negative migration. To address this situation, the present invention alleviates the negative migration problem caused by this situation by providing different global models for clients with different generalization directions.

Disclosure of Invention

Aiming at the defects in the prior art, the invention can provide a personalized federal meta-learning method aiming at data heterogeneity. Before the client side formally participates in the federal training, a self-encoder is trained to provide vectors of local data distribution, then the server side divides all the client sides participating in the training into a plurality of groups according to the data distribution vectors uploaded by the client sides, and maintains a corresponding number of generalized models in the server side to respectively guide the personalized process, so that the problems in the prior art are solved.

The invention is realized by at least one of the following technical schemes.

A personalized federal meta-learning method aiming at data heterogeneity comprises the following steps:

s1, determining the structure of an automatic encoder in the initialization stage and the structure of a meta-model in the personalization stage of each client;

s2, performing an initialization stage to obtain central points of different data distributions;

s3, the clients participate in federal training, and are divided into a plurality of groups according to the data distribution vectors uploaded in each round;

s4, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration;

and S5, after the federal training is finished, the client adjusts the meta-models in the group and the local data thereof to generate the personalized models.

Further, before participating in federal learning, the client needs to download a unified automatic encoder and a model structure of the meta-model from the server; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.

Further, the step S2 of acquiring the center point includes the following steps:

s201, order D_iLocal data set representing client, C_kWhich represents the center point of the image,

an encoder section representing an auto encoder;

s202, each client utilizes a local data set D_iTraining an autoencoder to obtain

S203, each client i uses an encoder

Obtaining each data sample x ∈ D_iEmbedded vector of

Then averaging the embedded vectors of all samples to obtain a local data distribution vector

Uploading the data to a server;

s204, the server collects client data distribution vectors { H }_iRunning K-means algorithm on the cluster to obtain K cluster center points C_k。

Further, the model structure of the automatic encoder is one of a stack automatic encoder, a convolution automatic encoder and a circulation automatic encoder.

Further, the federal training follows a federal mean algorithm, specifically: suppose there are N clients, each clientThe terminals have a fixed local data set D_iAt the beginning of each round, the server randomly selects a part of the clients, then sends the current global algorithm state to each client, each client performs local computation based on the global state and the local data set, and then sends the updated global state to the server. The server side aggregates the updated global states to generate a new global state, and repeats the process; under the framework of federal training.

Further, step S3 includes the steps of:

s301, let phi_kRepresenting the meta-model in the kth group, θ_iA local personalization model representing the client, R representing a total communication turn; selecting | S | clients from all clients participating in the federal training in each round; the number of times of local updating performed by the client is T; local data set D_iRepresents, and each client owns | D_iL number of data samples x; for client i, its local data set is divided into two parts

For the training of the client side, the training device,

personalization for the client;

s302, the server side randomly selects | S | clients and enables the corresponding meta-model phi to be_kSending the data to the selected client;

s303, the client receives the meta-model phi from the server_kTime, in the meta model phi_kAnd local data D_iAnd local updating is carried out, and when the local T belongs to the T round, the updating calculation mode is as follows:

wherein the content of the first and second substances,

representing a meta-model phi_kPerforming local updating of the t round on the client i in the communication round of the r round; α represents the learning rate of the local model;

representing the loss function in the model training process, its magnitude and

the correlation is carried out, and commonly used loss functions comprise a 0-1 loss function, a cross entropy loss function, a softmax loss function and the like;

the gradient size of the loss function when the neural network propagates reversely is represented;

represents from

Random sample size of

Data samples of (2) the updated local model

The meta-model is updated by the following calculation method:

wherein the content of the first and second substances,

representing a meta-model phi_kAfter the local update of round t on client i, in round r, the communication round is followed byA round of t +1 locally updated meta-models;

expressed is a loss function, the magnitude of which is

Associating; beta represents the learning rate of the meta-model, and is normally set to be beta ≦ alpha; subsequently, the step S303 is repeated until the local update of the T round is completed;

s304, order

Represents the data sample co-sampled in the T-round training, and the sampled data sample size is

And obtaining a local data distribution vector of the client under the communication round R belonging to R as follows:

s305, after the T-round local update is finished, the client side updates the meta-model

And a local data distribution vector h in step S304_iAnd sent to the server together.

Further, the Meta-Model update algorithm is Model-independent Meta-learning (MAML), and in step S303, when updating the Meta-Model, the gradient obtained by back propagation is as follows

The method specifically comprises the following steps:

using the first order gradient version to make the update, the second orderThe gradient is ignored and the corresponding gradient is updated to

Further, the step S4 is specifically:

s401, the server receives the updated meta model uploaded from the selected client list S

And corresponding local data distribution vector h_i}; the server side stores K clustering central points C_k；

S402, respectively calculating a local data distribution vector { h) uploaded by each client_iSimilarity with K cluster center points:

where cos represents cosine similarity, h_iRepresenting a local data distribution vector;

and allocating the client i to the group with the cluster center point with the maximum similarity:

wherein the content of the first and second substances,

a group number indicating the client is assigned;

when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { G_k}，k∈[1，K]Each grouping result contains the identification number of the client;

s403, for each group G_kAnd performing model aggregation in the group to generate the full of the next roundLocal meta model

The model polymerization mode is as follows:

wherein the content of the first and second substances,

after the r-th communication round is finished, generating a new k-th component model by using the following communication round r + 1;

its size is equal to the size of the data sampled at the time of local update of client i

In the context of a correlation, the correlation,

is the sampled data sample size, T is the local update times,

for the training data set of the client i,

in T-round training

The co-sampled data samples.

S404, the server side issues each group of updated meta-models to the corresponding clients in the group, for the unselected clients, the updated meta-models are not received, the steps S3 and S4 are repeated until the models are converged, and the server side stores the meta-models in each group

Further, the extra added computational complexity of the grouping operation is:

in the formula, K represents the number of groups, | S | represents the number of randomly selected clients in each round of the server, d_hRepresenting the dimensionality of a local data distribution vector uploaded by a client, wherein the number of groups satisfies K < N;

for space complexity, since K meta-models need to be stored, the extra size of the storage space is

Wherein d is_θRepresenting the parameter size of the meta-model.

Further, the step S5 is specifically:

s501, all clients utilize local data sets for personalized processes of the clients

Calculating its data distribution vector

And upload it to the server side,

an encoder section representing an auto encoder;

s502, the server completes grouping of all the clients according to the step S402 and issues the trained meta-model in each group to the clients in the group;

s503, the client combines the local data set according to the received meta-model

Executing gradient descent for several times to obtain personalized model

φ_kRepresenting the meta-models in the kth group, alpha representing the learning rate of the local model,

for the reverse propagation of the time-to-loss function

The resulting gradient is calculated.

Compared with the prior art, the invention has the following advantages and beneficial effects:

according to the invention, the clients with similar data distribution are adaptively divided into the same group in the federal training stage, and different global models are adopted in each group to respectively guide the clients to generate the personalized models, so that the problem of negative migration of a single global model to part of the clients is avoided, the cooperative training among the similar clients is promoted, the convergence rate is accelerated, and the accuracy of the personalized models is improved.

Drawings

FIG. 1 is a method flow diagram of a personalized federated meta-learning method for data heterogeneity according to the present invention;

FIG. 2 is a schematic illustration of the initialization phase of the present invention;

FIG. 3 is a schematic representation of the federal training phase of the present invention.

Detailed description of the invention

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The personalized federal study has wide application prospect in many fields, such as electronic commerce, finance, medical treatment, education, urban calculation, smart cities, edge calculation, Internet of things, mobile networks and the like. The following will describe how to perform personalized federal meta-learning, taking the mobile network field as an example.

As more and more users use smart phones, reliable and fast mobile input methods are also becoming more and more important. The next word prediction is a basic function of the input method, for example, the user inputs "today", words such as "evening, afternoon" appear in a pre-selection box of the input method for the user to select. Due to different input habits of different users, the distribution of local data samples of the users has great difference, so that personalized prediction models need to be established for the different users. In addition, in the training process, users with similar language habits are divided into the same group for collaborative training, so that the training process is accelerated, and the accuracy of the personalized prediction model is improved. In order to obtain a better next word prediction model, how to perform collaborative training of the model using personalized federal meta-learning on the local historical data of the user will be described below.

Example 1

The personalized federal meta-learning method for data heterogeneity shown in fig. 1 comprises the following steps:

firstly, determining an automatic encoder structure in an initialization stage and a meta-model structure in an individualized stage of each client;

before participating in federal learning, mobile equipment of each user firstly needs to download a uniform automatic encoder and a model structure of a meta-model from a cloud server; the automatic encoder for initialization phase is one of neural networks, generally used for data dimension reduction or feature learning, and is used for representing the distribution of local language data of a user; the meta-model used for the personalized stage refers to a model under meta-learning, which can adapt to a learning model of a new task through training of a small number of samples, and for making next word prediction, a common language model such as an LSTM (Long Short-Term Memory) language model is adopted;

secondly, performing an initialization stage to obtain central points of different data distributions;

users need to be preliminarily grouped according to local data distribution before participating in formal federal learning, because some users have similar input habits, and the grouping of the users in the same group helps to improve the performance of the model. Specifically, the method comprises the following steps:

s201, order D_iRepresenting local language data on a user's mobile device; c_kRepresenting the center point of data distribution in each group, and grouping the users according to the similarity degree with the center point of each group;

an encoder section representing an auto encoder;

s202, on the mobile equipment of the user, according to the local language data D_iTraining an auto-encoder to obtain an encoder portion

S203, in the mobile equipment of the user i, enabling each piece of local language data x to be in the range of D_iInput to an encoder

In the method, corresponding embedded vectors are obtained

Then embedding the vectors h of all samples_iAveraging to obtain local language data distribution vector

Uploading the data to a cloud server;

s204, the cloud server collects user local data distribution vectors { H }_iRunning K-means algorithm on the cluster to obtain K cluster center points C_k。

In this embodiment, the model structure of the automatic encoder is a stack automatic encoder, and it is assumed that the size dh of the compressed vector is 25.

As a preferred embodiment, as shown in fig. 2, the automatic encoder is composed of an encoder portion and a decoder portion, and the features of the hidden layer are obtained by compressing the original data through the encoder portion. For user 1, assume a total of 100 local language data samples { x }₁，...，x₁₀₀The part of the encoder after training is

Then the local data distribution vector (characteristic of the encoder output) calculated by client 1 is

Representing the encoder portion of the autoencoder. And then, on the server, after receiving the local data distribution vectors uploaded by all the clients, clustering through a K-means algorithm, and if the K of the set clustering algorithm is 2, the server needs to store 2 central points output by the K-means algorithm, and the central points are used in the federally trained adaptive grouping.

Thirdly, the client side participates in federal training, and the client side is divided into a plurality of groups according to the data distribution vector uploaded in each round;

specifically, the federal training procedure follows a federal averaging algorithm, specifically: assuming participation of N users, each user has a fixed local data set D_iAt the beginning of each round, the server randomly selects a portion of users, then the server sends the current global model to each user, each client performs local computations based on the global model and local datasets, then sends the updated global model to the server, the server aggregates the updated global models to generate a new global model, and repeats the process; under the framework of federal training, as shown in fig. 3, the specific process is as follows:

s301, since the users are divided into two groups, the meta-model set is { phi₁，φ₂Dividing users in the same group to personalize the local prediction model by using the meta-model of the group; θ i represents a model parameter of a local personalized predictive model of the user;

represents the total communication turn, which can be set to 500; selecting 3 users from all users participating in federal training in each round; the number of times of local updating of the mobile equipment of the user is T-5; local data set D_iThe local data set sizes of 5 users are 100, 200, 20, 400, and 30, respectively, and the data set sizes are denoted by | D_iI represents; for user i, its local data set is divided into two parts

For training of the mobile device of the user,

personalization with a user mobile device; the size of the data sample sampled per local update round of the user's mobile device is

S302, the cloud server randomly selects 3 users {1, 3, 5}, and corresponding meta model phi is used_kTo the part of users;

s303, if the user 1 receives the meta-model phi from the cloud server₁Time, in the meta model phi₁And local language data D₁And performing local updating, and updating the parameters of the local prediction model by using a back propagation algorithm when the local t is updated in the 3 th round, wherein the updating calculation mode of the parameters is as follows:

wherein the content of the first and second substances,

representing a meta-model phi₁In round 111, active communicationLocal update of round 3 is done on household 1, equivalent to

Alpha represents the learning rate of the local model and can be set as {0.001, 0.01, 0.1 };

representing the loss function in the model training process, its magnitude and

represents from

Random sample size of

The updated local prediction model

The meta-model is updated by performing a back propagation algorithm, which is calculated as follows:

wherein the content of the first and second substances,

representing a meta-model phi₁Local update of round 3 is performed on user 1 in round 111 communication round withMeta-models updated locally in round 4;

expressed is a loss function, the magnitude of which is

Associating; beta represents the learning rate of the meta model, and in general, beta is set to be beta ≦ alpha, and can be set to {0.0005, 0.005, 0.05 }; subsequently, the step S303 is repeated until the local update of the T-5 rounds is completed;

s304, order

Represents the data samples co-sampled in 5 rounds of training, the sampled data sample size is

Therefore, the local data distribution vector of the user under the communication turn r-111 can be obtained as follows:

s305, after 5 rounds of local updating are completed, the mobile equipment of the user enables the updated meta-model

And the local data distribution vector h in the step S304₁Sent to the cloud server together.

Specifically, the Meta-Model update algorithm is Model-independent Meta-Learning (MAML), and the magnitude of the gradient obtained by back propagation is equal to the magnitude of the gradient obtained when updating the Meta-Model in step S303

The method specifically comprises the following steps:

the first-order gradient version is used for updating, the second-order gradient is omitted, the calculation pressure of the mobile equipment can be reduced, and the corresponding gradient updating is changed into the updating

Fourthly, aggregating the client models in each group and issuing the aggregated client models to the clients in the group for the next iteration, specifically:

s401, the cloud server receives the updated meta model uploaded from the selected user list {1, 3, 5}

And corresponding local data distribution vector h_i}; in addition, the server side also stores 2 cluster center points { C₁，C₂}；

S402, respectively calculating a local data distribution vector { h) uploaded by each user₁，h₃，h₅The similarity between the cluster center points and 2 cluster center points is calculated as follows:

where cos represents the cosine similarity. Subsequently, the user 1 is assigned to the group in which the cluster center point with the largest similarity is located, and the calculation method is as follows:

wherein the content of the first and second substances,

indicating the group number to which user 1 is assigned.

When all users {1, 3, 5} finish grouping, get the grouping nodeAnd (5) fruit. Define the grouping result as { G_k}，k∈[1，2]Each grouping result contains the identification number of the user;

s403, for each group G_kAnd performing model aggregation in the group to generate a global meta-model of the next round

The model polymerization mode is as follows:

wherein the content of the first and second substances,

after the 111 th communication turn is finished, generating a new k component model by using the next communication turn 112;

In the context of a correlation, the correlation,

is the sampled data sample size, T is the local update times,

for the training data set of the user i,

in T-round training

The co-sampled data samples.

S404, the cloud server issues each group of updated meta-models to corresponding users in the group, and for the users which are not selected, the updated meta-models cannot be receivedAnd (4) molding. Repeating the steps S3 and S4 until the model converges, and the cloud server saves the meta model in each group

Fifthly, after the federal training is finished, the user finely adjusts the meta-models in the user group and the local data thereof to generate personalized models, specifically:

s501, all users {1, 2, 3, 4, 5} utilize local language data set

Calculating its data distribution vector

Uploading the data to a cloud server;

s502, the cloud server completes grouping of all users according to the step S402 and issues the trained meta-model in each group to the users in the group;

s503, combining the local data set by the user according to the received meta-model

Executing gradient descent for several times to obtain personalized model

for the reverse propagation of the time-to-loss function

The resulting gradient is calculated.

Specifically, the obtained gradient descent frequency of the personalized model is defaulted to one time, and different users can appropriately set the gradient descent frequency according to the model performance. After the final personalized training is finished, the user can obtain the fitnessPersonalized language prediction model [ theta ] fitting local data distribution₁，θ₂，θ₃，θ₄，θ₅}。

Example 2

As shown in fig. 3, a total of 5 users participate in federal training, which is divided into 2 groups in the cloud server. In any communication turn, assuming that the grouping result of the previous turn is {1, 2, 3} in the first group and {4, 5} in the second group, and the cloud server selects the users {1, 3, 5} to participate in the federal training process, then the users {1, 3} respectively receive the meta-model phi₁User {5} will receive the meta-model φ₂。

For user 1, it is assumed that it has 100 local data, and local updates are performed locally T ═ 5 times, and each update random sample batch has a size of

Then after the local update is completed, the total sampled data size is min (2 × 5, 100) ═ 50. Then calculating the distribution vector of the batch of sampling data

Together with locally updated meta-model

And uploading to a cloud server. Similarly, users {3, 5} also perform the above process.

For the cloud server, the data distribution vector uploaded by the users {1, 3, 5} and the updated meta model are received

The cloud server first traverses { h }₁，h₃，h₅List of local data distribution vectors { h } uploaded₁，h₃，h₅And performing similarity comparison with the stored 2 cluster center points to complete grouping, wherein the grouping result is assumed as follows: user {1, 5} is a group, and user {3} is a group. Then, model aggregation operation is carried out, if the users {1, 3, 5} sampleAre respectively 50, 20, 30, a weighting factor of w can be calculated₁＝50/(50+30)＝0.625，w₃＝20/20＝1.0，w₅30/(50+30) ═ 0.375}, and then the first set of weighted meta-models is obtained

The second set of weighted meta-models is

And repeating the step S3, and the step S4 until the preset stop condition R is reached, wherein the preset stop condition R is 500 communication rounds.

Furthermore, the above grouping operation adds additional computational complexity compared to the original federal averaging algorithm:

where 2K represents the number of packets, | S | 5 represents the number of clients randomly selected by the server per round, d_h25 represents the dimensionality of a local data distribution vector uploaded by a user, and the number of groups satisfies K < N;

for space complexity, since K is required to store 2 meta-models, the extra size of the storage space is as follows

Wherein d is_θThe parameter size of the meta-model is denoted 25.

Example 3

In one embodiment, the model structure of the automatic encoder may be one of a convolution automatic encoder and a cycle automatic encoder,

the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A personalized federal meta-learning method aiming at data heterogeneity is characterized by comprising the following steps:

2. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the client needs to download the unified automatic encoder and the model structure of the meta-model from the server before participating in the federal learning; the automatic encoder used for the initialization stage is one of neural networks, is used for extracting the statistical characteristics of the local data distribution of the client and is characterized in a vector form; the meta-model used for the personalized stage is a model under meta-learning, can adapt to a learning model of a new task through training of a small number of samples, and is used for adapting to local data of a client to generate a personalized model.

3. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein the step S2 of obtaining the central point comprises the following steps:

an encoder section representing an auto encoder;

S203, each client i uses an encoder

Obtaining each data sample x ∈ D_iEmbedded vector of

Uploading the data to a server;

4. The method of claim 1, wherein the model structure of the autoencoder is one of a stack autoencoder, a convolution autoencoder, and a loop autoencoder.

5. The method of claim 1, wherein the federal training followed a federationThe nation average algorithm specifically comprises the following steps: suppose there are N clients, each with a fixed local data set D_iAt the beginning of each round, the server randomly selects part of the clients, then sends the current global algorithm state to each client, each client performs local calculation based on the global state and the local data set, then sends the updated global state to the server, and the server aggregates the updated global states to generate a new global state and repeats the process; under the framework of federal training.

6. The method for personalized federal meta-learning for data isomerism as claimed in claim 1, wherein step S3 includes the following steps:

For the training of the client side, the training device,

personalization for the client;

s303, the client receives the meta-model phi from the server_kTime, in the meta model phi_kAnd local data D_iLocal update is carried out, and when the T is updated in the local T epsilon T round, the updated calculation is carried outThe method comprises the following steps:

wherein the content of the first and second substances,

representing the loss function in the model training process, its magnitude and

represents from

Random sample size of

Data samples of (2) the updated local model

The meta-model is updated by the following calculation method:

wherein the content of the first and second substances,

representing a meta-model phi_kIn the communication round of the r round, after the local update of the t round is carried out on the client i, the meta-model is locally updated by the next round of t + 1;

expressed is a loss function, the magnitude of which is

s304, order

7. The method of claim 6, wherein the method comprises a federated meta-learning method for data heterogeneityThen, the meta-model updating algorithm is model-independent meta-learning, and in step S303, when updating the meta-model, the gradient obtained by back propagation is as follows

The method specifically comprises the following steps:

using the first-order gradient version to update, neglecting the second-order gradient, and updating the corresponding gradient to

8. The method for personalized federal meta-learning for data heterogeneity according to claim 1, wherein the step S4 specifically includes:

wherein the content of the first and second substances,

a group number indicating the client is assigned;

when all the clients i belong to S to complete grouping, a grouping result is obtained; define the grouping result as { G_k},k∈[1,K]Each grouping result contains the identification number of the client;