CN117910600B

CN117910600B - Meta-continuous federal learning system and method based on fast learning and knowledge accumulation

Info

Publication number: CN117910600B
Application number: CN202410296683.3A
Authority: CN
Inventors: 高龙翔; 李秉泽; 曲悠扬; 顾树俊; 崔磊; 王宝海
Original assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2024-03-15
Filing date: 2024-03-15
Publication date: 2024-05-28
Anticipated expiration: 2044-03-15
Also published as: CN117910600A

Abstract

The invention provides a meta-continuous federal learning system and method based on rapid learning and knowledge accumulation, which belong to the technical field of artificial intelligence, and the scheme solves the problems that an edge device is difficult to realize effective model learning due to the fact that a model trained on a local side is easy to be over-fitted due to insufficient computing resources and limited sample data based on a mode of combining federal meta-learning and federal continuous learning; meanwhile, the problem that the edge equipment is difficult to quickly adapt to a new task and the model can cause disastrous forgetting to an old task when learning the new task is solved, and the training efficiency and accuracy of the model are improved.

Description

Meta-continuous federal learning system and method based on fast learning and knowledge accumulation

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a meta-continuous federal learning system and method based on rapid learning and knowledge accumulation.

Background

In conventional centralized machine learning, data typically needs to be centralized on a central server for training, which may involve leakage risk of user-sensitive information. In many applications, data is distributed across multiple geographic locations or devices, such as mobile devices, sensors, edge devices, and the like. In this case, the conventional centralized training method may suffer from high latency, large data transmission, and data centralization. In some applications, the model needs to be continuously adapted to new data and new tasks, rather than being statically trained once. In view of the above problems, federal learning, continuous learning, federal meta learning or federal continuous learning is generally adopted at present, but the above methods still have limitations.

The inventor finds that a single edge device is difficult to realize effective model learning because of insufficient computing resources and limited sample data, a model trained on a local side is easy to overfit; the addition of federal learning solves the problem that a single edge device is difficult to realize effective model learning, but improves the generalization capability of a model and weakens the individuation of terminal equipment; meanwhile, as the training tasks are continuously arrived, the model can forget the old tasks when learning the new tasks.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a meta-continuous federal learning system and method based on rapid learning and knowledge accumulation, and the scheme is based on a mode of combining federal meta-learning and federal continuous learning, so that the problems that an insufficient computing resource and a model trained by limited sample data at a local end are easy to be over-fitted, and an edge device is difficult to realize effective model learning are solved; meanwhile, the problem that the edge equipment is difficult to quickly adapt to a new task and the model can cause disastrous forgetting to an old task when learning the new task is solved, and the training efficiency and accuracy of the model are improved.

According to a first aspect of an embodiment of the present invention, there is provided a meta-continuous federal learning system based on fast learning and knowledge accumulation, comprising:

The server side is used for transmitting the global model to be trained to a plurality of clients participating in federal learning; receiving model parameters trained by the clients, aggregating the model parameters of a plurality of clients to form an updated global model, and executing the training of the next round until the iteration requirement is met;

The system comprises a client, a local data set and a local buffer area, wherein the client is used for receiving a global model from the server, training the model based on the local data set and returning trained model parameters to the server, the model training comprises pretraining based on federal element learning and continuous training based on federal continuous learning, model initialization is carried out by utilizing the model parameters obtained by pretraining in the continuous training, and in the continuous training process, if task drift does not occur, the current training data is added into the local buffer area; if the task drift occurs, taking the data in the local buffer zone as training data for the continuous training process; and resetting the local buffer.

Further, the pre-training based on federal element learning comprises a first training task, wherein the first training task is specifically constructed according to sampled training data, and the model to be trained is trained based on a training data set corresponding to the first training task.

Further, in the first training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; and updating the initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters.

Further, the pretraining based on federal element learning further comprises a second training task, wherein the second training task specifically comprises: and constructing a second training task according to the resampled training data, taking the model parameters obtained by training the first training task as the initial model parameters of the current task, and training the model to be trained based on the training data set corresponding to the second training task.

Further, in the second training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; updating initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters; and repeating the first training task and the second training task until the iteration ending condition is met.

Further, the continuous training based on federal continuous learning specifically comprises: and initializing a model by utilizing the model parameters obtained by pre-training, training the model based on the data set corresponding to the continuous training to obtain corresponding model parameters, and transmitting the corresponding model parameters to a server side.

Further, judging whether the task drift occurs or not, specifically judging according to a comparison result of the loss obtained by current round calculation in the continuous learning stage and the loss obtained by previous training calculation, and judging that the task drift does not occur if the difference value of the two is smaller than a preset threshold value; and if the task drift is not smaller than the preset threshold value, judging that the task drift occurs.

Further, when no task drift occurs, after the model parameters are finely adjusted, current training data are added into a buffer zone of the local client; when task drift occurs, data is taken out of the local buffer zone as training data to carry out continuous training, and model parameters are updated by using the training data; the local buffer is reset.

Furthermore, the federal continuous learning method specifically adopts Continual-MAML method.

According to a second aspect of the embodiments of the present invention, there is provided a meta-continuous federal learning method based on fast learning and knowledge accumulation, which is based on the above-mentioned meta-continuous federal learning system based on fast learning and knowledge accumulation, including:

the global model to be trained is issued to a plurality of clients participating in federal learning through a server side;

The method comprises the steps that a client receives a global model from a server, performs model training based on a local data set, and returns trained model parameters to the server, wherein the model training comprises pretraining based on federal element learning and continuous training based on federal continuous learning, model initialization is performed by using the model parameters obtained by pretraining in the continuous training, and in the continuous training process, if task drift does not occur, current training data are added to a local buffer zone; if the task drift occurs, taking the data in the local buffer zone as training data for the continuous training process; resetting the local buffer;

The server receives the model parameters trained by the clients, aggregates the model parameters of the clients to form an updated global model, and performs the training of the next round until the iteration requirement is met.

The one or more of the above technical solutions have the following beneficial effects:

(1) The invention provides a meta-continuous federal learning system and method based on rapid learning and knowledge accumulation, and the scheme solves the problems that insufficient computing resources and a model trained by limited sample data at a local end are easy to be over-fitted and effective model learning is difficult to realize by edge equipment based on a mode of combining federal meta-learning and federal continuous learning.

(2) The method solves the problems that the edge equipment is difficult to quickly adapt to a new task and the model can cause disastrous forgetting to the old task when learning the new task, and improves the training efficiency and accuracy of the model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic diagram of the overall structure of a meta-continuous federal learning method based on fast learning and knowledge accumulation according to an embodiment of the present invention;

FIG. 2 is a flowchart of a meta-continuous federal learning method based on fast learning and knowledge accumulation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a pre-training stage according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a continuous training phase according to an embodiment of the present invention.

Detailed Description

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In order that the invention may be readily understood, a further description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings and are not to be construed as limiting embodiments of the invention.

It will be appreciated by those skilled in the art that the drawings are merely schematic representations of examples and that the elements of the drawings are not necessarily required to practice the invention.

Example 1

It is an object of this embodiment to provide a meta-continuous federal learning system based on fast learning and knowledge accumulation.

A meta-continuous federal learning system based on fast learning and knowledge accumulation, comprising:

the server side is used for transmitting the global model to be trained to a plurality of clients participating in federal learning; receiving model parameters trained by the clients, aggregating the model parameters of the clients to form an updated global model, and executing the training of the next round until the iteration requirement is met;

The system comprises a client, a local data set and a local buffer area, wherein the client is used for receiving a global model from the server, training the model based on the local data set and returning trained model parameters to the server, the model training comprises pretraining based on federal element learning and continuous training based on federal continuous learning, model initialization is carried out by utilizing the model parameters obtained by pretraining in the continuous training, and in the continuous training process, if task drift does not occur, the current training data is added into the local buffer area; if the task drift occurs, taking the data in the local buffer zone as training data for the continuous training process; the local buffer is reset.

In specific implementation, the pre-training based on federal element learning comprises a first training task, wherein the first training task is specifically constructed according to sampled training data, and a model to be trained is trained based on a training data set corresponding to the first training task.

In specific implementation, in the first training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; updating initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters;

In a specific implementation, the pretraining based on federal element learning further includes a second training task, where the second training task specifically is: and constructing a second training task according to the resampled training data, taking the model parameters obtained by training the first training task as the initial model parameters of the current task, and training the model to be trained based on the training data set corresponding to the second training task.

In specific implementation, in the second training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; updating initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters; and repeating the first training task and the second training task until the iteration ending condition is met.

In a specific implementation, the continuous training based on federal continuous learning is specifically: and initializing a model by utilizing the model parameters obtained by pre-training, training the model based on the data set corresponding to the continuous training to obtain corresponding model parameters, and transmitting the corresponding model parameters to a server side.

In specific implementation, judging whether the task drift occurs or not, specifically judging according to a comparison result of the loss obtained by current round calculation and the loss obtained by previous training calculation in a continuous learning stage, and judging that the task drift does not occur if the difference value of the loss and the loss is smaller than a preset threshold value; and if the task drift is not smaller than the preset threshold value, judging that the task drift occurs.

In specific implementation, when task drift does not occur, after fine adjustment is performed on model parameters, current training data is added into a buffer zone local to a client; when task drift occurs, data is taken out of the local buffer zone as training data to carry out continuous training, and model parameters are updated by using the training data; the local buffer is reset.

In a specific implementation, the federal continuous learning method specifically employs Continual-MAML method. The Continual-MAML method from Massimo Caccia, Pau Rodríguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, and Laurent Charlin. 2020. Online fast adaptation and knowledge accumulation (OSAKA): a new approach to continual learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS'20). Curran Associates Inc., Red Hook, NY, USA, Article 1387, 16532–16545. is not described here again because it is an existing method.

For easy understanding, the following detailed description of the embodiments will be given with reference to the accompanying drawings:

Introduction of expert knowledge:

Federal learning (FEDERATED LEARNING): a distributed cooperative learning technology is designed to ensure the privacy of terminal data, the data does not need to leave the equipment or data center of the user, the model is updated through aggregation of model parameters, and only the parameters of the model can be transmitted between different equipment without sharing the original data. However, due to the difference of data distribution of the clients, the generalization capability of the model is improved by aggregating the models of the clients, and meanwhile, the individuation of the clients is reduced.

Continuous learning: continuous learning (continual learning) or life-long learning (lifelong learning) is intended to enable neural network learners to acquire experience from previous tasks to adapt to new tasks faster and more efficiently, while overcoming catastrophic forgetfulness, retaining previously acquired knowledge. Currently, continuous learning has made significant progress in different task areas, including continuous supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, and the like. However, applying continuous learning to the federal learning field to solve the federal optimization problem remains a relatively new research field.

Federal element learning: meta-Learning (Meta-Learning) is a machine Learning method that has the main objective of enabling machine Learning models to learn new tasks or fields faster. The core idea of meta-learning is to let the model learn how not only to perform a specific task, but also how to learn. This learning capability enables models to benefit from previous learning experiences when faced with new tasks, thereby adapting and generalizing to new tasks more quickly. A collaborative learning framework, known as federal element learning, is used to implement real-time edge intelligence in Internet of things applications. The framework comprises the steps of training a model across a plurality of source edge nodes by using a federal element learning method, transferring the model to a target edge node by using a small amount of samples so as to quickly adapt to a new task, and fine-tuning the model by using a small amount of data so that the model is more suitable for the data of the client side, thereby enhancing the individuation capability of the model. However, the model may degrade performance for previous tasks as new tasks are learned. This is because traditional machine learning models tend to overlay old knowledge with new knowledge.

Federal continuous learning: caccia et al propose a framework named continuous distillation federal learning (CFeD) to mitigate catastrophic forgetfulness in the federal. CFeD performs knowledge distillation on clients and servers using the unlabeled proxy dataset to mitigate forgetting. In addition, CFeD assign different learning objectives to different customers in order to improve the learning ability of the model. Le et al propose a federal continuous learning scheme FCL-BL (FEDERATED CONTINUOUS LEARNING WITH Broad Network Architecture) to support continuous learning in the Federal Learning (FL) environment. The neurons are widely spread and only parameters related to output are adjusted through linear regression, and a depth architecture is not needed. This simplifies the parameter adjustment and remodeling process, thereby allowing for faster training. Yoon et al proposes a new framework called federally weighted inter-client transmission (FedWeIT) in which each client learns a series of tasks from a private local data stream. The main challenge in this case is to use the knowledge of other clients while preventing interference of irrelevant knowledge and minimizing the communication costs. FedWeIT decompose the network weights into global federal parameters and sparse task specific parameters, allowing each client to receive selective knowledge from other clients through a weighted combination of its task specific parameters. The framework minimizes interference between incompatible tasks and enables aggressive knowledge transfer between clients. While the above method model is alleviating the forgetfulness of knowledge, it is not able to adapt quickly to new tasks, i.e. when new tasks continue to arrive.

Fig. 1 is a schematic diagram of an overall framework of the solution according to the present embodiment, and fig. 2 is a schematic diagram of the solution according to the present embodiment, where the solution according to the present embodiment provides a meta-continuous federal learning method based on fast learning and knowledge accumulation, and the solution includes the following two stages:

(1) Model aggregation phase of server:

The model training starting server firstly initializes a global model and sends the global model to all clients, the model parameters are uploaded to the server after the client training is completed, and the server aggregates the model parameters and distributes the model parameters to the clients for next training.

(2) Model learning stage of client:

It includes two phases of learning:

1) Pre-training stage: the pre-training phase aims to obtain a good parameter in order to accelerate the downstream training process, which is as follows:

a) Loading a pre-trained data set;

wherein the pre-trained data set:

Pretraining the model over the first 1,000 categories of Omniglot dataset;

continuously trained data sets:

With the complete Omniglot dataset and two out-of-distribution datasets: MNIST and FashionMNIST train the model.

B) The constructed tasks are divided into training tasks (TRAIN TASK), test tasks (Test tasks). Specifically, each task includes its own training data and test data, which are called Support Set and Query Set in meta-learning, respectively. N training tasks are prepared, and each training task corresponds to a Support Set and a Query Set. Several test tasks are prepared for evaluating the effect of the parameters learned by Meta-Learning. Both training tasks and testing tasks are generated from the pre-trained dataset;

c) Defining network structure, such as CNN, using global model parameters received by client as initialization parameters of meta network Meta network is the network that is ultimately used to apply to the new test task, which stores "a priori knowledge";

d) The iterative "pre-training" begins to be performed as shown in fig. 3.

A. 1 training task m (or several training tasks of 1 batch, 1 training task being sampled is shown in FIG. 3). Parameters of meta networkAssigning to a network unique to task m, obtaining/>(Initial/>=/>);

Wherein batch (batch) is typically used to update model parameters during the training process. Batch refers to a set of samples used to update model parameters in one training. A small batch (typically of size 32, 64, 128, etc.) of samples is used each time the model parameters are updated.

B. using the Support Set of the task m, 1 optimization update is performed on m based on the learning rate of the task m;

C. Based on 1 time of optimizationTask m's loss-/>, is calculated using Query Set(/>) And calculate/>() Pair/>Is a gradient of (2);

Wherein, Use of parameters of task m for model/>Input/>Obtaining a class probability distribution through a softmax function; /(I)To correspond to input/>A one-hot encoded vector; n is the number of samples in the query set; log represents the natural logarithm.

Calculation ofPair/>The gradient of (a) is typically using a back propagation (Backpropagation) algorithm, assuming/>Is related to model parameters/>Is calculated using a back propagation algorithm,/>For model parameters/>The basic formula for back propagation is as follows: /(I)。

D. Multiplying the gradient by the learning rate of meta network to updateObtain/>(Note that the first arrow with the largest gray value in FIG. 3 is parallel to the second arrow with the next largest gray value, meaning here/>Update and/>The direction of the gradient is consistent

E. Sampling 1 task n, and parameterAssigning a value to task n, obtaining/>(Initial/>=/>);

F. Then training data of the task n is used for training the task based on the learning rate of the task nPerform one-time optimization update/>;

G. Based on 1 time of optimizationTask n's loss-/>, is calculated using Query Set(/>) And calculate/>The gradient of the gradient, and the process is consistent with the step c, so that the process is not repeated here.

H. Multiplying the gradient by the learning rate of the meta networkUpdate middle/>Obtain/>；

I. the process of a-h is repeated on the training task.

2) The training phase continues as shown in fig. 4:

a) Loading a data set of a continuous learning stage;

b) The constructed tasks are divided into training tasks (TRAIN TASK), test tasks (Test tasks). In particular, each task contains its own training data, test data, both training and test tasks being generated from the data set loaded during the continuous learning phase.

C) The network structure of the continuous learning stage is the same as that of the pre-training stage, and parameters after the client performs the pre-training stage are used as initialization parameters of the meta network。

D) Input training data to calculate its lossAnd last loss/>Comparison is performed

E) If it is-/></>It is indicated that no task drift has occurred and the fast-adapting parameters are further fine-tuned and the training data is added to the local buffer. Otherwise, the task drift is indicated, and the data is firstly taken out from the local buffer zone to serve as training data and test data in the continuous learning stage for continuous learning. Training data is used to update the fast adaptation parameters, test data is used to recalculate the learning rate with which to update the local Meta parameters. Resetting the local buffer, resetting its adaptation parameters/>, using the training task of c)。

F) And (3) repeatedly executing the steps a) -e) on the training task.

G) Will ultimately train the parametersAnd sending the data to a server.

Example two

It is an object of this embodiment to provide a meta-continuous federal learning method based on fast learning and knowledge accumulation.

The meta-continuous federal learning method based on fast learning and knowledge accumulation is based on the meta-continuous federal learning system based on fast learning and knowledge accumulation, and comprises the following steps:

The meta-continuous federal learning system and method based on rapid learning and knowledge accumulation provided by the embodiment can be realized, and have wide application prospects.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. A meta-continuous federal learning system based on fast learning and knowledge accumulation, comprising:

The system comprises a client, a local data set and a local buffer area, wherein the client is used for receiving a global model from the server, training the model based on the local data set and returning trained model parameters to the server, the model training comprises pretraining based on federal element learning and continuous training based on federal continuous learning, model initialization is carried out by utilizing the model parameters obtained by pretraining in the continuous training, and in the continuous training process, if task drift does not occur, the current training data is added into the local buffer area; if the task drift occurs, taking the data in the local buffer zone as training data for continuous training process, and resetting the local buffer zone;

The pre-training based on federal element learning comprises a first training task, wherein the first training task is specifically to construct a first training task according to sampled training data, and train a model to be trained based on a training data set corresponding to the first training task; in the first training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; updating initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters;

the pre-training based on federal element learning further comprises a second training task, wherein the second training task specifically comprises: constructing a second training task according to the resampled training data, taking the model parameters obtained by training the first training task as initial model parameters of the current task, and training a model to be trained based on a training data set corresponding to the second training task; in the second training task, calculating the loss of the current training task based on the model parameters obtained by training, and the gradient of the current loss to the model parameters; updating initial parameters of the model based on the gradient and the learning rate of the model parameters to obtain updated model parameters; repeating the first training task and the second training task until the iteration ending condition is met;

The continuous training based on the federal continuous learning is specifically as follows: and initializing a model by utilizing the model parameters obtained by pre-training, training the model based on the data set corresponding to the continuous training to obtain corresponding model parameters, and transmitting the corresponding model parameters to a server side.

2. The meta-continuous federal learning system based on fast learning and knowledge accumulation according to claim 1, wherein the judging whether the task drift occurs or not is specifically performed according to a comparison result of a loss obtained by current round calculation and a loss obtained by previous training calculation in a continuous learning stage, and if a difference value between the two is smaller than a preset threshold value, the task drift is judged not to occur; and if the task drift is not smaller than the preset threshold value, judging that the task drift occurs.

3. The fast learning and knowledge accumulation based meta-continuous federal learning system according to claim 1, wherein when no task drift occurs, current training data is added to a buffer local to the client after fine tuning of model parameters; when a task drift occurs, data is taken from the local buffer as training data for continuous training, and the training data is used to update model parameters, and the local buffer is reset.

4. The meta-continuous federal learning system based on fast learning and knowledge accumulation according to claim 1, wherein the federal continuous learning method specifically adopts Continual-MAML method.

5. A meta-continuous federal learning method based on fast learning and knowledge accumulation, which is based on a meta-continuous federal learning system based on fast learning and knowledge accumulation as claimed in any one of claims 1 to 4, comprising:

The method comprises the steps that a client receives a global model from a server, performs model training based on a local data set, and returns trained model parameters to the server, wherein the model training comprises pretraining based on federal element learning and continuous training based on federal continuous learning, model initialization is performed by using the model parameters obtained by pretraining in the continuous training, and in the continuous training process, if task drift does not occur, current training data are added to a local buffer zone; if the task drift occurs, taking the data in the local buffer zone as training data for continuous training process, and resetting the local buffer zone;