CN117875453A

CN117875453A - Client asynchronous federal learning method with self-adaptive partial training

Info

Publication number: CN117875453A
Application number: CN202410121394.XA
Authority: CN
Inventors: 周起航; 孙倩文; 乔建忠
Original assignee: 东北大学
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-04-12

Abstract

The invention provides a client asynchronous federal learning method with self-adaptive partial training, and relates to the technical field of distributed machine learning. The server randomly selects n clients to broadcast a global model, each client executes complete model training of one data batch to estimate time consumption and calculate a loss value, and the result is transmitted to the server; normalizing the time consumption and the loss value of the client, calculating the comprehensive score of each client, selecting the maximum time consumption in the client with the score of k ranked, and determining the aggregation interval; determining a client part training ratio and a local training period according to the aggregation interval; after the client-side is trained locally, the model parameters are uploaded to the server-side, and the server aggregates the model parameters of the client-side received in the aggregation interval time into a new global model. Repeating the operation until the accuracy of the global model reaches the preset requirement. The invention can fully utilize the computing resources, ensure high participation of important clients and accelerate the aggregation of the global model.

Description

Client asynchronous federal learning method with self-adaptive partial training

Technical Field

The invention relates to the technical field of distributed machine learning, in particular to a client asynchronous federal learning method with self-adaptive partial training.

Background

Currently, federal Learning (FL) has become a promising paradigm of distributed machine learning to protect privacy. The gist of FL is to save the private data of the clients on the device and perform local model training for each client. The central server will collect these locally trained models to update the global model, which is then broadcast to the next round of training. The federal learning not only can ensure the data privacy of each client, but also can create huge social value and economic value.

While federal learning algorithms perform well, most existing federal learning protocols are based on synchronous federal learning training (SyncFL), meaning that in each round of training, all clients (or a selected set of clients) update their local model based on the latest updates broadcast by the server at the beginning of the round. However, due to communication imbalance, differences in hardware capabilities and training data distribution, there may be significant differences in the time consumption of local updates by the various devices, and even some clients may be temporarily disconnected during the training process. Thus, this leaves the server with two less preferred choices: either waiting for all participating clients in each round to complete the local training and make contributions to the model aggregation (which can cause significant delays due to the existence of the stallers) or waiting for only a portion of the faster clients (which will ignore all work and contributions from the slower clients). These key challenges largely hinder the scalability of synchronous federal learning and make it difficult to apply in large-scale cross-device scenarios.

To address these challenges, recent research has proposed asynchronous federal learning (AsyncFL) which allows slower clients to continue to train locally and contribute to future aggregation rounds. The asynccfl separates local training of the client from aggregation/updating of the global model, and only a specific client can acquire the update from the cloud server at the same time, thereby reducing the influence of the lingers. The latest asynccfl working fedbuf suggests that the server performs gradient aggregation when the number of received local updates reaches a required threshold (called an adjustable parameter of the aggregation target) to generate a global model. Even if the slower clients upload their updates later (as long as the local training is done), their updates may not be incorporated into the final model based on outdated information.

Since FedBuff only accepts a fixed number of local updates in each round of communication to contribute to the global model, this reduces the parallel computing efficiency, blocking other completed local updates from entering the global aggregation, making them obsolete updates because they are deferred until the next round of global updates. In addition, server aggregators prefer that faster devices provide more rounds of training, while slower devices may not enjoy the same frequency contribution. Even if slow devices participate in global training, they occasionally send out outdated updates, which may affect the convergence of global losses.

Partial model training may be viewed as an effective way to reduce the communication and computational load of federal learning system clients. For example, fedprun proposes a method based on device performance, pruning the global model for each client, providing a smaller model for slow clients, while fast clients use a larger model for training. FedPT utilizes a partially trainable neural network on the client, reduces communication cost, and achieves faster training with less memory usage and less impact on model performance. Other studies also indicate that partial model training can save communication costs and memory usage in cross-device federal learning. These studies all maintained the partial proportions of a client terminal model throughout the federal learning training process, ignoring the unstable availability of each device during the training process.

Previous studies have indicated that selecting a high-loss-value client in federal learning training to participate in the training helps to accelerate the convergence process of the model. As high loss values may reflect the differences of these client data from other client data. If the data distribution of some clients differs significantly from others, the data may contain information that is particularly beneficial to improving the model. Thus, selecting these clients for complete model training can integrate information specific to these data faster, helping the model adapt to different data distributions faster. Thereby speeding up global model convergence.

Therefore, it is necessary to explore an asynchronous federal learning method of adaptive partial training according to the local loss value and the local computing power of each client. It is expected that the computing resources of the client can be fully utilized, and meanwhile, the client with high loss value can finish complete model training as far as possible, so that the convergence speed of the global model is increased.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides the client asynchronous federal learning method with the self-adaptive partial training, and the partial training ratio of the client and the aggregation interval of the client selected by the server each time are determined according to the local loss value and the computing capacity of the client, so that the computing resources are fully utilized, the high participation degree of important clients is ensured, and the global model aggregation is accelerated.

In order to solve the technical problems, the invention adopts the following technical scheme:

a client asynchronous federation learning method with self-adaptive partial training comprises the following specific steps:

step 1: the server randomly selects n clients and broadcasts a global model, each selected client performs model training of one data batch to estimate time consumption and calculate a loss value of the data batch, and the time consumption and the loss value of the data batch are transmitted to the server together;

step 2: the time consumption and the loss value are in the same range by adopting a regularization method;

step 3: regularizing the time consumption of the client and the local loss value;

step 4: the server calculates the comprehensive scores of all the clients, so that the maximum time consumption in the clients with the scores k before ranking is selected, and the aggregation interval of the global training turn is determined;

step 5: determining a partial training ratio and a local training period of the client according to the aggregation interval;

step 6: after the client-side performs local training, transmitting model parameters to a server-side, and aggregating the model parameters of the client-side received in the aggregation interval time into a new global model by the server-side; and then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.

Further, the specific method of the step 1 is as follows:

step 101: the server selects n clients through a random selection algorithm and broadcasts a global model;

step 102: collecting real-time computation time t in complete model training of each client executing one data batch _cmp Unit calculation timeWhere β is the ratio of the number of lots trained to the total number of lots; calculating local communication time +.>M is the file size of the model, and Bw is the real-time network broadband of the equipment; calculating a local Loss value Loss (x) according to the cross entropy Loss function; loss (x), ->And sending the data to a server.

Further, in the step 2, a regularization method is adopted to make the time consumption and the loss value in the same range, and the regularization formula is as follows:

wherein max _val Represents the maximum value, min, in each client index _val Representing the minimum value in each client-side index;

further, in the step 3, the server calculates the time consumption of each clientAnd normalize it to +.>Normalize the loss value to +.>

Further, in the step 4, the client with high loss defines a score F for the client i according to the client time consumption and the loss value _i The following formula:

λ ₁ +λ ₂ ＝1；

wherein lambda is ₁ 、λ ₂ Importance indicators representing time consumption and loss values;

calculating the average value of all the client scores, and grading F of each client _i Sorting from small to large, comparing the sorting and average value in sequence, and setting the local time consumption of the client k with the score closest to the average value in each client as T _k 。

Further, the specific method in the step 5 is as follows:

step 501: if the time consumption of the client is greater than T _k The server will reduce the partial training ratio alpha _i To reduce the time consumption to be more than T _k Is ensured to be more than T _k The client of (1) can upload model parameters in time, and calculate the partial training ratio alpha of each client _i ；α _i The calculation mode of (a) is as follows:

wherein,calculating time for units of the ith client, < >>For the ith clientCommunication time;

step 502: if the time consumption of the client is less than T _k The server trains part of the client to be less than alpha _i Set to 1 and perform as many training periods E as possible locally _i ；E _i The calculation mode of (a) is as follows:

further, in the step 6, the client with the local training ratio smaller than 1 trains the partial model, freezes the parameters of the partial layer of the global model, and the frozen parameters are only used for forward propagation and not for backward propagation; the client with training ratio equal to 1, according to the local training period E in step 502 _i The computing resources of each client are utilized as much as possible; each client then sends the locally trained model parameters to the server.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the client asynchronous federation learning method with the self-adaptive partial training, the partial training ratio of the client and the aggregation interval of the client selected by the server each time are determined according to the local loss value and the computing capacity of the client, so that computing resources are fully utilized, high participation of important clients is ensured, and global model aggregation is accelerated; the problem of low participation of equipment with poor computing capability of the client can be solved by determining partial training rate and local training period of each client according to the aggregation time interval; the score of each client is calculated through the loss value and the time consumption, so that the time interval is determined, and the client with high loss value, which is helpful to accelerate the convergence of the global model, can be ensured to complete model training as much as possible, so that the global model aggregation time is accelerated.

Drawings

FIG. 1 is a flowchart of a client asynchronous federal learning method with adaptive partial training provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of a client asynchronous federal learning method with adaptive partial training according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a training part model of each client according to an embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

As shown in fig. 1 and 2, the method of the present embodiment is as follows.

Step 1: the server randomly selects n clients and broadcasts a global model, each selected client performs model training of a data batch to estimate its time consumption and calculate its loss value, and transmits its time consumption and loss value to the server.

Step 101: the server selects n clients by a random selection algorithm and broadcasts the global model.

Step 2: after receiving the unit calculation time, the local communication time and the loss value of each client, the server adopts a regularization method to enable the time consumption and the loss value to be in the same range, and a regularization formula is as follows:

wherein max _val Represents the maximum value, min, in each client index _val Representing the minimum of the individual client metrics.

Step 3: the server normalizes the time consumption and the local loss value of each client, and calculates the time consumption of each clientAnd normalize it to +.>Normalize the loss value to +.>For later computing a score for each client;

step 4: the server calculates the comprehensive scores of all the clients, so that the maximum time consumption in the clients with the scores of k ranked top is selected, and the aggregation interval of the global training current turn is determined. Since the client has a high loss value, global model convergence can be accelerated, the client with a high loss value defines a score F for client i according to the client time consumption and the loss value _i The following formula:

λ ₁ +λ ₂ ＝1；

wherein lambda is ₁ 、λ ₂ An importance index representing the time consumption and loss values.

Calculating the average value of all the client scores, and grading F of each client _i Sorting from small to large, comparing the scores sequentially with the average value according to the score sorting, and setting the local time consumption of the client k with the score closest to the average value in each client as T _k 。

Step 5: as shown in fig. 3, the server aggregates time intervals T according to the time consumption of the respective clients _k To determine the partial training ratio and the local training period of the client.

Step 501: if the time consumption of the ith clientGreater than T _k The server will reduce the partial training ratio alpha _i To reduce the time consumption to be more than T _k Is ensured to be more than T _k The client of (1) can upload model parameters in time, and calculate the partial training ratio alpha of each client _i ；α _i The calculation mode of (a) is as follows:

wherein,calculating time for units of the ith client, < >>Is the communication time of the i-th client.

Step 502: if the time consumption of the client is less than T _k The server trains the partial training rate alpha of the client _i Set to 1 and perform as many training periods E as possible locally _i ；E _i The calculation mode of (a) is as follows:

The client with the local training ratio less than 1 trains part of the model, freezes the parameters of part of the layers of the global model, and the frozen parameters are only used for forward propagation and not used for backward propagation, thereby reducing time consumption. The client with training ratio equal to 1, according to the local training period E in step 502 _i As much as possible, its computing resources are utilized. Each client then sends the locally trained model parameters to the server. The server aggregates the model parameters of the clients received within the aggregate interval into a new global model. And then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.

The CIFAR-10 dataset contained 60000 color images of 10 categories. Which includes 50000 training images and 10000 test images. To fit the true non-independent co-distributed data in the federal learning scenario, the two data sets were modeled as 128 devices/clients using a Dirichlet distribution, where α equals 0.1.ResNet-20 is a model suitable for image classification tasks, particularly for smaller image datasets, such as CIFAR-10. The data set was evaluated using the ResNet-20 model. Compared with the traditional method FedBuff, when the FedAVg polymerization function is used, the method of the embodiment takes about 5.5 hours when the global model accuracy reaches 60%, and the traditional method FedBuff takes about 7.8 hours. The method of the embodiment can save the total spending time by 30% when the accuracy of the global model reaches 60%, and compared with the traditional method, the method has obvious improvement.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims

1. An asynchronous federation learning method of a client with adaptive partial training, characterized in that: the method comprises the following specific steps:

step 6: after the client-side performs local training, the model parameters are transmitted to the server-side, and the server aggregates the model parameters of the client-side received in the aggregation interval time into a new global model; and then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.

2. The client asynchronous federal learning method with adaptive partial training of claim 1, wherein: the specific method of the step 1 is as follows:

3. The client asynchronous federal learning method with adaptive partial training of claim 1, wherein: in the step 2, a regularization method is adopted to enable the time consumption and the loss value to be in the same range, and a regularization formula is as follows:

4. The client asynchronous federal learning method with adaptive partial training of claim 2, wherein: in the step 3, the server calculates the time consumption of each clientAnd normalize it toNormalize the loss value to +.>

5. The client asynchronous federal learning method with adaptive partial training of claim 4, wherein: said step 4In which a client with a high loss value defines a score F for client i based on client time consumption and loss value _i The following formula:

λ ₁ +λ ₂ ＝1；

6. The client asynchronous federal learning method with adaptive partial training of claim 5, wherein: the specific method in the step 5 is as follows:

wherein,calculating time for units of the ith client, < >>Is the ithCommunication time of the client;

7. the client asynchronous federal learning method with adaptive partial training of claim 6, wherein: in the step 6, the client with the local training ratio smaller than 1 trains part of the model, freezes the parameters of part of the layers of the global model, and the frozen parameters are only used for forward propagation but not for backward propagation; the client with training ratio equal to 1, according to the local training period E in step 502 _i The computing resources of each client are utilized as much as possible; each client then sends the locally trained model parameters to the server.