CN117875453A - Client asynchronous federal learning method with self-adaptive partial training - Google Patents

Client asynchronous federal learning method with self-adaptive partial training Download PDF

Info

Publication number
CN117875453A
CN117875453A CN202410121394.XA CN202410121394A CN117875453A CN 117875453 A CN117875453 A CN 117875453A CN 202410121394 A CN202410121394 A CN 202410121394A CN 117875453 A CN117875453 A CN 117875453A
Authority
CN
China
Prior art keywords
client
training
server
model
time consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410121394.XA
Other languages
Chinese (zh)
Inventor
周起航
孙倩文
乔建忠
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202410121394.XA priority Critical patent/CN117875453A/en
Publication of CN117875453A publication Critical patent/CN117875453A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention provides a client asynchronous federal learning method with self-adaptive partial training, and relates to the technical field of distributed machine learning. The server randomly selects n clients to broadcast a global model, each client executes complete model training of one data batch to estimate time consumption and calculate a loss value, and the result is transmitted to the server; normalizing the time consumption and the loss value of the client, calculating the comprehensive score of each client, selecting the maximum time consumption in the client with the score of k ranked, and determining the aggregation interval; determining a client part training ratio and a local training period according to the aggregation interval; after the client-side is trained locally, the model parameters are uploaded to the server-side, and the server aggregates the model parameters of the client-side received in the aggregation interval time into a new global model. Repeating the operation until the accuracy of the global model reaches the preset requirement. The invention can fully utilize the computing resources, ensure high participation of important clients and accelerate the aggregation of the global model.

Description

Client asynchronous federal learning method with self-adaptive partial training
Technical Field
The invention relates to the technical field of distributed machine learning, in particular to a client asynchronous federal learning method with self-adaptive partial training.
Background
Currently, federal Learning (FL) has become a promising paradigm of distributed machine learning to protect privacy. The gist of FL is to save the private data of the clients on the device and perform local model training for each client. The central server will collect these locally trained models to update the global model, which is then broadcast to the next round of training. The federal learning not only can ensure the data privacy of each client, but also can create huge social value and economic value.
While federal learning algorithms perform well, most existing federal learning protocols are based on synchronous federal learning training (SyncFL), meaning that in each round of training, all clients (or a selected set of clients) update their local model based on the latest updates broadcast by the server at the beginning of the round. However, due to communication imbalance, differences in hardware capabilities and training data distribution, there may be significant differences in the time consumption of local updates by the various devices, and even some clients may be temporarily disconnected during the training process. Thus, this leaves the server with two less preferred choices: either waiting for all participating clients in each round to complete the local training and make contributions to the model aggregation (which can cause significant delays due to the existence of the stallers) or waiting for only a portion of the faster clients (which will ignore all work and contributions from the slower clients). These key challenges largely hinder the scalability of synchronous federal learning and make it difficult to apply in large-scale cross-device scenarios.
To address these challenges, recent research has proposed asynchronous federal learning (AsyncFL) which allows slower clients to continue to train locally and contribute to future aggregation rounds. The asynccfl separates local training of the client from aggregation/updating of the global model, and only a specific client can acquire the update from the cloud server at the same time, thereby reducing the influence of the lingers. The latest asynccfl working fedbuf suggests that the server performs gradient aggregation when the number of received local updates reaches a required threshold (called an adjustable parameter of the aggregation target) to generate a global model. Even if the slower clients upload their updates later (as long as the local training is done), their updates may not be incorporated into the final model based on outdated information.
Since FedBuff only accepts a fixed number of local updates in each round of communication to contribute to the global model, this reduces the parallel computing efficiency, blocking other completed local updates from entering the global aggregation, making them obsolete updates because they are deferred until the next round of global updates. In addition, server aggregators prefer that faster devices provide more rounds of training, while slower devices may not enjoy the same frequency contribution. Even if slow devices participate in global training, they occasionally send out outdated updates, which may affect the convergence of global losses.
Partial model training may be viewed as an effective way to reduce the communication and computational load of federal learning system clients. For example, fedprun proposes a method based on device performance, pruning the global model for each client, providing a smaller model for slow clients, while fast clients use a larger model for training. FedPT utilizes a partially trainable neural network on the client, reduces communication cost, and achieves faster training with less memory usage and less impact on model performance. Other studies also indicate that partial model training can save communication costs and memory usage in cross-device federal learning. These studies all maintained the partial proportions of a client terminal model throughout the federal learning training process, ignoring the unstable availability of each device during the training process.
Previous studies have indicated that selecting a high-loss-value client in federal learning training to participate in the training helps to accelerate the convergence process of the model. As high loss values may reflect the differences of these client data from other client data. If the data distribution of some clients differs significantly from others, the data may contain information that is particularly beneficial to improving the model. Thus, selecting these clients for complete model training can integrate information specific to these data faster, helping the model adapt to different data distributions faster. Thereby speeding up global model convergence.
Therefore, it is necessary to explore an asynchronous federal learning method of adaptive partial training according to the local loss value and the local computing power of each client. It is expected that the computing resources of the client can be fully utilized, and meanwhile, the client with high loss value can finish complete model training as far as possible, so that the convergence speed of the global model is increased.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides the client asynchronous federal learning method with the self-adaptive partial training, and the partial training ratio of the client and the aggregation interval of the client selected by the server each time are determined according to the local loss value and the computing capacity of the client, so that the computing resources are fully utilized, the high participation degree of important clients is ensured, and the global model aggregation is accelerated.
In order to solve the technical problems, the invention adopts the following technical scheme:
a client asynchronous federation learning method with self-adaptive partial training comprises the following specific steps:
step 1: the server randomly selects n clients and broadcasts a global model, each selected client performs model training of one data batch to estimate time consumption and calculate a loss value of the data batch, and the time consumption and the loss value of the data batch are transmitted to the server together;
step 2: the time consumption and the loss value are in the same range by adopting a regularization method;
step 3: regularizing the time consumption of the client and the local loss value;
step 4: the server calculates the comprehensive scores of all the clients, so that the maximum time consumption in the clients with the scores k before ranking is selected, and the aggregation interval of the global training turn is determined;
step 5: determining a partial training ratio and a local training period of the client according to the aggregation interval;
step 6: after the client-side performs local training, transmitting model parameters to a server-side, and aggregating the model parameters of the client-side received in the aggregation interval time into a new global model by the server-side; and then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.
Further, the specific method of the step 1 is as follows:
step 101: the server selects n clients through a random selection algorithm and broadcasts a global model;
step 102: collecting real-time computation time t in complete model training of each client executing one data batch cmp Unit calculation timeWhere β is the ratio of the number of lots trained to the total number of lots; calculating local communication time +.>M is the file size of the model, and Bw is the real-time network broadband of the equipment; calculating a local Loss value Loss (x) according to the cross entropy Loss function; loss (x), ->And sending the data to a server.
Further, in the step 2, a regularization method is adopted to make the time consumption and the loss value in the same range, and the regularization formula is as follows:
wherein max val Represents the maximum value, min, in each client index val Representing the minimum value in each client-side index;
further, in the step 3, the server calculates the time consumption of each clientAnd normalize it to +.>Normalize the loss value to +.>
Further, in the step 4, the client with high loss defines a score F for the client i according to the client time consumption and the loss value i The following formula:
λ 12 =1;
wherein lambda is 1 、λ 2 Importance indicators representing time consumption and loss values;
calculating the average value of all the client scores, and grading F of each client i Sorting from small to large, comparing the sorting and average value in sequence, and setting the local time consumption of the client k with the score closest to the average value in each client as T k
Further, the specific method in the step 5 is as follows:
step 501: if the time consumption of the client is greater than T k The server will reduce the partial training ratio alpha i To reduce the time consumption to be more than T k Is ensured to be more than T k The client of (1) can upload model parameters in time, and calculate the partial training ratio alpha of each client i ;α i The calculation mode of (a) is as follows:
wherein,calculating time for units of the ith client, < >>For the ith clientCommunication time;
step 502: if the time consumption of the client is less than T k The server trains part of the client to be less than alpha i Set to 1 and perform as many training periods E as possible locally i ;E i The calculation mode of (a) is as follows:
further, in the step 6, the client with the local training ratio smaller than 1 trains the partial model, freezes the parameters of the partial layer of the global model, and the frozen parameters are only used for forward propagation and not for backward propagation; the client with training ratio equal to 1, according to the local training period E in step 502 i The computing resources of each client are utilized as much as possible; each client then sends the locally trained model parameters to the server.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the client asynchronous federation learning method with the self-adaptive partial training, the partial training ratio of the client and the aggregation interval of the client selected by the server each time are determined according to the local loss value and the computing capacity of the client, so that computing resources are fully utilized, high participation of important clients is ensured, and global model aggregation is accelerated; the problem of low participation of equipment with poor computing capability of the client can be solved by determining partial training rate and local training period of each client according to the aggregation time interval; the score of each client is calculated through the loss value and the time consumption, so that the time interval is determined, and the client with high loss value, which is helpful to accelerate the convergence of the global model, can be ensured to complete model training as much as possible, so that the global model aggregation time is accelerated.
Drawings
FIG. 1 is a flowchart of a client asynchronous federal learning method with adaptive partial training provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a client asynchronous federal learning method with adaptive partial training according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a training part model of each client according to an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
As shown in fig. 1 and 2, the method of the present embodiment is as follows.
Step 1: the server randomly selects n clients and broadcasts a global model, each selected client performs model training of a data batch to estimate its time consumption and calculate its loss value, and transmits its time consumption and loss value to the server.
Step 101: the server selects n clients by a random selection algorithm and broadcasts the global model.
Step 102: collecting real-time computation time t in complete model training of each client executing one data batch cmp Unit calculation timeWhere β is the ratio of the number of lots trained to the total number of lots; calculating local communication time +.>M is the file size of the model, and Bw is the real-time network broadband of the equipment; calculating a local Loss value Loss (x) according to the cross entropy Loss function; loss (x), ->And sending the data to a server.
Step 2: after receiving the unit calculation time, the local communication time and the loss value of each client, the server adopts a regularization method to enable the time consumption and the loss value to be in the same range, and a regularization formula is as follows:
wherein max val Represents the maximum value, min, in each client index val Representing the minimum of the individual client metrics.
Step 3: the server normalizes the time consumption and the local loss value of each client, and calculates the time consumption of each clientAnd normalize it to +.>Normalize the loss value to +.>For later computing a score for each client;
step 4: the server calculates the comprehensive scores of all the clients, so that the maximum time consumption in the clients with the scores of k ranked top is selected, and the aggregation interval of the global training current turn is determined. Since the client has a high loss value, global model convergence can be accelerated, the client with a high loss value defines a score F for client i according to the client time consumption and the loss value i The following formula:
λ 12 =1;
wherein lambda is 1 、λ 2 An importance index representing the time consumption and loss values.
Calculating the average value of all the client scores, and grading F of each client i Sorting from small to large, comparing the scores sequentially with the average value according to the score sorting, and setting the local time consumption of the client k with the score closest to the average value in each client as T k
Step 5: as shown in fig. 3, the server aggregates time intervals T according to the time consumption of the respective clients k To determine the partial training ratio and the local training period of the client.
Step 501: if the time consumption of the ith clientGreater than T k The server will reduce the partial training ratio alpha i To reduce the time consumption to be more than T k Is ensured to be more than T k The client of (1) can upload model parameters in time, and calculate the partial training ratio alpha of each client i ;α i The calculation mode of (a) is as follows:
wherein,calculating time for units of the ith client, < >>Is the communication time of the i-th client.
Step 502: if the time consumption of the client is less than T k The server trains the partial training rate alpha of the client i Set to 1 and perform as many training periods E as possible locally i ;E i The calculation mode of (a) is as follows:
step 6: after the client-side performs local training, transmitting model parameters to a server-side, and aggregating the model parameters of the client-side received in the aggregation interval time into a new global model by the server-side; and then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.
The client with the local training ratio less than 1 trains part of the model, freezes the parameters of part of the layers of the global model, and the frozen parameters are only used for forward propagation and not used for backward propagation, thereby reducing time consumption. The client with training ratio equal to 1, according to the local training period E in step 502 i As much as possible, its computing resources are utilized. Each client then sends the locally trained model parameters to the server. The server aggregates the model parameters of the clients received within the aggregate interval into a new global model. And then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.
The CIFAR-10 dataset contained 60000 color images of 10 categories. Which includes 50000 training images and 10000 test images. To fit the true non-independent co-distributed data in the federal learning scenario, the two data sets were modeled as 128 devices/clients using a Dirichlet distribution, where α equals 0.1.ResNet-20 is a model suitable for image classification tasks, particularly for smaller image datasets, such as CIFAR-10. The data set was evaluated using the ResNet-20 model. Compared with the traditional method FedBuff, when the FedAVg polymerization function is used, the method of the embodiment takes about 5.5 hours when the global model accuracy reaches 60%, and the traditional method FedBuff takes about 7.8 hours. The method of the embodiment can save the total spending time by 30% when the accuracy of the global model reaches 60%, and compared with the traditional method, the method has obvious improvement.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (7)

1. An asynchronous federation learning method of a client with adaptive partial training, characterized in that: the method comprises the following specific steps:
step 1: the server randomly selects n clients and broadcasts a global model, each selected client performs model training of one data batch to estimate time consumption and calculate a loss value of the data batch, and the time consumption and the loss value of the data batch are transmitted to the server together;
step 2: the time consumption and the loss value are in the same range by adopting a regularization method;
step 3: regularizing the time consumption of the client and the local loss value;
step 4: the server calculates the comprehensive scores of all the clients, so that the maximum time consumption in the clients with the scores k before ranking is selected, and the aggregation interval of the global training turn is determined;
step 5: determining a partial training ratio and a local training period of the client according to the aggregation interval;
step 6: after the client-side performs local training, the model parameters are transmitted to the server-side, and the server aggregates the model parameters of the client-side received in the aggregation interval time into a new global model; and then repeating the operations from the step 1 to the step 6 until the precision of the global model reaches the preset requirement.
2. The client asynchronous federal learning method with adaptive partial training of claim 1, wherein: the specific method of the step 1 is as follows:
step 101: the server selects n clients through a random selection algorithm and broadcasts a global model;
step 102: collecting real-time computation time t in complete model training of each client executing one data batch cmp Unit calculation timeWhere β is the ratio of the number of lots trained to the total number of lots; calculating local communication time +.>M is the file size of the model, and Bw is the real-time network broadband of the equipment; calculating a local Loss value Loss (x) according to the cross entropy Loss function; loss (x), ->And sending the data to a server.
3. The client asynchronous federal learning method with adaptive partial training of claim 1, wherein: in the step 2, a regularization method is adopted to enable the time consumption and the loss value to be in the same range, and a regularization formula is as follows:
wherein max val Represents the maximum value, min, in each client index val Representing the minimum of the individual client metrics.
4. The client asynchronous federal learning method with adaptive partial training of claim 2, wherein: in the step 3, the server calculates the time consumption of each clientAnd normalize it toNormalize the loss value to +.>
5. The client asynchronous federal learning method with adaptive partial training of claim 4, wherein: said step 4In which a client with a high loss value defines a score F for client i based on client time consumption and loss value i The following formula:
λ 12 =1;
wherein lambda is 1 、λ 2 Importance indicators representing time consumption and loss values;
calculating the average value of all the client scores, and grading F of each client i Sorting from small to large, comparing the sorting and average value in sequence, and setting the local time consumption of the client k with the score closest to the average value in each client as T k
6. The client asynchronous federal learning method with adaptive partial training of claim 5, wherein: the specific method in the step 5 is as follows:
step 501: if the time consumption of the client is greater than T k The server will reduce the partial training ratio alpha i To reduce the time consumption to be more than T k Is ensured to be more than T k The client of (1) can upload model parameters in time, and calculate the partial training ratio alpha of each client i ;α i The calculation mode of (a) is as follows:
wherein,calculating time for units of the ith client, < >>Is the ithCommunication time of the client;
step 502: if the time consumption of the client is less than T k The server trains part of the client to be less than alpha i Set to 1 and perform as many training periods E as possible locally i ;E i The calculation mode of (a) is as follows:
7. the client asynchronous federal learning method with adaptive partial training of claim 6, wherein: in the step 6, the client with the local training ratio smaller than 1 trains part of the model, freezes the parameters of part of the layers of the global model, and the frozen parameters are only used for forward propagation but not for backward propagation; the client with training ratio equal to 1, according to the local training period E in step 502 i The computing resources of each client are utilized as much as possible; each client then sends the locally trained model parameters to the server.
CN202410121394.XA 2024-01-29 2024-01-29 Client asynchronous federal learning method with self-adaptive partial training Pending CN117875453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410121394.XA CN117875453A (en) 2024-01-29 2024-01-29 Client asynchronous federal learning method with self-adaptive partial training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410121394.XA CN117875453A (en) 2024-01-29 2024-01-29 Client asynchronous federal learning method with self-adaptive partial training

Publications (1)

Publication Number Publication Date
CN117875453A true CN117875453A (en) 2024-04-12

Family

ID=90590168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410121394.XA Pending CN117875453A (en) 2024-01-29 2024-01-29 Client asynchronous federal learning method with self-adaptive partial training

Country Status (1)

Country Link
CN (1) CN117875453A (en)

Similar Documents

Publication Publication Date Title
CN113112027B (en) Federal learning method based on dynamic adjustment of model aggregation weight
CN111160525B (en) Task unloading intelligent decision-making method based on unmanned aerial vehicle group in edge computing environment
CN111382844B (en) Training method and device for deep learning model
CN113206887A (en) Method for accelerating federal learning aiming at data and equipment isomerism under edge calculation
CN114584581A (en) Federal learning system and federal learning training method for smart city Internet of things and letter fusion
CN115587633A (en) Personalized federal learning method based on parameter layering
CN114169543B (en) Federal learning method based on model staleness and user participation perception
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN116363449A (en) Image recognition method based on hierarchical federal learning
CN116471286A (en) Internet of things data sharing method based on block chain and federal learning
CN113691594A (en) Method for solving data imbalance problem in federal learning based on second derivative
CN117994635B (en) Federal element learning image recognition method and system with enhanced noise robustness
CN115544873A (en) Training efficiency and personalized effect quantitative evaluation method for personalized federal learning
CN115130683A (en) Asynchronous federal learning method and system based on multi-agent model
Zhou et al. Fedaca: An adaptive communication-efficient asynchronous framework for federated learning
Yan et al. Federated learning model training method based on data features perception aggregation
CN109413746B (en) Optimized energy distribution method in communication system powered by hybrid energy
CN117875453A (en) Client asynchronous federal learning method with self-adaptive partial training
CN116776997A (en) Federal learning model construction method under non-independent co-distribution environment
CN114401192B (en) Multi-SDN controller cooperative training method
Zhang et al. Optimizing federated edge learning on Non-IID data via neural architecture search
CN115695429A (en) Non-IID scene-oriented federal learning client selection method
CN114118444A (en) Method for reducing equipment idle running time in federal learning by using heuristic algorithm
CN117811846B (en) Network security detection method, system, equipment and medium based on distributed system
CN117812564B (en) Federal learning method, device, equipment and medium applied to Internet of vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination