CN117221122B - Asynchronous layered joint learning training method based on bandwidth pre-allocation - Google Patents
Asynchronous layered joint learning training method based on bandwidth pre-allocation Download PDFInfo
- Publication number
- CN117221122B CN117221122B CN202311172306.0A CN202311172306A CN117221122B CN 117221122 B CN117221122 B CN 117221122B CN 202311172306 A CN202311172306 A CN 202311172306A CN 117221122 B CN117221122 B CN 117221122B
- Authority
- CN
- China
- Prior art keywords
- client
- training
- clients
- server
- edge server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000002776 aggregation Effects 0.000 claims abstract description 89
- 238000004220 aggregation Methods 0.000 claims abstract description 89
- 238000004891 communication Methods 0.000 claims abstract description 9
- 230000008859 change Effects 0.000 claims abstract 2
- 230000008569 process Effects 0.000 claims description 25
- 230000006870 function Effects 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000004931 aggregating effect Effects 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000013459 approach Methods 0.000 claims description 4
- 230000001788 irregular Effects 0.000 claims description 3
- 230000003321 amplification Effects 0.000 claims description 2
- 238000011835 investigation Methods 0.000 claims description 2
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 10
- 238000002474 experimental method Methods 0.000 description 10
- 230000001360 synchronised effect Effects 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Computer And Data Communications (AREA)
Abstract
The invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which comprises the following steps: when training is started, the cloud server selects a corresponding number of clients in the range of each edge server, and distributes the latest model parameters to the clients; the client performs multiple iterations on the model parameters by utilizing own data, selects the nearest edge server with the residual bandwidth according to the current position of the client to upload the model parameters when training is finished, and reasonably distributes resources according to the bandwidth condition to accelerate the uploading speed; each edge server at the lapse of a certain time interval T aggregation Then carrying out one round of edge aggregation, and uploading the aggregated parameters to a cloud server for one round of cloud aggregation; after cloud aggregation, the cloud server performs selection of the next round of clients. The invention not only adapts to the change of data distribution among the participants in a dynamic scene, but also fully utilizes limited communication resources, thereby improving the training effect.
Description
Technical Field
The invention relates to an asynchronous layered joint learning training method based on bandwidth pre-allocation, and belongs to the technical field of federal learning model training.
Background
The advent of digital technology has driven significant advances in a variety of revolutionary technologies such as big data and artificial intelligence. Machine learning driven mobile applications are revolutionizing various aspects of modern life. However, machine learning training tasks typically require a large amount of data from various terminals that have different computing capabilities. The traditional method is to upload data to a remote cloud server for processing, but the method has the challenges of privacy invasion, network congestion, transmission delay and the like, and the full utilization of the data is prevented.
In 2016, google institute proposed the concept of federal learning, aiming at alleviating network bandwidth constraints and solving the vulnerability of data privacy. Federal learning is a collaborative training and sharing method that eliminates the need to access raw data, conforms to decentralized collection and data minimization principles, and updates are shared with a central server or coordinator. This decentralized approach follows the data minimization principle, eliminating the need for centralized data collection. The original data is still securely stored on the individual device and cannot be accessed directly. Federal learning enables local model training on each device while only uploading aggregated updates or model parameters to a central server.
Hierarchical federal learning is a federal learning framework in which edge servers are typically deployed on base stations. In this framework, edge servers are deployed on base stations, acting as intermediate stations between mobile devices and cloud servers. These edge servers facilitate aggregation of local models of neighboring devices received from edges. By enabling the cloud server to effectively process data from more terminals, hierarchical federal learning successfully addresses the challenges of uploading data to the cloud.
However, the standard hierarchical federal learning framework employs a synchronous aggregation of global models, in which the server waits for all client parameters to be uploaded before proceeding to global aggregation. This synchronization approach results in a "dequeue effect", i.e. the delay of the global aggregation is determined by the client of the slowest upload parameters, resulting in an increase of the overall training process delay. In addition, delayed aggregation of client parameters prevents convergence of the global model, possibly affecting the accuracy and performance of the training model.
Therefore, there is a need to propose an asynchronous hierarchical federal learning framework that eliminates the need for servers to wait for all client parameters to be uploaded before proceeding with global aggregation, to shorten the federal learning training delay.
Disclosure of Invention
In order to solve the problems, the invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which has the following specific technical scheme:
an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation comprises the following steps:
step 1: cloud server selecting client
In the step, the cloud server is responsible for coordinating and managing the joint learning task, the cloud server selects a corresponding number of clients from the range of each edge server according to a strategy, if the number of selectable clients is more than the required clients, the best client is selected to participate in the next round of training according to the quality and the computing power of the local training data of the clients, and model parameters are sent to the clients;
step2: client local training
After the selected client receives the model parameters, model training is locally executed, and each client executes an optimization algorithm according to the task type and the model architecture so as to update the parameters of the local model, wherein the process can be iterated for a plurality of times, and the model parameters are updated in each iteration;
step3: local parameter upload
After iterating the local model to the preset precision, the client selects the nearest edge server with the residual bandwidth for association according to the current geographic position, and then uploads the local model parameters to the edge server;
step4: edge aggregation
Model parameter aggregation is a federal learning method, and a better global model is generated by averaging, weighted averaging or other aggregation strategies of parameters from different clients; unlike other hierarchical federal learning frameworks, the edge server does not need to collect and aggregate model parameters sent by all associated clients, but rather, aggregates model parameters every time a period of time T passes aggregation For alreadyThe collected model parameters are polymerized; considering that the collected model parameters may be outdated data generated by the previous training, the model parameters are weighted according to the time when the parameters start to iterate locally, and the earlier parameters are weighted smaller; after the edges are aggregated, each edge server sends the aggregated model parameters to the cloud server;
step 5: cloud aggregation
Every time an elapsed time T aggregation And (3) the cloud server receives model parameters sent by all edge servers at the same time, aggregates the model parameters, stops asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, and returns to the step (1) otherwise.
Further, in the step 1, when the client has a large-scale movement in the training process, the movement condition of the client selected by each round of cloud server in the training process cannot be predicted in advance, and the clients are associated to different edge servers when the training is finished; because the bandwidth of each edge server is limited, too many associated clients can cause that some clients can not successfully upload model parameters, and less clients can cause that the quality of the edge model aggregation is poor, so that a corresponding number of clients need to be selected in the range of each edge server when the clients are selected before training;
the specific method for selecting the client comprises the following steps:
the movement of the client in the training process can find a historical statistical rule, for example, from the client of a residential area under the scope of an edge server A, the situation that the client stays in the place, goes to a business area under the scope of an edge server B or goes to a company working under the scope of an edge server C in the training process can find corresponding probability distribution, in the situation, the transition probability from each edge server to another edge server in the training process can be represented by a matrix, the matrix of the client transition situation in the scope of a cloud server is obtained by actual investigation, and the number of clients which are initially selected under each edge server by the cloud server is calculated according to the matrix and the number of the clients which need to be associated with each edge server, so that the number of the clients which are associated with each edge server when the training is finished is uniform;
as shown in the matrix, the elements of row j and column represent the probability that the client will transition from edge server j range to edge server l range during the training process,
initial bandwidth allocation ratio beta for each training-participating client device The cloud server allocates the ratio according to the residual bandwidth of each edge server jAssociation->The number of clients to be associated with the next round of the |K| edge servers is set as (n) 1 ,n 2 ,n 3 ,...,n |K| ) Cloud server initially allocates the number of clients x= (X) under |k| edge servers 1 ,x 2 ,x 3 ,…,x |K| ) Let xm= (y) 1 ,y 2 ,y 3 ,...,y |K| ) The following optimization problems are obtained:
S.t.for->For the number of clients remaining under server j,
as a general constraint problem, a solution is made by the multiplier method-PHR algorithm, given an initial point (n 1 ,n 2 ,n 3 ,...,n |K| ) Penalty factor sigma, amplification factor c 1 >1、Control error ε > 0, constant θ ε (0, 1), let k=1, solve the process as follows:
step 1 at x k-1 For an initial point, solve the unconstrained problem:
obtaining the optimal solution x k ;
Step2 ifThen x k Stopping for the optimal solution; otherwise, turning to Step 3;
step3: if it isTurning to Step 4; no make sigma k+1 =cσ k Turning to Step 4;
step4, correcting the multiplier vector:
(λ k+1 ) 1 =(λ k ) 1 -σc 1 (x k )
(λ k+1 ) i =max[0,(λ k ) i -σc i (x k )],i=2,3,...,|2K+1|,
let k=k+1, turn Step 2;
c i (x) For the ith constraint condition, the 1 st constraint is the equality constraint, the 2 nd to 2|K |+1 nd constraint is the inequality constraint, iteration is carried out according to the algorithm, the approximate optimal solution is obtained, the precision epsilon is not required to be very high, and the solved X is rounded to an integer; the optimal solution is the optimal number of clients which need to be selected in the range of each edge server by the cloud server.
Further, in step2, the client participating in training receives the model parameters transmitted from the cloud server, and the client i uses its own data set D i To solve the optimal model parameter omega to representSo that the loss functionMinimum, the loss function on client i is expressed as:the objective is to find the optimization model parameters ω that minimize the loss function * =argmin[F i (ω)];
The problem is hardly solvable, the client needs to perform gradient descent in multiple iterations to gradually approach the optimal solution, in order to reach a predetermined local accuracy θ e (0, 1), the client needs to perform multiple rounds of local iterations L (θ) =μ log (1/θ), where the constant μ depends on the size of the training task, n LocalTraining The round local iteration is expressed as:iterating until->When the local training is completed, wherein eta is the learning rate;
calculation time delay of model training on client and iteration number L (theta) required by reaching accuracy, and calculation capability P of client i And the size |D of the training data set i Correlation, expressed as:where computing power P is the number of samples processed by the client over a period of time. The size of the training data set can be set in advance, and the calculation time delay can be effectively reduced when a client with strong calculation capability is selected.
Further, in the step3, after the client performs local training to reach a predetermined accuracy, the model parameters are immediately uploaded to the edge servers, and under the framework of asynchronous hierarchical federal learning based on bandwidth pre-allocation, the client i is not initially associated with the corresponding edge server, but calculates the distance s from each edge server j according to the current position when the training is finished ij From there is still residual bandwidth, i.eIs closest to the server selection, mins ij Is associated with the edge server j of (c) and the model parameters are uploaded to the edge server,
the initial bandwidth allocation ratio of the associated edge server to all clients is beta ji The parameter upload rate of the client i is:wherein h is i Is the channel gain of client i, N 0 Is noise (I)>The received power for client i for edge server j is denoted +.>Wherein p is j For the maximum receiving power of the edge server j, c is a constant, the uploading delay of the client parameter is +.>Wherein |d i And I is the size of the model parameters to be uploaded.
Further, in step4, in the algorithm of synchronous aggregation, the cloud server needs to receive all the parameters of the clients participating in training and then perform a round of global aggregation, so that the delay of global aggregation is determined by the last client for completing training and uploading the parameters to cause a "dequeue effect", the difference between the local data set size and the computing performance of the clients is large, the clients selected by the initial cloud server are not associated with the edge servers, so that the mode of synchronous aggregation of the edge servers may result in delay not waiting for uploading to the corresponding clients, the model training progress is seriously affected, and each edge server passes a period of time T by adopting the mode of asynchronous aggregation of the edges aggregation Aggregating the collected model parameters, and uploading the aggregated parameters to a cloud server。
Further, when the asynchronous aggregation mode is adopted, the model parameters uploaded by the clients may be outdated, so that the old function of each client in edge aggregation and the number n of cloud aggregation rounds already performed when the client receives the model are performed round And the latest number of rounds of cloud aggregation n CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as:wherein S is j Represented as a set of clients participating in this edge aggregation.
Further, in step 5, considering that the edge server and the cloud server have strong communication capability, the communication delay from the uploading parameter of the edge server to the cloud server is negligible, and the cloud server immediately performs a round of cloud aggregation after receiving the parameter uploaded by the edge server;
also, because the quality of the parameters received by the cloud aggregation is irregular, the obsolete function needs to be introduced as a super parameter to reduce the influence of the obsolete model on the global model training, and the obsolete function of each edge server uploading parameter is related to the overall obsolete degree of the participating clients when the edge servers are aggregated in a round, the method can be simply set as:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j s Aggregation of data for clients that have participated in trainingAnd (3) aggregating the model parameters, stopping asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, otherwise, returning to the step (1).
Furthermore, when the communication capacity of the edge server and the cloud server is strong, the edge server does not perform edge aggregation after receiving the parameters uploaded by the client, but directly sends the parameters to the cloud server for cloud aggregation, so that the influence of the staleness function on the model parameter weight is more accurate.
Further, when the cloud server selects the client, the calculation performance of the client needs to be considered to reduce the local calculation time delay; the data quality of the data set on the client side enables the model to achieve a better local training effect, because the data with representativeness and diversity can improve the model training effect, the following technical method is adopted for this purpose: when the number of clients capable of participating in training is more than one round of clients required in training, comprehensively considering the calculation performance of the clients and the entropy weight of data, and selecting the optimal client to participate in the next round of training under the scope of each edge server;
the data quality of the data set on the client is defined by entropy weighting, m samples are extracted from client i,1,2,…,k i the attribute feature index 1,2, …, m representing the client i is the sample index, +.>Values normalized for data attributes, +.>The entropy weight assigned to the data attribute,information entropy for each data attribute;
under each server j, chooseThe optimal computing power P and local model quality Q of the client are set, P and Q are smaller in multiple training rounds under the same client, and the overall strategy is that each edge server preferentially selects from the trained client set N selected Selecting a client with larger comprehensive capacity phi=gamma P+ (1-gamma) Q value, and remaining needed clients from a training client total set N sum Is set in the range of the edge server j>The specific steps of the client side and the parameter delta are as follows:
s1: update |N selected I, if I N selected I is greater than the set threshold θ client Update |N selected delta|N with largest phi value in| selected I clients join N best Then select all of N best Clients in and within range of edge server jA plurality of;
s2: if it isThen at N sum -N selected In and conform to s ji <R j The condition, i.e. randomly choose +.>Clients, which calculate phi and add N selected ,N selected All the clients in the training system are the clients participating in the training of the next round.
Further, after the cloud server selects the number of clients to perform edge aggregation and parameter uploading on one edge server, the edge server is sure to have residual bandwidth, other edge servers may also have residual bandwidth, because a part of the computing power is stronger, the clients which are closer to the edge server have completed local training, upload the parameters and release the bandwidth, and the part of the bandwidth is also distributed to the clients which participate in the training in the next round by the cloud server;
after the cloud server selects the clients, the residual bandwidth ratio of all edge servers is set to 0, and in consideration of the fact that some clients cannot complete training and release bandwidth for a long time due to abnormality, model parameters of the clients after the completion of training are also outdated, so that the model parameters are set in n overtime After round cloud aggregation, the allocated but unreleased bandwidth is automatically released for parameter uploading of the next round of clients.
The beneficial effects of the invention are as follows:
compared with the prior art, the invention has the beneficial effects that: according to the invention, after the training of the client is completed, the client is associated to the edge server which has residual bandwidth and is nearest according to the current position, so that the model uploading time delay is shortened; in order to avoid uneven client allocation, before each round of training, calculating the number of clients which are initially allocated in the range of each edge server according to the residual bandwidth of each edge server and a client transfer matrix; when the client is selected, the client which is helpful to model training and has strong computing power is calculated and selected according to the entropy weight, so as to shorten the local training time delay; and in the uploading process, uploading acceleration is carried out according to the residual bandwidth of the edge server so as to further shorten the uploading time delay.
Drawings
Figure 1 is a training flow chart of the present invention,
figure 2 is a cloud interaction diagram of the present invention,
figure 3 is a graph showing model accuracy in test set for comparative experiments in the examples,
figure 4 is a graph showing the model loss rate in the test set for the comparative experiment in the example,
FIG. 5 is a comparison of the time required for a user to upload data in the greedy algorithm and the hierarchical federal learning algorithm in an embodiment.
Detailed Description
The invention is further elucidated below in connection with the drawings and the detailed description. It should be understood that the following detailed description is merely illustrative of the invention and is not intended to limit the scope of the invention.
As can be seen with reference to fig. 1-2, the process of the present invention is:
1) Cloud server selecting client
And selecting a corresponding number of clients under each edge server, distributing the latest model parameters, and carrying out local training by using a local data set in the clients.
2) Client local training
In the step, the client participating in training receives the model parameters transmitted by the cloud server, and the client utilizes a local data set to solve the optimal model parameters.
3) Local parameter upload
And the client side uploads the model parameters to the edge server immediately after performing local training to reach the preset precision.
4) Edge aggregation
In the algorithm of synchronous aggregation, the server needs to receive all the parameters of the clients participating in training and then conduct a round of global aggregation, so that the time delay of global aggregation is decided by the last client for completing training and uploading the parameters to cause a 'dequeue effect'. The local data set of the client under the scene of the invention has larger difference in size and calculation performance, and the client selected by the initial cloud server is not associated with the edge server, so that the edge server synchronous aggregation mode can possibly cause that the client is delayed and not waiting for uploading of the corresponding client, and the model training progress is seriously influenced. Therefore, the invention is different from other hierarchical federal learning frameworks, and the edge server does not need to collect and aggregate all model parameters sent by the associated clients, but rather needs to transmit a period of time T aggregation Aggregating the collected model parameters; considering that the collected model parameters may be outdated data generated by the previous training, the model parameters are weighted according to the time when the parameters start to iterate locally, and the earlier parameters are weighted smaller; after the edges are aggregated, each edge server sends the aggregated model parameters to the cloud server;
using asynchronizationIn the aggregation mode, model parameters uploaded by the clients may be outdated, so that the old function of each client in edge aggregation and the number n of cloud aggregation rounds already performed when the client receives the model round And the latest number of rounds of cloud aggregation n CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as:wherein S is j Represented as a set of clients participating in this edge aggregation.
5) Cloud aggregation
Considering that the edge server and the cloud server have strong communication capability, the communication time delay of uploading parameters to the cloud server by the edge server is negligible. And the cloud server immediately performs a round of cloud aggregation after receiving the parameters uploaded by the edge server. Also because the quality of the parameters received by the cloud aggregation is irregular, a staleness function needs to be introduced as a super-parameter to reduce the influence of the outdated model on the global model training. The old function of each edge server uploading parameter is related to the overall old degree of the participating clients when the edge servers are aggregated in a round, and can be simply set as follows:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j s Aggregating the model parameters for the data aggregation set of the clients participating in training, and stopping if the aggregated model reaches the specified precisionAnd stopping the asynchronous hierarchical joint learning training based on bandwidth pre-allocation, otherwise, the cloud server performs the selection of the next round of clients and distributes the latest parameters to the clients.
If the communication capacity of the edge server and the cloud server is stronger, the edge server can not perform edge aggregation after receiving the parameters uploaded by the client, but directly send the parameters to the cloud server for cloud aggregation, so that the influence of the staleness function on the model parameter weight is more accurate.
In addition, after the operation of selecting the number of clients by the cloud server performs edge aggregation and parameter uploading on one edge server, the edge server is sure to have residual bandwidth, other edge servers may also have residual bandwidth, because a part of clients which are relatively close to the edge server have high computing power already perform local training, upload the parameters and release bandwidth, and the part of bandwidth is also distributed to the clients which participate in the training in the next round by the cloud server. The residual bandwidth ratio of all edge servers after the cloud server selects the client is set to be 0, and the model parameters are outdated after the client finishes training in consideration of the fact that some clients cannot finish training and release bandwidths for a long time due to abnormality, so that the bandwidth which is distributed after a period of time and is not released is automatically released for uploading parameters of the next round of clients.
To verify the application of this patent, specific experiments of this patent are given below:
experimental environment:
the experiment considered training under a distributed framework consisting of one cloud server, 5 edge servers and 250 clients to be trained. Each edge server comprises 50 clients, 10 clients under each edge are randomly selected to participate in training in the federal learning of the edge level, and parameters of each edge server in the federal learning of the cloud level participate in cloud aggregation.
In the local training, the neural network of Lenet convolution is taken as a model, and verification is carried out on a minst data set. The model initializes two convolutional layers, one dropout layer and two fully connected layers. The first convolution layer has an input channel number of 3, an output channel number of 10, and a convolution kernel size of 5. The second convolution layer has an input channel number of 10, an output channel number of 20, and a convolution kernel size of 5. The first full connection layer has 320 input nodes and 50 output nodes. The second full connection layer has 50 input nodes and 10 output nodes. The input data passes through the first convolutional layer and then undergoes maximum pooling and ReLU activation. The data then passes through the second convolution layer and dropout layer, where it is maximally pooled and ReLU activated. Then, the data is flattened and passed through two fully connected layers, returning the output result.
In this embodiment, 1000 data sets with labels are randomly selected from 60000 data sets and allocated to a cloud, 5000 data sets are equally divided into 5 parts and allocated to 5 edge servers, the 5 edge servers are used for testing the accuracy of a model at the edge and the cloud, and the rest 54000 data sets are equally allocated to 250 clients. In this embodiment, the samples are distributed in an independent and same-distribution manner, and the samples are uniformly and randomly distributed in the process of distributing the samples by the edge server and the client.
The embodiment simulates the time for uploading model parameters to the edge server after the local training is completed. Normalizing the distance from all clients to the edge server to be (0, 1), considering that the signal strength consideration is only related to free path weakness, and setting the furthest client signal-to-noise ratio in the range of the edge server to be 5, the uploading delay can be expressed as
The learning rate of the mnist dataset in the experiment is 0.01, the learning rate of each round in the experiment process is attenuated by 0.995 times before, and the model is iterated for 40 rounds in the local training process of each round.
Comparison experiment setting:
hierarchical federal learning (control): before training starts and after each round of cloud aggregation, 10 clients in the range of random selection of 5 edge servers participate in training, the clients possibly move in the training process, and when the local training is completed, if the clients are in the range of the associated edge servers, model parameters are uploaded to the corresponding edge servers. And the edge server collects all the clients or performs an edge aggregation round when the maximum waiting time is exceeded, and uploads the aggregated model parameters to the cloud server. And the cloud server collects all parameters transmitted by the edge servers and then carries out a round of cloud aggregation.
Client associated latest server when uploading model parameters (control): before training starts and after each round of cloud aggregation, 10 clients are randomly selected within the scope of each edge server for training. The client may move during the training process, and after the local training is finished, the client selects the edge server closest to the client to be associated with and uploads the model parameters. After a certain time, each edge server performs edge aggregation on the model parameters which are already uploaded, the parameters which are not uploaded in time are put into a buffer zone in an experiment, the parameters participate in the subsequent edge aggregation according to the uploading time delay, and each round of parameters in the buffer zone can be corrected according to a stale function. The cloud server can almost simultaneously receive all the model parameters uploaded by the edge servers and the number of the participating clients to perform cloud aggregation.
Asynchronous hierarchical federal learning based on bandwidth pre-allocation: before training starts and after each round of cloud aggregation, the number of the selected clients in the range of each edge server is calculated according to the residual bandwidth of each edge server and the transfer matrix, and the residual bandwidths of all edge servers are set to be 0. The client may move during the training process, after the local training is finished, the uploading time delay is calculated according to the residual bandwidth number and the distance in the range of each edge server, and the edge server with the smallest time delay is selected for association and uploading model parameters (bandwidth
=min (self-defined maximum allocatable bandwidth, residual bandwidth+1), the edge server residual bandwidth+1 after the upload is completed. If the client fails to upload to the pre-allocated edge server within a period of time, the edge server automatically releases bandwidth. And after a certain time, each edge server performs edge aggregation on the model parameters which are already uploaded, and uploads the model parameters to the cloud server. And the cloud server receives the model parameters uploaded by all the edge servers and the number of the participating clients, and performs cloud aggregation.
Conclusion analysis:
experimental results as shown in fig. 3 and fig. 4, since the time of each round of aggregation in the hierarchical federal learning is not fixed, the experiment uses the training time as the abscissa and uses the time of each round of cloud aggregation in the algorithm of the embodiment as 1 time unit. And the ordinate is the accuracy and loss value of the model after cloud aggregation on the test set under the corresponding time respectively.
As can be seen from the figure, the algorithm of the present embodiment is not synchronous in terms of model convergence and stability, although it adopts an asynchronous aggregation algorithm. However, in this scenario, the model achieves an accuracy of 0.9, 113 time units are needed for hierarchical federal learning, and the algorithm of this embodiment only needs 33 time units, can quickly converge, and achieves an accuracy of 0.95 when 108 time units, so as to achieve a better effect. The experiment in this embodiment does not consider the time delay of the local iteration, and the actual effect is not so obvious.
Under the scene, the hierarchical federation learning algorithm can not upload model parameters to the corresponding edge servers on time due to movement of the client during the training process, and part of the clients are not participated in aggregation of the edge models finally under the condition of setting maximum time waiting, so that the instability of curves and the final aggregation effect are poor.
When uploading model parameters, the client-side associated nearest server can greatly shorten the time delay of uploading the model parameters in a framework of hierarchical federation learning, as shown in fig. 5, wherein the abscissa of the graph is the time required by the client-side to upload the model to the associated edge server, the ordinate is the time required by uploading to the nearest edge server, and the maximum value is standardized to be 1. However, this method may cause a phenomenon that the initially allocated clients are gathered in a part of the edge servers after the training is completed, so that the edge servers need to receive additional clients, if the bandwidth is not allocated, the clients are blocked in the uploading stage, and finally the model convergence effect is affected, so that the curve is unstable.
Asynchronous hierarchical federation learning based on bandwidth pre-allocation processes the allocation of the initial clients on the basis of a greedy algorithm, so that the number of clients under each edge server is relatively uniform when the local training is completed, and the model convergence process is more stable. After collecting parameters transmitted by clients with short uploading time delay, the edge server distributes more bandwidths to the clients which are prolonged during uploading, so that the uploading of model parameters is quickened, and the model parameters of more clients can participate in edge aggregation in time.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the technical means, and also comprises the technical scheme formed by any combination of the technical features.
With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.
Claims (7)
1. An asynchronous hierarchical joint learning training method based on bandwidth pre-allocation is characterized by comprising the following steps:
step 1: cloud server selecting client
In the step, the cloud server is responsible for coordinating and managing the joint learning task, the cloud server selects a corresponding number of clients from the range of each edge server according to a strategy, selects clients participating in the next training round according to the quality and the computing power of the local training data of the clients based on the condition that the number of selectable clients is more than that of the clients required, sends model parameters to the clients participating in the next training round, and selects the clients participating in the next training round in the edge server based on the computing performance and the data entropy weight of the clients;
the specific method for selecting the client comprises the following steps:
the transition probability from each edge server to another edge server in the training process can be represented by a matrix, the matrix of the client transition condition in the cloud server coverage area is obtained through actual investigation, and the number of clients which are initially selected by the cloud server under each edge server is calculated according to the matrix and the number of clients which need to be associated with each edge server, so that the number of the clients which are associated with each edge server when the training is finished is uniform;
as shown in the matrix, element e of row j and column l represents the probability that the client transitions from edge server j range to edge server l range during the training process,
initial bandwidth allocation ratio beta for each training-participating client device The cloud server allocates the ratio according to the residual bandwidth of the edge server jAssociation->The number of clients to be associated with the next round of the |K| edge servers is set as (n) 1 ,n 2 ,n 3 ,...,n K ) Cloud server initially allocates the number of clients x= (X) under |k| edge servers 1 ,x 2 ,x 3 ,...,x |K| ) Let xm= (y) 1 ,y 2 ,y 3 ,...,y |K| ) The following optimization problems are obtained:
for-> For the number of clients remaining under server j,
as a general constraint problem, a solution is made by the multiplier method-PHR algorithm, given an initial point (n 1 ,n 2 ,n 3 ,...,n |K| ) Penalty factor sigma, amplification factor c 1 > 1, control error ε > 0, constant θ ε (0, 1), let k=1, solve the process as follows:
step 1 at x k-1 For an initial point, solve the unconstrained problem:
obtaining the optimal solution x k ;
Step2 ifThen x k Stopping for the optimal solution; otherwise, turning to Step 3;
step3: if it isTurning to Step 4; no make sigma k+1 =cσ k Turning to Step 4;
step4, correcting the multiplier vector:
(λ k+1 ) 1 =(λ k ) 1 -σc 1 (x k )
(λ k+1 ) i =max[0,(λ k ) i -σc i (x k )],i=2,3,...,2|K|+1,
let k=k+1, turn Step 2;
c i (x) For the ith constraint condition, the 1 st constraint is the equality constraint, the 2 nd to 2|K |+1 nd constraint is the inequality constraint, iteration is carried out according to the algorithm, the approximate optimal solution is obtained, the precision epsilon is not required to be very high, and the solved x is rounded to an integer; the method comprisesThe optimal solution is the number of optimal clients which need to be selected in the range of each edge server by the cloud server;
step2: client local training
After the selected client receives the model parameters, model training is locally executed, and each client executes an optimization algorithm according to the task type and the model architecture so as to update the parameters of the local model, wherein the process can be iterated for a plurality of times, and the model parameters are updated in each iteration;
step3: local parameter upload
After the client iterates the local model to the preset precision, carrying out association between the client and the edge server based on the distance between the geographic position of the client and the edge server containing the residual bandwidth;
after the client performs local training to reach a preset precision, uploading model parameters to edge servers, and under the framework of asynchronous hierarchical federal learning based on bandwidth pre-allocation, the client i is not initially associated with the corresponding edge server, but calculates the distance s between the client i and each edge server j according to the current position when the training is finished ij From there is still residual bandwidth, i.eIs closest to the server selection, mins ij Is associated with the edge server j of (c) and the model parameters are uploaded to the edge server,
the initial bandwidth allocation ratio of the associated edge server to all clients is beta ji And the parameter uploading rate of the client i is as follows:wherein h is i Is the channel gain of client i, N 0 Is noise (I)>The received power for client i for edge server j is denoted +.>Wherein p is j For the maximum receiving power of the edge server j, c is a constant, the uploading delay of the client parameter is +.>Wherein |d i The I is the size of the model parameters to be uploaded;
step4: edge aggregation
The model parameter aggregation is a federal learning method, and a better global model is generated by carrying out average, weighted average or other aggregation strategies on parameters from different clients; unlike other hierarchical federal learning frameworks, the edge server does not need to collect and aggregate model parameters sent by all associated clients, but rather, aggregates model parameters every time a period of time T passes aggregation Aggregating the collected model parameters; based on the condition that the collected model parameters are outdated data generated by the previous training, the local iteration time is given weight, and the earlier parameter weight is smaller; after the edges are aggregated, each edge server sends the aggregated model parameters to the cloud server;
step 5: cloud aggregation
Every time an elapsed time T aggregation And (3) the cloud server receives the model parameters sent by all the edge servers at the same time, aggregates the parameter models sent by the edge servers until the aggregated models reach the preset precision, and stops the asynchronous layered joint learning training based on bandwidth pre-allocation, otherwise, returns to the step (1).
2. The method for training asynchronous hierarchical joint learning based on bandwidth pre-allocation according to claim 1, wherein in step2, the client participating in training receives model parameters transmitted from the cloud server, and the client i uses its own data set D i To solve the optimal model parameter omega to representSo that the loss function->Minimum, the loss function on client i is expressed as: />The client performs gradient descent in multiple iterations to gradually approach the optimized model parameters ω that minimize the loss function * =argmin[F i (ω)];
To achieve a predetermined local accuracy θ∈ (0, 1), the client needs to perform multiple local iterations of L (θ) =μ log (1/θ), where the constant μ depends on the size of the training task, nth LocalTraining The round local iteration is expressed as:iterating until->When the local training is completed, wherein eta is the learning rate;
the calculation time delay of model training on the client and the iteration number L (theta) required by reaching accuracy, and the calculation capability P of the client i And the size |D of the training data set i The computing delay of client i is expressed as:the calculation capability P is the number of samples processed by the client in a period of time, the size of the training data set is set in advance, and the calculation time delay is effectively reduced when the client with strong calculation capability is selected.
3. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 1, wherein when the asynchronous aggregation mode is adopted, a staleness function is introduced as a super parameter to reduce the total staleness modelThe cloud server distributes the latest model to the client side participating in the training of the next round under the influence of the office model; when edge aggregation is carried out, the old function of each client and the number n of cloud aggregation rounds which are carried out when each client receives a model round And the latest number of rounds of cloud aggregation n CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as:wherein S is j Represented as a set of clients participating in this edge aggregation.
4. The method for training asynchronous hierarchical joint learning based on bandwidth pre-allocation according to claim 1, wherein in step 5, considering that the edge server and the cloud server have strong communication capability, the communication delay from the uploading parameters of the edge server to the cloud server is negligible, and the cloud server performs a round of cloud aggregation after receiving the parameters uploaded by the edge server;
also, because the quality of the parameters received by the cloud aggregation is irregular, a staleness function needs to be introduced as a super parameter to reduce the influence of an outdated model on global model training, the staleness function of each edge server uploading parameter is related to the overall staleness of the participating clients when the edge servers are in round aggregation, and the staleness function is set as:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j s And (3) aggregating the model parameters for the data aggregation set of the client terminals which participate in training until the aggregated model reaches the preset precision, stopping the asynchronous hierarchical joint learning training based on bandwidth pre-allocation, and otherwise, returning to the step (1).
5. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 4, wherein after the edge server receives the parameters uploaded by the client, the edge server sends the parameters to the cloud server for cloud aggregation.
6. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 1, wherein "based on the calculation performance and the data entropy weight of the client", the client participating in the next round of training is selected in the edge server;
the data quality of the data set on the client is defined by entropy weighting, m samples are extracted from client i,1,2,…,k i representing the attribute feature index on client i, 1,2, …, m is the sample index, +.>Values normalized for data attributes, +.>The entropy weight assigned to the data attribute,information entropy for each data attribute;
under each of the servers j, selecting based on computing power P and local model quality QThe P and Q of the same client are set to have smaller change in multiple training rounds, and the overall strategy is that each edge server preferentially takes part in the training client set N selected Selecting a client with larger comprehensive capacity phi=gamma P+ (1-gamma) Q value, wherein gamma is a weight coefficient, and the rest needed clients are selected from a training client total set N sum Is set in the range of the edge server j>The specific steps of the client side and the parameter delta are as follows:
s1: update |N selected I, if I N selected I is greater than the set threshold θ client Update |N selected delta|N with largest phi value in| selected I clients join N best Then select all of N best Clients in and within range of edge server jA plurality of;
s2: if it isThen at N sum -N selected In and conform to s ji <R j The condition, i.e. randomly choose +.>Clients, which calculate phi and add N selected ,N selected All the clients in the training system are the clients participating in the training of the next round.
7. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 6, wherein after the operation of selecting the number of clients by the cloud server performs edge aggregation and parameter uploading by one edge server, the remaining bandwidths of other edge servers containing remaining bandwidths are distributed to the clients participating in the training in the next round by the cloud server;
after the cloud server selects the client, the residual bandwidth ratio of all edge servers is set to 0, and at a set t overtime And after round cloud aggregation, automatically releasing the bandwidth which is allocated but not released, and using the bandwidth which is allocated but not released for parameter uploading of a next round of clients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311172306.0A CN117221122B (en) | 2023-09-12 | 2023-09-12 | Asynchronous layered joint learning training method based on bandwidth pre-allocation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311172306.0A CN117221122B (en) | 2023-09-12 | 2023-09-12 | Asynchronous layered joint learning training method based on bandwidth pre-allocation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117221122A CN117221122A (en) | 2023-12-12 |
CN117221122B true CN117221122B (en) | 2024-02-09 |
Family
ID=89034669
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311172306.0A Active CN117221122B (en) | 2023-09-12 | 2023-09-12 | Asynchronous layered joint learning training method based on bandwidth pre-allocation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117221122B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111447083A (en) * | 2020-03-10 | 2020-07-24 | 中国人民解放军国防科技大学 | Federal learning framework under dynamic bandwidth and unreliable network and compression algorithm thereof |
CN113469325A (en) * | 2021-06-09 | 2021-10-01 | 南京邮电大学 | Layered federated learning method, computer equipment and storage medium for edge aggregation interval adaptive control |
CN115526333A (en) * | 2022-08-31 | 2022-12-27 | 电子科技大学 | Federal learning method for dynamic weight under edge scene |
CN115766475A (en) * | 2022-10-31 | 2023-03-07 | 国网湖北省电力有限公司信息通信公司 | Semi-asynchronous power federal learning network based on communication efficiency and communication method thereof |
CN116484976A (en) * | 2023-04-26 | 2023-07-25 | 吉林大学 | Asynchronous federal learning method in wireless network |
CN116681126A (en) * | 2023-06-06 | 2023-09-01 | 重庆邮电大学空间通信研究院 | Asynchronous weighted federation learning method capable of adapting to waiting time |
CN116702881A (en) * | 2023-05-15 | 2023-09-05 | 湖南科技大学 | Multilayer federal learning scheme based on sampling aggregation optimization |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11244242B2 (en) * | 2018-09-07 | 2022-02-08 | Intel Corporation | Technologies for distributing gradient descent computation in a heterogeneous multi-access edge computing (MEC) networks |
US20210073639A1 (en) * | 2018-12-04 | 2021-03-11 | Google Llc | Federated Learning with Adaptive Optimization |
US20230196092A1 (en) * | 2021-12-21 | 2023-06-22 | Beijing Wodong Tianjun Information Technology Co., Ltd. | System and method for asynchronous multi-aspect weighted federated learning |
US20230214642A1 (en) * | 2022-01-05 | 2023-07-06 | Google Llc | Federated Learning with Partially Trainable Networks |
-
2023
- 2023-09-12 CN CN202311172306.0A patent/CN117221122B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111447083A (en) * | 2020-03-10 | 2020-07-24 | 中国人民解放军国防科技大学 | Federal learning framework under dynamic bandwidth and unreliable network and compression algorithm thereof |
CN113469325A (en) * | 2021-06-09 | 2021-10-01 | 南京邮电大学 | Layered federated learning method, computer equipment and storage medium for edge aggregation interval adaptive control |
CN115526333A (en) * | 2022-08-31 | 2022-12-27 | 电子科技大学 | Federal learning method for dynamic weight under edge scene |
CN115766475A (en) * | 2022-10-31 | 2023-03-07 | 国网湖北省电力有限公司信息通信公司 | Semi-asynchronous power federal learning network based on communication efficiency and communication method thereof |
CN116484976A (en) * | 2023-04-26 | 2023-07-25 | 吉林大学 | Asynchronous federal learning method in wireless network |
CN116702881A (en) * | 2023-05-15 | 2023-09-05 | 湖南科技大学 | Multilayer federal learning scheme based on sampling aggregation optimization |
CN116681126A (en) * | 2023-06-06 | 2023-09-01 | 重庆邮电大学空间通信研究院 | Asynchronous weighted federation learning method capable of adapting to waiting time |
Non-Patent Citations (1)
Title |
---|
基于异步奖励深度确定性策略梯度的边缘计算多任务资源联合优化;周恒;李丽君;董增寿;计算机应用研究;第 40卷(第05期);1491-1496 * |
Also Published As
Publication number | Publication date |
---|---|
CN117221122A (en) | 2023-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112882815B (en) | Multi-user edge calculation optimization scheduling method based on deep reinforcement learning | |
CN113141317B (en) | Streaming media server load balancing method, system, computer equipment and terminal | |
CN113938488A (en) | Load balancing method based on dynamic and static weighted polling | |
CN110362380B (en) | Network shooting range-oriented multi-objective optimization virtual machine deployment method | |
CN112118312A (en) | Network burst load evacuation method facing edge server | |
CN113642809B (en) | Power consumption prediction method and device, computer equipment and storage medium | |
CN114169543A (en) | Federal learning algorithm based on model obsolescence and user participation perception | |
CN114398634A (en) | Federal learning participant weight calculation method based on information entropy | |
CN116050540B (en) | Self-adaptive federal edge learning method based on joint bi-dimensional user scheduling | |
CN114301910A (en) | Cloud-edge collaborative computing task unloading method in Internet of things environment | |
CN116610434A (en) | Resource optimization method for hierarchical federal learning system | |
CN113099474A (en) | Method and device for predicting short-term traffic demand of satellite internet user terminal | |
CN112584527B (en) | Bandwidth allocation method and device for multi-node video transmission and electronic equipment | |
CN117221122B (en) | Asynchronous layered joint learning training method based on bandwidth pre-allocation | |
CN117787440A (en) | Internet of vehicles multi-stage federation learning method for non-independent co-distributed data | |
CN116302404B (en) | Resource decoupling data center-oriented server non-perception calculation scheduling method | |
CN117202264A (en) | 5G network slice oriented computing and unloading method in MEC environment | |
CN117176729A (en) | Client selection method, device and storage medium applied to federal learning | |
CN116663644A (en) | Multi-compression version Yun Bianduan DNN collaborative reasoning acceleration method | |
CN116306915A (en) | Wireless federal learning method for large-scale Internet of things collaborative intelligence | |
CN115766475A (en) | Semi-asynchronous power federal learning network based on communication efficiency and communication method thereof | |
CN117155792B (en) | Inter-core communication dynamic bandwidth adjustment method and system | |
CN117557870B (en) | Classification model training method and system based on federal learning client selection | |
CN114466356B (en) | Task unloading edge server selection method based on digital twin | |
CN117172333A (en) | Federal learning equipment scheduling method based on model loss tolerance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |