CN117221122A

CN117221122A - Asynchronous layered joint learning training method based on bandwidth pre-allocation

Info

Publication number: CN117221122A
Application number: CN202311172306.0A
Authority: CN
Inventors: 杨健; 周焱; 夏友旭; 张世召; 李飞扬
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-12
Anticipated expiration: 2043-09-12
Also published as: CN117221122B

Abstract

The invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which comprises the following steps: when training is started, the cloud server selects a corresponding number of clients in the range of each edge server, and distributes the latest model parameters to the clients; the client performs multiple iterations on the model parameters by utilizing own data, selects the nearest edge server with the residual bandwidth according to the current position of the client to upload the model parameters when training is finished, and reasonably distributes resources according to the bandwidth condition to accelerate the uploading speed; each edge server at certain time intervalsT _aggregation Then carrying out one round of edge aggregation, and uploading the aggregated parameters to a cloud server for one round of cloud aggregation; after cloud aggregation, the cloud server performs selection of the next round of clients. The invention not only adapts to the change of data distribution among the participants in a dynamic scene, but also fully utilizes limited communication resources, thereby improving the training effect.

Description

Asynchronous layered joint learning training method based on bandwidth pre-allocation

Technical Field

The invention relates to an asynchronous layered joint learning training method based on bandwidth pre-allocation, and belongs to the technical field of federal learning model training.

Background

The advent of digital technology has driven significant advances in a variety of revolutionary technologies such as big data and artificial intelligence. Machine learning driven mobile applications are revolutionizing various aspects of modern life. However, machine learning training tasks typically require a large amount of data from various terminals that have different computing capabilities. The traditional method is to upload data to a remote cloud server for processing, but the method has the challenges of privacy invasion, network congestion, transmission delay and the like, and the full utilization of the data is prevented.

In 2016, google institute proposed the concept of federal learning, aiming at alleviating network bandwidth constraints and solving the vulnerability of data privacy. Federal learning is a collaborative training and sharing method that eliminates the need to access raw data, conforms to decentralized collection and data minimization principles, and updates are shared with a central server or coordinator. This decentralized approach follows the data minimization principle, eliminating the need for centralized data collection. The original data is still securely stored on the individual device and cannot be accessed directly. Federal learning enables local model training on each device while only uploading aggregated updates or model parameters to a central server.

Hierarchical federal learning is a federal learning framework in which edge servers are typically deployed on base stations. In this framework, edge servers are deployed on base stations, acting as intermediate stations between mobile devices and cloud servers. These edge servers facilitate aggregation of local models of neighboring devices received from edges. By enabling the cloud server to effectively process data from more terminals, hierarchical federal learning successfully addresses the challenges of uploading data to the cloud.

However, the standard hierarchical federal learning framework employs a synchronous aggregation of global models, in which the server waits for all client parameters to be uploaded before proceeding to global aggregation. This synchronization approach results in a "dequeue effect", i.e. the delay of the global aggregation is determined by the client of the slowest upload parameters, resulting in an increase of the overall training process delay. In addition, delayed aggregation of client parameters prevents convergence of the global model, possibly affecting the accuracy and performance of the training model.

Therefore, there is a need to propose an asynchronous hierarchical federal learning framework that eliminates the need for servers to wait for all client parameters to be uploaded before proceeding with global aggregation, to shorten the federal learning training delay.

Disclosure of Invention

In order to solve the problems, the invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which has the following specific technical scheme:

an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation comprises the following steps:

step1: cloud server selecting client

In the step, the cloud server is responsible for coordinating and managing the joint learning task, the cloud server selects a corresponding number of clients from the range of each edge server according to a strategy, if the number of selectable clients is more than the required clients, the best client is selected to participate in the next round of training according to the quality and the computing power of the local training data of the clients, and model parameters are sent to the clients;

step2: client local training

After the selected client receives the model parameters, model training is locally executed, and each client executes an optimization algorithm according to the task type and the model architecture so as to update the parameters of the local model, wherein the process can be iterated for a plurality of times, and the model parameters are updated in each iteration;

step3: local parameter upload

After iterating the local model to the preset precision, the client selects the nearest edge server with the residual bandwidth for association according to the current geographic position, and then uploads the local model parameters to the edge server;

step4: edge aggregation

Model parameter aggregation is a federal learning method, and a better global model is generated by averaging, weighted averaging or other aggregation strategies of parameters from different clients; unlike other hierarchical federal learning frameworks, the edge server does not need to collect and aggregate model parameters sent by all associated clients, but rather, aggregates model parameters every time a period of time T passes _aggregation Aggregating the collected model parameters; considering that the collected model parameters may be outdated data generated by the previous training, the model parameters are weighted according to the time when the parameters start to iterate locally, and the earlier parameters are weighted smaller; after the edges are aggregated, each edge server sends the aggregated model parameters to the cloud server;

step 5: cloud aggregation

Every time an elapsed time T _aggregation And (3) the cloud server receives model parameters sent by all edge servers at the same time, aggregates the model parameters, stops asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, and returns to the step (1) otherwise.

Further, in the step1, when the client has a large-scale movement in the training process, the movement condition of the client selected by each round of cloud server in the training process cannot be predicted in advance, and the clients are associated to different edge servers when the training is finished; because the bandwidth of each edge server is limited, too many associated clients can cause that some clients can not successfully upload model parameters, and less clients can cause that the quality of the edge model aggregation is poor, so that a corresponding number of clients need to be selected in the range of each edge server when the clients are selected before training;

the specific method for selecting the client comprises the following steps:

the movement of the client in the training process can find a historical statistical rule, for example, from the client of a residential area under the scope of an edge server A, the situation that the client stays in the place, goes to a business area under the scope of an edge server B or goes to a company working under the scope of an edge server C in the training process can find corresponding probability distribution, in the situation, the transition probability from each edge server to another edge server in the training process can be represented by a matrix, the matrix of the client transition situation in the scope of a cloud server is obtained by actual investigation, and the number of clients which are initially selected under each edge server by the cloud server is calculated according to the matrix and the number of the clients which need to be associated with each edge server, so that the number of the clients which are associated with each edge server when the training is finished is uniform;

as shown in the matrix, the elements of row j and column represent the probability that the client will transition from edge server j range to edge server l range during the training process,

initial bandwidth allocation ratio beta for each training-participating client _device The cloud server allocates the ratio according to the residual bandwidth of each edge server jAssociation->The number of clients to be associated with the next round of the |K| edge servers is set as (n) ₁ ,n ₂ ,n ₃ ,...,n _|K| ) Cloud server initially allocates the number of clients x= (X) under |k| edge servers ₁ ,x ₂ ,x ₃ ,…,x _|K| ) Let xm= (y) ₁ ,y ₂ ,y ₃ ,...,y _|K| ) The following optimization problems are obtained:

S.t.for->For the number of clients remaining under server j,

as a general constraint problem, a solution is made by the multiplier method-PHR algorithm, given an initial point (n ₁ ,n ₂ ,n ₃ ,...,n _|K| ) Penalty factor sigma, amplification factor c ₁ > 1, control error ε > 0, constant θ ε (0, 1), let k=1, solve the process as follows:

step1 at x _k-1 For an initial point, solve the unconstrained problem:

obtaining the optimal solution x _k ；

Step2 ifThen x _k Stopping for the optimal solution; otherwise, turning to Step3;

step3: if it isTurning to Step4; no make sigma _k+1 ＝cσ _k Turning to Step4;

step4, correcting the multiplier vector:

(λ _k+1 ) ₁ ＝(λ _k ) ₁ -σc ₁ (x _k )

(λ _k+1 ) _i ＝max[0,(λ _k ) _i -σc _i (x _k )],i＝2,3,...，|2K+1|,

let k=k+1, turn Step2;

c _i (x) For the i constraint, the 1 constraint is the equality constraint, the 2 nd to 2|K |+1 constraint is the inequality constraint, and the algorithm is performedPerforming line iteration, namely obtaining an approximate optimal solution, wherein the precision epsilon is not required to be very high, and rounding the solved X to an integer; the optimal solution is the optimal number of clients which need to be selected in the range of each edge server by the cloud server.

Further, in step2, the client participating in training receives the model parameters transmitted from the cloud server, and the client i uses its own data set D _i To solve the optimal model parameter omega to representSo that the loss functionMinimum, the loss function on client i is expressed as: />The objective is to find the optimization model parameters ω that minimize the loss function ^* ＝argmin[F _i (ω)]；

The problem is hardly solvable, the client needs to perform gradient descent in multiple iterations to gradually approach the optimal solution, in order to reach a predetermined local accuracy θ e (0, 1), the client needs to perform multiple rounds of local iterations L (θ) =μ log (1/θ), where the constant μ depends on the size of the training task, n _{LocalTraining} The round local iteration is expressed as:iterating until->When the local training is completed, wherein eta is the learning rate;

calculation time delay of model training on client and iteration number L (theta) required by reaching accuracy, and calculation capability P of client _i And the size |D of the training data set _i Correlation, expressed as:where computing power P is the number of samples processed by the client over a period of time. The size of the training data set can be set in advance, and the calculation time delay can be effectively reduced when a client with strong calculation capability is selected.

Further, in the step3, after the client performs local training to reach a predetermined accuracy, the model parameters are immediately uploaded to the edge servers, and under the framework of asynchronous hierarchical federal learning based on bandwidth pre-allocation, the client i is not initially associated with the corresponding edge server, but calculates the distance s from each edge server j according to the current position when the training is finished _ij From there is still residual bandwidth, i.eIs closest to the server selection, mins _ij Is associated with the edge server j of (c) and the model parameters are uploaded to the edge server,

the initial bandwidth allocation ratio of the associated edge server to all clients is beta _ji The parameter upload rate of the client i is:wherein h is _i Is the channel gain of client i, N ₀ Is noise (I)>The received power for client i for edge server j is denoted +.>Wherein p is _j For the maximum receiving power of the edge server j, c is a constant, the uploading delay of the client parameter is +.>Wherein |d _i And I is the size of the model parameters to be uploaded.

Further, in step4, in the synchronization aggregation algorithm, the cloud server needs to receive all clients participating in trainingThe method comprises the steps that after parameters are acquired, a round of global aggregation is carried out, the time delay of the global aggregation is determined by the last client for completing training and uploading the parameters to cause a 'fall-behind effect', the difference between the local data set size and the computing performance of the client is large, the client selected by the initial cloud server is not associated with an edge server, therefore, the edge server synchronous aggregation mode can possibly cause delay to be not equal to the corresponding client for uploading, the model training progress is seriously influenced, and each edge server adopts an edge asynchronous aggregation mode, and each edge server has a period of time T _aggregation And aggregating the collected model parameters, and uploading the aggregated parameters to a cloud server.

Further, when the asynchronous aggregation mode is adopted, the model parameters uploaded by the clients may be outdated, so that the old function of each client in edge aggregation and the number n of cloud aggregation rounds already performed when the client receives the model are performed _round And the latest number of rounds of cloud aggregation n _CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as:wherein S is _j Represented as a set of clients participating in this edge aggregation.

Further, in step 5, considering that the edge server and the cloud server have strong communication capability, the communication delay from the uploading parameter of the edge server to the cloud server is negligible, and the cloud server immediately performs a round of cloud aggregation after receiving the parameter uploaded by the edge server;

also because the quality of the parameters received by the cloud aggregation is uneven, the staleness function needs to be introduced as a super parameter to reduce the influence of the outdated model on the global model training, and the staleness function of each edge server uploading the parameters and the whole participating clients when the edge servers are aggregated in a round mannerThe body staleness correlation can be simply set as:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j _s And (3) aggregating the model parameters for the data aggregation set of the client terminals which participate in training, stopping asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, otherwise, returning to the step (1).

Furthermore, when the communication capacity of the edge server and the cloud server is strong, the edge server does not perform edge aggregation after receiving the parameters uploaded by the client, but directly sends the parameters to the cloud server for cloud aggregation, so that the influence of the staleness function on the model parameter weight is more accurate.

Further, when the cloud server selects the client, the calculation performance of the client needs to be considered to reduce the local calculation time delay; the data quality of the data set on the client side enables the model to achieve a better local training effect, because the data with representativeness and diversity can improve the model training effect, the following technical method is adopted for this purpose: when the number of clients capable of participating in training is more than one round of clients required in training, comprehensively considering the calculation performance of the clients and the entropy weight of data, and selecting the optimal client to participate in the next round of training under the scope of each edge server;

the data quality of the data set on the client is defined by entropy weighting, m samples are extracted from client i,1,2,…,k _i the attribute feature index 1,2, …, m representing the client i is the sample index, +.>Values normalized for data attributes, +.>Entropy weight given to data attribute, +.>Information entropy for each data attribute;

under each server j, chooseThe optimal computing power P and local model quality Q of the client are set, P and Q are smaller in multiple training rounds under the same client, and the overall strategy is that each edge server preferentially selects from the trained client set N _selected Selecting a client with larger comprehensive capacity phi=gamma P+ (1-gamma) Q value, and remaining needed clients from a training client total set N _sum Is set in the range of the edge server j>The specific steps of the client side and the parameter delta are as follows:

s1: update |N _selected I, if I N _selected I is greater than the set threshold θ _client Update |N _selected delta|N with largest phi value in| _selected I clients join N _best Then select all of N _best Clients in and within range of edge server jA plurality of;

s2: if it isThen at N _sum -N _selected In and conform to s _ji ＜R _j Conditions, i.e. at the edgeRandom selection of +.>Clients, which calculate phi and add N _selected ，N _selected All the clients in the training system are the clients participating in the training of the next round.

Further, after the cloud server selects the number of clients to perform edge aggregation and parameter uploading on one edge server, the edge server is sure to have residual bandwidth, other edge servers may also have residual bandwidth, because a part of the computing power is stronger, the clients which are closer to the edge server have completed local training, upload the parameters and release the bandwidth, and the part of the bandwidth is also distributed to the clients which participate in the training in the next round by the cloud server;

after the cloud server selects the clients, the residual bandwidth ratio of all edge servers is set to 0, and in consideration of the fact that some clients cannot complete training and release bandwidth for a long time due to abnormality, model parameters of the clients after the completion of training are also outdated, so that the model parameters are set in n _overtime After round cloud aggregation, the allocated but unreleased bandwidth is automatically released for parameter uploading of the next round of clients.

The beneficial effects of the invention are as follows:

compared with the prior art, the invention has the beneficial effects that: according to the invention, after the training of the client is completed, the client is associated to the edge server which has residual bandwidth and is nearest according to the current position, so that the model uploading time delay is shortened; in order to avoid uneven client allocation, before each round of training, calculating the number of clients which are initially allocated in the range of each edge server according to the residual bandwidth of each edge server and a client transfer matrix; when the client is selected, the client which is helpful to model training and has strong computing power is calculated and selected according to the entropy weight, so as to shorten the local training time delay; and in the uploading process, uploading acceleration is carried out according to the residual bandwidth of the edge server so as to further shorten the uploading time delay.

Drawings

Figure 1 is a training flow chart of the present invention,

figure 2 is a cloud interaction diagram of the present invention,

figure 3 is a graph showing model accuracy in test set for comparative experiments in the examples,

figure 4 is a graph showing the model loss rate in the test set for the comparative experiment in the example,

FIG. 5 is a comparison of the time required for a user to upload data in the greedy algorithm and the hierarchical federal learning algorithm in an embodiment.

Detailed Description

The invention is further elucidated below in connection with the drawings and the detailed description. It should be understood that the following detailed description is merely illustrative of the invention and is not intended to limit the scope of the invention.

As can be seen with reference to fig. 1-2, the process of the present invention is:

1) Cloud server selecting client

And selecting a corresponding number of clients under each edge server, distributing the latest model parameters, and carrying out local training by using a local data set in the clients.

2) Client local training

In the step, the client participating in training receives the model parameters transmitted by the cloud server, and the client utilizes a local data set to solve the optimal model parameters.

3) Local parameter upload

And the client side uploads the model parameters to the edge server immediately after performing local training to reach the preset precision.

4) Edge aggregation

In the algorithm of synchronous aggregation, the server needs to receive all the parameters of the clients participating in training and then conduct a round of global aggregation, so that the time delay of global aggregation is decided by the last client for completing training and uploading the parameters to cause a 'dequeue effect'. The local data set of the client under the scene of the invention has larger difference in size and calculation performance, and the client selected by the initial cloud server is not associated with the edge server, so that the edge server synchronous aggregation mode may cause delay and not wait for corresponding timeAnd the client uploads the model training progress is seriously influenced. Therefore, the invention is different from other hierarchical federal learning frameworks, and the edge server does not need to collect and aggregate all model parameters sent by the associated clients, but rather needs to transmit a period of time T _aggregation Aggregating the collected model parameters; considering that the collected model parameters may be outdated data generated by the previous training, the model parameters are weighted according to the time when the parameters start to iterate locally, and the earlier parameters are weighted smaller; after the edges are aggregated, each edge server sends the aggregated model parameters to the cloud server;

when the asynchronous aggregation mode is adopted, model parameters uploaded by the client sides can be outdated, so that the old function of each client side when edge aggregation is carried out and the number n of cloud aggregation rounds which are carried out when the client side receives the model are adopted _round And the latest number of rounds of cloud aggregation n _CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as: />Wherein S is _j Represented as a set of clients participating in this edge aggregation.

5) Cloud aggregation

Considering that the edge server and the cloud server have strong communication capability, the communication time delay of uploading parameters to the cloud server by the edge server is negligible. And the cloud server immediately performs a round of cloud aggregation after receiving the parameters uploaded by the edge server. Also because the quality of the parameters received by the cloud aggregation is irregular, a staleness function needs to be introduced as a super-parameter to reduce the influence of the outdated model on the global model training. The old function of each edge server uploading parameter is related to the overall old degree of the participating clients when the edge servers are aggregated in a round, and can be simply set as follows:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j _s And aggregating the model parameters for the data aggregation set of the clients participating in training, stopping asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, otherwise, selecting the next round of clients by the cloud server, and distributing the latest parameters to the clients.

If the communication capacity of the edge server and the cloud server is stronger, the edge server can not perform edge aggregation after receiving the parameters uploaded by the client, but directly send the parameters to the cloud server for cloud aggregation, so that the influence of the staleness function on the model parameter weight is more accurate.

In addition, after the operation of selecting the number of clients by the cloud server performs edge aggregation and parameter uploading on one edge server, the edge server is sure to have residual bandwidth, other edge servers may also have residual bandwidth, because a part of clients which are relatively close to the edge server have high computing power already perform local training, upload the parameters and release bandwidth, and the part of bandwidth is also distributed to the clients which participate in the training in the next round by the cloud server. The residual bandwidth ratio of all edge servers after the cloud server selects the client is set to be 0, and the model parameters are outdated after the client finishes training in consideration of the fact that some clients cannot finish training and release bandwidths for a long time due to abnormality, so that the bandwidth which is distributed after a period of time and is not released is automatically released for uploading parameters of the next round of clients.

To verify the application of this patent, specific experiments of this patent are given below:

experimental environment:

the experiment considered training under a distributed framework consisting of one cloud server, 5 edge servers and 250 clients to be trained. Each edge server comprises 50 clients, 10 clients under each edge are randomly selected to participate in training in the federal learning of the edge level, and parameters of each edge server in the federal learning of the cloud level participate in cloud aggregation.

In the local training, the neural network of Lenet convolution is taken as a model, and verification is carried out on a minst data set. The model initializes two convolutional layers, one dropout layer and two fully connected layers. The first convolution layer has an input channel number of 3, an output channel number of 10, and a convolution kernel size of 5. The second convolution layer has an input channel number of 10, an output channel number of 20, and a convolution kernel size of 5. The first full connection layer has 320 input nodes and 50 output nodes. The second full connection layer has 50 input nodes and 10 output nodes. The input data passes through the first convolutional layer and then undergoes maximum pooling and ReLU activation. The data then passes through the second convolution layer and dropout layer, where it is maximally pooled and ReLU activated. Then, the data is flattened and passed through two fully connected layers, returning the output result.

In this embodiment, 1000 data sets with labels are randomly selected from 60000 data sets and allocated to a cloud, 5000 data sets are equally divided into 5 parts and allocated to 5 edge servers, the 5 edge servers are used for testing the accuracy of a model at the edge and the cloud, and the rest 54000 data sets are equally allocated to 250 clients. In this embodiment, the samples are distributed in an independent and same-distribution manner, and the samples are uniformly and randomly distributed in the process of distributing the samples by the edge server and the client.

The embodiment simulates the time for uploading model parameters to the edge server after the local training is completed. Normalizing the distance from all clients to the edge server to be (0, 1), considering that the signal strength consideration is only related to free path weakness, and setting the furthest client signal-to-noise ratio in the range of the edge server to be 5, the uploading delay can be expressed as

The learning rate of the mnist dataset in the experiment is 0.01, the learning rate of each round in the experiment process is attenuated by 0.995 times before, and the model is iterated for 40 rounds in the local training process of each round.

Comparison experiment setting:

hierarchical federal learning (control): before training starts and after each round of cloud aggregation, 10 clients in the range of random selection of 5 edge servers participate in training, the clients possibly move in the training process, and when the local training is completed, if the clients are in the range of the associated edge servers, model parameters are uploaded to the corresponding edge servers. And the edge server collects all the clients or performs an edge aggregation round when the maximum waiting time is exceeded, and uploads the aggregated model parameters to the cloud server. And the cloud server collects all parameters transmitted by the edge servers and then carries out a round of cloud aggregation.

Client associated latest server when uploading model parameters (control): before training starts and after each round of cloud aggregation, 10 clients are randomly selected within the scope of each edge server for training. The client may move during the training process, and after the local training is finished, the client selects the edge server closest to the client to be associated with and uploads the model parameters. After a certain time, each edge server performs edge aggregation on the model parameters which are already uploaded, the parameters which are not uploaded in time are put into a buffer zone in an experiment, the parameters participate in the subsequent edge aggregation according to the uploading time delay, and each round of parameters in the buffer zone can be corrected according to a stale function. The cloud server can almost simultaneously receive all the model parameters uploaded by the edge servers and the number of the participating clients to perform cloud aggregation.

Asynchronous hierarchical federal learning based on bandwidth pre-allocation: before training starts and after each round of cloud aggregation, the number of the selected clients in the range of each edge server is calculated according to the residual bandwidth of each edge server and the transfer matrix, and the residual bandwidths of all edge servers are set to be 0. The client may move during the training process, after the local training is finished, the uploading time delay is calculated according to the residual bandwidth number and the distance in the range of each edge server, and the edge server with the smallest time delay is selected for association and uploading model parameters (bandwidth

=min (self-defined maximum allocatable bandwidth, residual bandwidth+1), the edge server residual bandwidth+1 after the upload is completed. If the client fails to upload to the pre-allocated edge server within a period of time, the edge server automatically releases bandwidth. And after a certain time, each edge server performs edge aggregation on the model parameters which are already uploaded, and uploads the model parameters to the cloud server. And the cloud server receives the model parameters uploaded by all the edge servers and the number of the participating clients, and performs cloud aggregation.

Conclusion analysis:

experimental results as shown in fig. 3 and fig. 4, since the time of each round of aggregation in the hierarchical federal learning is not fixed, the experiment uses the training time as the abscissa and uses the time of each round of cloud aggregation in the algorithm of the embodiment as 1 time unit. And the ordinate is the accuracy and loss value of the model after cloud aggregation on the test set under the corresponding time respectively.

As can be seen from the figure, the algorithm of the present embodiment is not synchronous in terms of model convergence and stability, although it adopts an asynchronous aggregation algorithm. However, in this scenario, the model achieves an accuracy of 0.9, 113 time units are needed for hierarchical federal learning, and the algorithm of this embodiment only needs 33 time units, can quickly converge, and achieves an accuracy of 0.95 when 108 time units, so as to achieve a better effect. The experiment in this embodiment does not consider the time delay of the local iteration, and the actual effect is not so obvious.

Under the scene, the hierarchical federation learning algorithm can not upload model parameters to the corresponding edge servers on time due to movement of the client during the training process, and part of the clients are not participated in aggregation of the edge models finally under the condition of setting maximum time waiting, so that the instability of curves and the final aggregation effect are poor.

When uploading model parameters, the client-side associated nearest server can greatly shorten the time delay of uploading the model parameters in a framework of hierarchical federation learning, as shown in fig. 5, wherein the abscissa of the graph is the time required by the client-side to upload the model to the associated edge server, the ordinate is the time required by uploading to the nearest edge server, and the maximum value is standardized to be 1. However, this method may cause a phenomenon that the initially allocated clients are gathered in a part of the edge servers after the training is completed, so that the edge servers need to receive additional clients, if the bandwidth is not allocated, the clients are blocked in the uploading stage, and finally the model convergence effect is affected, so that the curve is unstable.

Asynchronous hierarchical federation learning based on bandwidth pre-allocation processes the allocation of the initial clients on the basis of a greedy algorithm, so that the number of clients under each edge server is relatively uniform when the local training is completed, and the model convergence process is more stable. After collecting parameters transmitted by clients with short uploading time delay, the edge server distributes more bandwidths to the clients which are prolonged during uploading, so that the uploading of model parameters is quickened, and the model parameters of more clients can participate in edge aggregation in time.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the technical means, and also comprises the technical scheme formed by any combination of the technical features.

With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims

1. An asynchronous hierarchical joint learning training method based on bandwidth pre-allocation is characterized by comprising the following steps:

step1: cloud server selecting client

step2: client local training

step3: local parameter upload

step4: edge aggregation

step 5: cloud aggregation

Every time an elapsed time T _aggregation The cloud server receives the model parameters sent by all edge servers at the same time, aggregates the model parameters, and stops the asynchronous based on bandwidth pre-allocation if the aggregated model reaches the designated precisionAnd (5) carrying out layered joint learning training, otherwise, returning to the step (1).

2. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 1, wherein in the step1, the client has a large-scale movement in the training process, so that the movement condition of the client selected by each round of cloud server in the training process cannot be pre-known, and the clients are associated to different edge servers when the training is finished; because the bandwidth of each edge server is limited, too many associated clients can cause that some clients can not successfully upload model parameters, and less clients can cause that the quality of the edge model aggregation is poor, so that a corresponding number of clients need to be selected in the range of each edge server when the clients are selected before training;

the specific method for selecting the client comprises the following steps:

as shown in the matrix, element e of row j and column l represents the probability that the client transitions from edge server j range to edge server l range during the training process,

each client end participating in trainingInitial bandwidth allocation ratio beta _device The cloud server allocates the ratio according to the residual bandwidth of the edge server jAssociation->The number of clients to be associated with the next round of the |K| edge servers is set as (n) ₁ ,n ₂ ,n ₃ ,...,n _|K| ) Cloud server initially allocates the number of clients x= (X) under |k| edge servers ₁ ,x ₂ ,x ₃ ,…,x _|K| ) Let xm= (y) ₁ ,y ₂ ,y ₃ ,…,y _|K| ) The following optimization problems are obtained:

for-> For the number of clients remaining under server j,

as a general constraint problem, a solution is made by the multiplier method-PHR algorithm, given an initial point (n ₁ ,n ₂ ,n ₃ ,…,n _|K| ) Penalty factor sigma, amplification factor c ₁ > 1, control error ε > 0, constant θ ε (0, 1), let k=1, solve the process as follows:

step1 at x _k-1 For an initial point, solve the unconstrained problem:

obtaining the optimal solution x _k ；

step3: if it isTurning to Step4; no make sigma _k+1 ＝cσ _k Turning to Step4;

step4, correcting the multiplier vector:

(λ _k+1 ) ₁ ＝(λ _k ) ₁ -σc ₁ (x _k )

(λ _k+1 ) _i ＝max[0,(λ _k ) _i -σc _i (x _k )],i＝2,3,…，2|K|+1，

let k=k+1, turn Step2;

c _i (x) For the ith constraint condition, the 1 st constraint is the equality constraint, the 2 nd to 2|K |+1 nd constraint is the inequality constraint, iteration is carried out according to the algorithm, the approximate optimal solution is obtained, the precision epsilon is not required to be very high, and the solved x is rounded to an integer; the optimal solution is the optimal number of clients which need to be selected in the range of each edge server by the cloud server.

3. The method for training asynchronous hierarchical joint learning based on bandwidth pre-allocation according to claim 1, wherein in step2, the client participating in training receives model parameters transmitted from the cloud server, and the client i uses its own data set D _i To solve the optimal model parameter omega to representSo that the loss function->Minimum, the loss function on client i is expressed as: />The objective is to find the optimization model parameters ω that minimize the loss function ^* ＝argmin[F _i (ω)]；

calculation time delay of model training on client and iteration number L (theta) required by reaching accuracy, and calculation capability P of client _i And the size |D of the training data set _i The computing delay of client i is expressed as:the calculation capability P is the number of samples processed by the client in a period of time, the size of the training data set is set in advance, and the calculation time delay is effectively reduced when the client with strong calculation capability is selected.

4. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 1, wherein in the step3, after the client performs local training to a predetermined accuracy, the model parameters are immediately uploaded to the edge servers, and in the framework of asynchronous hierarchical federal learning based on bandwidth pre-allocation, the client i is not initially associated with the corresponding edge serverInstead, at the end of training, the distance s from each edge server j is calculated from the current position _ij From there is still residual bandwidth, i.eIs closest to the server selection, mins _ij Is associated with the edge server j of (c) and the model parameters are uploaded to the edge server,

5. The method of claim 1, wherein in step4, in the algorithm of synchronous aggregation, the cloud server needs to receive all the parameters of the clients participating in training and then perform a round of global aggregation, so that the delay of global aggregation is determined by the last client for completing training and uploading parameters to cause the "dequeue effect", the local data set size and the computing performance of the client are greatly different, and the clients selected by the initial cloud server are not associated with each otherEdge servers, therefore, the edge server synchronous aggregation mode may not wait until the corresponding client is uploaded, seriously affecting the model training progress, therefore, each edge server adopts the edge asynchronous aggregation mode, and each edge server passes a period of time T _aggregation And aggregating the collected model parameters, and uploading the aggregated parameters to a cloud server.

6. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 5, wherein when the asynchronous aggregation mode is adopted, model parameters uploaded by the clients may be outdated, so that a staleness function is introduced as a super parameter to reduce the influence of the staleness model on the global model, and the cloud server distributes the latest model to the clients participating in the training of the next round all the time, so that the staleness function of each client in edge aggregation and the number n of cloud aggregation rounds already performed when the client receives the model are performed all the time _round And the latest number of rounds of cloud aggregation n _CurrentRound Correlation, so that the corresponding stale function of the parameter uploaded by client i is expressed asWhere λε (0, 1) is the given decay coefficient, the parameter update at edge server j is expressed as: />Wherein S is _j Represented as a set of clients participating in this edge aggregation.

7. The method for training asynchronous hierarchical joint learning based on bandwidth pre-allocation according to claim 1, wherein in step 5, considering that the edge server and the cloud server have strong communication capability, the communication delay from the uploading parameters of the edge server to the cloud server is negligible, and the cloud server immediately performs a round of cloud aggregation after receiving the parameters uploaded by the edge server;

also because of the cloud aggregation of received parametersThe quality is uneven, the obsolete function needs to be introduced as a super parameter to reduce the influence of the obsolete model on the global model training, and the obsolete function of each edge server uploading parameter is related to the whole obsolete degree of the participating clients when the edge servers are aggregated in a round manner, and can be simply set as follows:the updating of the model parameters on the cloud server is expressed as: />Wherein->Model parameters, D, represented as the latest upload by edge server j _s And (3) aggregating the model parameters for the data aggregation set of the client terminals which participate in training, stopping asynchronous hierarchical joint learning training based on bandwidth pre-allocation if the aggregated model reaches the specified precision, otherwise, returning to the step (1).

8. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 7, wherein when the communication capacity of the edge server and the cloud server is strong, the edge server does not perform edge aggregation after receiving the parameters uploaded by the client, but directly sends the parameters to the cloud server for cloud aggregation, so that the influence of the staleness function on the model parameter weight is more accurate.

9. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 1, wherein the cloud server needs to consider the calculation performance of the client to reduce the local calculation time delay when selecting the client; the data quality of the data set on the client side enables the model to achieve better local training effect, because the data with representativeness and diversity improves the training effect of the model, the following technical method is adopted for the purpose: when the number of clients capable of participating in training is more than one round of clients required in training, comprehensively considering the calculation performance of the clients and the entropy weight of data, and selecting the optimal client to participate in the next round of training under the scope of each edge server;

the data quality of the data set on the client is defined by entropy weighting, m samples are extracted from client i, representing the attribute feature index on client i, 1,2, …, m is the sample index, +.>Values normalized for data attributes, +.>Entropy weight given to data attribute, +.>Information entropy for each data attribute;

under each server j, chooseThe optimal computing power P and local model quality Q of the client are set, P and Q are smaller in multiple training rounds under the same client, and the overall strategy is that each edge server preferentially selects from the trained client set N _selected Selecting a client with larger comprehensive capacity phi=gamma P+ (1-gamma) Q value, wherein gamma is a weight coefficient, and the rest needed clients are selected from a training client total set N _sum Is set in the range of the edge server j>Each client, parameter delta, thenThe method comprises the following specific steps:

s2: if it isThen at N _sum -N _selected In and conform to s _ji ＜R _j The condition, i.e. randomly choose +.>Clients, which calculate phi and add N _selected ，N _selected All the clients in the training system are the clients participating in the training of the next round.

10. The method for asynchronous hierarchical joint learning training based on bandwidth pre-allocation according to claim 9, wherein after the operation of selecting the number of clients by the cloud server performs the edge aggregation and parameter uploading by one edge server, the edge server is sure to have the remaining bandwidth, other edge servers may have the remaining bandwidth, because a part of the computing power is stronger, the clients closer to the edge server have completed the local training, the operation of uploading parameters and the bandwidth is released, and the part of the bandwidth is also distributed to the clients participating in the training in the next round by the cloud server;

after the cloud server selects the client, the residual bandwidth ratio of all edge servers is set to 0, and the model parameters are too time-consuming after the client finishes training in consideration of the fact that some clients cannot finish training and release bandwidth for a long time due to abnormality, so that the method is setT is set _overtime After round cloud aggregation, the allocated but unreleased bandwidth is automatically released for parameter uploading of the next round of clients.