CN117221122B

CN117221122B - Asynchronous layered joint learning training method based on bandwidth pre-allocation

Info

Publication number: CN117221122B
Application number: CN202311172306.0A
Authority: CN
Inventors: 杨健; 周焱; 夏友旭; 张世召; 李飞扬
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2024-02-09
Anticipated expiration: 2043-09-12
Also published as: CN117221122A

Abstract

The invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which comprises the following steps: when training is started, the cloud server selects a corresponding number of clients in the range of each edge server, and distributes the latest model parameters to the clients; the client performs multiple iterations on the model parameters by utilizing own data, selects the nearest edge server with the residual bandwidth according to the current position of the client to upload the model parameters when training is finished, and reasonably distributes resources according to the bandwidth condition to accelerate the uploading speed; each edge server at the lapse of a certain time interval T _aggregation Then carrying out one round of edge aggregation, and uploading the aggregated parameters to a cloud server for one round of cloud aggregation; after cloud aggregation, the cloud server performs selection of the next round of clients. The invention not only adapts to the change of data distribution among the participants in a dynamic scene, but also fully utilizes limited communication resources, thereby improving the training effect.

Description

An asynchronous hierarchical joint learning training method based on bandwidth pre-allocation

技术领域Technical field

本发明涉及基于带宽预分配的异步分层联合学习训练方法，属于联邦学习模型训练的技术领域。The invention relates to an asynchronous hierarchical federated learning training method based on bandwidth pre-allocation, and belongs to the technical field of federated learning model training.

背景技术Background technique

数字技术的出现推动了各种变革性技术(如大数据和人工智能)的显著进步。机器学习驱动的移动应用正在革命化现代生活的各个方面。然而，机器学习训练任务通常需要来自各种终端的大量数据，这些终端具有不同的计算能力。传统方法是将数据上传到远程云服务器进行处理，但这种方法存在隐私侵犯、网络拥塞和传输延迟等挑战，阻碍了对数据的充分利用。The emergence of digital technology has driven significant progress in various transformative technologies such as big data and artificial intelligence. Machine learning-powered mobile apps are revolutionizing every aspect of modern life. However, machine learning training tasks often require large amounts of data from various terminals with different computing capabilities. The traditional method is to upload data to a remote cloud server for processing, but this method has challenges such as privacy violations, network congestion, and transmission delays, which hinder the full utilization of the data.

2016年，谷歌研究院提出了联邦学习的概念，旨在缓解网络带宽约束并解决数据隐私的漏洞。联邦学习是一种协作训练和分享方法，它消除了访问原始数据的必要性，符合分散收集和数据最小化原则，更新与中央服务器或协调员共享。这种分散的方法遵循数据最小化原则，消除了集中式数据收集的需求。原始数据仍然安全地存储在个体设备上，无法直接访问。联邦学习在每个设备上实现本地模型训练，同时仅上传聚合的更新或模型参数到中央服务器。In 2016, Google Research proposed the concept of federated learning, aiming to alleviate network bandwidth constraints and solve data privacy vulnerabilities. Federated learning is a collaborative training and sharing method that eliminates the need to access raw data, conforms to the principles of decentralized collection and data minimization, and updates are shared with a central server or coordinator. This decentralized approach follows data minimization principles and eliminates the need for centralized data collection. The raw data remains securely stored on the individual device and cannot be accessed directly. Federated learning enables local model training on each device while only uploading aggregated updates or model parameters to a central server.

分层联邦学习是一种联邦学习框架，通常在基站上部署边缘服务器。在这个框架中，边缘服务器被部署在基站上，用作移动设备和云服务器之间的中间站。这些边缘服务器促进了从边缘接收的邻近设备的本地模型的聚合。通过使云服务器能够有效处理来自更多终端的数据，分层联邦学习成功解决了上传数据到云的挑战。Hierarchical federated learning is a federated learning framework that typically deploys edge servers on base stations. In this framework, edge servers are deployed on base stations and serve as intermediate stations between mobile devices and cloud servers. These edge servers facilitate the aggregation of local models of neighboring devices received from the edge. Hierarchical federated learning successfully solves the challenge of uploading data to the cloud by enabling cloud servers to efficiently process data from more terminals.

然而，标准的分层联邦学习框架采用了全局模型的同步聚合，其中服务器在进行全局聚合之前等待所有客户端参数上传完毕。这种同步方法导致了“掉队者效应”，即全局聚合的延迟由最慢上传参数的客户端决定，导致整个训练过程延迟增加。此外，客户端参数的延迟聚合阻碍了全局模型的收敛，可能影响训练模型的准确性和性能。However, standard hierarchical federated learning frameworks employ synchronous aggregation of global models, where the server waits for all client parameters to be uploaded before performing global aggregation. This synchronization method leads to a "straggler effect", where the latency of global aggregation is determined by the client with the slowest upload parameters, resulting in increased latency throughout the training process. Furthermore, delayed aggregation of client-side parameters hinders the convergence of the global model and may affect the accuracy and performance of the trained model.

因此，有必要提出了一种异步分层联邦学习框架，消除服务器在进行全局聚合之前等待所有客户端参数被上传的需求，来缩短联邦学习的训练时延。Therefore, it is necessary to propose an asynchronous hierarchical federated learning framework that eliminates the need for the server to wait for all client parameters to be uploaded before global aggregation, thereby shortening the training delay of federated learning.

发明内容Contents of the invention

为了解决上述存在的问题，本发明公开了一种基于带宽预分配的异步分层联合学习训练方法，其具体技术方案如下：In order to solve the above existing problems, the present invention discloses an asynchronous hierarchical joint learning training method based on bandwidth pre-allocation. The specific technical solution is as follows:

一种基于带宽预分配的异步分层联合学习训练方法，包括以下步骤：An asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, including the following steps:

步骤1：云服务器选取客户端Step 1: Cloud server selects client

在这一步骤中，云服务器负责协调和管理联合学习任务，云服务器根据策略从每个边缘服务器范围内选择相应数量的客户端，如果可选客户端数量多于需求的客户端，则根据客户端的本地训练数据的质量和计算能力来选择最优秀的客户端参与下一轮的训练，并将模型参数发送给这些客户端；In this step, the cloud server is responsible for coordinating and managing joint learning tasks. The cloud server selects a corresponding number of clients from each edge server according to the policy. If the number of optional clients is more than the required clients, the cloud server selects a corresponding number of clients based on the client Based on the quality and computing power of local training data of the client, the best clients are selected to participate in the next round of training and the model parameters are sent to these clients;

步骤2：客户端本地训练Step 2: Client local training

选定的客户端接收到模型参数后会在本地执行模型训练，每个客户端根据任务类型和模型架构执行优化算法，以更新其本地模型的参数，这个过程可进行多轮迭代，每轮迭代中都会更新模型参数；After receiving the model parameters, the selected client will perform model training locally. Each client executes an optimization algorithm according to the task type and model architecture to update the parameters of its local model. This process can carry out multiple rounds of iterations, each iteration The model parameters will be updated;

步骤3：本地参数上传Step 3: Upload local parameters

客户端迭代本地模型到预设的精度后，客户端根据目前的地理位置，选取最近的且有剩余带宽的边缘服务器进行关联，接着将本地的模型参数上传至边缘服务器；After the client iterates the local model to the preset accuracy, the client selects the nearest edge server with remaining bandwidth to associate based on the current geographical location, and then uploads the local model parameters to the edge server;

步骤4：边缘聚合Step 4: Edge aggregation

模型参数聚合是一种联邦学习的方法，通过对来自不同客户端的参数进行平均、加权平均或其他聚合策略，生成一个更好的全局模型；有别于其它分层联邦学习框架，边缘服务器不需要收集齐所有关联客户端发送过来的模型参数再进行聚合，而是每经过一段时间T_aggregation，对已经收集的模型参数进行聚合；考虑到收集到的模型参数可能是上一轮训练产生的过时数据，因此会根据参数开始在本地迭代的时间赋予权重，越早的参数权重越小；每个边缘服务器在边缘聚合后，将聚合后的模型参数发送给云服务器；Model parameter aggregation is a federated learning method that generates a better global model by averaging, weighted averaging or other aggregation strategies on parameters from different clients; unlike other hierarchical federated learning frameworks, edge servers do not require Collect all model parameters sent by the associated clients and then aggregate them. Instead, aggregate the collected model parameters after a period of Tag _aggregation ; considering that the collected model parameters may be outdated data generated by the previous round of training , so the weight will be given based on the time when the parameters start to iterate locally, and the earlier the parameter, the smaller the weight; each edge server will send the aggregated model parameters to the cloud server after edge aggregation;

步骤5：云聚合Step 5: Cloud Aggregation

每经过时间T_aggregation，云服务器同时收到所有边缘服务器发送过来的模型参数，对这些模型参数进行聚合，如果聚合后模型到达指定的精度，则停止基于带宽预分配的异步分层联合学习训练，否则返回步骤1。Every time T _aggregation elapses, the cloud server receives the model parameters sent by all edge servers at the same time, and aggregates these model parameters. If the aggregation model reaches the specified accuracy, the asynchronous hierarchical joint learning training based on bandwidth pre-allocation is stopped. Otherwise return to step 1.

进一步的，所述步骤1中，客户端在训练过程中有大范围的移动的情形，则每轮云服务器选取的客户端在训练过程中的移动情况无法事先预知，这些客户端在结束训练时会关联至不同的边缘服务器；由于每个边缘服务器的带宽有限，关联的客户端太多会导致一些客户端无法顺利上传模型参数，较少则会导致边缘模型聚合的质量不佳，因此训练前选取客户端时需要在每个边缘服务器范围内选取相应数量的客户端；Furthermore, in step 1, if the client moves in a wide range during the training process, the movement of the clients selected by the cloud server in each round during the training process cannot be predicted in advance. When the training ends, these clients will Will be associated to different edge servers; due to the limited bandwidth of each edge server, too many associated clients will cause some clients to be unable to upload model parameters smoothly, and fewer clients will lead to poor quality of edge model aggregation, so before training When selecting clients, you need to select a corresponding number of clients within each edge server;

客户端选取的具体方法为：The specific method of client selection is:

训练过程中客户端的移动可以找到历史统计规律，比如从A边缘服务器范围下住宅区的客户端，训练过程中留在该地、前往B边缘服务器范围下的商业区或者前往C边缘服务器范围下的公司上班的情况都能找当相应的概率分布，在这种情况下，训练过程中每个边缘服务器到另一个边缘服务器的转移概率可由矩阵来表示，实际调查获取云服务器覆盖范围内客户端转移情况的矩阵，根据这个矩阵和每个边缘服务器需要关联的客户端数，计算出云服务器初始在每个边缘服务器下选取的客户端数量，使得结束训练时每个边缘服务器关联的客户端数均匀；Historical statistical patterns can be found in the movement of clients during the training process. For example, a client from a residential area under the scope of edge server A will stay there during the training process, go to a business area under the scope of edge server B, or go to a client under the scope of edge server C. The corresponding probability distribution can be found for the company's work situation. In this case, the transfer probability of each edge server to another edge server during the training process can be represented by a matrix. The actual investigation obtains the client transfer within the coverage of the cloud server. The matrix of the situation, based on this matrix and the number of clients that need to be associated with each edge server, calculate the number of clients that the cloud server initially selects under each edge server, so that the number of clients associated with each edge server is even when the training ends;

如矩阵所示，第j行l列的元素表示为客户端在训练过程中从边缘服务器j范围转移至边缘服务器l范围的概率，As shown in the matrix, the elements in the jth row and column l represent the probability that the client transfers from the edge server j range to the edge server l range during the training process.

每个参与训练的客户端初始带宽分配比β_device，云服务器根据每个边缘服务器j的剩余带宽分配比关联/>个客户端，在训练结束时上传至该服务器，设|K|个边缘服务器下一轮应关联的客户端数量为(n₁,n₂,n₃,...,n_|K|)，云服务器初始在|K|个边缘服务器下分配的客户端数X＝(x₁,x₂,x₃,…,x_|K|)，使xM＝(y₁,y₂,y₃,...,y_|K|)，得到以下优化问题：The initial bandwidth allocation ratio of each client participating in training is β _device , and the cloud server is based on the remaining bandwidth allocation ratio of each edge server j Related/> Clients are uploaded to the server at the end of training. Assume that the number of clients that |K| edge servers should be associated with in the next round is (n ₁ ,n ₂ ,n ₃ ,...,n _|K| ), The number of clients X=(x ₁ ,x ₂ ,x ₃ ,…,x _|K| ) that the cloud server initially allocates under |K| edge servers makes xM=(y ₁ ,y ₂ ,y ₃ ,.. .,y _|K| ), the following optimization problem is obtained:

S.t.对于/>为服务器j下剩余客户端数，St for/> is the number of remaining clients under server j,

作为一般约束问题，由乘子法--PHR算法求解，给定初始点(n₁,n₂,n₃,...,n_|K|)，罚因子σ、放大系数c₁＞1、控制误差ε＞0、常数θ∈(0,1)，令k＝1，求解过程如下：As a general constraint problem, it is solved by the multiplier method-PHR algorithm. Given the initial point (n ₁ ,n ₂ ,n ₃ ,...,n _|K| ), the penalty factor σ, the amplification coefficient c ₁ >1, Control error ε>0, constant θ∈(0,1), let k=1, the solution process is as follows:

Step 1:以x_k-1为初始点，求解无约束问题：Step 1: Taking x _k-1 as the initial point, solve the unconstrained problem:

得到最优解x_k； Obtain the optimal solution x _k ;

Step2:若则x_k为最优解，停止；否则，转Step 3；Step2:If Then x _k is the optimal solution and stop; otherwise, go to Step 3;

Step3：若转Step 4；否则令σ_k+1＝cσ_k，转Step 4；Step3: If Go to Step 4; otherwise let σ _k+1 = cσ _k and go to Step 4;

Step4:修正乘子向量：Step4: Correct the multiplier vector:

(λ_k+1)₁＝(λ_k)₁-σc₁(x_k)(λ _k+1 ) ₁ = (λ _k ) ₁ -σc ₁ (x _k )

(λ_k+1)_i＝max[0,(λ_k)_i-σc_i(x_k)],i＝2,3,...，|2K+1|,(λ _k+1 ) _i =max[0,(λ _k ) _i -σc _i (x _k )],i＝2,3,...,|2K+1|,

令k＝k+1，转Step 2；Let k=k+1, go to Step 2;

c_i(x)为第i条约束条件，第1条为等式约束，第2条至第2|K|+1条为不等式约束，按该算法进行迭代，获取近似最优解即可，精度ε不需要很高，求出来的X四舍五入至整数；该最优解即为云服务器在每个边缘服务器范围中需要选取的最佳客户端数量。c _i (x) is the i-th constraint, the 1st is an equality constraint, the 2nd to 2|K|+1 are inequality constraints, iterate according to this algorithm and obtain the approximate optimal solution, The precision ε does not need to be very high, and the calculated X is rounded to an integer; the optimal solution is the optimal number of clients that the cloud server needs to select in each edge server range.

进一步的，所述步骤2中，参与训练的客户端接收到了云服务器传来的模型参数，客户端i利用自身的数据集D_i来求解最优模型参数ω来表示使得损失函数最小，则客户端i上的损失函数表示为：目标是找到最小化损失函数的最优化模型参量ω^*＝argmin[F_i(ω)]；Further, in step 2, the client participating in the training receives the model parameters from the cloud server, and the client i uses its own data set D _i to solve for the optimal model parameters ω to represent so that the loss function Minimum, then the loss function on client i is expressed as: The goal is to find the optimal model parameters ω ^* =argmin[F _i (ω)] that minimize the loss function;

该问题几乎不可解，客户端需要在多次迭代中执行梯度下降，以逐步接近最优解，为了达到预定的本地精度θ∈(0,1)，客户端需要进行多轮本地迭代L(θ)＝μlog(1/θ)，其中常数μ取决于训练任务的规模大小，第n_{LocalTraining}轮本地迭代表示为：迭代直到/>时，本地训练完成，其中η为学习率；This problem is almost unsolvable. The client needs to perform gradient descent in multiple iterations to gradually approach the optimal solution. In order to achieve the predetermined local accuracy θ∈(0,1), the client needs to perform multiple rounds of local iterations L(θ) =μlog(1/θ), where the constant μ depends on the size of the training task. The nth _{LocalTraining} round of local iteration is expressed as: Iterate until/> When, local training is completed, where eta is the learning rate;

客户端上模型训练的计算时延与到达精度需要的迭代次数L(θ)，客户端的计算能力P_i和训练数据集的大小|D_i|相关，表示为：其中计算能力P是客户端一段时间内处理样本的数量。训练数据集的大小可以提前设定，在选取计算能力强的客户端可以有效降低计算时延。The computing delay of model training on the client is related to the number of iterations L(θ) required to achieve accuracy, the computing power _Pi of the client and the size of the training data set |D _i |, expressed as: The computing power P is the number of samples processed by the client within a period of time. The size of the training data set can be set in advance, and the calculation delay can be effectively reduced by selecting a client with strong computing power.

进一步的，所述步骤3中，客户端进行本地训练达到预定精度后，立即将模型参数上传至边缘服务器，在基于带宽预分配的异步分层联邦学习的框架下，客户端i初始未关联相应的边缘服务器，而是在训练结束时，根据当前的位置计算出与每个边缘服务器j的距离s_ij，从仍有剩余带宽，即的服务器中选择距离最近，即mins_ij的边缘服务器j进行关联，并将模型参数上传至该边缘服务器，Further, in step 3, after the client performs local training and reaches the predetermined accuracy, it immediately uploads the model parameters to the edge server. Under the framework of asynchronous hierarchical federated learning based on bandwidth pre-allocation, client i is not initially associated with the corresponding edge server, but at the end of training, the distance s _ij to each edge server j is calculated based on the current location, so that there is still remaining bandwidth, that is Select the edge server j with the closest distance, that is, mins _ij , to associate among the servers, and upload the model parameters to the edge server.

关联的边缘服务器给所有客户端的初始带宽分配比为β_ji，则客户端i的参数上传速率为：其中h_i是客户端i的信道增益，N₀为噪声，/>为边缘服务器j对客户端i的接收功率，表示为/>其中p_j为边缘服务器j的最大接收功率，c为常数，则客户端参数上传时延/>其中|d_i|为需要上传模型参数的大小。The initial bandwidth allocation ratio of the associated edge server to all clients is β _ji , then the parameter upload rate of client i is: where h _i is the channel gain of client i, N ₀ is the noise,/> is the receiving power of edge server j to client i, expressed as/> Where p _j is the maximum received power of edge server j, and c is a constant, then the client parameter upload delay/> Where |d _i | is the size of the model parameters that need to be uploaded.

进一步的，所述步骤4中，在同步聚合的算法中，云端服务器需要接收所有参与训练的客户端参数后再进行一轮全局聚合，导致全局聚合的时延由最后一个完成训练和上传参数的客户端来决定造成“掉队者效应”，客户端的本地数据集大小和计算性能差异较大，初始云服务器选取的客户端未关联边缘服务器，因此边缘服务器同步聚合的方式可能导致迟迟未等到相应的客户端上传，严重影响了模型训练进度，因此采用边缘异步聚合的方式，每个边缘服务器每经过一段时间T_aggregation，对已经收集的模型参数进行聚合，并将聚合后的参数上传至云服务器。Furthermore, in step 4, in the synchronous aggregation algorithm, the cloud server needs to receive the parameters of all clients participating in the training before performing a round of global aggregation, resulting in a global aggregation delay caused by the last one completing training and uploading parameters. The client makes the decision, resulting in a "straggler effect". The client's local data set size and computing performance vary greatly. The client selected by the initial cloud server is not associated with the edge server. Therefore, the edge server's synchronization and aggregation method may cause a delay in waiting for the corresponding response. The client upload seriously affects the model training progress, so the edge asynchronous aggregation method is adopted. Each edge server aggregates the collected model parameters after a period of Tag _aggregation and uploads the aggregated parameters to the cloud server. .

进一步的，所述采用异步聚合方式时，客户端上传的模型参数可能是过时的，因此进行边缘聚合时每个客户端的陈旧函数与该客户端接收到模型时已经进行的云聚合轮数n_round和云聚合的最新轮数n_CurrentRound相关，因此客户端i上传的参数相应的陈旧函数表示为其中λ∈(0,1)为给定的衰减系数，在边缘服务器j上的参数更新表示为：其中S_j表示为参与本次边缘聚合的客户端集。Furthermore, when the asynchronous aggregation method is used, the model parameters uploaded by the client may be out of date. Therefore, the stale function of each client when performing edge aggregation is different from the number n _round of cloud aggregation rounds that have been performed when the client receives the model. It is related to the latest round number n _CurrentRound of cloud aggregation, so the stale function corresponding to the parameters uploaded by client i is expressed as Where λ∈(0,1) is the given attenuation coefficient, and the parameter update on edge server j is expressed as: Where S _j represents the set of clients participating in this edge aggregation.

进一步的，所述步骤5中，考虑到边缘服务器和云服务器有强大的通信能力，因此边缘服务器上传参数至云服务器的通信时延可以忽略不计，云服务器再接收到边缘服务器上传的参数后，立即进行一轮云聚合；Furthermore, in step 5, considering that the edge server and the cloud server have strong communication capabilities, the communication delay for the edge server to upload parameters to the cloud server is negligible. After the cloud server receives the parameters uploaded by the edge server, Immediately perform a round of cloud aggregation;

同样因为云聚合接收的参数质量参差不齐，需要引入陈旧性函数作为超参数以减少过时模型对于全局模型训练的影响，每个边缘服务器上传参数的陈旧函数与该边缘服务器上轮聚合时参与客户端的整体陈旧度相关，可简单设为：云服务器上模型参数的更新表示为：/>其中/>表示为边缘服务器j最新上传的模型参数，D_s为已参与训练的客户端的数据集总集合，对这些模型参数进行聚合，如果聚合后模型到达指定的精度，则停止基于带宽预分配的异步分层联合学习训练，否则返回步骤1。Also because the quality of parameters received by cloud aggregation is uneven, it is necessary to introduce the stale function as a hyperparameter to reduce the impact of outdated models on global model training. The stale function of parameters uploaded by each edge server is consistent with the participating customers in the last round of aggregation of the edge server. It is related to the overall staleness of the terminal, which can be simply set as: The update of model parameters on the cloud server is expressed as:/> Among them/> Represented as the latest model parameters uploaded by edge server j, D _s is the total set of data sets of clients that have participated in training. These model parameters are aggregated. If the aggregation model reaches the specified accuracy, the asynchronous analysis based on bandwidth pre-allocation is stopped. Layer joint learning training, otherwise return to step 1.

进一步的，当边缘服务器和云服务器的通信能力比较强大，则使边缘服务器接收客户端上传的参数后，不进行边缘聚合，而是把这些参数直接发送给云服务器进行云聚合，这样陈旧性函数对于模型参数权重的影响更加准确。Furthermore, when the communication capabilities of the edge server and the cloud server are relatively strong, the edge server does not perform edge aggregation after receiving the parameters uploaded by the client, but directly sends these parameters to the cloud server for cloud aggregation. In this way, the staleness function The impact of model parameter weights is more accurate.

进一步的，所述云服务器选取客户端时需要考虑到客户端的计算性能来降低本地计算时延；客户端上数据集的数据质量来使模型达到更好的局部训练效果，因为具有代表性和多样性的数据可以提升模型训练效果，为此采用以下技术方法：在可参与训练的客户端数量多于一轮训练中所需客户端时，综合考虑客户端的计算性能和数据的熵权值，在每个边缘服务器范围下选取最优的客户端参与下一轮的训练；Furthermore, when selecting a client, the cloud server needs to consider the computing performance of the client to reduce local computing latency; the data quality of the data set on the client can enable the model to achieve better local training effects because it is representative and diverse. Specific data can improve the model training effect. To this end, the following technical methods are adopted: when the number of clients that can participate in training is more than the clients required in a round of training, the computing performance of the client and the entropy weight of the data are comprehensively considered. Select the best client under each edge server to participate in the next round of training;

客户端上数据集的数据质量用熵权法来定义，从客户端i中抽取m个样本，1,2,…,k_i表示客户端i上的属性特征索引1,2,…,m为样本索引，/>为数据属性标准化后的值，/>为数据属性赋予的熵权值，为每个数据属性的信息熵；The data quality of the data set on the client is defined using the entropy weight method. m samples are extracted from client i, 1,2,…,k _i represents the attribute feature index on client i. 1,2,…,m is the sample index,/> Is the normalized value of the data attribute,/> The entropy weight assigned to the data attribute, is the information entropy of each data attribute;

在每个服务器j下，选取个计算能力P和本地模型质量Q最优的客户端，设同一客户端下P和Q在多轮训练中变化较小，总体策略是每个边缘服务器下优先从参与过训练的客户端集N_selected中选取综合能力φ＝γP+(1-γ)Q值较大的客户端，剩余需要的客户端从训练客户端总集N_sum中选取，设在边缘服务器j的范围中选取/>个客户端，参数δ，则具体步骤如下：Under each server j, select A client with optimal computing power P and local model quality Q. Assume that P and Q change little in multiple rounds of training under the same client. The overall strategy is to give priority to each edge server from the client set N that has participated in training. Select clients with larger comprehensive capabilities φ = γP + (1-γ) Q value from _selected , and the remaining required clients are selected from the total set of training clients N _sum , and are selected from the range of edge server j/> For a client and parameter δ, the specific steps are as follows:

S2：若则在N_sum-N_selected中，且符合s_ji＜R_j条件，即在边缘服务器j范围内随机选取/>个客户端，将这些客户端计算φ值并加入N_selected，N_selected中的所有客户端即为下一轮参与训练的客户端。S2: If Then it is in N _sum -N _selected , and meets the condition of s _ji <R _j , that is, it is randomly selected within the range of edge server j/> clients, calculate the φ values of these clients and add them to N _selected . All clients in N _selected are the clients participating in the next round of training.

进一步的，云服务器选取客户端数量的操作在一个边缘服务器进行边缘聚合并参数上传之后，这个边缘服务器是肯定有剩余带宽的，其他边缘服务器可能也有剩余带宽，因为一部分计算能力较强，与边缘服务器距离较近的客户端已经完成本地训练、上传参数的操作并且释放了带宽，这部分带宽也会由云服务器分配给下一轮参与训练的客户端；Furthermore, the cloud server selects the number of clients after an edge server performs edge aggregation and uploads parameters. This edge server definitely has remaining bandwidth. Other edge servers may also have remaining bandwidth, because some of them have strong computing capabilities and are not related to the edge servers. Clients that are closer to the server have completed local training, uploaded parameters, and released bandwidth. This bandwidth will also be allocated by the cloud server to the clients participating in the next round of training;

云服务器选取客户端后，所有边缘服务器的剩余带宽比设为0，考虑到一些客户端因为异常原因长时间无法完成训练并释放带宽，这部分客户端完成训练后模型参数也是过时的，因此在设定的n_overtime轮云聚合后，已分配但未被释放的带宽将自动释放，用于下一轮客户端的参数上传。After the cloud server selects a client, the remaining bandwidth ratio of all edge servers is set to 0. Considering that some clients cannot complete training and release bandwidth for a long time due to abnormal reasons, the model parameters of these clients will also be outdated after completing training, so in After the set n _overtime round of cloud aggregation, the allocated but not released bandwidth will be automatically released for the next round of client parameter uploads.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明与现有技术相比的有益效果是：本发明在客户端训练完成后再根据当前位置关联至有剩余带宽且最近的边缘服务器，以缩短模型上传时延；为了避免客户端分配不均的情况，每轮训练之前先根据各个边缘服务器的剩余带宽和客户端转移矩阵来计算出本轮初始在每个边缘服务器范围内分配客户端的数量；选取客户端时按熵权值计算挑选出对模型训练有帮助且计算能力强的客户端，以缩短本地训练时延；上传过程中根据边缘服务器的剩余带宽进行上传加速来进一步缩短上传时延。Compared with the existing technology, the beneficial effects of the present invention are: after the client training is completed, the present invention associates it with the nearest edge server with remaining bandwidth according to the current location to shorten the model upload delay; in order to avoid uneven distribution of clients Before each round of training, the number of clients initially assigned to each edge server in this round is calculated based on the remaining bandwidth of each edge server and the client transfer matrix; when selecting clients, the pairs are selected based on the entropy weight calculation. Model training is helpful and the client with strong computing power is used to shorten the local training delay; during the upload process, the upload is accelerated according to the remaining bandwidth of the edge server to further shorten the upload delay.

附图说明Description of drawings

图1是本发明的训练流程图，Figure 1 is a training flow chart of the present invention,

图2是本发明的云端交互图，Figure 2 is a cloud interaction diagram of the present invention,

图3是实施例中对比实验中模型精度在测试集中的表现图，Figure 3 is a diagram showing the performance of the model accuracy in the test set in the comparative experiment in the embodiment.

图4是实施例中对比实验中模型损失率在测试集中的表现图，Figure 4 is a diagram showing the performance of the model loss rate in the test set in the comparative experiment in the embodiment.

图5是实施例中贪心算法和分层联邦学习算法中用户上传数据所需时间的对比。Figure 5 is a comparison of the time required for users to upload data in the greedy algorithm and the hierarchical federated learning algorithm in the embodiment.

具体实施方式Detailed ways

下面结合附图和具体实施方式，进一步阐明本发明。应理解下述具体实施方式仅用于说明本发明而不用于限制本发明的范围。The present invention will be further elucidated below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

结合附图1-2可见，本发明过程为：It can be seen with reference to the accompanying drawings 1-2 that the process of the present invention is:

1)云服务器选取客户端1) Cloud server selects client

在每个边缘服务器下选取相应数量客户端，分发最新的模型参数在这些客户端中利用本地数据集进行本地训练。Select a corresponding number of clients under each edge server, distribute the latest model parameters to these clients, and use local data sets for local training.

2)客户端本地训练2)Client local training

这一步骤中，参与训练的客户端接收到了云服务器传来的模型参数，客户端利用本地的数据集来求解最优模型参数。In this step, the client participating in the training receives the model parameters from the cloud server, and the client uses the local data set to solve the optimal model parameters.

3)本地参数上传3)Local parameter upload

客户端进行本地训练达到预定精度后，立即将模型参数上传至边缘服务器。After the client performs local training and reaches the predetermined accuracy, it immediately uploads the model parameters to the edge server.

4)边缘聚合4) Edge aggregation

在同步聚合的算法中，服务器需要接收所有参与训练的客户端参数后再进行一轮全局聚合，导致全局聚合的时延由最后一个完成训练和上传参数的客户端来决定造成“掉队者效应”。本发明场景下客户端的本地数据集大小和计算性能差异较大，初始云服务器选取的客户端未关联边缘服务器，因此边缘服务器同步聚合的方式可能导致迟迟未等到相应的客户端上传，严重影响了模型训练进度。因此本发明有别于其它分层联邦学习框架，边缘服务器不需要收集齐所有关联客户端发送过来的模型参数再进行聚合，而是每经过一段时间T_aggregation，对已经收集的模型参数进行聚合；考虑到收集到的模型参数可能是上一轮训练产生的过时数据，因此会根据参数开始在本地迭代的时间赋予权重，越早的参数权重越小；每个边缘服务器在边缘聚合后，将聚合后的模型参数发送给云服务器；In the synchronous aggregation algorithm, the server needs to receive the parameters of all clients participating in the training before performing a round of global aggregation. As a result, the delay of global aggregation is determined by the last client that completes training and uploads parameters, resulting in a "straggler effect" . In the scenario of the present invention, the client's local data set size and computing performance vary greatly. The client selected by the initial cloud server is not associated with the edge server. Therefore, the synchronous aggregation method of the edge server may cause a delay in waiting for the corresponding client to upload, which seriously affects model training progress. Therefore, the present invention is different from other hierarchical federated learning frameworks. The edge server does not need to collect all the model parameters sent by the associated clients and then aggregate them. Instead, it aggregates the collected model parameters after a period of _taggration ; Considering that the collected model parameters may be outdated data generated by the previous round of training, weights will be given based on the time when the parameters start to iterate locally. The earlier the parameters, the smaller the weight; after each edge server is aggregated at the edge, it will be aggregated The final model parameters are sent to the cloud server;

采用异步聚合方式时，客户端上传的模型参数可能是过时的，因此进行边缘聚合时每个客户端的陈旧函数与该客户端接收到模型时已经进行的云聚合轮数n_round和云聚合的最新轮数n_CurrentRound相关，因此客户端i上传的参数相应的陈旧函数表示为其中λ∈(0,1)为给定的衰减系数，在边缘服务器j上的参数更新表示为：其中S_j表示为参与本次边缘聚合的客户端集。When using asynchronous aggregation, the model parameters uploaded by the client may be out of date. Therefore, the stale function of each client when performing edge aggregation is different from the number n _round of cloud aggregation that has been performed when the client receives the model and the latest cloud aggregation. The number of rounds n is related to _CurrentRound , so the stale function corresponding to the parameters uploaded by client i is expressed as Where λ∈(0,1) is the given attenuation coefficient, and the parameter update on edge server j is expressed as: Where S _j represents the set of clients participating in this edge aggregation.

5)云聚合5) Cloud aggregation

考虑到边缘服务器和云服务器有强大的通信能力，因此边缘服务器上传参数至云服务器的通信时延可以忽略不计。云服务器再接收到边缘服务器上传的参数后，立即进行一轮云聚合。同样因为云聚合接收的参数质量参差不齐，需要引入陈旧性函数作为超参数以减少过时模型对于全局模型训练的影响。每个边缘服务器上传参数的陈旧函数与该边缘服务器上轮聚合时参与客户端的整体陈旧度相关，可简单设为：云服务器上模型参数的更新表示为：/>其中/>表示为边缘服务器j最新上传的模型参数，D_s为已参与训练的客户端的数据集总集合，对这些模型参数进行聚合，如果聚合后模型到达指定的精度，则停止基于带宽预分配的异步分层联合学习训练，否则云服务器再进行下一轮客户端的选取，并将最新的参数分发到这些客户端上。Considering that the edge server and the cloud server have strong communication capabilities, the communication delay for the edge server to upload parameters to the cloud server is negligible. After the cloud server receives the parameters uploaded by the edge server, it immediately performs a round of cloud aggregation. Also because the quality of parameters received by cloud aggregation varies, the staleness function needs to be introduced as a hyperparameter to reduce the impact of outdated models on global model training. The staleness function of each edge server's upload parameters is related to the overall staleness of the participating clients in the last round of aggregation of the edge server, and can be simply set as: The update of model parameters on the cloud server is expressed as:/> Among them/> Represented as the latest model parameters uploaded by edge server j, D _s is the total set of data sets of clients that have participated in training. These model parameters are aggregated. If the aggregation model reaches the specified accuracy, the asynchronous analysis based on bandwidth pre-allocation is stopped. Layer joint learning training, otherwise the cloud server will select the next round of clients and distribute the latest parameters to these clients.

如果边缘服务器和云服务器的通信能力比较强大，则可以使边缘服务器接收客户端上传的参数后，不进行边缘聚合，而是把这些参数直接发送给云服务器进行云聚合，这样陈旧性函数对于模型参数权重的影响更加准确。If the communication capabilities of the edge server and the cloud server are relatively strong, the edge server can not perform edge aggregation after receiving the parameters uploaded by the client, but directly send these parameters to the cloud server for cloud aggregation. In this way, the staleness function will be useful for the model. The influence of parameter weights is more accurate.

此外，云服务器选取客户端数量的操作在一个边缘服务器进行边缘聚合并参数上传之后，这个边缘服务器是肯定有剩余带宽的，其他边缘服务器可能也有剩余带宽，因为一部分计算能力较强，与边缘服务器距离较近的客户端已经完成本地训练、上传参数的操作并且释放了带宽，这部分带宽也会由云服务器分配给下一轮参与训练的客户端。云服务器选取客户端后所有边缘服务器的剩余带宽比设为0，考虑到一些客户端因为异常原因长时间无法完成训练并释放带宽，这部分客户端完成训练后模型参数也是过时的，因此在一段时间后分配且未被释放的带宽将自动释放，用于下一轮客户端的参数上传。In addition, after the cloud server selects the number of clients, after an edge server performs edge aggregation and uploads parameters, this edge server definitely has remaining bandwidth. Other edge servers may also have remaining bandwidth, because some of them have strong computing power and are different from the edge servers. Clients that are closer have completed local training, uploaded parameters, and released bandwidth. This bandwidth will also be allocated by the cloud server to the clients participating in the next round of training. After the cloud server selects the client, the remaining bandwidth ratio of all edge servers is set to 0. Considering that some clients cannot complete training and release bandwidth for a long time due to abnormal reasons, the model parameters of these clients are also outdated after completing training, so within a period of time The bandwidth allocated after the time and not released will be automatically released for the next round of client parameter uploads.

为验证本专利的应用，下面给出本专利的具体实验：In order to verify the application of this patent, the specific experiments of this patent are given below:

实验环境：lab environment:

实验考虑在由一个云服务器，5个边缘服务器和250个待参与训练的客户端构成的分布式框架下进行训练。每个边缘服务器范围内包含50个客户端，在边缘层次的联邦学习中,每一个边缘下将有10个客户端被随机选择参与训练，云端层次的联邦学习中每个边缘服务器的参数都参与云聚合。The experiment considers training under a distributed framework consisting of a cloud server, 5 edge servers and 250 clients to participate in training. Each edge server contains 50 clients. In edge-level federated learning, 10 clients under each edge will be randomly selected to participate in training. In cloud-level federated learning, the parameters of each edge server are involved. Cloud aggregation.

本地训练中，考虑了以Lenet卷积神经网络为模型，在minst数据集上进行验证。模型初始化了两个卷积层，一个dropout层和两个全连接层。第一个卷积层的输入通道数为3，输出通道数为10，卷积核大小为5。第二个卷积层的输入通道数为10，输出通道数为20，卷积核大小为5。第一个全连接层的输入节点数为320，输出节点数为50。第二个全连接层的输入节点数为50，输出节点数为10。输入数据通过第一个卷积层，然后进行最大池化和ReLU激活。接着，数据通过第二个卷积层和dropout层，再进行最大池化和ReLU激活。然后，将数据展平并通过两个全连接层，返回输出结果。In local training, the Lenet convolutional neural network was considered as a model and verified on the minst data set. The model is initialized with two convolutional layers, one dropout layer and two fully connected layers. The number of input channels of the first convolutional layer is 3, the number of output channels is 10, and the convolution kernel size is 5. The number of input channels of the second convolutional layer is 10, the number of output channels is 20, and the convolution kernel size is 5. The number of input nodes of the first fully connected layer is 320, and the number of output nodes is 50. The number of input nodes of the second fully connected layer is 50, and the number of output nodes is 10. The input data is passed through the first convolutional layer, followed by max pooling and ReLU activation. Then, the data passes through the second convolution layer and dropout layer, and then undergoes max pooling and ReLU activation. The data is then flattened and passed through two fully connected layers, and the output is returned.

本实施例从60000张带有标签的数据集中随机选择1000张分配给云端，5000张平均分成5份分配给5个边缘服务器，用于测试模型在边缘和云端的精度，其余54000张平均分配给250个客户端。本实施例采用独立同分布的方式进行分配，边缘服务器和客户端分配样本的过程中都进行均匀随机的分配。This example randomly selects 1,000 images from the 60,000 labeled data set and distributes them to the cloud. The 5,000 images are evenly divided into 5 parts and distributed to 5 edge servers for testing the accuracy of the model at the edge and the cloud. The remaining 54,000 images are evenly distributed to 250 clients. In this embodiment, the independent and identically distributed method is used for allocation. The edge server and the client allocate uniformly and randomly during the process of allocating samples.

本实施例模拟了本地训练完成后，模型参数上传至边缘服务器的时间。标准化所有客户端到边缘服务器的距离为(0,1)间，考虑信号强度考虑仅和自由路径衰弱有关，设边缘服务器范围内最远的客户端信噪比为5，则根据信道中最大信息传送速率公式，上传时延可以表示为 This embodiment simulates the time it takes for the model parameters to be uploaded to the edge server after local training is completed. The distance from all clients to the edge server is standardized to be between (0,1). Considering that the signal strength is only related to the weakening of the free path, assuming that the signal-to-noise ratio of the farthest client within the range of the edge server is 5, then according to the maximum information in the channel Transmission rate formula, upload delay can be expressed as

实验中mnist数据集的学习率为0.01，实验过程中每轮学习率衰减为之前的0.995倍，每一轮本地训练过程中，模型在本地迭代40轮。In the experiment, the learning rate of the mnist data set was 0.01. During the experiment, the learning rate in each round was attenuated to 0.995 times the previous one. During each round of local training, the model was iterated locally for 40 rounds.

对比实验设置：Compare experimental settings:

·分层联邦学习(对照)：训练开始前以及每轮云聚合过后，5个边缘服务器随机选择范围内10个客户端参与训练，客户端在训练过程中可能移动，完成本地训练时，若在关联边缘服务器范围内，则将模型参数上传至相应边缘服务器。边缘服务器收集满所有客户端，或者超过最大等待时间时，进行一轮边缘聚合，并将聚合后的模型参数上传至云服务器。云服务器收集齐所有边缘服务器传来的参数后进行一轮云聚合。·Hierarchical federated learning (control): Before the start of training and after each round of cloud aggregation, 5 edge servers randomly select 10 clients within the range to participate in training. The clients may move during the training process. When completing local training, if Within the scope of the associated edge server, the model parameters are uploaded to the corresponding edge server. When the edge server collects all clients or exceeds the maximum waiting time, it performs a round of edge aggregation and uploads the aggregated model parameters to the cloud server. The cloud server collects the parameters from all edge servers and performs a round of cloud aggregation.

·上传模型参数时客户端关联最近服务器(对照)：在训练开始前和每轮云聚合过后，在每个边缘服务器范围内随机选择10个客户端进行训练。客户端在训练过程中可能移动，在本地训练结束后，选择与其距离最近的边缘服务器进行关联并上传模型参数。每个边缘服务器经过一定时间后，对已经上传完成的模型参数进行边缘聚合，未能及时上传的参数在实验中被放入缓冲区，这些参数根据上传时延在之后的边缘聚合中参与，缓冲区中的参数每轮都会根据陈旧函数进行修正。云服务器几乎能同时收到所有边缘服务器上传的模型参数和参与客户端的数量，进行云聚合。·When uploading model parameters, the client associates with the nearest server (control): Before the start of training and after each round of cloud aggregation, 10 clients are randomly selected for training within the range of each edge server. The client may move during the training process. After the local training is completed, it selects the edge server closest to it to associate with it and upload the model parameters. After a certain period of time, each edge server performs edge aggregation on the model parameters that have been uploaded. Parameters that failed to be uploaded in time were put into the buffer during the experiment. These parameters will participate in the subsequent edge aggregation based on the upload delay. Buffering The parameters in the zone are corrected each round based on the stale function. The cloud server can receive the model parameters and the number of participating clients uploaded by all edge servers almost simultaneously for cloud aggregation.

·基于带宽预分配的异步分层联邦学习：在训练开始前和每轮云聚合过后，根据每个边缘服务器的剩余带宽和转移矩阵计算出每个边缘服务器范围内选取的客户端数量，并将所有边缘服务器剩余带宽设为0。客户端在训练过程中可能移动，在本地训练结束后，根据其所处每个边缘服务器范围内的剩余带宽数和距离，计算上传时延，选择时延最小的边缘服务器进行关联和上传模型参数(带宽·Asynchronous hierarchical federated learning based on bandwidth pre-allocation: Before the start of training and after each round of cloud aggregation, the number of clients selected within each edge server is calculated based on the remaining bandwidth and transfer matrix of each edge server, and The remaining bandwidth of all edge servers is set to 0. The client may move during the training process. After the local training is completed, the upload delay is calculated based on the remaining bandwidth and distance within each edge server where it is located, and the edge server with the smallest delay is selected to associate and upload the model parameters. (bandwidth

＝min(自定最多可分配的带宽,剩余带宽+1),上传完成后该边缘服务器剩余带宽+1。若客户端一段时间内未能上传至预分配的边缘服务器，该边缘服务器自动释放带宽。每个边缘服务器经过一定时间后，对已经上传完成的模型参数进行边缘聚合，并上传至云服务器。云服务器收到所有边缘服务器上传的模型参数和参与客户端的数量，进行云聚合。=min (customize the maximum allocated bandwidth, remaining bandwidth + 1), after the upload is completed, the remaining bandwidth of the edge server + 1. If the client fails to upload to the pre-allocated edge server within a period of time, the edge server automatically releases the bandwidth. After a certain period of time, each edge server performs edge aggregation on the uploaded model parameters and uploads them to the cloud server. The cloud server receives the model parameters and the number of participating clients uploaded by all edge servers and performs cloud aggregation.

结论分析：Conclusion analysis:

实验结果图3和图4所示，由于分层联邦学习每轮聚合的时间不固定，实验以训练时间为横坐标，以本实施例算法中每轮云聚合的时间作为1个时间单位。纵坐标分别为对应时间下，云聚合过后的模型在测试集上的准确率和loss值。The experimental results are shown in Figures 3 and 4. Since the time of each round of aggregation in hierarchical federated learning is not fixed, the experiment uses the training time as the abscissa and the time of each round of cloud aggregation in the algorithm of this embodiment as a time unit. The vertical coordinates are the accuracy and loss values of the model after cloud aggregation on the test set at the corresponding time.

由图可以看出，本实施例算法虽然采用异步聚合算法，在模型收敛和稳定性方面不如同步算法。但在此场景下，模型达到0.9的准确度，分层联邦学习需要113个时间单位，本实施例算法只需要33个时间单位，能快速收敛，并在108个时间单位时达到0.95的准确率，达到较好的效果。本实施例实验未考虑本地迭代的时延，实际效果会没这么明显。It can be seen from the figure that although the algorithm of this embodiment uses an asynchronous aggregation algorithm, it is not as good as the synchronous algorithm in terms of model convergence and stability. However, in this scenario, the model reaches an accuracy of 0.9. Hierarchical federated learning requires 113 time units. The algorithm in this embodiment only requires 33 time units, can converge quickly, and reaches an accuracy of 0.95 in 108 time units. , achieve better results. The experiment of this embodiment does not consider the delay of local iteration, and the actual effect will not be so obvious.

分层联邦学习算法在该场景下，由于客户端在训练过程时的移动，最后可能导致无法准时上传模型参数至相应的边缘服务器，在设置了最大时间等待的情况下，部分客户端最后没参与边缘模型的聚合，导致曲线的不稳定和最终聚合效果变差。In this scenario of the hierarchical federated learning algorithm, due to the movement of clients during the training process, the model parameters may not be uploaded to the corresponding edge server on time. When the maximum waiting time is set, some clients may not participate in the end. The aggregation of edge models leads to instability of the curve and poor final aggregation effect.

上传模型参数时客户端关联最近服务器能在分层联邦学习的框架中大幅缩短模型参数上传的时延，如图5所示，图中横坐标为客户端上传模型至关联的边缘服务器所需的时间，纵坐标为上传至最近边缘服务器所需时间，并将最大值标准化为1。但是该方法可能导致初始分配的客户端在训练完成后聚集于部分边缘服务器范围内的现象，导致这些边缘服务器需要接收额外的客户端，若未能分配带宽，则客户端阻塞于上传阶段，最后影响了模型收敛效果，导致曲线不稳定。When uploading model parameters, the client associates with the nearest server, which can greatly shorten the delay in uploading model parameters in the framework of hierarchical federated learning. As shown in Figure 5, the abscissa in the figure is the time required for the client to upload the model to the associated edge server. Time, the ordinate is the time required to upload to the nearest edge server, and the maximum value is normalized to 1. However, this method may cause the initially allocated clients to gather within the range of some edge servers after training is completed, causing these edge servers to need to receive additional clients. If bandwidth cannot be allocated, the clients will be blocked in the upload stage. Finally, It affects the convergence effect of the model and causes the curve to be unstable.

基于带宽预分配的异步分层联邦学习在贪心算法的基础上，对初始客户端的分配进行处理，使得本地训练完成时，每个边缘服务器下客户端的数量相对均匀，使得模型收敛过程更加稳定。收集上传时延短的客户端传来的参数后，边缘服务器会将更多带宽分配给上传时延长的客户端，加快模型参数的上传，使得更多客户端的模型参数能及时参与边缘聚合。Asynchronous hierarchical federated learning based on bandwidth pre-allocation processes the allocation of initial clients based on the greedy algorithm, so that when local training is completed, the number of clients under each edge server is relatively even, making the model convergence process more stable. After collecting parameters from clients with short upload delays, the edge server will allocate more bandwidth to clients with longer upload delays, speeding up the upload of model parameters, so that the model parameters of more clients can participate in edge aggregation in a timely manner.

本发明方案所公开的技术手段不仅限于上述技术手段所公开的技术手段，还包括由以上技术特征任意组合所组成的技术方案。The technical means disclosed in the solution of the present invention are not limited to the technical means disclosed in the above technical means, but also include technical solutions composed of any combination of the above technical features.

以上述依据本发明的理想实施例为启示，通过上述的说明内容，相关工作人员完全可以在不偏离本项发明技术思想的范围内，进行多样的变更以及修改。本项发明的技术性范围并不局限于说明书上的内容，必须要根据权利要求范围来确定其技术性范围。Taking the above-mentioned ideal embodiments of the present invention as inspiration and through the above description, relevant workers can make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the content in the description, and must be determined based on the scope of the claims.

Claims

1. An asynchronous hierarchical joint learning training method based on bandwidth pre-allocation, which is characterized by including the following steps:

Step 1: Cloud server selects client

In this step, the cloud server is responsible for coordinating and managing the joint learning task. The cloud server selects a corresponding number of clients from the range of each edge server according to the policy. Based on the situation that the number of optional clients exceeds the required clients, the cloud server selects a corresponding number of clients based on the policy. Select the client to participate in the next round of training based on the quality and computing power of the client's local training data, and send the model parameters to the client to participate in the next round of training. Based on the client's computing performance and data entropy weight, in Select clients from the edge server to participate in the next round of training;

The specific method of client selection is:

During the training process, the transfer probability of each edge server to another edge server can be represented by a matrix. Actual investigation obtains the matrix of client transfer situations within the coverage of the cloud server. Based on this matrix and the number of clients that need to be associated with each edge server, calculate The Izumo server initially selects the number of clients under each edge server so that the number of clients associated with each edge server is even at the end of training;

As shown in the matrix, the element e in the jth row and column l represents the probability that the client transfers from the edge server j range to the edge server l range during the training process.

The initial bandwidth allocation ratio of each client participating in training is β _device , and the cloud server is based on the remaining bandwidth allocation ratio of edge server j Related/> Clients are uploaded to the server at the end of training. Assume that the number of clients that |K| edge servers should be associated with in the next round is (n ₁ ,n ₂ ,n ₃ ,...,n _K ). The cloud server _The number of clients initially allocated under |K| edge servers X=(x ₁ _, x ₂ ,x ₃ ,..., _x _|K| ), so that .,y _|K| ), the following optimization problem is obtained:

for/> is the number of remaining clients under server j,

As a general constraint problem, it is solved by the multiplier method-PHR algorithm. Given the initial point (n ₁ ,n ₂ ,n ₃ ,...,n _|K| ), the penalty factor σ, the amplification coefficient c ₁ >1, Control error ε>0, constant θ∈(0,1), let k=1, the solution process is as follows:

Step 1: Taking x _k-1 as the initial point, solve the unconstrained problem:

Obtain the optimal solution x _k ;

Step2:If Then x _k is the optimal solution and stop; otherwise, go to Step3;

Step3: If Go to Step 4; otherwise let σ _k+1 = cσ _k and go to Step 4;

Step4: Correct the multiplier vector:

(λ _k+1 ) ₁ = (λ _k ) ₁ -σc ₁ (x _k )

(λ _k+1 ) _i =max[0,(λ _k ) _i -σc _i (x _k )],i=2,3,...,2|K|+1,

Let k=k+1, go to Step 2;

c _i (x) is the i-th constraint, the 1st is an equality constraint, the 2nd to 2|K|+1 are inequality constraints, iterate according to this algorithm and obtain the approximate optimal solution, The accuracy ε does not need to be very high, and the calculated x is rounded to an integer; the optimal solution is the optimal number of clients that the cloud server needs to select in each edge server range;

Step 2: Client local training

After receiving the model parameters, the selected client will perform model training locally. Each client executes an optimization algorithm according to the task type and model architecture to update the parameters of its local model. This process can carry out multiple rounds of iterations, each Model parameters will be updated in every iteration;

Step 3: Upload local parameters

After the client iterates the local model to a preset accuracy, it associates the client with the edge server based on the distance between the client's geographical location and the edge server containing the remaining bandwidth;

After the client performs local training to achieve a predetermined accuracy, it uploads the model parameters to the edge server. Under the framework of asynchronous hierarchical federated learning based on bandwidth pre-allocation, client i is not initially associated with the corresponding edge server, but is training At the end, the distance s _ij to each edge server j is calculated based on the current location, so that there is still remaining bandwidth, that is Select the edge server j with the closest distance, that is, mins _ij , to associate among the servers, and upload the model parameters to the edge server.

The initial bandwidth allocation ratio of the associated edge server to all clients is β _ji , then the parameter upload rate of client i is: where h _i is the channel gain of client i, N ₀ is the noise,/> is the receiving power of edge server j to client i, expressed as/> Where p _j is the maximum received power of edge server j, and c is a constant, then the client parameter upload delay/> Where |d _i | is the size of the model parameters that need to be uploaded;

Step 4: Edge aggregation

The model parameter aggregation is a federated learning method that generates a better global model by averaging, weighted averaging or other aggregation strategies on parameters from different clients; different from other hierarchical federated learning frameworks, edge servers There is no need to collect all the model parameters sent by the associated clients and then aggregate them. Instead, T _aggregation will aggregate the collected model parameters after a period of time; based on the fact that the collected model parameters are outdated data generated by the previous round of training In the case of , weight is given to the local iteration time, and the earlier parameters have smaller weights; each edge server sends the aggregated model parameters to the cloud server after aggregation at the edge;

Step 5: Cloud Aggregation

Every time T _aggregation elapses, the cloud server simultaneously receives the model parameters sent by all edge servers, aggregates the parameter models sent by the edge servers, and stops asynchronous stratification based on bandwidth pre-allocation until the aggregated model reaches the predetermined accuracy. Joint learning training, otherwise return to step 1.

2. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 1, characterized in that in step 2, the client participating in the training receives the model parameters transmitted from the cloud server, and the client i Use its own data set D _i to solve for the optimal model parameters ω to represent making the loss function/> Minimum, then the loss function on client i is expressed as:/> The client performs gradient descent in multiple iterations to gradually approach the optimal model parameters ω ^* =argmin[F _i (ω)] that minimize the loss function;

In order to achieve the predetermined local accuracy θ∈(0,1), the client needs to perform multiple rounds of local iterations L(θ)=μlog(1/θ), where the constant μ depends on the size of the training task. The nth _{LocalTraining} round local The iteration is expressed as: Iterate until/> When, local training is completed, where eta is the learning rate;

The computing delay of model training on the client is related to the number of iterations L(θ) required to achieve accuracy, the computing power of the client _Pi and the size of the training data set |D _i |. The computing delay of client i is expressed as : The computing power P is the number of samples processed by the client within a period of time. The size of the training data set is set in advance, and the calculation delay is effectively reduced by selecting a client with strong computing power.

3. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 1, characterized in that "when using the asynchronous aggregation method, the staleness function is introduced as a hyperparameter to reduce the impact of the stale model on the global model. Impact, the cloud server distributes the latest model to the clients participating in the next round of training; when performing edge aggregation, the stale function of each client is related to the number of cloud aggregation rounds that have been performed when each client receives the model. n _round is related to the latest round number n _CurrentRound of cloud aggregation, so the stale function corresponding to the parameters uploaded by client i is expressed as Where λ∈(0,1) is the given attenuation coefficient, and the parameter update on edge server j is expressed as: Where S _j represents the set of clients participating in this edge aggregation.

4. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 1, characterized in that in step 5, considering that the edge server and the cloud server have strong communication capabilities, the edge server The communication delay in uploading parameters to the cloud server is negligible. After the cloud server receives the parameters uploaded by the edge server, it performs a round of cloud aggregation;

Also because the quality of parameters received by cloud aggregation is uneven, it is necessary to introduce the stale function as a hyperparameter to reduce the impact of outdated models on global model training. The stale function of parameters uploaded by each edge server is consistent with the participating customers in the last round of aggregation of the edge server. It is related to the overall staleness of the end, and the staleness function is set to: The update of model parameters on the cloud server is expressed as:/> Among them/> Represented as the latest model parameters uploaded by edge server j, D _s is the total collection of data sets of clients that have participated in training. These model parameters are aggregated until the asynchronous analysis based on bandwidth pre-allocation is stopped until the aggregated model reaches the predetermined accuracy. layer joint learning training, otherwise return to step 1.

5. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 4, characterized in that, after receiving the parameters uploaded by the client, the edge server sends the parameters to the cloud server for cloud aggregation.

6. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 1, characterized in that, "based on the computing performance and data entropy weight of the client, select the clients participating in the next round of training in the edge server. end;

The data quality of the data set on the client is defined using the entropy weight method. m samples are extracted from client i, 1,2,…,k _i represents the attribute feature index on client i, 1,2,…,m is the sample index,/> Is the normalized value of the data attribute,/> The entropy weight assigned to the data attribute, is the information entropy of each data attribute;

Under each server j, select based on computing power P and local model quality Q Clients, assuming that P and Q under the same client change little in multiple rounds of training, the overall strategy is for each edge server to _first select the comprehensive capability φ=γP+(1- γ) Clients with a large Q value, where γ is the weight coefficient, and the remaining required clients are selected from the total set of training clients N _sum , and are selected from the range of edge server j/> For a client and parameter δ, the specific steps are as follows:

S2: If Then it is in N _sum -N _selected , and meets the condition of s _ji <R _j , that is, it is randomly selected within the range of edge server j/> clients, calculate the φ values of these clients and add them to N _selected . All clients in N _selected are the clients participating in the next round of training.

7. The asynchronous hierarchical joint learning training method based on bandwidth pre-allocation according to claim 6, characterized in that the cloud server selects the number of clients after one edge server performs edge aggregation and uploads parameters, and the other includes The remaining bandwidth of the edge server with remaining bandwidth will be allocated by the cloud server to the clients participating in the next round of training;

After the cloud server selects the client, the remaining bandwidth ratio of all edge servers is set to 0. After the set t _overtime cloud aggregation, the allocated but not released bandwidth is automatically released, and the allocated but not released bandwidth is automatically released. The released bandwidth is used for the next round of client parameter uploads.