CN113902021A

CN113902021A - High-energy-efficiency clustering federal edge learning strategy generation method and device

Info

Publication number: CN113902021A
Application number: CN202111191599.8A
Authority: CN
Inventors: 秦晓琦; 李艺璇; 韩凯峰; 马楠
Original assignee: Beijing University of Posts and Telecommunications; China Academy of Information and Communications Technology CAICT
Current assignee: Beijing University of Posts and Telecommunications; China Academy of Information and Communications Technology CAICT
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2022-01-07
Anticipated expiration: 2041-10-13
Also published as: CN113902021B

Abstract

The invention discloses a clustering federal edge learning strategy generation method and a clustering federal edge learning strategy generation device with high energy efficiency, wherein the method comprises the following steps: s1, initializing an edge access strategy by the cloud center; s2, the edge base station solves the bandwidth resource allocation strategy of the access equipment and sends the initialization model to the access equipment; s3, calculating the precision of the received global model by the equipment, training the local model by adopting a layered migration strategy according to the global model and the local data, calculating the energy spent on uploading the local model, taking the difference value between the test precision and the energy consumption as the local profit, and uploading the local model and the local profit to the accessed edge base station; s4, the edge base station hierarchically aggregates the local model, calculates edge income by averaging local income of all access devices, and uploads the edge income to the cloud center; s5, the cloud center calculates the system profit according to the received feedback information of the edge base station, and adjusts an edge access strategy by adopting a deep reinforcement learning algorithm; and S6, repeating the above processes until convergence.

Description

An energy-efficient clustering federated edge learning strategy generation method and device

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种高能效的聚类联邦边缘学习策略生成方法和装置。The invention relates to the technical field of data processing, in particular to a method and device for generating a clustering federated edge learning strategy with high energy efficiency.

背景技术Background technique

数据安全已成为人工智能技术持续发展所面临的关键问题。传统的机器学习技术是集中式的，其收集设备的数据到处理中心，进行集中训练，然而这可能会导致用户数据隐私泄露。Data security has become a key issue facing the sustainable development of artificial intelligence technology. The traditional machine learning technology is centralized, which collects the data of the device to the processing center for centralized training, however, this may lead to the leakage of user data privacy.

联邦学习是一种有前途的分布式机器学习架构，随着设备计算能力的提升，设备可以在本地使用采集的数据训练本地模型，之后只需要把本地模型上传到处理中心进行模型聚合，避免了原始数据直接上传，极大地保护了数据隐私。Federated learning is a promising distributed machine learning architecture. With the improvement of the computing power of the device, the device can use the collected data to train the local model locally, and then only need to upload the local model to the processing center for model aggregation, avoiding the need for The raw data is uploaded directly, which greatly protects data privacy.

在现实生活中，设备间的数据可能呈现非独立同分布特性，这对于联邦学习训练一个统一的全局模型带来了挑战。因此研究联邦学习如何适应每一个设备上的数据是非常有意义的。目前，已经有一些个性化联邦学习的研究被提出。In real life, the data between devices may exhibit non-IID characteristics, which brings challenges for federated learning to train a unified global model. Therefore, it is very interesting to study how federated learning adapts to the data on each device. At present, some studies on personalized federated learning have been proposed.

个性化联邦学习包括联邦迁移学习与联邦元学习，其本质都是先获得一个所有设备共享的基础全局模型，之后再在每一个设备上根据本地数据，微调全局模型以适应个性化数据特征。不同的个性化联邦学习策略各有弊端。联邦迁移学习和联邦元学习由于首先需要得到一个涵盖绝大多数特征的全局模型，然后再进行个性化，故其只适用于异构性较弱的数据，无法处理拥有强异构数据系统的个性化问题。Personalized federated learning includes federated transfer learning and federated meta-learning. Its essence is to first obtain a basic global model shared by all devices, and then fine-tune the global model on each device based on local data to adapt to personalized data features. Different personalized federated learning strategies have their own drawbacks. Federated transfer learning and federated meta-learning are only suitable for data with weak heterogeneity because they need to obtain a global model covering most of the features first, and then personalize it, so they cannot deal with the personality of a system with strong heterogeneous data. ization problem.

多任务联邦学习也是解决联邦学习个性化的有效方法，其通过计算相关矩阵来量化不同设备模型的相似性，然后把异构的数据作为不同的学习目标，从而进行多任务学习。联邦多任务学习只适用于凸问题或双凸问题，很难拓广到如常见的神经网络等非凸问题中，具有局限性。此外，这些个性化方法大多适用于输出标签不同的数据，如每个设备只拥有所有标签的子集，无法适用于一些条件分布不同且有明显聚类结构的数据。Multi-task federated learning is also an effective method to solve the personalization of federated learning. It calculates the correlation matrix to quantify the similarity of different device models, and then uses heterogeneous data as different learning goals to perform multi-task learning. Federated multi-task learning is only suitable for convex or biconvex problems, and it is difficult to extend to non-convex problems such as common neural networks, which has limitations. In addition, most of these personalized methods are suitable for outputting data with different labels. For example, each device only has a subset of all labels, and cannot be applied to some data with different conditional distributions and obvious clustering structure.

聚类联邦学习可以有效解决上述问题，其可以捕捉数据间的聚类结构，从而根据数据分布聚合多个模型来满足设备间异构的数据特征，极大地提高了学习准确度。由于联邦学习的隐私性，设备上的数据分布未知，这给聚类带来了极大的挑战。由理论分析可得，当学习模型之间的距离越小时，两者的数据分布越接近，故在不上传原始数据的情况下，聚类联邦学习大多采用模型距离来衡量不同设备上的数据相似性。衡量模型距离常用的指标为欧氏距离和余弦距离等。然而一些技术可以从本地模型中推断设备处的数据信息，从而造成数据隐私泄露。模型非线性加密方法能够很好地解决这个问题，但非线性加密之后的模型之间的距离与原模型距离可能不成正比，故使用本地模型距离聚类虽然计算复杂度较低，但在此情况下，无法通过加密之后的模型来判断数据间的相似性，导致聚类方法失效，所以这并不是一种可以广泛适用的方法。此外，现有的聚类联邦学习大多只考虑数据的统计异构性，忽略了系统的资源受限与通信瓶颈问题。同时，这些研究只考虑单基站场景，缺乏对多基站的拓展。对于能量有限的设备来说，通信开销是不可忽略的，单基站提供的频谱资源有限，对于信道状态不好的设备，上传本地模型会消耗大量的设备能量，从而在训练成本预算下降低学习性能。Cluster federated learning can effectively solve the above problems. It can capture the clustering structure between data, so as to aggregate multiple models according to the data distribution to meet the heterogeneous data characteristics between devices, which greatly improves the learning accuracy. Due to the privacy of federated learning, the data distribution on the device is unknown, which brings great challenges to clustering. It can be seen from theoretical analysis that when the distance between the learning models is smaller, the data distribution of the two is closer. Therefore, without uploading the original data, cluster federated learning mostly uses the model distance to measure the similarity of data on different devices. sex. Commonly used indicators to measure model distance are Euclidean distance and cosine distance. However, some techniques can infer data information at the device from the local model, resulting in data privacy leakage. The model nonlinear encryption method can solve this problem well, but the distance between the models after nonlinear encryption may not be proportional to the original model distance, so the use of local model distance clustering is less computationally complex, but in this case However, the similarity between the data cannot be judged by the encrypted model, which leads to the failure of the clustering method, so this is not a method that can be widely applied. In addition, most of the existing cluster federated learning only considers the statistical heterogeneity of data, ignoring the resource constraints and communication bottlenecks of the system. At the same time, these studies only consider the single base station scenario and lack the expansion to multiple base stations. For devices with limited energy, the communication overhead cannot be ignored. The spectrum resources provided by a single base station are limited. For devices with poor channel status, uploading the local model will consume a lot of device energy, thus reducing the learning performance under the training cost budget. .

传统的联邦学习需要设备通过广域网将本地模型上传到云端进行聚合，而设备的电池容量往往是有限的，联邦学习的多轮通信迭代与每一轮中巨大的通信开销将消耗大量的传输能量，从而在给定的能量预算下降低学习性能。多接入边缘计算(MEC)技术是一个有前途的分布式计算框架，其能够支持许多低时延低能耗应用的需求，MEC将延迟敏感和计算密集型任务卸载到边缘，实现了实时性和高能效。联邦边缘学习利用了MEC的优势，在云与设备间加入多个基站来进一步辅助训练，设备将本地模型上传到边缘基站处聚合。这极大地减少了设备与云之间通过广域网传输的通信开销，此外，通过对边缘基站和设备的整体协调，在数据非独立同分布的情况下，使系统实现高能效和高精度。在多基站联邦学习的架构中，大多只考虑训练成本，如时间和能耗等，没有考虑统计异构性给多基站场景带来的机遇与挑战，缺乏针对训练成本与学习性能的联合优化研究。Traditional federated learning requires the device to upload the local model to the cloud for aggregation through the WAN, and the battery capacity of the device is often limited. The multiple rounds of communication iterations of federated learning and the huge communication overhead in each round will consume a lot of transmission energy. Thereby reducing the learning performance for a given energy budget. Multi-access edge computing (MEC) technology is a promising distributed computing framework that can support the needs of many low-latency and low-energy applications. MEC offloads delay-sensitive and computationally intensive tasks to the edge, enabling real-time and High energy efficiency. Federated edge learning takes advantage of MEC, adding multiple base stations between the cloud and the device to further assist training, and the device uploads the local model to the edge base station for aggregation. This greatly reduces the communication overhead between the device and the cloud through the WAN transmission. In addition, through the overall coordination of edge base stations and devices, the system achieves high energy efficiency and high precision in the case of non-IID data. In the multi-base station federated learning architecture, most of them only consider the training cost, such as time and energy consumption, but do not consider the opportunities and challenges brought by statistical heterogeneity to multi-base station scenarios, and lack joint optimization research on training cost and learning performance. .

发明内容SUMMARY OF THE INVENTION

本发明针对上述现有技术的不足，在多边缘基站场景下，联合考虑设备的数据分布和能耗成本，找到统计异构与通信瓶颈的交叉点，从系统收益的角度出发，设计高能效和高精度的边缘接入策略与资源分配策略，提出一种高能效的聚类联邦边缘学习策略生成方法和装置。Aiming at the above-mentioned shortcomings of the prior art, the present invention, in the multi-edge base station scenario, jointly considers the data distribution and energy consumption cost of the equipment, finds the intersection of statistical heterogeneity and communication bottleneck, and designs high energy efficiency and High-precision edge access strategy and resource allocation strategy, and an energy-efficient clustering federated edge learning strategy generation method and device are proposed.

为了实现上述目的，本发明提供如下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

第一方面，本发明提供了一种高能效的聚类联邦边缘学习策略生成方法，包括以下步骤：In a first aspect, the present invention provides an energy-efficient method for generating a clustering federated edge learning strategy, comprising the following steps:

S1、云中心初始化边缘接入策略；S1. The cloud center initializes the edge access policy;

S2、边缘基站使用凸优化方法求解其接入设备的带宽资源分配策略，并将其初始化模型发送给接入设备；S2, the edge base station uses the convex optimization method to solve the bandwidth resource allocation strategy of its access device, and sends its initialization model to the access device;

S3、设备计算接收到的全局模型在本地测试数据集上精度，并根据全局模型和本地训练数据采用分层联邦迁移方法训练本地模型，计算上传本地模型所花费的能量，将测试精度与能耗的差值作为本地收益，然后将本地模型和本地收益上传到所接入的边缘基站；S3. The device calculates the accuracy of the received global model on the local test data set, and uses the hierarchical federated migration method to train the local model according to the global model and local training data, calculates the energy spent on uploading the local model, and compares the test accuracy and energy consumption. The difference is used as the local revenue, and then the local model and local revenue are uploaded to the connected edge base station;

S4、边缘基站分层聚合本地模型，同时通过平均所有接入设备的本地收益，计算边缘收益，将边缘收益上传到云中心；S4. The edge base station aggregates the local model hierarchically, and at the same time calculates the edge revenue by averaging the local revenue of all access devices, and uploads the edge revenue to the cloud center;

S5、云中心根据收到的边缘基站的反馈信息，计算系统收益，并采用深度强化学习算法调整边缘接入策略；S5. The cloud center calculates the system revenue according to the feedback information received from the edge base station, and uses the deep reinforcement learning algorithm to adjust the edge access strategy;

S6、重复上述过程直至收敛。S6. Repeat the above process until convergence.

进一步地，步骤S1中，设备与边缘基站的接入策略a_ij是一个二进制变量，即若设备i与边缘基站j通信，则a_ij＝1，否则，a_ij＝0，每个设备接入一个边缘基站。Further, in step S1, the access policy a _ij between the device and the edge base station is a binary variable, that is, if the device i communicates with the edge base station j, then a _ij =1, otherwise, a _ij =0, each device accesses an edge base station.

进一步地，步骤S2中的凸优化方法具体为：对于边缘基站j及其接入设备簇

给定边缘接入策略时，资源分配子问题的最优带宽分配β_ij计算公式如下：Further, the convex optimization method in step S2 is specifically: for edge base station j and its access device cluster

When the edge access strategy is given, the calculation formula of the optimal bandwidth allocation β _ij for the resource allocation sub-problem is as follows:

其中，h_ij表示设备i和边缘服务器j之间的信道增益，p_i表示设备i的模型上传功率，N₀表示高斯噪声的功率谱密度，β_ijB_j为接入边缘基站j的设备i所分得的带宽资源，接入边缘基站j的设备共享用带宽为B_j的公共频谱进行通信，

a_ij表示设备与边缘基站的接入策略，β_ij表示分配给设备i的带宽比例。where h _ij represents the channel gain between device i and edge server j, pi represents the model upload power of device _i , N ₀ represents the power spectral density of Gaussian noise, and β _ij B _j represents device i accessing edge base station j The allocated bandwidth resources, the devices accessing the edge base station j share the public spectrum with the bandwidth B _j for communication,

a _ij represents the access policy between the device and the edge base station, and β _ij represents the bandwidth ratio allocated to device i.

进一步地，设备根据收到的全局模型θ_j，利用本地数据

进行训练，设备i的损失函数公式为：Further, the device utilizes local data according to the received global model θ _j

For training, the loss function formula of device i is:

设备使用梯度下降法更新本地模型ω_i，公式如下：The device uses gradient descent to update the local model ω _i with the following formula:

其中，η为学习步长，且η≥0；Among them, η is the learning step size, and η≥0;

步骤S3采用分层联邦迁移学习策略训练本地模型，将神经网络分为基础特征层和个性特征层，分层联邦迁移学习策略具体过程为：Step S3 adopts the hierarchical federated transfer learning strategy to train the local model, and divides the neural network into a basic feature layer and a personality feature layer. The specific process of the hierarchical federated transfer learning strategy is as follows:

S301、根据如下公式计算每个边缘基站经过一定轮次后的平均学习精度：S301. Calculate the average learning accuracy of each edge base station after a certain round of times according to the following formula:

S302、将高于平均精度的设备基础特征层模型

和个性特征层模型

上传到接入的边缘基站，低于平均精度的设备上传基础特征层模型，个性特征层模型在设备本地更新，公式如下：S302. Set the equipment basic feature layer model with higher than average precision

and personality feature layer model

Uploaded to the connected edge base station, the device with lower than average precision uploads the basic feature layer model, and the individual feature layer model is updated locally on the device. The formula is as follows:

其中，

为设备i的本地个性特征层模型，

为迁移设备集。in,

is the local personality feature layer model of device i,

For the migration device set.

S303、边缘基站聚合所有设备的基础特征层模型并且聚合非迁移设备的个性特征层模型，边缘基站将聚合后的基础层模型下发到所有接入设备，而将个性层模型下发到非迁移设备，设备根据收到的模型再进行上述更新，如此迭代直到收敛。S303, the edge base station aggregates the basic feature layer models of all devices and aggregates the personality feature layer models of the non-migration devices, the edge base station delivers the aggregated base layer model to all access devices, and delivers the personality layer model to the non-migration devices The device, the device performs the above update according to the received model, and so on until convergence.

进一步地，步骤S3中，将全局模型在本地测试数据集上的学习精度g_ij作为衡量边缘基站j上的全局模型性能的指标，系统的学习性能增益G是所有设备的平均精度，公式所示：Further, in step S3, the learning accuracy g _ij of the global model on the local test data set is used as an index to measure the performance of the global model on the edge base station j, and the learning performance gain G of the system is the average accuracy of all devices, as shown in the formula. :

进一步地，步骤S3中设备i上传本地模型所消耗的能量E_ij公式如下：Further, in step S3, the formula for the energy _Eij consumed by the device i uploading the local model is as follows:

T_ij为设备i将本地模型上传到边缘基站的传输时延，公式如下：T _ij is the transmission delay for device i to upload the local model to the edge base station. The formula is as follows:

S表示本地模型的大小，r_ij为设备i上传模型的传输速率，公式如下：S represents the size of the local model, ri _ij is the transmission rate of the model uploaded by device i, and the formula is as follows:

h_ij表示设备i和边缘服务器j之间的信道增益，p_i表示设备i的模型上传功率，N₀表示高斯噪声的功率谱密度，β_ijB_j为接入边缘基站j的设备i所分得的带宽资源，接入边缘基站j的设备共享用带宽为B_j的公共频谱进行通信，

a_ij表示设备与边缘基站的接入策略，β_ij表示分配给设备i的带宽比例。进一步地，步骤S4中，在分层聚合之前，边缘基站聚合所有收到的本地模型，公式如下：h _ij represents the channel gain between device i and edge server j, pi represents the model upload power of device _i , N ₀ represents the power spectral density of Gaussian noise, β _ij B _j is the distribution of device i accessing edge base station j The obtained bandwidth resources, the devices accessing the edge base station j share the public spectrum with the bandwidth B _j for communication,

a _ij represents the access policy between the device and the edge base station, and β _ij represents the bandwidth ratio allocated to device i. Further, in step S4, before hierarchical aggregation, the edge base station aggregates all received local models, and the formula is as follows:

其中，

为所有接入边缘基站j的设备簇，ω_i为本地模型。in,

is all the device clusters accessing edge base station j, and ω _i is the local model.

在训练一定轮次之后，执行分层联邦迁移学习策略，边缘基站分层聚合收到的本地模型，具体方法为：边缘基站聚合所有设备的基础特征层模型以保证模型的泛化性能，并且聚合非迁移设备的个性特征层模型以消除设备间非独立同分布数据的影响，公式如下：After a certain round of training, the hierarchical federated transfer learning strategy is executed, and the edge base station aggregates the received local models hierarchically. The specific method is as follows: the edge base station aggregates the basic feature layer models of all devices to ensure the generalization performance of the model, and aggregates The personality feature layer model of non-migrated devices to eliminate the influence of non-IID data between devices, the formula is as follows:

其中，

为接入边缘基站j的设备的基础特征层全局模型，为所有设备共享，

为个性特征层全局模型，为非迁移设备共享，

为接入边缘基站j的非迁移设备簇。in,

is the global model of the basic feature layer of the device accessing the edge base station j, shared by all devices,

is a global model of the personality feature layer, shared for non-migration devices,

is a cluster of non-migrated devices accessing edge base station j.

进一步地，步骤S5中系统收益函数的公式如下：Further, the formula of the system revenue function in step S5 is as follows:

其中μ为连续变量，且μ∈[0,1]，用于调整学习性能和传输能耗之间的权衡关系，G_max与E_max为系统可达的最高精度与最大能耗。where μ is a continuous variable, and μ∈[0,1], which is used to adjust the trade-off relationship between learning performance and transmission energy consumption, and G _max and E _max are the highest accuracy and maximum energy consumption that can be achieved by the system.

进一步地，步骤S5中，采用深度强化学习调整边缘接入策略，深度强化学习的具体过程为：Further, in step S5, deep reinforcement learning is used to adjust the edge access strategy, and the specific process of deep reinforcement learning is:

S501、将边缘关联问题描述为马尔科夫过程

具体细节如下：S501. Describe the edge correlation problem as a Markov process

The specific details are as follows:

(1)状态

在第k轮，状态定义为S(k)＝{S₁(k),S₂(k),…,S_N(k)}，每一项S_i(k)定义为：(1) Status

In the kth round, the state is defined as S(k)={S ₁ (k), S ₂ (k),...,S _N (k)}, and each term S _i (k) is defined as:

S_i(k)＝{A_i(k-1),β_ij(k),Δ_i(k)}S _i (k)={A _i (k-1),β _ij (k),Δ _i (k)}

其中，Δ_i(k)表示相比于k-1轮，学习精度是否提升，即Δ_i(k)＝1代表精度有所提升，反之，Δ_i(k)＝0；Among them, Δ _i (k) indicates whether the learning accuracy is improved compared to the k-1 round, that is, Δ _i (k)=1 means that the accuracy is improved, otherwise, Δ _i (k)=0;

(2)动作

在第k轮，动作为每一个设备的边缘关联策略：(2) Action

In the kth round, the action is the edge association policy for each device:

A(k)＝{A₁(k)，A₂(k)，…，A_N(k)}A(k)={A ₁ (k), A ₂ (k), ..., A _N (k)}

其中每一项A_i(k)可表示为：Each of them A _i (k) can be expressed as:

A_i(k)＝{a_ij(k)}A _i (k)={a _ij (k)}

(3)奖励

把奖励设置为目标函数：(3) Rewards

Set the reward as the objective function:

S502、选择DQN作为基础框架，并且结合dueling DQN和double DQN来优化我们的算法，使用D3QN来解决边缘接入问题，Q值函数Q(S，A；θ)由参数为θ的神经网络来逼近，代表环境与动作之间的映射关系，神经网络的输出通过贝尔曼方程得到：S502, select DQN as the basic framework, and combine dueling DQN and double DQN to optimize our algorithm, use D3QN to solve the edge access problem, the Q value function Q(S, A; θ) is approximated by a neural network with a parameter of θ , represents the mapping relationship between the environment and the action, and the output of the neural network is obtained through the Bellman equation:

其中，S'，A'，θ'分别为下一时隙的状态，动作和对应参数；Among them, S', A', θ' are the state, action and corresponding parameters of the next time slot respectively;

在DQN中使用两个结构相同但参数不同的Q网络来提高算法的稳定性，一个是具有最新参数的当前Q网络，用于评估当前状态-动作的价值函数，另一个是带有过去轮次参数的目标Q网络，并在一段时间内保持Q值不变，把当前Q网络的Q值作为神经网络的输入，DQN的目标是最小化两个Q网络之间的差异，并将其定义为DQN的损失函数：Two Q-networks with the same structure but different parameters are used in DQN to improve the stability of the algorithm, one is the current Q-network with the latest parameters to evaluate the value function of the current state-action, and the other is the one with the past rounds The goal of the parameter Q network, and keep the Q value unchanged for a period of time, taking the Q value of the current Q network as the input of the neural network, the goal of DQN is to minimize the difference between the two Q networks, and it is defined as The loss function of DQN:

L(θ)＝E[(y-Q(S，A；θ))²]L(θ)=E[(yQ(S, A; θ)) ² ]

S503、采用DDQN算法，选择当前Q网络中最大Q值对应的动作：S503, using the DDQN algorithm, select the action corresponding to the maximum Q value in the current Q network:

再将选择的动作带入目标Q网络计算Q值：Then bring the selected action into the target Q network to calculate the Q value:

y＝R(S，A)+γQ'(φ(S'),A^max(S'；θ)；θ')y=R(S,A)+γQ'(φ(S'), ^Amax (S';θ);θ')

S504、使用dueling DQN来优化网络结构，并将网络分成两部分，分别为只与状态有关的值函数V(S,θ,α)和与状态和动作都相关的势函数A(S,A,θ，β)，其中θ为两个网络的公共参数，α为值函数独有参数，β为势函数独有参数，Q值为这两个函数的和：S504. Use dueling DQN to optimize the network structure, and divide the network into two parts, which are the value function V(S, θ, α) related only to the state and the potential function A(S, A, α) related to both the state and the action. θ, β), where θ is the common parameter of the two networks, α is the unique parameter of the value function, β is the unique parameter of the potential function, and the Q value is the sum of these two functions:

Q(S,A,θ,α,β)＝V(S,θ,α)+A(S,A,θ,β)。Q(S,A,θ,α,β)=V(S,θ,α)+A(S,A,θ,β).

第二方面，本发明提供了一种高能效的聚类联邦边缘学习策略生成装置，包括计算机存储器、计算机处理器以及存储在计算机存储器中并可在计算机处理器上执行的计算机程序，其特征在于，计算机处理器执行所述计算机程序时实现上述的高能效的聚类联邦边缘学习策略生成方法。In a second aspect, the present invention provides an energy-efficient clustering federated edge learning strategy generation device, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that , when the computer processor executes the computer program, the above energy-efficient clustering federated edge learning strategy generation method is implemented.

与现有技术相比，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

1、本发明联合考虑了现实中执行联邦学习的实际瓶颈，即设备数据的非独立同分布特性和设备的通信、能量限制，而大多数研究只考虑单一问题。1. The present invention jointly considers the actual bottleneck of implementing federated learning in reality, that is, the non-IID characteristics of device data and the communication and energy constraints of devices, while most studies only consider a single problem.

2、本发明考虑到设备的通信开销，本发明将传统单基站的联邦学习拓广到了多基站场景。不同于多基站场景中只解决通信瓶颈问题，本发明从系统收益的角度出发，联合考虑数据分布的异构性和信道状态，设计了高精度和高能效的边缘接入策略和资源分配策略。2. The present invention takes into account the communication overhead of the device, and the present invention extends the traditional federated learning of a single base station to a multi-base station scenario. Different from only solving the communication bottleneck problem in the multi-base station scenario, the present invention designs a high-precision and energy-efficient edge access strategy and resource allocation strategy by jointly considering the heterogeneity of data distribution and channel state from the perspective of system benefits.

3、为了增加算法的普适性，本发明考虑到联邦学习的数据隐私问题，特别是一些技术可以从设备上传的模型中推断出设备端的数据，非线性隐私加密是进一步保护数据隐私的算法，故常用的模型距离聚类的方法失效。本发明设计了深度强化学习，根据边缘反馈信息来自适应地探索边缘接入策略，保护数据隐私。同时为了增加算法的可拓展性，降低算法复杂度，本发明把资源分配问题解耦到边缘基站上独立求解。3. In order to increase the universality of the algorithm, the present invention takes into account the data privacy problem of federated learning, especially some technologies can infer the data on the device side from the model uploaded by the device. Non-linear privacy encryption is an algorithm to further protect data privacy, Therefore, the commonly used method of model distance clustering is invalid. The invention designs deep reinforcement learning, adaptively explores edge access strategies according to edge feedback information, and protects data privacy. At the same time, in order to increase the scalability of the algorithm and reduce the complexity of the algorithm, the present invention decouples the resource allocation problem to the edge base station to solve it independently.

4、本发明考虑到会出现拥有不一致数据分布的设备接入同一边缘基站的情况，为此设计了分层迁移学习来进一步提高学习性能。分析可得，本发明设计的分层迁移策略并不会额外消耗能量。4. The present invention considers the situation that devices with inconsistent data distribution may access the same edge base station, and for this purpose, hierarchical migration learning is designed to further improve the learning performance. The analysis shows that the hierarchical migration strategy designed in the present invention does not consume extra energy.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only described in the present invention. For some of the embodiments, those of ordinary skill in the art can also obtain other drawings according to these drawings.

图1为本发明实施例提供的聚类联邦边缘学习系统架构。FIG. 1 is an architecture of a cluster federated edge learning system provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了更好地理解本技术方案，下面结合附图对本发明的方法做详细的说明。In order to better understand the technical solution, the method of the present invention will be described in detail below with reference to the accompanying drawings.

本发明提供了一种高能效的聚类联邦边缘学习策略生成方法，包括以下步骤：The present invention provides an energy-efficient method for generating a clustering federated edge learning strategy, comprising the following steps:

S3、设备计算接收到的全局模型在本地测试数据集上的精度，并根据全局模型和本地训练数据采用分层联邦迁移方法训练本地模型，计算上传本地模型所花费的能量，将测试精度与能耗的差值作为本地收益，然后将本地模型和本地收益上传到所接入的边缘基站；S3. The device calculates the accuracy of the received global model on the local test data set, and uses the hierarchical federated migration method to train the local model according to the global model and local training data, calculates the energy spent on uploading the local model, and compares the test accuracy with the performance. The difference in consumption is used as the local revenue, and then the local model and local revenue are uploaded to the connected edge base station;

其中，本发明考虑一个多基站场景下的聚类联邦边缘学习架构，如图1所示，它由一个云中心S，M个边缘基站和N个设备组成。在网络中，以

作为边缘基站集合，

为边缘基站数量，

为设备集合，

为设备数量。对于每一个设备

采集并储存训练数据集

其中x_in为设备i的第n个储存样本，y_in为x_in的对应标签，

为设备i的训练数据量。不同设备的训练数据采集自不同的数据源，故联邦学习的训练数据非独立同分布。Among them, the present invention considers a cluster federated edge learning architecture in a multi-base station scenario, as shown in FIG. 1 , which consists of a cloud center S, M edge base stations and N devices. in the network, with

As a set of edge base stations,

is the number of edge base stations,

for the device collection,

is the number of devices. for each device

Collect and store training datasets

where x _in is the nth storage sample of device i, y _in is the corresponding label of x _in ,

is the amount of training data for device i. The training data of different devices is collected from different data sources, so the training data of federated learning is not independent and identically distributed.

在聚类联邦边缘学习中，系统的目标是学习多个模型来满足设备上的异构数据。联邦学习训练过程包括以下几个步骤：In cluster federated edge learning, the goal of the system is to learn multiple models to satisfy heterogeneous data on devices. The federated learning training process includes the following steps:

边缘基站向设备发送初始全局模型；The edge base station sends the initial global model to the device;

设备根据收到的全局模型θ_j，利用本地数据

进行训练。设备i的损失函数定义为：The device utilizes local data according to the received global model θ _j

to train. The loss function for device i is defined as:

设备使用梯度下降法更新本地模型ω_i，如下所示：The device updates the local model ω _i using gradient descent as follows:

其中，η为学习步长，且η≥0。Among them, η is the learning step size, and η≥0.

并将更新后的本地模型通过无线链路上传到接入的边缘基站；And upload the updated local model to the connected edge base station through the wireless link;

边缘基站聚合所有收到的本地模型，如下所示：The edge base station aggregates all received local models as follows:

其中，

为所有接入边缘基站j的设备簇。in,

is all the device clusters accessing edge base station j.

重复上述过程直到模型收敛。Repeat the above process until the model converges.

设备与边缘基站的接入策略a_ij是一个二进制变量，即若设备i与边缘基站j通信，则a_ij＝1，否则，a_ij＝0。每个设备只可以接入一个边缘基站，所以本发明有：The access policy a _ij between the device and the edge base station is a binary variable, that is, if the device i communicates with the edge base station j, a _ij =1; otherwise, a _ij =0. Each device can only access one edge base station, so the present invention has:

本发明将全局模型在本地数据集上的学习精度g_ij作为衡量边缘基站j上的全局模型性能的指标，系统的学习性能增益G可以看作是所有设备的平均精度，如下所示：In the present invention, the learning accuracy g _ij of the global model on the local data set is used as an index to measure the performance of the global model on the edge base station j, and the learning performance gain G of the system can be regarded as the average accuracy of all devices, as shown below:

值得注意的是，拥有非独立同分布训练数据的设备接入同一边缘基站对学习性能有消极的影响，因此统计异构性是设计边缘接入策略时不可忽视的关键问题。It is worth noting that the access of devices with non-IID training data to the same edge base station has a negative impact on the learning performance, so statistical heterogeneity is a key issue that cannot be ignored when designing edge access strategies.

对于本地模型的上传过程，本发明采用正交频分多址接入(orthogonalfrequency division multiple access，OFDMA)通信制式，这也很容易拓展到其他通信制式。所有接入边缘基站j的设备共享可用带宽为B_j的公共频谱进行通信，β_ij表示分配给设备i的带宽比例。那么本发明有：For the uploading process of the local model, the present invention adopts an orthogonal frequency division multiple access (orthogonal frequency division multiple access, OFDMA) communication standard, which is also easily extended to other communication standard. All devices accessing edge base station j share a common spectrum with available bandwidth B _j for communication, and β _ij represents the bandwidth ratio allocated to device i. Then the present invention has:

通过上述分析，接入边缘基站j的设备i所分得的带宽资源为β_ijB_j。设备i上传模型的传输速率可表示如下：Through the above analysis, the bandwidth resource allocated by the device i accessing the edge base station j is β _ij B _j . The transfer rate of the model uploaded by device i can be expressed as follows:

其中，h_ij表示设备i和边缘服务器j之间的信道增益，p_i表示设备i的模型上传功率，N₀表示高斯噪声的功率谱密度。令S表示本地模型的大小，设备i将本地模型上传到边缘基站的传输时延可表示如下：where h _ij represents the channel gain between device _i and edge server j, pi represents the model upload power of device i, and N ₀ represents the power spectral density of Gaussian noise. Let S denote the size of the local model, and the transmission delay for device i to upload the local model to the edge base station can be expressed as follows:

那么，设备i上传本地模型所消耗的能量可表示如下：Then, the energy consumed by device i to upload the local model can be expressed as follows:

本发明把所有设备的平均传输能耗作为联邦学习系统的通信成本，显然，通信成本也很容易拓广到其他资源，如训练时延等。系统的通信成本可表示如下：The present invention takes the average transmission energy consumption of all devices as the communication cost of the federated learning system. Obviously, the communication cost can also be easily extended to other resources, such as training delay. The communication cost of the system can be expressed as follows:

从上述分析可得，边缘接入策略与带宽资源分配策略都会影响设备能量消耗。因此，在设计边缘接入策略时也应该考虑通信成本。From the above analysis, it can be seen that both the edge access strategy and the bandwidth resource allocation strategy will affect the energy consumption of the device. Therefore, communication costs should also be considered when designing edge access strategies.

为了实现节省通信成本的同时提高学习精度，本发明以系统收益去量化联邦学习整体性能。本发明将系统收益定义如下：In order to save the communication cost and improve the learning accuracy, the present invention uses the system benefit to quantify the overall performance of the federated learning. The present invention defines system benefits as follows:

其中μ为连续变量，且μ∈[0,1]，用于调整学习性能和传输能耗之间的权衡关系。G_max与E_max为系统可达的最高精度与最大能耗。正则化的目的是减轻两者不同数量级对策略的影响。where μ is a continuous variable, and μ∈[0,1], which is used to adjust the trade-off between learning performance and transmission energy consumption. G _max and E _max are the highest precision and energy consumption that the system can achieve. The purpose of regularization is to mitigate the influence of the two different orders of magnitude on the policy.

本发明的目标是要找到边缘接入策略与资源分配策略来最大化系统收益。该优化问题可表述如下：The goal of the present invention is to find edge access strategies and resource allocation strategies to maximize system benefits. The optimization problem can be formulated as follows:

max Pmax P

s.t.

st

在目标函数中，a_ij为二进制变量，β_ij为连续变量。这个优化问题可以表述为混合整数非线性规划问题(MINLP)。In the objective function, a _ij is a binary variable, and β _ij is a continuous variable. This optimization problem can be formulated as a mixed integer nonlinear programming problem (MINLP).

由于联邦学习的隐私性，设备数据的统计分布不可获得，故直接得到全局最优解是十分困难的。同时为了防止从设备上传的本地模型参数中得到原始数据信息，联邦学习常与非线性隐私加密方法联用。考虑到这一问题，同时也为了增加所提算法的普适性，本发明使用深度强化学习来自适应地根据边缘反馈信息，探索一种多基站场景下的边缘接入策略，并且能够在没有数据交换的情况下，以一种保护数据隐私的方式最大化系统利润。Due to the privacy of federated learning, the statistical distribution of device data cannot be obtained, so it is very difficult to directly obtain the global optimal solution. At the same time, in order to prevent the original data information from the local model parameters uploaded by the device, federated learning is often used in conjunction with nonlinear privacy encryption methods. Considering this problem, and also in order to increase the universality of the proposed algorithm, the present invention uses deep reinforcement learning to adaptively explore an edge access strategy in a multi-base station scenario based on edge feedback information, and can be used without data. In the case of exchange, maximize the profit of the system in a way that preserves data privacy.

深度强化学习可以通过离散化连续变量或者连续化离散变量等方式把不同类型的变量转化为同一类型进行统一求解。然而，随着求解变量的增多，深度强化学习很容易陷入局部最优解，从而得到不令人满意的结果。所以本发明把原问题解耦为两个子问题进行求解，分别为：边缘关联问题与给定边缘接入策略下的资源分配问题。对于边缘关联子问题，本发明在云端部署深度强化学习来自适应调整边缘基站与设备间接入策略。而资源分配子问题与边缘接入问题相关，故本发明在给定边缘接入策略的情况下，把资源分配策略解耦到每一个边缘基站上单独求解，这降低了算法的复杂性，同时也增加了算法的可拓展性。Deep reinforcement learning can transform different types of variables into the same type for unified solution by discretizing continuous variables or continuous discrete variables. However, as the number of solving variables increases, deep reinforcement learning can easily get trapped in local optimal solutions, resulting in unsatisfactory results. Therefore, the present invention decouples the original problem into two sub-problems to solve, which are: the edge association problem and the resource allocation problem under a given edge access strategy. For the edge correlation sub-problem, the present invention deploys deep reinforcement learning in the cloud to adaptively adjust the access strategy between the edge base station and the device. The resource allocation sub-problem is related to the edge access problem, so the present invention decouples the resource allocation strategy to each edge base station and solves it separately under the condition of a given edge access strategy, which reduces the complexity of the algorithm and at the same time It also increases the scalability of the algorithm.

本发明观察到，当边缘接入策略固定时，系统的学习性能便随之确定，那么优化问题便可简化为如何分配通信资源使得上传能耗最小的问题。而每一个基站的带宽资源由其独立决定，与其他边缘基站无关。所以，多边缘基站的资源分配问题又可以分解为M个子问题，分别在每个边缘基站上单独求解。对于每个边缘基站，需要求解如下问题：The present invention observes that when the edge access strategy is fixed, the learning performance of the system is determined accordingly, and the optimization problem can be simplified to the problem of how to allocate communication resources to minimize uploading energy consumption. The bandwidth resources of each base station are independently determined by it and have nothing to do with other edge base stations. Therefore, the resource allocation problem of multi-edge base stations can be decomposed into M sub-problems, which are solved separately on each edge base station. For each edge base station, the following problems need to be solved:

其中，

为接入边缘基站j的设备簇，N_j为设备组

中的设备数量。in,

is the device cluster accessing edge base station j, and N _j is the device group

number of devices in .

很明显，上述问题是凸问题。因为变量β_ij在可行域是凸的，且所有约束条件都是仿射的。Obviously, the above problem is convex. Because the variable β _ij is convex in the feasible region, and all constraints are affine.

本发明使用常用的Karush-Kuhn-Tucker(KKT)条件去获得带宽分配的解析解，本发明有如下定理。The present invention uses the commonly used Karush-Kuhn-Tucker (KKT) condition to obtain the analytical solution of bandwidth allocation, and the present invention has the following theorem.

定理1：对于边缘基站j及其训练设备簇

给定边缘接入策略时，资源分配子问题的最优带宽分配β_ij可以表示如下：Theorem 1: For edge base station j and its training device cluster

Given an edge access strategy, the optimal bandwidth allocation β _ij for the resource allocation sub-problem can be expressed as follows:

定理1的证明过程如下：The proof process of Theorem 1 is as follows:

上述凸问题可以用拉格朗日乘子法求解，子问题目标函数的拉格朗日方程可表示如下：The above convex problem can be solved by the Lagrangian multiplier method, and the Lagrangian equation of the objective function of the subproblem can be expressed as follows:

其中，λ为凸问题约束条件的拉格朗日乘子。为了求解拉格朗日方程，本发明计算其KKT条件：where λ is the Lagrange multiplier of the constraints of the convex problem. In order to solve the Lagrange equation, the present invention calculates its KKT condition:

通过求解上式，可得：By solving the above equation, we can get:

基于此，本发明可以得到带宽分配和拉格朗日乘子的表达式，那么本发明有：Based on this, the present invention can obtain the expression of bandwidth allocation and Lagrange multiplier, then the present invention has:

同时，根据KKT条件，本发明有：Meanwhile, according to KKT conditions, the present invention has:

因此可得：So we get:

通过上式，本发明便可以求解出带宽资源分配变量，可表示为：Through the above formula, the present invention can solve the bandwidth resource allocation variable, which can be expressed as:

通过定理1，本发明可以有效地求解通信资源分配问题，并且对于给定的边缘接入策略，本发明都有在此情况下最优的带宽资源分配策略，并构成了一一对应关系，降低了原问题的求解难度。Through Theorem 1, the present invention can effectively solve the problem of communication resource allocation, and for a given edge access strategy, the present invention has the optimal bandwidth resource allocation strategy in this case, and forms a one-to-one correspondence, reducing the difficulty of solving the original problem.

对于边缘接入问题，传统的方法需要获得全部信息再进行求解，然而由于联邦学习的隐私性，这是不可能的。深度强化学习是一种通过不断探索环境且不需要任何先验信息的算法。本发明设计了一种能够根据边缘基站的反馈信息，自适应地调整边缘接入策略的深度强化学习方法。边缘关联问题可以描述为马尔科夫过程

具体细节如下：For the edge access problem, the traditional method needs to obtain all the information before solving it, but this is impossible due to the privacy of federated learning. Deep reinforcement learning is an algorithm that continuously explores the environment without any prior information. The present invention designs a deep reinforcement learning method capable of adaptively adjusting the edge access strategy according to the feedback information of the edge base station. The edge correlation problem can be described as a Markov process

The specific details are as follows:

(1)状态

在第k轮，云端只能观察到来自边缘基站对上一轮边缘接入策略的反馈信息，故本发明把状态定义为S(k)＝{S₁(k),S₂(k),…,S_N(k)},每一项S_i(k)可定义为：(1) Status

In the kth round, the cloud can only observe the feedback information from the edge base station on the edge access strategy of the previous round, so the present invention defines the state as S(k)={S ₁ (k), S ₂ (k), ...,S _N (k)}, each term S _i (k) can be defined as:

S_i(k)＝{A_i(k-1),β_ij(k),Δ_i(k)}S _i (k)={A _i (k-1),β _ij (k),Δ _i (k)}

其中，Δ_i(k)表示相比于k-1轮，学习精度是否提升，即Δ_i(k)＝1代表精度有所提升，反之，Δ_i(k)＝0。Among them, Δ _i (k) indicates whether the learning accuracy is improved compared to the k-1 round, that is, Δ _i (k)=1 means that the accuracy is improved, otherwise, Δ _i (k)=0.

(2)动作

在第k轮，动作为每一个设备的边缘关联策略:(2) Action

In the kth round, the action is the edge association policy for each device:

A(k)＝{A₁(k),A₂(k),…,A_N(k)}A(k)={A ₁ (k),A ₂ (k),...,A _N (k)}

其中每一项A_i(k)可表示为：Each of them A _i (k) can be expressed as:

A_i(k)＝{a_ij(k)}A _i (k)={a _ij (k)}

(3)奖励

奖励是策略的指引方向，故本发明把奖励设置为目标函数：(3) Rewards

The reward is the guiding direction of the strategy, so the present invention sets the reward as the objective function:

由于边缘基站不知道所有可能的后续状态和最优动作，因此本发明使用无模型深度强化学习范式来更新边缘接入策略。同时，为了处理大状态空间和离散类型动作，本发明选择深度Q网络(DQN)作为基础框架，并且结合dueling DQN和double DQN来优化本发明的算法，使用D3QN来解决边缘接入问题。Since the edge base station does not know all possible subsequent states and optimal actions, the present invention uses a model-free deep reinforcement learning paradigm to update the edge access policy. At the same time, in order to deal with large state space and discrete type actions, the present invention selects Deep Q Network (DQN) as the basic framework, and combines dueling DQN and double DQN to optimize the algorithm of the present invention, and uses D3QN to solve the edge access problem.

DQN是基于值的算法，其Q值函数Q(S,A；θ)由参数为θ的神经网络来逼近，代表环境与动作之间的映射关系，神经网络的输出可以通过贝尔曼方程得到，本发明有：DQN is a value-based algorithm. Its Q-value function Q(S, A; θ) is approximated by a neural network with a parameter of θ, representing the mapping relationship between the environment and actions. The output of the neural network can be obtained through the Bellman equation, The present invention has:

其中，S'，A'，θ'分别为下一时隙的状态，动作和对应参数。Among them, S', A', θ' are the state, action and corresponding parameters of the next time slot, respectively.

在DQN中使用两个结构相同但参数不同的Q网络来提高算法的稳定性。一个是具有最新参数的当前Q网络，用于评估当前状态-动作的价值函数。另一个是带有过去轮次参数的目标Q网络，并在一段时间内保持Q值不变。本发明把当前Q网络的Q值作为神经网络的输入。显然，DQN的目标是最小化两个Q网络之间的差异，并将其定义为DQN的损失函数。本发明有:Two Q-networks with the same structure but different parameters are used in DQN to improve the stability of the algorithm. One is the current Q-network with up-to-date parameters to evaluate the current state-action value function. The other is a target Q network with parameters from past rounds and keeps the Q value constant for a period of time. The present invention takes the Q value of the current Q network as the input of the neural network. Obviously, the goal of DQN is to minimize the difference between two Q-networks and define it as the loss function of DQN. The present invention has:

L(θ)＝E[(y-Q(S,A；θ))²]L(θ)=E[(yQ(S,A;θ)) ² ]

为了满足马尔科夫过程中数据的非独立同分布特性，DQN采用经验回放策略来降低样本间的时间相关性，保证算法的稳定性。然而，DQN的目标值都是通过贪婪方法直接获得的，这导致了过度估计和较大的偏差。为了解决这个问题，本发明引入了DDQN算法，通过解耦目标动作的选择和对当前状态的评估来避免过度估计。与DQN中选择目标Q网络中最大Q值对应的动作不同，DDQN选择当前Q网络中最大Q值对应的动作，本发明有:In order to meet the non-IID characteristics of the data in the Markov process, DQN adopts an experience playback strategy to reduce the time correlation between samples and ensure the stability of the algorithm. However, the target values of DQN are all directly obtained by greedy methods, which lead to overestimation and large bias. To solve this problem, the present invention introduces the DDQN algorithm to avoid overestimation by decoupling the selection of target actions and the evaluation of the current state. Different from the action corresponding to the maximum Q value in selecting the target Q network in the DQN, the DDQN selects the action corresponding to the maximum Q value in the current Q network, and the present invention has:

将选择的动作带入目标Q网络计算Q值，那么本发明有:Bring the selected action into the target Q network to calculate the Q value, then the present invention has:

同时，为了更快地收敛，本发明使用dueling DQN来优化网络结构，并将网络分成两部分,分别为只与状态有关的值函数V(S,θ,α)和与状态和动作都相关的势函数A(S，A，θ，β)，其中θ为两个网络的公共参数，α为值函数独有参数，β为势函数独有参数。Q值可以看做这两个函数的和，本发明有：At the same time, in order to converge faster, the present invention uses dueling DQN to optimize the network structure, and divides the network into two parts, which are the value function V(S, θ, α) only related to the state and the value function V(S, θ, α) related to both state and action. Potential function A(S, A, θ, β), where θ is the common parameter of the two networks, α is the unique parameter of the value function, and β is the unique parameter of the potential function. The Q value can be regarded as the sum of these two functions, and the present invention has:

Q(S，A，θ,α，β)＝V(S，θ,α)+A(S，A，θ，β)Q(S, A, θ, α, β) = V(S, θ, α) + A(S, A, θ, β)

Dueling DQN可以更好地对策略进行评价，从而加速网络的收敛。Dueling DQN can better evaluate the policy, thereby speeding up the convergence of the network.

值得注意的是，云中心获得的边缘接入策略直接改变了边缘基站与设备的接入关系，进而指导边缘基站上的通信资源分配策略，从而影响系统学习性能和设备能耗。It is worth noting that the edge access strategy obtained by the cloud center directly changes the access relationship between the edge base station and the device, which in turn guides the communication resource allocation strategy on the edge base station, thereby affecting the system learning performance and device energy consumption.

考虑到系统可能会出现由于能耗的权衡，具有不同数据分布的设备接入相同的边缘基站，本发明利用了迁移学习的优势，设计了一种分层联邦迁移学习策略。本发明可以把神经网络分为基础特征层和个性特征层。基础特征层拥有大多数数据的共同特征，个性特征层捕捉不同数据独有的性质。本发明的分层联邦迁移学习具体描述如下：Considering that due to the trade-off of energy consumption in the system, devices with different data distributions are connected to the same edge base station, the present invention utilizes the advantages of transfer learning to design a hierarchical federated transfer learning strategy. The present invention can divide the neural network into basic feature layer and individual feature layer. The basic feature layer has common features of most data, and the personality feature layer captures the unique properties of different data. The hierarchical federated transfer learning of the present invention is specifically described as follows:

(1)识别迁移设备：本发明计算每个边缘基站经过一定轮次后的平均学习精度：(1) Identifying and migrating equipment: The present invention calculates the average learning accuracy of each edge base station after a certain round:

本发明把低于平均精度的设备视为需要进一步提升精度的设备，显然，边缘基站的设备簇中与大多数数据分布不同的设备精度一定低于平均精度。为了方便，本发明统称为迁移设备，其他设备称为非迁移设备。The present invention regards the equipment with lower than average precision as the equipment whose precision needs to be further improved. Obviously, the equipment with different data distribution in the equipment cluster of the edge base station must have lower precision than the average precision. For convenience, the present invention is collectively referred to as migration devices, and other devices are referred to as non-migration devices.

(2)分层联邦迁移学习：非迁移设备将自己的基础特征层模型

和个性特征层模型

上传到接入的边缘基站。迁移设备只上传基础特征层模型，个性特征层模型在设备本地更新，本发明有：(2) Hierarchical federated transfer learning: non-transferred devices transfer their own basic feature layer model

and personality feature layer model

Upload to the connected edge base station. The migration device only uploads the basic feature layer model, and the individual feature layer model is updated locally on the device. The present invention includes:

边缘基站聚合所有设备的基础特征层模型以保证模型的泛化性能，并且聚合非迁移设备的个性特征层模型以消除非独立同分布设备的影响。那么本发明有：The edge base station aggregates the basic feature layer models of all devices to ensure the generalization performance of the model, and aggregates the individual feature layer models of non-migrating devices to eliminate the influence of non-IID devices. Then the present invention has:

(3)边缘基站将聚合后的基础层模型下发到所有接入设备，而将个性层模型下发到非迁移设备。设备根据收到的模型再进行上述更新，如此迭代直到收敛。(3) The edge base station delivers the aggregated base layer model to all access devices, and delivers the personality layer model to non-migrated devices. The device performs the above update again according to the received model, and so on until convergence.

本发明提出的分层迁移学习策略并不会消耗额外的能量，原因是与传统联邦学习相同，设备在本地训练时更新每一层模型，不同的是，在上传时，非迁移设备需要上传每一层模型，而迁移设备只需要将个性特征层模型上传到接入的边缘基站，这反而减小了上传模型的大小。但是基础层模型占所有层的大多数，故本发明在计算能耗时忽略减小的这部分能耗。The layered migration learning strategy proposed by the present invention does not consume extra energy, because the same as the traditional federated learning, the device updates each layer model during local training, the difference is that when uploading, the non-migration device needs to upload each layer One layer model, and the migration device only needs to upload the personality feature layer model to the connected edge base station, which reduces the size of the uploaded model. However, the base layer model accounts for the majority of all layers, so the present invention ignores the reduced energy consumption when calculating the energy consumption.

综上，本发明提供一种高能效的聚类联邦边缘学习策略生成方法：In summary, the present invention provides an energy-efficient method for generating a clustering federated edge learning strategy:

首先，为达到联邦系统高效率学习的目的，本发明以学习性能作为系统收获，通信能耗作为系统成本，从而得到系统收益函数。为了研究在聚类联邦边缘学习网络中的系统收益优化问题，本发明联合考虑了通信条件和数据的异构特性，在保证学习性能的同时实现高能效，并把该问题量化为一个混合整数非线性规划(MINLP)问题。First, in order to achieve the purpose of high-efficiency learning of the federated system, the present invention takes the learning performance as the system gain, and the communication energy consumption as the system cost, so as to obtain the system gain function. In order to study the system revenue optimization problem in the cluster federated edge learning network, the present invention jointly considers the communication conditions and the heterogeneous characteristics of the data, achieves high energy efficiency while ensuring the learning performance, and quantifies the problem as a mixed integer non- Linear programming (MINLP) problem.

其次，为了有效地解决系统收益最大化问题，本发明观察到在确定边缘接入策略后，原问题可看作以高能效为目的的资源分配问题，故将其分解为两个子问题:边缘接入问题和给定边缘接入策略的资源分配问题，并据此设计了一个有效的迭代优化算法。对于边缘接入子问题，为了加强联邦学习数据的隐私性，同时能够更好地适用于模型非线性加密算法，本发明利用深度强化学习来探索边缘接入策略。在资源分配子问题中，为了降低算法的复杂度，采用凸优化算法来求解资源分配策略。Secondly, in order to effectively solve the problem of maximizing system revenue, the present invention observes that after determining the edge access strategy, the original problem can be regarded as a resource allocation problem aiming at high energy efficiency, so it is decomposed into two sub-problems: edge access Based on the access problem and the resource allocation problem for a given edge access strategy, an effective iterative optimization algorithm is designed accordingly. For the edge access sub-problem, in order to strengthen the privacy of the federated learning data and be better applicable to the model nonlinear encryption algorithm, the present invention utilizes deep reinforcement learning to explore the edge access strategy. In the resource allocation sub-problem, in order to reduce the complexity of the algorithm, a convex optimization algorithm is used to solve the resource allocation strategy.

最后，由于能耗的权衡，数据分布不同的设备可能会接入同一个基站共同训练，考虑到这种情况，本发明提出了一种分层联邦迁移学习策略，在不额外消耗能量的情况下，进一步提高学习精度。Finally, due to the trade-off of energy consumption, devices with different data distributions may access the same base station for common training. Considering this situation, the present invention proposes a hierarchical federated transfer learning strategy, which does not consume additional energy. , to further improve the learning accuracy.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换，但这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The recorded technical solutions are modified, or some technical features thereof are equivalently replaced, but these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. an energy-efficient clustering federated edge learning strategy generation method, is characterized in that, comprises the following steps:

S1. The cloud center initializes the edge access policy;

S2, the edge base station uses the convex optimization method to solve the bandwidth resource allocation strategy of its access device, and sends its initialization model to the access device;

S3. The device calculates the accuracy of the received global model on the local test data set, and uses the hierarchical federated migration method to train the local model according to the global model and local training data, calculates the energy spent on uploading the local model, and compares the test accuracy and energy consumption. The difference is used as the local revenue, and then the local model and local revenue are uploaded to the connected edge base station;

S4. The edge base station aggregates the local model hierarchically, and at the same time calculates the edge revenue by averaging the local revenue of all access devices, and uploads the edge revenue to the cloud center;

S5. The cloud center calculates the system revenue according to the feedback information received from the edge base station, and uses the deep reinforcement learning algorithm to adjust the edge access strategy;

S6. Repeat the above process until convergence.

2. energy-efficient clustering federated edge learning strategy generation method according to claim 1, is characterized in that, in step S1, the access strategy a _ij of equipment and edge base station is a binary variable, namely if equipment i and edge When base station j communicates, a _ij =1, otherwise, a _ij =0, and each device accesses an edge base station.

3. The energy-efficient clustering federated edge learning strategy generation method according to claim 1, wherein the convex optimization method in step S2 is specifically: for edge base station j and its access device cluster

where h _ij represents the channel gain between device i and edge server j, pi represents the model upload power of device _i , N ₀ represents the power spectral density of Gaussian noise, and β _ij B _j represents device i accessing edge base station j The allocated bandwidth resources, the devices accessing the edge base station j share the public spectrum with the bandwidth B _j for communication,

4. The energy-efficient clustering federated edge learning strategy generation method according to claim 1, wherein in step S3, the device utilizes local data according to the received global model θ _j

For training, the loss function formula of device i is:

The device uses gradient descent to update the local model ω _i with the following formula:

Among them, η is the learning step size, and η≥0;

In step S3, after a certain round of training, the local model is trained by the hierarchical federated transfer learning strategy, and the neural network is divided into a basic feature layer and a personality feature layer. The specific process of the hierarchical federated transfer learning strategy is as follows:

S301. Calculate the average learning accuracy of each edge base station after a certain round of times according to the following formula:

S302. Set the equipment basic feature layer model with higher than average precision

and personality feature layer model

in,

is the local personality feature layer model of device i,

For the migration device set.

S303, the edge base station aggregates the basic feature layer models of all devices and aggregates the personality feature layer models of the non-migration devices, the edge base station delivers the aggregated base layer model to all access devices, and delivers the personality layer model to the non-migration devices The device, the device performs the above update according to the received model, and so on until convergence.

5. The energy-efficient clustering federated edge learning strategy generation method according to claim 1, characterized in that, in step S3, the learning accuracy gij of the global model on the local test data set is used to measure the value of the edge base station _j . An indicator of global model performance, the learning performance gain G of the system is the average accuracy of all devices, as shown in the formula:

6. The energy-efficient clustering federated edge learning strategy generation method according to claim 1, characterized in that, in step S3, the energy _Eij that is consumed by the device i uploading the local model is as follows:

T _ij is the transmission delay for device i to upload the local model to the edge base station. The formula is as follows:

S represents the size of the local model, ri _ij is the transmission rate of the model uploaded by device i, and the formula is as follows:

h _ij represents the channel gain between device i and edge server j, pi represents the model upload power of device _i , N ₀ represents the power spectral density of Gaussian noise, β _ij B _j is the distribution of device i accessing edge base station j The obtained bandwidth resources, the devices accessing the edge base station j share the public spectrum with the bandwidth B _j for communication,

7. The energy-efficient clustering federated edge learning strategy generation method according to claim 1, wherein in step S4, before hierarchical aggregation, the edge base station aggregates all received local models, and the formula is as follows:

in,

After a certain round of training, the hierarchical federated transfer learning strategy is executed, and the edge base station aggregates the received local models hierarchically. The specific method is as follows: the edge base station aggregates the basic feature layer models of all devices to ensure the generalization performance of the model, and aggregates The personality feature layer model of non-migrated devices to eliminate the influence of non-IID data between devices, the formula is as follows:

in,

is a cluster of non-migrated devices accessing edge base station j.

8. energy-efficient clustering federated edge learning strategy generation method according to claim 1, is characterized in that, the formula of system revenue function in step S5 is as follows:

where μ is a continuous variable, and μ∈[0, 1], which is used to adjust the trade-off relationship between learning performance and transmission energy consumption, and G _max and E _max are the highest accuracy and maximum energy consumption that can be achieved by the system.

9. energy-efficient clustering federated edge learning strategy generation method according to claim 1, is characterized in that, in step S5, adopts deep reinforcement learning to adjust edge access strategy, and the concrete process of deep reinforcement learning is:

S501. Describe the edge correlation problem as a Markov process

The specific details are as follows:

(1) Status

In the kth round, the state is defined as S(k)={S ₁ (k), S ₂ (k), ..., S _N (k)}, and each term S _i (k) is defined as:

S _i (k)={A _i (k-1), β _ij (k), Δ _i (k)}

Among them, Δ _i (k) indicates whether the learning accuracy is improved compared to the k-1 round, that is, Δ _i (k)=1 means that the accuracy is improved, otherwise, Δ _i (k)=0;

(2) Action

In the kth round, the action is the edge association policy for each device:

A(k)={A ₁ (k), A ₂ (k), ..., A _N (k)}

Each of them A _i (k) can be expressed as:

A _i (k)={a _ij (k)}

(3) Rewards

Set the reward as the objective function:

S502. Select DQN as the basic framework, optimize the algorithm by combining dueling DQN and double DQN, and use D3QN to solve the edge access problem. The Q-value function Q(S, A; θ) is approximated by a neural network with a parameter of θ, representing The mapping relationship between the environment and the action, the output of the neural network is obtained through the Bellman equation:

Among them, S', A', θ' are the state, action and corresponding parameters of the next time slot respectively;

Two Q-networks with the same structure but different parameters are used in DQN to improve the stability of the algorithm, one is the current Q-network with the latest parameters to evaluate the value function of the current state-action, and the other is the one with the past rounds The goal of the parameter Q network, and keep the Q value unchanged for a period of time, taking the Q value of the current Q network as the input of the neural network, the goal of DQN is to minimize the difference between the two Q networks, and it is defined as The loss function of DQN:

L(θ)=E[(yQ(S, A; θ)) ² ]

S503, using the DDQN algorithm, select the action corresponding to the maximum Q value in the current Q network:

Then bring the selected action into the target Q network to calculate the Q value:

y=R(S,A)+γQ′(φ(S′), ^Amax (S′;θ);θ′)

S504. Use dueling DQN to optimize the network structure, and divide the network into two parts, namely the value function V(S, θ, α) related only to the state and the potential function A(S, A, θ, β), where θ is the common parameter of the two networks, α is the unique parameter of the value function, β is the unique parameter of the potential function, and the Q value is the sum of these two functions:

Q(S, A, θ, α, β)=V(S, θ, α)+A(S, A, θ, β).

10. An energy-efficient clustering federated edge learning strategy generation device, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that the computer processor executes the When the computer program is described, the energy-efficient clustering federated edge learning strategy generation method described in any one of claims 1-9 is realized.