CN114828095A

CN114828095A - Efficient data perception layered federated learning method based on task unloading

Info

Publication number: CN114828095A
Application number: CN202210293352.5A
Authority: CN
Inventors: 马牧雷; 吴连涛; 杨旸
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-07-29

Abstract

The technical scheme of the invention is to provide an efficient data perception layering federal learning method based on task unloading. The invention considers the data distribution in the cost function for the first time, and can improve the quality of the edge data set while reducing the system cost. In addition, the invention designs a TO and RA method based on a multi-intelligence depth deterministic strategy gradient model for reducing an action space. A large number of experiments prove that the algorithm provided by the invention can effectively improve the accuracy of the aggregation model, effectively reduce the unloading cost, improve the training precision of the light-weight data perception HFEL algorithm and reduce the system cost.

Description

Efficient data perception layered federated learning method based on task unloading

Technical Field

The invention relates to joint task unloading, resource allocation and participant selection problems under layered federal edge learning (hereinafter referred to as HFEL) so as to reduce system cost and improve training precision of the federal learning (hereinafter referred to as FL).

Background

In the era of data intelligence, billions of devices produce large amounts of data in marginal scenes. Uploading personal data to third party cloud server computing can cause a number of problems, including privacy disclosure. As an effective coping method, FL has become a promising machine learning paradigm. And the FL uploads the trained gradient or weight, and a plurality of weights are aggregated to finally obtain the global model. FL has been applied to multiple access edge computing (MEC for short hereinafter) scenarios for distributed model training to protect data privacy.

The traditional FL is dominated by a two-layer cloud federal learning (hereinafter referred to as C-FL) architecture, and comprises a parameter server and an edge working node (hereinafter referred to as worker) in the cloud. In classical FL algorithms, such as Federated Averaging (FedAvg), worker performs several rounds of local updates and uploads weights to the cloud for global aggregation. However, communication resources in a wide area network (hereinafter referred to as WAN) in a two-tier C-FL framework are limited and expensive. Network congestion can be exacerbated when a large number of devices communicate with the cloud through the backbone network.

To alleviate the above problems, HFEL frames have received attention. There are two layers of aggregation in this framework, Edge-aggregation in local area networks (hereinafter abbreviated as LANs) and Cloud-aggregation in wide area networks. In an edge scenario, a user device (hereinafter abbreviated UD) offloads data to an edge server (hereinafter abbreviated ES) for training. An edge parameter server (hereinafter EPS) serves as an intermediary between the ES and the cloud. Cloud-aggregation is performed in the Cloud to aggregate the weight of the EPS. The HEFL can be applied in many industrial or Internet scenarios for machine learning based services. For example: one cell has multiple ESs and UDs that upload tasks to the ESs for computation, which involves a task offload. Several such areas (e.g., branches and government agencies) are separated by tens of kilometers and FL is implemented to break data islands.

Disclosure of Invention

The purpose of the invention is: reducing system cost and improving FL training accuracy.

In order to achieve the above object, the technical solution of the present invention is to provide a high-efficiency data-aware hierarchical federated learning method based on task offloading, which is characterized in that the method includes the following steps of defining a FEL-MTMH scenario, in which a training model is located in an edge server, a plurality of user devices can offload data to the plurality of edge servers, and an edge calculation scenario can be abstracted to a Multi-task Multi-Helper scenario, wherein the user devices are only responsible for data collection and offloading, and the processes of model training and parameter aggregation are performed in the edge server and an edge parameter server;

in a FEL-MTMH scenario, there are U user equipments and S edge servers, the U-th user equipment unloads a task U to an edge server through an uplink channel, the S-th edge server is further defined as a server S, and then, for the task U unloaded to the server S: use of h _us Representing the channel gain between task u and server s; order to

Representing the bandwidth allocation of task u; use of

To represent the bandwidth allocation policy of server s, where U _s Representing a set of tasks to be offloaded to a server s, B _s Represents the transmission bandwidth of the server s; with binary offload policies

To express an offload policy, let m _us To represent

M if task u can be offloaded to server s _us 1, otherwise, m _us When the value is 0, then:

offload delay for server s

The device consists of a transmission part and a calculation part, and is shown as the following formula:

in the formula, R _u Uplink transmission rate R of task u _u And e represents the calculated density, d _u Data size, f, representing task u _s Representing the computing power of the server s where the task u is located;

off-load energy consumption of servers s

Comprises the following steps:

in the formula, alpha represents

And

q is an energy parameter depending on the architecture of the s-chip of the server;

will cost function J ₁ Defined to minimize the maximum cost among all edge servers, the following is defined:

in the formula (I), the compound is shown in the specification,

and

in order to be a weight parameter, the weight parameter,

and is

Information entropy is introduced into the system cost to represent the characteristics of data distribution, as shown in the following formula:

in the formula: d ^s Represents a data set collected by server s; c represents the number of categories; p _c (D ^s ) Data representing class c is in D ^s The ratio of (1);

maximization of J ₂ Entropy of (2), then the joint problem is defined as follows, denoted as the continuous discrete mixture MINLP problem P0:

s.t.m _us ∈0，1，

the hierarchical federated edge learning system is provided with a cloud server and K FEL-MTMH scenes, wherein each FEL-MTMH scene is provided with an edge parameter server and S edge servers; defining the data set, weight and loss function of the edge server as D _s 、w _s And F _s (ws); defining the data set, weight and loss function of the edge parameter server in the k-th FEL-MTMH scene as

And

respectively defining the data set, the weight and the loss function of the cloud server as D _global 、w _global And F _global (w _global )；

Setting the aggregation policy to X ═ X ₁ ，x ₂ ，...，x _s ，...，x _S ]，x _s 1 denotes the s-th edge server participating in edge aggregation, x _s 0 means that the s-th edge server does not participate in edge aggregation, then

And F _global (w _global ) Represented by the formula:

let each edge server be at κ ₁ Edge aggregation is performed after round of local training, with each edge parameter server at κ ₂ The secondary aggregation is followed by cloud aggregation, and the process is repeated until sufficient accuracy or communication thresholds are reached and the HFEL system model parameters are updated as:

wherein l represents the number of local training rounds, w _s (l) Represents the weight of the ES obtained in the first round of training, η represents the learning rate,

is represented by F _s (w _s (l-1)), wherein,

and D _global Is a virtual data set;

the continuous discrete hybrid MINLP problem P0 is decomposed into two subproblems, one each by fixing

Minimizing binary offload policies

Problem P1 resulting from the cost and problem P2 resulting from solving the bandwidth allocation policy, problem P1 is represented as:

in the formula (I), the compound is shown in the specification,

is a binary offload strategy

Is determined.

Problem P2 is expressed as:

in the formula (I), the compound is shown in the specification,

is a bandwidth allocation policy

Is determined.

For problem P1, a reduced action space multi-agent depth certainty policy gradient is utilized to interact with the environment to obtain a binary offload policy

For problem P2, a convex optimization method is used to determine the bandwidth allocation strategy

ES data set D _s Distribution P (D) _s ) Is defined as: p (D) _s )＝[P _c (D _s )|c∈C]，P _c (D _s ) Represents D _s A distribution belonging to class C, C being the number of classes;

is a virtual data set of edge parameter servers in the kth FEL-MTMH scenario, the distribution of which

Is shown as

To represent

A distribution belonging to class c; global virtual dataset D _global Is by polymerization

Obtaining;

introducing a KL divergence, defined as:

problem P3: the KL divergence minimization problem is shown by the following equation:

in the formula (I), the compound is shown in the specification,

represents an aggregated decision for server s;

order to

The KL divergence is then expressed as:

in the formula, P _c (D _global ) Represents D _global A distribution belonging to class c;

for problem P3, the KKT condition is used to get the optimal strategy

Preferably, the signal-to-interference-and-noise ratio SINRu of task u is represented as:

in the formula, p _u Represents the communication power allocated by user u;

σ ² which is indicative of the power of the background noise,

is the cumulative inter-cell interference for all tasks associated with the ESs other than the server s.

Uplink transmission rate R of task u _u Expressed as:

in particular, if the network in the reduced action space multi-agent depth certainty policy gradient outputs only offload decisions, then the reduced action space multi-agent depth certainty policy gradient is used to interact with the environment to obtain a binary offload policy

The method comprises the following steps:

a reward function and reduced motion space are introduced in the madpg model, which describes the evolution of HFEL systems using the following markov decision process:

(1) the state is as follows: state is s _t ＝[c _t ，d _t ，f _t ，h _t ]Wherein, in the step (A),

and

respectively representing the sample class and the data size,

indicating the computing resources available on the ES,

representing ambient channel fading;

(2) the actions are as follows: defining the offload policy generated by each agent as an operation

The representation represents the mapping relation between the user equipment and the edge server;

(3) rewarding: the implication of the reward is the amount of system weighted cost after the offload policy and bandwidth allocation policy are implemented according to the action. Thus, the reward is defined as the negative of the cost function, i.e. r-J ₁ *J ₂ Maximizing rewards means minimizing system costs;

given Actor and Critic network parameters are respectively theta ═ theta ₁ ，...，θ _n ]And ω ═ ω [ ω ] ₁ ，...，ω _n ]. The set of policies for each agent is: pi ═ pi ₁ ，...，π _n ]Let us assume that the deterministic policy set of N agents is μ ═ μ ₁ ，...，μ _N ]The deterministic policy gradient is then expressed as follows:

in the formula (I), the compound is shown in the specification,

as a strategic gradient formula, E _s，a []Indicates expectation of a _i Represents an action o _i Denotes observation, μ _i (a _i |o _i ) A deterministic policy is represented that is,

denotes a _i The gradient of (a) of (b) is,

a state-action function representing an ith agent concentration;

in the centralized training stage of the MADDPG model, Actor and Critic carry out centralized training;

in the distributed execution phase of the MADDPG model, the Actor only needs to know local observations, and the centralized criticc update is expressed as follows:

in the formula: l (theta) _i ) A loss function representing Critic; e _s,a,r，s ，[]Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function.

Problem P2 is rewritten as follows:

in the formula: m ₁ And M ₂ Is a constant associated with the task u,

M ₂ ＝-∑ _s∈S Entropy(D ^s ) (ii) a The SINR is the SINR _u ；

Deriving optimal policies using Lagrangian multipliers and KKT conditions

Wherein the content of the first and second substances,

ε _s representing the lagrange multiplier.

The invention considers the data distribution in the cost function for the first time, and can improve the quality of the edge data set while reducing the system cost. In addition, the invention designs a TO and RA method based on a multi-intelligence depth deterministic strategy gradient model (RAS-MADDPG for short) for reducing an action space. RAS-MADDPG is improved on the basis of a depth deterministic policy gradient (MADDPG) model, only local observation is adopted to give optimal action, and a dynamic model of the environment and special communication requirements are not required to be known. In order to overcome the influence of non-independent and identically distributed data, the invention provides a lightweight data perception HFEL algorithm, and KL divergence is adopted for participant selection (hereinafter referred to as PS) in the algorithm so as to improve the accuracy of a polymerization model. A large number of experiments prove that the algorithm provided by the invention can effectively improve the accuracy of the aggregation model, effectively reduce the unloading cost, improve the training precision of the lightweight data perception HFEL algorithm and reduce the system cost.

Drawings

FIG. 1 illustrates TO and RA based HEFL architecture in an MEC scenario;

FIG. 2 illustrates the decomposition of problem P0;

FIG. 3 illustrates the detailed algorithm of MADDPG;

FIG. 4 illustrates a Data-aware HFEL specific algorithm;

FIG. 5 illustrates a comparison of HFEL-MADDPG with a cloud-based FL (based on MNIST);

FIG. 6 illustrates a comparison of HFEL-MADDPG with a cloud-based FL (Cifar-based);

FIG. 7 illustrates different HEFL algorithm offload costs (κ) ₁ ＝10，κ ₂ ＝12)；

FIG. 8 illustrates different HEFL algorithm offload costs (κ) ₁ ＝30，κ ₂ ＝4)；

FIG. 9 illustrates the training performance (κ) of different HFEL algorithms ₁ ＝10，κ ₂ ＝12)；

FIG. 10 illustrates the training performance (κ) of different HFEL algorithms ₁ ＝30，κ ₂ ＝4)。

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

1 System model

1.1 application scenarios

FL is an exploration of distributed machine learning that can be trained using scattered data. This concept adapts well to the characteristics of data fragmentation in MECs. Therefore, introduction of FL into MEC has great engineering utility value. Fig. 1 illustrates offloading (hereinafter referred TO as TO) and bandwidth allocation (hereinafter referred TO as RA) of HFELs in MECs. In the scenario shown in fig. 1, where the training model is located in an ES, multiple UDs may offload data to multiple ESs. The edge computation scenario may be abstracted as a Multi-TaskMulti-helper (MTMH) scenario. Since UDs are only responsible for the collection and offloading of data, the process of model training and parameter aggregation is performed in ES and EPS. The present invention defines the above scenario as a FEL-MTMH scenario.

1.2 TO and RA based on information entropy

Assume that in a FEL-MTMH scenario, there are U UDs and S ESs. The u-th UD offloads task u to the ES through the upstream channel. The communication multiple access scheme is based on OFDMA, and the UD communicates with the ES through orthogonal subbands in a single cell. Thus, the interference mainly comes from inter-cell communication. Defining the s-th ES further as server s, using h for task u offloaded to server s _us Representing the channel gain between task u and server s. Order to

Indicating the bandwidth allocation for task u. Use of

To represent the bandwidth allocation policy of server s, where U _s Representing a set of tasks to be offloaded to a server s, B _s Representing the communication bandwidth of the server s. With binary offload policies

To indicate an offloading policy. Let m _us To represent

M if task u can be offloaded to server s _us 1, otherwise, m _us 0. SINR of task u _u Expressed as:

in the formula, p _u Represents the allocated power of task u;

σ ² which is indicative of the power of the background noise,

Uplink transmission rate R of task u _u Expressed as:

the present invention ignores the delay of the downstream transmission because the upstream rate is much larger than the downstream rate and the size of the data returned by the task is usually very small. Offload delay for server s

wherein e represents the calculated density, d _u Data size (bits), f, representing task u _s Representing the computing power of the server s on which the task u is located.

Off-load energy consumption of servers s

Comprises the following steps:

in the formula, alpha represents

And

q is an energy parameter that depends on the architecture of the server s-chip. The system cost function may be expressed as a weighted sum of the offload delay and the offload energy consumption. In order to avoid the 'straggler' phenomenon and reduce the influence of unbalanced data, the cost function J is used in the invention ₁ Defined to minimize the maximum cost among all ESs, the following is defined:

in the formula (I), the compound is shown in the specification,

and

in order to be a weight parameter, the weight parameter,

and is

The weight parameter may be adjusted, e.g. increased, in dependence of the task property

To accommodate delay sensitive tasks. In order to reduce the influence of non-independent co-distributed data, the invention introduces information entropy in system cost to represent the characteristics of data distribution, as shown in the following formula:

in the formula: d ^s Represents a data set collected by server s; c represents the number of categories; p _c (D ^s ) Representing categoriesc data in D ^s The ratio of (1). By maximizing J ₂ The class of the data in the unloaded data set will be increased, and the FL model can be trained and simultaneously extract features from richer samples, so that the influence of Non-IID is reduced. Notably, J ₁ Is the MinMax problem, J ₂ Is the problem of maximizing entropy. Finally, the present invention defines the joint problem as the following, denoted as the continuous discrete mixture MINLP problem P0:

s.t.m _us ∈0，1，

1.3 HEFL model

The layered federated edge learning system is provided with a cloud server and K FEL-MTMH scenes. There is one EPS and S ESs per FEL-MTMH. Defining the data set, weight and loss function of ES as D _s 、w _s And F _s (w _s ) (ii) a Defining the data set, weight and loss function of EPS in the kth FEL-MTMH scene as

And

respectively defining the data set, the weight and the loss function of the cloud server as D _global 、w _global And F _global (w _global )。

Referred to as polymerization loss, a loss function F can be employed _s (w _s ) Is calculated as a weighted average. In FL in particular, the sampling of participant weights can affect the convergence and accuracy of the post-aggregation model. An appropriate sampling strategy may improve the representation of the data and reduce the variance. The present invention sets the aggregation strategy to be denoted X ═ X ₁ ，x ₂ ，...，x _s ，...，x _S ]，x _s 1 means that the s-th ES participates in Edge-Aggregation, x _s 0 means that the s-th ES does not participate in the polymerization. Then

And F _global (w _global ) Represented by the formula:

to reduce communication overhead, assume that each ES is at κ ₁ Edge-Aggregation is performed after round of local training, with each EPS at κ ₂ After the secondary polymerization, Cloud-Aggregation was performed. This process is repeated until sufficient accuracy or communication threshold is reached.

The HFEL system model parameters are updated as:

wherein l represents the number of local training rounds, w _s (l) Watch (A)Showing the weight of the ES obtained in the first round of training, eta shows the learning rate,

is represented by F _s (w _s (l-1)). To protect privacy, HFEL systems use parameter aggregation instead of data aggregation, so

And D _global Is a virtual data set.

2 Joint problem decoupling strategy

This is an NP-hard problem due to the continuous discrete blend MINLP characteristic of problem P0. The present invention proposes a two-step solution: reduction of action space multi-agent depth certainty policy gradients (hereinafter abbreviated RAS-madpg) and KL divergence based participant selection (hereinafter abbreviated PS-KL). The combined application of RAS-MADDPG and PS-KL processes unbalanced and Non-IID data, and can also improve the model training precision and reduce the system cost.

By fixing the variables, the continuous discrete blending MINLP problem P0 can decompose two sub-problems. As shown in FIG. 2, for problem P0, attention is first directed to fixing

Minimizing binary offload policies

The following problem P1 is obtained:

in the formula (I), the compound is shown in the specification,

is a binary offload strategy

Is determined.

Next, solving the bandwidth allocation policy results in a problem P2 as shown in the following formula:

in the formula (I), the compound is shown in the specification,

is a bandwidth allocation policy

Is determined.

For problem P1, RAS-MADDPG is used to interact with the environment to obtain a binary offload policy

For problem P2, the convex optimization method is applicable to the continuous RA problem. After determining the binary unloading strategy

And bandwidth allocation policy

Thereafter, the training data set for each ES is also determined. To improve the convergence speed and accuracy of FL, the present invention performs an optimized edge aggregation process at problem P3.

Problem solving strategy based on reinforcement learning

3.1 information entropy-based computational offload policy

The present invention describes the evolution of HFEL systems using the following Markov Decision Process (MDP):

and

respectively representing the sample class and the data size,

indicating the computing resources available on the ES,

indicating ambient channel fading.

And showing the mapping relation of UD and ES. Since the RA part is split, the motion space is greatly reduced here.

(3) Rewarding: in RAS-MADDPG, the implication of rewards is the amount of system weighted cost after the action executes the TO and RA policies. Thus, the reward is defined as the negative of the cost function, i.e. r-J ₁ *J ₂ Maximizing the reward means minimizing the system cost.

The TO framework based on RAS-MADDPG is shown in FIG. 1. The present invention introduces a novel reward function and reduced motion space in the maddppg model. The reward function may effectively incentivize the agent to find an optimal policy. The RA problem P2 is solved in the RA method block without putting in motion space. The network in RAS-MADDPG outputs only the offload decisions, so the action space and network complexity is greatly reduced.

The network parameters given to Actor and Critic are θ ═ θ, respectively ₁ ，...，θ _n ]And ω ═ ω [ ω ] ₁ ，...，ω _n ]. The set of policies for each agent is: pi ═ pi ₁ ，...，π _n ]. Suppose the deterministic policy set of N agents is μ ═ μ ₁ ，...，μ _N ]The deterministic policy gradient is then expressed as follows:

in the formula (I), the compound is shown in the specification,

denotes a _i The gradient of (a) of (b) is,

a state-action function representing the ith agent concentration.

Centralized training and distributed execution mechanism of madpg: during the training phase, Actor and Critic perform intensive training. During the execution phase, the Actor only needs to know the local observations. The centralized Critic update leverages the Time Difference (TD) in DQN and the idea of the target network, which is expressed as follows:

wherein L (θ) _i ) A loss function representing Critic; e _s,a,r，s′ []Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function,

wherein, gamma represents the discount rate,

denotes a target network,. mu.' _j (o _j ) ' parameter θ ' representing target strategy with hysteresis update ' _j 。

The detailed algorithmic representation of maddppg is shown in fig. 3.

3.2 resource Allocation Module

The problem P0 is difficult to solve due to continuous discrete mixing and non-convexity. Referring to the Tammer decoupling method, problem P2 can be rewritten as follows:

in the formula: m ₁ And M ₂ Is a constant associated with the task u,

M ₂ ＝-∑ _s∈S Entropy(D ^s ) (ii) a The SINR is the SINR _u . To the right of the equation is a Min-Max problem. And obtaining the second derivative from the supremum to know that the derivative is a convex function. Known from the theorem of sum of convex functions

For convex functions, the optimal strategy can be derived using the Lagrangian multiplier and the KKT condition

Wherein the content of the first and second substances,

ε _s representing lagrange multipliers.

3.3 data-aware HFEL

PS is applied for edge aggregation and reduces the impact of non-IID data. Due to FL environmentThe distributed nature of random selection of clients for aggregation can exacerbate the adverse effects of data heterogeneity. To facilitate the study of edge aggregation, the present invention converts the difference in aggregation weights into a difference in the distribution of the data set. ES data set D _s Distribution P (D) _s ) Is defined as: p (D) _s )＝[P _c (D _s )|c∈C]，P _c (D _s ) Data of the indicated category c is in D _s C is the number of classes.

Is a virtual data set of EPS in the kth FEL-MTMH scenario, the distribution of which

Is shown as

Represents class c data in

The ratio of (1). Global virtual dataset D _global Is by polymerization

And (4) obtaining the product. The invention concerns the similarity of data distribution after edge aggregation and cloud aggregation, introduces KL divergence and defines the following data distribution as follows:

based on the above, the present invention proposes problem P3: the KL divergence minimization problem is shown by the following equation:

in the formula (I), the compound is shown in the specification,

representing the aggregate decision of the server s.

Order to

The KL divergence is then expressed as:

in the formula, P _c (D _global ) Data in D representing category c _global The ratio of (1).

The second derivative on the right side of the equal sign is known to be a convex function, which means that the KKT condition can be used to obtain the optimal strategy

In particular, an optimal strategy

Is in the domain

Obtained as described above. According to

To a binary value. The specific Data-aware HFEL algorithm is shown in fig. 4.

4 Experimental Environment setup

The present invention employs two exemplary learning-based visual recognition tasks to construct a HFEL simulation system. These tasks are based on two data sets: MNIST digital data set and CIFAR-10 real object data set. To study the effect of non-IID data, the present invention also specifies the way data is distributed in the edge. To simulate unbalanced data, the number of samples on each UD is in accordance with a Gaussian distribution X N (100, 10). The invention shapes two data distributions: (1) IID, each data set is shuffled and then randomly partitioned into UDs. (2) Non-IID, data sets are classified into 10 classes according to label. Each UD is then assigned a sample randomly selected from two categories. Furthermore, training was performed on two models: (1) a multi-layer perceptron with two hidden layers and activated by a sigmoid function. (2) CNN of a 5 x 5 convolution kernel. The model is provided in the TensorFlow course and consists of two convolutional layers and two fully-connected layers.

For each FEL-MTMH scenario, consider a multi-element system where ES is located at the center of the hexagonal element. Assuming 10 cells per FEL-MTMH scenario, the ES computation power is from [6, 8, 10, 12 ]]And (4) randomly selecting in GHz. The background noise power σ and bandwidth B are set to-100 dBm and 5 MHz. The channel gain h follows the free space path loss model. On the UD side, the maximum transmission power is from [80, 100, 120 ]]Randomly selecting from mW. Default energy consumption and delay parameters

The invention realizes the comparison of three representative standards:

(1) HFEL-based RAS-MADDPG algorithm: the invention provides a method that applies RAS-MADDPG and data-aware HFEL. The TO, RA and PS problems in the MEC scene are jointly solved.

(2) Cloud-based FL algorithm (C-FL): the C-FL trains all UD data on distributed computing nodes, and parameters are aggregated through Cloud-based FL.

(3) An HFEL-based DRL (Actor-critical) offload algorithm (hereinafter abbreviated as HFEL-DRL): HFEL-DRL applies the latest DRL algorithm to obtain offloading decisions in HFEL. The algorithm adjusts the strategy through a heuristic method to achieve the performance close to the optimal performance.

(4) HFEL-based independent offload and Joint RA (hereinafter HFEL-IOJR): HFEL-IOJR randomly assigns each task to ES and employs joint RA.

5 influence of Key parameters

First, two key parameters (i.e., κ) of the HFEL-MADDPG algorithm were quantified ₁ And kappa ₂ ) The influence of (c). The training time and energy consumption for transmission and calculation are calculated in table 1.

TABLE 1 Critical parameter Effect

Table 1 shows that HFEL has less training time and energy consumption compared to C-FL. In addition, with κ ₁ The training time of the two data sets decreases monotonically. The energy consumption of the system is firstly reduced and then increased, which shows that ₁ The reduction of (b) may reduce the consumption of edge computations. However, too frequent edge aggregation may result in increased communication consumption. Reasonable setting kappa ₁ And kappa ₂ The system efficiency will be improved.

6 comparison with cloud-based FL

The performance of C-FL and HFEL-MADDPG were compared by convergence analysis. The results are shown in FIGS. 5 and 6. When the data is IID, the accuracy and convergence rate of HFEL-MADDPG are better than those of C-FL. However, when the data is Non-IID, the convergence rate of HFEL-MADDPG is significantly reduced. Thus, two conclusions can be drawn: (1) compared with C-FL, the data-aware HFEL-MADDPG has better performance in terms of training speed and accuracy. (2) non-IID data can greatly reduce the accuracy and convergence speed of the model.

7 comparison with HFEL baseline Algorithm

Next, the performance of the HFEL-MADDPG in the HFEL scenario is compared. In FIGS. 7 and 8, it can be seen that HFEL-MADDPG and HFEL-DRL can effectively reduce the cost of defluxing compared with HFEL-IOJR. In particular, the cost of HFEL-maddppg is minimal, reflecting the effectiveness of the offloading strategy and PS mechanism of the present invention.

The training process of HFEL-madpg was compared to other baseline algorithms with guaranteed communication performance. FIGS. 9 and 10 show the comparison of the training effects of the three HFEL algorithms in the Non-IID scenario. We can see that HFEL-maddppg has the best accuracy and convergence speed. In addition, compared with baselines, the HFEL-MADDPG greatly reduces communication cost, improves training efficiency and embodies the effectiveness of the algorithm.

Claims

1. A high-efficiency data perception layering federated learning method based on task unloading is characterized by comprising the following steps

Defining a FEL-MTMH scene, wherein a training model is positioned in an edge server in the FEL-MTMH scene, a plurality of user equipment can unload data to the plurality of edge servers, the edge calculation scene can be abstracted to a Multi-task Multi-Helper scene, wherein the user equipment is only responsible for collecting and unloading the data, and the process of model training and parameter aggregation is carried out in the edge server and the edge parameter server;

Representing the bandwidth allocation of task u; use of

To represent the bandwidth allocation policy of server s, where U _s Representing a set of tasks to be offloaded to a server s, B _s Represents the communication bandwidth of the server s; with binary offload policies

To express an offload policy, let m _us To represent

offload delay for server s

off-load energy consumption of servers s

Comprises the following steps:

in the formula, alpha represents

And

in the formula (I), the compound is shown in the specification,

and

in order to be a weight parameter, the weight parameter,

and is provided with

(P0)：

s.t.m _us ∈0，1，

the hierarchical federated edge learning system is provided with a cloud server and K FEL-MTMH scenes, wherein each FEL-MTMH scene is provided with an edge parameter server and S edge servers; defining the data set, weight and loss function of the edge server as D _s 、w _s And F _s (w _s ) (ii) a Defining the data set, weight and loss function of the edge parameter server in the k-th FEL-MTMH scene as

And

And F _global (w _global ) Represented by the formula:

let each edge server be at κ ₁ Edge aggregation is performed after round of local training, with each edge parameter server at κ ₂ Performing cloud aggregation after the secondary aggregation, and repeating the processUntil sufficient accuracy or communication threshold is reached, the HFEL system model parameters are updated as:

is represented by F _s (w _s (l-1)), wherein,

and D _global Is a virtual data set;

Minimizing binary offload policies

(P1)：

in the formula (I), the compound is shown in the specification,

is a binary offload policy

Is determined.

Problem P2 is expressed as:

(P2)：

in the formula (I), the compound is shown in the specification,

is a bandwidth allocation policy

The optimal function of (2).

Is shown as

To represent

Obtaining;

introducing a KL divergence, defined as:

in the formula (I), the compound is shown in the specification,

represents an aggregated decision for server s;

order to

The KL divergence is then expressed as:

for problem P3, the optimal strategy is derived using the KKT condition

2. The efficient data-aware hierarchical federated learning method based on task offloading of claim 1, wherein the task u's signal to interference and noise ratio SINR _u Expressed as:

in the formula, p _u Indicating the communication power allocated by the task u;

σ ² which is indicative of the power of the background noise,

3. The efficient data-aware hierarchical federated learning method based on task offloading as recited in claim 2, wherein the uplink transmission rate R of task u _u Expressed as:

4. the efficient data-aware hierarchical federated learning method based on task offloading as recited in claim 1, wherein if the network in the reduced action-space multi-agent depth certainty policy gradient outputs only offloading decisions, the reduced action-space multi-agent depth certainty policy gradient is utilized to interact with the environment to obtain a binary offloading policy

The method comprises the following steps:

and

respectively representing the sample class and the data size,

indicating the computing resources available on the ES,

representing ambient channel fading;

the parameters of the Actor and Critic networks are given as theta ═ theta respectively ₁ ，...，θ _n ]And ω ═ ω [ ω ] ₁ ，...，ω _n ]. The set of policies for each agent is: pi ═ pi ₁ ，...，π _n ]Let us assume that the deterministic policy set of N agents is μ ═ μ ₁ ，...，μ _N ]The deterministic policy gradient is then expressed as follows:

in the formula of the Chinese medicinal composition,

denotes a _i The gradient of (a) of (b) is,

a state-action function representing an ith agent concentration;

wherein, in the formula, L (theta) _i ) A loss function representing Critic; e _{s，a，r，s′} []Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function.

5. The efficient data-aware hierarchical federated learning method based on task offloading of claim 1, wherein problem P2 is rewritten as follows:

in the formula: m ₁ And M ₂ Is a constant associated with the task u,

M ₂ ＝-∑ _s∈S Entropy(D ^s ) (ii) a The SINR is the SINR _u ；

Deriving optimal policies using Lagrangian multipliers and KKT conditions

Wherein the content of the first and second substances,

ε _s representing the lagrange multiplier.