CN114828095A - Efficient data perception layered federated learning method based on task unloading - Google Patents

Efficient data perception layered federated learning method based on task unloading Download PDF

Info

Publication number
CN114828095A
CN114828095A CN202210293352.5A CN202210293352A CN114828095A CN 114828095 A CN114828095 A CN 114828095A CN 202210293352 A CN202210293352 A CN 202210293352A CN 114828095 A CN114828095 A CN 114828095A
Authority
CN
China
Prior art keywords
server
edge
task
formula
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210293352.5A
Other languages
Chinese (zh)
Inventor
马牧雷
吴连涛
杨旸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN202210293352.5A priority Critical patent/CN114828095A/en
Publication of CN114828095A publication Critical patent/CN114828095A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0983Quality of Service [QoS] parameters for optimizing bandwidth or throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The technical scheme of the invention is to provide an efficient data perception layering federal learning method based on task unloading. The invention considers the data distribution in the cost function for the first time, and can improve the quality of the edge data set while reducing the system cost. In addition, the invention designs a TO and RA method based on a multi-intelligence depth deterministic strategy gradient model for reducing an action space. A large number of experiments prove that the algorithm provided by the invention can effectively improve the accuracy of the aggregation model, effectively reduce the unloading cost, improve the training precision of the light-weight data perception HFEL algorithm and reduce the system cost.

Description

Efficient data perception layered federated learning method based on task unloading
Technical Field
The invention relates to joint task unloading, resource allocation and participant selection problems under layered federal edge learning (hereinafter referred to as HFEL) so as to reduce system cost and improve training precision of the federal learning (hereinafter referred to as FL).
Background
In the era of data intelligence, billions of devices produce large amounts of data in marginal scenes. Uploading personal data to third party cloud server computing can cause a number of problems, including privacy disclosure. As an effective coping method, FL has become a promising machine learning paradigm. And the FL uploads the trained gradient or weight, and a plurality of weights are aggregated to finally obtain the global model. FL has been applied to multiple access edge computing (MEC for short hereinafter) scenarios for distributed model training to protect data privacy.
The traditional FL is dominated by a two-layer cloud federal learning (hereinafter referred to as C-FL) architecture, and comprises a parameter server and an edge working node (hereinafter referred to as worker) in the cloud. In classical FL algorithms, such as Federated Averaging (FedAvg), worker performs several rounds of local updates and uploads weights to the cloud for global aggregation. However, communication resources in a wide area network (hereinafter referred to as WAN) in a two-tier C-FL framework are limited and expensive. Network congestion can be exacerbated when a large number of devices communicate with the cloud through the backbone network.
To alleviate the above problems, HFEL frames have received attention. There are two layers of aggregation in this framework, Edge-aggregation in local area networks (hereinafter abbreviated as LANs) and Cloud-aggregation in wide area networks. In an edge scenario, a user device (hereinafter abbreviated UD) offloads data to an edge server (hereinafter abbreviated ES) for training. An edge parameter server (hereinafter EPS) serves as an intermediary between the ES and the cloud. Cloud-aggregation is performed in the Cloud to aggregate the weight of the EPS. The HEFL can be applied in many industrial or Internet scenarios for machine learning based services. For example: one cell has multiple ESs and UDs that upload tasks to the ESs for computation, which involves a task offload. Several such areas (e.g., branches and government agencies) are separated by tens of kilometers and FL is implemented to break data islands.
Disclosure of Invention
The purpose of the invention is: reducing system cost and improving FL training accuracy.
In order to achieve the above object, the technical solution of the present invention is to provide a high-efficiency data-aware hierarchical federated learning method based on task offloading, which is characterized in that the method includes the following steps of defining a FEL-MTMH scenario, in which a training model is located in an edge server, a plurality of user devices can offload data to the plurality of edge servers, and an edge calculation scenario can be abstracted to a Multi-task Multi-Helper scenario, wherein the user devices are only responsible for data collection and offloading, and the processes of model training and parameter aggregation are performed in the edge server and an edge parameter server;
in a FEL-MTMH scenario, there are U user equipments and S edge servers, the U-th user equipment unloads a task U to an edge server through an uplink channel, the S-th edge server is further defined as a server S, and then, for the task U unloaded to the server S: use of h us Representing the channel gain between task u and server s; order to
Figure BDA0003562336360000021
Representing the bandwidth allocation of task u; use of
Figure BDA0003562336360000022
To represent the bandwidth allocation policy of server s, where U s Representing a set of tasks to be offloaded to a server s, B s Represents the transmission bandwidth of the server s; with binary offload policies
Figure BDA0003562336360000023
To express an offload policy, let m us To represent
Figure BDA0003562336360000024
M if task u can be offloaded to server s us 1, otherwise, m us When the value is 0, then:
offload delay for server s
Figure BDA0003562336360000025
The device consists of a transmission part and a calculation part, and is shown as the following formula:
Figure BDA0003562336360000026
in the formula, R u Uplink transmission rate R of task u u And e represents the calculated density, d u Data size, f, representing task u s Representing the computing power of the server s where the task u is located;
off-load energy consumption of servers s
Figure BDA0003562336360000027
Comprises the following steps:
Figure BDA0003562336360000028
in the formula, alpha represents
Figure BDA0003562336360000029
And
Figure BDA00035623363600000210
q is an energy parameter depending on the architecture of the s-chip of the server;
will cost function J 1 Defined to minimize the maximum cost among all edge servers, the following is defined:
Figure BDA00035623363600000211
in the formula (I), the compound is shown in the specification,
Figure BDA00035623363600000212
and
Figure BDA00035623363600000213
in order to be a weight parameter, the weight parameter,
Figure BDA00035623363600000214
and is
Figure BDA00035623363600000215
Information entropy is introduced into the system cost to represent the characteristics of data distribution, as shown in the following formula:
Figure BDA00035623363600000216
Figure BDA00035623363600000217
in the formula: d s Represents a data set collected by server s; c represents the number of categories; p c (D s ) Data representing class c is in D s The ratio of (1);
maximization of J 2 Entropy of (2), then the joint problem is defined as follows, denoted as the continuous discrete mixture MINLP problem P0:
Figure BDA0003562336360000031
s.t.m us ∈0,1,
Figure BDA0003562336360000032
Figure BDA0003562336360000033
Figure BDA0003562336360000034
the hierarchical federated edge learning system is provided with a cloud server and K FEL-MTMH scenes, wherein each FEL-MTMH scene is provided with an edge parameter server and S edge servers; defining the data set, weight and loss function of the edge server as D s 、w s And F s (ws); defining the data set, weight and loss function of the edge parameter server in the k-th FEL-MTMH scene as
Figure BDA0003562336360000035
And
Figure BDA0003562336360000036
respectively defining the data set, the weight and the loss function of the cloud server as D global 、w global And F global (w global );
Setting the aggregation policy to X ═ X 1 ,x 2 ,...,x s ,...,x S ],x s 1 denotes the s-th edge server participating in edge aggregation, x s 0 means that the s-th edge server does not participate in edge aggregation, then
Figure BDA0003562336360000037
And F global (w global ) Represented by the formula:
Figure BDA0003562336360000038
Figure BDA0003562336360000039
let each edge server be at κ 1 Edge aggregation is performed after round of local training, with each edge parameter server at κ 2 The secondary aggregation is followed by cloud aggregation, and the process is repeated until sufficient accuracy or communication thresholds are reached and the HFEL system model parameters are updated as:
Figure BDA00035623363600000310
Figure BDA0003562336360000041
wherein l represents the number of local training rounds, w s (l) Represents the weight of the ES obtained in the first round of training, η represents the learning rate,
Figure BDA0003562336360000042
is represented by F s (w s (l-1)), wherein,
Figure BDA0003562336360000043
and D global Is a virtual data set;
the continuous discrete hybrid MINLP problem P0 is decomposed into two subproblems, one each by fixing
Figure BDA00035623363600000423
Minimizing binary offload policies
Figure BDA00035623363600000424
Problem P1 resulting from the cost and problem P2 resulting from solving the bandwidth allocation policy, problem P1 is represented as:
Figure BDA0003562336360000044
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000045
is a binary offload strategy
Figure BDA0003562336360000046
Is determined.
Problem P2 is expressed as:
Figure BDA0003562336360000047
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000048
is a bandwidth allocation policy
Figure BDA00035623363600000425
Is determined.
For problem P1, a reduced action space multi-agent depth certainty policy gradient is utilized to interact with the environment to obtain a binary offload policy
Figure BDA0003562336360000049
For problem P2, a convex optimization method is used to determine the bandwidth allocation strategy
Figure BDA00035623363600000410
ES data set D s Distribution P (D) s ) Is defined as: p (D) s )=[P c (D s )|c∈C],P c (D s ) Represents D s A distribution belonging to class C, C being the number of classes;
Figure BDA00035623363600000411
is a virtual data set of edge parameter servers in the kth FEL-MTMH scenario, the distribution of which
Figure BDA00035623363600000412
Is shown as
Figure BDA00035623363600000413
Figure BDA00035623363600000414
To represent
Figure BDA00035623363600000415
A distribution belonging to class c; global virtual dataset D global Is by polymerization
Figure BDA00035623363600000416
Obtaining;
introducing a KL divergence, defined as:
Figure BDA00035623363600000417
problem P3: the KL divergence minimization problem is shown by the following equation:
Figure BDA00035623363600000418
Figure BDA00035623363600000419
in the formula (I), the compound is shown in the specification,
Figure BDA00035623363600000420
represents an aggregated decision for server s;
order to
Figure BDA00035623363600000421
The KL divergence is then expressed as:
Figure BDA00035623363600000422
in the formula, P c (D global ) Represents D global A distribution belonging to class c;
for problem P3, the KKT condition is used to get the optimal strategy
Figure BDA0003562336360000051
Preferably, the signal-to-interference-and-noise ratio SINRu of task u is represented as:
Figure BDA0003562336360000052
in the formula, p u Represents the communication power allocated by user u;
Figure BDA0003562336360000053
σ 2 which is indicative of the power of the background noise,
Figure BDA0003562336360000054
is the cumulative inter-cell interference for all tasks associated with the ESs other than the server s.
Uplink transmission rate R of task u u Expressed as:
Figure BDA0003562336360000055
in particular, if the network in the reduced action space multi-agent depth certainty policy gradient outputs only offload decisions, then the reduced action space multi-agent depth certainty policy gradient is used to interact with the environment to obtain a binary offload policy
Figure BDA00035623363600000512
The method comprises the following steps:
a reward function and reduced motion space are introduced in the madpg model, which describes the evolution of HFEL systems using the following markov decision process:
(1) the state is as follows: state is s t =[c t ,d t ,f t ,h t ]Wherein, in the step (A),
Figure BDA0003562336360000056
and
Figure BDA0003562336360000057
respectively representing the sample class and the data size,
Figure BDA0003562336360000058
indicating the computing resources available on the ES,
Figure BDA0003562336360000059
representing ambient channel fading;
(2) the actions are as follows: defining the offload policy generated by each agent as an operation
Figure BDA00035623363600000510
The representation represents the mapping relation between the user equipment and the edge server;
(3) rewarding: the implication of the reward is the amount of system weighted cost after the offload policy and bandwidth allocation policy are implemented according to the action. Thus, the reward is defined as the negative of the cost function, i.e. r-J 1 *J 2 Maximizing rewards means minimizing system costs;
given Actor and Critic network parameters are respectively theta ═ theta 1 ,...,θ n ]And ω ═ ω [ ω ] 1 ,...,ω n ]. The set of policies for each agent is: pi ═ pi 1 ,...,π n ]Let us assume that the deterministic policy set of N agents is μ ═ μ 1 ,...,μ N ]The deterministic policy gradient is then expressed as follows:
Figure BDA00035623363600000511
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000061
as a strategic gradient formula, E s,a []Indicates expectation of a i Represents an action o i Denotes observation, μ i (a i |o i ) A deterministic policy is represented that is,
Figure BDA0003562336360000062
denotes a i The gradient of (a) of (b) is,
Figure BDA0003562336360000063
a state-action function representing an ith agent concentration;
in the centralized training stage of the MADDPG model, Actor and Critic carry out centralized training;
in the distributed execution phase of the MADDPG model, the Actor only needs to know local observations, and the centralized criticc update is expressed as follows:
Figure BDA0003562336360000064
in the formula: l (theta) i ) A loss function representing Critic; e s,a,r,s ,[]Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function.
Problem P2 is rewritten as follows:
Figure BDA0003562336360000065
Figure BDA0003562336360000066
Figure BDA0003562336360000067
in the formula: m 1 And M 2 Is a constant associated with the task u,
Figure BDA0003562336360000068
M 2 =-∑ s∈S Entropy(D s ) (ii) a The SINR is the SINR u
Deriving optimal policies using Lagrangian multipliers and KKT conditions
Figure BDA0003562336360000069
Wherein the content of the first and second substances,
Figure BDA00035623363600000610
ε s representing the lagrange multiplier.
The invention considers the data distribution in the cost function for the first time, and can improve the quality of the edge data set while reducing the system cost. In addition, the invention designs a TO and RA method based on a multi-intelligence depth deterministic strategy gradient model (RAS-MADDPG for short) for reducing an action space. RAS-MADDPG is improved on the basis of a depth deterministic policy gradient (MADDPG) model, only local observation is adopted to give optimal action, and a dynamic model of the environment and special communication requirements are not required to be known. In order to overcome the influence of non-independent and identically distributed data, the invention provides a lightweight data perception HFEL algorithm, and KL divergence is adopted for participant selection (hereinafter referred to as PS) in the algorithm so as to improve the accuracy of a polymerization model. A large number of experiments prove that the algorithm provided by the invention can effectively improve the accuracy of the aggregation model, effectively reduce the unloading cost, improve the training precision of the lightweight data perception HFEL algorithm and reduce the system cost.
Drawings
FIG. 1 illustrates TO and RA based HEFL architecture in an MEC scenario;
FIG. 2 illustrates the decomposition of problem P0;
FIG. 3 illustrates the detailed algorithm of MADDPG;
FIG. 4 illustrates a Data-aware HFEL specific algorithm;
FIG. 5 illustrates a comparison of HFEL-MADDPG with a cloud-based FL (based on MNIST);
FIG. 6 illustrates a comparison of HFEL-MADDPG with a cloud-based FL (Cifar-based);
FIG. 7 illustrates different HEFL algorithm offload costs (κ) 1 =10,κ 2 =12);
FIG. 8 illustrates different HEFL algorithm offload costs (κ) 1 =30,κ 2 =4);
FIG. 9 illustrates the training performance (κ) of different HFEL algorithms 1 =10,κ 2 =12);
FIG. 10 illustrates the training performance (κ) of different HFEL algorithms 1 =30,κ 2 =4)。
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
1 System model
1.1 application scenarios
FL is an exploration of distributed machine learning that can be trained using scattered data. This concept adapts well to the characteristics of data fragmentation in MECs. Therefore, introduction of FL into MEC has great engineering utility value. Fig. 1 illustrates offloading (hereinafter referred TO as TO) and bandwidth allocation (hereinafter referred TO as RA) of HFELs in MECs. In the scenario shown in fig. 1, where the training model is located in an ES, multiple UDs may offload data to multiple ESs. The edge computation scenario may be abstracted as a Multi-TaskMulti-helper (MTMH) scenario. Since UDs are only responsible for the collection and offloading of data, the process of model training and parameter aggregation is performed in ES and EPS. The present invention defines the above scenario as a FEL-MTMH scenario.
1.2 TO and RA based on information entropy
Assume that in a FEL-MTMH scenario, there are U UDs and S ESs. The u-th UD offloads task u to the ES through the upstream channel. The communication multiple access scheme is based on OFDMA, and the UD communicates with the ES through orthogonal subbands in a single cell. Thus, the interference mainly comes from inter-cell communication. Defining the s-th ES further as server s, using h for task u offloaded to server s us Representing the channel gain between task u and server s. Order to
Figure BDA0003562336360000081
Indicating the bandwidth allocation for task u. Use of
Figure BDA0003562336360000082
To represent the bandwidth allocation policy of server s, where U s Representing a set of tasks to be offloaded to a server s, B s Representing the communication bandwidth of the server s. With binary offload policies
Figure BDA0003562336360000083
To indicate an offloading policy. Let m us To represent
Figure BDA0003562336360000084
M if task u can be offloaded to server s us 1, otherwise, m us 0. SINR of task u u Expressed as:
Figure BDA0003562336360000085
in the formula, p u Represents the allocated power of task u;
Figure BDA0003562336360000086
σ 2 which is indicative of the power of the background noise,
Figure BDA0003562336360000087
is the cumulative inter-cell interference for all tasks associated with the ESs other than the server s.
Uplink transmission rate R of task u u Expressed as:
Figure BDA0003562336360000088
the present invention ignores the delay of the downstream transmission because the upstream rate is much larger than the downstream rate and the size of the data returned by the task is usually very small. Offload delay for server s
Figure BDA0003562336360000089
The device consists of a transmission part and a calculation part, and is shown as the following formula:
Figure BDA00035623363600000810
wherein e represents the calculated density, d u Data size (bits), f, representing task u s Representing the computing power of the server s on which the task u is located.
Off-load energy consumption of servers s
Figure BDA00035623363600000811
Comprises the following steps:
Figure BDA00035623363600000812
in the formula, alpha represents
Figure BDA00035623363600000813
And
Figure BDA00035623363600000814
q is an energy parameter that depends on the architecture of the server s-chip. The system cost function may be expressed as a weighted sum of the offload delay and the offload energy consumption. In order to avoid the 'straggler' phenomenon and reduce the influence of unbalanced data, the cost function J is used in the invention 1 Defined to minimize the maximum cost among all ESs, the following is defined:
Figure BDA0003562336360000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000092
and
Figure BDA0003562336360000093
in order to be a weight parameter, the weight parameter,
Figure BDA0003562336360000094
and is
Figure BDA0003562336360000095
The weight parameter may be adjusted, e.g. increased, in dependence of the task property
Figure BDA0003562336360000096
To accommodate delay sensitive tasks. In order to reduce the influence of non-independent co-distributed data, the invention introduces information entropy in system cost to represent the characteristics of data distribution, as shown in the following formula:
Figure BDA0003562336360000097
Figure BDA0003562336360000098
in the formula: d s Represents a data set collected by server s; c represents the number of categories; p c (D s ) Representing categoriesc data in D s The ratio of (1). By maximizing J 2 The class of the data in the unloaded data set will be increased, and the FL model can be trained and simultaneously extract features from richer samples, so that the influence of Non-IID is reduced. Notably, J 1 Is the MinMax problem, J 2 Is the problem of maximizing entropy. Finally, the present invention defines the joint problem as the following, denoted as the continuous discrete mixture MINLP problem P0:
Figure BDA0003562336360000099
s.t.m us ∈0,1,
Figure BDA00035623363600000910
Figure BDA00035623363600000911
Figure BDA00035623363600000912
1.3 HEFL model
The layered federated edge learning system is provided with a cloud server and K FEL-MTMH scenes. There is one EPS and S ESs per FEL-MTMH. Defining the data set, weight and loss function of ES as D s 、w s And F s (w s ) (ii) a Defining the data set, weight and loss function of EPS in the kth FEL-MTMH scene as
Figure BDA00035623363600000913
And
Figure BDA00035623363600000914
respectively defining the data set, the weight and the loss function of the cloud server as D global 、w global And F global (w global )。
Figure BDA00035623363600000915
Referred to as polymerization loss, a loss function F can be employed s (w s ) Is calculated as a weighted average. In FL in particular, the sampling of participant weights can affect the convergence and accuracy of the post-aggregation model. An appropriate sampling strategy may improve the representation of the data and reduce the variance. The present invention sets the aggregation strategy to be denoted X ═ X 1 ,x 2 ,...,x s ,...,x S ],x s 1 means that the s-th ES participates in Edge-Aggregation, x s 0 means that the s-th ES does not participate in the polymerization. Then
Figure BDA00035623363600000916
And F global (w global ) Represented by the formula:
Figure BDA0003562336360000101
Figure BDA0003562336360000102
to reduce communication overhead, assume that each ES is at κ 1 Edge-Aggregation is performed after round of local training, with each EPS at κ 2 After the secondary polymerization, Cloud-Aggregation was performed. This process is repeated until sufficient accuracy or communication threshold is reached.
The HFEL system model parameters are updated as:
Figure BDA0003562336360000103
Figure BDA0003562336360000104
wherein l represents the number of local training rounds, w s (l) Watch (A)Showing the weight of the ES obtained in the first round of training, eta shows the learning rate,
Figure BDA0003562336360000105
is represented by F s (w s (l-1)). To protect privacy, HFEL systems use parameter aggregation instead of data aggregation, so
Figure BDA0003562336360000106
And D global Is a virtual data set.
2 Joint problem decoupling strategy
This is an NP-hard problem due to the continuous discrete blend MINLP characteristic of problem P0. The present invention proposes a two-step solution: reduction of action space multi-agent depth certainty policy gradients (hereinafter abbreviated RAS-madpg) and KL divergence based participant selection (hereinafter abbreviated PS-KL). The combined application of RAS-MADDPG and PS-KL processes unbalanced and Non-IID data, and can also improve the model training precision and reduce the system cost.
By fixing the variables, the continuous discrete blending MINLP problem P0 can decompose two sub-problems. As shown in FIG. 2, for problem P0, attention is first directed to fixing
Figure BDA00035623363600001010
Minimizing binary offload policies
Figure BDA00035623363600001011
The following problem P1 is obtained:
Figure BDA0003562336360000107
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000108
is a binary offload strategy
Figure BDA0003562336360000109
Is determined.
Next, solving the bandwidth allocation policy results in a problem P2 as shown in the following formula:
Figure BDA0003562336360000111
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000112
is a bandwidth allocation policy
Figure BDA0003562336360000113
Is determined.
For problem P1, RAS-MADDPG is used to interact with the environment to obtain a binary offload policy
Figure BDA0003562336360000114
For problem P2, the convex optimization method is applicable to the continuous RA problem. After determining the binary unloading strategy
Figure BDA0003562336360000115
And bandwidth allocation policy
Figure BDA00035623363600001115
Thereafter, the training data set for each ES is also determined. To improve the convergence speed and accuracy of FL, the present invention performs an optimized edge aggregation process at problem P3.
Problem solving strategy based on reinforcement learning
3.1 information entropy-based computational offload policy
The present invention describes the evolution of HFEL systems using the following Markov Decision Process (MDP):
(1) the state is as follows: state is s t =[c t ,d t ,f t ,h t ]Wherein, in the step (A),
Figure BDA0003562336360000116
and
Figure BDA0003562336360000117
respectively representing the sample class and the data size,
Figure BDA0003562336360000118
indicating the computing resources available on the ES,
Figure BDA0003562336360000119
indicating ambient channel fading.
(2) The actions are as follows: defining the offload policy generated by each agent as an operation
Figure BDA00035623363600001110
And showing the mapping relation of UD and ES. Since the RA part is split, the motion space is greatly reduced here.
(3) Rewarding: in RAS-MADDPG, the implication of rewards is the amount of system weighted cost after the action executes the TO and RA policies. Thus, the reward is defined as the negative of the cost function, i.e. r-J 1 *J 2 Maximizing the reward means minimizing the system cost.
The TO framework based on RAS-MADDPG is shown in FIG. 1. The present invention introduces a novel reward function and reduced motion space in the maddppg model. The reward function may effectively incentivize the agent to find an optimal policy. The RA problem P2 is solved in the RA method block without putting in motion space. The network in RAS-MADDPG outputs only the offload decisions, so the action space and network complexity is greatly reduced.
The network parameters given to Actor and Critic are θ ═ θ, respectively 1 ,...,θ n ]And ω ═ ω [ ω ] 1 ,...,ω n ]. The set of policies for each agent is: pi ═ pi 1 ,...,π n ]. Suppose the deterministic policy set of N agents is μ ═ μ 1 ,...,μ N ]The deterministic policy gradient is then expressed as follows:
Figure BDA00035623363600001111
in the formula (I), the compound is shown in the specification,
Figure BDA00035623363600001112
as a strategic gradient formula, E s,a []Indicates expectation of a i Represents an action o i Denotes observation, μ i (a i |o i ) A deterministic policy is represented that is,
Figure BDA00035623363600001113
denotes a i The gradient of (a) of (b) is,
Figure BDA00035623363600001114
a state-action function representing the ith agent concentration.
Centralized training and distributed execution mechanism of madpg: during the training phase, Actor and Critic perform intensive training. During the execution phase, the Actor only needs to know the local observations. The centralized Critic update leverages the Time Difference (TD) in DQN and the idea of the target network, which is expressed as follows:
Figure BDA0003562336360000121
wherein L (θ) i ) A loss function representing Critic; e s,a,r,s′ []Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function,
Figure BDA0003562336360000122
Figure BDA0003562336360000123
wherein, gamma represents the discount rate,
Figure BDA0003562336360000124
denotes a target network,. mu.' j (o j ) ' parameter θ ' representing target strategy with hysteresis update ' j
The detailed algorithmic representation of maddppg is shown in fig. 3.
3.2 resource Allocation Module
The problem P0 is difficult to solve due to continuous discrete mixing and non-convexity. Referring to the Tammer decoupling method, problem P2 can be rewritten as follows:
Figure BDA0003562336360000125
Figure BDA0003562336360000126
Figure BDA0003562336360000127
in the formula: m 1 And M 2 Is a constant associated with the task u,
Figure BDA0003562336360000128
M 2 =-∑ s∈S Entropy(D s ) (ii) a The SINR is the SINR u . To the right of the equation is a Min-Max problem. And obtaining the second derivative from the supremum to know that the derivative is a convex function. Known from the theorem of sum of convex functions
Figure BDA0003562336360000129
For convex functions, the optimal strategy can be derived using the Lagrangian multiplier and the KKT condition
Figure BDA00035623363600001210
Wherein the content of the first and second substances,
Figure BDA00035623363600001211
Figure BDA00035623363600001212
ε s representing lagrange multipliers.
3.3 data-aware HFEL
PS is applied for edge aggregation and reduces the impact of non-IID data. Due to FL environmentThe distributed nature of random selection of clients for aggregation can exacerbate the adverse effects of data heterogeneity. To facilitate the study of edge aggregation, the present invention converts the difference in aggregation weights into a difference in the distribution of the data set. ES data set D s Distribution P (D) s ) Is defined as: p (D) s )=[P c (D s )|c∈C],P c (D s ) Data of the indicated category c is in D s C is the number of classes.
Figure BDA00035623363600001213
Is a virtual data set of EPS in the kth FEL-MTMH scenario, the distribution of which
Figure BDA00035623363600001214
Is shown as
Figure BDA00035623363600001215
Figure BDA00035623363600001216
Represents class c data in
Figure BDA00035623363600001217
The ratio of (1). Global virtual dataset D global Is by polymerization
Figure BDA0003562336360000131
And (4) obtaining the product. The invention concerns the similarity of data distribution after edge aggregation and cloud aggregation, introduces KL divergence and defines the following data distribution as follows:
Figure BDA0003562336360000132
based on the above, the present invention proposes problem P3: the KL divergence minimization problem is shown by the following equation:
Figure BDA0003562336360000133
Figure BDA0003562336360000134
in the formula (I), the compound is shown in the specification,
Figure BDA0003562336360000135
representing the aggregate decision of the server s.
Order to
Figure BDA0003562336360000136
The KL divergence is then expressed as:
Figure BDA0003562336360000137
in the formula, P c (D global ) Data in D representing category c global The ratio of (1).
The second derivative on the right side of the equal sign is known to be a convex function, which means that the KKT condition can be used to obtain the optimal strategy
Figure BDA0003562336360000139
In particular, an optimal strategy
Figure BDA00035623363600001310
Is in the domain
Figure BDA00035623363600001311
Obtained as described above. According to
Figure BDA00035623363600001312
To a binary value. The specific Data-aware HFEL algorithm is shown in fig. 4.
4 Experimental Environment setup
The present invention employs two exemplary learning-based visual recognition tasks to construct a HFEL simulation system. These tasks are based on two data sets: MNIST digital data set and CIFAR-10 real object data set. To study the effect of non-IID data, the present invention also specifies the way data is distributed in the edge. To simulate unbalanced data, the number of samples on each UD is in accordance with a Gaussian distribution X N (100, 10). The invention shapes two data distributions: (1) IID, each data set is shuffled and then randomly partitioned into UDs. (2) Non-IID, data sets are classified into 10 classes according to label. Each UD is then assigned a sample randomly selected from two categories. Furthermore, training was performed on two models: (1) a multi-layer perceptron with two hidden layers and activated by a sigmoid function. (2) CNN of a 5 x 5 convolution kernel. The model is provided in the TensorFlow course and consists of two convolutional layers and two fully-connected layers.
For each FEL-MTMH scenario, consider a multi-element system where ES is located at the center of the hexagonal element. Assuming 10 cells per FEL-MTMH scenario, the ES computation power is from [6, 8, 10, 12 ]]And (4) randomly selecting in GHz. The background noise power σ and bandwidth B are set to-100 dBm and 5 MHz. The channel gain h follows the free space path loss model. On the UD side, the maximum transmission power is from [80, 100, 120 ]]Randomly selecting from mW. Default energy consumption and delay parameters
Figure BDA0003562336360000138
The invention realizes the comparison of three representative standards:
(1) HFEL-based RAS-MADDPG algorithm: the invention provides a method that applies RAS-MADDPG and data-aware HFEL. The TO, RA and PS problems in the MEC scene are jointly solved.
(2) Cloud-based FL algorithm (C-FL): the C-FL trains all UD data on distributed computing nodes, and parameters are aggregated through Cloud-based FL.
(3) An HFEL-based DRL (Actor-critical) offload algorithm (hereinafter abbreviated as HFEL-DRL): HFEL-DRL applies the latest DRL algorithm to obtain offloading decisions in HFEL. The algorithm adjusts the strategy through a heuristic method to achieve the performance close to the optimal performance.
(4) HFEL-based independent offload and Joint RA (hereinafter HFEL-IOJR): HFEL-IOJR randomly assigns each task to ES and employs joint RA.
5 influence of Key parameters
First, two key parameters (i.e., κ) of the HFEL-MADDPG algorithm were quantified 1 And kappa 2 ) The influence of (c). The training time and energy consumption for transmission and calculation are calculated in table 1.
Figure BDA0003562336360000141
TABLE 1 Critical parameter Effect
Table 1 shows that HFEL has less training time and energy consumption compared to C-FL. In addition, with κ 1 The training time of the two data sets decreases monotonically. The energy consumption of the system is firstly reduced and then increased, which shows that 1 The reduction of (b) may reduce the consumption of edge computations. However, too frequent edge aggregation may result in increased communication consumption. Reasonable setting kappa 1 And kappa 2 The system efficiency will be improved.
6 comparison with cloud-based FL
The performance of C-FL and HFEL-MADDPG were compared by convergence analysis. The results are shown in FIGS. 5 and 6. When the data is IID, the accuracy and convergence rate of HFEL-MADDPG are better than those of C-FL. However, when the data is Non-IID, the convergence rate of HFEL-MADDPG is significantly reduced. Thus, two conclusions can be drawn: (1) compared with C-FL, the data-aware HFEL-MADDPG has better performance in terms of training speed and accuracy. (2) non-IID data can greatly reduce the accuracy and convergence speed of the model.
7 comparison with HFEL baseline Algorithm
Next, the performance of the HFEL-MADDPG in the HFEL scenario is compared. In FIGS. 7 and 8, it can be seen that HFEL-MADDPG and HFEL-DRL can effectively reduce the cost of defluxing compared with HFEL-IOJR. In particular, the cost of HFEL-maddppg is minimal, reflecting the effectiveness of the offloading strategy and PS mechanism of the present invention.
The training process of HFEL-madpg was compared to other baseline algorithms with guaranteed communication performance. FIGS. 9 and 10 show the comparison of the training effects of the three HFEL algorithms in the Non-IID scenario. We can see that HFEL-maddppg has the best accuracy and convergence speed. In addition, compared with baselines, the HFEL-MADDPG greatly reduces communication cost, improves training efficiency and embodies the effectiveness of the algorithm.

Claims (5)

1. A high-efficiency data perception layering federated learning method based on task unloading is characterized by comprising the following steps
Defining a FEL-MTMH scene, wherein a training model is positioned in an edge server in the FEL-MTMH scene, a plurality of user equipment can unload data to the plurality of edge servers, the edge calculation scene can be abstracted to a Multi-task Multi-Helper scene, wherein the user equipment is only responsible for collecting and unloading the data, and the process of model training and parameter aggregation is carried out in the edge server and the edge parameter server;
in a FEL-MTMH scenario, there are U user equipments and S edge servers, the U-th user equipment unloads a task U to an edge server through an uplink channel, the S-th edge server is further defined as a server S, and then, for the task U unloaded to the server S: use of h us Representing the channel gain between task u and server s; order to
Figure FDA00035623363500000115
Representing the bandwidth allocation of task u; use of
Figure FDA0003562336350000011
To represent the bandwidth allocation policy of server s, where U s Representing a set of tasks to be offloaded to a server s, B s Represents the communication bandwidth of the server s; with binary offload policies
Figure FDA0003562336350000012
To express an offload policy, let m us To represent
Figure FDA0003562336350000013
M if task u can be offloaded to server s us 1, otherwise, m us When the value is 0, then:
offload delay for server s
Figure FDA0003562336350000014
The device consists of a transmission part and a calculation part, and is shown as the following formula:
Figure FDA0003562336350000015
in the formula, R u Uplink transmission rate R of task u u And e represents the calculated density, d u Data size, f, representing task u s Representing the computing power of the server s where the task u is located;
off-load energy consumption of servers s
Figure FDA0003562336350000016
Comprises the following steps:
Figure FDA0003562336350000017
in the formula, alpha represents
Figure FDA0003562336350000018
And
Figure FDA0003562336350000019
q is an energy parameter depending on the architecture of the s-chip of the server;
will cost function J 1 Defined to minimize the maximum cost among all edge servers, the following is defined:
Figure FDA00035623363500000110
in the formula (I), the compound is shown in the specification,
Figure FDA00035623363500000111
and
Figure FDA00035623363500000112
in order to be a weight parameter, the weight parameter,
Figure FDA00035623363500000113
and is provided with
Figure FDA00035623363500000114
Information entropy is introduced into the system cost to represent the characteristics of data distribution, as shown in the following formula:
Figure FDA0003562336350000021
Figure FDA0003562336350000022
in the formula: d s Represents a data set collected by server s; c represents the number of categories; p c (D s ) Data representing class c is in D s The ratio of (1);
maximization of J 2 Entropy of (2), then the joint problem is defined as follows, denoted as the continuous discrete mixture MINLP problem P0:
(P0):
Figure FDA0003562336350000023
s.t.m us ∈0,1,
Figure FDA0003562336350000024
Figure FDA0003562336350000025
Figure FDA0003562336350000026
the hierarchical federated edge learning system is provided with a cloud server and K FEL-MTMH scenes, wherein each FEL-MTMH scene is provided with an edge parameter server and S edge servers; defining the data set, weight and loss function of the edge server as D s 、w s And F s (w s ) (ii) a Defining the data set, weight and loss function of the edge parameter server in the k-th FEL-MTMH scene as
Figure FDA0003562336350000027
And
Figure FDA0003562336350000028
respectively defining the data set, the weight and the loss function of the cloud server as D global 、w global And F global (w global );
Setting the aggregation policy to X ═ X 1 ,x 2 ,...,x s ,...,x S ],x s 1 denotes the s-th edge server participating in edge aggregation, x s 0 means that the s-th edge server does not participate in edge aggregation, then
Figure FDA0003562336350000029
And F global (w global ) Represented by the formula:
Figure FDA00035623363500000210
Figure FDA00035623363500000211
let each edge server be at κ 1 Edge aggregation is performed after round of local training, with each edge parameter server at κ 2 Performing cloud aggregation after the secondary aggregation, and repeating the processUntil sufficient accuracy or communication threshold is reached, the HFEL system model parameters are updated as:
Figure FDA0003562336350000031
Figure FDA0003562336350000032
wherein l represents the number of local training rounds, w s (l) Represents the weight of the ES obtained in the first round of training, η represents the learning rate,
Figure FDA0003562336350000033
is represented by F s (w s (l-1)), wherein,
Figure FDA0003562336350000034
and D global Is a virtual data set;
the continuous discrete hybrid MINLP problem P0 is decomposed into two subproblems, one each by fixing
Figure FDA00035623363500000321
Minimizing binary offload policies
Figure FDA00035623363500000322
Problem P1 resulting from the cost and problem P2 resulting from solving the bandwidth allocation policy, problem P1 is represented as:
(P1):
Figure FDA0003562336350000035
in the formula (I), the compound is shown in the specification,
Figure FDA0003562336350000036
is a binary offload policy
Figure FDA0003562336350000037
Is determined.
Problem P2 is expressed as:
(P2):
Figure FDA0003562336350000038
in the formula (I), the compound is shown in the specification,
Figure FDA0003562336350000039
is a bandwidth allocation policy
Figure FDA00035623363500000323
The optimal function of (2).
For problem P1, a reduced action space multi-agent depth certainty policy gradient is utilized to interact with the environment to obtain a binary offload policy
Figure FDA00035623363500000310
For problem P2, a convex optimization method is used to determine the bandwidth allocation strategy
Figure FDA00035623363500000311
ES data set D s Distribution P (D) s ) Is defined as: p (D) s )=[P c (D s )|c∈C],P c (D s ) Represents D s A distribution belonging to class C, C being the number of classes;
Figure FDA00035623363500000312
is a virtual data set of edge parameter servers in the kth FEL-MTMH scenario, the distribution of which
Figure FDA00035623363500000313
Is shown as
Figure FDA00035623363500000314
To represent
Figure FDA00035623363500000315
A distribution belonging to class c; global virtual dataset D global Is by polymerization
Figure FDA00035623363500000316
Obtaining;
introducing a KL divergence, defined as:
Figure FDA00035623363500000317
problem P3: the KL divergence minimization problem is shown by the following equation:
Figure FDA00035623363500000318
Figure FDA00035623363500000319
in the formula (I), the compound is shown in the specification,
Figure FDA00035623363500000320
represents an aggregated decision for server s;
order to
Figure FDA0003562336350000041
The KL divergence is then expressed as:
Figure FDA0003562336350000042
in the formula, P c (D global ) Represents D global A distribution belonging to class c;
for problem P3, the optimal strategy is derived using the KKT condition
Figure FDA0003562336350000043
2. The efficient data-aware hierarchical federated learning method based on task offloading of claim 1, wherein the task u's signal to interference and noise ratio SINR u Expressed as:
Figure FDA0003562336350000044
in the formula, p u Indicating the communication power allocated by the task u;
Figure FDA0003562336350000045
σ 2 which is indicative of the power of the background noise,
Figure FDA0003562336350000046
is the cumulative inter-cell interference for all tasks associated with the ESs other than the server s.
3. The efficient data-aware hierarchical federated learning method based on task offloading as recited in claim 2, wherein the uplink transmission rate R of task u u Expressed as:
Figure FDA0003562336350000047
4. the efficient data-aware hierarchical federated learning method based on task offloading as recited in claim 1, wherein if the network in the reduced action-space multi-agent depth certainty policy gradient outputs only offloading decisions, the reduced action-space multi-agent depth certainty policy gradient is utilized to interact with the environment to obtain a binary offloading policy
Figure FDA0003562336350000048
The method comprises the following steps:
a reward function and reduced motion space are introduced in the madpg model, which describes the evolution of HFEL systems using the following markov decision process:
(1) the state is as follows: state is s t =[c t ,d t ,f t ,h t ]Wherein, in the step (A),
Figure FDA0003562336350000049
and
Figure FDA00035623363500000410
respectively representing the sample class and the data size,
Figure FDA00035623363500000411
indicating the computing resources available on the ES,
Figure FDA00035623363500000412
representing ambient channel fading;
(2) the actions are as follows: defining the offload policy generated by each agent as an operation
Figure FDA00035623363500000413
The representation represents the mapping relation between the user equipment and the edge server;
(3) rewarding: the implication of the reward is the amount of system weighted cost after the offload policy and bandwidth allocation policy are implemented according to the action. Thus, the reward is defined as the negative of the cost function, i.e. r-J 1 *J 2 Maximizing rewards means minimizing system costs;
the parameters of the Actor and Critic networks are given as theta ═ theta respectively 1 ,...,θ n ]And ω ═ ω [ ω ] 1 ,...,ω n ]. The set of policies for each agent is: pi ═ pi 1 ,...,π n ]Let us assume that the deterministic policy set of N agents is μ ═ μ 1 ,...,μ N ]The deterministic policy gradient is then expressed as follows:
Figure FDA0003562336350000051
in the formula of the Chinese medicinal composition,
Figure FDA0003562336350000052
as a strategic gradient formula, E s,a []Indicates expectation of a i Represents an action o i Denotes observation, μ i (a i |o i ) A deterministic policy is represented that is,
Figure FDA0003562336350000053
denotes a i The gradient of (a) of (b) is,
Figure FDA0003562336350000054
a state-action function representing an ith agent concentration;
in the centralized training stage of the MADDPG model, Actor and Critic carry out centralized training;
in the distributed execution phase of the MADDPG model, the Actor only needs to know local observations, and the centralized criticc update is expressed as follows:
Figure FDA0003562336350000055
wherein, in the formula, L (theta) i ) A loss function representing Critic; e s,a,r,s′ []Indicating a desire, s' indicates a next state, a indicates an action, r indicates a reward; y represents the sum of the reward and the discounted state-action function.
5. The efficient data-aware hierarchical federated learning method based on task offloading of claim 1, wherein problem P2 is rewritten as follows:
Figure FDA0003562336350000056
Figure FDA0003562336350000057
Figure FDA0003562336350000058
in the formula: m 1 And M 2 Is a constant associated with the task u,
Figure FDA0003562336350000059
M 2 =-∑ s∈S Entropy(D s ) (ii) a The SINR is the SINR u
Deriving optimal policies using Lagrangian multipliers and KKT conditions
Figure FDA00035623363500000510
Wherein the content of the first and second substances,
Figure FDA00035623363500000511
ε s representing the lagrange multiplier.
CN202210293352.5A 2022-03-24 2022-03-24 Efficient data perception layered federated learning method based on task unloading Withdrawn CN114828095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210293352.5A CN114828095A (en) 2022-03-24 2022-03-24 Efficient data perception layered federated learning method based on task unloading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210293352.5A CN114828095A (en) 2022-03-24 2022-03-24 Efficient data perception layered federated learning method based on task unloading

Publications (1)

Publication Number Publication Date
CN114828095A true CN114828095A (en) 2022-07-29

Family

ID=82531707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210293352.5A Withdrawn CN114828095A (en) 2022-03-24 2022-03-24 Efficient data perception layered federated learning method based on task unloading

Country Status (1)

Country Link
CN (1) CN114828095A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591888A (en) * 2024-01-17 2024-02-23 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train
WO2024087573A1 (en) * 2022-10-29 2024-05-02 华为技术有限公司 Federated learning method and apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024087573A1 (en) * 2022-10-29 2024-05-02 华为技术有限公司 Federated learning method and apparatus
CN117591888A (en) * 2024-01-17 2024-02-23 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train
CN117591888B (en) * 2024-01-17 2024-04-12 北京交通大学 Cluster autonomous learning fault diagnosis method for key parts of train

Similar Documents

Publication Publication Date Title
Tang et al. Computational intelligence and deep learning for next-generation edge-enabled industrial IoT
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN114828095A (en) Efficient data perception layered federated learning method based on task unloading
Jiang et al. Distributed resource scheduling for large-scale MEC systems: A multiagent ensemble deep reinforcement learning with imitation acceleration
CN112598150B (en) Method for improving fire detection effect based on federal learning in intelligent power plant
CN111132074B (en) Multi-access edge computing unloading and frame time slot resource allocation method in Internet of vehicles environment
CN110233755B (en) Computing resource and frequency spectrum resource allocation method for fog computing in Internet of things
CN112637883A (en) Federal learning method with robustness to wireless environment change in power Internet of things
CN113613301B (en) Air-ground integrated network intelligent switching method based on DQN
Yang et al. Deep reinforcement learning based wireless network optimization: A comparative study
CN116541106B (en) Computing task unloading method, computing device and storage medium
Elbir et al. A hybrid architecture for federated and centralized learning
CN111224905A (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN115034390A (en) Deep learning model reasoning acceleration method based on cloud edge-side cooperation
CN114528987A (en) Neural network edge-cloud collaborative computing segmentation deployment method
Zhou et al. Dynamic channel allocation for multi-UAVs: A deep reinforcement learning approach
He et al. Computation offloading and resource allocation based on DT-MEC-assisted federated learning framework
Qu et al. Stochastic cumulative DNN inference with RL-aided adaptive IoT device-edge collaboration
Ma et al. Quality-aware video offloading in mobile edge computing: A data-driven two-stage stochastic optimization
CN114629769B (en) Traffic map generation method of self-organizing network
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN116360996A (en) Reliable edge acceleration reasoning task allocation method in Internet of vehicles environment
CN116321181A (en) Online track and resource optimization method for multi-unmanned aerial vehicle auxiliary edge calculation
Wu et al. Model-heterogeneous Federated Learning with Partial Model Training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220729