CN113902021B

CN113902021B - Energy-efficient clustered federal edge learning strategy generation method and device

Info

Publication number: CN113902021B
Application number: CN202111191599.8A
Authority: CN
Inventors: 秦晓琦; 李艺璇; 韩凯峰; 马楠
Original assignee: Beijing University of Posts and Telecommunications; China Academy of Information and Communications Technology CAICT
Current assignee: Beijing University of Posts and Telecommunications; China Academy of Information and Communications Technology CAICT
Priority date: 2021-10-13
Filing date: 2021-10-13
Publication date: 2024-06-21
Anticipated expiration: 2041-10-13
Also published as: CN113902021A

Abstract

The invention discloses a method and a device for generating a cluster federal edge learning strategy with high energy efficiency, wherein the method comprises the following steps: s1, initializing an edge access strategy by a cloud center; s2, the edge base station solves a bandwidth resource allocation strategy of the access equipment, and sends an initialization model to the access equipment; s3, the equipment calculates the accuracy of the received global model, trains the local model by adopting a hierarchical migration strategy according to the global model and local data, calculates the energy spent for uploading the local model, takes the difference value between the test accuracy and the energy consumption as the local benefit, and uploads the local model and the local benefit to the accessed edge base station; s4, layering and aggregating the local model by the edge base station, calculating edge benefits by averaging the local benefits of all access devices, and uploading the edge benefits to a cloud center; s5, the cloud center calculates system benefits according to the received feedback information of the edge base station, and the edge access strategy is adjusted by adopting a deep reinforcement learning algorithm; s6, repeating the process until convergence.

Description

Energy-efficient clustered federal edge learning strategy generation method and device

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a device for generating a cluster federal edge learning strategy with high energy efficiency.

Background

Data security has become a key issue for the continued development of artificial intelligence technology. Traditional machine learning techniques are centralized, which collect data from devices to a processing center for centralized training, however this may lead to user data privacy exposure.

Federal learning is a promising distributed machine learning architecture, with the improvement of computing power of devices, the devices can train local models locally by using collected data, and then only the local models need to be uploaded to a processing center for model aggregation, so that direct uploading of original data is avoided, and data privacy is greatly protected.

In real life, data between devices may exhibit non-independent co-distribution characteristics, which presents challenges for federal learning to train a unified global model. It is therefore very interesting to study how federally learning fits the data on each device. At present, some research on personalized federal learning has been proposed.

The personalized federation learning comprises federation transfer learning and federation element learning, and the essence of the personalized federation learning is that a basic global model shared by all devices is obtained firstly, and then the global model is finely tuned on each device according to local data so as to adapt to the characteristics of personalized data. Different personalized federal learning strategies have drawbacks. Since the federal transfer learning and federal meta learning firstly need to obtain a global model covering most of characteristics and then perform individuation, the method is only suitable for data with weaker isomerism, and cannot deal with individuation problems of a strong isomerism data system.

Multitasking federal learning is also an effective method for solving the personalized federal learning problem, which quantifies the similarity of different equipment models by calculating a correlation matrix, and then takes heterogeneous data as different learning targets, thereby performing multitasking learning. Federal multitasking is only applicable to convex problems or biconvex problems, and is difficult to develop into non-convex problems such as common neural networks, and has limitations. In addition, most of the personalization methods are applicable to outputting data with different labels, for example, each device only has a subset of all labels and cannot be applicable to data with different conditional distributions and obvious clustering structures.

The clustering federal learning can effectively solve the problems, and can capture the clustering structure among data, so that heterogeneous data characteristics among devices are met by aggregating a plurality of models according to data distribution, and learning accuracy is greatly improved. Because of the privacy of federal learning, the distribution of data across devices is unknown, which presents a significant challenge to clustering. From theoretical analysis, when the distance between learning models is smaller, the data distribution of the learning models is closer, so that the clustering federation learning mostly adopts model distance to measure the data similarity on different devices under the condition of not uploading original data. The common indexes for measuring the distance of the model are Euclidean distance, cosine distance and the like. Some techniques, however, may infer data information at the device from the local model, thereby causing data privacy to be compromised. The model nonlinear encryption method can well solve the problem, but the distance between models after nonlinear encryption is probably not proportional to the distance between original models, so that the clustering of the distances between the models by using the local models is not widely applicable because the clustering method is invalid due to the fact that the similarity between data cannot be judged by the models after encryption under the condition of low calculation complexity. In addition, most of the existing cluster federation learning only considers the statistical heterogeneity of data, and ignores the resource limitation and communication bottleneck problems of the system. Meanwhile, the researches only consider single base station scenes, and lack expansion of multiple base stations. For energy-limited devices, the communication overhead is not negligible, the spectrum resources provided by a single base station are limited, and for devices with poor channel conditions, uploading a local model consumes a lot of device energy, thereby degrading learning performance under training cost budget.

Traditional federal learning requires devices to upload local models to the cloud for aggregation over a wide area network, while the battery capacity of the devices is often limited, and multiple communication iterations of federal learning and the huge communication overhead in each round will consume a large amount of transmission energy, thereby reducing learning performance under a given energy budget. Multiple access edge computing (MEC) technology is a promising distributed computing framework capable of supporting the needs of many low latency and low energy consumption applications, with MEC offloading delay sensitive and computation intensive tasks to the edges, achieving real-time and energy efficiency. Federal edge learning takes advantage of MEC, adding multiple base stations between the cloud and the device to further assist training, the device uploading the local model to the edge base station for aggregation. The communication overhead of the transmission between the equipment and the cloud through the wide area network is greatly reduced, and in addition, the system achieves high energy efficiency and high precision under the condition that data are not independently and uniformly distributed through the integral coordination of the edge base station and the equipment. In the architecture of multi-base station federal learning, training cost, such as time and energy consumption, is mostly considered, the opportunities and challenges brought to multi-base station scenes by statistical heterogeneity are not considered, and joint optimization research on training cost and learning performance is lacking.

Disclosure of Invention

Aiming at the defects of the prior art, the invention finds the intersection of the statistical isomerism and the communication bottleneck by jointly considering the data distribution and the energy consumption cost of equipment in the scene of the multi-edge base station, designs an edge access strategy and a resource allocation strategy with high energy efficiency and high precision from the aspect of system income, and provides an energy-efficient clustered federal edge learning strategy generation method and device.

In order to achieve the above object, the present invention provides the following technical solutions:

in a first aspect, the present invention provides an energy-efficient clustered federal edge learning strategy generation method, including the steps of:

s1, initializing an edge access strategy by a cloud center;

s2, the edge base station uses a convex optimization method to solve a bandwidth resource allocation strategy of the access equipment, and sends an initialization model of the bandwidth resource allocation strategy to the access equipment;

S3, the equipment calculates the accuracy of the received global model on a local test data set, trains the local model by adopting a hierarchical federal migration method according to the global model and local training data, calculates the energy spent for uploading the local model, takes the difference value between the test accuracy and the energy consumption as local benefit, and then uploads the local model and the local benefit to an accessed edge base station;

S4, the edge base station layer-by-layer aggregates the local model, and meanwhile, calculates edge benefits by averaging the local benefits of all access devices, and uploads the edge benefits to the cloud center;

s5, the cloud center calculates system benefits according to the received feedback information of the edge base station, and adjusts an edge access strategy by adopting a deep reinforcement learning algorithm;

s6, repeating the process until convergence.

Further, in step S1, the access policy a _ij of the device and the edge base station is a binary variable, i.e. a _ij =1 if the device i communicates with the edge base station j, otherwise a _ij =0, and each device accesses one edge base station.

Further, the convex optimization method in step S2 specifically includes: for edge base station j and access equipment cluster thereofGiven an edge access policy, the optimal bandwidth allocation β _ij of the resource allocation sub-problem is calculated as follows:

Where h _ij denotes the channel gain between device i and edge server j, p _i denotes the model upload power of device i, N ₀ denotes the power spectral density of gaussian noise, beta _ijB_j is the bandwidth resource allocated by device i accessing edge base station j, devices accessing edge base station j communicate with the public spectrum with bandwidth B _j, A _ij represents the access policy of the device with the edge base station and β _ij represents the ratio of bandwidth allocated to device i.

Further, the device uses the local data according to the received global model θ _j Training is performed, and a loss function formula of the equipment i is as follows:

The device updates the local model ω _i using the gradient descent method, with the formula:

Wherein, eta is learning step length and eta is more than or equal to 0;

step S3, training a local model by adopting a layered federal transfer learning strategy, and dividing the neural network into a basic characteristic layer and a personalized characteristic layer, wherein the layered federal transfer learning strategy comprises the following specific processes:

S301, calculating average learning accuracy of each edge base station after a certain turn according to the following formula:

s302, equipment basic feature layer model with higher average precision And personality trait layer model/>Uploading to an accessed edge base station, uploading a basic feature layer model by equipment with lower average precision, and updating a personalized feature layer model locally by the equipment, wherein the formula is as follows:

Wherein, For the local personality trait layer model of device i,/>Is a set of migration devices.

S303, the edge base station aggregates basic feature layer models of all devices and aggregates individual feature layer models of non-migration devices, the edge base station transmits the aggregated basic layer models to all access devices, and transmits the individual layer models to the non-migration devices, and the devices update the basic feature layer models according to the received models, and iterate until convergence.

Further, in step S3, the learning accuracy G _ij of the global model on the local test dataset is used as an index for measuring the performance of the global model on the edge base station j, and the learning performance gain G of the system is the average accuracy of all devices, and the formula is shown as follows:

further, the energy E _ij consumed by the device i to upload the local model in step S3 is formulated as follows:

T _ij is the transmission delay of the device i for uploading the local model to the edge base station, and the formula is as follows:

s represents the size of the local model, r _ij is the transmission rate of the uploading model of the equipment i, and the formula is as follows:

h _ij denotes the channel gain between device i and edge server j, p _i denotes the model upload power of device i, N ₀ denotes the power spectral density of gaussian noise, beta _ijB_j is the bandwidth resource allocated by device i to access edge base station j, the device to access edge base station j communicates with the public spectrum having bandwidth B _j, A _ij represents the access policy of the device with the edge base station and β _ij represents the ratio of bandwidth allocated to device i. Further, in step S4, before hierarchical aggregation, the edge base station aggregates all received local models, with the following formula:

Wherein, For all clusters of devices accessing edge base station j, ω _i is the local model.

After training for a certain turn, executing a hierarchical federal migration learning strategy, and layering and aggregating the received local model by an edge base station, wherein the specific method comprises the following steps of: the edge base station aggregates basic feature layer models of all devices to ensure the generalization performance of the models, and aggregates individual feature layer models of non-migration devices to eliminate the influence of non-independent and same-distributed data among the devices, wherein the formula is as follows:

Wherein, For the basic feature layer global model of the device accessing the edge base station j, shared for all devices,/>For the global model of the personality trait layer, shared for non-migration devices,/>Is a non-migrating cluster of devices accessing the edge base station j.

Further, the formula of the system benefit function in step S5 is as follows:

Wherein mu is a continuous variable, mu epsilon [0,1] is used for adjusting the trade-off relation between learning performance and transmission energy consumption, and G _max and E _max are the highest precision and the highest energy consumption which can be achieved by the system.

Further, in step S5, the edge access policy is adjusted by deep reinforcement learning, and the specific process of the deep reinforcement learning is as follows:

S501, describing the edge association problem as a Markov process The specific details are as follows:

(1) Status of At the kth round, the state is defined as S (k) = { S ₁(k),S₂(k),…,S_N (k) }, and each term S _i (k) is defined as:

S_i(k)＝{A_i(k-1),β_ij(k),Δ_i(k)}

Wherein Δ _i (k) represents whether learning accuracy is improved compared with k-1, i.e., Δ _i (k) =1 represents that accuracy is improved, and conversely, Δ _i (k) =0;

(2) Action In the kth round, the action is an edge association policy for each device:

A(k)＝{A₁(k),A₂(k),…,A_N(k)}

wherein each term a _i (k) can be expressed as:

A_i(k)＝{a_ij(k)}

(3) Rewards Setting the reward as an objective function:

s502, selecting DQN as a basic framework, combining dueling DQN and double DQN to optimize an algorithm, using D3QN to solve the problem of edge access, approximating a Q value function Q (S, A; theta) by a neural network with a parameter of theta, representing a mapping relation between the environment and the action, wherein the output of the neural network is obtained by a Belman equation:

s ', A ', theta ' are the state, action and corresponding parameters of the next time slot respectively;

Two Q networks of identical structure but different parameters are used in DQN to improve the stability of the algorithm, one is a current Q network with the latest parameters for evaluating the cost function of the current state-action, the other is a target Q network with past round parameters and keeping the Q value unchanged for a period of time, the Q value of the current Q network is taken as input to the neural network, the goal of DQN is to minimize the difference between the two Q networks and define it as the loss function of DQN:

L(θ)＝E[(y-Q(S，A；θ))²]

s503, adopting DDQN algorithm to select action corresponding to the maximum Q value in the current Q network:

and then the selected action is brought into a target Q network to calculate the Q value:

y＝R(S，A)+γQ'(φ(S'),A^max(S'；θ)；θ')

S504, optimizing the network structure by using dueling DQN, and dividing the network into two parts, namely a value function V (S, θ, α) related to the state only and a potential function a (S, a, θ, β) related to the state and the action, wherein θ is a common parameter of the two networks, α is a value function unique parameter, β is a potential function unique parameter, and Q is a sum of the two functions:

Q(S,A,θ,α,β)＝V(S,θ,α)+A(S,A,θ,β)。

In a second aspect, the present invention provides an energy-efficient cluster federation edge learning strategy generation device, including a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor implements the energy-efficient cluster federation edge learning strategy generation method when executing the computer program.

Compared with the prior art, the invention has the beneficial effects that:

1. The invention jointly considers the practical bottleneck of federal learning in reality, namely the non-independent co-distribution characteristic of equipment data and the communication and energy limitation of equipment, and most researches only consider a single problem.

2. The invention expands the federal learning of the traditional single base station to a multi-base station scene by taking the communication overhead of the equipment into consideration. Unlike the multi-base station scene which only solves the communication bottleneck problem, the invention designs the edge access strategy and the resource allocation strategy with high precision and high energy efficiency by jointly considering the isomerism of data distribution and the channel state from the perspective of system income.

3. In order to increase the universality of the algorithm, the invention considers the problem of data privacy of federal learning, and particularly, some technologies can infer data at the equipment end from a model uploaded by the equipment, and nonlinear privacy encryption is an algorithm for further protecting the data privacy, so that the conventional model distance clustering method fails. The invention designs deep reinforcement learning, adaptively explores an edge access strategy according to edge feedback information, and protects data privacy. Meanwhile, in order to increase the expansibility of the algorithm and reduce the complexity of the algorithm, the invention decouples the resource allocation problem to the edge base station for independent solution.

4. The invention considers the situation that the equipment with inconsistent data distribution can access the same edge base station, and designs layered transfer learning to further improve the learning performance. Analysis shows that the hierarchical migration strategy designed by the invention does not consume extra energy.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a clustered federal edge learning system architecture provided in an embodiment of the present invention.

Detailed Description

For a better understanding of the present technical solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

The invention provides a high-energy-efficiency clustered federal edge learning strategy generation method, which comprises the following steps:

s1, initializing an edge access strategy by a cloud center;

S3, the equipment calculates the accuracy of the received global model on a local test data set, trains the local model by adopting a hierarchical federal migration method according to the global model and local training data, calculates energy spent for uploading the local model, takes the difference value between the test accuracy and the energy consumption as local benefit, and then uploads the local model and the local benefit to an accessed edge base station;

s6, repeating the process until convergence.

The invention considers a cluster federal edge learning architecture in a multi-base station scene, as shown in fig. 1, and the cluster federal edge learning architecture consists of a cloud center S, M edge base stations and N devices. In a network, toAs an edge base station set,/>For the number of edge base stations,/> For a set of devices,/>Is the number of devices. For each device/>Acquisition and storage of training data set/>Where x _in is the nth stored sample of device i, y _in is the corresponding tag of x _in,/>Is the amount of training data for device i. Training data of different devices are collected from different data sources, so that training data of federal learning is not independently and uniformly distributed.

In clustered federal edge learning, the goal of the system is to learn multiple models to satisfy heterogeneous data on a device. The federal learning training process includes the following steps:

the edge base station sends an initial global model to the equipment;

The device uses the local data according to the received global model theta _j Training is performed. The loss function of device i is defined as:

the device updates the local model omega _i using a gradient descent method as follows:

Wherein eta is learning step length and eta is more than or equal to 0.

Uploading the updated local model to an accessed edge base station through a wireless link;

the edge base station aggregates all received local patterns as follows:

Wherein, Is a cluster of devices for all access edge base stations j.

The above process is repeated until the model converges.

The access policy a _ij of the device and the edge base station is a binary variable, i.e. a _ij =1 if the device i communicates with the edge base station j, otherwise a _ij =0. Each device can only access one edge base station, so the invention has:

the invention takes the learning accuracy G _ij of the global model on the local data set as an index for measuring the performance of the global model on the edge base station j, and the learning performance gain G of the system can be regarded as the average accuracy of all devices as follows:

It is worth noting that the access of devices with non-independent and co-distributed training data to the same edge base station has a negative influence on learning performance, so that statistical heterogeneity is a key problem in designing edge access strategies.

For the uploading process of the local model, the invention adopts an orthogonal frequency division multiple access (orthogonal frequency division multiple access, OFDMA) communication system, which is also easily expanded to other communication systems. All devices accessing edge base station j share the common spectrum with an available bandwidth B _j for communication, and β _ij represents the proportion of bandwidth allocated to device i. The invention includes:

through the analysis, the bandwidth resource obtained by the device i accessed to the edge base station j is beta _ijB_j. The transmission rate of the device i upload model can be expressed as follows:

Where h _ij denotes the channel gain between device i and edge server j, p _i denotes the model upload power of device i, and N ₀ denotes the power spectral density of gaussian noise. Let S denote the size of the local model, the propagation delay of the device i to upload the local model to the edge base station can be expressed as follows:

then the energy consumed by device i to upload the local model can be expressed as follows:

the invention takes the average transmission energy consumption of all the devices as the communication cost of the federal learning system, and obviously, the communication cost is easily extended to other resources, such as training time delay and the like. The communication cost of the system can be expressed as follows:

From the above analysis, both the edge access policy and the bandwidth resource allocation policy affect the device energy consumption. Therefore, communication costs should also be considered when designing edge access policies.

In order to achieve the purposes of saving communication cost and improving learning accuracy, the invention uses system benefits to quantify the overall performance of federal learning. The invention defines the system benefit as follows:

Wherein μ is a continuous variable and μ∈ [0,1] is used to adjust the trade-off relationship between learning performance and transmission energy consumption. G _max and E _max are the highest accuracy and maximum energy consumption achievable by the system. The purpose of regularization is to mitigate the impact of the two on the policy by different orders of magnitude.

The object of the present invention is to find edge access policies and resource allocation policies to maximize system revenue. The optimization problem can be expressed as follows:

max P

s.t.

In the objective function, a _ij is a binary variable and β _ij is a continuous variable. This optimization problem can be expressed as a mixed integer nonlinear programming problem (MINLP).

Because of the privacy of federal learning, statistical distribution of device data is not available, so it is very difficult to directly obtain a globally optimal solution. Meanwhile, in order to prevent the original data information from being obtained from local model parameters uploaded by the device, federal learning is often used in combination with nonlinear privacy encryption methods. In view of this problem, and in order to increase the universality of the proposed algorithm, the invention uses deep reinforcement learning to adaptively explore an edge access strategy in a multi-base station scenario according to edge feedback information, and can maximize system profit in a manner of protecting data privacy without data exchange.

The deep reinforcement learning can convert different types of variables into the same type through discretization continuous variables or continuous discretization variables and the like to perform unified solution. However, as solution variables increase, deep reinforcement learning is easily trapped in locally optimal solutions, resulting in unsatisfactory results. The invention solves the original problem by decoupling into two sub-problems, which are respectively: the edge association problem is associated with the resource allocation problem given an edge access policy. For the edge association sub-problem, the method and the device deploy deep reinforcement learning at the cloud to adaptively adjust the strategy between the edge base station and the device. The resource allocation sub-problem is related to the edge access problem, so that the resource allocation strategy is decoupled to each edge base station for independent solution under the condition of the given edge access strategy, the complexity of the algorithm is reduced, and the expansibility of the algorithm is increased.

The invention observes that when the edge access strategy is fixed, the learning performance of the system is determined accordingly, and the optimization problem can be reduced to the problem of how to allocate communication resources so as to minimize the uploading energy consumption. And the bandwidth resources of each base station are independently determined by it, independent of other edge base stations. Therefore, the resource allocation problem of the multi-edge base station can be decomposed into M sub-problems, which are solved separately on each edge base station. For each edge base station, the following problem needs to be solved:

Wherein, For the cluster of devices accessing edge base station j, N _j is the device group/>The number of devices in the system.

It is apparent that the above problem is a convex problem. Because the variable β _ij is convex in the feasible region, and all constraints are affine.

The present invention uses the commonly used Karush-Kuhn-turner (KKT) condition to obtain an analytical solution for bandwidth allocation, and has the following theorem.

Theorem 1: for edge base station j and training equipment cluster thereofGiven an edge access policy, the optimal bandwidth allocation β _ij for the resource allocation sub-problem can be expressed as follows:

the proving process of theorem 1 is as follows:

the convex problem can be solved by a Lagrangian multiplier method, and the Lagrangian equation of the objective function of the subproblem can be expressed as follows:

where λ is the lagrange multiplier for the convex problem constraint. In order to solve the Lagrangian equation, the present invention calculates its KKT condition:

By solving the above equation, it is possible to obtain:

based on this, the present invention can obtain the expression of bandwidth allocation and lagrangian multiplier, then the present invention has:

meanwhile, according to the KKT condition, the invention has:

Thus, it is possible to obtain:

by the above formula, the invention can solve the bandwidth resource allocation variable, which can be expressed as:

Through theorem 1, the invention can effectively solve the problem of communication resource allocation, and for a given edge access strategy, the invention has the optimal bandwidth resource allocation strategy under the condition, and forms a one-to-one correspondence relationship, thereby reducing the solving difficulty of the original problem.

For the edge access problem, the traditional method needs to obtain all information and solve, however, this is not possible due to the privacy of federal learning. Deep reinforcement learning is an algorithm that continuously explores the environment and does not require any a priori information. The invention designs a deep reinforcement learning method capable of adaptively adjusting an edge access strategy according to feedback information of an edge base station. The edge association problem can be described as a Markov processThe specific details are as follows:

(1) Status of In the kth round, the cloud only can observe feedback information from the edge base station on the edge access policy of the previous round, so the present invention defines the state as S (k) = { S ₁(k),S₂(k),…,S_N (k) }, and each item S _i (k) can be defined as:

S_i(k)＝{A_i(k-1),β_ij(k),Δ_i(k)}

Here, Δ _i (k) represents whether learning accuracy is improved compared with k-1, that is, Δ _i (k) =1 represents that accuracy is improved, and conversely, Δ _i (k) =0.

(2) ActionIn the kth round, the action is an edge association policy for each device:

A(k)＝{A₁(k),A₂(k),…,A_N(k)}

wherein each term a _i (k) can be expressed as:

A_i(k)＝{a_ij(k)}

(3) Rewards The rewards are the direction of the policy, so the invention sets the rewards as an objective function:

Since the edge base station is unaware of all possible subsequent states and optimal actions, the present invention uses a model-free deep reinforcement learning paradigm to update the edge access policy. Meanwhile, in order to handle large state space and discrete type actions, the invention selects a Deep Q Network (DQN) as a basic framework, and combines dueling DQN and double DQN to optimize the algorithm of the invention, and uses D3QN to solve the problem of edge access.

The DQN is a value-based algorithm, the Q value function Q (S, A; theta) of the DQN is approximated by a neural network with parameters of theta, the DQN represents the mapping relation between the environment and the action, and the output of the neural network can be obtained through a Belman equation, and the method comprises the following steps:

Wherein S ', A ', θ ' are the status, action and corresponding parameters of the next time slot, respectively.

Two Q networks of identical structure but different parameters are used in the DQN to improve the stability of the algorithm. One is the current Q network with the latest parameters for evaluating the cost function of the current state-action. The other is a target Q network with past round parameters and keeps the Q value unchanged for a period of time. The invention takes the Q value of the current Q network as the input of the neural network. Obviously, the goal of DQN is to minimize the difference between the two Q networks and define it as the loss function of DQN. The invention comprises the following steps:

L(θ)＝E[(y-Q(S,A；θ))²]

In order to meet the non-independent co-distribution characteristics of data in the Markov process, the DQN adopts an empirical playback strategy to reduce the time correlation among samples and ensure the stability of an algorithm. However, the target values of DQN are all obtained directly by a greedy approach, which results in overestimation and large deviations. To solve this problem, the present invention introduces DDQN algorithm to avoid overestimation by decoupling the selection of target actions and the evaluation of the current state. Unlike the action corresponding to the maximum Q value in the DQN selection target Q network, DDQN selects the action corresponding to the maximum Q value in the current Q network, the invention has the following steps:

the selected action is brought into the target Q network to calculate the Q value, then the invention has:

y＝R(S，A)+γQ'(φ(S'),A^max(S'；θ)；θ')

Meanwhile, in order to converge more quickly, the invention uses dueling DQN to optimize the network structure and divides the network into two parts, namely a value function V (S, theta, alpha) which is only related to the state and a potential function A (S, A, theta, beta) which is related to both the state and the action, wherein theta is a common parameter of the two networks, alpha is a value function unique parameter, and beta is a potential function unique parameter. The Q value can be regarded as the sum of these two functions, and the invention has:

Q(S，A，θ,α，β)＝V(S，θ,α)+A(S，A，θ，β)

Dueling DQN can better evaluate policies to speed up convergence of the network.

Notably, the edge access policy obtained by the cloud center directly changes the access relation between the edge base station and the equipment, so as to guide the communication resource allocation policy on the edge base station, thereby affecting the learning performance of the system and the energy consumption of the equipment.

Considering that the system can be weighted due to energy consumption, devices with different data distribution access the same edge base station, the invention designs a hierarchical federal migration learning strategy by utilizing the advantages of migration learning. The invention can divide the neural network into a basic characteristic layer and a personalized characteristic layer. The basic feature layer has common features of most data, and the personality feature layer captures unique properties of different data. The hierarchical federal migration learning of the present invention is specifically described as follows:

(1) Identifying a migration device: the invention calculates the average learning accuracy of each edge base station after a certain turn:

The invention regards the devices with lower than average precision as the devices with further improved precision, and obviously, the precision of the devices with different data distribution from most of the devices in the device cluster of the edge base station is necessarily lower than the average precision. For convenience, the present invention is collectively referred to as a migration device, and other devices are referred to as non-migration devices.

(2) Hierarchical federal migration learning: the non-migration device models its own basic feature layerAnd personality trait layer model/>Uploading to the accessed edge base station. The migration equipment only uploads the basic feature layer model, and the personalized feature layer model is updated locally on the equipment, and the method comprises the following steps:

the edge base station aggregates the basic feature layer models of all the devices to ensure the generalization performance of the models, and aggregates the individual feature layer models of the non-migration devices to eliminate the influence of the non-independent co-distributed devices. The invention includes:

/>

(3) The edge base station transmits the aggregated base layer model to all access devices, and transmits the personality layer model to the non-migration device. The device performs the updating according to the received model, and iterates until convergence.

The layered transfer learning strategy provided by the invention does not consume extra energy, because the method is the same as the traditional federal learning, the equipment updates each layer of model during local training, except that during uploading, the non-transfer equipment needs to upload each layer of model, and the transfer equipment only needs to upload the personalized feature layer model to an accessed edge base station, so that the size of the uploading model is reduced. But the base layer model accounts for most of all layers, the present invention ignores this reduced energy consumption in terms of computational energy consumption.

In summary, the invention provides a high-energy-efficiency clustered federal edge learning strategy generation method:

firstly, in order to achieve the purpose of high-efficiency learning of the federal system, the invention takes learning performance as system harvest and communication energy consumption as system cost, thereby obtaining a system benefit function. In order to study the system benefit optimization problem in the clustered federal edge learning network, the invention jointly considers the heterogeneous characteristics of communication conditions and data, realizes high energy efficiency while guaranteeing learning performance, and quantifies the problem into a mixed integer nonlinear programming (MINLP) problem.

Secondly, in order to effectively solve the problem of maximizing the system benefit, the invention observes that after determining the edge access strategy, the original problem can be regarded as the resource allocation problem aiming at high energy efficiency, so that the problem is decomposed into two sub-problems, namely the edge access problem and the resource allocation problem of the given edge access strategy, and an effective iterative optimization algorithm is designed according to the problem. For the edge access sub-problem, in order to strengthen the privacy of federal learning data, the method can be better applied to a model nonlinear encryption algorithm, and the method utilizes deep reinforcement learning to explore an edge access strategy. In the resource allocation sub-problem, in order to reduce the complexity of the algorithm, a convex optimization algorithm is used to solve the resource allocation strategy.

Finally, due to energy consumption balance, devices with different data distribution can be accessed to the same base station for co-training, and considering the situation, the invention provides a hierarchical federal migration learning strategy, and the learning precision is further improved under the condition of no extra energy consumption.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some technical features may be replaced with others, which may not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The energy-efficient clustered federal edge learning strategy generation method is characterized by comprising the following steps of:

s1, initializing an edge access strategy by a cloud center;

s6, repeating the process until convergence.

2. The method of claim 1, wherein in step S1, the access policy a _ij of the device to the edge base station is a binary variable, i.e. a _ij =1 if the device i communicates with the edge base station j, otherwise a _ij =0, and each device accesses an edge base station.

3. The method for generating the clustered federal edge learning strategy with high energy efficiency according to claim 1, wherein the convex optimization method in step S2 specifically comprises: for edge base station j and access equipment cluster thereofGiven an edge access policy, the optimal bandwidth allocation β _ij of the resource allocation sub-problem is calculated as follows:

Where h _ij denotes the channel gain between device i and edge server j, p _i denotes the model upload power of device i, N ₀ denotes the power spectral density of gaussian noise, beta _ijB_j is the bandwidth resource allocated by device i accessing edge base station j, devices accessing edge base station j communicate with the public spectrum with bandwidth B _j, A _ij represents the access policy of the device and the edge base station, β _ij represents the bandwidth ratio allocated to the device i, S represents the size of the local model, and N _j is the number of edge servers j.

4. The method for generating an energy efficient clustered federal edge learning strategy according to claim 1, wherein in step S3, the device uses local data according to the received global model θ _j Training is performed, and a loss function formula of the equipment i is as follows:

where x _in is the nth stored sample of device i, y _in is the corresponding tag of x _in, The amount of training data for device i;

Wherein, eta is learning step length and eta is more than or equal to 0;

step S3, after training for a certain round, training a local model by adopting a layered federal migration learning strategy, and dividing the neural network into a basic characteristic layer and a personalized characteristic layer, wherein the layered federal migration learning strategy comprises the following specific processes:

wherein a _ij is an access policy of the device i and the edge base station j, and g _ij is learning accuracy of the global model on the local test data set;

Wherein, For the local personality trait layer model of device i,/>Is a set of migration devices;

5. The method for generating the clustered federal edge learning strategy with high energy efficiency according to claim 1, wherein in step S3, learning accuracy G _ij of the global model on the local test data set is used as an index for measuring performance of the global model on the edge base station j, and learning performance gain G of the system is average accuracy of all devices, and the formula is shown as follows:

Wherein a _ij is the access policy of the device i and the edge base station j, g _ij is the learning accuracy of the global model on the local test data set, N is the number of devices, and M is the number of edge base stations.

6. The method for generating an energy-efficient clustered federal edge learning strategy according to claim 1, wherein the energy E _ij consumed by the device i to upload the local model in step S3 is as follows:

h _ij denotes the channel gain between device i and edge server j, p _i denotes the model upload power of device i, N ₀ denotes the power spectral density of gaussian noise, beta _ijB_j is the bandwidth resource allocated by device i to access edge base station j, the device to access edge base station j communicates with the public spectrum having bandwidth B _j, A _ij represents the access policy of the device with the edge base station and β _ij represents the ratio of bandwidth allocated to device i.

7. The method for generating an energy-efficient clustered federal edge learning strategy according to claim 1, wherein in step S4, the edge base station aggregates all received local models prior to hierarchical aggregation, as follows:

Wherein, For all the device clusters accessing the edge base station j, omega _i is a local model, and D _i is the training data amount of the device i;

8. The method for generating an energy-efficient clustered federal edge learning strategy according to claim 1, wherein the formula of the system benefit function in step S5 is as follows:

9. The method for generating the clustered federal edge learning strategy with high energy efficiency according to claim 1, wherein in step S5, the edge access strategy is adjusted by deep reinforcement learning, and the specific process of the deep reinforcement learning is as follows:

S_i(k)＝{A_i(k-1),β_ij(k),Δ_i(k)}

A(k)＝{A₁(k),A₂(k),…,A_N(k)}

wherein each term a _i (k) can be expressed as:

A_i(k)＝{a_ij(k)}

(3) Rewards Setting the reward as an objective function:

S502, selecting DQN as a basic framework, combining dueling DQN and double DQN to optimize an algorithm, using D3QN to solve the problem of edge access, approximating a Q value function Q (S, A; theta) by a neural network with a parameter theta, representing a mapping relation between the environment and the action, wherein the output of the neural network is obtained by a Belman equation:

L(θ)＝E[(y-Q(S,A；θ))²]

y＝R(S,A)+γQ'(φ(S'),A^max(S'；θ)；θ')

Q(S,A,θ,α,β)＝V(S,θ,α)+A(S,A,θ,β)。

10. an energy efficient cluster federation edge learning strategy generation apparatus comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor implements the energy efficient cluster federation edge learning strategy generation method of any of claims 1-9 when the computer program is executed.