CN117252253A

CN117252253A - Client selection and personalized privacy protection method in asynchronous federal edge learning

Info

Publication number: CN117252253A
Application number: CN202310864644.4A
Authority: CN
Inventors: 何春蓉; 龚小祥; 余红宴; 向顺灵
Original assignee: Chongqing Three Gorges University
Current assignee: Chongqing Three Gorges University
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-12-19

Abstract

The invention discloses a client selection and personalized privacy protection method in asynchronous federal edge learning, which relates to the field of distributed machine learning. And if the model staleness is greater than the set threshold value, the model staleness is forcedly synchronized with the current global model parameters. In constructing the optimization objective problem, the objective is to achieve convergence of the model, and constraints include differential privacy guarantees, resource limitations, and model staleness threshold limits. And when solving the optimization problem, constructing a rewarding penalty function by adopting a deep reinforcement learning algorithm, and acquiring the optimal number of the clients participating in aggregation and the noise added in the clients in each round through multiple feedback.

Description

Client selection and personalized privacy protection method in asynchronous federal edge learning

Technical Field

The invention relates to the field of distributed machine learning, in particular to a client selection and personalized privacy protection method in asynchronous federal edge learning.

Background

Along with the increasing research of federal learning technology in an edge computing environment, researchers refer to federal edge learning, and under the environment, the data can be kept from the local by cooperatively training a global model by using a local client, so that the purpose of 'data motionless model motion and invisible data availability' is achieved. In addition, since the importance of data is continuously strengthened in the current era and the awareness of data owners in terms of privacy protection is gradually increased, privacy computing is strongly supported and developed in recent years. While federal edge learning can provide strong support for scenarios for privacy calculations.

For federal edge learning application scenarios, there are many challenges, and the present chapter mainly considers and solves the following problems:

the latter problem. The adoption of the synchronous federal learning training process can cause the problem of inconsistent completion time due to edge heterogeneity, so that part of clients become the later, thereby increasing the waiting time of an edge server and reducing the system efficiency.

Resource limitation. Most local training clients are mobile devices or internet of things devices, with limited energy; in addition, since most of the wireless communication methods are adopted, if too many clients upload data at the same time, bandwidth resources may be insufficient. In addition to energy and bandwidth, other resources are also limited, such as storage, computation, etc.

Model parameter leakage problem. The model parameters may encounter an internal collusion attack or an external malicious attack or other attacks during the transmission process, thereby acquiring privacy information related to the client.

Disclosure of Invention

In order to solve the technical problems, the invention provides a client selection and personalized privacy protection method in asynchronous federal edge learning.

The technical scheme adopted for solving the technical problems is as follows: the method for protecting the client selection and the personalized privacy in the asynchronous federal edge learning is characterized by comprising an edge server and N clients, wherein the client set is as followsEach client v _i The personal private data set of (i e {1,2,., N }) is +.>The data size is +.>

In the federal learning process, the local learning model of each client is F _i (w) use of private data setsPerforming local training and local updating;

the global updating process of the edge server adopts an asynchronous aggregation scheme, and clients which do not participate in aggregation in time generate a stale model, so that each client has a staleness parameter tau _i The difference between the current communication round and the last communication round when the global parameter is received is calculated and stored by the edge server before each global aggregation;

Each communication round starts with sharing global model parameters, so when t=0, the edge server initializes global parameters w ⁰ And a model staleness list, model staleness τ for each client _i =0, then broadcasting global parameters to all clients for initial synchronization, and informing the clients to start local training;

after receiving the global model, the client uses SGD algorithm to update locally to obtain local model parametersIn order to prevent the parameters from being oversized, the parameters are cut, the upper boundary is C, gaussian noise is added, and the noise parameters +.>To prevent the parameter from being stolen in the transmission process so as to infer the original data information;

and after the client-side locally completes the calculation process, the calculation process is immediately uploaded to an edge server through a wireless link.

Preferably, the edge server first performs pre-aggregation preparation, delay compensates for the received local noise model parameters,

wherein beta is _i ＝Z(τ _i )；

When the number of the temporary storage model parameters of the buffer area reaches M, the global aggregation is immediately carried out,

when the data amount of the client is the same, then

After the aggregation is completed, updating a model staleness list is needed, and if the client does not participate in the current round of aggregation, τ is calculated _i ＝τ _i +1；

If the model staleness is greater than the threshold value, the client needs to synchronize the current global model parameters;

Because the system resources are limited and privacy budget is limited, each communication round needs to calculate the consumed privacy budget and resources, and if the consumed privacy budget and resources exceed a set threshold, the system is interrupted;

if the residual resources exist, continuing model training, and sharing the current global model parameters to an aggregation client and a client exceeding a staleness threshold;

and then proceeds to the next communication round for iteration.

Preferably, the client selects and personalizes privacy protection: the A3C algorithm design strategy is selected to execute corresponding actions according to the obtained state, so that the proposed framework can quickly converge the global model while protecting the parameter privacy of the model.

Preferably, the original federal edge learning system consists essentially of two parts, namely a local update process and a global update process;

DRL-based learning system for adaptively selecting client participation proportion alpha _t (t.epsilon. { 1.,.), T) and personalized privacy budget E for a single client _i Two parameter values.

Preferably, federal edge learning is combined with a deep reinforcement learning framework, global networks are deployed at edge servers, sub-networks are deployed at clients, each network containing an Actor-Critic (AC) network, the network framework employing a neural network model; each sub-network interacts with the environment to obtain the environment state, and then executes corresponding actions through strategies to obtain rewards fed back from the environment, and the global network does not interact with the environment directly.

Preferably, the privacy budget decision algorithm:

at each client, adding gaussian noise after performing local model training to accomplish privacy protection of model parameters;

because the total privacy budget is limited, the added noise after each training is also limited;

parameters that contribute more to model training should add more noise; when making decisions using DRL, corresponding parameters need to be designed according to the objectives.

Preferably, the DRL model

The environment of the DRL system refers to a designed federal edge learning framework, and comprises a global learning model and a local learning model;

the agent represents each client that interacts with the environment by performing actions;

status ofIs a feature vector describing the state of the agent at time t;

actionRepresenting client v _i The consumed privacy budget value is determined at time t. Given the current state, DRL agent will perform an action based on a policy, denoted +.>

RewardsIndicating the action when t time->After being executed, the smart will receive feedback rewards from the environment>Judging whether the action is good or bad according to the method;

at each communication round, the policy network receives the state s of the previous moment _t-1 The probability of outputting an action, called policy pi, is the state space To the action space->Typically using convolutional neural networks, the output layer is softmax;

then selecting actions from the action space according to the strategy pi

Next, the agent receives the current time statusPrize value +.>

The goal of the agent is to maximize the expected return 'by strategically selecting the best action'

Through analysis of the design module, a DRL network structure is realized by adopting an asynchronous dominant actor commentator algorithm; at each client, deploying a composite neural network, inputting the current state of each client, and outputting a strategy pi and a state cost function V(s); the actor network will decide the privacy budget that the client uses and the reviewer network will evaluate the revenue of taking the current action.

Preferably, the subnetwork status and rewards:

after the client finishes the local training, the local sub-network intelligent agent acquires the current state by interacting with the local environmentWhere t represents the current communication round, +.>Global model parameters representing the current communication round, < +.>Representing local model parameters, ε, after training using local private data _t Representing the privacy budget remaining for the current communication round, D _t Representing the remaining resource budget of the current communication round;

actions to be performed at the client according to the need Selecting an appropriate privacy budget, assuming e _i ∈{∈ _j }，j∈[1，J]Here J is the number of actions owned by the local action space, set to a discrete value; at time t of each communication round, the client performs privacy budget decision-making process, at current state +.>And executing all actions to obtain corresponding rewards. When j=1, the assignment is

The agent selects actions according to a policy expressed asIs the probability distribution of the action; here, the neural network is used to represent policy learning, the policy parameter is θ, so our policy can be expressed as +.>Is indicated in the state->Lower execution action a _i，j Probability of (2);

when the commentator network observes action a _i，j After being executed, the feedback reward value r is calculated _i，j The method comprises the steps of carrying out a first treatment on the surface of the The present state can be judged to be good or bad by rewards, so that rewards set by us are related to model parameter changes, resource consumption and privacy budget consumption,

wherein the first partIndicating the difference of the parameters of the local model before and after updating,/->Representing execution of action a _i，j Posterior local model parameters +.>If the difference is smaller, the benefit brought by the current action is larger, and the feedback rewarding value is larger; the second part represents the environmental impact of the resource consumption change, if the resource g consumed by the local computing _i，j The more the reward value fed back, the smaller; the goal of the local update is to obtain the current action that maximizes the cumulative return, and then upload the local update parameters to fullA local network. The cumulative return is calculated as follows:

discount factor gamma epsilon (0, 1), q is the time step index value when the current time step j reaches the ending state.

Preferably, the model is trained

Because the A3C framework is adopted to carry out model training in the system, a main agent management global network is created, a plurality of sub agents manage local networks, and asynchronous parallel training is carried out among all the sub agents. In local updates, the actor network is according to policiesSelecting actions, commentator network estimating state cost function +.>Where θ is a policy parameter, θ _v Is a state cost function parameter; the state-cost function is estimated from the neural network function as follows,

the policy and cost function may be updated after J actions are taken or until a termination state is reached; the local update process updates the policy function and the estimated state cost function toWherein->As a dominance function;

thus, a model-updated loss function can be obtained, the value function loss is the minimum mean square error of the dominance function,

The policy function loss is

Where H is the entropy of the policy distribution. Then accumulate the gradient update strategy parameter θ and the value function parameter θ _v 。

Preferably, the client quantity decision algorithm

After the asynchronous updating of the local client is completed, uploading the updated parameters to an edge server for global updating to obtain new global parameters; if a new client starts training, acquiring parameters from the global network to update the local sub-network in the next step; before the next round of new communication rounds starts, determining the number of clients participating in the aggregation of the next round by the edge server;

an A3C global network is arranged at the edge server side, and the main intelligent agent can acquire the current state s _t The current global state comprises the current communication round t and local model parameters uploaded by each clientResidual resource budget D _t ；

At time t, the global network in the edge server selects actions according to the strategy, and the action to be executed is a _j Indicating that j is a time step index of one stage in order to select a proper number of clients to participate in global aggregation;

the reviewer network calculates rewards based on the actions performed, settings of the rewards are related to convergence of the current global model,here F ^* Represents the optimal value, Δf, when the global model converges _t Representing the current loss value and the optimal loss value F ^* The difference of (a), i.e. DeltaF _t ＝F ^* -F _t ；

The local model parameters uploaded by each client at the moment are parameters added with Gaussian noiseUsing the corrected noise parameters when global polymerization is performed;

the policy network parameter of the global network is theta ', and the state value network parameter is theta' _v When the client selects, the initial parameters are set as the aggregated global network parameters. And after the global network setting is completed, performing global updating, wherein the updating flow is similar to the local network updating flow, and finally, performing global aggregation of the asynchronous federal learning system by maximizing the number of clients with optimal expected returns.

The invention has the advantages that:

providing a new resource-limited and privacy-protecting asynchronous aggregation mechanism, and providing personalized privacy protection for clients in a federal edge learning system with limited resources so as to enable a training model to be quickly converged; a new objective optimization problem is constructed with the objective of making the model converge quickly. Limiting the noise added by each client so that the consumed privacy budget is less than the total budget and the consumed computing and communication resources are less than the total resource budget; and solving the optimization problem by using an A3C algorithm, constructing an interactive environment and a proxy entity, setting state parameters and action targets, and searching for the proper number of clients participating in aggregation in each round and the added noise scale for each client.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for the description of the embodiments will be briefly described, and it is obvious that the drawings in the following description are only 8 of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is an asynchronous federal edge learning system according to an embodiment of the present invention;

FIG. 2 is a block diagram of a system according to an embodiment of the present invention;

FIG. 3 is a model aging decay function of an embodiment of the present invention;

FIG. 4 is a deep reinforcement learning asynchronous federation architecture according to an embodiment of the present invention;

fig. 5 shows DRL training results according to an embodiment of the present invention: (a) training period loss; (b) rewards vary with training period;

fig. 6 illustrates the aggregate client number impact without privacy protection according to an embodiment of the present invention: (a) testing accuracy; (b) test loss;

FIG. 7 illustrates the aggregate client quantity impact of an embodiment of the present invention: (a) testing accuracy; (b) test loss;

FIG. 8 is a comparison of different algorithms according to an embodiment of the present invention: (a) testing accuracy; (b) testing for loss.

Detailed Description

The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

Examples

The symbols and definitions used in this scheme are shown in table 1.1.

1 System model

1.1 System frame

Assuming an asynchronous federal edge learning system as shown in FIG. 1, there is an edge server and N clients, the clients are grouped intoEach client v _i The personal private data set of (i e {1,2,., N }) is +.>The data size is +.>In the federal learning process, the local learning model of each client is F _i (w) use of private dataset +.>And performing local training and performing local updating. The global updating process of the edge server adopts an asynchronous aggregation scheme, which is not reachedClients participating in aggregation generate a staleness model, so each client has a staleness parameter τ _i I.e. the difference between the current communication round and the last communication round when the global parameter was received, which is calculated and saved by the edge server before each global aggregation.

Table 1.1 list of symbols mainly used

If the differential privacy federal edge system is modularized, the differential privacy federal edge system mainly comprises five modules: system initialization, client selection, asynchronous federal training, personalized differential privacy protection and resource monitoring. The model composition is shown in figure 1.

1) System initialization

The initialization of the federal edge system is mainly completed by the edge server. Firstly, initializing a global model, and setting a global model parameter w ₀ Broadcast to all N clients. Then set the global privacy budget E and the resource budget D of the kth class of resources _k During the learning process, if the consumed resources exceed the resource budget, the system terminates. In addition, the initialization parameter clipping threshold C is also broadcast to all clients. The edge server also sets an initial model staleness threshold τ ₀ If the model obstinity of a certain client exceeds the threshold, the global parameters updated by the current round are sent to the client for forced synchronization, and a new communication round is entered for retraining.

2) Client selection

In order to reduce the delay of the edge server when a large number of clients participate in global aggregation and reduce the resource consumption caused by frequent communication when a single client performs global aggregation, it is very important to select a proper number of clients to participate in global aggregation. The scheme adopts a buffer asynchronous aggregation mode, and is characterized in thatThe server sets a buffer area, and buffers the received local parameters before global aggregation until the aggregation condition is triggered and global updating is performed. Assume that the buffer length is L (l.epsilon.1, L ]) Each communication round selects proper client quantity M (M is less than or equal to L) or client proportion alpha according to model training results _t (α _t ＝M/N)。

3) Asynchronous federal training

The asynchronous federation training module comprises a client local asynchronous parallel training and an edge server global buffering asynchronous aggregation process. For client v _i Using local private data setsLocal training according to random gradient descent algorithm (SGD) to obtain local model parameter +.>The update process is as follows:

where eta is the learning rate of the device,for the local iteration moment, the local iteration is complete +.>After the second time, the person is relieved>And then uploaded to an edge server over a wireless link.

After the edge server receives the local parameters, calculating the model staleness of the client, and if the model staleness belongs to a staleness model, correcting according to a delay compensation mechanism to obtain corrected model parameters

Wherein beta is E (0, 1)]As a function of model staleness, w ^t-1 And the global model parameters are the last round of the saved global model parameters. The delay compensation mechanism details are shown in section 1.2.

When the edge server cache area has M local parameters, global aggregation is immediately carried out, and the aggregation mode adopts the federal average method:

wherein p is _i Representing client v _i The proportion of the data volume of (2), i.e

4) Personalized differential privacy protection

There is still a potential for privacy leakage due to the updated parameters transmitted over the wireless link in the federal edge learning system. Therefore, the scheme adopts a differential privacy technology to protect the transmitted parameters, and the differential privacy has a plurality of properties such as combinability and post-processing property, and is suitable for the federal edge learning system in the scheme. We use (∈, δ) -DP, which is defined as follows:

definition 1 ((. Epsilon.,. Delta.) -DP). When two adjacent databasesAnd output space->Random mechanism->Is (∈, δ) -DP, satisfying:

where ε represents the privacy budget and δ represents the probability that ε is not meeting strict differential privacy.

To ensure (∈δ) -DP, model parameters are blurred by a Gaussian mechanism, mainly by adding Gaussian noise with 0 standard deviation sigma as the mean, i.e

The adopted Gaussian mechanism is in the same E as E (0, 1), sigma is larger than or equal to c, and delta is equal to sigma ₂ f/∈and constantDifferential privacy (e, δ) can be guaranteed. Wherein->Is a function->L of (2) ₂ Sensitivity.

And setting the global privacy budget as epsilon for protecting the privacy of model parameters in the system. Assuming that the client just consumes all the privacy budget after training the T-round, then the relationship is satisfiedWherein E is _t Privacy budgets consumed for each round. If the global privacy budget is evenly distributed, the average value of the privacy budget consumed in each round is +. >Privacy budget consumed by each client in this case +.>

According to formula (1.4), if client v _i Having more valid data, it will be allocated a lower privacy budget e _i Let v _i And better privacy protection is possessed. In this scheme, assuming that all clients consume different privacy budgets, the privacy budgets set to { e _i }. According to the differential privacy combination theory, each communication roundThe privacy budget consumed is E _t ＝max{∈ _i }. For each client v _i If its parameters have a greater contribution to the model, more noise is added to prevent leakage.

In addition, in order to prevent the problem caused by oversized parameters, the local training needs to be cut every time when the local training is completed, and the cutting threshold value is C, namely I W _i And the I is less than or equal to C. Is available according to sensitivity definitionStandard deviation sigma of gaussian noise _i ＝c△ ₂ f/∈ _i I.e.

For asynchronous federal training, it is necessary to allocate an appropriate privacy budget (gaussian noise standard deviation) for each client for personalized privacy protection. However, during the training process, ifThe training is terminated immediately.

5) Resource monitoring

In an asynchronous federal edge learning system, the training process consumes many resources, such as computing resources and communication resources. Edge servers typically have sufficient resources and therefore ignore the calculation of their resource consumption. But clients each have different resources due to system heterogeneous. During the training process, if the resources are exhausted, contact with the edge server is lost.

It is assumed that K types of resources (such as energy, network bandwidth, etc.) are shared in the learning system. For each resource type K e {1,2,., K }, let g _k Representing the resource consumption of a client in a local update procedure, b _k Representing the resource consumption of the model parameters to complete a single exchange between the edge server and the client, D _k Budgeting for global resources. Thus, after T global aggregations are performed, the resource consumption resulting from the local update is g _k N.T, the resource consumption by model exchange is 2b _k ·N·T。

In the federal training process, the edge server can always monitor resources, detect the resource consumption condition after each communication round is finished, and if the consumed resources are smaller than the global resource budget, continue training. Namely, the following conditions are required to be satisfied in the training process:

(g _k +2b _k )·N·t≤D _k (1.7)

1.2 delay Compensation mechanism

Because the federal edge learning system proposed by the scheme adopts an asynchronous mode to perform federal training, the client may generate a model staleness problem. In the global aggregation process, the received local model parameters are corrected according to the model staleness of each client, wherein the correction mode is shown as (1.2), and beta=Z (·) is a delay attenuation coefficient and is a function of the model staleness.

Assuming that the model of the client is old by τ, we can know that when τ=0, the aggregation process does not need to attenuate the model, i.e., Z (0) =1; the slower the decay, the greater the τ, the faster the decay.

Based on this property, we designed a bell-shaped curve with a functional expression as follows, in comparison with fig. 3.

As model attenuation factors, whereIs a super parameter for adjusting the attenuation speed.

The remaining functional expressions in the figure are shown below.

Some common staleness functions Z (τ) are listed below, where x, y >0:

constant function:

Z(τ)＝1 (1.9)

polynomial function:

Z _x (τ)＝(τ+1) ^-x (1.10)

piecewise function

Exponential function:

z _x (τ)＝e ^-xτ (1.12)

after each communication round is finished, detecting model staleness of each client, if tau _i Greater than a set staleness threshold τ ₀ The client is forced to synchronize, the current global model parameters are sent to the client, and then training is restarted, namely

1.3 problem Structure

According to the asynchronous federal edge learning scheme proposed above, it is known that our main objective is to complete privacy protection of system model parameters under the condition of limited resources, and finally make the global model converge, so our optimization problem is constructed as follows:

During model training, several constraints are required. The first constraint represents that the (∈, δ) -differential privacy can be guaranteed after noise is added to the local model parameters; the second constraint indicates that the privacy budget required to meet consumption during the T round of communication is less than the set total budget; the third constraint condition is a resource limited constraint, which means that the resources consumed by local calculation and model exchange in the T-round communication process are smaller than the set total resource budget; the fourth constraint is a model staleness constraint, meaning that the model staleness of each client cannot be greater than the staleness threshold.

In all constraints, the privacy budget E consumed by each client _i And the aggregate client number M per communication round _t Are not fixed, and these two parameters greatly affect the model convergence speed, so we need to be applied to ε _i And M _t And the estimation and optimization are carried out, so that the model converges faster and the accuracy is higher.

In general, finding the optimal solution to the above problem is an NP-hard problem, so we consider using a deep reinforcement learning algorithm to solve the optimization problem.

2 asynchronous aggregation algorithm with limited resources and privacy protection

The scheme focuses on an asynchronous aggregation algorithm with limited resources and privacy protection. From the system model, it is known that the main execution computation is done by the client and edge server entities, each communication round consists of two main steps, local training and global aggregation.

Each communication round starts with sharing global model parameters, so when t=0, the edge server initializes global parameters w ⁰ And a model staleness list, model staleness τ for each client _i =0, then broadcast the global parameters to all clients for initial synchronization and inform the clients to start local training.

After receiving the global model, the client uses SGD algorithm to update locally to obtain local model parametersIn order to prevent the parameters from being oversized, the parameters are cut, the upper boundary is C, gaussian noise is added, and the noise parameters +.>To prevent the parameter from being stolen during transmission to infer the original data information. And after the client-side locally completes the calculation process, the calculation process is immediately uploaded to an edge server through a wireless link.

The edge server first performs pre-aggregation preparation, performs delay compensation on the received local noise model parameters,

wherein beta is _i ＝Z(τ _i )。

When the data amount of the client is the same, then

After the aggregation is completed, updating a model staleness list is needed, and if the client does not participate in the current round of aggregation, τ is calculated _i ＝τ _i +1. If the model staleness is greater than the threshold, the client needs to synchronize the current global model parameters. Because the system resources are limited and privacy budget is limited, each communication round needs to calculate the consumed privacy budget and resources, and if the consumed privacy budget and resources exceed a set threshold, the system is interrupted; if the residual resources exist, continuing model training, and sharing the current global model parameters to the aggregation client and the clients exceeding the threshold of staleness. And then proceeds to the next communication round for iteration.

The system training process is shown in algorithm 1.1

Algorithm 1.1: asynchronous aggregation algorithm for limited resources and privacy protection

3 client selection and personalized privacy protection

In this section, we briefly introduce an agent Deep Reinforcement Learning (DRL) technique, and then use the DRL technique to solve the optimization problem described above. In order to fit the federal edge learning system, we choose to use an A3C algorithm design strategy to execute corresponding actions according to the obtained state, so that the proposed framework can quickly converge the global model while protecting the parameter privacy of the model.

3.1 design thought

The original federal edge learning system mainly includes two parts, namely a local update process and a global update process, as shown in fig. 1. The following mainly addresses two problems that occur in these two processes, respectively. 1) How much noise amount (how much privacy budget is consumed) needs to be added in the local update to make the model parameters available and to preserve user privacy? 2) How many client local model parameters need to be received for asynchronous aggregation in the global update process can ensure that the model converges quickly and reduce the latency of the edge server?

The two problems described above correspond in system module composition to the client selection module and the personalized differential privacy protection module, and are therefore designed next for these two modules to solve the problem. 1) And the personalized differential privacy protection module is used for distributing different client data in non-IID mode, and the importance degree of the data of each client is different, so that it is necessary to design different noise privacy budgets to conduct personalized privacy protection on the clients. 2) Aiming at the equipment selection module, the method has the advantages that the waiting time is required to be short in global updating, the model can be converged rapidly, and meanwhile, the consumption of communication resources is reduced, so that different clients can be adaptively selected for participating in global aggregation in each communication round in the training process. Through the module design, the federal edge system finally enables the model to achieve rapid convergence under the constraint conditions of limited resources and privacy protection, and the accuracy is not too low. Through analysis, we design a learning system based on DRL, and adaptively select the participation proportion alpha of clients _t (t.epsilon. { 1.,.), T) and personalized privacy budget E for a single client _i Two parameter values.

To solve the privacy budget decision and client number selection problems in the objective problem, federal edge learning is combined with a deep reinforcement learning framework, which can be considered as end-of-global networks at edge servers, sub-networks are deployed at clients, each network contains an Actor-critter (AC) network, and the network framework adopts a neural network model. The architecture diagram is shown in fig. 4, each sub-network interacts with the environment to obtain the environment state, and then performs corresponding actions through strategies to obtain rewards fed back from the environment, so that the global network does not interact with the environment directly. The decision process for the two questions will be described separately.

3.2 privacy budget decision Algorithm

At each client, gaussian noise needs to be added after performing local model training to accomplish privacy protection of model parameters. Due to the limited overall privacy budget, the added noise after each training may also be limited. In addition, since the client data is non-IID distributed, the contribution of the data to the model update is inconsistent, we believe that parameters that contribute more to model training should add more noise (less privacy budget). When making decisions using DRL, corresponding parameters need to be designed according to the objectives.

(1) DRL model

Attention is first directed to the privacy budget decision algorithm in the local update process, the deep reinforcement learning framework is shown in fig. 4. The standard reinforcement learning model is that an agent learns continuously during interaction with the environment according to rewards or penalties obtained to output the optimal actions. Several important concepts used in the DRL model are presented below.

An environment. The environment of the DRL system is designed to be a federal edge learning framework, and comprises a global learning model and a local learning model.

The agent represents each client that interacts with the environment by performing an action.

Status ofIs a feature vector describing the state of the agent at time t. />

ActionRepresenting client v _i Privacy budget value consumed at time t. Given the current state, DRL agent will perform an action based on a policy, denoted +.>

RewardsIndicating the action when t time->After being executed, the smart will receive feedback rewards from the environment>And judging whether the action is good or bad according to the judgment.

At each communication round, the policy network receives the state s of the previous moment _t-1 The probability of outputting an action (e.g., completion time, loss function, and resource consumption), which is referred to as policy pi, i.e., state space To the action space->Typically using convolutional neural networks, the output layer is softmax. Then select action from action space according to policy pi>Next, the agent receives the current time status +.>Prize value +.>The goal of the agent is to maximize the expected return by strategically selecting the best action.

Through analysis of the design module, we employed an asynchronous dominant actor critique (A3C) algorithm to implement the DRL network architecture. At each client, a composite neural network is deployed, the current state of each client is input, and the policy pi and state cost function V(s) are output. The actor network will decide the privacy budget that the client uses and the reviewer network will evaluate the revenue of taking the current action.

(2) Subnetwork status and rewards

After the client finishes the local training, the local sub-network intelligent agent acquires the current state by interacting with the local environmentWhere t represents the current communication round, +.>Global model parameters representing the current communication round, < +.>Representing local model parameters, ε, after training using local private data _t Representing the privacy budget remaining for the current communication round, D _t Representing the remaining resource budget for the current communication round.

Actions to be performed at the client according to the need Selecting an appropriate privacy budget, assuming e _i ∈{∈ _j }，j∈[1，J]Here J is the number of actions owned by the local action space, set to a discrete value. At the time t of each communication round, the client performs privacy budget decision process, and at the current state +.>And executing all actions to obtain corresponding rewards. When j=1, the value +.>

The agent selects actions according to a policy expressed asIs the probability distribution of the action. Here, the neural network is used to represent policy learning, the policy parameter is θ, so our policy can be expressed as +.>Is indicated in the state->Lower execution action a _i，j Is a probability of (2).

When the commentator network observes action a _i，j After being executed, the feedback reward value r is calculated _i，j . The present state can be judged to be good or bad by rewards, so that rewards set by us are related to model parameter changes, resource consumption and privacy budget consumption,

wherein the first partIndicating the difference of the parameters of the local model before and after updating,/->Representing execution of action a _i，j Posterior local model parameters +.>If the difference is smaller, the benefit brought by the current action is larger, and the feedback rewarding value is larger; the second part represents the environmental impact of the resource consumption change, if the resource g consumed by the local computing _i，j The more the prize value fed back, the smaller. The goal of the local update is to obtain the current action that maximizes the cumulative return, and then upload the local update parameters to the global network. The cumulative return is calculated as follows:

(3) Model training

Due to adoption of the systemThe A3C framework performs model training, a main agent management global network is created, a plurality of sub agents manage local networks, and asynchronous parallel training is performed among all sub agents. In local updates, the actor network is according to policiesSelecting actions, commentator network estimating state cost function +.>Where θ is a policy parameter, θ _v Is a state cost function parameter. The state-cost function is estimated from the neural network function as follows,

the policy and cost function may be updated after J actions are taken or until a termination state is reached (e.g., the model reaches convergence or resource exhaustion). The local update process updates the policy function and the estimated state cost function toWherein->As a dominant function.

The policy function penalty is:

3.3 client quantity decision Algorithm

And after the asynchronous updating of the local client is completed, uploading the updated parameters to an edge server for global updating to obtain new global parameters. If a new client starts training, parameters are acquired from the global network for the next local sub-network update. The number of clients involved in the next round of aggregation needs to be decided by the edge server before the next round of new communication rounds starts. The privacy budget decision after local update and the client quantity decision are mutually influenced, and the decisions of the two are not necessarily completed in the same communication round.

An A3C global network is arranged at the edge server side, and the main intelligent agent can acquire the current state s _t The current global state comprises the current communication round t and local model parameters uploaded by each clientResidual resource budget D _t . At time t, the global network in the edge server selects actions according to the strategy, and the action to be executed is a _j Indicating that j is a time-step index for one phase in order to select the appropriate number of clients to participate in the global aggregation. The critic network calculates rewards according to the executed actions, the setting of the rewards is related to the convergence condition of the current global model,/and- >Here F ^* Represents the optimal value, Δf, when the global model converges _t Representing the current loss value and the optimal loss value F ^* The difference of (a), i.e. DeltaF _t ＝F ^* -F _t . The local model parameters uploaded by each client at this time are parameters after Gaussian noise is added +.>The corrected noise parameters are used when global aggregation is performed.

The policy network parameter of the global network is theta ', and the state value network parameter is theta' _v When the client selects, the initial parameters are set as the aggregated global network parameters. At the completion of the globalAnd after the network is set, global updating is carried out, the updating flow is similar to the local network updating flow, and finally, the global aggregation of the asynchronous federal learning system is carried out by maximizing the number of clients with expected returns and selecting the best clients. The algorithm implementation is shown in algorithm 1.2.

4 experiment and Performance evaluation

And then, providing an experiment and performance evaluation scheme, comparing the proposed scheme with different parameter settings by adopting a public data set, wherein an experiment result shows that federal edge learning based on limited resources and differential privacy can well protect local model parameter privacy under the condition of assisting training by adopting a deep reinforcement learning technology, and can enable the model to quickly converge.

Algorithm 1.2: personalized privacy protection and client quantity selection algorithm

4.1 Experimental Environment

Aiming at the privacy protection scheme provided by the scheme, a local computer is selected to carry out simulation verification, and a Pytorch framework is adopted to realize the differential privacy buffering asynchronous aggregation process. The computer processor for the experiment was Genencori 7-10700,CPU@2.90GHz,32GBRAM.

The main data set evaluated was a handwriting recognition MNIST, consisting of 60,000 training samples and 10,000 test samples, each of which is a 28 x 28 pixel gray scale image representing a number from 0 to 9. The batch size of the dataset during training was set to 64 and the batch size of the test set was set to 1000. For clients in the federal edge learning environment, it is assumed that a non-IID data division mode is adopted to divide numbers 0-9 into different clients, each client only contains one or several samples of digital class, and the data distribution is different but the data quantity is the same. The global model of the client co-training is set as a convolutional neural network CNN, two convolutional layers and a full-connection layer are used, a rectification linear unit (RELUs) is selected by an activation function, and a dropout layer is used for regularization.

In the process of collaborative training and asynchronous parallel updating of the client, the proposed algorithm is evaluated by adopting the following indexes. (1) test accuracy. This is the most commonly used performance indicator of the classification training process, representing the ratio of the number of correctly identified data samples in the test data set to the total number of samples of all test data. (2) test loss. Representing the magnitude of the error between the predicted and actual values during training, cross entropy loss and NLL loss are typically used as loss functions. (3) DRL rewards. Representing the rewards calculated from the reward function during DRL training. (4) communication round. The number of communication rounds needed by the global model to reach convergence is represented, and each communication round represents the whole process from the global model parameter issuing to the global updating completion.

In a comparison experiment, a relevant reference algorithm is selected for comparison, the first one is used for selecting DP-SGD to compare the effect of model parameter privacy protection by adding Gaussian noise in the federal learning process, the second one is used for selecting NbAFL, and Gaussian noise is added in both the local model parameter and the global model parameter in the federal learning training process.

4.2 experimental results

(1) DRL training

The deep reinforcement learning training is mainly completed by an edge server and a client, the deployed A3C network completes the client selection process at the edge server, and personalized privacy protection is completed at the client. DRL performance, including training loss and rewards, was tested primarily during the experiment. The experimental procedure in this section sets the total number of clients n=10. It can be seen from the figure that the loss value drops rapidly during the early training phase, mainly because of the lack of information about the environment by the agent in the early stage. After a period of training, the agent obtains enough information about the environment, and the loss value begins to stabilize, indicating that the DRL agent is gradually adapted to the Federal edge learning system. And the reward values in the training process are gradually accumulated, and as the intelligent agent selects a better strategy to finish the action in continuous exploration, the better reward value can be obtained correspondingly. When the training period reaches 200 times, the prize value varies only slightly.

(2) Parameter influence

For the buffering asynchronous aggregation algorithm provided by the scheme, when the number of clients aggregated by a single communication round is changed, the test performance is shown in fig. 6, and the smaller the number of clients aggregated by each round, the more communication rounds are needed for convergence, and the lower the accuracy is compared with the aggregation of multiple clients. Because of Non-IID distribution of data, the less the clients participate in aggregation, the less the data sample size, so that the global model accuracy fluctuation during aggregation is very large, and the samples of all the clients can be used uniformly only by a large amount of frequent communication, and finally convergence is achieved. It can be seen from the figure that if an asynchronous aggregation algorithm is desired to be adopted, and the resource consumption condition is considered, the number of the appropriate participating aggregation clients needs to be set.

In order to make the model converge quickly, we choose the number of participating clients that dynamically changes to perform global aggregation, and the number required for each round is determined according to the A3C algorithm, and the determination process needs to consider the state of the current model.

The effect of adjusting the number of clients in the aggregation process on the model convergence performance is large, as shown in fig. 7, which shows the effect of the change of the number of clients on the model performance, and m=4 is selected compared with the fixed number of clients. The accuracy curve fluctuates greatly due to the non-IID distribution and asynchronous aggregation of the data. If the number of the self-adaptive clients is obtained by using the A3C decision algorithm, more clients are selected to participate in aggregation under the condition of low accuracy, so that the accuracy is improved, and the model convergence is quickened.

(3) Algorithm comparison

Fig. 8 shows the comparison result between different algorithms, wherein DRL-DPAFL represents the scheme that we propose to use DRL to decide the privacy budget consumption of different clients and the number of aggregation clients for each communication round, DP-SGD and NbAFL select 4 participants from 10 clients to aggregate in experiments, and the data distribution is set to be non-IID. From the figure, it can be seen that our DRL-DPAFL can achieve the effect almost similar to that of DP-SGD, and the accuracy can fluctuate by about 90% when the differential privacy mechanism is adopted for privacy protection, both of which are superior to NbAFL.

5 knots

The proposal provides an asynchronous federal learning privacy protection proposal with limited resources, and provides personalized privacy protection for the asynchronous federal learning privacy protection proposal under the condition of limiting the resources of a local client. Specifically, in the local training process, privacy protection of different degrees is provided for each client according to the importance degree of each client data, mainly by setting personalized privacy budget, adding different noise amounts, if the local data contributes greatly, less privacy budget is allocated to the local model parameters, namely more noise is added, so that the privacy protection degree is higher. After the local parameters reach the edge server, according to the model obsolescence of each client, carrying out attenuation correction on the local parameters, and then temporarily storing the local parameters in a buffer area. And if the number of the caches is equal to the optimal value obtained by the deep reinforcement learning algorithm, performing global aggregation. The global parameters are distributed to clients participating in aggregation for next training, and are sent to clients with model staleness exceeding a set threshold for forced synchronization. The proposed algorithm is verified by a large number of experiments in the performance evaluation stage, and the scheme can solve the lag problem, the resource limited problem and the model parameter leakage problem, and the performance is superior to that of a comparison algorithm with more relaxed conditions.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. Asynchronous machineThe client selection and personalized privacy protection method in federal edge learning is characterized by comprising an edge server and N clients, wherein the client set is as followsEach client v _i The personal private data set of (i e {1,2,., N }) is +.>The data size is +.>

In the federal learning process, the local learning model of each client is F _i (w) use of private data sets Performing local training and local updating;

after receiving the global model, the client uses SGD algorithm to update locally to obtain local model parametersTo prevent the parameter from being oversized, the parameter is cut out to be the upper limitTaking C, adding Gaussian noise to obtain noise parameter +.>To prevent the parameter from being stolen in the transmission process so as to infer the original data information;

2. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

wherein beta is _i ＝Z(τ _i )；

when the data amount of the client is the same, then

and then proceeds to the next communication round for iteration.

3. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

Client selection and personalized privacy protection: the A3C algorithm design strategy is selected to execute corresponding actions according to the obtained state, so that the proposed framework can quickly converge the global model while protecting the parameter privacy of the model.

4. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

the original federal edge learning system mainly comprises two parts, namely a local updating process and a global updating process;

5. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein: combining federal edge learning with a deep reinforcement learning framework, deploying global networks at edge servers, deploying sub-networks at clients, each network comprising an Actor-Critic (AC) network, the network framework employing a neural network model; each sub-network interacts with the environment to obtain the environment state, and then executes corresponding actions through strategies to obtain rewards fed back from the environment, and the global network does not interact with the environment directly.

6. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

privacy budget decision algorithm:

7. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

DRL model

status ofIs a feature vector describing the state of the agent at time t;

RewardsIndicating the action when t time- >After being executed, the smart will receive feedback rewards from the environment>Judging whether the action is good or bad according to the method;

at each communication round, the policy network receives the state s of the previous moment _t-1 The probability of outputting an action, called policy pi, is the state spaceTo the action space->Typically using convolutional neural networks, the output layer is softmax;

then selecting actions from the action space according to the strategy pi

Next, the agent receives the current time statusPrize value +.>

The goal of the agent is to maximize the expected return by strategically selecting the best action.

8. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

sub-network status and rewards:

after the client finishes the local training, the local sub-network intelligent agent acquires the current state by interacting with the local environment Where t represents the current communication round, +.>Global model parameters representing the current communication round,representing local model parameters, ε, after training using local private data _t Representing the privacy budget remaining for the current communication round, D _t Representing the remaining resource budget of the current communication round;

actions to be performed at the client according to the needSelecting an appropriate privacy budget, assuming e _i ∈{∈ _j },j∈[1,J]Here J is the number of actions owned by the local action space, set to a discrete value; at the time t of each communication round, the client performs a privacy budget decision process, and executes all actions under the current state sit to obtain corresponding rewards. When j=1, the assignment is

The agent selects actions according to a policy expressed asIs the probability distribution of an actionThe method comprises the steps of carrying out a first treatment on the surface of the Here, the neural network is used to represent policy learning, the policy parameter is θ, so our policy can be expressed as +.>Is shown in the stateLower execution action a _i,j Probability of (2);

when the commentator network observes action a _i,j After being executed, the feedback reward value r is calculated _i,j The method comprises the steps of carrying out a first treatment on the surface of the The present state can be judged to be good or bad by rewards, so that rewards set by us are related to model parameter changes, resource consumption and privacy budget consumption,

wherein the first part Indicating the difference of the parameters of the local model before and after updating,/->Representing execution of action a _i,j Posterior local model parameters +.>If the difference is smaller, the benefit brought by the current action is larger, and the feedback rewarding value is larger; the second part represents the environmental impact of the resource consumption change, if the resource g consumed by the local computing _i,j The more the reward value fed back, the smaller; the goal of the local update is to obtain the current action that maximizes the cumulative return, and then upload the local update parameters to the global network. The cumulative return is calculated as follows:

9. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

model training

the policy and cost function may be updated after J actions are taken or until a termination state is reached; the local update process updates the policy function and the estimated state cost function toWherein the method comprises the steps ofAs a dominance function;

the policy function loss is

10. The method for client selection and personalized privacy protection in asynchronous federal edge learning of claim 1, wherein:

client number decision algorithm

the critic network calculates rewards according to the executed actionsThe settings are related to the convergence of the current global model,here F ^* Represents the optimal value, Δf, when the global model converges _t Representing the current loss value and the optimal loss value F ^* The difference of (a), i.e. DeltaF _t ＝F ^* -F _t ；