CN113947210A

CN113947210A - Cloud side end federal learning method in mobile edge computing

Info

Publication number: CN113947210A
Application number: CN202111169752.7A
Authority: CN
Inventors: 毕远国; 项天敖; 闫飞宏; 王博洋; 郭朋超
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-01-18
Anticipated expiration: 2041-10-08
Also published as: CN113947210B

Abstract

The invention belongs to the technical field of edge computing, and provides a cloud edge side federal learning method in mobile edge computing, which is used for reducing communication turns and improving the reliability of a trained machine learning model in a three-layer edge computing framework. By having the edge server participate in the federal learning training process, the communication round between the edge server and the terminal device is minimized by dynamically adjusting the training round in the terminal device. The problem of model parameter loss caused by channel bandwidth and other factors in the process of transmitting the model is reduced by minimizing the communication turn between the edge server and the terminal equipment.

Description

Cloud side end federal learning method in mobile edge computing

Technical Field

The invention belongs to the technical field of edge computing, and relates to a cloud edge federal learning method in mobile edge computing.

Background

With the development of smart devices and communication technologies, over 1250 billions of devices were deployed in various fields such as industrial internet, smart car, health care, etc. by 2030 according to the prediction of IHS Markit. Applications in these areas often have strict requirements on response delay compared to traditional multimedia services, making deployment of such services at the edge of the network an on-going need.

Mobile Edge Computing (MEC) has become a promising technology to support the Quality of Service (QoS) requirements described above, which can provide real-time services with local Computing power, low latency, and large bandwidth communications. Meanwhile, as a distributed machine Learning paradigm, Federal Learning (FL) has a great potential in MEC to provide edge intelligence and privacy protection. And the terminal devices participating in the joint learning process cooperatively establish a global high-performance model, and simultaneously retain training data locally. One FL procedure consists of several rounds of training procedure, each round only selecting a part of the terminal devices to participate in the training procedure. The participating terminal equipment trains the model by using local data, and then the cloud server summarizes the received model to form a global model.

The primary advantage of FL is to decouple model training from direct access to the original training data. Without data exchange, the FL procedure preserves data privacy, reducing the risk of data leakage. Based on this feature, FL is widely used in various fields from business companies to academic organizations, such as google, Webank, IEEE, and the like. Meanwhile, FL integrated with MEC has been studied in many research areas, such as blockchain, deep reinforcement learning, and virtual network function auto scaling.

However, integrating the FL with the MEC presents many challenges due to limited bandwidth, unreliable wireless channels, and frequent handovers. On the one hand, many devices contend for access to the wireless channel for data transmission, resulting in severe interference or packet collisions. On the other hand, doppler effect and frequent handovers caused by mobility of the terminal equipment also greatly increase the probability of packet loss in the MEC. Since the terminal device participates in the FL process and exchanges parameters with the edge server, the unstable wireless connection reduces the reliability of the model and even makes the model unavailable, thereby degrading the quality of the global aggregation model.

Disclosure of Invention

In order to improve the reliability of the model, a positive correlation relationship is established between the possibility of model parameter loss and the communication turn between the terminal equipment and the edge server by utilizing the three-layer structure of the MEC. The existence of an optimal communication turn and the corresponding number of hours of duration that minimizes the possibility of parameter loss are demonstrated by theoretical analysis. The minimum communication turn and the corresponding estimation of the optimal local training turn are obtained by adjusting some key parameters, and then the edge server is used for dynamically controlling the training turn of the selected terminal equipment to reduce the communication turn.

The optimal communication round and corresponding duration are derived theoretically based on a positive correlation of communication round and packet loss, and further utilized to reduce the likelihood of parameter loss due to a fragile wireless connection. By expanding the optimization parameters, the optimal communication round and the corresponding local iteration round which is convenient to calculate are estimated, and an effective local iteration round adjustment algorithm is designed to improve the reliability of the model. Through extensive simulations, the results show that over 80% of the estimated hours are close to the optimal iteration round in practice, and the enhanced FL scheme effectively reduces the communication round to achieve the required accuracy.

The technical scheme of the invention is as follows:

a cloud edge federal learning method in mobile edge computing comprises the following steps:

firstly, a mobile terminal device estimates and transmits parameters required for calculating an optimal local iteration turn to a server;

the mobile terminal device estimates and transmits parameters required for calculating the optimal local iteration turn to the server, and the parameters are specifically as follows: and selecting the mobile terminal equipment participating in the FL process, and using the local data to train the initial machine learning model of the current round transmitted by the edge server, and controlling the local training round of the selected terminal equipment by the edge server. And the terminal equipment records the initialized machine learning model in the training process and collects data in the training process, estimates the Ripocez coefficient and the smooth coefficient of the model trained by the terminal equipment by utilizing the collected data and the initialized machine learning model after the training is finished, and returns the Ripocez coefficient and the smooth coefficient of the model to the edge server together with the model.

Secondly, the server dynamically adjusts the local training round based on the minimum communication round;

2-1, selecting control parameters; the method comprises the following specific steps:

first, a theoretically optimal local training round e is given^*：

Eta is a step length selected during training, L is a second-order RipShtz constant of a loss function, beta is a smooth coefficient of the loss function, ν is an upper bound of a reciprocal difference between a machine learning model weight and an optimal machine learning model weight obtained by a centralized random gradient descent method, μ is a lower bound of a difference between a machine learning loss function and an optimal loss function value trained by a federal mean algorithm, and δ is a gradient divergence generated by the model after the federation of the federal mean algorithm is aggregated, which is defined as follows;

wherein

Representing the first order gradient of the aggregated federal learning model,

representing the first order gradient, δ, of the Federal learning model through terminal i_iRepresenting the difference in the gradients of the aggregate model and the terminal i model, |, representing the potential of the set, n_iRepresenting a data set, n, owned by a terminal i_SumRepresenting the sum of the terminals in the edge server range,

wherein min_kMinimum value expressed in k-th round of training is equivalent to

And μ ═ min | | | F (w)^(c)(t))-F(w^*)||²：

Because v and mu in the formula (1) are difficult to obtain under the actual application scene, v mu in the formula (1) is converted into mu by utilizing the first-order Riposetz continuous condition²The scale conversion is as follows:

zeta represents the unknown parameter in equation (3) i.e.:

w^(c)(t) representing the machine learning model parameters of the centralized stochastic gradient descent method at the corresponding t moment; w is a^*Representing the optimal model parameters, i.e. the optimal solution of the problem, that minimize the model loss function; max_kRepresents the maximum value in the k-th round of training; zeta is smaller than 1, and the value is controlled to be close to the theoretical value when the optimal local iteration round is calculated.

2-2, estimating an optimal local training turn and a corresponding minimum communication turn;

the method comprises the following specific steps: from equations (1) and (4), the estimate of the optimal local iteration round corresponding to the minimum communication round is:

the minimum communication round corresponding to the theoretical optimal iteration number is:

wherein Δ is an upper bound of the difference between the optimal weight and the suboptimal weight of the desired machine learning model; e.g. of the type^*Representing the optimal local training round within the communication round.

And (3) combining the formula (5) and the formula (6) to obtain the estimation of the minimum communication turn:

when the algorithm runs beyond the estimate of the predicted minimum communication turn, the algorithm is forced to terminate to prevent the algorithm from stopping in the event of an inability to converge.

The invention has the beneficial effects that: the training round of the selected terminal equipment is dynamically adjusted through the minimum communication round estimated by the edge server and the corresponding optimal training round of the terminal equipment, so that the reliability of a transmission model in the FL process is improved, and the problem of model parameter loss caused by data packet loss caused by an unreliable wireless communication environment is solved.

Drawings

FIG. 1 is a FL overall system architecture in an MEC of the present invention;

fig. 2(a) is a flow chart of a mobile terminal device;

fig. 2(b) is a flow chart of the edge server.

FIG. 3(a) is a logistic regression training synthetic dataset.

FIG. 3(b) is a two-layer neural network regression training synthetic dataset.

Fig. 3(c) is a logistic regression training MNIST dataset.

Fig. 3(d) is a two-layer neural network training MNIST dataset.

In the figure:

Detailed Description

The technical solution of the present invention is described in detail below with reference to specific examples.

The invention provides a cloud side end federal learning method in mobile edge computing, which is used for reducing the problem of reliability reduction of a machine learning training model in a communication environment. In the implementation process, the purpose of reducing the communication round is achieved by collecting parameters to estimate the local optimal training round of the mobile terminal, so that the reliability of the machine learning model is guaranteed. The method combines the specific flow of a federal learning training machine learning model in a mobile edge computing environment and the current communication situation in a wireless network, and is divided into the following two parts.

(1) The mobile terminal device estimates and passes to the server the parameters needed to compute the optimal local iteration round, since the edge server needs to estimate some key parameters like β, L and δ, which the terminal device needs to estimate first in the local epoch, unlike the normal FL procedure. At the beginning of each round, the terminal device receives the initial model and the iteration rounds from the edge server. Then, the terminal device updates the model thereof and trains the model. After completing the local training round, the terminal device estimates beta_i、L_iAnd send the parameters and updated model to the edge server. The training process in the terminal device then stops and, unless selected again, it no longer continues to participate in the training process.

(2) Transmitting the model and the estimated value L at the terminal device_iAnd beta_iAfter the edge server, it collects the received models to obtain a region model. The aggregated model satisfies the stopping condition delta and the edge server will send to the cloud server. And sending the result to the cloud server as a final result, and finishing the training process. The edge server obtains by using the received parameters

And

and calculate

As the optimum calendar hours e^*(ii) an estimate of (d); will be provided with

And sending the model aggregated by the areas to the selected terminal equipment as the training turn and the initial model of the terminal equipment in the next training turn. According to equation (5), the edge server computes

First, estimate R_minBy using

Maximum value of as communication round R_limitThe upper limit of (3). Due to the fact that in the formula (5)

Estimation of optimal communication turns

Less than R_min. To avoid the problem of underestimating R_minAnd the number of communication rounds R is determined by using the parameter gamma > 1_limitTo γ R_limit. If number of communication rounds R_countGreater than γ R_limitThe edge server considers the algorithm to be difficult to converge and terminates the training process.

The following is introduced from three aspects of a training process of the mobile terminal device, an estimation of the optimal training round of the local device in the edge server, and a control mechanism of non-convergence.

1. Mobile terminal device training process

During federal learning, end devices generate and maintain data, while they are also responsible for training tasks. By utilizing data maintained by a single device, the trained model is not suitable for all devices. Therefore, the selected terminal deviceThe models are ready to be transmitted to the edge server for aggregation to alleviate this problem in the FL process. The mobile terminal device estimates and passes to the server the parameters needed to compute the optimal local iteration round, the edge server needs to estimate the key parameters, such as β, L and δ, which the terminal device estimates first in the local epoch. At the beginning of each round, the terminal device receives the initial model and the iteration round from the edge server. The terminal device then updates its model and trains the model. After completing the local training round, the terminal device estimates beta_i、L_iAnd send the parameters and updated model to the edge server. The training process in the terminal device then stops and, unless selected again, it no longer continues to participate in the training process.

(1) The mobile terminal device receives the initial machine learning model and the local training times sent by the edge server, records the initial model and initializes the local model with the initial model.

(2) And the mobile terminal device trains the model by using local data according to a random gradient descent method, and the training turn is determined by the local training turn estimated by the edge server received in the last step.

(3) When the local training round reaches the upper limit, estimating the RipSetz coefficient L of the loss function by using the loss function of the local model and the gradient thereof_iAnd the smoothing coefficient beta of the function_i。

(4) Ripocitz coefficient L estimated by mobile terminal device_iAnd a coefficient of smoothness beta_iAnd sending the trained model and the gradient of the loss function of the trained model to an edge server and stopping training.

2. Dynamic local device training turn adjustment mechanism in edge server

The aim of the present mechanism is to reduce the likelihood of model degradation due to packet loss in the communication between the terminal device and the edge server. In the FL procedure, the terminal deviceThe device maintains own data and uses the data to train the model without data exchange. The quality of the aggregated model depends on the updated model from the terminal device. Therefore, the loss of data packets during communication results in the loss of model parameters, thereby degrading the quality of the aggregation model in the edge server. Furthermore, model degradation in the edge server can slow down the entire FL process and even make the model unable to converge. The expected value of model loss depends on the communication round R and the number n of devices in the region_SumAnd the packet loss probability p of the device i_iWhere i ∈ {1,2.. n_Sum}. The expected time of packet loss can severely impact the quality of the aggregated model in the edge server, since loss of important parameters can reduce the reliability of the model. In order to reduce the possibility of parameter loss, improve model quality,

should be minimized.

Probability of packet loss p due to device i_iBeing independent and assumed to be fixed, increasing the reliability of the model means minimizing the number of communication rounds R of the objective function in equation (8).

The minimum communication round can be achieved by adjusting the local training round theoretically through analysis, and the corresponding local optimal training round is as follows:

wherein Δ is an upper bound of the difference between the optimal weight and the suboptimal weight of the desired machine learning model;

because v and mu in the formula (1) are difficult to obtain in the practical application scene, v mu in the formula (1) can be converted into v mu by utilizing the first-order Riposetz continuous condition²Scaling yields an estimate for the local optimal training round:

combining equation (5) with equation (6) yields an estimate of the minimum communication turn:

the mechanism comprises the following specific flows:

(1) the edge server receives the machine learning model, the gradient of the loss function, the Ripocez coefficient and the smoothing coefficient of the loss function transmitted from the selected terminal device.

(2) The models received from the terminal devices are weighted and aggregated according to a federal averaging algorithm and new machine learning models within the region are generated.

(3) If the generated model does not meet the requirement on the precision of the loss function, the edge server calculates and estimates the Ripocitz coefficient and the smooth coefficient of the loss function corresponding to the model in the edge server at the moment according to the gradient of the loss function received from the terminal equipment, the Ripocitz coefficient and the smooth coefficient of the loss function, and estimates the gradient divergence in the round of communication.

(4) The edge server calculates an optimal local training round corresponding to the minimum communication round according to formula (5), and estimates the minimum communication round according to the training round.

(5) And updating the communication round upper limit of the minimum communication round representing the maximum estimation in the training process, so as to prevent the condition that the training process cannot be terminated when the model cannot be converged.

(6) And sending the aggregated region model as a terminal equipment initialization model of the next round, and sending the estimated optimal local training round to the selected terminal equipment.

The dynamic adjustment mechanism was verified by two data sets, MNIST and composite data set, and two machine learning models, two-layer fully connected neural network and logistic model, with the results shown in fig. 3. The estimates of the optimal local training rounds in the figures are mainly distributed in the intervals [8,13 ] in fig. 3(a) and 3(b)]. In FIG. 3.(a), are distributed over [8,13 ]]The estimated rate of the interval was 86%, while in fig. 3(b), the estimated rate was 85.5%. The above simulation results show that the estimation in equation (5) is valid. The estimation results are dynamically adjusted to fit the best training round. Fig. 3.(c) and 3.(d) show that the estimates are mainly distributed in two intervals. In FIG. 3(c), 84% of the estimates are distributed over [12,14 ]]And [23,26 ]]In FIG. 3(d), are distributed in [8,9 ]]And [18,19 ]]The estimated value of the interval exceeds 89%. The reason for this estimated distribution is the delta fluctuation per round in the proposed algorithm. Since the partitioning of MNIST is more unbalanced compared to the synthetic dataset, δ is from 0.8 · 10^-7Fluctuating to 0.3. Fluctuations in the parameter δ lead to estimated transitions, which demonstrate that the gradient divergence δ can reflect the effect of the difference data in gradient divergence on the training process. Furthermore, such a sharp fluctuation of δ expands the estimated interval by only 10, which indicates that the estimation in equation (5) is stable.

Claims

1. A cloud side federal learning method in mobile edge computing is characterized by comprising the following steps:

and secondly, dynamically adjusting the server based on the local training turn for realizing the minimum communication turn.

2. The cloud edge federal learning method in mobile edge computing according to claim 1, wherein in the second step, the server dynamically adjusts based on a local training turn that realizes a minimum communication turn; the method specifically comprises the following steps:

2-1, selecting control parameters;

2-2 estimating an optimal local training turn and a corresponding minimum communication turn.

3. The method according to claim 1 or 2, wherein in the first step, the mobile terminal device estimates and transmits parameters required for calculating the optimal local iteration turn to the server, specifically: selecting the mobile terminal equipment participating in the FL process, and using local data to train the initial machine learning model of the current round sent by the edge server, and controlling the local training round of the selected terminal equipment by the edge server; and the terminal equipment records the initialized machine learning model in the training process and collects data in the training process, and estimates the Rippletz coefficient and the smooth coefficient of the self-trained model by utilizing the collected data and the initialized machine learning model after the training is finished and returns the Rippletz coefficient and the smooth coefficient together with the model to the edge server.

4. The method for learning the cloud side federation in the mobile edge computing according to claim 1 or 2, wherein the 2-1 selects the control parameters, and the specific steps are as follows:

first, a theoretically optimal local training round e is given^*：

wherein

Representing the first order gradient of the aggregated federal learning model,

wherein min_kMinimum value expressed in k-th round of training is equivalent to

And μ ═ min | | | F (w)^(c)(t))-F(w^*)||²：

zeta represents the unknown parameter in equation (3) i.e.:

5. The method for cloud edge federal learning in mobile edge computing according to claim 3, wherein the 2-1 selection of control parameters comprises the following specific steps:

first, a theoretically optimal local training round e is given^*：

wherein

Representing the first order gradient of the aggregated federal learning model,

wherein min_kMinimum value expressed in k-th round of training is equivalent to

And μ ═ min | | | F (w)^(c)(t))-F(w^*)||²：

zeta represents the unknown parameter in equation (3) i.e.:

6. The method for learning the cloud side federal in the mobile edge computing according to claim 1,2 or 5, wherein the 2-2 estimation optimal local training turns and the corresponding minimum communication turn comprise the following specific steps: from equations (1) and (4), the estimate of the optimal local iteration round corresponding to the minimum communication round is:

wherein Δ is an upper bound of the difference between the optimal weight and the suboptimal weight of the desired machine learning model; e.g. of the type^*The communication round is an optimal local training round;

7. The method for cloud edge federal learning in mobile edge computing according to claim 3, wherein the 2-2 estimated optimal local training rounds and the corresponding minimum communication rounds specifically comprise the following steps: from equations (1) and (4), the estimate of the optimal local iteration round corresponding to the minimum communication round is:

8. The method for cloud edge federal learning in mobile edge computing according to claim 4, wherein the 2-2 estimated optimal local training rounds and the corresponding minimum communication rounds specifically comprise the following steps: from equations (1) and (4), the estimate of the optimal local iteration round corresponding to the minimum communication round is: