CN115099420A

CN115099420A - Model aggregation weight dynamic distribution method for wireless federal learning

Info

Publication number: CN115099420A
Application number: CN202211032084.8A
Authority: CN
Inventors: 黄川�; 崔曙光; 郭玮
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-09-23

Abstract

The invention discloses a model aggregation weight dynamic distribution method facing wireless federal learning, which comprises the following steps: s1. for having one data center andKthe data center of the wireless federal learning system of edge devices uses the model parameters

The wireless channel is broadcasted to all edge devices, and the edge devices estimate and update the received information to obtain model parameters

(ii) a S2, all edge devices update the model parameters through wireless uplink

Sending the data to a data center; and S3, constructing an objective function for influences caused by uplink and downlink wireless channel fading and additive noise in each iteration process, obtaining an optimization problem based on the minimized objective function and power constraint, and solving to obtain an optimal weight distribution scheme. The method can determine the weight of each edge device when the data center carries out model aggregation, and effectively ensures the accuracy of model aggregation in the wireless federal learning process.

Description

Model aggregation weight dynamic distribution method for wireless federal learning

Technical Field

The invention relates to wireless federal learning, in particular to a model aggregation weight dynamic distribution method facing wireless federal learning.

Background

A large number of wireless edge devices with ever-increasing computing and communication capabilities and the mass data generated by them can implement intelligent applications in wireless networks by cooperatively training machine learning models. Federal learning, a newly proposed promising distributed machine learning paradigm, allows all participating end devices to exchange model parameters only with the parameter server and save the raw data locally, thus protecting data privacy and security.

However, when the federal learning is applied in a wireless communication scenario, it consumes a lot of communication resources to serve the terminal devices, so that it is necessary to perform joint optimization design from the perspective of both communication and learning efficiency.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a dynamic model aggregation weight distribution method for wireless federal learning, which can determine the weight of each edge device when a data center carries out model aggregation, and effectively ensures the accuracy of model aggregation in the wireless federal learning process.

The purpose of the invention is realized by the following technical scheme: a dynamic distribution method of model aggregation weights for wireless federal learning comprises the following steps:

s1. for a data center andKwireless federal learning system for edge devices, data center, and model parameters

Broadcasting the information to all edge devices through a wireless channel, and estimating and updating the edge devices according to the received information to obtain model parameters

；

S2, all edge devices update the model parameters through wireless uplink

Sending the data to a data center;

and S3, constructing an objective function for the influence caused by the uplink and downlink wireless channel fading and additive noise in each iteration process, and obtaining an optimization problem and solving the optimization problem to obtain an optimal weight distribution scheme based on the minimized objective function and the power constraint.

Further, the step S1 includes the following sub-steps:

s101, enabling the data center to use model parameters

Broadcasting to all edge devices through a wireless channel;

s102, arranging edge equipmentkThe received signal is

Wherein the content of the first and second substances,

representing data center to edge deviceskThe channel coefficient (c) of (a) is,

which represents the transmission power of the data center,

then representing a complex symmetric circular gaussian noise vector;

s103, edge equipmentkReceived signal

After that, the signal is divided by

Scaling is performed to estimate the original signal sent by the data center as a result of the estimation

；

At this time, the edge devicekWill estimate to obtain the result

As an initial result of the local training update, all edge devices pass throughEAfter the second local update, the updated model parameters are updated

Then sending the data back to the data center;

wherein, the local updating process is as follows:

wherein

It is indicated that the learning rate is,

is shown as

The number of the updating times is updated,

denotes the first

Secondary copyThe parameters of the model at the time of the update,

is shown as

Small batches of data randomly selected at the time of the second local update,

is shown in

Small batch gradients at secondary update; first, theESecond update, i.e.

Results obtained

I.e. the updated model parameters.

Further, the step S2 includes the following sub-steps:

s201. edge equipmentkPrecoding local model parameters, i.e. multiplying precoding factors

Wherein

Representing edge deviceskThe transmission power of the antenna is set to be,

representing edge deviceskThe channel coefficients to the data center are,

a complex symmetric circular gaussian noise vector is represented,

and

respectively representing conjugate transposition and modulus operation of complex numbers; k=1,2,…K;

s202, all edge devices transmit the local model parameters after pre-coding to the data center at the same time, and the signals received by the data center are calculated in the air

S203, the data center receives the signals

Multiplying by a scaling factor

The scaling factor being the inverse of the sum of all transmit powers, i.e.

(ii) a The final received signal of the data center is

。

Wherein

Is a dynamic model aggregation weight that satisfies

And is directly dependent on the upstream transmit power of the edge device.

Further, the step S3 includes the following sub-steps:

s301, calculating to obtain the influence caused by the uplink and downlink wireless channel fading and the additive noise in each iteration process as an objective function, and expressing the influence as follows:

wherein

A vector representing the components of the edge device transmit power,

the dimensions of the model are represented in a manner that,

the power of the gaussian complex noise is represented,

smooth coefficients representing loss functions, in the first term of the molecular part

Is expressed as

Wherein

Is expressed as

。

S302, optimizing the transmitting power of the edge devices by minimizing an objective function, wherein each edge device has independent power constraint, namely

Wherein

Is an edge device

The upper limit of the power of (a),

is expressed as

。

S303, based on the minimized objective function and the power constraint, obtaining the following original optimization problem

Obtaining the optimal power distribution vector by solving the original optimization problem

Thereby obtaining an optimal weight distribution scheme

。

The solving process of the original optimization problem comprises the following steps:

a1, introduction of auxiliary variables

Defining a new vector

Thereby converting the original optimization problem into the following problem

A2, carrying out variable replacement

The problem in step A1 translates into the following problem

Wherein

Are respectively represented as

；

A3, variable replacement is carried out again

The problem in step A2 is further converted into the following problem

The problem obtained by transformation in step A3 is a standard semi-definite relaxation problem, and the optimal solution is obtained by solving the problem

；

A4, solving to obtain an optimal solution

The optimal solution to the original optimization problem is later represented as

Wherein

，

Representation matrix

Delete the first

Column and first

Left after line

A matrix of sizes;

to obtain

Later, the optimal weight for model aggregation in the data center in each iteration process is

。

Preferably, the method for dynamically assigning weights further comprises:

at each round of iterative training

Edge device

Using the calculated transmit power after updating the local model

Apply the local model

Sending the data to a data center, and sending the data to equipment when the data center carries out model aggregation

Assigned weight of

Then the new global model obtained by the data center is

，

Wherein

Is an additive noise vector.

The invention has the beneficial effects that: the method can determine the weight of each edge device when the data center carries out model aggregation, and effectively ensures the accuracy of model aggregation in the wireless federal learning process. Moreover, the dynamic weight of the invention is obtained by directly optimizing the transmitting power of the equipment, so the weight not only balances the learning efficiency of wireless federal learning, but also balances the communication efficiency, and the invention is a simple and efficient learning-communication combined optimization design invention.

Drawings

FIG. 1 is a schematic diagram of a wireless federal learning system;

FIG. 2 is a flow chart of a method of the present invention;

FIG. 3 is a schematic diagram showing the variation of the test accuracy with the training times under independent and identically distributed data distribution;

FIG. 4 is a diagram showing the variation of the test accuracy with the training times under the non-independent and identically distributed data distribution.

Detailed Description

The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.

Aiming at a federal learning algorithm deployed in a wireless communication system, the invention designs a dynamic distribution scheme of model aggregation weights. Which comprises the following steps: modeling uplink and downlink signal transmission between the data center and the edge equipment; designing the weight when the data center carries out model aggregation; and (4) an optimal dynamic weight distribution scheme. As shown in fig. 1, we consider that the federal learning algorithm is deployed in a wireless communication system, where the system includes a data center and a plurality of edge devices, the edge devices transmit a locally updated model to the data center through a wireless uplink for model aggregation, then the data center redistributes a global model obtained after aggregation to the edge devices through a wireless downlink for further update, and a global optimal model is obtained through training through multiple cooperative iterations between the data center and the edge devices;

consider having a data center andKa wireless federal learning system for individual edge devices as shown in fig. 1. In order to train and obtain a global machine learning model and protect the local data privacy of all edge devices, only model parameters are exchanged between a data center and the edge devices in the system through a wireless channel. The invention considers simulating wireless downlink and uplink transmission, wherein the wireless downlink transmission is used for global model distribution, the wireless uplink transmission is based on an air computing technology and used as the basis of weight design of model aggregation, and specifically:

as shown in fig. 2, a method for dynamically allocating model aggregation weights for wireless federal learning includes the following steps:

s1. for having one data center andKwireless federal learning system for edge devices, data center, and method for updating model parameters (after previous aggregation update)

Broadcast to all edge devices through wireless channels (i.e. wireless downlink transmission), and the edge devices estimate and update the model parameters according to the received information

：

S101, enabling the data center to use model parameters

Broadcasting to all edge devices through a wireless channel;

s102, arranging edge equipmentkThe received signal is

Wherein, the first and the second end of the pipe are connected with each other,

representing data center to edge deviceskThe channel coefficient (c) of (a) is,

which represents the transmission power of the data center,

then representing a complex symmetric circular gaussian noise vector;

s103, serving as edge equipmentkReceived signal

After that, the signal is divided by

Scaling to estimate the original signal sent by the data center with the result of the estimation

；

At this time, the edge devicekWill estimate to obtain the result

As an initial result of the local training update, all edge devices pass throughEAfter the local update, the updated model parameters are updated

Then sending the data back to the data center;

wherein, the local updating process is as follows:

wherein

It is indicated that the learning rate is,

is shown as

The number of the sub-update is updated,

is shown as

The model parameters at the time of the second local update,

is shown as

Small batches of data randomly selected at the time of the second local update,

is shown in

Small batch gradients at secondary update; first, theESecond update, i.e.

Results obtained

I.e. the updated model parameters.

The local updating strategy adopted by the invention is a small-batch random gradient descent method.

S2, all edge devices update the model parameters through wireless uplink

Sending to a data center (i.e., wireless uplink transmission):

s201. edge equipmentkPrecoding local model parameters, i.e. multiplying by a precoding factor

Wherein

Representing edge deviceskThe transmission power of the antenna is set to be,

representing edge deviceskThe channel coefficients to the data center are,

a complex symmetric circular gaussian noise vector is represented,

and

S203, the data center receives the signals

Multiplying by a scaling factor

The scaling factor being the inverse of the sum of all transmit powers,namely, it is

(ii) a The final received signal of the data center is

。

Wherein

Is a dynamic model aggregation weight that satisfies

And is directly dependent on the upstream transmit power of the edge device.

S3, constructing an objective function for the influence caused by the uplink and downlink wireless channel fading and additive noise in each iteration process, and obtaining an optimization problem and solving the optimization problem to obtain an optimal weight distribution scheme based on the minimized objective function and the power constraint:

and S301, due to fading of a wireless channel and the existence of additive noise, model parameters received by the edge device and the data center are inaccurate in the training process of the wireless federal learning system. Therefore, in order to reduce the influence of channel fading and additive noise on the training process, the invention designs the model aggregation weight which can be directly determined by the transmitting power of the edge device. Based on the weight, an optimal model aggregation weight distribution scheme can be obtained through further optimization. Due to the dynamic property of the wireless channel, the weight needs to be optimized in each round of training process, so the allocation scheme is a dynamic model aggregation weight allocation scheme.

Through convergence analysis, it can be deduced that the wireless federal learning system considered by the invention is passing throughTAfter the second iteration, the upper bound between the difference between the loss function value and the optimal loss function value can be used as a measure model in the iterationTAn indicator of the effectiveness of the training next time. We refer to this upper bound as the optimal spacing,the optimal interval is the minimum, which shows that the model effect obtained by training is better. The distribution of training set data, the variance of the random gradient, and channel fading and additive noise introduced by wireless communications of the wireless federal learning system considered by the present invention all affect the value of the optimal interval. In order to reduce the influence of wireless communication on model training, the part related to channel fading and additive noise in the optimal interval needs to be minimized, which can be expressed as the weighted sum of the influence caused by uplink and downlink wireless channel fading and additive noise in each iteration process. Therefore, the influence caused by the uplink and downlink wireless channel fading and the additive noise in each iteration process is only required to be minimized, and the part related to the wireless communication in the optimal interval can be minimized.

The influence caused by the uplink and downlink wireless channel fading and the additive noise in each iteration process is calculated and obtained as an objective function, and is expressed as:

wherein

A vector representing the components of the edge device transmit power,

the dimensions of the model are represented in a manner that,

which represents the power of the gaussian complex noise,

smooth coefficient representing loss function, in first term of molecular part

Is expressed as

Wherein

Is expressed as

。

Wherein

Is an edge device

The upper limit of the power of (a),

is expressed as

。

Obtaining an optimal power distribution vector by solving an original optimization problem

Thereby obtaining an optimal weight distribution scheme

。

a1, introduction of auxiliary variables

Defining a new vector

Thereby converting the original optimization problem into the following problem

A2, carrying out variable replacement

The problem in step A1 translates into the following problem

Wherein

Are respectively represented as

；

A3, variable replacement is carried out again

The problem in step a2 is further converted into the following problem

The problem obtained by conversion in step A3 is a standard semi-positive definite relaxation problem, and is solvedOptimal solution

；

A4, solving to obtain an optimal solution

Wherein

，

Representation matrix

Delete the first

Column and first

Left after line

A matrix of sizes;

to obtain

。

Preferably, the method for dynamically assigning weights further comprises:

at each round of iterative training

Edge device

Using the calculated transmit power after updating the local model

Apply the local model

Assigned weight of

Then the new global model obtained by the data center is

，

Wherein

Is an additive noise vector.

In the embodiment of the application, a simulation result is given to verify the model aggregation scheme of the invention. In addition to the model aggregation scheme proposed by the present invention, the federate learning algorithm and the truncated channel inversion algorithm under the ideal channel also serve as a comparison scheme. In the simulation, we trained the convolutional neural network to recognize the MNIST data set, and the criteria were evaluated but the accuracy was tested. The simulation parameters are set as follows:

，

the uplink and downlink channels are modeled as independent and identically distributed Rayleigh fading channels, namely complex symmetrical round Gaussian with zero mean unit varianceThe variables are the variables of the process,

，

，

，

. Consider that each edge device has 800 training data.

First, the effect of the proposed scheme of the present invention in the case of independent co-distributed data is examined, as shown in fig. 3. The result shows that with the increase of the iteration times, the test accuracy of the scheme provided by the invention gradually rises and finally converges, and the convergence process almost coincides with the federal learning algorithm under an ideal channel, so that the test accuracy can reach 92.75% finally, and the effectiveness of the dynamic aggregation weight distribution scheme provided by the invention can be demonstrated. In addition, the scheme provided by the invention has better test accuracy than the existing scheme. Then, the effect of the proposed scheme of the present invention in case of non-independent co-distributed data is examined, as shown in fig. 4. The results also show that the test accuracy of the proposed scheme gradually rises and finally converges as the number of iterations increases, but in the case of non-independent identically distributed data, the convergence curve of the proposed scheme is slightly lower than that of the ideal federal learning algorithm, but still significantly better than that of the other existing schemes.

The foregoing is a preferred embodiment of the present invention, and it is to be understood that the invention is not limited to the form disclosed herein, but is not intended to be foreclosed in other embodiments and may be used in other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A model aggregation weight dynamic distribution method facing wireless federal learning is characterized in that: the method comprises the following steps:

；

S2, all edge devices update the model parameters through wireless uplink

Sending the data to a data center;

2. The method for dynamically allocating model aggregation weights for wireless federal learning according to claim 1, wherein: the step S1 includes the following sub-steps:

s101, enabling the data center to use model parameters

Broadcasting to all edge devices through a wireless channel;

s102, arranging edge equipmentkThe received signal is

Wherein the content of the first and second substances,

representing data center to edge deviceskThe channel coefficients of (a) are determined,

which represents the transmission power of the data center,

then representing a complex symmetric circular gaussian noise vector;

s103, serving as edge equipmentkReceived signal

After that, the signal is divided by

；

At this time, the edge devicekWill estimate to obtain the result

Then sending the data back to the data center;

wherein, the local updating process is as follows:

wherein

It is indicated that the learning rate is,

is shown as

The number of the sub-update is updated,

is shown as

The model parameters at the time of the second local update,

is shown as

Small batches of data randomly selected at the time of the second local update,

is shown in

Small batch gradients at secondary update; first, theESecond update, i.e.

Results obtained

I.e. the updated model parameters.

3. The method for dynamically allocating the model aggregation weights for wireless federal learning according to claim 1, wherein: the step S2 includes the following sub-steps:

Wherein

Representing edge deviceskThe transmission power of the antenna is set to be,

representing edge deviceskChannel coefficients to the data center, n representing a complex symmetric circular gaussian noise vector,

and

S203, the data center receives the signals

Multiplying by a scaling factor

The scaling factor being the inverse of the sum of all transmit powers, i.e.

(ii) a The final received signal of the data center is

；

Wherein

Are dynamic model aggregation weights that satisfy

And is directly dependent on the upstream transmit power of the edge device.

4. The method for dynamically allocating the model aggregation weights for wireless federal learning according to claim 1, wherein: the step S3 includes the following sub-steps:

wherein

A vector representing the components of the edge device transmit power,

the dimensions of the model are represented in a manner that,

the power of the gaussian complex noise is represented,

smooth coefficient representing loss function, in first term of molecular part

Is expressed as

Wherein

Is expressed as

；

Wherein

Is an edge device

The upper limit of the power of (a),

is expressed as

；

Thereby obtaining an optimal weight distribution scheme

。

5. The method for dynamically allocating the model aggregation weights for wireless federal learning according to claim 4, wherein: the solving process of the original optimization problem comprises the following steps:

a1, introduction of auxiliary variables

Defining a new vector

Thereby converting the original optimization problem into the following one

A2, carrying out variable replacement

The problem in step A1 translates into the following problem

Wherein

Are respectively represented as

；

A3, carrying out variable replacement again

The problem in step A2 is further converted into the following problem

；

A4, solving to obtain an optimal solution

Wherein

，

Representation matrix

Delete the first

Column and first

Left after line

A matrix of sizes;

to obtain

。

6. The method for dynamically allocating the model aggregation weights for wireless federal learning according to claim 1, wherein: the dynamic weight distribution method further comprises the following steps:

at each round of iterative training

Edge device

Using the calculated transmit power after updating the local model

Apply the local model

Send to a data centerEquipment for model aggregation in data center

Assigned weight of

Then the new global model obtained by the data center is

，

Wherein

Is an additive noise vector.