CN114912705A

CN114912705A - Optimization method for heterogeneous model fusion in federated learning

Info

Publication number: CN114912705A
Application number: CN202210615296.2A
Authority: CN
Inventors: 邵雨蒙; 李骏; 马川; 时龙; 王喆; 张�杰
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-08-16

Abstract

The invention discloses an optimization method for heterogeneous model fusion in federated learning, which comprises the following steps: in the federal learning system, a client uploads a local model and uploads the local iteration times of the client; the server optimizes the heterogeneous model aggregation process in the federated learning according to the iteration times uploaded by the client: and the server optimizes the aggregation weight corresponding to each client during global aggregation according to the collected local iteration times uploaded by the clients, wherein the optimization comprises linear weight optimization, square weight optimization and differential linear weight optimization. The heterogeneous model is a model with heterogeneity generated after local training in a federated learning system due to differences between computing power and owned data volume among clients; the local iteration times refer to the times of local training of co-iteration within a specified training time when the local client performs local model training. The invention improves the learning performance of the federal learning system.

Description

Optimization method for heterogeneous model fusion in federated learning

Technical Field

The invention relates to the technical field of machine learning, in particular to an optimization method for heterogeneous model fusion in federated learning.

Background

In the field of artificial intelligence, data is the basis for machine learning. In most industries, data often exists in an isolated island form due to problems of industry competition, privacy security, complex administrative procedures and the like. Even the centralized integration of data among different departments of the same company faces significant weight resistance. In reality, it is almost impossible or expensive to integrate data distributed in various places and organizations. With the further development of artificial intelligence, it has become a worldwide trend to attach importance to data privacy and security.

Federal learning is a machine learning architecture and aims to help a plurality of organizations to perform data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations, so as to achieve better learning effect and privacy protection performance. Compared with the traditional distributed machine learning framework, the Federal learning exchanges the trained model instead of the original data before training, so that the privacy of the private data is ensured on the premise of not reducing the learning effect. Therefore, the federated learning enables the users to participate in the joint learning under the condition of keeping data privacy, and the users collaborate to complete a common target task.

However, since each user in federal learning has its own local data and local training computation capabilities, the heterogeneous nature of this device resource results in heterogeneity of models that are trained locally within the same training time, i.e., some clients' models undergo more training and others less. The heterogeneous model provides a new challenge for model fusion at a server end, and the traditional federal learning model fusion algorithm, such as the federal average algorithm provided by google, and the latest federal approximate regular algorithm and the federal asynchronous algorithm do not consider the problem of heterogeneous model aggregation caused by heterogeneous resources of equipment, so that the heterogeneous model has a great defect when being applied in a real scene.

Disclosure of Invention

The invention aims to provide an optimization method for heterogeneous model fusion in federated learning, which solves the problem of training model heterogeneity caused by client resource heterogeneity in a federated learning architecture, so that a system can perform corresponding aggregation algorithm optimization according to local iteration times uploaded by a user, thereby improving the learning performance and efficiency of the system.

The technical solution for realizing the purpose of the invention is as follows: an optimization method for heterogeneous model fusion in federated learning comprises the following steps:

in a federal learning system, a client uploads a local model and uploads the local iteration times of the client;

the server optimizes the heterogeneous model aggregation process in the federated learning according to the iteration times uploaded by the client: and the server optimizes the aggregation weight corresponding to each client during global aggregation according to the collected local iteration times uploaded by the clients, wherein the optimization comprises linear weight optimization, square weight optimization and differential linear weight optimization.

Further, the federal learning system includes a centralized federal learning system and a decentralized federal learning system.

Further, the heterogeneous model refers to: in the federal learning system, due to differences between the computing power and the owned data volume among the clients, the model generated after local training has heterogeneity.

Further, the local iteration times refer to the times of local training of co-iteration within a specified training time when the local client performs local model training, including the times of gradient descent, the times of training of the same sample, and the times of training of the whole local data set.

Further, the uploading of the local model by the client and the uploading of the local iteration times of the client itself refer to: when the client and the server communicate and upload the local model, the numerical value of the local training iteration times of the client in the current communication round needs to be uploaded at the same time.

Further, the heterogeneous model aggregation refers to that the server performs model fusion to update a global model according to the collected local models of all the clients, and the aggregation optimization is realized by an aggregation algorithm, wherein the aggregation optimization comprises a federal average algorithm, a federal approximate regular algorithm and a federal asynchronous algorithm.

Further, in the federal learning system, the client local training is specifically as follows:

the method comprises the steps that N clients are arranged in a federal learning system to participate in training, wherein model parameters uploaded by the ith client before the t-th global aggregation are set as

The number of local iterations is

And the two parameters need to be uploaded to the server at the same time during the t-th aggregation;

for each client, the training purpose is to obtain an optimal learning model on the basis of own data, and the optimal learning model is expressed as follows:

wherein

Representing the optimal local model parameters, F _i Is the local objective function, ω, of the ith client _i Is the local model parameter for the ith client.

Further, the server optimizes a heterogeneous model aggregation process in federated learning according to the iteration times uploaded by the client, specifically as follows:

server receiving all clientsTo be transmitted

And

model fusion is then performed to generate a global model, which is represented as:

wherein, ω is ^t Representing the global model parameters generated by the t-th aggregation,

representing server uploads at the t-th aggregation according to the i-th client

Adjusted aggregate weights that satisfy a normalization criterion, i.e.

The optimization of the aggregation weight comprises linear weight optimization, square weight optimization and differential linear weight optimization so as to meet the requirement of optimizing a global objective function.

Further, the optimization of the aggregation weight includes linear weight optimization, square weight optimization, differential linear weight optimization, wherein:

the linear weight is optimized and expressed as

The square weight is optimized and expressed as

The differential linear optimization is expressed as

Wherein the content of the first and second substances,

representing the weight difference between two different clients i, j,

and C represents the weight optimization proportionality coefficients of different federal learning systems and different training targets.

Further, according to the aggregation weight adjusted by the server for each client at the t-th aggregation, the global objective function is expressed as:

wherein F represents a global objective function of the server;

the server adjusts the aggregation weight according to the local iteration times of each client to obtain an optimal global learning model, which is expressed as:

wherein, ω is ^* Representing the optimal global learning model parameters.

Compared with the prior art, the invention has the following remarkable advantages: (1) the user uploads the local iteration times while uploading the local model, so that the server can correspondingly optimize model fusion according to the collected iteration times of all users, the learning performance of federal learning is improved, and the system can output a global model with better performance; (2) the method improves the test precision of the federal learning system in an actual scene, improves the overall performance of the system, and has a very wide application prospect.

Drawings

FIG. 1 is a schematic diagram illustrating the optimization method of heterogeneous model fusion in federated learning according to the present invention.

FIG. 2 is a graphical illustration of the effect on the federally learned public data set in an embodiment of the present invention.

Detailed Description

In the existing federated learning framework, aiming at the problem of training model heterogeneity caused by client resource heterogeneity, the invention designs an optimization method of heterogeneous model fusion, so that the system can carry out corresponding aggregation algorithm optimization according to local iteration times uploaded by a user, thereby improving the learning performance and efficiency of the system.

With reference to fig. 1, the present invention provides an optimization method for heterogeneous model fusion in federated learning, which includes:

in the federal learning system, a client uploads a local model and uploads the local iteration times of the client;

As a specific example, the federal learning system includes a centralized federal learning system and an decentralized federal learning system. The federal learning is a machine learning framework, and can effectively help a plurality of units to carry out data use and machine learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations, wherein the data use and machine learning modeling comprises longitudinal federal learning, transverse federal learning, migratory federal learning, centralized federal learning and decentralized federal learning. User refers to a device capable of performing calculation, including a mobile terminal, a computer, and an edge router.

As a specific example, the heterogeneous model refers to: in the federal learning system, due to differences between the computing power and the owned data volume among the clients, the model generated after local training has heterogeneity.

Furthermore, the model is a set of data calculation flow method, which is mathematically embodied as a function and physically embodied as a section of code, and the data can obtain corresponding output results after the operation (such as addition, subtraction, multiplication and division or other operation steps) of the section of code, including model parameters of a support vector machine, model parameters of a multilayer perceptron, model parameters of a neural network and model parameters of reinforcement learning.

As a specific example, the local iteration number refers to the number of local training times of co-iteration within a specified training time when the local client performs local model training, and includes the number of gradient descent times, the number of training times of the same sample, and the number of training times of the whole local data set.

Further, the numerical value of the local iteration times refers to the machine learning model training of multiple iterations performed by the user within a specified time by using the own computing resources and owned data samples, and the times of training all the data samples are called the local iteration times, including the times of gradient descent and the training times experienced by the same data samples.

As a specific example, the uploading, by the client, of the local model and the local iteration number of the client itself at the same time refers to: when the client and the server communicate and upload the local model, the numerical value of the local training iteration times of the client in the current communication round needs to be uploaded at the same time.

As a specific example, the heterogeneous model aggregation refers to that the server performs model fusion to update a global model according to collected local models of all clients, and aggregation optimization is implemented by an aggregation algorithm, including a federal average algorithm, a federal approximate regular algorithm, and a federal asynchronous algorithm.

Further, the model fusion refers to that the server processes the collected local models to generate a global model, and then updates the global model to each user to ensure the normal operation of the federal learning system. The server is a device for coordinating communication and model sharing among users in the federal learning system, and can be connected with the users through a wireless or wired network, and comprises a base station, an edge server and a cloud server.

As a specific example, in the federal learning system, the client local training is specifically as follows:

The number of local iterations is

for each client, the training is to obtain an optimal learning model based on its own data, which is expressed as:

wherein

Representing the optimal local model parameters, F _i Is the local objective function of the ith client, ω _i Is the local model parameter for the ith client.

As a specific example, the server optimizes a heterogeneous model aggregation process in federated learning according to iteration times uploaded by the client, which is specifically as follows:

uploaded by the server on all clients

And

Adjusted aggregate weights that satisfy a normalization criterion, i.e.

As a specific example, the optimization of the aggregation weight includes linear weight optimization, square weight optimization, differential linear weight optimization, wherein:

the linear weight is optimized and expressed as

The square weight is optimized and expressed as

The differential linear optimization is expressed as

Wherein the content of the first and second substances,

representing the weight difference between two different clients i, j,

As a specific example, according to the aggregation weight adjusted by the server for each client at the t-th aggregation, the global objective function is expressed as:

wherein F represents a global objective function of the server;

wherein, ω is ^* Representing the optimal global learning model parameters.

Furthermore, the learning performance and efficiency of the system refer to the test accuracy of the global model finally output by the federal learning system on a standard data test set and the time and resources correspondingly consumed, including the test accuracy on a handwritten digital data set, and the test accuracy, the running time and the consumed operation period of a computer core processing unit on an open data set.

The invention is described in further detail below with reference to the figures and the embodiments.

Example 1

The embodiment provides an optimization method of heterogeneous model fusion in federal learning, and the optimization method is applied to a handwritten digital data set.

In an actual federated learning system, individual users have different computing power and data set sizes, and even the available computing resources and available data set sizes for each user may change over time. This may cause that the training iterations of each user within the same training time are different, for example, the user iteration with a large calculation capability is large or the user iteration with a large data set is small, which eventually causes heterogeneity between local models uploaded by each user. However, if the server wants to perform model fusion among these heterogeneous models and make the system output a global model with good performance, the server needs to consider the problem of model heterogeneity caused by the resource heterogeneity of the user. Traditional model fusion schemes such as a federal mean algorithm, a federal approximate regular algorithm and a federal asynchronous algorithm do not consider model heterogeneity, and the training effect on an actual data set is not ideal. Aiming at the phenomenon, the invention designs a method, each user is required to upload the local iteration times for training the current model while uploading the local model, so that the server can optimize the heterogeneous model fusion scheme by using the information of the local iteration times of each user, and a proper optimization method is selected to adjust the aggregation weight according to the scene requirement and the task requirement, thereby improving the learning performance and efficiency of the system.

The embodiment is divided into the following steps:

step 1, local training of client

This example takes training a handwriting number recognizer as an example. Firstly, each client participating in federal learning locally trains a private handwritten digital data set according to own computing capacity and owned data volume, a random gradient descent algorithm is adopted, the same learning rate and batch size are selected, the same neural network (CNN network) is used as a model base, the network has two convolutional layers, the first layer is a 10-channel convolutional layer with the size of 5 multiplied by 5, the second layer is a 20-channel convolutional layer with the size of 5 multiplied by 5, the two convolutional layers are activated by using a ReLU function and are followed by a maximum pooling layer with the size of 2 multiplied by 2, two fully-connected layers (320 multiplied by 10) are respectively formed after the two layers of convolutional, and finally a prediction structure classified by 10 is output by using a softmax function.

Step 2, uploading local information

After finishing local training, each client needs to complete the local model parameters of the client after finishing local training

And the number of local iterations run when training is completed

And uploading to a server. After the server receives the information uploaded by all the clients, the server can obtain the information according to the information

And

and optimizing the aggregation weight so as to output a global model.

Step 3, optimization of aggregation weight

Uploaded by the server on all clients

And

then, can be based on

And normalization criteria for aggregate weights

Selecting a proper optimization method to adjust the aggregation weight so that the global objective function F (omega) ^t ) Smaller and the final result of the output is the theoretically optimal global model parameter omega ^* Close.

Optimization methods for adjusting the weights include, but are not limited to, linear weight optimization, square weight optimization, differential linear weight optimization, and the like.

Wherein, the linear weight optimization can be expressed as:

the square weight optimization can be expressed as:

the differential linear optimization can be expressed as:

wherein the content of the first and second substances,

representing the difference in weight between two different clients,

and C represents weight optimization proportionality coefficients of different federal learning systems and different training targets. And according to a difference linear optimization formula and an aggregation weight normalization criterion, obtaining the aggregation weight adjusted by the server for each client during the t-th aggregation.

Step 4, generating a global model and updating local models of all clients

After the server adjusts the aggregation parameters, the server learns the aggregation rules according to the federation

Generating a current global model ω ^t And then the generated global model is issued to all the clients so as to update the clients by using the global modelA local model. And after the local model of the client is updated, the next round of local training can be started.

And 5, repeating the steps 1 to 4 until the communication turn reaches a preset numerical value. The federated learning system then outputs the final global model as a training result for the entire system.

The experimental results of this example are shown in fig. 2. In fig. 2, all experiments were performed with the computational resources and dataset size of each user dynamically changing over time, and the resulting test accuracy was tested on a standard handwritten digit dataset. The line marked by a square represents the test performance of the Federal averaging algorithm on the system; the line marked by a circle represents the test performance of the Federal approximate regular algorithm on the system; the line marked by the diamond indicates the test performance of the Federal asynchronous algorithm on the system; the lines marked with triangles indicate the test performance in the system using the optimization algorithm of the present invention. From the result, the method provided by the invention obviously improves the testing precision of the federal learning system in the actual scene, improves the overall performance of the system, and has a very wide application prospect.

Claims

1. An optimization method for heterogeneous model fusion in federated learning is characterized by comprising the following steps:

2. The method for optimizing fusion of heterogeneous models in federal learning as claimed in claim 1, wherein the federal learning system includes a centralized federal learning system and a decentralized federal learning system.

3. The method for optimizing heterogeneous model fusion in federal learning according to claim 1, wherein the heterogeneous model is: in the federal learning system, due to differences between the computing power and the owned data volume among the clients, the model generated after local training has heterogeneity.

4. The method for optimizing heterogeneous model fusion in federated learning according to claim 1, wherein the number of local iterations refers to the number of local training that a local client performs co-iteration within a specified training time when performing local model training, and includes the number of gradient descent, the number of training for the same sample, and the number of training experienced by the entire local data set.

5. The optimization method for heterogeneous model fusion in federal learning according to claim 1, wherein the uploading of the local model by the client and the uploading of the local iteration times of the client itself are: when the client and the server communicate and upload the local model, the numerical value of the local training iteration times of the client in the current communication round needs to be uploaded at the same time.

6. The method for optimizing heterogeneous model fusion in federated learning according to claim 1, wherein the heterogeneous model aggregation is that a server performs model fusion to update a global model according to collected local models of all clients, and the aggregation optimization is implemented by an aggregation algorithm, which includes a federated average algorithm, a federated approximate regular algorithm, and a federated asynchronous algorithm.

7. The optimization method for heterogeneous model fusion in federal learning according to any one of claims 1 to 6, wherein in a federal learning system, client local training is specifically as follows:

the total N clients in the federal learning system are designed to participate in training, wherein the ith client participates in the training for the t timeModel parameters uploaded before global aggregation are

The number of local iterations is

wherein

8. The method for optimizing heterogeneous model fusion in federated learning according to claim 7, wherein the server optimizes a heterogeneous model aggregation process in federated learning according to iteration times uploaded by a client, specifically as follows:

uploaded by the server on all clients

And

Adjusted aggregate weights that satisfy a normalization criterion, i.e.

9. The method of optimizing heterogeneous model fusion in federated learning of claim 8, wherein the optimization of the aggregate weight includes linear weight optimization, square weight optimization, differential linear weight optimization, wherein:

the linear weight is optimized and expressed as

The square weight is optimized and expressed as

The differential linear optimization is expressed as

Wherein the content of the first and second substances,

representing the weight difference between two different clients i, j,

10. The method for optimizing heterogeneous model fusion in federated learning of claim 9, wherein according to the aggregation weight that the server adjusts for each client at the t-th aggregation, the global objective function is expressed as:

wherein F represents a global objective function of the server;

wherein, ω is ^* Representing the optimal global learning model parameters.