CN115829027A

CN115829027A - Comparative learning-based federated learning sparse training method and system

Info

Publication number: CN115829027A
Application number: CN202211349843.3A
Authority: CN
Inventors: 陈家辉; 李峥明; 徐培明
Original assignee: CSG Electric Power Research Institute; Guangdong University of Technology
Current assignee: CSG Electric Power Research Institute; Guangdong University of Technology
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-03-21

Abstract

The invention discloses a comparative learning-based federated learning sparse training method and system, and relates to the cross field of a federated learning algorithm framework, neural network sparse training and comparative learning. Wherein the method comprises the following steps: the server side sends a global model and a mask to the local client side; the local client generates a local sparse model according to the received global model and the mask, and trains the local sparse model by using a local data set; the local client calculates a comparison loss function, updates the local loss function and the local sparse model, and uploads the updated local sparse model to the server; and the server side aggregates the local sparse model updated by the local client side, updates the global model, sends the updated global model to the local client side, and starts a new round of communication training until the global model converges. According to the method, sparse training and contrast learning are introduced into federal learning, so that the calculation communication overhead is obviously reduced, and the performance of a global model is improved.

Description

Comparative learning-based federated learning sparse training method and system

Technical Field

The invention relates to the technical field of distributed machine learning, relates to the crossing field of a federated learning algorithm frame and neural network sparse training and contrast learning, and particularly relates to a federated learning sparse training method and system based on contrast learning.

Background

Data islanding caused by privacy protection, computing resources and other reasons is preventing the use of big data necessary for training artificial intelligence models.

As a distributed machine learning technology, federated learning becomes a method for solving data islands, and a machine learning model is trained by a plurality of clients together. Federal learning is widely applied to medical learning, natural language processing, fraud credit card detection and the like by cooperatively training a machine learning model through an exchange model under the condition that data is not sent to others, so that data privacy is protected.

But federal learning still has the following problems at present:

(1) The problem of heterogeneity: the heterogeneity of data, that is, the data which are not independently and identically distributed can make the local model deviate from the global model, and the performance of the aggregated global model is influenced;

(2) Calculating the communication overhead problem: in real life, some local clients are small devices such as mobile phones or personal notebooks, and these devices do not have enough power to train large models, and meanwhile, communication with the server is limited by bandwidth.

When resources are limited, the federate learning training precision is greatly reduced due to the existence of the problems.

Disclosure of Invention

The invention provides a comparative learning-based dynamic sparse training method for federal learning, which aims to reduce the communication overhead of federal learning and ensure the accuracy of a model.

In order to solve the technical problems, the technical scheme of the invention is as follows:

in a first aspect, a comparative learning-based federated learning sparse training method includes:

the server side sends a global model and a mask to the local client side; wherein the mask is generated based on sparsity to indicate whether global model parameters are retained;

the local client generates a local sparse model according to the received global model and the mask, and trains the local sparse model by using a local data set;

in each round of training process, the local client side carries out comparison loss function calculation, updates the local loss function and the local sparse model, and uploads the updated local sparse model to the server side;

and the server side aggregates the local sparse model updated by the local client side, updates the global model, sends the updated global model to the local client side, and starts a new round of communication training until the global model converges.

In the technical scheme, the sparse model is directly trained in the process of federal learning, so that the calculated amount in the training process is effectively reduced, the storage cost of equipment is reduced, the training process is accelerated, and the federal learning calculation communication overhead is obviously reduced; in addition, a comparison learning method is introduced in the process of federal learning, common characteristics among similar examples are learned, the similarity of the same target under different data enhancement is maximized by using a comparison loss function, the similarity among different targets is minimized, the problem of data heterogeneity is solved, and the accuracy of the model is improved while the communication overhead of federal learning calculation is reduced.

As a preferred scheme, the sending, by the server, the global model and the mask to the local client includes:

server side initialization global model

Generating a mask indicating whether parameters of a global model are reserved according to sparsity

Wherein t represents the federal learning round, and the sparsity S is the ratio of the number of parameters cut out in the global model to the total number of parameters;

random selection of participation book by serverThe global model is generated by taking part in the local client of the federal study

Sum mask

And sending the data to the local client.

As a preferred scheme, the local client generates a local sparse model according to the received global model and the mask, specifically:

local client receives global model

Sum mask

Global model

Sum mask

Performing Hadamard inner product to obtain a local sparse model

Where t represents the federal learning round and k represents the index of the local client.

As a preferred scheme, the training of the local sparse model by using the local data set includes:

local client inputs local data set into local sparse model

Middle, local sparse model

Making a prediction and calculating a loss function

Wherein t represents a federal learning turn, and k represents an index of a local client;

according to a preset learning rate eta, the local sparse model is subjected to

And (6) updating.

As a possible design of the preferred embodiment, the local sparse model is subjected to a learning rate η according to a preset learning rate η

Updating, wherein the updating process adopts the following operations:

as a preferred scheme, in each round of training, the local client performs comparison loss function calculation to update the local loss function and the local sparse model, including:

respectively inputting the local data sets into the t-th round local sparse models

Local sparse model of round t-1

Global model of the t-th round

In the method, corresponding characteristic vectors z and z are obtained respectively _last And z _glob ；

Computing a contrast loss function from the feature vectors

The expression is as follows:

in the formula, tau is a preset temperature over-parameter;

updating a local loss function, wherein the expression of the local loss function is as follows:

in the formula (I), the compound is shown in the specification,

representing local sparse models

A loss function of (d);

utilizing updated local penalty functions

Updating local sparse models

As an optimal scheme, in each training process, the local client side carries out comparison loss function calculation, after the local loss function and the local sparse model are updated, mask adjustment is carried out in a preset communication turn, the network structure of the local sparse model is dynamically evolved and updated, and then the dynamically evolved and updated local sparse model is uploaded to the server side.

In the preferred scheme, the mask is adjusted in a specific turn, and the local sparse network is dynamically updated, so that the purpose of searching for a better sparse structure can be realized. Compared with static sparse training, under the condition of high sparsity, the dynamic sparse training can improve the precision of the local sparse model, and further improve the accuracy of the whole federal learning model.

As a possible design of the preferred scheme, the mask adjustment is performed in a preset communication turn, and the network structure of the local sparse model is updated through dynamic evolution, specifically:

specific turns of communication with server at local clientRemoving the connection between partial neuron nodes of the local sparse model to adjust the local sparse model to higher sparsity S + (1-S) alpha ^t (ii) a Wherein alpha is ^t Is a dynamic adjustment parameter, whose expression is:

wherein alpha represents a preset dynamic adjustment parameter alpha of the first round ¹ T represents a federal learning round, T _end Representing the last round of learning;

and (3) according to the instant gradient information of the local sparse model, increasing and removing the same number of neurons and the connection with the maximum gradient, and recovering the sparsity of the model to be the original sparsity S.

As a preferred scheme, the server aggregating the local sparse model updated by the local client and updating the global model includes:

the server receives local sparse models uploaded by a plurality of local clients

The server side enables a plurality of local sparse models to be combined based on a FedAvg mode

Performing unified aggregation to generate an updated global model

The expression of the polymerization process is as follows:

wherein K represents the local client c participating in the training of the t-th round _k Number of (2), D _k Representing local client c _k The corresponding local data set is then stored in the memory,

data set representing all local clients, k local client c _k Is used to determine the index of (1).

In a second aspect, the comparative learning-based federated learning sparse training system is applied to the comparative learning-based federated learning sparse training method provided in any technical scheme in the first aspect, and comprises a server and a local client, wherein the server is connected with the local client;

the server is used for sending a global model and a mask to the local client, aggregating the local sparse models uploaded by the local client and updating the global model; the mask is generated based on sparsity and is used for representing whether the global model parameters are reserved or not;

the local client is used for receiving the global model and the mask to generate a local sparse model, training the local sparse model by using the local data set, calculating a contrast loss function, updating the local loss function and the local sparse model, and uploading the updated local sparse model to the server.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention adopts a sparse training method in the process of federal learning, obviously reduces the calculation communication overhead, simultaneously introduces a comparative learning method, corrects the local model based on the similarity between model representations, trains a global model with smaller deviation, solves the problem of data heterogeneity in federal learning and improves the performance of the global model.

Drawings

FIG. 1 is a flow chart of a federated learning sparse training method;

FIG. 2 is a flow diagram of a federated learning sparse training method including mask adjustment;

FIG. 3 is a schematic diagram of a learning process framework of the federated learning sparse training method in embodiment 2;

fig. 4 is a comparison graph of the test accuracy results of the comparative learning-based federal learning sparse training method and other federal learning methods on the MNIST data set in example 2.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the present embodiments, certain elements of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described with reference to the drawings and the embodiments.

Example 1

The embodiment provides a comparative learning-based federated learning sparse training method, which, referring to fig. 1, includes:

the server side sends the global model and the mask to the local client side; wherein the mask is generated based on sparsity to indicate whether global model parameters are retained;

in each round of training process, the local client side calculates a comparison loss function, updates the local loss function and the local sparse model, and uploads the updated local sparse model to the server side;

and the server side aggregates the updated local sparse models of the local client side, updates the global model, sends the updated global model to the local client side, and starts a new round of communication training until the global model is converged.

In the embodiment, in the process of federal learning, a sparse training method is introduced, and a local sparse model is generated at a local client by using a mask and is directly trained, so that the calculated amount in the process of federal learning is effectively reduced, the storage cost of equipment is reduced, the training process is accelerated, and the calculation communication overhead of federal learning is remarkably reduced; meanwhile, a comparison learning method is introduced, and the local model is corrected based on the similarity between model representations, so that the problem of data heterogeneity is solved. Through the cross cooperation among the federal learning, the coefficient training and the contrast learning, the accuracy of the global model is improved while the calculation communication overhead of the federal learning is reduced.

In a preferred embodiment, the sending, by the server, the global model and the mask to the local client includes:

server-side initialized global model

the server randomly selects local clients participating in the federate learning in the current round, and the global model is converted into a global model

Sum mask

And sending the data to the local client.

In the preferred embodiment, the sparsity S is the ratio of the number of parameters pruned from the global model to the total number of parameters, and the mask is generated based on sparsity, which represents the structure of the sparse network.

In an alternative embodiment, the mask is in binary form.

As a non-limiting example, the mask is generated using a pruning algorithm based on sparsity.

In a preferred embodiment, the local client generates a local sparse model according to the received global model and the mask, specifically:

local client receives global model

Sum mask

Global model

Sum mask

Performing Hadamard inner product to obtain a local sparse model

That is to say that the temperature of the molten steel,

wherein | _ is the Hadamard inner product.

In a preferred embodiment, the training the local sparse model using the local data set includes:

local client inputs local data set into local sparse model

Middle, local sparse model

Making predictions and calculating loss functions

According to the preset learning rate eta, the local sparse model is subjected to

And (6) updating.

In an optional embodiment, the local sparse model is subjected to a pair of local sparse models according to a preset learning rate eta

Updating, wherein the updating process adopts the following operations:

in a preferred embodiment, in each round of training, the local client performs a comparison loss function calculation to update the local loss function and the local sparse model, including:

Local sparse model of round t-1

Global model of the t-th round

In (2), corresponding feature vectors z, z are obtained respectively _last And z _glob (ii) a Wherein z represents a vector of the features of the sample passing through the output of the Projection head (Projection head) structure of the feature representation network;

computing a contrast loss function from the feature vectors

The expression is as follows:

in the formula, tau is a preset temperature hyper-parameter;

in the formula (I), the compound is shown in the specification,

representing local sparse models

A loss function of (d);

utilizing updated local penalty functions

Updating local sparse models

In a preferred embodiment, in each training process, the local client performs comparison loss function calculation, updates the local loss function and the local sparse model, performs mask adjustment in a preset communication turn, dynamically evolves and updates a network structure of the local sparse model, and uploads the dynamically evolved and updated local sparse model to the server.

In a specific implementation process, a sparse network structure is randomly selected in an initial training stage, and mask adjustment is performed in a subsequent sparse training process. Because the mask represents the structure of the sparse network, the structure of the sparse network can be continuously changed through mask adjustment, so that the purpose of searching for a better sparse structure is achieved.

In an optional embodiment, referring to fig. 2, the mask adjustment is performed in a preset communication turn, and the network structure of the local sparse model is updated through dynamic evolution, specifically:

removing the connection between partial neuron nodes of the local sparse model in a specific turn of communication between the local client and the server, so that the local sparse model is adjusted to be higher in sparsity S + (1-S) alpha ^t (ii) a Wherein alpha is ^t Is a dynamic adjustment parameter, whose expression is:

in which alpha represents a preset first roundDynamic adjustment of parameter alpha ¹ T represents a federal learning round, T _end Representing the last round of learning;

according to the instant gradient information of the local sparse model, the connection with the maximum gradient and the same number of neurons is increased and removed, so that the sparsity of the model is recovered to be the original sparsity S.

In a preferred embodiment, the aggregating, by the server, the local sparse model updated by the local client to update the global model includes:

Performing unified aggregation to generate an updated global model

The polymerization process expression is as follows:

wherein K represents the local client c participating in the training in the tth round _k The number of the (c) component(s),

representing local client c _k The corresponding local data set is then stored in the memory,

In a specific implementation process, after the server completes the local sparse model, the newly generated global model is sent to the selected local client, and a new round of communication training is started until the global model converges.

Example 2

In this embodiment, an experiment is performed on the comparative learning-based federal learning sparse training method proposed in embodiment 1 by using a public MNIST dataset, with reference to fig. 1 to 4.

The MNIST data set (Mixed National Institute of Standards and Technology database) is a large handwritten digital database collected and collated by the National Institute of Standards and Technology, containing a training set of 60000 examples and a test set of 10000 examples.

Consider a typical federal learning framework: setting a global model as a convolutional neural network comprising two 5*5 convolutional layers, two maximum pooling layers and four fully-connected layers; the total number of the local clients is 100, 20 local clients are randomly selected from each communication turn to participate in training, and each local client iterates 10 times on a local data set by using an SGD optimizer in each turn and communicates with a service terminal 50 times.

As a non-limiting example, in the training process, the server sets sparsity S =0.5, and initializes the global model

Setting masks according to sparsity

The server randomly selects 20 local clients and sends a global model and a mask to the selected local clients;

after receiving the global model and the mask, the local client generates a local sparse model

Training a local model on a local data set, inputting local data x into the local sparse model in small batches of 32 samples, predicting the local sparse model, and calculating a loss function

Preset learning rate η =0.01 and proceedThe local sparse model is updated as follows:

respectively inputting local data x into local sparse models of the current round

Local sparse model of previous round

Global model of the present round

In the method, corresponding characteristic vectors z and z are obtained respectively _last 、z _glob Presetting a temperature hyperparametric calculation contrast loss function with tau =1

Update the local loss function as:

utilizing updated local penalty functions

Updating local sparse models

Setting that the local client performs mask adjustment once every ten rounds and dynamically updating the result of the sparse network. Setting α =0.01, when the local training is completed, the communication between the local client and the server is performedAnd (3) setting the number of rounds, wherein the local client adjusts the local sparse model to higher sparsity S + (1-S) alpha by removing the connection between the neuron nodes of the local sparse model part ^t (ii) a And then, according to the instant gradient information of the local sparse model, the connection with the maximum gradient and the same number of neurons is increased and removed, so that the sparsity of the local sparse model is recovered to be S. Wherein alpha is ^t Is to dynamically adjust the parameters and update the plan according to the cosine attenuation

And adjusting the change of sparsity.

After the local sparse model training and updating are completed by the selected local clients in the round, the local models are uploaded to the server, the server aggregates the local sparse models uploaded by the local clients in a FedAvg manner, and an updated global model is generated

Completing a round of communication learning;

wherein the polymerization mode is as follows:

after the server finishes aggregation of the local sparse models uploaded by the 20 local clients participating in the training, the newly generated global model is used

And sending the information to the selected local client, and starting a new round of communication training until the global model converges.

In addition, the convolutional neural network with the same structure and the same setting as those of the global model are selected in the embodiment, and the MNIST classification prediction task is executed. Selecting 20 local clients from 100 local clients, and under the condition that the given sparsity is S =0.5, each local client iterates 10 times on a local data set by using an SGD optimizer in each round and communicates with a service end 50 times to perform prediction after federal learning training, wherein the accuracy of the prediction result is shown in fig. 4. Obviously, compared with fedstt, fedAvg and FedProx, the model obtained by the comparative learning-based federal learning sparse training method provided by the embodiment has better performance, and higher accuracy can be obtained by requiring fewer communication rounds.

Example 3

The embodiment provides a comparative learning-based federated learning sparse training system, which is applied to the comparative learning-based federated learning sparse training method provided in embodiment 1, and the comparative learning-based federated learning sparse training system comprises a server and a local client, wherein the server is connected with the local client;

The same or similar reference numerals correspond to the same or similar parts;

the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A comparative learning-based federated learning sparse training method is characterized by comprising the following steps:

2. The comparative learning-based federated learning sparse training method according to claim 1, wherein the server sends a global model and a mask to a local client, comprising:

server side initialization global model

Generating a mask indicating whether parameters of a global model are reserved according to sparsity S

Sum mask

And sending the data to the local client.

3. The comparative learning-based federated learning sparse training method according to claim 1, wherein the local client generates a local sparse model according to the received global model and mask, specifically:

local client receives global model

Sum mask

Global model

Sum mask

Performing Hadamard inner product to obtain a local sparse model

4. The comparative learning based federated learning sparse training method of claim 1, wherein the training of the local sparse model using the local data set comprises:

local client inputs local data set into local sparse model

Middle, local sparse model

Making predictions and calculating loss functions

And (6) updating.

5. The comparative learning-based federated learning sparse training method of claim 4, wherein the local sparse model is subjected to η pair according to a preset learning rate

Updating, wherein the updating process adopts the following operations:

6. the comparative learning-based federated learning sparse training method according to claim 1, wherein during each training round, the local client performs a comparative loss function calculation to update the local loss function and the local sparse model, which includes:

Local sparse model of round t-1

Global model of the t-th round

In (1),respectively obtaining corresponding eigenvectors z and z _last And z _glob ；

Computing a contrast loss function from the feature vectors

The expression is as follows:

in the formula, tau is a preset temperature over-parameter;

in the formula (I), the compound is shown in the specification,

representing local sparse models

A loss function of (d);

utilizing updated local penalty functions

Updating local sparse models

7. The comparative learning-based federal learning sparse training method as claimed in claim 1, wherein in each training process, the local client performs comparative loss function calculation, performs mask adjustment in a preset communication turn after updating the local loss function and the local sparse model, dynamically evolves and updates a network structure of the local sparse model, and uploads the dynamically evolved and updated local sparse model to the server.

8. The comparative learning-based federal learning sparse training method as claimed in claim 7, wherein the mask adjustment is performed in a preset communication turn, and the network structure of the local sparse model is dynamically updated through evolution, specifically:

removing the connection between partial neuron nodes of the local sparse model in a specific turn of communication between the local client and the server, so that the local sparse model is adjusted to a higher sparsity S + (1-S) alpha ^t (ii) a Wherein alpha is ^t Is a dynamic adjustment parameter, whose expression is:

9. The comparative learning-based federated learning sparse training method according to any one of claims 1 to 8, wherein the server aggregates the updated local sparse models of the local clients and updates the global model, including:

Performing unified aggregation to generate an updated global model

The expression of the polymerization process is as follows:

wherein K represents the local client c participating in the local training in the tth round _k The number of the (c) component(s),

data sets representing all local clients, k representing local client _k Is used to determine the index of (1).

10. A comparative learning-based federated learning sparse training system, which is applied to any one of claims 1 to 9, and is characterized by comprising a server and a local client, wherein the server is connected with the local client;

the server is used for sending the global model and the mask to the local client, aggregating the local sparse models uploaded by the local client and updating the global model; the mask is generated based on sparsity and is used for representing whether the global model parameters are reserved or not;