CN116167452A

CN116167452A - Cluster federation learning method based on model similarity

Info

Publication number: CN116167452A
Application number: CN202211625268.5A
Authority: CN
Inventors: 胡敏; 曾云川; 黄宏程
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-05-26

Abstract

The invention relates to a cluster federation learning method based on model similarity, and belongs to the field of federation learning. The method comprises the following steps: s1: designing a local training adjustment strategy, and releasing a federal learning task by a server; after acquiring the federation learning task, the client sends a federation learning joining request containing identity information and data resource information to the server node; s2: after the server node verifies the identity and the data resource information of the client, the server broadcasts a global model; s3: adjusting the local training period of each client according to the data volume of each client; the obtained local model is used for calculating a model weight distance matrix; s4: designing a client self-adaptive clustering strategy, revealing the clustering relation among clients from a model similarity matrix, and self-adaptively dividing the clients with similar data distribution into the same cluster under the condition that the clustering quantity is not specified; s5: the FedCluster obtains a stable client cluster in communication between the client and the server.

Description

Cluster federation learning method based on model similarity

Technical Field

The invention belongs to the field of federal learning, and relates to a cluster federal learning method based on model similarity.

Background

Federal learning is a distributed machine learning framework that, in this highly data-sensitive era, is capable of cooperatively training a machine learning model involving multiple data warehouses in a manner that protects data privacy. The most widely used current is the "server-client" architecture, federal learning allows multiple users (called clients) to cooperatively train a shared global model without sharing data in the local device. And the central server coordinates and completes multiple rounds of federal learning to obtain a final global model. Wherein at the beginning of each round, the central server sends the current global model to clients participating in federal learning. Each client trains the received global model according to the local data, and returns the updated model to the central server after training.

Since federal learning focuses on obtaining a high quality global model by distributively learning the local data of all participating clients, it cannot capture personal information of each device, resulting in degraded performance of reasoning or classification. Furthermore, the accuracy of FedAvg is greatly reduced when learning on non-independent co-distributed (non I.I.D.) data. In the case where there are significant differences in the data distribution of individual clients, it is difficult for a single global model to cope with local distribution situations that are distinct from global distribution. For practical applications that often face non-independent co-distributed data sets, it is often not sufficient to have only a single model. Taking the mobile keyboard development language model as an example, users from different populations may have different usage patterns due to different international, linguistic and cultural nuances, e.g., certain words or emoticons may be used by a particular population of users. In this case, a more targeted prediction needs to be made for each user to meet the needs of the user.

In order to solve the data heterogeneity in federal learning, many people have made related studies. The FedAVg algorithm is improved by zhao et al, and the fact that when the data are in independent and same distribution, higher precision loss exists when the FedAVg algorithm is applied is found. They propose a computational weight divergence that can improve the accuracy of federal learning in non-IID data and a federal learning strategy for data sharing that improves the training effect on non-IID data by creating a small portion of data that is globally shared between all client devices at a central server. Muhammad et al, in order to improve the training efficiency of federal learning, combine federal learning and recommendation systems, put forward FedFast algorithm, this algorithm is an improved version of FedAvg algorithm, its basic flow is similar to federal average algorithm, mainly aiming at two key steps of federal learning, have improved customer end selection and model aggregation. It is well known that training models can be personalized to reduce heterogeneity and to have each model obtain a high quality personalized model, i.e. personalized federal learning.

Disclosure of Invention

Accordingly, the present invention is directed to a method for cluster federal learning based on model similarity, which can eliminate the influence of Non-IID and unbalanced data at the same time. To handle unbalanced data, the local training adjustment strategy adaptively adjusts the number of local training periods for each client. To further improve the accuracy and adaptability of clustering, a client clustering strategy based on weighted voting automatically groups each client into the appropriate cluster.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a method of cluster federation learning based on model similarity, the method comprising the steps of:

s1: designing a local training adjustment strategy, and releasing a federal learning task by a server; after acquiring the federation learning task, the client sends a federation learning joining request containing identity information and data resource information to the server node;

s2: after the server node verifies the identity and the data resource information of the client, the server broadcasts a global model;

s3: after the client acquires the global model, the local training period of the client is adjusted according to the data volume of each client; different clients experience different numbers of local training stages, and the resulting local model is used to calculate a model weight distance matrix, i.e., a model similarity matrix;

s4: designing a client self-adaptive clustering strategy, revealing the clustering relation among clients from a model similarity matrix, and self-adaptively dividing the clients with similar data distribution into the same cluster under the condition that the clustering quantity is not specified;

s5: the FedCluster obtains a stable client cluster in communication between the client and the server.

Optionally, in the step S1, the local training adjustment policy specifically is:

the cumulative loss of the client reflects the change of the local loss in the multi-round communication, and after the t-round communication, the cumulative loss of the client m is as follows:

representing a local experience loss of client m in the ith round of communication;

based on accumulated losses

FedCluster calculates the local iteration number of client m in the t-th round of communication loop +.>

The method comprises the following steps:

s ^* indicating that the client has the most local samples,

and |D _m I represents the local number of samples s of the client, respectively ^* And m; the parameter alpha represents the iteration number increase rate controlling the iteration number increase rate, and the parameter ρ is definedThe meaning is as follows:

in equation (1), fedcmaster uses the number of local samples and the cumulative loss to calculate the number of local iterations per client; the number of local samples determines the maximum step size

And the local experience loss determines the actual increase step ρ based on the maximum step; resulting in a ratio->

I.e. the difference between the local experience loss of the customer and the reference will gradually decrease;

after the first round of communication, the cumulative loss of clients with fewer local samples is compared to the baseline client s ^* The gap between them becomes smaller and smaller; less than

They are->

The gap between the client ends is gradually increased, and the iteration times of the client ends are not increased; once the variance of the customer's cumulative loss is minimized, the FedCluster will stop the local training adjustment process and each customer will complete its local model training in this round.

Optionally, in S4, the client adaptive clustering policy is:

the minimum inter-cluster distance is not less than the maximum intra-cluster distance, i.e

min dist(G _i ·，G _j ·)≥max dist(G _i· ，G _i· ) (4)

dist(G _i· ，G _j· ) Representing the model distance between any two clients from two different clusters, dist (G _i· ，G _i· ) Watch (watch)Showing the model distance between any two clients within the same cluster;

selecting the weight closest to the output layer as a representative weight set of all model weights to calculate the similarity of the two models; the total weight of the model is represented by ω, and the selected partial weight is represented by ω';

the FedCluster groups the clients by using a model similarity matrix M, wherein the matrix is obtained by carrying out local training calculation on the regulated local models of all the clients; each element M [ M, n ]]＝dist(ω′ _m ，ω′ _n ) Using the formula

To measure the model distance between any two clients; the ω is the weight number of the two models; taking M as input, fedCluster performs client clustering according to three steps, namely cluster demarcation detection, weighted voting and voting-based clustering.

Optionally, the group demarcation detection is:

for client M, fedCluster will first send M [ M, ]]The model distance values in (1) are ordered according to ascending order to obtain M' [ M, ]]Wherein M' is [ M, n ]]≥M′[m，z]N is greater than z; fedCluster calculations were then stored at M' [ M, ]]The difference between the distance values of any two adjacent models is obtained, and the maximum distance difference t is obtained _m The method comprises the steps of carrying out a first treatment on the surface of the Let t _m Clustering boundaries of all clients taking the client m as a reference; based on t _m M' [ M, ]]All clients indexed are divided into two groups P _m，1 And P _m，2 ，P _m，1 The difference between any two adjacent values is less than t _m ，P _m，2 The difference between any two adjacent values is not less than t _m ；

P _m，1 The clients in (a) can be allocated to the same cluster, cannot be assigned to the group belonging to P _m，2 Clustering the clients of (1); adding weighted voting to adaptively determine a final client clustering result;

the weighted voting is as follows:

for each client M and its corresponding M' [ M, ]]At P _m，1 The guests having the largest number of local samples among the clients:

client _mmax ＝arg max{||D _n |，n∈P _m,1 } (5)

then FedCluster let P _m，1 Each client n in (2) is a client according to equation (5) _mmax Voting and updating client n pairs of clients _mmax Total score of votes of (2)

Each customer maintains a list of accumulated vote scores for other customers who have been selected as the customer with the most samples;

for a particular customer, more local samples are given greater voting weights;

the voting-based clustering is as follows:

after cluster demarcation detection and weighted voting are operated on all clients by traversing all rows, a final voting score list of each client is obtained; for a given client m, it is assigned to the representative client with the highest cumulative score in the voting score list for m _* In the same cluster G _* In, i.e

By scanning all clients and their list of voting scores, the enhanced CFL automatically selects some representative clients as cluster heads and assigns other clients to those clusters.

The invention has the beneficial effects that:

the proposed framework was evaluated on four open datasets and showed the advantages of fedclumer compared to FedAvg, fedProx and FeSEM, which improved the stability and accuracy of federal learning training compared to FeSEM.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a workflow diagram of FedCluster.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.

Similar to FedAvg, three main steps are included: (1) the server broadcasting the global model to the clients; (2) each client trains a model using its local data; (3) The server aggregates these updated local models from the client; based on typical FL training, fedCluster incorporates two key strategies to address the shortcomings of the existing CFL methods. A local training adjustment strategy is first designed that adjusts the number of local training iterations per client given the amount of data. Different clients may experience different numbers of local training times and their resulting local models are used to calculate a model weight distance matrix (i.e., model similarity matrix). Secondly, an adaptive client clustering strategy is proposed to reveal the clustering relation between clients in a model similarity matrix and adaptively divide clients with similar data distribution into the same cluster. A weighted voting mechanism is designed to further eliminate the effects of unbalanced data. In combination with the two novel strategies, fedCluster can derive stable client clusters in the recorded communication between the client and the server in an efficient and adaptive manner.

1. Local training adjustment strategy

Typically, the local empirical loss reflects the state of the local model and is affected by the number of iterations during FL training. In particular, more iterations tend to make experience loss of local model training smaller before model convergence. In general, one epoch consists of multiple iterations through all samples in the machine learning model training process. Thus, unlike FedAvg setting the same number of local iterations for all clients, fedCluster dynamically adjusts the number of iterations per client during local model training to control the experience loss of the client. In contrast, more local iterations are set for fewer clients of the sample, while for more clients of the sample, fewer local iterations are set. Essentially, experience loss for clients with similar data distribution should remain consistent after such local training adjustments are made.

However, it is difficult to keep local experience loss consistent for customers of different sample numbers in a round of communication. This is because increasing the number of local iterations of a client may result in a significant reduction in local experience loss or overfitting when the number of samples stored by the client is small. Thus, in turn, the cumulative loss of customers is made more consistent rather than the local experience loss of a round of communication. The cumulative loss of a customer reflects its variation in local loss in multiple rounds of communication, which is easier to control than the local experience loss of a single round of communication. After t rounds of communication, the accumulated loss of the client m is as follows:

representing the loss of experience local to client m in the ith round of communication.

Based on accumulated losses

The method comprises the following steps:

s ^* indicating that the client has the most local samples,

and |D _m I represents the local number of samples s of the client, respectively ^* And m. In addition, the parameter α represents the iteration number increase rate that controls the iteration number increase rate, and the parameter ρ is defined as:

on the one hand, the cumulative loss of customers with more local samples tends to be more reliable. On the other hand, increasing the number of iterations of the client would incur excessive local computational overhead. Thus, the local iteration number should be set for all clients within a reasonable range. In practice, the cumulative loss of the clients takes most local samples as a standard with the minimum loss in general, and the cumulative loss among the clients is more consistent by controlling the cumulative loss of other clients to approach the standard. In equation (1), fedcmaster uses the number of local samples and the cumulative loss to calculate the number of local iterations per client. In particular the number of local samples determines the maximum step size (e.g

) While the local empirical loss determines the actual increase step size (e.g., p) based on the maximum step size. The adjustment strategy is conservative in that the actual step size of the increase in the number of iterations is decreasing during the local training adjustment. A greater number of local training iterations will result in a faster reduction in local experience loss, resulting in a ratio

I.e. the difference between the local experience loss of the customer and the local experience loss of the benchmark will gradually decrease.

FedCluster can naturally stop local training adjustment. After the first round of communication, clients with fewer local samples typically train the local model through multiple iterations, thus having a faster drop in local experience loss. Cumulative loss of such clients and benchmark clients ^* The gap between them will be smaller and smaller. But once they are smallIn the following

They are->

The gap between the client and the client will gradually increase, because the iteration number of the client will not increase. In summary, once the variance of the customer's cumulative loss is minimized, the FedCluster will stop the local training adjustment process and each customer will complete its local model training in this round.

2. Client adaptive clustering strategy

Unlike CFL work that previously required the input of k clusters in a priori, fedclumer can accomplish client clustering by using model similarity matrix alone without knowing the number of clusters.

The motivation for adaptive client cluster design is an important criterion for evaluating the clustering results, i.e. the minimum inter-cluster distance should not be smaller than the maximum intra-cluster distance, i.e.

min dist(G _i· ，G _j· )≥max dist(G _i· ，G _i· ) (4)

dist(G _i· ，G _j· ) Representing the model distance between any two clients from two different clusters, dist (G _i· ，G _i· ) Representing the model distance between any two clients within the same cluster. Such preconditions may ensure that all clients may be categorized into the appropriate cluster. Based on this precondition, fedclumter can simply search for cluster separation conditions instead of optimizing inter-cluster distance and intra-cluster distance.

Using a large model is all weights to compute model similarity between clients can result in a significant computational overhead. To reduce the computational cost of deriving the model similarity matrix, fedcmaster uses partially carefully selected weights to compute the model similarity between any two clients. The chosen weights will better reflect the differences between the two models, which is theoretically supported by some previous work. For example, a.rozantsev and j.yosinki propose that model high-level weights are more task-dependent than these low-level weights. Also, m.luo reports that when non-IID data is present, the neural network model has a greater model difference between the weights of the classifier layers than the model trained on IID data. Inspired by these efforts, the weight closest to the output layer is selected as the representative weight set of all model weights to calculate the similarity of the two models. The total weight of the model is denoted by ω, and the selected partial weight is denoted by ω'.

FedCluster groups clients using a model similarity matrix M, which is calculated from the adjusted local training of the stable local model of all clients. Specifically, each element M [ M, n]＝dist(ω′ _m ，ω′ _n ) Using the formula

(|ω| is the weight number of the two models) to measure the model distance between any two clients. M < M > -M of M matrix]Representing model distance between a model of client M and all other client models, where M [ M, M]=0, and M [ M, n]> 0, n noteq.m. Taking M as input, fedCluster performs client clustering according to three key steps, namely cluster demarcation detection, weighted voting and voting-based clustering. The method comprises the following specific steps:

(1) Cluster demarcation detection

For client M, fedCluster will first send M [ M, ]]The model distance values in (1) are ordered according to ascending order to obtain M' [ M, ]]Wherein M' is [ M, n ]]≥M′[m，z]N > z. FedCluster calculations were then stored at M' [ M, ]]The difference between the distance values of any two adjacent models is obtained, and the maximum distance difference t is obtained _m . Considering that t is given in the precondition that "good" cluster is represented in equation (4) _m Clustering boundaries of all clients with reference to the client m. Based on t _m M' [ M, ]]All clients indexed are divided into two groups P _m，1 And P _m，2 ，P _m，1 The difference between any two adjacent values is less than t _m ，P _m，2 No difference between any two adjacent valuesLess than t _m 。

Intuitively, P _m，1 The clients in (a) can be assigned to the same cluster but not to the group P _m，2 Because their model differences are relatively large. Furthermore, the maximum distance difference t calculated for the customer with fewer samples _m Poor stability compared to more sample customers, and therefore cannot rely on t alone _m And clustering clients. To solve these problems, a weighted voting mechanism may be added to adaptively determine the final customer clustering results.

(2) Weight voting

client _mmax ＝arg max{|D _n |，n∈P _m，1 } (5)

Each customer maintains a list of accumulated vote scores for other customers who have been selected as having the most samples.

Generally, the voting weight is assigned according to the sample size of the client. Specifically, for a particular customer, more local samples are given greater voting weights. Weighted voting can further eliminate cluster instability caused by clients with unbalanced samples and reduce the chance of incorrect clustering.

(3) Voting-based clustering

By traversing all rows, after steps (1) and (2) are run on all clients, a final list of voting scores for each client can be derived. For a given client m, accumulated in the voting score list assigned to mHighest representative client _* In the same cluster G _* In, i.e

Figure 1 shows the workflow of fedcmaster,

It is similar to FedAvg and comprises three main steps: (1) the server broadcasting the global model to clients; (2) each client trains a model using its local data; (3) The server aggregates these updated local models from the clients.

FedCluster is improved on the basis of typical federal training, and two key strategies are added to solve the defects of the current cluster federal learning method.

1: the server issues federal learning tasks; after acquiring the federation learning task, the client sends a federation learning joining request containing identity information and data resource information to the server node;

2: after the server node verifies the identity and the data resource information of the client, the server broadcasts a global model;

3: after the client acquires the global model, the local training period of the client is adjusted according to the data volume of each client. Different customers may experience different numbers of local training phases and their resulting local models are used to calculate a model weight distance matrix (i.e., a model similarity matrix)

4: the clustering relation among the clients is revealed from the model similarity matrix, and the clients with similar data distribution are adaptively divided into the same cluster under the condition that the clustering quantity is not specified

5: fedCluster can obtain a stable client cluster in an efficient and adaptive manner in several rounds of communication between the client and the server.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. A cluster federation learning method based on model similarity is characterized in that: the method comprises the following steps:

2. A method of cluster federal learning based on model similarity according to claim 1, wherein: in the step S1, the local training adjustment policy specifically includes:

based on accumulated losses

The method comprises the following steps:

s ^* indicating that the client has the most local samples,

and |D _m I represents the local number of samples s of the client, respectively ^* And m; the parameter α represents the iteration number increase rate controlling the iteration number increase rate, and the parameter ρ is defined as:

in equation (1), fedcmaster uses the number of local samples and the cumulative loss to calculate the number of local iterations per client; the number of local samples determines the maximum stepLong length

They are->

3. A method of cluster federal learning based on model similarity according to claim 2, wherein: in the step S4, the adaptive clustering strategy of the client is:

min dist(G _i· ，G _j· )≥max dist(G _i· ，G _i· ) (4)

dist(G _i· ，G _j· ) Representing the model distance between any two clients from two different clusters, dist (G _i· ，G _i· ) Representing a model distance between any two clients within the same cluster;

4. A method of clustered federal learning based on model similarity according to claim 3, wherein: the group demarcation detection is:

the weighted voting is as follows:

for each client M and its corresponding M' [ M; carrying out]At P _m，1 Is the guest having the largest number of local samples among the clients of (a)The method comprises the following steps:

client _mmax ＝arg max{|D _n |，n∈P _m，1 } (5)

for a particular customer, more local samples are given greater voting weights;

the voting-based clustering is as follows: