CN115840900A - Personalized federal learning method and system based on self-adaptive clustering layering - Google Patents

Personalized federal learning method and system based on self-adaptive clustering layering Download PDF

Info

Publication number
CN115840900A
CN115840900A CN202211129262.9A CN202211129262A CN115840900A CN 115840900 A CN115840900 A CN 115840900A CN 202211129262 A CN202211129262 A CN 202211129262A CN 115840900 A CN115840900 A CN 115840900A
Authority
CN
China
Prior art keywords
model
group
client
parameter
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211129262.9A
Other languages
Chinese (zh)
Inventor
谢在鹏
刘尧
蒋俊辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN202211129262.9A priority Critical patent/CN115840900A/en
Publication of CN115840900A publication Critical patent/CN115840900A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an individualized federated learning method and an individualized federated learning system based on self-adaptive clustering layering, wherein the method comprises the following steps: carrying out weighted average processing on the gradients of all clients of the parameter server, and adjusting global model parameters by adopting the average gradient obtained by calculation; calculating the similarity between the clients according to the gradients uploaded by all the clients in the latest round, clustering and grouping all the clients according to the calculation result, and generating the individualized weight vectors of the layers in the group; the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized federal learning training in the groups, and the latest intra-group model parameters are uploaded to the parameter server; and the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all the client groups to obtain a new global model. The invention achieves the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging the global generalization capability.

Description

Personalized federated learning method and system based on adaptive clustering layering
Technical Field
The invention belongs to the technical field of distributed machine learning, and particularly relates to an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy.
Background
With the coming of the cloud era and the popularization of edge equipment (such as smart phones, smart wearable equipment and the like), data are constantly generated and even increase in a blowout manner. These rich data provide great opportunities for machine learning applications, such as speech recognition, computer vision, where deep neural networks can efficiently extract the desired information with a large amount of training data. However, as data privacy is more and more concerned in all social circles, a great deal of data generated in edge devices or organizations (such as hospitals, companies and court yards) cannot be collected in a central server, which brings great challenges to deep learning.
Federal learning is a deep learning framework, and clients can cooperatively train a shared global model to process their data under the coordination of a central server, and meanwhile, the privatization of the data is kept, and the risk of systematic privacy and the communication cost are reduced. Most of the existing training methods are federally averaged variants, and the traditional federal learning focuses on obtaining a high-quality and universal global model by learning local data of participating clients. However, in the presence of statistical heterogeneity of data (e.g., non-independent co-distributed and unbalanced data), federated learning has difficulty training out a single model that is applicable to all customers. Optimizing the global model individually may result in poor performance of the local model because the global model does not fit every client. This problem is further exacerbated as the difference between local data of different customers becomes larger and larger.
To alleviate the problem of data statistical heterogeneity that degrades federal learning performance, personalized federal learning becomes a solution. Personalized federal learning aims at training unique personalized models for different customers, which can combine the generalization characteristics of global models and the distribution matching characteristics of local models, but the challenge is how to achieve a fine balance between the specific knowledge of local models and the shared knowledge of global models. In recent years, much work has focused on two possible solutions in the search of personalized federal chemistry: cluster-based personalization and tier-based personalization. The cluster-based personalization method groups customer clusters with similar data distributions and trains a specialized model for each cluster group. The hierarchy-based personalization methods personalize some layers of the local model, while others are derived from the global model.
Although both of these approaches may improve federal learning performance through personalization, significant problems remain. Current clustering-based personalization methods are less concerned with model sharing among groups, and therefore they may affect the generalization performance of the global model. Meanwhile, the existing layer-based personalization method usually adopts a layering scheme predefined by people, and is lack of flexibility and adaptability. Therefore, they may end up with sub-optimal solutions, resulting in an imbalance in the performance of the global and local models.
The invention with the application number of 202210511356.6 provides a method and a system for federated learning, which comprises the following steps: s1, sending an initial global model to all clients, and uploading the initial local model to a central service system by the clients; s2, clustering the clients according to the initial local model uploaded by the clients to obtain more than one client class; s3, performing multiple rounds of iterative training on the global model until an iterative stopping condition is reached, and performing the t-th round of iterative training: selecting at least one client from each client class to participate in training; judging whether gradient conflicts exist between the clients participating in the t-th round of iterative training or not based on the t-th round of local models returned by the clients and the t-th round of loss function values, and acquiring accumulated model differences according to gradient conflict conditions; and updating the global model of the t round by using the accumulated model difference. The method has the advantages that the reasons of model unfairness are divided into two types of external contradiction and internal contradiction to be eliminated, the representativeness and fairness of the selected client are improved, the training turns and communication cost are reduced, and convergence is accelerated. However, the present invention cannot solve the problems of unbalanced performance of the global model and the local model and poor generalization performance of the global model.
Disclosure of Invention
The technical problem to be solved is as follows: the invention provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, which solve the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and carry out clustering grouping and adaptive hierarchical fusion by integrating a client clustering method and an adaptive hierarchical fusion algorithm and utilizing performance feedback of clients, thereby flexibly establishing an individualized strategy for a specific federated learning task and achieving the technical effect of greatly improving the individualized performance of a client local model on the premise of not damaging global generalization capability.
The technical scheme is as follows:
an individualized federated learning method based on adaptive clustering hierarchy comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute the individualized federal learning training in the groups, and the obtained latest in-group model parameters are uploaded to the parameter server; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
Further, in step S2, the process of the global parameter server issuing the global model parameters to the client, making the client use the received global model parameters as initial parameters of the local model, performing the current round of training on the model by using the local training data set, evaluating the model prediction effect by using the test data set after the training is completed, calculating the gradient, and uploading the calculated gradient to the parameter server includes the following steps:
s21, the global parameter server enables the model W g Parameter of the t-th round
Figure SMS_1
Issuing the data to K clients participating in federal learning training; t epsilon [1,T pre ]Wherein, T pre The round 1 parameter for the first phase maximum communication round>
Figure SMS_2
Obtaining a random initialization model by a global parameter server;
s22, in each receiving model W g Parameter of the tth round
Figure SMS_3
The following training steps are performed in parallel on the client:
s221, the client side enables the model W g Parameter of the tth round
Figure SMS_4
As initial model parameter, is recorded as->
Figure SMS_5
A tth round of initial model parameters representing a kth client local model; />
S222, based on the initial model parameters
Figure SMS_6
And a training data set of N samples randomly drawn from the raw data held by the client>
Figure SMS_7
The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method to obtain an optimized model parameter->
Figure SMS_8
S223, the client uses the optimized model parameters
Figure SMS_9
For the test data set->
Figure SMS_10
Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
Figure SMS_11
S224, the client side enables the gradient g k And sending the data to a global parameter server.
Further, in step S222, the optimized model parameters are calculated by using the following formula
Figure SMS_12
Figure SMS_13
Figure SMS_14
Wherein the content of the first and second substances,
Figure SMS_16
representing the sampled training data set pick>
Figure SMS_19
Is selected, based on the number of samples in (1), based on the number of samples in (4)>
Figure SMS_20
For the loss values, x and y represent numbers, respectivelyCharacteristic of a single sample in the dataset and corresponding label, <' >>
Figure SMS_17
Representing a model output result>
Figure SMS_18
A loss from the true value y, eta represents the learning rate, r>
Figure SMS_21
Represents->
Figure SMS_22
For->
Figure SMS_15
Of the gradient of (c).
Further, in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting the global model parameter by using the calculated average gradient includes the following steps:
global parameter Server from client training data set
Figure SMS_23
Number of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k
Adopting Federal average algorithm FedAvg to weight and aggregate gradients of all K clients participating in Federal learning training to obtain model parameters of t +1 round
Figure SMS_24
Figure SMS_25
Further, in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the latest round, performs cluster grouping on all the clients according to the calculation result, and selects a group server for each client group, including the following steps:
s41, the parameter server is according to the T pre Gradient { g) uploaded by all clients in turn k } k∈K By calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; wherein
Figure SMS_26
ρ i,j =S C (i,j),S C (i,j)=(g i ·g j )/(||g i ||·|g j ||);
S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groups
Figure SMS_27
S43, selecting a group server for each client group to coordinate the training of clients in the group;
s44, making the group server of each client group copy a global parameter server model W g T th of (A) pre Wheel parameters
Figure SMS_28
As a client group server model W m Is greater than or equal to>
Figure SMS_29
Where M is for {1,2, …, M }.
Further, the process of generating a personalized weight vector for each client group layered within the group in step S4 comprises the steps of:
calculating an average gradient within the group from the gradients of the clients within the group
Figure SMS_30
Figure SMS_31
Average gradient
Figure SMS_32
Spread out in model parameter layer, expressed as->
Figure SMS_33
Wherein l is the total number of model parameter layers;
to average gradient
Figure SMS_34
Calculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta m
Figure SMS_35
Defining a hyper-parameter beta for adjusting the degree of personalization, delta m After normalization, multiplying by beta to obtain personalized model weight psi of group internal hierarchy m
ψ m =β·δ m /max(δ m )。
Further, in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server includes the following steps:
s51, the global parameter server enables the model W g Parameter of the tth round
Figure SMS_36
Sending to M customer group servers; t e (T) pre ,T total ) Wherein, T pre ,T total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;
s52, the following steps are executed in parallel on each client group server:
s521, the group server receives the global model W sent by the parameter server g Parameter of the tth round
Figure SMS_37
S522, the group server enables the t-th round global model W g Parameter (d) of
Figure SMS_38
And group model W m Is greater than or equal to>
Figure SMS_39
Personalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated to->
Figure SMS_40
The weighted fusion process first withholds>
Figure SMS_41
And &>
Figure SMS_42
Is decomposed into->
Figure SMS_43
And &>
Figure SMS_44
And then fusing the parameters of each layer, wherein the specific formula is as follows:
Figure SMS_45
Figure SMS_46
wherein
Figure SMS_47
Represents->
Figure SMS_48
Parameter in layer n->
Figure SMS_49
Represents->
Figure SMS_50
The parameter of the nth layer, n belongs to {1,2, …, l };
s523, the group server combines the group model W m Parameter (d) of
Figure SMS_51
Sending to the client in the group;
s53, in each receiving group model W m Parameter of the tth round
Figure SMS_52
The following training steps are performed in parallel on the client:
s531, the client side makes the group model W m Parameter of the tth round
Figure SMS_53
As initial model parameter, is recorded as>
Figure SMS_54
A tth round of initial model parameters representing a kth client local model;
s532, based on the initial model parameters
Figure SMS_55
And a training data set of N samples taken randomly from the raw data held by the client>
Figure SMS_56
The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method SGD to obtain an optimized model parameter->
Figure SMS_57
S534, the client uses the optimized model parameters
Figure SMS_58
For the test data set->
Figure SMS_59
Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
Figure SMS_60
S535, the client sends the gradient g k Sending to a corresponding group server;
s54, each group server trains data sets according to clients in the group
Figure SMS_61
Number of samples n k Calculating the weight proportion occupied by the client k in the group>
Figure SMS_62
Updating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group m Is greater than or equal to>
Figure SMS_63
The specific calculation formula is as follows:
Figure SMS_64
s55, judging whether the iteration times reach the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;
s56, the group server model W m Updated parameters
Figure SMS_65
Is recorded as->
Figure SMS_66
Group server model W m In a parameter>
Figure SMS_67
And sending the data to a global parameter server.
Further, in step S5, the process of the parameter server obtaining a new global model by weighted average aggregation of the received latest intra-group model parameters sent by all client groups includes the following steps:
the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group servers
Figure SMS_68
Obtain a model W g The parameter of the t +1 round>
Figure SMS_69
The specific calculation formula is as follows:
Figure SMS_70
the invention also discloses an individualized federated learning system based on the adaptive clustering hierarchy, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the individualized federated learning method based on the adaptive clustering hierarchy when executing the program.
Has the advantages that:
first, the personalized federal learning method based on adaptive clustering hierarchy of the invention, because of integrating the customer clustering method, can group the customer according to the similarity of the data distribution on the premise of not obtaining the true data of the customer, the customer can join the corresponding online learning system to carry on training and reasoning of the model.
Secondly, the personalized federal learning method based on the self-adaptive clustering hierarchy integrates the self-adaptive hierarchical fusion scheme, so that a customer can obtain a personalized model most suitable for the customer, and meanwhile, the method can maintain a global model with good generalization performance so as to facilitate the addition or use of a corresponding personalized federal learning system by a new customer.
Thirdly, the personalized federal learning method based on the self-adaptive clustering hierarchy solves the problem of data statistics heterogeneity in federal learning and the problem of unbalanced performance of a global model and a local model in personalized federal learning, and achieves the technical effect of greatly improving the personalized performance of a local model of a client on the premise of not damaging global generalization capability.
Drawings
Fig. 1 is a schematic view of an overall process of an individualized federated learning method based on adaptive clustering hierarchy in an embodiment of the present invention.
Fig. 2 is a schematic diagram of an intra-group training process of an individualized federated learning method based on adaptive clustering hierarchy in the embodiment of the present invention.
Fig. 3 is a schematic diagram of a first-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a second-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an individualized federated learning system based on adaptive clustering hierarchy in an embodiment of the present invention.
Detailed Description
The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the invention in any way.
The embodiment discloses an individualized federated learning method based on self-adaptive clustering hierarchy, which comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
In another aspect, an embodiment of the present invention provides an adaptive clustering hierarchy-based personalized federal learning system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the foregoing adaptive clustering hierarchy-based personalized federal learning method when executing the program.
The embodiment provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, solves the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and achieves the technical effect of greatly improving the individualized performance of a local model of a client by integrating a client clustering method and an adaptive hierarchical fusion algorithm, performing clustering grouping and adaptive hierarchical fusion by utilizing performance feedback of the client, flexibly making an individualized strategy for a specific federated learning task and achieving the purpose of greatly improving the individualized performance of the local model of the client on the premise of not damaging global generalization capability.
As shown in fig. 1, the embodiment provides a personalized federal learning method based on adaptive clustering hierarchy, which specifically includes two stages, including the following steps:
the first stage, the schematic diagram is shown in fig. 3:
s100: all clients prepare training and testing data sets for the prediction task.
S110: global parameter server random initialization model W g Round 1 parameter of
Figure SMS_71
S200: global parameter Server models W g Parameter of the t-th round
Figure SMS_72
Issuing the data to K clients participating in federal learning training; t E [1,T pre ]Wherein, T pre To specify a communication round. />
S210: at each receiving model W g Parameter of the tth round
Figure SMS_73
The following training steps are performed in parallel on the client:
client side model W g Parameter of the t-th round
Figure SMS_74
As initial model parameter, is recorded as>
Figure SMS_75
A tth round of initial model parameters representing a kth client local model;
based on initial model parameters
Figure SMS_76
And a training data set of N samples taken randomly from the raw data held by the client>
Figure SMS_77
The client trains and optimizes a local model, and E-round local iteration is carried out by using a Stochastic Gradient Descent (SGD) method to obtain an optimized model parameter ^ whether or not>
Figure SMS_78
S220: client-side usage optimized model parameters
Figure SMS_79
For test data sets>
Figure SMS_80
Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
Figure SMS_81
S230: client will gradient g k And sending the data to a global parameter server.
S240: global parameter Server from client training data set
Figure SMS_82
Number of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k (ii) a And then, carrying out weighted aggregation on the gradients of all K clients participating in the Federal learning training by using a Federal averaging algorithm FedAvg to obtain the model parameter ^ on the t +1 round>
Figure SMS_83
The specific calculation formula is as follows:
Figure SMS_84
s250: and judging whether the iteration times reach the specified communication turns, if so, executing S300, otherwise, continuing to execute S200-S250.
S300: parameter server according to Tth pre Gradient g uploaded by all clients in turn k } k∈K Calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; wherein
Figure SMS_85
ρ i,j =S C (i,j),
S C (i,j)=(g i ·g j )/(||g i ||·|g j ||) (2);
Based on the similarity matrix rho, clustering K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groups
Figure SMS_86
Selecting a group server for each client group to coordinate the training of clients in the group; each client group server replicates a copy of the global parameter server model W g T th (a) pre Wheel parameter->
Figure SMS_87
As a client group server model W m Is greater than or equal to>
Figure SMS_88
Where M is {1,2, …, M }.
Further, in each client group, the group server performs the following calculations:
computing an intra-group average gradient D from the gradients of the intra-group clients m
Figure SMS_89
Average gradient
Figure SMS_90
Spread out by the model parameter layer, expressed as->
Figure SMS_91
Wherein l is the total number of model parameter layers;
to average gradient
Figure SMS_92
Calculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta m The calculation formula is as follows:
Figure SMS_93
defining a hyper-parameter beta for adjusting the degree of personalization, delta m Normalized and multiplied by beta to obtainPersonalized model weight psi to intra-group hierarchy m The specific calculation formula is as follows:
ψ m =β·δ m /max(δ m ) (5)。
in the second stage, the schematic diagram is shown in fig. 4:
s400: global parameter Server models W g Parameter of the tth round
Figure SMS_94
Sending to M customer group servers; te (T) pre ,T total ) Wherein, T pre ,T total To designate a communication turn.
S410: as shown in fig. 2, the following steps are performed in parallel on each client group server:
s411: the group server receives the global model W sent by the parameter server g Parameter of the tth round
Figure SMS_95
S412: the group server sends the t round global model W g Parameter (d) of
Figure SMS_96
And group model W m Is greater than or equal to>
Figure SMS_97
Personalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated to->
Figure SMS_98
The weighted fusion process first withholds>
Figure SMS_99
And &>
Figure SMS_100
Decomposition into->
Figure SMS_101
And &>
Figure SMS_102
And then fusing the parameters of each layer, wherein the specific formula is as follows:
Figure SMS_103
Figure SMS_104
wherein
Figure SMS_105
Represents->
Figure SMS_106
Parameter of the nth layer->
Figure SMS_107
Represents->
Figure SMS_108
The parameter of the nth layer in the drawing, n is epsilon {1,2, …, l }.
S413: group server will group model W m Parameter (d) of
Figure SMS_109
Sending to the client in the group;
at each receiving group model W m Parameter of the tth round
Figure SMS_110
The following training steps are performed in parallel on the client:
s414: client side group model W m Parameter of the tth round
Figure SMS_111
As initial model parameter, is recorded as->
Figure SMS_112
The initial model parameters of the t-th round representing the k-th client local model.
Based on initial model parameters
Figure SMS_113
And a training data set of N samples randomly drawn from the raw data held by the client>
Figure SMS_114
The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method SGD to obtain an optimized model parameter->
Figure SMS_115
S415: client usage optimized model parameters
Figure SMS_116
For the test data set->
Figure SMS_117
Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
Figure SMS_118
S416: the client will gradient g k To the corresponding group server.
S417: each group server training data set according to clients in the group
Figure SMS_119
Number of samples n in (1) k Calculating the proportion of weights taken by client k in a group->
Figure SMS_120
Then the gradient of the client in the aggregation group is weighted by using Federal averaging algorithm FedAvg, and a group server model W is updated m Is greater than or equal to>
Figure SMS_121
The specific calculation formula is as follows:
Figure SMS_122
s418: and judging whether the iteration number reaches the specified communication turn, if so, executing S420, and otherwise, continuing to execute S413-418.
S420: group server model W m Updated parameters
Figure SMS_123
Is recorded as +>
Figure SMS_124
Group server model W m Is greater than or equal to>
Figure SMS_125
And sending the data to a global parameter server.
S430: the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group servers
Figure SMS_126
Obtain a model W g The parameter of the t +1 round>
Figure SMS_127
The specific calculation formula is as follows:
Figure SMS_128
/>
s440: judging whether the model converges or whether the iteration times reach the specified communication turns, if a certain condition is met, finishing the training, and testing the data set by all the clients by using the corresponding group models of the clients
Figure SMS_129
Testing is carried out; otherwise, execution continues with S400 to S440.
In S210 and S414, the initial model parameters are used
Figure SMS_130
And a training data set->
Figure SMS_131
Training and optimizing a local model of a client to obtain optimized initial model parameters->
Figure SMS_132
The specific calculation formula is as follows:
Figure SMS_133
Figure SMS_134
wherein, the first and the second end of the pipe are connected with each other,
Figure SMS_137
representing the sampled training data set pick>
Figure SMS_138
Is selected, based on the number of samples in (1), based on the number of samples in (4)>
Figure SMS_140
For the penalty value, x and y represent the characteristic and corresponding label, respectively, of a single sample in the dataset, and ` H `>
Figure SMS_136
Represents the output result of the model->
Figure SMS_139
A loss from the true value y, eta represents the learning rate, r>
Figure SMS_141
Represents->
Figure SMS_142
For->
Figure SMS_135
Of the gradient of (a).
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
due to the integration of the customer clustering method, the embodiment can group the customers according to the similarity of data distribution on the premise of not acquiring the real data of the customers, and the customers can add in the system of the embodiment to train and reason the model; due to the integration of the self-adaptive hierarchical fusion scheme, a client can obtain a personalized model which is most suitable for the client through the embodiment, and meanwhile, the embodiment maintains a global model with good generalization performance so as to facilitate the addition or use of a new client in the personalized federal learning system. The method and the device solve the problem of data statistics heterogeneity in federated learning and the problem of performance imbalance of the global model and the local model in personalized federated learning, and achieve the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging global generalization capability.
The electronic device according to an embodiment of the present application is described below with reference to fig. 5, and based on the same inventive concept as that of the personalized federal learning method based on adaptive cluster hierarchy in the foregoing embodiment, an embodiment of the present application further provides a personalized federal learning system based on adaptive cluster hierarchy, including: a processor coupled to a memory, the memory for storing a program that, when executed by the processor, causes a system to perform the method of any of the first aspects.
The electronic device 300 includes: processor 302, communication interface 303, memory 301. Optionally, the electronic device 300 may also include a bus architecture 304. Wherein, the communication interface 303, the processor 302 and the memory 301 may be connected to each other through a bus architecture 304; the bus architecture 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus architecture 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Processor 302 may be a CPU, microprocessor, ASIC, or one or more integrated circuits for controlling the execution of programs in accordance with the teachings of the present application.
Communication interface 303, using any transceiver or like device, is used to communicate with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), a wired access network, etc.
The memory 301 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable read-only memory (EEPROM), a compact-read-only-memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor through a bus architecture 304. The memory may also be integral to the processor. The memory 301 is used for storing computer-executable instructions for implementing the present application, and is controlled by the processor 302 to execute. The processor 302 is configured to execute the computer-executable instructions stored in the memory 301, so as to implement the personalized federal learning method based on adaptive cluster hierarchy provided in the foregoing embodiments of the present application.
The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (9)

1. An individualized federated learning method based on adaptive clustering hierarchy is characterized in that the individualized federated learning method comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received parameters of the latest intra-group model sent by all the client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
2. The individualized federated learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S2, the global parameter server issues global model parameters to the client, and the client is made to use the received global model parameters as initial parameters of the local model, and uses the local training data set to perform the current round of training on the model, and uses the test data set to evaluate the model prediction effect after the training is completed, and calculates the gradient, and the process of uploading the calculated gradient to the parameter server includes the following steps:
s21, the global parameter server enables the model W g Parameter of the tth round
Figure FDA0003849405650000011
Issuing the data to K clients participating in federal learning training; t epsilon [1,T pre ]Wherein, T pre For the first phase maximum communication round, round 1 parameter
Figure FDA0003849405650000012
Obtaining a random initialization model by a global parameter server;
s22, receiving the model W g Parameter of the tth round
Figure FDA0003849405650000013
The following training steps are performed in parallel on the client:
s221, the client side enables the model W g Parameter of the tth round
Figure FDA0003849405650000014
As initial model parameters, are recorded
Figure FDA0003849405650000015
A tth round of initial model parameters representing a kth client local model;
s222, based on the initial model parameters
Figure FDA0003849405650000021
And a training data set consisting of N samples randomly extracted from the original data held by the client
Figure FDA0003849405650000022
The client trains and optimizes the local model, and E-round local iteration is performed by using a random gradient descent method to obtain an optimized modelForm parameter
Figure FDA0003849405650000023
S223, the client uses the optimized model parameters
Figure FDA0003849405650000024
For test data sets
Figure FDA0003849405650000025
Performing predictive reasoning, evaluating the predictive effect, and calculating the gradient
Figure FDA0003849405650000026
S224, the client side sends the gradient g k And sending the data to a global parameter server.
3. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S222, the optimized model parameters are calculated by the following formula
Figure FDA0003849405650000027
Figure FDA0003849405650000028
Figure FDA0003849405650000029
Wherein the content of the first and second substances,
Figure FDA00038494056500000210
representing a sampled training data set
Figure FDA00038494056500000211
Number of samples of (2),
Figure FDA00038494056500000212
For the loss values, x and y represent the characteristics and corresponding labels, respectively, of a single sample in the dataset,
Figure FDA00038494056500000213
representing model output results
Figure FDA00038494056500000214
And the true value y, η represents the learning rate,
Figure FDA00038494056500000215
to represent
Figure FDA00038494056500000216
For the
Figure FDA00038494056500000217
Of the gradient of (c).
4. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting global model parameters by using the average gradient obtained by calculation includes the following steps:
global parameter Server from client training data set
Figure FDA00038494056500000218
Number of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k
Adopting Federal average algorithm FedAvg to weight and aggregate gradients of all K clients participating in Federal learning training to obtain model parameters of t +1 round
Figure FDA00038494056500000219
Figure FDA00038494056500000220
5. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the last round, and performs clustering grouping on all the clients according to the calculation result, and the process of selecting a group server for each client group includes the following steps:
s41, the parameter server is according to the T pre Gradient g uploaded by all clients in turn k } k∈K By calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; wherein
Figure FDA00038494056500000221
ρ i,j =S C (i,j),S C (i,j)=(g i ·g j )/(||g i ||·|g j ||);
S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and marking the client groups as M client groups
Figure FDA0003849405650000031
S43, selecting a group server for each client group to coordinate the training of clients in the group;
s44, making the group server of each client group copy a global parameter server model W g T th (a) pre Wheel parameters
Figure FDA0003849405650000032
As a client group server model W m Parameter (d) of
Figure FDA0003849405650000033
Where M is {1,2, …, M }.
6. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 5, wherein the process of generating the personalized weight vector of the intra-group hierarchy for each client group in step S4 comprises the following steps:
calculating an average gradient within the group from the gradients of the clients within the group
Figure FDA0003849405650000034
Figure FDA0003849405650000035
Average gradient
Figure FDA0003849405650000036
Expand according to the model parameter layer, and are expressed as
Figure FDA0003849405650000037
Wherein
Figure FDA00038494056500000332
Is the total number of model parameter layers;
to average gradient
Figure FDA0003849405650000038
Calculating Euclidean distance layer by layer to obtain one
Figure FDA00038494056500000330
Dimension vector delta m
Figure FDA0003849405650000039
Defining a hyper-parameter beta for adjusting the degree of personalization, delta m After normalization, multiplying by beta to obtain personalized model weight psi of group internal hierarchy m
ψ m =β·δ m /max(δ m )。
7. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively perform intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server comprises the following steps:
s51, the global parameter server enables the model W g Parameter of the t-th round
Figure FDA00038494056500000310
Sending to M customer group servers; te (T) pre ,T total ) Wherein, T pre ,T total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;
s52, executing the following steps in parallel on each client group server:
s521, the group server receives the global model W sent by the parameter server g Parameter of the tth round
Figure FDA00038494056500000311
S522, the group server enables the t-th round global model W g Parameter (d) of
Figure FDA00038494056500000312
And group model W m Parameter (d) of
Figure FDA00038494056500000313
Personalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated to
Figure FDA00038494056500000314
The weighted fusion process is firstly
Figure FDA00038494056500000315
And
Figure FDA00038494056500000316
is decomposed into
Figure FDA00038494056500000317
And
Figure FDA00038494056500000318
and then fusing the parameters of each layer, wherein the specific formula is as follows:
Figure FDA00038494056500000319
Figure FDA00038494056500000320
wherein
Figure FDA00038494056500000321
To represent
Figure FDA00038494056500000322
The parameters of the n-th layer in (1),
Figure FDA00038494056500000323
to represent
Figure FDA00038494056500000324
The parameters of the n-th layer in (1),
Figure FDA00038494056500000331
s523, the group server combines the group model W m Parameter (d) of
Figure FDA00038494056500000325
Sending to the client in the group;
s53, in each receiving group model W m Parameter of the tth round
Figure FDA00038494056500000326
The following training steps are performed in parallel on the client:
s531, the client side makes the group model W m Parameter of the t-th round
Figure FDA00038494056500000327
As initial model parameters, are recorded
Figure FDA00038494056500000328
The initial model parameters of the kth round representing the local model of the kth client;
s532, based on the initial model parameters
Figure FDA00038494056500000329
And a training data set consisting of N samples randomly extracted from the original data held by the client
Figure FDA0003849405650000041
The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent (SGD) method to obtain optimized model parameters
Figure FDA0003849405650000042
S534, the client uses the optimized model parameters
Figure FDA0003849405650000043
For test data sets
Figure FDA0003849405650000044
Performing predictive reasoning, evaluating the predictive effect, and calculating the gradient
Figure FDA0003849405650000045
S535, the client sends the gradient g k Sending the information to a corresponding group server;
s54, each group server trains the data set according to the clients in the group
Figure FDA0003849405650000046
Number of samples n k Calculating the weight proportion of the client k in the group
Figure FDA0003849405650000047
Updating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group m Parameter (d) of
Figure FDA0003849405650000048
The specific calculation formula is as follows:
Figure FDA0003849405650000049
s55, judging whether the iteration frequency reaches the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;
s56, the group server model W m Updated parameters
Figure FDA00038494056500000410
Is marked as
Figure FDA00038494056500000411
Group server model W m Parameter (d) of
Figure FDA00038494056500000412
And sending the data to a global parameter server.
8. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 7, wherein in step S5, the process of the parameter server aggregating the received latest intra-group model parameters sent by all client groups by weighted average to obtain a new global model comprises the following steps:
the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group servers
Figure FDA00038494056500000413
Obtain a model W g Parameters of the t +1 round
Figure FDA00038494056500000414
The specific calculation formula is as follows:
Figure FDA00038494056500000415
9. an adaptive clustering hierarchy based personalized federal learning system, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor, when executing the program, implements the steps of the adaptive clustering hierarchy based personalized federal learning method as claimed in any one of claims 1 to 8.
CN202211129262.9A 2022-09-16 2022-09-16 Personalized federal learning method and system based on self-adaptive clustering layering Pending CN115840900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211129262.9A CN115840900A (en) 2022-09-16 2022-09-16 Personalized federal learning method and system based on self-adaptive clustering layering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211129262.9A CN115840900A (en) 2022-09-16 2022-09-16 Personalized federal learning method and system based on self-adaptive clustering layering

Publications (1)

Publication Number Publication Date
CN115840900A true CN115840900A (en) 2023-03-24

Family

ID=85574918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211129262.9A Pending CN115840900A (en) 2022-09-16 2022-09-16 Personalized federal learning method and system based on self-adaptive clustering layering

Country Status (1)

Country Link
CN (1) CN115840900A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049862A (en) * 2023-03-13 2023-05-02 杭州海康威视数字技术股份有限公司 Data protection method, device and system based on asynchronous packet federation learning
CN116596065A (en) * 2023-07-12 2023-08-15 支付宝(杭州)信息技术有限公司 Gradient calculation method and device, storage medium, product and electronic equipment
CN117010484A (en) * 2023-10-07 2023-11-07 之江实验室 Personalized federal learning generalization method, device and application based on attention mechanism
CN117057442A (en) * 2023-10-09 2023-11-14 之江实验室 Model training method, device and equipment based on federal multitask learning
CN117216596A (en) * 2023-08-16 2023-12-12 中国人民解放军总医院 Federal learning optimization communication method, system and storage medium based on gradient clustering
CN117350373A (en) * 2023-11-30 2024-01-05 艾迪恩(山东)科技有限公司 Personalized federal aggregation algorithm based on local self-attention mechanism

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049862A (en) * 2023-03-13 2023-05-02 杭州海康威视数字技术股份有限公司 Data protection method, device and system based on asynchronous packet federation learning
CN116596065A (en) * 2023-07-12 2023-08-15 支付宝(杭州)信息技术有限公司 Gradient calculation method and device, storage medium, product and electronic equipment
CN116596065B (en) * 2023-07-12 2023-11-28 支付宝(杭州)信息技术有限公司 Gradient calculation method and device, storage medium, product and electronic equipment
CN117216596A (en) * 2023-08-16 2023-12-12 中国人民解放军总医院 Federal learning optimization communication method, system and storage medium based on gradient clustering
CN117216596B (en) * 2023-08-16 2024-04-30 中国人民解放军总医院 Federal learning optimization communication method, system and storage medium based on gradient clustering
CN117010484A (en) * 2023-10-07 2023-11-07 之江实验室 Personalized federal learning generalization method, device and application based on attention mechanism
CN117010484B (en) * 2023-10-07 2024-01-26 之江实验室 Personalized federal learning generalization method, device and application based on attention mechanism
CN117057442A (en) * 2023-10-09 2023-11-14 之江实验室 Model training method, device and equipment based on federal multitask learning
CN117350373A (en) * 2023-11-30 2024-01-05 艾迪恩(山东)科技有限公司 Personalized federal aggregation algorithm based on local self-attention mechanism
CN117350373B (en) * 2023-11-30 2024-03-01 艾迪恩(山东)科技有限公司 Personalized federal aggregation algorithm based on local self-attention mechanism

Similar Documents

Publication Publication Date Title
CN115840900A (en) Personalized federal learning method and system based on self-adaptive clustering layering
CN113191484A (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
CN113326731A (en) Cross-domain pedestrian re-identification algorithm based on momentum network guidance
CN110263236B (en) Social network user multi-label classification method based on dynamic multi-view learning model
CN107194672B (en) Review distribution method integrating academic expertise and social network
CN112380433A (en) Recommendation meta-learning method for cold-start user
CN111814963B (en) Image recognition method based on deep neural network model parameter modulation
CN111353534B (en) Graph data category prediction method based on adaptive fractional order gradient
CN116503676A (en) Picture classification method and system based on knowledge distillation small sample increment learning
CN117391247A (en) Enterprise risk level prediction method and system based on deep learning
CN116226689A (en) Power distribution network typical operation scene generation method based on Gaussian mixture model
CN115359298A (en) Sparse neural network-based federal meta-learning image classification method
CN111475158A (en) Sub-domain dividing method and device, electronic equipment and computer readable storage medium
CN111192158A (en) Transformer substation daily load curve similarity matching method based on deep learning
Yang Combination forecast of economic chaos based on improved genetic algorithm
CN117078312B (en) Advertisement putting management method and system based on artificial intelligence
CN111353525A (en) Modeling and missing value filling method for unbalanced incomplete data set
CN116415177A (en) Classifier parameter identification method based on extreme learning machine
CN115690476A (en) Automatic data clustering method based on improved harmony search algorithm
CN108268898A (en) A kind of electronic invoice user clustering method based on K-Means
CN111027709B (en) Information recommendation method and device, server and storage medium
CN113377884A (en) Event corpus purification method based on multi-agent reinforcement learning
CN111814190A (en) Privacy protection method based on differential privacy distributed deep learning optimization
Khotimah et al. Adaptive SOMMI (Self Organizing Map Multiple Imputation) base on Variation Weight for Incomplete Data
CN117557870B (en) Classification model training method and system based on federal learning client selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination