CN115840900A - Personalized federal learning method and system based on self-adaptive clustering layering - Google Patents
Personalized federal learning method and system based on self-adaptive clustering layering Download PDFInfo
- Publication number
- CN115840900A CN115840900A CN202211129262.9A CN202211129262A CN115840900A CN 115840900 A CN115840900 A CN 115840900A CN 202211129262 A CN202211129262 A CN 202211129262A CN 115840900 A CN115840900 A CN 115840900A
- Authority
- CN
- China
- Prior art keywords
- model
- group
- client
- parameter
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an individualized federated learning method and an individualized federated learning system based on self-adaptive clustering layering, wherein the method comprises the following steps: carrying out weighted average processing on the gradients of all clients of the parameter server, and adjusting global model parameters by adopting the average gradient obtained by calculation; calculating the similarity between the clients according to the gradients uploaded by all the clients in the latest round, clustering and grouping all the clients according to the calculation result, and generating the individualized weight vectors of the layers in the group; the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized federal learning training in the groups, and the latest intra-group model parameters are uploaded to the parameter server; and the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all the client groups to obtain a new global model. The invention achieves the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging the global generalization capability.
Description
Technical Field
The invention belongs to the technical field of distributed machine learning, and particularly relates to an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy.
Background
With the coming of the cloud era and the popularization of edge equipment (such as smart phones, smart wearable equipment and the like), data are constantly generated and even increase in a blowout manner. These rich data provide great opportunities for machine learning applications, such as speech recognition, computer vision, where deep neural networks can efficiently extract the desired information with a large amount of training data. However, as data privacy is more and more concerned in all social circles, a great deal of data generated in edge devices or organizations (such as hospitals, companies and court yards) cannot be collected in a central server, which brings great challenges to deep learning.
Federal learning is a deep learning framework, and clients can cooperatively train a shared global model to process their data under the coordination of a central server, and meanwhile, the privatization of the data is kept, and the risk of systematic privacy and the communication cost are reduced. Most of the existing training methods are federally averaged variants, and the traditional federal learning focuses on obtaining a high-quality and universal global model by learning local data of participating clients. However, in the presence of statistical heterogeneity of data (e.g., non-independent co-distributed and unbalanced data), federated learning has difficulty training out a single model that is applicable to all customers. Optimizing the global model individually may result in poor performance of the local model because the global model does not fit every client. This problem is further exacerbated as the difference between local data of different customers becomes larger and larger.
To alleviate the problem of data statistical heterogeneity that degrades federal learning performance, personalized federal learning becomes a solution. Personalized federal learning aims at training unique personalized models for different customers, which can combine the generalization characteristics of global models and the distribution matching characteristics of local models, but the challenge is how to achieve a fine balance between the specific knowledge of local models and the shared knowledge of global models. In recent years, much work has focused on two possible solutions in the search of personalized federal chemistry: cluster-based personalization and tier-based personalization. The cluster-based personalization method groups customer clusters with similar data distributions and trains a specialized model for each cluster group. The hierarchy-based personalization methods personalize some layers of the local model, while others are derived from the global model.
Although both of these approaches may improve federal learning performance through personalization, significant problems remain. Current clustering-based personalization methods are less concerned with model sharing among groups, and therefore they may affect the generalization performance of the global model. Meanwhile, the existing layer-based personalization method usually adopts a layering scheme predefined by people, and is lack of flexibility and adaptability. Therefore, they may end up with sub-optimal solutions, resulting in an imbalance in the performance of the global and local models.
The invention with the application number of 202210511356.6 provides a method and a system for federated learning, which comprises the following steps: s1, sending an initial global model to all clients, and uploading the initial local model to a central service system by the clients; s2, clustering the clients according to the initial local model uploaded by the clients to obtain more than one client class; s3, performing multiple rounds of iterative training on the global model until an iterative stopping condition is reached, and performing the t-th round of iterative training: selecting at least one client from each client class to participate in training; judging whether gradient conflicts exist between the clients participating in the t-th round of iterative training or not based on the t-th round of local models returned by the clients and the t-th round of loss function values, and acquiring accumulated model differences according to gradient conflict conditions; and updating the global model of the t round by using the accumulated model difference. The method has the advantages that the reasons of model unfairness are divided into two types of external contradiction and internal contradiction to be eliminated, the representativeness and fairness of the selected client are improved, the training turns and communication cost are reduced, and convergence is accelerated. However, the present invention cannot solve the problems of unbalanced performance of the global model and the local model and poor generalization performance of the global model.
Disclosure of Invention
The technical problem to be solved is as follows: the invention provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, which solve the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and carry out clustering grouping and adaptive hierarchical fusion by integrating a client clustering method and an adaptive hierarchical fusion algorithm and utilizing performance feedback of clients, thereby flexibly establishing an individualized strategy for a specific federated learning task and achieving the technical effect of greatly improving the individualized performance of a client local model on the premise of not damaging global generalization capability.
The technical scheme is as follows:
an individualized federated learning method based on adaptive clustering hierarchy comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute the individualized federal learning training in the groups, and the obtained latest in-group model parameters are uploaded to the parameter server; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
Further, in step S2, the process of the global parameter server issuing the global model parameters to the client, making the client use the received global model parameters as initial parameters of the local model, performing the current round of training on the model by using the local training data set, evaluating the model prediction effect by using the test data set after the training is completed, calculating the gradient, and uploading the calculated gradient to the parameter server includes the following steps:
s21, the global parameter server enables the model W g Parameter of the t-th roundIssuing the data to K clients participating in federal learning training; t epsilon [1,T pre ]Wherein, T pre The round 1 parameter for the first phase maximum communication round>Obtaining a random initialization model by a global parameter server;
s22, in each receiving model W g Parameter of the tth roundThe following training steps are performed in parallel on the client:
s221, the client side enables the model W g Parameter of the tth roundAs initial model parameter, is recorded as->A tth round of initial model parameters representing a kth client local model; />
S222, based on the initial model parametersAnd a training data set of N samples randomly drawn from the raw data held by the client>The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method to obtain an optimized model parameter->
S223, the client uses the optimized model parametersFor the test data set->Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
S224, the client side enables the gradient g k And sending the data to a global parameter server.
Wherein the content of the first and second substances,representing the sampled training data set pick>Is selected, based on the number of samples in (1), based on the number of samples in (4)>For the loss values, x and y represent numbers, respectivelyCharacteristic of a single sample in the dataset and corresponding label, <' >>Representing a model output result>A loss from the true value y, eta represents the learning rate, r>Represents->For->Of the gradient of (c).
Further, in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting the global model parameter by using the calculated average gradient includes the following steps:
global parameter Server from client training data setNumber of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k ;
Adopting Federal average algorithm FedAvg to weight and aggregate gradients of all K clients participating in Federal learning training to obtain model parameters of t +1 round
Further, in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the latest round, performs cluster grouping on all the clients according to the calculation result, and selects a group server for each client group, including the following steps:
s41, the parameter server is according to the T pre Gradient { g) uploaded by all clients in turn k } k∈K By calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; whereinρ i,j =S C (i,j),S C (i,j)=(g i ·g j )/(||g i ||·|g j ||);
S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groups
S43, selecting a group server for each client group to coordinate the training of clients in the group;
s44, making the group server of each client group copy a global parameter server model W g T th of (A) pre Wheel parametersAs a client group server model W m Is greater than or equal to>Where M is for {1,2, …, M }.
Further, the process of generating a personalized weight vector for each client group layered within the group in step S4 comprises the steps of:
Average gradientSpread out in model parameter layer, expressed as->Wherein l is the total number of model parameter layers;
to average gradientCalculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta m :
Defining a hyper-parameter beta for adjusting the degree of personalization, delta m After normalization, multiplying by beta to obtain personalized model weight psi of group internal hierarchy m :
ψ m =β·δ m /max(δ m )。
Further, in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server includes the following steps:
s51, the global parameter server enables the model W g Parameter of the tth roundSending to M customer group servers; t e (T) pre ,T total ) Wherein, T pre ,T total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;
s52, the following steps are executed in parallel on each client group server:
s521, the group server receives the global model W sent by the parameter server g Parameter of the tth round
S522, the group server enables the t-th round global model W g Parameter (d) ofAnd group model W m Is greater than or equal to>Personalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated to->The weighted fusion process first withholds>And &>Is decomposed into->And &>And then fusing the parameters of each layer, wherein the specific formula is as follows:
whereinRepresents->Parameter in layer n->Represents->The parameter of the nth layer, n belongs to {1,2, …, l };
s523, the group server combines the group model W m Parameter (d) ofSending to the client in the group;
s53, in each receiving group model W m Parameter of the tth roundThe following training steps are performed in parallel on the client:
s531, the client side makes the group model W m Parameter of the tth roundAs initial model parameter, is recorded as>A tth round of initial model parameters representing a kth client local model;
s532, based on the initial model parametersAnd a training data set of N samples taken randomly from the raw data held by the client>The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method SGD to obtain an optimized model parameter->
S534, the client uses the optimized model parametersFor the test data set->Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
S535, the client sends the gradient g k Sending to a corresponding group server;
s54, each group server trains data sets according to clients in the groupNumber of samples n k Calculating the weight proportion occupied by the client k in the group>Updating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group m Is greater than or equal to>The specific calculation formula is as follows:
s55, judging whether the iteration times reach the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;
s56, the group server model W m Updated parametersIs recorded as->Group server model W m In a parameter>And sending the data to a global parameter server.
Further, in step S5, the process of the parameter server obtaining a new global model by weighted average aggregation of the received latest intra-group model parameters sent by all client groups includes the following steps:
the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group serversObtain a model W g The parameter of the t +1 round>The specific calculation formula is as follows:
the invention also discloses an individualized federated learning system based on the adaptive clustering hierarchy, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the individualized federated learning method based on the adaptive clustering hierarchy when executing the program.
Has the advantages that:
first, the personalized federal learning method based on adaptive clustering hierarchy of the invention, because of integrating the customer clustering method, can group the customer according to the similarity of the data distribution on the premise of not obtaining the true data of the customer, the customer can join the corresponding online learning system to carry on training and reasoning of the model.
Secondly, the personalized federal learning method based on the self-adaptive clustering hierarchy integrates the self-adaptive hierarchical fusion scheme, so that a customer can obtain a personalized model most suitable for the customer, and meanwhile, the method can maintain a global model with good generalization performance so as to facilitate the addition or use of a corresponding personalized federal learning system by a new customer.
Thirdly, the personalized federal learning method based on the self-adaptive clustering hierarchy solves the problem of data statistics heterogeneity in federal learning and the problem of unbalanced performance of a global model and a local model in personalized federal learning, and achieves the technical effect of greatly improving the personalized performance of a local model of a client on the premise of not damaging global generalization capability.
Drawings
Fig. 1 is a schematic view of an overall process of an individualized federated learning method based on adaptive clustering hierarchy in an embodiment of the present invention.
Fig. 2 is a schematic diagram of an intra-group training process of an individualized federated learning method based on adaptive clustering hierarchy in the embodiment of the present invention.
Fig. 3 is a schematic diagram of a first-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a second-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an individualized federated learning system based on adaptive clustering hierarchy in an embodiment of the present invention.
Detailed Description
The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the invention in any way.
The embodiment discloses an individualized federated learning method based on self-adaptive clustering hierarchy, which comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
In another aspect, an embodiment of the present invention provides an adaptive clustering hierarchy-based personalized federal learning system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the foregoing adaptive clustering hierarchy-based personalized federal learning method when executing the program.
The embodiment provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, solves the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and achieves the technical effect of greatly improving the individualized performance of a local model of a client by integrating a client clustering method and an adaptive hierarchical fusion algorithm, performing clustering grouping and adaptive hierarchical fusion by utilizing performance feedback of the client, flexibly making an individualized strategy for a specific federated learning task and achieving the purpose of greatly improving the individualized performance of the local model of the client on the premise of not damaging global generalization capability.
As shown in fig. 1, the embodiment provides a personalized federal learning method based on adaptive clustering hierarchy, which specifically includes two stages, including the following steps:
the first stage, the schematic diagram is shown in fig. 3:
s100: all clients prepare training and testing data sets for the prediction task.
S200: global parameter Server models W g Parameter of the t-th roundIssuing the data to K clients participating in federal learning training; t E [1,T pre ]Wherein, T pre To specify a communication round. />
S210: at each receiving model W g Parameter of the tth roundThe following training steps are performed in parallel on the client:
client side model W g Parameter of the t-th roundAs initial model parameter, is recorded as>A tth round of initial model parameters representing a kth client local model;
based on initial model parametersAnd a training data set of N samples taken randomly from the raw data held by the client>The client trains and optimizes a local model, and E-round local iteration is carried out by using a Stochastic Gradient Descent (SGD) method to obtain an optimized model parameter ^ whether or not>
S220: client-side usage optimized model parametersFor test data sets>Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
S230: client will gradient g k And sending the data to a global parameter server.
S240: global parameter Server from client training data setNumber of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k (ii) a And then, carrying out weighted aggregation on the gradients of all K clients participating in the Federal learning training by using a Federal averaging algorithm FedAvg to obtain the model parameter ^ on the t +1 round>The specific calculation formula is as follows:
s250: and judging whether the iteration times reach the specified communication turns, if so, executing S300, otherwise, continuing to execute S200-S250.
S300: parameter server according to Tth pre Gradient g uploaded by all clients in turn k } k∈K Calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; whereinρ i,j =S C (i,j),
S C (i,j)=(g i ·g j )/(||g i ||·|g j ||) (2);
Based on the similarity matrix rho, clustering K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groupsSelecting a group server for each client group to coordinate the training of clients in the group; each client group server replicates a copy of the global parameter server model W g T th (a) pre Wheel parameter->As a client group server model W m Is greater than or equal to>Where M is {1,2, …, M }.
Further, in each client group, the group server performs the following calculations:
computing an intra-group average gradient D from the gradients of the intra-group clients m :
Average gradientSpread out by the model parameter layer, expressed as->Wherein l is the total number of model parameter layers;
to average gradientCalculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta m The calculation formula is as follows:
defining a hyper-parameter beta for adjusting the degree of personalization, delta m Normalized and multiplied by beta to obtainPersonalized model weight psi to intra-group hierarchy m The specific calculation formula is as follows:
ψ m =β·δ m /max(δ m ) (5)。
in the second stage, the schematic diagram is shown in fig. 4:
s400: global parameter Server models W g Parameter of the tth roundSending to M customer group servers; te (T) pre ,T total ) Wherein, T pre ,T total To designate a communication turn.
S410: as shown in fig. 2, the following steps are performed in parallel on each client group server:
s411: the group server receives the global model W sent by the parameter server g Parameter of the tth round
S412: the group server sends the t round global model W g Parameter (d) ofAnd group model W m Is greater than or equal to>Personalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated to->The weighted fusion process first withholds>And &>Decomposition into->And &>And then fusing the parameters of each layer, wherein the specific formula is as follows:
whereinRepresents->Parameter of the nth layer->Represents->The parameter of the nth layer in the drawing, n is epsilon {1,2, …, l }.
at each receiving group model W m Parameter of the tth roundThe following training steps are performed in parallel on the client:
s414: client side group model W m Parameter of the tth roundAs initial model parameter, is recorded as->The initial model parameters of the t-th round representing the k-th client local model.
Based on initial model parametersAnd a training data set of N samples randomly drawn from the raw data held by the client>The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method SGD to obtain an optimized model parameter->
S415: client usage optimized model parametersFor the test data set->Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>
S416: the client will gradient g k To the corresponding group server.
S417: each group server training data set according to clients in the groupNumber of samples n in (1) k Calculating the proportion of weights taken by client k in a group->Then the gradient of the client in the aggregation group is weighted by using Federal averaging algorithm FedAvg, and a group server model W is updated m Is greater than or equal to>The specific calculation formula is as follows:
s418: and judging whether the iteration number reaches the specified communication turn, if so, executing S420, and otherwise, continuing to execute S413-418.
S420: group server model W m Updated parametersIs recorded as +>Group server model W m Is greater than or equal to>And sending the data to a global parameter server.
S430: the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group serversObtain a model W g The parameter of the t +1 round>The specific calculation formula is as follows:
s440: judging whether the model converges or whether the iteration times reach the specified communication turns, if a certain condition is met, finishing the training, and testing the data set by all the clients by using the corresponding group models of the clientsTesting is carried out; otherwise, execution continues with S400 to S440.
In S210 and S414, the initial model parameters are usedAnd a training data set->Training and optimizing a local model of a client to obtain optimized initial model parameters->The specific calculation formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,representing the sampled training data set pick>Is selected, based on the number of samples in (1), based on the number of samples in (4)>For the penalty value, x and y represent the characteristic and corresponding label, respectively, of a single sample in the dataset, and ` H `>Represents the output result of the model->A loss from the true value y, eta represents the learning rate, r>Represents->For->Of the gradient of (a).
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
due to the integration of the customer clustering method, the embodiment can group the customers according to the similarity of data distribution on the premise of not acquiring the real data of the customers, and the customers can add in the system of the embodiment to train and reason the model; due to the integration of the self-adaptive hierarchical fusion scheme, a client can obtain a personalized model which is most suitable for the client through the embodiment, and meanwhile, the embodiment maintains a global model with good generalization performance so as to facilitate the addition or use of a new client in the personalized federal learning system. The method and the device solve the problem of data statistics heterogeneity in federated learning and the problem of performance imbalance of the global model and the local model in personalized federated learning, and achieve the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging global generalization capability.
The electronic device according to an embodiment of the present application is described below with reference to fig. 5, and based on the same inventive concept as that of the personalized federal learning method based on adaptive cluster hierarchy in the foregoing embodiment, an embodiment of the present application further provides a personalized federal learning system based on adaptive cluster hierarchy, including: a processor coupled to a memory, the memory for storing a program that, when executed by the processor, causes a system to perform the method of any of the first aspects.
The electronic device 300 includes: processor 302, communication interface 303, memory 301. Optionally, the electronic device 300 may also include a bus architecture 304. Wherein, the communication interface 303, the processor 302 and the memory 301 may be connected to each other through a bus architecture 304; the bus architecture 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus architecture 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 301 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable read-only memory (EEPROM), a compact-read-only-memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor through a bus architecture 304. The memory may also be integral to the processor. The memory 301 is used for storing computer-executable instructions for implementing the present application, and is controlled by the processor 302 to execute. The processor 302 is configured to execute the computer-executable instructions stored in the memory 301, so as to implement the personalized federal learning method based on adaptive cluster hierarchy provided in the foregoing embodiments of the present application.
The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.
Claims (9)
1. An individualized federated learning method based on adaptive clustering hierarchy is characterized in that the individualized federated learning method comprises the following steps:
s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;
s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;
s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;
s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;
s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received parameters of the latest intra-group model sent by all the client groups to obtain a new global model;
and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.
2. The individualized federated learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S2, the global parameter server issues global model parameters to the client, and the client is made to use the received global model parameters as initial parameters of the local model, and uses the local training data set to perform the current round of training on the model, and uses the test data set to evaluate the model prediction effect after the training is completed, and calculates the gradient, and the process of uploading the calculated gradient to the parameter server includes the following steps:
s21, the global parameter server enables the model W g Parameter of the tth roundIssuing the data to K clients participating in federal learning training; t epsilon [1,T pre ]Wherein, T pre For the first phase maximum communication round, round 1 parameterObtaining a random initialization model by a global parameter server;
s22, receiving the model W g Parameter of the tth roundThe following training steps are performed in parallel on the client:
s221, the client side enables the model W g Parameter of the tth roundAs initial model parameters, are recordedA tth round of initial model parameters representing a kth client local model;
s222, based on the initial model parametersAnd a training data set consisting of N samples randomly extracted from the original data held by the clientThe client trains and optimizes the local model, and E-round local iteration is performed by using a random gradient descent method to obtain an optimized modelForm parameter
S223, the client uses the optimized model parametersFor test data setsPerforming predictive reasoning, evaluating the predictive effect, and calculating the gradient
S224, the client side sends the gradient g k And sending the data to a global parameter server.
3. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S222, the optimized model parameters are calculated by the following formula
Wherein the content of the first and second substances,representing a sampled training data setNumber of samples of (2),For the loss values, x and y represent the characteristics and corresponding labels, respectively, of a single sample in the dataset,representing model output resultsAnd the true value y, η represents the learning rate,to representFor theOf the gradient of (c).
4. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting global model parameters by using the average gradient obtained by calculation includes the following steps:
global parameter Server from client training data setNumber of samples n in (1) k Calculating the weight proportion gamma of the client k k =n k /∑ k∈K n k ;
Adopting Federal average algorithm FedAvg to weight and aggregate gradients of all K clients participating in Federal learning training to obtain model parameters of t +1 round
5. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the last round, and performs clustering grouping on all the clients according to the calculation result, and the process of selecting a group server for each client group includes the following steps:
s41, the parameter server is according to the T pre Gradient g uploaded by all clients in turn k } k∈K By calculating the cosine similarity S between every two client gradients C Obtaining a similarity matrix rho; whereinρ i,j =S C (i,j),S C (i,j)=(g i ·g j )/(||g i ||·|g j ||);
S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and marking the client groups as M client groups
S43, selecting a group server for each client group to coordinate the training of clients in the group;
6. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 5, wherein the process of generating the personalized weight vector of the intra-group hierarchy for each client group in step S4 comprises the following steps:
Average gradientExpand according to the model parameter layer, and are expressed asWhereinIs the total number of model parameter layers;
to average gradientCalculating Euclidean distance layer by layer to obtain oneDimension vector delta m :
Defining a hyper-parameter beta for adjusting the degree of personalization, delta m After normalization, multiplying by beta to obtain personalized model weight psi of group internal hierarchy m :
ψ m =β·δ m /max(δ m )。
7. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively perform intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server comprises the following steps:
s51, the global parameter server enables the model W g Parameter of the t-th roundSending to M customer group servers; te (T) pre ,T total ) Wherein, T pre ,T total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;
s52, executing the following steps in parallel on each client group server:
s521, the group server receives the global model W sent by the parameter server g Parameter of the tth round
S522, the group server enables the t-th round global model W g Parameter (d) ofAnd group model W m Parameter (d) ofPersonalized model weight psi using intra-group hierarchies m Weighted fusion layer by layer, and combining the model W m Is updated toThe weighted fusion process is firstlyAndis decomposed intoAndand then fusing the parameters of each layer, wherein the specific formula is as follows:
whereinTo representThe parameters of the n-th layer in (1),to representThe parameters of the n-th layer in (1),
s523, the group server combines the group model W m Parameter (d) ofSending to the client in the group;
s53, in each receiving group model W m Parameter of the tth roundThe following training steps are performed in parallel on the client:
s531, the client side makes the group model W m Parameter of the t-th roundAs initial model parameters, are recordedThe initial model parameters of the kth round representing the local model of the kth client;
s532, based on the initial model parametersAnd a training data set consisting of N samples randomly extracted from the original data held by the clientThe client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent (SGD) method to obtain optimized model parameters
S534, the client uses the optimized model parametersFor test data setsPerforming predictive reasoning, evaluating the predictive effect, and calculating the gradient
S535, the client sends the gradient g k Sending the information to a corresponding group server;
s54, each group server trains the data set according to the clients in the groupNumber of samples n k Calculating the weight proportion of the client k in the groupUpdating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group m Parameter (d) ofThe specific calculation formula is as follows:
s55, judging whether the iteration frequency reaches the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;
8. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 7, wherein in step S5, the process of the parameter server aggregating the received latest intra-group model parameters sent by all client groups by weighted average to obtain a new global model comprises the following steps:
the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group serversObtain a model W g Parameters of the t +1 roundThe specific calculation formula is as follows:
9. an adaptive clustering hierarchy based personalized federal learning system, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor, when executing the program, implements the steps of the adaptive clustering hierarchy based personalized federal learning method as claimed in any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129262.9A CN115840900A (en) | 2022-09-16 | 2022-09-16 | Personalized federal learning method and system based on self-adaptive clustering layering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211129262.9A CN115840900A (en) | 2022-09-16 | 2022-09-16 | Personalized federal learning method and system based on self-adaptive clustering layering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115840900A true CN115840900A (en) | 2023-03-24 |
Family
ID=85574918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211129262.9A Pending CN115840900A (en) | 2022-09-16 | 2022-09-16 | Personalized federal learning method and system based on self-adaptive clustering layering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115840900A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049862A (en) * | 2023-03-13 | 2023-05-02 | 杭州海康威视数字技术股份有限公司 | Data protection method, device and system based on asynchronous packet federation learning |
CN116596065A (en) * | 2023-07-12 | 2023-08-15 | 支付宝(杭州)信息技术有限公司 | Gradient calculation method and device, storage medium, product and electronic equipment |
CN117010484A (en) * | 2023-10-07 | 2023-11-07 | 之江实验室 | Personalized federal learning generalization method, device and application based on attention mechanism |
CN117057442A (en) * | 2023-10-09 | 2023-11-14 | 之江实验室 | Model training method, device and equipment based on federal multitask learning |
CN117216596A (en) * | 2023-08-16 | 2023-12-12 | 中国人民解放军总医院 | Federal learning optimization communication method, system and storage medium based on gradient clustering |
CN117350373A (en) * | 2023-11-30 | 2024-01-05 | 艾迪恩(山东)科技有限公司 | Personalized federal aggregation algorithm based on local self-attention mechanism |
-
2022
- 2022-09-16 CN CN202211129262.9A patent/CN115840900A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116049862A (en) * | 2023-03-13 | 2023-05-02 | 杭州海康威视数字技术股份有限公司 | Data protection method, device and system based on asynchronous packet federation learning |
CN116596065A (en) * | 2023-07-12 | 2023-08-15 | 支付宝(杭州)信息技术有限公司 | Gradient calculation method and device, storage medium, product and electronic equipment |
CN116596065B (en) * | 2023-07-12 | 2023-11-28 | 支付宝(杭州)信息技术有限公司 | Gradient calculation method and device, storage medium, product and electronic equipment |
CN117216596A (en) * | 2023-08-16 | 2023-12-12 | 中国人民解放军总医院 | Federal learning optimization communication method, system and storage medium based on gradient clustering |
CN117216596B (en) * | 2023-08-16 | 2024-04-30 | 中国人民解放军总医院 | Federal learning optimization communication method, system and storage medium based on gradient clustering |
CN117010484A (en) * | 2023-10-07 | 2023-11-07 | 之江实验室 | Personalized federal learning generalization method, device and application based on attention mechanism |
CN117010484B (en) * | 2023-10-07 | 2024-01-26 | 之江实验室 | Personalized federal learning generalization method, device and application based on attention mechanism |
CN117057442A (en) * | 2023-10-09 | 2023-11-14 | 之江实验室 | Model training method, device and equipment based on federal multitask learning |
CN117350373A (en) * | 2023-11-30 | 2024-01-05 | 艾迪恩(山东)科技有限公司 | Personalized federal aggregation algorithm based on local self-attention mechanism |
CN117350373B (en) * | 2023-11-30 | 2024-03-01 | 艾迪恩(山东)科技有限公司 | Personalized federal aggregation algorithm based on local self-attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115840900A (en) | Personalized federal learning method and system based on self-adaptive clustering layering | |
CN113191484A (en) | Federal learning client intelligent selection method and system based on deep reinforcement learning | |
CN113326731A (en) | Cross-domain pedestrian re-identification algorithm based on momentum network guidance | |
CN110263236B (en) | Social network user multi-label classification method based on dynamic multi-view learning model | |
CN107194672B (en) | Review distribution method integrating academic expertise and social network | |
CN112380433A (en) | Recommendation meta-learning method for cold-start user | |
CN111814963B (en) | Image recognition method based on deep neural network model parameter modulation | |
CN111353534B (en) | Graph data category prediction method based on adaptive fractional order gradient | |
CN116503676A (en) | Picture classification method and system based on knowledge distillation small sample increment learning | |
CN117391247A (en) | Enterprise risk level prediction method and system based on deep learning | |
CN116226689A (en) | Power distribution network typical operation scene generation method based on Gaussian mixture model | |
CN115359298A (en) | Sparse neural network-based federal meta-learning image classification method | |
CN111475158A (en) | Sub-domain dividing method and device, electronic equipment and computer readable storage medium | |
CN111192158A (en) | Transformer substation daily load curve similarity matching method based on deep learning | |
Yang | Combination forecast of economic chaos based on improved genetic algorithm | |
CN117078312B (en) | Advertisement putting management method and system based on artificial intelligence | |
CN111353525A (en) | Modeling and missing value filling method for unbalanced incomplete data set | |
CN116415177A (en) | Classifier parameter identification method based on extreme learning machine | |
CN115690476A (en) | Automatic data clustering method based on improved harmony search algorithm | |
CN108268898A (en) | A kind of electronic invoice user clustering method based on K-Means | |
CN111027709B (en) | Information recommendation method and device, server and storage medium | |
CN113377884A (en) | Event corpus purification method based on multi-agent reinforcement learning | |
CN111814190A (en) | Privacy protection method based on differential privacy distributed deep learning optimization | |
Khotimah et al. | Adaptive SOMMI (Self Organizing Map Multiple Imputation) base on Variation Weight for Incomplete Data | |
CN117557870B (en) | Classification model training method and system based on federal learning client selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |