CN115840900A

CN115840900A - Personalized federal learning method and system based on self-adaptive clustering layering

Info

Publication number: CN115840900A
Application number: CN202211129262.9A
Authority: CN
Inventors: 谢在鹏; 刘尧; 蒋俊辰
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-03-24

Abstract

The invention provides an individualized federated learning method and an individualized federated learning system based on self-adaptive clustering layering, wherein the method comprises the following steps: carrying out weighted average processing on the gradients of all clients of the parameter server, and adjusting global model parameters by adopting the average gradient obtained by calculation; calculating the similarity between the clients according to the gradients uploaded by all the clients in the latest round, clustering and grouping all the clients according to the calculation result, and generating the individualized weight vectors of the layers in the group; the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized federal learning training in the groups, and the latest intra-group model parameters are uploaded to the parameter server; and the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all the client groups to obtain a new global model. The invention achieves the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging the global generalization capability.

Description

Personalized federated learning method and system based on adaptive clustering layering

Technical Field

The invention belongs to the technical field of distributed machine learning, and particularly relates to an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy.

Background

With the coming of the cloud era and the popularization of edge equipment (such as smart phones, smart wearable equipment and the like), data are constantly generated and even increase in a blowout manner. These rich data provide great opportunities for machine learning applications, such as speech recognition, computer vision, where deep neural networks can efficiently extract the desired information with a large amount of training data. However, as data privacy is more and more concerned in all social circles, a great deal of data generated in edge devices or organizations (such as hospitals, companies and court yards) cannot be collected in a central server, which brings great challenges to deep learning.

Federal learning is a deep learning framework, and clients can cooperatively train a shared global model to process their data under the coordination of a central server, and meanwhile, the privatization of the data is kept, and the risk of systematic privacy and the communication cost are reduced. Most of the existing training methods are federally averaged variants, and the traditional federal learning focuses on obtaining a high-quality and universal global model by learning local data of participating clients. However, in the presence of statistical heterogeneity of data (e.g., non-independent co-distributed and unbalanced data), federated learning has difficulty training out a single model that is applicable to all customers. Optimizing the global model individually may result in poor performance of the local model because the global model does not fit every client. This problem is further exacerbated as the difference between local data of different customers becomes larger and larger.

To alleviate the problem of data statistical heterogeneity that degrades federal learning performance, personalized federal learning becomes a solution. Personalized federal learning aims at training unique personalized models for different customers, which can combine the generalization characteristics of global models and the distribution matching characteristics of local models, but the challenge is how to achieve a fine balance between the specific knowledge of local models and the shared knowledge of global models. In recent years, much work has focused on two possible solutions in the search of personalized federal chemistry: cluster-based personalization and tier-based personalization. The cluster-based personalization method groups customer clusters with similar data distributions and trains a specialized model for each cluster group. The hierarchy-based personalization methods personalize some layers of the local model, while others are derived from the global model.

Although both of these approaches may improve federal learning performance through personalization, significant problems remain. Current clustering-based personalization methods are less concerned with model sharing among groups, and therefore they may affect the generalization performance of the global model. Meanwhile, the existing layer-based personalization method usually adopts a layering scheme predefined by people, and is lack of flexibility and adaptability. Therefore, they may end up with sub-optimal solutions, resulting in an imbalance in the performance of the global and local models.

The invention with the application number of 202210511356.6 provides a method and a system for federated learning, which comprises the following steps: s1, sending an initial global model to all clients, and uploading the initial local model to a central service system by the clients; s2, clustering the clients according to the initial local model uploaded by the clients to obtain more than one client class; s3, performing multiple rounds of iterative training on the global model until an iterative stopping condition is reached, and performing the t-th round of iterative training: selecting at least one client from each client class to participate in training; judging whether gradient conflicts exist between the clients participating in the t-th round of iterative training or not based on the t-th round of local models returned by the clients and the t-th round of loss function values, and acquiring accumulated model differences according to gradient conflict conditions; and updating the global model of the t round by using the accumulated model difference. The method has the advantages that the reasons of model unfairness are divided into two types of external contradiction and internal contradiction to be eliminated, the representativeness and fairness of the selected client are improved, the training turns and communication cost are reduced, and convergence is accelerated. However, the present invention cannot solve the problems of unbalanced performance of the global model and the local model and poor generalization performance of the global model.

Disclosure of Invention

The technical problem to be solved is as follows: the invention provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, which solve the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and carry out clustering grouping and adaptive hierarchical fusion by integrating a client clustering method and an adaptive hierarchical fusion algorithm and utilizing performance feedback of clients, thereby flexibly establishing an individualized strategy for a specific federated learning task and achieving the technical effect of greatly improving the individualized performance of a client local model on the premise of not damaging global generalization capability.

The technical scheme is as follows:

an individualized federated learning method based on adaptive clustering hierarchy comprises the following steps:

s1, a client prepares a training data set and a testing data set of a prediction task, and a global parameter server randomly initializes global model parameters;

s2, the global parameter server issues global model parameters to the client, the client uses the received global model parameters as initial parameters of a local model, a local training data set is adopted to train the model in the current round, a test data set is adopted to evaluate the model prediction effect after the training is finished, the gradient is calculated, and the calculated gradient is uploaded to the parameter server; the parameter server performs weighted average processing on the received gradients of all the clients, and adjusts global model parameters by adopting the average gradients obtained by calculation;

s3, repeatedly executing the step S2 until the training round reaches the maximum communication round of the first stage, and turning to the step S4;

s4, the parameter server calculates the similarity among the clients according to the gradients uploaded by all the clients in the latest round, performs clustering grouping on all the clients according to the calculation result, selects a group server for each client group, and generates hierarchical personalized weight vectors for each client group;

s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute the individualized federal learning training in the groups, and the obtained latest in-group model parameters are uploaded to the parameter server; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;

and S6, repeatedly executing the step S5 until the training round reaches the maximum round or the model is converged, and ending the process.

Further, in step S2, the process of the global parameter server issuing the global model parameters to the client, making the client use the received global model parameters as initial parameters of the local model, performing the current round of training on the model by using the local training data set, evaluating the model prediction effect by using the test data set after the training is completed, calculating the gradient, and uploading the calculated gradient to the parameter server includes the following steps:

s21, the global parameter server enables the model W _g Parameter of the t-th round

Issuing the data to K clients participating in federal learning training; t epsilon [1,T _pre ]Wherein, T _pre The round 1 parameter for the first phase maximum communication round>

Obtaining a random initialization model by a global parameter server;

s22, in each receiving model W _g Parameter of the tth round

The following training steps are performed in parallel on the client:

s221, the client side enables the model W _g Parameter of the tth round

As initial model parameter, is recorded as->

A tth round of initial model parameters representing a kth client local model; />

S222, based on the initial model parameters

And a training data set of N samples randomly drawn from the raw data held by the client>

The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method to obtain an optimized model parameter->

S223, the client uses the optimized model parameters

For the test data set->

Performing predictive reasoning, evaluating the predictive effect, and calculating a gradient @>

S224, the client side enables the gradient g _k And sending the data to a global parameter server.

Further, in step S222, the optimized model parameters are calculated by using the following formula

Wherein the content of the first and second substances,

representing the sampled training data set pick>

Is selected, based on the number of samples in (1), based on the number of samples in (4)>

For the loss values, x and y represent numbers, respectivelyCharacteristic of a single sample in the dataset and corresponding label, <' >>

Representing a model output result>

A loss from the true value y, eta represents the learning rate, r>

Represents->

For->

Of the gradient of (c).

Further, in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting the global model parameter by using the calculated average gradient includes the following steps:

global parameter Server from client training data set

Number of samples n in (1) _k Calculating the weight proportion gamma of the client k _k ＝n _k /∑ _k∈K n _k ；

Adopting Federal average algorithm FedAvg to weight and aggregate gradients of all K clients participating in Federal learning training to obtain model parameters of t +1 round

Further, in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the latest round, performs cluster grouping on all the clients according to the calculation result, and selects a group server for each client group, including the following steps:

s41, the parameter server is according to the T _pre Gradient { g) uploaded by all clients in turn _k } _k∈K By calculating the cosine similarity S between every two client gradients _C Obtaining a similarity matrix rho; wherein

ρ _i,j ＝S _C (i,j)，S _C (i,j)＝(g _i ·g _j )/(||g _i ||·|g _j ||)；

S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groups

S43, selecting a group server for each client group to coordinate the training of clients in the group;

s44, making the group server of each client group copy a global parameter server model W _g T th of (A) _pre Wheel parameters

As a client group server model W _m Is greater than or equal to>

Where M is for {1,2, …, M }.

Further, the process of generating a personalized weight vector for each client group layered within the group in step S4 comprises the steps of:

calculating an average gradient within the group from the gradients of the clients within the group

Average gradient

Spread out in model parameter layer, expressed as->

Wherein l is the total number of model parameter layers;

to average gradient

Calculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta _m ：

Defining a hyper-parameter beta for adjusting the degree of personalization, delta _m After normalization, multiplying by beta to obtain personalized model weight psi of group internal hierarchy _m ：

ψ _m ＝β·δ _m /max(δ _m )。

Further, in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server includes the following steps:

s51, the global parameter server enables the model W _g Parameter of the tth round

Sending to M customer group servers; t e (T) _pre ,T _total ) Wherein, T _pre ,T _total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;

s52, the following steps are executed in parallel on each client group server:

s521, the group server receives the global model W sent by the parameter server _g Parameter of the tth round

S522, the group server enables the t-th round global model W _g Parameter (d) of

And group model W _m Is greater than or equal to>

Personalized model weight psi using intra-group hierarchies _m Weighted fusion layer by layer, and combining the model W _m Is updated to->

The weighted fusion process first withholds>

And &>

Is decomposed into->

And &>

And then fusing the parameters of each layer, wherein the specific formula is as follows:

wherein

Represents->

Parameter in layer n->

Represents->

The parameter of the nth layer, n belongs to {1,2, …, l };

s523, the group server combines the group model W _m Parameter (d) of

Sending to the client in the group;

s53, in each receiving group model W _m Parameter of the tth round

The following training steps are performed in parallel on the client:

s531, the client side makes the group model W _m Parameter of the tth round

As initial model parameter, is recorded as>

A tth round of initial model parameters representing a kth client local model;

s532, based on the initial model parameters

And a training data set of N samples taken randomly from the raw data held by the client>

The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent method SGD to obtain an optimized model parameter->

S534, the client uses the optimized model parameters

For the test data set->

S535, the client sends the gradient g _k Sending to a corresponding group server;

s54, each group server trains data sets according to clients in the group

Number of samples n _k Calculating the weight proportion occupied by the client k in the group>

Updating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group _m Is greater than or equal to>

The specific calculation formula is as follows:

s55, judging whether the iteration times reach the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;

s56, the group server model W _m Updated parameters

Is recorded as->

Group server model W _m In a parameter>

And sending the data to a global parameter server.

Further, in step S5, the process of the parameter server obtaining a new global model by weighted average aggregation of the received latest intra-group model parameters sent by all client groups includes the following steps:

the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group servers

Obtain a model W _g The parameter of the t +1 round>

The specific calculation formula is as follows:

the invention also discloses an individualized federated learning system based on the adaptive clustering hierarchy, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the steps of the individualized federated learning method based on the adaptive clustering hierarchy when executing the program.

Has the advantages that:

first, the personalized federal learning method based on adaptive clustering hierarchy of the invention, because of integrating the customer clustering method, can group the customer according to the similarity of the data distribution on the premise of not obtaining the true data of the customer, the customer can join the corresponding online learning system to carry on training and reasoning of the model.

Secondly, the personalized federal learning method based on the self-adaptive clustering hierarchy integrates the self-adaptive hierarchical fusion scheme, so that a customer can obtain a personalized model most suitable for the customer, and meanwhile, the method can maintain a global model with good generalization performance so as to facilitate the addition or use of a corresponding personalized federal learning system by a new customer.

Thirdly, the personalized federal learning method based on the self-adaptive clustering hierarchy solves the problem of data statistics heterogeneity in federal learning and the problem of unbalanced performance of a global model and a local model in personalized federal learning, and achieves the technical effect of greatly improving the personalized performance of a local model of a client on the premise of not damaging global generalization capability.

Drawings

Fig. 1 is a schematic view of an overall process of an individualized federated learning method based on adaptive clustering hierarchy in an embodiment of the present invention.

Fig. 2 is a schematic diagram of an intra-group training process of an individualized federated learning method based on adaptive clustering hierarchy in the embodiment of the present invention.

Fig. 3 is a schematic diagram of a first-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a second-stage principle of an individualized federated learning method based on adaptive clustering hierarchy according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an individualized federated learning system based on adaptive clustering hierarchy in an embodiment of the present invention.

Detailed Description

The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the invention in any way.

The embodiment discloses an individualized federated learning method based on self-adaptive clustering hierarchy, which comprises the following steps:

s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received latest intra-group model parameters sent by all client groups to obtain a new global model;

In another aspect, an embodiment of the present invention provides an adaptive clustering hierarchy-based personalized federal learning system, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the foregoing adaptive clustering hierarchy-based personalized federal learning method when executing the program.

The embodiment provides an individualized federated learning method and an individualized federated learning system based on adaptive clustering hierarchy, solves the problem of data statistics heterogeneity in federated learning and the problem of unbalanced performance of a global model and a local model in individualized federated learning, and achieves the technical effect of greatly improving the individualized performance of a local model of a client by integrating a client clustering method and an adaptive hierarchical fusion algorithm, performing clustering grouping and adaptive hierarchical fusion by utilizing performance feedback of the client, flexibly making an individualized strategy for a specific federated learning task and achieving the purpose of greatly improving the individualized performance of the local model of the client on the premise of not damaging global generalization capability.

As shown in fig. 1, the embodiment provides a personalized federal learning method based on adaptive clustering hierarchy, which specifically includes two stages, including the following steps:

the first stage, the schematic diagram is shown in fig. 3:

s100: all clients prepare training and testing data sets for the prediction task.

S110: global parameter server random initialization model W _g Round 1 parameter of

S200: global parameter Server models W _g Parameter of the t-th round

Issuing the data to K clients participating in federal learning training; t E [1,T _pre ]Wherein, T _pre To specify a communication round. />

S210: at each receiving model W _g Parameter of the tth round

The following training steps are performed in parallel on the client:

client side model W _g Parameter of the t-th round

As initial model parameter, is recorded as>

A tth round of initial model parameters representing a kth client local model;

based on initial model parameters

The client trains and optimizes a local model, and E-round local iteration is carried out by using a Stochastic Gradient Descent (SGD) method to obtain an optimized model parameter ^ whether or not>

S220: client-side usage optimized model parameters

For test data sets>

S230: client will gradient g _k And sending the data to a global parameter server.

S240: global parameter Server from client training data set

Number of samples n in (1) _k Calculating the weight proportion gamma of the client k _k ＝n _k /∑ _k∈K n _k (ii) a And then, carrying out weighted aggregation on the gradients of all K clients participating in the Federal learning training by using a Federal averaging algorithm FedAvg to obtain the model parameter ^ on the t +1 round>

The specific calculation formula is as follows:

s250: and judging whether the iteration times reach the specified communication turns, if so, executing S300, otherwise, continuing to execute S200-S250.

S300: parameter server according to Tth _pre Gradient g uploaded by all clients in turn _k } _k∈K Calculating the cosine similarity S between every two client gradients _C Obtaining a similarity matrix rho; wherein

ρ _i,j ＝S _C (i,j)，

S _C (i,j)＝(g _i ·g _j )/(||g _i ||·|g _j ||) (2)；

Based on the similarity matrix rho, clustering K clients into M client groups by using a top-down hierarchical clustering algorithm, and recording the client groups as M client groups

Selecting a group server for each client group to coordinate the training of clients in the group; each client group server replicates a copy of the global parameter server model W _g T th (a) _pre Wheel parameter->

As a client group server model W _m Is greater than or equal to>

Where M is {1,2, …, M }.

Further, in each client group, the group server performs the following calculations:

computing an intra-group average gradient D from the gradients of the intra-group clients _m ：

Average gradient

Spread out by the model parameter layer, expressed as->

Wherein l is the total number of model parameter layers;

to average gradient

Calculating Euclidean distance layer by layer to obtain a 1 × l dimension vector delta _m The calculation formula is as follows:

defining a hyper-parameter beta for adjusting the degree of personalization, delta _m Normalized and multiplied by beta to obtainPersonalized model weight psi to intra-group hierarchy _m The specific calculation formula is as follows:

ψ _m ＝β·δ _m /max(δ _m ) (5)。

in the second stage, the schematic diagram is shown in fig. 4:

s400: global parameter Server models W _g Parameter of the tth round

Sending to M customer group servers; te (T) _pre ,T _total ) Wherein, T _pre ,T _total To designate a communication turn.

S410: as shown in fig. 2, the following steps are performed in parallel on each client group server:

s411: the group server receives the global model W sent by the parameter server _g Parameter of the tth round

S412: the group server sends the t round global model W _g Parameter (d) of

And group model W _m Is greater than or equal to>

The weighted fusion process first withholds>

And &>

Decomposition into->

And &>

wherein

Represents->

Parameter of the nth layer->

Represents->

The parameter of the nth layer in the drawing, n is epsilon {1,2, …, l }.

S413: group server will group model W _m Parameter (d) of

Sending to the client in the group;

at each receiving group model W _m Parameter of the tth round

The following training steps are performed in parallel on the client:

s414: client side group model W _m Parameter of the tth round

As initial model parameter, is recorded as->

The initial model parameters of the t-th round representing the k-th client local model.

Based on initial model parameters

S415: client usage optimized model parameters

For the test data set->

S416: the client will gradient g _k To the corresponding group server.

S417: each group server training data set according to clients in the group

Number of samples n in (1) _k Calculating the proportion of weights taken by client k in a group->

Then the gradient of the client in the aggregation group is weighted by using Federal averaging algorithm FedAvg, and a group server model W is updated _m Is greater than or equal to>

The specific calculation formula is as follows:

s418: and judging whether the iteration number reaches the specified communication turn, if so, executing S420, and otherwise, continuing to execute S413-418.

S420: group server model W _m Updated parameters

Is recorded as +>

Group server model W _m Is greater than or equal to>

And sending the data to a global parameter server.

S430: the global parameter server uses Federal averaging algorithm FedAvg to weight and aggregate the model parameters sent by all the group servers

Obtain a model W _g The parameter of the t +1 round>

The specific calculation formula is as follows:

/>

s440: judging whether the model converges or whether the iteration times reach the specified communication turns, if a certain condition is met, finishing the training, and testing the data set by all the clients by using the corresponding group models of the clients

Testing is carried out; otherwise, execution continues with S400 to S440.

In S210 and S414, the initial model parameters are used

And a training data set->

Training and optimizing a local model of a client to obtain optimized initial model parameters->

The specific calculation formula is as follows:

wherein, the first and the second end of the pipe are connected with each other,

representing the sampled training data set pick>

For the penalty value, x and y represent the characteristic and corresponding label, respectively, of a single sample in the dataset, and ` H `>

Represents the output result of the model->

A loss from the true value y, eta represents the learning rate, r>

Represents->

For->

Of the gradient of (a).

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

due to the integration of the customer clustering method, the embodiment can group the customers according to the similarity of data distribution on the premise of not acquiring the real data of the customers, and the customers can add in the system of the embodiment to train and reason the model; due to the integration of the self-adaptive hierarchical fusion scheme, a client can obtain a personalized model which is most suitable for the client through the embodiment, and meanwhile, the embodiment maintains a global model with good generalization performance so as to facilitate the addition or use of a new client in the personalized federal learning system. The method and the device solve the problem of data statistics heterogeneity in federated learning and the problem of performance imbalance of the global model and the local model in personalized federated learning, and achieve the technical effect of greatly improving the personalized performance of the local model of the client on the premise of not damaging global generalization capability.

The electronic device according to an embodiment of the present application is described below with reference to fig. 5, and based on the same inventive concept as that of the personalized federal learning method based on adaptive cluster hierarchy in the foregoing embodiment, an embodiment of the present application further provides a personalized federal learning system based on adaptive cluster hierarchy, including: a processor coupled to a memory, the memory for storing a program that, when executed by the processor, causes a system to perform the method of any of the first aspects.

The electronic device 300 includes: processor 302, communication interface 303, memory 301. Optionally, the electronic device 300 may also include a bus architecture 304. Wherein, the communication interface 303, the processor 302 and the memory 301 may be connected to each other through a bus architecture 304; the bus architecture 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus architecture 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

Processor 302 may be a CPU, microprocessor, ASIC, or one or more integrated circuits for controlling the execution of programs in accordance with the teachings of the present application.

Communication interface 303, using any transceiver or like device, is used to communicate with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), a wired access network, etc.

The memory 301 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an electrically erasable Programmable read-only memory (EEPROM), a compact-read-only-memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor through a bus architecture 304. The memory may also be integral to the processor. The memory 301 is used for storing computer-executable instructions for implementing the present application, and is controlled by the processor 302 to execute. The processor 302 is configured to execute the computer-executable instructions stored in the memory 301, so as to implement the personalized federal learning method based on adaptive cluster hierarchy provided in the foregoing embodiments of the present application.

The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. An individualized federated learning method based on adaptive clustering hierarchy is characterized in that the individualized federated learning method comprises the following steps:

s5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively execute individualized Federal learning training in the groups, and the obtained latest intra-group model parameters are uploaded to the parameter servers; the parameter server performs weighted average aggregation on the received parameters of the latest intra-group model sent by all the client groups to obtain a new global model;

2. The individualized federated learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S2, the global parameter server issues global model parameters to the client, and the client is made to use the received global model parameters as initial parameters of the local model, and uses the local training data set to perform the current round of training on the model, and uses the test data set to evaluate the model prediction effect after the training is completed, and calculates the gradient, and the process of uploading the calculated gradient to the parameter server includes the following steps:

s21, the global parameter server enables the model W _g Parameter of the tth round

Issuing the data to K clients participating in federal learning training; t epsilon [1,T _pre ]Wherein, T _pre For the first phase maximum communication round, round 1 parameter

Obtaining a random initialization model by a global parameter server;

s22, receiving the model W _g Parameter of the tth round

The following training steps are performed in parallel on the client:

s221, the client side enables the model W _g Parameter of the tth round

As initial model parameters, are recorded

A tth round of initial model parameters representing a kth client local model;

s222, based on the initial model parameters

And a training data set consisting of N samples randomly extracted from the original data held by the client

The client trains and optimizes the local model, and E-round local iteration is performed by using a random gradient descent method to obtain an optimized modelForm parameter

S223, the client uses the optimized model parameters

For test data sets

Performing predictive reasoning, evaluating the predictive effect, and calculating the gradient

S224, the client side sends the gradient g _k And sending the data to a global parameter server.

3. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S222, the optimized model parameters are calculated by the following formula

Wherein the content of the first and second substances,

representing a sampled training data set

Number of samples of (2)，

For the loss values, x and y represent the characteristics and corresponding labels, respectively, of a single sample in the dataset,

representing model output results

And the true value y, η represents the learning rate,

to represent

For the

Of the gradient of (c).

4. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 2, wherein in step S2, the parameter server performs weighted average processing on the received gradients of all the clients, and the process of adjusting global model parameters by using the average gradient obtained by calculation includes the following steps:

global parameter Server from client training data set

5. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S4, the parameter server calculates the similarity between the clients according to the gradients uploaded by all the clients in the last round, and performs clustering grouping on all the clients according to the calculation result, and the process of selecting a group server for each client group includes the following steps:

s41, the parameter server is according to the T _pre Gradient g uploaded by all clients in turn _k } _k∈K By calculating the cosine similarity S between every two client gradients _C Obtaining a similarity matrix rho; wherein

ρ _i,j ＝S _C (i,j)，S _C (i,j)＝(g _i ·g _j )/(||g _i ||·|g _j ||)；

S42, based on the similarity matrix rho, clustering the K clients into M client groups by using a top-down hierarchical clustering algorithm, and marking the client groups as M client groups

s44, making the group server of each client group copy a global parameter server model W _g T th (a) _pre Wheel parameters

As a client group server model W _m Parameter (d) of

Where M is {1,2, …, M }.

6. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 5, wherein the process of generating the personalized weight vector of the intra-group hierarchy for each client group in step S4 comprises the following steps:

Average gradient

Expand according to the model parameter layer, and are expressed as

Wherein

Is the total number of model parameter layers;

to average gradient

Calculating Euclidean distance layer by layer to obtain one

Dimension vector delta _m ：

ψ _m ＝β·δ _m /max(δ _m )。

7. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 1, wherein in step S5, the parameter server sends the latest global model parameters to all group servers, the group servers iteratively perform intra-group personalized federal learning training, and the process of uploading the obtained latest intra-group model parameters to the parameter server comprises the following steps:

s51, the global parameter server enables the model W _g Parameter of the t-th round

Sending to M customer group servers; te (T) _pre ,T _total ) Wherein, T _pre ,T _total The maximum communication turn of the first stage and the maximum communication turn of the second stage are respectively;

s52, executing the following steps in parallel on each client group server:

And group model W _m Parameter (d) of

Personalized model weight psi using intra-group hierarchies _m Weighted fusion layer by layer, and combining the model W _m Is updated to

The weighted fusion process is firstly

And

is decomposed into

And

wherein

To represent

The parameters of the n-th layer in (1),

to represent

The parameters of the n-th layer in (1),

s523, the group server combines the group model W _m Parameter (d) of

Sending to the client in the group;

s53, in each receiving group model W _m Parameter of the tth round

The following training steps are performed in parallel on the client:

s531, the client side makes the group model W _m Parameter of the t-th round

As initial model parameters, are recorded

The initial model parameters of the kth round representing the local model of the kth client;

s532, based on the initial model parameters

The client trains and optimizes the local model, and performs E-round local iteration by using a random gradient descent (SGD) method to obtain optimized model parameters

S534, the client uses the optimized model parameters

For test data sets

S535, the client sends the gradient g _k Sending the information to a corresponding group server;

s54, each group server trains the data set according to the clients in the group

Number of samples n _k Calculating the weight proportion of the client k in the group

Updating a group server model W by using Federal averaging algorithm FedAvg to weight the gradient of clients in the aggregation group _m Parameter (d) of

The specific calculation formula is as follows:

s55, judging whether the iteration frequency reaches the maximum communication turn of the second stage, if so, executing S56, otherwise, continuing to execute S523 to S54;

s56, the group server model W _m Updated parameters

Is marked as

Group server model W _m Parameter (d) of

And sending the data to a global parameter server.

8. The personalized federal learning method based on adaptive clustering hierarchy as claimed in claim 7, wherein in step S5, the process of the parameter server aggregating the received latest intra-group model parameters sent by all client groups by weighted average to obtain a new global model comprises the following steps:

Obtain a model W _g Parameters of the t +1 round

The specific calculation formula is as follows:

9. an adaptive clustering hierarchy based personalized federal learning system, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor, when executing the program, implements the steps of the adaptive clustering hierarchy based personalized federal learning method as claimed in any one of claims 1 to 8.