CN117829307A - Federal learning method and system for data heterogeneity - Google Patents

Federal learning method and system for data heterogeneity Download PDF

Info

Publication number
CN117829307A
CN117829307A CN202311769640.4A CN202311769640A CN117829307A CN 117829307 A CN117829307 A CN 117829307A CN 202311769640 A CN202311769640 A CN 202311769640A CN 117829307 A CN117829307 A CN 117829307A
Authority
CN
China
Prior art keywords
user
data
model
cluster
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311769640.4A
Other languages
Chinese (zh)
Inventor
赵川
魏宇楠
赵圣楠
埃尔加内·阿米娜
林宇成
鞠雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Quancheng Provincial Laboratory
Original Assignee
Quancheng Provincial Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quancheng Provincial Laboratory filed Critical Quancheng Provincial Laboratory
Priority to CN202311769640.4A priority Critical patent/CN117829307A/en
Publication of CN117829307A publication Critical patent/CN117829307A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a federal learning method and a federal learning system for data isomerism, which belong to the technical field of federal learning, and the method comprises the following steps: the central server acquires the data of the user terminal, performs federal learning training to cluster the user terminal, and is connected with the user terminal through an acceleration node; the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes; the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster; and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node. The invention ensures the training precision of the model and improves the efficiency and performance of federal learning.

Description

Federal learning method and system for data heterogeneity
Technical Field
The invention relates to a federal learning method and system for data heterogeneity, and belongs to the technical field of federal learning.
Background
In recent years, with the rapid increase in the number of smart phones and smart wearable devices, a large amount of data generated by users is stored in these terminal devices having storage and data processing functions. As a new machine learning paradigm, federal learning (Federated Learning, FL) can fully utilize computing, storage and data resources of edge devices to implement collaborative model training among multiple devices while protecting user local data privacy. FL has been used in a number of application areas including predicting human behavior, emotion detection, natural language processing, and enterprise infrastructure, among others.
The McMahan et al of Google proposed the basic framework of FL for the first time and designed a parameter aggregation algorithm named FedAvg, but Non-IID (Non-independent distribution) data would affect the prediction accuracy of the global model. And as the data heterogeneity increases, the accuracy of the FedAVg will drop significantly. To address the challenges presented by non-IID data, fedDyn and SCAFFOLD each use regularization methods to estimate global knowledge of all device data distribution. However, this approach can produce large deviations when only a small number of devices are involved in each round of training. There are studies showing that momentum can be used to improve the accuracy of FL and combine it with other methods. The schemes of CMFL, oort, and Favor, among others, neutralize the effects of Non-IID and accelerate convergence by selecting a set of "excellent" devices to participate in each round of training. But these methods do not take full advantage of the valuable data stored on a small number of devices. FedRep and FedMD build different models for each device using transfer learning and knowledge distillation, respectively, which makes it difficult for new devices to select the appropriate model for initialization.
In a typical FL framework, the user equipment participating in the training first performs local training based on its own data, and then updates and uploads the model parameters to the central server. The central server aggregates model parameters from different devices in each round and then broadcasts the aggregated global model parameters to the devices. In the whole training process of the FL, the training data on each device cannot leave the local, so that the privacy of the data is protected. FL typically involves a large number of devices that typically have highly heterogeneous hardware resources (CPU, memory, and network resources) and Non-IID data. The existing FL framework can be divided into synchronous FL (e.g., fedAvg) and asynchronous FL (e.g., fedAsync). When large-scale heterogeneous equipment performs federal learning, the synchronization FL often causes free effects, and the equipment enters an idle state; asynchronous FL can avoid devices from falling into idle state, but more rounds of communication can occur between the more powerful devices and the server, possibly resulting in server crash, and the obsolete model transmitted to the server by the less powerful devices can also affect the training results of the global model. In addition, in both frames, the non-IID data stored on the device can cause significant differences in the weights of the device updates, thereby greatly affecting the training accuracy of the final model.
Disclosure of Invention
In order to solve the problems, the invention provides a federal learning method and a federal learning system for data heterogeneity, which can ensure training accuracy of a model and improve efficiency and performance of federal learning.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, a federal learning method for data isomerism provided by an embodiment of the present invention includes the following steps:
the central server acquires the data of the user terminal, performs federal learning training to cluster the user terminal, and is connected with the user terminal through an acceleration node;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
As a possible implementation manner of this embodiment, the federal learning training process includes:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
As a possible implementation manner of this embodiment, each row of the data distribution matrix of the user terminal corresponds to the number of data samples of different data labels owned by one user terminal.
As a possible implementation manner of this embodiment, the calculating the EMD distance between the user similarity matrix and the user side data distribution includes:
suppose user terminal U j With user terminal U l The probability distribution of the data is respectivelyWherein->A weight representing a certain characteristic value under a certain distribution;
define the distance matrix between P and Q as Dist= [ d ] ij ],d ij Represents p i And q j Distance between them, flow matrix f= [ F ] ij ]So that p i To q j The sum of the distances of (2) is the smallest, namely:
wherein f ij The representation is from p i To q j Is a variable amount of (a);
after finding the optimal flow matrix F, the EMD distance between the two data distributions of P and Q is:
wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to m.
As a possible implementation manner of this embodiment, the calculating the euclidean distance to obtain the cluster includes:
converting distance matrix Dist into an undirected weighted graph g=<V,E>Wherein each vertex represents a user, the weight of each edgeFor any two users U j And U l Distance between, i.e.)>
For the followingDefinition of its degree d j For the sum of the weights of all edges connected to it, i.e. +.>Calculating to obtain a degree matrix of the distance matrix Dist:
the laplacian matrix of graph G is calculated:
the point set of the k subgraphs of graph G is A 1 ,A 2 ,…,A kAnd A is 1 ∪A 2 ∪…∪A k =V;
For the followingDefining a graph cut loss function between A and B as:
the tangent pattern loss function of the k-subgraph is defined as:
for the followingDefinition of vol (A): sigma j∈A d j
Performing optimal sub-graph cutting on G by using a Normalized Cut graph method, wherein the graph cutting loss is defined as follows:
wherein the method comprises the steps ofIs A j Is introduced into the instruction matrix>
For vector h j Definition:
for the followingThe method comprises the following steps:
namely:
order theThen there are:
the optimization objective function is therefore:
wherein,is->The feature matrix is formed by the feature vectors corresponding to the first k minimum feature values according to line standardization;
for feature matrixEach row of which is considered as a sample, and k is initialized in the sample set The clustering centers E= { E 1 ,e 2 ,…,e k In each iteration, each sampleCalculating Euclidean distance between each cluster center and the cluster c with the shortest distance i Until the maximum iteration times are reached, obtaining a clustering result C= { C i |i∈[1,k]}。
As a possible implementation manner of this embodiment, the clustering-based clustering aggregation of client model gradients includes:
cluster c i User terminal U in j Model gradient W after local model training is completed j Periodically transmitted to acceleration node AN i ,AN i The model gradient of the cluster is uploaded to a central server at regular intervals, and the central server takes the distance from each user in each cluster to the cluster center as the corresponding weight of parameter aggregation to perform gradient aggregation:
wherein,representing global model gradient after the (r+1) th round of iteration, l ij Representing user U j And cluster center e i Distance L of (2) i Representing cluster c i Average distance of the user and the cluster center.
As a possible implementation manner of this embodiment, the user side includes a smart phone, a PC, or a smart wearable device.
In a second aspect, the federal learning system for data heterogeneity provided in the embodiment of the present invention includes a client, an acceleration node, and a central server,
the central server acquires user side data, performs federal learning training to cluster the user sides, and is connected with the user sides through acceleration nodes;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
As a possible implementation manner of this embodiment, the federal learning training process performed by the central server is:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
As a possible implementation manner of this embodiment, the user side includes a smart phone, a PC, or a smart wearable device.
The technical scheme of the embodiment of the invention has the following beneficial effects:
the federal learning method for data isomerism comprises the following steps: the central server acquires the data of the user terminal, performs federal learning training to cluster the user terminal, and is connected with the user terminal through an acceleration node; the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes; the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster; and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node. The invention ensures that the heterogeneity of the user side data does not influence the model training effect, and can fully utilize the computing resources on the user side equipment, thereby improving the efficiency and performance of federal learning.
Aiming at the isomerism of data among users, the method groups the users with similar data distribution based on spectral clustering and then carries out federal model training, thereby reducing the influence caused by data isomerism and effectively improving the model accuracy. In the parameter aggregation stage, the method takes the distance between the users in the cluster and the cluster center as a part of the aggregation weight consideration, and further reduces the influence caused by the user data distribution difference. The invention ensures the training precision of the model and improves the efficiency and performance of federal learning.
Drawings
FIG. 1 is a flow chart illustrating a federal learning method for data heterogeneity according to an example embodiment;
FIG. 2 is a block diagram of a federal learning system oriented to data heterogeneity, according to an example embodiment;
FIG. 3 is a diagram of a federal learning training framework, according to an exemplary embodiment;
fig. 4 is a schematic diagram illustrating a user data preprocessing process according to an exemplary embodiment.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
in order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.
As shown in fig. 1, the federal learning method for data isomerism provided by the embodiment of the invention includes the following steps:
the central server acquires the data of the user terminal, performs federal learning training to cluster the user terminal, and is connected with the user terminal through an acceleration node;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
As a possible implementation manner of this embodiment, the federal learning training process includes:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
As a possible implementation manner of this embodiment, each row of the data distribution matrix of the user terminal corresponds to the number of data samples of different data labels owned by one user terminal.
As a possible implementation manner of this embodiment, the calculating the EMD distance between the user similarity matrix and the user side data distribution includes:
suppose user terminal U j With user terminal U l The probability distribution of the data is respectivelyWherein->A weight representing a certain characteristic value under a certain distribution;
define the distance matrix between P and Q as Dist= [ d ] ij ],d ij Represents p i And q j Distance between them, flow matrix f= [ F ] ij ]So that p i To q j The sum of the distances of (2) is the smallest, namely:
wherein f ij The representation is from p i To q j Is a variable amount of (a);
after finding the optimal flow matrix F, the EMD distance between the two data distributions of P and Q is:
wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to m.
As a possible implementation manner of this embodiment, the calculating the euclidean distance to obtain the cluster includes:
converting distance matrix Dist into an undirected weighted graph g=<V,E>Wherein each vertex represents a user, the weight of each edgeFor any two users U j And U l The distance between, i.e
For the followingDefinition of its degree d j For the sum of the weights of all edges connected to it, i.e. +.>Calculating to obtain a degree matrix of the distance matrix Dist:
the laplacian matrix of graph G is calculated:
the point set of the k subgraphs of graph G is A 1 ,A 2 ,…,A kAnd A is 1 ∪A 2 ∪…∪A k =V;
For the followingDefining a graph cut loss function between A and B as:
the tangent pattern loss function of the k-subgraph is defined as:
for the followingDefinition of vol (A): sigma j∈A d j
Performing optimal sub-graph cutting on G by using a Normalized Cut graph method, wherein the graph cutting loss is defined as follows:
wherein the method comprises the steps ofIs A j Is introduced into the instruction matrix>
For vector h j Definition:
for the followingThe method comprises the following steps:
namely:
order theThen there are:
the optimization objective function is therefore:
wherein,is->The feature matrix is formed by the feature vectors corresponding to the first k minimum feature values according to line standardization;
for feature matrixEach row of which is considered as a sample, and k is initialized in the sample set The clustering centers E= { E 1 ,e 2 ,…,e k In each iteration, each sample calculates the Euclidean distance from each cluster center and is integrated into cluster c with the shortest distance i Until the maximum iteration times are reached, obtaining a clustering result C= { C i |i∈[1,k]}。
As a possible implementation manner of this embodiment, the clustering-based clustering aggregation of client model gradients includes:
cluster c i User terminal U in j Model gradient W after local model training is completed j Periodically transmitted to acceleration node AN i ,AN i The model gradient of the cluster is uploaded to a central server at regular intervals, and the central server takes the distance from each user in each cluster to the cluster center as the corresponding weight of parameter aggregation to perform gradient aggregation:
wherein,representing global model gradient after the (r+1) th round of iteration, l ij Representing user U j And cluster center e i Distance L of (2) i Representing cluster c i Middle userAverage distance of cluster centers.
As a possible implementation manner of this embodiment, the user side includes a smart phone, a PC, or a smart wearable device.
As shown in fig. 2, the federal learning system for data heterogeneity provided in the embodiment of the present invention includes a client, an acceleration node and a central server,
the central server acquires user side data, performs federal learning training to cluster the user sides, and is connected with the user sides through acceleration nodes;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
As a possible implementation manner of this embodiment, the federal learning training process performed by the central server is:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
The invention provides a federal learning framework for data heterogeneous conditions, and aims to solve the technical problem of user data isomerism in federal learning. The system model, as shown in fig. 3, mainly comprises three types of entities:
center Server (CS): the central server is a type of cloud computing equipment which is reliable, high in reliability and high in computing power and data processing power, and after receiving the user gradient sent by the acceleration nodes, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user through each acceleration node.
Acceleration Node (AN): the accelerating node is a terminal device with the functions of calculation, storage and network routing, is closer to a user than the central server, has smaller communication delay and is mainly used for reducing the communication delay between the user and the central server and the communication bandwidth of the central server. After the central server clusters the users, each acceleration node is responsible for being connected with terminal equipment in one user cluster, periodically aggregates the model gradients of the users in the group and then sends the aggregated model gradients to the central server, and simultaneously distributes global model gradients to the user equipment.
The user terminal: each user terminal has a local small data set, the data are independently and uniformly distributed, and the user terminal can communicate with a central server by using terminal equipment such as a smart phone, a PC or intelligent wearable equipment, so as to cooperatively train a more accurate and efficient machine learning model. The method comprises the steps that at each round of FL, a user side performs local model training, local model gradients are uploaded to AN AN, the user side obtains global model gradients from CS to update the local model, and then the user side performs new rounds of training by using own data until the training is completed.
/>
The federal learning training process for the system is shown in algorithm 1. Without loss of generality, it is assumed that there are m acceleration nodes, each connected to a user cluster, each cluster containing n user equipments and connected to the same acceleration node. It is assumed that the central server knows the union of all user data features, but cannot distinguish the data features owned by a single user.
1.1 data preprocessing stage
For the user devices participating in the present training, the central server first counts the data distribution of each user, as shown in FIG. 4(a) In the example shown, the MNIST data set is taken as an example, the data labels are in 10 categories, wherein each row corresponds to the number of data samples of different data labels owned by the device. Then calculate EMD distance between every two users, assuming user U j And U l Probability distribution of dataWherein the method comprises the steps ofThe weight of a certain feature value under a certain distribution is represented (here, the weight of each data feature is equal by default). Define the distance matrix between P and Q as Dist= [ d ] ij ],d ij Represents p i And q j The distance between the two flow matrixes (i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to m), and finally a flow matrix F= [ F ] is found out ij ]Can make p i To q j The sum of the distances of (2) is the smallest, namely:
wherein f ij The representation is from p i To q j Is a variable amount of (a). The problem here is due to the optimization problem of the minimum of the linear function under a linear constraint. After finding the optimal F, the EMD distance between the two data distributions is:
after the distance matrix Dist between users is obtained (shown in fig. 4 (b)), it is converted into an undirected weighted graph g=<V,E>As shown in FIG. 4 (c), where each vertex represents a user, the weight of each edgeFor any two users U j And U l Distance between, i.e.)> />
Thereafter, algorithm 2 is run forDefinition of its degree d j For the sum of the weights of all edges connected to it, i.e.Then calculating to obtain a degree matrix of the distance matrix Dist:
the laplacian matrix of graph G is further calculated:
let the point set of k subgraphs of graph G be A 1 ,A 2 ,…,A k WhereinAnd A is 1 ∪A 2 ∪…∪A k =v. For->B ε V, define the tangent pattern loss function between A and B as:
the tangent pattern loss function of the k-subgraph is defined as:
for the followingDefinition of vol (A): sigma j∈A d j . G was performed using the Normalized Cut mapping method
Cutting an optimal subgraph, and defining graph cutting loss as follows:
wherein the method comprises the steps ofIs A j Is a complement of (a). Introducing an indication matrix->For vector h j Definition: />
For the followingThe method comprises the following steps:
namely:
order theThen there are:
the optimization objective function is therefore:
wherein the method comprises the steps ofIs->The feature matrix is formed by the feature vectors corresponding to the first k minimum feature values according to line standardization. For feature matrix->Each row of which is considered as a sample, and k is initialized in the sample set as shown in algorithm 3 The clustering centers E= { E 1 ,e 2 ,…,e k In each iteration, each sample calculates the Euclidean distance from each cluster center and is integrated into cluster c with the shortest distance i Until the maximum iteration times are reached, obtaining a clustering result C= { C i |i∈[1,k]}。/>
1.2 gradient polymerization stage
Cluster c i User equipment U within j Completing local model trainingThen the model gradient W j Periodically transmitted to acceleration node AN i ,AN i The model gradient of the cluster is uploaded to CS.A weight of each model uploaded to the server by the system defaults is irrelevant to the data volume on the device uploading the model, and each local model has the same weight, so that the system gradient aggregation error caused by the size of a user data set is avoided. When the resource heterogeneity of all devices is very high, the system uses the distance from each user in each cluster to the cluster center as the corresponding weight for parameter aggregation. Gradient polymerization is as follows:
wherein the method comprises the steps ofRepresenting global model gradient after the (r+1) th round of iteration, l ij Representing user U j And cluster center e i Distance L of (2) i Representing cluster c i Average distance of the user and the cluster center. CS distributes aggregated model gradients to AN i And then distributed to each user equipment U j
Aiming at the isomerism of data among users, the method groups the users with similar data distribution based on spectral clustering and then carries out federal model training, thereby reducing the influence caused by data isomerism and effectively improving the model accuracy.
In the parameter aggregation stage, the method takes the distance between the users in the cluster and the cluster center as a part of the aggregation weight consideration, and further reduces the influence caused by the user data distribution difference.
The invention ensures the training precision of the model and improves the efficiency and performance of federal learning
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (10)

1. A federal learning method for data heterogeneity is characterized by comprising the following steps:
the central server acquires the data of the user terminal, performs federal learning training to cluster the user terminal, and is connected with the user terminal through an acceleration node;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
2. The data heterogeneity-oriented federal learning method of claim 1, wherein the federal learning training process comprises:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
3. The federal learning method for data heterogeneity according to claim 2, wherein each row of the client data distribution matrix corresponds to the number of data samples of different data labels owned by a client.
4. A federal learning method for data heterogeneity according to claim 3, wherein calculating the EMD distance between the user similarity matrix and the user side data distribution comprises:
suppose user terminal U j With user terminal U l The probability distribution of the data is respectivelyWherein->A weight representing a certain characteristic value under a certain distribution;
define the distance matrix between P and Q as Dist= [ d ] ij ],d ij Represents p i And q j Distance between them, flow matrix f= [ F ] ij ]So that p i To q j The sum of the distances of (2) is the smallest, namely:
wherein f ij The representation is from p i To q j Is a variable amount of (a);
after finding the optimal flow matrix F, the EMD distance between the two data distributions of P and Q is:
wherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to m.
5. The data heterogeneity-oriented federation learning method of claim 4, wherein the computing euclidean distance to obtain clusters comprises:
converting distance matrix Dist into an undirected weighted graph g=<V,E>Each of which is provided withThe vertices represent a user, the weight of each edgeFor any two users U j And U l Distance between, i.e.)>
For the followingDefinition of its degree d j For the sum of the weights of all edges connected to it, i.e. +.>Calculating to obtain a degree matrix of the distance matrix Dist:
the laplacian matrix of graph G is calculated:
the point set of the k subgraphs of graph G is A 1 ,A 2 ,…,A kAnd A is 1 ∪A 2 ∪…∪A k =V;
For the followingDefining a graph cut loss function between A and B as:
the tangent pattern loss function of the k-subgraph is defined as:
for the followingDefinition of vol (a): = Σ j∈A d j
Performing optimal sub-graph cutting on G by using a Normalized Cut graph method, wherein the graph cutting loss is defined as follows:
wherein the method comprises the steps ofIs A j Is introduced into the instruction matrix>
For vector h j Definition:
for the followingThe method comprises the following steps:
namely:
order theThen there are:
the optimization objective function is therefore:
wherein,is->The feature matrix is formed by the feature vectors corresponding to the first k minimum feature values according to line standardization;
for feature matrixEach row of which is considered as a sample, and k is initialized in the sample set The clustering centers E= { E 1 ,e 2 ,…,e k In each iteration, each sample calculates the Euclidean distance from each cluster center and is integrated into cluster c with the shortest distance i Until the maximum iteration times are reached, obtaining a clustering result C= { C i |i∈[1,k]}。
6. The data heterogeneity-oriented federal learning method of claim 5, wherein clustering-based aggregation of client model gradients comprises:
cluster c i User terminal U in j Model gradient W after local model training is completed j Periodically transmitted to acceleration node AN i ,AN i The model gradient of the cluster is uploaded to a central server at regular intervals, and the central server takes the distance from each user in each cluster to the cluster center as the corresponding weight of parameter aggregation to perform gradient aggregation:
wherein,representing global model gradient after the (r+1) th round of iteration, l ij Representing user U j And cluster center e i Distance L of (2) i Representing cluster c i Average distance of the user and the cluster center.
7. The federal learning method for data heterogeneity according to any one of claims 1-6, wherein the client comprises a smart phone, a PC, or a smart wearable device.
8. A federal learning system oriented to data heterogeneity is characterized by comprising a user side, an acceleration node and a central server,
the central server acquires user side data, performs federal learning training to cluster the user sides, and is connected with the user sides through acceleration nodes;
the user side acquires global model gradients and aggregation gradients through the acceleration nodes, performs local model training, updates the local model and sends the local model gradients to the acceleration nodes;
the acceleration nodes aggregate the model gradients of the user terminals and then send the aggregated model gradients to the central server, and each acceleration node is connected with the user terminal in one user cluster;
and after receiving the user gradient sent by the acceleration node, the central server aggregates the user gradient to obtain a global model gradient and updates the global model, and then distributes the aggregated gradient to the user terminal through the acceleration node.
9. The data heterogeneity-oriented federal learning system of claim 8, wherein the central server performs a federal learning training process that:
acquiring the data distribution condition of a user side, and constructing a data distribution matrix of the user side;
calculating EMD distance between the user similarity matrix and the user side data distribution;
calculating Euclidean distance to obtain cluster;
and aggregating the client model gradients based on the cluster.
10. The data heterogeneity-oriented federal learning system according to claim 8 or 9, wherein the client comprises a smart phone, a PC, or a smart wearable device.
CN202311769640.4A 2023-12-20 2023-12-20 Federal learning method and system for data heterogeneity Pending CN117829307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311769640.4A CN117829307A (en) 2023-12-20 2023-12-20 Federal learning method and system for data heterogeneity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311769640.4A CN117829307A (en) 2023-12-20 2023-12-20 Federal learning method and system for data heterogeneity

Publications (1)

Publication Number Publication Date
CN117829307A true CN117829307A (en) 2024-04-05

Family

ID=90522227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311769640.4A Pending CN117829307A (en) 2023-12-20 2023-12-20 Federal learning method and system for data heterogeneity

Country Status (1)

Country Link
CN (1) CN117829307A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118171076A (en) * 2024-05-14 2024-06-11 中国矿业大学 Data feature extraction method, system and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118171076A (en) * 2024-05-14 2024-06-11 中国矿业大学 Data feature extraction method, system and computer equipment

Similar Documents

Publication Publication Date Title
Ma et al. Layer-wised model aggregation for personalized federated learning
US10567299B2 (en) Coflow identification method and system, and server using method
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
CN110458187B (en) Malicious code family clustering method and system
CN111932386B (en) User account determining method and device, information pushing method and device, and electronic equipment
CN103838803A (en) Social network community discovery method based on node Jaccard similarity
WO2014080304A2 (en) Multi-objective server placement determination
Liu et al. Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis
CN117829307A (en) Federal learning method and system for data heterogeneity
CN113518007B (en) Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN115358487A (en) Federal learning aggregation optimization system and method for power data sharing
CN109991591B (en) Positioning method and device based on deep learning, computer equipment and storage medium
CN112637883A (en) Federal learning method with robustness to wireless environment change in power Internet of things
CN113033800A (en) Distributed deep learning method and device, parameter server and main working node
CN115562940A (en) Load energy consumption monitoring method and device, medium and electronic equipment
Li et al. Data analytics for fog computing by distributed online learning with asynchronous update
Gao et al. A deep learning framework with spatial-temporal attention mechanism for cellular traffic prediction
Huang et al. Active client selection for clustered federated learning
CN116244612B (en) HTTP traffic clustering method and device based on self-learning parameter measurement
Fellus et al. Decentralized k-means using randomized gossip protocols for clustering large datasets
CN117060401A (en) New energy power prediction method, device, equipment and computer readable storage medium
CN115577797A (en) Local noise perception-based federated learning optimization method and system
CN112003733B (en) Comprehensive management method and management platform for smart park Internet of things
CN114217933A (en) Multi-task scheduling method, device, equipment and storage medium
CN111275201A (en) Sub-graph division based distributed implementation method for semi-supervised learning of graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination