CN115952860A - Heterogeneous statistics-oriented clustering federal learning method - Google Patents

Heterogeneous statistics-oriented clustering federal learning method Download PDF

Info

Publication number
CN115952860A
CN115952860A CN202310060893.8A CN202310060893A CN115952860A CN 115952860 A CN115952860 A CN 115952860A CN 202310060893 A CN202310060893 A CN 202310060893A CN 115952860 A CN115952860 A CN 115952860A
Authority
CN
China
Prior art keywords
node
clustering
model
edge
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310060893.8A
Other languages
Chinese (zh)
Inventor
左方
高铭远
刘家萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202310060893.8A priority Critical patent/CN115952860A/en
Publication of CN115952860A publication Critical patent/CN115952860A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a heterogeneous statistics-oriented clustering federal learning method, which comprises the following steps: step 1, constructing an edge node distribution classifier; step 2, determining a measurement index of edge node clustering; step 3, determining a clustering method of the node cluster; step 4, clustering the edge nodes by using a clustering method; step 5, the server initializes the global model and sends the model to the head node of each node cluster; step 6, after receiving the model, the edge node performs local training on a local data set and updates the model, sends the updated model to the next node in the cluster for training until all the nodes in each cluster complete training, and uploads the updated model to the server; step 7, the server receives the updated models of all clusters, then carries out weighted average and updates the global model; and 8, repeating the step 6 and the step 7 until the global model converges. Compared with the traditional federal learning method, the method is more efficient and has stronger applicability.

Description

Heterogeneous statistics-oriented clustering federal learning method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a heterogeneous statistics-oriented clustering federal learning method.
Background
Modern mobile and internet of things devices (e.g., smartphones, smart wearable devices, smart home devices) are producing large amounts of data each day, which provides opportunities to make complex Machine Learning (ML) models to address challenging artificial intelligence tasks. In conventional High Performance Computing (HPC), all data is collected and concentrated in one place for processing by a supercomputer having hundreds to thousands of compute nodes. However, concerns about security and privacy have led to new legislation, such as General Data Protection Regulations (GDPR) and Health Insurance Portability and Accountability Act (HIPAA), that prevents data from being transmitted to a centralized location, making traditional high performance computing difficult to apply to collecting and processing scattered data. Joint learning addresses security and privacy challenges by using decentralized data, i.e., training local models on local data of each client (data side), and using a central aggregator to accumulate learning gradients of local models to train a global model, thereby enlightening a new emerging high performance computational paradigm. While the computing resources of a single client may be far less powerful than the computing nodes of a traditional supercomputer, the computing power from a large number of clients can be accumulated to form a very powerful "decentralized virtual supercomputer". Joint learning has proven its success in a range of applications. From GBoard and keyword discovery and the like consumer devices to the pharmaceutical, medical research, financial, and manufacturing industries.
The data in federal learning is owned by the customer and may vary widely in quantity and content. Resulting in severe data heterogeneity that does not typically occur in data center distributed learning because the data distribution therein is well controlled. In data center distributed learning, the categories and characteristics of the training data are evenly distributed across all clients, i.e., independently Identically Distributed (IID). However, in federal learning, the distribution of data classes and features depends on the data owner, thus resulting in a non-uniform data distribution, referred to as a non-independent uniform distribution (non-IID data heterogeneity). This heterogeneity greatly affects training time and accuracy, and a technical solution for the above situation is needed.
Disclosure of Invention
Aiming at the problems that in federal learning, the distribution of data categories and characteristics depends on data owners, so that the data distribution is non-uniform, and further the training time and accuracy are greatly influenced, the invention provides a heterogeneous statistics-oriented clustering federal learning method, which is used in a federal learning environment with data statistics heterogeneity, and realizes a more efficient and highly applicable federal learning method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a heterogeneous statistics-oriented clustering federal learning method comprises the following steps:
step 1, constructing an edge node distribution classifier;
step 2, determining a measurement index of edge node clustering;
step 3, determining a clustering method of the node cluster;
step 4, clustering the edge nodes by using a clustering method;
step 5, the server initializes the global model and sends the model to the head node of each node cluster;
step 6, after receiving the model, the edge node performs local training on a local data set and updates the model, sends the updated model to the next node in the cluster for training until all the nodes in each cluster complete training, and uploads the updated model to the server;
step 7, the server receives the updated models of all clusters, then carries out weighted average and updates the global model;
and 8, repeating the step 6 and the step 7 until the global model converges.
Further, the step 1 comprises:
global model f θ Segmentation into a depth feature extractor
Figure BDA0004061210170000021
And a classifier->
Figure BDA0004061210170000022
Wherein θ = (θ) featclf ) Is a set of parameters of the global model;
before the formal start of federal learning, a pre-training phase is used to estimate the data distribution on the edge nodes participating in training, during which each edge node k initializes θ from the same random 0 Initially, e rounds of training are performed on their local data sets, updating the model to
Figure BDA0004061210170000023
Based on local classifiers, respectively
Figure BDA0004061210170000024
Parameter ψ clf Or it's common data set at the server side
Figure BDA0004061210170000025
Upper prediction psi conf Constructing an edge node distribution classifier;
at the server side, a classifier is used for updating the model according to the edge node
Figure BDA0004061210170000026
Evaluating the data distribution of the node>
Figure BDA0004061210170000027
Further, the step 2 comprises:
approximation of data distribution from edge node k
Figure BDA0004061210170000028
Initially, similar node clusters are established from nodes with different distributions, the distance between the node clusters is minimized, and the distance in the node clusters is maximized;
use ofCosine and Euclidean distance are used for comparing the weight of the client classifier
Figure BDA0004061210170000031
The actual probability distribution form given as the confidence vector, and the KL divergence as the measure index.
Further, the clustering method comprises the following steps:
strategy 1: the clients are randomly assigned to the node clusters until a defined stopping criterion is met;
strategy 2: first, N is obtained by using the K-means method S Individual homogeneous clustering; all node clusters are then formed by iteratively extracting one edge node at a time from each cluster, up to the number of samples in each node cluster S
Figure BDA0004061210170000039
And edge node K S ≤k S,max
Strategy 3: randomly selecting an edge node k i Assigned to the current node cluster S, i ∈ [ K ]](ii) a Then, a second edge node k is selected j Let k be i And k j The distance therebetween reaches a maximum, i.e.
Figure BDA0004061210170000032
This process is repeated continuously and finally maximization->
Figure BDA0004061210170000033
Reach the set maximum edge node number K S,max And a minimum sample number>
Figure BDA0004061210170000034
Wherein tau is a measure of clustering.
Compared with the prior art, the invention has the following beneficial effects:
the method is suitable for the federal learning environment with data statistics heterogeneity, can be conveniently deployed under the traditional two-layer framework of the server-edge node, and can also be expanded and deployed under the three-layer framework of the cloud-edge server-edge node. Compared with the traditional federal learning method, the method is more efficient and has stronger applicability.
Drawings
Fig. 1 is a schematic flow chart of a heterogeneous statistics-oriented clustering federal learning method according to an embodiment of the present invention;
fig. 2 is a second flowchart of a heterogeneous statistics-oriented clustering federal learning method according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
the goal of traditional federal learning is to learn a global model
Figure BDA0004061210170000035
Each edge node K ∈ [ K ]]Can be based on the local data set->
Figure BDA0004061210170000036
To obtain n k The FedAvg is a method based on T communication rounds of iteration and aims to solve the problem of ^ er>
Figure BDA0004061210170000037
Wherein
Figure BDA0004061210170000038
Is a local empirical risk,/ k Is the cross entropy loss, n = ∑ Σ k n k Is the total amount of data involved in the training. At each round T e [ T ∈ [ ]]The server will theta t Is sent to a randomly selected->
Figure BDA0004061210170000041
A part of the customer. Each client
Figure BDA0004061210170000042
Using D by minimizing local objects k Performing local gradient descent to reduce theta t Is updated to be->
Figure BDA0004061210170000043
And returns it to the server. The updated model is then summarized by the server to a new global model->
Figure BDA0004061210170000044
Is medium, i.e.>
Figure BDA0004061210170000045
However, in real-world scenarios, there is no guarantee that local datasets from different customers are independently extracted from the same underlying distribution.
Aiming at the problems, as shown in fig. 1 and fig. 2, the invention provides a heterogeneous statistics-oriented clustering federal learning method, which comprises the following steps:
step one, constructing an edge node distribution classifier psi. The invention integrates a global model f θ Separation into a depth feature extractor
Figure BDA0004061210170000046
And a classifier->
Figure BDA0004061210170000047
Wherein θ = (θ) featclf ) Is a set of parameters of the global model. The classification output is selected by>
Figure BDA00040612101700000421
It is given. Before the formal start of federal learning, a pre-training phase is used to estimate the data distribution on the edge nodes participating in training, during which each edge node k initializes θ from the same random 0 Initially, e rounds of training are performed on their local data sets to update the model to ≧>
Figure BDA0004061210170000048
The invention uses two strategies, based on local classifier @, respectively>
Figure BDA0004061210170000049
Radix Ginseng (radix Ginseng)Number psi clf Or its public data set at the server side->
Figure BDA00040612101700000410
Upper prediction psi conf . For strategy one, assume that the weight of the classifier can represent the local distribution of each client and directly feed it back to the clustering method φ (.) . For strategy two, at a common "feature set">
Figure BDA00040612101700000411
Up-test each pick>
Figure BDA00040612101700000412
Wherein->
Figure BDA00040612101700000413
Containing c e [ N [ ] C ]J samples of (a). Then according to class>
Figure BDA00040612101700000414
The predictions are averaged and a confidence vector for the kth client is defined as ≥>
Figure BDA00040612101700000415
On the server side, the classifier psi is used, and the updated model is combined with the updated edge node>
Figure BDA00040612101700000416
An estimate of the data distribution of the node is obtained>
Figure BDA00040612101700000417
And step two, determining a clustering measurement index tau. Approximation of data distribution from edge node k
Figure BDA00040612101700000418
Initially, similar node clusters are established from nodes having different distributions in order to minimize the distance between node clusters while maximizing the distance within a node cluster. Given a/>
Figure BDA00040612101700000419
And &>
Figure BDA00040612101700000420
It is necessary to find a metric for measuring the distance between two distribution estimates
Figure BDA0004061210170000051
Using cosine and Euclidean distance to compare the weight of the customer classifier, use @>
Figure BDA0004061210170000052
The actual probability distribution form given as the confidence vector, and the KL divergence as the measure.
And step three, determining a clustering method phi of the node cluster. First, define
Figure BDA0004061210170000053
For a client belonging to a cluster of nodes S>
Figure BDA0004061210170000054
The collection of data of (2). In order to find a satisfying: minimum sample number>
Figure BDA0004061210170000055
And a maximum number of customers K S,max Maximum number of nodes N of the constraint S Distributing the classifier psi at a given edge node (.) And clustering metric τ, the present invention introduces three strategies to find an approximation of the maximization problem. The first is phi rand Policy, a simple and practical method, is that customers are randomly assigned to a cluster of nodes until a defined stopping criterion is met. The second is phi kmeans The strategy is based on a K-means algorithm: first, N is obtained by using the K-means method S Individual homogeneous clustering; then, all node clusters are formed by iteratively extracting one edge node at a time from each cluster until the number of samples ≧ in each node cluster S>
Figure BDA0004061210170000056
And edge node K S ≤k S,max . Finally, phi geedy The strategy follows a greedy approach to generating node clusters. Initially, an edge node k is randomly selected i Assigned to the current node cluster S, i ∈ [ K ]]. Then, a second edge node k is selected j Let k be i And k j Reaches a maximum, i.e. < >>
Figure BDA0004061210170000057
This process is repeated continuously, finally maximizing by iteration>
Figure BDA0004061210170000058
To a predetermined maximum number of edge nodes K S,max And a minimum sample number->
Figure BDA0004061210170000059
And step four, according to a clustering method, dividing all edge nodes participating in training into i node clusters after pre-training is finished, combining edge nodes with different distributions together, and simultaneously dividing edge nodes with similar distributions.
Step five, the server initializes a global model theta t Communicating with all edge nodes participating in the training and sending the model to all node clusters S i ,i∈[N S ]Head node k of i,1
Step six, node k i,1 Upon receiving the model θ t Then, in the local data
Figure BDA00040612101700000510
To carry out E k Updating the model to be ^ er/standard for each round of training>
Figure BDA00040612101700000511
Will then->
Figure BDA00040612101700000512
Sent to the next edge node k in the node cluster i,2 And repeating the process until the last client side of the node cluster is judged to be>
Figure BDA00040612101700000513
The model is received and local training is completed. Is at>
Figure BDA00040612101700000514
After the training is completed, the updated model is->
Figure BDA00040612101700000515
Sends to head node k in the cluster i,1
Step seven, head node k in the cluster i,1 Receiving a model
Figure BDA0004061210170000061
Then judging whether to repeat E according to the training effect S Substep six, if not required, the model is evaluated>
Figure BDA0004061210170000062
Sending the updated model to a server, and after receiving the updated model returned by all the node clusters, the server bases the operation on the updated model
Figure BDA0004061210170000063
The model updates are averaged.
And step eight, repeating the step six and the step seven until the global model converges.
It should be noted that if the method is applied to a three-tier architecture of cloud-edge server-edge node, the edge server can be regarded as the server in the above steps simply when the edge server-edge node hierarchy. In the server-edge server hierarchy, the edge servers can be regarded as edge nodes in the above steps, and the sequence training is performed between the edge serversThe method can be carried out without clustering. The basis for this is that merging models may only be useful if the models are trained on a larger data set. According to statistics, in N S After a round, each model may have been trained on the entire data set, so that the performance of the strategy is closer and closer to a centralized strategy.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (4)

1. A heterogeneous statistics-oriented clustering federal learning method is characterized by comprising the following steps:
step 1, constructing an edge node distribution classifier;
step 2, determining a measurement index of edge node clustering;
step 3, determining a clustering method of the node cluster;
step 4, clustering the edge nodes by using a clustering method;
step 5, the server initializes the global model and sends the model to the head node of each node cluster;
step 6, after receiving the model, the edge node performs local training on a local data set and updates the model, sends the updated model to the next node in the cluster for training until all the nodes in each cluster complete training, and uploads the updated model to the server;
step 7, the server receives the updated models of all clusters, then carries out weighted average and updates the global model;
and 8, repeating the step 6 and the step 7 until the global model converges.
2. The heterogeneous statistics-oriented clustering federated learning method according to claim 1, wherein the step 1 includes:
global model f θ Is divided intoDepth feature extractor
Figure FDA0004061210160000011
And a classifier->
Figure FDA0004061210160000012
Wherein θ = (θ) featclf ) Is a set of parameters of the global model;
before the formal start of federal learning, a pre-training phase is used to estimate the data distribution on the edge nodes participating in training, during which each edge node k initializes θ from the same random 0 Initially, e rounds are trained on their local data sets, updating the model to
Figure FDA0004061210160000013
Based on local classifiers, respectively
Figure FDA0004061210160000014
Parameter ψ clf Or it's common data set at the server side
Figure FDA0004061210160000015
Upper prediction psi conf Constructing an edge node distribution classifier;
at the server side, a classifier is used for updating the model according to the edge node
Figure FDA0004061210160000016
An estimate of the data distribution of the node is obtained>
Figure FDA0004061210160000017
3. The heterogeneous statistics-oriented clustering federated learning method according to claim 2, wherein the step 2 includes:
data distribution approximation from edge node kValue of
Figure FDA0004061210160000018
Initially, similar node clusters are established from nodes with different distributions, the distance between the node clusters is minimized, and the distance in the node clusters is maximized;
using cosine and Euclidean distances to compare weights of a client classifier, using
Figure FDA0004061210160000019
The actual probability distribution form given as the confidence vector, and the KL divergence as the measure index.
4. The heterogeneous statistics-oriented clustering federated learning method according to claim 1, wherein the clustering method includes:
strategy 1: the clients are randomly assigned to the node clusters until a defined stopping criterion is met;
strategy 2: first, N is obtained by using the K-means method S Individual homogeneous clustering; all node clusters are then formed by iteratively extracting one edge node at a time from each cluster, up to the number of samples in each node cluster S
Figure FDA0004061210160000021
And edge node K S ≤k S,max
Strategy 3: randomly selecting an edge node k i Assigned to the current node cluster S, i ∈ [ K ]](ii) a Then, a second edge node k is selected j Let k be i And k j The distance between them is maximized, i.e.
Figure FDA0004061210160000022
This process is repeated continuously, finally maximizing by iteration>
Figure FDA0004061210160000023
To a predetermined maximum edge pitchNumber of points K S,max And a minimum sample number>
Figure FDA0004061210160000024
Wherein tau is a measure of clustering. />
CN202310060893.8A 2023-01-17 2023-01-17 Heterogeneous statistics-oriented clustering federal learning method Pending CN115952860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310060893.8A CN115952860A (en) 2023-01-17 2023-01-17 Heterogeneous statistics-oriented clustering federal learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310060893.8A CN115952860A (en) 2023-01-17 2023-01-17 Heterogeneous statistics-oriented clustering federal learning method

Publications (1)

Publication Number Publication Date
CN115952860A true CN115952860A (en) 2023-04-11

Family

ID=87282541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310060893.8A Pending CN115952860A (en) 2023-01-17 2023-01-17 Heterogeneous statistics-oriented clustering federal learning method

Country Status (1)

Country Link
CN (1) CN115952860A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806838A (en) * 2024-02-29 2024-04-02 浪潮电子信息产业股份有限公司 Heterogeneous data-based device clustering method, apparatus, device, system and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117806838A (en) * 2024-02-29 2024-04-02 浪潮电子信息产业股份有限公司 Heterogeneous data-based device clustering method, apparatus, device, system and medium
CN117806838B (en) * 2024-02-29 2024-06-04 浪潮电子信息产业股份有限公司 Heterogeneous data-based device clustering method, apparatus, device, system and medium

Similar Documents

Publication Publication Date Title
Zhong et al. Applying big data based deep learning system to intrusion detection
Chien et al. Community detection in hypergraphs: Optimal statistical limit and efficient algorithms
Asad et al. Evaluating the communication efficiency in federated learning algorithms
CN114186237A (en) Truth-value discovery-based robust federated learning model aggregation method
CN115358487A (en) Federal learning aggregation optimization system and method for power data sharing
Yi et al. Fedlora: Model-heterogeneous personalized federated learning with lora tuning
CN113537509A (en) Collaborative model training method and device
CN115952860A (en) Heterogeneous statistics-oriented clustering federal learning method
CN114821237A (en) Unsupervised ship re-identification method and system based on multi-stage comparison learning
CN114999635A (en) circRNA-disease association relation prediction method based on graph convolution neural network and node2vec
CN115359298A (en) Sparse neural network-based federal meta-learning image classification method
CN115114484A (en) Abnormal event detection method and device, computer equipment and storage medium
CN116244484B (en) Federal cross-modal retrieval method and system for unbalanced data
Zhang et al. Federated multi-task learning with non-stationary heterogeneous data
CN108121912B (en) Malicious cloud tenant identification method and device based on neural network
CN111160077A (en) Large-scale dynamic face clustering method
Yang et al. An academic social network friend recommendation algorithm based on decision tree
Liu et al. Optimizing federated unsupervised person re-identification via camera-aware clustering
Basu et al. Pareto optimal streaming unsupervised classification
CN112766336A (en) Method for improving verifiable defense performance of model under maximum random smoothness
Nguyen et al. Gradual federated learning using simulated annealing
Govindarajan et al. Network Traffic Prediction Using Radial Kernelized-Tversky Indexes-Based Multilayer Classifier.
Tian et al. FedACQ: adaptive clustering quantization of model parameters in federated learning
CN112085114B (en) Online and offline identity matching method, device, equipment and storage medium
Fan et al. Robust distributed swarm learning for intelligent iot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination