CN112364913A - Federal learning communication traffic optimization method and system based on core data set - Google Patents

Federal learning communication traffic optimization method and system based on core data set Download PDF

Info

Publication number
CN112364913A
CN112364913A CN202011240064.0A CN202011240064A CN112364913A CN 112364913 A CN112364913 A CN 112364913A CN 202011240064 A CN202011240064 A CN 202011240064A CN 112364913 A CN112364913 A CN 112364913A
Authority
CN
China
Prior art keywords
model
global model
layer
global
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011240064.0A
Other languages
Chinese (zh)
Inventor
肖春华
李开菊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202011240064.0A priority Critical patent/CN112364913A/en
Publication of CN112364913A publication Critical patent/CN112364913A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Neurology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the field of federal machine learning, and discloses a core data set-based federal learning communication traffic optimization method and system. According to the method, firstly, each terminal user screens out core data from local training data in parallel, the cloud center constructs a sparse global model according to a set sparse proportion, and each terminal user conducts local model training according to the screened local core data to obtain local model updating. Then, in order to make the global model more adaptive to the local core data, the cloud center adjusts the network structure of the global model according to the global model update obtained by aggregating the local model updates, wherein the steps include two steps of removing unimportant connections and adding important connections. And finally, the cloud center distributes the adjusted global model to each terminal user, and the steps are iterated until the global model converges. According to the method, the core data are screened from the terminal user, the adaptive sparse network model is deployed, uploading of parameters of the terminal user and the cloud center model is reduced, and the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between the terminal user and the cloud center in the federal learning technology is solved essentially.

Description

Federal learning communication traffic optimization method and system based on core data set
Technical Field
The invention relates to the field of federal machine learning, in particular to a core data set-based federal learning communication traffic optimization method and a core data set-based federal learning communication traffic optimization system, which are used for solving the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between terminal users/equipment and a cloud center in the federal learning technology.
Background
Machine learning, as an important branch of the field of artificial intelligence, is successfully and widely applied in various fields such as pattern recognition, data mining and computer vision. Due to the limited computing resources of the terminal equipment, a cloud-based mode is usually adopted for training a machine learning model at present, and in the mode, data collected by the terminal equipment, such as pictures, videos or personal position information, must be uploaded to a cloud center completely, and the cloud center completes the training of the model in a centralized manner to obtain an inference model. However, the sensitive data of the uploaded users can reveal the privacy information of the uploaded users, and as the privacy awareness of the users increases, more and more users do not want to share the privacy data to participate in the training of the model. This has severely hampered the development and application of machine learning techniques in the long term.
Accordingly, federal learning arises in order to protect sensitive data of users without affecting the training of machine learning models. In a federal learning environment, a terminal user does not need to upload local sensitive data to a cloud center, only needs to share local model updating parameters, and the cloud center interacts with the terminal user for many times until a global model converges, so that the sensitive data of the user is protected, and a final available model is obtained.
In the federal learning environment, multiple rounds of interaction are needed between the end user and the cloud center to obtain a global model of the target precision. Then, for complex model training, such as deep learning model training, each model update may contain millions of parameters, and the high dimensionality of the model update consumes a lot of communication cost, even becoming a bottleneck in model training. In addition, due to the heterogeneity of end users/devices, the unreliability of the network state of each device, and the asymmetry of the internet connection speed, for example, the uploading speed is much lower than the downloading speed, so that the delay of uploading local update parameters by the end user can further deteriorate the bottleneck of model training, and therefore, in order to improve the federal learning model training performance, the communication efficiency must be improved.
At present, in order to improve the communication efficiency problem of federal study, researchers at home and abroad have carried out a lot of studies on the federal study, and put forward a plurality of effective communication optimization methods. The methods mainly consider the redundancy characteristics of the model parameters, perform model compression operations such as sparsification, light weight or knowledge distillation on local model updating parameters, reduce the uploading of the redundancy parameters, and enable the uploaded model to be more compact in updating, thereby achieving the purpose of reducing communication traffic. However, these methods all consider the reduction of traffic from the viewpoint of model parameters, and do not essentially solve the problem. As is known, model parameters are obtained by local data training, and data themselves have redundancy, so that the purpose of reducing communication traffic can be achieved by extracting important data from the data themselves to perform model training and essentially reducing uploading of redundant parameters. Meanwhile, in the existing methods, although the model parameters are sparsified or lightened, and the uploading of redundant parameters is reduced, the existing methods do not operate the network model, and the model has the redundancy characteristic. Therefore, in the method for reducing the communication traffic, in order to substantially reduce the uploading of the redundant parameters, the redundancy characteristic of the data and the redundancy of the model are considered, and the two characteristics are not considered in the existing method.
In order to solve the problems of user sensitive data leakage and model availability caused by cloud-based training, federal learning is carried forward. However, due to the high dimensionality of the model training parameters and the unreliability of the network in the federated learning environment, the communication cost problem becomes a fundamental and important problem in federated learning. Although the existing research methods propose a plurality of effective communication optimization methods from the perspective of reducing communication traffic, they all consider the reduction of communication traffic from the model parameters, but do not consider the substantial reduction of redundant parameter uploading from the aspects of training data and models, so in order to better solve the problem of high communication cost of federal learning, the redundancy characteristics of the training data and models need to be fully considered, and the redundant parameter uploading is substantially reduced, thereby achieving the purpose of reducing communication traffic.
Based on the background, the invention provides a simple and easy-to-implement Federal learning communication traffic optimization method and system based on a core data set, and lays a foundation for solving the problem of high communication cost in Federal learning.
Disclosure of Invention
In order to effectively solve the problem of high communication cost of federal learning, the invention provides a method and a system for optimizing federal learning communication traffic based on a core data set. Firstly, each terminal user screens out core data from local training data in parallel, the cloud center constructs a sparse global model according to a set sparse proportion, and each terminal user conducts local model training according to the screened local core data to obtain local model updating. And finally, the cloud center distributes the adjusted global model to each terminal user, and the steps are iterated until the global model converges. According to the method, core data are screened from the training data for each terminal user according to the redundancy characteristic of the training data, and from the perspective of the model, the redundancy of the model is considered, and the sparse operation is performed on the model structure, so that a sparse network model adaptive to the core data is deployed, uploading of parameters of the terminal users and parameters of the cloud center model are reduced substantially, and the purpose of reducing communication traffic is achieved. The invention provides a Federal learning communication traffic optimization method based on a core data set, which comprises the following steps,
step S1, core data set construction. Given data set
Figure BDA0002768072640000021
Weight set of all data points in D
Figure BDA0002768072640000022
Selecting a core data set M from D so that a loss function satisfies | f (D) -f (M) | ≦ ε | f (D), wherein the loss function tolerance ε (0 < ε < 1), the error rate δ (0 < δ < 1) and a parameter R, and the method comprises the following sub-steps:
s1-1, selecting K clustering centers from D, dividing D into K clustering clusters by adopting K-means clustering algorithm, and using GkRepresents the clustering result of the kth cluster,
Figure BDA0002768072640000031
representing the weight of the jth data point in the kth cluster;
step S1-2, calculating the redundancy m of each data point i in DiThe calculation formula is as follows:
Figure BDA0002768072640000032
wherein the content of the first and second substances,
Figure BDA0002768072640000033
represents the weighted sum of all data points in the kth cluster except data point i,
Figure BDA0002768072640000034
representing the weighted average distance of the data point i from all data points in the kth cluster.
Step S1-3, calculating the normalized redundancy P of each data point i in DiThe calculation formula is as follows:
Figure BDA0002768072640000035
step S1-4, calculating the redundancy average value of all data points in D
Figure BDA0002768072640000036
The calculation formula is as follows:
Figure BDA0002768072640000037
step S1-5, averaging according to the redundancy
Figure BDA0002768072640000038
The loss function tolerance epsilon and the error rate delta are calculated, and the size of the core data set M is calculated according to the following formula:
Figure BDA0002768072640000039
where c is a fixed constant.
And S1-6, screening the core data set M from the D according to the formula (2) and the formula (4).
Repeat step S1 for a given set of end users { C1,C2,...,Cr,...CnAnd a corresponding data set { D }1,D2,...,Dr,...DnEach end-user screens out its core data set { M } in parallel1,M2,...,Mr,...Mn}。
And step S2, constructing a sparse model. Constructing a sparse global model according to a given model sparsification proportion alpha, and comprising the following sub-steps of:
step S2-1, constructing a network with L number of layers and n number of L (L is less than or equal to L) layer neuronslThe fully connected network model of (1);
step S2-2, according to model sparsification proportion alpha and network connection probability
Figure BDA0002768072640000041
The sparse fully-connected network model has the following formula:
Figure BDA0002768072640000042
wherein n islAnd nl-1Respectively representing the number of l-layer neurons and l-1-layer neurons in the fully-connected network model,
Figure BDA0002768072640000043
it represents the probability of connection of the neuron i of the l-th layer to the neuron j of the l-1-th layer.
Step S3, cloud center initializes. Initializing global model parameters W according to the constructed sparse global model0Global model update parameter U0Iteration round number T and total uploaded parameter number omega.
Step S4, local model training. Each end user Cr(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter Hr(r 1,2.. n) and upload its local model update parameters to the cloud center, including the sub-steps of:
step S4-1, for each end user Cr(r 1,2.., n) from a local core dataset Mr(r ═ 1,2.. times, n) local model training is performed in parallel, resulting in local model update parameters Hr(r=1,2,...,n);
Step S4-2, for each end user Cr(r 1,2.., n) upload its local model parameters Hr(r 1,2.., n) to the cloud center, juxtaposed
Figure BDA0002768072640000044
Step S5, global model update. The cloud center calculates a global model update U of the current iteration round number T (T is less than or equal to T) according to local model update parameters uploaded by the terminal usertAnd global model parameters WtThe method comprises the following substeps:
step S5-1, calculating the global model update U of the current iteration round number T (T is less than or equal to T)tThe calculation formula is as follows:
Figure BDA0002768072640000045
wherein, UtAnd the global model updating parameter represents the current iteration round number T (T is less than or equal to T).
Step S5-2, updating U according to the global modeltCalculating the global model parameter W of the current iteration round number T (T is less than or equal to T)tThe calculation formula is as follows:
Wt=Wt-1-Ut (7)
and step S6, adjusting the structure of the global model. According to the global model parameter WtAnd a proportion beta of unimportant connections removal set, removing unimportant network connections in the global model while increasing the proportion beta of important connections, comprising the sub-steps of:
step S6-1, evaluating the influence I of removing any connection I in the global model on the model loss functioniThe calculation formula is as follows:
Ii=|f(Wt)-f(Wt|wi=0)| (8)
wherein, wiRepresents WtThe ith weight parameter.
Since it is very time consuming to evaluate the effect of sequentially removing all connections in the network on the model loss function according to equation (8), equation (8) is further expressed as:
Ii=|giwi| (9)
wherein, giRepresenting the ith update gradient in the global model gradient.
Step S6-2, removing I of beta proportion layer by layer from the global modeliA smaller neuronal connection;
step S6-3, according to the importance I of the neuron connectioniThe important junctions are screened layer by layer. Assuming that in step S6-2, the connection formed by the ith node of the l-th layer and the jth node of the l-1 layer is removed, screening out a node set S without removing the unimportant connection in the l-1 layer;
and step S6-4, randomly selecting a node from the set S to be connected with the ith node of the ith layer.
Step S7, global model distribution. And distributing the adjusted global model and the model parameters to each end user, and starting the next iteration.
And repeating the steps S4-S7 until T rounds of iteration are finished, and finishing the model training to obtain the final total communication traffic omega.
Meanwhile, the present invention also provides a federate learning traffic optimization system based on a core data set, as shown in fig. 3, including:
core data set building block for building a set of end users { C }from a given set of end users1,C2,...,Cr,...CnAnd a corresponding data set { D }1,D2,...,Dr,...DnIn the method, each end user screens out a core data set { M) of the end user in parallel1,M2,...,Mr,...MnContains the following sub-modules:
a clustering submodule for dividing a given data set D into K clusters and using GkRepresents the clustering result of the kth cluster,
Figure BDA0002768072640000051
representing the weight of the jth data point in the kth cluster;
weight and calculation submodule for calculating
Figure BDA0002768072640000052
Obtaining the weight sum of all data except the data point i in the kth clustering cluster;
a weighted average distance calculation submodule for calculating
Figure BDA0002768072640000061
Obtaining the weighted average distance between the data point i and all data points in the kth cluster;
redundancy calculation submodule for calculating
Figure BDA0002768072640000062
Obtaining the redundancy of each data point i in the D;
normalization submodule for calculating
Figure BDA0002768072640000063
Obtaining the normalized redundancy of each data point i in the D;
average redundancy calculation submodule for calculating
Figure BDA0002768072640000064
Obtaining the redundancy average value of all data points in the D;
core data set size calculation submodule for calculating
Figure BDA0002768072640000065
Obtaining the size of a core data set;
a probability sampling submodule for normalizing the redundancy P according to the size of the core data set | M |)iScreening a core data set M from the D;
parallel submodule for parallel slave to each end user CrData set D of (r ═ 1,2.., n)r(r ═ 1,2.., n) screening the core dataset Mr(r=1,2,...,n)。
The sparse model building module is used for conducting sparse on the built full-connection network model to obtain a sparse global model and comprises the following sub-modules:
the full-connection network construction submodule is used for constructing a full-connection network model according to the set number of network layers and the number of neurons in each layer;
a sparsification sampling probability calculation submodule for calculating the neuron connection probability of the adjacent layer in the model for each layer of the full-connection network model
Figure BDA0002768072640000066
A sparsification sampling submodule for sparsifying each layer of the full-connection network model to make the neuron connection probability of the adjacent layer in the model satisfy
Figure BDA0002768072640000067
A cloud center initialization module for initializing global model parameters W0Global model update parameter U0The number of model iteration rounds T and the total uploaded parameters omega.
Local model training module for each end user Cr(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter Hr(r ═ 1,2.., n), including the followingSubmodule:
global model parameter acquisition submodule for each end user Cr(r ═ 1,2.. times, n), obtaining global model parameters W of current iteration round number T (T ≦ T) from the cloud centert
Local model update parameter calculation submodule for each end user Cr(r 1,2.., n), from which the local core dataset M is derivedr(r ═ 1,2.. times, n) to carry out local model training, and obtain a model updating parameter H of the current iteration round number T (T is less than or equal to T)r(r=1,2,...,n)。
The global model updating module is used for calculating global model updating parameters and global model parameters of the current iteration rounds, and comprises the following sub-modules:
a local model update parameter acquisition submodule for acquiring each end user Cr(r ═ 1,2.., n) model update parameter H for current iteration round number T (T ≦ T)r(r=1,2,...,n);
A global model parameter calculation submodule for calculating
Figure BDA0002768072640000071
And Wt=Wt-1-UtObtaining a global model updating parameter U of the current iteration round number T (T is less than or equal to T)tAnd global model parameters Wt
A global model structure adjusting module for adjusting the global model structure according to the global model parameter WtAdjusting the structure of the global model, comprising the following sub-modules:
a model connection importance calculation submodule for calculating Ii=|giwiI, removing the influence of the connection i in the global model on the model loss function;
an importance ranking submodule for ranking IiThe values of (A) are sorted in the order from small to large;
an unimportant connection removal submodule for removing layer-by-layer beta-scaled I from the global modeliA smaller neuronal connection;
an importance node screening submodule for connecting importance I according to neuronsiAnd screening an important connection point set S layer by layer.
And the randomization submodule is used for randomly choosing the importance node i from the set S.
From the data perspective, important training data are screened from local redundant data by comprehensively considering the training data and the redundancy characteristics of the model, the network model is thinned according to the important data, and a thinned network model adaptive to core data is deployed, so that each time a terminal user interacts with the cloud center, only training parameters of the thinned model need to be uploaded, the uploading of redundant parameters is reduced substantially, the problem of high communication cost caused by frequent transmission of high-dimensional update parameters between the terminal user and the cloud center in the federal learning technology is solved, and compared with the prior art, the method has the following beneficial effects:
(1) the method and the system provided by the invention have the advantages that from the data perspective, the redundancy characteristics of data and models are considered, a new idea for solving the problem of high communication cost in federal learning is developed, and the uploading of redundant parameters is substantially reduced, so that the purpose of reducing communication traffic is achieved;
(2) in the prior art, in the model training stage, most local model parameters need to deploy complex network models on terminal equipment, most terminals are resource-limited equipment, and the deployment of the complex models on the equipment is unreasonable in the actual federal learning scene.
(3) The global model structure adjusting part adopts Taylor first-level expansion characteristic of the loss function for network connection importance evaluation, simplifies the complexity of importance evaluation, further reduces the calculation complexity and is convenient for high-efficiency algorithm implementation.
Drawings
Fig. 1 is a flowchart of an overall method provided by an embodiment of the invention.
Fig. 2 is a flowchart illustrating specific steps provided by an embodiment of the present invention.
Fig. 3 is a schematic design diagram of a module of a federal learning traffic optimization system based on a core data set according to an embodiment of the present invention.
Detailed Description
The conception, specific structure and technical effects of the present invention will be further described in conjunction with the accompanying drawings and embodiments, so that the objects, features and effects of the present invention can be fully understood.
The specific implementation steps of the present invention are described below by taking an example of jointly training a multi-layer perceptron (MLP) model by using an MNIST data set by 100 end users, and the purpose is to reduce the total parameters uploaded to the cloud center by local users, so as to achieve the purpose of reducing communication traffic. Let the expression of MLP model be
Figure BDA0002768072640000081
Wherein N represents the total number of samples, XiIs a feature vector of a sample, WiIs the model parameter, b is the bias, σ is the activation function, and y is the output of the model.
The method provided by the technical scheme of the invention can adopt a computer software technology to realize an automatic operation process, fig. 1 is a general method flow chart of the embodiment of the invention, and referring to fig. 1, in combination with a specific step flow chart of fig. 2, the specific steps of the embodiment of the invention based on a core data set comprise:
step S1, core data set construction. Given data set
Figure BDA0002768072640000082
Weight set of all data points in D
Figure BDA0002768072640000083
And (3) screening a core data set M from the D so that the loss function satisfies | f (D) -f (M) | < ≦ ε | f (D) |, wherein the loss function tolerance ε (0 < ε < 1), the error rate δ (0 < δ < 1) and a parameter R.
In an embodiment, the MNIST data set is partitioned to each end user C in a non-independent equal distribution (N-IID) mannerr(r 1,2.., 100), to giveTraining data set D to each end userr(r ═ 1,2,. 100), where,
Figure BDA0002768072640000091
Drset of weights for all data points in
Figure BDA0002768072640000092
Given a loss function tolerance e of 0.08, an error rate δ of 0.05 and a parameter R of 3, the data set D is concurrently derived from the data set D according to the given parametersr(r 1,2.., 100) screening the core dataset Mr(r 1,2.., 100) such that the loss function satisfies | f (D)r)-f(Mr)|≤0.08|f(Dr) L (═ 1,2., 100), the following is implemented,
s1-1, selecting K clustering centers from D, dividing D into K clustering clusters by adopting K-means clustering algorithm, and using GkRepresents the clustering result of the kth cluster,
Figure BDA0002768072640000093
representing the weight of the jth data point in the kth cluster;
in the examples, from DrSelecting 3 clustering centers from (r 1,2.., 100), and clustering D by using a K-means clustering algorithmr(r 1,2.., 100) into 3 cluster clusters, and Gk(k ═ 1,2,3) denotes a clustering result of the kth cluster,
Figure BDA0002768072640000094
representing the weight of the jth data point in the kth cluster;
step S1-2, calculating the redundancy m of each data point i in DiThe calculation formula is as follows:
Figure BDA0002768072640000095
wherein the content of the first and second substances,
Figure BDA0002768072640000096
represents the weighted sum of all data points in the kth cluster except data point i,
Figure BDA0002768072640000097
representing the weighted average distance of the data point i from all data points in the kth cluster.
In the examples, pair DrEach data point i in (r 1,2.. 100) is first calculated
Figure BDA0002768072640000098
To obtain
Figure BDA0002768072640000099
Then according to
Figure BDA00027680726400000910
Respectively calculating the weighted average distance between the data point i and all data points in the k (k is 1,2,3) th clusterX i,AVGAccording to the formula
Figure BDA00027680726400000911
Computing redundancy m of data point ii(i=1,2..,|Dr|)。
Step S1-3, calculating the normalized redundancy P of each data point i in DiThe calculation formula is as follows:
Figure BDA0002768072640000101
in the examples, the calculation
Figure BDA0002768072640000102
To obtain DrNormalized redundancy P of all data points in (r ═ 1,2.. times, n)i(i=1,2...|Dr|)。
Step S1-4, calculating the redundancy average value of all data points in D
Figure BDA0002768072640000103
The calculation formula is as follows:
Figure BDA0002768072640000104
In the examples, the calculation
Figure BDA0002768072640000105
To obtain DrMean value of redundancy of all data in
Figure BDA0002768072640000106
Step S1-5, averaging according to the redundancy
Figure BDA0002768072640000107
The loss function tolerance epsilon and the error rate delta are calculated, and the size of the core data set M is calculated according to the following formula:
Figure BDA0002768072640000108
where c is a fixed constant.
In the embodiment, the redundancy average value is calculated according to the known parameters of 0.08 epsilon and 0.05 delta
Figure BDA0002768072640000109
Computing
Figure BDA00027680726400001010
Get size | M of core datasetr|。
And S1-6, screening the core data set M from the D according to the formula (2) and the formula (4).
In an embodiment, according to the core dataset size | MrL and normalized redundancy Pi(i=1,2...|DrFrom D) |)rMiddle screening large redundancy | Mr| data points constitute a core data set Mr
Repeat step S1 for a given set of end users { C1,C2,...,Cr,...CnAnd a corresponding data set { D }1,D2,...,Dr,...DnEach end-user screens out its core data set { M } in parallel1,M2,...,Mr,...Mn}。
In an embodiment, a set of end users { C is known1,C2,...,Cr,...C100And a corresponding data set { D }1,D2,...,Dr,...D100Repeat step S1, and each end user screens out its core data set { M }in parallel1,M2,...,Mr,...M100}。
And step S2, constructing a sparse model. And constructing a sparse global model according to the given model sparse proportion alpha.
In the embodiment, the fully connected global network model is thinned according to the model thinning ratio α being 20 to obtain a thinned global model structure, which is specifically realized as follows,
step S2-1, constructing a network with L number of layers and n number of L (L is less than or equal to L) layer neuronslThe fully connected network model of (1);
in the embodiment, a multilayer perceptron Model (MLP) comprising an input layer, a hidden layer and an output layer is constructed, and 784 neurons, 200 neurons and 10 neurons are respectively arranged for the input layer, the hidden layer and the output layer of the model, and each layer of neurons is connected with the neurons of the previous layer.
Step S2-2, according to model sparsification proportion alpha and network connection probability
Figure BDA0002768072640000111
The sparse fully-connected network model has the following formula:
Figure BDA0002768072640000112
wherein n islAnd nl-1Respectively representing the number of l-layer neurons and l-1-layer neurons in the fully-connected network model,
Figure BDA0002768072640000113
it represents the probability of connection of the neuron i of the l-th layer to the neuron j of the l-1-th layer.
In an embodiment, each layer of full connections is thinned so that the connection of the neuron i of the l (l ≦ 3) th layer and the neuron j of the l-1 th layer satisfies the probability
Figure BDA0002768072640000114
Thereby obtaining a thinned global model.
Step S3, cloud center initializes. Initializing global model parameters W according to the constructed sparse global model0Global model update parameter U0Iteration round number T and total uploaded parameter number omega.
In an embodiment, the global model parameters W are initialized according to a thinned global model0Global model update parameter U0The iteration round number T is 10, and the total parameter number Ω is 0.
Step S4, local model training. Each end user Cr(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter Hr(r ═ 1,2.., n), and upload its local model update parameters to the cloud center.
In the embodiment, each terminal user performs local model training in parallel according to the local core data set to obtain local model update, and then uploads the local model update to the cloud center,
step S4-1, for each end user Cr(r 1,2.., n) from a local core dataset Mr(r ═ 1,2.. times, n) local model training is performed in parallel, resulting in local model update parameters Hr(r=1,2,...,n);
In an embodiment, each end user Cr(r 1,2.., 100) from its local core dataset Mr(r 1,2.., 100) performing local model training in parallel to obtain local model update parameters H thereofr(r=1,2,...,100);
Step S4-2, eachEnd user Cr(r 1,2.., n) upload its local model parameters Hr(r 1,2.., n) to the cloud center, juxtaposed
Figure BDA0002768072640000121
In an embodiment, each end user Cr(r 1,2.., 100) updating parameters H of the trained local modelr(r 1,2.., 100) to a cloud center, juxtaposed
Figure BDA0002768072640000122
Step S5, global model update. The cloud center calculates a global model update U of the current iteration round number T (T is less than or equal to T) according to local model update parameters uploaded by the terminal usertAnd global model parameters Wt
In the embodiment, the cloud center gathers uploaded local model update parameters, and calculates to obtain a global model update U of the current iteration round number T (T is less than or equal to T)tAnd global model parameters WtThe concrete implementation is as follows,
step S5-1, calculating the global model update U of the current iteration round number T (T is less than or equal to T)tThe calculation formula is as follows:
Figure BDA0002768072640000123
wherein, UtAnd the global model updating parameter represents the current iteration round number T (T is less than or equal to T).
In an embodiment, the cloud center aggregates uploaded local model update parameters Hr(r ═ 1,2.., 100), calculated
Figure BDA0002768072640000124
Obtaining the global model update U of the current iteration round number T (T is less than or equal to T)t
Step S5-2, updating U according to the global modeltCalculating the global model parameter W of the current iteration round number T (T is less than or equal to T)tThe calculation formula is as follows:
Wt=Wt-1-Ut (7)
in an embodiment, the parameter U is updated according to a global modeltCalculating to obtain the global model parameter W of the current iteration round number T (T is less than or equal to T)t
And step S6, adjusting the structure of the global model. According to the global model parameter WtAnd a proportion beta of unimportant connections is set, unimportant network connections in the global model are removed, and important connections of the proportion beta are increased at the same time.
In an embodiment, the parameters W are based on a global modeltRemoving the non-important links with beta 0.3 from each layer of the global model, while adding the important links with beta 0.3, is implemented as follows,
step S6-1, evaluating the influence I of removing any connection I in the global model on the model loss functioniThe calculation formula is as follows:
Ii=|f(Wt)-f(Wt|wi=0)| (8)
wherein, wiRepresents WtThe ith weight parameter.
Since it is very time consuming to evaluate the effect of sequentially removing all connections in the network on the model loss function according to equation (8), equation (8) is further expressed as:
Ii=|giwi| (9)
wherein, giRepresenting the ith update gradient in the global model gradient.
In the examples, I is calculatedi=|giwiI, obtaining the influence I of removing any connection I in the global model on the model loss functioni
Step S6-2, removing I of beta proportion layer by layer from the global modeliA smaller neuronal connection;
in an embodiment, | I is removed from each layer of the global modeli0.3 pieces of IiA lesser value of neuronal connectivity;
step S6-3, according to the importance I of the neuron connectioniLayer by layer sieveSelect important connection points. Assuming that in step S6-2, the connection formed by the ith node of the l-th layer and the jth node of the l-1 layer is removed, screening out a node set S without removing the unimportant connection in the l-1 layer;
in the examples, the importance of neuronal connectivity IiAnd screening out a node set S without removing the unimportant connections in the l-1 layer from each layer l (l is less than or equal to 3) of the global model. Assuming that in step S6-2, the i-th or 8-th node in the l-3-th layer and the j-th or 50-th node in the l-1-2-th layer are removed, a node set S without removing the insignificant connection in the l-1-2-th layer is screened;
and step S6-4, randomly selecting a node from the set S to be connected with the ith node of the ith layer.
In the embodiment, for each layer l (l is less than or equal to 3) of the global model, a node is randomly selected from the set S to be connected with the ith node of the l layer. Based on the embodiment of step S6-3, a node is randomly selected from the set S to connect with the i-th or 8-th node of the l-3 layer.
Step S7, global model distribution. And distributing the adjusted global model and the model parameters to each end user, and starting the next iteration.
In an embodiment, the adjusted global model and model parameters are distributed to each end user Cr(r 1,2.. 100) and the next iteration is started.
And repeating the steps S4-S7 until T rounds of iteration are finished, and finishing the model training to obtain the final total communication traffic omega.
In the embodiment, the model training is ended until T is 10 iterations, and the final total traffic Ω is obtained.
The present invention provides a technical solution that can be implemented by those skilled in the art. The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes or modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions are within the scope of the present invention.

Claims (5)

1. A federal learning communication traffic optimization method based on a core data set is characterized in that: comprises the following steps of (a) carrying out,
step S1, core data set construction. Given data set
Figure FDA0002768072630000011
Weight set of all data points in D
Figure FDA0002768072630000012
And (3) screening a core data set M from the D so that the loss function satisfies | f (D) -f (M) | < ≦ ε | f (D) |, wherein the loss function tolerance ε (0 < ε < 1), the error rate δ (0 < δ < 1) and a parameter R.
And step S2, constructing a sparse model. And constructing a sparse global model according to the given model sparse proportion alpha.
Step S3, cloud center initializes. Initializing global model parameters W according to the constructed sparse global model0Global model update parameter U0Iteration round number T and total uploaded parameter number omega.
Step S4, local model training. Each end user Cr(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter Hr(r ═ 1,2.., n), and upload its local model update parameters to the cloud center.
Step S5, global model update. The cloud center calculates a global model update U of the current iteration round number T (T is less than or equal to T) according to local model update parameters uploaded by the terminal usertAnd global model parameters Wt
And step S6, adjusting the structure of the global model. According to the global model parameter WtAnd a proportion beta of unimportant connections is set, unimportant network connections in the global model are removed, and important connections of the proportion beta are increased at the same time.
Step S7, global model distribution. And distributing the adjusted global model and the model parameters to each end user, and starting the next iteration.
2. The method of claim 1, wherein the step S1 includes the following sub-steps:
step S1-2, calculating the redundancy m of each data point i in DiThe calculation formula is as follows:
Figure FDA0002768072630000013
wherein the content of the first and second substances,
Figure FDA0002768072630000014
represents the weighted sum of all data points in the kth cluster except data point i,
Figure FDA0002768072630000015
representing the weighted average distance of the data point i from all data points in the kth cluster.
Step S1-3, calculating the normalized redundancy P of each data point i in DiThe calculation formula is as follows:
Figure FDA0002768072630000021
step S1-4, calculating the redundancy average value of all data points in D
Figure FDA0002768072630000022
The calculation formula is as follows:
Figure FDA0002768072630000023
step S1-5, averaging according to the redundancy
Figure FDA0002768072630000024
Tolerance of loss functionEpsilon, error rate delta, the size of the core data set M is calculated as follows:
Figure FDA0002768072630000025
where c is a fixed constant.
And S1-6, screening the core data set M from the D according to the formula (2) and the formula (4).
3. The core data set-based federal learning traffic optimization method of claim 1, wherein said step S2 includes the following sub-steps,
step S2-2, according to model sparsification proportion alpha and network connection probability
Figure FDA0002768072630000026
The sparse fully-connected network model has the following formula:
Figure FDA0002768072630000027
wherein n islAnd nl-1Respectively representing the number of l-layer neurons and l-1-layer neurons in the fully-connected network model,
Figure FDA0002768072630000028
it represents the probability of connection of the neuron i of the l-th layer to the neuron j of the l-1-th layer.
4. The method of claim 1, wherein the step S6 includes the following sub-steps:
step S6-1, evaluating the influence I of removing any connection I in the global model on the model loss functioniThe calculation formula is as follows:
Ii=|f(Wt)-f(Wt|wi=0)| (8)
wherein, wiRepresents WtThe ith weight parameter.
Since it is very time consuming to evaluate the effect of sequentially removing all connections in the network on the model loss function according to equation (8), equation (8) is further expressed as:
Ii=|giwi| (9)
wherein, giRepresenting the ith update gradient in the global model gradient.
Step S6-2, removing I of beta proportion layer by layer from the global modeliA smaller neuronal connection;
step S6-3, according to the importance I of the neuron connectioniThe important junctions are screened layer by layer. Assuming that in step S6-2, the connection formed by the ith node of the l-th layer and the jth node of the l-1 layer is removed, screening out a node set S without removing the unimportant connection in the l-1 layer;
and step S6-4, randomly selecting a node from the set S to be connected with the ith node of the ith layer.
5. A core data set-based federated learning traffic optimization system, characterized by: comprises the following modules which are used for realizing the functions of the system,
core data set building block for building a set of end users { C }from a given set of end users1,C2,...,Cr,...CnAnd a corresponding data set { D }1,D2,...,Dr,...DnIn the method, each end user screens out a core data set { M) of the end user in parallel1,M2,...,Mr,...MnContains the following sub-modules:
a clustering submodule for dividing a given data set D into K clusters and using GkRepresents the clustering result of the kth cluster,
Figure FDA0002768072630000035
representing the weight of the jth data point in the kth cluster;
weight and calculation submodule for calculating
Figure FDA0002768072630000031
Obtaining the weight sum of all data except the data point i in the kth clustering cluster;
a weighted average distance calculation submodule for calculating
Figure FDA0002768072630000032
Obtaining the weighted average distance between the data point i and all data points in the kth cluster;
redundancy calculation submodule for calculating
Figure FDA0002768072630000033
Obtaining the redundancy of each data point i in the D;
normalization submodule for calculating
Figure FDA0002768072630000034
Obtaining the normalized redundancy of each data point i in the D;
average redundancy calculation submodule for calculating
Figure FDA0002768072630000041
Obtaining the redundancy average value of all data points in the D;
core data set size calculation submodule for calculating
Figure FDA0002768072630000042
Obtaining the size of a core data set;
a probability sampling submodule for normalizing the redundancy P according to the size of the core data set | M |)iScreening a core data set M from the D;
parallel submodule for parallel slave to each end user CrData set D of (r ═ 1,2.., n)r(r ═ 1,2.., n) screening the core dataset Mr(r=1,2,...,n)。
The sparse model building module is used for conducting sparse on the built full-connection network model to obtain a sparse global model and comprises the following sub-modules:
the full-connection network construction submodule is used for constructing a full-connection network model according to the set number of network layers and the number of neurons in each layer;
a sparsification sampling probability calculation submodule for calculating the neuron connection probability of the adjacent layer in the model for each layer of the full-connection network model
Figure FDA0002768072630000043
A sparsification sampling submodule for sparsifying each layer of the full-connection network model to make the neuron connection probability of the adjacent layer in the model satisfy
Figure FDA0002768072630000044
A cloud center initialization module for initializing global model parameters W0Global model update parameter U0The number of model iteration rounds T and the total uploaded parameters omega.
Local model training module for each end user Cr(r 1,2.., n) carrying out local model training in parallel to obtain a local model updating parameter Hr(r ═ 1,2.., n), containing the following sub-modules:
global model parameter acquisition submodule for each end user Cr(r ═ 1,2.. times, n), obtaining global model parameters W of current iteration round number T (T ≦ T) from the cloud centert
Local model update parameter calculation submodule for each end user Cr(r 1,2.., n), from which the local core dataset M is derivedr(r ═ 1,2.. times, n) to carry out local model training, and obtain a model updating parameter H of the current iteration round number T (T is less than or equal to T)r(r=1,2,...,n)。
The global model updating module is used for calculating global model updating parameters and global model parameters of the current iteration rounds, and comprises the following sub-modules:
a local model update parameter acquisition submodule for acquiring each end user Cr(r ═ 1,2.., n) model update parameter H for current iteration round number T (T ≦ T)r(r=1,2,...,n);
A global model parameter calculation submodule for calculating
Figure FDA0002768072630000051
And Wt=Wt-1-UtObtaining a global model updating parameter U of the current iteration round number T (T is less than or equal to T)tAnd global model parameters Wt
A global model structure adjusting module for adjusting the global model structure according to the global model parameter WtAdjusting the structure of the global model, comprising the following sub-modules:
a model connection importance calculation submodule for calculating Ii=|giwiI, removing the influence of the connection i in the global model on the model loss function;
an importance ranking submodule for ranking IiThe values of (A) are sorted in the order from small to large;
an unimportant connection removal submodule for removing layer-by-layer beta-scaled I from the global modeliA smaller neuronal connection;
an importance node screening submodule for connecting importance I according to neuronsiAnd screening an important connection point set S layer by layer.
And the randomization submodule is used for randomly choosing the importance node i from the set S.
CN202011240064.0A 2020-11-09 2020-11-09 Federal learning communication traffic optimization method and system based on core data set Pending CN112364913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011240064.0A CN112364913A (en) 2020-11-09 2020-11-09 Federal learning communication traffic optimization method and system based on core data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011240064.0A CN112364913A (en) 2020-11-09 2020-11-09 Federal learning communication traffic optimization method and system based on core data set

Publications (1)

Publication Number Publication Date
CN112364913A true CN112364913A (en) 2021-02-12

Family

ID=74509330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011240064.0A Pending CN112364913A (en) 2020-11-09 2020-11-09 Federal learning communication traffic optimization method and system based on core data set

Country Status (1)

Country Link
CN (1) CN112364913A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113258935A (en) * 2021-05-25 2021-08-13 山东大学 Communication compression method based on model weight distribution in federated learning
CN113420888A (en) * 2021-06-03 2021-09-21 中国石油大学(华东) Unsupervised federal learning method based on generalization domain self-adaptation
CN113537509A (en) * 2021-06-28 2021-10-22 南方科技大学 Collaborative model training method and device
CN114819321A (en) * 2022-04-18 2022-07-29 郑州大学 Distributed machine learning-oriented parameter transmission communication optimization method
US11468370B1 (en) 2022-03-07 2022-10-11 Shandong University Communication compression method based on model weight distribution in federated learning
CN115913413A (en) * 2023-02-22 2023-04-04 西安电子科技大学 Intelligent spatial millimeter wave propagation characteristic analysis method
CN117149527A (en) * 2023-10-31 2023-12-01 江苏华鲲振宇智能科技有限责任公司 System and method for backing up and recovering server data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766138A (en) * 2019-10-21 2020-02-07 中国科学院自动化研究所 Method and system for constructing self-adaptive neural network model based on brain development mechanism
CN111768457A (en) * 2020-05-14 2020-10-13 北京航空航天大学 Image data compression method, device, electronic equipment and storage medium
CN111882133A (en) * 2020-08-03 2020-11-03 重庆大学 Prediction-based federated learning communication optimization method and system
CN111901829A (en) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 Wireless federal learning method based on compressed sensing and quantitative coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766138A (en) * 2019-10-21 2020-02-07 中国科学院自动化研究所 Method and system for constructing self-adaptive neural network model based on brain development mechanism
CN111768457A (en) * 2020-05-14 2020-10-13 北京航空航天大学 Image data compression method, device, electronic equipment and storage medium
CN111901829A (en) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 Wireless federal learning method based on compressed sensing and quantitative coding
CN111882133A (en) * 2020-08-03 2020-11-03 重庆大学 Prediction-based federated learning communication optimization method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SOLMAZ NIKNAM ET AL: "Federated Learning for Wireless Communications:Motivation ,Opportunities and Challenges", 《HTTPS://ARXIV.ORG/PDF/1908.06847.PDF》 *
杨庚 等: "联邦学习中的隐私保护研究进展", 《南京邮电大学学报(自然科学版)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113258935A (en) * 2021-05-25 2021-08-13 山东大学 Communication compression method based on model weight distribution in federated learning
CN113258935B (en) * 2021-05-25 2022-03-04 山东大学 Communication compression method based on model weight distribution in federated learning
CN113420888A (en) * 2021-06-03 2021-09-21 中国石油大学(华东) Unsupervised federal learning method based on generalization domain self-adaptation
CN113537509A (en) * 2021-06-28 2021-10-22 南方科技大学 Collaborative model training method and device
US11468370B1 (en) 2022-03-07 2022-10-11 Shandong University Communication compression method based on model weight distribution in federated learning
CN114819321A (en) * 2022-04-18 2022-07-29 郑州大学 Distributed machine learning-oriented parameter transmission communication optimization method
CN114819321B (en) * 2022-04-18 2023-04-07 郑州大学 Distributed machine learning-oriented parameter transmission communication optimization method
CN115913413A (en) * 2023-02-22 2023-04-04 西安电子科技大学 Intelligent spatial millimeter wave propagation characteristic analysis method
CN115913413B (en) * 2023-02-22 2023-07-14 西安电子科技大学 Intelligent space millimeter wave propagation characteristic analysis method
CN117149527A (en) * 2023-10-31 2023-12-01 江苏华鲲振宇智能科技有限责任公司 System and method for backing up and recovering server data
CN117149527B (en) * 2023-10-31 2024-03-08 江苏华鲲振宇智能科技有限责任公司 System and method for backing up and recovering server data

Similar Documents

Publication Publication Date Title
CN112364913A (en) Federal learning communication traffic optimization method and system based on core data set
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN113053115B (en) Traffic prediction method based on multi-scale graph convolution network model
Kim et al. SplitNet: Learning to semantically split deep networks for parameter reduction and model parallelization
CN110263280B (en) Multi-view-based dynamic link prediction depth model and application
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN114912705A (en) Optimization method for heterogeneous model fusion in federated learning
CN112215353B (en) Channel pruning method based on variational structure optimization network
CN112100514B (en) Friend recommendation method based on global attention mechanism representation learning
Navgaran et al. Evolutionary based matrix factorization method for collaborative filtering systems
Liu et al. A survey on computationally efficient neural architecture search
CN114091667A (en) Federal mutual learning model training method oriented to non-independent same distribution data
CN116362325A (en) Electric power image recognition model lightweight application method based on model compression
CN113469891A (en) Neural network architecture searching method, training method and image completion method
Napoli et al. A mathematical model for file fragment diffusion and a neural predictor to manage priority queues over BitTorrent
CN111898316A (en) Construction method and application of super-surface structure design model
Qi et al. FedAGCN: A traffic flow prediction framework based on federated learning and Asynchronous Graph Convolutional Network
CN111832637A (en) Distributed deep learning classification method based on alternative direction multiplier method ADMM
CN115587633A (en) Personalized federal learning method based on parameter layering
CN114065033A (en) Training method of graph neural network model for recommending Web service combination
Tanghatari et al. Federated learning by employing knowledge distillation on edge devices with limited hardware resources
CN117523291A (en) Image classification method based on federal knowledge distillation and ensemble learning
Zhang et al. Federated multi-task learning with non-stationary heterogeneous data
CN114662658A (en) On-chip optical network hot spot prediction method based on LSTM neural network
CN110381540B (en) Dynamic cache updating method for responding popularity of time-varying file in real time based on DNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210212

WD01 Invention patent application deemed withdrawn after publication