CN113191504B - Federated learning training acceleration method for computing resource isomerism - Google Patents

Federated learning training acceleration method for computing resource isomerism Download PDF

Info

Publication number
CN113191504B
CN113191504B CN202110556962.5A CN202110556962A CN113191504B CN 113191504 B CN113191504 B CN 113191504B CN 202110556962 A CN202110556962 A CN 202110556962A CN 113191504 B CN113191504 B CN 113191504B
Authority
CN
China
Prior art keywords
global model
parameters
gradient
updated
model parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202110556962.5A
Other languages
Chinese (zh)
Other versions
CN113191504A (en
Inventor
何耶肖
李欢
章小宁
吴昊
范晨昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110556962.5A priority Critical patent/CN113191504B/en
Publication of CN113191504A publication Critical patent/CN113191504A/en
Application granted granted Critical
Publication of CN113191504B publication Critical patent/CN113191504B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a federated learning training acceleration method for computing resource isomerism, which is characterized in that whether the iterative number difference between a fastest device and a slowest device reaches a threshold value or not is judged, if so, the fastest device does not need to wait, local model parameters are directly updated by using a gradient update value, then the latest global model parameters are downloaded to obtain the latest global model parameter copy, the global model parameter copy is updated by using extra gradient update parameters, the loss function value with the latest global model parameters is judged, and if the loss function value of the updated global model parameter copy is smaller, the latest global model parameters are replaced by the updated global model parameter copy. The invention provides the relevant adaptability improvement to the traditional SSP parallel synchronization mechanism, and improves the utilization rate of computing resources, thereby improving the training efficiency and shortening the overall training time.

Description

Federated learning training acceleration method for computing resource heterogeneity
Technical Field
The invention relates to the technical field of federal learning, in particular to a calculation resource isomerism oriented federal learning training acceleration method.
Background
In recent years, with the rapid development of machine learning, many artificial intelligence applications, such as data mining, image recognition, natural language processing, biometric recognition, search engines, medical diagnosis, detecting credit card fraud, stock market analysis, voice and handwriting recognition, strategic games, robots, and the like, which are difficult to implement using conventional technologies, have been raised in various fields of human life. Machine learning is a data analysis method for automated analytical model construction that allows computers to learn autonomously without explicit programming. As a data-driven technique, machine learning requires a large amount of data to train to arrive at a high performance model. Today, with the development of cell phones, tablets, and various wearable devices, billions of edge devices generate a lot of user Data, and the Data generated by edge devices will increase to 79.4ZB by 2025 as estimated by the report of International Data Corporation (IDC). This is a valuable data resource for machine learning. The related techniques and applications of machine learning are expected to further advance if such data stored on edge devices can be utilized. However, in order to train the machine learning model, the traditional approach is to upload all raw data sets collected by the edge devices to a remote data center for uniform training. This approach requires a large amount of communication resources to transmit a large amount of data, resulting in unacceptably high costs. In addition, as people's privacy awareness increases, many people are reluctant to upload user data to a data center, and the problem of user privacy disclosure and data security is also caused by transmitting user data through a communication network. Also, the centralized training mode cannot be applied to the fields of finance and the like which are highly sensitive to data security.
In order to protect data privacy and reduce communication resource overhead, federal learning, a distributed training system, has been proposed in recent years to replace the centralized training system. Today, many banks, securities companies, medical equipment manufacturers, and technology companies are actively developing federal learning, the safety and utility of which are widely verified, and the basic framework of federal learning is shown in fig. 1.
A set of edge devices (also referred to as clients), such as smartphones, laptops, tablets, etc., participate in the distributed training process of the model using their locally stored data. Each edge device maintains a copy of the model as a local model. The server connecting all edge devices maintains a global model and aggregates the gradient updates from the various edge devices to update the global model. In the training iteration, each edge device uses its local data to compute gradient updates for the model parameters, then uploads the gradient updates to the server, and then downloads the updated new global model parameters from the server as new local model parameters. In the process, each edge device only shares the intermediate calculation result (namely the gradient updating of the model parameters) to the server without uploading the original data stored locally by the edge device, so that the privacy and the security of the data are protected. In addition, the gradient and model of the transmission may utilize various encryption methods to further enhance its security.
In an edge environment, devices have heterogeneous computing resources due to differences in their processor architectures, power consumption limitations, systems, and so forth. The gradient computation time difference between the devices is large due to the heterogeneous computing resources. In order to maintain consistency of model parameters of various devices, federal learning requires the application of a synchronous parallel mechanism. If a traditional Bulk Synchronization Parallel (BSP) synchronization Parallel mechanism is used, a device with more computing resources capable of quickly completing gradient update computation needs to wait for a device with less computing resources to achieve synchronization in each iteration. The more rapidly graduated device wastes a lot of computing resources due to the waiting process, while the slowest device largely determines the time consumption of the iteration, which is called the Straggler problem. The Straggler problem reduces the training efficiency of federated learning, slowing the convergence rate of the learning model. In order to improve the training efficiency, a Stale Synchronous Parallel (SSP) synchronization Parallel mechanism is proposed. The strategy of SSP is to allow each device to not wait for the next iteration directly after completing one iteration, but to limit the difference in the number of iterations between the fastest device and the slowest device to within a threshold. Once this threshold is reached, the fastest device needs to wait until the slowest device catches up. Although SSP improves training efficiency to some extent, a large amount of computing resources are inevitably wasted due to the waiting process. The current machine learning algorithm needs a large amount of computing resource support, and on the edge device with limited resources, the low-efficiency training can lead to long training time and difficulty in deploying algorithm application.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a federated learning training acceleration method oriented to computing resource isomerism.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a federated learning training acceleration method for computing resource isomerism comprises the following steps:
s1, initializing local iteration times, continuous extra gradient calculation times, a continuous extra gradient calculation time threshold value and total iteration times, and downloading initial global model parameters from a server;
s2, adding 1 to the local iteration times, judging whether the local iteration times meet the total iteration times, if so, ending the training, otherwise, entering the step S3;
s3, storing the latest global model parameters downloaded from the server as local model parameters, performing gradient update by using a BP algorithm in combination with a local data set to obtain gradient update parameters, and uploading the gradient update parameters to the server;
s4, judging whether the continuous extra gradient calculation times meet the continuous extra gradient calculation time threshold, if yes, entering a step S10, otherwise, entering a step S5;
s5, updating the local model parameters by using the gradient update parameters obtained in the step S3 to obtain updated local model parameters, and performing an additional gradient update by using a BP algorithm in combination with the local data set to obtain additional gradient update parameters;
S6, receiving a signal which is issued by the server and allows the latest global model parameter to be downloaded, and judging whether the extra gradient calculation is completed, if so, entering a step S7, otherwise, entering a step S9;
s7, downloading the latest global model parameters from the server and copying to obtain a global model parameter copy, updating the global model parameter copy by using the extra gradient update parameters obtained in the step S5, and adding 1 to the calculation times of the continuous extra gradients;
s8, re-determining the latest global model parameters according to the latest global model parameters and the loss function values corresponding to the updated global model parameter copies, and returning to the step S2;
s9, immediately stopping extra gradient calculation, downloading the latest global model parameters from the server, initializing the calculation times of continuous extra gradient, and returning to the step S2;
and S10, initializing continuous additional gradient calculation times, receiving a signal which is sent by the server and allows the latest global model parameter to be downloaded, downloading the latest global model parameter, and returning to the step S2.
The beneficial effect of this scheme does:
after the threshold value of the synchronous parallel mechanism SSP is reached, the equipment which should wait is allowed to additionally perform a round of gradient calculation, after the global model parameters are obtained by next downloading, the global model parameters are updated by using the gradients obtained by the additional round of calculation on the premise of reducing the loss function of the model, so that the calculation resources are further utilized, the waiting time of the equipment which can quickly complete the gradient calculation is reduced, the utilization rate of the calculation resources is improved, the training efficiency is improved, and the overall training time is shortened.
Further, the gradient update parameter calculation formula is:
Figure BDA0003077532780000051
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003077532780000052
the parameters are updated for the purpose of the gradient,
Figure BDA0003077532780000053
in order to derive the parameters in the objective function,
Figure BDA0003077532780000054
for the local model parameters, z is the data sample of the local data set, and Q is the objective function.
The beneficial effects of the further scheme are as follows:
and calculating to obtain gradient updating parameters, and updating global model parameters by combining the gradient updating parameters.
Further, the additional gradient update parameter calculation formula is:
Figure BDA0003077532780000055
wherein the content of the first and second substances,
Figure BDA0003077532780000056
the parameters are updated for the purpose of additional gradients,
Figure BDA0003077532780000057
to derive the parameters in the objective function,
Figure BDA0003077532780000058
for the updated local model parameters, z is the data sample of the local data set and Q is the objective function.
The beneficial effects of the further scheme are as follows:
and performing an additional round of gradient updating to obtain additional gradient updating parameters, and updating the global model parameter copy.
Further, the update formula for updating the local model parameters by using the gradient update parameters in step S5 is as follows:
Figure BDA0003077532780000059
wherein the content of the first and second substances,
Figure BDA00030775327800000510
updating the parameters for the gradient, η is the training step length,
Figure BDA00030775327800000511
as are the parameters of the local model,
Figure BDA00030775327800000512
are updated local model parameters.
The beneficial effects of the further scheme are as follows:
and the local model is updated, an additional round of gradient calculation is performed, and the utilization rate of calculation resources is improved.
Further, the updating formula for updating the global model parameter copy with the additional gradient updating parameters in step S7 is:
Figure BDA0003077532780000061
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003077532780000062
update parameters for additional gradients, ws' is a copy of the parameters of the global model, ws *Is a copy of the updated global model parameters.
The beneficial effects of the further scheme are as follows:
preparation is made for re-determining the latest global model parameters for the loss function values corresponding to the latest global model parameters and the updated copy of the global model parameters in step S8.
Further, the step S8 is specifically:
determining the latest global model parameter wsWith the updated global model parameter copy ws *Respectively corresponding loss function values loss (w)s) And loss (w)s *) Size, if the latest global model parameter wsLoss function value loss (w)s) Less than the updated global model replica ws *Loss function value loss (w)s *) The updated global model copy w is discardeds *Otherwise, the updated global model copy ws *As the latest global model parameter wsAnd returns to step S2.
The beneficial effects of the further scheme are as follows:
and updating the gradient obtained by the extra gradient calculation, updating the copy after obtaining the global model parameter by next downloading, replacing the original global model parameter with the copy if the updated copy has a smaller loss function value than the original global model parameter, and otherwise discarding the copy to enable the training model to be converged more quickly, thereby improving the training efficiency and shortening the training time.
Further, the receiving of the signal for allowing downloading of the latest global model parameters sent by the server includes the following steps:
s61, initializing global model parameters, an iteration number difference threshold value between the fastest device and the slowest device and a target loss function value;
s62, updating the initial global model parameters by using the gradient updating parameters uploaded to the server in the step S3 to obtain updated global model parameters;
s63, judging whether the loss function value corresponding to the updated global model parameter is smaller than the target loss function value, if so, stopping training, otherwise, entering the step S64;
s64, judging whether the iteration number difference between the fastest device and the slowest device meets an iteration number difference threshold value, if so, entering a step S65, otherwise, entering a step S66;
s65, sending a signal for allowing the latest global model parameter to be downloaded to other equipment except the fastest equipment, and returning to the step S62;
s66, a signal for allowing the latest global model parameter to be downloaded is sent to each device, and the process returns to step S62.
The beneficial effects of the further scheme are as follows:
after reaching the SSP threshold, the fastest devices need not wait, update the local model directly with the gradient obtained from the local computation, and then perform an additional round of gradient computation using the local model and the data. This reduces the latency of the device and improves the utilization of the computing resources.
Further, the update formula for updating the initial global model parameters by using the gradient update parameters in step S62 is as follows:
Figure BDA0003077532780000071
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003077532780000072
the parameters are updated for the purpose of the gradient,
Figure BDA0003077532780000073
is an initial global model parameter, eta is a training step length, wsIs the updated global model parameters.
The beneficial effects of the further scheme are as follows:
the global model parameters are updated, the convergence of the global model is promoted, and the overall training progress is promoted.
Drawings
FIG. 1 is a prior art federated learning basic framework;
FIG. 2 is a schematic flow chart of a federated learning training acceleration method for computing resource heterogeneity provided in the present invention;
fig. 3 is a flowchart illustrating the substep of step S6.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 2, an embodiment of the present invention provides a federated learning training acceleration method for computing resource heterogeneity, including the following steps S1 to S10:
s1, initializing local iteration times tpSuccessive additional gradient calculation times alphapCalculating a time threshold c and a total iteration time T by continuous additional gradiometers, and downloading initial global model parameters from a server
Figure BDA0003077532780000081
In this embodiment, all the edge devices are connected to a single server through a physical channel, and acquire, from the server through the physical channel, global model parameters for downloading signal initialization
Figure BDA0003077532780000082
S2, determining the local iteration times tpAdding 1, entering an iteration process, and judging the local iteration times tpWhether the total iteration times T are met, if yes, finishing training, and otherwise, entering a step S3;
s3, storing the latest global model parameters downloaded from the server as local model parameters
Figure BDA0003077532780000091
Calculating gradient updating parameters by using a BP algorithm in combination with a local data set, and uploading the gradient updating parameters to a server;
calculating gradient updating parameters through a BP algorithm, wherein the gradient updating parameters are calculated according to the following formula:
Figure BDA0003077532780000092
wherein the content of the first and second substances,
Figure BDA0003077532780000093
the parameters are updated for the purpose of the gradient,
Figure BDA0003077532780000094
to derive the parameters in the objective function,
Figure BDA0003077532780000095
for the local model parameters, z is the data sample of the local data set, and Q is the objective function.
In this embodiment, the local data set is a data set generated at the edge device.
In this embodiment, the work flow of the BP algorithm includes the following steps:
firstly, providing an input example for an input layer neuron, and then forwarding signals layer by layer until a result of an output layer is generated; then the error is reversely propagated to the hidden layer neuron; then adjusting the connection weight and the threshold value according to the error of the hidden layer neuron; and finally, circularly performing an iterative process until a preset stop condition is reached.
S4, judging the number alpha of continuous extra gradient calculationpWhether the continuous additional gradient calculation time threshold c is met, if yes, the step S10 is executed, and if not, the step S5 is executed;
in this embodiment, the threshold c of the continuous extra-gradient computation time is a hyper-parameter, which is set manually and generally ranges from 1 to 10.
S5, updating parameters by using the gradient obtained in the step S3
Figure BDA0003077532780000101
Updating local model parameters
Figure BDA0003077532780000102
Obtaining updated local model parameters
Figure BDA0003077532780000103
Updating parameters using gradients
Figure BDA0003077532780000104
Updating local model parameters
Figure BDA0003077532780000105
The update formula of (2) is:
Figure BDA0003077532780000106
wherein eta is a training step length;
and calculating an additional round of gradient update by using a BP algorithm in combination with a local data set to obtain an additional gradient update parameter
Figure BDA0003077532780000107
The additional gradient update parameter calculation formula is:
Figure BDA0003077532780000108
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003077532780000109
the parameters are updated for the purpose of additional gradients,
Figure BDA00030775327800001010
in order to derive the parameters in the objective function,
Figure BDA00030775327800001011
for the updated local model parameters, z is the data sample of the local data set and Q is the objective function.
S6, receiving a signal which is sent by the server and allows the latest global model parameter to be downloaded, judging whether the extra gradient calculation is completed, if so, entering the step S7, otherwise, entering the step S9;
in this embodiment, the signal for allowing the latest global model parameter to be downloaded and sent by the server is received through the physical channel.
As shown in fig. 3, the receiving of the signal for allowing the latest global model parameter to be downloaded from the server includes the following steps S61 to S66:
s61, initializing global model parameters wsThreshold m for the difference in the number of iterations between the fastest and the slowest device and the value of the loss-of-interest function lossinf
In this embodiment, the threshold m of the difference between the iteration times of the fastest device and the slowest device is a hyper-parameter, which is set manually, and the range is generally 1 to 10.
S62, updating the parameters by the gradient uploaded to the server in the step S3
Figure BDA0003077532780000111
Updating initial global model parameters
Figure BDA0003077532780000112
Obtaining updated global model parameters wsUpdating parameters using gradients
Figure BDA0003077532780000113
Updating initial global model parameters
Figure BDA0003077532780000114
The update formula of (2) is:
Figure BDA0003077532780000115
wherein eta is a training step length;
s63, judging the updated global model parameter wsCorresponding loss function value loss (w)s) Whether less than the loss of interest function value lossinfIf yes, stopping training, otherwise, entering step S64;
s64, judging whether the iteration number difference between the fastest device and the slowest device meets a preset iteration number difference threshold value m, if so, entering a step S65, otherwise, entering a step S66;
s65, sending a signal for allowing the latest global model parameter to be downloaded to other equipment except the fastest equipment, and returning to the step S62;
s66, a signal for allowing the latest global model parameter to be downloaded is sent to each device, and the process returns to step S62.
In this embodiment, the server is connected to all edge devices participating in federal learning through a physical channel.
S7, downloading the latest global model parameter w from the serversAnd copying to obtain a global model parameter copy ws', and updating the parameters with the additional gradient obtained in step S5
Figure BDA0003077532780000116
Updating global model parameter copy ws *And calculating the number of successive additional gradients alphapAdding 1; updating parameters with additional gradients
Figure BDA0003077532780000117
Updating global model parameter copy wsThe update formula of' is:
Figure BDA0003077532780000118
Wherein eta is a training step length;
s8, according to the latest global model parameter wsWith the updated copy w of the global model parameterss *Corresponding loss function value, and re-determining the latest global model parameter wsReturning to step S2;
in this embodiment, the latest global model parameter w is determinedsWith the updated global model parameter copy ws *Respectively corresponding loss function values loss (w)s) And loss (w)s *) Size, if the latest global model parameter wsLoss function value loss (w)s) Less than the updated global model replica ws *Loss function value loss (w)s *) The updated global model copy w is discardeds *Otherwise, the updated global model copy ws *As the latest global model parameter wsAnd returns to step S2.
S9, immediately stopping additional gradient calculation, and downloading the latest global model parameter w from the serversAnd initializing the number of successive additional gradient calculations alphapReturning to step S2;
s10, initializing continuous additional gradient calculation times alphapAnd receiving a signal that the server allows downloading the latest global model parameter wsReturning to step S2.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its aspects.

Claims (8)

1. A federated learning training acceleration method for computing resource isomerism is characterized by comprising the following steps:
s1, initializing local iteration times, continuous extra gradient calculation times, a continuous extra gradient calculation time threshold value and total iteration times, and downloading initial global model parameters from a server;
s2, adding 1 to the local iteration times, judging whether the local iteration times meet the total iteration times, if so, ending the training, otherwise, entering the step S3;
s3, storing the latest global model parameter downloaded from the server as a local model parameter, performing gradient update by using a BP algorithm in combination with a local data set to obtain a gradient update parameter, and uploading the gradient update parameter to the server;
S4, judging whether the continuous extra gradient calculation times meet the continuous extra gradient calculation time threshold, if yes, entering a step S10, otherwise, entering a step S5;
s5, updating the local model parameters by using the gradient update parameters obtained in the step S3 to obtain updated local model parameters, and performing an additional gradient update by using a BP algorithm in combination with the local data set to obtain additional gradient update parameters;
s6, receiving a signal which is issued by the server and allows the latest global model parameter to be downloaded, and judging whether the extra gradient calculation is completed, if so, entering a step S7, otherwise, entering a step S9;
s7, downloading the latest global model parameters from the server and copying to obtain a global model parameter copy, updating the global model parameter copy by using the extra gradient update parameters obtained in the step S5, and adding 1 to the continuous extra gradient calculation times;
s8, re-determining the latest global model parameters according to the latest global model parameters and the loss function values corresponding to the updated global model parameter copies, and returning to the step S2;
s9, stopping the extra gradient calculation immediately, downloading the latest global model parameters from the server, initializing the calculation times of the continuous extra gradient, and returning to the step S2;
And S10, initializing continuous additional gradient calculation times, receiving a signal which is issued by the server and allows the latest global model parameter to be downloaded, downloading the latest global model parameter, and returning to the step S2.
2. The method as claimed in claim 1, wherein the gradient update parameter calculation formula is:
Figure FDA0003648240620000021
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00036482406200000212
the parameters are updated for the purpose of the gradient,
Figure FDA0003648240620000022
in order to derive the parameters in the objective function,
Figure FDA00036482406200000213
for the local model parameters, z is the data sample of the local data set and Q is the objective function.
3. The method as claimed in claim 1, wherein the additional gradient update parameter calculation formula is:
Figure FDA0003648240620000023
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003648240620000024
the parameters are updated for the purpose of additional gradients,
Figure FDA0003648240620000025
in order to derive the parameters in the objective function,
Figure FDA0003648240620000026
for the updated local model parameters, z is the data sample of the local data set and Q is the objective function.
4. The method as claimed in claim 1, wherein the updating formula for updating the local model parameters by using the gradient update parameters in step S5 is as follows:
Figure FDA0003648240620000027
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003648240620000028
updating the parameters for the gradient, eta is the training step length,
Figure FDA0003648240620000029
are the parameters of the local model and are,
Figure FDA00036482406200000210
are updated local model parameters.
5. The method as claimed in claim 4, wherein the updating formula for updating the global model parameter copy with the additional gradient updating parameter in step S7 is:
Figure FDA00036482406200000211
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003648240620000031
update parameters for additional gradients, ws' is a copy of the parameters of the global model, ws *Is a copy of the updated global model parameters.
6. The method as claimed in claim 2, wherein the step S8 specifically includes:
determining the latest global model parameter wsWith the updated global model parameter copy ws *Respectively corresponding loss function values loss (w)s) And loss (w)s *) Size, if the latest global model parameter wsLoss function value loss (w)s) Less than the updated global model replica ws *Loss function value loss (w)s *) The updated global model copy w is discardeds *Otherwise, the updated global model copy ws *As the latest global model parameter wsAnd returns to step S2.
7. The method for accelerating the training of the federal learning oriented towards computing resource heterogeneity according to claim 1, wherein the receiving the signal transmitted by the server for allowing the latest global model parameter to be downloaded includes the following steps:
S61, initializing global model parameters, an iteration number difference threshold value between the fastest device and the slowest device and a target loss function value;
s62, updating the initial global model parameters by using the gradient update parameters uploaded to the server in the step S3 to obtain updated global model parameters;
s63, judging whether the loss function value corresponding to the updated global model parameter is smaller than the target loss function value, if yes, stopping training, otherwise, entering the step S64;
s64, judging whether the iteration number difference between the fastest device and the slowest device meets the iteration number difference threshold value, if yes, entering a step S65, otherwise, entering a step S66;
s65, sending a signal for allowing the latest global model parameter to be downloaded to other equipment except the fastest equipment, and returning to the step S62;
s66, a signal for allowing the latest global model parameter to be downloaded is sent to each device, and the process returns to step S62.
8. The method of claim 7, wherein the updating formula for updating the initial global model parameters by using the gradient update parameters in step S62 is as follows:
Figure FDA0003648240620000041
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003648240620000042
the parameters are updated for the purpose of the gradient,
Figure FDA0003648240620000043
is an initial global model parameter, eta is a training step length, w sAre updated global model parameters.
CN202110556962.5A 2021-05-21 2021-05-21 Federated learning training acceleration method for computing resource isomerism Expired - Fee Related CN113191504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110556962.5A CN113191504B (en) 2021-05-21 2021-05-21 Federated learning training acceleration method for computing resource isomerism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110556962.5A CN113191504B (en) 2021-05-21 2021-05-21 Federated learning training acceleration method for computing resource isomerism

Publications (2)

Publication Number Publication Date
CN113191504A CN113191504A (en) 2021-07-30
CN113191504B true CN113191504B (en) 2022-06-28

Family

ID=76984715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110556962.5A Expired - Fee Related CN113191504B (en) 2021-05-21 2021-05-21 Federated learning training acceleration method for computing resource isomerism

Country Status (1)

Country Link
CN (1) CN113191504B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778966B (en) * 2021-09-15 2024-03-26 深圳技术大学 Cross-school information sharing method and related device for university teaching and course score
CN113902128B (en) * 2021-10-12 2022-09-16 中国人民解放军国防科技大学 Asynchronous federal learning method, device and medium for improving utilization efficiency of edge device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112181971A (en) * 2020-10-27 2021-01-05 华侨大学 Edge-based federated learning model cleaning and equipment clustering method, system, equipment and readable storage medium
CN112288100A (en) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 Method, system and device for updating model parameters based on federal learning
CN112817653A (en) * 2021-01-22 2021-05-18 西安交通大学 Cloud-side-based federated learning calculation unloading computing system and method
CN112818394A (en) * 2021-01-29 2021-05-18 西安交通大学 Self-adaptive asynchronous federal learning method with local privacy protection

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10546237B2 (en) * 2017-03-30 2020-01-28 Atomwise Inc. Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel
US11836583B2 (en) * 2019-09-09 2023-12-05 Huawei Cloud Computing Technologies Co., Ltd. Method, apparatus and system for secure vertical federated learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611610A (en) * 2020-04-12 2020-09-01 西安电子科技大学 Federal learning information processing method, system, storage medium, program, and terminal
CN112181971A (en) * 2020-10-27 2021-01-05 华侨大学 Edge-based federated learning model cleaning and equipment clustering method, system, equipment and readable storage medium
CN112288100A (en) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 Method, system and device for updating model parameters based on federal learning
CN112817653A (en) * 2021-01-22 2021-05-18 西安交通大学 Cloud-side-based federated learning calculation unloading computing system and method
CN112818394A (en) * 2021-01-29 2021-05-18 西安交通大学 Self-adaptive asynchronous federal learning method with local privacy protection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ClusterGrad: Adaptive Gradient Compression by Clustering in Federated Learning;Laizhong Cui;《GLOBECOM 2020 - 2020 IEEE Global Communications Conference》;20210125;全文 *
面向异构边缘节点的融合联邦学习机制研究;廖钰盈;《中国优秀硕士电子期刊》;20210515;全文 *

Also Published As

Publication number Publication date
CN113191504A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN111784002B (en) Distributed data processing method, device, computer equipment and storage medium
CN113191504B (en) Federated learning training acceleration method for computing resource isomerism
CN110889509B (en) Gradient momentum acceleration-based joint learning method and device
CN108416440A (en) A kind of training method of neural network, object identification method and device
EP3889846A1 (en) Deep learning model training method and system
CN113011568B (en) Model training method, data processing method and equipment
WO2023103864A1 (en) Node model updating method for resisting bias transfer in federated learning
CN112163601A (en) Image classification method, system, computer device and storage medium
CN111598213A (en) Network training method, data identification method, device, equipment and medium
WO2023231954A1 (en) Data denoising method and related device
WO2021169366A1 (en) Data enhancement method and apparatus
WO2023020613A1 (en) Model distillation method and related device
CN112446462B (en) Method and device for generating target neural network model
CN115526307A (en) Network model compression method and device, electronic equipment and storage medium
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN114358250A (en) Data processing method, data processing apparatus, computer device, medium, and program product
CN115907041A (en) Model training method and device
CN114758130B (en) Image processing and model training method, device, equipment and storage medium
CN115795025A (en) Abstract generation method and related equipment thereof
CN114723069A (en) Parameter updating method and device and electronic equipment
CN110782017B (en) Method and device for adaptively adjusting learning rate
CN113033422A (en) Face detection method, system, equipment and storage medium based on edge calculation
CN112379688B (en) Multi-robot finite time synchronization control method based on membrane calculation
WO2023143128A1 (en) Data processing method and related device
CN116419251A (en) Cell load adjusting method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220628

CF01 Termination of patent right due to non-payment of annual fee