WO2024099109A1 - 一种联邦学习模型训练方法、装置、设备及存储介质 - Google Patents
一种联邦学习模型训练方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2024099109A1 WO2024099109A1 PCT/CN2023/127265 CN2023127265W WO2024099109A1 WO 2024099109 A1 WO2024099109 A1 WO 2024099109A1 CN 2023127265 W CN2023127265 W CN 2023127265W WO 2024099109 A1 WO2024099109 A1 WO 2024099109A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- round
- global
- local
- model parameters
- difference
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 97
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000004590 computer program Methods 0.000 claims description 23
- 238000012937 correction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 22
- 238000004891 communication Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Definitions
- the present application relates to the field of model training technology, and in particular to a federated learning model training method, apparatus, device and storage medium.
- Federated learning is a distributed framework that decouples data and models, which can solve the problems of data silos and privacy protection.
- joint modeling of all participants can be achieved without leaving the local data of the participants.
- the trained federated learning model also called the global model
- Federated learning has broad application prospects in the fields of smart healthcare, financial insurance, and smart Internet of Things.
- federated learning faces severe challenges brought by the problem of non-independent and identically distributed data.
- the problem of non-independent and identically distributed data means that the data distribution owned by each participant is inconsistent with the global distribution. This data inconsistency may cause the model to converge too slowly and may damage the accuracy of the model.
- the present application provides a federated learning model training method, apparatus, device and storage medium to improve the convergence speed and accuracy of the federated learning model.
- the present application provides a federated learning model training method, which is applied to a client, and the method includes:
- determining the local parameter gradient based on the updated local model parameter and the current round global parameter control variable includes:
- the local parameter gradient is determined.
- the correction of the loss value based on the currently saved local parameter control variable of the previous round and the global parameter control variable of the current round includes:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- Ground model parameters including:
- the local model parameters output in this round are determined.
- the correcting the local parameter gradient based on the currently saved local parameter control variable of the previous round and the global parameter control variable of the current round includes:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- determining the local model parameters output in this round based on the corrected local parameter gradient and the updated local model parameters includes:
- the method further includes:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- the present application provides a federated learning model training method, which is applied to a server, and the method includes:
- the global model parameters of this round and the global parameter control variables of this round of the federated learning model to be trained are determined based on the first difference and the second difference;
- the current round global model parameters and the current round global parameter control variables are sent to each client.
- determining the current round global model parameters and the current round global parameter control variables of the federated learning model to be trained based on the first difference and the second difference includes:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the present application provides a federated learning model training system, the system comprising:
- the server is used to perform at least the following steps in each round of iterative training of the federated learning model: if a first difference between the local model parameters output in the previous round and the global model parameters in the previous round, and a second difference between the global sub-parameter gradient of each client in the previous round and the local parameter control variables of the client in the previous two rounds are received, determine the global model parameters and the global parameter control variables of the current round of the federated learning model to be trained based on the first difference and the second difference; send the global model parameters and the global parameter control variables of the current round to each client;
- Each client is used to perform at least the following steps during each round of iterative training of the federated learning model: receiving the global model parameters of this round and the global parameter control variables of this round sent by the server; using the global model parameters of this round to update the local model parameters of the currently saved federated learning sub-model; determining the local parameter gradient based on the updated local model parameters and the global parameter control variables of this round, and determining the local model parameters output in this round based on the local parameter gradient; and determining the global sub-parameter gradient based on the updated local model parameters; determining the first difference between the local model parameters output in this round and the global model parameters of this round; and determining the second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; and sending the first difference and the second difference to the server.
- the present application provides a federated learning model training device, the device comprising:
- a receiving module used for receiving the global model parameters of the federated learning model to be trained and the global parameter control variables of the current round sent by the server during each round of iterative training of the federated learning model;
- An updating module used to update the local model parameters of the currently saved federated learning sub-model using the global model parameters of this round;
- the first determination module is used to control the local model parameters based on the updated local model parameters and the global parameters of this round. Control variables, determine local parameter gradients, determine local model parameters output in this round based on the local parameter gradients; and determine global sub-parameter gradients based on the updated local model parameters;
- the first sending module is used to determine the first difference between the local model parameters output in this round and the global model parameters in this round; and determine the second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; send the first difference and the second difference to the server, so that the server determines the global model parameters and the global parameter control variables of the next round based on the first difference and the second difference sent by each client.
- the first determining module is specifically configured to:
- the local parameter gradient is determined.
- the first determining module is specifically configured to:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- the first determining module is specifically configured to:
- the local model parameters output in this round are determined.
- the first determining module is specifically configured to:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- the first determining module is specifically configured to:
- the first determining module is further configured to:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- the present application provides a federated learning model training device, the device comprising:
- a second determination module is used to determine the global model parameters of the federated learning model to be trained in this round and the global parameter control variables of the current round based on the first difference and the second difference if a first difference between the local model parameters output in the previous round and the global model parameters in the previous round and a second difference between the global sub-parameter gradient of the previous round and the local parameter control variables of the client in the previous two rounds are received during each round of iterative training of the federated learning model;
- the second sending module is used to send the current round global model parameters and the current round global parameter control variables to each client.
- the second determining module is specifically configured to:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the present application provides an electronic device, which includes at least a processor and a memory, and the processor is used to implement the steps of any of the above methods when executing a computer program stored in the memory.
- the present application provides a computer-readable storage medium storing a computer program, which implements the steps of any of the above methods when executed by a processor.
- the present application provides a computer program product, comprising: a computer program code, which, when executed on a computer, enables the computer to execute the steps of any of the above-described methods.
- the local parameter control variables can also be called the client model parameter update direction, or federated learning Sub-model parameter update direction
- global parameter control variable can also be called federated learning model parameter update direction, or server model parameter update direction, that is, the server of this application can comprehensively determine the next round of model parameter update direction of the federated learning model based on the global sub-parameter gradient of this round sent by each client and the model parameter update direction of the client model in the previous round;
- each client can determine the client local parameter gradient based on the global parameter control variable of this round sent by the server, that is, each client can determine the client local parameter gradient based on the global model parameter update direction of this round sent by the server, and then determine the local model parameters output by the client in this round.
- the clients of this application can constrain each other, and each client can refer to the parameter update direction of other clients during the iterative training process, etc., to adjust the parameters of the client local federated learning sub-model, thereby effectively solving the client drift problem and improving the accuracy of the trained federated learning model.
- the federated learning sub-model in the client can be pulled back to the ideal update path for update during each update of the iterative training, which can significantly reduce the number of communications between the client and the server and the number of iterative training rounds of the federated learning model, and significantly improve the convergence speed of the federated learning model.
- FIG1 shows a schematic diagram of a first federated learning model training process provided by some embodiments
- FIG2 shows a schematic diagram of a second federated learning model training process provided by some embodiments
- FIG3 shows a schematic diagram of a third federated learning model training process provided by some embodiments.
- FIG4 shows a schematic diagram of a fourth federated learning model training process provided by some embodiments.
- FIG5 shows a schematic diagram of a federated learning model training system provided by some embodiments
- FIG6 shows a schematic diagram of a federated learning model training device provided by some embodiments
- FIG7 shows a schematic diagram of another federated learning model training device provided in some embodiments.
- FIG8 shows a schematic diagram of the structure of an electronic device provided in some embodiments.
- the present application provides a federated learning model training method, apparatus, equipment and medium.
- module refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functions associated with that element.
- FIG1 shows a schematic diagram of a first federated learning model training process provided by some embodiments, where the method is applied to a client, and exemplarily, the client may be an electronic device such as a PC or a mobile terminal. As shown in FIG1 , the client performs at least the following steps in each round of iterative training of the federated learning model:
- the server may determine the model parameters of the federated learning model in this round (for convenience of description, referred to as the global model parameters of this round, denoted by ), and the parameter control variables in this round (for the convenience of description, called the global parameter control variables of this round, ), and the global model parameters of this round are determined and the global parameter control variables of this round Sent to each client (also called participant) participating in this round of training.
- the federated learning model can also be called the server model, and the global parameter control variable can also be called the federated learning model parameter update direction, or the server model parameter update direction. It can also be called the parameter update direction of this round of federated learning model, or the parameter update direction of this round of server model.
- Each client participating in this round of training can receive the global model parameters of this round sent by the server. and the global parameter control variables of this round
- the model stored in each client is called the federated learning sub-model. and the global parameter control variables of this round Afterwards, the global model parameters of this round can be used
- the parameters of the currently saved federated learning sub-model for convenience of description, called local model parameters, That is, each client participating in this round of training can update the parameters (also called weights) of the federated learning sub-model stored in the client's local computer to the global model parameters of this round.
- S103 Determine the local parameter gradient based on the updated local model parameters and the current round global parameter control variables, determine the local model parameters output in this round based on the local parameter gradient; and determine the global sub-parameter gradient based on the updated local model parameters.
- the global model parameters of this round are used
- the process may include:
- the local parameter gradient is determined.
- the sample data set used by the client to train the federated learning sub-model in this round of iterative training is represented by Di.
- the sample data set contains a number of sample data i.
- the loss value also called loss function, for ease of understanding, is used to determine
- the sample data i may be input into the federated learning sub-model to obtain the recognition result of the federated learning sub-model, and the loss value may be determined based on the difference between the sample label corresponding to the sample data i and the recognition result, etc., which will not be described in detail here.
- each client may perform several (for convenience of description, referred to as K times) sub-trainings on the local federated learning sub-model based on the sample data, and may determine the loss value of this round of training based on the sum of the loss values corresponding to each sample data in the K sub-training processes.
- the loss value obtained in this round of training is denoted by express.
- the local parameter control variables of the previous round currently saved by the client can be used (for ease of understanding, the variables are represented by Indicates) and this round of global parameter control variable Correct the loss value.
- the client controls the variables based on the local parameters of the previous round currently saved.
- the global parameter control variables of this round When correcting the loss value, you can first determine the local parameter control variable of the previous round The global parameter control variable of this round.
- the federated learning sub-model in the client can also be called the client model, and the local parameter control variable can also be called the client model parameter update direction, or the federated learning sub-model parameter update direction. It can also be called the direction of the last round of client model parameter update, or the direction of the last round of federated learning sub-model parameter update.
- the last round of local parameter control variables can be The global parameter control variable of this round
- the difference that is, the third difference
- the loss value is corrected based on the client drift value, so as to realize the correction of the client's local parameter gradient based on the client drift value, so that each client can refer to the parameter update direction of other clients in the iterative training process to adjust the parameters of the client's local federated learning sub-model, thereby effectively solving the client drift problem, improving the accuracy of the trained federated learning model, and significantly improving the convergence speed of the federated learning model.
- the third difference And the set loss value adjustment rate (called ⁇ for ease of understanding) is used to correct the loss value.
- the fifth difference the difference between the fourth difference (for convenience of description, called the fifth difference), that is:
- the loss value can be accurately corrected based on the product of the square of the fifth difference and the loss value adjustment rate ⁇ .
- the sum of the loss value before correction and the product can be determined as the corrected loss value.
- the corrected loss value (for ease of understanding, the corrected loss value is referred to as the local update function, and is represented by denoted), where:
- the local parameter gradient can be determined based on the corrected loss value:
- the local update function can be used For local model parameters Take the derivative and get the local parameter gradient Right now:
- the client obtains the local parameter gradient
- based on the local parameter gradient you can first control the variables based on the currently saved local parameters of the previous round. And the global parameter control variables of this round Correct the local parameter gradient.
- the third difference between the local parameter control variable of the previous round and the global parameter control variable of this round can be determined first, that is, Then, based on the third difference and the set drift adjustment rate ⁇ , the local parameter gradient is corrected.
- the product of the third difference and the drift adjustment rate ⁇ can be determined, and the difference between the local parameter gradient and the product is used as the corrected local parameter gradient, that is, the corrected local parameter gradient is:
- the local model parameters output in this round can be determined based on the corrected local parameter gradient and the updated local model parameters.
- the product of the corrected local parameter gradient and the set local learning rate (for ease of understanding, represented by ⁇ local ) can be determined, and then the local model parameters output in this round are determined based on the product and the updated local model parameters.
- the updated local model parameters can be determined The difference between the product and the local model parameter output in this round is determined as: Local model parameters output in this round Can be:
- the client can also determine the global sub-parameter gradient of this round based on the updated local model parameters (for ease of understanding, use
- the client can update the local model parameters based on the updated local model parameters, that is, based on the global model parameters of this round sent by the server.
- Determine the loss value wherein the process of determining the loss value is the same as the process of determining the loss value in the above embodiment, and will not be repeated here.
- the loss value obtained in this round of training can also be used Indicates that it can be called the server-side update subfunction Sub-functions can be updated via the server For local model parameters Take the derivative and get the global sub-parameter gradient Right now:
- the global sub-parameter gradient obtained in this round can be Determine the local parameter control variable for this round This is for the client to use in the next round of iterative training, so I will not go into details here.
- S104 Determine a first difference between the local model parameters output in this round and the global model parameters in this round; and determine a second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; send the first difference and the second difference to the server, so that the server determines the global model parameters and the global parameter control variables of the next round based on the first difference and the second difference sent by each client.
- the client in order to improve the convergence speed and accuracy of the federated learning model, can determine the local model parameters output in this round With the global model parameters of this round The first difference between In addition, the client can also determine the global sub-parameter gradient The local parameter control variable of the previous round saved The second difference between
- the client can send the first difference and the second difference of the client to the server.
- the server receives the first difference and the second difference sent by each client, if the federated learning model has not yet satisfied
- the server can determine the global model parameters for the next round of iterative training based on the first difference and the second difference sent by each client (for ease of understanding, referred to as the next round of global model parameters, with ) and global parameter control variables (for ease of understanding, called the next round of global parameter control variables, express).
- the server may determine the global model parameters of this round based on the first difference of each client and the set global learning rate (represented by ⁇ global for ease of understanding). Correction is performed to obtain the next round of global model parameters
- the average value of the first difference sent by each client can be determined first: Then determine the product of the global learning rate ⁇ global and the average of the first differences:
- you can set the global model parameters for this round The sum of the product is determined as the next round of global model parameters, namely:
- the server may calculate the difference value based on the second difference value sent by each client. Correct the global parameter control variables of this round to obtain the global parameter control variables of the next round.
- the server may first determine the average value of the second difference sent by each client: Then set the global parameter control variable of this round The sum of the average value is determined as the next round of global parameter control variables Right now:
- the server of this application can comprehensively determine the next round of global parameter control variables based on the difference between the current round of global sub-parameter gradients sent by each client and the local parameter control variables of the previous round, where the local parameter control variables can also be called the client model parameter update direction, or the federated learning sub-model parameter update direction, the global parameter control variables can also be called the federated learning model parameter update direction, or the server model parameter update direction, that is, the server of this application can be based on each The global sub-parameter gradient of this round sent by the client and the model parameter update direction of the client model in the previous round are used to comprehensively determine the next round of model parameter update direction of the federated learning model; in the present application, each client can determine the client local parameter gradient based on the global parameter control variable of this round sent by the server, that is, each client can determine the client local parameter gradient based on the global model parameter update direction of this round sent by the server, and then determine the local model parameters output by the client in this round.
- the clients of the present application can constrain each other, and each client can refer to the parameter update direction of other clients during the iterative training process, etc., to adjust the parameters of the client local federated learning sub-model, thereby effectively solving the client drift problem and improving the accuracy of the trained federated learning model.
- the federated learning sub-model in the client can be pulled back to the ideal update path for update during each update of the iterative training, which can significantly reduce the number of communications between the client and the server and the number of iterative training rounds of the federated learning model, and significantly improve the convergence speed of the federated learning model.
- each client can refer to the parameter update direction information of other clients in the iterative training process to adjust the parameters of the local federated learning sub-model of the client, thereby effectively solving the client drift problem, improving the accuracy of the trained federated learning model, and significantly improving the convergence speed of the federated learning model.
- the present application can also correct the local parameter gradient of the client based on the set drift adjustment rate, which can effectively prevent improper adjustment caused by excessive adjustment or insufficient adjustment, etc., and can effectively solve the client drift problem, improve the accuracy of the trained federated learning model, and significantly improve the convergence speed of the federated learning model.
- the present application can also correct the loss value of the client based on the set loss value adjustment rate, which can effectively prevent improper adjustment caused by over-adjustment or too small adjustment force, etc., can effectively solve the client drift problem, improve the accuracy of the trained federated learning model, and significantly improve High convergence speed of federated learning models.
- the server obtains the global model parameters The initial value of (The initial value can be 0, etc.), set the initial value of the global model parameter as the global model parameter of the first round
- the server can also obtain global parameter control variables The initial value of (The initial value can be 0, etc.), set the initial value of the global parameter control variable as the global parameter control variable of the first round
- the client can obtain local parameter control variables The initial value of (The initial value can be 0, etc.), set the local parameter control variable The initial value of the setting is used as the local parameter control variable of the previous round
- the server will set the global model parameters of the first round Global parameter control variables for the first round Sent to each client.
- the client sends the local model parameters of the currently saved federated learning sub-model Update to the global model parameters of the first round Based on the updated local model parameters and sample data, the corrected loss value is determined. able to pass For local model parameters Take the derivative and get the local parameter gradient Right now:
- the local model parameters output by the client in the first round Can be:
- the global sub-parameter gradient of the client in the first round can also be obtained:
- the first difference of the client and the second difference All are sent to the server.
- the client can Determine the local parameter control variables for the first round
- the server After receiving the first difference and the second difference sent by each client, the server can The parameters of the sub-models at the end are integrated, and based on the first difference and the second difference, the global model parameters used for the second round of training of the federated learning model are determined. And the second round of global parameter control variables
- the server sets the global model parameters of the second round Global parameter control variables for the first round Sent to each client.
- the client sends the local model parameters of the currently saved federated learning sub-model Update to the global model parameters of the second round And based on the updated local model parameters and sample data, the corrected loss value is determined: able to pass For local model parameters Take the derivative and get the local parameter gradient Right now:
- the local model parameters output by the client in the second round Can be:
- the global sub-parameter gradient of the client in the second round can also be obtained:
- the first difference of the client and the second difference All are sent to the server.
- the client can Determine the local parameter control variables for the second round
- the server After receiving the first difference and the second difference sent by each client, the server can integrate the parameters of the sub-models of each client, and determine the global model parameters used for the third round of training of the federated learning model based on the first difference and the second difference. And the third round of global parameter control variables Among them, the server determines the third round of global model parameters And the third round of global parameter control variables The process of receiving the third round of global model parameters And the third round of global parameter control variables The subsequent training process is the same as the first and second rounds mentioned above. The training process is similar and will not be described here.
- the server can send the trained federated learning model to each client, and each client receives and uses the federated learning model.
- FIG. 2 shows a schematic diagram of a second federated learning model training process provided by some embodiments, the process comprising the following steps:
- S201 The server sends the current round global model parameters and the current round global parameter control variables to each client.
- Each client uses the global model parameters of this round to update the local model parameters of the currently saved federated learning sub-model.
- Each client determines a local parameter gradient based on the updated local model parameters and the current round global parameter control variables, and determines the local model parameters output in this round based on the local parameter gradient.
- each client can also determine a global sub-parameter gradient based on the updated local model parameters.
- Each client determines the first difference between the local model parameters output in this round and the global model parameters in this round; and determines the second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; sends the first difference and the second difference to the server, so that the server determines the next round of global model parameters and the next round of global parameter control variables of the federated learning model to be trained based on the first difference and the second difference sent by each client, and returns to the loop to execute S201.
- FIG3 shows a schematic diagram of a third federated learning model training process provided by some embodiments, and the process includes the following steps:
- S301 The server sends the current round global model parameters and the current round global parameter control variables to each client.
- Each client uses the global model parameters of this round to update the local model parameters of the currently saved federated learning sub-model.
- Each client determines the loss value based on the updated local model parameters and sample data; determines the third difference between the currently saved local parameter control variables of the previous round and the global parameter control variables of this round; corrects the loss value based on the third difference and the set loss value adjustment rate; determines the local parameter gradient based on the corrected loss value.
- Each client determines the third difference between the local parameter control variable in the previous round and the global parameter control variable in this round; based on the third difference and the set drift adjustment rate, corrects the local parameter gradient; and determines the product of the corrected local parameter gradient and the set local learning rate; based on the product and the updated local model parameters, determines the local model parameters output in this round.
- Each client determines the global sub-parameter gradient based on the updated local model parameters; determines the first difference between the local model parameters output in this round and the global model parameters in this round; and determines the second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; sends the first difference and the second difference to the server, so that the server determines the next round of global model parameters and the next round of global parameter control variables of the federated learning model to be trained based on the first difference and the second difference sent by each client, and returns to the loop to execute S301.
- FIG4 shows a schematic diagram of a fourth federated learning model training process provided by some embodiments. As shown in FIG4 , in each round of iterative training of the federated learning model, the process includes at least the following steps:
- the server receives the local model parameters of the second round output sent by each client. and the second round global model parameters The first difference between And the second round of global sub-parameter gradient The first round of client local parameter control variables The second difference between Afterwards, the server can determine the global model parameters used for the third round of training of the federated learning model based on the first difference and the second difference sent by each client. And the third round of global parameter control variables
- determining the current round global model parameters and the current round global parameter control variables of the federated learning model to be trained based on the first difference and the second difference includes:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the third round of global model parameters can be calculated using the following formula:
- the third round of global parameter control variables can be calculated using the following formula:
- S402 Send the current round global model parameters and the current round global parameter control variables to each client.
- FIG5 shows a schematic diagram of a federated learning model training system provided by some embodiments. As shown in FIG5 , the system includes:
- the server 51 is used to perform at least the following steps in each round of iterative training of the federated learning model: if a first difference between the local model parameters output in the previous round and the global model parameters in the previous round, and a second difference between the global sub-parameter gradient of each client in the previous round and the local parameter control variables of the client in the previous two rounds are received, based on each first difference and the second difference, determine the global model parameters of the current round and the global parameter control variables of the federated learning model to be trained; Sending the current round global model parameters and the current round global parameter control variables to each client 52;
- the client 52 is used to perform at least the following steps during each round of iterative training of the federated learning model: receiving the global model parameters of this round and the global parameter control variables of this round sent by the server; using the global model parameters of this round to update the local model parameters of the currently saved federated learning sub-model; determining the local parameter gradient based on the updated local model parameters and the global parameter control variables of this round, and determining the local model parameters output in this round based on the local parameter gradient; and determining the global sub-parameter gradient based on the updated local model parameters; determining the first difference between the local model parameters output in this round and the global model parameters of this round; and determining the second difference between the global sub-parameter gradient and the currently saved local parameter control variables of the previous round; and sending the first difference and the second difference to the server 51.
- the server 51 is specifically configured to:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the client 52 is specifically configured to:
- the local parameter gradient is determined.
- the client 52 is specifically configured to:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- the client 52 is specifically configured to:
- the local model parameters output in this round are determined.
- the client 52 is specifically configured to:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- the client 52 is specifically configured to:
- the client 52 is further configured to:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- FIG6 shows a schematic diagram of a federated learning model training device provided by some embodiments. As shown in FIG6 , the device includes:
- a receiving module 61 is used to receive the global model parameters of the federated learning model to be trained and the global parameter control variables of the current round sent by the server during each round of iterative training of the federated learning model;
- An updating module 62 configured to update the local model parameters of the currently saved federated learning sub-model using the global model parameters of this round;
- a first determination module 63 is used to determine the local parameter gradient based on the updated local model parameters and the current round global parameter control variables, determine the local model parameters output in this round based on the local parameter gradient; and determine the global sub-parameter gradient based on the updated local model parameters;
- the first sending module 64 is used to determine the first difference between the local model parameters output in this round and the global model parameters in this round; and determine the second difference between the global sub-parameter gradient and the currently saved local parameter control variable of the previous round; send the first difference and the second difference to the server, so that the server determines the next step based on the first difference and the second difference sent by each client.
- the first determining module 63 is specifically configured to:
- the local parameter gradient is determined.
- the first determining module 63 is specifically configured to:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- the first determining module 63 is specifically configured to:
- the local model parameters output in this round are determined.
- the first determining module 63 is specifically configured to:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- the first determining module 63 is specifically configured to:
- the first determining module 63 is further configured to:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- FIG7 shows a schematic diagram of another federated learning model training device provided by some embodiments. As shown in FIG7, The device comprises:
- the second determination module 71 is used to determine the global model parameters of the federated learning model to be trained and the global parameter control variables of the current round based on the first difference and the second difference if the first difference between the local model parameters output in the previous round and the global model parameters in the previous round and the second difference between the global sub-parameter gradient of the previous round and the local parameter control variables of the client in the previous two rounds are received during each round of iterative training of the federated learning model;
- the second sending module 72 is used to send the current round global model parameters and the current round global parameter control variables to each client.
- the second determining module 71 is specifically configured to:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- FIG8 shows a schematic diagram of the structure of an electronic device provided by some embodiments.
- the electronic device includes: a processor 81, a communication interface 82, a memory 83 and a communication bus 84, wherein the processor 81, the communication interface 82, and the memory 83 communicate with each other through the communication bus 84;
- the memory 83 stores a computer program. When the program is executed by the processor 81, the processor 81 performs the following steps:
- the local parameter gradient is determined, and based on the local parameter gradient, the local model parameters output in this round are determined; and based on the Determine the global sub-parameter gradient by using the updated local model parameters;
- the processor 81 is specifically configured to:
- the local parameter gradient is determined.
- the processor 81 is specifically configured to:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- the processor 81 is specifically configured to:
- the local model parameters output in this round are determined.
- the processor 81 is specifically configured to:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- the processor 81 is specifically configured to:
- the processor 81 is further configured to:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- the present application also provides an electronic device, still referring to FIG. 8 , the electronic device includes: a processor 81, a communication interface 82, a memory 83 and a communication bus 84, wherein the processor 81, the communication interface 82, and the memory 83 communicate with each other through the communication bus 84;
- the memory 83 stores a computer program. When the program is executed by the processor 81, the processor 81 performs the following steps:
- the global model parameters of this round and the global parameter control variables of this round of the federated learning model to be trained are determined based on the first difference and the second difference;
- the current round global model parameters and the current round global parameter control variables are sent to each client.
- the processor 81 is specifically configured to:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the communication bus mentioned in the above electronic device can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface 82 is used for communication between the above electronic device and other devices.
- the memory may include a random access memory (RAM) or a non-volatile memory (NVM), such as at least one disk storage device.
- RAM random access memory
- NVM non-volatile memory
- the memory may also be at least one storage device located away from the aforementioned processor.
- processors can be general-purpose processors, including central processing units, network processors (Network Processor, NP), etc.; they can also be digital signal processing processors (Digital Signal Processing, DSP), application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processing
- an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program executable by an electronic device.
- the program runs on the electronic device, the electronic device implements the following steps when executing:
- determining the local parameter gradient based on the updated local model parameter and the current round global parameter control variable includes:
- the local parameter gradient is determined.
- the correction of the loss value based on the currently saved local parameter control variable of the previous round and the global parameter control variable of the current round includes:
- the loss value is corrected based on the third difference and the set loss value adjustment rate.
- determining the local model parameters output in this round based on the local parameter gradient includes:
- the local model parameters output in this round are determined.
- the correcting the local parameter gradient based on the currently saved local parameter control variable of the previous round and the global parameter control variable of the current round includes:
- the local parameter gradient is corrected based on the third difference and a set drift adjustment rate.
- determining the local model parameters output in this round based on the corrected local parameter gradient and the updated local model parameters includes:
- the method further includes:
- the global sub-parameter gradient is determined as the local parameter control variable of this round.
- the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program executable by an electronic device.
- the electronic device implements the following steps:
- the global model parameters of this round and the global parameter control variables of this round of the federated learning model to be trained are determined based on the first difference and the second difference;
- the current round global model parameters and the current round global parameter control variables are sent to each client.
- determining the current round global model parameters and the current round global parameter control variables of the federated learning model to be trained based on the first difference and the second difference includes:
- the global parameter control variable of the previous round is corrected to obtain the global parameter control variable of this round.
- the above-mentioned computer-readable storage medium can be any available medium or data storage device that can be accessed by the processor in the electronic device, including but not limited to magnetic storage such as floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO), etc., optical storage such as CD, DVD, BD, HVD, etc., and semiconductor storage such as ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drives (SSD), etc.
- magnetic storage such as floppy disks, hard disks, magnetic tapes, magneto-optical disks (MO), etc.
- optical storage such as CD, DVD, BD, HVD, etc.
- semiconductor storage such as ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drives (SSD), etc.
- the present application provides a computer program product, which includes: computer program code, when the computer program code runs on a computer, the computer implements the method described in any method embodiment applied to an electronic device.
- all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof, and all or part of the embodiments may be implemented in the form of a computer program product.
- the computer program product includes one or more computer instructions, and when the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
- the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of one or more The form of a computer program product implemented on a computer-usable storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
- a computer-usable storage medium including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
- These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请公开了一种联邦学习模型训练方法、装置、设备及存储介质,用以提高联邦学习模型的收敛速度及精度。本申请中服务器可以基于各客户端发送的本轮全局子参数梯度与上一轮本地参数控制变量之间的差值等,综合确定下一轮全局参数控制变量,各客户端可以基于服务器发送的本轮全局参数控制变量,来确定客户端本地参数梯度,进而确定客户端本轮输出的本地模型参数,基于此,本申请各客户端之间可以互相约束,每个客户端可以参考其他客户端在迭代训练过程中的参数更新方向等,来对客户端本地的联邦学习子模型的参数进行调整,从而可以有效解决客户端漂移问题,可以提高训练完成的联邦学习模型的精度,并可以显著提高联邦学习模型的收敛速度。
Description
相关申请的交叉引用
本申请要求在2022年11月11日提交中国专利局、申请号为202211414446.X、申请名称为“一种联邦学习模型训练方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及模型训练技术领域,尤其涉及一种联邦学习模型训练方法、装置、设备及存储介质。
联邦学习是一种将数据和模型解耦合的分布式框架,可以解决数据孤岛和隐私保护难题。基于联邦学习进行模型训练时,可以在数据不离开参与方本地的情况下,实现各参与方的联合建模。训练好的联邦学习模型(也可称为全局模型)可以在各参与方之间共享和部署。联邦学习在智慧医疗、金融保险和智能物联网等领域有广泛的应用前景。
然而,联邦学习面临着数据非独立同分布问题所带来的严峻挑战,数据非独立同分布问题即每个参与方所拥有的数据分布与全局分布并不一致,这种数据的不一致可能会导致模型收敛速度过慢,并可能会使得模型的精度受损。
因此,基于非独立同分布数据,如何提高联邦学习模型的收敛速度及精度是目前亟需解决的一个技术问题。
发明内容
本申请提供了一种联邦学习模型训练方法、装置、设备及存储介质,用以提高联邦学习模型的收敛速度及精度。
第一方面,本申请提供了一种联邦学习模型训练方法,应用于客户端,所述方法包括:
在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;
基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;
确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,所述基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,包括:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正,包括:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,基于所述本地参数梯度,确定本轮输出的本
地模型参数,包括:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正,包括:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数,包括:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述方法还包括:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
第二方面,本申请提供了一种联邦学习模型训练方法,应用于服务器,所述方法包括:
在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
在一种可能的实施方式中,所述基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量,包括:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
第三方面,本申请提供了一种联邦学习模型训练系统,所述系统包括:
服务器,用于在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端;
所述每个客户端,用于在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:接收所述服务器发送的所述本轮全局模型参数以及本轮全局参数控制变量;采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器。
第四方面,本申请提供了一种联邦学习模型训练装置,所述装置包括:
接收模块,用于在参与对联邦学习模型的每轮迭代训练过程中,接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
更新模块,用于采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;
第一确定模块,用于基于更新后的本地模型参数及所述本轮全局参数控
制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;
第一发送模块,用于确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,所述第一确定模块,具体用于:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述第一确定模块,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,所述第一确定模块,具体用于:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述第一确定模块,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,所述第一确定模块,具体用于:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以
及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述第一确定模块,还用于:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
第五方面,本申请提供了一种联邦学习模型训练装置,所述装置包括:
第二确定模块,用于在对联邦学习模型的每轮迭代训练过程中,若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
第二发送模块,用于将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
在一种可能的实施方式中,所述第二确定模块,具体用于:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
第六方面,本申请提供了一种电子设备,所述电子设备至少包括处理器和存储器,所述处理器用于执行存储器中存储的计算机程序时实现如上述任一所述方法的步骤。
第七方面,本申请提供了一种计算机可读存储介质,其存储有计算机程序,所述计算机程序被处理器执行时实现如上述任一所述方法的步骤。
第八方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如上述任一所述方法的步骤。
由于本申请服务器可以基于各客户端发送的本轮全局子参数梯度与上一轮本地参数控制变量之间的差值,来综合确定下一轮全局参数控制变量,其中,本地参数控制变量也可以称为客户端模型参数更新方向,或者联邦学习
子模型参数更新方向,全局参数控制变量也可以称为联邦学习模型参数更新方向,或者服务器模型参数更新方向,也就是说,本申请服务器可以基于各客户端发送的本轮全局子参数梯度与上一轮客户端模型的模型参数更新方向,来综合确定联邦学习模型的下一轮模型参数更新方向;本申请中各客户端可以基于服务器发送的本轮全局参数控制变量,来确定客户端本地参数梯度,也就是说,各客户端可以基于服务器发送的本轮全局模型参数更新方向,来确定客户端本地参数梯度,进而确定客户端本轮输出的本地模型参数,基于此,相较于相关技术中各客户端之间互相独立地对本地的联邦学习子模型进行训练,容易出现客户端漂移(client-drift)问题而言,本申请各客户端之间可以互相约束,每个客户端可以参考其他客户端在迭代训练过程中的参数更新方向等,来对客户端本地的联邦学习子模型的参数进行调整,从而可以有效解决客户端漂移问题,可以提高训练完成的联邦学习模型的精度。
另外,基于本轮全局参数控制变量等进行联邦学习训练时,可以使得客户端中的联邦学习子模型在每次迭代训练的更新过程中,可以被拉回到理想的更新路径附近进行更新,可以显著减少客户端和服务器的通信次数以及联邦学习模型的迭代训练轮数,显著提高联邦学习模型的收敛速度。
为了更清楚地说明本申请实施例或相关技术中的实施方式,下面将对实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1示出了一些实施例提供的第一种联邦学习模型训练过程示意图;
图2示出了一些实施例提供的第二种联邦学习模型训练过程示意图;
图3示出了一些实施例提供的第三种联邦学习模型训练过程示意图;
图4示出了一些实施例提供的第四种联邦学习模型训练过程示意图;
图5示出了一些实施例提供的一种联邦学习模型训练系统示意图;
图6示出了一些实施例提供的一种联邦学习模型训练装置示意图;
图7示出了一些实施例提供的另一种联邦学习模型训练装置示意图;
图8示出了一些实施例提供的一种电子设备结构示意图。
为了提高联邦学习模型的收敛速度及精度,本申请提供了一种联邦学习模型训练方法、装置、设备及介质。
为使本申请的目的和实施方式更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
需要说明的是,本申请中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本申请的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
本申请中说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别类似或同类的对象或实体,而不必然意味着限定特定的顺序或先后次序,除非另外注明。应该理解这样使用的用语在适当情况下可以互换。
术语“包括”和“具有”以及他们的任何变形,意图在于覆盖但不排他的包含,例如,包含了一系列组件的产品或设备不必限于清楚地列出的所有组件,而是可包括没有清楚地列出的或对于这些产品或设备固有的其它组件。
术语“模块”是指任何已知或后来开发的硬件、软件、固件、人工智能、模糊逻辑或硬件或/和软件代码的组合,能够执行与该元件相关的功能。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
图1示出了一些实施例提供的第一种联邦学习模型训练过程示意图,该方法应用于客户端,示例性的,客户端可以是PC、移动终端等电子设备。如图1所示,客户端在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
S101:接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量。
在一种可能的实施方式中,为了提高联邦学习模型的收敛速度及精度,在对联邦学习模型的任意一轮(如第r轮)迭代训练过程中,服务器可以确定联邦学习模型在本轮的模型参数(为方便描述,称为本轮全局模型参数,用表示),以及在本轮的参数控制变量(为方便描述,称为本轮全局参数控制变量,用表示),并将确定的本轮全局模型参数和本轮全局参数控制变量发送给参与本轮训练的每个客户端(也可称为参与方)。其中,针对服务器如何确定本轮全局模型参数和本轮全局参数控制变量在下文进行介绍,在此先不赘述。其中,联邦学习模型也可以称为服务器模型,全局参数控制变量也可以称为联邦学习模型参数更新方向,或者服务器模型参数更新方向。相应的,本轮全局参数控制变量也可称为本轮联邦学习模型参数更新方向,或者本轮服务器模型参数更新方向。
针对参与本轮训练的每个客户端,均可以接收到服务器发送的本轮全局模型参数和本轮全局参数控制变量
S102:采用本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新。
为方便描述,将保存在每个客户端中的模型称为联邦学习子模型。针对任一客户端,接收到服务器发送的本轮全局模型参数和本轮全局参数控制变量之后,可以采用本轮全局模型参数对当前保存的联邦学习子模型的参数(为方便描述,称为本地模型参数,用表示)进行更新。也就是说,参与本轮训练的每个客户端均可以将保存在客户端自身本地中的联邦学习子模型的参数(也可称为权重)更新为本轮全局模型参数
S103:基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度。
在一种可能的实施方式中,针对任一客户端,采用本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新之后,可以基于更新后的本地模型参数也即本轮全局模型参数及本轮全局参数控制变量确定本地参数梯度(为方便理解,用表示)。
在一种可能的实施方式中,基于更新后的本地模型参数及本轮全局参数控制变量确定本地参数梯度的过程可以包括:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
具体的,针对任一客户端,该客户端在本轮迭代训练过程中,采用的训练联邦学习子模型的样本数据集用Di表示,样本数据集中包含若干个样本数据i,可以基于更新后的本地模型参数及样本数据i,确定损失值(也可称为损失函数,为方便理解,用表示)。示例性的,确定损失值时,可以是将样本数据i输入联邦学习子模型中,获得联邦学习子模型的识别结果,基于样本数据i对应的样本标签与该识别结果的差异等,来确定损失值,在此不再赘述。在一种可能的实施方式中,在参与对联邦学习模型的每一轮迭代训练过程,每个客户端均可以基于样本数据对本地的联邦学习子模型进行若干次(为方便描述,称为K次)子训练,可以基于K次子训练过程中每个样本数据对应的损失值的和值,来确定本轮训练的损失值。为方便描述,将获得的本轮训练的损失值用表示。
在一种可能的实施方式中,为了有效解决客户端漂移问题,提高联邦学习模型的收敛速度及精度,针对任一客户端,可以基于该客户端当前保存的上一轮本地参数控制变量(为方便理解,用表示)以及本轮全局参数控制
变量对损失值进行校正。在一种可能的实施方式中,客户端基于当前保存的上一轮本地参数控制变量及本轮全局参数控制变量对损失值进行校正时,可以是先确定上一轮本地参数控制变量与本轮全局参数控制变量的第三差值,即
其中,客户端中的联邦学习子模型也可以称为客户端模型,本地参数控制变量也可以称为客户端模型参数更新方向,或者联邦学习子模型参数更新方向。相应的,上一轮本地参数控制变量也可称为上一轮客户端模型参数更新方向,或者上一轮联邦学习子模型参数更新方向。在一种可能的实施方式中,为了有效解决客户端漂移问题,可以将上一轮本地参数控制变量与本轮全局参数控制变量的差值,即第三差值作为客户端漂移值,基于客户端漂移值对损失值进行校正,进而实现基于客户端漂移值对客户端的本地参数梯度进行校正,从而可以使得每个客户端可以参考其他客户端在迭代训练过程中的参数更新方向等,来对客户端本地的联邦学习子模型的参数进行调整,从而可以有效解决客户端漂移问题,提高训练完成的联邦学习模型的精度,并显著提高联邦学习模型的收敛速度。
在一种可能的实施方式中,在基于当前保存的上一轮本地参数控制变量及本轮全局参数控制变量对损失值进行校正时,可以基于第三差值及设定的损失值调整率(为方便理解,称为β),对损失值进行校正。
示例性的,基于第三差值及设定的损失值调整率β,对损失值进行校正时,可以是先确定在上一轮迭代训练过程中接收到的上一轮全局模型参数(为方便理解,称为)与上述第三差值之间的差值(为方便描述,称为第四差值),即另外,还可以确定更新后的本地模型参数与第四差值之间的差值(为方便描述,称为第五差值),即:
确定第五差值的平方与损失值调整率β的乘积,即:
在一种可能的实施方式中,可以基于第五差值的平方与损失值调整率β的乘积,准确地对损失值进行校正,示例性的,可以将校正前的损失值与该乘积的和,确定为校正后的损失值,校正后的损失值(为方便理解,将校正后的损失值称为本地更新函数,用表示),其中:
在一种可能的实施方式中,可以基于校正后的损失值,确定本地参数梯度在一种可能的实施方式中,可以通过本地更新函数对本地模型参数求导数,获得本地参数梯度即:
针对任一客户端,该客户端获得了本地参数梯度后,可以基于本地参数梯度确定该客户端本轮输出的本地模型参数。在一种可能的实施方式中,基于本地参数梯度确定本轮输出的本地模型参数时,可以是先基于当前保存的上一轮本地参数控制变量及本轮全局参数控制变量对本地参数梯度进行校正。示例性的,对本地参数梯度进行校正时,可以是先确定上一轮本地参数控制变量与本轮全局参数控制变量的第三差值,即然后基于该第三差值与设定的漂移调整率α,对本地参数梯度进行校正。示例性的,可以确定第三差值与漂移调整率α的乘积,将本地参数梯度与该乘积的差值,作为校正后的本地参数梯度,即校正后的本地参数梯度为:
在一种可能的实施方式中,可以基于校正后的本地参数梯度以及更新后的本地模型参数,确定本轮输出的本地模型参数。示例性的,确定本轮输出的本地模型参数时,可以是确定校正后的本地参数梯度与设定的本地学习率(为方便理解,用ηlocal表示)的乘积,然后基于该乘积以及更新后的本地模型参数,确定本轮输出的本地模型参数,示例性的,可以确定更新后的本地模型参数与该乘积的差值,将该差值确定为本轮输出的本地模型参数,即
本轮输出的本地模型参数可以为:
在一种可能的实施方式中,客户端还可以基于更新后的本地模型参数,确定本轮的全局子参数梯度(为方便理解,用表示)。示例性的,客户端可以基于更新后的本地模型参数,也即可以基于服务器发送的本轮全局模型参数确定损失值,其中,确定损失值的过程与上述实施例中确定损失值的过程相同,在此不再赘述。其中,获得的本轮训练的损失值也可以用表示,可称为服务器端更新子函数可以通过服务器端更新子函数对本地模型参数求导数,获得全局子参数梯度即:
在一种可能的实施方式中,可以将本轮获得的全局子参数梯度确定为本轮本地参数控制变量以供客户端在下一轮迭代训练过程中使用,在此先不赘述。
S104:确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,为了提高联邦学习模型的收敛速度及精度,客户端可以确定本轮输出的本地模型参数与本轮全局模型参数之间的第一差值,即另外,客户端还可以确定全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值,即
在一种可能的实施方式中,针对本轮参与迭代训练的每个客户端,该客户端可以将该客户端的第一差值和第二差值发送给服务器。服务器接收到每个客户端分别发送的第一差值和第二差值之后,如果联邦学习模型还未满足
设定的收敛条件,服务器可以基于各客户端发送的第一差值和第二差值,确定下一轮迭代训练的全局模型参数(为方便理解,称为下一轮全局模型参数,用表示)以及全局参数控制变量(为方便理解,称为下一轮全局参数控制变量,用表示)。
在一种可能的实施方式中,服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量时,可以是基于各客户端的第一差值以及设定的全局学习率(为方便理解,用ηglobal表示),对本轮全局模型参数进行校正,从而得到下一轮全局模型参数
示例性的,假设本轮参与训练的客户端集合用Ntrains表示,本轮参与训练的客户端的数量用|Nclients|表示,可以先确定每个客户端发送的第一差值的平均值:然后确定全局学习率ηglobal与各第一差值的平均值的乘积:可选的,可以将本轮全局模型参数与该乘积的和值,确定为下一轮全局模型参数,即:
在一种可能的实施方式中,服务器可以基于各客户端发送的第二差值对本轮全局参数控制变量进行校正,从而得到下一轮全局参数控制变量。示例性的,服务器可以先确定每个客户端发送的第二差值的平均值:然后将本轮全局参数控制变量与该平均值的和值,确定为下一轮全局参数控制变量即:
由于本申请服务器可以基于各客户端发送的本轮全局子参数梯度与上一轮本地参数控制变量之间的差值,来综合确定下一轮全局参数控制变量,其中,本地参数控制变量也可以称为客户端模型参数更新方向,或者联邦学习子模型参数更新方向,全局参数控制变量也可以称为联邦学习模型参数更新方向,或者服务器模型参数更新方向,也就是说,本申请服务器可以基于各
客户端发送的本轮全局子参数梯度与上一轮客户端模型的模型参数更新方向,来综合确定联邦学习模型的下一轮模型参数更新方向;本申请中各客户端可以基于服务器发送的本轮全局参数控制变量,来确定客户端本地参数梯度,也就是说,各客户端可以基于服务器发送的本轮全局模型参数更新方向,来确定客户端本地参数梯度,进而确定客户端本轮输出的本地模型参数,基于此,相较于相关技术中各客户端之间互相独立地对本地的联邦学习子模型进行训练,容易出现客户端漂移(client-drift)问题而言,本申请各客户端之间可以互相约束,每个客户端可以参考其他客户端在迭代训练过程中的参数更新方向等,来对客户端本地的联邦学习子模型的参数进行调整,从而可以有效解决客户端漂移问题,可以提高训练完成的联邦学习模型的精度。
另外,基于本轮全局参数控制变量等进行联邦学习训练时,可以使得客户端中的联邦学习子模型在每次迭代训练的更新过程中,可以被拉回到理想的更新路径附近进行更新,可以显著减少客户端和服务器的通信次数以及联邦学习模型的迭代训练轮数,显著提高联邦学习模型的收敛速度。
另外,本申请中的上一轮本地参数控制变量与本轮全局参数控制变量的差值可以作为客户端漂移值,基于客户端漂移值对客户端的本地参数梯度进行校正时,可以使得每个客户端可以参考其他客户端在迭代训练过程中的参数更新方向信息等,来对客户端本地的联邦学习子模型的参数进行调整,从而可以有效解决客户端漂移问题,提高训练完成的联邦学习模型的精度,并显著提高联邦学习模型的收敛速度。
另外,本申请还可以基于设定的漂移调整率,对客户端的本地参数梯度进行校正,可以有效防止过度调整或者调整力度过小所导致的调整不当等,可以有效解决客户端漂移问题,提高训练完成的联邦学习模型的精度,并显著提高联邦学习模型的收敛速度。
另外,本申请还可以基于设定的损失值调整率,对客户端的损失值进行校正,可以有效防止过度调整或者调整力度过小所导致的调整不当等,可以有效解决客户端漂移问题,提高训练完成的联邦学习模型的精度,并显著提
高联邦学习模型的收敛速度。
为方便理解,下面通过一个具体实施例对本申请提供的联邦学习模型训练过程进行解释说明。
在训练开始时,服务器获取全局模型参数的设定初始值(初始值可以为0等),将全局模型参数的设定初始值,作为第一轮的全局模型参数另外,服务器还可以获取全局参数控制变量的设定初始值(初始值可以为0等),将全局参数控制变量的设定初始值,作为第一轮的全局参数控制变量同样的,客户端可以获取本地参数控制变量的设定初始值(初始值可以为0等),将本地参数控制变量的设定初始值,作为上一轮的本地参数控制变量
在第一轮迭代训练过程中,服务器将第一轮的全局模型参数第一轮的全局参数控制变量发送给每一个客户端。针对每个客户端,该客户端将当前保存的联邦学习子模型的本地模型参数更新为第一轮的全局模型参数并基于更新后的本地模型参数以及样本数据等,确定校正后的损失值,校正后的损失值
可以通过对本地模型参数求导数,获得本地参数梯度即:
针对每个客户端,该客户端在第一轮输出的本地模型参数可以为:
另外,针对每个客户端,还可以获得该客户端在第一轮的全局子参数梯度:
针对客户端,将该客户端的第一差值以及第二差值均发送给服务器。
另外,客户端可以将确定为第一轮的本地参数控制变量
服务器接收到各客户端发送的第一差值和第二差值之后,可以对各客户
端的子模型的参数等进行整合,基于第一差值和第二差值,确定对联邦学习模型进行第二轮训练时采用的全局模型参数以及第二轮全局参数控制变量
其中,
在第二轮迭代训练过程中,服务器将第二轮的全局模型参数第一轮的全局参数控制变量发送给每一个客户端。针对每个客户端,该客户端将当前保存的联邦学习子模型的本地模型参数更新为第二轮的全局模型参数并基于更新后的本地模型参数以及样本数据等,确定校正后的损失值:可以通过对本地模型参数求导数,获得本地参数梯度即:
针对每个客户端,该客户端在第二轮输出的本地模型参数可以为:
另外,针对每个客户端,还可以获得该客户端在第二轮的全局子参数梯度:
针对客户端,将该客户端的第一差值以及第二差值均发送给服务器。
另外,客户端可以将确定为第二轮的本地参数控制变量
服务器接收到各客户端发送的第一差值和第二差值之后,可以对各客户端的子模型的参数等进行整合,基于第一差值和第二差值,确定对联邦学习模型进行第三轮训练时采用的全局模型参数以及第三轮全局参数控制变量其中,服务器确定第三轮全局模型参数以及第三轮全局参数控制变量的过程,以及客户端接收到第三轮全局模型参数以及第三轮全局参数控制变量之后的训练过程与上述第一轮和第二轮
训练过程类似,在此不再赘述。
假设经过N轮迭代训练后,服务器中的联邦学习模型满足收敛条件,则服务器可以将训练好的联邦学习模型发送给每个客户端,每个客户端接收并使用该联邦学习模型。
为方便理解,下面再通过一个具体实施例对本申请提供的联邦学习过程进行解释说明。参阅图2,图2示出了一些实施例提供的第二种联邦学习模型训练过程示意图,该过程包括以下步骤:
S201:服务器向各客户端发送本轮全局模型参数以及本轮全局参数控制变量。
S202:每个客户端分别采用本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新。
S203:每个客户端分别基于更新后的本地模型参数及本轮全局参数控制变量,确定本地参数梯度,基于本地参数梯度,确定本轮输出的本地模型参数。另外,每个客户端还可以分别基于更新后的本地模型参数,确定全局子参数梯度。
S204:每个客户端确定本轮输出的本地模型参数与本轮全局模型参数之间的第一差值;并确定全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将第一差值及第二差值发送给服务器,使得服务器基于各客户端发送的第一差值和第二差值,确定待训练的联邦学习模型的下一轮全局模型参数以及下一轮全局参数控制变量,并返回循环执行S201。
为方便理解,下面再通过一个具体实施例对本申请提供的联邦学习过程进行解释说明。参阅图3,图3示出了一些实施例提供的第三种联邦学习模型训练过程示意图,该过程包括以下步骤:
S301:服务器向各客户端发送本轮全局模型参数以及本轮全局参数控制变量。
S302:每个客户端分别采用本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新。
S303:每个客户端分别基于更新后的本地模型参数及样本数据,确定损失值;确定当前保存的上一轮本地参数控制变量与本轮全局参数控制变量的第三差值;基于第三差值及设定的损失值调整率,对损失值进行校正;基于校正后的损失值,确定本地参数梯度。
S304:每个客户端分别确定上一轮本地参数控制变量与本轮全局参数控制变量的第三差值;基于第三差值与设定的漂移调整率,对本地参数梯度进行校正;并确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及更新后的本地模型参数,确定本轮输出的本地模型参数。
S305:每个客户端分别基于更新后的本地模型参数,确定全局子参数梯度;确定本轮输出的本地模型参数与本轮全局模型参数之间的第一差值;并确定全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将第一差值及第二差值发送给服务器,使得服务器基于各客户端发送的第一差值和第二差值,确定待训练的联邦学习模型的下一轮全局模型参数以及下一轮全局参数控制变量,并返回循环执行S301。
基于相同的技术构思,本申请还提供了一种联邦学习模型训练方法,该方法应用于服务器,图4示出了一些实施例提供的第四种联邦学习模型训练过程示意图,如图4所示,在对联邦学习模型的每轮迭代训练过程中,该过程至少包括以下步骤:
S401:若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量。
在一种可能的实施方式中,以第三轮迭代训练为例,服务器接收到各客户端发送的第二轮输出的本地模型参数与第二轮全局模型参数之间的第一差值以及第二轮全局子参数梯度与第一轮客户端本地参数控制变量之间的第二差值
之后,服务器可以基于各客户端发送的第一差值和第二差值,确定对联邦学习模型进行第三轮训练时采用的全局模型参数以及第三轮全局参数控制变量
在一种可能的实施方式中,所述基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量,包括:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
其中,确定本轮全局模型参数及本轮全局参数控制变量的过程与上述实施例相同。例如:
第三轮全局模型参数可以采用如下公式计算获得:
第三轮全局参数控制变量可以采用如下公式计算获得:
在此不再赘述。
S402:将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
基于相同的技术构思,本申请还提供了一种联邦学习模型训练系统,图5示出了一些实施例提供的一种联邦学习模型训练系统示意图,如图5所示,该系统包括:
服务器51,用于在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:若接收到各客户端52发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端52;
所述客户端52,用于在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:接收所述服务器发送的所述本轮全局模型参数以及本轮全局参数控制变量;采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器51。
在一种可能的实施方式中,所述服务器51,具体用于:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
在一种可能的实施方式中,所述客户端52,具体用于:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述客户端52,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,所述客户端52,具体用于:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述客户端52,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,所述客户端52,具体用于:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述客户端52,还用于:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
基于相同的技术构思,本申请提供了一种联邦学习模型训练装置,图6示出了一些实施例提供的一种联邦学习模型训练装置示意图,如图6所示,所述装置包括:
接收模块61,用于在参与对联邦学习模型的每轮迭代训练过程中,接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
更新模块62,用于采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;
第一确定模块63,用于基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;
第一发送模块64,用于确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下
一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,所述第一确定模块63,具体用于:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述第一确定模块63,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,所述第一确定模块63,具体用于:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述第一确定模块63,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,所述第一确定模块63,具体用于:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述第一确定模块63,还用于:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
基于相同的技术构思,本申请还提供了一种联邦学习模型训练装置,图7示出了一些实施例提供的另一种联邦学习模型训练装置示意图,如图7所示,
所述装置包括:
第二确定模块71,用于在对联邦学习模型的每轮迭代训练过程中,若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
第二发送模块72,用于将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
在一种可能的实施方式中,所述第二确定模块71,具体用于:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
基于相同的技术构思,本申请还提供了一种电子设备,图8示出了一些实施例提供的一种电子设备结构示意图,如图8所示,电子设备包括:处理器81、通信接口82、存储器83和通信总线84,其中,处理器81,通信接口82,存储器83通过通信总线84完成相互间的通信;
所述存储器83中存储有计算机程序,当所述程序被所述处理器81执行时,使得所述处理器81执行如下步骤:
在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;
基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所
述更新后的本地模型参数,确定全局子参数梯度;
确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,所述处理器81,具体用于:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述处理器81,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,所述处理器81,具体用于:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述处理器81,具体用于:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,所述处理器81,具体用于:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述处理器81,还用于:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
基于相同的技术构思,本申请还提供了一种电子设备,仍参阅图8所示,电子设备包括:处理器81、通信接口82、存储器83和通信总线84,其中,处理器81,通信接口82,存储器83通过通信总线84完成相互间的通信;
所述存储器83中存储有计算机程序,当所述程序被所述处理器81执行时,使得所述处理器81执行如下步骤:
在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
在一种可能的实施方式中,所述处理器81,具体用于:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口82用于上述电子设备与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存
储器。可选地,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述处理器可以是通用处理器,包括中央处理器、网络处理器(Network Processor,NP)等;还可以是数字指令处理器(Digital Signal Processing,DSP)、专用集成电路、现场可编程门陈列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
基于相同的技术构思,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有可由电子设备执行的计算机程序,当所述程序在所述电子设备上运行时,使得所述电子设备执行时实现如下步骤:
在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;
基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;
确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
在一种可能的实施方式中,所述基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,包括:
基于更新后的本地模型参数及样本数据,确定损失值;
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;
基于校正后的损失值,确定所述本地参数梯度。
在一种可能的实施方式中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正,包括:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
在一种可能的实施方式中,基于所述本地参数梯度,确定本轮输出的本地模型参数,包括:
基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;
基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正,包括:
确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;
基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
在一种可能的实施方式中,基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数,包括:
确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
在一种可能的实施方式中,所述方法还包括:
将所述全局子参数梯度,确定为本轮本地参数控制变量。
基于相同的技术构思,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有可由电子设备执行的计算机程序,当所述程序在所述电子设备上运行时,使得所述电子设备执行时实现如下步骤:
在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:
若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;
将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
在一种可能的实施方式中,所述基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量,包括:
基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;
基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
上述计算机可读存储介质可以是电子设备中的处理器能够存取的任何可用介质或数据存储设备,包括但不限于磁性存储器如软盘、硬盘、磁带、磁光盘(MO)等、光学存储器如CD、DVD、BD、HVD等、以及半导体存储器如ROM、EPROM、EEPROM、非易失性存储器(NAND FLASH)、固态硬盘(SSD)等。
基于相同的技术构思,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行时实现上述应用于电子设备的任一方法实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令,在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个
其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (14)
- 一种联邦学习模型训练方法,应用于客户端,所述方法包括:在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
- 根据权利要求1所述的方法,其中,所述基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,包括:基于更新后的本地模型参数及样本数据,确定损失值;基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正;基于校正后的损失值,确定所述本地参数梯度。
- 根据权利要求2所述的方法,其中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述损失值进行校正,包括:确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;基于所述第三差值及设定的损失值调整率,对所述损失值进行校正。
- 根据权利要求1所述的方法,其中,基于所述本地参数梯度,确定本轮输出的本地模型参数,包括:基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正;基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
- 根据权利要求4所述的方法,其中,所述基于当前保存的上一轮本地参数控制变量及所述本轮全局参数控制变量,对所述本地参数梯度进行校正,包括:确定所述上一轮本地参数控制变量与所述本轮全局参数控制变量的第三差值;基于所述第三差值与设定的漂移调整率,对所述本地参数梯度进行校正。
- 根据权利要求4所述的方法,其中,基于校正后的本地参数梯度以及所述更新后的本地模型参数,确定本轮输出的本地模型参数,包括:确定校正后的本地参数梯度与设定的本地学习率的乘积;基于该乘积以及所述更新后的本地模型参数,确定本轮输出的本地模型参数。
- 根据权利要求1所述的方法,所述方法还包括:将所述全局子参数梯度,确定为本轮本地参数控制变量。
- 一种联邦学习模型训练方法,应用于服务器,所述方法包括:在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
- 根据权利要求8所述的方法,其中,所述基于各第一差值及第二差值, 确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量,包括:基于各第一差值以及设定的全局学习率,对上一轮全局模型参数进行校正,得到所述本轮全局模型参数;基于各第二差值,对上一轮全局参数控制变量进行校正,得到所述本轮全局参数控制变量。
- 一种联邦学习模型训练系统,所述系统包括:服务器,用于在对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端;所述每个客户端,用于在参与对联邦学习模型的每轮迭代训练过程中,至少执行以下步骤:接收所述服务器发送的所述本轮全局模型参数以及本轮全局参数控制变量;采用所述本轮全局模型参数对当前保存的联邦学习子模型的本地模型参数进行更新;基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器。
- 一种联邦学习模型训练装置,所述装置包括:接收模块,用于在参与对联邦学习模型的每轮迭代训练过程中,接收服务器发送的待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;更新模块,用于采用所述本轮全局模型参数对当前保存的联邦学习子模 型的本地模型参数进行更新;第一确定模块,用于基于更新后的本地模型参数及所述本轮全局参数控制变量,确定本地参数梯度,基于所述本地参数梯度,确定本轮输出的本地模型参数;并基于所述更新后的本地模型参数,确定全局子参数梯度;第一发送模块,用于确定所述本轮输出的本地模型参数与所述本轮全局模型参数之间的第一差值;并确定所述全局子参数梯度与当前保存的上一轮本地参数控制变量之间的第二差值;将所述第一差值及第二差值发送给所述服务器,使所述服务器基于各客户端发送的第一差值和第二差值,确定下一轮全局模型参数以及下一轮全局参数控制变量。
- 一种联邦学习模型训练装置,所述装置包括:第二确定模块,用于在对联邦学习模型的每轮迭代训练过程中,若接收到各客户端发送的上一轮输出的本地模型参数与上一轮全局模型参数之间的第一差值,以及各客户端的上一轮全局子参数梯度与上两轮客户端本地参数控制变量之间的第二差值,基于各第一差值及第二差值,确定待训练的联邦学习模型的本轮全局模型参数以及本轮全局参数控制变量;第二发送模块,用于将所述本轮全局模型参数以及本轮全局参数控制变量,发送给每个客户端。
- 一种电子设备,所述电子设备至少包括处理器和存储器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1-9任一所述方法的步骤。
- 一种计算机可读存储介质,其存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-9任一所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211414446.X | 2022-11-11 | ||
CN202211414446.XA CN115660115A (zh) | 2022-11-11 | 2022-11-11 | 一种联邦学习模型训练方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024099109A1 true WO2024099109A1 (zh) | 2024-05-16 |
Family
ID=85020422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/127265 WO2024099109A1 (zh) | 2022-11-11 | 2023-10-27 | 一种联邦学习模型训练方法、装置、设备及存储介质 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN115660115A (zh) |
TW (1) | TW202420136A (zh) |
WO (1) | WO2024099109A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660115A (zh) * | 2022-11-11 | 2023-01-31 | 中国银联股份有限公司 | 一种联邦学习模型训练方法、装置、设备及存储介质 |
US20240054403A1 (en) * | 2023-03-17 | 2024-02-15 | Intel Corporation | Resource efficient federated edge learning with hyperdimensional computing |
CN116911403B (zh) * | 2023-06-06 | 2024-04-26 | 北京邮电大学 | 联邦学习的服务器和客户端的一体化训练方法及相关设备 |
CN117390448B (zh) * | 2023-10-25 | 2024-04-26 | 西安交通大学 | 一种用于云际联邦学习的客户端模型聚合方法及相关系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210073639A1 (en) * | 2018-12-04 | 2021-03-11 | Google Llc | Federated Learning with Adaptive Optimization |
CN113011599A (zh) * | 2021-03-23 | 2021-06-22 | 上海嗨普智能信息科技股份有限公司 | 基于异构数据的联邦学习系统 |
CN113435604A (zh) * | 2021-06-16 | 2021-09-24 | 清华大学 | 一种联邦学习优化方法及装置 |
CN114528304A (zh) * | 2022-02-18 | 2022-05-24 | 安徽工业大学 | 一种自适应客户端参数更新的联邦学习方法、系统及存储介质 |
CN115660115A (zh) * | 2022-11-11 | 2023-01-31 | 中国银联股份有限公司 | 一种联邦学习模型训练方法、装置、设备及存储介质 |
CN116720592A (zh) * | 2023-07-24 | 2023-09-08 | 中国电信股份有限公司 | 联邦学习模型训练方法、装置、非易失性存储介质及电子设备 |
-
2022
- 2022-11-11 CN CN202211414446.XA patent/CN115660115A/zh active Pending
-
2023
- 2023-10-27 WO PCT/CN2023/127265 patent/WO2024099109A1/zh unknown
- 2023-11-03 TW TW112142349A patent/TW202420136A/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210073639A1 (en) * | 2018-12-04 | 2021-03-11 | Google Llc | Federated Learning with Adaptive Optimization |
CN113011599A (zh) * | 2021-03-23 | 2021-06-22 | 上海嗨普智能信息科技股份有限公司 | 基于异构数据的联邦学习系统 |
CN113435604A (zh) * | 2021-06-16 | 2021-09-24 | 清华大学 | 一种联邦学习优化方法及装置 |
CN114528304A (zh) * | 2022-02-18 | 2022-05-24 | 安徽工业大学 | 一种自适应客户端参数更新的联邦学习方法、系统及存储介质 |
CN115660115A (zh) * | 2022-11-11 | 2023-01-31 | 中国银联股份有限公司 | 一种联邦学习模型训练方法、装置、设备及存储介质 |
CN116720592A (zh) * | 2023-07-24 | 2023-09-08 | 中国电信股份有限公司 | 联邦学习模型训练方法、装置、非易失性存储介质及电子设备 |
Non-Patent Citations (1)
Title |
---|
SAI PRANEETH KARIMIREDDY; SATYEN KALE; MEHRYAR MOHRI; SASHANK J. REDDI; SEBASTIAN U. STICH; ANANDA THEERTHA SURESH: "SCAFFOLD: Stochastic Controlled Averaging for Federated Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 April 2021 (2021-04-09), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081922657 * |
Also Published As
Publication number | Publication date |
---|---|
TW202420136A (zh) | 2024-05-16 |
CN115660115A (zh) | 2023-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024099109A1 (zh) | 一种联邦学习模型训练方法、装置、设备及存储介质 | |
US8904149B2 (en) | Parallelization of online learning algorithms | |
US20210089511A1 (en) | Management of snapshot in blockchain | |
US10540587B2 (en) | Parallelizing the training of convolutional neural networks | |
Avella-Medina et al. | Robust estimation of high-dimensional covariance and precision matrices | |
Shen et al. | Two updating schemes of iterative learning control for networked control systems with random data dropouts | |
US11587356B2 (en) | Method and device for age estimation | |
TWI736838B (zh) | 訓練全連接神經網路的裝置及方法 | |
US20170206450A1 (en) | Method and apparatus for machine learning | |
US20190244100A1 (en) | Cascaded computing for convolutional neural networks | |
US20210233068A1 (en) | Settlement system, settlement method, user device, and settlement program | |
WO2023124296A1 (zh) | 基于知识蒸馏的联合学习训练方法、装置、设备及介质 | |
US10762616B2 (en) | Method and system of analytics system balancing lead time and accuracy of edge analytics modules | |
JP6870508B2 (ja) | 学習プログラム、学習方法及び学習装置 | |
WO2017166155A1 (zh) | 一种对神经网络模型进行训练的方法、装置及电子设备 | |
EP3168754B1 (fr) | Procédé d'identification d'une entité | |
WO2021051556A1 (zh) | 深度学习权值更新方法、系统、计算机设备及存储介质 | |
EP3779801A1 (en) | Method for optimizing neural network parameter appropriate for hardware implementation, neural network operation method, and apparatus therefor | |
CN113869293A (zh) | 车道线识别方法、装置、电子设备和计算机可读介质 | |
US11528125B2 (en) | Electronic device for sorting homomorphic ciphertext using shell sorting and operating method thereof | |
CN105405052A (zh) | 一种计算保险产品的保险相关费用的方法及系统 | |
WO2019037409A1 (zh) | 神经网络训练系统、方法和计算机可读存储介质 | |
CN116911403B (zh) | 联邦学习的服务器和客户端的一体化训练方法及相关设备 | |
WO2024152797A1 (zh) | 视频补全方法、装置、介质和电子设备 | |
CN109388784A (zh) | 最小熵核密度估计器生成方法、装置和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23887799 Country of ref document: EP Kind code of ref document: A1 |