CN115907044A

CN115907044A - Method, device and system for federated learning, storage medium and electronic equipment

Info

Publication number: CN115907044A
Application number: CN202211468920.7A
Authority: CN
Inventors: 沈力; 孙昊; 陶大程
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-04-04

Abstract

The invention discloses a method, a device and a system for federated learning, a storage medium and electronic equipment. The method comprises the following steps: in any global iteration process, receiving a correction gradient item in the current global iteration process; in any local iteration process in the current global iteration process, determining a local gradient term of the machine learning model in the current local iteration process; modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameters of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameters; and under the condition of finishing the local iteration process, sending the model parameter change in the current global iteration process to the central server node. The embodiment of the invention avoids the condition that each computing node enters the local optimal value in the local training process, and improves the generalization performance of the machine learning model obtained by training.

Description

Method, device and system for federated learning, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of federal learning, in particular to a method, a device and a system for federal learning, a storage medium and electronic equipment.

Background

Federal learning is used as a distributed machine learning framework, and a plurality of computing nodes cooperatively train the same model. In the training process, the computing nodes and the central server node only interact model parameters and do not interact original data, so that the training method can avoid the sharing of the original data of each computing node.

In the process of implementing the invention, at least the following technical problems are found in the prior art: due to the heterogeneity of local training data of each computing node, the optimal parameters of each computing node are often different, and meanwhile, the average value of the optimal parameters is often not the global optimal parameter. The difference of model parameters obtained by local iteration of different computing nodes is not beneficial to the overall training of the model.

Disclosure of Invention

The invention provides a method, a device and a system for federated learning, a storage medium and electronic equipment, which are used for reducing the condition that each computing node is trapped into local optimum in the training process.

According to one aspect of the invention, a federated learning method is provided and applied to computing nodes, the method comprising:

in any global iteration process, receiving a correction gradient item in the current global iteration process;

in any local iteration process in the current global iteration process, determining a local gradient item of the machine learning model in the current local iteration process;

modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameters of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameters;

and under the condition of finishing the local iteration process, sending the model parameter change in the current global iteration process to a central server node, wherein the central server node determines a correction gradient item required by the next global iteration process based on the model parameter change in the current global iteration process.

According to another aspect of the present invention, a federated learning method is provided, which is applied to a central server node, and the method includes:

in any global iteration process, receiving model parameter changes sent by each computing node;

globally updating the model parameters based on the model parameter change sent by each computing node, and determining a correction gradient item of the next global iteration process;

and sending the updated model parameters and the corrected gradient items of the next global iteration process to each computing node, wherein each computing node carries out the next global iteration process.

According to another aspect of the present invention, there is provided a federated learning apparatus integrated with a computing node device, the apparatus including:

the global information acquisition module is used for receiving the correction gradient item in the current global iteration process in any global iteration process;

the modified gradient item determining module is used for determining a local gradient item of the machine learning model in the current local iteration in any local iteration process in the current global iteration process;

the local parameter updating module is used for correcting a local gradient item of the current local iteration based on the corrected gradient item, updating a model parameter of the machine learning model based on the corrected target gradient item, and executing a next local iteration process based on the updated model parameter;

and the information sending module is used for sending the model parameter change in the current global iteration process to the central server node under the condition of finishing the local iteration process, wherein the central server node determines the correction gradient item required by the next global iteration process based on the model parameter change in the current global iteration process.

According to another aspect of the present invention, there is provided a federated learning apparatus integrated in a central server node device, the apparatus including:

the local information receiving module is used for receiving the model parameter change sent by each computing node in any global iteration process;

the global updating module is used for carrying out global updating on the model parameters based on the model parameter changes sent by each computing node and determining a correction gradient item of the next global iterative process;

and the information sending module is used for sending the updated model parameters and the corrected gradient items of the next global iteration process to each computing node, wherein each computing node carries out the next global iteration process.

According to another aspect of the present invention, there is provided a federated learning system, comprising a central server node and a plurality of compute nodes, wherein,

the central server node issues a correction gradient item, a global model parameter and a momentum parameter in the current global iteration process to each computing node;

the computing node receives the corrected gradient item, the global model parameter and the momentum parameter, and determines the local gradient item of the machine learning model in the current local iteration in any local iteration process in the current global iteration process; modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameters of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameters; under the condition of completing the local iteration process, sending the model parameter change and the local momentum parameter in the current global iteration process to a central server node;

and the central server node determines a correction gradient item, a global model parameter and a global momentum parameter required by the next global iteration process based on the model parameter change and the local momentum parameter in the current global iteration process, and transmits the correction gradient item, the global model parameter and the global momentum parameter required by the next global iteration process to each computing node until the global iteration is completed to obtain a trained machine learning model.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the federal learning method as defined in any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium having stored thereon computer instructions for causing a processor to, when executed, implement a federal learning method as in any of the embodiments of the present invention.

According to the technical scheme of the embodiment, in the process of local iteration of the computing nodes, the local gradient item in each local iteration process is corrected through the correction gradient item issued by the central server node, so that model parameter differences caused by local gradient item differences obtained on different computing nodes are reduced, the condition that each computing node sinks into a local optimal value in the local training process is avoided, and the generalization performance of the trained machine learning model is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a federated learning method provided in an embodiment of the present invention;

FIG. 2 is a flowchart of a Federation learning method provided by the embodiment of the present invention;

fig. 3 is a schematic structural diagram of a bang learning device according to a third embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a Federation learning device according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a federated learning system provided in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The federal learning process includes: 1. and the computing node trains the distributed model by using the data of the computing node, and uploads the trained model to the central server node again. 2. And the central server node aggregates the uploaded models and then distributes the same global model to a plurality of computing nodes. These two steps are iterated until the model converges. In the federate learning system formed by a plurality of computing nodes and a central server node, the central server node may be configured in a central server device, and the computing nodes are configured in a computing node device, where different computing nodes may be configured in different computing node devices, or two or more computing nodes may be configured in the same computing node device, which is not limited herein.

In this embodiment, the compute node and the server node are trained together to obtain a machine learning model. The application scenario of federal learning is not limited here, that is, the type of the machine learning model and the function of the trained machine learning model are not limited. In some embodiments, the machine learning model may include, but is not limited to, a neural network model, a logistic regression model, etc., wherein the neural network model may be, but is not limited to, a convolutional neural network model CNN, a recurrent neural network model RNN, a long-short term memory network model LSTM, a residual network model ResNet50, etc. The federal learning of machine learning models can be applied to machine learning models such as an image classification model, an image segmentation model, an image feature extraction model, an image compression model, an image enhancement model, an image noise reduction model, an image tag generation model, a text classification model, a text translation model, a text abstract extraction model, a text prediction model, a keyword conversion model, a text semantic analysis model, a speech recognition model, an audio noise reduction model, an audio synthesis model, an audio equalizer conversion model, a weather prediction model, a commodity recommendation model, an article recommendation network, an action recognition model, a face recognition model, a facial expression recognition model, and the like. The above application scenarios are merely exemplary, and the application scenarios of the neural model generation method are not limited in the present application.

In federated learning of machine learning models, computational nodes locally train machine learning models iteratively. Correspondingly, the computing node is preset with sample data, and iterative training is performed on the machine learning model based on the sample data, wherein a training mode for the machine learning model is not limited, for example, supervised training, unsupervised training and the like can be performed, and the machine learning model can be trained, and network parameters can be updated.

The sample data can be image data, and the prediction result of the machine learning model is an image processing result; or the sample data is text data, and the prediction result of the machine learning model is a text processing result; or the sample data is audio data, and the prediction result of the machine learning model is an audio processing result.

For example, if the sample data is image data, the machine learning model may be an image classification model, and the prediction result output by the machine learning model may be an image classification result; alternatively, the machine learning model may be an image segmentation model, and the prediction result may be an image segmentation result; alternatively, the machine learning model may be an image feature extraction model, and the prediction result may be an image feature extraction result; alternatively, the machine learning model may be an image compression model and the prediction result may be an image compression result; alternatively, the machine learning model may be an image enhancement model and the prediction result may be an image enhancement result; alternatively, the machine learning model may be an image denoising model, and the prediction result may be an image denoising result; alternatively, the machine learning model may be an image label generation model, the prediction result may be an image label, and so on. If the sample data is text data, the machine learning model can be a text classification model, and a prediction result output by the machine learning model can be a text classification result; alternatively, the machine learning model may be a text prediction model and the prediction result may be a text prediction result; alternatively, the machine learning model may be a text summarization extraction model, and the prediction result may be a text summarization extraction result; alternatively, the machine learning model may be a text translation model and the prediction result may be a text translation result; alternatively, the machine learning model may be a keyword conversion model, and the prediction result may be a keyword conversion result; alternatively, the machine learning model may be a text semantic analysis model, and the prediction result may be a text semantic analysis result, or the like. If the sample data is audio data, the machine learning model can be a speech recognition model, and the prediction result output by the machine learning model can be a speech recognition result; alternatively, the machine learning model may be an audio noise reduction model and the prediction result may be an audio noise reduction result; alternatively, the machine learning model may be an audio synthesis model and the prediction result may be an audio synthesis result; alternatively, the machine learning model may be an audio equalizer transition model, the prediction result may be an audio equalizer transition result, and so on.

The federated learning process comprises a plurality of global iteration processes, in each global iteration process, the central server node sends model parameters of the machine learning model to each computing node, the computing nodes complete model parameter updating by locally performing a plurality of times of local iteration, and the trained model is uploaded to the central server node again. And the central server node aggregates the uploaded models to complete one-time global iteration, and sends the aggregated model parameters serving as one global model to a plurality of computing nodes to perform the next global iteration process.

Due to the fact that the data distribution of local training data on different computing nodes is different, namely the heterogeneity of the training data, model parameters obtained by training on different computing nodes are different and fall into a local optimal value, and therefore updating of the whole model parameters which are not beneficial to federal learning is achieved. In view of the above technical problems, an embodiment of the present invention provides a federated learning method, and fig. 1 is a flowchart of the federated learning method provided in the embodiment of the present invention, and this embodiment may be applicable to a case where a computation node performs local multiple local iterative training on a machine learning model, where the method may be executed by a federated learning apparatus, the federated learning apparatus may be implemented in the form of hardware and/or software, the federated learning apparatus may be configured in a computation node device, and the computation node device may be an electronic device such as a computer, a mobile phone, a PC terminal, and the like. As shown in fig. 1, the method includes:

and S110, receiving the correction gradient item in the current global iteration process in any global iteration process.

And S120, determining a local gradient item of the machine learning model in the current local iteration in any local iteration process in the current global iteration process.

S130, modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameter of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameter.

S140, under the condition that the local iteration process is completed, sending the local momentum parameter of the model parameter change in the current global iteration process to a central server node, wherein the central server node determines a correction gradient item required by the next global iteration process based on the model parameter change in the current global iteration process.

In any global iteration process, the computing node receives the model parameters issued by the central server node, and performs local iteration optimization on the model parameters for multiple times based on local training data until the local iteration process is completed.

In this embodiment, the computation node further receives a correction gradient item issued by the central server node, where the correction gradient item is determined by the central server node and synchronously sent to all computation nodes in the current global iteration process, where the correction gradient item in the current global iteration process is obtained based on a change in a model parameter in the last global iteration process, and the correction gradient item may be set to zero in the first global falling process.

In the current global iteration process, each computing node locally performs multiple local iteration processes, and in each obtaining process, model parameters can be updated through a gradient descent method. In this embodiment, in the process of updating the model parameters based on the gradient descent method, each computation node corrects the local gradient item of each local iteration process by correcting the gradient item, and updates the model parameters based on the corrected gradient, thereby reducing the difference between the model parameters obtained by training different computation nodes.

Optionally, the receiving a modified gradient term in the current global iteration process includes: receiving correction gradient item issued by central server node, current time global iterative processInitial model parameters and initial momentum parameters. The correction gradient term, the initial model parameter and the initial momentum parameter in the current global iteration process are determined based on the iteration result of the last global iteration process. It should be noted that, during the first global training process, the modified gradient term, the initial model parameter and the initial momentum parameter are respectively initialization data, for example, the modified gradient term may be zero, that is, g _0,α =0, the initial momentum parameter may be a preset value, i.e.

The initial model parameter may be an initial value x ₀ The initial value may be preset, for example, 0 or 0.5, and may also be obtained by performing an initialization operation on the model parameters, which is not limited herein. />

The computing node locally performs k local iterations, where the local iterations may be the same or different in each global iteration process, which is not limited. Before local iteration is carried out, the computing node carries out initialization setting on the basis of a correction gradient item issued by the central server node, an initial model parameter of the current global iteration process and an initial momentum parameter, and the initial model parameter and the initial momentum parameter are used as initial parameters of local iteration.

And setting corresponding parameters in the machine learning model based on the initial model parameters to form the machine learning model to be trained, and performing multiple local iterations on the machine learning model to be trained. In each local iteration process, a local momentum parameter of the current local iteration is determined, and a local gradient item is determined based on the local momentum parameter, wherein the local gradient item is used for updating the model parameter in the current local iteration process.

Optionally, the determining a local gradient term of the machine learning model at the current local iteration includes: determining local momentum parameters of a machine learning model at the current local iteration, wherein the local momentum parameters comprise first-order momentum and second-order momentum; determining a local gradient term for the current local iteration based on the first and second order momentums.

Wherein the determining the machine learning model is currentLocal momentum parameters for the secondary local iteration, including: determining the random gradient of the machine learning model in the current local iteration; and determining the first-order momentum of the current local iteration based on the random gradient and the first-order momentum of the last local iteration, and determining the second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration. In each local iteration process, inputting the local training data into a machine learning model to be trained of the current local iteration, iterating a prediction result, and determining a loss function based on the prediction result to determine the random gradient of the current local iteration. The loss function may be preset, and is not limited herein, and may be adaptively changed according to different application scenarios. Illustratively, the loss function may be labeled f _i (x _t，k-1，i ,ξ _t，k，i )。，ξ _t，k，i For any training sample data, it can be expressed as xi _t，k，i ＝(ξ _t，k ， _ifeature ，ξ _t，k ， _ilabel ). Wherein xi is _{t，k，ifeature} And representing the characteristics of training sample data participating in the kth iteration training of the ith computing node device. Expressed as the characteristics of training sample data participating in the kth iteration training of the ith computing node device. Xi shape _{t，k，ilabel} And representing labels of training sample data participating in the kth iteration training of the ith computing node device. x is the number of _t，k-1，i Model parameters before training for the current local iteration. Determining a random gradient based on the loss function

Wherein t is the global iteration number, k is the local iteration number, and i is the calculation node identifier. In particular, the derivative of the model parameter in the machine learning model may be determined based on the loss function, and the derivative may be derivatives of different orders corresponding to different network layers, for example, in the case that the machine learning model includes three network layers, the derivative of the model parameter in the machine learning model includes a first derivative, a second derivative and a third derivative of the loss function to the model parameterAnd (4) counting. And combining the derivatives of each order respectively corresponding to each model parameter to obtain the random gradient of the machine learning model in the current iteration, for example, combining the derivatives of each order respectively corresponding to each model parameter in a matrix or vector form to obtain the random gradient of the machine learning model in the current iteration.

The computing node is preset with a hyper-parameter, wherein the hyper-parameter can comprise a first-order momentum-related hyper-parameter beta ₁ And a second order momentum-related hyperparameter beta ₂ Correspondingly, the first-order momentum and the first-order momentum associated hyperparameter beta based on the random gradient and the last local iteration ₁ Calculating a first-order momentum of the current local iteration, specifically, the first-order momentum can be calculated based on the following formula:

m _t,k,i ＝β ₁ m _t,k-1,i -(1-β ₁ )g _t,k,i

wherein m is _t,k-1,i Is the first order momentum corresponding to k-1 local iteration processes in t global iteration processes, i.e. the first order momentum of the last local iteration, m _t,k,i In the t times of global iteration processes, the first-order momentum corresponding to the k times of local iteration processes, namely the first-order momentum of the current time of local iteration. g _t,k,i Is the random gradient in the current local iteration process.

In some embodiments, the second-order momentum based on random gradients, last local iteration, and the second-order momentum associated hyperparameter β ₂ Determining the second-order momentum of the current local iteration, specifically, calculating the second-order momentum based on the following formula:

wherein v is _t,k-1,i Is the second-order momentum corresponding to k-1 local iteration processes in t global iteration processes, namely the second-order momentum, v, of the last local iteration _t,k,i The second-order momentum corresponding to the k local iteration processes in the t global iteration processes, namely the second-order momentum of the current local iteration.

In some embodiments, whenThe second-order momentum applied in the previous local iteration process is the maximum value of the second-order momentum calculated at the current time and the second-order momentum calculated in the previous local iteration process. Correspondingly, the determining the second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration comprises: determining the current second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration; and determining the second-order momentum of the last local iteration and the maximum value of the current second-order momentum as the second-order momentum of the current local iteration. Illustratively, the target second-order momentum in the current local iteration is

It should be noted that the dimensions of the first-order momentum, the second-order momentum, and the target second-order momentum are all the same as the dimension of the random gradient, and the dimensions can be expressed as ^ greater than or equal to>

。

And determining a local gradient term of the current local iteration based on the first-order momentum and the second-order momentum, specifically, determining a local gradient term of the current local iteration based on a ratio of the first-order momentum to the second-order momentum. For example, the local gradient term may be calculated based on the following formula:

wherein, the ratio of the first-order momentum to the target second-order momentum is used as a local gradient term.

On the basis of the embodiment, in each local iteration process, the local gradient item of the current local iteration is corrected through the correction gradient item to obtain a target gradient item, and the target gradient item is used for updating the model parameters.

Optionally, determining a target gradient term of the current local iteration based on the local gradient term and the modified gradient term includes: based on preset weight, carrying out weighted fusion on the local gradient term and the corrected gradient term to obtain the current local iterationA target gradient term, wherein the weight of the modified gradient term is not zero. Illustratively, the weights of the local gradient term and the modified gradient term are set, the sum of the weights of the local gradient term and the modified gradient term is 1, and correspondingly, the target gradient term can be expressed as

Wherein, g _t,α And alpha is the weight of the local gradient term and 1-alpha is the weight of the modified gradient term in the current global iteration process. It is defined that the weight of the modified gradient term is not zero, i.e. α ≠ 1. In some embodiments, the weight α is a modification weight, which can be preset.

Based on the above embodiment, the model parameters in the current local iteration process are updated based on the target gradient term, and the update of the model parameters may be implemented by the following formula, for example:

wherein x is _t,k-1,i For the model parameter, x, before the current local iteration _t,k,i For the updated model parameter, η, of the current local iteration _l To calculate the local learning rate of the node.

And the computing nodes iteratively execute the local iterative process until K local iterations are completed, and a machine learning model of the current global iterative process after the computing nodes are updated is obtained. And the computing node locally completes the current global iteration part and sends the iteration result to the central server node. The iteration result may include the model parameters after iteration, or the model parameter change between the model parameters after iteration and the model parameters before iteration. In some embodiments, the iteration result includes a model parameter change x _t,0,i -x _t,K,i Wherein x is _t,K,i For the model parameters after k local iterations, x _t,0,i Are the model parameters after k local iterations.

Each computing node sends the iteration result to the central server node, so that the central server node obtains the global model parameters of the current global iteration process and the correction gradient items in the next global iteration process based on the iteration result of each computing node, and the next global iteration process is facilitated. And determining the global model parameters obtained in the next global iteration process based on the current global iteration process and the global model parameters before the current global iteration process.

In some embodiments, the local momentum parameters obtained by the last local iteration process may also be included in the iteration result. Correspondingly, the local momentum parameters are sent to the central server node, so that the central server node obtains initial momentum parameters in the next global iteration process based on the local momentum parameters of each computing node, and performs the next global iteration.

According to the technical scheme of the embodiment, in the process of local iteration of the computing nodes, the local gradient item in each local iteration process is corrected through the correction gradient item issued by the central server node, so that model parameter differences caused by local gradient item differences obtained on different computing nodes are reduced, the situation that each computing node sinks into a local optimal value in the local training process is avoided, and the generalization performance of the trained machine learning model is improved.

Fig. 2 is a flowchart of a federated learning method provided in an embodiment of the present invention, where the federated learning method is applicable to a case where a central server node processes an iteration result of each computing node, and the method may be executed by a federated learning apparatus, where the federated learning apparatus may be implemented in a form of hardware and/or software, the federated learning apparatus may be configured in a central server node device, and the central server node device may be an electronic device such as a computer, a server, and the like. As shown in fig. 2, the method includes:

and S210, receiving the model parameter change sent by each computing node in any global iteration process.

S220, carrying out global updating on the model parameters based on the model parameter changes sent by each computing node, and determining a correction gradient item of the next global iterative process.

And S230, sending the updated model parameters and the corrected gradient items of the next global iteration process to each computing node, wherein each computing node carries out the next global iteration process.

In this embodiment, the central server node sends the modified gradient term, the initial model parameter of the current-time global iteration process, and the initial momentum parameter to the computing nodes participating in the current-time global iteration in each global iteration process, so that each computing node performs local model training based on the modified gradient term, the initial model parameter of the current-time global iteration process, and the initial momentum parameter, and obtains the iteration result of the current-time global iteration respectively. In this embodiment, the iteration result includes a change in the model parameter.

The central server node can obtain a model parameter change mean value based on the model parameter change of each computing node, and updates the global model parameter of the current iteration process based on the global learning rate and the model parameter change mean value to realize the update of the global model parameter. Illustratively, this can be achieved based on the following formula:

wherein x is _t,0,i -x _t,K,i The model parameter change sent for the ith computing node, S is the number of computing nodes participating in model training in the current global iteration process, eta _g For global learning rate, x _t Is the global model parameter, x, of the current iteration process _t+1 Is the updated global model parameter, i.e. the global model parameter of the next global iteration.

Further, based on the global model parameter x before updating _t And updated global model parameters x _t+1 And determining a modified gradient term in the next global iteration process. Specifically, the global model parameter x before updating can be used _t And updated global model parameters x _t+1 And calculating the corrected gradient term in the next global iteration process by the parameter difference, the local iteration times K, the local learning rate and the global learning rate. Exemplary, it can be implemented based on the following calculation process:

in some embodiments, the iteration result of each computation node further includes a local dynamic parameter, that is, a local dynamic parameter obtained after K local iterations. Correspondingly, the central server node is further configured to determine a global dynamic parameter based on the local dynamic parameter of each computing node, where the global dynamic parameter is used as an initial dynamic parameter of each computing node in a next global iteration process. For example, the global dynamic parameter may be determined by an average of the local dynamic parameters of each compute node, e.g.

Wherein +>

And sending a local dynamic parameter, namely the target second-order momentum, for the ith computing node.

And the central server node sends the updated model parameters and the corrected gradient items of the next global iteration process to each computing node, wherein the updated model parameters are used as initial model parameters for the next global iteration of each computing node. In some embodiments, the global dynamic parameter may also be sent to each computing node.

It should be noted that, in each global iteration process, only the local compute nodes perform local training, the number of the compute nodes performing local training in different global iteration processes is the same, and the compute nodes performing local training may be different. The number of the computing nodes or the proportion of the computing nodes is preset, random extraction is carried out in all the computing nodes, and the computing nodes which are locally trained in each global iteration process are determined.

Correspondingly, the step of sending the model parameters and the modified gradient terms of the next global iteration process to the computing node in each global iteration comprises the following steps: extracting the calculation nodes with a preset proportion to serve as the calculation nodes of the next global iteration; and sending the updated model parameters and the corrected gradient items of the next global iteration process to the extracted computing nodes. In each global iteration process, local training is carried out by extracting local computing nodes, and on the basis of reducing the number of computing nodes participating in global iteration and reducing communication cost and computing cost, the diversity of training data applied in different global iteration processes can be improved, and the generalization of a model obtained by training is improved.

According to the technical scheme provided by the embodiment, the iteration result of each computing node is processed at the central server node to obtain the global model parameter, and meanwhile, the correction gradient item in the next global iteration process is determined based on the global model parameter. By issuing the correction gradient item to each computing node, in the local training process of each computing node, the local gradient item of the computing node in the local iteration process is corrected, so that the difference of the local gradient items among different computing nodes is reduced, the difference of model parameters obtained by iteration is further reduced, the accuracy of global model parameters is improved, and the situation that each computing node falls into local optimum is avoided.

Fig. 3 is a schematic structural diagram of a federated learning apparatus provided in a third embodiment of the present invention, where the federated learning apparatus may be configured in a computing node device. As shown in fig. 3, the apparatus includes:

a global information obtaining module 310, configured to receive, in any global iteration process, a modified gradient item in a current global iteration process;

a modified gradient term determining module 320, configured to determine, in any local iteration process in the current global iteration process, a local gradient term of the machine learning model in the current local iteration;

a local parameter updating module 330, configured to modify a local gradient item of a current local iteration based on the modified gradient item, update a model parameter of the machine learning model based on a modified target gradient item, and execute a next local iteration process based on the updated model parameter;

and an information sending module 340, configured to send the change of the model parameter in the current global iteration process to a central server node when the local iteration process is completed, where the central server node determines, based on the change of the model parameter in the current global iteration process, a corrected gradient term and a global model parameter that are required in the next global iteration process.

On the basis of the above embodiment, optionally, the modified gradient term determining module 320 includes:

the local momentum parameter determining unit is used for determining local momentum parameters of the machine learning model in the current local iteration, wherein the local momentum parameters comprise first-order momentum and second-order momentum;

and the local gradient item determining unit is used for determining a local gradient item of the current local iteration based on the first-order momentum and the second-order momentum.

Optionally, the local momentum parameter determining unit includes:

the random gradient determining subunit is used for determining the random gradient of the machine learning model in the current local iteration;

the first-order momentum determining subunit is used for determining the first-order momentum of the current local iteration based on the random gradient and the first-order momentum of the last local iteration;

and the second-order quantitative determination subunit is used for determining the second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration.

Optionally, the second order quantitative determination subunit is configured to:

determining the current second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration;

and determining the second-order momentum of the last local iteration and the maximum value of the current second-order momentum as the second-order momentum of the current local iteration.

On the basis of the foregoing embodiment, optionally, the local parameter updating module 330 is configured to:

and performing weighted fusion on the local gradient term and the corrected gradient term based on a preset weight to obtain a target gradient term of the current local iteration, wherein the weight of the corrected gradient term is not zero.

On the basis of the foregoing embodiment, optionally, the global information obtaining module 310 is configured to:

and receiving a correction gradient item, an initial model parameter of the current global iterative process and an initial momentum parameter which are issued by the central server node.

The federal learning device provided by the embodiment of the invention can execute the federal learning method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 4 is a schematic structural diagram of a federal learning device provided in an embodiment of the present invention, where the federal learning device may be configured in a central server node device. As shown in fig. 4, the apparatus includes:

a local information receiving module 410, configured to receive, in any global iteration process, a model parameter change sent by each computing node;

a global update module 420, configured to perform global update on the model parameters based on the change of the model parameters sent by each computing node, and determine a modified gradient term of the next global iteration process;

and an information sending module 430, configured to send the updated model parameter and the modified gradient term of the next global iteration process to each computing node, where each computing node performs the next global iteration process.

On the basis of the foregoing embodiment, optionally, the information sending module 430 is configured to:

extracting the calculation nodes with preset proportion as the calculation nodes of the next global iteration;

and sending the updated model parameters and the corrected gradient items of the next global iteration process to the extracted computing nodes.

Fig. 5 is a schematic structural diagram of a bang learning system according to a third embodiment of the present invention. As shown in fig. 5, the system includes: a central server node 510 and a plurality of compute nodes 520,

the central server node 510 issues the modified gradient term, the global model parameter and the momentum parameter in the current global iteration process to each computing node;

the computing node 520 receives the modified gradient term, the global model parameter and the momentum parameter, and determines the local gradient term of the machine learning model in the current local iteration in any local iteration process in the current global iteration process; modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameters of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameters; when the local iteration process is completed, sending the model parameter change and the local momentum parameter in the current global iteration process to the central server node 510;

the central server node 510 determines a correction gradient item, a global model parameter and a global momentum parameter required by the next global iteration process based on the model parameter change and the local momentum parameter in the current global iteration process, and issues the correction gradient item, the global model parameter and the global momentum parameter required by the next global iteration process to each computing node until the global iteration is completed, so as to obtain a trained machine learning model.

Before the first global iteration, the central server node sets an initialization model parameter x ₀ And the hyperparameter is as follows: local learning rate η _l Global learning rate eta _g Correction of the weight α, correction of the gradient g _0,α =0, number of devices per round S, hyper-parameter β ₁ ，β ₂ . Initialized parameter beta ₁ And beta ₂ ，

The global iterative process is a synchronization process of the central server node and the computing node. The local iterative process is a process in which each computing node performs model training locally. In the global iteration process, the central server node selects some computing nodes to carry out the local copyAnd (4) calculating, processing the collected model parameter change and second-order momentum by the central server node, and updating the stored information by using the averaged model parameters and other states. For example by

Derive a second order momentum for the next global iteration, pass ^ er>

Get the global model parameter for the next global iteration, pass ^ er>

And obtaining a modified gradient term of the next global iteration. The central server node then distributes the model parameters and other states to the compute nodes for the next round of computation.

In the local iteration process, the selected S computing nodes compute the local iteration in parallel. The initial values are model parameters, second-order momentum and modified gradient terms distributed by the central server node in the previous round. The compute node then uses the local data to compute a stochastic gradient. In the local iteration step, the AMSGrad method is adopted to update the parameters of the model. Obtaining the gradient g of the model parameters from the back propagation of the loss function _t,k,i Where t represents the number of cycles of global synchronization, k represents the round of local iteration, and i represents the node belonging to the ith compute node. From gradient g _t,k,i And state m in the previous local iteration round _t,k-1,i And v _t,k-1,i Calculate the current momentum m _t,k,i And a second order momentum v _t,k,i . The momentum and the second-order momentum dimension are the same as the gradient dimension, and belong to

. Selecting the current second-order momentum v _t,k,i And the second-order momentum taken in the previous iteration>

The larger of two is adopted as the current iterationThe order momentum. From the current x _t,k-1,i Model parameter x when k is obtained by updating _t,k,i . After the iteration is carried out for K steps, the computing node uploads the current model parameters and other states to the central server node.

The learning process of the present embodiment, when (1) the continuity assumption (2) the variance-bounded (3) the gradient-bounded (4) the device gradient difference-bounded assumption are satisfied, satisfies the global learning rate and the local learning rate

When we select->

And &>

Then, our algorithm converges at a rate of:

wherein S is the number of the selected working nodes, K is the number of local cycles, and T is the number of global cycles.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 11 performs the various methods and processes described above, such as the federal learning method.

In some embodiments, the federated learning method may be implemented as a computer program that is tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more of the steps of the federal learning method described above. Alternatively, in other embodiments, processor 11 may be configured to perform the federal learning method in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the federal learning method of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer instruction is stored, where the computer instruction is used to enable a processor to execute a method for federated learning, and the method includes:

in any local iteration process in the current global iteration process, determining local momentum parameters of the machine learning model in the current local iteration process;

updating the model parameters of the machine learning model according to the local momentum parameters of the current local iteration based on the corrected gradient term, and executing the next local iteration process based on the updated model parameters;

and under the condition of finishing the local iteration process, sending the model parameter change and the local momentum parameter in the current global iteration process to a central server node, wherein the central server node determines a correction gradient item required by the next global iteration process based on the model parameter change and the local momentum parameter in the current global iteration process.

And/or the presence of a gas in the gas,

computer instructions for causing a processor to perform a method of federated learning, the method comprising:

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for federated learning, which is applied to compute nodes, the method comprising:

2. The method of claim 1, wherein determining a local gradient term for the machine learning model at the current local iteration comprises:

determining local momentum parameters of a machine learning model at the current local iteration, wherein the local momentum parameters comprise first-order momentum and second-order momentum;

determining a local gradient term for the current local iteration based on the first and second order momentums.

3. The method of claim 2, wherein determining the local momentum parameters of the machine learning model at the current local iteration comprises:

determining the random gradient of the machine learning model in the current local iteration;

and determining the first-order momentum of the current local iteration based on the random gradient and the first-order momentum of the last local iteration, and determining the second-order momentum of the current local iteration based on the random gradient and the second-order momentum of the last local iteration.

4. The method of claim 3, wherein determining the second-order momentum for the current local iteration based on the random gradient and the second-order momentum for the previous local iteration comprises:

5. The method of claim 1, wherein the modifying the local gradient term of the current local iteration based on the modified gradient term comprises:

6. The method of claim 1, wherein receiving the modified gradient term during the current global iteration comprises:

and receiving the corrected gradient item, the initial model parameter of the current global iteration process and the initial momentum parameter which are issued by the central server node.

7. A method for federated learning is characterized in that the method is applied to a central server node and comprises the following steps:

globally updating the model parameters based on the model parameter change sent by each computing node, and determining a correction gradient item of the next global iterative process;

8. The method of claim 7, wherein sending the updated model parameters and the modified gradient terms of the next global iteration process to each computing node comprises:

9. A federated learning apparatus, integrated with a computing node device, the apparatus comprising:

the local parameter updating module is used for correcting the local gradient item of the current local iteration based on the corrected gradient item, updating the model parameter of the machine learning model based on the corrected target gradient item, and executing the next local iteration process based on the updated model parameter;

10. The utility model provides a bang learning device which characterized in that, integrates in central server node equipment, the device includes:

11. A federated learning system is characterized by comprising a central server node and a plurality of computing nodes, wherein,

the computing node receives the correction gradient item, the global model parameter and the momentum parameter, and determines the local gradient item of the machine learning model in the current local iteration in any local iteration process in the current global iteration process; modifying the local gradient item of the current local iteration based on the modified gradient item, updating the model parameters of the machine learning model based on the modified target gradient item, and executing the next local iteration process based on the updated model parameters; under the condition of completing the local iteration process, sending the model parameter change and the local momentum parameter in the current global iteration process to a central server node;

and the central server node determines a correction gradient item, a global model parameter and a global momentum parameter required by the next global iteration process based on the model parameter change and the local momentum parameter in the current global iteration process, and transmits the correction gradient item, the global model parameter and the global momentum parameter required by the next global iteration process to each computing node until the global iteration is completed, so as to obtain a trained machine learning model.

12. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the federal learning method as claimed in any of claims 1-6 and/or the federal learning method as claimed in any of claims 7-8.

13. A computer readable storage medium having stored thereon computer instructions for causing a processor to implement the federal learning method as claimed in any of claims 1-6 and/or the federal learning method as claimed in any of claims 7-8 when executed.