CN111723947A

CN111723947A - Method and device for training federated learning model

Info

Publication number: CN111723947A
Application number: CN202010564409.1A
Authority: CN
Inventors: 刘楠; 王玥琪; 李晓丽; 陈川; 郑子彬; 严强; 李辉忠
Original assignee: Sun Yat Sen University; WeBank Co Ltd
Current assignee: Sun Yat Sen University; WeBank Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-09-29

Abstract

The invention discloses a method and a device for training a federated learning model, which comprise the following steps: the client side obtains global model parameters of the (k-1) th iteration broadcasted by the server, wherein k is a positive integer, then the global model parameters are used as local model parameters with regularization constraints, the local data are used for carrying out the kth iteration training to obtain the local model parameters of the kth iteration training, wherein the regularization constraints are determined based on global model parameters of the server and local model parameters of the client, the gradients in the model are optimized by the regularization constraints, thereby reducing the influence of the extreme data on the local model parameter training, improving the accuracy of the local model parameter on the training of the non-independent same distribution data, and then sending the local model parameters of the kth iterative training to the server so that the server updates the global model parameters of the kth iterative training and improves the accuracy of the global model parameters to the training of the non-independent same distribution data.

Description

Method and device for training federated learning model

Technical Field

The invention relates to the field of financial technology (Fintech), in particular to a method and a device for training a joint learning model.

Background

With the development of computer technology, more and more technologies (such as block chains, cloud computing or big data) are applied in the financial field, the traditional financial industry is gradually changing to the financial technology, and big data technology is no exception, but higher requirements are also put forward on big data technology due to the security and real-time requirements of the financial and payment industries.

In the prior art, federal learning is that communication among nodes is performed in a parameter transmission mode, and information provided by data of each node is integrated in a parameter averaging mode in a training process. In the process of federal learning, data needs to be trained, a node is randomly selected each time, a global model is issued to the node, iteration is carried out by using the data of the node, model parameters obtained by training are sent back to a central server, and the central server averages the model parameters obtained by training each node to determine a model for next iteration.

However, in the federal learning in the prior art, the model trained by using the data which is not independent and distributed has low accuracy and poor effect. Therefore, a method is needed to improve the accuracy of training the parameters of the data model with non-independent and same distribution.

Disclosure of Invention

The embodiment of the invention provides a method and a device for training a federated learning model, which are used for improving the accuracy of a model trained by non-independent and identically distributed data and optimizing the model trained by the non-independent and identically distributed data.

In a first aspect, an embodiment of the present invention provides a method for training a bang learning model, including:

the client acquires global model parameters of the (k-1) th iteration broadcasted by the server; k is a positive integer;

the client side takes the global model parameters as local model parameters with regularization constraints, local data are used for conducting kth iterative training, and local model parameters of the kth iterative training are obtained; the regularization constraint is determined according to global model parameters of the server and local model parameters of the client;

and the client side sends the local model parameters trained by the k iteration to the server so that the server updates the global model parameters of the k iteration.

In the technical scheme, a client acquires global model parameters of k-1 iterations, when k is 1, namely the global model parameters are initial global model parameters, the initial global model parameters are set with a regularization constraint as local model parameters, so that the global model parameters and the local model parameters have regularization constraints in the iteration process, when the local model parameters train local data, a loss function of the local model parameters is constrained, the gradient of the local model parameters is optimized, the influence of extreme data in the local data on the training result of the local model parameters is further reduced, the accuracy of the local model parameters on non-independent identically distributed data training is improved, a local model of the kth iteration training is obtained, the local model parameters of the kth iteration training are sent to a server, so that the server updates the global model parameters of the kth iteration, and further improving the accuracy of the global model parameters to the training of the non-independent same distribution data.

Optionally, the regularization constraint determined according to the global model parameter of the server and the local model parameter of the client includes:

and F norm calculation is carried out on the difference value of the global model parameter and the local model parameter to obtain the regularization constraint.

According to the technical scheme, the regularization constraint is obtained according to the global model parameters and the local model parameters, and the regularization constraint is used for optimizing a loss function in the local model parameters and improving the accuracy of the local model parameters on training of the non-independent same distribution data.

Optionally, determining a final loss function of the local model parameter according to the following formula (1);

wherein the content of the first and second substances,

a final loss function for the local model parameters, J (W)⁽ⁱ⁾) As a loss function of the local model parameters, J_T-S(W⁽ⁱ⁾) For the regularization constraint, β are coefficients of the regularization constraint.

Optionally, determining a local model parameter of the kth iterative training according to the following formula (2);

wherein, W_k ⁽ⁱ⁾Is the local model parameter of the kth iteration, W_k-1 ⁽ⁱ⁾Local model parameters for the k-1 th iteration, α_kFor the learning rate of the k-th iteration,

the gradient of the original loss function for the local model parameters,

for the gradient of the regularization constraint, W_k-1 ⁽⁰⁾Is the global model parameter for the (k-1) th iteration.

In a second aspect, an embodiment of the present invention provides a method for training a bang learning model, including:

the server obtains local model parameters of the kth iteration sent by a plurality of clients; the local model parameters of the kth iteration are obtained by training the global model parameters of the kth-1 th iteration by using the local model parameters with regularization constraints by the client; the regularization constraint is determined by the client according to global model parameters of the server and local model parameters of the client;

the server determines global model parameters of the kth iteration according to the local model parameters of the kth iteration and the global model parameters of k-1 iteration;

and the server broadcasts the global model parameters of the kth iteration to the plurality of clients so that the plurality of clients perform the (k + 1) th iteration training.

In the technical scheme, the client side obtains local model parameters of the kth iteration sent by the plurality of client sides, global model parameters of the kth iteration are determined, after the local model parameters of the client sides are trained through local data, the accuracy of the local model parameters in training on non-independent identically distributed data is improved, after the global model parameters are updated according to the local model parameters, the accuracy of the training on the non-independent identically distributed data is also improved through the global model parameters, and then the global model parameters of the kth iteration are broadcasted to the plurality of client sides for the next iteration.

Optionally, determining a global model parameter of the kth iteration according to the following formula (3);

wherein, W_k ⁽⁰⁾Is the global model parameter of the kth iteration, W_k-1 ⁽⁰⁾α for the global model parameters of the k-1 th iteration_kLearning rate for the kth iteration, W_k ⁽ⁱ⁾Local model parameters for the kth iteration for the ith client.

In a third aspect, an embodiment of the present invention provides a training apparatus for a bang learning model, including:

the acquisition module is used for acquiring global model parameters of the (k-1) th iteration broadcasted by the server; k is a positive integer;

the processing module is used for taking the global model parameters as parameters of a local model with regularization constraints, and performing kth iterative training by using local data to obtain local model parameters of the kth iterative training; the regularization constraint is determined according to global model parameters of the server and local model parameters of the client;

and sending the local model parameters trained by the k iteration to the server so that the server updates the global model parameters of the k iteration.

Optionally, the processing module is specifically configured to:

determining a final loss function of the local model parameters according to the following formula (1);

wherein the content of the first and second substances,

Optionally, the processing module is specifically configured to:

determining the local model parameters of the kth iteration according to the following formula (2);

the gradient of the original loss function for the local model parameters,

In a fourth aspect, an embodiment of the present invention provides a training apparatus for a bang learning model, including:

the acquisition unit is used for acquiring local model parameters of the kth iteration sent by a plurality of clients; the local model parameters of the kth iteration are obtained by training the global model parameters of the kth-1 th iteration by using the local model parameters with regularization constraints by the client; the regularization constraint is determined by the client according to global model parameters of the server and local model parameters of the client;

the processing unit is used for determining the global model parameters of the kth iteration according to the local model parameters of the kth iteration and the global model parameters of the k-1 iteration;

broadcasting the global model parameters of the kth iteration to the plurality of clients so that the plurality of clients perform the (k + 1) th iteration training.

Optionally, the processing unit is specifically configured to:

determining the global model parameter of the kth iteration according to the following formula (3);

wherein, W_k ⁽⁰⁾Is the global model parameter of the kth iteration, W_k-1 ⁽⁰⁾α for the global model parameters of the k-1 th iteration_kLearning rate for the kth iteration, W_k ⁽ⁱ⁾Is as followsLocal model parameters for the k-th iteration of i clients.

In a fifth aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

and the processor is used for calling the program instructions stored in the memory and executing the training method of the federal learning model according to the obtained program.

In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer to execute the above method for training the federal learning model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a system architecture diagram according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for training a federated learning model according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a method for training a federated learning model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a device for training a federated learning model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a training device of a bang learning model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 illustrates an exemplary system architecture to which an embodiment of the present invention is applicable, which includes a server 100 and a client 200.

The server 100 is configured to connect with the client 200 and send the global model parameters of the (k-1) th iteration to the client 200, and it should be noted that fig. 1 only illustrates an exemplary client 200, and may actually be multiple clients 200, which is not limited herein.

The client 200 is configured to obtain global model parameters of the k-1 th iteration sent by the server 100, use the global model parameters of the k-1 th iteration as local model parameters with regularization constraints, perform training using local data to obtain local model parameters of the k-th iteration training, and send the local model parameters of the k-th iteration training to the server 100, so that the server 100 updates the global model parameters of the k-th iteration.

It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.

Based on the above description, fig. 2 exemplarily shows a flow of a method for training a federal learning model according to an embodiment of the present invention, where the flow may be performed by a training apparatus of the federal learning model.

As shown in fig. 2, the process specifically includes:

step 201, a client acquires global model parameters of the (k-1) th iteration broadcasted by a server; and k is a positive integer.

In the embodiment of the invention, before acquiring the global model parameter of the kth-1 iteration broadcasted by the server, the client sends the local model parameter of the kth-1 iteration to the server, so that the global model parameter of the kth-1 iteration is broadcasted after the server carries out the global model parameter of the kth-1 iteration.

Step 202, the client side takes the global model parameters as local model parameters with regularization constraints, and performs kth iterative training by using local data to obtain local model parameters of the kth iterative training; the regularization constraints are determined according to global model parameters of the server and local model parameters of the client.

According to the embodiment of the invention, after the client acquires the global model parameter when k is 1, the acquired global model parameter is used as the local model parameter, and the regularization constraint is set, so that before the model is converged, the global model parameter and the local model parameter are provided with the regularization constraint, the final loss function in the local model parameter is obtained, and the influence of the data of the client on the model training is reduced.

Further, the regularization constraint is determined according to the global model parameters of the server and the local model parameters of the client, and comprises the following steps: and performing norm calculation on the difference value of the global model parameter and the local model parameter to obtain the regularization constraint.

The regularization constraint is obtained by calculating an F-norm of a weight in the global model parameter sent by the server and a weight in the local model parameter of the client, where the F-norm is a matrix norm and refers to a square of a sum of squares of each element of the matrix, and the regularization constraint is specifically determined according to the following formula (4).

Wherein, J_T-S(W⁽ⁱ⁾) For regularizing the constraints, W⁽⁰⁾As weights in the global model parameters, W⁽ⁱ⁾Are weights in the local model parameters.

Further, determining a final loss function of the local model parameters according to the following formula (1);

wherein the content of the first and second substances,

is the final loss function of the local model, J (W)⁽ⁱ⁾) As a loss function of the local model parameters, J_T-S(W⁽ⁱ⁾) For the regularization constraint, β are coefficients of the regularization constraint.

And after the client side obtains the global model parameters of the (k-1) th iteration, the regularization constraint is calculated, the global model parameters provided with the regularization constraint are used as local model parameters, then model training is carried out, the final loss function of the local model parameters is obtained, and further the local model parameters are obtained.

By the algorithm, the original loss function (W) of the extreme data distribution relative to the local model parameters⁽ⁱ⁾) The effect is large, but according to the calculation of the regularization constraint, when W⁽⁰⁾When not changed, W⁽ⁱ⁾Increasing, then regularizing the constraint (J)_T-S(W⁽ⁱ⁾) ) is reduced, resulting in a final loss function of the local model

The balance is approached, and the influence of the final loss function is reduced.

Determining the local model parameters of the kth iterative training according to the following formula (2);

the original gradient of the loss function for the local model parameters,

for regularizing the constrained gradient, W_k-1 ⁽⁰⁾Is the global model parameter for the (k-1) th iteration.

And obtaining updated local model parameters of the kth iteration by performing derivation on the final loss function of the local model parameters and adding and summing the derivation and the local model parameters of the k-1 iteration.

In order to better explain the above technical solutions, the following description is made in specific examples.

Example 1

Initializing non-independent and identically distributed data as a training set for model training, and comprising the following two modes.

1. The data is sorted according to the digital labels, then the data is divided into multiple shares, such as ten shares, each client holds various data of the digital labels, such as two, for example, after the data is divided into ten shares, label sorting is carried out, 1-10, 1 and 8 are taken for carrying out model training on the 1 st client, 9 and 7 are taken for carrying out model training on the 2 nd client, and therefore the data of each client cannot be used as a representative of global data distribution.

2. And dividing the data into ten parts by using a reference data set, so that the data quantity on each client is greatly different, and the data of each client cannot be used as a representative of the global data distribution.

Selecting 10 clients to perform model training by using the data, wherein the 10 clients acquire the global model parameter W of the k-1 th time broadcasted by the server_k-1 ⁽⁰⁾And obtaining local model parameters of the kth iteration according to a minimized final loss function:

wherein, W_k ⁽ⁱ⁾Local model parameters for the kth iteration of the ith client, i ∈ (1, 2, 3, 4, 5, 6, 7, 8, 9, 10), α_kAnd obtaining the local model parameters of the k-th iteration of each client, wherein the local model parameters of the k-th iteration of each client cannot be used as representatives of the global model parameters.

Step 203, the client sends the local model parameters of the kth iteration training to the server, so that the server updates the global model parameters of the kth iteration.

In the embodiment of the invention, the plurality of clients send the local model parameters of the kth iteration obtained after training to the server, so that the server obtains the local model parameters of the plurality of kth iterations corresponding to the whole data set, further updates the global model parameters of the kth iteration, and broadcasts the global model parameters of the kth iteration next time.

In the embodiment of the invention, regularization constraint is set on the obtained global model parameter of the k-1 iteration as a local model parameter, so that when the local model parameter is trained, the influence of the terminal data in the local data on the training result of the local model parameter is reduced, the accuracy of the local model parameter on the training of the non-independent identically distributed data is improved, the local model of the kth iteration training is obtained, and then the local model parameter of the kth iteration training is sent to the server, so that the accuracy of the global model parameter on the training of the non-independent identically distributed data is improved.

Fig. 2 exemplarily shows a flow of a method for training a bang learning model according to an embodiment of the present invention.

As shown in fig. 3, the specific process includes:

step 301, a server obtains local model parameters of a kth iteration sent by a plurality of clients; the local model parameters of the kth iteration are obtained by training the global model parameters of the kth-1 th iteration by using the local model parameters with regularization constraints by the client; the regularization constraints are determined by the client based on global model parameters of the server and local model parameters of the client.

According to the embodiment of the invention, a server obtains local model parameters of kth iteration sent by a plurality of clients, wherein the local model parameters are provided with regularization constraints.

And 302, the server determines the global model parameters of the kth iteration according to the local model parameters of the kth iteration and the global model parameters of the k-1 iteration.

In the embodiment of the invention, the server calculates the difference between the obtained local model parameters of the kth iteration of a plurality of clients and the global model parameters of the k-1 iteration, sums the calculated difference, obtains the global model parameters of the kth iteration according to the learning rate and the global model parameters of the k-1 iteration,

further, according to the following formula (3), determining global model parameters of the kth iteration;

wherein, W_k ⁽⁰⁾The global model parameter, W, for the kth iteration_k-1 ⁽⁰⁾α for the global model parameters of the k-1 th iteration_kLearning rate for the kth iteration, W_k ⁽ⁱ⁾Local model parameters for the kth iteration for the ith client.

Compared with the traditional method for calculating the mean value in the federal learning, the embodiment of the invention sums the difference values between the local model parameters of the k iteration of all the clients and the global model parameters of the k-1 iteration, and adds the sum as the step length to the global model parameters of the k-1 iteration to serve as the updated global model parameters of the k iteration.

The above technical solution is described in the following specific example in conjunction with example 1 in the above fig. 2.

Example 2

Obtaining local model parameters of the kth iteration sent by 10 clients, and then summing differences between the local model parameters of the kth iteration of all the clients and the global model parameters of the (k-1) th iteration to obtain a sum:

then multiplying the sum by the learning rate, and adding the sum to the global model parameter of the k-1 iteration to obtain the global model parameter of the k iteration as follows:

step 303, the server broadcasts the global model parameter of the kth iteration to the plurality of clients, so that the plurality of clients perform the (k + 1) th iteration training.

According to the embodiment of the invention, the server broadcasts the global model parameters of the kth iteration to the plurality of clients, the clients do not need to reset regularization constraints, the clients directly carry out the (k + 1) th iteration training, and the local training model parameters of the (k + 1) th iteration sent by the clients are obtained, so that the global model parameters of the (k + 1) th iteration are updated.

Based on the same technical concept, fig. 4 exemplarily shows the structure of a training apparatus for a federated learning model provided in an embodiment of the present invention, and the apparatus may execute the flow of the training method for the federated learning model in fig. 2.

As shown in fig. 4, the apparatus specifically includes:

an obtaining module 401, configured to obtain global model parameters of a k-1 st iteration broadcasted by a server; k is a positive integer;

a processing module 402, configured to use the global model parameter as a parameter of a local model with regularization constraints, and perform a kth iterative training using local data to obtain a local model parameter of the kth iterative training; the regularization constraint is determined according to global model parameters of the server and local model parameters of the client;

Optionally, the processing module 402 is specifically configured to:

wherein the content of the first and second substances,

Optionally, the processing module 402 is specifically configured to:

the gradient of the original loss function for the local model parameters,

Fig. 5 exemplarily shows a structure of a training apparatus for a federal learning model according to an embodiment of the present invention, which may execute a flow of a training method for the federal learning model in fig. 3. As shown in fig. 5, the apparatus specifically includes:

an obtaining unit 501, configured to obtain local model parameters of a kth iteration sent by multiple clients; the local model parameters of the kth iteration are obtained by training the global model parameters of the kth-1 th iteration by using the local model parameters with regularization constraints by the client; the regularization constraint is determined by the client according to global model parameters of the server and local model parameters of the client;

the processing unit 502 determines a global model parameter of the kth iteration according to the local model parameter of the kth iteration and the global model parameter of the k-1 iteration;

Optionally, the processing unit 502 is specifically configured to:

Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:

a memory for storing program instructions;

Based on the same technical concept, the embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to enable a computer to execute the above-mentioned method for training the federal learning model.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for training a federated learning model is characterized by comprising the following steps:

2. The method of claim 1, wherein the regularization constraint determined based on the global model parameters of the server and the local model parameters of the client comprises:

3. The method of claim 1, wherein a final loss function for the local model parameters is determined according to equation (1) below;

wherein the content of the first and second substances,

participating in the local modelFinal loss function of number, J (W)⁽ⁱ⁾) As a loss function of the local model parameters, J_T-S(W⁽ⁱ⁾) For the regularization constraint, β are coefficients of the regularization constraint.

4. The method of claim 1, wherein the local model parameters for the kth iterative training are determined according to the following formula (2);

the gradient of the original loss function for the local model parameters,

5. A method for training a federated learning model is characterized by comprising the following steps:

6. The method of claim 5, wherein the global model parameters for the kth iteration are determined according to the following equation (3);

7. The utility model provides a trainer of bang's learning model which characterized in that includes:

8. The utility model provides a trainer of bang's learning model which characterized in that includes:

9. A computing device, comprising:

a memory for storing program instructions;

a processor for calling program instructions stored in said memory to perform the method of any one of claims 1 to 4 or 5 to 6 in accordance with the obtained program.

10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 4 or 5 to 6.