CN111553484A

CN111553484A - Method, device and system for federal learning

Info

Publication number: CN111553484A
Application number: CN202010370086.2A
Authority: CN
Inventors: 岑园园; 孟丹; 李宏宇; 李晓林
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-18
Anticipated expiration: 2040-04-30
Also published as: CN111553484B

Abstract

The system comprises a federal server, a client and a system, wherein the federal server uniformly sends an initial model to each client, and the client receives the initial model which is uniformly sent to each client by the federal server and then trains the model based on local data of the client to obtain an updated gradient; the client sends the updated gradient to the federal server; after receiving the global update gradient, the federal server aggregates the update gradients sent by the clients to obtain a global update gradient, and performs singular value decomposition on the global update gradient; and the client receives the global updating gradient after singular value decomposition sent by the federal server, calculates the global updating gradient according to the global updating gradient after singular value decomposition, and continues to train the model according to the global updating gradient. The method and the device aim to solve at least one of the problems of high network transmission overhead and unsafe data in the conventional federated learning system.

Description

Method, device and system for federal learning

Technical Field

The application relates to the technical field of machine learning, in particular to a method, a device and a system for federated learning.

Background

Federal learning is a learning mode in which data is distributed among different entities. In a federated learning system, data is distributed on different clients, and a federated server and the clients initialize the same model (e.g., a neural network model) and the same initial parameters of the model. The client trains on a local data set to obtain the gradient of model update (gradient of model parameters), then each client entity sends the gradient to the federal server, and the federal server collects the gradients of all client entity updates and returns the obtained global gradient to each client entity after aggregation processing (averaging, summing and the like) so as to train the model of each client entity.

In the process of applying the federal learning system, the inventor finds the following problems: network transmission is required between the client and the server, and with the increasing number of clients, the overhead of the network transmission becomes the bottleneck of improving the system performance; in addition, for the federated server to collect the gradient of individual client entity updates, this gradient of updates is visible to the server, creating a security problem. According to existing studies, the gradient can be restored to the original data, which causes leakage of the original data in the client entity.

Disclosure of Invention

The main purpose of the present application is to provide a method, an apparatus, and a system for federated learning, so as to solve at least one of the problems of high network transmission overhead and unsafe data in the existing federated learning system.

To achieve the above object, according to a first aspect of the present application, a method of federated learning is provided.

The method of federal learning according to the present application includes:

and the federal server realizes the training of the model based on the compressed updating gradient.

Optionally, the compressed update gradient is specifically a first global update gradient, and the training of the federate server based on the compressed update gradient implementation model includes:

the client receives an initial model uniformly sent by the federal server, obtains an updated gradient based on local data and initial model training, and sends the updated gradient to the federal server;

and the client receives a first global update gradient sent by the federated server, and continues to train the model according to the first global update gradient, wherein the first global update gradient is obtained by the federated server sequentially performing aggregation processing and singular value decomposition on the update gradients sent by the clients.

Optionally, the compressed update gradient is specifically a second global update gradient, and the training of the model performed by the federal server based on the compressed update gradient includes:

the client receives an initial model uniformly sent by the federal server, obtains an updated gradient based on local data and initial model training, and sends the updated gradient of the last k layers of the initial model to the federal server, wherein the initial model is of a p-layer structure, and k is more than or equal to 1 and is less than p;

the client receives the second global update gradient sent by the federal server, and calculates global update gradients of other layers layer by layer according to the second global update gradient, wherein the second global update gradient is obtained by aggregating the gradients of the last k layers of updates sent by each client by the federal server;

and the client side continues to train the model according to the second global updating gradient and the global updating gradients of other layers.

Optionally, the continuing training of the model according to the first global update gradient includes:

and after matrix recovery processing is carried out on the matrix of the first global update gradient, model training is continued.

Optionally, after continuing the training of the model according to the second global update gradient and the global update gradients of the other layers, the method further includes:

counting the continuous times of sending the updated gradient of the last k layers of the initial model to the federal server, wherein each model training corresponds to one time;

if the continuous times reach preset times, executing one time of sending the updated gradients of all layers obtained after the last round of model training to the federal server so that the federal server can sequentially perform polymerization treatment and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient;

and receiving the third global updating gradient, and continuing the training of the model according to the third global updating gradient.

To achieve the above object, according to a second aspect of the present application, there is also provided a method of federated learning.

The method of federal learning according to the present application includes:

and training the model based on the compressed updating gradient with each client.

Optionally, the compressed update gradient is specifically a first global update gradient, and the training of the update gradient implementation model based on compression with each client includes:

the federal server uniformly sends the initial model to each client so that each client can obtain an updated gradient after training based on local data and the initial model;

receiving respective updated gradients sent by each client, and sequentially performing aggregation processing and singular value decomposition on the respective updated gradients sent by each client to obtain the first global update gradient;

and returning the first global updating gradient to each client so that each client can continue to train the model according to the first global updating gradient.

Optionally, the compressed update gradient is specifically a second global update gradient, and the training of the implementation model based on the compressed update gradient with each client includes:

the federal server uniformly sends the initial model to each client so that each client can obtain an updated gradient after training based on local data and the initial model, wherein the initial model is of a p-layer structure;

the federal server receives the updated gradient of the last k layers of the initial models sent by the clients, wherein k is more than or equal to 1 and less than p;

aggregating the gradients of the last k-layer updates sent by each client to obtain a second global update gradient;

and sending the second global updating gradient to each client so that each client can calculate the global updating gradient of other layers according to the second global updating gradient layer by layer.

Optionally, the singular value decomposition includes:

performing singular value decomposition on a corresponding gradient matrix after aggregation processing is performed on the updated gradients sent by each client to obtain a matrix formed by left singular vectors, a matrix formed by diagonal vectors and a matrix formed by right singular vectors;

selecting the first n non-zero singular values according to the arrangement sequence of the non-zero singular values on the diagonal line in the diagonal matrix from large to small;

and respectively compressing the matrix formed by the left singular vectors, the diagonal matrix and the matrix formed by the right singular vectors according to the first n nonzero singular values, and combining the compressed matrixes to obtain the second global updating gradient.

Optionally, after receiving the gradient of the last k-layer update sent by each client for a preset number of consecutive times, the method further includes:

receiving updated gradients of all layers sent by each client;

sequentially carrying out aggregation processing and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient;

and sending the third global update gradient to each client so that each client can continue to train the model according to the third global update gradient.

To achieve the above object, according to a third aspect of the present application, there is provided an apparatus for bang learning.

The apparatus for federal learning according to the present application includes:

and the training unit is used for realizing the training of the model based on the compressed updating gradient with the federated server.

Optionally, the compressed update gradient is specifically a first global update gradient, and the training unit includes:

the client side is used for receiving the initial model uniformly sent by the federal server, obtaining an updated gradient based on local data and initial model training and sending the updated gradient to the federal server;

and the first training module is used for receiving a first global update gradient sent by the federated server by the client and continuing to train a model according to the first global update gradient, wherein the first global update gradient is obtained by sequentially performing aggregation processing and singular value decomposition on the update gradients sent by the clients by the federated server.

Optionally, the compressed update gradient is specifically a second global update gradient, and the training unit includes:

the second sending module is used for receiving the initial model uniformly sent by the federal server by the client, obtaining an updated gradient based on local data and initial model training, and sending the updated gradient of the last k layers of the initial model to the federal server, wherein the initial model is of a p-layer structure, and k is more than or equal to 1 and less than p;

the calculation module is used for receiving the second global update gradient sent by the federal server by the client, and calculating global update gradients of other layers layer by layer according to the second global update gradient, wherein the second global update gradient is obtained by aggregating the updated gradients of the last k layers sent by each client by the federal server;

and the second training module is used for continuing the training of the model by the client according to the second global update gradient and the global update gradients of other layers.

Optionally, the first training module is further configured to:

Optionally, the apparatus further comprises:

the statistical module is used for counting the continuous times of sending the updated gradient of the last k layers of the initial model to the federal server after continuing the training of the model according to the second global update gradient and the global update gradients of other layers, wherein each round of model training corresponds to one time;

the execution module is used for executing one time of sending the updated gradients of all layers obtained after the last round of model training to the federal server if the continuous times reach the preset times, so that the federal server can sequentially perform aggregation processing and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient;

the second training module is further configured to receive the third global update gradient, and continue training the model according to the third global update gradient.

To achieve the above object, according to a fourth aspect of the present application, there is also provided an apparatus for bang learning.

and the training unit is used for realizing the training of the model based on the compressed updating gradient with each client.

the sending module is used for uniformly sending the initial model to each client by the federal server so that each client can obtain an updated gradient after training based on local data and the initial model;

the first processing module is used for receiving the respective updated gradients sent by the clients and sequentially performing aggregation processing and singular value decomposition on the respective updated gradients sent by the clients to obtain a first global update gradient;

and the first returning module is used for returning the first global updating gradient to each client so that each client can continue to train the model according to the first global updating gradient.

the sending module is used for uniformly sending the initial model to each client by the federal server so that each client can obtain an updated gradient after training based on local data and the initial model, and the initial model is of a p-layer structure;

the first receiving module is used for the federal server to receive the updated gradient of the last k layers of the initial models which are respectively and correspondingly sent by the clients, and k is more than or equal to 1 and is less than p;

the second processing module is used for performing aggregation processing on the gradient of the last k-layer update sent by each client to obtain a second global update gradient;

and the second returning module is used for sending the second global update gradient to each client so that each client can calculate the global update gradients of other layers according to the second global update gradient layer by layer.

Optionally, the first processing module is further configured to:

Optionally, the apparatus further comprises:

the second receiving module is used for receiving the updated gradients of all the layers sent by each client after receiving the updated gradient of the last k layers sent by each client continuously for preset times;

the third processing module is used for sequentially carrying out aggregation processing and singular value decomposition on the updated gradients of all the layers sent by each client to obtain a third global update gradient;

and the third returning module is used for sending the third global update gradient to each client so that each client can continue to train the model according to the third global update gradient.

In order to achieve the above object, according to a fifth aspect of the present application, there is provided a system for federated learning, the system including clients, a federated server;

the clients are used for executing the method for federal learning of any one of the first aspect;

the federated server is configured to perform the method of federated learning of any one of the foregoing second aspects.

To achieve the above object, according to a sixth aspect of the present application, there is provided a computer-readable storage medium storing computer instructions for causing a computer to execute the method for federal learning in any one of the first and second aspects.

In the method, the device and the system for federal learning in the embodiment of the application, each client and the federal server realize the training of the model based on the compressed update gradient. Compared with the communication resources occupied by the original update gradient, the compressed update gradient is reduced, and the transmitted update gradient is changed into the compressed update gradient in the process of model training based on federal learning, so that the overhead of network transmission can be reduced. The problem that the overhead of network transmission becomes a bottleneck of improving the system performance along with the continuous increase of the number of the clients can be effectively solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a flowchart of a method for federated learning according to an embodiment of the present application;

FIG. 2 is a flow chart of another method for federated learning provided in accordance with an embodiment of the present application;

FIG. 3 is a flow chart of yet another method for federated learning provided in accordance with an embodiment of the present application;

FIG. 4 is a block diagram illustrating an apparatus for federated learning provided in accordance with an embodiment of the present application;

FIG. 5 is a block diagram illustrating components of another federated learning apparatus provided in accordance with an embodiment of the present application;

FIG. 6 is a block diagram of another apparatus for federated learning provided in accordance with an embodiment of the present application;

FIG. 7 is a block diagram illustrating another apparatus for federated learning provided in accordance with an embodiment of the present application;

FIG. 8 is a block diagram illustrating components of yet another federated learning apparatus provided in accordance with an embodiment of the present application;

FIG. 9 is a block diagram illustrating another apparatus for federated learning provided in accordance with an embodiment of the present application;

fig. 10 is a block diagram illustrating a system for federated learning according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

According to the embodiment of the application, a method for federated learning is provided, and the method comprises the steps that each client and a federated server achieve model training based on compressed update gradient.

The existing federal learning process is that, in an initial stage, a federal server sends an initial model to each client uniformly, each client performs model training based on local data respectively to obtain an updated gradient, and the updated gradient is a corresponding model gradient after the client trains the initial model. And then each client sends the updated gradient to the federal server, and the federal server collects the updated gradients of all client entities, and returns the obtained global gradient to each client after aggregation processing (averaging, summing and the like) so as to train the model of each client. And repeating the process until the model training is finished.

In this embodiment, the compressed update gradient is a compressed global update gradient, and specifically, the global update gradient transmitted between the client and the federal server in the training process is compressed, and the problem that the network transmission overhead becomes a bottleneck of improving the system performance as the number of clients increases continuously is solved by reducing the network transmission overhead of the global update gradient.

It should be noted that the initial models and the corresponding parameters in each client and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the gradients in this embodiment are the gradients of the model parameters. The model is any other algorithm model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan transaction, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms (social security organization, bank, online shopping platform, etc.), but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning manner. The multi-party organization or platform is equivalent to each client, and the evaluation model of the credit score is equivalent to the model.

And aiming at the training mode of the update gradient implementation model based on compression of each client and the federated server, a plurality of specific implementation modes are provided, and different implementation modes correspond to different compressed update gradients.

In the first mode, the compressed update gradient is the first global update gradient, that is, the updated global gradient is compressed by singular value decomposition

According to an embodiment of the present application, there is provided a method for bang learning, as shown in fig. 1, the method includes the following steps:

and S101, uniformly sending the initial model to each client by the federal server.

S102, the client receives an initial model which is sent to each client by the federal server, training of the model is conducted on the basis of local data of the client, and an updated gradient is obtained and is the gradient of a trained model parameter.

In the initial stage, the federal server sends the initial models to all the clients uniformly, all the clients perform model training respectively based on local data to obtain updated gradients, and the updated gradients are corresponding to the model gradients after the clients perform training on the initial models. The initial models and corresponding parameters in the clients and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the model gradients in this embodiment are the gradients of the model parameters. The model is any other algorithm model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan transaction, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms (social security organization, bank, online shopping platform, etc.), but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning manner. The multi-party organization or platform is equivalent to each client, and the evaluation model of the credit score is equivalent to the model.

And S103, the client sends the updated gradient to the federal server.

And S104, the federal server receives the updated gradient sent by each client, and sequentially carries out aggregation processing and singular value decomposition on the updated gradient sent by each client to obtain a first global update gradient.

The polymerization treatment in this embodiment is an averaging treatment. The result of the aggregation process is equivalent to a global update gradient in the prior art, and the first global update gradient is equivalent to compressing the global update gradient by means of singular value decomposition.

Singular Value Decomposition (SVD) is an important matrix decomposition in linear algebra, and is used for mapping a data set to a low-dimensional space to achieve the purposes of dimension reduction and compression. The singular values of the singular value decomposed data set are ranked according to significance. In this embodiment, the matrix corresponding to the updated gradient sent by each client is subjected to SVD processing, specifically, the matrix corresponding to each layer is processed according to the gradient matrix of each layer, matrix decomposition is performed on the matrix corresponding to each layer to obtain U S V, U is a matrix composed of left singular vectors, S is a diagonal matrix, and V is a matrix composed of right singular vectors, elements on the diagonal of the diagonal matrix are singular values, the singular values are arranged in order of magnitude, when compression is performed, only the first n (two or three, etc., which can be adjusted according to actual requirements) nonzero singular values and the corresponding first n singular vectors, that is, only the first n columns of U (that is, the first n left singular vectors), the first n rows of V (that is, the first n right singular vectors) and the first n diagonal elements of S (that is, the largest n singular values) are selected to complete SVD processing, and after SVD processing of all layers of gradient matrices is completed, a first global update gradient is obtained. The non-zero singular values are arranged according to importance, and only the first few singular values are taken to reduce transmission quantity and do not influence the accuracy of a subsequent model training result.

Specific examples are given to explain the SVD processing effect: assuming that a gradient matrix of a certain layer after aggregation processing is performed on the update gradients sent by each client is a 500 × 800 matrix, if normal transmission is not performed by SVD processing, a matrix composed of 400000 numbers needs to be transmitted; after SVD approximation, we take 2 singular values, and then we need to transmit 2 left singular vectors with a length of 500, 2 right singular vectors with a length of 800 and 2 singular values, and the total transmitted quantity is (2 × 500+2 × 800+2) ═ 2602 numbers.

And S105, the federal server returns the first global update gradient to each client.

And S106, the client receives the first global update gradient sent by the federal server, and continues to train the model according to the first global update gradient.

In the singular value decomposition, a matrix for performing singular decomposition can be obtained by reverse-deducing according to non-zero singular values, although a client knows that several important non-zero singular values are known, but not all non-zero singular values, and the reversely-deduced matrix has a certain difference with that before the singular value decomposition, the SVD protects the main structure of the matrix (the matrix after aggregation processing is performed on the update gradients sent by each client), and the difference is proved to be small in actual verification and has almost no influence on the training result of the whole model.

Specifically, when continuing to train the model according to the first global update gradient, the method includes performing matrix recovery processing on the matrix of the first global update gradient, and then continuing to train the model. The principle of performing matrix recovery processing on the matrix of the first global update gradient is as follows: namely, the matrix after SVD processing is restored, and assuming that u1 and u2 are left singular eigenvectors extracted after SVD processing, s1 and s2 are singular values extracted after SVD processing, and v1 and v2 are right eigenvectors extracted after SVD processing, the following formula is adopted to perform approximate restoration of the gradient matrix:

s1*(u1×v1)+s2*(u2×v2)

where "+" is the multiplication of a scalar and a matrix, "x" is the outer product, the result of the two vector outer products is a matrix, and "+" is the addition of the matrix.

And the client side continues to train the model according to the first global updating gradient to obtain a new round of updating gradient, and then continues to repeat the steps S103-S106 until the model training is finished.

From the above description, it can be seen that, in the method for federal learning in the embodiment of the present application, the federal server sends the initial model to each client, the client performs model training based on client local data after receiving the initial model sent by the federal server to each client, so as to obtain an updated gradient, where the updated gradient is a gradient of a trained model parameter; then the client sends the updated gradient to the federal server; after receiving the update gradient sent by each client, the federated server sequentially carries out aggregation processing and singular value decomposition on the update gradient to obtain a first global update gradient; and then the client receives the first global update gradient sent by the federal server, and continues to train the model according to the first global update gradient. It can be seen that in the federate learning manner in the application, the federate server does not directly return the aggregation processing result to each client, but performs singular value decomposition on the aggregation processing result, that is, reduces dimensionality, and then returns the global update gradient after singular value decomposition to each client, which greatly reduces the overhead of network transmission. The problem that the overhead of network transmission becomes the bottleneck of improving the system performance along with the continuous increase of the number of the clients is effectively solved.

In the second way, the compressed update gradient is specifically the second global update gradient, that is, the compressed global update gradient is obtained by aggregating the compressed update gradients transmitted by the client (the gradient of updates that only pass through the last few layers)

According to an embodiment of the present application, there is provided a method for bang learning, as shown in fig. 2, the method includes the following steps:

s201, the federal server uniformly sends the initial model to each client.

S202, the client receives an initial model which is sent to each client by the federal server in a unified mode, and training of the model is carried out on the basis of local data of the client to obtain an updated gradient.

The implementation manners of steps S201 and S202 may refer to the implementation manners of steps S101 and S102 in fig. 1, and are not described herein again.

And S203, the client sends the updated gradient of the last k layers of the initial model to a federal server.

And sending the gradient of the last k-layer update of the initial model to the federal server, so that the federal server performs aggregation processing (averaging) on the gradient of the last k-layer update sent by each client to obtain a second global update gradient.

The initial model is a p-layer structure, k is more than or equal to 1 and less than p, and an optimal value range of k is more than or equal to 1 and less than or equal to 3 and more than 3 according to an actual test result.

Unlike the embodiment of fig. 1, this embodiment does not send all of the updated gradients to the federated server, but instead sends the last few layers of updated gradients to the federated server. In this way, network overhead when the client sends updated gradients to the federated server is reduced, and the second global update gradient aggregated by the federated server is also compressed compared to the original global update gradient (the global update gradient obtained by aggregating all the updated gradients). In addition, only the last layers of the updated gradient are transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved.

And S204, the federal server receives the last k layers of gradients sent by the clients, performs aggregation processing to obtain a second global update gradient, and sends the second global update gradient to the federal server.

And after receiving the last k-layer gradient sent by each client, the federal server performs aggregation processing, namely averaging processing, to obtain a second global update gradient. In addition, in practical applications, the averaging process may be direct averaging, or weighted averaging according to actual needs, and the weight value is determined according to the importance of the data of each client. For example, in the evaluation model of the user credit score, each organization or platform may set different weight values according to actual requirements.

S205, the client receives the second global update gradient sent by the federated server, and continues to train the model according to the second global update gradient and the global update gradients of other layers.

The second global update gradient includes the global update gradients of the last k layers, which are respectively aggregated by the update gradients of the last k layers, so that the client can calculate the global update gradients of other layers layer by layer according to the second global update gradient to obtain the global update gradients of all the layers, so that the client continues to train the model according to the global update gradients of all the layers to obtain a new round of updated gradients, and then continues to repeat the foregoing steps S203-S205 until the model training is finished.

From the above description, it can be seen that in the method for federal learning in the embodiment of the application, a client receives an initial model uniformly sent by a federal server, obtains an updated gradient based on local data and initial model training, and sends the updated gradient of the last k layers of the initial model to the federal server, wherein the initial model is a p-layer structure, and k is greater than or equal to 1 and is less than p; the federal server carries out aggregation processing on the gradient of the last k-layer update sent by each client to obtain a second global update gradient; and the client receives the second global update gradient sent by the federal server, calculates the global update gradients of other layers layer by layer according to the second global update gradient, and continues to train the model according to the second global update gradient and the global update gradients of other layers. It can be seen that in this embodiment, the client sends only the last few layers of the updated gradient to the federated server. Therefore, network overhead when the client sends the updated gradient to the federal server is reduced, the second global update gradient obtained by the federal server is compressed compared with the original global update gradient (the global update gradient obtained by aggregating all the updated gradients), and compared with the prior art, the network transmission overhead of the global update gradient is also reduced. In addition, only the last layers of the updated gradient are transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved.

As a preferred mode of the above-mentioned embodiment of fig. 2, another federal learning method is provided, as shown in fig. 3, the method includes the following steps:

s301, the federal server uniformly sends the initial model to each client.

S302, the client receives an initial model which is sent to each client by the federal server in a unified mode, and training of the model is carried out based on local data of the client to obtain an updated gradient.

The implementation manners of steps S301 and S302 can refer to the implementation manners of steps S101 and S102 in fig. 1, and are not described herein again.

And S303, the client sends the updated gradient of the last layer of the initial model to the federal server.

And sending the updated gradient of the last layer of the initial model to the federal server, so that the federal server performs aggregation processing (averaging) on the updated gradient of the last layer sent by each client to obtain a second global update gradient.

This embodiment does not send all of the updated gradients to the federated server, but instead sends the last layer of updated gradients to the federated server. In this way, network overhead when the client sends updated gradients to the federated server is reduced, and the second global update gradient aggregated by the federated server is also compressed compared to the original global update gradient (the global update gradient obtained by aggregating all the updated gradients). In addition, only the last layer of the updated gradient is transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved.

Compared with the mode of only transmitting the last multiple layers, the mode of only transmitting the last layer further reduces the network overhead during transmission, and because the last layer is an output layer, the global updating gradient of other layers can be calculated layer by layer according to the last layer.

S304, the federal server receives the last layer of gradient sent by each client, performs aggregation processing to obtain a second global update gradient, and sends the second global update gradient to the federal server.

And after receiving the last layer of gradient sent by each client, the federated server performs aggregation processing, namely averaging processing, to obtain a second global update gradient. In addition, in practical applications, the averaging process may be direct averaging, or weighted averaging according to actual needs, and the weight value is determined according to the importance of the data of each client. For example, in the evaluation model of the user credit score, each organization or platform may set different weight values according to actual requirements.

S305, the client receives the second global update gradient sent by the federal server, and continues to train the model according to the second global update gradient and the global update gradients of other layers.

The implementation of step S305 can refer to the implementation of step S205 in fig. 2, and is not described herein again.

From the above description, it can be seen that in this embodiment, the client sends only the last layer of the updated gradient to the federated server. Therefore, network overhead when the client sends the updated gradient to the federal server is reduced, the second global update gradient obtained by the federal server is compressed compared with the original global update gradient (the global update gradient obtained by aggregating all the updated gradients), and compared with the prior art, the network transmission overhead of the global update gradient is also reduced. In addition, only the last layers of the updated gradient are transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved. Compared with the embodiment in fig. 2, the present embodiment can further reduce the network overhead during transmission.

A third mode is a combination of the first and second modes.

In practical applications, in the second method, the gradient method that only updates of the last few layers are transmitted has a very small transmission amount, but causes a data drift problem, so that the whole model needs to be timed and synchronized. Analysis of data drift problem: after receiving the global update gradients of the last layers returned by the federal server, the client needs to calculate and update the gradients of the previous layers layer by layer, so that the gradients obtained by the clients are not completely consistent. The present embodiment solves the data drift problem by using the first and second methods in combination. The specific combination mode is as follows:

in the process of model training, after the preset times are executed according to the second mode, the execution is performed once according to the first mode, and the regular and periodic circulation is performed in sequence until the model training is finished.

Specifically, the method comprises the following steps: counting the continuous times of sending the updated gradient of the last k layers of the initial model to the federal server, wherein each model training corresponds to one time; if the continuous times reach the preset times, executing one time of sending the updated gradients of all layers obtained after the last round of model training to the federal server so that the federal server can sequentially carry out polymerization treatment and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient; and receiving the third global updating gradient, and continuing the training of the model according to the third global updating gradient.

From the above description, it can be seen that, in the embodiment, the two modes are used in combination, so that network communication overhead is greatly reduced, the possibility of deducing original data from an original gradient is reduced, security is increased, system performance is also improved, and meanwhile, the accuracy of a learning model is not reduced.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present application, there is also provided an apparatus for implementing the above method, the apparatus including: and the training unit is used for training the client and the federated server on the basis of the compressed update gradient implementation model.

Further, according to an embodiment of the present application, there is also provided an apparatus for federal learning for implementing the method in fig. 1 to 3, where the apparatus is located on a client side, and the compressed update gradient is specifically a first global update gradient, as shown in fig. 4, and the training unit includes:

the first sending module 41 is used for receiving the initial model uniformly sent by the federal server by the client, obtaining an updated gradient based on local data and initial model training, and sending the updated gradient to the federal server;

and the first training module 42 is configured to receive a first global update gradient sent by the federated server by the client, and continue to train a model according to the first global update gradient, where the first global update gradient is obtained by sequentially performing aggregation processing and singular value decomposition on update gradients sent by the clients by the federated server.

From the above description, it can be seen that, in the device for federal learning in the embodiment of the present application, the federal server sends the initial model to each client, the client performs model training based on client local data after receiving the initial model sent by the federal server to each client, so as to obtain an updated gradient, where the updated gradient is a gradient of a trained model parameter; then the client sends the updated gradient to the federal server; after receiving the update gradient sent by each client, the federated server sequentially carries out aggregation processing and singular value decomposition on the update gradient to obtain a first global update gradient; and then the client receives the first global update gradient sent by the federal server, and continues to train the model according to the first global update gradient. It can be seen that in the federate learning manner in the application, the federate server does not directly return the aggregation processing result to each client, but performs singular value decomposition on the aggregation processing result, that is, reduces dimensionality, and then returns the global update gradient after singular value decomposition to each client, which greatly reduces the overhead of network transmission. The problem that the overhead of network transmission becomes the bottleneck of improving the system performance along with the continuous increase of the number of the clients is effectively solved.

Further, the first training module 42 is further configured to:

Specifically, the specific process of implementing the functions of each unit and module in the device in the embodiment of the present application may refer to the related description in the method embodiment, and is not described herein again.

Further, the compressed update gradient is specifically a second global update gradient, as shown in fig. 5, where the training unit includes:

a second sending module 43, configured to receive, by the client, the initial model uniformly sent by the federal server, obtain an updated gradient based on local data and initial model training, and send the updated gradient of the last k layers of the initial model to the federal server, where the initial model is a p-layer structure, and k is greater than or equal to 1 and less than p;

a calculating module 44, configured to receive the second global update gradient sent by the federation server by the client, and calculate global update gradients of other layers layer by layer according to the second global update gradient, where the second global update gradient is obtained by aggregating, by the federation server, the last k-layer update gradient sent by each client;

and a second training module 45, configured to continue training the model by the client according to the second global update gradient and the global update gradients of the other layers.

From the above description, it can be seen that in the device for federal learning in the embodiment of the present application, a client receives an initial model uniformly sent by a federal server, obtains an updated gradient based on local data and initial model training, and sends the updated gradient of the last k layers of the initial model to the federal server, wherein the initial model is a p-layer structure, and k is greater than or equal to 1 and is less than p; the federal server carries out aggregation processing on the gradient of the last k-layer update sent by each client to obtain a second global update gradient; and the client receives the second global update gradient sent by the federal server, calculates the global update gradients of other layers layer by layer according to the second global update gradient, and continues to train the model according to the second global update gradient and the global update gradients of other layers. It can be seen that in this embodiment, the client sends only the last few layers of the updated gradient to the federated server. Therefore, network overhead when the client sends the updated gradient to the federal server is reduced, the second global update gradient obtained by the federal server is compressed compared with the original global update gradient (the global update gradient obtained by aggregating all the updated gradients), and compared with the prior art, the network transmission overhead of the global update gradient is also reduced. In addition, only the last layers of the updated gradient are transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved.

Further, as shown in fig. 6, the apparatus further includes:

a statistics module 46, configured to count the number of consecutive times that a gradient of the last k-layer update of the initial model is sent to the federated server after continuing model training according to the second global update gradient and global update gradients of other layers, where each round of model training corresponds to one time;

an executing module 47, configured to, if the consecutive times reach preset times, execute one time to send the updated gradients of all layers obtained after the last round of model training to the federal server, so that the federal server sequentially performs aggregation processing and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient;

the second training module 45 is further configured to receive the third global update gradient, and continue training the model according to the third global update gradient.

Further, according to an embodiment of the present application, there is provided an apparatus for federal learning for implementing the method in fig. 1 to 3, where the apparatus is located on a federal server side, the compressed update gradient is specifically a first global update gradient, and as shown in fig. 7, the training unit includes:

a sending module 51, configured to uniformly send the initial model to each client by the federal server, so that each client obtains an updated gradient after training based on local data and the initial model;

the first processing module 52 is configured to receive the respective updated gradients sent by each client, and sequentially perform aggregation processing and singular value decomposition on the respective updated gradients sent by each client to obtain the first global update gradient;

a first returning module 53, configured to return the first global update gradient to each client, so that each client continues to train the model according to the first global update gradient.

Further, the first processing module 52 is further configured to:

Further, the compressed update gradient is specifically a second global update gradient, as shown in fig. 8, where the training unit includes:

the sending module 51 is configured to uniformly send the initial model to each client by the federal server, so that each client obtains an updated gradient after training based on local data and the initial model, where the initial model is a p-layer structure;

a first receiving module 54, configured to receive, by the federation server, a gradient of last k-level updates of the initial model sent by each client, where k is greater than or equal to 1 and is less than p;

a second processing module 55, configured to aggregate gradients of last k-layer updates sent by each client to obtain the second global update gradient;

a second returning module 56, configured to send the second global update gradient to each client, so that each client calculates the global update gradients of the other respective layers layer by layer according to the second global update gradient.

Further, as shown in fig. 9, the apparatus further includes:

a second receiving module 57, configured to receive updated gradients of all layers sent by each client after receiving the updated gradient of the last k layers sent by each client for a preset number of consecutive times;

a third processing module 58, configured to sequentially perform aggregation processing and singular value decomposition on the updated gradients of all layers sent by each client to obtain a third global update gradient;

a third returning module 59, configured to send the third global update gradient to each client, so that each client continues to train the model according to the third global update gradient.

According to an embodiment of the present application, there is also provided a system for federal learning for implementing the methods described in fig. 1 to 3, as shown in fig. 10, the system includes various clients 61, a federal server 62;

each client 61 is configured to execute steps executed by the client in the federated learning method described in the embodiments of fig. 1 to 3;

the federated server 62 is configured to perform the steps performed by the federated server in the federated learning method described in the embodiments of fig. 1-3.

From the above description, it can be seen that, in the system for federal learning in the embodiment of the present application, the federal server sends the initial model to each client, the client performs model training based on client local data after receiving the initial model sent by the federal server to each client, so as to obtain an updated gradient, where the updated gradient is a gradient of a trained model parameter; then the client sends the updated gradient to the federal server; after receiving the update gradient sent by each client, the federated server sequentially carries out aggregation processing and singular value decomposition on the update gradient to obtain a first global update gradient; and then the client receives the first global update gradient sent by the federal server, and continues to train the model according to the first global update gradient. It can be seen that in the federate learning manner in the application, the federate server does not directly return the aggregation processing result to each client, but performs singular value decomposition on the aggregation processing result, that is, reduces dimensionality, and then returns the global update gradient after singular value decomposition to each client, which greatly reduces the overhead of network transmission. The problem that the overhead of network transmission becomes the bottleneck of improving the system performance along with the continuous increase of the number of the clients is effectively solved.

In addition, the client receives an initial model uniformly sent by the federal server, obtains an updated gradient based on local data and initial model training, and sends the updated gradient of the last k layers of the initial model to the federal server, wherein the initial model is of a p-layer structure, and k is more than or equal to 1 and is less than p; the federal server carries out aggregation processing on the gradient of the last k-layer update sent by each client to obtain a second global update gradient; and the client receives the second global update gradient sent by the federal server, calculates the global update gradients of other layers layer by layer according to the second global update gradient, and continues to train the model according to the second global update gradient and the global update gradients of other layers. It can be seen that in this embodiment, the client sends only the last few layers of the updated gradient to the federated server. Therefore, network overhead when the client sends the updated gradient to the federal server is reduced, the second global update gradient obtained by the federal server is compressed compared with the original global update gradient (the global update gradient obtained by aggregating all the updated gradients), and compared with the prior art, the network transmission overhead of the global update gradient is also reduced. In addition, only the last layers of the updated gradient are transmitted, so that the risk of obtaining local original data of the client according to all the updated gradients can be effectively avoided, and the effect of keeping the updated gradient secret is achieved.

There is further provided a computer-readable storage medium according to an embodiment of the present application, where the computer-readable storage medium stores computer instructions for causing the computer to execute the method of federal learning in the above method embodiment.

According to an embodiment of the present application, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of federal learning in the above method embodiments.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of federated learning, the method comprising:

2. The method of federated learning according to claim 1, wherein the compressed update gradient is specifically a first global update gradient, and wherein the training with the federated server based on the compressed update gradient implementation model includes:

3. The method of federated learning according to claim 1, wherein the compressed update gradient is specifically a second global update gradient, and wherein the training of the model with the federated server based on the compressed update gradient includes:

4. The method of federated learning of claim 2, wherein the continuing training of the model according to the first global update gradient comprises:

5. The method of federated learning of claim 3, wherein after continuing with training of a model according to the second global update gradient and global update gradients of other layers, the method further comprises:

6. A method of federated learning, the method comprising:

7. The method of federated learning as claimed in claim 6, wherein the compressed update gradient is specifically a first global update gradient, and the training with each client based on the compressed update gradient implementation model includes:

8. The method of federated learning as claimed in claim 6, wherein the compressed update gradient is specifically a second global update gradient, and the training with each client based on the compressed update gradient implementation model includes:

9. The method of federal learning as in claim 7, wherein the singular value decomposition comprises:

10. The method of federated learning of claim 8, wherein after receiving a gradient of the last k-layer updates sent by each client a preset number of consecutive times, the method further comprises:

receiving updated gradients of all layers sent by each client;

11. A system for federated learning is characterized in that the system comprises clients and a federated server;

the clients for performing the method of federal learning of any of the preceding claims 1-5;

the federated server configured to perform the method of federated learning of any of the preceding claims 6-10.

12. A computer readable storage medium having computer instructions stored thereon for causing a computer to perform the method of federal learning as claimed in any of claims 1-10.