CN111553483A

CN111553483A - Gradient compression-based federated learning method, device and system

Info

Publication number: CN111553483A
Application number: CN202010370062.7A
Authority: CN
Inventors: 岑园园; 李宏宇; 李晓林
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-18
Anticipated expiration: 2040-04-30
Also published as: CN111553483B

Abstract

The application discloses a method, a device and a system for federal learning based on gradient compression, wherein the system comprises the following steps: the federal server uniformly sends the initial model to each client; the client trains the initial model to obtain an updated gradient, performs quantization processing on the updated gradient to obtain a quantized gradient of the updated gradient, and then sends the quantized gradient of the updated gradient to the federal server; the federated server performs quantity statistics according to all the quantitative gradients corresponding to the clients and the difference of the quantitative values in the quantitative gradients to obtain statistical results, and returns the statistical results to the clients; and the client receives the statistical result sent by the federal server, calculates the global updating gradient according to the statistical result, and continues to train the model according to the global updating gradient. The method and the device aim to solve the problems that network transmission overhead is high and data are unsafe in the existing federated learning system.

Description

Gradient compression-based federated learning method, device and system

Technical Field

The application relates to the technical field of machine learning, in particular to a method, a device and a system for federal learning based on gradient compression.

Background

Federal learning is a learning mode in which data is distributed among different entities. In a federated learning system, data is distributed on different clients, and a federated server and the clients initialize the same model (e.g., a neural network model) and the same initial parameters of the model. The client-side trains on a local data set to obtain the gradient of model update (gradient of model parameters), then each client-side entity sends the gradient to the federal server, and the federal server collects the gradient of all client-side entity updates, and returns the obtained global gradient to each client-side entity after average processing, so that each client-side entity trains the model.

In the process of applying the federal learning system, the inventor finds the following problems: network transmission is required between the client and the server, and with the increasing number of clients, the overhead of the network transmission becomes the bottleneck of improving the system performance; in addition, for the federated server to collect the gradient of individual client entity updates, this gradient of updates is visible to the server, creating a security problem. According to existing studies, the gradient can be restored to the original data, which causes leakage of the original data in the client entity.

Disclosure of Invention

The main purpose of the present application is to provide a method, an apparatus, and a system for federated learning based on gradient compression, so as to solve the problems of high network transmission overhead and unsafe data in the existing federated learning system.

To achieve the above object, according to a first aspect of the present application, there is provided a method of federated learning based on gradient compression.

The method for gradient compression-based federated learning according to the present application includes:

the client side carries out quantization processing on the updated gradient to obtain the quantized gradient of the updated gradient, wherein the updated gradient is a model gradient corresponding to an initial model after the client side trains the initial model, and the initial model is an initial model which is sent to each client side by the federation server;

sending the quantitative gradients to the federal server so that the federal server can count all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain a statistical result;

and receiving the statistical result sent by the federal server, and calculating a global update gradient according to the statistical result so as to continue model training according to the global update gradient.

Optionally, before sending the quantization gradient to the federation server, the method further includes:

the client and other clients determine a uniform encryption mode;

and encrypting the quantization gradient according to the encryption mode.

Optionally, the encryption mode is a preset permutation and replacement mode;

the encrypting the quantization gradient according to the encryption mode comprises: and rearranging the quantization values in the quantization gradient according to the permutation and replacement mode.

Optionally, before calculating the global update gradient according to the statistical result, the method further includes:

and carrying out decryption processing on the statistical result.

Optionally, the client performs quantization processing on the updated gradient, and obtaining the quantized gradient of the updated gradient includes:

the client side sends the first maximum value of the updated component absolute value of the gradient to the federal server, so that the federal server determines a second maximum value according to the first maximum value of the updated component absolute value of the gradient corresponding to each client side, wherein the second maximum value is the maximum value of all the first maximum values;

and carrying out quantization processing on the updated gradient according to the second maximum value sent by the federal server to obtain the quantized gradient of the updated gradient.

To achieve the above object, according to a second aspect of the present application, there is also provided a method of federated learning based on gradient compression.

the federal server uniformly sends the initial model to each client so that each client can obtain respective corresponding updated gradient after training the initial model;

receiving quantization gradients sent by each client and corresponding to each client, and performing quantity statistics on all the quantization gradients corresponding to each client according to different quantization values in the quantization gradients to obtain a statistical result, wherein the quantization gradients are obtained by performing quantization processing on updated gradients corresponding to each client;

and returning the statistical result to each client so that each client can calculate the global updating gradient according to the statistical result.

Optionally, the performing quantity statistics on all quantization gradients corresponding to each client according to different quantization values in the quantization gradients to obtain a statistical result includes:

and respectively counting the number of different quantized values at each component position, wherein each component position is the same component position of the updated gradient of each client.

Optionally, before receiving the respective corresponding quantization gradients sent by the clients, the method further includes:

receiving a first maximum value of the component absolute value of the updated gradient sent by each client;

determining a second maximum value according to all the first maximum values, wherein the second maximum value is the maximum value of all the first maximum values;

and returning the second maximum value to each client so that each client can perform quantization processing on the updated gradient according to the second maximum value.

To achieve the above object, according to a third aspect of the present application, there is provided an apparatus for gradient compression-based federal learning.

The device for gradient compression-based federal learning according to the application comprises:

the client side is used for carrying out quantization processing on the updated gradient to obtain the quantized gradient of the updated gradient, the updated gradient is a model gradient corresponding to an initial model which is uniformly sent to each client side by the federal server;

the sending unit is used for sending the quantitative gradients to the federal server so that the federal server can count all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain a statistical result;

and the training unit is used for receiving the statistical result sent by the federal server and calculating a global update gradient according to the statistical result so as to continue the training of the model according to the global update gradient.

Optionally, the apparatus further comprises:

the determining unit is used for determining a uniform encryption mode by the client and other clients before the quantization gradient is sent to the federal server;

and the encryption unit is used for encrypting the quantization gradient according to the encryption mode.

Optionally, the encryption mode is a preset permutation and permutation mode, and the encryption unit is configured to:

and rearranging the quantization values in the quantization gradient according to the permutation and replacement mode.

Optionally, the apparatus further comprises:

and the decryption unit is used for decrypting the statistical result before calculating the global update gradient according to the statistical result.

Optionally, the quantization unit includes:

a sending module, configured to send, by the client, the first maximum value of the updated absolute value of the component of the gradient to the federated server, so that the federated server determines, according to the first maximum value of the updated absolute value of the component of the gradient corresponding to each client, a second maximum value, which is a maximum value among all the first maximum values;

and the quantization module is used for performing quantization processing on the updated gradient according to the second maximum value sent by the federal server to obtain the quantization gradient of the updated gradient.

In order to achieve the above object, according to a fourth aspect of the present application, there is also provided an apparatus for gradient compression-based federal learning.

the sending unit is used for uniformly sending the initial model to each client by the federal server so that each client can obtain the corresponding updated gradient after training the initial model;

the statistical unit is used for receiving the quantization gradients sent by the clients and corresponding to the clients, and performing quantity statistics on all the quantization gradients corresponding to the clients according to different quantization values in the quantization gradients to obtain statistical results, wherein the quantization gradients are obtained by performing quantization processing on the corresponding updated gradients by the clients;

and the first returning unit is used for returning the statistical result to each client so that each client can calculate the global updating gradient according to the statistical result.

Optionally, the statistical unit is configured to:

Optionally, the apparatus further comprises:

the receiving unit is used for receiving a first maximum value of the component absolute value of the updated gradient sent by each client side before receiving the corresponding quantization gradient sent by each client side;

a determining unit, configured to determine a second maximum value according to all the first maximum values, where the second maximum value is a maximum value of all the first maximum values;

and the second returning unit is used for returning the second maximum value to each client so that each client can perform quantization processing on the updated gradient according to the second maximum value.

In order to achieve the above object, according to a fifth aspect of the present application, there is provided a system for gradient compression-based federated learning, the system including clients, a federated server;

the clients are used for executing the method for gradient compression-based federated learning of any one of the first aspects;

the federated server is configured to execute the method for gradient compression-based federated learning according to any one of the foregoing second aspects.

In order to achieve the above object, according to a sixth aspect of the present application, there is provided a computer-readable storage medium storing computer instructions for causing the computer to execute the method for gradient compression-based federated learning according to any one of the first and second aspects.

In the method, the device and the system for gradient compression-based federated learning, a federated server sends an initial model to each client uniformly; the client trains the initial model to obtain an updated gradient, and quantifies the updated gradient and sends the quantized updated gradient to the federal server; the federated server performs quantity statistics on all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain statistical results, and returns the statistical results to the clients; and the client receives the statistical result sent by the federal server, calculates the global updating gradient according to the statistical result, and continues to train the model according to the global updating gradient. It can be seen that, in the application, the model gradient transmitted between the client and the federal server is a quantized gradient, and the quantized gradient greatly reduces the occupied space, so that the communication traffic of the client and the federal server can be reduced, the performance can be improved, and the problem of high network transmission overhead in the existing federal learning system is solved. In addition, in the communication process of the client and the federal server, the quantized model gradient is transmitted instead of the original model gradient, so that the risk that the original data can be leaked according to the original model gradient can be effectively avoided, and the safety of the local data of the client is ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a flow chart of a method for gradient compression-based federated learning according to an embodiment of the present application;

FIG. 2 is a flow chart of another method for gradient compression-based federated learning provided in accordance with an embodiment of the present application;

FIG. 3 is a flow chart of yet another method for gradient compression-based federated learning provided in accordance with an embodiment of the present application;

FIG. 4 is a flow chart of yet another method for gradient compression-based federated learning provided in accordance with an embodiment of the present application;

FIG. 5 is a flow chart of yet another method for gradient compression-based federated learning provided in accordance with an embodiment of the present application;

FIG. 6 is a block diagram illustrating components of a gradient compression-based federated learning apparatus provided in accordance with an embodiment of the present application;

FIG. 7 is a block diagram illustrating components of another apparatus for gradient compression based federated learning provided in accordance with an embodiment of the present application;

FIG. 8 is a block diagram illustrating a gradient compression-based federated learning apparatus according to an embodiment of the present application;

FIG. 9 is a block diagram illustrating a further apparatus for gradient compression based federated learning provided in accordance with an embodiment of the present application;

fig. 10 is a block diagram of a system for gradient compression-based federated learning according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

According to the embodiment of the application, a method for gradient compression-based federated learning is provided, which is applied to a client side as shown in fig. 1, and the method comprises the following steps:

and S101, the client performs quantization processing on the updated gradient to obtain the quantization gradient of the updated gradient.

In the initial stage, the federal server sends the initial models to all the clients uniformly, all the clients perform model training respectively based on local data to obtain updated gradients, and the updated gradients are corresponding to the model gradients after the clients perform training on the initial models. The initial models and corresponding parameters in the clients and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the model gradients in this embodiment are the gradients of the model parameters. The model is any other model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan service, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms, but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning mode. Wherein, the multi-party organization or platform is equivalent to the client, and the credit evaluation model is equivalent to the model.

Quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) of discrete values. The quantization process for the updated gradient in this embodiment is to replace a larger number of original updated gradients with a smaller number of values. A specific example is given to illustrate, assuming that the updated gradient vector corresponding to the client is [ v1, v2, …, vN ], and a smaller number of values are 1,0, -1, the updated gradient is quantized, i.e., v1, v2, …, vN is converted into any one of 1,0, -1, and the quantization process of the updated gradient is completed. In practical applications, 1,0, -1 can also be adaptively adjusted according to actual requirements, and this example is only schematically illustrated.

The updated gradient is quantized to compress the model gradient transmitted between the client and the federal server, so that the communication traffic is reduced.

And S102, sending the quantitative gradient to a federal server.

The reason for sending the quantitative gradients to the federal server is to enable the federal server to carry out quantity statistics on all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients, obtain statistical results and return the statistical results to the clients. Specifically, the process of quantity statistics is explained with reference to the above example: each component in the vector of the quantization gradient in the above example is composed of 1,0, -1, and 1,0, -1 is the quantization value in the quantization gradient. "all quantization gradients corresponding to each client perform quantity statistics according to different quantization values in the quantization gradients", that is, the quantity of "1", the quantity of "0", and the quantity of "-1" in each component are respectively counted. Each component refers to a component at the same position corresponding to each client, and the dimensions of the gradient vectors of all the clients are the same. Specific examples are given for illustration: assuming that there are 4 clients, A, B, C, D, the gradient vectors of their respective updates are [ v11, v21, …, vN1 ]; [ v12, v22, …, vN2 ]; [ v13, v23, …, vN3 ]; [ v14, v24, …, vN4], then the statistics are respectively 1,0 and-1 in v11, v12, v13 and v 14; 1,0, -1 number in v21, v22, v23, v 24; …, respectively; 1,0, -1 of vN1, vN2, vN3, vN 4.

And S103, receiving the statistical result sent by the federal server, and calculating a global updating gradient according to the statistical result so as to continue model training according to the global updating gradient.

The description of the present step is made by taking the above example as an example: and after receiving the statistical result sent by the federal server, the client calculates the global update gradient according to the number of '1', the number of '0' and the number of '-1' in the statistical result and the total number of the clients. The computed global update gradient obtained by each client is the same.

And after obtaining the global updating gradient, the client continues to train the model according to the global updating gradient to obtain a new round of updating gradient, and repeats the steps until the model training is finished.

In addition, in the communication process of the client and the federal server, the quantized model gradient is transmitted instead of the original model gradient, so that the risk that the original data can be leaked according to the original model gradient can be effectively avoided, and the safety of the local data of the client is ensured.

From the above description, it can be seen that in the method for gradient compression-based federated learning according to the embodiment of the present application, the federated server uniformly sends the initial model to each client; the client trains the initial model to obtain an updated gradient, and quantifies the updated gradient and sends the quantized updated gradient to the federal server; the federated server performs quantity statistics on all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain statistical results, and returns the statistical results to the clients; and the client receives the statistical result sent by the federal server, calculates the global updating gradient according to the statistical result, and continues to train the model according to the global updating gradient. It can be seen that, in the application, the model gradient transmitted between the client and the federal server is a quantized gradient, and the quantized gradient greatly reduces the occupied space, so that the communication traffic of the client and the federal server can be reduced, the performance can be improved, and the problem of high network transmission overhead in the existing federal learning system is solved. In addition, in the communication process of the client and the federal server, the quantized model gradient is transmitted instead of the original model gradient, so that the risk that the original data can be leaked according to the original model gradient can be effectively avoided, and the safety of the local data of the client is ensured.

Further, as a further supplement and refinement to the above embodiment, there is also provided a method for gradient compression-based federal learning, which is applied to a client side, as shown in fig. 2, and includes the following steps:

s201, the client sends the first maximum value of the updated absolute value of the gradient component to the federal server.

The updated gradient has the same meaning as in fig. 1 and will not be described here. The purpose of sending the first maximum value to the federated server in this step is to enable the federated server to determine a second maximum value of all maximum values according to a first maximum value of absolute values of components corresponding to all clients (one first maximum value for each client, and the first maximum values corresponding to different clients may be different). The purpose of determining the second maximum is to quantize the updated gradient according to the second maximum.

And S202, carrying out quantization processing on the updated gradient according to the second maximum value sent by the federal server to obtain the quantized gradient of the updated gradient.

The quantization process in this step will be described by taking a specific example in step S101 of fig. 1 as an example: assuming that the updated gradient obtained by the client training is a vector [ v1, v2, …, vN ], the updated gradient is quantized according to the following rule, i.e. the v1, v2, …, vN is converted into 0, 1, -1 according to the following rule, and for convenience, the second maximum value is denoted as B:

if vi (i ═ 1,2, …, N) is greater than 0, then the probability that vi/B is 1 after quantization and the probability that vi/B is 1-vi/B is 0 (it should be noted that the closer the absolute value of vi is to B, the greater the probability that vi is 1 during quantization); if vi is less than 0, then the probability that vi has-vi/B after quantization is-1, and the probability that vi has 1+ vi/B is 0 (it should be noted that the closer the absolute value of vi is to B, the greater the probability that vi has-1 in the quantization process is); if vi is equal to 0, it remains 0 after quantization. After quantization according to this method, the quantized gradient vector of the gradient updated by the client is [ u1, u2, …, uN ], u1, u2, …, uN, each having one of 0, 1 and-1. The quantized gradient components can be represented by 2 bits, so that the occupied space is greatly reduced.

S203, the client and other clients determine a uniform encryption mode, and encrypt the quantization gradient according to the encryption mode.

The quantization gradient is encrypted to further secure the data. The encryption mode is determined by a plurality of client-side common negotiations of the training model. The specific encryption mode may be homomorphic encryption or rearrangement according to a preset permutation and replacement mode. Homomorphic encryption is a cryptographic technique based on the theory of computational complexity of mathematical problems. The homomorphic encrypted data is processed to produce an output, which is decrypted, the result being the same as the output obtained by processing the unencrypted original data in the same way. Homomorphic encryption needs to set a homomorphic encryption function, and encryption processing is carried out based on the homomorphic encryption function. The encryption function may be homomorphic encryption, multiplicative homomorphic encryption, or the like. A specific predetermined permutation is to permute 0, 1, -1 in a predetermined manner, that is, to rearrange the obtained quantized gradient vector ui (i ═ 1,2, …, N) in a predetermined manner.

And S204, sending the quantitative gradient to a federal server.

Specifically, the encrypted quantization gradients are sent to the federal server, and the purpose of sending the encrypted quantization gradients to the federal server is to enable the federal server to carry out quantity statistics according to the encrypted results according to the quantization gradients sent by the clients, and then the statistics results are returned to the clients.

The implementation manner of the number statistics in this step is similar to that in step S103 in fig. 1, and the only difference is that the quantization gradient is replaced with an encrypted quantization gradient.

And S205, decrypting the statistical result returned by the federal server.

The encryption mode may be homomorphic encryption or rearrangement according to a preset permutation and replacement mode, and the corresponding decryption processing also needs to be decrypted according to the specific encryption mode used. For the statistical result obtained according to the encrypted quantization gradient, for the homomorphic encryption mode, the statistical result needs to be decrypted according to the decryption operation corresponding to the encryption function to obtain the statistical result before encryption. And for the encryption mode rearranged according to the preset permutation and replacement mode, the corresponding decryption processing is to reversely permute the statistical result returned by the federated server and determine the statistical result before rearrangement.

And S206, calculating a global updating gradient according to the statistical result.

In this embodiment, the global update gradient is defined as an average of the gradients of the clients. The principle of calculating the global update gradient in this step is explained by taking the foregoing example as an example: and the client side carries out decryption processing to obtain a statistical result before encryption, and then calculates a global update gradient according to the number of '1', the number of '0' and the number of '-1' corresponding to each component position in the statistical result and the total number of the client sides, wherein the global update gradients obtained by each client side are the same. Assuming that a component position has only k 1 values, this component of the synthesized global update gradient (average gradient) is k × B/M, where M is the total number of clients; if there are only k-1 values, then this component of the synthesized global update gradient is-k × B/M; for a component with only a value of 0, then this component of the synthesized global update gradient is also 0. If there are two or three of the 1, -1,0 values in the statistical result of a component, the calculation formula of this component of the synthesized global update gradient is: if the numbers of 1, -1,0 correspond to P1, P2, P3, respectively, this component of the global update gradient is P1 × B/M + (-P2 × B/M) + P3 × 0 ═ P1-P2) B/M.

The result difference between the synthesized global updating gradient and the direct averaging of the updating gradient of each client is not large, and the final result of model training is not influenced in the actual verification.

According to the embodiment of the application, another method for gradient compression-based federal learning is provided, and is applied to the federal server side as shown in fig. 3, and the method comprises the following steps:

s301, the federal server uniformly sends the initial model to each client so that the client can obtain an updated gradient after training the initial model.

In the initial stage, the federal server sends the initial models to each client side uniformly, so that each client side conducts model training based on local data to obtain updated gradient, and the updated gradient is the corresponding model gradient after the client side conducts model training on the initial models. The initial models and corresponding parameters in the clients and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the model gradients in this embodiment are the gradients of the model parameters. The model is any other model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan service, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms, but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning mode. Wherein, the multi-party organization or platform is equivalent to the client terminals, and the credit evaluation model is equivalent to the model.

The client quantifies the updated gradient so as to compress the model gradient transmitted between the client and the federal server and reduce the communication traffic.

Quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) of discrete values. The quantization process for the updated gradient in this embodiment is to replace a larger number of original updated gradients with a smaller number of values. A specific example is given for explanation, assuming that the updated gradient vector corresponding to the client is [ v1, v2, …, vN ], and a smaller number of values are 1,0, -1, performing quantization processing on the updated gradient is to convert v1, v2, …, vN into any one of 1,0, and-1, and complete quantization processing on the updated gradient. In practical applications, 1,0, -1 can also be adaptively adjusted according to actual requirements, and this example is only schematically illustrated.

S302, receiving the quantization gradients sent by the clients, and performing quantity statistics according to the quantization gradients sent by the clients and different quantization values in the quantization gradients to obtain statistical results.

The process of quantity statistics is explained in connection with the above example: each component in the vector of the quantization gradient is composed of 1,0, -1, and 1,0, -1 is the quantization value in the quantization gradient. "all quantization gradients corresponding to each client perform quantity statistics according to different quantization values in the quantization gradients", that is, the quantity of "1", the quantity of "0", and the quantity of "-1" in each component are respectively counted. Each component refers to a component at the same position corresponding to each client, and the dimensions of the gradient vectors of all the clients are the same. Specific examples are given for illustration: assuming that there are 4 clients, A, B, C, D, the gradient vectors of their respective updates are [ v11, v21, …, vN1 ]; [ v12, v22, …, vN2 ]; [ v13, v23, …, vN3 ]; [ v14, v24, …, vN4], then the statistics are respectively 1,0 and-1 in v11, v12, v13 and v 14; 1,0, -1 number in v21, v22, v23, v 24; …, respectively; 1,0, -1 of vN1, vN2, vN3, vN 4.

And S303, returning the statistical result to each client so that each client calculates a global update gradient according to the statistical result and continues to train the model according to the global update gradient.

The global update gradient in this embodiment is specifically an average gradient of each client, and for the federal learning manner in this embodiment, specifically, the global update gradient calculated according to the statistical result is: and calculating a global updating gradient according to the number of '1', the number of '0' and the number of '-1' of each component position in the statistical result and the total number of clients. The resulting computed global update gradient for each client is ultimately the same.

The calculation principle of the global update gradient is illustrated in a practical example: assuming that a component position has only k 1 values, this component of the synthesized global update gradient (average gradient) is k × B/M, where M is the total number of clients; if there are only k-1 values, then this component of the synthesized global update gradient is-k × B/M; for a component with only a value of 0, then this component of the synthesized global update gradient is also 0. If there are two or three of the 1, -1,0 values in the statistical result of a component, the calculation formula of this component of the synthesized global update gradient is: if the numbers of 1, -1,0 correspond to P1, P2, P3, respectively, this component of the global update gradient is P1 × B/M + (-P2 × B/M) + P3 × 0 ═ P1-P2) B/M.

And after obtaining the global updating gradient, the client continues to train the model according to the global updating gradient to obtain a new round of updating gradient, and repeats the contents in the previous steps until the model training is finished.

Further, as a further supplement and refinement to the above embodiment, there is also provided a method for federal learning based on gradient compression, which is applied to the federal server side, as shown in fig. 4, and includes the following steps:

s401, the federal server uniformly sends the initial model to each client.

In the initial stage, the federal server sends the initial models to each client side uniformly, so that each client side conducts model training based on local data to obtain updated gradient, and the updated gradient is the corresponding model gradient after the client side conducts model training on the initial models.

The initial models and corresponding parameters in the clients and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the model gradients in this embodiment are the gradients of the model parameters. The model is any other model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan service, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms, but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning mode. Wherein, the multi-party organization or platform is equivalent to the client terminals, and the credit evaluation model is equivalent to the model.

S402, receiving a first maximum value of the component absolute value of the updated gradient sent by each client.

After obtaining the updated gradient, each client sends the first maximum value of the component absolute value of the updated gradient to the federal server, so that the federal server can receive the first maximum value of the component absolute value of the updated gradient sent by each client.

In the step, each client is laid for quantifying the updated gradient, and the quantified updated gradient can be sent to the federal server after the client quantifies the updated gradient, so that the transmission safety is ensured, and the communication traffic is greatly reduced.

And S403, determining a second maximum value in all the first maximum values, and returning the second maximum value to each client.

And the federal server records the first maximum value received from all the clients and selects the maximum value as a second maximum value. And returning the second maximum value to each client so that each client can quantize the updated gradient according to the second maximum value.

A specific example is given as an example to explain the quantization processing in this step: assuming that the gradient of the update obtained by the client training is vector [ v1, v2, …, vN ], i.e. the v1, v2, …, vN is subjected to 0, 1, -1 conversion according to the following rules, and for convenience, the second maximum value is denoted as B: if vi (i ═ 1,2, …, N) is greater than 0, then the probability that vi/B is 1 after quantization and the probability that vi/B is 1-vi/B is 0 (it should be noted that the closer the absolute value of vi is to B, the greater the probability that vi is 1 during quantization); if vi is less than 0, then the probability that vi has-vi/B after quantization is-1, and the probability that vi has 1+ vi/B is 0 (it should be noted that the closer the absolute value of vi is to B, the greater the probability that vi has-1 in the quantization process is); if vi is equal to 0, it remains 0 after quantization. After quantization according to this method, the quantized gradient vector of the gradient updated by the client is [ u1, u2, …, uN ], u1, u2, …, uN, each having one of 0, 1 and-1. The quantized gradient components can be represented by 2 bits, so that the occupied space is greatly reduced.

After quantization, in order to further ensure the transmission security, each client encrypts the quantized gradient in a uniform encryption manner determined by all clients. The specific encryption mode may be homomorphic encryption or rearrangement according to a preset permutation and replacement mode. Homomorphic encryption is a cryptographic technique based on the theory of computational complexity of mathematical problems. The homomorphic encrypted data is processed to produce an output, which is decrypted, the result being the same as the output obtained by processing the unencrypted original data in the same way. Homomorphic encryption needs to set a homomorphic encryption function, and encryption processing is carried out based on the homomorphic encryption function. The encryption function may be homomorphic encryption, multiplicative homomorphic encryption, or the like. A specific predetermined permutation may be to permute 0, 1, -1 in a predetermined manner, i.e. to rearrange the quantized updated gradient vector ui (i ═ 1,2, …, N) in a predetermined manner.

S404, receiving the quantization gradients sent by the clients, and performing quantity statistics according to the quantization gradients sent by the clients and different quantization values in the quantization gradients to obtain statistical results.

The implementation manner of "performing quantity statistics according to the quantization gradients sent by the clients and the difference between the quantization values in the quantization gradients to obtain the statistical result" in this step is similar to the implementation manner of "performing quantity statistics according to the quantization gradients sent by the clients and the difference between the quantization values in the quantization gradients and the difference between the quantization gradients in step S302 in fig. 3" in step S302 in fig. 3, the only difference is that the quantization gradient in this step is an encrypted quantization gradient, and the quantization gradient in step S302 in fig. 3 is an unencrypted quantization gradient. For a specific implementation, refer to step S302, which is not described herein again.

And S405, returning the statistical result to each client.

The "returning the statistical result to each client" is to decrypt the statistical result and then calculate the global update gradient. Specifically, each client decrypts the statistical result to obtain a statistical result before encryption (statistical result obtained by performing statistics on the quantization gradient that is not encrypted), and then calculates the global update gradient according to the statistical result. For a specific calculation manner, see a corresponding implementation manner in step S303 in fig. 3, which is not described herein again.

According to an embodiment of the present application, there is also provided a method for gradient compression-based federal learning, including the following steps:

s501, the federal server uniformly sends the initial model to each client.

The implementation of this step is the same as the corresponding implementation of step S301 in fig. 3, and is not described here again.

S502, each client performs model training based on local data to obtain an updated gradient, and performs quantization processing on the updated gradient to obtain a quantized gradient of the updated gradient.

The updated gradient is the corresponding model gradient after the client has trained the initial model. The initial models and corresponding parameters in the clients and the federal server are unified, after each client is trained according to respective local data, the parameters of the models change, the gradients corresponding to the model parameters also change, and the model gradients in this embodiment are the gradients of the model parameters. The model is any other model needing to be trained in a federal learning mode, such as a neural network model. For example, in the loan service, the credit score of the user needs to be evaluated, but the evaluation of the credit score needs to be obtained comprehensively by multiple organizations or platforms, but the data of the multiple organizations or platforms is not shared, so that the evaluation model of the credit score can be obtained finally by solving the problem in a federal learning mode. Wherein, the multi-party organization or platform is equivalent to the client, and the credit evaluation model is equivalent to the model.

The quantization processing of the updated gradient specifically includes:

1) the client sends the first maximum value of the absolute value of the component of the updated gradient to the federated server.

The implementation of this step is the same as the corresponding implementation of step S201 in fig. 2, and is not described here again.

2) And the federal server receives the first maximum value of the updated component absolute value of the gradient sent by each client, determines a second maximum value in all the first maximum values, and returns the second maximum value to each client.

3) And each client side carries out quantization processing on the updated gradient according to the second maximum value sent by the federal server to obtain the quantized gradient of the updated gradient.

The implementation of this step is the same as the corresponding implementation of step S202 in fig. 2, and is not described here again.

And S503, each client sends the quantization gradient to a federal server.

And S504, the federal server receives the quantization gradients sent by the clients, and performs quantity statistics according to the quantization gradients sent by the clients and different quantization values in the quantization gradients to obtain a statistical result.

The implementation of this step is the same as the corresponding implementation of step S302 in fig. 3, and is not described here again.

And S505, the federal server sends the statistical result to each client.

And S506, calculating a global updating gradient by each client according to the statistical result sent by the federal server.

The implementation of this step is the same as the corresponding implementation of step S206 in fig. 2, and is not described here again.

Further, in order to further ensure the security of data during transmission, before step S503, the client and other clients determine a uniform encryption manner, and encrypt the quantization gradient according to the encryption manner. (the implementation manner of the encryption processing may refer to the implementation manner in step S203 in fig. 2, which is not described herein again), the statistical result obtained by the corresponding federal server is counted by the encrypted quantization gradient, and therefore, before the client calculates the global update gradient according to the statistical result sent by the federal server in step S506, decryption processing needs to be performed on the statistical result (the implementation manner of the decryption processing may refer to the implementation manner in step S205 in fig. 2, which is not described herein again), and the statistical result before encryption (the statistical result obtained by counting the quantization gradient that is not encrypted) is obtained.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present application, there is also provided an apparatus for gradient compression-based federal learning for implementing the method described in fig. 1 to 2, as shown in fig. 6, the apparatus includes:

the quantification unit 61 is configured to perform quantification processing on the updated gradient by the client to obtain a quantified gradient of the updated gradient, where the updated gradient is a model gradient corresponding to an initial model trained by the client, and the initial model is an initial model sent to each client by the federation server;

the sending unit 62 is configured to send the quantization gradients to the federal server, so that the federal server performs quantity statistics on all quantization gradients corresponding to each client according to different quantization values in the quantization gradients, and obtains a statistical result;

and the training unit 63 is configured to receive the statistical result sent by the federal server, and calculate a global update gradient according to the statistical result, so as to continue model training according to the global update gradient.

From the above description, it can be seen that in the device for gradient compression-based federal learning according to the embodiment of the present application, the federal server uniformly sends the initial model to each client; the client trains the initial model to obtain an updated gradient, and quantifies the updated gradient and sends the quantized updated gradient to the federal server; the federated server performs quantity statistics on all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain statistical results, and returns the statistical results to the clients; and the client receives the statistical result sent by the federal server, calculates the global updating gradient according to the statistical result, and continues to train the model according to the global updating gradient. It can be seen that, in the application, the model gradient transmitted between the client and the federal server is a quantized gradient, and the quantized gradient greatly reduces the occupied space, so that the communication traffic of the client and the federal server can be reduced, the performance can be improved, and the problem of high network transmission overhead in the existing federal learning system is solved. In addition, in the communication process of the client and the federal server, the quantized model gradient is transmitted instead of the original model gradient, so that the risk that the original data can be leaked according to the original model gradient can be effectively avoided, and the safety of the local data of the client is ensured.

Further, as shown in fig. 7, the apparatus further includes:

a determining unit 64, configured to determine a uniform encryption manner between the client and another client before sending the quantization gradient to the federation server;

an encryption unit 65, configured to perform encryption processing on the quantization gradient in the encryption manner;

further, the encryption scheme is a preset permutation scheme, as shown in fig. 7, the encryption unit 65 is further configured to: and rearranging the quantization values in the quantization gradient according to the permutation and replacement mode.

Further, as shown in fig. 7, the apparatus further includes:

a decryption unit 66, configured to decrypt the statistical result before calculating the global update gradient according to the statistical result.

Further, as shown in fig. 7, the quantization unit 61 includes:

a sending module 611, configured to send, by the client, the first maximum value of the updated absolute value of the component of the gradient to the federated server, so that the federated server determines, according to the first maximum value of the updated absolute value of the component of the gradient corresponding to each client, a second maximum value, where the second maximum value is a maximum value of all the first maximum values;

a quantization module 612, configured to perform quantization processing on the updated gradient according to the second maximum value sent by the federation server, so as to obtain a quantization gradient of the updated gradient.

Specifically, the specific process of implementing the functions of each unit and module in the device in the embodiment of the present application may refer to the related description in the method embodiment, and is not described herein again.

According to an embodiment of the present application, there is also provided an apparatus for gradient compression-based federal learning for implementing the method described in fig. 3 to 4, as shown in fig. 8, the apparatus includes:

a sending unit 71, configured to uniformly send the initial model to each client by the federal server, so that each client obtains a corresponding updated gradient after training the initial model;

a statistical unit 72, configured to receive the quantization gradients sent by each client and corresponding to each client, and perform quantity statistics on all quantization gradients corresponding to each client according to different quantization values in the quantization gradients to obtain a statistical result, where the quantization gradients are obtained by performing quantization processing on respective corresponding updated gradients by each client;

the first returning unit 73 is configured to return the statistical result to each client, so that each client calculates a global update gradient according to the statistical result.

Further, the statistical unit 72 is configured to:

Further, as shown in fig. 9, the apparatus further includes:

a receiving unit 74, configured to receive a first maximum value of component absolute values of the respectively corresponding updated gradients sent by each client before receiving the respectively corresponding quantized gradients sent by each client;

a determining unit 75, configured to determine a second maximum value according to all the first maximum values, where the second maximum value is a maximum value of all the first maximum values;

a second returning unit 76, configured to return the second maximum value to each client, so that each client performs quantization processing on the updated gradient according to the second maximum value.

According to an embodiment of the present application, there is also provided a system for gradient compression-based federated learning for implementing the methods described in fig. 1 to 5 above, as shown in fig. 10, the system includes various clients 81, a federated server 82;

the clients 81 are configured to execute the method of gradient compression-based federated learning described in the embodiments of fig. 1-2;

the federated server 82 is configured to execute the method of gradient compression-based federated learning described in the embodiments of fig. 3-4.

From the above description, it can be seen that in the system for gradient compression-based federal learning according to the embodiment of the present application, the federal server uniformly sends the initial model to each client; the client trains the initial model to obtain an updated gradient, and quantifies the updated gradient and sends the quantized updated gradient to the federal server; the federated server performs quantity statistics on all the quantitative gradients corresponding to the clients according to different quantitative values in the quantitative gradients to obtain statistical results, and returns the statistical results to the clients; and the client receives the statistical result sent by the federal server, calculates the global updating gradient according to the statistical result, and continues to train the model according to the global updating gradient. It can be seen that, in the application, the model gradient transmitted between the client and the federal server is a quantized gradient, and the quantized gradient greatly reduces the occupied space, so that the communication traffic of the client and the federal server can be reduced, the performance can be improved, and the problem of high network transmission overhead in the existing federal learning system is solved. In addition, in the communication process of the client and the federal server, the quantized model gradient is transmitted instead of the original model gradient, so that the risk that the original data can be leaked according to the original model gradient can be effectively avoided, and the safety of the local data of the client is ensured.

Further, in the system for gradient compression-based federal learning in the embodiment of the present application, in order to further ensure the security of data, each client replaces the quantized updated gradient and then sends the replaced gradient to the federal server, and the replacement mode is only known by each client and is unknown by the federal server, so that the security of gradient transmission between the client and the federal server is further ensured.

There is further provided a computer-readable storage medium according to an embodiment of the present application, where the computer-readable storage medium stores computer instructions for causing the computer to execute the method for gradient compression-based federated learning in the above method embodiment.

According to an embodiment of the present application, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of gradient compression based federated learning in the above-described method embodiments.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for gradient compression-based federated learning, the method comprising:

2. The method for gradient compression-based federated learning of claim 1, wherein prior to sending the quantized gradient to the federated server, the method further comprises:

the client and other clients determine a uniform encryption mode;

and encrypting the quantization gradient according to the encryption mode.

3. The method for federal learning based on gradient compression as claimed in claim 2, wherein the encryption scheme is a preset permutation scheme;

4. The method for gradient compression-based federated learning of claim 2, wherein prior to calculating the global update gradient from the statistics, the method further comprises:

and carrying out decryption processing on the statistical result.

5. The method for federal learning based on gradient compression as claimed in any one of claims 1 to 4, wherein the step of quantizing the updated gradient by the client, the step of obtaining the quantized gradient of the updated gradient comprises:

6. A method for gradient compression-based federated learning, the method comprising:

7. The method for federal learning based on gradient compression as claimed in claim 6, wherein the quantity statistics is performed on all the quantization gradients corresponding to the clients according to the difference of quantization values in the quantization gradients, and the obtaining of the statistical result comprises:

8. The method for gradient compression-based federated learning of claim 6 or 7, wherein prior to receiving the respective corresponding quantized gradients sent by each client, the method further comprises:

9. The system for gradient compression-based federated learning is characterized by comprising clients and a federated server;

the clients for executing the gradient compression-based federated learning method of any one of the preceding claims 1 to 5;

the federated server configured to perform the method of gradient compression-based federated learning of any of the preceding claims 6 to 8.

10. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method for gradient compression-based federated learning of any of claims 1 through 8.