WO2023134065A1

WO2023134065A1 - Gradient compression method and apparatus, device, and storage medium

Info

Publication number: WO2023134065A1
Application number: PCT/CN2022/089866
Authority: WO
Inventors: 李泽远; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-14
Filing date: 2022-04-28
Publication date: 2023-07-20
Also published as: CN114386622A

Abstract

The present application relates to artificial intelligence, and provides a federated learning-based gradient compression method and apparatus, a device, and a storage medium. That is, gradient data in gradient data to be transmitted which has a gradient value of no less than a preset gradient threshold is used as first gradient data; gradient data in the gradient data to be transmitted other than the first gradient data is used as second gradient data, and each piece of gradient data in the second gradient data is compressed according to a 2-bit compression strategy or a 4-bit compression strategy; and the first gradient data and the compressed second gradient data is uploaded to a server. According to the present invention, first gradient data having a high degree of importance is screened according to the size of a gradient value, gradient data having a high degree of importance is completely uploaded, and second gradient data having a low degree of importance is compressed according to a corresponding compression strategy. While the modeling accuracy of a model is ensured, the data amount of transmission gradients is reduced, thereby increasing the gradient transmission efficiency.

Description

Gradient compression method, device, equipment and storage medium

This application claims the priority of the Chinese patent application with the application number 2022100442162 and the title of the invention "gradient compression method, device, equipment and storage medium" submitted to the China Patent Office on January 14, 2022, the entire contents of which are hereby incorporated by reference In this application.

technical field

The present invention relates to the technical field of data processing, in particular to a federated learning-based gradient compression method, device, equipment and computer-readable storage medium.

Background technique

Federated learning is a learning method in which data is distributed under different entities. In a federated learning system, the data is distributed on different clients, and the federated server and the client initialize the same model (such as a neural network model) and the same initial parameters of the model. The client first trains on the local data set to obtain the gradient of the model update (the gradient of the model parameters), and then each client entity sends the gradient to the federated server, and the federated server collects the updated gradients of all client entities, and after averaging Return the obtained global gradient to each client entity, so that each client entity can train the model.

The emergence of federated learning provides a new solution direction between data sharing requirements and privacy protection requirements, so it has received more and more attention. However, the inventor realized that in the joint modeling process of federated learning, as the number of clients increases, the amount of gradient data that needs to be transmitted between the federated server and the client also increases. Therefore, how to solve the problem of low gradient transmission efficiency caused by the large amount of gradient data transmitted during the federated learning modeling process has become an urgent technical problem to be solved.

Contents of the invention

The main purpose of the present invention is to provide a gradient compression method, device, device and computer-readable storage medium based on federated learning, aiming to solve the technical problem of low gradient transmission efficiency caused by the large amount of transmitted gradient data in the federated learning modeling process .

In order to achieve the above object, the present invention provides a gradient compression method based on federated learning, the method comprising: obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as The first gradient data; use gradient data other than the first gradient data in the gradient data to be transmitted as the second gradient data, and compress the second gradient according to a 2-bit compression strategy or a 4-bit compression strategy Each gradient data in the data is compressed; uploading the first gradient data and the compressed second gradient data to the server.

In addition, in order to achieve the above object, the present invention also provides a gradient compression device based on federated learning, the device includes: a gradient data acquisition module, used to acquire the gradient data to be transmitted, and store the gradient value in the gradient data to be transmitted The gradient data not less than the preset gradient threshold is used as the first gradient data; the gradient data compression module is used to use the gradient data in the gradient data to be transmitted except the first gradient data as the second gradient data, and according to A 2-bit compression strategy or a 4-bit compression strategy compresses each gradient data in the second gradient data; a gradient data upload module, configured to upload the first gradient data and the compressed second gradient data to the server.

In addition, in order to achieve the above object, the present invention also provides a gradient compression device based on federated learning, the gradient compression device based on federated learning includes a processor, a memory, and a The federated learning-based gradient compression program executed, wherein when the federated learning-based gradient compression program is executed by the processor, it realizes:

Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;

Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;

Upload the first gradient data and the compressed second gradient data to the server.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores a gradient compression program based on federated learning, wherein the gradient compression program based on federated learning is processed by the processor When executed, achieve:

The present invention provides a gradient compression method based on federated learning, the method comprising: acquiring gradient data to be transmitted, and using gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data; Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compressing the gradient data; uploading the first gradient data and the compressed second gradient data to the server. Through the above method, the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.

Description of drawings

Fig. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention;

Fig. 2 is a schematic flow chart of the first embodiment of the federated learning-based gradient compression method of the present invention;

Fig. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.

The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

The federated learning-based gradient compression method involved in the embodiment of the present invention is mainly applied to a federated learning-based gradient compression device, which may be a PC, a portable computer, a mobile terminal, and other devices with display and processing functions.

Referring to FIG. 1, FIG. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention. In the embodiment of the present invention, the gradient compression device based on federated learning may include a processor 1001 (such as a CPU), a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Wherein, the communication bus 1002 is used to realize the connection and communication between these components; the user interface 1003 can include a display screen (Display), an input unit such as a keyboard (Keyboard); the network interface 1004 can optionally include a standard wired interface and a wireless interface (such as WI-FI interface); the memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 can optionally also be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the hardware structure shown in Figure 1 does not constitute a limitation on the gradient compression device based on federated learning, and may include more or less components than those shown in the figure, or combine certain components, or be different layout of the components.

Continuing to refer to FIG. 1 , the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a gradient compression program based on federated learning.

In Figure 1, the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based gradient compression program provided by the embodiment of the present invention. The gradient compression method.

An embodiment of the present invention provides a gradient compression method based on federated learning.

Referring to FIG. 2 , FIG. 2 is a schematic flowchart of the first embodiment of the federated learning-based gradient compression method of the present invention.

In this embodiment, the gradient compression method based on federated learning includes the following steps:

Step S10, acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;

In order to solve the problem of low gradient transmission efficiency caused by the large amount of gradient data transmitted during the federated learning modeling process caused by uploading complete gradients, this embodiment screens out important gradient data based on the gradient value of the gradient data, that is, the first gradient data, and The important gradient data is completely uploaded, thereby improving the accuracy of the modeling, and the remaining non-important gradient data, that is, the second gradient data, is compressed according to the compression strategy, thereby reducing the amount of transmitted gradient data.

Specifically, from the gradient data corresponding to different models, or the gradient data corresponding to different network layers in the same model, the gradient data corresponding to a network layer of a model is obtained as the gradient data to be transmitted. Calculate the gradient value of each gradient data in the data to be transmitted, that is, the absolute value of the gradient data, and compare the gradient value of each gradient data with a preset gradient threshold. Wherein, the gradient threshold can be determined according to the gradient value corresponding to the important gradient data of the actual model. Among the gradient data to be transmitted, the gradient data whose gradient value is not less than a preset gradient threshold and important gradient data are used as the first gradient data. Thus, the important gradient data and the non-important gradient data are screened out through the gradient value of the gradient data.

Exemplarily, after the step of acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as the first gradient data, it further includes:

When the data volume of the first gradient data exceeds the data volume threshold, sort the gradient data in the gradient data to be transmitted according to the gradient value;

Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.

In this embodiment, in order to prevent the data volume of the first gradient data from being too large, after the first gradient data is determined, the data volume of the data to be transmitted is compared with a preset data volume threshold. Wherein, the data volume threshold is set according to actual transmission resources. When the transmission resources are large, the data volume threshold can be set larger, and when the transmission resources are small, the data volume threshold can be set small. When the data volume of the first gradient exceeds the data volume threshold, each gradient data in the first gradient data may be sorted in descending order or ascending order of gradient values according to the Top-K method. When sorting the gradient data in the first gradient data in descending order of gradient values (that is, in descending order of gradient values), or sorting the gradient data in the first gradient data in ascending order of gradient values (that is, according to the order of gradient values from small to large), according to the amount of target data, obtain the k pieces of gradient data with the largest gradient value in each sorted gradient data, among which, the k pieces of gradient data with the largest gradient value The amount of data is not greater than the target amount of data. As a result, the first gradient data that needs to be completely uploaded is updated to avoid an increase in modeling overhead caused by an excessive amount of first gradient data. When the data volume of the first gradient data is less than the data volume threshold, there is no need to filter the gradient data, and the first gradient data can be directly uploaded in its entirety.

Step S20, using gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compress each gradient data of

In this embodiment, gradient data other than the first gradient data in the gradient data to be transmitted is used as second gradient data, that is, non-important gradient data. According to the compression strategy, each gradient data in the first gradient data is compressed into 2 bits (bits) or 4 bits. Among them, the 2-bit compression strategy includes 3 thresholds, the 3 thresholds include 0, a set positive threshold and its opposite number, compress the gradient data less than the set threshold to 0, and compress the gradient data not less than the threshold Data is compressed to this threshold. The 4-bit compression strategy contains 15 thresholds, and the 15 thresholds include 0, 7 set positive thresholds and their opposites (for example, -7, -6, -5, -4, -3, -2, - 1, 0, 1, 2, 3, 4, 5, 6, 7), and compress each gradient data to the corresponding compression threshold. Thus, some accuracy of non-important gradient data is sacrificed to reduce communication overhead.

Exemplarily, the step of compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy specifically includes:

When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;

When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.

In this embodiment, when the variance of the second gradient data is less than the variance threshold, it means that the volatility of the second gradient data is small, and a 2-bit compression strategy can be used to compress each gradient data in the second gradient data to compress. Divide the 2 bits into a sign bit and a value bit, the first bit is the sign bit, and the other bit is the value bit. 0 in the sign bit represents a positive number and 1 represents a negative number. When the variance of the second gradient data is not less than the preset variance threshold, it means that the volatility of the second gradient data is relatively large. The 2-bit compression strategy only includes 3 thresholds, compressing the second gradient data to 0 or setting the threshold, which greatly reduces the accuracy of the gradient data. Therefore, when the variance of the second gradient data is not less than the variance threshold, a 4-bit compression strategy (including 15 thresholds) is used to compress each gradient data in the second gradient data. Divide the 4 bits into a sign bit and a value bit, the first bit is a sign bit, and the rest are value bits. In the sign bit, 0 represents a positive number, 1 represents a negative number, and the 3-digit value bit stores the compressed value. The 4 bits are the compressed value after gradient compression.

Exemplarily, the step of compressing each gradient data in the second gradient data according to the 2-bit compression strategy includes:

Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.

In this embodiment, the variance of the second gradient data is less than the variance threshold, which means that the difference between the gradient data in the second gradient data is small, and the average gradient value of the second gradient data can be used as the compression threshold of the 2-bit compression strategy, That is, the first compression threshold. First determine whether the gradient data is a positive number or a negative number, if it is a positive number, compare the gradient data with the corresponding set positive threshold (the first compression threshold), if it is less than the first compression threshold, then compress the gradient data is 0, if not less than the first compression threshold, the gradient data is compressed to the first compression threshold. If it is a negative number, compare the gradient data with the opposite number of the corresponding set positive threshold (the opposite number of the first compression threshold), if it is less than the opposite number of the first compression threshold, then compress the gradient data into the first compression threshold If the inverse number of a compression threshold is not smaller than the inverse number of the first compression threshold, the gradient data is compressed to 0.

Exemplarily, the step of compressing each gradient data in the second gradient data according to the 4-bit compression strategy specifically includes:

determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;

Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.

In this embodiment, the compression threshold group corresponding to the 4-bit compression strategy is determined before and after the average gradient value of the second gradient data, that is, 15 compression thresholds, and one gradient data in the second gradient data is respectively combined with 15 compression thresholds. The thresholds are compared, and the compression threshold closest to the gradient data is determined among the 15 compression thresholds. Wherein, the compression threshold closest to the gradient data may be the compression threshold with the smallest difference with the gradient data, or the minimum value of the compression thresholds greater than the gradient data, or less than the minimum value of the compression thresholds of the gradient data maximum value. By analogy, the compression of each gradient data in the second gradient data is thus completed.

Exemplarily, the step of determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data specifically includes:

determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;

The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.

In this embodiment, the variance of the second gradient data is not less than the variance threshold, which means that the gradient data in the second gradient data have a large gap. According to the average gradient value of the second gradient data and the The difference between the minimum gradient value, the average gradient value of the second gradient data and the maximum gradient value in the second gradient data determines the second compression threshold corresponding to the 4-bit compression strategy (i.e. 7 positive compression threshold and its inverse number, and 0, the 15 compression thresholds formed by the 7 positive compression thresholds and their inverse number) as the compression threshold group of the second gradient data.

Exemplarily, the step of compressing each gradient data in the second gradient data into corresponding compression thresholds in the compression threshold group, and completing the compression of each gradient data in the second gradient data includes:

using one gradient data in the second gradient data as target gradient data;

Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;

Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.

In this embodiment, the gradient data in the second gradient data is traversed. First, the positive and negative values of a gradient data are judged, and then the gradient data are compared with the second compression thresholds from small to large, and then the gradient data The data is compressed to the value at the smaller end, and the compressed 4-bit data is the corresponding compressed value of the gradient data.

Wherein, the compression process for completing each gradient data in the second gradient data is:

Obtain one gradient data in the second gradient data as the target gradient data, and perform the target gradient data with the second compression thresholds arranged in order (from large to small or from small to large) in the compression threshold group respectively For comparison, for example, the target gradient data is A, and the respective second compression thresholds are -X ₇ , -X ₆ , -X ₅ , -X ₄ , -X ₃ , -X ₂ , -X ₁ , 0, and X ₁ , X ₂ , X ₃ , X ₄ , X ₅ , X ₆ , X ₇ , wherein, X ₁ -X ₇ increase sequentially. If A>X ₃ and A<X ₄ , compress A to X ₃ .

Each gradient data in the second gradient data is sequentially acquired as target gradient data, and the above steps are repeated until the compression of each gradient data in the second gradient data is completed.

It can be understood that the first compression threshold and the second compression threshold can be preset by the user according to actual needs, and can also be the average gradient value of the system based on the second gradient data (the minimum gradient of the second gradient data can also be further combined value and the difference between the maximum gradient value and the average gradient value) are calculated and determined.

Step S30, uploading the first gradient data and the compressed second gradient data to the server.

In this embodiment, the first gradient data with greater importance is completely uploaded to the server, and then the second gradient data is gradient-compressed and uploaded to the server, thereby converting a gradient data that occupies a large memory into a gradient data that occupies a small Gradient data in memory, thereby reducing the communication overhead occupied by each gradient data transmission.

In addition, an embodiment of the present invention also provides a gradient compression device based on federated learning.

Referring to FIG. 3 , FIG. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.

In this embodiment, the gradient compression device based on federated learning includes:

A gradient data acquisition module, configured to acquire gradient data to be transmitted, and use gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;

A gradient data compression module, configured to use gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compress the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy. Each gradient data in the second gradient data is compressed;

The gradient data uploading module is configured to upload the first gradient data and the compressed second gradient data to the server.

Further, the gradient data compression module specifically includes:

A 4-bit compression unit, configured to compress each gradient data in the second gradient data according to the 4-bit compression strategy when the variance of the second gradient data is not less than a preset variance threshold;

A 2-bit compression unit, configured to compress each gradient data in the second gradient data according to the 2-bit compression strategy when the variance of the second gradient data is smaller than the variance threshold.

Further, the 2-bit compression unit specifically includes:

A 2-bit compression subunit, configured to use the average gradient value of the second gradient data as a first compression threshold, and compress each gradient data in the second gradient data to 0, the first compression Threshold or the opposite number of the first compression threshold to complete the compression of each gradient data in the second gradient data.

Further, the 4-bit compression unit specifically includes:

A threshold group determining subunit, configured to determine a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;

The gradient data compression subunit is configured to compress each gradient data in the second gradient data to a corresponding compression threshold in the compression threshold group, and complete the compression of each gradient data in the second gradient data.

Further, the threshold group determination subunit is also specifically used for:

Further, the gradient data compression subunit is also used for:

using one gradient data in the second gradient data as target gradient data;

Further, the gradient compression device based on federated learning also includes:

A gradient data sorting module, configured to sort each gradient data in the gradient data to be transmitted according to gradient values when the data volume of the first gradient data exceeds the data volume threshold;

A gradient data update module, configured to obtain gradient data of a target data amount from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.

Wherein, each module in the above-mentioned federated learning-based gradient compression device corresponds to each step in the above-mentioned federated learning-based gradient compression method embodiment, and its functions and implementation processes will not be repeated here.

In addition, an embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.

The computer-readable storage medium of the present invention stores a federated learning-based gradient compression program, wherein when the federated learning-based gradient compression program is executed by a processor, the steps of the above-mentioned federated learning-based gradient compression method are implemented.

For the method implemented when the federated learning-based gradient compression program is executed, reference may be made to various embodiments of the federated learning-based gradient compression method of the present invention, which will not be repeated here.

The present invention provides a gradient compression method, device, device and computer-readable storage medium based on federated learning. The method includes: acquiring gradient data to be transmitted, and setting the gradient value in the gradient data to be transmitted not less than the preset gradient The gradient data of the threshold is used as the first gradient data; the gradient data other than the first gradient data in the gradient data to be transmitted is used as the second gradient data, and according to the 2-bit compression strategy or the 4-bit compression strategy, the Each gradient data in the second gradient data is compressed; uploading the first gradient data and the compressed second gradient data to the server. Through the above method, the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.

It should be noted that, as used herein, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or system comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or system. Without further limitations, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system comprising that element.

The serial numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present invention.

The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process conversion made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields , are all included in the scope of patent protection of the present invention in the same way.

Claims

A gradient compression method based on federated learning, wherein the gradient compression method based on federated learning comprises the following steps:

Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;

Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;

Upload the first gradient data and the compressed second gradient data to the server.
The gradient compression method based on federated learning according to claim 1, wherein the step of compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy specifically includes :

When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;

When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
The gradient compression method based on federated learning according to claim 2, wherein the step of compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:

Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
The gradient compression method based on federated learning according to claim 2, wherein the step of compressing each gradient data in the second gradient data according to the 4-bit compression strategy specifically comprises:

determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;

Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
The gradient compression method based on federated learning according to claim 4, wherein the step of determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data specifically comprises:

determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;

The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
The gradient compression method based on federated learning according to claim 5, wherein said compressing each gradient data in said second gradient data into corresponding compression thresholds in said compression threshold group respectively completes said second The steps of compressing each gradient data in the gradient data include:

using one gradient data in the second gradient data as target gradient data;

Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;

Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
The gradient compression method based on federated learning according to any one of claims 1 to 6, wherein the acquisition of the gradient data to be transmitted, and the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted After the first gradient data step, also include:

When the data volume of the first gradient data exceeds the data volume threshold, sort each gradient data in the gradient data to be transmitted according to the gradient value;

Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
A gradient compression device based on federated learning, wherein the gradient compression device based on federated learning includes:

A gradient data acquisition module, configured to acquire gradient data to be transmitted, and use gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;

A gradient data compression module, configured to use gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compress the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy. Each gradient data in the second gradient data is compressed;

The gradient data uploading module is configured to upload the first gradient data and the compressed second gradient data to the server.
A federated learning-based gradient compression device, wherein the federated learning-based gradient compression device includes a processor, a memory, and a federated learning-based gradient compression program stored on the memory and executable by the processor , wherein when the gradient compression program based on federated learning is executed by the processor, it can realize:

Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;

Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;

Upload the first gradient data and the compressed second gradient data to the server.
The gradient compression device based on federated learning according to claim 9, wherein said compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy includes:

When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;

When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
The gradient compression device based on federated learning according to claim 10, wherein said compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:

Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
The gradient compression device based on federated learning according to claim 10, wherein said compressing each gradient data in the second gradient data according to the 4-bit compression strategy includes:

determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;

Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
The gradient compression device based on federated learning according to claim 12, wherein said determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data includes:

determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;

The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
The gradient compression device based on federated learning according to claim 13, wherein said compressing each gradient data in said second gradient data into corresponding compression thresholds in said compression threshold group respectively completes said second The compression of each gradient data in the gradient data, including:

using one gradient data in the second gradient data as target gradient data;

Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;

Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
The gradient compression device based on federated learning according to any one of claims 9-14, wherein the acquisition of the gradient data to be transmitted, and the gradient value of the gradient value in the gradient data to be transmitted are not less than the preset gradient threshold After the data is used as the first gradient data, it also includes:

When the data volume of the first gradient data exceeds the data volume threshold, sort the gradient data in the gradient data to be transmitted according to the gradient value;

Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
A computer-readable storage medium, wherein a federated learning-based gradient compression program is stored on the computer-readable storage medium, wherein when the federated learning-based gradient compression program is executed by a processor, it realizes:

Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;

Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;

Upload the first gradient data and the compressed second gradient data to the server.
The computer-readable storage medium according to claim 16, wherein said compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy comprises:

When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;

When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
The computer-readable storage medium according to claim 17, wherein the compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:

Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
The computer-readable storage medium according to claim 17, wherein said compressing each gradient data in the second gradient data according to the 4-bit compression strategy comprises:

determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;

Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
The computer-readable storage medium according to claim 19, wherein said determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data comprises:

determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;

The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.