WO2023134065A1 - Gradient compression method and apparatus, device, and storage medium - Google Patents

Gradient compression method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2023134065A1
WO2023134065A1 PCT/CN2022/089866 CN2022089866W WO2023134065A1 WO 2023134065 A1 WO2023134065 A1 WO 2023134065A1 CN 2022089866 W CN2022089866 W CN 2022089866W WO 2023134065 A1 WO2023134065 A1 WO 2023134065A1
Authority
WO
WIPO (PCT)
Prior art keywords
gradient
gradient data
compression
data
threshold
Prior art date
Application number
PCT/CN2022/089866
Other languages
French (fr)
Chinese (zh)
Inventor
李泽远
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023134065A1 publication Critical patent/WO2023134065A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • the present invention relates to the technical field of data processing, in particular to a federated learning-based gradient compression method, device, equipment and computer-readable storage medium.
  • Federated learning is a learning method in which data is distributed under different entities.
  • the data is distributed on different clients, and the federated server and the client initialize the same model (such as a neural network model) and the same initial parameters of the model.
  • the client first trains on the local data set to obtain the gradient of the model update (the gradient of the model parameters), and then each client entity sends the gradient to the federated server, and the federated server collects the updated gradients of all client entities, and after averaging Return the obtained global gradient to each client entity, so that each client entity can train the model.
  • the main purpose of the present invention is to provide a gradient compression method, device, device and computer-readable storage medium based on federated learning, aiming to solve the technical problem of low gradient transmission efficiency caused by the large amount of transmitted gradient data in the federated learning modeling process .
  • the present invention provides a gradient compression method based on federated learning, the method comprising: obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as The first gradient data; use gradient data other than the first gradient data in the gradient data to be transmitted as the second gradient data, and compress the second gradient according to a 2-bit compression strategy or a 4-bit compression strategy Each gradient data in the data is compressed; uploading the first gradient data and the compressed second gradient data to the server.
  • the present invention also provides a gradient compression device based on federated learning
  • the device includes: a gradient data acquisition module, used to acquire the gradient data to be transmitted, and store the gradient value in the gradient data to be transmitted The gradient data not less than the preset gradient threshold is used as the first gradient data; the gradient data compression module is used to use the gradient data in the gradient data to be transmitted except the first gradient data as the second gradient data, and according to A 2-bit compression strategy or a 4-bit compression strategy compresses each gradient data in the second gradient data; a gradient data upload module, configured to upload the first gradient data and the compressed second gradient data to the server.
  • the present invention also provides a gradient compression device based on federated learning
  • the gradient compression device based on federated learning includes a processor, a memory, and a The federated learning-based gradient compression program executed, wherein when the federated learning-based gradient compression program is executed by the processor, it realizes:
  • the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores a gradient compression program based on federated learning, wherein the gradient compression program based on federated learning is processed by the processor When executed, achieve:
  • the present invention provides a gradient compression method based on federated learning, the method comprising: acquiring gradient data to be transmitted, and using gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data; Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compressing the gradient data; uploading the first gradient data and the compressed second gradient data to the server.
  • the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.
  • Fig. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention
  • Fig. 2 is a schematic flow chart of the first embodiment of the federated learning-based gradient compression method of the present invention
  • Fig. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.
  • the federated learning-based gradient compression method involved in the embodiment of the present invention is mainly applied to a federated learning-based gradient compression device, which may be a PC, a portable computer, a mobile terminal, and other devices with display and processing functions.
  • a federated learning-based gradient compression device which may be a PC, a portable computer, a mobile terminal, and other devices with display and processing functions.
  • FIG. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention.
  • the gradient compression device based on federated learning may include a processor 1001 (such as a CPU), a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 .
  • the communication bus 1002 is used to realize the connection and communication between these components;
  • the user interface 1003 can include a display screen (Display), an input unit such as a keyboard (Keyboard);
  • the network interface 1004 can optionally include a standard wired interface and a wireless interface (such as WI-FI interface);
  • the memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 can optionally also be a storage device independent of the aforementioned processor 1001 .
  • Figure 1 does not constitute a limitation on the gradient compression device based on federated learning, and may include more or less components than those shown in the figure, or combine certain components, or be different layout of the components.
  • the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a gradient compression program based on federated learning.
  • the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based gradient compression program provided by the embodiment of the present invention.
  • the gradient compression method is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based gradient compression program provided by the embodiment of the present invention.
  • the gradient compression method is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based gradient compression program provided by the embodiment of the present invention.
  • the gradient compression method is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based
  • An embodiment of the present invention provides a gradient compression method based on federated learning.
  • FIG. 2 is a schematic flowchart of the first embodiment of the federated learning-based gradient compression method of the present invention.
  • the gradient compression method based on federated learning includes the following steps:
  • Step S10 acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;
  • this embodiment screens out important gradient data based on the gradient value of the gradient data, that is, the first gradient data, and The important gradient data is completely uploaded, thereby improving the accuracy of the modeling, and the remaining non-important gradient data, that is, the second gradient data, is compressed according to the compression strategy, thereby reducing the amount of transmitted gradient data.
  • the gradient data corresponding to a network layer of a model is obtained as the gradient data to be transmitted.
  • the gradient threshold can be determined according to the gradient value corresponding to the important gradient data of the actual model.
  • the gradient data to be transmitted the gradient data whose gradient value is not less than a preset gradient threshold and important gradient data are used as the first gradient data.
  • the important gradient data and the non-important gradient data are screened out through the gradient value of the gradient data.
  • the step of acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as the first gradient data it further includes:
  • Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
  • the data volume of the data to be transmitted is compared with a preset data volume threshold.
  • the data volume threshold is set according to actual transmission resources. When the transmission resources are large, the data volume threshold can be set larger, and when the transmission resources are small, the data volume threshold can be set small.
  • each gradient data in the first gradient data may be sorted in descending order or ascending order of gradient values according to the Top-K method.
  • the first gradient data that needs to be completely uploaded is updated to avoid an increase in modeling overhead caused by an excessive amount of first gradient data.
  • the data volume of the first gradient data is less than the data volume threshold, there is no need to filter the gradient data, and the first gradient data can be directly uploaded in its entirety.
  • Step S20 using gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compress each gradient data of
  • gradient data other than the first gradient data in the gradient data to be transmitted is used as second gradient data, that is, non-important gradient data.
  • each gradient data in the first gradient data is compressed into 2 bits (bits) or 4 bits.
  • the 2-bit compression strategy includes 3 thresholds, the 3 thresholds include 0, a set positive threshold and its opposite number, compress the gradient data less than the set threshold to 0, and compress the gradient data not less than the threshold Data is compressed to this threshold.
  • the 4-bit compression strategy contains 15 thresholds, and the 15 thresholds include 0, 7 set positive thresholds and their opposites (for example, -7, -6, -5, -4, -3, -2, - 1, 0, 1, 2, 3, 4, 5, 6, 7), and compress each gradient data to the corresponding compression threshold.
  • some accuracy of non-important gradient data is sacrificed to reduce communication overhead.
  • the step of compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy specifically includes:
  • each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
  • the variance of the second gradient data when the variance of the second gradient data is less than the variance threshold, it means that the volatility of the second gradient data is small, and a 2-bit compression strategy can be used to compress each gradient data in the second gradient data to compress. Divide the 2 bits into a sign bit and a value bit, the first bit is the sign bit, and the other bit is the value bit. 0 in the sign bit represents a positive number and 1 represents a negative number.
  • the 2-bit compression strategy only includes 3 thresholds, compressing the second gradient data to 0 or setting the threshold, which greatly reduces the accuracy of the gradient data.
  • a 4-bit compression strategy (including 15 thresholds) is used to compress each gradient data in the second gradient data. Divide the 4 bits into a sign bit and a value bit, the first bit is a sign bit, and the rest are value bits. In the sign bit, 0 represents a positive number, 1 represents a negative number, and the 3-digit value bit stores the compressed value. The 4 bits are the compressed value after gradient compression.
  • the step of compressing each gradient data in the second gradient data according to the 2-bit compression strategy includes:
  • the variance of the second gradient data is less than the variance threshold, which means that the difference between the gradient data in the second gradient data is small, and the average gradient value of the second gradient data can be used as the compression threshold of the 2-bit compression strategy, That is, the first compression threshold.
  • the step of compressing each gradient data in the second gradient data according to the 4-bit compression strategy specifically includes:
  • Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
  • the compression threshold group corresponding to the 4-bit compression strategy is determined before and after the average gradient value of the second gradient data, that is, 15 compression thresholds, and one gradient data in the second gradient data is respectively combined with 15 compression thresholds.
  • the thresholds are compared, and the compression threshold closest to the gradient data is determined among the 15 compression thresholds.
  • the compression threshold closest to the gradient data may be the compression threshold with the smallest difference with the gradient data, or the minimum value of the compression thresholds greater than the gradient data, or less than the minimum value of the compression thresholds of the gradient data maximum value.
  • the step of determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data specifically includes:
  • the set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
  • the variance of the second gradient data is not less than the variance threshold, which means that the gradient data in the second gradient data have a large gap.
  • the average gradient value of the second gradient data and the The difference between the minimum gradient value, the average gradient value of the second gradient data and the maximum gradient value in the second gradient data determines the second compression threshold corresponding to the 4-bit compression strategy (i.e. 7 positive compression threshold and its inverse number, and 0, the 15 compression thresholds formed by the 7 positive compression thresholds and their inverse number) as the compression threshold group of the second gradient data.
  • the step of compressing each gradient data in the second gradient data into corresponding compression thresholds in the compression threshold group, and completing the compression of each gradient data in the second gradient data includes:
  • the gradient data in the second gradient data is traversed. First, the positive and negative values of a gradient data are judged, and then the gradient data are compared with the second compression thresholds from small to large, and then the gradient data The data is compressed to the value at the smaller end, and the compressed 4-bit data is the corresponding compressed value of the gradient data.
  • the compression process for completing each gradient data in the second gradient data is:
  • the target gradient data is A
  • the respective second compression thresholds are -X 7 , -X 6 , -X 5 , -X 4 , -X 3 , -X 2 , -X 1 , 0, and X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , wherein, X 1 -X 7 increase sequentially. If A>X 3 and A ⁇ X 4 , compress A to X 3 .
  • Each gradient data in the second gradient data is sequentially acquired as target gradient data, and the above steps are repeated until the compression of each gradient data in the second gradient data is completed.
  • the first compression threshold and the second compression threshold can be preset by the user according to actual needs, and can also be the average gradient value of the system based on the second gradient data (the minimum gradient of the second gradient data can also be further combined value and the difference between the maximum gradient value and the average gradient value) are calculated and determined.
  • Step S30 uploading the first gradient data and the compressed second gradient data to the server.
  • the first gradient data with greater importance is completely uploaded to the server, and then the second gradient data is gradient-compressed and uploaded to the server, thereby converting a gradient data that occupies a large memory into a gradient data that occupies a small Gradient data in memory, thereby reducing the communication overhead occupied by each gradient data transmission.
  • an embodiment of the present invention also provides a gradient compression device based on federated learning.
  • FIG. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.
  • the gradient compression device based on federated learning includes:
  • a gradient data acquisition module configured to acquire gradient data to be transmitted, and use gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;
  • a gradient data compression module configured to use gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compress the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy. Each gradient data in the second gradient data is compressed;
  • the gradient data uploading module is configured to upload the first gradient data and the compressed second gradient data to the server.
  • the gradient data compression module specifically includes:
  • a 4-bit compression unit configured to compress each gradient data in the second gradient data according to the 4-bit compression strategy when the variance of the second gradient data is not less than a preset variance threshold
  • a 2-bit compression unit configured to compress each gradient data in the second gradient data according to the 2-bit compression strategy when the variance of the second gradient data is smaller than the variance threshold.
  • the 2-bit compression unit specifically includes:
  • a 2-bit compression subunit configured to use the average gradient value of the second gradient data as a first compression threshold, and compress each gradient data in the second gradient data to 0, the first compression Threshold or the opposite number of the first compression threshold to complete the compression of each gradient data in the second gradient data.
  • the 4-bit compression unit specifically includes:
  • a threshold group determining subunit configured to determine a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data
  • the gradient data compression subunit is configured to compress each gradient data in the second gradient data to a corresponding compression threshold in the compression threshold group, and complete the compression of each gradient data in the second gradient data.
  • threshold group determination subunit is also specifically used for:
  • the set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
  • the gradient data compression subunit is also used for:
  • the gradient compression device based on federated learning also includes:
  • a gradient data sorting module configured to sort each gradient data in the gradient data to be transmitted according to gradient values when the data volume of the first gradient data exceeds the data volume threshold;
  • a gradient data update module configured to obtain gradient data of a target data amount from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
  • each module in the above-mentioned federated learning-based gradient compression device corresponds to each step in the above-mentioned federated learning-based gradient compression method embodiment, and its functions and implementation processes will not be repeated here.
  • an embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium of the present invention stores a federated learning-based gradient compression program, wherein when the federated learning-based gradient compression program is executed by a processor, the steps of the above-mentioned federated learning-based gradient compression method are implemented.
  • the present invention provides a gradient compression method, device, device and computer-readable storage medium based on federated learning.
  • the method includes: acquiring gradient data to be transmitted, and setting the gradient value in the gradient data to be transmitted not less than the preset gradient
  • the gradient data of the threshold is used as the first gradient data; the gradient data other than the first gradient data in the gradient data to be transmitted is used as the second gradient data, and according to the 2-bit compression strategy or the 4-bit compression strategy, the Each gradient data in the second gradient data is compressed; uploading the first gradient data and the compressed second gradient data to the server.
  • the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present invention can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present invention.
  • a terminal device which may be a mobile phone, computer, server, air conditioner, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present application relates to artificial intelligence, and provides a federated learning-based gradient compression method and apparatus, a device, and a storage medium. That is, gradient data in gradient data to be transmitted which has a gradient value of no less than a preset gradient threshold is used as first gradient data; gradient data in the gradient data to be transmitted other than the first gradient data is used as second gradient data, and each piece of gradient data in the second gradient data is compressed according to a 2-bit compression strategy or a 4-bit compression strategy; and the first gradient data and the compressed second gradient data is uploaded to a server. According to the present invention, first gradient data having a high degree of importance is screened according to the size of a gradient value, gradient data having a high degree of importance is completely uploaded, and second gradient data having a low degree of importance is compressed according to a corresponding compression strategy. While the modeling accuracy of a model is ensured, the data amount of transmission gradients is reduced, thereby increasing the gradient transmission efficiency.

Description

梯度压缩方法、装置、设备及存储介质Gradient compression method, device, equipment and storage medium
本申请要求于2022年01月14日提交中国专利局、申请号为2022100442162,发明名称为“梯度压缩方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 2022100442162 and the title of the invention "gradient compression method, device, equipment and storage medium" submitted to the China Patent Office on January 14, 2022, the entire contents of which are hereby incorporated by reference In this application.
技术领域technical field
本发明涉及数据处理技术领域,尤其涉及一种基于联邦学习的梯度压缩方法、装置、设备及计算机可读存储介质。The present invention relates to the technical field of data processing, in particular to a federated learning-based gradient compression method, device, equipment and computer-readable storage medium.
背景技术Background technique
联邦学习是一种数据分布在不同实体下的学习方式。一种联邦学习系统中,数据分布在不同的客户端上,联邦服务器和客户端初始化相同的模型(神比如经网络模型)和相同的模型初始参数。客户端首先在本地数据集上训练,得到模型更新的梯度(模型参数的梯度),然后各个客户端实体将梯度发送至联邦服务器,联邦服务器收集所有客户端实体更新的梯度后,经过平均处理后将得到的全局梯度后返回给各客户端实体,以使个客户端实体进行模型的训练。Federated learning is a learning method in which data is distributed under different entities. In a federated learning system, the data is distributed on different clients, and the federated server and the client initialize the same model (such as a neural network model) and the same initial parameters of the model. The client first trains on the local data set to obtain the gradient of the model update (the gradient of the model parameters), and then each client entity sends the gradient to the federated server, and the federated server collects the updated gradients of all client entities, and after averaging Return the obtained global gradient to each client entity, so that each client entity can train the model.
联邦学习的出现为数据共享需求和隐私保护要求之间提供了一种新的解决方向,因此受到了越来越多的关注。但发明人意识到联邦学习的联合建模过程中,随着客户端的增加,联邦服务器与客户端之间的需要传输的梯度数据量也随之增大。因此,如何解决联邦学习建模过程中传输梯度数据量大导致的梯度传输效率低下的问题,成为了目前亟待解决的技术问题。The emergence of federated learning provides a new solution direction between data sharing requirements and privacy protection requirements, so it has received more and more attention. However, the inventor realized that in the joint modeling process of federated learning, as the number of clients increases, the amount of gradient data that needs to be transmitted between the federated server and the client also increases. Therefore, how to solve the problem of low gradient transmission efficiency caused by the large amount of gradient data transmitted during the federated learning modeling process has become an urgent technical problem to be solved.
发明内容Contents of the invention
本发明的主要目的在于提供一种基于联邦学习的梯度压缩方法、装置、设备及计算机可读存储介质,旨在解决联邦学习建模过程中传输梯度数据量大导致的梯度传输效率低下的技术问题。The main purpose of the present invention is to provide a gradient compression method, device, device and computer-readable storage medium based on federated learning, aiming to solve the technical problem of low gradient transmission efficiency caused by the large amount of transmitted gradient data in the federated learning modeling process .
为实现上述目的,本发明提供一种基于联邦学习的梯度压缩方法,所述方法包括:获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。In order to achieve the above object, the present invention provides a gradient compression method based on federated learning, the method comprising: obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as The first gradient data; use gradient data other than the first gradient data in the gradient data to be transmitted as the second gradient data, and compress the second gradient according to a 2-bit compression strategy or a 4-bit compression strategy Each gradient data in the data is compressed; uploading the first gradient data and the compressed second gradient data to the server.
此外,为实现上述目的,本发明还提供一种基于联邦学习的梯度压缩装置,所述装置包括:梯度数据获取模块,用于获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;梯度数据压缩模块,用于将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;梯度数据上传模块,用于将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。In addition, in order to achieve the above object, the present invention also provides a gradient compression device based on federated learning, the device includes: a gradient data acquisition module, used to acquire the gradient data to be transmitted, and store the gradient value in the gradient data to be transmitted The gradient data not less than the preset gradient threshold is used as the first gradient data; the gradient data compression module is used to use the gradient data in the gradient data to be transmitted except the first gradient data as the second gradient data, and according to A 2-bit compression strategy or a 4-bit compression strategy compresses each gradient data in the second gradient data; a gradient data upload module, configured to upload the first gradient data and the compressed second gradient data to the server.
此外,为实现上述目的,本发明还提供一种基于联邦学习的梯度压缩设备,所述基于联邦学习的梯度压缩设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的基于联邦学习的梯度压缩程序,其中所述基于联邦学习的梯度压缩程序被所述处理器执行时,实现:In addition, in order to achieve the above object, the present invention also provides a gradient compression device based on federated learning, the gradient compression device based on federated learning includes a processor, a memory, and a The federated learning-based gradient compression program executed, wherein when the federated learning-based gradient compression program is executed by the processor, it realizes:
获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;
将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并 根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;
将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Upload the first gradient data and the compressed second gradient data to the server.
此外,为实现上述目的,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于联邦学习的梯度压缩程序,其中所述基于联邦学习的梯度压缩程序被处理器执行时,实现:In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores a gradient compression program based on federated learning, wherein the gradient compression program based on federated learning is processed by the processor When executed, achieve:
获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;
将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;
将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Upload the first gradient data and the compressed second gradient data to the server.
本发明提供一种基于联邦学习的梯度压缩方法,所述方法包括:获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。通过上述方式,本发明根据梯度值的大小,筛选出重要度高的第一梯度数据,然后将重要度高的梯度数据完整上传,并将重要度较低的第二梯度数据按照2比特位压缩策略或4比特位压缩策略进行压缩。由此,在保证模型建模准确性的同时,降低了传输梯度的数据量,提高了梯度传输效率,解决了现有联邦学习建模过程中传输梯度数据量大导致的梯度传输效率低下的技术问题。The present invention provides a gradient compression method based on federated learning, the method comprising: acquiring gradient data to be transmitted, and using gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data; Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compressing the gradient data; uploading the first gradient data and the compressed second gradient data to the server. Through the above method, the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.
附图说明Description of drawings
图1为本发明实施例方案中涉及的基于联邦学习的梯度压缩设备的硬件结构示意图;Fig. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention;
图2为本发明基于联邦学习的梯度压缩方法第一实施例的流程示意图;Fig. 2 is a schematic flow chart of the first embodiment of the federated learning-based gradient compression method of the present invention;
图3为本发明基于联邦学习的梯度压缩装置第一实施例的功能模块示意图。Fig. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
本发明实施例涉及的基于联邦学习的梯度压缩方法主要应用于基于联邦学习的梯度压缩设备,该基于联邦学习的梯度压缩设备可以是PC、便携计算机、移动终端等具有显示和处理功能的设备。The federated learning-based gradient compression method involved in the embodiment of the present invention is mainly applied to a federated learning-based gradient compression device, which may be a PC, a portable computer, a mobile terminal, and other devices with display and processing functions.
参照图1,图1为本发明实施例方案中涉及的基于联邦学习的梯度压缩设备的硬件结构示意图。本发明实施例中,基于联邦学习的梯度压缩设备可以包括处理器1001(例如CPU),通信总线1002,用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信;用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard);网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口);存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器,存储器1005可选的还可以是独立于前述处理器1001的存储装置。Referring to FIG. 1, FIG. 1 is a schematic diagram of the hardware structure of the federated learning-based gradient compression device involved in the solution of the embodiment of the present invention. In the embodiment of the present invention, the gradient compression device based on federated learning may include a processor 1001 (such as a CPU), a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Wherein, the communication bus 1002 is used to realize the connection and communication between these components; the user interface 1003 can include a display screen (Display), an input unit such as a keyboard (Keyboard); the network interface 1004 can optionally include a standard wired interface and a wireless interface (such as WI-FI interface); the memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 can optionally also be a storage device independent of the aforementioned processor 1001 .
本领域技术人员可以理解,图1中示出的硬件结构并不构成对基于联邦学习的梯度压缩设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件 布置。Those skilled in the art can understand that the hardware structure shown in Figure 1 does not constitute a limitation on the gradient compression device based on federated learning, and may include more or less components than those shown in the figure, or combine certain components, or be different layout of the components.
继续参照图1,图1中作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块以及基于联邦学习的梯度压缩程序。Continuing to refer to FIG. 1 , the memory 1005 as a computer-readable storage medium in FIG. 1 may include an operating system, a network communication module, and a gradient compression program based on federated learning.
在图1中,网络通信模块主要用于连接服务器,与服务器进行数据通信;而处理器1001可以调用存储器1005中存储的基于联邦学习的梯度压缩程序,并执行本发明实施例提供的基于联邦学习的梯度压缩方法。In Figure 1, the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the federated learning-based gradient compression program stored in the memory 1005, and execute the federated learning-based gradient compression program provided by the embodiment of the present invention. The gradient compression method.
本发明实施例提供了一种基于联邦学习的梯度压缩方法。An embodiment of the present invention provides a gradient compression method based on federated learning.
参照图2,图2为本发明基于联邦学习的梯度压缩方法第一实施例的流程示意图。Referring to FIG. 2 , FIG. 2 is a schematic flowchart of the first embodiment of the federated learning-based gradient compression method of the present invention.
本实施例中,所述基于联邦学习的梯度压缩方法包括以下步骤:In this embodiment, the gradient compression method based on federated learning includes the following steps:
步骤S10,获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Step S10, acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;
为了解决上传完整梯度导致的联邦学习建模过程中传输梯度数据量大导致的梯度传输效率低下的问题,本实施例基于梯度数据的梯度值筛选出重要梯度数据,即第一梯度数据,并将重要梯度数据完整上传,由此提高建模的准确性,将剩余非重要梯度数据,即第二梯度数据,按照压缩策略进行压缩,由此降低了传输梯度的数据量。In order to solve the problem of low gradient transmission efficiency caused by the large amount of gradient data transmitted during the federated learning modeling process caused by uploading complete gradients, this embodiment screens out important gradient data based on the gradient value of the gradient data, that is, the first gradient data, and The important gradient data is completely uploaded, thereby improving the accuracy of the modeling, and the remaining non-important gradient data, that is, the second gradient data, is compressed according to the compression strategy, thereby reducing the amount of transmitted gradient data.
具体地,在不同模型对应的梯度数据,或同一模型中的不同网络层对应的梯度数据,获取一模型的一网络层对应的梯度数据,作为待传输梯度数据。计算所述待传输数据中各个梯度数据的梯度值,即梯度数据的绝对值,将各个梯度数据的梯度值与预先设置的梯度阈值进行比对。其中,梯度阈值可根据实际模型的重要梯度数据对应的梯度值确定。将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据,及重要梯度数据,作为第一梯度数据。由此,通过梯度数据的梯度值筛选出重要梯度数据以及非重要梯度数据。Specifically, from the gradient data corresponding to different models, or the gradient data corresponding to different network layers in the same model, the gradient data corresponding to a network layer of a model is obtained as the gradient data to be transmitted. Calculate the gradient value of each gradient data in the data to be transmitted, that is, the absolute value of the gradient data, and compare the gradient value of each gradient data with a preset gradient threshold. Wherein, the gradient threshold can be determined according to the gradient value corresponding to the important gradient data of the actual model. Among the gradient data to be transmitted, the gradient data whose gradient value is not less than a preset gradient threshold and important gradient data are used as the first gradient data. Thus, the important gradient data and the non-important gradient data are screened out through the gradient value of the gradient data.
示例性的,所述获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据的步骤之后,还包括:Exemplarily, after the step of acquiring the gradient data to be transmitted, and using the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted as the first gradient data, it further includes:
在所述第一梯度数据的数据量超过所述数据量阈值时,将所述待传输梯度数据中的各个梯度数据按照梯度值进行排序;When the data volume of the first gradient data exceeds the data volume threshold, sort the gradient data in the gradient data to be transmitted according to the gradient value;
在排序后的各个梯度数据中获取目标数据量的梯度数据,作为更新后的第一梯度数据,其中,所述目标数据量不大于所述数据量阈值。Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
本实施例中,为了防止第一梯度数据的数据量过大,在确定第一梯度数据后,将所述待传输数据的数据量与预先设置的数据量阈值进行对比。其中,数据量阈值为根据实际传输资源进行设置,在传输资源较大时,可将数据量阈值设置较大,在传输资源较小时,可将数据量阈值设置较小。在第一梯度的数据量超过数据量阈值时,可根据Top-K方法,将第一梯度数据中的各个梯度数据按照梯度值的降序或升序进行排序。在将第一梯度数据中的各个梯度数据按照梯度值的降序(即按照梯度值的由大到小的顺序)进行排序时,或在将第一梯度数据中的各个梯度数据按照梯度值的升序(即按照梯度值的由小到大的顺序)进行排序时,根据目标数据量,在排序后的各个梯度数据中获取梯度值最大的k条梯度数据,其中,梯度值最大的k条梯度数据的数据量不大于所述目标数据量。由此,将需要完整上传的第一梯度数据进行更新,避免第一梯度数据的数据量过大导致建模开销增加。在所述第一梯度数据的数据量小于所述数据量阈值时,则无需进行梯度数据筛选,直接将所述第一梯度数据进行完整上传即可。In this embodiment, in order to prevent the data volume of the first gradient data from being too large, after the first gradient data is determined, the data volume of the data to be transmitted is compared with a preset data volume threshold. Wherein, the data volume threshold is set according to actual transmission resources. When the transmission resources are large, the data volume threshold can be set larger, and when the transmission resources are small, the data volume threshold can be set small. When the data volume of the first gradient exceeds the data volume threshold, each gradient data in the first gradient data may be sorted in descending order or ascending order of gradient values according to the Top-K method. When sorting the gradient data in the first gradient data in descending order of gradient values (that is, in descending order of gradient values), or sorting the gradient data in the first gradient data in ascending order of gradient values (that is, according to the order of gradient values from small to large), according to the amount of target data, obtain the k pieces of gradient data with the largest gradient value in each sorted gradient data, among which, the k pieces of gradient data with the largest gradient value The amount of data is not greater than the target amount of data. As a result, the first gradient data that needs to be completely uploaded is updated to avoid an increase in modeling overhead caused by an excessive amount of first gradient data. When the data volume of the first gradient data is less than the data volume threshold, there is no need to filter the gradient data, and the first gradient data can be directly uploaded in its entirety.
步骤S20,将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Step S20, using gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Compress each gradient data of
本实施例中,将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据,作为第二梯度数据,即非重要梯度数据。根据压缩策略,将所述第一梯度数据中的各个梯度数据压缩为2bit(比特位)或4bit。其中,2比特位压缩策略包含3个阈值,3个阈值包括0、 一个设定的正数阈值及其相反数,将小于该设定阈值的梯度数据压缩为0,将不小于该阈值的梯度数据压缩为该阈值。4比特位压缩策略包含15个阈值,15个阈值包括0、7个设定的正数阈值及其相反数(例如,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7),并将各个梯度数据压缩为对应的压缩阈值。由此,牺牲非重要梯度数据的部分精度来降低通信开销。In this embodiment, gradient data other than the first gradient data in the gradient data to be transmitted is used as second gradient data, that is, non-important gradient data. According to the compression strategy, each gradient data in the first gradient data is compressed into 2 bits (bits) or 4 bits. Among them, the 2-bit compression strategy includes 3 thresholds, the 3 thresholds include 0, a set positive threshold and its opposite number, compress the gradient data less than the set threshold to 0, and compress the gradient data not less than the threshold Data is compressed to this threshold. The 4-bit compression strategy contains 15 thresholds, and the 15 thresholds include 0, 7 set positive thresholds and their opposites (for example, -7, -6, -5, -4, -3, -2, - 1, 0, 1, 2, 3, 4, 5, 6, 7), and compress each gradient data to the corresponding compression threshold. Thus, some accuracy of non-important gradient data is sacrificed to reduce communication overhead.
示例性的,所述根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤具体包括:Exemplarily, the step of compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy specifically includes:
在所述第二梯度数据的方差不小于预设方差阈值时,根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;
在所述第二梯度数据的方差小于所述方差阈值时,根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩。When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
本实施例中,在所述第二梯度数据的方差小于所述方差阈值时,即表示第二梯度数据的波动性较小,可采用2比特位压缩策略对第二梯度数据中的各梯度数据进行压缩。将2比特位划分位符号位和数值位,首位为符号位,另外一位为数值位。符号位里0代表正数,1代表负数。在第二梯度数据的方差不小于预设方差阈值时,即表示第二梯度数据的波动性较大。而2比特位压缩策略仅包含3个阈值,将第二梯度数据压缩为0或设定阈值,极大降低了梯度数据的精度。因此,在第二梯度数据方差不小于所述方差阈值时,采用4比特位压缩策略(包含15个阈值)对第二梯度数据中的各梯度数据进行压缩。将4比特位划分位符号位和数值位,首位为符号位,其余为数值位。符号位里0代表正数,1代表负数,3位数值位存储压缩后的数值。此4位比特位即为梯度压缩后的压缩值。In this embodiment, when the variance of the second gradient data is less than the variance threshold, it means that the volatility of the second gradient data is small, and a 2-bit compression strategy can be used to compress each gradient data in the second gradient data to compress. Divide the 2 bits into a sign bit and a value bit, the first bit is the sign bit, and the other bit is the value bit. 0 in the sign bit represents a positive number and 1 represents a negative number. When the variance of the second gradient data is not less than the preset variance threshold, it means that the volatility of the second gradient data is relatively large. The 2-bit compression strategy only includes 3 thresholds, compressing the second gradient data to 0 or setting the threshold, which greatly reduces the accuracy of the gradient data. Therefore, when the variance of the second gradient data is not less than the variance threshold, a 4-bit compression strategy (including 15 thresholds) is used to compress each gradient data in the second gradient data. Divide the 4 bits into a sign bit and a value bit, the first bit is a sign bit, and the rest are value bits. In the sign bit, 0 represents a positive number, 1 represents a negative number, and the 3-digit value bit stores the compressed value. The 4 bits are the compressed value after gradient compression.
示例性的,所述根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤包括:Exemplarily, the step of compressing each gradient data in the second gradient data according to the 2-bit compression strategy includes:
将所述第二梯度数据的平均梯度值,作为第一压缩阈值,并将所述第二梯度数据中的各个梯度数据分别压缩为0、所述第一压缩阈值或所述第一压缩阈值的相反数,完成所述第二梯度数据中每个梯度数据的压缩。Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
本实施例中,第二梯度数据的方差小于方差阈值,即表示第二梯度数据中的各个梯度数据差距较小,可将第二梯度数据的平均梯度值作为2比特位压缩策略的压缩阈值,即第一压缩阈值。首先确定梯度数据为正数还是负数,若为正数,将该梯度数据与对应的设定正数阈值(第一压缩阈值)进行比对,若小于第一压缩阈值,则将该梯度数据压缩为0,若不小于第一压缩阈值,则将该梯度数据压缩为第一压缩阈值。若为负数,将该梯度数据与对应的设定正数阈值的相反数(第一压缩阈值的相反数)进行比对,若小于第一压缩阈值的相反数,则将该梯度数据压缩为第一压缩阈值的相反数,若不小于第一压缩阈值的相反数,则将该梯度数据压缩为0。In this embodiment, the variance of the second gradient data is less than the variance threshold, which means that the difference between the gradient data in the second gradient data is small, and the average gradient value of the second gradient data can be used as the compression threshold of the 2-bit compression strategy, That is, the first compression threshold. First determine whether the gradient data is a positive number or a negative number, if it is a positive number, compare the gradient data with the corresponding set positive threshold (the first compression threshold), if it is less than the first compression threshold, then compress the gradient data is 0, if not less than the first compression threshold, the gradient data is compressed to the first compression threshold. If it is a negative number, compare the gradient data with the opposite number of the corresponding set positive threshold (the opposite number of the first compression threshold), if it is less than the opposite number of the first compression threshold, then compress the gradient data into the first compression threshold If the inverse number of a compression threshold is not smaller than the inverse number of the first compression threshold, the gradient data is compressed to 0.
示例性的,所述根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤具体包括:Exemplarily, the step of compressing each gradient data in the second gradient data according to the 4-bit compression strategy specifically includes:
根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组;determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;
将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩。Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
本实施例中,在第二梯度数据的平均梯度值的前后确定4比特位压缩策略对应的压缩阈值组,即15个压缩阈值,将第二梯度数据中的一个梯度数据,分别与15个压缩阈值进行比对,并在15个压缩阈值中确定与该梯度数据最接近的压缩阈值。其中,与该梯度数据最接近的压缩阈值可以是与该梯度数据的差值最小的压缩阈值,或者是大于该梯度数据的压缩阈值中的最小值,或者是小于该梯度数据的压缩阈值中的最大值。依次类推,由此完成第二梯度数据中每个梯度数据的压缩。In this embodiment, the compression threshold group corresponding to the 4-bit compression strategy is determined before and after the average gradient value of the second gradient data, that is, 15 compression thresholds, and one gradient data in the second gradient data is respectively combined with 15 compression thresholds. The thresholds are compared, and the compression threshold closest to the gradient data is determined among the 15 compression thresholds. Wherein, the compression threshold closest to the gradient data may be the compression threshold with the smallest difference with the gradient data, or the minimum value of the compression thresholds greater than the gradient data, or less than the minimum value of the compression thresholds of the gradient data maximum value. By analogy, the compression of each gradient data in the second gradient data is thus completed.
示例性的,所述根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的 压缩阈值组的步骤具体包括:Exemplarily, the step of determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data specifically includes:
根据预设差值以及所述平均梯度值,确定所述第二梯度数据对应的各个第二压缩阈值;determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;
根据各个第二压缩阈值以及各个第二压缩阈值对应的相反数,生成所述压缩阈值组。The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
本实施例中,第二梯度数据的方差不小于方差阈值,即表示第二梯度数据中的各个梯度数据差距较大,可根据所述第二梯度数据的平均梯度值与第二梯度数据中的最小梯度值的差值、所述第二梯度数据的平均梯度值与第二梯度数据中的最大梯度值的差值,确定4比特位压缩策略对应的第二压缩阈值(即7个正数压缩阈值及其相反数,并将0、该7个正数压缩阈值及其相反数组成的15个压缩阈值),作为所述第二梯度数据的压缩阈值组。In this embodiment, the variance of the second gradient data is not less than the variance threshold, which means that the gradient data in the second gradient data have a large gap. According to the average gradient value of the second gradient data and the The difference between the minimum gradient value, the average gradient value of the second gradient data and the maximum gradient value in the second gradient data determines the second compression threshold corresponding to the 4-bit compression strategy (i.e. 7 positive compression threshold and its inverse number, and 0, the 15 compression thresholds formed by the 7 positive compression thresholds and their inverse number) as the compression threshold group of the second gradient data.
示例性的,所述将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩的步骤包括:Exemplarily, the step of compressing each gradient data in the second gradient data into corresponding compression thresholds in the compression threshold group, and completing the compression of each gradient data in the second gradient data includes:
将所述第二梯度数据中的一个梯度数据,作为目标梯度数据;using one gradient data in the second gradient data as target gradient data;
将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值;Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;
将所述目标梯度数据压缩为所述目标压缩阈值,并获取所述第二梯度数据中的下一个梯度数据,作为所述目标梯度数据,并执行:将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值,直至完成所述第二梯度数据中每个梯度数据的压缩。Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
本实施例中,对第二梯度数据中的梯度数据进行遍历,首先判断一个梯度数据的正负值,再将该梯度数据分别和由小到大的第二压缩阈值进行比较,再将该梯度数据压缩至较小一端的数值,压缩后的4位数据是该梯度数据对应的压缩值。In this embodiment, the gradient data in the second gradient data is traversed. First, the positive and negative values of a gradient data are judged, and then the gradient data are compared with the second compression thresholds from small to large, and then the gradient data The data is compressed to the value at the smaller end, and the compressed 4-bit data is the corresponding compressed value of the gradient data.
其中,完成所述第二梯度数据中每个梯度数据的压缩过程为:Wherein, the compression process for completing each gradient data in the second gradient data is:
获取所述第二梯度数据中的一个梯度数据,作为目标梯度数据,并将目标梯度数据分别与压缩阈值组中按序排列(由大到小或由小到大)的各个第二压缩阈值进行比对,例如,目标梯度数据为A,各个第二压缩阈值分别为-X 7、-X 6、-X 5、-X 4、-X 3、-X 2、-X 1、0、X 1、X 2、X 3、X 4、X 5、X 6、X 7,其中,X 1-X 7依次增大。若A>X 3且A<X 4,则将A压缩为X 3Obtain one gradient data in the second gradient data as the target gradient data, and perform the target gradient data with the second compression thresholds arranged in order (from large to small or from small to large) in the compression threshold group respectively For comparison, for example, the target gradient data is A, and the respective second compression thresholds are -X 7 , -X 6 , -X 5 , -X 4 , -X 3 , -X 2 , -X 1 , 0, and X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 , wherein, X 1 -X 7 increase sequentially. If A>X 3 and A<X 4 , compress A to X 3 .
依次获取所述第二梯度数据中的各个梯度数据作为目标梯度数据,重复上述步骤,直至完成所述第二梯度数据中每个梯度数据的压缩。Each gradient data in the second gradient data is sequentially acquired as target gradient data, and the above steps are repeated until the compression of each gradient data in the second gradient data is completed.
可以理解的是,第一压缩阈值和第二压缩阈值可以是用户根据实际需要进行预先设定,还可以是系统根据第二梯度数据的平均梯度值(还可以进一步结合第二梯度数据的最小梯度值及最大梯度值分别与平均梯度值的差值)计算确定。It can be understood that the first compression threshold and the second compression threshold can be preset by the user according to actual needs, and can also be the average gradient value of the system based on the second gradient data (the minimum gradient of the second gradient data can also be further combined value and the difference between the maximum gradient value and the average gradient value) are calculated and determined.
步骤S30,将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Step S30, uploading the first gradient data and the compressed second gradient data to the server.
本实施例中,将重要性较大的第一梯度数据完整上传至服务器,然后将第二梯度数据梯度压缩后上传至服务器,由此,将一个占用较大内存的梯度数据转换为占用较小内存的梯度数据,由此降低每个梯度数据传输时占用的通信开销。In this embodiment, the first gradient data with greater importance is completely uploaded to the server, and then the second gradient data is gradient-compressed and uploaded to the server, thereby converting a gradient data that occupies a large memory into a gradient data that occupies a small Gradient data in memory, thereby reducing the communication overhead occupied by each gradient data transmission.
此外,本发明实施例还提供一种基于联邦学习的梯度压缩装置。In addition, an embodiment of the present invention also provides a gradient compression device based on federated learning.
参照图3,图3为本发明基于联邦学习的梯度压缩装置第一实施例的功能模块示意图。Referring to FIG. 3 , FIG. 3 is a schematic diagram of functional modules of the first embodiment of the federated learning-based gradient compression device of the present invention.
本实施例中,所述基于联邦学习的梯度压缩装置包括:In this embodiment, the gradient compression device based on federated learning includes:
梯度数据获取模块,用于获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;A gradient data acquisition module, configured to acquire gradient data to be transmitted, and use gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;
梯度数据压缩模块,用于将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;A gradient data compression module, configured to use gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compress the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy. Each gradient data in the second gradient data is compressed;
梯度数据上传模块,用于将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。The gradient data uploading module is configured to upload the first gradient data and the compressed second gradient data to the server.
进一步地,所述梯度数据压缩模块具体包括:Further, the gradient data compression module specifically includes:
4比特位压缩单元,用于在所述第二梯度数据的方差不小于预设方差阈值时,根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;A 4-bit compression unit, configured to compress each gradient data in the second gradient data according to the 4-bit compression strategy when the variance of the second gradient data is not less than a preset variance threshold;
2比特位压缩单元,用于在所述第二梯度数据的方差小于所述方差阈值时,根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩。A 2-bit compression unit, configured to compress each gradient data in the second gradient data according to the 2-bit compression strategy when the variance of the second gradient data is smaller than the variance threshold.
进一步地,所述2比特位压缩单元具体包括:Further, the 2-bit compression unit specifically includes:
2比特位压缩子单元,用于将所述第二梯度数据的平均梯度值,作为第一压缩阈值,并将所述第二梯度数据中的各个梯度数据分别压缩为0、所述第一压缩阈值或所述第一压缩阈值的相反数,完成所述第二梯度数据中每个梯度数据的压缩。A 2-bit compression subunit, configured to use the average gradient value of the second gradient data as a first compression threshold, and compress each gradient data in the second gradient data to 0, the first compression Threshold or the opposite number of the first compression threshold to complete the compression of each gradient data in the second gradient data.
进一步地,所述4比特位压缩单元具体包括:Further, the 4-bit compression unit specifically includes:
阈值组确定子单元,用于根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组;A threshold group determining subunit, configured to determine a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;
梯度数据压缩子单元,用于将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩。The gradient data compression subunit is configured to compress each gradient data in the second gradient data to a corresponding compression threshold in the compression threshold group, and complete the compression of each gradient data in the second gradient data.
进一步地,所述阈值组确定子单元具体还用于:Further, the threshold group determination subunit is also specifically used for:
根据预设差值以及所述平均梯度值,确定所述第二梯度数据对应的各个第二压缩阈值;determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;
根据各个第二压缩阈值以及各个第二压缩阈值对应的相反数,生成所述压缩阈值组。The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
进一步地,所述梯度数据压缩子单元还用于:Further, the gradient data compression subunit is also used for:
将所述第二梯度数据中的一个梯度数据,作为目标梯度数据;using one gradient data in the second gradient data as target gradient data;
将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值;Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;
将所述目标梯度数据压缩为所述目标压缩阈值,并获取所述第二梯度数据中的下一个梯度数据,作为所述目标梯度数据,并执行:将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值,直至完成所述第二梯度数据中每个梯度数据的压缩。Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
进一步地,所述基于联邦学习的梯度压缩装置还包括:Further, the gradient compression device based on federated learning also includes:
梯度数据排序模块,用于在所述第一梯度数据的数据量超过所述数据量阈值时,将所述待传输梯度数据中的各个梯度数据按照梯度值进行排序;A gradient data sorting module, configured to sort each gradient data in the gradient data to be transmitted according to gradient values when the data volume of the first gradient data exceeds the data volume threshold;
梯度数据更新模块,用于在排序后的各个梯度数据中获取目标数据量的梯度数据,作为更新后的第一梯度数据,其中,所述目标数据量不大于所述数据量阈值。A gradient data update module, configured to obtain gradient data of a target data amount from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
其中,上述基于联邦学习的梯度压缩装置中各个模块与上述基于联邦学习的梯度压缩方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。Wherein, each module in the above-mentioned federated learning-based gradient compression device corresponds to each step in the above-mentioned federated learning-based gradient compression method embodiment, and its functions and implementation processes will not be repeated here.
此外,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。In addition, an embodiment of the present invention also provides a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
本发明计算机可读存储介质上存储有基于联邦学习的梯度压缩程序,其中所述基于联邦学习的梯度压缩程序被处理器执行时,实现如上述的基于联邦学习的梯度压缩方法的步骤。The computer-readable storage medium of the present invention stores a federated learning-based gradient compression program, wherein when the federated learning-based gradient compression program is executed by a processor, the steps of the above-mentioned federated learning-based gradient compression method are implemented.
其中,基于联邦学习的梯度压缩程序被执行时所实现的方法可参照本发明基于联邦学习的梯度压缩方法的各个实施例,此处不再赘述。For the method implemented when the federated learning-based gradient compression program is executed, reference may be made to various embodiments of the federated learning-based gradient compression method of the present invention, which will not be repeated here.
本发明提供一种基于联邦学习的梯度压缩方法、装置、设备及计算机可读存储介质,所述方法包括:获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;将所述第一梯度数据以及压缩后的第二梯度数据上传 至服务器。通过上述方式,本发明根据梯度值的大小,筛选出重要度高的第一梯度数据,然后将重要度高的梯度数据完整上传,并将重要度较低的第二梯度数据按照2比特位压缩策略或4比特位压缩策略进行压缩。由此,在保证模型建模准确性的同时,降低了传输梯度的数据量,提高了梯度传输效率,解决了现有联邦学习建模过程中传输梯度数据量大导致的梯度传输效率低下的技术问题。The present invention provides a gradient compression method, device, device and computer-readable storage medium based on federated learning. The method includes: acquiring gradient data to be transmitted, and setting the gradient value in the gradient data to be transmitted not less than the preset gradient The gradient data of the threshold is used as the first gradient data; the gradient data other than the first gradient data in the gradient data to be transmitted is used as the second gradient data, and according to the 2-bit compression strategy or the 4-bit compression strategy, the Each gradient data in the second gradient data is compressed; uploading the first gradient data and the compressed second gradient data to the server. Through the above method, the present invention screens out the first gradient data with high importance according to the magnitude of the gradient value, then uploads the gradient data with high importance completely, and compresses the second gradient data with low importance according to 2 bits strategy or 4-bit compression strategy for compression. Therefore, while ensuring the accuracy of model modeling, the amount of transmitted gradient data is reduced, the gradient transmission efficiency is improved, and the existing technology of inefficient gradient transmission caused by the large amount of transmitted gradient data in the modeling process of federated learning is solved. question.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, as used herein, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or system comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or system. Without further limitations, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system comprising that element.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present invention.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process conversion made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields , are all included in the scope of patent protection of the present invention in the same way.

Claims (20)

  1. 一种基于联邦学习的梯度压缩方法,其中,所述基于联邦学习的梯度压缩方法包括以下步骤:A gradient compression method based on federated learning, wherein the gradient compression method based on federated learning comprises the following steps:
    获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;
    将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;
    将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Upload the first gradient data and the compressed second gradient data to the server.
  2. 如权利要求1所述的基于联邦学习的梯度压缩方法,其中,所述根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤具体包括:The gradient compression method based on federated learning according to claim 1, wherein the step of compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy specifically includes :
    在所述第二梯度数据的方差不小于预设方差阈值时,根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;
    在所述第二梯度数据的方差小于所述方差阈值时,根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩。When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
  3. 如权利要求2所述的基于联邦学习的梯度压缩方法,其中,所述根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤包括:The gradient compression method based on federated learning according to claim 2, wherein the step of compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:
    将所述第二梯度数据的平均梯度值,作为第一压缩阈值,并将所述第二梯度数据中的各个梯度数据分别压缩为0、所述第一压缩阈值或所述第一压缩阈值的相反数,完成所述第二梯度数据中每个梯度数据的压缩。Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
  4. 如权利要求2所述的基于联邦学习的梯度压缩方法,其中,所述根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩的步骤具体包括:The gradient compression method based on federated learning according to claim 2, wherein the step of compressing each gradient data in the second gradient data according to the 4-bit compression strategy specifically comprises:
    根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组;determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;
    将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩。Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
  5. 如权利要求4所述的基于联邦学习的梯度压缩方法,其中,所述根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组的步骤具体包括:The gradient compression method based on federated learning according to claim 4, wherein the step of determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data specifically comprises:
    根据预设差值以及所述平均梯度值,确定所述第二梯度数据对应的各个第二压缩阈值;determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;
    根据各个第二压缩阈值以及各个第二压缩阈值对应的相反数,生成所述压缩阈值组。The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
  6. 如权利要求5所述的基于联邦学习的梯度压缩方法,其中,所述将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩的步骤包括:The gradient compression method based on federated learning according to claim 5, wherein said compressing each gradient data in said second gradient data into corresponding compression thresholds in said compression threshold group respectively completes said second The steps of compressing each gradient data in the gradient data include:
    将所述第二梯度数据中的一个梯度数据,作为目标梯度数据;using one gradient data in the second gradient data as target gradient data;
    将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值;Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;
    将所述目标梯度数据压缩为所述目标压缩阈值,并获取所述第二梯度数据中的下一个梯度数据,作为所述目标梯度数据,并执行:将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值,直至完成所述第二梯度数据中每个梯度数据的压缩。Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
  7. 如权利要求1至6任意一项所述的基于联邦学习的梯度压缩方法,其中,所述获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据的步骤之后,还包括:The gradient compression method based on federated learning according to any one of claims 1 to 6, wherein the acquisition of the gradient data to be transmitted, and the gradient data whose gradient value is not less than the preset gradient threshold in the gradient data to be transmitted After the first gradient data step, also include:
    在所述第一梯度数据的数据量超过所述数据量阈值时,将所述待传输梯度数据中的各 个梯度数据按照梯度值进行排序;When the data volume of the first gradient data exceeds the data volume threshold, sort each gradient data in the gradient data to be transmitted according to the gradient value;
    在排序后的各个梯度数据中获取目标数据量的梯度数据,作为更新后的第一梯度数据,其中,所述目标数据量不大于所述数据量阈值。Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
  8. 一种基于联邦学习的梯度压缩装置,其中,所述基于联邦学习的梯度压缩装置包括:A gradient compression device based on federated learning, wherein the gradient compression device based on federated learning includes:
    梯度数据获取模块,用于获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;A gradient data acquisition module, configured to acquire gradient data to be transmitted, and use gradient data whose gradient value is not less than a preset gradient threshold among the gradient data to be transmitted as the first gradient data;
    梯度数据压缩模块,用于将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;A gradient data compression module, configured to use gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compress the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy. Each gradient data in the second gradient data is compressed;
    梯度数据上传模块,用于将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。The gradient data uploading module is configured to upload the first gradient data and the compressed second gradient data to the server.
  9. 一种基于联邦学习的梯度压缩设备,其中,所述基于联邦学习的梯度压缩设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的基于联邦学习的梯度压缩程序,其中所述基于联邦学习的梯度压缩程序被所述处理器执行时,实现:A federated learning-based gradient compression device, wherein the federated learning-based gradient compression device includes a processor, a memory, and a federated learning-based gradient compression program stored on the memory and executable by the processor , wherein when the gradient compression program based on federated learning is executed by the processor, it can realize:
    获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;
    将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;
    将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Upload the first gradient data and the compressed second gradient data to the server.
  10. 根据权利要求9所述的基于联邦学习的梯度压缩设备,其中,所述根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The gradient compression device based on federated learning according to claim 9, wherein said compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy includes:
    在所述第二梯度数据的方差不小于预设方差阈值时,根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;
    在所述第二梯度数据的方差小于所述方差阈值时,根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩。When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
  11. 根据权利要求10所述的基于联邦学习的梯度压缩设备,其中,所述根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The gradient compression device based on federated learning according to claim 10, wherein said compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:
    将所述第二梯度数据的平均梯度值,作为第一压缩阈值,并将所述第二梯度数据中的各个梯度数据分别压缩为0、所述第一压缩阈值或所述第一压缩阈值的相反数,完成所述第二梯度数据中每个梯度数据的压缩。Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
  12. 根据权利要求10所述的基于联邦学习的梯度压缩设备,其中,所述根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The gradient compression device based on federated learning according to claim 10, wherein said compressing each gradient data in the second gradient data according to the 4-bit compression strategy includes:
    根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组;determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;
    将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩。Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
  13. 根据权利要求12所述的基于联邦学习的梯度压缩设备,其中,所述根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组,包括:The gradient compression device based on federated learning according to claim 12, wherein said determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data includes:
    根据预设差值以及所述平均梯度值,确定所述第二梯度数据对应的各个第二压缩阈值;determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;
    根据各个第二压缩阈值以及各个第二压缩阈值对应的相反数,生成所述压缩阈值组。The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
  14. 根据权利要求13所述的基于联邦学习的梯度压缩设备,其中,所述将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩,包括:The gradient compression device based on federated learning according to claim 13, wherein said compressing each gradient data in said second gradient data into corresponding compression thresholds in said compression threshold group respectively completes said second The compression of each gradient data in the gradient data, including:
    将所述第二梯度数据中的一个梯度数据,作为目标梯度数据;using one gradient data in the second gradient data as target gradient data;
    将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值;Comparing the target gradient data with each second compression threshold sequentially arranged in the compression threshold group in turn, and determining a target compression threshold corresponding to the target gradient data among each of the second compression thresholds;
    将所述目标梯度数据压缩为所述目标压缩阈值,并获取所述第二梯度数据中的下一个梯度数据,作为所述目标梯度数据,并执行:将所述目标梯度数据依次与所述压缩阈值组中按序排列的各个第二压缩阈值进行比对,并在各个所述第二压缩阈值中确定所述目标梯度数据对应的目标压缩阈值,直至完成所述第二梯度数据中每个梯度数据的压缩。Compressing the target gradient data to the target compression threshold, and obtaining the next gradient data in the second gradient data as the target gradient data, and performing: sequentially compressing the target gradient data with the compressed Comparing the second compression thresholds arranged in sequence in the threshold group, and determining the target compression threshold corresponding to the target gradient data in each of the second compression thresholds, until each gradient in the second gradient data is completed Data compression.
  15. 根据权利要求9-14中任意一项所述的基于联邦学习的梯度压缩设备,其中,所述获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据之后,还包括:The gradient compression device based on federated learning according to any one of claims 9-14, wherein the acquisition of the gradient data to be transmitted, and the gradient value of the gradient value in the gradient data to be transmitted are not less than the preset gradient threshold After the data is used as the first gradient data, it also includes:
    在所述第一梯度数据的数据量超过所述数据量阈值时,将所述待传输梯度数据中的各个梯度数据按照梯度值进行排序;When the data volume of the first gradient data exceeds the data volume threshold, sort the gradient data in the gradient data to be transmitted according to the gradient value;
    在排序后的各个梯度数据中获取目标数据量的梯度数据,作为更新后的第一梯度数据,其中,所述目标数据量不大于所述数据量阈值。Gradient data of a target data amount is obtained from each sorted gradient data as updated first gradient data, wherein the target data amount is not greater than the data amount threshold.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有基于联邦学习的梯度压缩程序,其中所述基于联邦学习的梯度压缩程序被处理器执行时,实现:A computer-readable storage medium, wherein a federated learning-based gradient compression program is stored on the computer-readable storage medium, wherein when the federated learning-based gradient compression program is executed by a processor, it realizes:
    获取待传输梯度数据,并将所述待传输梯度数据中梯度值不小于预设梯度阈值的梯度数据作为第一梯度数据;Obtaining the gradient data to be transmitted, and using the gradient data whose gradient value is not less than a preset gradient threshold in the gradient data to be transmitted as the first gradient data;
    将所述待传输梯度数据中除所述第一梯度数据之外的梯度数据作为第二梯度数据,并根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;Taking gradient data other than the first gradient data in the gradient data to be transmitted as second gradient data, and compressing each of the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy Gradient data is compressed;
    将所述第一梯度数据以及压缩后的第二梯度数据上传至服务器。Upload the first gradient data and the compressed second gradient data to the server.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述根据2比特位压缩策略或4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The computer-readable storage medium according to claim 16, wherein said compressing each gradient data in the second gradient data according to a 2-bit compression strategy or a 4-bit compression strategy comprises:
    在所述第二梯度数据的方差不小于预设方差阈值时,根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩;When the variance of the second gradient data is not less than a preset variance threshold, compress each gradient data in the second gradient data according to the 4-bit compression strategy;
    在所述第二梯度数据的方差小于所述方差阈值时,根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩。When the variance of the second gradient data is smaller than the variance threshold, each gradient data in the second gradient data is compressed according to the 2-bit compression strategy.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述根据所述2比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The computer-readable storage medium according to claim 17, wherein the compressing each gradient data in the second gradient data according to the 2-bit compression strategy comprises:
    将所述第二梯度数据的平均梯度值,作为第一压缩阈值,并将所述第二梯度数据中的各个梯度数据分别压缩为0、所述第一压缩阈值或所述第一压缩阈值的相反数,完成所述第二梯度数据中每个梯度数据的压缩。Taking the average gradient value of the second gradient data as the first compression threshold, and compressing each gradient data in the second gradient data to 0, the first compression threshold, or the first compression threshold Negative numbers, completing the compression of each gradient data in the second gradient data.
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述根据所述4比特位压缩策略将所述第二梯度数据中的每个梯度数据进行压缩,包括:The computer-readable storage medium according to claim 17, wherein said compressing each gradient data in the second gradient data according to the 4-bit compression strategy comprises:
    根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组;determining a compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data;
    将所述第二梯度数据中的各个梯度数据分别压缩为所述压缩阈值组中的对应压缩阈值,完成所述第二梯度数据中每个梯度数据的压缩。Each gradient data in the second gradient data is respectively compressed to a corresponding compression threshold in the compression threshold group, and the compression of each gradient data in the second gradient data is completed.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述根据所述第二梯度数据的平均梯度值,确定所述第二梯度数据对应的压缩阈值组,包括:The computer-readable storage medium according to claim 19, wherein said determining the compression threshold group corresponding to the second gradient data according to the average gradient value of the second gradient data comprises:
    根据预设差值以及所述平均梯度值,确定所述第二梯度数据对应的各个第二压缩阈值;determining respective second compression thresholds corresponding to the second gradient data according to the preset difference value and the average gradient value;
    根据各个第二压缩阈值以及各个第二压缩阈值对应的相反数,生成所述压缩阈值组。The set of compression thresholds is generated according to each second compression threshold and an inverse number corresponding to each second compression threshold.
PCT/CN2022/089866 2022-01-14 2022-04-28 Gradient compression method and apparatus, device, and storage medium WO2023134065A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210044216.2A CN114386622A (en) 2022-01-14 2022-01-14 Gradient compression method, device, equipment and storage medium
CN202210044216.2 2022-01-14

Publications (1)

Publication Number Publication Date
WO2023134065A1 true WO2023134065A1 (en) 2023-07-20

Family

ID=81201355

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089866 WO2023134065A1 (en) 2022-01-14 2022-04-28 Gradient compression method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114386622A (en)
WO (1) WO2023134065A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386622A (en) * 2022-01-14 2022-04-22 平安科技(深圳)有限公司 Gradient compression method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951438A (en) * 2019-01-15 2019-06-28 中国科学院信息工程研究所 A kind of communication optimization method and system of distribution deep learning
CN112817653A (en) * 2021-01-22 2021-05-18 西安交通大学 Cloud-side-based federated learning calculation unloading computing system and method
US20210295168A1 (en) * 2020-03-23 2021-09-23 Amazon Technologies, Inc. Gradient compression for distributed training
CN113487036A (en) * 2021-06-24 2021-10-08 浙江大学 Distributed training method and device of machine learning model, electronic equipment and medium
CN114386622A (en) * 2022-01-14 2022-04-22 平安科技(深圳)有限公司 Gradient compression method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951438A (en) * 2019-01-15 2019-06-28 中国科学院信息工程研究所 A kind of communication optimization method and system of distribution deep learning
US20210295168A1 (en) * 2020-03-23 2021-09-23 Amazon Technologies, Inc. Gradient compression for distributed training
CN112817653A (en) * 2021-01-22 2021-05-18 西安交通大学 Cloud-side-based federated learning calculation unloading computing system and method
CN113487036A (en) * 2021-06-24 2021-10-08 浙江大学 Distributed training method and device of machine learning model, electronic equipment and medium
CN114386622A (en) * 2022-01-14 2022-04-22 平安科技(深圳)有限公司 Gradient compression method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114386622A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2019200714A1 (en) Server connection method, computer readable storage medium, terminal device, and apparatus
TWI477981B (en) System and method for avoiding data parts stored in servers repeatedly
WO2021036810A1 (en) Evidence verification method, system, apparatus and device, and readable storage medium
CN110535869B (en) Data transmission method based on compression algorithm, terminal equipment and storage medium
CN107249035B (en) Shared repeated data storage and reading method with dynamically variable levels
WO2023134065A1 (en) Gradient compression method and apparatus, device, and storage medium
CN105518673B (en) Managing data ingestion
CN106156037B (en) Data processing method, apparatus and system
WO2020124374A1 (en) Image processing method, terminal device and storage medium
US20130066872A1 (en) Method and Apparatus for Organizing Images
CN112966832A (en) Multi-server-based federal learning system
WO2016101663A1 (en) Image compression method and device
CN108512817B (en) Multi-video transcoding scheduling method and device
US20230004776A1 (en) Moderator for identifying deficient nodes in federated learning
CN114861790A (en) Method, system and device for optimizing federal learning compression communication
CN116975733A (en) Traffic classification system, model training method, device, and storage medium
CN111415311A (en) Resource-saving image quality enhancement model
US11463556B2 (en) Systems and methods for packet-based file compression and storage
TW201624322A (en) System and method of classifying images
CN110290381B (en) Video quality evaluation method and device, electronic equipment and computer storage medium
US9536199B1 (en) Recommendations based on device usage
CN114650422A (en) Video frame encoding method, video frame encoding device, electronic equipment and computer readable medium
CN108347451B (en) Picture processing system, method and device
KR20220021495A (en) Method for improving the resolution of streaming files based on AI
CN111680754A (en) Image classification method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22919706

Country of ref document: EP

Kind code of ref document: A1