CN114548421A - Optimization processing method and device for federal learning communication overhead - Google Patents

Optimization processing method and device for federal learning communication overhead Download PDF

Info

Publication number
CN114548421A
CN114548421A CN202210023353.8A CN202210023353A CN114548421A CN 114548421 A CN114548421 A CN 114548421A CN 202210023353 A CN202210023353 A CN 202210023353A CN 114548421 A CN114548421 A CN 114548421A
Authority
CN
China
Prior art keywords
gradient
quantization
local client
round
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210023353.8A
Other languages
Chinese (zh)
Inventor
刘洋
丁文伯
赵子号
毛钰竹
黄绍伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210023353.8A priority Critical patent/CN114548421A/en
Publication of CN114548421A publication Critical patent/CN114548421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an optimization processing method and device for federal learning communication overhead. The method comprises the following steps: distributing the initial global model to a local client to obtain a target quantization grade of the current turn, which is obtained by the local client based on the adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on an inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current round based on a target quantization level by the local client; and carrying out aggregation processing on the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round. According to the method provided by the invention, the quantization grade is dynamically adjusted through the self-adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, and the communication overhead is reduced.

Description

Optimization processing method and device for federal learning communication overhead
Technical Field
The invention relates to the technical field of artificial intelligence analysis, in particular to an optimization processing method and device for federal learning communication overhead. In addition, an electronic device and a processor-readable storage medium are also related.
Background
In the federal learning process, the local client and the central server need to interact for multiple times to obtain a global model meeting the precision condition. For a complex model training process, such as training of a deep learning model, each model update may contain a large number of model parameters, so that the federal learning communication overhead efficiency is low, and therefore, research on improving the federal learning communication efficiency is of great value.
Currently, in the prior art, in order to improve the federal learning communication efficiency, a mode of reducing the number of model transmissions or reducing the number of bits uploaded by a client each time is generally adopted. Although the two methods for optimizing communication efficiency can reduce communication overhead to a certain extent, each of the two methods has limitations, such as determining quantization levels only by more experience or grid search. Therefore, how to design an efficient optimization processing scheme aiming at the federal learning communication overhead becomes a difficult problem to be solved urgently.
Disclosure of Invention
Therefore, the invention provides an optimization processing method and device aiming at the federal learning communication overhead, and aims to overcome the defect that the federal learning communication efficiency is poor due to the fact that an optimization processing scheme aiming at the federal learning communication overhead is high in limitation in the prior art.
In a first aspect, the present invention provides an optimization processing method for federal learning communication overhead, which is applied to a central server, and includes:
distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, determining whether to acquire a quantization gradient corresponding to the local client in the current turn based on a preset inert gradient aggregation model specifically includes:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, aggregating the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round, specifically including:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
Figure BDA0003463507020000021
in the formula,
Figure BDA0003463507020000022
representing the mth local clientTarget quantization levels in the k-th round, wherein
Figure BDA0003463507020000023
Representing a quantization level; b0Representing an initial quantization level corresponding to each local client;
Figure BDA0003463507020000024
representing an adaptive process, obtained adaptively in model training, wherein
Figure BDA0003463507020000025
Representing the quantization gradient corresponding to the mth local client at round 1,
Figure BDA0003463507020000031
representing the quantization gradient corresponding to the mth local client in the kth round,
Figure BDA0003463507020000032
representing the quantization gradient actually uploaded by the mth local client in the 0 th round;
Figure BDA0003463507020000033
representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
Further, the inert gradient polymerization model corresponds to the following formula (2):
Figure BDA0003463507020000034
in the formula,
Figure BDA0003463507020000035
indicating that the mth local client uses the target quantization level in the kth round
Figure BDA0003463507020000036
Dequantization of a gradient
Figure BDA0003463507020000037
Obtaining a quantization gradient;
Figure BDA0003463507020000038
represent the quantization gradient actually used by the mth local client in the k-1 th round
Figure BDA0003463507020000039
Dequantization of a gradient
Figure BDA00034635070200000310
Obtaining a true gradient;
Figure BDA00034635070200000311
the difference value is used for measuring the k-th round updating quantity of the mth local client;
Figure BDA00034635070200000312
Figure BDA00034635070200000313
representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;
Figure BDA00034635070200000314
representing the difference between the true gradient and the quantized gradient;
Figure BDA00034635070200000315
representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
In a second aspect, the present invention further provides an optimization processing method for federal learning communication overhead, which is applied to a local client, and includes:
acquiring an initial global model distributed by a central server;
determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
In a third aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:
the system comprises a quantization grade self-adaptive determination unit, a target quantization grade determination unit and a target quantization grade determination unit, wherein the quantization grade self-adaptive determination unit is used for distributing an initial global model to a local client and obtaining a target quantization grade of a current turn, which is obtained by the local client based on a preset self-adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
the inertia gradient aggregation processing unit is used for determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on a preset inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and the quantization gradient aggregation processing unit is used for performing aggregation processing on the quantization gradients and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, the inert gradient polymerization processing unit is specifically configured to:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, the quantization gradient aggregation processing unit is specifically configured to:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
Figure BDA0003463507020000041
in the formula,
Figure BDA0003463507020000042
representing a target quantization level of the mth local client in the kth round, wherein
Figure BDA0003463507020000043
Representing a quantization level; b0Representing an initial quantization level corresponding to each local client;
Figure BDA0003463507020000044
representing an adaptive process, obtained adaptively in model training, wherein
Figure BDA0003463507020000045
Representing the quantization gradient corresponding to the mth local client at round 1,
Figure BDA0003463507020000051
representing the quantization gradient corresponding to the mth local client in the kth round,
Figure BDA0003463507020000052
representing the quantization gradient actually uploaded by the mth local client in the 0 th round;
Figure BDA0003463507020000053
representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
Further, the inert gradient polymerization model corresponds to the following formula (2):
Figure BDA0003463507020000054
in the formula,
Figure BDA0003463507020000055
indicating that the mth local client uses the target quantization level in the kth round
Figure BDA0003463507020000056
Dequantization of a gradient
Figure BDA0003463507020000057
Obtaining a quantization gradient;
Figure BDA0003463507020000058
represent the quantization gradient actually used by the mth local client in the k-1 th round
Figure BDA0003463507020000059
Dequantization of a gradient
Figure BDA00034635070200000510
Obtaining a true gradient;
Figure BDA00034635070200000511
the difference value is used for measuring the k-th round updating quantity of the mth local client;
Figure BDA00034635070200000512
Figure BDA00034635070200000513
representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;
Figure BDA00034635070200000514
representing the difference of the true gradient and the quantized gradient;
Figure BDA00034635070200000515
representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
In a fourth aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:
the global model obtaining unit is used for obtaining an initial global model distributed by the central server;
the quantization grade self-adaptive processing unit is used for determining a target quantization grade of the current round based on a preset self-adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and the quantization gradient uploading unit is used for uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
In a fifth aspect, the present invention further provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the optimization processing method for the federal learning communication overhead.
In a sixth aspect, the present invention further provides a processor-readable storage medium, which stores thereon a computer program, which when executed by a processor implements the steps of the optimization processing method for federal learning communication overhead as set forth in any one of the above.
According to the optimization processing method for the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is one of the flow diagrams of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;
fig. 2 is a complete schematic diagram of a federal learning framework corresponding to the optimization processing method for federal learning communication overhead provided by the embodiment of the present invention;
fig. 3 is a schematic diagram of gradient quantization in an optimization processing method for federal learning communication overhead according to an embodiment of the present invention;
fig. 4 is one of schematic structural diagrams of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;
fig. 5 is a second flowchart of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;
fig. 6 is a second schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;
fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the optimization problem of the federal learning communication overhead, the invention provides an optimization processing method aiming at the federal learning communication overhead, namely, a communication cooperative learning framework AUQILA (Adaptive quantization of learning-aggregated granules) obtained based on the Adaptive model quantization of an inertia gradient aggregation strategy is applied to the federal learning framework to reduce the overall communication traffic of a local client, thereby better realizing the high-efficiency transmission in the federal learning.
The following describes an embodiment of the method for optimizing federal learning communication overhead based on the present invention in detail. As shown in fig. 1, which is one of the flow diagrams of the optimization processing method for federal learning communication overhead provided in the embodiment of the present invention, a specific implementation process includes the following steps:
step 101: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; and the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn.
In a federal learning scenario, different local clients can perform quantization processing on the gradient through different determined quantization levels, so that different quantization gradients are obtained. In the embodiment of the present invention, the application of the communication collaborative learning framework mainly includes that an optimal quantization bit is selected for each communication round in laq (lazy Aggregation learning) by optimizing the gradient loss caused by skipping quantization update.
As shown in fig. 2, it illustrates a training process of a communication collaborative learning framework, which includes an adaptive gradient Quantization model (Quantization level selection) and an inert gradient Aggregation model (Lazy Aggregation history). At each training round k: the central server distributes the initial global model to each local client in advance in a broadcasting mode; the local client m calculates the target quantization level according to the formula (1) corresponding to the following adaptive gradient quantization model
Figure BDA0003463507020000081
Figure BDA0003463507020000082
In the formula,
Figure BDA0003463507020000083
represents the target quantization level (optimal quantization level) of the mth local client in the kth round, wherein
Figure BDA0003463507020000084
Representing a quantization level; b0Indicating an initial quantization level corresponding to each local client, for example, the initial quantization level is 2;
Figure BDA0003463507020000085
the self-adaptive process is represented and is obtained by self-adaptation in training; wherein,
Figure BDA0003463507020000086
the corresponding quantization gradient of the mth local client in the 1 st round,
Figure BDA0003463507020000087
the corresponding quantization gradient at the kth round is applied to the mth local client,
Figure BDA0003463507020000088
the quantization gradient actually uploaded by the mth local client in the 0 th round;
Figure BDA0003463507020000089
the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client. If the target quantization level is higher than the preset threshold bmaxThen will be
Figure BDA00034635070200000810
Is set as bmax
In the embodiment of the invention, after the initial global model is distributed to the local client, the target quantization grade of the current round calculated by the local client based on the preset adaptive gradient quantization model and the dynamically adjusted quantization bit number is obtained. After the target quantization level is determined, the central server can obtain the target quantization level of the current turn, which is obtained by the local client based on a preset adaptive gradient quantization model.
Step 102: determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; and the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level.
In the embodiment of the present invention, based on a preset inert gradient aggregation model, it is determined whether to acquire a quantization gradient corresponding to the local client in the current turn, and a specific implementation process includes: and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client. The local client is a client used for realizing data communication with the central server in the federal learning architecture. The target quantization levels correspond to the number of quantization bits, and the quantization levels include corresponding target quantization levels.
In this step, the central server determines whether the local client uploads the quantization gradient in the current turn according to the following formula (2) corresponding to the inertia gradient aggregation model.
Figure BDA0003463507020000091
In the formula,
Figure BDA0003463507020000092
indicating that the mth local client uses the target quantization level in the kth turn
Figure BDA0003463507020000093
Dequantization of a gradient
Figure BDA0003463507020000094
Obtaining a quantization gradient;
Figure BDA0003463507020000095
representing the quantization gradient actually used by the mth local client (or client m) in the k-1 th round;
Figure BDA0003463507020000096
the difference value of the k-th round update quantity (the difference value of the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round) of the mth local client is used for weighing;
Figure BDA0003463507020000097
representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;
Figure BDA0003463507020000098
Figure BDA0003463507020000099
representing the difference between the true gradient and the quantized gradient;
Figure BDA00034635070200000910
representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, ξdA penalty term (from 32 bits to 2 bits with partial loss) is preset for quantizing the gradient, and α is a preset hyperparameter.
If the above equation (2) holds, the quantized quantization gradient Q is calculated by the corresponding local client using the following equation (3)b(g) Obtaining the uploaded quantization gradient Q by the central serverb(g)。
Qb(gi)=||g||2·sign(gi)·ξi(g,b) (3)
In the formula, giRepresenting the gradient (vector), i representing the indices of several components of a gradient, each component of this gradient being subjected to the above-mentioned quantization operation, resulting in a Qb(g) Or Qb(gi);||g||2L representing a gradient g2A norm; sign (g)i) Is giFor example, positive equals 1, negative equals-1; xii(g, b) represents a probability.
If the above formula (2) does not hold, the corresponding local client does not upload the quantization gradient Q in the current turnb(g) Thereby achieving the purpose of transmitting the model parameters (or gradients) with larger update degree by using less quantization bits.
In the embodiment of the present invention, QSGD (Quantized Stochastic Gradient Descent) is used for quantization in AQUILA, and the quantization formula is shown in the following formula (4), wherein l is equal to [0, b ]]Is an integer such that | gi|/||g||2∈[l/b,(l+1)/b]. For each component g of the gradientiThe target gradient and the original gradient are estimated unbiased by using QSGD, wherein the QSGD can approximate to one endpoint of the subinterval with a certain probability each time, and the target gradient and the original gradient are estimated unbiased by using QSGD. To enable a more intuitive understanding of the QSGD, the example of fig. 3 was devised. In fig. 3, the target quantization scale is 5: the interval has 5 endpoints 0, 0.25, 0.5, 0.75, 1. If a component of the original gradient has a value of 0.6, then it has a probability of 0.6 being mapped to 0.5 and a probability of 0.4 being mapped to 0.75.
Figure BDA0003463507020000101
Step 103: and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
In the embodiment of the present invention, the quantization gradient is aggregated, and the initial global model is updated according to the aggregation result to obtain a target global model corresponding to the next round, and the specific implementation process includes: polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced; and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
In practical implementation, the central server receives the quantized gradients Q from the local clientsb(g) The quantization gradients are aggregated and used to update the initial global model using the following equation (5):
Figure BDA0003463507020000111
in the formula, α represents a learning rate or a learning step; thetakAn initial global model of the kth round (corresponding to the current round), i.e. a global parameter representing the kth round; thetak+1A target global model representing the updated k +1 round (equivalent to the next round); m represents data of a local client, and M represents the local client;
Figure BDA0003463507020000112
representing a gradient decrease after which θ is obtainedk+1I.e. the target global model of round k + 1; α is the model learning rate.
It should be noted that, in the embodiment of the present invention, the quantization level or the target quantization level indicates how many quantization bits are used to dequantize a target to be quantized (for example, a gradient, and a quantization gradient is obtained by dequantizing a gradient with one quantization level). The quantization bits are how many bits are used for uploading the gradient (vector) of the gradient (for example, the quantization bits may be 32 bits, and the quantization level may be 2 bits). The quantization gradient is obtained by putting the gradient into the quantization model (quantization frame) to be dequantized. It can be understood that: the gradient is a vector (including several components), each component in the vector is taken as a quantization bit, for example, each component is originally 32 bits, and is used for a quantization level 2 to dequantize, which finally obtains a quantization bit 2 bits, that is, a quantization gradient obtained by the gradient.
According to the optimization processing method for the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides an optimization processing device for the federal learning communication overhead. Since the embodiment of the device is similar to the method embodiment described above, the description is relatively simple, and for relevant points, reference may be made to the description in the above method embodiment section, and the following description of the embodiment of the optimization processing device for federal learning communication overhead is only illustrative. Fig. 4 is a schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.
The invention relates to an optimization processing device aiming at federal learning communication overhead, which comprises the following parts:
a quantization level adaptive determination unit 401, configured to distribute an initial global model to a local client, and obtain a target quantization level of a current round that is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
an inertia gradient aggregation processing unit 402, configured to determine, based on a preset inertia gradient aggregation model, whether to obtain a quantization gradient corresponding to the local client in the current turn; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and a quantization gradient aggregation processing unit 403, configured to perform aggregation processing on the quantization gradients, and update the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, the inert gradient polymerization processing unit is specifically configured to:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, the quantization gradient aggregation processing unit is specifically configured to:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
Figure BDA0003463507020000131
in the formula,
Figure BDA0003463507020000132
representing a target quantization level of the mth local client in the kth round, wherein
Figure BDA0003463507020000133
Representing a quantization level; b0Representing an initial quantization level corresponding to each local client;
Figure BDA0003463507020000134
representing an adaptive process, obtained adaptively in model training, wherein
Figure BDA0003463507020000135
Representing the quantization gradient corresponding to the mth local client at round 1,
Figure BDA0003463507020000136
representing the quantization gradient corresponding to the mth local client in the kth round,
Figure BDA0003463507020000137
representing the quantization gradient actually uploaded by the mth local client in the 0 th round;
Figure BDA0003463507020000138
representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
Further, the inert gradient polymerization model corresponds to the following formula (2):
Figure BDA0003463507020000139
in the formula,
Figure BDA00034635070200001310
indicating that the mth local client uses the target quantization level in the kth round
Figure BDA00034635070200001311
Dequantization of a gradient
Figure BDA00034635070200001312
Obtaining a quantization gradient;
Figure BDA00034635070200001313
represent the quantization gradient actually used by the mth local client in the k-1 th round
Figure BDA00034635070200001314
Dequantization of a gradient
Figure BDA00034635070200001315
Obtaining a true gradient;
Figure BDA00034635070200001316
the difference value is used for measuring the k-th round updating quantity of the mth local client;
Figure BDA00034635070200001317
Figure BDA00034635070200001318
representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;
Figure BDA00034635070200001319
representing the difference between the true gradient and the quantized gradient;
Figure BDA00034635070200001320
representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
By adopting the optimization processing device aiming at the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Correspondingly, the invention also provides another optimized processing method and device aiming at the federal learning communication overhead, which corresponds to the optimized processing method and device aiming at the federal learning communication overhead provided by the invention. Since the embodiments of the method and the apparatus are similar to the embodiments of the method and the apparatus, the description is relatively simple, and please refer to the description in the above embodiments of the method and the apparatus, and the embodiments of the method and the apparatus for optimizing federal learning communication overhead described below are only schematic. Fig. 5 is a second flowchart illustrating an optimization processing method for federally learned communication overhead according to a second embodiment of the present invention.
Step 501: and acquiring an initial global model distributed by the central server.
Step 502: and determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round.
Step 503: and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
Fig. 6 is a schematic structural diagram of a second optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.
A global model obtaining unit 601, configured to obtain an initial global model distributed by a central server;
a quantization level adaptive processing unit 602, configured to determine a target quantization level of a current round based on a preset adaptive gradient quantization model, and quantize a gradient to be uploaded in the current round based on the target quantization level to obtain a quantization gradient corresponding to the current round;
a quantization gradient uploading unit 603, configured to upload the quantization gradient corresponding to the current turn to a central server, so as to implement aggregation processing on the quantization gradient in the central server.
Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor)701, a memory (memory)702 and a communication bus 703, wherein the processor 701 and the memory 702 communicate with each other through the communication bus 703 and communicate with the outside through a communication interface 704. Processor 701 may invoke logic instructions in memory 702 to perform a method of optimization for federally learned communications overhead, the method comprising: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and carrying out aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Furthermore, the logic instructions in the memory 702 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, and the computer program includes program instructions, where when the program instructions are executed by a computer, the computer is capable of executing the optimization processing method for the federal learning communication overhead provided by the above-mentioned method embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
In still another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when executed by a processor, the computer program is implemented to perform the optimization processing method for federal learning communication overhead provided in the foregoing embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An optimization processing method aiming at federal learning communication overhead is applied to a central server, and is characterized by comprising the following steps:
distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
2. The optimization processing method for federated learning communication overhead according to claim 1, wherein determining whether to acquire a quantization gradient corresponding to the local client in a current turn based on a preset inert gradient aggregation model specifically includes:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
3. The optimization processing method for the federal learning communication overhead according to claim 1, wherein the aggregating is performed on the quantization gradient, and the initial global model is updated according to an aggregation result to obtain a target global model corresponding to the next round, specifically comprising:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
4. The optimization processing method for federal learning communication overhead as claimed in claim 1, wherein the adaptive gradient quantization model corresponds to formula (1) as follows:
Figure FDA0003463507010000011
in the formula,
Figure FDA0003463507010000021
representing a target quantization level of the mth local client in the kth round, wherein
Figure FDA0003463507010000022
Representing a quantization level; b0Representing an initial quantization level corresponding to each local client;
Figure FDA0003463507010000023
representing an adaptive process, obtained adaptively in model training, wherein
Figure FDA0003463507010000024
Representing the quantization gradient corresponding to the mth local client at round 1,
Figure FDA0003463507010000025
representing the quantization gradient corresponding to the mth local client in the kth round,
Figure FDA0003463507010000026
representing the quantization gradient actually uploaded by the mth local client in the 0 th round;
Figure FDA0003463507010000027
representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
5. The optimization processing method for federal learning communication overhead according to claim 2, wherein the inert gradient aggregation model corresponds to the following formula (2):
Figure FDA0003463507010000028
in the formula,
Figure FDA0003463507010000029
indicating that the mth local client uses the target quantization level in the kth round
Figure FDA00034635070100000210
Dequantization of a gradient
Figure FDA00034635070100000211
Obtaining a quantization gradient;
Figure FDA00034635070100000212
represent the quantization gradient actually used by the mth local client in the k-1 th round
Figure FDA00034635070100000213
Dequantization of a gradient
Figure FDA00034635070100000214
Obtaining a true gradient;
Figure FDA00034635070100000215
the difference value is used for measuring the k-th round updating quantity of the mth local client;
Figure FDA00034635070100000216
Figure FDA00034635070100000217
representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;
Figure FDA00034635070100000218
representing the difference between the true gradient and the quantized gradient;
Figure FDA00034635070100000219
representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
6. An optimization processing method for federal learning communication overhead is applied to a local client, and is characterized by comprising the following steps:
acquiring an initial global model distributed by a central server;
determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
7. An optimization processing apparatus for federally learned communication overhead, comprising:
the system comprises a quantization grade self-adaptive determination unit, a target quantization grade determination unit and a target quantization grade determination unit, wherein the quantization grade self-adaptive determination unit is used for distributing an initial global model to a local client and obtaining a target quantization grade of a current turn, which is obtained by the local client based on a preset self-adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
the inertia gradient aggregation processing unit is used for determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on a preset inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and the quantization gradient aggregation processing unit is used for performing aggregation processing on the quantization gradients and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
8. An optimization processing apparatus for federally learned communication overhead, comprising:
the global model obtaining unit is used for obtaining an initial global model distributed by the central server;
the quantization grade self-adaptive processing unit is used for determining a target quantization grade of the current round based on a preset self-adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and the quantization gradient uploading unit is used for uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for optimization of federal learned communications overhead of any of claims 1 to 6.
10. A processor-readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method for optimization of federally learned communications overhead of any of claims 1 to 6.
CN202210023353.8A 2022-01-10 2022-01-10 Optimization processing method and device for federal learning communication overhead Pending CN114548421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210023353.8A CN114548421A (en) 2022-01-10 2022-01-10 Optimization processing method and device for federal learning communication overhead

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210023353.8A CN114548421A (en) 2022-01-10 2022-01-10 Optimization processing method and device for federal learning communication overhead

Publications (1)

Publication Number Publication Date
CN114548421A true CN114548421A (en) 2022-05-27

Family

ID=81669352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210023353.8A Pending CN114548421A (en) 2022-01-10 2022-01-10 Optimization processing method and device for federal learning communication overhead

Country Status (1)

Country Link
CN (1) CN114548421A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024094094A1 (en) * 2022-11-02 2024-05-10 华为技术有限公司 Model training method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139662A (en) * 2021-04-23 2021-07-20 深圳市大数据研究院 Global and local gradient processing method, device, equipment and medium for federal learning
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113139662A (en) * 2021-04-23 2021-07-20 深圳市大数据研究院 Global and local gradient processing method, device, equipment and medium for federal learning
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANQIAO WANGNI 等: "Gradient sparsification for communication-efficient distributed optimization.", 《PROCEEDINGS OF THE 32ND INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》, 31 December 2018 (2018-12-31) *
董业;侯炜;陈小军;曾帅;: "基于秘密分享和梯度选择的高效安全联邦学习", 计算机研究与发展, no. 10, 9 October 2020 (2020-10-09) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024094094A1 (en) * 2022-11-02 2024-05-10 华为技术有限公司 Model training method and apparatus

Similar Documents

Publication Publication Date Title
CN112181666B (en) Equipment assessment and federal learning importance aggregation method based on edge intelligence
CN110942154B (en) Data processing method, device, equipment and storage medium based on federal learning
CN111091199B (en) Federal learning method, device and storage medium based on differential privacy
CN113568727B (en) Mobile edge computing task allocation method based on deep reinforcement learning
CN113469325B (en) Hierarchical federation learning method for edge aggregation interval self-adaptive control, computer equipment and storage medium
CN115037608B (en) Quantization method, quantization device, quantization apparatus, and readable storage medium
CN109743713B (en) Resource allocation method and device for electric power Internet of things system
CN116523079A (en) Reinforced learning-based federal learning optimization method and system
CN110072130A (en) A kind of HAS video segment method for pushing based on HTTP/2
WO2022217210A1 (en) Privacy-aware pruning in machine learning
CN115392348A (en) Federal learning gradient quantification method, high-efficiency communication Federal learning method and related device
CN112307044A (en) Adaptive network data acquisition method based on multi-objective optimization and related equipment
CN114548421A (en) Optimization processing method and device for federal learning communication overhead
CN116996938A (en) Internet of vehicles task unloading method, terminal equipment and storage medium
CN113839830B (en) Method, device and storage medium for predicting multiple data packet parameters
CN114422438A (en) Link adjusting method and device of power communication network
CN109219960B (en) Method, device and equipment for optimizing video coding quality smoothness and storage medium
WO2023236609A1 (en) Automatic mixed-precision quantization method and apparatus
WO2023142351A1 (en) Weight adjustment method and apparatus, and storage medium and electronic apparatus
CN112511702B (en) Media frame pushing method, server, electronic equipment and storage medium
CN116491115A (en) Rate controlled machine learning model with feedback control for video coding
US20140321558A1 (en) Video quality measurement considering multiple artifacts
CN115293324A (en) Quantitative perception training method and related device
CN115564055A (en) Asynchronous joint learning training method and device, computer equipment and storage medium
CN113850390A (en) Method, device, equipment and medium for sharing data in federal learning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination