CN114548421A - Optimization processing method and device for federal learning communication overhead - Google Patents
Optimization processing method and device for federal learning communication overhead Download PDFInfo
- Publication number
- CN114548421A CN114548421A CN202210023353.8A CN202210023353A CN114548421A CN 114548421 A CN114548421 A CN 114548421A CN 202210023353 A CN202210023353 A CN 202210023353A CN 114548421 A CN114548421 A CN 114548421A
- Authority
- CN
- China
- Prior art keywords
- gradient
- quantization
- local client
- round
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004891 communication Methods 0.000 title claims abstract description 77
- 238000005457 optimization Methods 0.000 title claims abstract description 46
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 238000013139 quantization Methods 0.000 claims abstract description 256
- 238000004220 aggregation Methods 0.000 claims abstract description 70
- 230000002776 aggregation Effects 0.000 claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 33
- 230000003044 adaptive effect Effects 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims description 19
- 238000006116 polymerization reaction Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 11
- 230000000379 polymerizing effect Effects 0.000 claims description 5
- 230000004931 aggregating effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 3
- 241000272478 Aquila Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides an optimization processing method and device for federal learning communication overhead. The method comprises the following steps: distributing the initial global model to a local client to obtain a target quantization grade of the current turn, which is obtained by the local client based on the adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on an inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current round based on a target quantization level by the local client; and carrying out aggregation processing on the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round. According to the method provided by the invention, the quantization grade is dynamically adjusted through the self-adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, and the communication overhead is reduced.
Description
Technical Field
The invention relates to the technical field of artificial intelligence analysis, in particular to an optimization processing method and device for federal learning communication overhead. In addition, an electronic device and a processor-readable storage medium are also related.
Background
In the federal learning process, the local client and the central server need to interact for multiple times to obtain a global model meeting the precision condition. For a complex model training process, such as training of a deep learning model, each model update may contain a large number of model parameters, so that the federal learning communication overhead efficiency is low, and therefore, research on improving the federal learning communication efficiency is of great value.
Currently, in the prior art, in order to improve the federal learning communication efficiency, a mode of reducing the number of model transmissions or reducing the number of bits uploaded by a client each time is generally adopted. Although the two methods for optimizing communication efficiency can reduce communication overhead to a certain extent, each of the two methods has limitations, such as determining quantization levels only by more experience or grid search. Therefore, how to design an efficient optimization processing scheme aiming at the federal learning communication overhead becomes a difficult problem to be solved urgently.
Disclosure of Invention
Therefore, the invention provides an optimization processing method and device aiming at the federal learning communication overhead, and aims to overcome the defect that the federal learning communication efficiency is poor due to the fact that an optimization processing scheme aiming at the federal learning communication overhead is high in limitation in the prior art.
In a first aspect, the present invention provides an optimization processing method for federal learning communication overhead, which is applied to a central server, and includes:
distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, determining whether to acquire a quantization gradient corresponding to the local client in the current turn based on a preset inert gradient aggregation model specifically includes:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, aggregating the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round, specifically including:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
in the formula,representing the mth local clientTarget quantization levels in the k-th round, whereinRepresenting a quantization level; b0Representing an initial quantization level corresponding to each local client;representing an adaptive process, obtained adaptively in model training, whereinRepresenting the quantization gradient corresponding to the mth local client at round 1,representing the quantization gradient corresponding to the mth local client in the kth round,representing the quantization gradient actually uploaded by the mth local client in the 0 th round;representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
in the formula,indicating that the mth local client uses the target quantization level in the kth roundDequantization of a gradientObtaining a quantization gradient;represent the quantization gradient actually used by the mth local client in the k-1 th roundDequantization of a gradientObtaining a true gradient;the difference value is used for measuring the k-th round updating quantity of the mth local client; representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;representing the difference between the true gradient and the quantized gradient;representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
In a second aspect, the present invention further provides an optimization processing method for federal learning communication overhead, which is applied to a local client, and includes:
acquiring an initial global model distributed by a central server;
determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
In a third aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:
the system comprises a quantization grade self-adaptive determination unit, a target quantization grade determination unit and a target quantization grade determination unit, wherein the quantization grade self-adaptive determination unit is used for distributing an initial global model to a local client and obtaining a target quantization grade of a current turn, which is obtained by the local client based on a preset self-adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
the inertia gradient aggregation processing unit is used for determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on a preset inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and the quantization gradient aggregation processing unit is used for performing aggregation processing on the quantization gradients and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, the inert gradient polymerization processing unit is specifically configured to:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, the quantization gradient aggregation processing unit is specifically configured to:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
in the formula,representing a target quantization level of the mth local client in the kth round, whereinRepresenting a quantization level; b0Representing an initial quantization level corresponding to each local client;representing an adaptive process, obtained adaptively in model training, whereinRepresenting the quantization gradient corresponding to the mth local client at round 1,representing the quantization gradient corresponding to the mth local client in the kth round,representing the quantization gradient actually uploaded by the mth local client in the 0 th round;representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
in the formula,indicating that the mth local client uses the target quantization level in the kth roundDequantization of a gradientObtaining a quantization gradient;represent the quantization gradient actually used by the mth local client in the k-1 th roundDequantization of a gradientObtaining a true gradient;the difference value is used for measuring the k-th round updating quantity of the mth local client; representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;representing the difference of the true gradient and the quantized gradient;representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
In a fourth aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:
the global model obtaining unit is used for obtaining an initial global model distributed by the central server;
the quantization grade self-adaptive processing unit is used for determining a target quantization grade of the current round based on a preset self-adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and the quantization gradient uploading unit is used for uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
In a fifth aspect, the present invention further provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the optimization processing method for the federal learning communication overhead.
In a sixth aspect, the present invention further provides a processor-readable storage medium, which stores thereon a computer program, which when executed by a processor implements the steps of the optimization processing method for federal learning communication overhead as set forth in any one of the above.
According to the optimization processing method for the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is one of the flow diagrams of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;
fig. 2 is a complete schematic diagram of a federal learning framework corresponding to the optimization processing method for federal learning communication overhead provided by the embodiment of the present invention;
fig. 3 is a schematic diagram of gradient quantization in an optimization processing method for federal learning communication overhead according to an embodiment of the present invention;
fig. 4 is one of schematic structural diagrams of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;
fig. 5 is a second flowchart of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;
fig. 6 is a second schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;
fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the optimization problem of the federal learning communication overhead, the invention provides an optimization processing method aiming at the federal learning communication overhead, namely, a communication cooperative learning framework AUQILA (Adaptive quantization of learning-aggregated granules) obtained based on the Adaptive model quantization of an inertia gradient aggregation strategy is applied to the federal learning framework to reduce the overall communication traffic of a local client, thereby better realizing the high-efficiency transmission in the federal learning.
The following describes an embodiment of the method for optimizing federal learning communication overhead based on the present invention in detail. As shown in fig. 1, which is one of the flow diagrams of the optimization processing method for federal learning communication overhead provided in the embodiment of the present invention, a specific implementation process includes the following steps:
step 101: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; and the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn.
In a federal learning scenario, different local clients can perform quantization processing on the gradient through different determined quantization levels, so that different quantization gradients are obtained. In the embodiment of the present invention, the application of the communication collaborative learning framework mainly includes that an optimal quantization bit is selected for each communication round in laq (lazy Aggregation learning) by optimizing the gradient loss caused by skipping quantization update.
As shown in fig. 2, it illustrates a training process of a communication collaborative learning framework, which includes an adaptive gradient Quantization model (Quantization level selection) and an inert gradient Aggregation model (Lazy Aggregation history). At each training round k: the central server distributes the initial global model to each local client in advance in a broadcasting mode; the local client m calculates the target quantization level according to the formula (1) corresponding to the following adaptive gradient quantization model
In the formula,represents the target quantization level (optimal quantization level) of the mth local client in the kth round, whereinRepresenting a quantization level; b0Indicating an initial quantization level corresponding to each local client, for example, the initial quantization level is 2;the self-adaptive process is represented and is obtained by self-adaptation in training; wherein,the corresponding quantization gradient of the mth local client in the 1 st round,the corresponding quantization gradient at the kth round is applied to the mth local client,the quantization gradient actually uploaded by the mth local client in the 0 th round;the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client. If the target quantization level is higher than the preset threshold bmaxThen will beIs set as bmax。
In the embodiment of the invention, after the initial global model is distributed to the local client, the target quantization grade of the current round calculated by the local client based on the preset adaptive gradient quantization model and the dynamically adjusted quantization bit number is obtained. After the target quantization level is determined, the central server can obtain the target quantization level of the current turn, which is obtained by the local client based on a preset adaptive gradient quantization model.
Step 102: determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; and the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level.
In the embodiment of the present invention, based on a preset inert gradient aggregation model, it is determined whether to acquire a quantization gradient corresponding to the local client in the current turn, and a specific implementation process includes: and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client. The local client is a client used for realizing data communication with the central server in the federal learning architecture. The target quantization levels correspond to the number of quantization bits, and the quantization levels include corresponding target quantization levels.
In this step, the central server determines whether the local client uploads the quantization gradient in the current turn according to the following formula (2) corresponding to the inertia gradient aggregation model.
In the formula,indicating that the mth local client uses the target quantization level in the kth turnDequantization of a gradientObtaining a quantization gradient;representing the quantization gradient actually used by the mth local client (or client m) in the k-1 th round;the difference value of the k-th round update quantity (the difference value of the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round) of the mth local client is used for weighing;representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds; representing the difference between the true gradient and the quantized gradient;representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, ξdA penalty term (from 32 bits to 2 bits with partial loss) is preset for quantizing the gradient, and α is a preset hyperparameter.
If the above equation (2) holds, the quantized quantization gradient Q is calculated by the corresponding local client using the following equation (3)b(g) Obtaining the uploaded quantization gradient Q by the central serverb(g)。
Qb(gi)=||g||2·sign(gi)·ξi(g,b) (3)
In the formula, giRepresenting the gradient (vector), i representing the indices of several components of a gradient, each component of this gradient being subjected to the above-mentioned quantization operation, resulting in a Qb(g) Or Qb(gi);||g||2L representing a gradient g2A norm; sign (g)i) Is giFor example, positive equals 1, negative equals-1; xii(g, b) represents a probability.
If the above formula (2) does not hold, the corresponding local client does not upload the quantization gradient Q in the current turnb(g) Thereby achieving the purpose of transmitting the model parameters (or gradients) with larger update degree by using less quantization bits.
In the embodiment of the present invention, QSGD (Quantized Stochastic Gradient Descent) is used for quantization in AQUILA, and the quantization formula is shown in the following formula (4), wherein l is equal to [0, b ]]Is an integer such that | gi|/||g||2∈[l/b,(l+1)/b]. For each component g of the gradientiThe target gradient and the original gradient are estimated unbiased by using QSGD, wherein the QSGD can approximate to one endpoint of the subinterval with a certain probability each time, and the target gradient and the original gradient are estimated unbiased by using QSGD. To enable a more intuitive understanding of the QSGD, the example of fig. 3 was devised. In fig. 3, the target quantization scale is 5: the interval has 5 endpoints 0, 0.25, 0.5, 0.75, 1. If a component of the original gradient has a value of 0.6, then it has a probability of 0.6 being mapped to 0.5 and a probability of 0.4 being mapped to 0.75.
Step 103: and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
In the embodiment of the present invention, the quantization gradient is aggregated, and the initial global model is updated according to the aggregation result to obtain a target global model corresponding to the next round, and the specific implementation process includes: polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced; and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
In practical implementation, the central server receives the quantized gradients Q from the local clientsb(g) The quantization gradients are aggregated and used to update the initial global model using the following equation (5):
in the formula, α represents a learning rate or a learning step; thetakAn initial global model of the kth round (corresponding to the current round), i.e. a global parameter representing the kth round; thetak+1A target global model representing the updated k +1 round (equivalent to the next round); m represents data of a local client, and M represents the local client;representing a gradient decrease after which θ is obtainedk+1I.e. the target global model of round k + 1; α is the model learning rate.
It should be noted that, in the embodiment of the present invention, the quantization level or the target quantization level indicates how many quantization bits are used to dequantize a target to be quantized (for example, a gradient, and a quantization gradient is obtained by dequantizing a gradient with one quantization level). The quantization bits are how many bits are used for uploading the gradient (vector) of the gradient (for example, the quantization bits may be 32 bits, and the quantization level may be 2 bits). The quantization gradient is obtained by putting the gradient into the quantization model (quantization frame) to be dequantized. It can be understood that: the gradient is a vector (including several components), each component in the vector is taken as a quantization bit, for example, each component is originally 32 bits, and is used for a quantization level 2 to dequantize, which finally obtains a quantization bit 2 bits, that is, a quantization gradient obtained by the gradient.
According to the optimization processing method for the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides an optimization processing device for the federal learning communication overhead. Since the embodiment of the device is similar to the method embodiment described above, the description is relatively simple, and for relevant points, reference may be made to the description in the above method embodiment section, and the following description of the embodiment of the optimization processing device for federal learning communication overhead is only illustrative. Fig. 4 is a schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.
The invention relates to an optimization processing device aiming at federal learning communication overhead, which comprises the following parts:
a quantization level adaptive determination unit 401, configured to distribute an initial global model to a local client, and obtain a target quantization level of a current round that is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
an inertia gradient aggregation processing unit 402, configured to determine, based on a preset inertia gradient aggregation model, whether to obtain a quantization gradient corresponding to the local client in the current turn; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and a quantization gradient aggregation processing unit 403, configured to perform aggregation processing on the quantization gradients, and update the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Further, the inert gradient polymerization processing unit is specifically configured to:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
Further, the quantization gradient aggregation processing unit is specifically configured to:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
Further, the adaptive gradient quantization model corresponds to the following formula (1):
in the formula,representing a target quantization level of the mth local client in the kth round, whereinRepresenting a quantization level; b0Representing an initial quantization level corresponding to each local client;representing an adaptive process, obtained adaptively in model training, whereinRepresenting the quantization gradient corresponding to the mth local client at round 1,representing the quantization gradient corresponding to the mth local client in the kth round,representing the quantization gradient actually uploaded by the mth local client in the 0 th round;representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
in the formula,indicating that the mth local client uses the target quantization level in the kth roundDequantization of a gradientObtaining a quantization gradient;represent the quantization gradient actually used by the mth local client in the k-1 th roundDequantization of a gradientObtaining a true gradient;the difference value is used for measuring the k-th round updating quantity of the mth local client; representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;representing the difference between the true gradient and the quantized gradient;representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
By adopting the optimization processing device aiming at the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.
Correspondingly, the invention also provides another optimized processing method and device aiming at the federal learning communication overhead, which corresponds to the optimized processing method and device aiming at the federal learning communication overhead provided by the invention. Since the embodiments of the method and the apparatus are similar to the embodiments of the method and the apparatus, the description is relatively simple, and please refer to the description in the above embodiments of the method and the apparatus, and the embodiments of the method and the apparatus for optimizing federal learning communication overhead described below are only schematic. Fig. 5 is a second flowchart illustrating an optimization processing method for federally learned communication overhead according to a second embodiment of the present invention.
Step 501: and acquiring an initial global model distributed by the central server.
Step 502: and determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round.
Step 503: and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
Fig. 6 is a schematic structural diagram of a second optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.
A global model obtaining unit 601, configured to obtain an initial global model distributed by a central server;
a quantization level adaptive processing unit 602, configured to determine a target quantization level of a current round based on a preset adaptive gradient quantization model, and quantize a gradient to be uploaded in the current round based on the target quantization level to obtain a quantization gradient corresponding to the current round;
a quantization gradient uploading unit 603, configured to upload the quantization gradient corresponding to the current turn to a central server, so as to implement aggregation processing on the quantization gradient in the central server.
Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor)701, a memory (memory)702 and a communication bus 703, wherein the processor 701 and the memory 702 communicate with each other through the communication bus 703 and communicate with the outside through a communication interface 704. Processor 701 may invoke logic instructions in memory 702 to perform a method of optimization for federally learned communications overhead, the method comprising: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and carrying out aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
Furthermore, the logic instructions in the memory 702 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, and the computer program includes program instructions, where when the program instructions are executed by a computer, the computer is capable of executing the optimization processing method for the federal learning communication overhead provided by the above-mentioned method embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
In still another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when executed by a processor, the computer program is implemented to perform the optimization processing method for federal learning communication overhead provided in the foregoing embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An optimization processing method aiming at federal learning communication overhead is applied to a central server, and is characterized by comprising the following steps:
distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
2. The optimization processing method for federated learning communication overhead according to claim 1, wherein determining whether to acquire a quantization gradient corresponding to the local client in a current turn based on a preset inert gradient aggregation model specifically includes:
and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.
3. The optimization processing method for the federal learning communication overhead according to claim 1, wherein the aggregating is performed on the quantization gradient, and the initial global model is updated according to an aggregation result to obtain a target global model corresponding to the next round, specifically comprising:
polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;
and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.
4. The optimization processing method for federal learning communication overhead as claimed in claim 1, wherein the adaptive gradient quantization model corresponds to formula (1) as follows:
in the formula,representing a target quantization level of the mth local client in the kth round, whereinRepresenting a quantization level; b0Representing an initial quantization level corresponding to each local client;representing an adaptive process, obtained adaptively in model training, whereinRepresenting the quantization gradient corresponding to the mth local client at round 1,representing the quantization gradient corresponding to the mth local client in the kth round,representing the quantization gradient actually uploaded by the mth local client in the 0 th round;representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.
5. The optimization processing method for federal learning communication overhead according to claim 2, wherein the inert gradient aggregation model corresponds to the following formula (2):
in the formula,indicating that the mth local client uses the target quantization level in the kth roundDequantization of a gradientObtaining a quantization gradient;represent the quantization gradient actually used by the mth local client in the k-1 th roundDequantization of a gradientObtaining a true gradient;the difference value is used for measuring the k-th round updating quantity of the mth local client; representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;representing the difference between the true gradient and the quantized gradient;representing the difference between the quantized gradient and the original gradient; xidAnd α is two hyperparameters, where ξdA penalty term for quantifying the gradient.
6. An optimization processing method for federal learning communication overhead is applied to a local client, and is characterized by comprising the following steps:
acquiring an initial global model distributed by a central server;
determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
7. An optimization processing apparatus for federally learned communication overhead, comprising:
the system comprises a quantization grade self-adaptive determination unit, a target quantization grade determination unit and a target quantization grade determination unit, wherein the quantization grade self-adaptive determination unit is used for distributing an initial global model to a local client and obtaining a target quantization grade of a current turn, which is obtained by the local client based on a preset self-adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;
the inertia gradient aggregation processing unit is used for determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on a preset inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;
and the quantization gradient aggregation processing unit is used for performing aggregation processing on the quantization gradients and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.
8. An optimization processing apparatus for federally learned communication overhead, comprising:
the global model obtaining unit is used for obtaining an initial global model distributed by the central server;
the quantization grade self-adaptive processing unit is used for determining a target quantization grade of the current round based on a preset self-adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;
and the quantization gradient uploading unit is used for uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for optimization of federal learned communications overhead of any of claims 1 to 6.
10. A processor-readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method for optimization of federally learned communications overhead of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210023353.8A CN114548421A (en) | 2022-01-10 | 2022-01-10 | Optimization processing method and device for federal learning communication overhead |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210023353.8A CN114548421A (en) | 2022-01-10 | 2022-01-10 | Optimization processing method and device for federal learning communication overhead |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114548421A true CN114548421A (en) | 2022-05-27 |
Family
ID=81669352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210023353.8A Pending CN114548421A (en) | 2022-01-10 | 2022-01-10 | Optimization processing method and device for federal learning communication overhead |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114548421A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024094094A1 (en) * | 2022-11-02 | 2024-05-10 | 华为技术有限公司 | Model training method and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139662A (en) * | 2021-04-23 | 2021-07-20 | 深圳市大数据研究院 | Global and local gradient processing method, device, equipment and medium for federal learning |
CN113435604A (en) * | 2021-06-16 | 2021-09-24 | 清华大学 | Method and device for optimizing federated learning |
-
2022
- 2022-01-10 CN CN202210023353.8A patent/CN114548421A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139662A (en) * | 2021-04-23 | 2021-07-20 | 深圳市大数据研究院 | Global and local gradient processing method, device, equipment and medium for federal learning |
CN113435604A (en) * | 2021-06-16 | 2021-09-24 | 清华大学 | Method and device for optimizing federated learning |
Non-Patent Citations (2)
Title |
---|
JIANQIAO WANGNI 等: "Gradient sparsification for communication-efficient distributed optimization.", 《PROCEEDINGS OF THE 32ND INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》, 31 December 2018 (2018-12-31) * |
董业;侯炜;陈小军;曾帅;: "基于秘密分享和梯度选择的高效安全联邦学习", 计算机研究与发展, no. 10, 9 October 2020 (2020-10-09) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024094094A1 (en) * | 2022-11-02 | 2024-05-10 | 华为技术有限公司 | Model training method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112181666B (en) | Equipment assessment and federal learning importance aggregation method based on edge intelligence | |
CN110942154B (en) | Data processing method, device, equipment and storage medium based on federal learning | |
CN111091199B (en) | Federal learning method, device and storage medium based on differential privacy | |
CN113568727B (en) | Mobile edge computing task allocation method based on deep reinforcement learning | |
CN113469325B (en) | Hierarchical federation learning method for edge aggregation interval self-adaptive control, computer equipment and storage medium | |
CN115037608B (en) | Quantization method, quantization device, quantization apparatus, and readable storage medium | |
CN109743713B (en) | Resource allocation method and device for electric power Internet of things system | |
CN116523079A (en) | Reinforced learning-based federal learning optimization method and system | |
CN110072130A (en) | A kind of HAS video segment method for pushing based on HTTP/2 | |
WO2022217210A1 (en) | Privacy-aware pruning in machine learning | |
CN115392348A (en) | Federal learning gradient quantification method, high-efficiency communication Federal learning method and related device | |
CN112307044A (en) | Adaptive network data acquisition method based on multi-objective optimization and related equipment | |
CN114548421A (en) | Optimization processing method and device for federal learning communication overhead | |
CN116996938A (en) | Internet of vehicles task unloading method, terminal equipment and storage medium | |
CN113839830B (en) | Method, device and storage medium for predicting multiple data packet parameters | |
CN114422438A (en) | Link adjusting method and device of power communication network | |
CN109219960B (en) | Method, device and equipment for optimizing video coding quality smoothness and storage medium | |
WO2023236609A1 (en) | Automatic mixed-precision quantization method and apparatus | |
WO2023142351A1 (en) | Weight adjustment method and apparatus, and storage medium and electronic apparatus | |
CN112511702B (en) | Media frame pushing method, server, electronic equipment and storage medium | |
CN116491115A (en) | Rate controlled machine learning model with feedback control for video coding | |
US20140321558A1 (en) | Video quality measurement considering multiple artifacts | |
CN115293324A (en) | Quantitative perception training method and related device | |
CN115564055A (en) | Asynchronous joint learning training method and device, computer equipment and storage medium | |
CN113850390A (en) | Method, device, equipment and medium for sharing data in federal learning system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |