CN114548421A

CN114548421A - Optimization processing method and device for federal learning communication overhead

Info

Publication number: CN114548421A
Application number: CN202210023353.8A
Authority: CN
Inventors: 刘洋; 丁文伯; 赵子号; 毛钰竹; 黄绍伦
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-05-27

Abstract

The invention provides an optimization processing method and device for federal learning communication overhead. The method comprises the following steps: distributing the initial global model to a local client to obtain a target quantization grade of the current turn, which is obtained by the local client based on the adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on an inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current round based on a target quantization level by the local client; and carrying out aggregation processing on the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round. According to the method provided by the invention, the quantization grade is dynamically adjusted through the self-adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, and the communication overhead is reduced.

Description

Optimization processing method and device for federal learning communication overhead

Technical Field

The invention relates to the technical field of artificial intelligence analysis, in particular to an optimization processing method and device for federal learning communication overhead. In addition, an electronic device and a processor-readable storage medium are also related.

Background

In the federal learning process, the local client and the central server need to interact for multiple times to obtain a global model meeting the precision condition. For a complex model training process, such as training of a deep learning model, each model update may contain a large number of model parameters, so that the federal learning communication overhead efficiency is low, and therefore, research on improving the federal learning communication efficiency is of great value.

Currently, in the prior art, in order to improve the federal learning communication efficiency, a mode of reducing the number of model transmissions or reducing the number of bits uploaded by a client each time is generally adopted. Although the two methods for optimizing communication efficiency can reduce communication overhead to a certain extent, each of the two methods has limitations, such as determining quantization levels only by more experience or grid search. Therefore, how to design an efficient optimization processing scheme aiming at the federal learning communication overhead becomes a difficult problem to be solved urgently.

Disclosure of Invention

Therefore, the invention provides an optimization processing method and device aiming at the federal learning communication overhead, and aims to overcome the defect that the federal learning communication efficiency is poor due to the fact that an optimization processing scheme aiming at the federal learning communication overhead is high in limitation in the prior art.

In a first aspect, the present invention provides an optimization processing method for federal learning communication overhead, which is applied to a central server, and includes:

distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;

determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;

and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

Further, determining whether to acquire a quantization gradient corresponding to the local client in the current turn based on a preset inert gradient aggregation model specifically includes:

and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client.

Further, aggregating the quantization gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round, specifically including:

polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced;

and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.

Further, the adaptive gradient quantization model corresponds to the following formula (1):

in the formula,

representing the mth local clientTarget quantization levels in the k-th round, wherein

Representing a quantization level; b₀Representing an initial quantization level corresponding to each local client;

representing an adaptive process, obtained adaptively in model training, wherein

Representing the quantization gradient corresponding to the mth local client at round 1,

representing the quantization gradient corresponding to the mth local client in the kth round,

representing the quantization gradient actually uploaded by the mth local client in the 0 th round;

representing the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client.

Further, the inert gradient polymerization model corresponds to the following formula (2):

in the formula,

indicating that the mth local client uses the target quantization level in the kth round

Dequantization of a gradient

Obtaining a quantization gradient;

represent the quantization gradient actually used by the mth local client in the k-1 th round

Dequantization of a gradient

Obtaining a true gradient;

the difference value is used for measuring the k-th round updating quantity of the mth local client;

representing a gradient range threshold, θ being a parameter of the global model; d represents the total number of rounds;

representing the difference between the true gradient and the quantized gradient;

representing the difference between the quantized gradient and the original gradient; xi_dAnd α is two hyperparameters, where ξ_dA penalty term for quantifying the gradient.

In a second aspect, the present invention further provides an optimization processing method for federal learning communication overhead, which is applied to a local client, and includes:

acquiring an initial global model distributed by a central server;

determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;

and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.

In a third aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:

the system comprises a quantization grade self-adaptive determination unit, a target quantization grade determination unit and a target quantization grade determination unit, wherein the quantization grade self-adaptive determination unit is used for distributing an initial global model to a local client and obtaining a target quantization grade of a current turn, which is obtained by the local client based on a preset self-adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;

the inertia gradient aggregation processing unit is used for determining whether to acquire a quantization gradient corresponding to the local client in the current turn or not based on a preset inertia gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;

and the quantization gradient aggregation processing unit is used for performing aggregation processing on the quantization gradients and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

Further, the inert gradient polymerization processing unit is specifically configured to:

Further, the quantization gradient aggregation processing unit is specifically configured to:

in the formula,

representing a target quantization level of the mth local client in the kth round, wherein

in the formula,

Dequantization of a gradient

Obtaining a quantization gradient;

Dequantization of a gradient

Obtaining a true gradient;

representing the difference of the true gradient and the quantized gradient;

In a fourth aspect, the present invention further provides an optimization processing apparatus for federal learning communication overhead, including:

the global model obtaining unit is used for obtaining an initial global model distributed by the central server;

the quantization grade self-adaptive processing unit is used for determining a target quantization grade of the current round based on a preset self-adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round;

and the quantization gradient uploading unit is used for uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.

In a fifth aspect, the present invention further provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the optimization processing method for the federal learning communication overhead.

In a sixth aspect, the present invention further provides a processor-readable storage medium, which stores thereon a computer program, which when executed by a processor implements the steps of the optimization processing method for federal learning communication overhead as set forth in any one of the above.

According to the optimization processing method for the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is one of the flow diagrams of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;

fig. 2 is a complete schematic diagram of a federal learning framework corresponding to the optimization processing method for federal learning communication overhead provided by the embodiment of the present invention;

fig. 3 is a schematic diagram of gradient quantization in an optimization processing method for federal learning communication overhead according to an embodiment of the present invention;

fig. 4 is one of schematic structural diagrams of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;

fig. 5 is a second flowchart of the optimization processing method for federal learning communication overhead according to the embodiment of the present invention;

fig. 6 is a second schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention;

fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the optimization problem of the federal learning communication overhead, the invention provides an optimization processing method aiming at the federal learning communication overhead, namely, a communication cooperative learning framework AUQILA (Adaptive quantization of learning-aggregated granules) obtained based on the Adaptive model quantization of an inertia gradient aggregation strategy is applied to the federal learning framework to reduce the overall communication traffic of a local client, thereby better realizing the high-efficiency transmission in the federal learning.

The following describes an embodiment of the method for optimizing federal learning communication overhead based on the present invention in detail. As shown in fig. 1, which is one of the flow diagrams of the optimization processing method for federal learning communication overhead provided in the embodiment of the present invention, a specific implementation process includes the following steps:

step 101: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; and the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn.

In a federal learning scenario, different local clients can perform quantization processing on the gradient through different determined quantization levels, so that different quantization gradients are obtained. In the embodiment of the present invention, the application of the communication collaborative learning framework mainly includes that an optimal quantization bit is selected for each communication round in laq (lazy Aggregation learning) by optimizing the gradient loss caused by skipping quantization update.

As shown in fig. 2, it illustrates a training process of a communication collaborative learning framework, which includes an adaptive gradient Quantization model (Quantization level selection) and an inert gradient Aggregation model (Lazy Aggregation history). At each training round k: the central server distributes the initial global model to each local client in advance in a broadcasting mode; the local client m calculates the target quantization level according to the formula (1) corresponding to the following adaptive gradient quantization model

In the formula,

represents the target quantization level (optimal quantization level) of the mth local client in the kth round, wherein

Representing a quantization level; b₀Indicating an initial quantization level corresponding to each local client, for example, the initial quantization level is 2;

the self-adaptive process is represented and is obtained by self-adaptation in training; wherein,

the corresponding quantization gradient of the mth local client in the 1 st round,

the corresponding quantization gradient at the kth round is applied to the mth local client,

the quantization gradient actually uploaded by the mth local client in the 0 th round;

the quantization gradient actually uploaded by the mth local client in the k-1 th round; m denotes the mth local client. If the target quantization level is higher than the preset threshold b_maxThen will be

Is set as b_max。

In the embodiment of the invention, after the initial global model is distributed to the local client, the target quantization grade of the current round calculated by the local client based on the preset adaptive gradient quantization model and the dynamically adjusted quantization bit number is obtained. After the target quantization level is determined, the central server can obtain the target quantization level of the current turn, which is obtained by the local client based on a preset adaptive gradient quantization model.

Step 102: determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; and the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level.

In the embodiment of the present invention, based on a preset inert gradient aggregation model, it is determined whether to acquire a quantization gradient corresponding to the local client in the current turn, and a specific implementation process includes: and judging whether the difference value between the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round is greater than or equal to a preset gradient range threshold value or not based on a preset inert gradient aggregation model, and if so, determining to acquire the quantization gradient corresponding to the current round of the local client. The local client is a client used for realizing data communication with the central server in the federal learning architecture. The target quantization levels correspond to the number of quantization bits, and the quantization levels include corresponding target quantization levels.

In this step, the central server determines whether the local client uploads the quantization gradient in the current turn according to the following formula (2) corresponding to the inertia gradient aggregation model.

In the formula,

indicating that the mth local client uses the target quantization level in the kth turn

Dequantization of a gradient

Obtaining a quantization gradient;

representing the quantization gradient actually used by the mth local client (or client m) in the k-1 th round;

the difference value of the k-th round update quantity (the difference value of the quantization gradient corresponding to the current round and the quantization gradient corresponding to the previous round) of the mth local client is used for weighing;

representing the difference between the quantized gradient and the original gradient; xi_dAnd α is two hyperparameters, ξ_dA penalty term (from 32 bits to 2 bits with partial loss) is preset for quantizing the gradient, and α is a preset hyperparameter.

If the above equation (2) holds, the quantized quantization gradient Q is calculated by the corresponding local client using the following equation (3)_b(g) Obtaining the uploaded quantization gradient Q by the central server_b(g)。

Q_b(g_i)＝||g||₂·sign(g_i)·ξ_i(g,b) (3)

In the formula, g_iRepresenting the gradient (vector), i representing the indices of several components of a gradient, each component of this gradient being subjected to the above-mentioned quantization operation, resulting in a Q_b(g) Or Q_b(g_i)；||g||₂L representing a gradient g²A norm; sign (g)_i) Is g_iFor example, positive equals 1, negative equals-1; xi_i(g, b) represents a probability.

If the above formula (2) does not hold, the corresponding local client does not upload the quantization gradient Q in the current turn_b(g) Thereby achieving the purpose of transmitting the model parameters (or gradients) with larger update degree by using less quantization bits.

In the embodiment of the present invention, QSGD (Quantized Stochastic Gradient Descent) is used for quantization in AQUILA, and the quantization formula is shown in the following formula (4), wherein l is equal to [0, b ]]Is an integer such that | g_i|/||g||₂∈[l/b,(l+1)/b]. For each component g of the gradient_iThe target gradient and the original gradient are estimated unbiased by using QSGD, wherein the QSGD can approximate to one endpoint of the subinterval with a certain probability each time, and the target gradient and the original gradient are estimated unbiased by using QSGD. To enable a more intuitive understanding of the QSGD, the example of fig. 3 was devised. In fig. 3, the target quantization scale is 5: the interval has 5 endpoints 0, 0.25, 0.5, 0.75, 1. If a component of the original gradient has a value of 0.6, then it has a probability of 0.6 being mapped to 0.5 and a probability of 0.4 being mapped to 0.75.

Step 103: and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

In the embodiment of the present invention, the quantization gradient is aggregated, and the initial global model is updated according to the aggregation result to obtain a target global model corresponding to the next round, and the specific implementation process includes: polymerizing the quantitative gradient based on a preset quantitative gradient polymerization model to obtain a corresponding polymerization result after the gradient is reduced; and updating the initial global model corresponding to the current training round according to the aggregation result to obtain a target global model corresponding to the next training round.

In practical implementation, the central server receives the quantized gradients Q from the local clients_b(g) The quantization gradients are aggregated and used to update the initial global model using the following equation (5):

in the formula, α represents a learning rate or a learning step; theta^kAn initial global model of the kth round (corresponding to the current round), i.e. a global parameter representing the kth round; theta^k+1A target global model representing the updated k +1 round (equivalent to the next round); m represents data of a local client, and M represents the local client;

representing a gradient decrease after which θ is obtained^k+1I.e. the target global model of round k + 1; α is the model learning rate.

It should be noted that, in the embodiment of the present invention, the quantization level or the target quantization level indicates how many quantization bits are used to dequantize a target to be quantized (for example, a gradient, and a quantization gradient is obtained by dequantizing a gradient with one quantization level). The quantization bits are how many bits are used for uploading the gradient (vector) of the gradient (for example, the quantization bits may be 32 bits, and the quantization level may be 2 bits). The quantization gradient is obtained by putting the gradient into the quantization model (quantization frame) to be dequantized. It can be understood that: the gradient is a vector (including several components), each component in the vector is taken as a quantization bit, for example, each component is originally 32 bits, and is used for a quantization level 2 to dequantize, which finally obtains a quantization bit 2 bits, that is, a quantization gradient obtained by the gradient.

Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides an optimization processing device for the federal learning communication overhead. Since the embodiment of the device is similar to the method embodiment described above, the description is relatively simple, and for relevant points, reference may be made to the description in the above method embodiment section, and the following description of the embodiment of the optimization processing device for federal learning communication overhead is only illustrative. Fig. 4 is a schematic structural diagram of an optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.

The invention relates to an optimization processing device aiming at federal learning communication overhead, which comprises the following parts:

a quantization level adaptive determination unit 401, configured to distribute an initial global model to a local client, and obtain a target quantization level of a current round that is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn;

an inertia gradient aggregation processing unit 402, configured to determine, based on a preset inertia gradient aggregation model, whether to obtain a quantization gradient corresponding to the local client in the current turn; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level;

and a quantization gradient aggregation processing unit 403, configured to perform aggregation processing on the quantization gradients, and update the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

in the formula,

in the formula,

Dequantization of a gradient

Obtaining a quantization gradient;

Dequantization of a gradient

Obtaining a true gradient;

By adopting the optimization processing device aiming at the federal learning communication overhead, the quantization grade is adaptively and dynamically adjusted through the adaptive gradient quantization model, and the communication frequency is adjusted through the inert gradient aggregation model, so that the communication efficiency is effectively improved, the whole bit transmission quantity is reduced, the communication overhead is reduced, and the high-efficiency transmission in the federal learning can be better realized.

Correspondingly, the invention also provides another optimized processing method and device aiming at the federal learning communication overhead, which corresponds to the optimized processing method and device aiming at the federal learning communication overhead provided by the invention. Since the embodiments of the method and the apparatus are similar to the embodiments of the method and the apparatus, the description is relatively simple, and please refer to the description in the above embodiments of the method and the apparatus, and the embodiments of the method and the apparatus for optimizing federal learning communication overhead described below are only schematic. Fig. 5 is a second flowchart illustrating an optimization processing method for federally learned communication overhead according to a second embodiment of the present invention.

Step 501: and acquiring an initial global model distributed by the central server.

Step 502: and determining a target quantization grade of the current round based on a preset adaptive gradient quantization model, and quantizing the gradient to be uploaded in the current round based on the target quantization grade to obtain a quantization gradient corresponding to the current round.

Step 503: and uploading the quantization gradient corresponding to the current turn to a central server so as to realize the aggregation processing of the quantization gradient in the central server.

Fig. 6 is a schematic structural diagram of a second optimization processing apparatus for federal learning communication overhead according to an embodiment of the present invention.

A global model obtaining unit 601, configured to obtain an initial global model distributed by a central server;

a quantization level adaptive processing unit 602, configured to determine a target quantization level of a current round based on a preset adaptive gradient quantization model, and quantize a gradient to be uploaded in the current round based on the target quantization level to obtain a quantization gradient corresponding to the current round;

a quantization gradient uploading unit 603, configured to upload the quantization gradient corresponding to the current turn to a central server, so as to implement aggregation processing on the quantization gradient in the central server.

Corresponding to the optimization processing method for the federal learning communication overhead, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor)701, a memory (memory)702 and a communication bus 703, wherein the processor 701 and the memory 702 communicate with each other through the communication bus 703 and communicate with the outside through a communication interface 704. Processor 701 may invoke logic instructions in memory 702 to perform a method of optimization for federally learned communications overhead, the method comprising: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and carrying out aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

Furthermore, the logic instructions in the memory 702 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, and the computer program includes program instructions, where when the program instructions are executed by a computer, the computer is capable of executing the optimization processing method for the federal learning communication overhead provided by the above-mentioned method embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

In still another aspect, an embodiment of the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when executed by a processor, the computer program is implemented to perform the optimization processing method for federal learning communication overhead provided in the foregoing embodiments. The method comprises the following steps: distributing the initial global model to a local client, and obtaining a target quantization grade of the current round, which is obtained by the local client based on a preset adaptive gradient quantization model; the target quantization level is used for quantizing the gradient uploaded by the local client in the current turn; determining whether to acquire a quantization gradient corresponding to the current turn of the local client based on a preset inert gradient aggregation model; the quantization gradient is obtained by quantizing the gradient uploaded by the current turn by the local client based on the target quantization level; and performing aggregation processing on the quantitative gradients, and updating the initial global model according to an aggregation result to obtain a target global model corresponding to the next round.

The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An optimization processing method aiming at federal learning communication overhead is applied to a central server, and is characterized by comprising the following steps:

2. The optimization processing method for federated learning communication overhead according to claim 1, wherein determining whether to acquire a quantization gradient corresponding to the local client in a current turn based on a preset inert gradient aggregation model specifically includes:

3. The optimization processing method for the federal learning communication overhead according to claim 1, wherein the aggregating is performed on the quantization gradient, and the initial global model is updated according to an aggregation result to obtain a target global model corresponding to the next round, specifically comprising:

4. The optimization processing method for federal learning communication overhead as claimed in claim 1, wherein the adaptive gradient quantization model corresponds to formula (1) as follows:

in the formula,

5. The optimization processing method for federal learning communication overhead according to claim 2, wherein the inert gradient aggregation model corresponds to the following formula (2):

in the formula,

Dequantization of a gradient

Obtaining a quantization gradient;

Dequantization of a gradient

Obtaining a true gradient;

6. An optimization processing method for federal learning communication overhead is applied to a local client, and is characterized by comprising the following steps:

acquiring an initial global model distributed by a central server;

7. An optimization processing apparatus for federally learned communication overhead, comprising:

8. An optimization processing apparatus for federally learned communication overhead, comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the method for optimization of federal learned communications overhead of any of claims 1 to 6.

10. A processor-readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method for optimization of federally learned communications overhead of any of claims 1 to 6.