CN111401552B - Federal learning method and system based on batch size adjustment and gradient compression rate adjustment - Google Patents

Federal learning method and system based on batch size adjustment and gradient compression rate adjustment Download PDF

Info

Publication number
CN111401552B
CN111401552B CN202010166667.4A CN202010166667A CN111401552B CN 111401552 B CN111401552 B CN 111401552B CN 202010166667 A CN202010166667 A CN 202010166667A CN 111401552 B CN111401552 B CN 111401552B
Authority
CN
China
Prior art keywords
gradient
compression rate
terminal
batch size
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010166667.4A
Other languages
Chinese (zh)
Other versions
CN111401552A (en
Inventor
刘胜利
余官定
殷锐
袁建涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010166667.4A priority Critical patent/CN111401552B/en
Publication of CN111401552A publication Critical patent/CN111401552A/en
Application granted granted Critical
Publication of CN111401552B publication Critical patent/CN111401552B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明公开了一种基于调整批量大小与梯度压缩率的联邦学习方法和系统,用于提高模型训练性能,包括:在联邦学习场景中,多个终端共享上行无线信道资源,基于本地终端的训练数据,与边缘服务器共同完成神经网络模型的训练;在模型训练过程中,终端在本地计算中采用批量的方法计算梯度,在上行传输过程中,传输前需要对梯度进行压缩;根据各终端的计算能力与其所处的信道状态,调整批量大小以及梯度压缩率,以在保证训练时间与不降低模型正确率的同时,提高模型训练的收敛速率。

Figure 202010166667

The invention discloses a federated learning method and system based on adjusting batch size and gradient compression rate, which are used to improve model training performance, including: in a federated learning scenario, multiple terminals share uplink wireless channel resources, and local terminal-based training Data, together with the edge server to complete the training of the neural network model; during the model training process, the terminal uses a batch method to calculate the gradient in the local calculation, and in the uplink transmission process, the gradient needs to be compressed before transmission; according to the calculation of each terminal Ability and its channel state, adjust the batch size and gradient compression rate to improve the convergence rate of model training while ensuring the training time and not reducing the accuracy of the model.

Figure 202010166667

Description

基于调整批量大小与梯度压缩率的联邦学习方法和系统Federated learning method and system based on adjusting batch size and gradient compression rate

技术领域Technical Field

本发明涉及人工智能与通信领域,具体涉及一种基于调整批量大小与梯度压缩率的联邦学习方法和系统。The present invention relates to the field of artificial intelligence and communications, and in particular to a federated learning method and system based on adjusting batch size and gradient compression rate.

背景技术Background Art

近年来,随着硬件及软件水平的不断提升,人工智能(Artificial Intelligence,AI)技术又迎来了发展的高峰期。它从海量的数据中挖掘出关键信息以实现各种应用,如人脸识别,语音识别,数据挖掘等。然而对于数据隐私性比较敏感的场景,如医院的病人信息,银行的客户信息等,数据通常很难获取,俗称信息孤岛。如果仍然采用现有的人工智能训练方法,由于没有足够的数据,很难得到有效的结果。In recent years, with the continuous improvement of hardware and software levels, artificial intelligence (AI) technology has ushered in a peak of development. It mines key information from massive amounts of data to achieve various applications, such as face recognition, voice recognition, data mining, etc. However, for scenarios where data privacy is sensitive, such as patient information in hospitals and customer information in banks, data is usually difficult to obtain, commonly known as information islands. If the existing artificial intelligence training method is still used, it is difficult to obtain effective results due to insufficient data.

由谷歌提出的联邦学习(Federated Learning,FL)就是用来解决信息孤岛问题,主要面向数据隐私性和安全性比较敏感的业务,以及将要到来的自动驾驶,物联网等场景。它将模型训练分散至若干个终端进行,终端不需要将原始数据发送至边缘服务器,取而代之的是模型的参数或者梯度信息。基于传统的随机梯度下降的训练方法,运用在联邦学习场景的平均梯度方法仍然可以获得很好的学习性能。在5G的高可靠低时延的无线通信场景中,自动驾驶的实现,物联网的智能分析、决策等都离不开联邦学习。Federated Learning (FL) proposed by Google is used to solve the problem of information islands. It is mainly aimed at businesses that are sensitive to data privacy and security, as well as upcoming scenarios such as autonomous driving and the Internet of Things. It distributes model training to several terminals. The terminals do not need to send raw data to the edge server, but instead send model parameters or gradient information. Based on the traditional stochastic gradient descent training method, the average gradient method used in the federated learning scenario can still achieve good learning performance. In the high-reliability and low-latency wireless communication scenarios of 5G, the realization of autonomous driving, intelligent analysis and decision-making of the Internet of Things are inseparable from federated learning.

在传统的联邦学习场景中,由于终端与服务器之间采用有线连接,通信开销与本地计算时延都可以忽略不计。但随着移动通信网络的发展,以及移动智能设备的快速增长,为了快速实现物联网应用以及自动驾驶,人工智能模型训练可以放置在移动智能终端处,传统的有线通信也可以改为无线通信,使得训练终端的加入与退出变得非常便捷,即将联邦学习与无线通信网络相结合是未来的发展方向。In traditional federated learning scenarios, since the terminals and servers are connected by wires, the communication overhead and local computing latency can be ignored. However, with the development of mobile communication networks and the rapid growth of mobile smart devices, in order to quickly realize IoT applications and autonomous driving, artificial intelligence model training can be placed at mobile smart terminals, and traditional wired communications can be changed to wireless communications, making it very convenient to join and exit training terminals. Combining federated learning with wireless communication networks is the future development direction.

但将无线通信应用到联邦学习的场景,将会出现许多问题。首先,本地计算时延增加。虽然本地终端的计算能力在不断增加,已经可以部署人工智能模型,但是其与台式机或者服务器的差距仍然是比较大的,本地梯度计算带来的计算时延是不可以被忽略的。另外一方面,由于无线通信带宽资源紧缺,并且无线信道不稳定,大量的模型梯度信息传输会带来巨大的通信开销,引起很大的传输时延。However, applying wireless communication to the scenario of federated learning will cause many problems. First, the local computing latency increases. Although the computing power of local terminals is constantly increasing and AI models can be deployed, the gap between them and desktop computers or servers is still relatively large, and the computing latency caused by local gradient calculations cannot be ignored. On the other hand, due to the shortage of wireless communication bandwidth resources and the instability of wireless channels, the transmission of a large amount of model gradient information will bring huge communication overhead and cause a large transmission latency.

为了解决本地计算与梯度传输带来较高的训练时延的问题,本地批量数据处理与梯度压缩技术在模型训练过程中得到应用。每一轮训练交互中,终端可以只根据一部分的数据计算模型的梯度,以减少本地计算产生的时延,同时在传输的过程中,对梯度信息进行压缩,以较少的数据量表示原有的梯度信息,减少通信传输时间。但批量处理方式与梯度压缩都会对模型的收敛速率产生影响。In order to solve the problem of high training latency caused by local calculation and gradient transmission, local batch data processing and gradient compression technology are applied in the model training process. In each round of training interaction, the terminal can calculate the gradient of the model based on only a part of the data to reduce the latency caused by local calculation. At the same time, during the transmission process, the gradient information is compressed to represent the original gradient information with less data, reducing the communication transmission time. However, batch processing and gradient compression will affect the convergence rate of the model.

因此,在控制模型训练时延的同时,也需要考虑模型的收敛速率。如何采用合理的方式调整批量大小与梯度压缩率来保证训练时延、提高收敛速率是急需解决的问题。Therefore, while controlling the model training delay, the model convergence rate also needs to be considered. How to reasonably adjust the batch size and gradient compression rate to ensure the training delay and improve the convergence rate is an urgent problem to be solved.

发明内容Summary of the invention

本发明的目的是提供一种基于调整批量大小与梯度压缩率的联邦学习方法和系统,该联邦学习方法通过调整批量大小与梯度压缩率,在规定学习时间内提升了模型的收敛速率,且在联邦学习时不需要传输原始数据,更好地保护用户的隐私性和安全性。The purpose of the present invention is to provide a federated learning method and system based on adjusting batch size and gradient compression rate. The federated learning method improves the convergence rate of the model within a specified learning time by adjusting the batch size and gradient compression rate, and does not require the transmission of original data during federated learning, thereby better protecting the privacy and security of users.

为实现上述发明目的本发明提供以下技术方案:In order to achieve the above-mentioned purpose, the present invention provides the following technical solutions:

第一方面,提供了一种基于调整批量大小与梯度压缩率的联邦学习方法,实现所述联邦学习的系统包括边缘服务器、与所述边缘服务器无线通信的多个终端,所述终端根据本地数据进行模型学习,所述联邦学习方法包括:In a first aspect, a federated learning method based on adjusting a batch size and a gradient compression rate is provided, wherein a system for implementing the federated learning includes an edge server and multiple terminals in wireless communication with the edge server, wherein the terminals perform model learning based on local data, and the federated learning method includes:

所述边缘服务器根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率,并将调整后的批量大小与梯度压缩率传输至终端;The edge server adjusts the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal, and transmits the adjusted batch size and gradient compression rate to the terminal;

所述终端按照接收的批量大小进行模型学习,并将模型学习获得的梯度信息按照接收的提取压缩率压缩后输出至边缘服务器;The terminal performs model learning according to the received batch size, and compresses the gradient information obtained by the model learning according to the received extraction compression rate and outputs it to the edge server;

所述边缘服务器对接收的所有梯度信息求平均后,将梯度平均值同步到终端;The edge server averages all received gradient information and synchronizes the gradient average value to the terminal;

所述终端根据接收的梯度平均值更新模型。The terminal updates the model according to the received gradient average value.

本发明中,为了减少终端计算时间以及设备硬件的需求,本地终端采用批量的方式计算梯度,即每次计算选择一定的批量大小进行梯度计算。为了减少终端至边缘服务器的传输信息,节省通信开销,降低通信时间,终端在上传梯度信息前,需要对梯度信息进行压缩。In the present invention, in order to reduce the terminal computing time and the demand for device hardware, the local terminal calculates the gradient in batches, that is, a certain batch size is selected for gradient calculation each time. In order to reduce the transmission information from the terminal to the edge server, save communication overhead, and reduce communication time, the terminal needs to compress the gradient information before uploading it.

在一个可能的实现方式中,所述根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率包括:In a possible implementation, adjusting the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal includes:

(a)根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力计算当前学习时延;(a) Calculate the current learning delay based on the current batch size and gradient compression rate, combined with the computing power of the terminal and the communication capacity between the edge server and the terminal;

(b)比较所述当前学习时延与规定学习时延,当前学习时延大于规定学习时延时,减小批量大小和增大梯度压缩率;当前学习时延小于规定学习时延时,增大批量大小和减小梯度压缩率;(b) comparing the current learning delay with the prescribed learning delay, if the current learning delay is greater than the prescribed learning delay, reducing the batch size and increasing the gradient compression rate; if the current learning delay is less than the prescribed learning delay, increasing the batch size and reducing the gradient compression rate;

(c)重复步骤(a)和步骤(b),直到当前学习时延等于规定学习时延为止,与规定学习时延相等的当前学习时延对应的批量大小与梯度压缩率即为调整后的批量大小与梯度压缩率。(c) Repeat steps (a) and (b) until the current learning delay is equal to the specified learning delay, and the batch size and gradient compression rate corresponding to the current learning delay equal to the specified learning delay are the adjusted batch size and gradient compression rate.

当前学习时延高于规定学习时间,即当前训练不能够按时完成,此时应该减小批量大小以及增大压缩率,以节约本地终端计算和无线传输的时间;当前学习时延小于规定学习时间,即当前训练可以完成,但收敛速率还可以进一步提升,此时可以适当增加批量大小以及减小梯度的压缩率,以提高收敛性能。The current learning delay is higher than the specified learning time, that is, the current training cannot be completed on time. At this time, the batch size should be reduced and the compression rate should be increased to save time for local terminal calculation and wireless transmission. The current learning delay is lower than the specified learning time, that is, the current training can be completed, but the convergence rate can be further improved. At this time, the batch size can be appropriately increased and the compression rate of the gradient can be reduced to improve the convergence performance.

特别地,当学习时间特别短时,即训练任务非常紧张,各终端批量大小应尽可能地小,压缩率应该尽可能地大,以保证学习时延;当学习时间特别长,即该任务没有严格的学习时间要求,各终端批量大小应该尽可能地大,压缩率应该尽可能地小,以保证收敛性能最好。In particular, when the learning time is particularly short, that is, the training task is very tight, the batch size of each terminal should be as small as possible and the compression rate should be as large as possible to ensure the learning delay; when the learning time is particularly long, that is, the task has no strict learning time requirements, the batch size of each terminal should be as large as possible and the compression rate should be as small as possible to ensure the best convergence performance.

步骤(a)中,当前学习时延的计算过程为:In step (a), the calculation process of the current learning delay is:

根据终端学习能力和批量大小计算终端以当前批量大小计算梯度时所需要的时间;Calculate the time required for the terminal to calculate the gradient with the current batch size based on the terminal's learning ability and batch size;

根据通信能力和梯度压缩率计算梯度信息经过压缩后经过无线信道上传至边缘服务器所经历的时间;The time it takes for the gradient information to be compressed and uploaded to the edge server through the wireless channel is calculated based on the communication capacity and the gradient compression rate;

计算所有梯度信息汇总后进行平均获得平均梯度信息所需要的时间;The time required to calculate and average all gradient information to obtain the average gradient information;

计算边缘服务器将平均梯度信息下发至各个终端所需要经历的时间;Calculate the time it takes for the edge server to send the average gradient information to each terminal;

计算各终端收到平均梯度信息后对模型进行更新所需要的时间;Calculate the time required for each terminal to update the model after receiving the average gradient information;

这五部分时间之和即为当前学习时延。The sum of these five parts of time is the current learning delay.

根据当前学习时延的计算过程可得,当前学习时延可以通过调整批量大小与梯度压缩率进行改变,当批量大小增大或者梯度压缩率减小时,当前学习时延增加,当批量大小减小或者梯度压缩率增大时,当前学习时延减小。According to the calculation process of the current learning delay, the current learning delay can be changed by adjusting the batch size and the gradient compression rate. When the batch size increases or the gradient compression rate decreases, the current learning delay increases. When the batch size decreases or the gradient compression rate increases, the current learning delay decreases.

在一种可能的实现方式中,对梯度信息进行压缩时,可以将梯度用量化损失来进行量化,此时量化损失被定义为压缩率,即所述梯度压缩率包括对梯度信息进行梯度量化获得的压缩率,表示为:In a possible implementation, when compressing the gradient information, the gradient may be quantized using a quantization loss. In this case, the quantization loss is defined as a compression rate. That is, the gradient compression rate includes a compression rate obtained by performing gradient quantization on the gradient information, which is expressed as:

Figure GDA0004062316160000051
Figure GDA0004062316160000051

其中,x表示梯度信息,Q(x)表示量化函数,c表示压缩率,

Figure GDA0004062316160000052
表示二范数的平方。Among them, x represents gradient information, Q(x) represents the quantization function, and c represents the compression rate.
Figure GDA0004062316160000052
represents the square of the 2-norm.

对应模型训练的收敛速率可以表示为:The convergence rate of the corresponding model training can be expressed as:

Figure GDA0004062316160000053
Figure GDA0004062316160000053

其中,α11是与模型相关的系数,且α1>0,β1>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 1 , β 1 are coefficients related to the model, and α 1 >0, β 1 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.

当梯度压缩率采用梯度量化方式获得的压缩率时,通过增大或减小量化函数采用的量化位数来增大或减小梯度压缩率。When the gradient compression rate adopts the compression rate obtained by the gradient quantization method, the gradient compression rate is increased or decreased by increasing or decreasing the number of quantization bits adopted by the quantization function.

在一种可能的实现方式中,所述梯度压缩率包括对梯度信息进行稀疏化处理获得的压缩率,即选择梯度矩阵中的最大m个梯度进行传输,此时将梯度的传输量定义为压缩率,表示为:In a possible implementation, the gradient compression rate includes a compression rate obtained by performing sparse processing on the gradient information, that is, selecting the maximum m gradients in the gradient matrix for transmission. At this time, the transmission amount of the gradient is defined as the compression rate, which is expressed as:

Figure GDA0004062316160000054
Figure GDA0004062316160000054

其中,M表示模型的大小。Here, M represents the size of the model.

对应模型训练的收敛速率可以表示为:The convergence rate of the corresponding model training can be expressed as:

Figure GDA0004062316160000061
Figure GDA0004062316160000061

其中,α22是与模型相关的系数,且α2>0,β2>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 2 , β 2 are coefficients related to the model, and α 2 >0, β 2 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.

当梯度压缩率采用梯度稀疏化方式获得的压缩率时,通过增大或减小梯度个数来增大或减小梯度压缩率。When the gradient compression rate is obtained by using the gradient sparse method, the gradient compression rate is increased or decreased by increasing or decreasing the number of gradients.

在一个可能的实现方式中,采用以下公式获得梯度平均值:In a possible implementation, the gradient average is obtained using the following formula:

Figure GDA0004062316160000062
Figure GDA0004062316160000062

其中,gk(w)表示终端计算的梯度,G(w)表示梯度平均值,w表示模型参数,k表示终端索引,K表示终端的总个数。Among them, g k (w) represents the gradient calculated by the terminal, G(w) represents the average gradient, w represents the model parameter, k represents the terminal index, and K represents the total number of terminals.

在一个可能的实现方式中,采用以下公式根据梯度平均值更新模型:In one possible implementation, the following formula is used to update the model based on the gradient average:

wt+1=wt-αG(wt)w t+1 = w t -αG(w t )

其中,t是迭代次数,G(wt)表示第t次迭代时梯度平均值,wt、wt+1分别表示第t次和第t+1次迭代时终端的模型参数。Where t is the number of iterations, G( wt ) represents the average gradient at the tth iteration, and wt and wt+1 represent the model parameters of the terminal at the tth and t+1th iterations, respectively.

第二方面,提供了一种基于调整批量大小与梯度压缩率的联邦学习系统,包括连接到基通信端的边缘服务器,与所述边缘服务器无线通信的多个终端,In a second aspect, a federated learning system based on adjusting batch size and gradient compression rate is provided, comprising an edge server connected to a base communication terminal, a plurality of terminals in wireless communication with the edge server,

所述边缘服务器根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率,并将调整后的批量大小与梯度压缩率传输至终端;The edge server adjusts the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal, and transmits the adjusted batch size and gradient compression rate to the terminal;

所述终端按照接收的批量大小进行模型学习,并将模型学习获得的梯度信息按照接收的提取压缩率压缩后输出至边缘服务器;The terminal performs model learning according to the received batch size, and compresses the gradient information obtained by the model learning according to the received extraction compression rate and outputs it to the edge server;

所述边缘服务器对接收的所有梯度信息求平均后,将梯度平均值同步到终端;The edge server averages all received gradient information and synchronizes the gradient average value to the terminal;

所述终端根据接收的梯度平均值更新模型。The terminal updates the model according to the received gradient average value.

从以上技术方案可以看出,本申请实施例提供的基于调整批量大小与梯度压缩率的联邦学习方法和系统具有以下优点:与直接传输数据相比,传输梯度可以充分保护用户数据的隐私性和安全性;对梯度进行压缩可以减缓通信传输的压力,减低时延;根据学习时间的需求动态调整批量大小以及压缩率,可以在保证训练时间以及模型训练正确率的同时,提高模型的收敛速率。It can be seen from the above technical solutions that the federated learning method and system based on adjusting the batch size and gradient compression rate provided in the embodiments of the present application have the following advantages: compared with directly transmitting data, transmitting gradients can fully protect the privacy and security of user data; compressing gradients can reduce the pressure of communication transmission and reduce latency; dynamically adjusting the batch size and compression rate according to the requirements of learning time can improve the convergence rate of the model while ensuring the training time and model training accuracy.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动前提下,还可以根据这些附图获得其他附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1为本申请实施例中基于无线通信网络的联邦学习系统示意图;FIG1 is a schematic diagram of a federated learning system based on a wireless communication network in an embodiment of the present application;

图2为本申请实施例中梯度压缩中量化方法示意图;FIG2 is a schematic diagram of a quantization method in gradient compression in an embodiment of the present application;

图3为本申请实施例中提供的一个实施例的批量大小和压缩率调整流程图;FIG3 is a flow chart of batch size and compression rate adjustment according to an embodiment of the present application;

图4为本申请实施例中提供的一个实施例的整体训练交互示意图;FIG4 is a schematic diagram of an overall training interaction of an embodiment provided in the embodiments of the present application;

图5为本申请实施例中梯度压缩中稀疏化方法示意图;FIG5 is a schematic diagram of a sparse method in gradient compression in an embodiment of the present application;

图6为本申请实施例中提供的另一个实施例的批量大小和压缩率调整流程图;FIG6 is a flow chart of adjusting batch size and compression rate according to another embodiment provided in the embodiments of the present application;

图7为本申请实施例中提供的另一个实施例的整体训练交互示意图。FIG. 7 is a schematic diagram of an overall training interaction of another embodiment provided in the embodiments of the present application.

具体实施方式DETAILED DESCRIPTION

为使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本发明,并不限定本发明的保护范围。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific implementation methods described herein are only used to explain the present invention and do not limit the scope of protection of the present invention.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

图1为本申请实施例中基于无线通信网络的联邦学习系统,该联邦学习系统包括连接至基站等通信端的边缘服务器和终端,终端共享上行无线信道资源,基于本地终端的训练数据,与边缘服务器共同完成神经网络模型的训练。利用该联邦学习系统可以实现基于调整批量大小与梯度压缩率的联邦学习方法。具体地,对于本地计算引起的计算时延,可以根据终端的计算能力,调整批量大小,以减小计算时延;另一方面,对于通信瓶颈问题,可以通过减少传输的信息量,以减少通信开销,即梯度压缩技术。根据终端所在的无线信道的状态以调整梯度压缩率,从而减小通信时延。为了满足训练时间的需求,本发明中批量大小与压缩率调整方法可以在保证训练时延的同时,提高模型训练的收敛速率。FIG1 is a federated learning system based on a wireless communication network in an embodiment of the present application. The federated learning system includes an edge server and a terminal connected to a communication terminal such as a base station. The terminal shares uplink wireless channel resources and completes the training of the neural network model together with the edge server based on the training data of the local terminal. The federated learning system can be used to implement a federated learning method based on adjusting the batch size and the gradient compression rate. Specifically, for the calculation delay caused by local calculations, the batch size can be adjusted according to the computing power of the terminal to reduce the calculation delay; on the other hand, for the communication bottleneck problem, the communication overhead can be reduced by reducing the amount of information transmitted, that is, the gradient compression technology. The gradient compression rate is adjusted according to the state of the wireless channel where the terminal is located, thereby reducing the communication delay. In order to meet the training time requirements, the batch size and compression rate adjustment method in the present invention can improve the convergence rate of model training while ensuring the training delay.

实施例一Embodiment 1

本实施例提供的基于调整批量大小与梯度压缩率的联邦学习方法适用于多个移动终端与一个连接通信热点(如基站)的边缘服务器共同训练一个人工智能模型的场景,对于其他无线通信技术,可以以相同的工作模式工作,因此在本实施例中,主要考虑移动通信技术的情况。The federated learning method based on adjusting the batch size and gradient compression rate provided in this embodiment is applicable to the scenario where multiple mobile terminals and an edge server connected to a communication hotspot (such as a base station) jointly train an artificial intelligence model. For other wireless communication technologies, they can work in the same working mode. Therefore, in this embodiment, the situation of mobile communication technology is mainly considered.

本实施例中,各个终端采用批量的方法进行本地计算,梯度压缩方法为量化,特别的,量化采用固定长度量化,其量化过程如图2,梯度信息经由一定比特量化并编码后,作为梯度信息传输至边缘服务器。当采用高量化位数表示梯度时,可以尽可能地保留原始的梯度信息,同时也增加了传输的信息量;当采用低量化位数对梯度进行量化时,量化后的梯度信息与原始的梯度信息存在偏差,但是传输的信息量减少。此时,梯度的压缩率可以用量化误差表示,即In this embodiment, each terminal uses a batch method to perform local calculations, and the gradient compression method is quantization. In particular, the quantization uses fixed-length quantization. The quantization process is shown in Figure 2. The gradient information is quantized and encoded with a certain number of bits, and then transmitted to the edge server as gradient information. When a high quantization bit number is used to represent the gradient, the original gradient information can be retained as much as possible, while also increasing the amount of information transmitted; when a low quantization bit number is used to quantize the gradient, there is a deviation between the quantized gradient information and the original gradient information, but the amount of information transmitted is reduced. At this time, the compression rate of the gradient can be expressed by the quantization error, that is,

Figure GDA0004062316160000091
Figure GDA0004062316160000091

其中,Q(x)表示量化函数,x表示梯度。Among them, Q(x) represents the quantization function and x represents the gradient.

采用梯度量化时,模型训练的收敛速率可以表示为:When gradient quantization is used, the convergence rate of model training can be expressed as:

Figure GDA0004062316160000092
Figure GDA0004062316160000092

其中,α11是与模型相关的系数,且α1>0,β1>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 1 , β 1 are coefficients related to the model, and α 1 >0, β 1 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.

在本实施例中,最终的目标要在满足训练时延的基础上,根据各个终端的计算能力以及所处无线信道的状态,调整批量大小以及压缩率的大小,以提高模型训练的收敛速率。In this embodiment, the ultimate goal is to adjust the batch size and compression rate based on the computing power of each terminal and the state of the wireless channel on the basis of satisfying the training delay, so as to improve the convergence rate of model training.

具体地,批量大小与压缩率的调整算法如图3,包括以下部分:Specifically, the batch size and compression ratio adjustment algorithm is shown in Figure 3, which includes the following parts:

301、计算当前的训练时延,其中,训练时延包含五个部分:301. Calculate the current training delay, where the training delay includes five parts:

(1)本地以批量大小b计算梯度时,所需要的时间;(1) The time required to calculate the gradient locally with batch size b;

(2)梯度信息经过压缩后,经过无线信道上传至边缘服务器所经历的时间;(2) The time it takes for the gradient information to be compressed and uploaded to the edge server through the wireless channel;

(3)所以梯度信息汇总后进行平均汇总所需要的时间;(3) Therefore, the time required for averaging after the gradient information is aggregated;

(4)边缘服务器将计算好的平均梯度信息发送至各个终端所需要经历的时间;(4) The time required for the edge server to send the calculated average gradient information to each terminal;

(5)各终端收到梯度信息后,对模型进行更新的时间;(5) The time it takes for each terminal to update the model after receiving the gradient information;

302、由于本实施例的目的是为了在保证训练时间的前提下,提高收敛速率,因此需要将当前的训练时间与规定的训练时间Tmax相比较,分为三种情况,以做出对应的改变。302. Since the purpose of this embodiment is to improve the convergence rate while ensuring the training time, it is necessary to compare the current training time with the prescribed training time T max , which is divided into three cases to make corresponding changes.

303、当训练时间小于规定的训练时间时,应该适当增大批量大小与量化的位数,以提高收敛的速率。303. When the training time is less than the specified training time, the batch size and the number of quantization bits should be appropriately increased to improve the convergence rate.

304、当训练时间大于规定的训练时间时,应该适当减小批量大小与量化的位数,及时完成训练,满足时间要求。304. When the training time is longer than the specified training time, the batch size and the number of quantization bits should be appropriately reduced to complete the training in time and meet the time requirements.

305、当训练时间等于规定的训练时间时,此时的批量大小与压缩率可以用于训练。305. When the training time is equal to the specified training time, the batch size and compression rate at this time can be used for training.

将此调整过程用于联邦学习的实际训练中,终端与基站的具体的交互过程如图4,具体包括如下内容:This adjustment process is used in the actual training of federated learning. The specific interaction process between the terminal and the base station is shown in Figure 4, which specifically includes the following contents:

401、初始化,各个终端需要上传相关的信息,如计算能力,以及无线信道所在的状态等信息至基站。401. Initialization: Each terminal needs to upload relevant information, such as computing capability and the status of the wireless channel, to the base station.

402、基站根据各个终端上传的信息,利用图3中所述的调整方法,得到批量大小b与梯度压缩率c。402. The base station obtains a batch size b and a gradient compression rate c according to the information uploaded by each terminal using the adjustment method described in FIG. 3 .

403、各终端计算本地的梯度信息。403. Each terminal calculates local gradient information.

404、根据梯度压缩率,即量化误差,得到相应的量化位数,对梯度进行量化编码,并使用无线信道传输。404. According to the gradient compression rate, that is, the quantization error, a corresponding quantization bit number is obtained, the gradient is quantized and encoded, and the gradient is transmitted using a wireless channel.

405、基站收到梯度信息,对梯度进行平均,并发送至各终端。405. The base station receives the gradient information, averages the gradients, and sends them to each terminal.

406、各终端下载平均梯度信息。406. Each terminal downloads average gradient information.

407、各终端使用该平均梯度信息对模型进行更新。407. Each terminal updates the model using the average gradient information.

使用该发明方法,即可以获得最好的模型性能。By using the invented method, the best model performance can be obtained.

实施例二Embodiment 2

本实施例提供的调整方法适用于多个移动终端与一个连接通信热点(如基站)的边缘服务器共同训练一个人工智能模型的场景,对于其他无线通信技术,可以以相同的工作模式工作,因此在本实施例中,主要考虑移动通信的情况。The adjustment method provided in this embodiment is applicable to the scenario where multiple mobile terminals and an edge server connected to a communication hotspot (such as a base station) jointly train an artificial intelligence model. For other wireless communication technologies, they can work in the same working mode. Therefore, in this embodiment, the mobile communication situation is mainly considered.

本实施例中,各个终端采用批量的方法进行本地计算,梯度压缩方法为稀疏化,特别的,稀疏化的方式是选择部分较大的梯度进行传输,稀疏化的过程如图5,梯度信息经过稀疏化之后,将选中的梯度信息及其编号传输至边缘服务器。当保留更多的梯度信息时,可以减少梯度信息的损失,同时也增加了传输的信息量;当保留较少的梯度信息时,更多的梯度信息被丢失,但是传输的信息量减少。此时,梯度的压缩率可以表示为模型总量与传输的梯度个数的比值,即In this embodiment, each terminal uses a batch method to perform local calculations, and the gradient compression method is sparse. Specifically, the sparse method is to select some larger gradients for transmission. The sparse process is shown in Figure 5. After the gradient information is sparse, the selected gradient information and its number are transmitted to the edge server. When more gradient information is retained, the loss of gradient information can be reduced, and the amount of information transmitted is also increased; when less gradient information is retained, more gradient information is lost, but the amount of information transmitted is reduced. At this time, the compression rate of the gradient can be expressed as the ratio of the total model to the number of gradients transmitted, that is,

Figure GDA0004062316160000111
Figure GDA0004062316160000111

其中,M表示模型的大小。Here, M represents the size of the model.

采用梯度稀疏化时,模型训练的收敛速率可以表示为:When gradient sparsification is used, the convergence rate of model training can be expressed as:

Figure GDA0004062316160000112
Figure GDA0004062316160000112

其中,α22是与模型相关的系数,且α2>0,β2>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 2 , β 2 are coefficients related to the model, and α 2 >0, β 2 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.

在本实施例中,最终的目标要在满足训练时延的基础上,根据各个终端的计算能力以及所处无线信道的状态,调整批量大小以及压缩率的大小,以提高模型训练的收敛速率。In this embodiment, the ultimate goal is to adjust the batch size and compression rate based on the computing power of each terminal and the state of the wireless channel on the basis of satisfying the training delay, so as to improve the convergence rate of the model training.

具体地,批量大小与压缩率的调整算法如图6,包括以下部分:Specifically, the batch size and compression ratio adjustment algorithm is shown in Figure 6, which includes the following parts:

601、计算当前的训练时延,其中,训练时延包含五个部分:601. Calculate the current training delay, where the training delay includes five parts:

(1)本地以批量大小b计算梯度时,所需要的时间;(1) The time required to calculate the gradient locally with batch size b;

(2)梯度信息经过压缩后,经过无线信道上传至边缘服务器所经历的时间;(2) The time it takes for the gradient information to be compressed and uploaded to the edge server through the wireless channel;

(3)所以梯度信息汇总后进行平均汇总所需要的时间;(3) Therefore, the time required for averaging after the gradient information is aggregated;

(4)边缘服务器将计算好的平均梯度信息发送至各个终端所需要经历的时间;(4) The time required for the edge server to send the calculated average gradient information to each terminal;

(5)各终端收到梯度信息后,对模型进行更新的时间;(5) The time it takes for each terminal to update the model after receiving the gradient information;

602、由于本实施例的目的是为了在保证训练时间的前提下,提高收敛速率,因此需要将当前的训练时间与规定的训练时间Tmax相比较,分为三种情况,以做出对应的改变。602. Since the purpose of this embodiment is to improve the convergence rate while ensuring the training time, it is necessary to compare the current training time with the prescribed training time T max , which is divided into three cases to make corresponding changes.

603、当训练时间小于规定的训练时间时,应该适当增大批量大小与传输梯度的个数,以提高收敛的速率。603. When the training time is less than the specified training time, the batch size and the number of transmitted gradients should be appropriately increased to improve the convergence rate.

604、当训练时间大于规定的训练时间时,应该适当减小批量大小与传输梯度的个数,及时完成训练,满足时间要求。604. When the training time is longer than the specified training time, the batch size and the number of transmitted gradients should be appropriately reduced to complete the training in time and meet the time requirements.

605、当训练时间等于规定的训练时间时,此时的批量大小与压缩率可以用于训练。605. When the training time is equal to the specified training time, the batch size and compression rate at this time can be used for training.

将此调整过程用于联邦学习的实际训练中,终端与基站的具体的交互过程如图7,具体包括如下内容:This adjustment process is used in the actual training of federated learning. The specific interaction process between the terminal and the base station is shown in Figure 7, which specifically includes the following contents:

701、初始化,各个终端需要上传相关的信息,如计算能力,以及无线信道所在的状态等信息至基站。701. Initialization: Each terminal needs to upload relevant information, such as computing capability and the status of the wireless channel, to the base station.

702、基站根据各个终端上传的信息,利用图6中所述的调整方法,得到批量大小b与梯度压缩率c。702. The base station obtains a batch size b and a gradient compression rate c according to the information uploaded by each terminal using the adjustment method described in FIG. 6 .

703、各终端计算本地的梯度信息。703. Each terminal calculates local gradient information.

704、根据梯度压缩率,得到应该传输的梯度的个数,对梯度进行排序选择,选出最大的梯度及其位置,并使用无线信道传输。704. According to the gradient compression rate, the number of gradients to be transmitted is obtained, the gradients are sorted and selected, the largest gradient and its position are selected, and they are transmitted using a wireless channel.

705、基站收到梯度信息,对梯度进行平均,并发送至各终端。705. The base station receives the gradient information, averages the gradient, and sends it to each terminal.

706、各终端下载平均梯度信息。706. Each terminal downloads average gradient information.

707、各终端使用该平均梯度信息对模型进行更新。707. Each terminal uses the average gradient information to update the model.

使用该发明方法,即可以获得最好的模型性能。By using the invented method, the best model performance can be obtained.

该联邦学习方法通过调整批量大小与梯度压缩率,在规定学习时间内提升了模型的收敛速率,且在联邦学习时不需要传输原始数据,更好地保护用户的隐私性和安全性。This federated learning method improves the convergence rate of the model within the specified learning time by adjusting the batch size and gradient compression rate. It does not require the transmission of original data during federated learning, thus better protecting the privacy and security of users.

实施例三Embodiment 3

实施例三提供了一种基于调整批量大小与梯度压缩率的联邦学习系统,包括连接到基通信端的边缘服务器,与所述边缘服务器无线通信的多个终端,Embodiment 3 provides a federated learning system based on adjusting batch size and gradient compression rate, including an edge server connected to a base communication terminal, a plurality of terminals in wireless communication with the edge server,

所述边缘服务器根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率,并将调整后的批量大小与梯度压缩率传输至终端;The edge server adjusts the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal, and transmits the adjusted batch size and gradient compression rate to the terminal;

所述终端按照接收的批量大小进行模型学习,并将模型学习获得的梯度信息按照接收的提取压缩率压缩后输出至边缘服务器;The terminal performs model learning according to the received batch size, and compresses the gradient information obtained by the model learning according to the received extraction compression rate and outputs it to the edge server;

所述边缘服务器对接收的所有梯度信息求平均后,将梯度平均值同步到终端;The edge server averages all received gradient information and synchronizes the gradient average value to the terminal;

所述终端根据接收的梯度平均值更新模型。The terminal updates the model according to the received gradient average value.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,该联邦学习系统与实施例一和实施例二提供的基于调整批量大小与梯度压缩率的联邦学习方法实现具体过程相同,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific process of implementing the federated learning system and the federated learning method based on adjusting the batch size and the gradient compression rate provided in Example 1 and Example 2 is the same, and reference can be made to the corresponding process in the aforementioned method embodiments, which will not be repeated here.

该联邦学习系统通过调整批量大小与梯度压缩率,在规定学习时间内提升了模型的收敛速率,且在联邦学习时不需要传输原始数据,更好地保护用户的隐私性和安全性。The federated learning system improves the convergence rate of the model within the specified learning time by adjusting the batch size and gradient compression rate. It does not need to transmit original data during federated learning, thus better protecting the privacy and security of users.

其中,上述提到的无线通信方式,可以是现有的移动通信网络,即LTE(Long-termEvolution)或5G网络,或者是WiFi网络。Among them, the wireless communication method mentioned above can be an existing mobile communication network, namely LTE (Long-term Evolution) or 5G network, or a WiFi network.

其中,上述提到的边缘服务器的处理器能力远超于本地终端的计算能力,并且具备独立进行模型训练的能力。处理器可以是一个通用的中央处理器(Central ProcessingUnit,CPU),图形处理器(Graphics Processing Unit,GPU),微处理器,特定应用集成电路(Application Specific Integrated Circuit,ASIC),或一个或多个用于上述的模型训练的程序执行的集成电路。The processor capability of the edge server mentioned above far exceeds the computing capability of the local terminal and has the ability to independently perform model training. The processor can be a general-purpose central processing unit (CPU), a graphics processing unit (GPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for executing the above-mentioned model training programs.

其中,上述提到的本地终端,可以是现代智能手机、平板电脑、笔记本、自动驾驶汽车等可以支撑模型训练的移动终端,配有无线通信系统,可以接入移动通信网络、WiFi等主流的无线通信网络。Among them, the local terminals mentioned above can be modern smart phones, tablets, notebooks, self-driving cars and other mobile terminals that can support model training, equipped with wireless communication systems, and can access mainstream wireless communication networks such as mobile communication networks and WiFi.

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等,均应包含在本发明的保护范围之内。The specific implementation methods described above provide a detailed description of the technical solutions and beneficial effects of the present invention. It should be understood that the above is only the most preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, supplements and equivalent substitutions made within the scope of the principles of the present invention should be included in the protection scope of the present invention.

Claims (9)

1.一种基于调整批量大小与梯度压缩率的联邦学习方法,实现所述联邦学习的系统包括边缘服务器、与所述边缘服务器无线通信的多个终端,所述终端根据本地数据进行模型学习,其特征在于,所述联邦学习方法包括:1. A federated learning method based on adjusting batch size and gradient compression rate, wherein a system for implementing the federated learning comprises an edge server and a plurality of terminals in wireless communication with the edge server, wherein the terminals perform model learning based on local data, wherein the federated learning method comprises: 所述边缘服务器根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率,包括:The edge server adjusts the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal, including: (a)根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力计算当前学习时延;(a) Calculate the current learning delay based on the current batch size and gradient compression rate, combined with the computing power of the terminal and the communication capacity between the edge server and the terminal; (b)比较所述当前学习时延与规定学习时延,当前学习时延大于规定学习时延时,减小批量大小和增大梯度压缩率;当前学习时延小于规定学习时延时,增大批量大小和减小梯度压缩率;(b) comparing the current learning delay with the prescribed learning delay, if the current learning delay is greater than the prescribed learning delay, reducing the batch size and increasing the gradient compression rate; if the current learning delay is less than the prescribed learning delay, increasing the batch size and reducing the gradient compression rate; (c)重复步骤(a)和步骤(b),直到当前学习时延等于规定学习时延为止,与规定学习时延相等的当前学习时延对应的批量大小与梯度压缩率即为调整后的批量大小与梯度压缩率;(c) repeating steps (a) and (b) until the current learning delay is equal to the specified learning delay, and the batch size and gradient compression rate corresponding to the current learning delay equal to the specified learning delay are the adjusted batch size and gradient compression rate; 所述边缘服务器将调整后的批量大小与梯度压缩率传输至终端;The edge server transmits the adjusted batch size and gradient compression rate to the terminal; 所述终端按照接收的批量大小进行模型学习,并将模型学习获得的梯度信息按照接收的提取压缩率压缩后输出至边缘服务器;The terminal performs model learning according to the received batch size, and compresses the gradient information obtained by the model learning according to the received extraction compression rate and outputs it to the edge server; 所述边缘服务器对接收的所有梯度信息求平均后,将梯度平均值同步到终端;The edge server averages all received gradient information and synchronizes the gradient average value to the terminal; 所述终端根据接收的梯度平均值更新模型。The terminal updates the model according to the received gradient average value. 2.如权利要求1所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,步骤(a)中,当前学习时延的计算过程为:2. The method for federated learning based on adjusting batch size and gradient compression rate according to claim 1, wherein in step (a), the calculation process of the current learning delay is: 根据终端学习能力和批量大小计算终端以当前批量大小计算梯度时所需要的时间;Calculate the time required for the terminal to calculate the gradient with the current batch size based on the terminal's learning ability and batch size; 根据通信能力和梯度压缩率计算梯度信息经过压缩后经过无线信道上传至边缘服务器所经历的时间;The time it takes for the gradient information to be compressed and uploaded to the edge server through the wireless channel is calculated based on the communication capacity and the gradient compression rate; 计算所有梯度信息汇总后进行平均获得平均梯度信息所需要的时间;The time required to calculate and average all gradient information to obtain the average gradient information; 计算边缘服务器将平均梯度信息下发至各个终端所需要经历的时间;Calculate the time it takes for the edge server to send the average gradient information to each terminal; 计算各终端收到平均梯度信息后对模型进行更新所需要的时间;Calculate the time required for each terminal to update the model after receiving the average gradient information; 这五部分时间之和即为当前学习时延。The sum of these five parts of time is the current learning delay. 3.如权利要求1所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,所述梯度压缩率包括对梯度信息进行梯度量化获得的压缩率,表示为:3. The federated learning method based on adjusting batch size and gradient compression rate according to claim 1, wherein the gradient compression rate includes a compression rate obtained by gradient quantization of gradient information, expressed as:
Figure FDA0004062316150000021
Figure FDA0004062316150000021
其中,x表示梯度信息,Q(x)表示量化函数,c表示压缩率,
Figure FDA0004062316150000022
表示二范数的平方;
Among them, x represents gradient information, Q(x) represents the quantization function, and c represents the compression rate.
Figure FDA0004062316150000022
represents the square of the two-norm;
对应模型训练的收敛速率表示为:The convergence rate of the corresponding model training is expressed as:
Figure FDA0004062316150000023
Figure FDA0004062316150000023
其中,α11是与模型相关的系数,且α1>0,β1>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 1 , β 1 are coefficients related to the model, and α 1 >0, β 1 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.
4.如权利要求1所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,所述梯度压缩率包括对梯度信息进行稀疏化处理获得的压缩率,即选择梯度矩阵中的最大m个梯度进行传输,此时将梯度的传输量定义为压缩率,表示为:4. The method for federated learning based on adjusting batch size and gradient compression rate according to claim 1, wherein the gradient compression rate includes a compression rate obtained by performing sparse processing on gradient information, that is, selecting a maximum of m gradients in a gradient matrix for transmission, and at this time, the transmission amount of the gradient is defined as the compression rate, which is expressed as:
Figure FDA0004062316150000031
Figure FDA0004062316150000031
其中,M表示模型的大小;Where M represents the size of the model; 对应模型训练的收敛速率表示为:The convergence rate of the corresponding model training is expressed as:
Figure FDA0004062316150000032
Figure FDA0004062316150000032
其中,α22是与模型相关的系数,且α2>0,β2>0,B是各终端批量大小的总和,C是各终端梯度压缩率的平均值,S是训练步数。Among them, α 2 , β 2 are coefficients related to the model, and α 2 >0, β 2 >0, B is the sum of the batch sizes of each terminal, C is the average value of the gradient compression rate of each terminal, and S is the number of training steps.
5.如权利要求3所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,当梯度压缩率采用梯度量化方式获得的压缩率时,通过增大或减小量化函数采用的量化位数来增大或减小梯度压缩率。5. The federated learning method based on adjusting batch size and gradient compression rate as described in claim 3 is characterized in that when the gradient compression rate adopts the compression rate obtained by gradient quantization, the gradient compression rate is increased or decreased by increasing or decreasing the number of quantization bits adopted by the quantization function. 6.如权利要求4所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,当梯度压缩率采用梯度稀疏化方式获得的压缩率时,通过增大或减小梯度个数来增大或减小梯度压缩率。6. The federated learning method based on adjusting batch size and gradient compression rate as described in claim 4 is characterized in that when the gradient compression rate adopts the compression rate obtained by gradient sparsification, the gradient compression rate is increased or decreased by increasing or decreasing the number of gradients. 7.如权利要求1所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,采用以下公式获得梯度平均值:7. The method for federated learning based on adjusting batch size and gradient compression rate according to claim 1, wherein the gradient average is obtained by the following formula:
Figure FDA0004062316150000033
Figure FDA0004062316150000033
其中,gk(w)表示终端计算的梯度,G(w)表示梯度平均值,w表示模型参数,k表示终端索引,K表示终端的总个数。Among them, g k (w) represents the gradient calculated by the terminal, G(w) represents the average gradient, w represents the model parameter, k represents the terminal index, and K represents the total number of terminals.
8.如权利要求1所述的基于调整批量大小与梯度压缩率的联邦学习方法,其特征在于,采用以下公式根据梯度平均值更新模型:8. The method for federated learning based on adjusting batch size and gradient compression rate according to claim 1, characterized in that the model is updated according to the gradient average using the following formula: wt+1=wt-αG(wt)w t+1 = w t -αG(w t ) 其中,t是迭代次数,G(wt)表示第t次迭代时梯度平均值,wt、wt+1分别表示第t次和第t+1次迭代时终端的模型参数。Where t is the number of iterations, G( wt ) represents the average gradient at the tth iteration, and wt and wt+1 represent the model parameters of the terminal at the tth and t+1th iterations, respectively. 9.一种基于调整批量大小与梯度压缩率的联邦学习系统,包括连接到基通信端的边缘服务器,与所述边缘服务器无线通信的多个终端,其特征在于,9. A federated learning system based on adjusting batch size and gradient compression rate, comprising an edge server connected to a base communication terminal, and a plurality of terminals in wireless communication with the edge server, characterized in that: 所述边缘服务器根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力调整终端的批量大小与梯度压缩率,包括:(a)根据当前批量大小和梯度压缩率,并结合终端的计算能力和边缘服务器与终端之间的通信能力计算当前学习时延;The edge server adjusts the batch size and gradient compression rate of the terminal according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal, including: (a) calculating the current learning delay according to the current batch size and gradient compression rate, and in combination with the computing capability of the terminal and the communication capability between the edge server and the terminal; (b)比较所述当前学习时延与规定学习时延,当前学习时延大于规定学习时延时,减小批量大小和增大梯度压缩率;当前学习时延小于规定学习时延时,增大批量大小和减小梯度压缩率;(b) comparing the current learning delay with the prescribed learning delay, if the current learning delay is greater than the prescribed learning delay, reducing the batch size and increasing the gradient compression rate; if the current learning delay is less than the prescribed learning delay, increasing the batch size and reducing the gradient compression rate; (c)重复步骤(a)和步骤(b),直到当前学习时延等于规定学习时延为止,与规定学习时延相等的当前学习时延对应的批量大小与梯度压缩率即为调整后的批量大小与梯度压缩率;(c) repeating steps (a) and (b) until the current learning delay is equal to the specified learning delay, and the batch size and gradient compression rate corresponding to the current learning delay equal to the specified learning delay are the adjusted batch size and gradient compression rate; 所述边缘服务器将调整后的批量大小与梯度压缩率传输至终端;The edge server transmits the adjusted batch size and gradient compression rate to the terminal; 所述终端按照接收的批量大小进行模型学习,并将模型学习获得的梯度信息按照接收的提取压缩率压缩后输出至边缘服务器;The terminal performs model learning according to the received batch size, and compresses the gradient information obtained by the model learning according to the received extraction compression rate and outputs it to the edge server; 所述边缘服务器对接收的所有梯度信息求平均后,将梯度平均值同步到终端;The edge server averages all received gradient information and synchronizes the gradient average value to the terminal; 所述终端根据接收的梯度平均值更新模型。The terminal updates the model according to the received gradient average value.
CN202010166667.4A 2020-03-11 2020-03-11 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment Active CN111401552B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010166667.4A CN111401552B (en) 2020-03-11 2020-03-11 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010166667.4A CN111401552B (en) 2020-03-11 2020-03-11 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment

Publications (2)

Publication Number Publication Date
CN111401552A CN111401552A (en) 2020-07-10
CN111401552B true CN111401552B (en) 2023-04-07

Family

ID=71428616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010166667.4A Active CN111401552B (en) 2020-03-11 2020-03-11 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment

Country Status (1)

Country Link
CN (1) CN111401552B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070207B (en) * 2020-07-31 2024-11-19 华为技术有限公司 A model training method and device
CN114091679A (en) * 2020-08-24 2022-02-25 华为技术有限公司 Method for updating machine learning model and communication device
CN112001502B (en) * 2020-08-24 2022-06-21 平安科技(深圳)有限公司 Federal learning training method and device for high-delay network environment robustness
US12376103B2 (en) 2020-09-10 2025-07-29 Qualcomm Incorporated Reporting for information aggregation in federated learning
CN111931950B (en) * 2020-09-28 2021-01-26 支付宝(杭州)信息技术有限公司 A method and system for updating model parameters based on federated learning
CN112182633B (en) * 2020-11-06 2023-03-10 支付宝(杭州)信息技术有限公司 Privacy-preserving model joint training method and device
KR20230074512A (en) * 2020-12-11 2023-05-30 엘지전자 주식회사 Apparatus and method for signal transmission in a wireless communication system
CN112231742B (en) * 2020-12-14 2021-06-18 支付宝(杭州)信息技术有限公司 Model joint training method and device based on privacy protection
CN112560059B (en) * 2020-12-17 2022-04-29 浙江工业大学 Vertical federal model stealing defense method based on neural pathway feature extraction
CN116569185B (en) * 2020-12-31 2025-11-14 华为技术有限公司 Method and apparatus for transmitting model data
US20220245527A1 (en) * 2021-02-01 2022-08-04 Qualcomm Incorporated Techniques for adaptive quantization level selection in federated learning
CN112800468B (en) * 2021-02-18 2022-04-08 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment based on privacy protection
CN113807534B (en) * 2021-03-08 2023-09-01 京东科技控股股份有限公司 Model parameter training method and device of federal learning model and electronic equipment
CN113222179B (en) * 2021-03-18 2023-06-20 北京邮电大学 A Federated Learning Model Compression Method Based on Model Sparsification and Weight Quantization
CN113298191B (en) * 2021-04-01 2022-09-06 山东大学 User behavior identification method based on personalized semi-supervised online federal learning
CN113095510B (en) * 2021-04-14 2024-03-01 深圳前海微众银行股份有限公司 Federal learning method and device based on block chain
CN113098806B (en) * 2021-04-16 2022-03-29 华南理工大学 A Channel Adaptive Gradient Compression Method for Edge-End Collaboration in Federated Learning
CN113238867B (en) * 2021-05-19 2024-01-19 浙江凡双科技股份有限公司 Federal learning method based on network unloading
JP7546254B2 (en) * 2021-07-06 2024-09-06 日本電信電話株式会社 Processing system, processing method, and processing program
CN113504999B (en) * 2021-08-05 2023-07-04 重庆大学 Scheduling and resource allocation method for high-performance hierarchical federal edge learning
KR20230026137A (en) * 2021-08-17 2023-02-24 삼성전자주식회사 A server for distributed learning and distributed learning method
CN113923605B (en) * 2021-10-25 2022-08-09 浙江大学 Distributed edge learning system and method for industrial internet
CN114841370B (en) * 2022-04-29 2022-12-09 杭州锘崴信息科技有限公司 Processing method and device of federal learning model, electronic equipment and storage medium
CN115150288B (en) * 2022-05-17 2023-08-04 浙江大学 Distributed communication system and method
CN114710415B (en) * 2022-05-23 2022-08-12 北京理工大学 Redundant coded passive message reliable transmission and processing system
WO2024050659A1 (en) * 2022-09-05 2024-03-14 华南理工大学 Federated learning lower-side cooperative channel adaptive gradient compression method
CN116050540B (en) * 2023-02-01 2023-09-22 北京信息科技大学 An adaptive federated edge learning method based on joint dual-dimensional user scheduling
CN116405491B (en) * 2023-06-09 2023-09-15 北京随信云链科技有限公司 File batch uploading method and system, electronic equipment and computer readable storage medium
CN119697040A (en) * 2023-09-25 2025-03-25 华为技术有限公司 A data processing method and a communication device
CN117857647B (en) * 2023-12-18 2024-09-13 慧之安信息技术股份有限公司 Federal learning communication method and system based on MQTT oriented to industrial Internet of things
CN118245809B (en) * 2024-05-27 2024-08-16 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Batch size adjustment method in distributed data parallel online asynchronous training

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934802A (en) * 2019-02-02 2019-06-25 浙江工业大学 A Cloth Defect Detection Method Based on Fourier Transform and Image Morphology
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Vertical federated learning system optimization method, device, equipment and readable storage medium
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Horizontal and vertical federation methods, devices, equipment and media
CN110874484A (en) * 2019-10-16 2020-03-10 众安信息技术服务有限公司 Data processing method and system based on neural network and federal learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6303332B2 (en) * 2013-08-28 2018-04-04 富士通株式会社 Image processing apparatus, image processing method, and image processing program
US11475350B2 (en) * 2018-01-22 2022-10-18 Google Llc Training user-level differentially private machine-learned models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934802A (en) * 2019-02-02 2019-06-25 浙江工业大学 A Cloth Defect Detection Method Based on Fourier Transform and Image Morphology
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Vertical federated learning system optimization method, device, equipment and readable storage medium
CN110874484A (en) * 2019-10-16 2020-03-10 众安信息技术服务有限公司 Data processing method and system based on neural network and federal learning
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Horizontal and vertical federation methods, devices, equipment and media

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Haijian Sun .Adaptive Federated Learning With Gradient Compression in Uplink NOMA.《arXiv》.2020,1-10. *
刘俊旭.机器学习的隐私保护研究综述.《计算机研究与发展》.2020,346-362. *

Also Published As

Publication number Publication date
CN111401552A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401552B (en) Federal learning method and system based on batch size adjustment and gradient compression rate adjustment
CN113315604B (en) Adaptive gradient quantization method for federated learning
CN111176929A (en) Edge federal learning-oriented high-energy-efficiency calculation communication joint optimization method
CN114978413B (en) Information coding control method and related device
CN109743713B (en) Resource allocation method and device for electric power Internet of things system
CN114096006B (en) Resource allocation and data compression combined optimization method in mobile edge computing system
JP7425197B2 (en) Scheduling method and device
CN114697333A (en) An edge computing method for energy queue balance
CN106162188A (en) Video code rate self-adapting regulation method and device
CN110798865A (en) Data compression method, data compression device, computer equipment and computer-readable storage medium
CN114827289B (en) Communication compression method, system, electronic device and storage medium
CN112770398A (en) Far-end radio frequency end power control method based on convolutional neural network
CN116667973A (en) Data transmission method, device and equipment for analog joint source channel coding
CN115150288B (en) Distributed communication system and method
CN102821489A (en) Base station and data compression method on base station side
CN113613270A (en) Fog access network calculation unloading method based on data compression
CN109561129B (en) A Cooperative Computing Offloading Method Based on Optical Fiber-Wireless Network
Lin et al. Channel-adaptive quantization for wireless federated learning
CN108738048B (en) Active storage method of maximized fairness base station based on genetic algorithm
CN114302435B (en) An optimization method for power consumption and delay in mobile edge computing transmission system
CN117750406A (en) Communication resource joint optimization algorithm under electric power 5G mixed scene
Bhat et al. Distortion minimization in energy harvesting sensor nodes with compression power constraints
CN111885641B (en) A buffer resource allocation method in buffer-assisted relay network
CN105592006B (en) IR state transition period selection method and device of ROHC compressor
CN109995406B (en) Beam forming method and baseband processing unit of wireless communication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant