WO2023035691A1 - 一种数据处理方法、系统、存储介质及电子设备 - Google Patents

一种数据处理方法、系统、存储介质及电子设备 Download PDF

Info

Publication number
WO2023035691A1
WO2023035691A1 PCT/CN2022/096157 CN2022096157W WO2023035691A1 WO 2023035691 A1 WO2023035691 A1 WO 2023035691A1 CN 2022096157 W CN2022096157 W CN 2022096157W WO 2023035691 A1 WO2023035691 A1 WO 2023035691A1
Authority
WO
WIPO (PCT)
Prior art keywords
model parameters
local model
learning rate
iterations
local
Prior art date
Application number
PCT/CN2022/096157
Other languages
English (en)
French (fr)
Inventor
沈力
廖烙锋
段佳
陶大程
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2023035691A1 publication Critical patent/WO2023035691A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the technical field of adversarial learning, and more specifically, to a data processing method, system, storage medium and electronic equipment.
  • Adversarial learning is a machine learning method.
  • the method of adversarial learning is to let two networks compete against each other, one of which is the generator network, which continuously captures the probability distribution of real pictures in the training library, and converts the input random noise into new samples (false data), and the other
  • One is the discriminator network, which can observe real and fake data at the same time, and judge whether the data is true or false. Through repeated confrontation, the capabilities of the generator and the discriminator will continue to increase until a balance is reached, and finally the generator can generate high-quality, fake pictures.
  • Adaptive learning rate does not require engineers to manually adjust the learning rate, eliminating the interference of human factors in model learning, so it is also an important technology to achieve reliable artificial intelligence.
  • the present disclosure is a data processing method, system, storage medium and electronic equipment, which reduce the limitation of adversarial learning training and improve the efficiency of engineering practice.
  • the first aspect of the present disclosure discloses a data processing method, the method comprising:
  • the acquired local model parameters Perform weighted average calculation with the adaptive learning rate to obtain the model parameters after weighted average And the model parameters after the weighted average For pre-acquired local model parameters to update;
  • the updated local model parameters performing calculations to obtain stochastic gradient directions, and determining target model parameters based on the stochastic gradient directions;
  • a network model training operation is performed.
  • obtaining the adaptive learning rate of the current iteration number of each parallel device includes:
  • the diameter of the feasible set the estimated value of the preset gradient upper bound, the preset basic learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform the calculation to get the adaptive learning rate.
  • the acquired local model parameters Perform weighted average calculation with the adaptive learning rate to obtain the model parameters after weighted average And the model parameters after the weighted average For pre-acquired local model parameters Make updates, including:
  • the difference belongs to the set of communication time nodes of each device, it is determined that each parallel device is in a communication state, and the set of communication time nodes of each device is determined by the local update steps of the parallel device and the total number of iterations, and the total number of iterations Determined by the number of communications between parallel devices and the number of local update steps of parallel devices;
  • each parallel device When each parallel device is in a communication state, each parallel device sends local model parameters and the adaptive learning rate to the central device, triggering the central device to transfer the local model parameters Perform weighted average calculation on the sum of the adaptive learning rate and the number of parallel devices obtained in advance to obtain weights and model parameters after weighted average The model parameters after the weighted average From the weights, the obtained local model parameters and the sum of the number of parallel devices is determined;
  • model parameters after the weighted average For pre-acquired local model parameters to update.
  • each parallel device If the difference does not belong to the set of communication time nodes of each parallel device, it is determined that each parallel device is in a non-communication state, and the set of communication time nodes of each device is determined by the number of local update steps and the total number of iterations of the parallel device. The total number of iterations is determined by the number of communications between parallel devices and the number of local update steps of parallel devices;
  • the local model parameter will be For the local model parameters to update.
  • the adaptive learning rate of the current iteration number of each parallel device it also includes:
  • the adaptive learning rate of the current iteration number of each parallel device it also includes:
  • a second aspect of the present disclosure discloses a data processing system, the system comprising:
  • the acquisition unit is used to acquire the adaptive learning rate of the current number of iterations of each parallel device during the iterative calculation process
  • a first updating unit configured to update the acquired local model parameters if the current number of iterations meets the first preset condition Carry out weighted average calculation with the adaptive learning rate to obtain weight and model parameters after weighted average And the model parameters after the weighted average For pre-acquired local model parameters to update;
  • the second updating unit is configured to, if the current number of iterations meets the second preset condition, based on the weighted averaged model parameters The sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters and put the local model parameter For the local model parameters to update;
  • the determination unit is used to convert the updated local model parameters through an additional gradient algorithm performing calculations to obtain stochastic gradient directions, and determining target model parameters based on the stochastic gradient directions;
  • An execution unit configured to execute a network model training operation based on the target model parameters.
  • the acquisition unit includes:
  • An acquisition module for acquiring the diameter of the feasible set, the estimated value of the upper bound of the preset gradient, the preset basic learning rate and the current iteration number of each parallel device;
  • the first calculation module is used to calculate the local model parameters when the current number of iterations is equal to the preset number of times local model parameters and local model parameters
  • the second calculation module is used to use the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform the calculation to get the adaptive learning rate.
  • the third aspect of the present disclosure discloses a storage medium, the storage medium includes stored instructions, wherein when the instructions are executed, the device where the storage medium is located is controlled to perform the data processing described in any one of the first aspect method.
  • a fourth aspect of the present disclosure discloses an electronic device, including a memory, and one or more instructions, wherein one or more instructions are stored in the memory, and configured to be executed by one or more processors as in the first aspect Any one of the data processing methods.
  • FIG. 1 is a schematic flow diagram of a data processing method disclosed in an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a comparison of convergence speed effects disclosed in an embodiment of the present disclosure
  • FIG. 3 is a schematic flow diagram of obtaining an adaptive learning rate for the current iteration number of each parallel device disclosed in an embodiment of the present disclosure
  • Fig. 4 is the model parameter after weighted average disclosed by the embodiment of the present disclosure For pre-acquired local model parameters Schematic diagram of the update process
  • Fig. 5 is the local model parameter disclosed by the embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a data processing system disclosed in an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present disclosure.
  • the term "comprises”, “comprises” or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes none. other elements specifically listed, or also include elements inherent in such a process, method, article, or apparatus.
  • an element defined by the phrase “comprising a " does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
  • the present disclosure discloses a data processing method, system, storage medium and electronic device, which acquires the adaptive learning rate of the current iteration number of each parallel device, and if the current iteration number meets the first preset condition, it will obtain local model parameters to Perform weighted average calculation with the adaptive learning rate to obtain the model parameters after weighted average And the model parameters after weighted average For pre-acquired local model parameters
  • To update if the current number of iterations meets the second preset condition, based on the model parameters after weighted average
  • the adaptive learning rate and distributed computing can be realized during the anti-learning training at the same time, which reduces the limitation of the anti-learning training.
  • the calculation of the adaptive learning rate is performed locally without communication between devices, thereby reducing the trial-and-error model training of engineers and improving the efficiency of engineering practice.
  • the specific implementation manner is specifically described through the following embodiments.
  • X and Y are the model parameter model search space (also called feasible set), F is a training function for different machine learning problems, min is the minimum value, and max is the maximum value.
  • the above mathematical model covers many problems in engineering practice, such as generative confrontation neural network training, bilinear game theory model solving, etc.
  • the function F is convex and concave, and we consider the case that the function F is smooth or not smooth.
  • Our proposed algorithm is presented in the algorithm box in the figure below.
  • the combination of variable x and variable y (representing model parameters) is denoted as z
  • the product set of set X and set Y is denoted as Z.
  • FIG. 1 it is a schematic flowchart of a data processing method disclosed in an embodiment of the present disclosure.
  • the data processing method mainly includes the following steps:
  • Step S101 Execute an initialization operation.
  • step S101 the initialization operation includes initialization calculation and initialization of local model parameters of each parallel device.
  • the parameters include the diameter D of the feasible set, the preset basic learning rate ⁇ , the estimated value G 0 of the preset gradient upper bound, the number of parallel device local update steps K, and the number of parallel devices M The number of communications with parallel devices R.
  • G 0 is the estimated value of the preset gradient upper bound
  • G 0 is estimated according to the data set.
  • each parallel device communication time node set S ⁇ 0, K, 2K, . . . , RK ⁇ .
  • K is the number of local update steps of parallel devices
  • R is the number of communications between parallel devices.
  • each parallel device executes step S102 to step S105 until the iteration process is completed.
  • T KR
  • T is the total iteration number of each parallel device.
  • Step S102 During the iterative calculation process, obtain the adaptive learning rate of the current iteration number of each parallel device.
  • step S102 specifically in the iterative calculation process, the process of obtaining the adaptive learning rate of the current iteration number of each parallel device is as follows:
  • the iterative calculation process obtain the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate and the current iteration number of each parallel device, and then, when the current iteration number is equal to the preset number, calculate get local model parameters local model parameters and local model parameters Finally, the diameter of the feasible set, the estimated value of the upper bound of the preset gradient, the preset base learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform the calculation to get the adaptive learning rate.
  • the summation item in the denominator of formula (4) calculates the difference of the model parameters that have appeared in the local device first, and then sums them up.
  • the calculation of the adaptive learning rate only depends on the data set on the local machine and the model iteration parameters that have appeared locally, and does not require mutual communication between machines.
  • Step S103 If the current number of iterations meets the first preset condition, the acquired local model parameters Perform weighted average calculation with the adaptive learning rate to obtain the model parameters after weighted average And the model parameters after weighted average For pre-acquired local model parameters to update.
  • step S103 if t-1 ⁇ S, that is, the current number of iterations meets the first preset condition, then in the current number of iterations, each parallel device needs to communicate, where S is a set of communication time nodes for each parallel device.
  • each device transmits the current model parameters and learning step size to a central device after updating K steps.
  • the central device we compute a weighted average of each device's models, where the weights are inversely proportional to the machine's current learning step size. Then we weighted and averaged the model and broadcasted it to each parallel device.
  • the obtained local model parameters Perform weighted average calculation with the adaptive learning rate to obtain the model parameters after weighted average And the model parameters after weighted average For pre-acquired local model parameters
  • the update process is as follows:
  • the difference belongs to the set of communication time nodes of each device, it is determined that each parallel device is in a communication state, and the set of communication time nodes of each device is determined by The number of local update steps and the total number of iterations of parallel devices are determined. The total number of iterations is determined by the number of communication times between parallel devices and the number of steps of local update of parallel devices.
  • each parallel device when each parallel device is in a communication state, each parallel device sends a local Model parameters and adaptive learning rate to the central device, triggering the central device to transfer the local model parameters
  • the weighted average calculation is performed on the sum of the adaptive learning rate and the number of parallel devices obtained in advance to obtain the weight and the local model parameters after weighted average
  • Local model parameters after weighted average The local model parameters obtained by weights and and the sum of the number of parallel devices is determined, and finally, the local model parameters after the weighted average For pre-acquired local model parameters Update, that is, the central device updates the local model parameters
  • the trigger central device will local model parameters
  • the weighted average calculation is performed on the sum of the adaptive learning rate and the number of parallel devices obtained in advance, and the weight and weighted average model parameters are obtained.
  • the formulas are as follows:
  • w m is the weight
  • ⁇ m is the sum of the number of parallel devices
  • is the adaptive learning rate calculated when the number of iterations is equal to ⁇ .
  • ⁇ m is the sum of the number of parallel devices
  • w m is the weight
  • Step S104 If the current number of iterations meets the second preset condition, based on the model parameters after weighted average The sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters and put the local model parameter For local model parameters to update.
  • step S104 if That is, if the current number of iterations meets the second preset condition, then in the current number of iterations, each parallel device does not need to communicate.
  • Step S103 and step S104 are the iterative calculation process, and step S105 is executed after the iterative calculation process is completed.
  • Step S105 through the additional gradient algorithm, the updated local model parameters Perform calculations to obtain the stochastic gradient direction, and determine the target model parameters based on the stochastic gradient direction.
  • the additional gradient algorithm is a commonly used algorithm for confrontation training. He is different from the conventional gradient descent algorithm. He needs to calculate the stochastic gradient twice in each iteration, and then perform two gradient descents. The first gradient descent on the current local model along the out of the computed stochastic gradient direction for descent. Denote the model obtained in the first step as °z t m . The second gradient descent on the current model Descent along the direction of the stochastic gradient computed at °z t m out. On each parallel device, we first randomly sample a mini-batch of training samples, and use these samples to compute stochastic gradient directions.
  • step S105 through the additional gradient algorithm, the updated local model parameters Perform calculations to obtain the first stochastic gradient direction and the second stochastic gradient direction, and determine the target model parameters based on the first stochastic gradient direction and the second stochastic gradient direction Among them, ⁇ m is the sum of the number of parallel devices, ⁇ t is the sum of the number of iterations, T is the total number of iterations of each parallel device, is the local model parameter.
  • ⁇ Z is the projection factor
  • the local model parameter is the adaptive learning rate
  • Step S106 Perform a network model training operation based on the target model parameters.
  • the network model training operation may be in scenarios such as image generation, reliable and robust model training, and game theory model solving.
  • the design of the adaptive learning rate in this algorithm is based on the model parameters that have appeared in the local machine iteration, and does not need to be known in advance Dataset parameters. Computation of the adaptive learning rate is done entirely locally, requiring no machine-to-machine communication. The adaptive learning rate reduces the trial-and-error model training of engineers and improves the efficiency of engineering practice.
  • the adaptive distributed adversarial learning algorithm of this scheme has important engineering practical significance in many scenarios such as huge amount of model parameters, huge amount of training data, need to realize user privacy protection, distributed computing, and slow communication speed of parallel devices.
  • the adoption of this technical solution can greatly reduce the training communication, communication error and learning rate debugging problems of distributed training large-scale confrontation learning model.
  • the ImageNet dataset contains more than 10 million samples, and the generative confrontation network model contains tens of millions of parameters.
  • the generative confrontation network model contains tens of millions of parameters.
  • the design of the learning rate has an important impact on the quality of the generated images. Every time the learning rate is adjusted, a large number of GPU computing resources are consumed, which greatly increases the cost of the enterprise.
  • the technical solution in this patent can uniformly solve the communication problem and the learning rate adjustment problem in model training, so that a large-scale confrontation learning network model can be trained quickly and effectively.
  • the function DualGap measures the quality of a certain model model parameter, which is a commonly used model parameter model measurement criterion in adversarial learning.
  • o is to omit the constant term
  • is the expected value
  • G is the upper bound of the gradient modulus of the function F
  • is the ratio of the engineer's initial gradient estimate to the upper bound of the gradient modulus of the function F
  • T is the value of each device
  • D is the diameter of the feasible set
  • is the noise level of the stochastic gradient
  • M is the number of parallel devices.
  • the function DualGap measures the quality of a certain model model parameter, which is a commonly used model parameter model measurement criterion in adversarial learning. and Both are variables, X and Y are model search spaces (also called feasible sets), F is a training function for different machine learning problems, max is the maximum value, and min is the minimum value.
  • V 1 (T) is the expected value of the square root of the sum of the moduli of random gradients that appear on each device
  • o is the constant term omitted
  • D is the diameter of the feasible set
  • G is the gradient modulus of the function F
  • M is the number of parallel devices
  • is the ratio of the engineer’s initial gradient estimate to the upper bound of the gradient modulus of the function F
  • L is the smoothness of the function F
  • T is the total number of iterations of each device
  • is the random gradient
  • the degree of noise, theoretically, the above convergence rate is the best convergence rate that any algorithm can achieve.
  • the algorithm proposed in this scheme is applied to the problem of training generative models for network models.
  • Use Frechet Inception Distance, FID) (the lower the FID, the better the effect of the algorithm) and (Inception Score, IS) (the higher the IS, the better the effect of the algorithm) to measure the superiority of the algorithm of this scheme.
  • FID Frechet Inception Distance
  • IS Inception Score
  • MB-ASMP represents the small-batch adaptive mirror single gradient descent algorithm
  • MB-UMP represents the small-batch global mirror gradient descent algorithm
  • LocalAdam represents the local adaptive gradient descent algorithm
  • LocalAdaSEG represents the algorithm of this scheme
  • the ordinate (1.00 , 1.25, 1.50, 1.75, 2.00, 2.25, 2.50, 2.75) represent the IS value
  • the abscissa (0, 2, 4, 6, 8, 10) represents the traffic.
  • the adversarial learning algorithm proposed in this scheme can achieve the best convergence rate, and as the number of devices increases, the convergence rate of the algorithm gradually accelerates.
  • the adaptive learning rate adjustment mechanism in the algorithm greatly reduces the cost of learning rate adjustment and improves the stability of the algorithm.
  • we also theoretically verified the convergence of the algorithm proposed in this scheme ensuring that the algorithm converges in various environments, and enhancing the credibility of the scheme.
  • the adaptive learning rate and distributed computing can be realized simultaneously during the anti-learning training, which reduces the limitation of the adversarial learning training.
  • the calculation of the adaptive learning rate is performed locally without communication between devices, thereby reducing the trial-and-error model training of engineers and improving the efficiency of engineering practice.
  • the process of obtaining the adaptive learning rate of the current number of iterations of each parallel device in the above step S102 mainly includes the following steps:
  • Step S301 Obtain the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate and the current iteration number of each parallel device.
  • Step S302 When the current iteration number is equal to the preset number, calculate the local model parameters local model parameters and local model parameters
  • Step S303 The diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform the calculation to get the adaptive learning rate.
  • step S301-step S303 is consistent with the execution principle of step S102 above, which can be referred to, and will not be repeated here.
  • the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform calculations to achieve the purpose of obtaining an adaptive learning rate.
  • the update process mainly includes the following steps:
  • Step S401 Calculate the difference between the current iteration number and the preset number to obtain the difference.
  • Step S402 If the difference value belongs to the communication time node set of each device, it is determined that each parallel device is in a communication state, and the communication time node set of each device is determined by the local update steps of the parallel device and the total number of iterations, and the total number of iterations is determined by the parallel device The number of inter-communications and the number of steps for local updates of parallel devices are determined.
  • Step S403 When each parallel device is in a communication state, make each parallel device send local model parameters and adaptive learning rate to the central device, triggering the central device to transfer the local model parameters
  • the weighted average calculation is performed on the sum of the adaptive learning rate and the number of parallel devices obtained in advance, and the weight and weighted average model parameters are obtained Model parameters after weighted average
  • the local model parameters obtained by weights and and the sum of the number of parallel devices is determined.
  • Step S404 Weighted and averaged model parameters For pre-acquired local model parameters to update.
  • step S401-step S404 is consistent with the execution principle of step S103 above, which can be referred to, and will not be repeated here.
  • the difference between the current iteration number and the preset number is calculated to obtain the difference, and each parallel device is in a communication state, so that each parallel device sends a local model parameter and adaptive learning rate to the central device, triggering the central device to transfer the local model parameters
  • the weighted average calculation is performed on the sum of the adaptive learning rate and the number of parallel devices obtained in advance, and the weight and weighted average model parameters are obtained Realize the model parameters after weighted average For pre-acquired local model parameters purpose of updating.
  • step S104 it is involved in the above step S104 if the current number of iterations meets the second preset condition, based on the model parameters after weighted average
  • the update process mainly includes the following steps:
  • Step S501 Calculate the difference between the current iteration number and the preset number to obtain the difference.
  • Step S502 If the difference value does not belong to the communication time node set of each parallel device, it is determined that each parallel device is in a non-communication state, and the communication time node set of each device is determined by the local update steps of the parallel device and the total number of iterations, the total number of iterations It is determined by the number of communications between parallel devices and the number of local update steps of parallel devices.
  • Step S503 When each device is in a non-communication state, based on the model parameters after weighted average The sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters
  • Step S504 the local model parameters For local model parameters to update.
  • step S501-step S504 is consistent with the execution principle of step S104 above, which can be referred to, and will not be repeated here.
  • the difference between the current number of iterations and the preset number of times is calculated to obtain the difference, and when each device is in a non-communication state, based on the model parameters after weighted average
  • the sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters Implements the local model parameter For local model parameters purpose of updating.
  • the embodiment of the present disclosure also discloses a corresponding data processing system.
  • the data processing system includes an acquisition unit 601, a first update unit 602, a Two update unit 603 , determination unit 604 and execution unit 605 .
  • the obtaining unit 601 is configured to obtain the adaptive learning rate of the current number of iterations of each parallel device.
  • the first update unit 602 is used to update the acquired local model parameters if the current iteration number meets the first preset condition Perform weighted average calculation with the adaptive learning rate to obtain weights and model parameters after weighted average And the model parameters after weighted average For pre-acquired local model parameters to update.
  • the second update unit 603 is configured to, if the current number of iterations meets the second preset condition, based on the model parameters after weighted average The sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters and put the local model parameter For local model parameters to update.
  • a determining unit 604 configured to use an additional gradient algorithm to convert the updated local model parameters Perform calculations to obtain the stochastic gradient direction, and determine the target model parameters based on the stochastic gradient direction.
  • the executing unit 605 is configured to execute a network model training operation based on the target model parameters.
  • the obtaining unit 601 includes:
  • the obtaining module is used to obtain the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset base learning rate and the current iteration number of each parallel device.
  • the first calculation module is used to calculate the local model parameters when the current number of iterations is equal to the preset number of times local model parameters and local model parameters
  • the second calculation module is used to use the diameter of the feasible set, the estimated value of the preset gradient upper bound, the preset basic learning rate, the current number of iterations of each parallel device, and the local model parameters local model parameters and local model parameters Perform the calculation to get the adaptive learning rate.
  • the first updating unit 602 includes:
  • the third calculation module is used to calculate the difference between the current iteration number and the preset number to obtain the difference.
  • the first determination module is used to determine that each parallel device is in a communication state if the difference value belongs to a set of communication time nodes of each device, and the set of communication time nodes of each device is determined by the local update steps of the parallel device and the total number of iterations, and the total number of iterations The number is determined by the number of communications between parallel devices and the number of local update steps of parallel devices.
  • the fourth calculation module is used to enable each parallel device to send local model parameters when each parallel device is in a communication state and adaptive learning rate to the central device, triggering the central device to transfer the local model parameters
  • the weighted average calculation is performed on the sum of the adaptive learning rate and the number of parallel devices obtained in advance, and the weight and weighted average model parameters are obtained Model parameters after weighted average
  • the local model parameters obtained by weights and and the sum of the number of parallel devices is determined.
  • the first update module is used to weight and average the model parameters For pre-acquired local model parameters to update.
  • the second updating unit 603 includes:
  • the fifth calculation module is used to calculate the difference between the current number of iterations and the preset number of iterations to obtain a difference.
  • the second determination module is used to determine that each parallel device is in a non-communication state if the difference value does not belong to the communication time node set of each parallel device, and the communication time node set of each device is determined by the local update steps and the total iteration number of the parallel devices , the total number of iterations is determined by the number of communications between parallel devices and the number of local update steps of parallel devices.
  • the acquisition module is used to obtain the model parameters based on the weighted average when each device is in a non-communication state The sum of the weight and the number of parallel devices obtained in advance to obtain the local model parameters
  • the second update module is used to update the local model parameters For local model parameters to update.
  • first initialization unit includes:
  • the acquisition module is used to obtain the diameter of the feasible set, the estimated value of the preset base learning rate and the preset gradient upper bound.
  • the sixth calculation module is used to perform initialization calculation on the diameter of the feasible set, the preset basic learning rate and the estimated value of the preset gradient upper bound to obtain the initial learning rate.
  • a second initialization unit is also included.
  • the second initialization unit is used for initializing local model parameters of each parallel device.
  • the adaptive learning rate and distributed computing can be realized simultaneously during the anti-learning training, which reduces the limitation of the adversarial learning training.
  • the calculation of the adaptive learning rate is performed locally without communication between devices, thereby reducing the trial-and-error model training of engineers and improving the efficiency of engineering practice.
  • An embodiment of the present disclosure further provides a storage medium, the storage medium includes stored instructions, wherein when the instructions are executed, the device where the storage medium is located is controlled to execute the above data processing method.
  • An embodiment of the present disclosure also provides an electronic device, the structural diagram of which is shown in FIG. It is configured to be executed by one or more processors 703 to execute the one or more instructions 702 to perform the above data processing method.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the systems and system embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is It can be located in one place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Abstract

一种数据处理方法、系统、存储介质及电子设备,在迭代计算过程中,若当前迭代次数符合第一预设条件,将模型参数(I)对本地模型参数(II)进行更新,若当前迭代次数符合第二预设条件,将获取到的本地模型参数(III)对本地模型参数(II)进行更新,通过额外梯度算法将更新后的本地模型参数(II)进行计算,得到随机梯度方向并确定目标模型参数,基于目标模型参数执行网络模型训练操作。结合额外梯度算法和自适应学习速率,可同时实现在对抗学习进行训练时自适应学习速率与分布式计算,降低对抗学习进行训练的局限性;此外,自适应学习速率的计算在本地上进行,无需设备之间进行通信,从而减轻工程师试错式的模型训练,提高工程实践效率。

Description

一种数据处理方法、系统、存储介质及电子设备
本公开要求于2021年9月8日提交中国专利局、申请号为202111048745.1、公开名称为“一种数据处理方法、系统、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及对抗学习技术领域,更具体地说,涉及一种数据处理方法、系统、存储介质及电子设备。
背景技术
对抗学习是一种机器学习方法。对抗学习实现的方法,是让两个网络相互竞争对抗,其中一个是生成器网络,它不断捕捉训练库里真实图片的概率分布,将输入的随机噪声转变成新的样本(假数据),另一个是判别器网络,它可以同时观察真实和假造的数据,判断这个数据的真假。通过反复对抗,生成器和判别器的能力都会不断增强,直到达成一个平衡,最后生成器可生成高质量的、以假乱真的图片。
在对抗学习中,模型效果十分依赖所采用的学习速率,因此自适应的学习速率在实践中具有重要意义。自适应的学习速率不需要工程师手动调整学习速率,消除了模型学习中人为因素的干扰,因此也是实现可靠人工智能的重要技术。
由于对抗学习的数据集通常数量巨大,在训练时需要使用分布式的训练方式。另外。在对抗学习中,模型效果十分依赖所采用自适应学习速率。由于对抗学习的损失函数具有最小化-最大化的结构,使得现有的技术方案无法同时实现自适应学习速率与分布式计算。
因此,在对抗学习进行训练的训练方式的局限性高。
发明内容
有鉴于此,本公开为一种数据处理方法、系统、存储介质及电子设备,实现降低了对抗学习进行训练的局限性和提高工程实践效率的目的。
为了实现上述目的,其公开的技术方案如下:
本公开第一方面公开了一种数据处理方法,所述方法包括:
在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率;
若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000001
和所述自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000002
并将所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000003
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000004
进行更新;
若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000005
所述权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000006
并将所述本地模型参数
Figure PCTCN2022096157-appb-000007
对所述本地模型参数
Figure PCTCN2022096157-appb-000008
进行更新;
通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000009
进行计算,得到随机梯度方向,并基于所述随机梯度方向,确定目标模型参数;
基于所述目标模型参数,执行网络模型训练操作。
优选的,所述在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率,包括:
在迭代计算过程中,获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数;
当所述当前迭代次数等于预设次数时,计算得到本地模型参数
Figure PCTCN2022096157-appb-000010
本地模型参数
Figure PCTCN2022096157-appb-000011
和本地模型参数
Figure PCTCN2022096157-appb-000012
将所述可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000013
本地模型参数
Figure PCTCN2022096157-appb-000014
和本地模型参数
Figure PCTCN2022096157-appb-000015
进行计算,得到自适应学习速率。
优选的,所述若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000016
和所述自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000017
并将所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000018
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000019
进行更新,包括:
将所述当前迭代次数与预设次数进行求差计算,得到差值;
若所述差值属于各个设备通信时间节点集合,则确定各个并行设备之间处于通信状态,所述各个设备通信时间节点集合由并行设备本地更新步数和总迭 代数目确定,所述总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定;
在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
Figure PCTCN2022096157-appb-000020
和所述自适应学习速率至中心设备,触发所述中心设备将所述本地模型参数
Figure PCTCN2022096157-appb-000021
所述自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000022
所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000023
由所述权重、所述获取到的本地模型参数
Figure PCTCN2022096157-appb-000024
和所述并行设备数的总和确定;
将所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000025
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000026
进行更新。
优选的,所述若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000027
所述权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000028
并将所述本地模型参数
Figure PCTCN2022096157-appb-000029
对所述本地模型参数
Figure PCTCN2022096157-appb-000030
进行更新,包括:
将所述当前迭代次数与预设次数进行求差计算,得到差值;
若所述差值不属于各个并行设备通信时间节点集合,则确定各个并行设备之间处于非通信状态,所述各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,所述总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定;
在各个设备之间处于非通信状态下,基于所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000031
所述权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000032
将所述本地模型参数
Figure PCTCN2022096157-appb-000033
对所述本地模型参数
Figure PCTCN2022096157-appb-000034
进行更新。
优选的,在所述获取各个并行设备的当前迭代次数的自适应学习速率之前,还包括:
获取可行集的直径、预设基础学习速率和预设梯度上界的估计值;
对所述可行集的直径、所述预设基础学习速率和预设梯度上界的估计值进行初始化计算,得到初始学习速率。
优选的,在所述获取各个并行设备的当前迭代次数的自适应学习速率之前,还包括:
初始化各个并行设备的本地模型参数。
本公开第二方面公开了一种数据处理系统,所述系统包括:
获取单元,用于在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率;
第一更新单元,用于若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000035
和所述自适应学习速率进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000036
并将所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000037
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000038
进行更新;
第二更新单元,用于若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
Figure PCTCN2022096157-appb-000039
所述权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000040
并将所述本地模型参数
Figure PCTCN2022096157-appb-000041
对所述本地模型参数
Figure PCTCN2022096157-appb-000042
进行更新;
确定单元,用于通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000043
进行计算,得到随机梯度方向,并基于所述随机梯度方向,确定目标模型参数;
执行单元,用于基于所述目标模型参数,执行网络模型训练操作。
优选的,所述获取单元,包括:
获取模块,用于获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数;
第一计算模块,用于当所述当前迭代次数等于预设次数时,计算得到本地模型参数
Figure PCTCN2022096157-appb-000044
本地模型参数
Figure PCTCN2022096157-appb-000045
和本地模型参数
Figure PCTCN2022096157-appb-000046
第二计算模块,用于将所述可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000047
本地模型参数
Figure PCTCN2022096157-appb-000048
和本地模型参数
Figure PCTCN2022096157-appb-000049
进行计算,得到自适应学习速率。
本公开第三方面公开了一种存储介质,所述存储介质包括存储的指令,其中,在所述指令运行时控制所述存储介质所在的设备执行如第一方面任意一项所述的数据处理方法。
本公开第四方面公开了一种电子设备,包括存储器,以及一个或者一个以上的指令,其中一个或者一个以上指令存储于存储器中,且经配置以由一个或者一个以上处理器执行如第一方面任意一项所述的数据处理方法。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本公开实施例公开的一种数据处理方法的流程示意图;
图2为本公开实施例公开的收敛速度效果对比示意图;
图3为本公开实施例公开的获取各个并行设备的当前迭代次数的自适应学习速率的流程示意图;
图4为本公开实施例公开的将加权平均后的模型参数
Figure PCTCN2022096157-appb-000050
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000051
进行更新的流程示意图;
图5为本公开实施例公开的将本地模型参数
Figure PCTCN2022096157-appb-000052
对本地模型参数
Figure PCTCN2022096157-appb-000053
进行更新的流程示意图;
图6为本公开实施例公开的一种数据处理系统的结构示意图;
图7为本公开实施例公开的一种电子设备的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
由背景技术可知,在对抗学习进行训练的训练方式的局限性高。
为了解决该问题,本公开公开了一种数据处理方法、系统、存储介质及电子设备,获取各个并行设备的当前迭代次数的自适应学习速率,若当前迭代次 数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000054
和自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000055
并将加权平均后的模型参数
Figure PCTCN2022096157-appb-000056
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000057
进行更新,若当前迭代次数符合第二预设条件,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000058
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000059
并将本地模型参数
Figure PCTCN2022096157-appb-000060
对本地模型参数
Figure PCTCN2022096157-appb-000061
进行更新,通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000062
进行计算,得到随机梯度方向,并基于随机梯度方向,确定目标模型参数,基于目标模型参数,执行网络模型训练操作。通过上述方案,结合额外梯度算法以及自适应学习速率,可同时实现在抗学习进行训练时自适应学习速率与分布式计算,降低了对抗学习进行训练的局限性。此外,自适应学习速率的计算在本地上进行,无需设备之间进行通信,从而减轻了工程师试错式的模型训练,提高了工程实践效率。具体实现方式通过下述实施例具体进行说明。
本方案解决如下对抗式优化问题:
min x∈Xmax y∈YF(x,y)     公式(1)
其中,X和Y是模型参数模型搜索空间(也叫做可行集),F是针对不同机器学习问题而定的训练函数,min为最小值,max为最大值。上面的数学模型涵盖了工程实践中的很多问题,例如生成式对抗神经网络训练、双线性博弈论模型求解等。我们假设函数F具有凸凹性,并且我们考虑函数F是光滑或者不光滑的情形。我们提出的算法呈现在下图的算法框中。为符号方便,将变量x和变量y(表示模型参数)合并记作z,将集合X和集合Y的乘积集合记作Z。
参考图1所示,为本公开实施例公开的一种数据处理方法的流程示意图,该数据处理方法主要包括如下步骤:
步骤S101:执行初始化操作。
在步骤S101中,初始化操作包括初始化计算和初始化各个并行设备的本地模型参数。
在初始化操作之前,需进行算法参数输入,参数包括可行集的直径D、预设基础学习速率α、预设梯度上界的估计值G 0、并行设备本地更新的步数K、并行设备数M和并行设备通信次数R。
初始化操作的过程如下:
首先,获取可行集的直径、预设基础学习速率和预设梯度上界的估计值,然后,对可行集的直径、预设基础学习速率和预设梯度上界的估计值进行初始化计算,得到初始学习速率,初始化学习速率的公式如下:
Figure PCTCN2022096157-appb-000063
其中,
Figure PCTCN2022096157-appb-000064
为初始学习速率,D为可行集的直径,α为预设基础学习速率,α的取值为0.01或0.1,G 0为预设梯度上界的估计值,G 0根据数据集进行估计得到。
初始化各个设备的本地模型参数的过程如下:
在获取各个并行设备的当前迭代次数的自适应学习速率之前,初始化各个并行设备的本地模型参数。
初始化各个并行设备的本地模型参数的公式如下:
Figure PCTCN2022096157-appb-000065
其中,
Figure PCTCN2022096157-appb-000066
为初始化后的本地模型参数。
初始化操作完成后,定义各个并行设备通信时间节点集合S={0,K,2K,…,RK}。
K为并行设备本地更新的步数,R为并行设备间通信次数。
当前迭代次数t=(1,2,…T)的过程中,各个并行设备均执行步骤S102至步骤S105,直至完成迭代过程结束。
其中,T=KR,T为每个并行设备的总迭代数目。
步骤S102:在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率。
在步骤S102中,具体在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率的过程如下:
首先,在迭代计算过程中,获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数,然后,当当前迭代次数等于预设次数时,计算得到本地模型参数
Figure PCTCN2022096157-appb-000067
本地模型参数
Figure PCTCN2022096157-appb-000068
和本地模型参数
Figure PCTCN2022096157-appb-000069
最后,将可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000070
本地模型参数
Figure PCTCN2022096157-appb-000071
和本地模型参数
Figure PCTCN2022096157-appb-000072
进行计算,得到自适应学习速率。
自适应学习速率的公式如下:
Figure PCTCN2022096157-appb-000073
其中,
Figure PCTCN2022096157-appb-000074
为在迭代次数等于τ时计算出的自适应学习速率,
Figure PCTCN2022096157-appb-000075
Figure PCTCN2022096157-appb-000076
均为在迭代次数等于τ时计算出的本地模型参数,D为可行集的直径,α为预设基础学习速率,α的取值为0.01或0.1,G 0为预设梯度上界的估计值,t为当前迭代次数,G 0根据数据集进行估计得到,∑为求和。
公式(4)的分母中的求和项将本地设备中出现过的模型参数先进行求差后再求和。
需要说明的是,自适应学习速率的计算只依赖本地机器上的数据集以及本地出现过的模型迭代参数,而不需要进行机器之间的相互通信。
步骤S103:若当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000077
和自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000078
并将加权平均后的模型参数
Figure PCTCN2022096157-appb-000079
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000080
进行更新。
在步骤S103中,若t-1∈S,即当前迭代次数符合第一预设条件,则在当前迭代次数中,各个并行设备需要进行通信,其中,S为各个并行设备通信时间节点集合。
通过机器通信协议与模型加权平均规则,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000081
和自适应学习速率进行加权平均计算。规定每台设备在更新K步之后向一个中心设备传递当前的模型参数以及学习步长。在中心设备上,我们计算各台设备模型的加权平均,其中权重与机器的当前学习步长成反比。然后我们将加权平均之后模型并广播到各个并行设备上。
具体若当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000082
和自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000083
并将加权平均后的模型参数
Figure PCTCN2022096157-appb-000084
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000085
进行更新的过程如下:
首先,将当前迭代次数与预设次数进行求差计算,得到差值,其次,若差值属于各个设备通信时间节点集合,则确定各个并行设备之间处于通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭 代数目由并行设备间通信次数和并行设备本地更新的步数确定,然后,在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
Figure PCTCN2022096157-appb-000086
和自适应学习速率至中心设备,触发中心设备将本地模型参数
Figure PCTCN2022096157-appb-000087
自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的本地模型参数
Figure PCTCN2022096157-appb-000088
加权平均后的本地模型参数
Figure PCTCN2022096157-appb-000089
由权重、获取到的本地模型参数
Figure PCTCN2022096157-appb-000090
和并行设备数的总和确定,最后,将加权平均后的本地模型参数
Figure PCTCN2022096157-appb-000091
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000092
进行更新,即,中心设备更新本地模型参数
Figure PCTCN2022096157-appb-000093
触发中心设备将本地模型参数
Figure PCTCN2022096157-appb-000094
自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000095
的公式分别如下:
Figure PCTCN2022096157-appb-000096
其中,w m为权重,∑ m为并行设备数的总和,
Figure PCTCN2022096157-appb-000097
为在迭代次数等于τ时计算出的自适应学习速率。
Figure PCTCN2022096157-appb-000098
其中,
Figure PCTCN2022096157-appb-000099
为加权平均后的本地模型参数,∑ m为并行设备数的总和,w m为权重,
Figure PCTCN2022096157-appb-000100
为本地模型参数。
步骤S104:若当前迭代次数符合第二预设条件,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000101
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000102
并将本地模型参数
Figure PCTCN2022096157-appb-000103
对本地模型参数
Figure PCTCN2022096157-appb-000104
进行更新。
在步骤S104中,若
Figure PCTCN2022096157-appb-000105
即当前迭代次数符合第二预设条件,则在当前迭代次数中,各个并行设备不需要进行通信。
具体若当前迭代次数符合第二预设条件,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000106
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000107
并将本地模型参数
Figure PCTCN2022096157-appb-000108
对所述本地模型参数
Figure PCTCN2022096157-appb-000109
进行更新的过程如下:
首先,将当前迭代次数与预设次数进行求差计算,得到差值,其次,若差 值不属于各个并行设备通信时间节点集合,则确定各个并行设备之间处于非通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定,然后,在各个设备之间处于非通信状态下,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000110
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000111
最后将本地模型参数
Figure PCTCN2022096157-appb-000112
对本地模型参数
Figure PCTCN2022096157-appb-000113
进行更新,即中心设备更新本地模型参数
Figure PCTCN2022096157-appb-000114
步骤S103和步骤S104即为迭代计算过程,在迭代计算过程完成后,执行步骤S105。
步骤S105:通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000115
进行计算,得到随机梯度方向,并基于随机梯度方向,确定目标模型参数。
其中,额外梯度算法是对抗训练常用的算法。他与常规的梯度下降算法不同,他在每一次迭代中需要计算两次随机梯度,之后进行两次梯度下降。第一次的梯度下降在当前本地模型
Figure PCTCN2022096157-appb-000116
处沿着在
Figure PCTCN2022096157-appb-000117
出计算的随机梯度方向进行下降。将第一步得到的模型记作°z t m。第二次的梯度下降在当前模型
Figure PCTCN2022096157-appb-000118
处沿着在°z t m出计算的随机梯度方向进行下降。在每个并行设备上,我们首先随机采样得到小批量的训练样本,利用这些样本计算随机梯度方向。
在步骤S105中,通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000119
进行计算,得到第一随机梯度方向和第二随机梯度方向,并基于第一随机梯度方向和第二随机梯度方向,确定目标模型参数
Figure PCTCN2022096157-appb-000120
其中,∑ m为并行设备数的总和,∑ t为迭代次数的总和,T为每个并行设备的总迭代数目,
Figure PCTCN2022096157-appb-000121
为本地模型参数。
第一随机梯度方向的公式如下:
Figure PCTCN2022096157-appb-000122
其中,
Figure PCTCN2022096157-appb-000123
为本地模型参数,Π Z为投影因子,
Figure PCTCN2022096157-appb-000124
为本地模型参数,
Figure PCTCN2022096157-appb-000125
为自适应学习速率,
Figure PCTCN2022096157-appb-000126
是在
Figure PCTCN2022096157-appb-000127
出计算的第一随机梯度方向。
第二随机梯度方向的公式如下:
Figure PCTCN2022096157-appb-000128
其中,
Figure PCTCN2022096157-appb-000129
为本地模型参数,Π Z为投影因子,
Figure PCTCN2022096157-appb-000130
为本地模型参数,
Figure PCTCN2022096157-appb-000131
为自适应学习速率,
Figure PCTCN2022096157-appb-000132
是在
Figure PCTCN2022096157-appb-000133
出计算的第二随机梯度方向。
步骤S106:基于目标模型参数,执行网络模型训练操作。
在步骤S106中,网络模型训练操作可以是在图像生成、可靠鲁棒模型训练以及博弈论模型的求解等场景。
针对分布式、模型参数模型本地更新式场景下对抗学习任务中的自适应学习速率设计,在该算法中自适应学习速率的设计是基于本地机器迭代出现过的模型参数而定,不需要预先知道数据集的参数。自适应学习速率的计算完全在本地上进行,不需要机器间通信。自适应学习速率减轻工程师试错式的模型训练,提高了工程实践效率。
本方案的自适应的分布式对抗学习算法在模型参数量巨大,训练数据量巨大、需要实现用户隐私保护、分布式计算、并行设备通信速度慢等诸多场景下具有重要工程实践意义。采用本技术方案可大幅降低分布式训练大规模对抗学习模型的训练通信,通信误差和学习率调试问题。
例如,对于图像生成任务上,ImageNet的数据集包含一千多万个样本,生成式对抗网络模型包含上千万的参数量。直接采用tensorflow/pytorch或者传统的分布式算法来训练,在并行设备和中心设备之间会存在超高的通信量并且学习率也难以调整。同时学习速率的设计对生成图像的质量有重要影响,每次调整学习率均需要消耗大量的GPU算例资源,大量增加了企业的成本。采用本专利中的技术方案可以统一的解决模型训练中的通信问题以及学习率调整问题,从而可以快速有效的训练大规模的对抗学习网络模型。
理论上对如上算法给出了如下收敛保证。对于不可导的函数F,证明本方案的输出具有如下收敛速度:
Figure PCTCN2022096157-appb-000134
其中,函数DualGap衡量某一个模型模型参数的质量,是对抗学习中常用的模型参数模型衡量准则,
Figure PCTCN2022096157-appb-000135
为收敛速度,o为略去常数项,Ε为期望值,G为函数F的梯度模的上界,γ是工程师初始梯度估计与函数F的梯度模的上界的比值,T是每台设备的总迭代数,D为可行集的直径,σ为随机 梯度的噪声程度,M为并行设备数。
对于模型参数
Figure PCTCN2022096157-appb-000136
该衡量准则的具体定义为:
Figure PCTCN2022096157-appb-000137
其中,函数DualGap衡量某一个模型模型参数的质量,是对抗学习中常用的模型参数模型衡量准则,
Figure PCTCN2022096157-appb-000138
Figure PCTCN2022096157-appb-000139
均为变量,X和Y是模型搜索空间(也叫做可行集),F是针对不同机器学习问题而定的训练函数,max为最大值,min为最小值。
随着迭代次数的增加,该算法的输出在期望意义下会逼近函数F的鞍点。
对于函数F为可导的情况,本方案的输出具有如下收敛速度:
Figure PCTCN2022096157-appb-000140
其中,
Figure PCTCN2022096157-appb-000141
为收敛速度,V 1(T)为每一台设备上出现的随机梯度的模之和开根号的期望值,o为略去常数项,D为可行集的直径,G为函数F的梯度模的上界,M为并行设备数,γ是工程师初始梯度估计与函数F的梯度模的上界的比值,L为函数F的光滑程度,T是每台设备的总迭代数,σ为随机梯度的噪声程度,从理论上说,以上的收敛速率是任何算法可以达到的最佳收敛速率。
结合图2所示,本方案提出的算法应用到训练生成式对网络模型训练的问题上。利用(Frechet Inception Distance,FID)(FID越低代表算法效果越好)和(Inception Score,IS)(IS越高算法的效果越好)来度量本方案算法的优越性,根据以下的试验结果可以看出,在相同的通信量的情况下,本方案的算法可以快速的收敛并且达到最好的效果。
图2中,MB-ASMP表示小批量自适应镜面单梯度下降算法,MB-UMP表示小批量全局镜面梯度下降算法,LocalAdam表示局部自适应梯度下降算法,LocalAdaSEG表示本方案的算法,纵坐标(1.00、1.25、1.50、1.75、2.00、2.25、2.50、2.75)表示IS值,横坐标(0、2、4、6、8、10)表示通信量。
综上所述,本方案中提出的对抗学习算法可以取得最佳的收敛速率,并且随着设备数量的增多,算法的收敛速度逐渐加快。另外,算法中的自适应学习率调整机制大幅降低了学习率调整的成本,提高了算法的稳定性。同时,我们从理论上也验证了本方案提出的算法的收敛性,保证了算法在多种环境下均收敛,增强了本方案的可信程度。
本公开实施例中,结合额外梯度算法以及自适应学习速率,可同时实现在抗学习进行训练时自适应学习速率与分布式计算,降低了对抗学习进行训练的局限性。此外,自适应学习速率的计算在本地上进行,无需设备之间进行通信,从而减轻了工程师试错式的模型训练,提高了工程实践效率。
参考图3所示,为在上述步骤S102中涉及到获取各个并行设备的当前迭代次数的自适应学习速率的过程,主要包括如下步骤:
步骤S301:获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数。
步骤S302:当当前迭代次数等于预设次数时,计算得到本地模型参数
Figure PCTCN2022096157-appb-000142
本地模型参数
Figure PCTCN2022096157-appb-000143
和本地模型参数
Figure PCTCN2022096157-appb-000144
步骤S303:将可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000145
本地模型参数
Figure PCTCN2022096157-appb-000146
和本地模型参数
Figure PCTCN2022096157-appb-000147
进行计算,得到自适应学习速率。
步骤S301-步骤S303的执行原理与上述步骤S102的执行原理一致,可参考,此处不再进行赘述。
本公开实施例中,将可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000148
本地模型参数
Figure PCTCN2022096157-appb-000149
和本地模型参数
Figure PCTCN2022096157-appb-000150
进行计算,实现得到自适应学习速率的目的。
参考图4所示,为上述步骤S103中涉及到若当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000151
和自适应学习速率进行加权平均计算,得到加权平均后的模型参数
Figure PCTCN2022096157-appb-000152
并将加权平均后的模型参数
Figure PCTCN2022096157-appb-000153
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000154
进行更新的过程,主要包括如下步骤:
步骤S401:将当前迭代次数与预设次数进行求差计算,得到差值。
步骤S402:若差值属于各个设备通信时间节点集合,则确定各个并行设 备之间处于通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定。
步骤S403:在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
Figure PCTCN2022096157-appb-000155
和自适应学习速率至中心设备,触发中心设备将本地模型参数
Figure PCTCN2022096157-appb-000156
自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000157
加权平均后的模型参数
Figure PCTCN2022096157-appb-000158
由权重、获取到的本地模型参数
Figure PCTCN2022096157-appb-000159
和并行设备数的总和确定。
步骤S404:将加权平均后的模型参数
Figure PCTCN2022096157-appb-000160
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000161
进行更新。
步骤S401-步骤S404的执行原理与上述步骤S103的执行原理一致,可参考,此处不再进行赘述。
本公开实施例中,将当前迭代次数与预设次数进行求差计算,得到差值,在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
Figure PCTCN2022096157-appb-000162
和自适应学习速率至中心设备,触发中心设备将本地模型参数
Figure PCTCN2022096157-appb-000163
自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000164
实现将加权平均后的模型参数
Figure PCTCN2022096157-appb-000165
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000166
进行更新的目的。
参考图5所示,为上述步骤S104中涉及到若当前迭代次数符合第二预设条件,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000167
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000168
并将本地模型参数
Figure PCTCN2022096157-appb-000169
对本地模型参数
Figure PCTCN2022096157-appb-000170
进行更新的过程,主要包括如下步骤:
步骤S501:将当前迭代次数与预设次数进行求差计算,得到差值。
步骤S502:若差值不属于各个并行设备通信时间节点集合,则确定各个并行设备之间处于非通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定。
步骤S503:在各个设备之间处于非通信状态下,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000171
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000172
步骤S504:将本地模型参数
Figure PCTCN2022096157-appb-000173
对本地模型参数
Figure PCTCN2022096157-appb-000174
进行更新。
步骤S501-步骤S504的执行原理与上述步骤S104的执行原理一致,可参考,此处不再进行赘述。
本公开实施例中,将当前迭代次数与预设次数进行求差计算,得到差值,在各个设备之间处于非通信状态下,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000175
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000176
实现将本地模型参数
Figure PCTCN2022096157-appb-000177
对本地模型参数
Figure PCTCN2022096157-appb-000178
进行更新的目的。
基于上述实施例图1公开的一种数据处理方法,本公开实施例还对应公开了一种数据处理系统,如图6所示,该数据处理系统包括获取单元601、第一更新单元602、第二更新单元603、确定单元604和执行单元605。
获取单元601,用于获取各个并行设备的当前迭代次数的自适应学习速率。
第一更新单元602,用于若当前迭代次数符合第一预设条件,将获取到的本地模型参数
Figure PCTCN2022096157-appb-000179
和自适应学习速率进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000180
并将加权平均后的模型参数
Figure PCTCN2022096157-appb-000181
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000182
进行更新。
第二更新单元603,用于若当前迭代次数符合第二预设条件,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000183
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000184
并将本地模型参数
Figure PCTCN2022096157-appb-000185
对本地模型参数
Figure PCTCN2022096157-appb-000186
进行更新。
确定单元604,用于通过额外梯度算法,将更新后的本地模型参数
Figure PCTCN2022096157-appb-000187
进行计算,得到随机梯度方向,并基于随机梯度方向,确定目标模型参数。
执行单元605,用于基于目标模型参数,执行网络模型训练操作。
进一步的,获取单元601,包括:
获取模块,用于获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数。
第一计算模块,用于当所述当前迭代次数等于预设次数时,计算得到本地模型参数
Figure PCTCN2022096157-appb-000188
本地模型参数
Figure PCTCN2022096157-appb-000189
和本地模型参数
Figure PCTCN2022096157-appb-000190
第二计算模块,用于将所述可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
Figure PCTCN2022096157-appb-000191
本地模型 参数
Figure PCTCN2022096157-appb-000192
和本地模型参数
Figure PCTCN2022096157-appb-000193
进行计算,得到自适应学习速率。
进一步的,第一更新单元602,包括:
第三计算模块,用于将当前迭代次数与预设次数进行求差计算,得到差值。
第一确定模块,用于若差值属于各个设备通信时间节点集合,则确定各个并行设备之间处于通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定。
第四计算模块,用于在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
Figure PCTCN2022096157-appb-000194
和自适应学习速率至中心设备,触发中心设备将本地模型参数
Figure PCTCN2022096157-appb-000195
自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
Figure PCTCN2022096157-appb-000196
加权平均后的模型参数
Figure PCTCN2022096157-appb-000197
由权重、获取到的本地模型参数
Figure PCTCN2022096157-appb-000198
和并行设备数的总和确定。
第一更新模块,用于将加权平均后的模型参数
Figure PCTCN2022096157-appb-000199
对预先获取到的本地模型参数
Figure PCTCN2022096157-appb-000200
进行更新。
进一步的,第二更新单元603,包括:
第五计算模块,用于将当前迭代次数与预设次数进行求差计算,得到差值。
第二确定模块,用于若差值不属于各个并行设备通信时间节点集合,则确定各个并行设备之间处于非通信状态,各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定。
获取模块,用于在各个设备之间处于非通信状态下,基于加权平均后的模型参数
Figure PCTCN2022096157-appb-000201
权重和预先获取到的并行设备数的总和,得到本地模型参数
Figure PCTCN2022096157-appb-000202
第二更新模块,用于将本地模型参数
Figure PCTCN2022096157-appb-000203
对本地模型参数
Figure PCTCN2022096157-appb-000204
进行更新。
进一步的,还包括第一初始化单元,第一初始化单元包括:
获取模块,用于获取可行集的直径、预设基础学习速率和预设梯度上界的估计值。
第六计算模块,用于对可行集的直径、预设基础学习速率和预设梯度上界的估计值进行初始化计算,得到初始学习速率。
进一步的,还包括第二初始化单元。
第二初始化单元,用于初始化各个并行设备的本地模型参数。
本公开实施例中,结合额外梯度算法以及自适应学习速率,可同时实现在抗学习进行训练时自适应学习速率与分布式计算,降低了对抗学习进行训练的局限性。此外,自适应学习速率的计算在本地上进行,无需设备之间进行通信,从而减轻了工程师试错式的模型训练,提高了工程实践效率。
本公开实施例还提供了一种存储介质,存储介质包括存储的指令,其中,在指令运行时控制存储介质所在的设备执行上述数据处理方法。
本公开实施例还提供了一种电子设备,其结构示意图如图7所示,具体包括存储器701,以及一个或者一个以上的指令702,其中一个或者一个以上指令702存储于存储器701中,且经配置以由一个或者一个以上处理器703执行所述一个或者一个以上指令702执行上述数据处理方法。
上述各个实施例的具体实施过程及其衍生方式,均在本公开的保护范围之内。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本公开。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 一种数据处理方法,其特征在于,所述方法包括:
    在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率;
    若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
    Figure PCTCN2022096157-appb-100001
    和所述自适应学习速率进行加权平均计算,得到权重和加权平均后的模型参数
    Figure PCTCN2022096157-appb-100002
    并将所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100003
    对预先获取到的本地模型参数
    Figure PCTCN2022096157-appb-100004
    进行更新;
    若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100005
    所述权重和预先获取到的并行设备数的总和,得到本地模型参数
    Figure PCTCN2022096157-appb-100006
    并将所述本地模型参数
    Figure PCTCN2022096157-appb-100007
    对所述本地模型参数
    Figure PCTCN2022096157-appb-100008
    进行更新;
    通过额外梯度算法,将更新后的本地模型参数
    Figure PCTCN2022096157-appb-100009
    进行计算,得到随机梯度方向,并基于所述随机梯度方向,确定目标模型参数;
    基于所述目标模型参数,执行网络模型训练操作。
  2. 根据权利要求1所述的方法,其特征在于,所述在迭代计算过程中,获取各个并行设备的当前迭代次数的自适应学习速率,包括:
    在迭代计算过程中,获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数;
    当所述当前迭代次数等于预设次数时,计算得到本地模型参数
    Figure PCTCN2022096157-appb-100010
    本地模型参数
    Figure PCTCN2022096157-appb-100011
    和本地模型参数
    Figure PCTCN2022096157-appb-100012
    将所述可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
    Figure PCTCN2022096157-appb-100013
    本地模型参数
    Figure PCTCN2022096157-appb-100014
    和本地模型参 数
    Figure PCTCN2022096157-appb-100015
    进行计算,得到自适应学习速率。
  3. 根据权利要求1所述的方法,其特征在于,所述若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
    Figure PCTCN2022096157-appb-100016
    和所述自适应学习速率进行加权平均计算,得到权重和加权平均后的模型参数
    Figure PCTCN2022096157-appb-100017
    并将所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100018
    对预先获取到的本地模型参数
    Figure PCTCN2022096157-appb-100019
    进行更新,包括:
    将所述当前迭代次数与预设次数进行求差计算,得到差值;
    若所述差值属于各个设备通信时间节点集合,则确定各个并行设备之间处于通信状态,所述各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,所述总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定;
    在各个并行设备之间处于通信状态下,使各个并行设备发送本地模型参数
    Figure PCTCN2022096157-appb-100020
    和所述自适应学习速率至中心设备,触发所述中心设备将所述本地模型参数
    Figure PCTCN2022096157-appb-100021
    所述自适应学习速率和预先获取到的并行设备数的总和进行加权平均计算,得到权重和加权平均后的模型参数
    Figure PCTCN2022096157-appb-100022
    所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100023
    由所述权重、所述获取到的本地模型参数
    Figure PCTCN2022096157-appb-100024
    和所述并行设备数的总和确定;
    将所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100025
    对预先获取到的本地模型参数
    Figure PCTCN2022096157-appb-100026
    进行更新。
  4. 根据权利要求1所述的方法,其特征在于,所述若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100027
    所述权重和预先获取到的并行设备数的总和,得到本地模型参数
    Figure PCTCN2022096157-appb-100028
    并将所述本地模型参数
    Figure PCTCN2022096157-appb-100029
    对所述本地模型参数
    Figure PCTCN2022096157-appb-100030
    进行更新,包括:
    将所述当前迭代次数与预设次数进行求差计算,得到差值;
    若所述差值不属于各个并行设备通信时间节点集合,则确定各个并行设备之间处于非通信状态,所述各个设备通信时间节点集合由并行设备本地更新步数和总迭代数目确定,所述总迭代数目由并行设备间通信次数和并行设备本地更新的步数确定;
    在各个设备之间处于非通信状态下,基于所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100031
    所述权重和预先获取到的并行设备数的总和,得到本地模型参数
    Figure PCTCN2022096157-appb-100032
    将所述本地模型参数
    Figure PCTCN2022096157-appb-100033
    对所述本地模型参数
    Figure PCTCN2022096157-appb-100034
    进行更新。
  5. 根据权利要求1所述的方法,其特征在于,在所述获取各个并行设备的当前迭代次数的自适应学习速率之前,还包括:
    获取可行集的直径、预设基础学习速率和预设梯度上界的估计值;
    对所述可行集的直径、所述预设基础学习速率和预设梯度上界的估计值进行初始化计算,得到初始学习速率。
  6. 根据权利要求1所述的方法,其特征在于,在所述获取各个并行设备的当前迭代次数的自适应学习速率之前,还包括:
    初始化各个并行设备的本地模型参数。
  7. 一种数据处理系统,其特征在于,所述系统包括:
    获取单元,用于获取各个并行设备的当前迭代次数的自适应学习速率;
    第一更新单元,用于若所述当前迭代次数符合第一预设条件,将获取到的本地模型参数
    Figure PCTCN2022096157-appb-100035
    和所述自适应学习速率进行加权平均计算,得到权重和加权平均后的模型参数
    Figure PCTCN2022096157-appb-100036
    并将所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100037
    对预先获取到的本地模型参数
    Figure PCTCN2022096157-appb-100038
    进行更新;
    第二更新单元,用于若所述当前迭代次数符合第二预设条件,基于所述加权平均后的模型参数
    Figure PCTCN2022096157-appb-100039
    所述权重和预先获取到的并行设备数的总和,得到 本地模型参数
    Figure PCTCN2022096157-appb-100040
    并将所述本地模型参数
    Figure PCTCN2022096157-appb-100041
    对所述本地模型参数
    Figure PCTCN2022096157-appb-100042
    进行更新;
    确定单元,用于通过额外梯度算法,将更新后的本地模型参数
    Figure PCTCN2022096157-appb-100043
    进行计算,得到随机梯度方向,并基于所述随机梯度方向,确定目标模型参数;
    执行单元,用于基于所述目标模型参数,执行网络模型训练操作。
  8. 根据权利要求7所述的系统,其特征在于,所述获取单元,包括:
    获取模块,用于获取可行集的直径、预设梯度上界的估计值、预设基础学习速率和各个并行设备的当前迭代次数;
    第一计算模块,用于当所述当前迭代次数等于预设次数时,计算得到本地模型参数
    Figure PCTCN2022096157-appb-100044
    本地模型参数
    Figure PCTCN2022096157-appb-100045
    和本地模型参数
    Figure PCTCN2022096157-appb-100046
    第二计算模块,用于将所述可行集的直径、预设梯度上界的估计值、预设基础学习速率、各个并行设备的当前迭代次数、本地模型参数
    Figure PCTCN2022096157-appb-100047
    本地模型参数
    Figure PCTCN2022096157-appb-100048
    和本地模型参数
    Figure PCTCN2022096157-appb-100049
    进行计算,得到自适应学习速率。
  9. 一种存储介质,其特征在于,所述存储介质包括存储的指令,其中,在所述指令运行时控制所述存储介质所在的设备执行如权利要求1至6任意一项所述的数据处理方法。
  10. 一种电子设备,其特征在于,包括存储器,以及一个或者一个以上的指令,其中一个或者一个以上指令存储于存储器中,且经配置以由一个或者一个以上处理器执行如权利要求1至6任意一项所述的数据处理方法。
PCT/CN2022/096157 2021-09-08 2022-05-31 一种数据处理方法、系统、存储介质及电子设备 WO2023035691A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111048745.1A CN113762527A (zh) 2021-09-08 2021-09-08 一种数据处理方法、系统、存储介质及电子设备
CN202111048745.1 2021-09-08

Publications (1)

Publication Number Publication Date
WO2023035691A1 true WO2023035691A1 (zh) 2023-03-16

Family

ID=78793777

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096157 WO2023035691A1 (zh) 2021-09-08 2022-05-31 一种数据处理方法、系统、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN113762527A (zh)
WO (1) WO2023035691A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663639A (zh) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762527A (zh) * 2021-09-08 2021-12-07 京东科技信息技术有限公司 一种数据处理方法、系统、存储介质及电子设备
CN114841341B (zh) * 2022-04-25 2023-04-28 北京百度网讯科技有限公司 图像处理模型训练及图像处理方法、装置、设备和介质
CN115348329B (zh) * 2022-10-17 2023-01-06 南京凯奥思数据技术有限公司 基于梯度传输优化的数据分布式训练方法、系统及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615072A (zh) * 2018-11-27 2019-04-12 长威信息科技发展股份有限公司 一种对抗神经网络的集成方法及计算机设备
CN110136063A (zh) * 2019-05-13 2019-08-16 南京信息工程大学 一种基于条件生成对抗网络的单幅图像超分辨率重建方法
US20200111194A1 (en) * 2018-10-08 2020-04-09 Rensselaer Polytechnic Institute Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle)
CN111968666A (zh) * 2020-08-20 2020-11-20 南京工程学院 基于深度域自适应网络的助听器语音增强方法
CN113762527A (zh) * 2021-09-08 2021-12-07 京东科技信息技术有限公司 一种数据处理方法、系统、存储介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200111194A1 (en) * 2018-10-08 2020-04-09 Rensselaer Polytechnic Institute Ct super-resolution gan constrained by the identical, residual and cycle learning ensemble (gan-circle)
CN109615072A (zh) * 2018-11-27 2019-04-12 长威信息科技发展股份有限公司 一种对抗神经网络的集成方法及计算机设备
CN110136063A (zh) * 2019-05-13 2019-08-16 南京信息工程大学 一种基于条件生成对抗网络的单幅图像超分辨率重建方法
CN111968666A (zh) * 2020-08-20 2020-11-20 南京工程学院 基于深度域自适应网络的助听器语音增强方法
CN113762527A (zh) * 2021-09-08 2021-12-07 京东科技信息技术有限公司 一种数据处理方法、系统、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUOFENG LIAO; LI SHEN; JIA DUAN; MLADEN KOLAR; DACHENG TAO: "Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax Problems", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 June 2021 (2021-06-18), 201 Olin Library Cornell University Ithaca, NY 14853, XP081991968 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663639A (zh) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质
CN116663639B (zh) * 2023-07-31 2023-11-03 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质

Also Published As

Publication number Publication date
CN113762527A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
WO2023035691A1 (zh) 一种数据处理方法、系统、存储介质及电子设备
CN113762530B (zh) 面向隐私保护的精度反馈联邦学习方法
CN114787824A (zh) 联合混合模型
CN115552432A (zh) 用于联合学习的方法和装置
Tuor et al. Demo abstract: Distributed machine learning at resource-limited edge nodes
CN111090631B (zh) 分布式环境下的信息共享方法、装置和电子设备
WO2022160604A1 (en) Servers, methods and systems for second order federated learning
CN115378813A (zh) 一种基于差分隐私机制的分布式在线优化方法
Cho et al. Qos-aware workload distribution in hierarchical edge clouds: A reinforcement learning approach
CN116488906A (zh) 一种安全高效的模型共建方法
Dalgkitsis et al. Schema: Service chain elastic management with distributed reinforcement learning
Zhou et al. JPAS: Job-progress-aware flow scheduling for deep learning clusters
CN117151208B (zh) 基于自适应学习率的异步联邦学习参数更新方法、电子设备及存储介质
Yu et al. A genetic programming approach to distributed QoS-aware web service composition
CN113821317A (zh) 一种边云协同的微服务调度方法、装置及设备
Fan et al. DRL-D: revenue-aware online service function chain deployment via deep reinforcement learning
CN111415265A (zh) 生成式对抗网络的社交关系数据生成方法
Mirali et al. Distributed weighting strategies for improved convergence speed of first-order consensus
US10291485B2 (en) System and method for fault-tolerant parallel learning over non-iid data
CN115022231A (zh) 一种基于深度强化学习的最优路径规划的方法和系统
CN114302456A (zh) 一种移动边缘计算网络考虑任务优先级的计算卸载方法
CN114298319A (zh) 联合学习贡献值的确定方法、装置、电子设备及存储介质
CN114791853A (zh) 一种基于可靠性约束的工作流调度优化方法
Zhu et al. Scaling up mobile service selection in edge computing environment with cuckoo optimization algorithm
CN113743012A (zh) 一种多用户场景下的云-边缘协同模式任务卸载优化方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866169

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE