WO2023197520A1 - Procédé et système de traitement de données, dispositif et support de stockage lisible - Google Patents

Procédé et système de traitement de données, dispositif et support de stockage lisible Download PDF

Info

Publication number
WO2023197520A1
WO2023197520A1 PCT/CN2022/118104 CN2022118104W WO2023197520A1 WO 2023197520 A1 WO2023197520 A1 WO 2023197520A1 CN 2022118104 W CN2022118104 W CN 2022118104W WO 2023197520 A1 WO2023197520 A1 WO 2023197520A1
Authority
WO
WIPO (PCT)
Prior art keywords
moving average
training
new
moment
model
Prior art date
Application number
PCT/CN2022/118104
Other languages
English (en)
Chinese (zh)
Inventor
郭振华
邱志勇
赵雅倩
李仁刚
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023197520A1 publication Critical patent/WO2023197520A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data processing method, system, equipment and non-volatile computer-readable storage medium.
  • model training can be carried out with the help of hardware modules (such as GPU, Graphics Processing Unit).
  • hardware modules such as GPU, Graphics Processing Unit
  • the server as the host sends a large amount of training data to the hardware module, and the hardware module processes the training data for model training. After the model training is completed, the hardware module feeds back the trained model to the host.
  • the inventor realized that due to the large amount of training data, and the data transmission between the host and the hardware module needs to go through storage media such as host memory, GPU cache, GPU memory, etc., the data transmission between the host and the hardware module is The overhead is large and will affect the model training efficiency.
  • this application provides a data processing method, which is applied to a hardware computing platform connected to a host through the CXL (Compute Express Link, high-speed interconnection communication protocol) protocol, including:
  • the training data used to train the target model is shared in the host based on the CXL protocol;
  • calculating the new parameters includes: determining the current value of the moment moving average, and adjusting the learning rate based on the current value of the moment moving average, based on the adjusted New parameters for learning rate calculation;
  • the new model In response to the new model meeting the convergence conditions, the new model is retained and the host is enabled to share the new model based on the CXL protocol.
  • determining the current value of the moment moving average and adjusting the learning rate based on the current value of the moment moving average includes:
  • the warmup strategy is used to adjust the learning rate; in response to the current value of the moment moving average being less than or equal to the preset threshold, the stochastic gradient descent and momentum algorithms are used to adjust the learning rate.
  • determining the current value of the moment moving average based on the preset target attenuation coefficient and the moment moving average maximum value includes:
  • the first formula is:
  • ⁇ t is the current value of the moment moving average
  • is the maximum value of the moment moving average
  • t represents the current training time
  • ⁇ 2 is the target attenuation coefficient
  • the warmup strategy is used to adjust the learning rate, including:
  • new parameters are calculated based on the adjusted learning rate, including:
  • stochastic gradient descent and momentum algorithms are used to adjust the learning rate, including:
  • new parameters are calculated based on the adjusted learning rate, including:
  • the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
  • the computing module includes: any one or combination of CPU, GPU, FPGA, and ASIC.
  • calculating the update gradient at the current training moment based on the training data, training results, and model parameters output at the previous training moment includes:
  • gt is the updated gradient at the current training time
  • ⁇ t-1 represents the model parameters output at the previous training time
  • X is the training data
  • ft( ⁇ t-1; X) represents the training result for the training data.
  • calculating a new first moving average based on the preset object attenuation coefficient, update gradient and the first moving average of the previous training moment includes:
  • mt is the new first moving average
  • ⁇ 1 is the object attenuation coefficient
  • mt-1 is the first moving average of the previous training moment
  • gt is the updated gradient of the current training moment.
  • calculating a new second moving average based on the updated gradient, the target attenuation coefficient, the new first moving average, and the second moving average of the previous training moment includes:
  • vt is the new second moving average
  • ⁇ 2 is the target attenuation coefficient
  • mt is the new first moving average
  • vt-1 is the second moving average of the previous training moment
  • gt is the updated gradient of the current training moment.
  • calculating the learning rate at the current training moment based on the new second moving average and the target attenuation coefficient includes:
  • ⁇ t>4 means that the current value of the moment sliding average is greater than the preset threshold, the preset threshold value is 4, vt is the new second sliding average, and ⁇ 2 is the target attenuation coefficient.
  • calculating the learning rate at the current training moment based on the new second moving average and the target attenuation coefficient also includes:
  • ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4; among them, lt-1 is the learning rate output at the previous training moment, ⁇ t is the forward step length, and gt is the current training moment.
  • the update gradient of , ⁇ is the preset iteration parameter.
  • new parameters are calculated based on the adjusted learning rate, including:
  • ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, the preset threshold value is 4, ⁇ t is the forward step length, rt is the correction term of the new second moving average vt, is the correction term of the new first moving average mt; ⁇ t is the current value of the moment moving average, and ⁇ is the maximum value of the moment moving average.
  • calculating new parameters based on the adjusted learning rate also includes:
  • ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, which is 4, and ⁇ t-1 means the model parameters output at the previous training moment.
  • the new first moving average mt determines the descending direction of the gradient during the model training process, and vt and ⁇ t jointly determine the descending size of the gradient during the model training process.
  • the new first moving average mt is calculated Used to calculate new parameters to reduce calculation errors.
  • the second aspect of this application provides a data processing system, including: a host, and a hardware computing platform connected to the host through the CXL protocol;
  • Host used to provide training data for training target models; new models trained based on the CXL protocol shared hardware computing platform; and
  • Hardware computing platform used to share training data in the host based on the CXL protocol; call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results; use the new parameters to update the target model to obtain a new model; if new If the model meets the convergence conditions, the new model is retained; calculating new parameters includes: determining the current value of the moment moving average, adjusting the learning rate based on the current value of the moment moving average, and calculating new parameters based on the adjusted learning rate.
  • a third aspect of this application provides an electronic device, including:
  • One or more memories for storing computer-readable instructions
  • One or more processors used to execute computer-readable instructions to implement the aforementioned disclosed data processing methods.
  • a fourth aspect of the application provides one or more non-volatile computer-readable storage media storing computer-readable instructions. When executed by one or more processors, the computer-readable instructions cause one or more processes to The processor executes the data processing method disclosed above.
  • Figure 1 is a flow chart of a data processing method provided in one or more embodiments of the present application.
  • Figure 2 is a schematic diagram of a system framework provided in one or more embodiments of the present application.
  • Figure 3 is a schematic diagram of a connection between devices provided in one or more embodiments of the present application.
  • Figure 4 is a schematic diagram of memory sharing based on the CXL protocol provided in one or more embodiments of the present application.
  • Figure 5 is a schematic diagram of an electronic device provided in one or more embodiments of the present application.
  • this application provides a data processing solution that can reduce the data transmission overhead between the host and the hardware module and improve model training efficiency.
  • an embodiment of the present application discloses a data processing method, which is applied to a hardware computing platform connected to a host through the CXL protocol, including:
  • the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
  • Computing modules include: any one or combination of CPU, GPU, FPGA, and ASIC.
  • the target model can be any model, such as CNN, natural language processing model, image classification model, etc.
  • S102 Call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results. Calculating the new parameters includes: determining the current value of the moment moving average, and adjusting the learning rate based on the current value of the moment moving average. The new parameters are calculated using the subsequent learning rate.
  • model training process is the process of updating model parameters.
  • Current optimization algorithms used to update model parameters include AdaGrad, RMSProp, Adam, etc. Improved algorithms for Adam such as Radam, Adabelief, etc.
  • This embodiment uses Adabelief to update model parameters. Specifically, based on Adabelief, parameters such as the forward step length, two attenuation coefficients, iteration parameter ⁇ , and the maximum value of the moment moving average can be set. After each training result is obtained, new model parameters can be calculated based on these parameters at the previous training moment. In this embodiment, in order to avoid the impact of the learning rate on parameter calculation, the current value of the moment sliding average is first calculated, and based on the moment sliding The average current value adjusts the learning rate before calculating new parameters, so that the appropriate learning rate can be determined and the model parameters can be steadily updated. Among them, the calculated new parameters include the weight parameters and bias parameters of the model, that is, the new parameters of the model calculated each time are a collection of many parameters.
  • the new model meets the convergence conditions, the new model is retained and the host is allowed to share the new model based on the CXL protocol.
  • the convergence conditions can be set with reference to existing related technologies, such as reaching the maximum number of iterations, etc.
  • the host and the hardware computing platform are connected through the CXL protocol. Therefore, the host and the hardware computing platform can share each other's memory, IO and cache. Then the training data is transmitted from the host to the hardware computing platform without going through the host memory, Instead of storage media such as GPU cache and GPU memory, the hardware computing platform directly reads the training data in the host memory, thereby reducing data transmission overhead.
  • the hardware computing platform can adjust the learning rate based on the current value of the moment moving average, and calculate new parameters of the model based on the adjusted learning rate, thereby stabilizing the model parameters, avoiding falling into local optimality, and ensuring model accuracy. , improve training efficiency. It can be seen that this solution can reduce the data transmission overhead between the host and the hardware module and improve the efficiency of model training.
  • determining the current value of the moment moving average and adjusting the learning rate based on the current value of the moment moving average includes: determining the moment moving average based on the preset target attenuation coefficient and the maximum value of the moment moving average. Current value; in response to the current value of the moment moving average being greater than the preset threshold, use the warmup strategy to adjust the learning rate; corresponding to the current value of the moment moving average being greater than the preset threshold, use stochastic gradient descent and momentum algorithms to adjust the learning rate.
  • determining the current value of the moment moving average based on the preset target attenuation coefficient and the maximum value of the moment moving average includes: calculating the current value of the moment moving average according to a first formula; the first formula is:
  • ⁇ t is the current value of the moment moving average
  • is the maximum value of the moment moving average
  • t represents the current training time
  • ⁇ 2 is the target attenuation coefficient.
  • the warmup strategy is used to adjust the learning rate, including: calculating the update gradient at the current training moment based on the training data, training results, and model parameters output at the previous training moment; based on the preset object attenuation coefficient, update gradient Calculate a new first moving average with the first moving average of the previous training moment; calculate a new second moving average based on the updated gradient, target attenuation coefficient, new first moving average and the second moving average of the previous training moment; calculate a new second moving average based on the new
  • the sliding average and target attenuation coefficient are used to calculate the learning rate at the current training moment; accordingly, new parameters are calculated based on the adjusted learning rate, including: the learning rate based on the current training moment, the model parameters output at the previous training moment, and the preset The forward step length, the correction term of the new second moving average, and the correction term of the new first moving average calculate new parameters.
  • the process of calculating new parameters includes:
  • mt is the new first moving average
  • ⁇ 1 is the object attenuation coefficient
  • mt-1 is the first moving average of the previous training moment
  • gt is the updated gradient of the current training moment.
  • vt ⁇ 2 v t-1 +(1- ⁇ 2 )(g t -m t ) 2 .
  • vt is the new second moving average
  • ⁇ 2 is the target attenuation coefficient
  • mt is the new first moving average
  • vt-1 is the second moving average of the previous training moment
  • gt is the updated gradient of the current training moment.
  • ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, that is, the preset threshold value is 4.
  • vt is the new second moving average
  • ⁇ 2 is the target attenuation coefficient.
  • ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, that is, the preset threshold value is 4.
  • ⁇ t is the forward step length
  • rt is the correction term of the new second moving average vt
  • mt is the correction term of the new first moving average mt.
  • ⁇ t is the current value of the moment moving average
  • is the maximum value of the moment moving average.
  • mt determines the direction of gradient descent during model training
  • vt and ⁇ t jointly determine the magnitude of gradient descent during model training.
  • calculating new parameters can make the calculation error always relatively small. That is: passed in the early stage of model training Increase the original mt; when t becomes larger, ⁇ 1t approaches 0, so 1- ⁇ 1t approaches 1, so the later Close to the original mt. According to this, when ⁇ t>4, the learning rate gradually and steadily increases, which helps to slow down the early over-fitting phenomenon of the model in the initial training stage and maintain the stability of the distribution.
  • using stochastic gradient descent and momentum algorithms to adjust the learning rate includes: calculating the updated gradient at the current training time based on training data, training results, and model parameters output at the previous training time; based on preset iteration parameters Calculate the learning rate at the current training moment with the preset forward step length, the target moving average of the previous training moment, and the update gradient; accordingly, calculate new parameters based on the adjusted learning rate, including: based on the learning rate at the current training moment and Calculate new parameters based on the model parameters output at the previous training moment.
  • the process of calculating new parameters includes:
  • ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4.
  • lt-1 is the learning rate output at the previous training moment
  • ⁇ t is the forward step length
  • gt is the update gradient at the current training moment
  • is the preset iteration parameter.
  • ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4.
  • ⁇ t-1 represents the model parameters output at the previous training moment.
  • the stochastic gradient descent plus momentum (SGD+Momentum) algorithm can be used to effectively avoid the negative learning rate and keep the learning rate in a relatively stable fluctuation state in the early stage.
  • the following embodiment builds a hardware interconnection system based on the CXL protocol for model training, which can effectively solve data transmission delay and bandwidth problems, and can support various mainstream communication topologies such as Parameter server and Ring-Allreduce.
  • the hardware interconnection system includes computing devices CPU, GPU, FPGA, and ASIC. It can realize memory sharing of multiple heterogeneous computing devices through the CXL protocol, open up the communication delay barrier between heterogeneous devices, and significantly To increase the speed of data interaction, please see Figure 2 for the overall architecture of the system.
  • Python is used to implement the top-level deep learning framework
  • OneAPI programming is used to implement the target operator.
  • the target operator can be called by the top-level deep learning framework and runs on different underlying computing devices.
  • the different underlying computing devices CPU, GPU, FPGA, and ASIC are interconnected through the CXL protocol, and each computing device and the host device are also connected through the CXL protocol.
  • the target operator implementation includes: the model that needs to be trained, the Rectified-Adabelief optimization algorithm and its related parameters.
  • each computing device (CPU, GPU, FPGA, ASIC, etc.) is connected to the host device through an adapter device.
  • each computing device can be shared between different host devices, that is, different hosts share all computing devices.
  • Each connection line shown in Figure 3 uses the CXL protocol to realize interconnection sharing of IO, cache and memory.
  • each computing device Taking the memory sharing of each computing device as an example, the schematic diagram of memory sharing of each computing device is shown in Figure 4.
  • each host and each computing device accesses the memory of a certain computing device, it is like accessing its own memory.
  • this embodiment uses the Adabelief optimization algorithm to solve the problem of excessive learning rate variance caused by insufficient data in the early stage of training of the optimization algorithm, achieve faster convergence speed when completing various deep learning tasks, and avoid prematurely falling into local problems.
  • Optimal solution a heterogeneous computing system that implements the distributed Rectified-Adabelief optimization algorithm is built based on the CXL communication protocol, and the Rectified-Adabelief optimization algorithm is implemented based on the OneAPI programming model, so that it can run on a variety of heterogeneous computing devices. Achieve memory consistency between heterogeneous computing devices, greatly increase data transmission bandwidth, and reduce data interaction delays between computing devices.
  • a data processing system provided by an embodiment of the present application is introduced below.
  • the data processing system described below and the data processing method described above can be referred to each other.
  • the embodiment of the present application discloses a data processing system, including: a host, and a hardware computing platform connected to the host through the CXL protocol;
  • Host used to provide training data for training target models; new models trained based on the CXL protocol shared hardware computing platform; and
  • Hardware computing platform used to share training data in the host based on the CXL protocol; call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results; use the new parameters to update the target model to obtain a new model; if new If the model meets the convergence conditions, the new model is retained; calculating new parameters includes: determining the current value of the moment moving average, adjusting the learning rate based on the current value of the moment moving average, and calculating new parameters based on the adjusted learning rate.
  • the hardware computing platform is specifically used for:
  • the warmup strategy is used to adjust the learning rate; otherwise, the stochastic gradient descent and momentum algorithms are used to adjust the learning rate.
  • the hardware computing platform is specifically used for:
  • the first formula is:
  • ⁇ t is the current value of the moment moving average
  • is the maximum value of the moment moving average
  • t represents the current training time
  • ⁇ 2 is the target attenuation coefficient
  • the hardware computing platform is specifically used for:
  • the hardware computing platform is specifically used for:
  • the hardware computing platform is specifically used for:
  • the hardware computing platform is specifically used for:
  • the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
  • the computing module includes: any one or combination of CPU, GPU, FPGA, and ASIC.
  • this embodiment provides a data processing system that can reduce data transmission overhead between the host and the hardware module and improve model training efficiency.
  • An electronic device provided by an embodiment of the present application is introduced below.
  • An electronic device described below and a data processing method and system described above may be referred to each other.
  • an electronic device including:
  • One or more memories 501 for storing computer readable instructions
  • One or more processors 502 are configured to execute computer-readable instructions to implement the methods disclosed in any of the above embodiments.
  • non-volatile computer-readable storage medium provided by embodiments of the present application.
  • the non-volatile computer-readable storage medium described below and the data processing method, system and device described above can be Cross-reference.
  • the specific steps of this method reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be described again here.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other form of non-volatile computer-readable storage medium known to the public.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

La présente demande concerne un procédé et un système de traitement de données, ainsi qu'un dispositif et un support de stockage lisible dans le domaine technique des ordinateurs. Dans la présente demande, un hôte est connecté à une plateforme informatique matérielle au moyen d'un protocole CXL, de telle sorte que l'hôte et la plateforme informatique matérielle peuvent partager la mémoire, l'IO et le cache l'un de l'autre. De cette manière, des données d'apprentissage n'ont pas besoin d'être transmises au moyen de supports de stockage tels qu'une mémoire hôte, une mémoire cache de processeur graphique et une mémoire de processeur graphique ; au lieu de cela, des données d'apprentissage dans la mémoire hôte sont directement lues par la plateforme informatique matérielle, ce qui permet de réduire le surdébit de transmission de données. De plus, la plateforme informatique matérielle peut ajuster un taux d'apprentissage sur la base d'une valeur de courant moyen de déplacement de moment et ensuite calculer de nouveaux paramètres d'un modèle, de telle sorte que des paramètres de modèle peuvent être stabilisés, la précision de modèle peut être garantie, et l'efficacité d'apprentissage peut être améliorée.
PCT/CN2022/118104 2022-04-14 2022-09-09 Procédé et système de traitement de données, dispositif et support de stockage lisible WO2023197520A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210387060.8 2022-04-14
CN202210387060.8A CN114461568B (zh) 2022-04-14 2022-04-14 一种数据处理方法、系统、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2023197520A1 true WO2023197520A1 (fr) 2023-10-19

Family

ID=81418423

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118104 WO2023197520A1 (fr) 2022-04-14 2022-09-09 Procédé et système de traitement de données, dispositif et support de stockage lisible

Country Status (2)

Country Link
CN (1) CN114461568B (fr)
WO (1) WO2023197520A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112466A (zh) * 2023-10-25 2023-11-24 浪潮(北京)电子信息产业有限公司 一种数据处理方法、装置、设备、存储介质及分布式集群
CN117785489A (zh) * 2024-02-27 2024-03-29 苏州元脑智能科技有限公司 一种服务器及一种任务执行方法、装置和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461568B (zh) * 2022-04-14 2022-07-08 苏州浪潮智能科技有限公司 一种数据处理方法、系统、设备及可读存储介质
CN114925829A (zh) * 2022-07-18 2022-08-19 山东海量信息技术研究院 一种神经网络训练方法、装置、电子设备及存储介质
CN115310566A (zh) * 2022-10-12 2022-11-08 浪潮电子信息产业股份有限公司 分布式训练系统、方法、装置、设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312415A (zh) * 2020-02-27 2021-08-27 Sap欧洲公司 用于数据库操作的近存储器加速
US20210390414A1 (en) * 2020-06-10 2021-12-16 Nvidia Corporation Accelerated training for neural network models
CN114169534A (zh) * 2021-12-09 2022-03-11 京东科技信息技术有限公司 分布式机器学习模型的训练方法、装置、设备及介质
CN114257386A (zh) * 2020-09-10 2022-03-29 华为技术有限公司 检测模型的训练方法、系统、设备及存储介质
CN114461568A (zh) * 2022-04-14 2022-05-10 苏州浪潮智能科技有限公司 一种数据处理方法、系统、设备及可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991095B (zh) * 2016-01-21 2021-09-28 阿里巴巴集团控股有限公司 机器异常的处理方法、学习速率的调整方法及装置
CN110033081A (zh) * 2019-03-08 2019-07-19 华为技术有限公司 一种确定学习率的方法和装置
US20210142177A1 (en) * 2019-11-13 2021-05-13 Nvidia Corporation Synthesizing data for training one or more neural networks
CN113723692A (zh) * 2021-09-02 2021-11-30 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、介质及程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312415A (zh) * 2020-02-27 2021-08-27 Sap欧洲公司 用于数据库操作的近存储器加速
US20210390414A1 (en) * 2020-06-10 2021-12-16 Nvidia Corporation Accelerated training for neural network models
CN114257386A (zh) * 2020-09-10 2022-03-29 华为技术有限公司 检测模型的训练方法、系统、设备及存储介质
CN114169534A (zh) * 2021-12-09 2022-03-11 京东科技信息技术有限公司 分布式机器学习模型的训练方法、装置、设备及介质
CN114461568A (zh) * 2022-04-14 2022-05-10 苏州浪潮智能科技有限公司 一种数据处理方法、系统、设备及可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117112466A (zh) * 2023-10-25 2023-11-24 浪潮(北京)电子信息产业有限公司 一种数据处理方法、装置、设备、存储介质及分布式集群
CN117112466B (zh) * 2023-10-25 2024-02-09 浪潮(北京)电子信息产业有限公司 一种数据处理方法、装置、设备、存储介质及分布式集群
CN117785489A (zh) * 2024-02-27 2024-03-29 苏州元脑智能科技有限公司 一种服务器及一种任务执行方法、装置和存储介质
CN117785489B (zh) * 2024-02-27 2024-05-10 苏州元脑智能科技有限公司 一种服务器及一种任务执行方法、装置和存储介质

Also Published As

Publication number Publication date
CN114461568B (zh) 2022-07-08
CN114461568A (zh) 2022-05-10

Similar Documents

Publication Publication Date Title
WO2023197520A1 (fr) Procédé et système de traitement de données, dispositif et support de stockage lisible
WO2021012869A1 (fr) Procédé et dispositif de détermination de vitesse de transmission, appareil et support de stockage
CN105827537B (zh) 一种基于quic协议的拥塞改进方法
WO2020143304A1 (fr) Procédé et appareil d'optimisation de la fonction de perte, dispositif informatique et support de stockage
US7353339B2 (en) Adaptive caching
US9237107B2 (en) Fair quantized congestion notification (FQCN) to mitigate transport control protocol (TCP) throughput collapse in data center networks
JP6433146B2 (ja) 情報処理装置、システム、情報処理方法、コンピュータプログラム
US9680742B2 (en) Packet output processing
WO2022198994A1 (fr) Procédé et appareil de planification de mouvement de bras robotisé, ainsi que support de stockage lisible et bras robotisé
WO2015130404A1 (fr) Mise en forme de paquets dans un processeur réseau
WO2018077236A1 (fr) Procédé et système d'apprentissage automatique distribué
CN109818863A (zh) 链路优先级设置方法及装置
WO2015130403A1 (fr) Planification de paquets dans un processeur réseau
JP2018110387A (ja) リアルタイムライブ環境でのバッファに基づく帯域幅測定および適応的データ送信のための方法およびシステム
US10963386B2 (en) Dynamically determining tracks to prestage from storage to cache by training a machine learning module
WO2021238274A1 (fr) Procédé de mise à jour d'informations de gradient pour apprentissage profond distribué, et appareil associé
WO2022252546A1 (fr) Procédé et dispositif de réglage d'informations, et support d'enregistrement
WO2024098953A1 (fr) Procédé et appareil d'épissage de ligne de voie, et dispositif électronique et support de stockage
JP4616391B2 (ja) 動的データプリフェッチのためのシステム及び方法
JP4782082B2 (ja) パケット処理装置、方法、およびプログラム
CN112383485A (zh) 一种网络拥塞控制方法及装置
WO2017000684A1 (fr) Procédé de lecture de données, dispositif pair, dispositif de commande, et support de stockage
CN113902128B (zh) 改善边缘设备利用效率的异步联邦学习方法、装置及介质
WO2021115039A1 (fr) Plateforme fpga, procédé d'évaluation de performance et d'optimisation de conception associé, et support de stockage
US10061726B2 (en) Precision time management (PTM) for USB retimers that accurately adjusts timestamp fields of isochronous timestamp packets (ITPS)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22937158

Country of ref document: EP

Kind code of ref document: A1