WO2023197520A1 - Procédé et système de traitement de données, dispositif et support de stockage lisible - Google Patents
Procédé et système de traitement de données, dispositif et support de stockage lisible Download PDFInfo
- Publication number
- WO2023197520A1 WO2023197520A1 PCT/CN2022/118104 CN2022118104W WO2023197520A1 WO 2023197520 A1 WO2023197520 A1 WO 2023197520A1 CN 2022118104 W CN2022118104 W CN 2022118104W WO 2023197520 A1 WO2023197520 A1 WO 2023197520A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- moving average
- training
- new
- moment
- model
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 214
- 230000015654 memory Effects 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims description 46
- 238000004364 calculation method Methods 0.000 claims description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000012937 correction Methods 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 14
- 238000013459 approach Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 230000000717 retained effect Effects 0.000 claims description 5
- 238000009795 derivation Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- -1 Radam Natural products 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present application relates to the field of computer technology, and in particular, to a data processing method, system, equipment and non-volatile computer-readable storage medium.
- model training can be carried out with the help of hardware modules (such as GPU, Graphics Processing Unit).
- hardware modules such as GPU, Graphics Processing Unit
- the server as the host sends a large amount of training data to the hardware module, and the hardware module processes the training data for model training. After the model training is completed, the hardware module feeds back the trained model to the host.
- the inventor realized that due to the large amount of training data, and the data transmission between the host and the hardware module needs to go through storage media such as host memory, GPU cache, GPU memory, etc., the data transmission between the host and the hardware module is The overhead is large and will affect the model training efficiency.
- this application provides a data processing method, which is applied to a hardware computing platform connected to a host through the CXL (Compute Express Link, high-speed interconnection communication protocol) protocol, including:
- the training data used to train the target model is shared in the host based on the CXL protocol;
- calculating the new parameters includes: determining the current value of the moment moving average, and adjusting the learning rate based on the current value of the moment moving average, based on the adjusted New parameters for learning rate calculation;
- the new model In response to the new model meeting the convergence conditions, the new model is retained and the host is enabled to share the new model based on the CXL protocol.
- determining the current value of the moment moving average and adjusting the learning rate based on the current value of the moment moving average includes:
- the warmup strategy is used to adjust the learning rate; in response to the current value of the moment moving average being less than or equal to the preset threshold, the stochastic gradient descent and momentum algorithms are used to adjust the learning rate.
- determining the current value of the moment moving average based on the preset target attenuation coefficient and the moment moving average maximum value includes:
- the first formula is:
- ⁇ t is the current value of the moment moving average
- ⁇ is the maximum value of the moment moving average
- t represents the current training time
- ⁇ 2 is the target attenuation coefficient
- the warmup strategy is used to adjust the learning rate, including:
- new parameters are calculated based on the adjusted learning rate, including:
- stochastic gradient descent and momentum algorithms are used to adjust the learning rate, including:
- new parameters are calculated based on the adjusted learning rate, including:
- the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
- the computing module includes: any one or combination of CPU, GPU, FPGA, and ASIC.
- calculating the update gradient at the current training moment based on the training data, training results, and model parameters output at the previous training moment includes:
- gt is the updated gradient at the current training time
- ⁇ t-1 represents the model parameters output at the previous training time
- X is the training data
- ft( ⁇ t-1; X) represents the training result for the training data.
- calculating a new first moving average based on the preset object attenuation coefficient, update gradient and the first moving average of the previous training moment includes:
- mt is the new first moving average
- ⁇ 1 is the object attenuation coefficient
- mt-1 is the first moving average of the previous training moment
- gt is the updated gradient of the current training moment.
- calculating a new second moving average based on the updated gradient, the target attenuation coefficient, the new first moving average, and the second moving average of the previous training moment includes:
- vt is the new second moving average
- ⁇ 2 is the target attenuation coefficient
- mt is the new first moving average
- vt-1 is the second moving average of the previous training moment
- gt is the updated gradient of the current training moment.
- calculating the learning rate at the current training moment based on the new second moving average and the target attenuation coefficient includes:
- ⁇ t>4 means that the current value of the moment sliding average is greater than the preset threshold, the preset threshold value is 4, vt is the new second sliding average, and ⁇ 2 is the target attenuation coefficient.
- calculating the learning rate at the current training moment based on the new second moving average and the target attenuation coefficient also includes:
- ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4; among them, lt-1 is the learning rate output at the previous training moment, ⁇ t is the forward step length, and gt is the current training moment.
- the update gradient of , ⁇ is the preset iteration parameter.
- new parameters are calculated based on the adjusted learning rate, including:
- ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, the preset threshold value is 4, ⁇ t is the forward step length, rt is the correction term of the new second moving average vt, is the correction term of the new first moving average mt; ⁇ t is the current value of the moment moving average, and ⁇ is the maximum value of the moment moving average.
- calculating new parameters based on the adjusted learning rate also includes:
- ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, which is 4, and ⁇ t-1 means the model parameters output at the previous training moment.
- the new first moving average mt determines the descending direction of the gradient during the model training process, and vt and ⁇ t jointly determine the descending size of the gradient during the model training process.
- the new first moving average mt is calculated Used to calculate new parameters to reduce calculation errors.
- the second aspect of this application provides a data processing system, including: a host, and a hardware computing platform connected to the host through the CXL protocol;
- Host used to provide training data for training target models; new models trained based on the CXL protocol shared hardware computing platform; and
- Hardware computing platform used to share training data in the host based on the CXL protocol; call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results; use the new parameters to update the target model to obtain a new model; if new If the model meets the convergence conditions, the new model is retained; calculating new parameters includes: determining the current value of the moment moving average, adjusting the learning rate based on the current value of the moment moving average, and calculating new parameters based on the adjusted learning rate.
- a third aspect of this application provides an electronic device, including:
- One or more memories for storing computer-readable instructions
- One or more processors used to execute computer-readable instructions to implement the aforementioned disclosed data processing methods.
- a fourth aspect of the application provides one or more non-volatile computer-readable storage media storing computer-readable instructions. When executed by one or more processors, the computer-readable instructions cause one or more processes to The processor executes the data processing method disclosed above.
- Figure 1 is a flow chart of a data processing method provided in one or more embodiments of the present application.
- Figure 2 is a schematic diagram of a system framework provided in one or more embodiments of the present application.
- Figure 3 is a schematic diagram of a connection between devices provided in one or more embodiments of the present application.
- Figure 4 is a schematic diagram of memory sharing based on the CXL protocol provided in one or more embodiments of the present application.
- Figure 5 is a schematic diagram of an electronic device provided in one or more embodiments of the present application.
- this application provides a data processing solution that can reduce the data transmission overhead between the host and the hardware module and improve model training efficiency.
- an embodiment of the present application discloses a data processing method, which is applied to a hardware computing platform connected to a host through the CXL protocol, including:
- the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
- Computing modules include: any one or combination of CPU, GPU, FPGA, and ASIC.
- the target model can be any model, such as CNN, natural language processing model, image classification model, etc.
- S102 Call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results. Calculating the new parameters includes: determining the current value of the moment moving average, and adjusting the learning rate based on the current value of the moment moving average. The new parameters are calculated using the subsequent learning rate.
- model training process is the process of updating model parameters.
- Current optimization algorithms used to update model parameters include AdaGrad, RMSProp, Adam, etc. Improved algorithms for Adam such as Radam, Adabelief, etc.
- This embodiment uses Adabelief to update model parameters. Specifically, based on Adabelief, parameters such as the forward step length, two attenuation coefficients, iteration parameter ⁇ , and the maximum value of the moment moving average can be set. After each training result is obtained, new model parameters can be calculated based on these parameters at the previous training moment. In this embodiment, in order to avoid the impact of the learning rate on parameter calculation, the current value of the moment sliding average is first calculated, and based on the moment sliding The average current value adjusts the learning rate before calculating new parameters, so that the appropriate learning rate can be determined and the model parameters can be steadily updated. Among them, the calculated new parameters include the weight parameters and bias parameters of the model, that is, the new parameters of the model calculated each time are a collection of many parameters.
- the new model meets the convergence conditions, the new model is retained and the host is allowed to share the new model based on the CXL protocol.
- the convergence conditions can be set with reference to existing related technologies, such as reaching the maximum number of iterations, etc.
- the host and the hardware computing platform are connected through the CXL protocol. Therefore, the host and the hardware computing platform can share each other's memory, IO and cache. Then the training data is transmitted from the host to the hardware computing platform without going through the host memory, Instead of storage media such as GPU cache and GPU memory, the hardware computing platform directly reads the training data in the host memory, thereby reducing data transmission overhead.
- the hardware computing platform can adjust the learning rate based on the current value of the moment moving average, and calculate new parameters of the model based on the adjusted learning rate, thereby stabilizing the model parameters, avoiding falling into local optimality, and ensuring model accuracy. , improve training efficiency. It can be seen that this solution can reduce the data transmission overhead between the host and the hardware module and improve the efficiency of model training.
- determining the current value of the moment moving average and adjusting the learning rate based on the current value of the moment moving average includes: determining the moment moving average based on the preset target attenuation coefficient and the maximum value of the moment moving average. Current value; in response to the current value of the moment moving average being greater than the preset threshold, use the warmup strategy to adjust the learning rate; corresponding to the current value of the moment moving average being greater than the preset threshold, use stochastic gradient descent and momentum algorithms to adjust the learning rate.
- determining the current value of the moment moving average based on the preset target attenuation coefficient and the maximum value of the moment moving average includes: calculating the current value of the moment moving average according to a first formula; the first formula is:
- ⁇ t is the current value of the moment moving average
- ⁇ is the maximum value of the moment moving average
- t represents the current training time
- ⁇ 2 is the target attenuation coefficient.
- the warmup strategy is used to adjust the learning rate, including: calculating the update gradient at the current training moment based on the training data, training results, and model parameters output at the previous training moment; based on the preset object attenuation coefficient, update gradient Calculate a new first moving average with the first moving average of the previous training moment; calculate a new second moving average based on the updated gradient, target attenuation coefficient, new first moving average and the second moving average of the previous training moment; calculate a new second moving average based on the new
- the sliding average and target attenuation coefficient are used to calculate the learning rate at the current training moment; accordingly, new parameters are calculated based on the adjusted learning rate, including: the learning rate based on the current training moment, the model parameters output at the previous training moment, and the preset The forward step length, the correction term of the new second moving average, and the correction term of the new first moving average calculate new parameters.
- the process of calculating new parameters includes:
- mt is the new first moving average
- ⁇ 1 is the object attenuation coefficient
- mt-1 is the first moving average of the previous training moment
- gt is the updated gradient of the current training moment.
- vt ⁇ 2 v t-1 +(1- ⁇ 2 )(g t -m t ) 2 .
- vt is the new second moving average
- ⁇ 2 is the target attenuation coefficient
- mt is the new first moving average
- vt-1 is the second moving average of the previous training moment
- gt is the updated gradient of the current training moment.
- ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, that is, the preset threshold value is 4.
- vt is the new second moving average
- ⁇ 2 is the target attenuation coefficient.
- ⁇ t>4 means that the current value of the moment moving average is greater than the preset threshold, that is, the preset threshold value is 4.
- ⁇ t is the forward step length
- rt is the correction term of the new second moving average vt
- mt is the correction term of the new first moving average mt.
- ⁇ t is the current value of the moment moving average
- ⁇ is the maximum value of the moment moving average.
- mt determines the direction of gradient descent during model training
- vt and ⁇ t jointly determine the magnitude of gradient descent during model training.
- calculating new parameters can make the calculation error always relatively small. That is: passed in the early stage of model training Increase the original mt; when t becomes larger, ⁇ 1t approaches 0, so 1- ⁇ 1t approaches 1, so the later Close to the original mt. According to this, when ⁇ t>4, the learning rate gradually and steadily increases, which helps to slow down the early over-fitting phenomenon of the model in the initial training stage and maintain the stability of the distribution.
- using stochastic gradient descent and momentum algorithms to adjust the learning rate includes: calculating the updated gradient at the current training time based on training data, training results, and model parameters output at the previous training time; based on preset iteration parameters Calculate the learning rate at the current training moment with the preset forward step length, the target moving average of the previous training moment, and the update gradient; accordingly, calculate new parameters based on the adjusted learning rate, including: based on the learning rate at the current training moment and Calculate new parameters based on the model parameters output at the previous training moment.
- the process of calculating new parameters includes:
- ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4.
- lt-1 is the learning rate output at the previous training moment
- ⁇ t is the forward step length
- gt is the update gradient at the current training moment
- ⁇ is the preset iteration parameter.
- ⁇ t ⁇ 4 means that the current value of the moment moving average is not greater than the preset threshold, that is, the preset threshold value is 4.
- ⁇ t-1 represents the model parameters output at the previous training moment.
- the stochastic gradient descent plus momentum (SGD+Momentum) algorithm can be used to effectively avoid the negative learning rate and keep the learning rate in a relatively stable fluctuation state in the early stage.
- the following embodiment builds a hardware interconnection system based on the CXL protocol for model training, which can effectively solve data transmission delay and bandwidth problems, and can support various mainstream communication topologies such as Parameter server and Ring-Allreduce.
- the hardware interconnection system includes computing devices CPU, GPU, FPGA, and ASIC. It can realize memory sharing of multiple heterogeneous computing devices through the CXL protocol, open up the communication delay barrier between heterogeneous devices, and significantly To increase the speed of data interaction, please see Figure 2 for the overall architecture of the system.
- Python is used to implement the top-level deep learning framework
- OneAPI programming is used to implement the target operator.
- the target operator can be called by the top-level deep learning framework and runs on different underlying computing devices.
- the different underlying computing devices CPU, GPU, FPGA, and ASIC are interconnected through the CXL protocol, and each computing device and the host device are also connected through the CXL protocol.
- the target operator implementation includes: the model that needs to be trained, the Rectified-Adabelief optimization algorithm and its related parameters.
- each computing device (CPU, GPU, FPGA, ASIC, etc.) is connected to the host device through an adapter device.
- each computing device can be shared between different host devices, that is, different hosts share all computing devices.
- Each connection line shown in Figure 3 uses the CXL protocol to realize interconnection sharing of IO, cache and memory.
- each computing device Taking the memory sharing of each computing device as an example, the schematic diagram of memory sharing of each computing device is shown in Figure 4.
- each host and each computing device accesses the memory of a certain computing device, it is like accessing its own memory.
- this embodiment uses the Adabelief optimization algorithm to solve the problem of excessive learning rate variance caused by insufficient data in the early stage of training of the optimization algorithm, achieve faster convergence speed when completing various deep learning tasks, and avoid prematurely falling into local problems.
- Optimal solution a heterogeneous computing system that implements the distributed Rectified-Adabelief optimization algorithm is built based on the CXL communication protocol, and the Rectified-Adabelief optimization algorithm is implemented based on the OneAPI programming model, so that it can run on a variety of heterogeneous computing devices. Achieve memory consistency between heterogeneous computing devices, greatly increase data transmission bandwidth, and reduce data interaction delays between computing devices.
- a data processing system provided by an embodiment of the present application is introduced below.
- the data processing system described below and the data processing method described above can be referred to each other.
- the embodiment of the present application discloses a data processing system, including: a host, and a hardware computing platform connected to the host through the CXL protocol;
- Host used to provide training data for training target models; new models trained based on the CXL protocol shared hardware computing platform; and
- Hardware computing platform used to share training data in the host based on the CXL protocol; call the target model to process the training data to obtain training results, and calculate new parameters of the target model based on the training results; use the new parameters to update the target model to obtain a new model; if new If the model meets the convergence conditions, the new model is retained; calculating new parameters includes: determining the current value of the moment moving average, adjusting the learning rate based on the current value of the moment moving average, and calculating new parameters based on the adjusted learning rate.
- the hardware computing platform is specifically used for:
- the warmup strategy is used to adjust the learning rate; otherwise, the stochastic gradient descent and momentum algorithms are used to adjust the learning rate.
- the hardware computing platform is specifically used for:
- the first formula is:
- ⁇ t is the current value of the moment moving average
- ⁇ is the maximum value of the moment moving average
- t represents the current training time
- ⁇ 2 is the target attenuation coefficient
- the hardware computing platform is specifically used for:
- the hardware computing platform is specifically used for:
- the hardware computing platform is specifically used for:
- the hardware computing platform is specifically used for:
- the hardware computing platform includes multiple computing modules, and each computing module shares memory based on the CXL protocol.
- the computing module includes: any one or combination of CPU, GPU, FPGA, and ASIC.
- this embodiment provides a data processing system that can reduce data transmission overhead between the host and the hardware module and improve model training efficiency.
- An electronic device provided by an embodiment of the present application is introduced below.
- An electronic device described below and a data processing method and system described above may be referred to each other.
- an electronic device including:
- One or more memories 501 for storing computer readable instructions
- One or more processors 502 are configured to execute computer-readable instructions to implement the methods disclosed in any of the above embodiments.
- non-volatile computer-readable storage medium provided by embodiments of the present application.
- the non-volatile computer-readable storage medium described below and the data processing method, system and device described above can be Cross-reference.
- the specific steps of this method reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be described again here.
- RAM random access memory
- ROM read-only memory
- electrically programmable ROM electrically erasable programmable ROM
- registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. Any other form of non-volatile computer-readable storage medium known to the public.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Stored Programmes (AREA)
Abstract
La présente demande concerne un procédé et un système de traitement de données, ainsi qu'un dispositif et un support de stockage lisible dans le domaine technique des ordinateurs. Dans la présente demande, un hôte est connecté à une plateforme informatique matérielle au moyen d'un protocole CXL, de telle sorte que l'hôte et la plateforme informatique matérielle peuvent partager la mémoire, l'IO et le cache l'un de l'autre. De cette manière, des données d'apprentissage n'ont pas besoin d'être transmises au moyen de supports de stockage tels qu'une mémoire hôte, une mémoire cache de processeur graphique et une mémoire de processeur graphique ; au lieu de cela, des données d'apprentissage dans la mémoire hôte sont directement lues par la plateforme informatique matérielle, ce qui permet de réduire le surdébit de transmission de données. De plus, la plateforme informatique matérielle peut ajuster un taux d'apprentissage sur la base d'une valeur de courant moyen de déplacement de moment et ensuite calculer de nouveaux paramètres d'un modèle, de telle sorte que des paramètres de modèle peuvent être stabilisés, la précision de modèle peut être garantie, et l'efficacité d'apprentissage peut être améliorée.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210387060.8 | 2022-04-14 | ||
CN202210387060.8A CN114461568B (zh) | 2022-04-14 | 2022-04-14 | 一种数据处理方法、系统、设备及可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023197520A1 true WO2023197520A1 (fr) | 2023-10-19 |
Family
ID=81418423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/118104 WO2023197520A1 (fr) | 2022-04-14 | 2022-09-09 | Procédé et système de traitement de données, dispositif et support de stockage lisible |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114461568B (fr) |
WO (1) | WO2023197520A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117112466A (zh) * | 2023-10-25 | 2023-11-24 | 浪潮(北京)电子信息产业有限公司 | 一种数据处理方法、装置、设备、存储介质及分布式集群 |
CN117785489A (zh) * | 2024-02-27 | 2024-03-29 | 苏州元脑智能科技有限公司 | 一种服务器及一种任务执行方法、装置和存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114461568B (zh) * | 2022-04-14 | 2022-07-08 | 苏州浪潮智能科技有限公司 | 一种数据处理方法、系统、设备及可读存储介质 |
CN114925829A (zh) * | 2022-07-18 | 2022-08-19 | 山东海量信息技术研究院 | 一种神经网络训练方法、装置、电子设备及存储介质 |
CN115310566A (zh) * | 2022-10-12 | 2022-11-08 | 浪潮电子信息产业股份有限公司 | 分布式训练系统、方法、装置、设备及可读存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312415A (zh) * | 2020-02-27 | 2021-08-27 | Sap欧洲公司 | 用于数据库操作的近存储器加速 |
US20210390414A1 (en) * | 2020-06-10 | 2021-12-16 | Nvidia Corporation | Accelerated training for neural network models |
CN114169534A (zh) * | 2021-12-09 | 2022-03-11 | 京东科技信息技术有限公司 | 分布式机器学习模型的训练方法、装置、设备及介质 |
CN114257386A (zh) * | 2020-09-10 | 2022-03-29 | 华为技术有限公司 | 检测模型的训练方法、系统、设备及存储介质 |
CN114461568A (zh) * | 2022-04-14 | 2022-05-10 | 苏州浪潮智能科技有限公司 | 一种数据处理方法、系统、设备及可读存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991095B (zh) * | 2016-01-21 | 2021-09-28 | 阿里巴巴集团控股有限公司 | 机器异常的处理方法、学习速率的调整方法及装置 |
CN110033081A (zh) * | 2019-03-08 | 2019-07-19 | 华为技术有限公司 | 一种确定学习率的方法和装置 |
US20210142177A1 (en) * | 2019-11-13 | 2021-05-13 | Nvidia Corporation | Synthesizing data for training one or more neural networks |
CN113723692A (zh) * | 2021-09-02 | 2021-11-30 | 深圳前海微众银行股份有限公司 | 数据处理方法、装置、设备、介质及程序产品 |
-
2022
- 2022-04-14 CN CN202210387060.8A patent/CN114461568B/zh active Active
- 2022-09-09 WO PCT/CN2022/118104 patent/WO2023197520A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312415A (zh) * | 2020-02-27 | 2021-08-27 | Sap欧洲公司 | 用于数据库操作的近存储器加速 |
US20210390414A1 (en) * | 2020-06-10 | 2021-12-16 | Nvidia Corporation | Accelerated training for neural network models |
CN114257386A (zh) * | 2020-09-10 | 2022-03-29 | 华为技术有限公司 | 检测模型的训练方法、系统、设备及存储介质 |
CN114169534A (zh) * | 2021-12-09 | 2022-03-11 | 京东科技信息技术有限公司 | 分布式机器学习模型的训练方法、装置、设备及介质 |
CN114461568A (zh) * | 2022-04-14 | 2022-05-10 | 苏州浪潮智能科技有限公司 | 一种数据处理方法、系统、设备及可读存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117112466A (zh) * | 2023-10-25 | 2023-11-24 | 浪潮(北京)电子信息产业有限公司 | 一种数据处理方法、装置、设备、存储介质及分布式集群 |
CN117112466B (zh) * | 2023-10-25 | 2024-02-09 | 浪潮(北京)电子信息产业有限公司 | 一种数据处理方法、装置、设备、存储介质及分布式集群 |
CN117785489A (zh) * | 2024-02-27 | 2024-03-29 | 苏州元脑智能科技有限公司 | 一种服务器及一种任务执行方法、装置和存储介质 |
CN117785489B (zh) * | 2024-02-27 | 2024-05-10 | 苏州元脑智能科技有限公司 | 一种服务器及一种任务执行方法、装置和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114461568B (zh) | 2022-07-08 |
CN114461568A (zh) | 2022-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023197520A1 (fr) | Procédé et système de traitement de données, dispositif et support de stockage lisible | |
WO2021012869A1 (fr) | Procédé et dispositif de détermination de vitesse de transmission, appareil et support de stockage | |
CN105827537B (zh) | 一种基于quic协议的拥塞改进方法 | |
WO2020143304A1 (fr) | Procédé et appareil d'optimisation de la fonction de perte, dispositif informatique et support de stockage | |
US7353339B2 (en) | Adaptive caching | |
US9237107B2 (en) | Fair quantized congestion notification (FQCN) to mitigate transport control protocol (TCP) throughput collapse in data center networks | |
JP6433146B2 (ja) | 情報処理装置、システム、情報処理方法、コンピュータプログラム | |
US9680742B2 (en) | Packet output processing | |
WO2022198994A1 (fr) | Procédé et appareil de planification de mouvement de bras robotisé, ainsi que support de stockage lisible et bras robotisé | |
WO2015130404A1 (fr) | Mise en forme de paquets dans un processeur réseau | |
WO2018077236A1 (fr) | Procédé et système d'apprentissage automatique distribué | |
CN109818863A (zh) | 链路优先级设置方法及装置 | |
WO2015130403A1 (fr) | Planification de paquets dans un processeur réseau | |
JP2018110387A (ja) | リアルタイムライブ環境でのバッファに基づく帯域幅測定および適応的データ送信のための方法およびシステム | |
US10963386B2 (en) | Dynamically determining tracks to prestage from storage to cache by training a machine learning module | |
WO2021238274A1 (fr) | Procédé de mise à jour d'informations de gradient pour apprentissage profond distribué, et appareil associé | |
WO2022252546A1 (fr) | Procédé et dispositif de réglage d'informations, et support d'enregistrement | |
WO2024098953A1 (fr) | Procédé et appareil d'épissage de ligne de voie, et dispositif électronique et support de stockage | |
JP4616391B2 (ja) | 動的データプリフェッチのためのシステム及び方法 | |
JP4782082B2 (ja) | パケット処理装置、方法、およびプログラム | |
CN112383485A (zh) | 一种网络拥塞控制方法及装置 | |
WO2017000684A1 (fr) | Procédé de lecture de données, dispositif pair, dispositif de commande, et support de stockage | |
CN113902128B (zh) | 改善边缘设备利用效率的异步联邦学习方法、装置及介质 | |
WO2021115039A1 (fr) | Plateforme fpga, procédé d'évaluation de performance et d'optimisation de conception associé, et support de stockage | |
US10061726B2 (en) | Precision time management (PTM) for USB retimers that accurately adjusts timestamp fields of isochronous timestamp packets (ITPS) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22937158 Country of ref document: EP Kind code of ref document: A1 |