CN114970830A - Flexible communication method for accelerating data parallel distributed deep learning training - Google Patents

Flexible communication method for accelerating data parallel distributed deep learning training Download PDF

Info

Publication number
CN114970830A
CN114970830A CN202210651078.4A CN202210651078A CN114970830A CN 114970830 A CN114970830 A CN 114970830A CN 202210651078 A CN202210651078 A CN 202210651078A CN 114970830 A CN114970830 A CN 114970830A
Authority
CN
China
Prior art keywords
layer
neural network
parameters
completed
communication operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210651078.4A
Other languages
Chinese (zh)
Inventor
马胜
侯翔
黎铁军
吴利舟
张建民
罗莉
蒋威
易啸
徐睿
王波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210651078.4A priority Critical patent/CN114970830A/en
Publication of CN114970830A publication Critical patent/CN114970830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multi Processors (AREA)

Abstract

A flexible communication method for accelerating data parallel distributed deep learning training comprises the following steps: in the back propagation phase and the parameter update phase: when the parameters of the local DNN models of all the computing nodes are updated, suspending the ongoing communication operation for synchronizing the DNN model parameters of all the nodes, and saving the communication state; synchronizing the communication operation of the first-layer neural network parameters of all the nodes, updating the first-layer neural network, and starting the next training cycle; in the forward propagation stage of the next training cycle, when the activation of the first layer of neural network is started to be calculated, the incomplete synchronous communication operation from the second layer to the last layer of neural network parameters in the previous training cycle is started in sequence, and the updating is completed; and when the activation calculation of the nth layer of neural network is completed and the (n + 1) th layer of neural network is updated, starting the activation calculation of the (n + 1) th layer of neural network. The method has the advantages of simple principle, easy realization, obvious speed improvement on data parallel distributed training and the like.

Description

Flexible communication method for accelerating data parallel distributed deep learning training
Technical Field
The invention mainly relates to the technical field of distributed deep learning training, in particular to a flexible communication method for accelerating data parallel distributed deep learning training.
Background
Deep Learning (Deep Learning) technology deeply influences our daily life and plays an immeasurable role in the fields of language recognition, machine translation, image classification and the like. Deep learning includes two steps, training and reasoning, respectively. The size of the training data set and the complexity of a Deep Neural Network (DNN) model are two important factors affecting the accuracy of Deep learning inference results.
In recent years, with the development of deep learning algorithms, DNN models have become more and more complex. At the same time, the development of the internet has made it easy to acquire large-scale data sets for training DNN models. The proliferation of model parameters and training data improves the reasoning accuracy of the DNN model. However, this also puts higher demands on the computational and memory resources of the deep learning training system. In this context, it is expected that if the single accelerator system is continuously used for training, the problem of inefficiency will be faced in the future, which will be far from meeting the requirements of the academic world and the industrial industry.
In order to solve the problem of low training efficiency caused by the rapid increase of model parameters and training data, a method for training a DNN model by using a distributed deep learning training system is proposed by practitioners. In a distributed training system, the training data and DNN models may be stored to multiple single accelerator systems, which then collectively perform the training task. The training mode not only reduces the requirements of the storage and the computing capacity of the single accelerator system, but also can remarkably shorten the time for training the DNN model through the cooperative training of the single accelerator system and the computing capacity.
Data parallelism is one of the most important modes of distributed training and is characterized by the following:
firstly, a randomly selected small batch of training data sets needs to be divided evenly, and each part of data is distributed to each single accelerator system in the distributed training system.
Second, each single accelerator system stores a copy of the complete DNN model and is trained using the local data set.
At present, the traditional process of training the DNN model using the data parallel mode is as follows:
first, in the forward propagation phase, each single accelerator system is activated by the go-to-back computation layer-by-layer local DNN model.
Then, in the back propagation stage, the gradient of each layer of neural network parameters (including weight and bias) is calculated layer by layer from back to front, and the local DNN model is updated according to the gradient.
Finally, after the parameters of any nth layer neural network are updated, a parameter updating stage is started, namely: and synchronizing the parameters in the distributed training system through communication operation to obtain global parameters, and updating the DNN model. After the model is updated, the next training cycle is started.
However, in the training process of the above conventional technology, there still exist some technical problems, which directly affect the training efficiency:
firstly, the method comprises the following steps: in the forward propagation stage of training, only the calculation operation of the neural network exists in the distributed training system, and the communication operation between single accelerator systems does not exist, so that the functional module responsible for communication is idle.
Secondly, the method comprises the following steps: in the back propagation and parameter updating stage of training, after the parameter calculation of the layer 1 neural network is completed, the synchronous communication operation of the parameters cannot be started immediately, but the communication from the last layer neural network to the second layer neural network parameters needs to be completed completely, and the functional module responsible for the neural network calculation in the distributed training system is idle in this period.
In view of the foregoing, there is a need for a method capable of improving the parallelism of the computation operation and the communication operation in the data parallel distributed training, and further improving the speed of the data parallel distributed training.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a flexible communication method for accelerating data parallel distributed deep learning training, which has the advantages of simple principle, easy realization and obvious improvement on the speed of the data parallel distributed training.
In order to solve the technical problems, the invention adopts the following technical scheme:
a flexible communication method for accelerating data parallel distributed deep learning training comprises the following steps:
in the back propagation phase and the parameter update phase of the distributed training loop: when the parameters of the local DNN models of all the computing nodes are updated, suspending the ongoing communication operation for synchronizing the DNN model parameters of all the computing nodes, and saving the communication state;
after the communication state is stored, synchronizing the communication operation of the first-layer neural network parameters of all the nodes, updating the first-layer neural network, and then starting the next training cycle;
in the forward propagation stage of the next training cycle, when the activation of the first layer of neural network is started to be calculated, the synchronous communication operation of the parameters from the second layer to the last layer of neural network which are not completed in the previous training cycle is started in sequence, and each layer of neural network is updated;
when the activation calculation of the nth layer of neural network is completed and the (n + 1) th layer of neural network is updated, starting the activation calculation of the (n + 1) th layer of neural network; wherein n is more than or equal to 1 and is less than the number of DNN model layers.
As a further improvement of the process of the invention: the distributed training cycle comprises a forward propagation stage, a backward propagation stage and a parameter updating stage; the forward propagation stage is used for completing the calculation operation of local DNN model activation of the single accelerator system; in the back propagation stage, parameters for updating the local DNN model are calculated; and the parameter updating stage is used for synchronizing local parameters of all the single accelerator systems through communication operation and updating the DNN model.
As a further improvement of the process of the invention: the parameters include weights and biases.
As a further improvement of the process of the invention: and the back propagation stage and the parameter updating stage are used for calculating the parameters of the DNN model from the last layer to the 1 st layer in sequence, and after the model parameters of any layer are calculated, starting the synchronous communication operation of the model parameters of the layer after the synchronous communication operation in the system is finished.
As a further improvement of the process of the invention: in the forward propagation stage, before the activation of any layer of neural network is calculated, the synchronous communication operation of the parameters of the layer of neural network in the last training cycle is completed, that is, the layer of model is updated.
As a further improvement of the method of the invention: in the back propagation stage and the parameter updating stage, after the parameter calculation of the layer 1 neural network is completed, the synchronous communication operation of the DNN model parameters in the system is suspended, and the communication state is saved.
As a further improvement of the process of the invention: and when the communication is suspended, starting and finishing the synchronous communication operation of the layer 1 neural network parameters, and updating the layer 1 neural network.
As a further improvement of the process of the invention: after the layer 1 neural network is updated, starting the next training cycle; in the forward propagation stage, starting to calculate the activation of the layer 1 neural network; and meanwhile, starting synchronous communication operation of the layer 2 neural network parameters which are not completed in the last training cycle, and if the synchronous communication operation is completed, updating the layer 2 neural network.
As a further improvement of the process of the invention: after the synchronous communication operation of the layer 2 neural network parameters is completed, starting the unfinished synchronous communication operation of the layer 3 neural network parameters in the previous training cycle, and updating the layer 3 neural network; and by analogy, completing synchronous communication operation of all the model parameters which are not completed in the last training cycle, and updating each layer of neural network.
As a further improvement of the process of the invention: when the activation calculation of the layer 1 neural network is completed and the layer 2 neural network is updated, starting to calculate the activation of the layer 2 neural network; and by analogy, the calculation of the activation of each layer of neural network is completed.
Compared with the prior art, the invention has the advantages that:
the flexible communication method for accelerating the data parallel distributed deep learning training is simple in principle and easy to realize, and obviously improves the speed of the data parallel distributed training; the invention greatly improves the speed of data parallel distributed deep learning training by executing the activated calculation operation and the synchronous communication operation of the parameters in the last training cycle in parallel.
Drawings
Fig. 1 is a schematic diagram of the training process of the first training cycle forward propagation stage in a specific application example of the present invention.
Fig. 2 is a schematic diagram of the training process from the second training cycle to the last training cycle in the embodiment of the present invention.
FIG. 3 is a schematic diagram of the training process from the first training cycle to the penultimate training cycle back propagation and parameter synchronization stage in the specific application example of the present invention.
Fig. 4 is a schematic diagram of the training process of the last training cycle back propagation and parameter synchronization phase in the specific application example of the present invention.
FIG. 5 is a schematic flow chart of the method of the present invention in a specific application.
Detailed Description
The invention will be described in further detail below with reference to the drawings and specific examples.
As shown in fig. 5, a flexible communication method for accelerating data parallel distributed deep learning training of the present invention includes:
in the back propagation phase and the parameter update phase of the distributed training loop: when the updating of the parameters (including the weight and the offset) of the local DNN model of all the computing nodes is completed, suspending the ongoing communication operation for synchronizing the DNN model parameters of all the nodes and saving the communication state;
after the communication state is stored, starting and completing the communication operation of synchronizing the parameters of the first-layer neural network of all the nodes, updating the first-layer neural network, and then starting the next training cycle;
in the forward propagation stage of the next training cycle, starting to calculate the activation of the first layer of neural network, and simultaneously starting the synchronous communication operation of the parameters from the second layer to the last layer of neural network which are not completed in the previous training cycle in sequence, and updating each layer of neural network;
and when the activation calculation of the n (n is more than or equal to 1 and less than the number of DNN model layers) layer neural network is completed and the n +1 layer neural network is updated, starting the activation calculation of the n +1 layer neural network.
In a specific application example, in a distributed training system composed of a plurality of single accelerator systems, a distributed training cycle is formed, and data parallel distributed training is performed in three stages, which sequentially comprises: a forward propagation phase, a backward propagation phase and a parameter update phase, the parameters including weights and biases. Wherein, the forward propagation stage is used for completing the calculation operation of local DNN model activation of the single accelerator system; in the back propagation stage, parameters for updating the local DNN model are calculated; and in the parameter updating stage, the DNN model is updated by synchronizing local parameters of all the single accelerator systems through communication operation.
In a specific application example, in a back propagation stage and a parameter updating stage of a distributed training cycle, the parameters of the DNN model are sequentially calculated from the last layer to the layer 1, and after the model parameters of any layer are calculated, the synchronous communication operation of the model parameters of the layer is started after the synchronous communication operation in progress in the system is completed.
In a specific application example, in a forward propagation stage of a distributed training cycle, before the activation of any layer of neural network is calculated, synchronous communication operation of parameters of the layer of neural network in the previous training cycle is completed, that is, the layer of model is updated.
In a specific application example, in a back propagation stage and a parameter updating stage of the distributed training cycle, after the parameter calculation of the layer 1 neural network is completed, the ongoing synchronous communication operation of the DNN model parameters in the system is suspended, and the communication state is saved.
In a specific application example, after the communication is suspended, synchronous communication operation of the layer 1 neural network parameters is started and completed, and the layer 1 neural network is updated.
In the specific application example, after the layer 1 neural network is updated, the next training cycle is started. In the forward propagation stage, starting to calculate the activation of the layer 1 neural network; at the same time, synchronous communication operation of the layer 2 neural network parameters (if any) which is not completed in the last training cycle is started, and the layer 2 neural network is updated.
In a specific application example, after the synchronous communication operation of the layer 2 neural network parameters is completed, the incomplete synchronous communication operation (if any) of the layer 3 neural network parameters in the last training cycle is started, and the layer 3 neural network is updated. And in the same way, completing the synchronous communication operation of all the model parameters which are not completed in the last training cycle (if any), and updating each layer of neural network.
In a specific application example, when the activation calculation of the layer 1 neural network is completed and the layer 2 neural network is updated, the activation calculation of the layer 2 neural network is started. And by analogy, the calculation of the activation of each layer of neural network is completed.
In a specific application example, the detailed process of the invention is as follows:
as shown in fig. 1, in a specific application example, the training process of the forward propagation stage of the first training cycle is as follows: and calculating and activating layer by layer from the neural network of the layer 1 to the neural network of the last layer, wherein synchronous communication operation is not performed in the training process.
As shown in fig. 2, in a specific application example, the training process from the second training cycle to the last training cycle in the forward propagation stage is:
for the communication operation: firstly, the incomplete synchronous communication operation of the parameters of the layer 2 neural network in the last training cycle is started, and the layer 2 neural network is updated. And then starting synchronous communication operation of the layer 3 neural network parameters, and updating the layer 3 neural network. And by analogy, synchronous communication operation of all unfinished parameters in the last training cycle is finished, and each layer of neural network is updated.
For the calculation operation: first, the activation of the layer 1 neural network is calculated. Then, when the activation calculation is completed and the layer 2 neural network is updated, the calculation of the activation of the layer 2 neural network is started. Then, activation is calculated by analogy, namely: and when the activation calculation of the n (1 < n < the number of the DNN model layers) layer neural network is completed and the n +1 layer neural network is updated, starting to calculate the activation of the n +1 layer neural network, and finally completing the calculation of the activation of each layer of the DNN model.
As shown in fig. 3, in a specific application example, the training process from the first training cycle to the penultimate training cycle back propagation and parameter synchronization stage is as follows:
for the calculation operation: and calculating local DNN model parameters layer by layer from the last layer of neural network to the 1 st layer of neural network.
For the communication operation: when the parameter calculation of the local nth (n is more than 1 and less than or equal to the DNN model layer number) layer neural network is finished and the synchronous communication operation in the current system is finished, starting the synchronous communication operation of the nth layer neural network parameter; after the parameter calculation of the local layer 1 neural network is completed, suspending the ongoing communication operation in the system and saving the communication state; after the communication state is stored, the synchronous communication operation of the layer 1 neural network parameters is started and completed, and the layer 1 neural network is updated.
As shown in fig. 4, in the specific application example, the training process of the last training cycle back propagation and parameter synchronization stage is as follows:
for the calculation operation: and calculating local DNN model parameters layer by layer from the last layer of neural network to the 1 st layer of neural network.
For the communication operation: and when the parameter calculation of the local nth (n is more than or equal to 1 and less than or equal to the number of the DNN model layers) layer neural network is finished and the synchronous communication operation in the current system is finished, starting the synchronous communication operation of the nth layer neural network parameters and finally finishing the updating of the DNN model parameters.
As shown in fig. 5, in the forward propagation stage, communication of the unfinished parameters of the second layer to the last layer of the neural network in the last training cycle is performed in sequence. In the back propagation and parameter synchronization stage, after the parameters of any layer of neural network of the local DNN model are updated, the synchronous communication operation of the parameters of the layer can be started; when the layer 1 neural network parameters are updated, the communication is suspended, and the communication state is saved; and finally, carrying out synchronous communication operation of the layer 1 parameters.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (10)

1. A flexible communication method for accelerating data parallel distributed deep learning training is characterized by comprising the following steps:
in the back propagation phase and the parameter update phase of the distributed training loop: when the parameters of the local DNN models of all the computing nodes are updated, suspending the ongoing communication operation for synchronizing the DNN model parameters of all the computing nodes, and saving the communication state;
after the communication state is stored, synchronizing the communication operation of the first-layer neural network parameters of all the nodes, updating the first-layer neural network, and then starting the next training cycle;
in the forward propagation stage of the next training cycle, when the activation of the first layer of neural network is started to be calculated, the synchronous communication operation of the parameters from the second layer to the last layer of neural network which are not completed in the previous training cycle is started in sequence, and each layer of neural network is updated;
when the activation calculation of the nth layer of neural network is completed and the (n + 1) th layer of neural network is updated, starting the activation calculation of the (n + 1) th layer of neural network; wherein n is more than or equal to 1 and is less than the number of DNN model layers.
2. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to claim 1, wherein the distributed training loop comprises a forward propagation phase, a backward propagation phase and a parameter update phase; the forward propagation stage is used for completing the calculation operation of local DNN model activation of the single accelerator system; in the back propagation stage, parameters for updating the local DNN model are calculated; and the parameter updating stage is used for synchronizing local parameters of all the single accelerator systems through communication operation and updating the DNN model.
3. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to claim 2, wherein the parameters include weights and biases.
4. The flexible communication method oriented to acceleration of data parallel distributed deep learning training of any one of claims 1 to 3, wherein the back propagation stage and the parameter updating stage sequentially calculate the parameters of the DNN model from the last layer to the layer 1, and after the calculation of the model parameters of any layer is completed, the synchronous communication operation of the model parameters of the layer is started after the completion of the ongoing synchronous communication operation in the system.
5. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to any one of claims 1 to 3, wherein in the forward propagation stage, before the activation of any layer of neural network is calculated, the synchronous communication operation of the parameters of the layer of neural network in the previous training cycle is completed, that is, the layer of model is updated.
6. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to any one of claims 1-3, characterized in that in the back propagation stage and the parameter updating stage, after the parameter calculation of the layer 1 neural network is completed, the ongoing synchronous communication operation of the DNN model parameters in the system is suspended, and the communication state is saved.
7. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to claim 6, characterized in that after communication is suspended, synchronous communication operation of layer 1 neural network parameters is started and completed, and the layer 1 neural network is updated.
8. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to claim 7, characterized in that after the layer 1 neural network is updated, the next training cycle is started; in the forward propagation stage, starting to calculate the activation of the layer 1 neural network; and meanwhile, starting synchronous communication operation of the layer 2 neural network parameters which are not completed in the last training cycle, and if the synchronous communication operation is completed, updating the layer 2 neural network.
9. The flexible communication method for acceleration of data-oriented parallel distributed deep learning training according to claim 8, wherein after the synchronous communication operation of the layer 2 neural network parameters is completed, the incomplete synchronous communication operation of the layer 3 neural network parameters in the previous training cycle is started, and the layer 3 neural network is updated; and by analogy, completing synchronous communication operation of all the model parameters which are not completed in the last training cycle, and updating each layer of neural network.
10. The flexible communication method for acceleration of data parallel distributed deep learning training according to claim 9, wherein when the activation calculation of the layer 1 neural network is completed and the layer 2 neural network is updated, the activation calculation of the layer 2 neural network is started; and by analogy, the calculation of the activation of each layer of neural network is completed.
CN202210651078.4A 2022-06-09 2022-06-09 Flexible communication method for accelerating data parallel distributed deep learning training Pending CN114970830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210651078.4A CN114970830A (en) 2022-06-09 2022-06-09 Flexible communication method for accelerating data parallel distributed deep learning training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210651078.4A CN114970830A (en) 2022-06-09 2022-06-09 Flexible communication method for accelerating data parallel distributed deep learning training

Publications (1)

Publication Number Publication Date
CN114970830A true CN114970830A (en) 2022-08-30

Family

ID=82960635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210651078.4A Pending CN114970830A (en) 2022-06-09 2022-06-09 Flexible communication method for accelerating data parallel distributed deep learning training

Country Status (1)

Country Link
CN (1) CN114970830A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115600687A (en) * 2022-11-08 2023-01-13 北京百度网讯科技有限公司(Cn) Model training method, device, equipment and storage medium
CN116258197A (en) * 2023-05-16 2023-06-13 之江实验室 Distributed training acceleration method and system based on parameter calculation and communication scheduling

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115600687A (en) * 2022-11-08 2023-01-13 北京百度网讯科技有限公司(Cn) Model training method, device, equipment and storage medium
CN116596091A (en) * 2022-11-08 2023-08-15 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN116596091B (en) * 2022-11-08 2024-02-02 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN116258197A (en) * 2023-05-16 2023-06-13 之江实验室 Distributed training acceleration method and system based on parameter calculation and communication scheduling
CN116258197B (en) * 2023-05-16 2023-09-08 之江实验室 Distributed training acceleration method and system based on parameter calculation and communication scheduling

Similar Documents

Publication Publication Date Title
CN114970830A (en) Flexible communication method for accelerating data parallel distributed deep learning training
CN109902818B (en) Distributed acceleration method and system for deep learning training task
CN114756383B (en) Distributed computing method, system, equipment and storage medium
Yu et al. LLR: Learning learning rates by LSTM for training neural networks
CN110533183B (en) Task placement method for heterogeneous network perception in pipeline distributed deep learning
CN109257429A (en) A kind of calculating unloading dispatching method based on deeply study
CN111858009A (en) Task scheduling method of mobile edge computing system based on migration and reinforcement learning
CN106156810A (en) General-purpose machinery learning algorithm model training method, system and calculating node
CN111274036A (en) Deep learning task scheduling method based on speed prediction
CN111882060A (en) Single-step delay stochastic gradient descent training method for machine learning
CN114356578B (en) Parallel computing method, device, equipment and medium for natural language processing model
CN111860828A (en) Neural network training method, storage medium and equipment
CN114237869B (en) Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN112686383B (en) Method, system and device for reducing distributed random gradient of communication parallelism
CN115293342A (en) Deep convolutional neural network parallel training method based on hybrid parallel
CN113159287A (en) Distributed deep learning method based on gradient sparsity
CN115186806A (en) Distributed graph neural network training method supporting cross-node automatic differentiation
CN111831354A (en) Data precision configuration method, device, chip array, equipment and medium
KR102463147B1 (en) Massively parallel deep learning method and apparatus
CN117331700B (en) Computing power network resource scheduling system and method
CN114186671A (en) Large-batch decentralized distributed image classifier training method and system
CN117744759A (en) Text information identification method and device, storage medium and electronic equipment
CN112862083A (en) Deep neural network inference method and device under edge environment
CN116339942A (en) Self-adaptive scheduling method of distributed training task based on reinforcement learning
US11928598B2 (en) Method and system for distributed neural network training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination