WO2022042741A1 - Learning model training method, working node, server, device and medium - Google Patents

Learning model training method, working node, server, device and medium Download PDF

Info

Publication number
WO2022042741A1
WO2022042741A1 PCT/CN2021/115544 CN2021115544W WO2022042741A1 WO 2022042741 A1 WO2022042741 A1 WO 2022042741A1 CN 2021115544 W CN2021115544 W CN 2021115544W WO 2022042741 A1 WO2022042741 A1 WO 2022042741A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
training
statistical parameters
model
current
Prior art date
Application number
PCT/CN2021/115544
Other languages
French (fr)
Chinese (zh)
Inventor
徐茂轩
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022042741A1 publication Critical patent/WO2022042741A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the invention relates to the technical field of deep learning, in particular to a deep learning model training method, a working node, a parameter server, an electronic device and a readable storage medium.
  • multiple worker nodes can usually be used to train the same model.
  • different worker nodes are responsible for training different training layers in the same model.
  • the next training layer needs to wait for the training of the previous training layer to complete before the training is completed.
  • Being able to perform the training process the latency of which greatly increases the total time for model training, thereby reducing the efficiency of model training.
  • Embodiments of the present invention provide a deep learning model training method, a working node and a parameter server, so as to solve the problem of low model training efficiency in the process of using multiple working nodes to train the same deep learning model in the related art.
  • an embodiment of the present invention provides a deep learning model training method, which is applied to a worker node.
  • the method includes multiple training cycles, and each training cycle includes:
  • the multiple layers of the target model are trained according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, starting from the second training cycle, in each training cycle Training the target layer includes:
  • the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data includes the current working node at The target statistical parameters obtained in the training period before the current training period, and the target statistical parameters obtained in the training period before the current training period by other working nodes belonging to the same working network as the current working node;
  • target statistical parameters of the current target layer wherein the target statistical parameters are the statistical parameters of the target batch training samples;
  • Batch normalize the current target layer based on the actual statistical parameters and the target batch training samples, and send the target statistical parameters to the parameter server.
  • an embodiment of the present invention further provides a deep learning model training method, which is applied to a parameter server, wherein the deep learning model training method includes multiple computing cycles, and in each computing cycle, the Methods include:
  • Target statistical parameters for the target layer in the target training model sent by multiple working nodes, where the target statistical parameters of the target layer include that the working node that sent the target statistical parameters trains the target in the training period corresponding to the current computing period Statistical parameters of target batch training samples used in layers, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
  • the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model are calculated according to the received target statistical parameters.
  • an embodiment of the present invention further provides a working node, including:
  • the sample acquisition module is used to acquire target batch training samples
  • a training module configured to train multiple layers of the target model according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, and the training module includes:
  • the first receiving unit is configured to receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the target layer of the target model, and the historical training data includes the current The target statistical parameters obtained by the working node in the training cycle before the current training cycle, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current work in the training cycle before the current training cycle;
  • a first obtaining unit configured to obtain the target statistical parameters of the currently trained target layer, wherein the target statistical parameters are the statistical parameters of the target batch training samples;
  • the determining unit is configured to determine the actual statistical parameters of the currently trained target layer based on the historical global statistical parameters and the target statistical parameters, and perform the corresponding target layer based on the actual statistical parameters and the target batch training samples. batch normalization, and sending the target statistical parameters to the parameter server.
  • an embodiment of the present invention further provides a parameter server, including:
  • a parameter server characterized in that it includes:
  • the parameter receiving module is used to receive the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes, and the target statistical parameters of the target layer include the working node that sends the target statistical parameters in the current calculation cycle.
  • Statistical parameters of the target batch training samples used in training the target layer in the training cycle wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
  • the updating module calculates the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model according to the received target statistical parameters.
  • an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a program or instruction stored in the memory and executable on the processor, the program or instruction being The steps of implementing the deep learning model training method described in the first aspect when the processor is executed, or the steps of implementing the deep learning model training method described in the second aspect when the program or instruction is executed by the processor.
  • an embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the deep learning model described in the first aspect is implemented The steps of the training method, or the steps of implementing the deep learning model training method described in the second aspect when the program or instructions are executed by the processor.
  • each working node when calculating the historical global statistical parameters, only needs to send the target statistical parameters obtained by their respective calculations to the parameter server.
  • the parameter server can directly calculate and obtain the historical global statistical parameters according to the target statistical parameters sent by each working node.
  • the amount of data that needs to be transmitted is relatively small, which improves the efficiency of the deep learning model training method.
  • the result obtained after standardizing the sample data can be made more accurate.
  • Figure 1 is a schematic diagram of the system architecture of a deep learning model training method based on a mini-batch gradient descent method
  • FIG. 2 is a flowchart of an embodiment of the deep learning model training method provided by the first aspect of the present invention
  • FIG. 3 is a flowchart of another embodiment of the deep learning model training method provided by the first aspect of the present invention.
  • FIG. 5 is a flowchart of another embodiment of the deep learning model training method provided by the second aspect of the present disclosure.
  • FIG. 6 is a schematic diagram of data interaction between a working node and a parameter server in a deep learning model training method provided by an embodiment of the present invention
  • FIG. 7 is a structural diagram of a working node provided by an embodiment of the present invention.
  • FIG. 8 is a structural diagram of a parameter server provided by an embodiment of the present invention.
  • the model training is usually performed using the mini-batch gradient descent method.
  • mini-batch gradient descent method for model training, each iteration uses batch size (batch size: the number of samples selected for one training) samples to update the parameters.
  • batch size the number of samples selected for one training
  • the deep learning model cannot perform small batch calculations on a single worker node, and the training process usually needs to be performed on multiple worker nodes.
  • a network composed of multiple working nodes is hereinafter referred to as a working network.
  • a data-parallel training method for a large-scale deep learning model can be used, for example, the training network as shown in Figure 1 can be used.
  • Work node 10 to put the same model on each work node 10 for training, then divide the training data set (generate mini-batch training samples), and distribute the divided mini-batch training samples 20 to different On the working node 10, so that the working node 10 performs model training based on the assigned small batch training samples 20, and each working node 10 exchanges data with the parameter server 30 after the training is completed, and reports the training results, or each working node 10.
  • Perform data interaction with the parameter server 30 during the training process to standardize batch training samples (hereinafter referred to as: batch standardization), which can process each batch of standardization layers in the forward propagation of the training process.
  • Statistics parameters are also used during backpropagation.
  • a single worker node can independently train deep learning based on a small batch of training samples, and synchronize the updated data to the parameter server. This embodiment is only applicable to small deep learning training models.
  • each worker node uses the same model, and retrieves data from the database for training, respectively.
  • Complete small batch training with batch size of samples together that is, batch size is the sum of the number of training samples trained by all worker nodes in one model iteration process).
  • batch size is the sum of the number of training samples trained by all worker nodes in one model iteration process.
  • the network layer needs to perform global batch size statistics for batch size samples (the batch normalization layer (batch norm) is used as an example for illustration below), each work needs to be done.
  • the batch normalization layer (batch norm) is used as an example for illustration below)
  • each work needs to be done.
  • the data of this layer is synchronized to the parameter server.
  • the parameter server completes the calculation of the statistical value, it is synchronized to each working node.
  • the parameter server needs to collect statistics on the training data of all working nodes, so that each working node can be based on The statistical results are respectively corrected and calculated for the current data of this layer, and the corresponding target layer is batch-standardized using the corrected statistical values.
  • the worker node needs to perform the following waiting process:
  • the present application can solve the problem of low model training efficiency in the process of using the mini-batch gradient descent method for model training.
  • FIG. 2 is a flowchart of a deep learning model training method provided by an embodiment of the present invention, and the method is applied to a worker node.
  • the deep learning model training method includes multiple training cycles, as shown in Figure 2, and each training cycle may include the following steps:
  • step S210 obtain target batch training samples
  • step S220 multiple layers of the target model are trained according to the target batch training samples.
  • the multiple layers of the target model include at least one target layer that needs batch normalization. Accordingly, starting from the second training cycle, training the target layer in each training cycle includes:
  • step S221 the historical global statistical parameters sent by the parameter server are received, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data Including the target statistical parameters obtained by the current working node in the training cycle before the current training cycle, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current working node in the training cycle before the current training cycle;
  • step S222 the target statistical parameters of the current target layer are obtained, wherein the target statistical parameters include the statistical parameters of the target batch training samples;
  • step S223 the actual statistical parameters of the current target layer are determined based on the historical global statistical parameters and the target statistical parameters;
  • step S224 batch normalization is performed on the current target layer based on the actual statistical parameters and the target batch training samples, and the target statistical parameters are sent to the parameter server.
  • the training network for training the target model needs to perform multiple model iteration training on the target model, then in each iteration training (that is, each training cycle): each working node in the training network is trained in the forward direction to In the target normalization layer (ie, the target layer above), statistical calculation is performed on the currently used training samples to obtain target statistical parameters, and the target statistical parameters are reported to the parameter server.
  • the target statistical parameter includes the statistical value and the number of samples of the training samples used by the worker nodes that obtain the target statistical parameter.
  • the parameter server may calculate and obtain the historical global statistical parameters according to the received target statistical parameters of each working node belonging to the same working network. In this way, the historical global statistical parameters calculated and obtained by the parameter server can reflect the statistical characteristics of all historical training samples.
  • the historical global statistical parameters are the historical statistical parameters of the sample data that have been trained on the target normalization layer.
  • the work node combines the target statistical parameters of the target batch training samples of the target normalization layer with the historical global statistical parameters to obtain this The actual statistical parameters of the normalization layer.
  • each worker node When calculating the historical global statistical parameters, each worker node only needs to send the target statistical parameters obtained by their respective calculations to the parameter server.
  • the parameter server can directly calculate and obtain the historical global statistical parameters according to the target statistical parameters sent by each working node. During the entire computing process, the amount of data that needs to be transmitted is relatively small, which improves the efficiency of the deep learning model training method. Moreover, by using the historical global statistical parameters to correct the target statistical parameters of the current target layer, the result obtained after standardizing the sample data can be made more accurate.
  • how to train the target layer in the first training cycle is not particularly limited.
  • the target statistical parameters of the target layer in the first cycle and the target batch training samples can be directly used to batch normalize the current target layer.
  • the present disclosure is not limited to this, and the target statistical parameters of the target layer in the first cycle can also be modified by using preset values to obtain the actual statistical parameters of the first training cycle, and then the actual statistical parameters and the target batch can be used
  • the training samples are batch normalized to the current target layer.
  • the parameter server can be used to calculate and obtain global statistical parameters according to the target batch training samples of each worker node, and then use the global statistical parameters and the target batch training samples to perform batch normalization on the current target layer.
  • the target model may include multiple normalization layers, and the parameter server may update the historical global statistical parameters of each normalization layer respectively.
  • Each worker node can obtain the historical global statistical parameters corresponding to the m1 standardization layer from the parameter server when forward training to the m1 standardization layer; when each worker node is forward trained to the m2 standardization layer, can obtain the m2 standardization layer from the parameter server.
  • the historical global statistical parameters of where the normalization layers in the target model include m1 normalization layer and m2 normalization layer.
  • the above-mentioned historical global statistical parameters may be statistical values or series of statistical values obtained by performing one or more calculations such as variance, summation, and integration of historical data, or, the historical global statistical parameters may also include The number of training samples for which training is completed, wherein the statistical value sequence may include multiple statistical values, for example, including variance values and summation values.
  • historical global statistical parameters may be the historical statistical parameters received by the working node before training the target layer (that is, the historical statistical parameters received before the current training cycle), which are not used for the target layer currently. Statistics on training samples.
  • the above-mentioned target layer may be a target batch normalization layer
  • the target batch training sample may be a sample set including at least one training sample, and a working node applying the deep learning model training method provided by the present application, based on the sample
  • the training samples in the set are used to train the target layer of the target model, that is, the target batch training samples may also be referred to as the current training samples for the training of the target layer.
  • the above-mentioned training of the target layer of the target model according to the target batch training samples can also be understood as: the target batch training samples are trained forward to the target layer of the target model.
  • the above-mentioned target statistical parameter may be a statistical value or a series of statistical values obtained by performing statistical calculation on the target batch of training samples, and the statistical calculation method may be the same as the statistical calculation method performed by the parameter server on the historical training data. This will not be repeated here.
  • the worker nodes before training multiple layers of the target model according to the target batch training samples, the worker nodes also need to obtain the target batch data.
  • the globality of the target batch data that is, the global data information can be used in the training process.
  • the target batch of training samples may be obtained from a database in a random sampling manner.
  • sampling with replacement can be adopted.
  • the target batch training samples may not be deleted, and other worker nodes can also randomly select the target batch training samples.
  • the training samples in the training samples can be adopted.
  • the target batch training samples arranged at preset positions may also be acquired from a database, wherein the training samples stored in the database are arranged in random order.
  • the target batch training samples listed at the preset position can be obtained from the training sample arranged at the first position, and N-1 training samples arranged after the training sample are obtained, wherein, N represents The number of training samples included in the batch of training samples.
  • the database can allocate training samples within it to assign different training samples to different worker nodes.
  • the parameter types of the historical global statistical parameters are the same as the parameter types of the target statistical parameters.
  • the historical global statistical parameters include statistical parameters for historical training samples and historical training samples containing
  • the target statistical parameters include the statistical value of the target batch training samples and the number of samples contained in the target batch training samples.
  • the above-mentioned number of samples may refer to the number of times the training samples have been used, that is, if the same sample has been trained n times, the number of samples is n.
  • the actual statistical parameters of the target layer may include statistical parameter values.
  • the above-mentioned determination of the actual statistical parameters of the target layer based on the historical global statistical parameters and the target statistical parameters may be to correct and calculate the target statistical parameters by using the historical global statistical parameters, to obtain the actual statistical parameters of the target layer.
  • the above-mentioned batch normalization correction formula can be adjusted according to information such as the variance of the statistical value of the target layer and the degree of deviation from the first statistical value.
  • the specific process of the batch normalization is the same as that of the prior art
  • the batch normalization process of the middle batch normalization layer has the same meaning and will not be repeated here.
  • the worker node after acquiring the target statistical parameters, also sends the target statistical parameters to the parameter server, so that the parameter server updates the historical global statistical parameters according to the target statistical parameters, so that the parameters
  • the server sends the updated historical global statistical parameters to each working node to realize data synchronization between the working nodes, so that the updated historical global statistical parameters can be used in the next iteration to calibrate and calculate the target statistical parameters in the iteration cycle , until the model training is complete.
  • the historical global statistical parameters and the target statistical parameters may include the training sample statistical parameter value and the number of training samples, so that the communication volume between the worker nodes and the parameter server is small and the communication efficiency is improved.
  • the working node is not specifically limited to perform: batch normalize the current target layer based on the actual statistical parameters and the target batch training samples, and send the target statistical parameters to the The sequence of these two steps for the parameter server.
  • each working node may store the actual statistical parameter in the working node, so as to use the actual statistical parameter for back propagation.
  • the working node receives the historical global statistical parameters determined by the parameter server according to the historical training data, and when the working node trains the target layer of the target model based on the target batch training samples, based on the received historical global statistical parameters parameters and target statistical parameters of the target batch training samples, determine the actual statistical parameters of the target layer, and perform forward and back propagation training of the target model based on the actual statistical parameters, so that the worker nodes do not need to wait for the parameter server to obtain
  • the statistical parameters can only be executed based on the statistical parameters sent by the parameter server when the statistical parameters are updated according to these statistical parameters of the training samples and sent to each working node. Forward and backpropagation training greatly reduces the time for worker nodes to wait for the parameter server to issue statistical parameters, which can improve the training efficiency of the deep learning model.
  • the deep learning model training method is equivalent to a gradient descent method. After the data of the target layer is normalized, the parameters of the target model need to be trained by using the normalized data.
  • the model parameters obtained after training by each worker node (referred to as single-body target model parameters for ease of description) are sent to the parameter server, which receives multiple single-body target models
  • the overall parameters of the target model are updated in combination with all the received individual target model parameters, and the overall updated target model is sent to each work node, so that each work node continues to adjust the parameters based on different training samples.
  • the updated target model is trained.
  • each training cycle of the deep learning model training method may further include:
  • step S230 sending the single target model parameters obtained by training each layer at the current working node to the parameter server;
  • step S240 before the next training cycle starts, the updated target training model sent by the parameter server is received as the target training model of the next training cycle.
  • FIG. 4 is a flowchart of a deep learning model training method provided by the present application.
  • the deep learning model training method is applied to a parameter server.
  • the deep learning model training method includes multiple computing cycles, and in each computing cycle, the method may include the following steps:
  • step S310 the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes are received, and the target statistical parameters of the target layer include the training data corresponding to the current computing cycle of the working node that sent the target statistical parameters.
  • step S320 the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model are calculated according to the received target statistical parameters.
  • the above-mentioned working node may be a working node that trains the target model together with the parameter server, which may be a working node that executes the method shown in FIG. 2 .
  • the parameter server may send the historical global statistical parameters to all working nodes that train the target model.
  • the historical global statistical parameters have the same meaning as the historical global statistical parameters mentioned in the first aspect of the present disclosure, and are not repeated here.
  • the target statistical parameters received in the first calculation cycle are the target statistical parameters obtained by calculation in the first training cycle of the worker node.
  • the deep learning model training method provided by the second aspect of the present disclosure cooperates with the deep learning model training method provided by the first aspect of the present disclosure.
  • the first computing cycle in the deep learning model training method provided in the second aspect of the present disclosure corresponds to the first training cycle in the deep learning model training method provided in the first aspect of the present disclosure
  • the first The historical global statistical parameter received in the second training cycle of the deep learning model training method provided by the aspect is the historical global statistical parameter obtained by the first calculation cycle of the deep learning model training method provided by the second aspect of the present disclosure.
  • the second calculation period in the deep learning model training method provided by the second aspect of the present disclosure corresponds to the second training period in the deep learning model training method provided by the first aspect of the present disclosure
  • the historical global statistical parameters received in the third training cycle in the deep learning model training method provided by the first aspect is the second computing cycle of the deep learning model training method provided by the second aspect of the present disclosure Calculate the historical global statistical parameters obtained by calculation, and so on.
  • the target statistical parameters sent by each working node may be the statistical parameters of the target batch training samples used when training the target layer, which may have the same meaning as the target statistical parameters in the method embodiment shown in FIG. 2 . This will not be repeated here.
  • the target statistical parameters corresponding to different target layers are different, and the historical global statistical parameters corresponding to different target layers are also different.
  • the parameter server may actively deliver the historical global statistical parameters obtained by calculation to each working node. That is to say, the deep learning model training method includes the following steps after step S320:
  • the parameter server After calculating and obtaining the historical global statistical parameters, the parameter server does not actively send the historical global statistical parameters to each working node, but only sends the global parameter acquisition request to the The worker node that sends the historical global parameter acquisition request sends the historical global statistical parameter. That is to say, the deep learning model training method may further include steps performed after step S320:
  • the above-mentioned receiving the target statistical parameters sent by the working nodes when training the target layer of the target model based on different batches of training samples may be obtained when each working node receives the target model Step S320 is performed only after the target statistical parameters sent when the target layer is trained.
  • step S320 may specifically include:
  • the target statistical parameters of the target layer corresponding to the current computing cycle sent by a preset number of working nodes, based on the historical global statistical parameters obtained in the computing cycle before the current computing cycle and the preset number of The historical global statistical parameters of the target layer corresponding to the current cycle are calculated from the target statistical parameters, wherein the preset number is less than or equal to the total number of working nodes in the working network.
  • some working nodes are reserved, and after a preset number of working nodes submit data to the parameter server, the model can be updated according to the submitted data, and the data submitted by working nodes exceeding the preset number will no longer wait and receive. , in this way, it can reduce the stop and wait of the overall model training process due to the slow running or crash of a certain worker node, which can improve the efficiency of model training.
  • the above-mentioned updating of the historical global statistical parameters based on the target statistical parameters may be performing deviation correction on the historical global statistical parameters according to the target statistical parameters.
  • the parameter server may periodically execute steps S310 and S320, so as to synchronize the historical global statistical parameters of each batch normalization layer to each working node when updating the model data.
  • the parameter server may perform step S310 when obtaining the updated historical global statistical parameters.
  • the parameter server may send the updated historical global statistical parameters to each working node, specifically, when each working node runs forward to the target normalization layer , respectively sending the updated historical global statistical parameters to the working node, wherein the target normalization layer is a normalization layer associated with the updated historical global statistical parameters.
  • each worker node in the process of training the batch normalization layer, it is not necessary for each worker node to send all training sample data to the parameter server, and wait for the parameter server to perform global statistics before returning the statistical parameters.
  • each worker node counts the batch normalization data of the batch normalization layer used by each worker, and reports their respective target statistical parameters, thereby reducing the amount of data interaction between the parameter server and each worker node, and the parameters
  • the server can send the historical global statistical parameters to each working node, so that the working node can correct the target statistical parameters according to the received historical global statistical parameters, so as to obtain the actual statistical parameters used by each working node, and use the actual statistical parameters.
  • the target training samples are batch normalized to the corresponding target layers, which can greatly reduce the waiting time of the worker nodes.
  • the deep model training method may further include:
  • step S330 receiving the individual target model parameters obtained by training each layer of the target model sent by each working node;
  • step S340 the target model of the current cycle is updated according to the received monomer target model parameters, so as to obtain the target model for the next training cycle of the working node.
  • the method includes the following processes:
  • step 601 the working node A obtains the target training data that needs to be trained currently from the database.
  • the target training data is the target batch training samples in the method embodiments shown in FIG. 2 and FIG. 3 .
  • Step 602 The working node A trains the target model based on the target training data.
  • the target model can include multiple layers that require global batch size statistical parameters (such as batchnorm batch normalization layer, the following only takes batchnorm batch normalization layer as an example).
  • the global batch size statistical parameters include: variance, summation, average and other statistical values, which also include: the number of target training data.
  • Step 603 When the working node A runs forward to the target batch normalization layer, obtain the target statistical parameters, and correct the target statistical parameters based on the historical global statistical parameters to obtain the actual statistical parameters.
  • the worker node A obtains the target training data of the target batch normalization layer, performs statistical calculation on the target training data to obtain the target statistical parameters, and then uses the historical global statistical parameters of the target batch normalization layer synchronized from the parameter server, Correct the target statistical parameters of this work node to obtain the currently used statistical parameters, that is, the actual statistical parameters.
  • This statistical parameter is reserved in the local machine and needs to be used during back propagation.
  • Step 604 Send the target statistical parameters to the parameter server at the working node A.
  • Step 605 The parameter server updates the stored historical global statistical parameters according to the target statistical parameters, and sends the updated historical global statistical parameters to each working node (including working node A).
  • the working node and the parameter server cooperate with each other to execute each process of the deep learning model training method as shown in FIG. 2 and FIG. 3 , and can achieve the same beneficial effect. To avoid repetition, details are not repeated here.
  • FIG. 7 is a structural diagram of a working node provided by an embodiment of the present application.
  • the working node 500 includes:
  • a sample acquisition module 510 configured to acquire target batch training samples
  • the training module 520 is configured to train multiple layers of the target model according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization.
  • Training module 520 includes
  • the first receiving unit 521 is configured to receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the target layer of the target model, and the historical training data includes The target statistical parameters obtained by the current working node in the training period before the current training period, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current work in the training period before the current training period;
  • the determining unit 522 is configured to determine the actual statistical parameters of the currently trained target layer based on the historical global statistical parameters and the target statistical parameters, and based on the actual statistical parameters and the target batch training samples, the corresponding target layer Batch normalization is performed, and the target statistical parameters are sent to the parameter server.
  • the "corresponding target layer” here refers to the target layer that belongs to the same training cycle as the actual statistical parameters and the target batch training samples and needs to use the above-mentioned actual statistical parameters.
  • the working node is used to execute the deep learning model training method provided in the first aspect of the present disclosure.
  • the principles and beneficial effects of the deep learning model training method have been described in detail above, and will not be repeated here.
  • the sample acquisition module 510 is configured to acquire the target batch training samples from the database in a random sampling manner.
  • the sample obtaining module 510 is configured to obtain the target batch training samples arranged at preset positions from a database, wherein the training samples stored in the database are arranged in random order.
  • the target statistical parameter includes a statistical value of the target batch of training samples and the number of samples of the target batch of training samples.
  • the worker node 500 may further include:
  • a sending module 530 configured to send the single target model parameters obtained by training each layer at the current working node to the parameter server;
  • the model receiving module 540 is configured to receive the updated target training model sent by the parameter server as the target training model for the next training cycle.
  • FIG. 8 is a structural diagram of a parameter server provided by an embodiment of the present application.
  • the parameter server 600 includes:
  • the parameter receiving module 610 is configured to receive the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes, where the target statistical parameters of the target layer include the working node that sends the target statistical parameters in the current computing cycle corresponding to the The statistical parameters of the target batch training samples used in training the target layer in the training cycle of
  • the updating module 620 calculates, according to the received target statistical parameters, the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model.
  • the parameter server 600 is configured to execute the deep learning model training method provided by the second aspect of the present disclosure.
  • the beneficial effects and working principles of the deep learning model training method provided by the second aspect of the present disclosure have been described in detail above, I won't go into details here.
  • the update module 620 may be specifically configured to, in the case of receiving the target statistical parameters of the target layer corresponding to the current calculation period sent by the preset number of first working nodes, based on the calculation period before the current calculation period.
  • the obtained historical global statistical parameters and the preset number of target statistical parameters calculate the historical global statistical parameters of the target layer corresponding to the current cycle, wherein the preset number is less than or equal to the total number of working nodes in the working network .
  • the parameter receiving module 610 is further configured to receive individual target model parameters sent by each working node and obtained by training each layer of the target model.
  • the updating module 620 is further configured to update the target model of the current cycle according to the received monomer target model parameters.
  • Embodiments of the invention also provide an electronic device, including a processor, a memory, a program or an instruction stored in the memory and executable on the processor, and when the program or instruction is executed by the processor, the first embodiment of the present disclosure is implemented.
  • the various processes of the deep learning model training method provided in one aspect or the deep learning model training method provided by the second aspect of the present disclosure can achieve the same technical effect, and are not repeated here to avoid repetition.
  • An embodiment of the present disclosure further provides a readable storage medium, wherein a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the deep learning according to the first aspect of the present disclosure is implemented The steps of the model training method, or the steps of implementing the deep learning model training method according to the second aspect of the present disclosure when the program or instructions are executed by the processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A deep learning model training method, a working node, a parameter server, an electronic device and a readable medium, the method comprising: acquiring target batch training samples (S210); training multiple layers of a target model according to the target batch training samples, the trainings of target layers in each training cycle starting from the second training cycle comprising receiving historical global statistical parameters sent by the parameter server (S221), wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data comprises target statistical parameters acquired by the current working node in training cycles prior to the current training cycle and target statistical parameters acquired by the other working nodes in training cycles prior to the current training cycle; acquiring the target statistical parameters of the current target layer (S222); determining the actual statistical parameters of the current target layer based on the historical global statistical parameters and the target statistical parameters (S223); and performing batch standardization on the current target layer based on the actual statistical parameters and the target batch training samples, and sending the target statistical parameters to the parameter server (S224).

Description

学习模型训练方法、工作节点、服务器、设备、介质Learning model training methods, worker nodes, servers, devices, media 技术领域technical field
本发明涉及深度学习技术领域,尤其涉及一种深度学习模型训练方法、一种工作节点、一种参数服务器、一种电子设备和一种可读存储介质。The invention relates to the technical field of deep learning, in particular to a deep learning model training method, a working node, a parameter server, an electronic device and a readable storage medium.
背景技术Background technique
随着信息科技的发展,采用深度学习模型进行训练,以使用训练出的模型对目标数据进行预测,已经得到越来越广泛的使用,为了进一步提升训练出的模型的准确性等,训练样本的数量也越来越大,这就造成了训练的复杂程度和训练时间较长。With the development of information technology, the use of deep learning models for training to use the trained models to predict target data has been more and more widely used. In order to further improve the accuracy of the trained models, the size of the training samples The number is also getting larger, which leads to the complexity of the training and the longer training time.
在相关技术中,通常可以采用多个工作节点对同一模型进行训练,例如:不同的工作节点负责训练同一模型中的不同训练层,此时,下一训练层需要等待上一训练层训练完成才能够执行训练过程,其等待时间大大增加了模型训练的总时间,从而降低了模型训练的效率。In related technologies, multiple worker nodes can usually be used to train the same model. For example, different worker nodes are responsible for training different training layers in the same model. In this case, the next training layer needs to wait for the training of the previous training layer to complete before the training is completed. Being able to perform the training process, the latency of which greatly increases the total time for model training, thereby reducing the efficiency of model training.
由此可知,相关技术中采用多个工作节点对同一模型进行训练的过程中,存在模型训练效率低的缺陷。It can be seen from this that in the process of using multiple working nodes to train the same model in the related art, there is a defect of low model training efficiency.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种深度学习模型训练方法、工作节点和参数服务器,以解决相关技术中采用多个工作节点对同一深度学习模型进行训练的过程中,存在的模型训练效率低的问题。Embodiments of the present invention provide a deep learning model training method, a working node and a parameter server, so as to solve the problem of low model training efficiency in the process of using multiple working nodes to train the same deep learning model in the related art.
为了解决上述技术问题,本发明是这样实现的:In order to solve the above-mentioned technical problems, the present invention is achieved in this way:
第一方面,本发明实施例提供了一种深度学习模型训练方法,应用于工作节点,所述方法包括多个训练周期,每个训练周期都包括:In a first aspect, an embodiment of the present invention provides a deep learning model training method, which is applied to a worker node. The method includes multiple training cycles, and each training cycle includes:
获取目标批量训练样本;Obtain the target batch training samples;
根据所述目标批量训练样本对目标模型的多个层进行训练,其中,所述目标模型的多个层包括至少一个需要批标准化的目标层,从第二个训练周期开始,在每个训练周期对所述目标层进行训练都包括:The multiple layers of the target model are trained according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, starting from the second training cycle, in each training cycle Training the target layer includes:
接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据所述目标模型的当前的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作节点属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;Receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data includes the current working node at The target statistical parameters obtained in the training period before the current training period, and the target statistical parameters obtained in the training period before the current training period by other working nodes belonging to the same working network as the current working node;
获取当前的目标层的目标统计参数,其中,所述目标统计参数为所述目标批量训练样本的统计参数;Obtain the target statistical parameters of the current target layer, wherein the target statistical parameters are the statistical parameters of the target batch training samples;
基于所述历史全局统计参数和所述目标统计参数确定当前的目标层的实际统计参数,并Determine the actual statistical parameters of the current target layer based on the historical global statistical parameters and the target statistical parameters, and
基于所述实际统计参数以及所述目标批量训练样本对当前的目标层进行批标准化,并将所述目标统计参数发送至所述参数服务器。Batch normalize the current target layer based on the actual statistical parameters and the target batch training samples, and send the target statistical parameters to the parameter server.
第二方面,本发明实施例还提供了一种深度学习模型训练方法,应用于参数服务器,其中,所 述深度学习模型训练方法包括多个计算周期,在每个所述计算周期中,所述方法都包括:In a second aspect, an embodiment of the present invention further provides a deep learning model training method, which is applied to a parameter server, wherein the deep learning model training method includes multiple computing cycles, and in each computing cycle, the Methods include:
接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;Receive the target statistical parameters for the target layer in the target training model sent by multiple working nodes, where the target statistical parameters of the target layer include that the working node that sent the target statistical parameters trains the target in the training period corresponding to the current computing period Statistical parameters of target batch training samples used in layers, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。The historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model are calculated according to the received target statistical parameters.
第三方面,本发明实施例还提供了一种工作节点,包括:In a third aspect, an embodiment of the present invention further provides a working node, including:
样本获取模块,用于获取目标批量训练样本;The sample acquisition module is used to acquire target batch training samples;
训练模块,用于根据所述目标批量训练样本对目标模型的多个层进行训练,其中,所述目标模型的多个层包括至少一个需要批标准化的目标层,所述训练模块包括:A training module, configured to train multiple layers of the target model according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, and the training module includes:
第一接收单元,用于接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据目标模型的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;The first receiving unit is configured to receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the target layer of the target model, and the historical training data includes the current The target statistical parameters obtained by the working node in the training cycle before the current training cycle, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current work in the training cycle before the current training cycle;
第一获取单元,用于获取当前被训练的目标层的目标统计参数,其中,所述目标统计参数为所述目标批量训练样本的统计参数;a first obtaining unit, configured to obtain the target statistical parameters of the currently trained target layer, wherein the target statistical parameters are the statistical parameters of the target batch training samples;
确定单元,用于基于所述历史全局统计参数和所述目标统计参数确定当前被训练的目标层的实际统计参数,并基于所述实际统计参数以及所述目标批量训练样本对相应的目标层进行批标准化,以及将所述目标统计参数发送至所述参数服务器。The determining unit is configured to determine the actual statistical parameters of the currently trained target layer based on the historical global statistical parameters and the target statistical parameters, and perform the corresponding target layer based on the actual statistical parameters and the target batch training samples. batch normalization, and sending the target statistical parameters to the parameter server.
第四方面,本发明实施例还提供了一种参数服务器,包括:In a fourth aspect, an embodiment of the present invention further provides a parameter server, including:
一种参数服务器,其特征在于,包括:A parameter server, characterized in that it includes:
参数接收模块,用于接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;The parameter receiving module is used to receive the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes, and the target statistical parameters of the target layer include the working node that sends the target statistical parameters in the current calculation cycle. Statistical parameters of the target batch training samples used in training the target layer in the training cycle, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
更新模块,根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。The updating module calculates the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model according to the received target statistical parameters.
第五方面,本发明实施例还提供了一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现第一方面所述的深度学习模型训练方法的步骤,或者所述程序或指令被所述处理器执行时实现第二方面所述的深度学习模型训练方法的步骤。In a fifth aspect, an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a program or instruction stored in the memory and executable on the processor, the program or instruction being The steps of implementing the deep learning model training method described in the first aspect when the processor is executed, or the steps of implementing the deep learning model training method described in the second aspect when the program or instruction is executed by the processor.
第六方面,本发明实施例还提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现第一方面所述的深度学习模型训练方法的步骤,或者所述程序或指令被处理器执行时实现第二方面所述的深度学习模型训练方法的步骤。In a sixth aspect, an embodiment of the present invention further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the deep learning model described in the first aspect is implemented The steps of the training method, or the steps of implementing the deep learning model training method described in the second aspect when the program or instructions are executed by the processor.
在本发明实施例中,在计算历史全局统计参数时,各个工作节点仅需要将各自计算获得的目标 统计参数发送至参数服务器即可。而参数服务器直接根据各个工作节点发送的目标统计参数即可计算获得历史全局统计参数。整个计算过程中,需要传输的数据量相对较少,从而提高了深度学习模型训练方法的效率。并且,利用历史全局统计参数对当前的目标层的目标统计参数进行修正,可以使得对样本数据进行标准化后获得的结果更加准确。In the embodiment of the present invention, when calculating the historical global statistical parameters, each working node only needs to send the target statistical parameters obtained by their respective calculations to the parameter server. The parameter server can directly calculate and obtain the historical global statistical parameters according to the target statistical parameters sent by each working node. During the entire computing process, the amount of data that needs to be transmitted is relatively small, which improves the efficiency of the deep learning model training method. Moreover, by using the historical global statistical parameters to correct the target statistical parameters of the current target layer, the result obtained after standardizing the sample data can be made more accurate.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative labor.
图1是基于小批量梯度下降法的深度学习模型训练方法的系统架构示意图;Figure 1 is a schematic diagram of the system architecture of a deep learning model training method based on a mini-batch gradient descent method;
图2是本发明第一个方面所提供的深度学习模型训练方法的一种实施方式的流程图;2 is a flowchart of an embodiment of the deep learning model training method provided by the first aspect of the present invention;
图3是本发明第一个方面所提供的深度学习模型训练方法的另一种实施方式的流程图;3 is a flowchart of another embodiment of the deep learning model training method provided by the first aspect of the present invention;
图4是本公开第二个方面所提供的深度学习模型训练方法的一种实施方式的流程图;4 is a flowchart of an embodiment of the deep learning model training method provided by the second aspect of the present disclosure;
图5是本公开第二个方面所提供的深度学习模型训练方法的另一种实施方式的流程图;FIG. 5 is a flowchart of another embodiment of the deep learning model training method provided by the second aspect of the present disclosure;
图6是本发明实施例提供的深度学习模型训练方法中工作节点与参数服务器的数据交互示意图;6 is a schematic diagram of data interaction between a working node and a parameter server in a deep learning model training method provided by an embodiment of the present invention;
图7是本发明实施例提供的一种工作节点的结构图;7 is a structural diagram of a working node provided by an embodiment of the present invention;
图8是本发明实施例提供的一种参数服务器的结构图。FIG. 8 is a structural diagram of a parameter server provided by an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
在大型深度学习模型的训练过程中,为了加快模型收敛速度,提升训练效率,同时考虑到样本总量可能很大(无法利用所有样本数据进行模型迭代),通常使用小批量梯度下降方法进行模型训练。其中,使用小批量梯度下降方法进行模型训练时,每次迭代使用batch size(批尺寸:一次训练所选取的样本数)个样本来对参数进行更新。然而对于参数很多或中间激活数据很多的大型深度学习模型来说,该深度学习模型无法在单个工作节点进行小批量计算,通常需要将训练过程放在多台工作节点上进行。为了便于描述,下文中将多台工作节点组成的网络称为工作网络。In the training process of large-scale deep learning models, in order to speed up the convergence speed of the model and improve the training efficiency, and considering that the total number of samples may be large (it is impossible to use all the sample data for model iteration), the model training is usually performed using the mini-batch gradient descent method. . Among them, when using the mini-batch gradient descent method for model training, each iteration uses batch size (batch size: the number of samples selected for one training) samples to update the parameters. However, for a large-scale deep learning model with many parameters or a lot of intermediate activation data, the deep learning model cannot perform small batch calculations on a single worker node, and the training process usually needs to be performed on multiple worker nodes. For ease of description, a network composed of multiple working nodes is hereinafter referred to as a working network.
举例来说,将训练过程放在多台工作节点上进行的过程中,可以采用针对大型深度学习模型的数据并行训练方法,例如,可以是如图1所示训练网络,该训练网络包括多个工作节点10,以将同一个模型放在各个工作节点10上分别进行训练,然后对训练数据集进行划分(生成小批量训练样本),并将划分后的小批量训练样本20分别分配到不同的工作节点10上,以使该工作点10基于分配到的小批量训练样本20进行模型训练,且各个工作节点10在训练完成后与参数服务器30进行数据交互,以上报训练结果,或者各个工作节点10在训练的过程中与参数服务器30进行数据交互,以使批量训练样本标准化(以下简称:批标准化),该批标准化可以对训练过程的前向传播进行各批标准化层的处理,同时上述实际统计参数在反向传播时也会用到。For example, in the process of placing the training process on multiple worker nodes, a data-parallel training method for a large-scale deep learning model can be used, for example, the training network as shown in Figure 1 can be used. Work node 10, to put the same model on each work node 10 for training, then divide the training data set (generate mini-batch training samples), and distribute the divided mini-batch training samples 20 to different On the working node 10, so that the working node 10 performs model training based on the assigned small batch training samples 20, and each working node 10 exchanges data with the parameter server 30 after the training is completed, and reports the training results, or each working node 10. Perform data interaction with the parameter server 30 during the training process to standardize batch training samples (hereinafter referred to as: batch standardization), which can process each batch of standardization layers in the forward propagation of the training process. Statistics parameters are also used during backpropagation.
在一种实施方式中,单个工作节点能够基于小批量训练样本独立的对深度学习进行训练,并在将更新数据同步到参数服务器,该实施方式仅适用于小型深度学习训练模型。In one embodiment, a single worker node can independently train deep learning based on a small batch of training samples, and synchronize the updated data to the parameter server. This embodiment is only applicable to small deep learning training models.
在另一种可选的实施方式中,在大型深度学习网络模型的训练过程中,使用小批量梯度下降方法进行模型训练时,各工作节点使用相同的模型,分别从数据库中取出数据进行训练,共同完成样本数量为batch size的小批量训练(即batch size为一次模型迭代过程中,所有工作节点训练的训练样本数量的总和),各工作节点运行完后,将模型或参数更新数据同步到参数服务器,参数服务器得到所有工作节点数据后,进行模型更新,并将更新后的模型同步到各工作节点。其中,在各个工作节点执行模型训练的过程中,如果网络层需要对batch size个样本进行全局batch size统计值(以下以批标准化层(batch norm)为例进行举例说明)的时候,需要各工作节点在运行到这一层时,将该层的数据同步到参数服务器,由参数服务器完成统计值的计算后,再同步到各工作节点。In another optional implementation, in the training process of a large-scale deep learning network model, when using the mini-batch gradient descent method for model training, each worker node uses the same model, and retrieves data from the database for training, respectively. Complete small batch training with batch size of samples together (that is, batch size is the sum of the number of training samples trained by all worker nodes in one model iteration process). After each worker node runs, synchronize the model or parameter update data to the parameters. After the server and parameter server get all the working node data, the model is updated, and the updated model is synchronized to each working node. Among them, in the process of model training performed by each worker node, if the network layer needs to perform global batch size statistics for batch size samples (the batch normalization layer (batch norm) is used as an example for illustration below), each work needs to be done. When the node runs to this layer, the data of this layer is synchronized to the parameter server. After the parameter server completes the calculation of the statistical value, it is synchronized to each working node.
例如:对模型训练网络中含有的批标准化层(当前的模型训练网络通常会包含多层批标准化层)进行训练时,需要参数服务器对全部工作节点的训练数据进行统计,以使各个工作节点基于该统计结果分别对本层的当前数据进行校正计算,利用校正后的统计值对相应的目标层进行批标准化。For example, when training the batch normalization layer contained in the model training network (the current model training network usually contains multiple batch normalization layers), the parameter server needs to collect statistics on the training data of all working nodes, so that each working node can be based on The statistical results are respectively corrected and calculated for the current data of this layer, and the corresponding target layer is batch-standardized using the corrected statistical values.
网络层需要对batch size个样本进行全局统计值的时候,工作节点需要进行以下等待过程:When the network layer needs to perform global statistics on batch size samples, the worker node needs to perform the following waiting process:
等待其他工作节点训练至同一批标准化层,以使参数服务器能够获取各个工作节点分别上报的批标准化层的数据;然后等待参数服务器对各个工作节点分别上报的批标准化层的数据进行统计计算,并下发统计结果。Wait for other worker nodes to train to the same batch of normalization layers, so that the parameter server can obtain the batch normalization layer data reported by each worker node; Send statistical results.
这样,极大的增加了工作节点与参数服务器之间的通信量,且增加了各个工作节点在训练过程中的等待时间,对于以后日趋增多的大型模型的训练来说,是个严重的瓶颈问题,本申请能够解决采用小批量梯度下降方法进行模型训练的过程中的存在的模型训练效率低的问题。In this way, the communication volume between the working nodes and the parameter server is greatly increased, and the waiting time of each working node during the training process is increased. The present application can solve the problem of low model training efficiency in the process of using the mini-batch gradient descent method for model training.
请参见图2,图2是本发明实施例提供的一种深度学习模型训练方法的流程图,该方法应用于工作节点。所述深度学习模型训练方法包括多个训练周期,如图2所示,每个训练周期可以包括以下步骤:Please refer to FIG. 2. FIG. 2 is a flowchart of a deep learning model training method provided by an embodiment of the present invention, and the method is applied to a worker node. The deep learning model training method includes multiple training cycles, as shown in Figure 2, and each training cycle may include the following steps:
在步骤S210中,获取目标批量训练样本;In step S210, obtain target batch training samples;
在步骤S220中,根据所述目标批量训练样本对目标模型的多个层进行训练。In step S220, multiple layers of the target model are trained according to the target batch training samples.
其中,所述目标模型的多个层包括至少一个需要批标准化的目标层,相应地,从第二个训练周期开始,在每个训练周期对所述目标层进行训练都包括:Wherein, the multiple layers of the target model include at least one target layer that needs batch normalization. Accordingly, starting from the second training cycle, training the target layer in each training cycle includes:
在步骤S221中,接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据所述目标模型的当前的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作节点属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;In step S221, the historical global statistical parameters sent by the parameter server are received, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data Including the target statistical parameters obtained by the current working node in the training cycle before the current training cycle, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current working node in the training cycle before the current training cycle;
在步骤S222中,获取当前的目标层的目标统计参数,其中,所述目标统计参数包括所述目标批量训练样本的统计参数;In step S222, the target statistical parameters of the current target layer are obtained, wherein the target statistical parameters include the statistical parameters of the target batch training samples;
在步骤S223中,基于所述历史全局统计参数和所述目标统计参数确定当前的目标层的实际统计参数;In step S223, the actual statistical parameters of the current target layer are determined based on the historical global statistical parameters and the target statistical parameters;
在步骤S224中,基于所述实际统计参数以及所述目标批量训练样本对当前的目标层进行批标 准化,并将所述目标统计参数发送至所述参数服务器。In step S224, batch normalization is performed on the current target layer based on the actual statistical parameters and the target batch training samples, and the target statistical parameters are sent to the parameter server.
对所述目标模型进行训练的训练网络需要对目标模型进行多次模型迭代训练,则每一次迭代训练(即,每一个训练周期)中:该训练网络中的各个工作节点分别在前向训练至目标标准化层(即,上文中的目标层)时,对各自当前使用的训练样本进行统计计算,以得到目标统计参数,并将该目标统计参数上报至参数服务器。需要指出的是,目标统计参数包括得出该目标统计参数的工作节点所使用的训练样本的统计值以及样本数量。所述参数服务器可以根据接收到的属于同一工作网络的各个工作节点的目标统计参数计算获得历史全局统计参数。这样,参数服务器计算获得的所述历史全局统计参数能够反映全部历史训练样本的统计特征,则在工作节点下一个训练周期对目标标准化层进行迭代训练的过程中,工作节点从参数服务器获取的所述历史全局统计参数即为已经对目标标准化层训练过的样本数据的历史统计参数,该工作节点将目标标准化层的目标批量训练样本的目标统计参数与该历史全局统计参数进行结合,以得到本标准化层的实际统计参数。The training network for training the target model needs to perform multiple model iteration training on the target model, then in each iteration training (that is, each training cycle): each working node in the training network is trained in the forward direction to In the target normalization layer (ie, the target layer above), statistical calculation is performed on the currently used training samples to obtain target statistical parameters, and the target statistical parameters are reported to the parameter server. It should be pointed out that the target statistical parameter includes the statistical value and the number of samples of the training samples used by the worker nodes that obtain the target statistical parameter. The parameter server may calculate and obtain the historical global statistical parameters according to the received target statistical parameters of each working node belonging to the same working network. In this way, the historical global statistical parameters calculated and obtained by the parameter server can reflect the statistical characteristics of all historical training samples. In the process of iteratively training the target normalization layer in the next training cycle of the working node, all the data obtained by the working node from the parameter server The historical global statistical parameters are the historical statistical parameters of the sample data that have been trained on the target normalization layer. The work node combines the target statistical parameters of the target batch training samples of the target normalization layer with the historical global statistical parameters to obtain this The actual statistical parameters of the normalization layer.
在计算历史全局统计参数时,各个工作节点仅需要将各自计算获得的目标统计参数发送至参数服务器即可。而参数服务器直接根据各个工作节点发送的目标统计参数即可计算获得历史全局统计参数。整个计算过程中,需要传输的数据量相对较少,从而提高了深度学习模型训练方法的效率。并且,利用历史全局统计参数对当前的目标层的目标统计参数进行修正,可以使得对样本数据进行标准化后获得的结果更加准确。When calculating the historical global statistical parameters, each worker node only needs to send the target statistical parameters obtained by their respective calculations to the parameter server. The parameter server can directly calculate and obtain the historical global statistical parameters according to the target statistical parameters sent by each working node. During the entire computing process, the amount of data that needs to be transmitted is relatively small, which improves the efficiency of the deep learning model training method. Moreover, by using the historical global statistical parameters to correct the target statistical parameters of the current target layer, the result obtained after standardizing the sample data can be made more accurate.
在本公开中,对第一个训练周期中,如何对目标层进行训练并不做特殊的限定。例如,可以直接实用第一个周期中目标层的目标统计参数以及目标批量训练样本对当前的目标层进行批标准化。当然,本公开并不限于此,也可以利用预设值对第一个周期中目标层的目标统计参数进行修正、以获得第一个训练周期的实际统计参数,然后利用实际统计参数以及目标批量训练样本对当前的目标层进行批标准化。再或者,可以使用参数服务器根据各个工作节点的目标批量训练样本计算获得全局统计参数,然后利用全局统计参数、以及目标批量训练样本对当前的目标层进行批标准化。In the present disclosure, how to train the target layer in the first training cycle is not particularly limited. For example, the target statistical parameters of the target layer in the first cycle and the target batch training samples can be directly used to batch normalize the current target layer. Of course, the present disclosure is not limited to this, and the target statistical parameters of the target layer in the first cycle can also be modified by using preset values to obtain the actual statistical parameters of the first training cycle, and then the actual statistical parameters and the target batch can be used The training samples are batch normalized to the current target layer. Alternatively, the parameter server can be used to calculate and obtain global statistical parameters according to the target batch training samples of each worker node, and then use the global statistical parameters and the target batch training samples to perform batch normalization on the current target layer.
需要说明的是,目标模型中可以包括多个标准化层,参数服务器可以分别对每一个标准化层的历史全局统计参数进行更新。各个工作节点在前向训练至m1标准化层时,可以从参数服务器获取m1标准化层对应的历史全局统计参数;各个工作节点在前向训练至m2标准化层时,可以从参数服务器获取m2标准化层对应的历史全局统计参数,其中,目标模型中的标准化层包括m1标准化层和m2标准化层。It should be noted that the target model may include multiple normalization layers, and the parameter server may update the historical global statistical parameters of each normalization layer respectively. Each worker node can obtain the historical global statistical parameters corresponding to the m1 standardization layer from the parameter server when forward training to the m1 standardization layer; when each worker node is forward trained to the m2 standardization layer, can obtain the m2 standardization layer from the parameter server. The historical global statistical parameters of , where the normalization layers in the target model include m1 normalization layer and m2 normalization layer.
上述历史全局统计参数可以是对历史数据进行求方差、求和、求积分等计算中的一个或者多个后得出的统计值或者统计值序列,或者,所述历史全局统计参数还可以包括已完成训练的训练样本数量,其中,统计值序列可以包括多个统计值,例如:包括方差值和求和值等。The above-mentioned historical global statistical parameters may be statistical values or series of statistical values obtained by performing one or more calculations such as variance, summation, and integration of historical data, or, the historical global statistical parameters may also include The number of training samples for which training is completed, wherein the statistical value sequence may include multiple statistical values, for example, including variance values and summation values.
需要说明的是,上述历史全局统计参数可以为工作节点在对目标层进行训练之前接收到的历史统计参数(即,当前训练周期之前接收到的历史统计参数),其未对目标层当前使用的训练样本进行统计。It should be noted that the above-mentioned historical global statistical parameters may be the historical statistical parameters received by the working node before training the target layer (that is, the historical statistical parameters received before the current training cycle), which are not used for the target layer currently. Statistics on training samples.
如上文中所述,上述目标层可以是目标批标准化层,所述目标批量训练样本可以是包括至少一个训练样本的样本集合,且应用本申请提供的深度学习模型训练方法的工作节点,基于该样本集合中的训练样本对目标模型的目标层进行训练,即所述目标批量训练样本又可以称之为所述目标层进 行训练的当前训练样本。上述根据目标批量训练样本对目标模型的目标层进行训练,也可以理解为:目标批量训练样本前向训练至所述目标模型的目标层。As mentioned above, the above-mentioned target layer may be a target batch normalization layer, and the target batch training sample may be a sample set including at least one training sample, and a working node applying the deep learning model training method provided by the present application, based on the sample The training samples in the set are used to train the target layer of the target model, that is, the target batch training samples may also be referred to as the current training samples for the training of the target layer. The above-mentioned training of the target layer of the target model according to the target batch training samples can also be understood as: the target batch training samples are trained forward to the target layer of the target model.
另外,上述目标统计参数可以是对该目标批量训练样本进行统计计算后得出的统计值或者统计值序列,该统计计算的方式可以与参数服务器对历史训练数据执行的统计计算的方式相同,在此不再赘述。In addition, the above-mentioned target statistical parameter may be a statistical value or a series of statistical values obtained by performing statistical calculation on the target batch of training samples, and the statistical calculation method may be the same as the statistical calculation method performed by the parameter server on the historical training data. This will not be repeated here.
需要说明的是,在根据所述目标批量训练样本对所述目标模型的多个层进行训训练之前,工作节点还需要获取目标批量数据,举例来说,可以通过以下方式提升各个工作节点获取到的目标批量数据的全局性,即训练过程中能够使用到全局数据信息。It should be noted that, before training multiple layers of the target model according to the target batch training samples, the worker nodes also need to obtain the target batch data. The globality of the target batch data, that is, the global data information can be used in the training process.
在一种可选的实施方式中,可以通过随机采样方式从数据库中获取所述目标批量训练样本。In an optional implementation manner, the target batch of training samples may be obtained from a database in a random sampling manner.
例如,可以采取有放回的方式采样,本实施方式中,工作节点在数据库中获取所述目标批量训练样本之后,可以不删除所述目标批量训练样本,其他工作节点也可以随机选取到目标批量训练样本中的训练样本。For example, sampling with replacement can be adopted. In this embodiment, after the worker node obtains the target batch training samples in the database, the target batch training samples may not be deleted, and other worker nodes can also randomly select the target batch training samples. The training samples in the training samples.
在另一种可选的实施方式中,还可以从数据库获取排列于预设位置处的所述目标批量训练样本,其中,所述数据库存储的训练样本是乱序排列的。In another optional implementation manner, the target batch training samples arranged at preset positions may also be acquired from a database, wherein the training samples stored in the database are arranged in random order.
其中,上述列于预设位置处的所述目标批量训练样本可以是从排列于第一位的训练样本开始获取,并获取排列于该训练样本之后的N-1个训练样本,其中,N表示批量训练样本中包含的训练样本数量。举例来说,数据库可以对其内的训练样本进行分配,以向不同的工作节点分配不同的训练样本。Wherein, the target batch training samples listed at the preset position can be obtained from the training sample arranged at the first position, and N-1 training samples arranged after the training sample are obtained, wherein, N represents The number of training samples included in the batch of training samples. For example, the database can allocate training samples within it to assign different training samples to different worker nodes.
在一些可选的实施例中,所述历史全局统计参数的参数种类与所述目标统计参数的参数种类相同,举例来说,历史全局统计参数包括对历史训练样本的统计参数和历史训练样本含有的样本数量,目标统计参数包括对目标批量训练样本的统计值和目标批量训练样本含有的样本数量。其中,上述样本数量可以是指训练样本使用过的次数,即如果同一个样本训练过n次,则样本数量为n。In some optional embodiments, the parameter types of the historical global statistical parameters are the same as the parameter types of the target statistical parameters. For example, the historical global statistical parameters include statistical parameters for historical training samples and historical training samples containing The target statistical parameters include the statistical value of the target batch training samples and the number of samples contained in the target batch training samples. The above-mentioned number of samples may refer to the number of times the training samples have been used, that is, if the same sample has been trained n times, the number of samples is n.
在一些可选的实施例中,上述目标层的实际统计参数,可以包括统计参数值。In some optional embodiments, the actual statistical parameters of the target layer may include statistical parameter values.
在一些可选的实施例中,上述基于所述历史全局统计参数和所述目标统计参数确定所述目标层的实际统计参数,可以是采用所述历史全局统计参数对目标统计参数进行校正计算,以得到所述目标层的实际统计参数。In some optional embodiments, the above-mentioned determination of the actual statistical parameters of the target layer based on the historical global statistical parameters and the target statistical parameters may be to correct and calculate the target statistical parameters by using the historical global statistical parameters, to obtain the actual statistical parameters of the target layer.
在一种可能的实现方式中,上述批标准化的校正公式可以根据目标层的统计值方差,以及与第一统计值之间的偏差程度等信息进行调整,该批标准化的具体过程与现有技术中批标准化层的批标准化的过程具有相同含义,在此不再赘述。In a possible implementation manner, the above-mentioned batch normalization correction formula can be adjusted according to information such as the variance of the statistical value of the target layer and the degree of deviation from the first statistical value. The specific process of the batch normalization is the same as that of the prior art The batch normalization process of the middle batch normalization layer has the same meaning and will not be repeated here.
在一些可选的实施例中,工作节点在获取到目标统计参数之后,还将该目标统计参数发送至参数服务器,以使参数服务器根据该目标统计参数对历史全局统计参数进行更新,以便于参数服务器将更新后的历史全局统计参数发送至各个工作节点,实现各个工作节点之间的数据同步,便于在下一次迭代中使用更新后的历史全局统计参数对该迭代周期中的目标统计参数进行校准计算,直至模型训练完成。如前所述,历史全局统计参数以及目标统计参数可以包括训练样本统计参数值和训练样本数量,这样,工作节点和参数服务器之间的通信量较小,提高通信效率。In some optional embodiments, after acquiring the target statistical parameters, the worker node also sends the target statistical parameters to the parameter server, so that the parameter server updates the historical global statistical parameters according to the target statistical parameters, so that the parameters The server sends the updated historical global statistical parameters to each working node to realize data synchronization between the working nodes, so that the updated historical global statistical parameters can be used in the next iteration to calibrate and calculate the target statistical parameters in the iteration cycle , until the model training is complete. As mentioned above, the historical global statistical parameters and the target statistical parameters may include the training sample statistical parameter value and the number of training samples, so that the communication volume between the worker nodes and the parameter server is small and the communication efficiency is improved.
需要指出的是,本实施方式中,并不具体限定工作节点执行:基于所述实际统计参数和所述目 标批量训练样本对当前的目标层进行批标准化以及将所述目标统计参数发送至所述参数服务器这两个步骤的先后顺序。It should be pointed out that, in this implementation manner, the working node is not specifically limited to perform: batch normalize the current target layer based on the actual statistical parameters and the target batch training samples, and send the target statistical parameters to the The sequence of these two steps for the parameter server.
在一种可能的实现方式中,各个工作节点在得到所述实际统计参数之后,可以将该实际统计参数存储在本工作节点,以利用该实际统计参数进行反向传播。In a possible implementation manner, after obtaining the actual statistical parameter, each working node may store the actual statistical parameter in the working node, so as to use the actual statistical parameter for back propagation.
在本发明实施例中,工作节点接收参数服务器根据历史训练数据确定的历史全局统计参数,在该工作节点基于目标批量训练样本对目标模型的目标层进行训练时,基于已经接收到的历史全局统计参数和目标批量训练样本的目标统计参数,确定所述目标层的实际统计参数,并基于该实际统计参数执行所述目标模型的前向和反向传播训练,这样,工作节点无需等待参数服务器获取到目标模型的全部工作节点对目标层进行训练时的训练样本统计参数时,依据这些训练样本统计参数更新统计参数并下发至各个工作节点时,才能够基于该参数服务器下发的统计参数执行前向和反向传播训练,从而大大减小了工作节点等待参数服务器下发统计参数的时间,能够提升深度学习模型的训练效率。In the embodiment of the present invention, the working node receives the historical global statistical parameters determined by the parameter server according to the historical training data, and when the working node trains the target layer of the target model based on the target batch training samples, based on the received historical global statistical parameters parameters and target statistical parameters of the target batch training samples, determine the actual statistical parameters of the target layer, and perform forward and back propagation training of the target model based on the actual statistical parameters, so that the worker nodes do not need to wait for the parameter server to obtain When all the working nodes of the target model train the target layer with the statistical parameters of the training samples, the statistical parameters can only be executed based on the statistical parameters sent by the parameter server when the statistical parameters are updated according to these statistical parameters of the training samples and sent to each working node. Forward and backpropagation training greatly reduces the time for worker nodes to wait for the parameter server to issue statistical parameters, which can improve the training efficiency of the deep learning model.
如上文中所述,在本公开中,所述深度学习模型训练方法相当于一种梯度下降方法,在对目标层的数据进行标准化后,还需要利用标准化后的数据训练目标模型的参数。As described above, in the present disclosure, the deep learning model training method is equivalent to a gradient descent method. After the data of the target layer is normalized, the parameters of the target model need to be trained by using the normalized data.
在同一个训练网络中,多个工作节点分别利用不同的训练样本对相同的模型进行训练。为了获得更准确的模型,每个工作节点训练后获得的模型参数(为了便于描述,称之为单体目标模型参数)均被发送至参数服务器,所述参数服务器接收到多个单体目标模型参数后,结合所接收到的所有单体目标模型参数对目标模型进行总体参数更新,并且将总体更新后的目标模型下发至各个工作节点,以使各个工作节点分别基于不同的训练样本继续对更新后的目标模型进行训练。In the same training network, multiple worker nodes use different training samples to train the same model. In order to obtain a more accurate model, the model parameters obtained after training by each worker node (referred to as single-body target model parameters for ease of description) are sent to the parameter server, which receives multiple single-body target models After the parameters are set, the overall parameters of the target model are updated in combination with all the received individual target model parameters, and the overall updated target model is sent to each work node, so that each work node continues to adjust the parameters based on different training samples. The updated target model is trained.
针对任意工作节点而言,如图3所示,所述深度学习模型训练方法的每个训练周期还可以包括:For any working node, as shown in Figure 3, each training cycle of the deep learning model training method may further include:
在步骤S230中,将在当前工作节点对各个层训练获得的单体目标模型参数发送至所述参数服务器;以及In step S230, sending the single target model parameters obtained by training each layer at the current working node to the parameter server; and
在步骤S240中,在下一个训练周期开始之前,接收所述参数服务器发送的更新后的目标训练模型,作为一下个训练周期的目标训练模型。In step S240, before the next training cycle starts, the updated target training model sent by the parameter server is received as the target training model of the next training cycle.
作为本公开的第二个方面,请参阅图4,是本申请提供的深度学习模型训练方法的流程图,该深度学习模型训练方法应用于参数服务器,在本公开所提供的第二个方面中,所述深度学习模型训练方法包括多个计算周期,在每个计算周期中,该方法可以包括以下步骤:As a second aspect of the present disclosure, please refer to FIG. 4 , which is a flowchart of a deep learning model training method provided by the present application. The deep learning model training method is applied to a parameter server. In the second aspect provided by the present disclosure , the deep learning model training method includes multiple computing cycles, and in each computing cycle, the method may include the following steps:
在步骤S310中,接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;In step S310, the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes are received, and the target statistical parameters of the target layer include the training data corresponding to the current computing cycle of the working node that sent the target statistical parameters. Statistical parameters of the target batch training samples used in training the target layer in the cycle, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
在步骤S320中,根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。In step S320, the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model are calculated according to the received target statistical parameters.
本实施方式中,上述工作节点可以是与所述参数服务器一起对目标模型进行训练的工作节点,其可以是执行如图2所示方法的工作节点。In this embodiment, the above-mentioned working node may be a working node that trains the target model together with the parameter server, which may be a working node that executes the method shown in FIG. 2 .
获得所述历史全局统计参数后,参数服务器可以向对目标模型进行训练的全部工作节点发送该 历史全局统计参数。其中,所述历史全局统计参数与本公开第一个方面所提到的历史全局统计参数具有相同含义,在此不在赘述。After obtaining the historical global statistical parameters, the parameter server may send the historical global statistical parameters to all working nodes that train the target model. Wherein, the historical global statistical parameters have the same meaning as the historical global statistical parameters mentioned in the first aspect of the present disclosure, and are not repeated here.
需要指出的是,第一个计算周期中接收到的目标统计参数是工作节点第一个训练周期中计算获得的目标统计参数。It should be pointed out that the target statistical parameters received in the first calculation cycle are the target statistical parameters obtained by calculation in the first training cycle of the worker node.
本公开第二个方面所提供的深度学习模型训练方法与本公开第一个方面所提供的深度学习模型训练方法互相配合。本公开第二个方面所提供的深度学习模型训练方法中的第一个计算周期与本公开第一方面所提供的深度学习模型训练方法中的第一个训练周期相对应,本公开第一个方面所提供的深度学习模型训练方法的第二个训练周期中所接收到的历史全局统计参数是本公开第二个方面所提供的深度学习模型训练方法的第一个计算周期计算获得的历史全局统计参数;本公开第二个方面所提供的深度学习模型训练方法中的第二个计算周期与本公开第一方面所提供的深度学习模型训练方法中的第二个训练周期相对应,本公开第一个方面所提供的深度学习模型训练方法中的第三个训练周期中所所接收到的历史全局统计参数是本公开第二个方面所提供的深度学习模型训练方法的第二个计算周期计算获得的历史全局统计参数,依次类推。The deep learning model training method provided by the second aspect of the present disclosure cooperates with the deep learning model training method provided by the first aspect of the present disclosure. The first computing cycle in the deep learning model training method provided in the second aspect of the present disclosure corresponds to the first training cycle in the deep learning model training method provided in the first aspect of the present disclosure, and the first The historical global statistical parameter received in the second training cycle of the deep learning model training method provided by the aspect is the historical global statistical parameter obtained by the first calculation cycle of the deep learning model training method provided by the second aspect of the present disclosure. Statistical parameters; the second calculation period in the deep learning model training method provided by the second aspect of the present disclosure corresponds to the second training period in the deep learning model training method provided by the first aspect of the present disclosure, and the present disclosure The historical global statistical parameters received in the third training cycle in the deep learning model training method provided by the first aspect is the second computing cycle of the deep learning model training method provided by the second aspect of the present disclosure Calculate the historical global statistical parameters obtained by calculation, and so on.
其中,各个工作节点发送的目标统计参数可以为其对目标层进行训练时使用的目标批量训练样本的统计参数,其可以与如图2所示方法实施例中的目标统计参数具有相同含义,在此不再赘述。The target statistical parameters sent by each working node may be the statistical parameters of the target batch training samples used when training the target layer, which may have the same meaning as the target statistical parameters in the method embodiment shown in FIG. 2 . This will not be repeated here.
除此之外,即便对于同一个工作节点、同一个训练周期而言,不同的目标层对应的目标统计参数不同,且不同的目标衬该对应的历史全局统计参数也不同。In addition, even for the same working node and the same training cycle, the target statistical parameters corresponding to different target layers are different, and the historical global statistical parameters corresponding to different target layers are also different.
在本公开中,对各个工作节点如何获得参数服务器计算获得的历史全局统计参数并不做特殊的限定。作为一种可选实施方式,参数服务器可以主动将计算获得的历史全局统计参数下发至各个工作节点。也就是说,所述深度学习模型训练方法包括在步骤S320之后进行的:In the present disclosure, there is no special limitation on how each working node obtains the historical global statistical parameters calculated and obtained by the parameter server. As an optional implementation manner, the parameter server may actively deliver the historical global statistical parameters obtained by calculation to each working node. That is to say, the deep learning model training method includes the following steps after step S320:
向各个所述工作节点发送所述历史全局统计参数。Send the historical global statistical parameters to each of the working nodes.
当然,本公开并不限于此,所述参数服务器在计算获得所述历史全局统计参数后,不主动向各个工作节点发送所述历史全局统计参数,而是在接收到全局参数获取请求时才向发送该历史全局参数获取请求的工作节点发送所述历史全局统计参数。也就是说,所述深度学习模型训练方法还可以包括在步骤S320之后进行的:Of course, the present disclosure is not limited to this. After calculating and obtaining the historical global statistical parameters, the parameter server does not actively send the historical global statistical parameters to each working node, but only sends the global parameter acquisition request to the The worker node that sends the historical global parameter acquisition request sends the historical global statistical parameter. That is to say, the deep learning model training method may further include steps performed after step S320:
响应于全局参数获取请求,确定发送该全局参数获取请求的工作节点的身份信息;In response to the global parameter acquisition request, determine the identity information of the worker node that sent the global parameter acquisition request;
向发送所述全局参数获取请求的工作节点发送所述历史全局统计参数。Send the historical global statistical parameters to the worker node that sent the global parameter acquisition request.
在一种可选的实施方式中,上述接收所述工作节点分别基于不同的批量训练样本对目标模型的目标层进行训练时发送的目标统计参数,可以是在接收到每一个工作节点对目标模型的目标层进行训练时发送的目标统计参数后,才执行步骤S320。In an optional implementation manner, the above-mentioned receiving the target statistical parameters sent by the working nodes when training the target layer of the target model based on different batches of training samples may be obtained when each working node receives the target model Step S320 is performed only after the target statistical parameters sent when the target layer is trained.
在另一种可选的实施方式中,上述接收所述工作节点分别基于不同的批量训练样本对目标模型的目标层进行训练时发送的目标统计参数,可以是在接收到预设数量个工作节点发送的目标统计参数的情况下,基于所述历史全局统计参数和所述预设数量个目标统计参数对所述历史全局统计参数进行更新,其中,所述预设数量小于或者等于所述第一工作节点的总数量。换言之,从第二个计算周期开始,步骤S320可以具体包括:In another optional implementation manner, the above-mentioned receiving of the target statistical parameters sent by the worker nodes when training the target layer of the target model based on different batch training samples respectively may be received when a preset number of worker nodes are received. In the case of the sent target statistical parameters, the historical global statistical parameters are updated based on the historical global statistical parameters and the preset number of target statistical parameters, wherein the preset number is less than or equal to the first The total number of worker nodes. In other words, starting from the second calculation cycle, step S320 may specifically include:
在接收到预设数量个工作节点发送的与当前计算周期对应的目标层的目标统计参数的情况下, 基于在当前计算周期之前的计算周期中所获得的历史全局统计参数和所述预设数量个目标统计参数计算当前周期对应的目标层的历史全局统计参数,其中,所述预设数量小于或者等于所述工作网络中工作节点的总数量。In the case of receiving the target statistical parameters of the target layer corresponding to the current computing cycle sent by a preset number of working nodes, based on the historical global statistical parameters obtained in the computing cycle before the current computing cycle and the preset number of The historical global statistical parameters of the target layer corresponding to the current cycle are calculated from the target statistical parameters, wherein the preset number is less than or equal to the total number of working nodes in the working network.
本实施方式中,预留一些工作节点,当预设数量的工作节点提交数据到参数服务器后,即可根据提交的数据更新模型,而对于超出预设数量的工作节点提交数据不再等待和接收,这样,能够减少因为某个工作节点运行慢或死机等导致的整体模型训练过程停止等待,能够提升模型训练效率。In this embodiment, some working nodes are reserved, and after a preset number of working nodes submit data to the parameter server, the model can be updated according to the submitted data, and the data submitted by working nodes exceeding the preset number will no longer wait and receive. , in this way, it can reduce the stop and wait of the overall model training process due to the slow running or crash of a certain worker node, which can improve the efficiency of model training.
上述基于所述目标统计参数对所述历史全局统计参数进行更新,可以是根据所述目标统计参数对所述历史全局统计参数进行偏差校正。The above-mentioned updating of the historical global statistical parameters based on the target statistical parameters may be performing deviation correction on the historical global statistical parameters according to the target statistical parameters.
如上文所述,参数服务器可以周期性地执行步骤S310和步骤S320,以实现在进行模型数据更新时,将每一个批标准化层的历史全局统计参数同步到各个工作节点。其中,参数服务器可以在得到更新后的历史全局统计参数时,执行步骤S310。As described above, the parameter server may periodically execute steps S310 and S320, so as to synchronize the historical global statistical parameters of each batch normalization layer to each working node when updating the model data. The parameter server may perform step S310 when obtaining the updated historical global statistical parameters.
在具体实施中,参数服务器在得到更新后的历史全局统计参数之后,可以将该更新后的历史全局统计参数分别发送至各个工作节点,具体可以是在各工作节点前向运行至目标标准化层时,将更新后的历史全局统计参数分别发送至该工作节点,其中,所述目标标准化层为与所述更新的历史全局统计参数相关联的标准化层。In a specific implementation, after obtaining the updated historical global statistical parameters, the parameter server may send the updated historical global statistical parameters to each working node, specifically, when each working node runs forward to the target normalization layer , respectively sending the updated historical global statistical parameters to the working node, wherein the target normalization layer is a normalization layer associated with the updated historical global statistical parameters.
本申请实施例提供的深度学习模型训练方法,在对批标准化层进行训练的过程中,无需各个工作节点将训练样本数据全部发送至参数服务器,并等待参数服务器进行全局统计后返回统计参数才能够执行批标准化,而是由各个工作节点对各自使用的批标准化层的批标准化数据进行统计,并上报各自的目标统计参数,从而减少了参数服务器与各个工作节点之间的数据交互量,且参数服务器能够向各个工作节点发送历史全局统计参数,以使工作节点能够根据接收到的历史全局统计参数对目标统计参数进行校正,以得到各个工作节点各自使用的实际统计参数,并采用该实际统计参数以及目标训练样本对相应的目标层进行批标准化,能够大大减少工作节点的等待时间。In the deep learning model training method provided by the embodiment of the present application, in the process of training the batch normalization layer, it is not necessary for each worker node to send all training sample data to the parameter server, and wait for the parameter server to perform global statistics before returning the statistical parameters. To perform batch normalization, each worker node counts the batch normalization data of the batch normalization layer used by each worker, and reports their respective target statistical parameters, thereby reducing the amount of data interaction between the parameter server and each worker node, and the parameters The server can send the historical global statistical parameters to each working node, so that the working node can correct the target statistical parameters according to the received historical global statistical parameters, so as to obtain the actual statistical parameters used by each working node, and use the actual statistical parameters. And the target training samples are batch normalized to the corresponding target layers, which can greatly reduce the waiting time of the worker nodes.
为了对整个目标模型进行更新,相应地,如图5所示,所述深度模型训练方法还可以包括:In order to update the entire target model, correspondingly, as shown in Figure 5, the deep model training method may further include:
在步骤S330中,接收各个工作节点发送的对所述目标模型的各层进行训练获得的单体目标模型参数;In step S330, receiving the individual target model parameters obtained by training each layer of the target model sent by each working node;
在步骤S340中,根据接收到的单体目标模型参数对当前周期的目标模型进行更新,以获得用于工作节点的下一个训练周期的目标模型。In step S340, the target model of the current cycle is updated according to the received monomer target model parameters, so as to obtain the target model for the next training cycle of the working node.
下面结合工作节点与参数服务器的数据交互过程,对本申请实施例提供的深度学习模型训练方法进行举例说明,如图6所示,该方法包括以下过程:The following describes the deep learning model training method provided by the embodiment of the present application in conjunction with the data interaction process between the working node and the parameter server. As shown in FIG. 6 , the method includes the following processes:
步骤601、工作节点A从数据库中获取当前需要训练的目标训练数据。In step 601, the working node A obtains the target training data that needs to be trained currently from the database.
其中,目标训练数据即如图2和图3所示方法实施例中的目标批量训练样本。The target training data is the target batch training samples in the method embodiments shown in FIG. 2 and FIG. 3 .
步骤602、工作节点A基于目标训练数据对目标模型进行训练。Step 602: The working node A trains the target model based on the target training data.
其中,目标模型可以包括多个需要全局batch size统计参数的层(如batchnorm批标准化层,以下仅以batchnorm批标准化层为例)。Among them, the target model can include multiple layers that require global batch size statistical parameters (such as batchnorm batch normalization layer, the following only takes batchnorm batch normalization layer as an example).
其中,全局batch size统计参数包括:方差,求和,平均值等统计值,其还包括:目标训练数据的数量。Among them, the global batch size statistical parameters include: variance, summation, average and other statistical values, which also include: the number of target training data.
步骤603、在工作节点A前向运行至目标批标准化层时,获取目标统计参数,并基于历史全局统计参数对目标统计参数进行校正,得到实际统计参数。Step 603: When the working node A runs forward to the target batch normalization layer, obtain the target statistical parameters, and correct the target statistical parameters based on the historical global statistical parameters to obtain the actual statistical parameters.
本步骤中,工作节点A获取目标批标准化层的目标训练数据,并对目标训练数据进行统计计算,以得到目标统计参数,然后使用从参数服务器同步来的目标批标准化层的历史全局统计参数,对本工作节点的目标统计参数进行校正,求得当前实际使用的统计参数,即实际统计参数,此统计参数在本机保留,反向传播时需要用到。In this step, the worker node A obtains the target training data of the target batch normalization layer, performs statistical calculation on the target training data to obtain the target statistical parameters, and then uses the historical global statistical parameters of the target batch normalization layer synchronized from the parameter server, Correct the target statistical parameters of this work node to obtain the currently used statistical parameters, that is, the actual statistical parameters. This statistical parameter is reserved in the local machine and needs to be used during back propagation.
步骤604、在工作节点A将目标统计参数发送至参数服务器。Step 604: Send the target statistical parameters to the parameter server at the working node A.
步骤605、参数服务器根据目标统计参数对存储的历史全局统计参数进行更新,并将更新后的历史全局统计参数发送至各个工作节点(包括工作节点A)。Step 605: The parameter server updates the stored historical global statistical parameters according to the target statistical parameters, and sends the updated historical global statistical parameters to each working node (including working node A).
本申请实施例中,工作节点和参数服务器相互配合以执行如图2和图3所示深度学习模型训练方法的各个过程,且能够取得相同的有益效果,为避免重复,在此不再赘述。In the embodiment of the present application, the working node and the parameter server cooperate with each other to execute each process of the deep learning model training method as shown in FIG. 2 and FIG. 3 , and can achieve the same beneficial effect. To avoid repetition, details are not repeated here.
请参阅图7,是本申请实施例提供的一种工作节点的结构图,如图7所示,该工作节点500包括:Please refer to FIG. 7 , which is a structural diagram of a working node provided by an embodiment of the present application. As shown in FIG. 7 , the working node 500 includes:
样本获取模块510,用于获取目标批量训练样本;a sample acquisition module 510, configured to acquire target batch training samples;
训练模块520,用于根据所述目标批量训练样本对目标模型的多个层进行训练,其中,所述目标模型的多个层包括至少一个需要批标准化的目标层。The training module 520 is configured to train multiple layers of the target model according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization.
训练模块520包括 Training module 520 includes
第一接收单元521,用于接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据目标模型的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;The first receiving unit 521 is configured to receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the target layer of the target model, and the historical training data includes The target statistical parameters obtained by the current working node in the training period before the current training period, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current work in the training period before the current training period;
确定单元522,用于基于所述历史全局统计参数和所述目标统计参数确定当前被训练的目标层的实际统计参数,并基于所述实际统计参数和所述目标批量训练样本对相应的目标层进行批标准化,以及将所述目标统计参数发送至所述参数服务器。The determining unit 522 is configured to determine the actual statistical parameters of the currently trained target layer based on the historical global statistical parameters and the target statistical parameters, and based on the actual statistical parameters and the target batch training samples, the corresponding target layer Batch normalization is performed, and the target statistical parameters are sent to the parameter server.
需要解释的是,此处“相应的目标层”是指,与实际统计参数、目标批量训练样本均属于同一个训练周期的、且需要用到上述实际统计参数的目标层。It should be explained that the "corresponding target layer" here refers to the target layer that belongs to the same training cycle as the actual statistical parameters and the target batch training samples and needs to use the above-mentioned actual statistical parameters.
工作节点用于执行本公开第一个方面所提供的深度学习模型训练方法,上文中已经对所述深度学习模型训练方法的原理以及有益效果进行了详细描述,这里不再赘述。The working node is used to execute the deep learning model training method provided in the first aspect of the present disclosure. The principles and beneficial effects of the deep learning model training method have been described in detail above, and will not be repeated here.
可选的,样本获取模块510用于通过随机采样方式从数据库中获取所述目标批量训练样本。或者,样本获取模块510用于从数据库获取排列于预设位置处的所述目标批量训练样本,其中,所述数据库存储的训练样本是乱序排列的。Optionally, the sample acquisition module 510 is configured to acquire the target batch training samples from the database in a random sampling manner. Alternatively, the sample obtaining module 510 is configured to obtain the target batch training samples arranged at preset positions from a database, wherein the training samples stored in the database are arranged in random order.
可选的,所述目标统计参数包括所述目标批量训练样本的统计值和所述目标批量训练样本的样本数量。Optionally, the target statistical parameter includes a statistical value of the target batch of training samples and the number of samples of the target batch of training samples.
可选地,工作节点500还可以包括:Optionally, the worker node 500 may further include:
发送模块530,用于将在当前工作节点对各个层训练获得的单体目标模型参数发送至所述参数服务器;和a sending module 530, configured to send the single target model parameters obtained by training each layer at the current working node to the parameter server; and
模型接收模块540,用于接收所述参数服务器发送的更新后的目标训练模型,作为一下个训练周期的目标训练模型。The model receiving module 540 is configured to receive the updated target training model sent by the parameter server as the target training model for the next training cycle.
请参阅图8,是本申请实施例提供的一种参数服务器的结构图,如图8所示,该参数服务器600包括:Please refer to FIG. 8 , which is a structural diagram of a parameter server provided by an embodiment of the present application. As shown in FIG. 8 , the parameter server 600 includes:
参数接收模块610,用于接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;The parameter receiving module 610 is configured to receive the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes, where the target statistical parameters of the target layer include the working node that sends the target statistical parameters in the current computing cycle corresponding to the The statistical parameters of the target batch training samples used in training the target layer in the training cycle of
更新模块620,根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。The updating module 620 calculates, according to the received target statistical parameters, the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model.
参数服务器600用于执行本公开第二个方面所提供的深度学习模型训练方法,上文中已经对本公开第二个方面所提供的深度学习模型训练方法的有益效果和工作原理进行了详细的描述,这里不再赘述。The parameter server 600 is configured to execute the deep learning model training method provided by the second aspect of the present disclosure. The beneficial effects and working principles of the deep learning model training method provided by the second aspect of the present disclosure have been described in detail above, I won't go into details here.
可选的,更新模块620可以具体用于在接收到预设数量个第一工作节点发送的与当前计算周期对应的目标层的目标统计参数的情况下,基于在当前计算周期之前的计算周期中所获得的历史全局统计参数和所述预设数量个目标统计参数计算当前周期对应的目标层的历史全局统计参数,其中,所述预设数量小于或者等于所述工作网络中工作节点的总数量。Optionally, the update module 620 may be specifically configured to, in the case of receiving the target statistical parameters of the target layer corresponding to the current calculation period sent by the preset number of first working nodes, based on the calculation period before the current calculation period. The obtained historical global statistical parameters and the preset number of target statistical parameters calculate the historical global statistical parameters of the target layer corresponding to the current cycle, wherein the preset number is less than or equal to the total number of working nodes in the working network .
可选地,参数接收模块610还用于接收各个工作节点发送的对所述目标模型的各层进行训练获得的单体目标模型参数。Optionally, the parameter receiving module 610 is further configured to receive individual target model parameters sent by each working node and obtained by training each layer of the target model.
相应地,更新模块620还用于根据接收到的单体目标模型参数对当前周期的目标模型进行更新。Correspondingly, the updating module 620 is further configured to update the target model of the current cycle according to the received monomer target model parameters.
发明实施例还提供一种电子设备,包括处理器、存储器,存储在存储器上并可在所述处理器上运行的程序或指令,该程序或指令被处理器执行时实现如本公开第一个方面所提供的深度学习模型训练方法或者本公开第二个方面所提供的深度学习模型训练方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。Embodiments of the invention also provide an electronic device, including a processor, a memory, a program or an instruction stored in the memory and executable on the processor, and when the program or instruction is executed by the processor, the first embodiment of the present disclosure is implemented. The various processes of the deep learning model training method provided in one aspect or the deep learning model training method provided by the second aspect of the present disclosure can achieve the same technical effect, and are not repeated here to avoid repetition.
本公开实施例还提供一种可读存储介质,其中,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如本公开第一个方面所述的深度学习模型训练方法的步骤,或者所述程序或指令被处理器执行时实现如本公开第二个方面所述的深度学习模型训练方法的步骤。An embodiment of the present disclosure further provides a readable storage medium, wherein a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the deep learning according to the first aspect of the present disclosure is implemented The steps of the model training method, or the steps of implementing the deep learning model training method according to the second aspect of the present disclosure when the program or instructions are executed by the processor.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干 指令用以使得一台移动终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a mobile terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present invention.
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the spirit of the present invention and the scope protected by the claims, many forms can be made, which all belong to the protection of the present invention.

Claims (16)

  1. 一种深度学习模型训练方法,应用于工作节点,其特征在于,所述方法包括多个训练周期,每个训练周期都包括:A deep learning model training method, applied to a working node, is characterized in that, the method includes a plurality of training cycles, and each training cycle includes:
    获取目标批量训练样本;Obtain the target batch training samples;
    根据所述目标批量训练样本对目标模型的多个层进行训练,其中,所述目标模型的多个层包括至少一个需要批标准化的目标层,从第二个训练周期开始,在每个训练周期对所述目标层进行训练都包括:The multiple layers of the target model are trained according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, starting from the second training cycle, in each training cycle Training the target layer includes:
    接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据所述目标模型的当前的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作节点属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;Receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the current target layer of the target model, and the historical training data includes the current working node at The target statistical parameters obtained in the training period before the current training period, and the target statistical parameters obtained in the training period before the current training period by other working nodes belonging to the same working network as the current working node;
    获取当前的目标层的目标统计参数,其中,所述目标统计参数为所述目标批量训练样本的统计参数;Obtain the target statistical parameters of the current target layer, wherein the target statistical parameters are the statistical parameters of the target batch training samples;
    基于所述历史全局统计参数和所述目标统计参数确定当前的目标层的实际统计参数;Determine the actual statistical parameters of the current target layer based on the historical global statistical parameters and the target statistical parameters;
    基于所述实际统计参数以及所述目标批量训练样本对当前的目标层进行批标准化,并将所述目标统计参数发送至所述参数服务器。Batch normalize the current target layer based on the actual statistical parameters and the target batch training samples, and send the target statistical parameters to the parameter server.
  2. 根据权利要求1所述的深度学习模型训练方法,其特征在于,所述获取目标批量训练样本,包括:The deep learning model training method according to claim 1, wherein the obtaining target batch training samples comprises:
    通过随机采样方式从数据库中获取所述目标批量训练样本;Obtain the target batch training samples from the database by random sampling;
    或者,or,
    从数据库获取排列于预设位置处的所述目标批量训练样本,其中,所述数据库存储的训练样本是乱序排列的。The target batch training samples arranged at preset positions are acquired from a database, wherein the training samples stored in the database are arranged in disorder.
  3. 根据权利要求1或2所述的深度学习模型训练方法,其特征在于,所述目标统计参数包括所述目标批量训练样本的统计值和所述目标批量训练样本的样本数量。The deep learning model training method according to claim 1 or 2, wherein the target statistical parameters include the statistical value of the target batch of training samples and the number of samples of the target batch of training samples.
  4. 根据权利要求1或2所述的深度学习模型训练方法,其特征在于,每个训练周期还包括:The deep learning model training method according to claim 1 or 2, wherein each training cycle further comprises:
    将在当前工作节点对各个层训练获得的单体目标模型参数发送至所述参数服务器;以及sending the single target model parameters obtained by training each layer at the current working node to the parameter server; and
    在下一个训练周期开始之前,接收所述参数服务器发送的更新后的目标训练模型,作为一下个训练周期的目标训练模型。Before the next training period starts, the updated target training model sent by the parameter server is received as the target training model of the next training period.
  5. 一种深度学习模型训练方法,应用于参数服务器,其特征在于,所述深度学习模型训练方法包括多个计算周期,在每个所述计算周期中,所述方法都包括:A deep learning model training method, applied to a parameter server, characterized in that the deep learning model training method includes a plurality of calculation cycles, and in each of the calculation cycles, the method includes:
    接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统 计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;Receive the target statistical parameters for the target layer in the target training model sent by multiple working nodes, where the target statistical parameters of the target layer include that the working node that sent the target statistical parameters trains the target in the training period corresponding to the current computing period Statistical parameters of target batch training samples used in layers, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
    根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。The historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model are calculated according to the received target statistical parameters.
  6. 根据权利要求5所述的深度学习模型训练方法,其特征在于,从第二个计算周期开始,所述根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数,包括:The deep learning model training method according to claim 5, wherein, starting from the second calculation cycle, the calculation of the target corresponding to the current calculation cycle in the target training model according to the received target statistical parameters Historical global statistical parameters of the layer, including:
    在接收到预设数量个工作节点发送的与当前计算周期对应的目标层的目标统计参数的情况下,基于在当前计算周期之前的计算周期中所获得的历史全局统计参数和所述预设数量个目标统计参数计算当前周期对应的目标层的历史全局统计参数,其中,所述预设数量小于或者等于所述工作网络中工作节点的总数量。In the case of receiving the target statistical parameters of the target layer corresponding to the current computing cycle sent by the preset number of working nodes, based on the historical global statistical parameters obtained in the computing cycle before the current computing cycle and the preset number of The historical global statistical parameters of the target layer corresponding to the current cycle are calculated from the target statistical parameters, wherein the preset number is less than or equal to the total number of working nodes in the working network.
  7. 根据权利要求5或6所述的深度学习模型训练方法,其特征在于,所述深度模型训练方法还包括:The deep learning model training method according to claim 5 or 6, wherein the deep model training method further comprises:
    接收各个工作节点发送的对所述目标模型的各层进行训练获得的单体目标模型参数;Receive the single target model parameters obtained by training each layer of the target model sent by each working node;
    根据接收到的单体目标模型参数对当前周期的目标模型进行更新。The target model of the current cycle is updated according to the received monomer target model parameters.
  8. 一种工作节点,其特征在于,包括:A working node, characterized in that it includes:
    样本获取模块,用于获取目标批量训练样本;The sample acquisition module is used to acquire target batch training samples;
    训练模块,用于根据所述目标批量训练样本对目标模型的多个层进行训练,其中,所述目标模型的多个层包括至少一个需要批标准化的目标层,所述训练模块包括:A training module, configured to train multiple layers of the target model according to the target batch training samples, wherein the multiple layers of the target model include at least one target layer requiring batch normalization, and the training module includes:
    第一接收单元,用于接收参数服务器发送的历史全局统计参数,其中,所述历史全局统计参数是所述参数服务器根据目标模型的目标层的历史训练数据确定的,所述历史训练数据包括当前工作节点在当前训练周期之前的训练周期中获得的目标统计参数、以及与当前工作属于同一工作网络的其他工作节点在当前训练周期之前的训练周期中获得的目标统计参数;The first receiving unit is configured to receive the historical global statistical parameters sent by the parameter server, wherein the historical global statistical parameters are determined by the parameter server according to the historical training data of the target layer of the target model, and the historical training data includes the current The target statistical parameters obtained by the working node in the training cycle before the current training cycle, and the target statistical parameters obtained by other working nodes belonging to the same working network as the current work in the training cycle before the current training cycle;
    第一获取单元,用于获取当前被训练的目标层的目标统计参数,其中,所述目标统计参数为所述目标批量训练样本的统计参数;a first obtaining unit, configured to obtain the target statistical parameters of the currently trained target layer, wherein the target statistical parameters are the statistical parameters of the target batch training samples;
    确定单元,用于基于所述历史全局统计参数和所述目标统计参数确定当前被训练的目标层的实际统计参数,并基于所述实际统计参数以及所述目标批量训练样本对相应的目标层进行批标准化,以及将所述目标统计参数发送至所述参数服务器。The determining unit is configured to determine the actual statistical parameters of the currently trained target layer based on the historical global statistical parameters and the target statistical parameters, and perform the corresponding target layer based on the actual statistical parameters and the target batch training samples. batch normalization, and sending the target statistical parameters to the parameter server.
  9. 根据权利要求8所述的工作节点,其特征在于,所述样本获取模块用于通过随机采样方式从数据库中获取所述目标批量训练样本;The working node according to claim 8, wherein the sample acquisition module is configured to acquire the target batch training samples from a database in a random sampling manner;
    或者,or,
    所述样本获取模块用于从数据库获取排列于预设位置处的所述目标批量训练样本,其中,所述数据库存储的训练样本是乱序排列的。The sample acquisition module is configured to acquire the target batch training samples arranged at preset positions from a database, wherein the training samples stored in the database are arranged in disorder.
  10. 根据权利要求8或9所述的工作节点,其特征在于,所述目标统计参数包括所述目标批量训练样本的统计值和所述目标批量训练样本的样本数量。The worker node according to claim 8 or 9, wherein the target statistical parameters include the statistical value of the target batch training samples and the number of samples of the target batch training samples.
  11. 根据权利要求8或9所述的工作节点,其特征在于,所述工作节点还包括:The working node according to claim 8 or 9, wherein the working node further comprises:
    发送模块,用于将在当前工作节点对各个层训练获得的单体目标模型参数发送至所述参数服务器;和a sending module, configured to send the single target model parameters obtained by training each layer at the current working node to the parameter server; and
    模型接收模块,用于接收所述参数服务器发送的更新后的目标训练模型,作为一下个训练周期的目标训练模型。The model receiving module is configured to receive the updated target training model sent by the parameter server as the target training model for the next training cycle.
  12. 一种参数服务器,其特征在于,包括:A parameter server, characterized in that it includes:
    参数接收模块,用于接收多个工作节点发送的针对目标训练模型中的目标层的目标统计参数,所述目标层的目标统计参数包括发送该目标统计参数的工作节点在与当前计算周期对应的训练周期中训练该目标层所用到的目标批量训练样本的统计参数,其中,多个所述工作节点属于同一工作网络,所述目标训练模型包括至少一个目标层;The parameter receiving module is used to receive the target statistical parameters for the target layer in the target training model sent by the plurality of working nodes, and the target statistical parameters of the target layer include the working node that sends the target statistical parameters in the current calculation cycle. Statistical parameters of the target batch training samples used in training the target layer in the training cycle, wherein a plurality of the working nodes belong to the same working network, and the target training model includes at least one target layer;
    更新模块,根据接收到的所述目标统计参数计算所述目标训练模型中与当前计算周期对应的目标层的历史全局统计参数。The updating module calculates the historical global statistical parameters of the target layer corresponding to the current calculation period in the target training model according to the received target statistical parameters.
  13. 根据权利要求12所述的参数服务器,其特征在于,所述更新模块用于在接收到预设数量个第一工作节点发送的与当前计算周期对应的目标层的目标统计参数的情况下,基于在当前计算周期之前的计算周期中所获得的历史全局统计参数和所述预设数量个目标统计参数计算当前周期对应的目标层的历史全局统计参数,其中,所述预设数量小于或者等于所述工作网络中工作节点的总数量。The parameter server according to claim 12, wherein the updating module is configured to, in the case of receiving the target statistical parameters of the target layer corresponding to the current computing cycle sent by a preset number of first working nodes, based on The historical global statistical parameters of the target layer corresponding to the current period are calculated from the historical global statistical parameters obtained in the calculation period before the current calculation period and the preset number of target statistical parameters, wherein the preset number is less than or equal to the predetermined number of target layers. The total number of worker nodes in the described worker network.
  14. 根据权利要求12或13所述的参数服务器,其特征在于,所述参数接收模块还用于接收各个工作节点发送的对所述目标模型的各层进行训练获得的单体目标模型参数;The parameter server according to claim 12 or 13, wherein the parameter receiving module is further configured to receive the single target model parameters obtained by training each layer of the target model and sent by each working node;
    所述更新模块还用于根据接收到的单体目标模型参数对当前周期的目标模型进行更新。The updating module is further configured to update the target model of the current cycle according to the received monomer target model parameters.
  15. 一种电子设备,其特征在于,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-4中任一项所述的深度学习模型训练方法的步骤,或者所述程序或指令被所述处理器执行时实现如权利要求5至7中任意一项所述的深度学习模型训练方法的步骤。An electronic device, characterized in that it includes a processor, a memory, and a program or instruction stored on the memory and executable on the processor, the program or instruction being executed by the processor to achieve the right The steps of the deep learning model training method according to any one of claims 1-4, or when the program or instruction is executed by the processor, the deep learning model training according to any one of claims 5 to 7 is implemented steps of the method.
  16. 一种可读存储介质,其特征在于,所述可读存储介质上存储程序或指令,所述程序或指令 被处理器执行时实现如权利要求1-4中任一项所述的深度学习模型训练方法的步骤,或者所述程序或指令被处理器执行时实现如权利要求5至7中任意一项所述的深度学习模型训练方法的步骤。A readable storage medium, characterized in that a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the deep learning model according to any one of claims 1-4 is implemented The steps of the training method, or the steps of implementing the deep learning model training method according to any one of claims 5 to 7 when the program or instructions are executed by the processor.
PCT/CN2021/115544 2020-08-31 2021-08-31 Learning model training method, working node, server, device and medium WO2022042741A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010896348.9 2020-08-31
CN202010896348.9A CN112016699B (en) 2020-08-31 2020-08-31 Deep learning model training method, working node and parameter server

Publications (1)

Publication Number Publication Date
WO2022042741A1 true WO2022042741A1 (en) 2022-03-03

Family

ID=73503128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115544 WO2022042741A1 (en) 2020-08-31 2021-08-31 Learning model training method, working node, server, device and medium

Country Status (2)

Country Link
CN (1) CN112016699B (en)
WO (1) WO2022042741A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370471A (en) * 2023-12-07 2024-01-09 苏州元脑智能科技有限公司 Global prediction method, device, equipment and storage medium based on pruning average

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016699B (en) * 2020-08-31 2024-02-02 北京灵汐科技有限公司 Deep learning model training method, working node and parameter server
CN114004358B (en) * 2021-12-29 2022-06-14 粤港澳大湾区数字经济研究院(福田) Deep learning model training method
CN116663639B (en) * 2023-07-31 2023-11-03 浪潮电子信息产业股份有限公司 Gradient data synchronization method, system, device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
CN108122032A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of neural network model training method, device, chip and system
CN108491928A (en) * 2018-03-29 2018-09-04 腾讯科技(深圳)有限公司 Model parameter training method, device, server and storage medium
US20190026657A1 (en) * 2016-03-26 2019-01-24 Alibaba Group Holding Limited Distributed Cluster Training Method and Apparatus
CN109754060A (en) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 A kind of training method and device of neural network machine learning model
CN112016699A (en) * 2020-08-31 2020-12-01 北京灵汐科技有限公司 Deep learning model training method, working node and parameter server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026657A1 (en) * 2016-03-26 2019-01-24 Alibaba Group Holding Limited Distributed Cluster Training Method and Apparatus
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
CN108122032A (en) * 2016-11-29 2018-06-05 华为技术有限公司 A kind of neural network model training method, device, chip and system
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA
CN109754060A (en) * 2017-11-06 2019-05-14 阿里巴巴集团控股有限公司 A kind of training method and device of neural network machine learning model
CN108491928A (en) * 2018-03-29 2018-09-04 腾讯科技(深圳)有限公司 Model parameter training method, device, server and storage medium
CN112016699A (en) * 2020-08-31 2020-12-01 北京灵汐科技有限公司 Deep learning model training method, working node and parameter server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370471A (en) * 2023-12-07 2024-01-09 苏州元脑智能科技有限公司 Global prediction method, device, equipment and storage medium based on pruning average
CN117370471B (en) * 2023-12-07 2024-02-27 苏州元脑智能科技有限公司 Global prediction method, device, equipment and storage medium based on pruning average

Also Published As

Publication number Publication date
CN112016699B (en) 2024-02-02
CN112016699A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
WO2022042741A1 (en) Learning model training method, working node, server, device and medium
US11687832B1 (en) Training a model using parameter server shards
US10540587B2 (en) Parallelizing the training of convolutional neural networks
CN111091199B (en) Federal learning method, device and storage medium based on differential privacy
WO2021056390A1 (en) Synchronous training method and cluster for convolutional neural network model, and readable storage medium
CN108009642B (en) Distributed machine learning method and system
CN107944566B (en) Machine learning method, main node, working node and system
CN111030861B (en) Edge calculation distributed model training method, terminal and network side equipment
US20100217734A1 (en) Method and system for calculating value of website visitor
EP3688673A1 (en) Neural architecture search
CN109816412A (en) A kind of training pattern generation method, device, equipment and computer storage medium
CN112990478A (en) Federal learning data processing system
JPWO2021119601A5 (en)
CN114760308A (en) Edge calculation unloading method and device
GB2615219A (en) Remote system update and monitoring
CN115374954A (en) Model training method based on federal learning, terminal and storage medium
CN115796289A (en) Client selection method and system for federated Bayesian learning
CN116011991A (en) Multi-user collaborative task guaranteeing method based on agent and backup technology
CN115713128A (en) Federal learning method based on equipment training time fairness
CN115987443A (en) Timing method and device and instrument equipment
CN110753366A (en) Prediction processing method and device for industry short message gateway capacity
CN116341687A (en) Federal learning self-adaptive optimization method based on number of clients and communication period
US20230145177A1 (en) Federated learning method and federated learning system based on mediation process
CN112506673B (en) Intelligent edge calculation-oriented collaborative model training task configuration method
CN115422787B (en) Engine simulation model balancing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21860573

Country of ref document: EP

Kind code of ref document: A1