WO2020015734A1 - 更新参数的方法和装置 - Google Patents
更新参数的方法和装置 Download PDFInfo
- Publication number
- WO2020015734A1 WO2020015734A1 PCT/CN2019/096774 CN2019096774W WO2020015734A1 WO 2020015734 A1 WO2020015734 A1 WO 2020015734A1 CN 2019096774 W CN2019096774 W CN 2019096774W WO 2020015734 A1 WO2020015734 A1 WO 2020015734A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- training
- node
- batchsize
- samples
- training samples
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000012549 training Methods 0.000 claims abstract description 807
- 238000012545 processing Methods 0.000 claims description 90
- 230000008569 process Effects 0.000 claims description 50
- 238000004364 calculation method Methods 0.000 claims description 27
- 230000015654 memory Effects 0.000 description 16
- 238000012360 testing method Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- the present invention relates to the field of computer technology, and in particular, to a method and device for updating parameters.
- Deep learning has become a popular research direction due to its powerful representation and fitting capabilities. Deep learning models need to be trained before use.
- the training can use gradient descent.
- the training process can be as follows: input sample data into the model to be trained to obtain training values, and calculate the loss function loss according to the training values and the true values of the sample data. Calculate the gradient data of each parameter in the model to be trained according to the loss, calculate the update parameters corresponding to each parameter according to the gradient data, and update the parameters in the model to be trained. Repeat the above process until the loss is less than the preset loss threshold. End training.
- the model to be trained may be stored on each training node participating in the training, and the training node may be a terminal or a server.
- Multiple training nodes use the same amount of different sample data (which can be called BatchSize) to calculate the gradient data of each parameter separately, and then calculate the average of the gradient data of each parameter obtained by multiple training nodes, and then based on the average of the gradient data Determine the update parameters, and then each training node updates the parameters at the same time according to the update parameters. Then repeat the above process to continue training.
- This technique can be called a deep learning synchronous data parallel training technique.
- the computing capabilities of the training nodes are different. Therefore, when each of the above training nodes calculates gradient data based on the same BatchSize sample data at the same time, a training node with strong computing power will first complete the calculation and then wait for weak computing power In the case where the training node of the training node has completed the calculation and subsequent processing is performed, the computing resources of the training node will be wasted and the training efficiency will be reduced.
- a method for updating parameters includes:
- the performance parameters include at least one parameter of a central processing unit CPU model, a number of CPUs, a graphics processor GPU model, and a time spent processing a preset number of training samples.
- the number of samples determines the number of training samples that can be processed by each training node within a preset time period, as the BatchSize corresponding to each training node.
- the number of training samples that can be processed by each training node within a preset duration is determined according to the number of processing training samples within the unit duration, and as the BatchSize corresponding to each training node, includes :
- the number of training samples processed within the unit duration is determined as the number of training samples that each training node can process within a preset duration, as the BatchSize corresponding to each training node.
- the method further includes:
- the update parameters are sent to each training node.
- a method for updating parameters includes:
- the training model is processed.
- the training samples include sample input data and output reference data
- the performing training processing on a model to be trained according to the obtained training samples includes:
- an apparatus for updating parameters includes:
- An acquisition module for acquiring performance parameters of each training node
- a determining module configured to determine, according to the performance parameters of each training node, the number of training samples that each training node can process within a preset length of time, as the BatchSize corresponding to each training node;
- a sending module is configured to send the determined BatchSize to the corresponding training nodes respectively.
- the performance parameters include at least one parameter of a central processing unit CPU model, a number of CPUs, a graphics processor GPU model, and a time spent processing a preset number of training samples.
- the determining module is configured to:
- the number of samples determines the number of training samples that can be processed by each training node within a preset time period, as the BatchSize corresponding to each training node.
- the determining module is configured to:
- the number of training samples processed within the unit duration is determined as the number of training samples that each training node can process within a preset duration, as the BatchSize corresponding to each training node.
- the apparatus further includes:
- a receiving module configured to send the determined BatchSize to the corresponding training nodes, and then receive the gradient data sent by each training node;
- a calculation module for calculating an average value of the received gradient data
- the determining module is further configured to determine an update parameter of a model to be trained according to the average value
- the sending module is further configured to send the update parameter to each training node.
- an apparatus for updating parameters includes:
- a receiving module for receiving the BatchSize corresponding to the training node sent by the central node
- An acquisition module configured to acquire a corresponding number of training samples according to the BatchSize
- a training module is configured to perform training processing on a model to be trained according to the obtained training samples.
- the training samples include sample input data and output reference data
- the training module is configured to:
- a system for updating parameters includes a central node and a training node, where:
- the central node for performing the method according to the first aspect
- the training node is configured to execute the method described in the second aspect.
- a computer device includes a processor, a communication interface, a memory, and a communication bus.
- the processor, the communication interface, and the memory complete communication with each other through the bus.
- the memory is used to store a computer.
- a program ; a processor, configured to execute a program stored in a memory to implement the method steps described in any one of the first aspect or the second aspect.
- a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, or the code.
- the set or instruction set is loaded and executed by the processor to implement the method of updating parameters as described in the first or second aspect above.
- the center node determines the batch data batch size of each training node according to the performance parameters of each training node, and the training node obtains a corresponding number of training samples according to the batch size, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- FIG. 1 is a schematic flowchart of a device interaction for updating parameters according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart of a device interaction for updating parameters according to an embodiment of the present invention
- FIG. 3 is a flowchart of a method for updating parameters according to an embodiment of the present invention.
- FIG. 4 is a flowchart of a method for updating parameters according to an embodiment of the present invention.
- FIG. 5 is a flowchart of a method for updating parameters according to an embodiment of the present invention.
- FIG. 6 is a flowchart of a method for updating parameters according to an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of an apparatus for updating parameters according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of an apparatus for updating parameters according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of an apparatus for updating parameters according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of a central node according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of a training node according to an embodiment of the present invention.
- An embodiment of the present invention provides a method for updating parameters.
- the method may be implemented by a central node and a training node together.
- the central node may be a terminal or a server; the training node may be a terminal or a server.
- the training node and the central node are both servers to form a server group, and in another case, the training node and the central node are both terminals.
- the central node and the training node can be separate physical devices. As shown in Figure 1, the central node is only used for data interaction with the training node and does not itself participate in the process of training the model to be trained using the training samples; the central node and training
- the node can also be a virtual module built in a physical device.
- the central node can be integrated with the training node in a physical device. As shown in Figure 2, the physical device integrated by the central node and the training node must be integrated with other physical devices.
- the training nodes perform data interaction and also participate in the process of training the model to be trained using the training samples, which is not limited in the present invention.
- the central node may include components such as a processor, a memory, and a transceiver.
- the processor may be a CPU (Central Processing Unit, etc.), and can determine the batch size of the training node according to the performance parameters, obtain the performance parameters of each training node, and calculate the average value of the gradient data received.
- Memory which can be RAM (Random Access Memory), Flash (Flash memory), etc., can be used to store received data, data required for processing, data generated during processing, etc., such as each training
- the transceiver can be used for data transmission with the terminal or other servers.
- the transceiver can include antennas, matching circuits, modems, etc. .
- the central node may further include a screen, an image detection component, an audio output component, an audio input component, and the like.
- the screen can be used to display training results.
- the image detection means may be a camera or the like.
- Audio output components can be speakers, headphones, etc.
- the audio input means may be a microphone or the like.
- the training node may include components such as a processor, a memory, and a transceiver.
- the processor may be a CPU (Central Processing Unit), etc., and may be used to obtain training samples according to BatchSize, obtain output data corresponding to training samples according to training samples, calculate gradient data, and update training parameters according to update parameters. deal with.
- Memory which can be RAM (Random Access Memory), Flash (Flash), etc., can be used to store received data, data required for processing, data generated during processing, such as BatchSize, training Output data corresponding to samples, training samples, gradient data corresponding to the parameters to be trained, update parameters, etc.
- the transceiver can be used for data transmission with the terminal or other servers, for example, sending the gradient data corresponding to the parameters to be trained to the central node, and receiving the corresponding BatchSize sent by the central node.
- the transceiver can include an antenna, a matching circuit, and a modem.
- the terminal may further include a screen, an image detection component, an audio output component, an audio input component, and the like. The screen can be used to display training results and so on.
- the transceiver can be used for data transmission with other devices.
- the device list and control page received by the receiving server can include antennas, matching circuits, and modems.
- An embodiment of the present invention provides a method for updating parameters.
- the method is applied to a central node.
- a processing flow of the method may include the following steps:
- step 301 performance parameters of each training node are obtained.
- step 302 according to the performance parameters of each training node, the number of training samples that can be processed by each training node within a preset duration is determined as the BatchSize corresponding to each training node.
- step 303 the determined BatchSize is sent to the corresponding training nodes respectively.
- the central node determines the batch size of each training node according to the performance parameters of each training node, so that the training node obtains a corresponding number of training samples according to the batch size, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- An embodiment of the present invention provides a method for updating parameters.
- the method is applied to a training node.
- a processing flow of the method may include the following steps:
- step 401 a BatchSize corresponding to the training node sent by the central node is received.
- step 402 a corresponding number of training samples are obtained according to the BatchSize.
- step 403 a training process is performed on the model to be trained according to the obtained training samples.
- the training node obtains a BatchSize determined based on the computing performance of the training node, obtains a corresponding number of training samples according to the BatchSize, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- Many training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use a smaller number of training samples to calculate the gradient data at the same time. In this way, each training node consumes almost the same time, which can avoid training nodes with strong computing capabilities. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- An embodiment of the present invention provides a method for updating parameters.
- the method is applied to a central node and a training node.
- a processing flow of the method may include the following steps:
- step 501 the central node obtains performance parameters of each training node.
- the central node and the training node may be started first. After startup, the central node sends a performance parameter acquisition request to each training node.
- the training node that received the performance parameter acquisition request stores and parses the performance parameter acquisition request, and then obtains the pre-stored performance parameter and its own node identifier, and combines the performance parameter and The node identification is sent to the central node.
- the central node After receiving the performance parameters and node IDs sent by each training node, stores the performance parameters and node IDs of each training node and generates a node identification table for all training nodes participating in the training. This table includes Node flags and corresponding performance parameters.
- the above performance parameters include at least one parameter of a central processing unit CPU model, a number of CPUs, a graphics processor GPU model, and a time spent processing a preset number of training samples.
- the CPU model means that the CPU manufacturer will determine a model for the CPU product according to the market positioning of the CPU product for easy classification and management.
- the CPU model can be said to be an important identifier for distinguishing the performance of the CPU.
- a CPU model can represent a fixed number of CPU cores and a fixed core frequency.
- the time taken to process a preset number of training samples is a parameter obtained and stored in advance by the training node.
- This parameter may be obtained by a technician testing a training node according to a preset number of training samples in advance, and then recording and storing the time spent processing the preset number of training samples.
- the parameter may also be that after each training node is started, each training node automatically obtains a preset number of training samples for testing, and automatically records and stores the time consumed for processing the preset number of training samples.
- the above testing process may be: inputting a preset number of training sample sample data into the model to be trained to obtain output data, and determining each of the models to be trained according to the output data and the output reference data in the training sample.
- the statistical time is the total time from the input sample data to the gradient data.
- any solution that can acquire the time spent processing a preset number of training samples is acceptable, and the present invention is not limited thereto.
- the central node determines the number of training samples that can be processed by each training node within a preset length of time, as the BatchSize corresponding to each training node.
- the central node determines, according to the performance parameters of each training node, the number of training samples that each training node can process within a preset duration, as each training The BatchSize of the node. In this way, the number of training samples processed by each training node can be determined according to the computing performance of each training node. The number of training samples processed by the training node with stronger computing performance is higher, and the number of training samples processed by the training node with weaker computing performance is higher. The number of training samples processed twice is less.
- the time consumed by each training node to process the training samples is approximately the same, avoiding that the training nodes consume different time for each training, so that training nodes with strong computing performance need to wait for weak computing performance. After the training nodes have finished processing the training samples, they can proceed to the next operation to avoid wasting computing resources and further improve training efficiency.
- the central node may calculate the BatchSize of each training node according to the correspondence between the pre-stored performance parameters and the number of training samples processed in a unit duration.
- the corresponding processing steps may be as follows: Correspondence between the number of processing training samples in the duration and the performance parameters of each training node, determine the corresponding number of processing training samples in the unit time, and determine that each training node The number of training samples that can be processed within the duration is used as the BatchSize corresponding to each training node.
- BatchSize is used to instruct the training node to process the number of training samples simultaneously during one training process when training nodes are trained.
- the preset duration is equivalent to the time consumed by each training node to complete a training session using training samples.
- the following uses the performance parameters of one training node as an example for description.
- the central node obtains the correspondence between the pre-stored performance parameters and the number of processing training samples per unit time.
- the correspondence is the result of a large number of experiments performed by the technicians in advance, and is then stored in the central node in the form of a correspondence table.
- a technician can use different training models of CPU models to conduct experiments, use these training nodes to process training samples to obtain gradient data, and obtain the training samples that can be processed by the training nodes in a unit time after testing.
- Corresponding to the number of CPU models to get the correspondence table.
- the correspondence table between the performance parameters and the number of processing training samples within a unit length may be multiple This table includes at least the correspondence between the CPU model and the number of training samples processed in the unit time (as shown in Table 1 below), and the correspondence between the number of CPUs and the number of processing training samples in the unit time (as shown in Table 2 below) ) And the corresponding table of the GPU model and the number of training samples processed per unit time (as shown in Table 3 below). If the performance parameter is the time taken to process a preset number of training samples, the number of training samples processed per unit time can be calculated by an algorithm.
- the central node may determine the type of the performance parameter of the training node, then obtain the corresponding correspondence table, and query the correspondence table to query the number of training samples processed in the unit duration corresponding to the performance parameter of the training node. For example, if the central node determines the performance parameter of the training node and the CPU model is Intel i5, the central node obtains the correspondence table corresponding to Table 1 and queries the number of training samples processed per unit time corresponding to Intel i5 to determine the performance of the training node The number of training samples processed per unit time corresponding to the parameter is 380.
- the performance parameters of the training node may be a combination of multiple parameters among the above parameters.
- the number of processing training samples within a unit length corresponding to each parameter may be determined first, and then the determined multiple The average value of the number of processing training samples in each unit time is used as the number of processing training samples in the unit time of the training node.
- different training nodes can be selected for experiments based on a variety of parameters to construct a correspondence table. For example, different training nodes can be selected for the two parameters of CPU model and GPU model for testing. These training nodes are used to process training samples to obtain gradient data. After the experiment, it is obtained that the training samples that can be processed by the training nodes within a unit length of time. Number, corresponding to the CPU model and GPU model, to get the correspondence table.
- the number of training samples that the training node can process in a preset time is determined.
- the product of the number of training samples processed in a unit time and a preset value can be determined as the number of training samples that each training node can process in a preset time period as the BatchSize corresponding to each training node; or, the unit length
- the number of internally processed training samples is determined as the number of training samples that can be processed by each training node within a preset length of time, as the BatchSize corresponding to each training node.
- the training node is 380 and the unit duration is 10s, and the preset duration is set to 30s.
- the above steps use the central node to determine the batch size of a training node as an example.
- the central node can process the above-mentioned processing steps when determining its batch size.
- the present invention provides I won't go into details here.
- step 503 the central node sends the determined BatchSize to the corresponding training nodes.
- the central node after determining the BatchSize of each training node through the above steps, obtains the node ID of the training node and the BatchSize of the training node, and then obtains the node corresponding to the training node according to the node ID of the training node.
- the address according to the node address, sends to the training node the BatchSize corresponding to the training node.
- step 504 the training node receives the BatchSize corresponding to the training node sent by the central node.
- the training node receives the BatchSize sent by the central node. Then, the previously stored model to be trained is initialized according to the received BatchSize and the initial parameters of the model.
- the initial parameters of the model can be stored in the training node in advance, and when used, the training node can directly obtain it from its own memory.
- the initial model parameters may also be stored in the central node, and the training node sends an initial parameter acquisition request to the central node.
- the central node sends the pre-stored initial parameters of the model to the training node according to the initial parameter acquisition request sent by the training node, and the training node receives The initial parameters of the model are stored and obtained in this way. The invention does not limit this.
- step 505 according to the BatchSize, the training node obtains a corresponding number of training samples.
- the training node after receiving the BatchSize sent by the central node, the training node obtains a corresponding number of training samples according to the BatchSize, and starts processing the training samples for training.
- training samples can be any kind of training data set that is known, specifically which training data set is selected according to the needs of the user (that is, a technician), and all the training that participates in training the model using the training samples Nodes, the training samples used are not the same.
- the following provides several options for training nodes to obtain training samples:
- Training samples can be stored in the central node in advance.
- a training node needs to obtain training samples, it sends a training sample acquisition request to the central node.
- the training sample acquisition request carries the node ID of the training node and BatchSize, the central node.
- the training samples After receiving the training sample acquisition request, after obtaining a corresponding number of training samples according to the BatchSize, the training samples are sent to the training nodes according to the node identification.
- the central node may adopt an sequential allocation algorithm, and assign training samples to each training node in the order of the training sample set, or may use any random non-repeating allocation algorithm. Assign non-repeating training samples to each training node.
- the central node segments the stored training data set according to the number of training nodes, and each segment of the training data set corresponds to one training node.
- any scheme that allows the central node to allocate different training samples to each training node is acceptable, and the present invention does not limit this.
- the training samples can be stored in each training node in advance, and the training nodes can directly obtain a corresponding number of training samples in their respective memories according to the BatchSize.
- the training samples stored in each training node may be different parts of the same training data set, that is, the training samples stored in each training node are different.
- the same training data set is stored in each training node, but the training data sets in different training nodes carry different segmentation identifiers, and each training node can only obtain a part of the training data set distinguished by the segmentation identifiers. Training samples.
- any scheme that allows each training node to obtain different training samples is acceptable, and the present invention does not limit this.
- the training samples can be stored in an independent storage server in advance.
- a training node needs to obtain training samples, it sends a training sample acquisition request to the storage server.
- the training sample acquisition request carries the node ID and BatchSize of the training node.
- the storage server obtains a corresponding number of training samples according to the BatchSize, and then sends the training samples to the training node according to the node identifier.
- the processing speed and bandwidth of the storage server are higher than those of the central node. Therefore, the training efficiency of the scheme of storing training samples on the storage server is higher than that of the scheme of storing training samples on the central node.
- step 506 the training node performs training processing on the model to be trained according to the obtained training samples.
- the training node uses the obtained training samples to perform training processing on the model to be trained.
- the above training samples may include sample input data and output reference data.
- the processing steps for training the model to be trained by the training node and the central node may be as follows: In step 601, training The node inputs the sample input data in the obtained training sample into the model to be trained, and obtains the output data corresponding to the training sample. In step 602, according to the output reference data and output data in the training sample, the training node determines the model to be trained.
- the gradient data corresponding to each of the parameters to be trained in step 603, the training node sends the gradient data to the central node; in step 604, the central node receives the gradient data sent by the training node; in step 605, the central node calculates and receives The average value of the gradient data obtained; in step 606, the center node determines the update parameters of the model to be trained according to the average value; in step 607, the center node sends the update parameters to each training node; in step 608, The training node receives the update parameters sent by the central node, and according to the update parameters, the training node Each parameter to be trained in the model to be trained is updated with parameters.
- each training node obtains a corresponding number of training samples according to the corresponding BatchSize
- the interaction between a training node and a central node is taken as an example.
- the number of BatchSize is n, that is, the training node obtains n Training samples.
- the training node inputs the sample input data in the n training samples obtained into the model to be trained.
- the model to be trained processes the n sample input data at the same time and outputs n output data.
- each training node consumes a corresponding number of training samples according to the BatchSize and processes these training samples at the same time.
- the duration is the above-mentioned preset duration, that is, the time consumed by each training node is the same, which can avoid the waste of computing resources and further improve the training efficiency.
- the Loss corresponding to each of the n output data is calculated according to the n output data and the output reference data in the corresponding training sample and the calculation formula for calculating the loss function Loss.
- the output data and the output reference data may both be in the form of long vectors, and the vector lengths of the output data and the output reference data are the same.
- Loss can be calculated according to the calculation formula of the cross-entropy loss function, such as the following formula (1).
- y represents the output reference data
- p represents the output data
- N represents the output reference data or the vector length of the output data.
- n Loss After calculating the n Loss, calculate the average of the n Loss as the common Loss of this batch of training samples. Then, you can judge whether you need to continue training based on the common Loss.
- a feasible judgment scheme may be to determine whether the common Loss is less than a preset Loss threshold. If the common Loss is less than a preset Loss threshold, it means that the gap between the output data corresponding to the sample input data and the sample reference data is very small.
- Another feasible judgment scheme can be to determine whether the common Loss has converged, that is, whether the common Loss of this iteration and the common Loss of the previous iteration have changed. If there is no change, it means that the common Loss has converged. Continuing training cannot make the common Loss common. Loss becomes smaller, so stop training and end training.
- the training node can send a notification message to the central node to notify the central node that the current model meets the end of training conditions. After receiving the notification message, the central node can end the pairing. Model training.
- the common Loss is greater than or equal to a preset Loss threshold, or the common Loss does not converge, it means that continuing the training can reduce the common Loss and improve the accuracy of the model, so the training can be continued.
- the gradient data of each parameter to be trained is calculated.
- the calculation formula for calculating the gradient data can be shown as the following formula (2), and then the obtained gradient data and its own The node identification is sent to the central node.
- W i to be trained for the parameters of the current parameter values Loss common Loss
- ⁇ W i is the gradient data.
- the central node After the central node receives the gradient data and the node identifier sent by the training node, it compares the received at least one node identifier with the node identifier in the previously generated node identifier table. If the received node identifier is compared with the node in the node identifier table, The identification is the same, it means that all the training nodes participating in the model training have completed the processing of the training samples and returned the gradient data.
- the central node calculates the average of the gradient data corresponding to each of the parameters to be trained among all the gradient data received, and finally obtains the average of the gradient data of all the parameters to be trained. Then, the center node calculates an update parameter of each to-be-trained parameter according to an average value of gradient data of each to-be-trained parameter and a preset learning rate, and a calculation formula may be as the following formula (3).
- W i + 1 is an update parameter of the parameter to be trained
- ⁇ is a preset learning rate
- the central node After obtaining the updated parameters of each parameter to be trained, the central node sends the updated parameters to each training node separately. After receiving the updated parameters, the training node updates each to-be-trained parameter in the model to be trained with the received updated parameter. In this way, a parallel training of deep learning synchronous data is completed. After each training node updates the model parameters, it can continue to obtain the next set of BatchSize training samples, and perform the above training process again based on the newly obtained training samples.
- some training samples can be divided into test samples, and the model is tested using the test samples.
- the obtained test data is compared with the corresponding output reference data to determine whether the obtained test data is correct.
- the ratio of the correct test data to the incorrect test data in the calculated test data is the training accuracy.
- the test data is the output data obtained by inputting the sample input data in the test sample into the model.
- the training accuracy indicates the accuracy of a model output. When the training accuracy reaches the preset training accuracy threshold, it indicates that the model is sufficiently accurate and no further training is required. Therefore, when the training accuracy reaches the preset training accuracy threshold, the training is stopped and the training process ends. At this time, the training node may send a notification message to the center node to notify the center node that the current model meets the training end condition, and the center node may end the training of the model after receiving the notification message.
- the training node may send a notification message to the center node to notify the center node that the current model meets the training end condition, and the center node may end the training of the model after receiving the notification message.
- a threshold for the number of iterations can also be set in advance, and the training is stopped when the number of iterations for the training reaches the threshold for the number of iterations, and the training process ends.
- the training node may send a notification message to the center node to notify the center node that the current model meets the training end condition, and the center node may end the training of the model after receiving the notification message.
- any method that can effectively judge the stop of the training is not limited to the above-mentioned methods, and the present invention does not limit this.
- the above training process is repeated until the above conditions for stopping training are reached, and the training is stopped, and each training node gets the same, trained model.
- the above-mentioned central node may be a terminal or a server.
- the training node can be a terminal or a server.
- the central node and the training node can be separate physical devices.
- the central node is only used for data interaction with the training node and does not participate in the process of training the model to be trained using the training samples; the central node and the training node can also be in the entity.
- the virtual module established in the device, the central node and the training node can be integrated in a physical device, then the physical device integrated by the central node and the training node must both interact with the training nodes in other physical devices and participate in the use of training samples The process of training the model to be trained.
- the data communication between the physical device where the central node is located and the physical device where other training nodes are located can use a parallel communication protocol.
- the central node determines the batch size of each training node according to the performance parameters of each training node, and the training node obtains a corresponding number of training samples according to the batch size, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- Many training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- an embodiment of the present invention further provides a device for updating parameters.
- the device may be a central node in the foregoing embodiment. As shown in FIG. 7, the device includes an obtaining module 710, a determining module 720, and Sending module 730.
- the obtaining module 710 is configured to obtain performance parameters of each training node
- the determining module 720 is configured to determine, according to the performance parameters of each training node, the number of training samples that each training node can process within a preset length of time, as the BatchSize corresponding to each training node;
- the sending module 730 is configured to send the determined BatchSize to the corresponding training nodes, respectively.
- the performance parameters include at least one parameter of a central processing unit CPU model, a number of CPUs, a graphics processor GPU model, and a time spent processing a preset number of training samples.
- the determining module 720 is configured to:
- the number of samples determines the number of training samples that can be processed by each training node within a preset time period, as the BatchSize corresponding to each training node.
- the determining module 720 is configured to:
- the number of training samples processed within the unit duration is determined as the number of training samples that each training node can process within a preset duration, as the BatchSize corresponding to each training node.
- the apparatus further includes:
- the receiving module 740 is configured to send the determined BatchSize to the corresponding training nodes respectively, and then receive the gradient data sent by each training node;
- a calculation module 750 configured to calculate an average value of the received gradient data
- the determining module 720 is further configured to determine update parameters of a model to be trained according to the average value
- the sending module 730 is further configured to send the update parameter to each training node.
- the central node determines the number of batches of sample data of each training node according to the performance parameters of each training node, so that the training node obtains a corresponding number of training samples according to the BatchSize, and performs training processing on the model to be trained. .
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- an embodiment of the present invention further provides a device for updating parameters.
- the device may be a training node in the foregoing embodiment.
- the device includes a receiving module 910, an obtaining module 920, and Training module 930.
- the receiving module 910 is configured to receive a BatchSize corresponding to the training node sent by the central node;
- the obtaining module 920 is configured to obtain a corresponding number of training samples according to the BatchSize;
- the training module 930 is configured to perform training processing on a model to be trained according to the obtained training samples.
- the training samples include sample input data and output reference data
- the training module 930 is configured to:
- the training node obtains a BatchSize determined based on the computing performance of the training node, obtains a corresponding number of training samples according to the BatchSize, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- Many training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use a smaller number of training samples to calculate the gradient data at the same time. In this way, each training node consumes almost the same time, which can avoid training nodes with strong computing capabilities. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- an embodiment of the present invention further provides a system for updating parameters.
- the system includes a central node and a training node, where:
- the central node is used to obtain the performance parameters of each training node; and according to the performance parameters of each training node, the number of training samples that can be processed by each training node within a preset length of time is determined as all Describe the BatchSize corresponding to each training node; send the determined BatchSize to the corresponding training nodes respectively;
- the training node is configured to receive a BatchSize corresponding to the training node sent by the central node; obtain a corresponding number of training samples according to the BatchSize; and perform training processing on the model to be trained according to the obtained training samples.
- the central node determines the batch size of each training node according to the performance parameters of each training node, and the training node obtains a corresponding number of training samples according to the batch size, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- Many training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- the device for updating parameters provided in the foregoing embodiment only uses the above-mentioned division of function modules as an example to describe when updating the parameters.
- the above-mentioned functions may be allocated by different function modules as required. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the device for updating parameters and the method for updating parameters provided by the foregoing embodiments belong to the same concept. For specific implementation processes, refer to the method embodiments, and details are not described herein again.
- FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
- the computer device may be a central node in the foregoing embodiment.
- the computer device 1000 may have a relatively large difference due to different configurations or performance, and may include one or more processors (central processing units) 1001 and one or more memories 1002.
- the memory 1002 stores therein There are at least one instruction, which is loaded and executed by the processor 1001 to implement the following method steps for updating parameters:
- the at least one instruction is loaded and executed by the processor 1001 to implement the following method steps:
- the number of samples determines the number of training samples that can be processed by each training node within a preset time period, as the BatchSize corresponding to each training node.
- the at least one instruction is loaded and executed by the processor 1001 to implement the following method steps:
- the number of training samples processed within the unit duration is determined as the number of training samples that each training node can process within a preset duration, as the BatchSize corresponding to each training node.
- the at least one instruction is loaded and executed by the processor 1001 to implement the following method steps:
- the update parameters are sent to each training node.
- the central node determines the batch size of each training node according to the performance parameters of each training node, so that the training node obtains a corresponding number of training samples according to the batch size, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use fewer training samples to calculate the gradient data at the same time, so that each training node consumes almost the same time, which can avoid the occurrence of training nodes with strong computing power. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- FIG. 11 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
- the computer device may be a training node in the foregoing embodiment.
- the computer device 1100 may have a relatively large difference due to different configurations or performance, and may include one or more processors (central processing units) (CPU) 1101 and one or more memories 1102.
- the memory 1102 stores therein There are at least one instruction that is loaded and executed by the processor 1101 to implement the following method steps for updating parameters:
- the training model is processed.
- the at least one instruction is loaded and executed by the processor 1101 to implement the following method steps:
- the training node obtains a BatchSize determined based on the computing performance of the training node, obtains a corresponding number of training samples according to the BatchSize, and performs training processing on the model to be trained.
- the BatchSize of the training node is determined according to the computing performance of the training node.
- the BatchSize of the training node with the stronger computing performance is larger, and the BatchSize of the training node with the weaker computing performance is smaller.
- Many training samples are used to calculate the gradient data, and the training nodes with weaker computing performance use a smaller number of training samples to calculate the gradient data at the same time. In this way, each training node consumes almost the same time, which can avoid training nodes with strong computing capabilities. Completing the calculation, and then waiting for the training nodes with weak computing power to complete the calculation before performing subsequent processing, avoiding wasting the computing resources of the training nodes, thereby improving the training efficiency.
- a computer-readable storage medium stores at least one instruction, at least one program, code set, or instruction set, at least one instruction, at least one program, code set, or instruction set. Loaded and executed by the processor to implement the method of identifying action categories in the above embodiments.
- the computer-readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
- the program may be stored in a computer-readable storage medium.
- the storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Mobile Radio Communication Systems (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
CPU型号 | 单位时长处理训练样本的数目 |
Intel i3 | 300 |
Intel i5 | 380 |
…… | …… |
AMD Ryzen 7 2700 | 330 |
CPU内核数目 | 单位时长处理训练样本的数目 |
2 | 150 |
4 | 270 |
6 | 380 |
8 | 500 |
GPU型号 | 单位时长处理训练样本的数目 |
Intel(R)HD Graphics 630 | 350 |
AMD Radeon(TM)R9 270 | 480 |
…… | …… |
GeForce GTX 760 | 200 |
Claims (15)
- 一种更新参数的方法,其特征在于,所述方法包括:获取每个训练节点的性能参数;根据所述每个训练节点的性能参数,分别确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的样本数据批次数目BatchSize;将确定出的BatchSize,分别发送给对应的训练节点。
- 根据权利要求1所述的方法,其特征在于,所述性能参数包括中央处理器CPU型号、CPU个数、图形处理器GPU型号、处理预设数目个训练样本所耗时长中的至少一个参数。
- 根据权利要求1所述的方法,其特征在于,所述根据所述每个训练节点的性能参数,分别确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize,包括:根据预先存储的性能参数与单位时长内处理训练样本的数目的对应关系,以及所述每个训练节点的性能参数,确定对应的单位时长内处理训练样本的数目,根据所述单位时长内处理训练样本的数目,确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize。
- 根据权利要求3所述的方法,其特征在于,所述根据所述单位时长内处理训练样本的数目,确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize,包括:将所述单位时长内处理训练样本的数目与预设数值的乘积,确定为所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize;或者,将所述单位时长内处理训练样本的数目,确定为所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize。
- 根据权利要求1所述的方法,其特征在于,所述将确定出的BatchSize,分别发送给对应的训练节点之后,还包括:接收所述每个训练节点发送的梯度数据;计算接收到的梯度数据的平均值;根据所述平均值,确定待训练的模型的更新参数;将所述更新参数发送给每个训练节点。
- 一种更新参数的方法,其特征在于,所述方法包括:接收中心节点发送的本训练节点对应的BatchSize;根据所述BatchSize,获取对应数目的训练样本;根据获取的训练样本,对待训练的模型进行训练处理。
- 根据权利要求6所述的方法,其特征在于,所述训练样本包括样本输入数据和输出参考数据;所述根据获取的训练样本,对待训练的模型进行训练处理,包括:将获取的训练样本中的样本输入数据输入待训练的模型,得到所述训练样本对应的输出数据;根据所述训练样本中的输出参考数据和所述输出数据,确定所述待训练的模型中的每个待训练参数对应的梯度数据;将所述梯度数据发送给所述中心节点;接收所述中心节点发送的更新参数,根据所述更新参数,对所述待训练的模型中的每个待训练参数进行参数更新。
- 一种更新参数的装置,其特征在于,所述装置包括:获取模块,用于获取每个训练节点的性能参数;确定模块,用于根据所述每个训练节点的性能参数,分别确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的样本数据批次数目BatchSize;发送模块,用于将确定出的BatchSize,分别发送给对应的训练节点。
- 根据权利要求8所述的装置,其特征在于,所述性能参数包括中央处理器CPU型号、CPU个数、图形处理器GPU型号、处理预设数目个训练样本所耗时长中的至少一个参数。
- 根据权利要求8所述的装置,其特征在于,所述确定模块,用于:根据预先存储的性能参数与单位时长内处理训练样本的数目的对应关系,以及所述每个训练节点的性能参数,确定对应的单位时长内处理训练样本的数目,根据所述单位时长内处理训练样本的数目,确定所述每个训练节点在预设 时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize。
- 根据权利要求10所述的装置,其特征在于,所述确定模块,用于:将所述单位时长内处理训练样本的数目与预设数值的乘积,确定为所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize;或者,将所述单位时长内处理训练样本的数目,确定为所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的BatchSize。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:接收模块,用于将确定出的BatchSize,分别发送给对应的训练节点之后,接收所述每个训练节点发送的梯度数据;计算模块,用于计算接收到的梯度数据的平均值;所述确定模块,还用于根据所述平均值,确定待训练的模型的更新参数;所述发送模块,还用于将所述更新参数发送给每个训练节点。
- 一种更新参数的装置,其特征在于,所述装置包括:接收模块,用于接收中心节点发送的本训练节点对应的BatchSize;获取模块,用于根据所述BatchSize,获取对应数目的训练样本;训练模块,用于根据获取的训练样本,对待训练的模型进行训练处理。
- 根据权利要求13所述的装置,其特征在于,所述训练样本包括样本输入数据和输出参考数据;所述训练模块,用于:将获取的训练样本中的样本输入数据输入待训练的模型,得到所述训练样本对应的输出数据;根据所述训练样本中的输出参考数据和所述输出数据,确定所述待训练的模型中的每个待训练参数对应的梯度数据;将所述梯度数据发送给所述中心节点;接收所述中心节点发送的更新参数,根据所述更新参数,对所述待训练的模型中的每个待训练参数进行参数更新。
- 一种更新参数的系统,其特征在于,所述系统包括中心节点和训练节 点,其中:所述中心节点,用于获取每个训练节点的性能参数;根据所述每个训练节点的性能参数,分别确定所述每个训练节点在预设时长内能够处理的训练样本的数目,作为所述每个训练节点对应的样本数据批次数目BatchSize;将确定出的BatchSize,分别发送给对应的训练节点;所述训练节点,用于接收中心节点发送的本训练节点对应的BatchSize;根据所述BatchSize,获取对应数目的训练样本;根据获取的训练样本,对待训练的模型进行训练处理。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810803723.3A CN110737446B (zh) | 2018-07-20 | 2018-07-20 | 更新参数的方法和装置 |
CN201810803723.3 | 2018-07-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020015734A1 true WO2020015734A1 (zh) | 2020-01-23 |
Family
ID=69165013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/096774 WO2020015734A1 (zh) | 2018-07-20 | 2019-07-19 | 更新参数的方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110737446B (zh) |
WO (1) | WO2020015734A1 (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298569A (zh) * | 2010-06-24 | 2011-12-28 | 微软公司 | 在线学习算法的并行化 |
CN106203395A (zh) * | 2016-07-26 | 2016-12-07 | 厦门大学 | 基于多任务深度学习的人脸属性识别方法 |
CN107203809A (zh) * | 2017-04-20 | 2017-09-26 | 华中科技大学 | 一种基于Keras的深度学习自动化调参方法及系统 |
US20170308789A1 (en) * | 2014-09-12 | 2017-10-26 | Microsoft Technology Licensing, Llc | Computing system for training neural networks |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184367B (zh) * | 2014-06-09 | 2018-08-14 | 讯飞智元信息科技有限公司 | 深度神经网络的模型参数训练方法及系统 |
CN104035751B (zh) * | 2014-06-20 | 2016-10-12 | 深圳市腾讯计算机系统有限公司 | 基于多图形处理器的数据并行处理方法及装置 |
CN106033371B (zh) * | 2015-03-13 | 2019-06-21 | 杭州海康威视数字技术股份有限公司 | 一种视频分析任务的调度方法及系统 |
CN106991072B (zh) * | 2016-01-21 | 2022-12-06 | 杭州海康威视数字技术股份有限公司 | 在线自学习事件检测模型更新方法及装置 |
CN107622274B (zh) * | 2016-07-15 | 2020-06-02 | 北京市商汤科技开发有限公司 | 用于图像处理的神经网络训练方法、装置以及计算机设备 |
CN107665349B (zh) * | 2016-07-29 | 2020-12-04 | 腾讯科技(深圳)有限公司 | 一种分类模型中多个目标的训练方法和装置 |
CN106293942A (zh) * | 2016-08-10 | 2017-01-04 | 中国科学技术大学苏州研究院 | 基于多机多卡的神经网络负载均衡优化方法和系统 |
US20180144244A1 (en) * | 2016-11-23 | 2018-05-24 | Vital Images, Inc. | Distributed clinical workflow training of deep learning neural networks |
-
2018
- 2018-07-20 CN CN201810803723.3A patent/CN110737446B/zh active Active
-
2019
- 2019-07-19 WO PCT/CN2019/096774 patent/WO2020015734A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298569A (zh) * | 2010-06-24 | 2011-12-28 | 微软公司 | 在线学习算法的并行化 |
US20170308789A1 (en) * | 2014-09-12 | 2017-10-26 | Microsoft Technology Licensing, Llc | Computing system for training neural networks |
CN106203395A (zh) * | 2016-07-26 | 2016-12-07 | 厦门大学 | 基于多任务深度学习的人脸属性识别方法 |
CN107203809A (zh) * | 2017-04-20 | 2017-09-26 | 华中科技大学 | 一种基于Keras的深度学习自动化调参方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110737446B (zh) | 2021-10-12 |
CN110737446A (zh) | 2020-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112561078B (zh) | 分布式的模型训练方法及相关装置 | |
WO2021056390A1 (zh) | 卷积神经网络模型同步训练方法、集群及可读存储介质 | |
CN107885595B (zh) | 一种资源分配方法、相关设备及系统 | |
US11436050B2 (en) | Method, apparatus and computer program product for resource scheduling | |
WO2021088964A1 (zh) | 推理系统、推理方法、电子设备及计算机存储介质 | |
CN109660367B (zh) | 基于改进Raft算法的共识达成方法、装置与电子设备 | |
US8032900B2 (en) | Conducting client-server inter-process communication | |
CN106020984B (zh) | 电子设备中进程的创建方法及装置 | |
CN112862112A (zh) | 联邦学习方法、存储介质、终端、服务器、联邦学习系统 | |
CN111355814A (zh) | 一种负载均衡方法、装置及存储介质 | |
CN107544845B (zh) | Gpu资源调度方法及装置 | |
WO2020015734A1 (zh) | 更新参数的方法和装置 | |
WO2022110796A1 (zh) | 云服务请求响应方法及装置、电子设备和存储介质 | |
CN114579311B (zh) | 执行分布式计算任务的方法、装置、设备以及存储介质 | |
CN115563160A (zh) | 数据处理方法、装置、计算机设备和计算机可读存储介质 | |
JP2020003860A (ja) | 学習システム、処理装置、処理方法、およびプログラム | |
CN114021715A (zh) | 基于Tensorflow框架的深度学习训练方法 | |
Surya et al. | Dynamic resource allocation for distributed Tensorflow training in kubernetes cluster | |
CN115860114B (zh) | 深度学习模型的训练方法、装置、电子设备及存储介质 | |
CN112445698A (zh) | 虚拟服务节点性能测试方法、装置和计算机可读存储介质 | |
CN111553379B (zh) | 基于异步训练的图像数据处理方法和系统 | |
WO2022120993A1 (zh) | 在线场景的资源分配方法、装置及电子设备 | |
CN117519910B (zh) | 用于虚拟机的计算快速链接内存确定方法和装置 | |
CN114238004B (zh) | 互联电路的数据传输正确性检查方法及装置、电子设备 | |
WO2020062303A1 (zh) | 训练神经网络的方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19838506 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19838506 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.08.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19838506 Country of ref document: EP Kind code of ref document: A1 |