CN111461343A

CN111461343A - Model parameter updating method and related equipment thereof

Info

Publication number: CN111461343A
Application number: CN202010234711.0A
Authority: CN
Inventors: 吴志华; 于佃海; 程默; 汤伟; 马琳; 董大祥
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-27
Publication date: 2020-07-28
Anticipated expiration: 2040-03-27
Also published as: CN111461343B

Abstract

The application discloses a model parameter updating method and relevant equipment thereof, and relates to the technical field of deep learning. The specific implementation scheme is as follows: acquiring local incremental information, and updating the parameter snapshot information of the model by combining the local incremental information; reporting the local incremental information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

Description

Model parameter updating method and related equipment thereof

Cross Reference to Related Applications

The application requires the priority of the Chinese patent application No. 202010179455.X, filed on 13.03/2020/Baidu network science and technology ltd in the name of "model parameter updating method and related equipment".

Technical Field

The application relates to the technical field of data processing, in particular to the technical field of deep learning, and particularly relates to a model parameter updating method and related equipment.

Background

The current distributed deep learning algorithm relates to a parameter server node and a plurality of training nodes. The common distributed deep learning algorithm mainly comprises an EA-SGD (random gradient descent algorithm) algorithm and the like. In the algorithm, each training node holds the full amount of training data, a depth model is trained, model parameter increment is sent to a parameter server node after the training round is reached, and local parameters are updated according to the global parameter increment returned by the parameter server node until the training is finished.

However, in the above algorithm, in the process of communication between the training node and the parameter server node, the training process of the training node is stopped, the communication ratio is high, the convergence rate of the model is reduced, and the training rate of the model is reduced.

Disclosure of Invention

The application provides a model parameter updating method and related equipment thereof, which ensure the parallelism of training and communication by combining parameter snapshot information, reduce the communication ratio and reduce the training speed reduction caused by communication. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

An embodiment of a first aspect of the present application provides a model parameter updating method, including: acquiring local incremental information, and updating the parameter snapshot information of the model by combining the local incremental information; reporting the local incremental information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In an embodiment of the present application, the obtaining local incremental information includes: and determining parameter difference information according to the local parameter information and the parameter snapshot information of the model, and determining the local incremental information according to the parameter difference information and the preset weight.

In one embodiment of the present application, the weight is an inverse of a total number of training nodes in the distributed training system for the model training.

In one embodiment of the present application, the local incremental information includes: the identification of the training node, the local parameters to be reported in the model and the corresponding local increment; the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which numerical values vary among the sparse local parameters.

In an embodiment of the present application, the updating, according to the global parameter information, the local parameter information and the parameter snapshot information of the model includes: determining global incremental information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

In an embodiment of the present application, updating the local parameter information of the model according to the global incremental information includes: aiming at each local parameter of the model, acquiring a global increment numerical value corresponding to the local parameter in the global increment information; and adding the global increment numerical value and the current numerical value of the local parameter, and taking the addition result as the current numerical value of the local parameter.

According to the model parameter updating method, local incremental information is obtained, and parameter snapshot information of the model is updated by combining the local incremental information; reporting the local incremental information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. The method ensures the parallelism of training and communication by combining the parameter snapshot information, reduces the communication occupation ratio and reduces the training speed reduction caused by communication. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

The embodiment of the second aspect of the present application provides another model parameter updating method, including: receiving incremental information reported by at least one training node in a distributed training system; updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node.

In an embodiment of the present application, the local incremental information reported by the training node is obtained by determining parameter difference information according to the local parameter information and the parameter snapshot information of the model on the training node; and determining the local incremental information according to the parameter difference information and a preset weight.

In one embodiment of the present application, the local incremental information includes: the method comprises the steps of identifying a training node, and reporting local parameters and corresponding local incremental information of a model on the training node; the local parameters to be reported are as follows: and training dense local parameters in the model on the node and parameters with changed numerical values in the sparse local parameters.

In an embodiment of the present application, the updating global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information includes: inquiring local increment information reported by the at least one training node aiming at each global parameter in the global parameter information to obtain at least one local increment corresponding to the global parameter; and adding the at least one local increment and the numerical value of the global parameter to obtain an updated global parameter.

The model parameter updating method of the embodiment of the application receives local incremental information reported by at least one training node in a distributed training system; updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node. The method ensures the parallelism of training and communication by combining the parameter snapshot information, reduces the communication occupation ratio and reduces the training speed reduction caused by communication. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

An embodiment of a third aspect of the present application provides a training node, including: the acquisition module is used for acquiring local incremental information and updating the parameter snapshot information of the model by combining the local incremental information; the reporting module is used for reporting the local incremental information to a parameter server node in a distributed training system and receiving global parameter information returned by the parameter server node; and the updating module is used for updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In an embodiment of the present application, the obtaining module is specifically configured to determine parameter difference information according to local parameter information of the model and parameter snapshot information; and determining the local incremental information according to the parameter difference information and a preset weight.

In an embodiment of the present application, the update module is specifically configured to determine global incremental information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

In an embodiment of the application, the updating module is specifically configured to, for each local parameter of the model, obtain a global increment value corresponding to the local parameter in the global increment information, add the global increment value to the current value of the local parameter, and take an addition result as the current value of the local parameter.

An embodiment of a fourth aspect of the present application provides a parameter server node, including: the receiving module is used for receiving local incremental information reported by at least one training node in the distributed training system; the updating module is used for updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information; and the return module is used for returning the updated global parameter information to the training node.

In one embodiment of the present application, the local incremental information includes: identification of a training node, local parameters to be reported of a model on the training node and corresponding local increments; the local parameters to be reported are dense local parameters in a model on the training node and parameters with changed values in the sparse local parameters.

In an embodiment of the present application, the updating module is specifically configured to, for each global parameter in the global parameter information, query local increment information reported by the at least one training node, and obtain at least one local increment corresponding to the global parameter; and adding the at least one local increment and the numerical value of the global parameter to obtain an updated global parameter.

An embodiment of a fifth aspect of the present application provides a distributed training system, including: a parameter server node, and a plurality of training nodes; the parameter server node is connected with each training node in the plurality of training nodes; each of the plurality of training nodes is configured to perform the method according to the first aspect of the present application; the parameter server node is configured to perform the method according to the embodiment of the second aspect of the present application.

An embodiment of a sixth aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model parameter update method as described above.

An embodiment of a seventh aspect of the present application proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the model parameter updating method as described above.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic flow chart of an algorithm according to an embodiment of the present application;

FIG. 5 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 6 is a schematic illustration according to a fifth embodiment of the present application;

FIG. 7 is a schematic illustration according to a sixth embodiment of the present application;

FIG. 8 is a schematic illustration according to a seventh embodiment of the present application;

FIG. 9 is a block diagram of an electronic device for implementing a model parameter update method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The model parameter updating method and the related device according to the embodiment of the present application are described below with reference to the drawings.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

As shown in fig. 1, the model parameter updating method is implemented as follows:

step 101, obtaining local incremental information, and updating the parameter snapshot information of the model by combining the local incremental information.

In the embodiment of the application, the training node can acquire the local incremental information and update the parameter snapshot information of the model according to the local incremental information. It should be noted that the local increment information may include, but is not limited to, a local increment corresponding to each local parameter, and the obtaining manner of the local increment information may be described in the following embodiments; each training node may store a parameter snapshot information, and each parameter snapshot information may include snapshot values for a plurality of parameters. Before training, the initial value of the snapshot value of each parameter may be 0. For example, the parameter snapshot information may include snapshot values corresponding to respective local parameters. As an example, global incremental information may be determined based on a comparison of the parameter snapshot information and the global parameter information. In the embodiment of the application, in order to keep the parameter snapshot information and the global parameter information in the same change trend, local incremental information may be obtained on the training node, the parameter snapshot information of the corresponding model is updated according to the local incremental information, and the local incremental information is reported.

And 102, reporting the local incremental information to a parameter server node in the distributed training system, and receiving the global parameter information returned by the parameter server node.

In the embodiment of the present application, when the parameter snapshot information of the model is updated in combination with the local incremental information, the training node may report the local incremental information to the parameter server node in the distributed training system. The parameter server node can generate global parameter information according to the local incremental information reported by each training node, the global parameter information is used as return information corresponding to each training node, and the return information is returned to the corresponding training node, so that the training node can receive the global parameter information returned by the parameter server node.

And 103, updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

Optionally, global incremental information is determined according to the global parameter information and the parameter snapshot information, local parameter information of the model is updated according to the global incremental information, and the global parameter information is determined as the parameter snapshot information of the model.

In order to ensure that the training nodes do not get into the local optimum of the model, the overall convergence direction is kept consistent. In the embodiment of the application, when the parameter server node returns the global parameter information, the training node compares the global parameter information with the parameter snapshot information, and a comparison result is used as global incremental information. And then, updating the current numerical value of each local parameter obtained again according to the global increment information, and meanwhile, preparing for next communication and updating the parameter snapshot information. For example, for each local parameter, adding the retrieved current value of the local parameter to the global increment, and taking the addition result as the current value of the local parameter.

In conclusion, by combining the parameter snapshot information, the parallelism of training and communication is ensured, the communication occupation ratio is reduced, and the training speed reduction caused by communication is reduced. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

In addition, in the embodiment of the application, the model can be used for image recognition, image comparison, text semantic recognition and the like. For example, when the model is used for image recognition, if the model parameters are updated in the model training process by using the embodiment shown in fig. 1, the accuracy of the model obtained by training can be improved and the number of images required by training can be reduced because the model retains local optimization; and the parameter snapshot information and the incremental updating are used, so that the parallel implementation of parameter communication and model training can be realized, the training speed of the model is improved, the model with high accuracy can be obtained as soon as possible for image recognition, the image recognition efficiency is improved, and the image recognition cost is reduced.

For example, when the model is used for semantic recognition of a text to obtain keywords in the text, if the embodiment shown in fig. 1 is used to update the model parameters in the model training process, the model retains local optimization, so that the accuracy of the trained model can be improved, and the text number and the text labeling cost required by training can be reduced; and the parameter snapshot information and the incremental updating are used, so that the parallel implementation of parameter communication and model training can be realized, the training speed of the model is improved, the model with high accuracy can be obtained as soon as possible and used for text semantic recognition, the efficiency of the text semantic recognition is improved, and the cost of the text semantic recognition is reduced.

As shown in fig. 2, fig. 2 is a schematic diagram according to a second embodiment of the present application, and a specific implementation process of the model parameter updating method is as follows:

step 201, in the process of training the model by using the training data, judging whether a preset communication condition is met.

In the embodiment of the application, the training node trains the model by adopting the training data, and judges whether the preset communication condition is met in real time in the training process until the preset training end condition is met. It should be noted that the training data of the training node may include, but is not limited to, a plurality of small batch sample data; the communication condition can be that the batch number of the small batch of sample data is larger than a preset batch number threshold value; the preset training end condition may be that the model achieves an expected training result. The communication condition may be, for example, that a preset communication time point is reached.

In addition, it is understood that training nodes use training data to train the model, and the training data of the training nodes needs to be acquired in advance. Therefore, before the model is trained using the training data, the training data assigned to the model may be acquired in advance. Optionally, the training data is full-scale training data, or segmented training data allocated to the training nodes in combination with the total number of training nodes performing model training in the distributed training system and the full-scale training data. For example, the full-scale training data is segmented according to the total number to obtain a plurality of segmented training data, and each training node is assigned with one segmented training data. Therefore, after the training nodes train a plurality of small-batch samples locally, the training nodes interact with the parameter server, and the communication ratio is reduced.

And step 202, when the preset communication condition is met, acquiring local incremental information, updating the parameter snapshot information of the model by combining the local incremental information, and continuing training.

In the embodiment of the application, when the communication turn is reached on the training node, that is, when the batch number of the small batch of sample data to be trained is greater than the preset batch number threshold, the local incremental information can be acquired, the parameter snapshot information of the model is updated according to the local incremental information, and the model is continuously trained. It should be noted that the local incremental information may include, but is not limited to: and the identification of the training node, the local parameters to be reported in the model and the corresponding local increment. The local parameters to be reported may include, but are not limited to, dense local parameters in the model, and parameters with changed values in the sparse local parameters.

Alternatively, as shown in fig. 3, fig. 3 is a schematic view according to a third embodiment of the present application. The specific implementation process for determining the local incremental information is as follows:

step 301, obtaining local parameter information and parameter snapshot information of the model, and determining parameter difference information.

The local parameter information may include: each local parameter of the model and corresponding local parameter information. The parameter snapshot information may include: each local parameter and corresponding snapshot value.

Step 302, determining local incremental information according to the parameter difference information and the preset weight.

In an embodiment of the present application, the weight may be an inverse of a total number of training nodes performing model training in the distributed training system.

In the embodiment of the present application, the local incremental information may include, but is not limited to: the identification of the training node, the local parameters to be reported in the model and the corresponding local increment; the local parameters to be reported may be dense local parameters in the model and parameters with changed values in the sparse local parameters.

For example, assume that the current value of the local parameter to be reported is x_iThe corresponding snapshot value is x_oldAnd if the total number of the training nodes for model training is N, the local increment delta corresponding to the local parameter to be reported is (x)_i-x_old) and/N. The corresponding snapshot values are then incrementally updated in a manner similar to the voting. Such as x_oldΔ + Δ; and simultaneously, reporting the local increment delta corresponding to the local parameter to be reported to a parameter server node in the distributed training system.

And 203, reporting the local incremental information to a parameter server node in the distributed training system, and receiving the global parameter information returned by the parameter server node.

In bookIn the application embodiment, when the parameter server node receives only the local incremental information reported by the training node, the parameter server node may obtain the global parameter corresponding to the local parameter to be reported

And updating to obtain the updated global parameters. Such as, for example,

in this embodiment of the application, when receiving local increment information reported by a plurality of training nodes, a parameter server node needs to update, for each local parameter to be reported, a global parameter corresponding to the local parameter to be reported according to a local increment corresponding to the local parameter to be reported of the plurality of training nodes, so as to obtain an updated global parameter. For example,

wherein

Representing a global parameter, n being the number of training nodes received by the parameter server node, Δ_nAnd representing a local increment corresponding to the local parameter information to be reported of the nth training node.

Thus, the parameter snapshot information and the global parameter information can keep the same change trend.

In the embodiment of the application, the parameter server node may generate updated global parameter information according to the local incremental information reported by each training node, use the updated global parameter information as the return information corresponding to each training node, and return the return information to the corresponding training node. Therefore, the training node can receive the global parameter information returned by the parameter server node.

And step 204, updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In the embodiment of the application, for each local parameter to be reported in the model, according to the global parameter and the snapshot value corresponding to the local parameter to be reported, the global increment of the local parameter to be reported is determined, the newly acquired current value of the local parameter to be reported is updated according to the global increment, and the global parameter is determined as the snapshot value corresponding to the local parameter to be reported. See the description of step 103 for details.

It should be understood that, because the global parameter information and the parameter snapshot information keep the same change trend, the global incremental information is the parameter incremental information generated by other training nodes after eliminating the influence of the local training node during the operation of the local training node.

To better illustrate the above embodiments, the description will now be given by way of example. As shown in fig. 4, fig. 4 is a schematic flow chart of an algorithm according to an embodiment of the present application. As can be seen from fig. 4, the training threads on the training nodes are independent from the communication thread, each training node stores parameter snapshot information to maintain the independent parameter snapshot information, and the training threads are not blocked when communicating with the parameter server node; the communication occupation ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the current numerical value of the local parameter to be reported, and for each local parameter to be reported, the increment of the global parameter numerical value and the snapshot numerical value is added to the current numerical value of the local parameter to be reported, so that the local optimization of the local parameter to be reported can be reserved, the beneficial information searched by the training node is reserved, the model is updated fully considering the balance of the global direction correctness and the local optimum, and the convergence speed of the model is further improved.

According to the model parameter updating method, the local incremental information is obtained, and the parameter snapshot information of the model is updated by combining the local incremental information; reporting the local incremental information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. The method is characterized in that a training thread and a communication thread are mutually independent, each training node stores parameter snapshot information so as to maintain the independent parameter snapshot information, and the training thread is not blocked when the training thread is communicated with a parameter server node; the communication occupation ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the local parameters to be reported, and for each local parameter to be reported, the increment of the global parameter value and the snapshot value is added to the current value of the local parameter to be reported, so that the local optimization of the local parameter to be reported can be reserved, the beneficial information searched by the training node is reserved, the updating of the model fully considers the balance between the global direction accuracy and the local optimum, and the convergence speed of the model is further improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 5, another model parameter updating method is specifically implemented as follows:

step 501, receiving local incremental information reported by at least one training node in a distributed training system.

In the embodiment of the present application, a training node performing model training in a distributed training system reports local incremental information to a parameter server node, and the parameter server node may receive the local incremental information reported by the training node performing model training in the distributed training system, where it is to be noted that the number of the training nodes performing model training in the distributed training system is at least one; the local delta information may include, but is not limited to: and the identification of the training node, the local parameters to be reported in the model and the corresponding local increment. The local parameters to be reported can be dense local parameters in a model on the training node and parameters with changed values in the sparse local parameters.

Optionally, determining parameter difference information according to local parameter information of the model on the training node and the parameter snapshot information; and determining local incremental information according to the parameter difference information and the preset weight. See in particular the description of the embodiment shown in fig. 3.

Step 502, updating the global parameter information of the model according to the local incremental information reported by at least one training node to obtain updated global parameter information.

As an example, for each global parameter in the global parameter information, local increment information reported by at least one training node is queried, and at least one local increment corresponding to the global parameter is obtained; and adding the at least one local increment and the numerical value of the global parameter to obtain the updated global parameter.

For example, a training node in the distributed training system reports local incremental information Δ corresponding to a local parameter to be reported to a parameter server node in the distributed training system, and the parameter server node reports global parameter information corresponding to the local parameter to be reported

And updating to obtain updated global parameter information. For example, when the number of training nodes reporting information is 1,

therefore, the snapshot value corresponding to the local parameter to be reported and the global parameter value corresponding to the local parameter to be reported can keep the same change trend.

Step 503, returning the updated global parameter information to the training node.

It can be understood that, in the first implementation scenario, updated global parameter information corresponding to all parameters may be returned to the training node. In a second implementation scenario, the local parameters to be reported may be dense local parameters in a model on a training node and parameters with changed values in the sparse local parameters. Therefore, the parameter server node in the distributed training system can obtain the updated global parameter corresponding to each local parameter to be reported, and return the updated global parameter corresponding to the local parameter to be reported to the training node corresponding to the local parameter to be reported.

The model parameter updating method of the embodiment of the application receives local incremental information reported by at least one training node in a distributed training system; updating the global parameter information of the model according to the local incremental information reported by at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node. The method has the advantages that the training threads and the communication threads are independent, each training node stores parameter snapshot information so as to maintain the independent parameter snapshot information, and the training threads are not blocked when communicating with the parameter server nodes; the communication occupation ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the local parameter to be reported, but the increment of the global parameter value corresponding to the local parameter to be reported and the snapshot value is added with the local parameter to be reported, so that the local optimization of the local parameter to be reported can be reserved, the beneficial information explored by the training node is reserved, the model is updated fully considering the balance between the global direction and the local optimum, and the convergence speed of the model is further improved.

In order to implement the embodiments described in fig. 1 to fig. 4, an embodiment of the present application further provides a training node.

Fig. 6 is a schematic diagram according to a fifth embodiment of the present application. As shown in fig. 6, the training node 600 includes: an acquisition module 610, a reporting module 620 and an updating module 630.

The obtaining module 610 is configured to obtain local incremental information, and update the parameter snapshot information of the model in combination with the local incremental information; a reporting module 620, configured to report the local incremental information to a parameter server node in the distributed training system, and receive global parameter information returned by the parameter server node; an updating module 630, configured to update the local parameter information and the parameter snapshot information of the model according to the global parameter information.

As a possible implementation manner of the embodiment of the present application, the obtaining module 610 is specifically configured to determine parameter difference information according to the local parameter information of the model and the parameter snapshot information; and determining the local incremental information according to the parameter difference information and a preset weight.

As a possible implementation manner of the embodiment of the present application, the weight is an inverse number of a total number of training nodes performing the model training in the distributed training system.

As a possible implementation manner of the embodiment of the present application, the local incremental information includes: the identification of the training node, the local parameters to be reported in the model and the corresponding local increment; the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which numerical values vary among the sparse local parameters.

As a possible implementation manner of the embodiment of the present application, the updating module 630 is specifically configured to determine global incremental information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

As a possible implementation manner of the embodiment of the present application, the updating module 630 is specifically configured to, for each local parameter of the model, obtain a global increment value corresponding to the local parameter in the global increment information; and adding the global increment numerical value and the current numerical value of the local parameter, and taking the addition result as the current numerical value of the local parameter.

According to the training node, local incremental information is obtained, and parameter snapshot information of the model is updated by combining the local incremental information; reporting the local incremental information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. The training node ensures the parallelism of training and communication by combining the parameter snapshot information, reduces the communication occupation ratio and reduces the training speed reduction caused by communication. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

In order to implement the embodiment described in fig. 5, an embodiment of the present application further provides a parameter server node.

Fig. 7 is a schematic diagram according to a sixth embodiment of the present application. As shown in fig. 7, the parameter server node 700 includes: a receiving module 710, an updating module 720, and a returning module 730.

The receiving module 710 is configured to receive local incremental information reported by at least one training node in the distributed training system; an updating module 720, configured to update the global parameter information of the model according to the local incremental information reported by the at least one training node, to obtain updated global parameter information; and a returning module 730, configured to return the updated global parameter information to the training node.

As a possible implementation manner of the embodiment of the present application, the local incremental information reported by the training node is obtained by determining parameter difference information according to the local parameter information and the parameter snapshot information of the model on the training node; and determining local incremental information according to the parameter difference information and the preset weight.

As a possible implementation manner of the embodiment of the present application, the local incremental information includes: identification of a training node, local parameters to be reported of a model on the training node and corresponding local increments; the local parameters to be reported are dense local parameters in the model on the training node and parameters with changed values in the sparse local parameters.

As a possible implementation manner of the embodiment of the present application, the updating module 720 is specifically configured to, for each global parameter in the global parameter information, query local increment information reported by at least one training node, and obtain at least one local increment corresponding to the global parameter; and adding the at least one local increment and the numerical value of the global parameter to obtain the updated global parameter.

The parameter server node receives local incremental information reported by at least one training node in a distributed training system; updating the global parameter information of the model according to the local incremental information reported by at least one training node to obtain updated global parameter information; and returning the updated parameter information to the training node. The parameter server node ensures the parallelism of training and communication by combining the parameter snapshot information, reduces the communication occupation ratio and reduces the training speed reduction caused by communication. Meanwhile, by combining incremental information updating, local optimization of parameters is reserved, beneficial information explored by training nodes is reserved, and the convergence speed of the model is further improved.

In order to implement the foregoing embodiments, the embodiments of the present application further provide a distributed training system.

Fig. 8 is a schematic diagram of a seventh embodiment according to the present application. As shown in fig. 8, the distributed training system 800 includes: parameter server node 810, a plurality of training nodes 820.

Wherein parameter server node 810 is connected to each of a plurality of training nodes 820; each of the plurality of training nodes 820 configured to perform the model parameter updating method described in fig. 1-4; and a parameter server node 810 for executing the model parameter updating method described in fig. 5.

In order to implement the above embodiments, an electronic device is further provided in the embodiments of the present application.

Fig. 9 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 9, the electronic apparatus includes: one or more processors 1001, memory 1002, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of one processor 1001.

The memory 1002 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the model parameter updating method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the model parameter update method provided herein.

The memory 1002, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the model parameter updating method in the embodiment of the present application (for example, the obtaining module 610, the reporting module 620, the updating module 630 shown in fig. 6, the receiving module 710, the updating module 720, and the returning module 730 shown in fig. 7). The processor 1001 executes various functional applications of the server and data processing, i.e., implements the model parameter updating method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 1002.

The memory 1002 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device updated according to the model parameters, and the like. Further, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1002 may optionally include memory located remotely from the processor 1001, which may be connected to a model parameter updating electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the model parameter updating method may further include: an input device 1003 and an output device 1004. The processor 1001, the memory 1002, the input device 1003, and the output device 1004 may be connected by a bus or other means, and the bus connection is exemplified in fig. 9.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for model parameter updates, such as a touch screen, keypad, mouse, track pad, touch pad, pointing stick, one or more mouse buttons, track ball, joystick, etc. the output device 1004 may include a display device, auxiliary lighting device (e.g., L ED), and tactile feedback device (e.g., vibrating motor), etc.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (P L D)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

The systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or L CD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer for providing interaction with the user.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., AN application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with AN implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for updating model parameters, comprising:

acquiring local incremental information, and updating the parameter snapshot information of the model by combining the local incremental information;

reporting the local incremental information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node;

and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

2. The method of claim 1, wherein obtaining local delta information comprises:

determining parameter difference information according to the local parameter information and the parameter snapshot information of the model;

and determining the local incremental information according to the parameter difference information and a preset weight.

3. The method of claim 2, wherein the weight is an inverse of a total number of training nodes in the distributed training system for which the model training is performed.

4. The method of claim 1 or 2, wherein the local delta information comprises: the identification of the training node, the local parameters to be reported in the model and the corresponding local increment;

the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which numerical values vary among the sparse local parameters.

5. The method of claim 1, wherein the updating the local parameter information and the parameter snapshot information of the model according to the global parameter information comprises:

determining global incremental information according to the global parameter information and the parameter snapshot information;

and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

6. The method of claim 5, wherein updating the local parameter information of the model based on the global delta information comprises:

aiming at each local parameter of the model, acquiring a global increment numerical value corresponding to the local parameter in the global increment information;

and adding the global increment numerical value and the current numerical value of the local parameter, and taking the addition result as the current numerical value of the local parameter.

7. A method for updating model parameters, comprising:

receiving local incremental information reported by at least one training node in a distributed training system;

updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information;

and returning the updated global parameter information to the training node.

8. The method of claim 7, wherein the local incremental information reported by the training nodes is obtained in a manner,

determining parameter difference information according to the local parameter information and the parameter snapshot information of the model on the training node;

9. The method of claim 7, wherein the local delta information comprises: identification of a training node, local parameters to be reported of a model on the training node and corresponding local increments;

the local parameters to be reported are as follows: and training dense local parameters in the model on the node and parameters with changed numerical values in the sparse local parameters.

10. The method of claim 7, wherein the updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information comprises:

inquiring local increment information reported by the at least one training node aiming at each global parameter in the global parameter information to obtain at least one local increment corresponding to the global parameter;

and adding the at least one local increment and the numerical value of the global parameter to obtain an updated global parameter.

11. A training node, comprising:

the acquisition module is used for acquiring local incremental information and updating the parameter snapshot information of the model by combining the local incremental information;

the reporting module is used for reporting the local incremental information to a parameter server node in a distributed training system and receiving global parameter information returned by the parameter server node;

and the updating module is used for updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

12. The training node of claim 11, wherein the acquisition module is specifically configured to,

13. The training node of claim 12, wherein the weight is an inverse of a total number of training nodes in the distributed training system that are performing the model training.

14. Training node according to claim 11 or 12, wherein the local incremental information comprises: the identification of the training node, the local parameters to be reported in the model and the corresponding local increment;

15. Training node according to claim 11, characterized in that the updating module is specifically adapted to,

16. The training node according to claim 15, wherein the updating module is specifically configured to, for each local parameter of the model, obtain a global increment value corresponding to the local parameter in the global increment information;

17. A parameter server node, comprising:

the receiving module is used for receiving local incremental information reported by at least one training node in the distributed training system;

the updating module is used for updating the global parameter information of the model according to the local incremental information reported by the at least one training node to obtain updated global parameter information;

and the return module is used for returning the updated global parameter information to the training node.

18. The parameter server node of claim 17, wherein the local incremental information reported by the training nodes is obtained in a manner,

19. The parameter server node of claim 17, wherein the local delta information comprises: identification of a training node, local parameters to be reported of a model on the training node and corresponding local increments;

20. The parameter server node of claim 17, wherein the update module is specifically configured to,

21. A distributed training system, comprising:

a parameter server node, and a plurality of training nodes;

the parameter server node is connected with each training node in the plurality of training nodes;

each training node of the plurality of training nodes to perform the method of any of claims 1-6;

the parameter server node for performing the method of any of claims 7-10.

22. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

23. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.