CN111461343B

CN111461343B - Model parameter updating method and related equipment thereof

Info

Publication number: CN111461343B
Application number: CN202010234711.0A
Authority: CN
Inventors: 吴志华; 于佃海; 程默; 汤伟; 马琳; 董大祥
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-27
Publication date: 2023-08-04
Anticipated expiration: 2040-03-27
Also published as: CN111461343A

Abstract

The application discloses a model parameter updating method and related equipment thereof, and relates to the technical field of deep learning. The specific implementation scheme is as follows: local increment information is obtained, and parameter snapshot information of the model is updated by combining the local increment information; reporting the local increment information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

Description

Model parameter updating method and related equipment thereof

Cross Reference to Related Applications

The application requires priority of Chinese patent application No. 202010179455.X, filed by Beijing Baidu network science and technology Co., ltd, at 13/2020, with application name "model parameter updating method and related equipment".

Technical Field

The application relates to the technical field of data processing, in particular to the technical field of deep learning, and especially relates to a model parameter updating method and related equipment thereof.

Background

The current distributed deep learning algorithm involves a parameter server node and a plurality of training nodes. The common distributed deep learning algorithm mainly comprises an EA-SGD (random gradient descent algorithm) algorithm and the like. In the algorithm, full training data are held on each training node, the depth model is trained, after training rounds are achieved, model parameter increment is sent to the parameter server node, and local parameters are updated according to the global parameter increment returned by the parameter server node until training is finished.

However, in the algorithm, during the process of communication between the training node and the parameter server node, the training process of the training node is stopped, the communication occupation is high, the convergence speed of the model is reduced, and the training speed of the model is reduced.

Disclosure of Invention

The application provides a model parameter updating method and related equipment, which ensure the parallel of training and communication by combining parameter snapshot information, reduce the communication duty ratio and reduce the training speed reduction caused by communication. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

An embodiment of a first aspect of the present application provides a method for updating model parameters, including: acquiring local increment information, and updating the parameter snapshot information of the model by combining the local increment information; reporting the local increment information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In one embodiment of the present application, the obtaining the local delta information includes: and determining parameter difference information according to the local parameter information and the parameter snapshot information of the model, and determining the local increment information according to the parameter difference information and preset weights.

In one embodiment of the present application, the weight is the inverse of the total number of training nodes in the distributed training system that perform the model training.

In one embodiment of the present application, the local delta information includes: training the identification of the node, wherein the local parameters to be reported and the corresponding local increment in the model; the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which the values change in the sparse local parameters.

In one embodiment of the present application, the updating the local parameter information and the parameter snapshot information of the model according to the global parameter information includes: determining global increment information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

In one embodiment of the present application, updating the local parameter information of the model according to the global increment information includes: for each local parameter of the model, acquiring a global increment value corresponding to the local parameter in the global increment information; and adding the global increment value and the current value of the local parameter, and taking the added result as the current value of the local parameter.

According to the model parameter updating method, the parameter snapshot information of the model is updated by acquiring local increment information and combining the local increment information; reporting the local increment information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. According to the method, through combining the parameter snapshot information, the parallelism of training and communication is ensured, the communication duty ratio is reduced, and the training speed reduction caused by communication is reduced. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

An embodiment of a second aspect of the present application proposes another method for updating model parameters, including: receiving incremental information reported by at least one training node in a distributed training system; updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node.

In one embodiment of the present application, the local incremental information reported by the training node is obtained by determining parameter difference information according to local parameter information and parameter snapshot information of the model on the training node; and determining the local increment information according to the parameter difference information and the preset weight.

In one embodiment of the present application, the local delta information includes: training the identification of the node, training the local parameters to be reported of the model on the node and the corresponding local incremental information; the local parameters to be reported are as follows: training dense local parameters in the model on the nodes, and parameters in which the numerical value changes in the sparse local parameters.

In an embodiment of the present application, the updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information includes: inquiring local increment information reported by the at least one training node aiming at each global parameter in the global parameter information to obtain at least one local increment corresponding to the global parameter; and adding the values of the at least one local increment and the global parameter to obtain the updated global parameter.

According to the model parameter updating method, local increment information reported by at least one training node in a distributed training system is received; updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node. According to the method, through combining the parameter snapshot information, the parallelism of training and communication is ensured, the communication duty ratio is reduced, and the training speed reduction caused by communication is reduced. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

An embodiment of a third aspect of the present application proposes a training node, comprising: the acquisition module is used for acquiring local increment information and updating the parameter snapshot information of the model by combining the local increment information; the reporting module is used for reporting the local increment information to a parameter server node in the distributed training system and receiving global parameter information returned by the parameter server node; and the updating module is used for updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In one embodiment of the present application, the obtaining module is specifically configured to determine parameter difference information according to local parameter information and parameter snapshot information of the model; and determining the local increment information according to the parameter difference information and the preset weight.

In one embodiment of the present application, the update module is specifically configured to determine global incremental information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

In an embodiment of the present application, the updating module is specifically configured to obtain, for each local parameter of the model, a global increment value corresponding to the local parameter in the global increment information, add the global increment value to a current value of the local parameter, and use an addition result as the current value of the local parameter.

An embodiment of a fourth aspect of the present application proposes a parameter server node, including: the receiving module is used for receiving the local increment information reported by at least one training node in the distributed training system; the updating module is used for updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information; and the returning module is used for returning the updated global parameter information to the training node.

In one embodiment of the present application, the local delta information includes: training the identification of the node, training the local parameters to be reported of the model on the node and the corresponding local increment; the local parameters to be reported are dense local parameters in the model on the training node and parameters with the numerical value changed in the sparse local parameters.

In an embodiment of the present application, the update module is specifically configured to query, for each global parameter in the global parameter information, local increment information reported by the at least one training node, and obtain at least one local increment corresponding to the global parameter; and adding the values of the at least one local increment and the global parameter to obtain the updated global parameter.

Embodiments of a fifth aspect of the present application provide a distributed training system, including: a parameter server node and a plurality of training nodes; the parameter server node is connected with each training node in the plurality of training nodes; each of the plurality of training nodes is configured to perform a method according to an embodiment of the first aspect of the present application; the parameter server node is configured to perform a method according to an embodiment of the second aspect of the present application.

An embodiment of a sixth aspect of the present application proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model parameter updating method as described above.

Embodiments of a seventh aspect of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the model parameter updating method as described above.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic diagram according to a third embodiment of the present application;

FIG. 4 is a schematic flow diagram of an algorithm according to an embodiment of the present application;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present application;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present application;

FIG. 7 is a schematic diagram according to a sixth embodiment of the present application;

FIG. 8 is a schematic diagram according to a seventh embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a model parameter updating method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The model parameter updating method and the related device according to the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

As shown in fig. 1, the specific implementation process of the model parameter updating method is as follows:

and step 101, obtaining local increment information, and updating the parameter snapshot information of the model by combining the local increment information.

In the embodiment of the application, the training node can acquire the local increment information and update the parameter snapshot information of the model according to the local increment information. It should be noted that, the local increment information may include, but is not limited to, a local increment corresponding to each local parameter, and the local increment information may be obtained in a manner described in the following embodiments; each training node may store one parameter snapshot information, each parameter snapshot information may include snapshot values of a plurality of parameters. Wherein, before training, the training node may set the initial value of the snapshot value of each parameter to 0. For example, the parameter snapshot information may include snapshot values corresponding to respective local parameters. As one example, global delta information may be determined from a comparison of parameter snapshot information with global parameter information. In the embodiment of the application, in order to enable the parameter snapshot information and the global parameter information to keep the same change trend, local increment information can be acquired on a training node, the parameter snapshot information of a corresponding model is updated according to the local increment information, and the local increment information is reported.

Step 102, reporting the local increment information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node.

In the embodiment of the application, when the parameter snapshot information of the model is updated by combining the local increment information, the training node can report the local increment information to a parameter server node in the distributed training system. The parameter server node can generate global parameter information according to the local increment information reported by each training node, and takes the global parameter information as return information corresponding to each training node, and returns the return information to the corresponding training node, so that the training node can receive the global parameter information returned by the parameter server node.

And step 103, updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

Optionally, global increment information is determined according to the global parameter information and the parameter snapshot information, local parameter information of the model is updated according to the global increment information, and the global parameter information is determined to be the parameter snapshot information of the model.

In order to ensure that the training nodes do not fall into the local optima of the model, the overall convergence direction remains consistent. In the embodiment of the application, when the parameter server node returns global parameter information, the training node compares the global parameter information with the parameter snapshot information, and the comparison result is used as global increment information. And then updating the current value of each local parameter obtained again according to the global increment information, preparing for the next communication, and updating the parameter snapshot information. For example, for each local parameter, the current value of the local parameter retrieved is added to the global increment, and the result is added as the current value of the local parameter.

In conclusion, by combining the parameter snapshot information, the parallelism of training and communication is ensured, the communication duty ratio is reduced, and the training speed reduction caused by communication is reduced. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

In addition, in the embodiment of the application, the model can be used for image recognition, image comparison, text semantic recognition and the like. For example, when the model is used for image recognition, if the embodiment shown in fig. 1 is used for updating the model parameters in the model training process, the accuracy of the model obtained by training can be improved and the number of images required by training can be reduced because the model is locally optimized; and the use of parameter snapshot information and increment update can realize the parallel performance of parameter communication and model training, improve the training speed of the model, obtain a model with high accuracy as soon as possible for image recognition, improve the efficiency of image recognition and reduce the cost of image recognition.

For example, when the model is used for text semantic recognition and keywords in the text are acquired, if the embodiment shown in fig. 1 is adopted to update model parameters in the model training process, the accuracy of the model obtained by training can be improved and the number of the texts and the text labeling cost required by training can be reduced because the model is locally optimized; the parameter snapshot information and the incremental update are used, so that parameter communication and model training can be performed in parallel, the training speed of the model is improved, the model with high accuracy can be obtained as soon as possible for text semantic recognition, the efficiency of text semantic recognition is improved, and the cost of text semantic recognition is reduced.

As shown in fig. 2, fig. 2 is a schematic diagram according to a second embodiment of the present application, and a specific implementation procedure of the model parameter updating method is as follows:

step 201, in the process of training the model by using training data, judging whether a preset communication condition is satisfied.

In the embodiment of the application, the training node trains the model by adopting training data, and in the training process, judges whether the preset communication condition is met in real time until the preset training ending condition is met. Wherein, it should be noted that the training data of the training node may include, but is not limited to, a plurality of small batch sample data; the communication condition may be that the batch number of the trained small batch sample data is greater than a preset batch number threshold; the preset training ending condition may be that the model reaches the expected training result. The communication condition may be, for example, reaching a predetermined communication time point.

In addition, it is easy to understand that the training node adopts training data to train the model, and the training data of the training node needs to be obtained in advance. Thus, training data assigned to a model may be acquired in advance before training the model with the training data. Optionally, the training data is full training data, or the total number of training nodes performing model training in the distributed training system and the full training data are combined, and the segmentation training data is distributed to the training nodes. For example, the whole training data is segmented according to the total number to obtain a plurality of segmented training data, and each training node is assigned with one segmented training data. Therefore, after the training nodes train a plurality of small batches of samples locally, the training nodes interact with the parameter server, so that the communication duty ratio is reduced.

Step 202, when the preset communication condition is met, local increment information is obtained, and parameter snapshot information of the model is updated by combining the local increment information, so that training is continued.

In the embodiment of the application, when the communication round is reached on the training node, that is, when the batch number of the trained small batch sample data is greater than the preset batch number threshold, local increment information can be obtained, and the parameter snapshot information of the model is updated according to the local increment information, so that the model continues to train. It should be noted that, the local increment information may include, but is not limited to: training the identification of the node, and reporting the local parameters and the corresponding local increment in the model. The local parameters to be reported may include, but are not limited to, dense local parameters in the model, and parameters with a change in the values in the sparse local parameters.

Alternatively, as shown in fig. 3, fig. 3 is a schematic diagram according to a third embodiment of the present application. The specific implementation process for determining the local increment information is as follows:

step 301, obtaining local parameter information and parameter snapshot information of the model, and determining parameter difference information.

The local parameter information may include: each local parameter of the model and corresponding local parameter information. The parameter snapshot information may include: each local parameter and corresponding snapshot value.

Step 302, determining local increment information according to the parameter difference information and the preset weight.

In the embodiment of the application, the weight may be the inverse of the total number of training nodes performing model training in the distributed training system.

In embodiments of the present application, the local delta information may include, but is not limited to: training the identification of the node, and reporting local parameters and corresponding local increments in the model; the local parameters to be reported can be dense local parameters in the model and parameters with numerical values changed in the sparse local parameters.

For example, assume that the current value of the local parameter to be reported is x _i The corresponding snapshot value is x _old If the total number N of training nodes trained by the model is equal to the local increment delta corresponding to the local parameter to be reported, the local increment delta is (x _i -x _old ) N. The corresponding snapshot values are then incrementally updated in a manner similar to voting. E.g. x _old +=Δ; and simultaneously, reporting the local increment delta corresponding to the local parameter to be reported to a parameter server node in the distributed training system.

And 203, reporting the local increment information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node.

In this embodiment of the present application, when the parameter server node only receives the local increment information reported by the training node, the parameter server node may determine a global parameter corresponding to the local parameter to be reportedAnd updating to obtain the updated global parameters. For example, a->

In this embodiment of the present application, when the parameter server node receives the local increment information reported by multiple training nodes, the parameter server node needs to be targetedAnd updating global parameters corresponding to the local parameters to be reported according to local increment corresponding to the local parameters to be reported of a plurality of training nodes to obtain updated global parameters. For example, the number of the cells to be processed,wherein->Representing global parameters, n being the number of training nodes received by the parameter server node, delta _n And representing the local increment corresponding to the local parameter information to be reported of the nth training node.

Thus, the parameter snapshot information and the global parameter information can keep the same change trend.

In this embodiment of the present application, the parameter server node may generate updated global parameter information according to the local increment information reported by each training node, and use the updated global parameter information as the return information corresponding to each training node, and return the return information to the corresponding training node. So that the training node can receive the global parameter information returned by the parameter server node.

And step 204, updating the local parameter information and the parameter snapshot information of the model according to the global parameter information.

In the embodiment of the application, for each local parameter to be reported in the model, determining a global increment of the local parameter to be reported according to a global parameter and a snapshot value corresponding to the local parameter to be reported, updating the current value of the local parameter to be reported, which is obtained again, according to the global increment, and determining the global parameter as the snapshot value corresponding to the local parameter to be reported. See in particular the description of step 103.

It should be understood that, because the global parameter information and the parameter snapshot information keep the same variation trend, the global increment information is the parameter increment information generated by other training nodes after the local training nodes exclude the influence of the local training nodes during the operation.

In order to better illustrate the above embodiments, an example will now be described. As shown in fig. 4, fig. 4 is a schematic flow diagram of an algorithm according to one embodiment of the present application. As can be derived from fig. 4, the training threads on the training nodes are mutually independent of the communication threads, each training node stores parameter snapshot information so as to maintain independent parameter snapshot information, and the training threads are not blocked when communicating with the parameter server node; the communication duty ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the current value of the local parameter to be reported, and the increment of the global parameter value and the snapshot value is added with the current value of the local parameter to be reported aiming at each local parameter to be reported, so that the local optimization of the local parameter to be reported can be reserved, the beneficial information explored by the training node is reserved, the model is updated to fully balance the correct global direction and the local optimum, and the convergence rate of the model is further improved.

According to the model parameter updating method, the parameter snapshot information of the model is updated by acquiring local increment information and combining the local increment information; reporting the local increment information to a parameter server node in the distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. According to the method, the training threads and the communication threads are mutually independent, each training node stores parameter snapshot information so as to maintain independent parameter snapshot information, and the training threads are not blocked when the training nodes communicate with the parameter server nodes; the communication duty ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the local parameters to be reported, the increment of the global parameter value and the snapshot value is added with the current value of the local parameters to be reported aiming at each local parameter to be reported, the local optimization of the local parameters to be reported can be reserved, the beneficial information explored by the training node is reserved, the model is updated fully, the balance of the global direction correctness and the local optimum is fully considered, and the convergence rate of the model is further improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 5, another model parameter updating method is specifically implemented as follows:

step 501, receiving local incremental information reported by at least one training node in a distributed training system.

In the embodiment of the application, a training node for performing model training in a distributed training system reports local increment information to a parameter server node, and the parameter server node can receive the local increment information reported by the training node for performing model training in the distributed training system, wherein the number of the training nodes for performing model training in the distributed training system is at least one; the local delta information may include, but is not limited to: training the identification of the node, and reporting the local parameters and the corresponding local increment in the model. The local parameters to be reported can be dense local parameters in the model on the training node and parameters with the numerical value changed in the sparse local parameters.

Optionally, determining parameter difference information according to local parameter information and parameter snapshot information of the model on the training node; and determining local increment information according to the parameter difference information and preset weights. See in particular the description of the embodiment shown in fig. 3.

Step 502, updating global parameter information of the model according to the local increment information reported by at least one training node, and obtaining updated global parameter information.

As an example, for each global parameter in the global parameter information, querying local increment information reported by at least one training node to obtain at least one local increment corresponding to the global parameter; and adding the values of at least one local increment and the global parameter to obtain the updated global parameter.

For example, a training node in the distributed training system reports local increment information delta corresponding to the local parameter to be reported to the distributed trainingParameter server node in system, and global parameter information corresponding to local parameter to be reported by parameter server nodeAnd updating to obtain updated global parameter information. For example, when the number of training nodes on which information is reported is 1,/is>Therefore, the snapshot value corresponding to the local parameter to be reported and the value of the global parameter corresponding to the local parameter to be reported can keep the same change trend.

And step 503, returning the updated global parameter information to the training node.

It will be appreciated that in the first implementation scenario, updated global parameter information corresponding to all parameters may be returned to the training node. In a second implementation scenario, the local parameters to be reported may be dense local parameters in the model on the training node, as well as parameters in which the values in the sparse local parameters change. Therefore, the parameter server node in the distributed training system can acquire the updated global parameter corresponding to each local parameter to be reported, and return the updated global parameter corresponding to the local parameter to be reported to the training node corresponding to the local parameter to be reported.

According to the model parameter updating method, local increment information reported by at least one training node in a distributed training system is received; updating global parameter information of the model according to the local increment information reported by at least one training node to obtain updated global parameter information; and returning the updated global parameter information to the training node. According to the method, training threads and communication threads are mutually independent, each training node stores parameter snapshot information so as to maintain independent parameter snapshot information, and the training threads are not blocked when the training nodes communicate with a parameter server node; the communication duty ratio can be further reduced, the training speed reduction caused by communication is reduced, and the training time of each round is reduced. Meanwhile, the global parameter information of the parameter server node does not cover the local parameter to be reported, but adds the increment of the global parameter value and the snapshot value corresponding to the local parameter to be reported and the local parameter to be reported, so that the local optimization of the local parameter to be reported can be reserved, the beneficial information explored by the training node is reserved, the model is updated fully, the balance of the correct global direction and the local optimum is fully considered, and the convergence rate of the model is further improved.

In order to implement the embodiments described in fig. 1 to fig. 4, the embodiments of the present application further propose a training node.

Fig. 6 is a schematic diagram according to a fifth embodiment of the present application. As shown in fig. 6, the training node 600 includes: an acquisition module 610, a reporting module 620, and an updating module 630.

The obtaining module 610 is configured to obtain local incremental information, and update parameter snapshot information of the model in combination with the local incremental information; the reporting module 620 is configured to report the local incremental information to a parameter server node in the distributed training system, and receive global parameter information returned by the parameter server node; and an updating module 630, configured to update the local parameter information and the parameter snapshot information of the model according to the global parameter information.

As a possible implementation manner of the embodiment of the present application, the obtaining module 610 is specifically configured to determine parameter difference information according to local parameter information and parameter snapshot information of the model; and determining the local increment information according to the parameter difference information and the preset weight.

As one possible implementation manner of the embodiment of the present application, the weight is the inverse of the total number of training nodes performing the model training in the distributed training system.

As one possible implementation manner of the embodiment of the present application, the local increment information includes: training the identification of the node, wherein the local parameters to be reported and the corresponding local increment in the model; the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which the values change in the sparse local parameters.

As a possible implementation manner of the embodiment of the present application, the update module 630 is specifically configured to determine global incremental information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

As a possible implementation manner of the embodiment of the present application, the update module 630 is specifically configured to obtain, for each local parameter of the model, a global increment value corresponding to the local parameter in the global increment information; and adding the global increment value and the current value of the local parameter, and taking the added result as the current value of the local parameter.

According to the training node, the local increment information is acquired, and the parameter snapshot information of the model is updated by combining the local increment information; reporting the local increment information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node; and updating the local parameter information and the parameter snapshot information of the model according to the global parameter information. The training node ensures the parallel of training and communication by combining the parameter snapshot information, reduces the communication duty ratio and reduces the training speed reduction caused by communication. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

In order to implement the embodiment illustrated in fig. 5, the embodiment of the present application further proposes a parameter server node.

Fig. 7 is a schematic diagram according to a sixth embodiment of the present application. As shown in fig. 7, the parameter server node 700 includes: a receiving module 710, an updating module 720, a returning module 730.

The receiving module 710 is configured to receive local incremental information reported by at least one training node in the distributed training system; the updating module 720 is configured to update global parameter information of the model according to the local increment information reported by the at least one training node, so as to obtain updated global parameter information; and a returning module 730, configured to return the updated global parameter information to the training node.

As one possible implementation manner of the embodiment of the application, the local increment information reported by the training node is obtained by determining parameter difference information according to the local parameter information and the parameter snapshot information of the model on the training node; and determining local increment information according to the parameter difference information and preset weights.

As one possible implementation manner of the embodiment of the present application, the local increment information includes: training the identification of the node, training the local parameters to be reported of the model on the node and the corresponding local increment; the local parameters to be reported are dense local parameters in the model on the training node and parameters with numerical value changed in the sparse local parameters.

As one possible implementation manner of the embodiment of the present application, the update module 720 is specifically configured to query, for each global parameter in the global parameter information, local increment information reported by at least one training node, and obtain at least one local increment corresponding to the global parameter; and adding the values of at least one local increment and the global parameter to obtain the updated global parameter.

The parameter server node receives local increment information reported by at least one training node in the distributed training system; updating global parameter information of the model according to the local increment information reported by at least one training node to obtain updated global parameter information; and returning the updated parameter information to the training node. The parameter server node ensures the parallel of training and communication by combining the parameter snapshot information, reduces the communication duty ratio and reduces the training speed reduction caused by communication. Meanwhile, the incremental information is combined for updating, the local optimization of parameters is reserved, the beneficial information explored by the training nodes is reserved, and the convergence rate of the model is further improved.

In order to implement the above embodiments, the embodiments of the present application further provide a distributed training system.

Fig. 8 is a schematic diagram according to a seventh embodiment of the present application. As shown in fig. 8, the distributed training system 800 includes: a parameter server node 810, a plurality of training nodes 820.

Wherein the parameter server node 810 is connected to each of the plurality of training nodes 820; each of the plurality of training nodes 820 is configured to perform the model parameter updating method described in fig. 1-4; the parameter server node 810 is configured to perform the model parameter updating method described in fig. 5.

In order to achieve the above embodiments, the embodiments of the present application further provide an electronic device.

As shown in fig. 9, a block diagram of an electronic device according to a model parameter updating method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 9, the electronic device includes: one or more processors 1001, memory 1002, and interfaces for connecting the components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 1001 is illustrated in fig. 9.

Memory 1002 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the model parameter updating methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the model parameter updating method provided by the present application.

The memory 1002 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the model parameter updating method in the embodiments of the present application (e.g., the acquisition module 610, the reporting module 620, the updating module 630 shown in fig. 6, the receiving module 710, the updating module 720, and the returning module 730 shown in fig. 7). The processor 1001 executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 1002, that is, implements the model parameter updating method in the above-described method embodiment.

Memory 1002 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device updated with the model parameters, and the like. In addition, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 1002 may optionally include memory remotely located with respect to processor 1001, which may be connected to the model parameter updating electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the model parameter updating method may further include: an input device 1003 and an output device 1004. The processor 1001, memory 1002, input device 1003, and output device 1004 may be connected by a bus or other means, for example by a bus connection in fig. 9.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for model parameter updating, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output means 1004 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for updating model parameters, comprising:

acquiring local increment information, and updating parameter snapshot information of the model by combining the local increment information, wherein the parameter snapshot information comprises snapshot values corresponding to local parameters;

reporting the local increment information to a parameter server node in a distributed training system, and receiving global parameter information returned by the parameter server node;

determining global increment information according to the global parameter information and the parameter snapshot information;

and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

2. The method of claim 1, wherein the obtaining local delta information comprises:

Determining parameter difference information according to the local parameter information and the parameter snapshot information of the model;

and determining the local increment information according to the parameter difference information and the preset weight.

3. The method of claim 2, wherein the weight is the inverse of the total number of training nodes in the distributed training system that perform the model training.

4. The method according to claim 1 or 2, wherein the local delta information comprises: training the identification of the node, wherein the local parameters to be reported and the corresponding local increment in the model;

the local parameters to be reported are as follows: dense local parameters in the model, and parameters in which the values change in the sparse local parameters.

5. The method of claim 1, wherein updating the local parameter information of the model based on the global delta information comprises:

for each local parameter of the model, acquiring a global increment value corresponding to the local parameter in the global increment information;

and adding the global increment value and the current value of the local parameter, and taking the added result as the current value of the local parameter.

6. A method for updating model parameters, comprising:

receiving local increment information reported by at least one training node in a distributed training system;

updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information;

returning the updated global parameter information to the training node;

the training node determines global increment information according to the global parameter information and the parameter snapshot information, updates local parameter information of the model according to the global increment information, and determines the global parameter information as the parameter snapshot information of the model, wherein the parameter snapshot information comprises snapshot values corresponding to local parameters.

7. The method of claim 6, wherein the local delta information reported by the training node is obtained by,

determining parameter difference information according to the local parameter information and the parameter snapshot information of the model on the training node;

8. The method of claim 6, wherein the local delta information comprises: training the identification of the node, training the local parameters to be reported of the model on the node and the corresponding local increment;

The local parameters to be reported are as follows: training dense local parameters in the model on the nodes, and parameters in which the numerical value changes in the sparse local parameters.

9. The method of claim 6, wherein updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information comprises:

inquiring local increment information reported by the at least one training node aiming at each global parameter in the global parameter information to obtain at least one local increment corresponding to the global parameter;

and adding the values of the at least one local increment and the global parameter to obtain the updated global parameter.

10. A training node, comprising:

the acquisition module is used for acquiring local increment information, and updating parameter snapshot information of the model by combining the local increment information, wherein the parameter snapshot information comprises snapshot values corresponding to local parameters;

the reporting module is used for reporting the local increment information to a parameter server node in the distributed training system and receiving global parameter information returned by the parameter server node;

The updating module is used for determining global increment information according to the global parameter information and the parameter snapshot information; and updating the local parameter information of the model according to the global increment information, and determining the global parameter information as the parameter snapshot information of the model.

11. The training node of claim 10, wherein the acquisition module is configured to,

12. The training node of claim 11, wherein the weight is the inverse of the total number of training nodes in the distributed training system that perform the model training.

13. Training node according to claim 10 or 11, characterized in that the local delta information comprises: training the identification of the node, wherein the local parameters to be reported and the corresponding local increment in the model;

14. The training node according to claim 10, wherein the updating module is specifically configured to obtain, for each local parameter of the model, a global increment value corresponding to the local parameter in the global increment information;

15. A parameter server node, comprising:

the receiving module is used for receiving the local increment information reported by at least one training node in the distributed training system;

the updating module is used for updating the global parameter information of the model according to the local increment information reported by the at least one training node to obtain updated global parameter information;

the returning module is used for returning the updated global parameter information to the training node;

16. The parameter server node of claim 15, wherein the local delta information reported by the training node is obtained by,

17. The parameter server node of claim 15, wherein the local delta information comprises: training the identification of the node, training the local parameters to be reported of the model on the node and the corresponding local increment;

18. The parameter server node according to claim 15, wherein the update module is configured to,

19. A distributed training system, comprising:

a parameter server node and a plurality of training nodes;

the parameter server node is connected with each training node in the plurality of training nodes;

each of the plurality of training nodes for performing the method of any of claims 1-5;

the parameter server node for performing the method of any of claims 6-9.

20. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

21. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.