CN111814968B - Method and apparatus for distributed training of machine learning models - Google Patents

Method and apparatus for distributed training of machine learning models Download PDF

Info

Publication number
CN111814968B
CN111814968B CN202010957906.8A CN202010957906A CN111814968B CN 111814968 B CN111814968 B CN 111814968B CN 202010957906 A CN202010957906 A CN 202010957906A CN 111814968 B CN111814968 B CN 111814968B
Authority
CN
China
Prior art keywords
node
model information
current
nodes
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010957906.8A
Other languages
Chinese (zh)
Other versions
CN111814968A (en
Inventor
石红梅
廉相如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010957906.8A priority Critical patent/CN111814968B/en
Publication of CN111814968A publication Critical patent/CN111814968A/en
Application granted granted Critical
Publication of CN111814968B publication Critical patent/CN111814968B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The present disclosure relates to a method and apparatus for distributed training of machine learning models. The method comprises the following steps: acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; based on the sequence, acquiring a node from the N nodes as a target node of the current node in the current iteration step; communicating with a target node to acquire model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed. The convergence efficiency of distributed training can be ensured in the embodiment; meanwhile, the communication traffic of a single node can be reduced to S, and the effect of shortening the communication time is achieved; in addition, each node only shares information with respective target node, and load balance among the nodes can be guaranteed.

Description

Method and apparatus for distributed training of machine learning models
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a method and an apparatus for distributed training of a machine learning model.
Background
With the increasing scale of machine learning data and models, the storage and calculation capacity of a single card/single machine cannot well meet the training requirements of large data and large models, and distributed machine learning comes up. Currently, distributed machine learning in the related art:
taking the Ring-allreduce architecture as an example, N computing devices form a Ring, and each computing device is a work node (worker). In an iterative process, each node divides model information/gradient to be communicated into N parts, each worker completes training of training data (mini-batch) of the worker, calculates the gradient, transmits the gradient to the next worker in the ring, and simultaneously receives the gradient transmitted from the previous worker. For a ring containing N workers, after each worker needs to receive the gradients of other N-1 workers, gradient aggregation is carried out, and therefore model information is updated. Under the Ring-allow architecture, the communication traffic of a single node in N nodes is O (2S (N-1)/N), wherein S is the size of model information/gradient required to be communicated, N is the number of nodes, and when N is larger, the communication traffic is approximately equal to 2S.
Taking other decentralized algorithms as an example, each node only selects the left and right neighbor nodes or the random node to communicate, and the model information/gradient of the node is shared. At this time, the communication volume of a single node depends on the number of nodes selected for communication, and the information volume of other nodes shared by each node in each iteration is limited, so that the average log is needed2The information of all nodes can be shared only by N-N/2 times of iteration, so that the information sharing among the nodes is insufficient, and the overall convergence process of distributed training is slowed down.
Disclosure of Invention
The present disclosure provides a method and apparatus for distributed training of machine learning models to solve the problems in the related art.
The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a method for distributed training of a machine learning model, applied to each node of N nodes in a distributed system architecture, including:
acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N;
based on the sequence, acquiring one node from the N nodes as a target node of the current node in the current iteration step;
communicating with the target node to acquire the model information shared by the target node;
and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed.
Optionally, acquiring one node from the N nodes as a target node of the current node in the current iteration step includes:
acquiring the node spacing corresponding to the order of the current iteration step based on the corresponding relation between the preset order and the node spacing;
and selecting the node which is separated from the current node by the node distance as the target node according to the predetermined cyclic sequencing information of the N nodes.
Optionally, the circular ordering information includes: the cyclic direction of the N nodes and the arrangement sequence among the N nodes;
in the N nodes arranged in the arrangement order, along the circulation direction, a node distance between an mth node located after a current node and the current node is m, where m is a natural number.
Optionally, the method further comprises:
determining the preset number of the iteration steps according to the N;
generating a corresponding relation between an order and a node distance based on the preset number, wherein the node distance corresponding to the order i is 2iI has a value range of [0, log2N-1]。
Optionally, the method further comprises:
receiving the corresponding relation between the preset sequence and the node spacing, wherein the node spacing corresponding to the sequence i is 2iI has a value range of [0, log2N-1]。
Optionally, updating the machine learning model of the current node according to the model information of the current node and the model information of the target node, including:
acquiring a weight matrix and a bias matrix in the model information of the current node, and acquiring a weight matrix and a deflection matrix in the model information of the target node;
acquiring the average value of the weight matrix of the current node and the weight matrix of the target node, and acquiring the average value of the bias matrix of the current node and the bias matrix of the target node;
updating a weight matrix in the model information of the current node by using an average value of the weight matrix, and updating a bias matrix in the model information of the current node by using an average value of the bias matrix to update the model information of the current node.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for distributed training of a machine learning model, applied to each node of N nodes in a distributed system architecture, including: the device comprises a communication module, a machine learning module and a transceiving module;
the machine learning module is configured to execute the sequence of the current iteration step in a preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; and executing the step of obtaining one node from the N nodes as a target node of the current node in the current iteration step based on the sequence;
the communication module configured to perform communication with the target node;
the transceiver module is configured to execute the acquisition of the model information shared by the target nodes by using the communication module and send the model information to the machine learning module;
the machine learning module is further configured to update the machine learning model of the current node according to the model information of the current node and the model information of the target node, so that the machine learning model of each node synchronously obtains the model information of the N nodes after the preset number of iteration steps are completed.
Optionally, the machine learning module is further configured to perform:
acquiring the node spacing corresponding to the order of the current iteration step based on the corresponding relation between the preset order and the node spacing;
and selecting the node which is separated from the current node by the node distance as the target node according to the predetermined cyclic sequencing information of the N nodes.
Optionally, the circular ordering information includes: the cyclic direction of the N nodes and the arrangement sequence among the N nodes;
in the N nodes arranged in the arrangement order, along the circulation direction, a node distance between an mth node located after a current node and the current node is m, where m is a natural number.
Optionally, the machine learning module is further configured to perform:
determining the preset number of the iteration steps according to the N;
generating a corresponding relation between an order and a node distance based on the preset number, wherein the node distance corresponding to the order i is 2iI has a value range of [0, log2N-1]。
Optionally, the machine learning module is further configured to perform:
receiving the corresponding relation between the preset sequence and the node spacing, wherein the node spacing corresponding to the sequence i is 2iI has a value range of [0, log2N-1]。
Optionally, the machine learning module is further configured to perform: acquiring a weight matrix and a bias matrix in the model information of the current node, and acquiring a weight matrix and a deflection matrix in the model information of the target node;
acquiring the average value of the weight matrix of the current node and the weight matrix of the target node, and acquiring the average value of the bias matrix of the current node and the bias matrix of the target node;
updating a weight matrix in the model information of the current node by using an average value of the weight matrix, and updating a bias matrix in the model information of the current node by using an average value of the bias matrix to update the model information of the current node.
According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:
a processor;
a memory for storing a computer program executable by the processor;
wherein the processor is configured to execute the computer program in the memory to implement the steps of the method as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, in which an executable computer program is capable of implementing the steps of the method as described above when executed by a processor.
According to a fifth aspect of embodiments of the present disclosure, there is provided an application program, which, when executed by a processor of a server, enables the server to perform the steps of the method described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in this embodiment, the order of the current iteration step may be obtained in a preset number of iterations of distributed training, and the preset number is in a logarithmic relationship with the number N of nodes in the distributed system architecture; then, based on the sequence, acquiring a node from the N nodes as a target node of the current node in the current iteration step; then, communicating with the target node to acquire the model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed. Thus, the present embodiment can pass the preset number (log)2N) the iteration steps can realize information sharing among all nodes, and the convergence efficiency of distributed training can be ensured; meanwhile, a single node only communicates with one target node in each iteration step, the communication traffic can be reduced to S, and the effect of shortening the communication time is achieved; in addition, each node only shares information with respective target node, and load balance among the nodes can be guaranteed.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure. It will be apparent to those skilled in the art that other figures can be derived from these figures without inventive exercise.
FIG. 1 is a flow diagram illustrating a method for distributed training of a machine learning model, according to an example embodiment.
FIG. 2 is a flow diagram illustrating an acquisition of a target node in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating updating model information according to an example embodiment.
FIG. 4 is a diagram illustrating sharing of model information by 8 nodes, according to an example embodiment.
FIG. 5 is a block diagram illustrating an apparatus for distributed training of a machine learning model, according to an example embodiment.
FIG. 6 is a block diagram illustrating a server in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
Therefore, the method for distributed training of the machine learning model provided by the embodiments of the present disclosure may be applied to each node that adopts N nodes in a distributed system architecture, and each node may be implemented by using a server or a server cluster or other computing devices. The method has the inventive concept that through the corresponding relation between the preset sequence and the node spacing, the target node corresponding to the node spacing can be determined for each node in each iteration step, so that each node can communicate with the target node to obtain model information (such as model parameters and/or gradient values), and the effect of synchronously obtaining the model information of all nodes by the machine learning model in each node can be achieved through the preset number of iterations; and each node only communicates with the target node each time, so that the effect that the communication traffic of a single node is S is achieved, and the effects of reducing the communication traffic and reducing the iteration times are achieved.
FIG. 1 is a block diagram illustrating a method for distributed training of a machine learning model for use with nodes employing N nodes in a distributed system architecture, according to an example embodiment. Referring to fig. 1, a method for distributed training of a machine learning model includes steps 11-14:
in step 11, in a preset number of iteration steps of distributed training, obtaining the order of the current iteration step; the preset number is in logarithmic relation with the N.
In this step, the distributed training architecture includes N (positive integers greater than 1) nodes, a circular list may be formed by the N nodes in advance, and circular list information in the circular list may include a circular direction of the N nodes and an arrangement order between the N nodes. In the N nodes arranged in the arrangement order, along the cycle direction, the node distance between the mth node located after the current node and the current node is m, where m is a natural number. For convenience of description, in the following embodiments, one cycle of N nodes may be represented by "0, 1, 2, …, N-1".
In this step, each node is preset with an initial machine learning model, for example, a recurrent neural network, a convolutional network, a boltzmann machine, a limited boltzmann machine, a deep belief network, a generative countermeasure network, and the like, which can be selected according to a specific scenario. Then, each node can acquire different training data for training, and when a preset training end condition is reached, each node ends the training of the current iteration step. In this way, each node can complete one-time updating of model information for the preset machine learning model through the current iteration step.
In this step, the number of times of training for completing model information update each time in the distributed training process may be set to a preset number of times, that is, each node may share the model information of all nodes every time the training is performed for the preset number of times. The preset number of times and the node number N are in a logarithmic relation, namely the preset number of times C = log2N, such that the order of the iterative steps may be i =0, 1, 2, …, log, in order2N-1. Based on this, each node can acquire the order i of the current iteration step at each iteration step.
In step 12, based on the order, one node is obtained from the N nodes as a target node of the current node in the current iteration step.
In this step, each node may determine a corresponding target node according to the order of the current iteration step. It should be noted that in each iteration step, a single node has one and only one target node. The target node refers to the node with which the current node needs to communicate in the current iteration step to obtain the shared model information. Referring to fig. 2, the current node acquires a target node through steps 21 to 22:
in step 21, the current node may obtain a corresponding relationship between a preset order and a node distance. The corresponding relation may include node distances corresponding to respective orders, where the node distance corresponding to the node order i is 2iI has a value range of [0, log2N-1]That is, the node distance of the current node in each iteration step is [2 ] in sequence0,21,22, 2i,…,2M];M= log2N-1. Then, based on the correspondence between the order and the node distance, the current node may obtain the node distance corresponding to the order of the current iteration step.
It should be noted that, the correspondence between the current node acquisition order and the node distance may adopt the following manner: for example, the current node may determine a preset number of iteration steps from N; and then, based on the preset number, generating a corresponding relation between the sequence and the node distance of the current node. For another example, the correspondence relationship is generated by the server and sent to each node, so that each node can receive the correspondence relationship.
In step 22, the current node may select, according to the predetermined cyclic ranking information of the N nodes, a node that is separated from the current node by the node distance determined in step 11 as a target node. Continuing to take the above correspondence as an example, assuming that the current iteration step i takes the value 3, i.e. the 3 rd iteration step, the current spacing 2 can be obtained according to the above correspondence3-1=22And = 4. The target node corresponding to the node 0 in the 3 rd iteration step is: 0+ Current Pitch (2)3-1=22= 4) = 4. Due to the fact thatEach node moves the same current distance when determining the target node, so that in each iteration step, each node has communication with only one node (namely the target node), and the communication traffic of each communication is S (namely model information of a machine learning model), thereby ensuring load balance among the nodes in the distributed system; further, since the communication volume of each node is S, the communication time of each node can be kept the same or similar in the communication network under the same condition. Compared with a scene that the communication time of one iteration process is long due to overlarge communication traffic of a certain node in a distributed system in the related art, the embodiment can greatly shorten the communication time of each iteration process, and further shorten the time of a distributed training process.
In step 13, the current result is communicated with the target node to obtain the model information shared by the target node.
In step 14, the machine learning model of the current node is updated according to the model information of the current node and the model information of the target node, so that the machine learning models of the nodes synchronously obtain the model information of the N nodes after the preset number of iterations are completed.
In this step, the current node updates the machine learning model of the current node by using the acquired model information. Referring to fig. 3, in step 31, the current node may obtain model information of a machine learning model within itself, where the model information includes a weight matrix and a bias matrix; and obtaining model information of the target node, wherein the model information also comprises a weight matrix and a bias matrix. In step 32, the current node may then obtain an average of both its own weight matrix and the weight matrix of the target node, and obtain an average of both its own bias matrix and the bias matrix of the target node. It should be noted that the average value of the weight matrix is a matrix having the same dimension as the weight matrix, and the average value of the bias matrix is a matrix having the same dimension as the bias matrix. In step 33, the current node may update the weight matrix of the current node by using the average value of the weight matrix, and update the bias matrix of the current node by using the average value of the bias matrix, so as to update the model parameters of the machine learning model in the current node, that is, the machine learning model corresponding to the current iteration step may be obtained. And then, the current node can continue the next iteration step until the preset number of iteration steps are completed, so that the model information of the machine learning model in each node comprises the model information of the machine learning models in all nodes, and the distributed training process is completed.
When the machine learning model finishes training to achieve a usable effect, the multiple rounds of sharing model information are required, that is, the integral multiple rounds of steps 11 to 14 are repeated, that is, the number of times of model finishing training C1= x log2N and x are positive integers.
In this embodiment, in a preset number of iteration steps of distributed training, the order of the current iteration step may be obtained, and the preset number and the number N of nodes in the distributed system architecture are in a logarithmic relationship; then, based on the sequence, acquiring a node from the N nodes as a target node of the current node in the current iteration step; then, communicating with the target node to acquire the model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed. Thus, the present embodiment can pass the preset number (log)2N) the iteration steps can realize information sharing among all nodes, and the convergence efficiency of distributed training can be ensured; meanwhile, a single node only communicates with one target node in each iteration step, the communication traffic can be reduced to S, and the effect of shortening the communication time is achieved; in addition, each node only shares information with respective target node, and load balance among the nodes can be guaranteed. Furthermore, after the machine learning model shares model information through x rounds, the effects of reducing communication traffic and shortening training time in the updating process of model information of each round can be amplified, and the training efficiency is greatly improved.
In conjunction with the scenario shown in fig. 4, the working process of each node implementing the method for distributed training of the machine learning model is described in detail:
in this example, the distributed training system implements communication between nodes in a synchronous and decentralized manner. In this example, all nodes in the distributed training system may form a circular list in advance, and each node may establish communication with the target node in each iteration step according to the correspondence.
In each iteration step, each node selects a node (namely a target node) for communication, and the selection of the target node can be carried out according to the corresponding relation [2 ] between the sequence number of the iteration step and the node distance0,21,22,…,2M]And (6) selecting. Since each node communicates with only one target node in each iteration step, the communication volume is S. And moreover, load balance among the nodes can be ensured.
In each iteration step, the model information shared by the nodes is increased according to the node distance in the corresponding relationship, that is:
in the 0 th iteration step, the node i shares the model information of the target node i +1 corresponding to the current iteration step, at the moment, the target node i +1 only contains the model information of the node i, and at the moment, the node i shares the model information of 2 nodes;
in the 1 st iteration step, the node i shares the model information in the target node i +2 corresponding to the current iteration step, at this time, the model information also comprises the model information shared at the 0 th time besides the model information of the target node, and at this time, the node i shares the model information of 4 nodes;
in the iteration step 2, the node i shares the model information of the target node i +4 corresponding to the current iteration step, and the node i shares the model information of 8 nodes at the moment;
and so on until log2After N iterations, each node shares the model information to N nodes.
The distributed training system is provided with 8 nodes [0, 1, 2, 3, 4, 5, 6, 7 ]]For example, when M = log2N-1=2, so the target node of node 0 is [1, 2, 4 ] in order]Node of1 is [2, 3, 5 ] in sequence]The target node of the node 2 is [3, 4, 6 ] in sequence]The target node of the node 3 is [4, 5, 7 ] in sequence]The target node of the node 4 is [5, 6, 0 ] in sequence]The target node of the node 5 is [6, 7, 1 ] in sequence]The target node of the node 6 is [7, 0, 2 ] in sequence]The target node of the node 7 is [0, 1, 3 ] in sequence]. Referring to fig. 4, model information shared by each node in different iteration steps can be obtained through 3 iteration steps, taking node 0 as an example, the analysis process of other nodes is similar, including:
in the 0 th iteration step S0, each node completes training according to the training data to obtain its own model information. Referring to the node 0, the node 0 queries that the node distance from the target node is 1, that is, the node 0 can communicate with the target node 1 and share the model information of the target node 1, then obtains the average value of the model information of the node 0 and the node 1, and updates the model information of the machine learning model in the node 0. At this time, model information to node 0 and node 1 can be shared within node 0, with the "/" partition in fig. 4.
In the 1 st iteration step S1, each node continues to complete training according to the training data, and obtains its own model information. Referring to the node 0, the node 0 queries that the node distance from the target node is 2, that is, the node 0 can communicate with the target node 2 and share the model information of the target node 2, and then the node 0 can obtain the average value of the model information of the nodes 0 and 2 and update the model information of the machine learning model in the node 0. Since the model information to node 2 and node 3 has already been shared within the target node 2, the model information to node 0, node 1, node 2 and node 3 can be shared within node 0, with the "/" partition in fig. 4.
It should be noted that the numbers in the boxes in fig. 4 indicate model information of nodes that the respective nodes have shared. In practical application, the model information that the node 0 can obtain is only the model information of itself and the model information shared by the target node.
In the 2 nd iteration step S2, each node continues to complete training according to the training data, and obtains its own model information. Referring to the node 0, the node 0 queries that the node distance from the target node is 4, that is, the node 0 can communicate with the target node 4 and share the model information of the target node 4, and then the node 0 can obtain the average value of the model information of the nodes 0 and 4 and update the model information of the machine learning model in the node 0. Since the model information to node 4, node 5, node 6, and node 7 has already been shared within the target node 4, the model information to node 0, node 1, node 2, node 3, node 4, node 5, node 6, and node 7 may be shared within node 0, separated by "/" in fig. 4.
It should be noted that the numbers in the boxes in fig. 4 indicate model information of nodes that the respective nodes have shared. In practical application, the model information that the node 0 can obtain is only the model information of itself and the model information shared by the target node.
With continued reference to fig. 4, after 3 iterations, each node of the 8 nodes has shared the model information of all nodes, so as to achieve the number of times of synchronously sharing the data of all nodes with the minimum number of iterations and the minimum amount of traffic each time.
On the basis of the foregoing method for distributed training of a machine learning model, an embodiment of the present disclosure further provides an apparatus for distributed training of a machine learning model, referring to fig. 5, applied to each node of N nodes in a distributed system architecture, including: a communication module 53, a machine learning module 51 and a transceiver module 52;
the machine learning module 51 configured to perform obtaining, in a preset number of iteration steps of the distributed training, an order of a current iteration step; the preset number is in logarithmic relation with the N; and executing the step of obtaining one node from the N nodes as a target node of the current node in the current iteration step based on the sequence;
the communication module 53 configured to perform communication with the target node;
the transceiver module 52 is configured to perform obtaining of the model information shared by the target node by using the communication module 53, and send the model information to the machine learning module 51;
the machine learning module 51 is further configured to update the machine learning model of the current node according to the model information of the current node and the model information of the target node, so as to obtain the model information of the N nodes synchronously by the machine learning model of each node after the preset number of iteration steps are completed.
Optionally, the machine learning module 51 is further configured to perform:
acquiring the node spacing corresponding to the order of the current iteration step based on the corresponding relation between the preset order and the node spacing;
and selecting the node which is separated from the current node by the node distance as the target node according to the predetermined cyclic sequencing information of the N nodes.
Optionally, the circular ordering information includes: the cyclic direction of the N nodes and the arrangement sequence among the N nodes;
in the N nodes arranged in the arrangement order, along the circulation direction, a node distance between an mth node located after a current node and the current node is m, where m is a natural number.
Optionally, the machine learning module 51 is further configured to perform:
determining the preset number of the iteration steps according to the N;
generating a corresponding relation between an order and a node distance based on the preset number, wherein the node distance corresponding to the order i is 2iI has a value range of [0, log2N-1]。
Optionally, the machine learning module 51 is further configured to perform:
receiving the corresponding relation between the preset sequence and the node spacing, wherein the node spacing corresponding to the sequence i is 2iI has a value range of [0, log2N-1]。
Optionally, the machine learning module 51 is further configured to perform: acquiring a weight matrix and a bias matrix in the model information of the current node, and acquiring a weight matrix and a deflection matrix in the model information of the target node;
acquiring the average value of the weight matrix of the current node and the weight matrix of the target node, and acquiring the average value of the bias matrix of the current node and the bias matrix of the target node;
updating a weight matrix in the model information of the current node by using an average value of the weight matrix, and updating a bias matrix in the model information of the current node by using an average value of the bias matrix to update the model information of the current node.
With regard to the apparatus in the above-mentioned embodiment, the working principle of each module in the apparatus has been described in detail in describing the method embodiment, and will not be elaborated here.
FIG. 6 is a block diagram illustrating a server in accordance with an example embodiment. Referring to fig. 6, server 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.
The processing component 602 generally controls overall operation of the server 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the method shown in fig. 3. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.
The memory 604 is configured to store various types of data to support operations at the server 600. Examples of such data include instructions for any application or method operating on server 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power components 606 provide power to the various components of the server 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the server 600.
The multimedia component 608 includes a screen that provides an output interface between the server 600 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the server 600 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the server 600 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.
The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 614 includes one or more sensors for providing various aspects of status assessment for the server 600. For example, the sensor component 614 may detect an open/closed status of the server 600, a relative positioning of components, such as a display and keypad of the server 600, the sensor component 614 may also detect a change in position of the server 600 or a component of the server 600, the presence or absence of user contact with the server 600, orientation or acceleration/deceleration of the server 600, and a change in temperature of the server 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 616 is configured to facilitate communications between the server 600 and other devices in a wired or wireless manner. The server 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 6G), or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an embodiment of the present disclosure, the server 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following steps: acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; based on the sequence, acquiring one node from the N nodes as a target node of the current node in the current iteration step; communicating with the target node to acquire the model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed.
In an embodiment of the present disclosure, there is also provided a non-transitory computer readable storage medium, such as the memory 604, comprising instructions executable by the processor 620 of the server 600 to perform the steps of: acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; based on the sequence, acquiring one node from the N nodes as a target node of the current node in the current iteration step; communicating with the target node to acquire the model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an embodiment of the present disclosure, there is also provided a computer program product, which when executed by a processor of a server, enables the server to perform the steps of: acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; based on the sequence, acquiring one node from the N nodes as a target node of the current node in the current iteration step; communicating with the target node to acquire the model information shared by the target node; and updating the machine learning model of the current node according to the model information of the current node and the model information of the target node so as to synchronously obtain the model information of the N nodes by the machine learning model of each node after the preset number of iteration steps are completed.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device/server/storage medium embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the embodiments discussed above that follow in general the principles of the disclosure and include such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A method for distributed training of a machine learning model, applied to each node employing N nodes in a distributed system architecture, comprising:
acquiring the sequence of the current iteration step in the preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N;
acquiring the node spacing corresponding to the order of the current iteration step based on the corresponding relation between the preset order and the node spacing;
selecting a node which is separated from the current node by the node distance as a target node according to the predetermined cyclic sequencing information of the N nodes;
communicating with the target node to acquire the model information shared by the target node;
updating the machine learning model of the current node according to the model information of the current node and the model information of the target node, so that the machine learning model of each node synchronously obtains the model information of the N nodes after the preset number of iteration steps are completed;
the corresponding relation between the preset sequence and the node distance is obtained through the following steps of:
determining the preset number of the iteration steps according to the N;
generating a corresponding relation between an order and a node distance based on the preset number, wherein the node distance corresponding to the order i is 2iI has a value range of [0, log2N-1]。
2. The method of claim 1, wherein the circular ordering information comprises: the cyclic direction of the N nodes and the arrangement sequence among the N nodes;
in the N nodes arranged in the arrangement order, along the circulation direction, a node distance between an mth node located after a current node and the current node is m, where m is a natural number.
3. The method of claim 1, further comprising:
receiving the corresponding relation between the preset sequence and the node spacing, wherein the node spacing corresponding to the sequence i is 2iI has a value range of [0, log2N-1]。
4. The method of claim 1, wherein updating the machine learning model of the current node based on the model information of the current node and the model information of the target node comprises:
acquiring a weight matrix and a bias matrix in the model information of the current node, and acquiring a weight matrix and a deflection matrix in the model information of the target node;
acquiring the average value of the weight matrix of the current node and the weight matrix of the target node, and acquiring the average value of the bias matrix of the current node and the bias matrix of the target node;
updating a weight matrix in the model information of the current node by using an average value of the weight matrix, and updating a bias matrix in the model information of the current node by using an average value of the bias matrix to update the model information of the current node.
5. An apparatus for distributed training of a machine learning model, applied to each node employing N nodes in a distributed system architecture, comprising: the device comprises a communication module, a machine learning module and a transceiving module;
the machine learning module is configured to execute the sequence of the current iteration step in a preset number of iteration steps of distributed training; the preset number is in logarithmic relation with the N; executing a corresponding relation based on a preset sequence and the node spacing, acquiring the node spacing corresponding to the sequence of the current iteration step, and selecting the node which is separated from the current node by the node spacing as a target node according to the predetermined cyclic sequencing information of the N nodes;
the communication module configured to perform communication with the target node;
the transceiver module is configured to execute the acquisition of the model information shared by the target nodes by using the communication module and send the model information to the machine learning module;
the machine learning module is further configured to update the machine learning model of the current node according to the model information of the current node and the model information of the target node, so that the machine learning model of each node synchronously obtains the model information of the N nodes after the preset number of iteration steps are completed;
the machine learning module is further configured to perform:
determining the preset number of the iteration steps according to the N;
generating a corresponding relation between an order and a node distance based on the preset number, wherein the node distance corresponding to the order i is 2iI has a value range of [0, log2N-1]。
6. The apparatus of claim 5, wherein the round robin ordering information comprises: the cyclic direction of the N nodes and the arrangement sequence among the N nodes;
in the N nodes arranged in the arrangement order, along the circulation direction, a node distance between an mth node located after a current node and the current node is m, where m is a natural number.
7. The apparatus of claim 5, wherein the machine learning module is further configured to perform:
receiving the corresponding relation between the preset sequence and the node spacing, wherein the node spacing corresponding to the sequence i is 2iI has a value range of [0, log2N-1]。
8. The apparatus of claim 5, wherein the machine learning module is further configured to perform: acquiring a weight matrix and a bias matrix in the model information of the current node, and acquiring a weight matrix and a deflection matrix in the model information of the target node;
acquiring the average value of the weight matrix of the current node and the weight matrix of the target node, and acquiring the average value of the bias matrix of the current node and the bias matrix of the target node;
updating a weight matrix in the model information of the current node by using an average value of the weight matrix, and updating a bias matrix in the model information of the current node by using an average value of the bias matrix to update the model information of the current node.
9. A server, comprising:
a processor;
a memory for storing a computer program executable by the processor;
wherein the processor is configured to execute the computer program in the memory to implement the steps of the method according to any of claims 1-4.
10. A computer-readable storage medium, in which an executable computer program is stored which, when executed by a processor, is capable of carrying out the steps of the method according to any one of claims 1 to 4.
11. An application program, characterized in that the application program, when executed by a processor of a server, enables said server to carry out the steps of the method according to any one of claims 1 to 4.
CN202010957906.8A 2020-09-14 2020-09-14 Method and apparatus for distributed training of machine learning models Active CN111814968B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010957906.8A CN111814968B (en) 2020-09-14 2020-09-14 Method and apparatus for distributed training of machine learning models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010957906.8A CN111814968B (en) 2020-09-14 2020-09-14 Method and apparatus for distributed training of machine learning models

Publications (2)

Publication Number Publication Date
CN111814968A CN111814968A (en) 2020-10-23
CN111814968B true CN111814968B (en) 2021-01-12

Family

ID=72860041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010957906.8A Active CN111814968B (en) 2020-09-14 2020-09-14 Method and apparatus for distributed training of machine learning models

Country Status (1)

Country Link
CN (1) CN111814968B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185490B1 (en) * 2009-08-05 2012-05-22 The United States Of America As Represented By The Secretary Of The Navy Class-specific iterated subspace classifier
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
CN108829441A (en) * 2018-05-14 2018-11-16 中山大学 A kind of parameter update optimization system of distribution deep learning
CN110084378A (en) * 2019-05-07 2019-08-02 南京大学 A kind of distributed machines learning method based on local learning strategy
CN110502576A (en) * 2019-08-12 2019-11-26 北京迈格威科技有限公司 Data integration method, distributed computational nodes and distributed deep learning training system
US10592732B1 (en) * 2017-12-14 2020-03-17 Perceive Corporation Probabilistic loss function for training network with triplets
CN110929884A (en) * 2019-11-22 2020-03-27 北京大学 Classification method and device for distributed machine learning optimization based on column division
CN111325356A (en) * 2019-12-10 2020-06-23 四川大学 Neural network search distributed training system and training method based on evolutionary computation
CN111353582A (en) * 2020-02-19 2020-06-30 四川大学 Particle swarm algorithm-based distributed deep learning parameter updating method
CN111369009A (en) * 2020-03-04 2020-07-03 南京大学 Distributed machine learning method capable of tolerating untrusted nodes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221646A (en) * 2019-12-16 2020-06-02 清华大学 Parameter synchronization method and device for distributed machine learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185490B1 (en) * 2009-08-05 2012-05-22 The United States Of America As Represented By The Secretary Of The Navy Class-specific iterated subspace classifier
CN107688493A (en) * 2016-08-05 2018-02-13 阿里巴巴集团控股有限公司 Train the method, apparatus and system of deep neural network
US10592732B1 (en) * 2017-12-14 2020-03-17 Perceive Corporation Probabilistic loss function for training network with triplets
CN108829441A (en) * 2018-05-14 2018-11-16 中山大学 A kind of parameter update optimization system of distribution deep learning
CN110084378A (en) * 2019-05-07 2019-08-02 南京大学 A kind of distributed machines learning method based on local learning strategy
CN110502576A (en) * 2019-08-12 2019-11-26 北京迈格威科技有限公司 Data integration method, distributed computational nodes and distributed deep learning training system
CN110929884A (en) * 2019-11-22 2020-03-27 北京大学 Classification method and device for distributed machine learning optimization based on column division
CN111325356A (en) * 2019-12-10 2020-06-23 四川大学 Neural network search distributed training system and training method based on evolutionary computation
CN111353582A (en) * 2020-02-19 2020-06-30 四川大学 Particle swarm algorithm-based distributed deep learning parameter updating method
CN111369009A (en) * 2020-03-04 2020-07-03 南京大学 Distributed machine learning method capable of tolerating untrusted nodes

Also Published As

Publication number Publication date
CN111814968A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US11663468B2 (en) Method and apparatus for training neural network, and storage medium
RU2735572C1 (en) Method and device for training super network
CN108510987B (en) Voice processing method and device
CN105069073B (en) Contact information recommendation method and device
US20170193399A1 (en) Method and device for conducting classification model training
CN104469167B (en) Atomatic focusing method and device
CN110557547B (en) Lens position adjusting method and device
CN112713322B (en) Battery temperature determination method, battery temperature determination device, and storage medium
CN106886038B (en) Motion trajectory processing method and device
CN106250430A (en) The sort method of smart machine list and device
CN106648063B (en) Gesture recognition method and device
US10313537B2 (en) Method, apparatus and medium for sharing photo
CN110782010A (en) Neural network construction method and device and storage medium
KR102568810B1 (en) How to build a super network, how to use it, device and storage medium
CN111814968B (en) Method and apparatus for distributed training of machine learning models
CN107885464B (en) Data storage method, device and computer readable storage medium
CN111382242A (en) Information providing method, device and readable medium
CN110909886B (en) Machine learning network operation method, device and medium
CN114840761A (en) Push model training method, device, equipment, storage medium and program product
CN109325141B (en) Image retrieval method and device, electronic equipment and storage medium
CN109711386B (en) Method and device for obtaining recognition model, electronic equipment and storage medium
CN109813295B (en) Orientation determination method and device and electronic equipment
CN111862288A (en) Pose rendering method, device and medium
CN111538543A (en) Lost article searching method and device and storage medium
JP2022502718A (en) Subnetwork sampling method, hypernetwork topology structure construction method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant