WO2020107351A1 - Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage - Google Patents

Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage Download PDF

Info

Publication number
WO2020107351A1
WO2020107351A1 PCT/CN2018/118291 CN2018118291W WO2020107351A1 WO 2020107351 A1 WO2020107351 A1 WO 2020107351A1 CN 2018118291 W CN2018118291 W CN 2018118291W WO 2020107351 A1 WO2020107351 A1 WO 2020107351A1
Authority
WO
WIPO (PCT)
Prior art keywords
iteration
group
parameters
model
current node
Prior art date
Application number
PCT/CN2018/118291
Other languages
English (en)
Chinese (zh)
Inventor
袁振南
朱鹏新
Original Assignee
袁振南
区链通网络有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 袁振南, 区链通网络有限公司 filed Critical 袁振南
Priority to CN201880002436.0A priority Critical patent/CN109690530A/zh
Priority to PCT/CN2018/118291 priority patent/WO2020107351A1/fr
Publication of WO2020107351A1 publication Critical patent/WO2020107351A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of blockchain technology, in particular to a model training method and its node, network and storage device.
  • a decentralized network contains multiple nodes, and there is no central node in the network.
  • each node in the decentralized network can be used to cooperate to realize the information processing using the model. That is, each node uses its corresponding model to process the input information to output the result.
  • the technical problem that this application mainly solves is to provide a model training method and its nodes, network and storage device to realize the training of the model based on the decentralized network.
  • the first aspect of the present application provides a model training method, which is applied to a decentralized network containing at least one group of nodes, where each group of nodes includes at least one node, and at least part of the nodes Used for training to obtain the model parameters of the model; the method includes: the current node adopts a preset decentralized training strategy within the group to obtain the group parameters for the model; using the group parameters and the The weights of neighboring nodes outside the current node group with respect to the current node obtain model parameters for the model.
  • a second aspect of the present application provides a node of a decentralized network, including a processor and a memory and a communication circuit coupled to the processor, wherein the communication circuit is used to Communicate with other nodes of the centralized network; the memory is used to store program instructions; the processor is used to run the program instructions to perform the above method.
  • a third aspect of the present application provides a decentralized network.
  • the decentralized network includes at least one group of nodes, and each group of nodes includes at least one of the foregoing nodes.
  • a fourth aspect of the present application provides a storage device that stores program instructions, and when the program instructions run on a processor, execute the method described in the first aspect above.
  • the above scheme adopts a preset decentralized training strategy within the group to obtain the parameters within the group, and then uses the weights of the neighbor nodes outside the group to weight the parameters within the group, so as to realize the use of its common parameters in the decentralized network.
  • the node can get the model parameters of the model without the central node. .
  • FIG. 1 is a schematic structural diagram of an embodiment of a decentralized network of this application
  • FIG. 2 is a schematic flowchart of an embodiment of a model training method of this application
  • step S220 is a schematic flowchart of step S220 in another embodiment of the model training method of the present application.
  • step S220 is a schematic flowchart of step S220 in another embodiment of the model training method of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a node of a decentralized network of this application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a storage device of the present application.
  • the decentralized network 10 includes multiple nodes 11, wherein the multiple nodes 11 are divided into at least one group of nodes, and each group of nodes includes at least one node 11.
  • the decentralized network cluster 10 multiple nodes 11 are divided into three groups, and each group includes three nodes 11.
  • the node 11 may be a communication device such as a mobile phone, a tablet computer, a computer, or a server.
  • the nodes 11 can directly communicate with each other, and it is not necessary for all nodes 11 to communicate through the central node.
  • the nodes 11 in the group can communicate with each other, and each node 11 can communicate with at least one node 11 of each other group, wherein at least one node 11 in another group that communicates with the node 11 is called the node 11 Communication node outside the group.
  • the node 11 cannot directly communicate with other nodes 11 in other groups.
  • the decentralized network 10 may be used to create models and use the created models for data processing.
  • each node 11 of the decentralized network 10 may be used to perform information processing by using a model in coordination with each node in the decentralized network when performing the above information processing. That is, each node uses its corresponding model to process the input information to output the result.
  • each node may be responsible for different parts of the model.
  • the model is a neural network model, and different network layers of the neural network model are assigned to different nodes, so that different nodes and different model processing parts , That is, parallelize the model; in another reference scenario, each node is responsible for all parts of the model, for example, different nodes have multiple copies of the same model, each node is assigned to a part of the data, and then all The calculation results of the nodes are combined in a certain way.
  • the decentralized network 10 may first perform model training to obtain model parameters of the model, and then use the model corresponding to the model parameters to implement information processing as shown above.
  • each node in the decentralized network 10 is used to train to obtain the model parameters of the model.
  • each group of nodes 11 of the decentralized network 10 is first trained to obtain the group parameters of the model, and then the weights of neighbor nodes of different groups and the group parameters are used to obtain the model parameters of the model. Further, in order to obtain accurate model parameters, the model parameters may be iterated as many times as shown above.
  • the following example lists a training principle for the model parameters of the decentralized network of this application.
  • the above grouped decentralized network is used to implement a machine learning algorithm that optimizes the objective function, and then realizes the training of model parameters, where the objective function can be optimized based on the gradient descent method.
  • the model parameter training method of the decentralized network is equivalent to solving the following objective function J:
  • J k (x) is the sub-objective function of the k-th node 11
  • N is the number of nodes in the decentralized network
  • the parameter training method of the decentralized network is to let all nodes in the decentralized network 10 optimize sub-target data based on local data, and then exchange iterative parameters with other nodes in the decentralized network.
  • a certain number of iterations can make the solutions of all nodes in the decentralized network 10 converge to an approximate solution of the objective function, such as an unbiased optimal solution, and then obtain model parameters of the model.
  • the decentralized network can realize the training of its model. Specifically, the decentralized network can use the following training methods to train its model, and then obtain model parameters.
  • FIG. 2 is a schematic flowchart of an embodiment of a model training method of the present application.
  • the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model.
  • the method includes the following steps:
  • S210 The current node adopts a preset decentralized training strategy in the group to obtain the group parameters for the model.
  • the current node is any node in the above decentralized network.
  • the model parameters of the model can be obtained by iterative training.
  • the current node can use the model parameters obtained from its previous iteration to adopt this preset decentralized training strategy in this group for this iteration training to obtain the group parameters of the model for this iteration, and then step In S220, the local iteration model parameters are obtained using the local iteration group parameters. Therefore, the model parameters are continuously updated iteratively using this training method. After iterating to a certain number of times, the model parameters converge, and the converged model parameters can be taken as the final training model parameters.
  • the preset decentralized training strategy includes but is not limited to the following strategies: gossip-based training strategy, incremental training strategy, consensus training strategy or diffusion training strategy.
  • the diffusion training strategy can be specifically (A Multitask Diffusion Strategy with Optimized Inter-Cluster Cooperation).
  • the above training strategy can be used to iterate the model parameters to obtain an unbiased optimal solution. For example, when the probability of any node being selected in the random strategy of the gossip training method reaches a uniform random distribution, the solutions of all nodes converge to the unbiased optimal solution.
  • the other three strategies can also converge to the unbiased optimal solution.
  • the gossip-based training strategy refers to that each node in the network periodically passes a certain random strategy from all nodes, only exchanges parameters with one other node at a time, and iterates; the model parameters of node k at the tth iteration
  • the update process of w k,t can be as follows: Among them, w k,t-1 is the model parameter of node k at t-1th iteration, l is the sequence number of other neighbor nodes randomly selected, w l,t-1 is the node l at t-1th iteration Model parameters.
  • the training strategy based on gossip can be understood as using each node in the group where the current node is located to periodically pass a certain random strategy from the group where it belongs, Exchange parameters with other nodes in a group and iterate.
  • Gossip is a decentralized, fault-tolerant protocol that guarantees eventual consistency.
  • the incremental training strategy is to iterate the model parameters using the following formula
  • the update process of the model parameters w k,t of the node k at the t-th iteration can be as follows: Among them, w k, t-1 is the model parameter of node k in the t-1th iteration; u is the iteration factor, for example, the value of 0-1; N represents the number of nodes in the network, Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w, It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain a specific algorithm for performing an incremental training strategy on the nodes in the group.
  • Consensus training strategy that is to use the following formula to iterate the model parameters
  • the node k at the tth iteration of the model parameter w k, t update process can be as follows: Where w k,t-1 is the model parameter of node k at the t-1th iteration, N k represents all the sequence numbers of the neighbor nodes of node k, and w l,t-1 is the neighbor node l at the t-1th time right neighbor node l iterative model parameters, c lk weighting factor for the node k, u k is the weight of the composition gradient weighting factor, Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w, It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain an algorithm for performing a consensus strategy on the nodes
  • w k, t-1 is the model parameter of node k at the t-1th iteration
  • N k represents all the sequence numbers of the neighbor nodes of node k
  • clk is the weight factor of neighbor node l of node k
  • u k is The weighting factor of the combined gradient
  • J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain an algorithm for expanding the training strategy for the nodes in the group, as described in detail below.
  • this S210 adopts an extended training strategy to implement iterative update of the parameters in the group, specifically including the following sub-steps:
  • the current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
  • the current node may use the following formula (1) to obtain the initial parameter ⁇ k,t of the current node in this iteration;
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the model parameters obtained by the current node in the previous iteration
  • the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
  • the current node uses the following formula (2) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • the k is the sequence number of the current node
  • the G k represents the sequence number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node
  • the ⁇ l, t is the initial parameter of the node l in this group in this iteration.
  • this S210 adopts a consensus training strategy to implement iterative update of the parameters within the group, specifically including the following sub-steps:
  • the current node obtains the initial parameters of the current node in the local iteration by using the model parameters obtained by itself in the previous iteration and the weights of other nodes in the group relative to the current node.
  • the current node may use the following formula (3) to obtain the initial parameter ⁇ k,t-1 of the current node in this iteration;
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the modulus parameter obtained by the current node in the previous iteration
  • the G k represents the serial number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the reference parameters of the group of iterations.
  • the current node uses the following formula (4) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • ⁇ k,t ⁇ k,t-1 +2u k r k,t (d k,t -r k,t w k,t-1 ) (4)
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the model parameters obtained by the current node in the previous iteration
  • the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
  • the current node obtains the model parameters for the model by using the parameters in the group and the weights of the neighbor nodes outside the group and the current node relative to the current node.
  • step S220 the current node reuses the current node's The in-group parameters and the weights of the out-of-group neighbor nodes of the current node relative to the current node obtain the model parameters for the model in this iteration.
  • the parameters of the local iteration are used to obtain the model parameters of the local iteration.
  • the current node pre-stores the weight of each out-of-group neighbor node in the decentralized network relative to the current node, where the out-of-group neighbor node of the current node is a group different from the current node and adjacent to the current node There can be one or more nodes.
  • the current node After the current node obtains the in-group parameters of the model for this iteration, it can add the product between the in-group parameters of the model for this iteration and the weights of the pre-stored neighbor nodes of each group as the current node Model parameters for the model.
  • the current node can use the following formula (5) to obtain the model parameter w k,t of the current node for the model in this iteration;
  • the k is the sequence number of the current node
  • the N k represents the sequence number of the out-group neighbor node of the current node
  • the cl is the out-group neighbor node l relative to the The weight of the current node.
  • the ⁇ k,t is the parameter of the current node in the group of this iteration.
  • a preset decentralized training strategy is adopted within the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve utilization in the decentralized network
  • the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.
  • this embodiment adopts the above-mentioned preset decentralization within the group.
  • the training strategy updates the parameters and then merges between the groups. Further, in order to achieve faster convergence, the aforementioned extended training strategy may be adopted.
  • the above-mentioned group parameters and/or model parameters may be subjected to noise processing.
  • the preset node is used to denoise the parameters of the current node in the group of this iteration, and the parameters in the group after the noise are updated to the current node in the group of this iteration Parameters; further, after sub-steps S311 or S313 in S210, the initial parameters of the current node in this iteration can be denoised using preset noise, and the initial parameters after denoising are updated to the current The initial parameters of the node in this iteration.
  • the preset model noise is used to denoise the model parameters of the model in this iteration, and the denoised model parameters are updated to the model parameters of the model in the current iteration .
  • the aforementioned preset noise is differential privacy noise, for example, laplacian random noise.
  • the Laplace random noise may be L(F, ⁇ ); wherein, ⁇ is a differential privacy parameter that satisfies ⁇ , and F is a differential privacy sensitivity of a preset model training objective function J k .
  • the preset model may be a neural network model. It can be understood that, in other embodiments, the above-mentioned noise adding process may be performed only on a part of the above-mentioned group parameters, initial parameters, and model parameters.
  • FIG. 4 is a flowchart of still another embodiment of the model training method of the present application.
  • the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model.
  • This method uses the extended training strategy to train within the group to obtain the parameters within the group, and then weights between the groups to obtain the model parameters, and performs differential privacy denoising on the parameters of the extended training process and the final model parameters to prevent data indirection Give way.
  • the method includes the following steps:
  • the current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
  • the current node can use the formula (1) as described above, the model parameters w k, t-1 obtained from the previous iteration of itself , and the reference parameters u k , r k, t , d k, t to obtain the initial parameter ⁇ k,t of the current node in this iteration.
  • S420 The current node uses the preset noise to add noise to the initial parameter of the current node in this iteration, and updates the initial parameter after noise addition to the initial parameter of the current node in this iteration.
  • the preset noise is added when the gradient expansion update in the current node calculation group.
  • the preset noise is Laplace random noise.
  • the current node uses the following formula (6) to increase the preset noise and the number of neighbor nodes in the current node's group for the initial parameter ⁇ k,t of the current node in this iteration And update the initial parameter after noise addition to the initial parameter ⁇ ′ k,t of the current node in this iteration.
  • L(F, ⁇ ) Laplace random noise
  • is the differential privacy parameter that satisfies ⁇
  • F is the differential privacy sensitivity of the neural network model training objective function J k ; The number of neighbor nodes in the current node group.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
  • the current node can use the following formula (7) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • the k is the sequence number of the current node
  • the G k represents the sequence number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node
  • the ⁇ l, t ′ is the initial parameter of the node l in the group after the noise is added in this iteration.
  • the current node uses the in-group parameters of the current node to the model in this iteration and the weights of the out-group neighbor nodes of the current node relative to the current node to obtain the model parameters of the model in this iteration.
  • the current node may use the formula (5) as described above, and the current node’s in-group parameter ⁇ k,t and the weight c l of the out-of- group neighbor nodes relative to the current node in this iteration are used to obtain the current The node's model parameters w k,t at this iteration.
  • S450 The current node uses preset noise to denoise the model parameters of the model in this iteration, and updates the model parameters after the denoising to the model parameters of the model in this iteration.
  • the preset noise is Laplace random noise.
  • the current node uses the following formula (8) to increase the preset noise and the number of out-of-group neighbor nodes of the current node's model parameters w k,t in this iteration And update the model parameters after noise addition to the model parameters w k,t ′ of the current node in this iteration.
  • L(F, ⁇ ) Laplace random noise
  • is the differential privacy parameter that satisfies ⁇
  • F is the differential privacy sensitivity of the neural network model training objective function J k ; The number of neighbors outside the group of the current node.
  • the strategy of adopting the update strategy of the nodes in the group first using the diffusion strategy and then merging with the nodes outside the group can accelerate the convergence speed of distributed optimization, and at the same time, the noise of differential privacy can prevent the problem of indirect data leakage.
  • FIG. 5 is a schematic structural diagram of an embodiment of a node of a decentralized network of the present application.
  • the node 50 may be a node in the decentralized network as described in FIG. 1 and includes a memory 51, a processor 52, and a communication circuit 53.
  • the communication circuit 53 and the memory 51 are respectively coupled to the processor 52.
  • each component of the node 50 may be coupled together through a bus, or the processor of the node 50 may be connected to other components one by one.
  • the node 50 may be any communication device such as a mobile phone, a notebook, a desktop computer, and a server.
  • the communication circuit 53 is used to communicate with other nodes in the decentralized network.
  • the communication circuit 53 may communicate with nodes in the group in the decentralized network to obtain initial parameters of previous iterations of other nodes in the group.
  • the memory 51 is used to store program instructions executed by the processor 52 and data during processing of the processor 52, wherein the memory 51 includes a non-volatile storage part for storing the above-mentioned program instructions. Furthermore, the memory 51 may also store account related data.
  • the processor 52 controls the operation of the node 50, and the processor 52 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 52 may be an integrated circuit chip with signal processing capabilities.
  • the processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components .
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 52 uses program instructions stored in the memory 51 to: use the preset decentralized training strategy in the group to obtain the group parameters of the model; use the group parameters and The weight of the neighboring node outside the group with the current node relative to the current node obtains the model parameters of the model.
  • the processor 52 executes the use of a preset decentralized training strategy in the group to obtain the group parameters for the model, including: using the model parameters obtained by the previous iteration of itself in the group The preset decentralized training strategy is used for this iterative training to obtain the in-group parameters of the model for this iteration; the processor 52 executes the use of the in-group parameters and the out-of-group neighbor nodes with the current node Relative to the weight of the current node, the model parameters for the model are obtained, including: using the current node to compare the in-group parameters of the model and the out-group neighbor nodes of the current node in this iteration The weight of the current node obtains the model parameters of the model in this iteration.
  • the preset decentralized training strategy includes a gossip-based training strategy, an incremental training strategy, a consensus training strategy, or a diffusion training strategy.
  • the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a diffusion training strategy to perform this iteration training in this group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained from the previous iteration and the reference parameters of this iteration to obtain the initial parameters of the current node in this iteration; according to the initial parameters of the current node in this iteration and other nodes in this group in this iteration The initial parameters of the current node to obtain the parameters of the current node within the group of the model in this iteration.
  • the processor 52 executes the model parameters obtained by using the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration, which may specifically include: using the formula (1) described above ) Obtain the initial parameter ⁇ k,t of the current node in this iteration.
  • the processor 52 executes the process according to the initial parameters of the current node at this iteration and the initial parameters of other nodes of the group at this iteration to obtain the current node in the group of the model in this iteration
  • the parameters may specifically include: using the formula (2) described above to obtain the parameter ⁇ k,t of the current node in the group of this iteration.
  • the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a consensus training strategy to perform this iteration training within the group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration; according to the initial parameters of the current node in this iteration and the reference of the group of iteration Parameters to get the parameters of the current node within the group of the model in this iteration.
  • the processor 52 executes the model parameters obtained from the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration, which may specifically include: using the above Formula (3) obtains the initial parameter ⁇ k,t-1 of the current node in this iteration.
  • the processor 52 executes the parameters according to the initial parameters of the current node in the current iteration and the reference parameters of the set of iterations to obtain the parameters of the current node for the model in the current iteration, which may specifically include : Use the formula (4) described above to obtain the parameter ⁇ k,t of the current node in the group of this iteration.
  • the processor 52 is further configured to: use preset noise to add noise to the initial parameters of the current node in this iteration, and update the initial parameters after noise addition to the current node at this time The initial parameters of the iteration.
  • the processor 52 executes the use of preset noise to add noise to the initial parameter of the current node in this iteration, which may specifically include: adding the preset to the initial parameter of the current node in this iteration Noise and the number of neighbor nodes in the current node group
  • the processor 52 is further configured to: use preset noise to denoise the model parameters of the model in this iteration, and update the model parameters after denoising to the Iterate over the model parameters of the model.
  • the processor 52 executes the process of adding noise to the model parameters of the model in the current iteration using preset noise, which may specifically include: modeling the model for the current node in the current iteration The parameter increases the preset noise and the number of neighbor nodes outside the group of the current node
  • the preset noise is Laplace random noise.
  • the Laplacian random noise may be: L(F, ⁇ ); wherein, ⁇ is a differential privacy parameter satisfying ⁇ , and F is a differential privacy sensitivity of a preset model training objective function J k .
  • the processor 52 executes the use of the current node in this iteration for the in-group parameters of the model and the weight of the out-group neighbor nodes of the current node relative to the current node to obtain
  • the model parameters of the model in this iteration include: using the above formula (5) to obtain the model parameters w k,t of the current node for the model in this iteration.
  • the above processor 52 is also used to execute the steps of any of the above method embodiments.
  • the present application also provides a schematic structural diagram of an embodiment of a storage device.
  • the storage device 60 stores program instructions 61 executable by the processor, and the program instructions 61 are used to execute the method in the foregoing embodiment.
  • the storage device 60 may specifically be a medium that can store program instructions, such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk, or It may also be a server that stores the program instructions. The server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.
  • program instructions such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk
  • the server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.
  • the storage device 60 may also be a memory as shown in FIG. 5.
  • the preset decentralized training strategy is adopted in the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve the use of the Ordinary nodes can get the model parameters of the model, without a central node.
  • the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.
  • the disclosed method and apparatus may be implemented in other ways.
  • the device implementation described above is only schematic.
  • the division of modules or units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program instructions .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention concerne un procédé d'apprentissage de modèle et des nœuds associés, un réseau et un dispositif de stockage. Le procédé est appliqué à un réseau décentralisé comprenant au moins un groupe de nœuds, chaque groupe de nœuds comprenant au moins un nœud, et au moins certains des nœuds étant utilisés pour l'apprentissage afin d'obtenir des paramètres de modèle du modèle. Le procédé consiste : à utiliser une politique d'apprentissage décentralisée prédéfinie destinée à un nœud actuel dans le groupe présent afin d'obtenir des paramètres en groupe destinés au modèle ; et à obtenir des paramètres de modèle destinés au modèle à l'aide des paramètres en groupe et des pondérations des nœuds voisins hors groupe du nœud actuel par rapport au nœud actuel. De cette manière, l'apprentissage d'un modèle basé sur un réseau décentralisé peut être mis en œuvre.
PCT/CN2018/118291 2018-11-29 2018-11-29 Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage WO2020107351A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880002436.0A CN109690530A (zh) 2018-11-29 2018-11-29 模型训练方法及其节点、网络及存储装置
PCT/CN2018/118291 WO2020107351A1 (fr) 2018-11-29 2018-11-29 Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/118291 WO2020107351A1 (fr) 2018-11-29 2018-11-29 Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage

Publications (1)

Publication Number Publication Date
WO2020107351A1 true WO2020107351A1 (fr) 2020-06-04

Family

ID=66190447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118291 WO2020107351A1 (fr) 2018-11-29 2018-11-29 Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage

Country Status (2)

Country Link
CN (1) CN109690530A (fr)
WO (1) WO2020107351A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865607A (zh) * 2023-03-01 2023-03-28 山东海量信息技术研究院 一种分布式训练的计算节点管理方法及相关装置
CN116663639A (zh) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704178B (zh) * 2019-09-04 2023-05-23 北京三快在线科技有限公司 机器学习模型训练方法、平台、电子设备及可读存储介质
CN111475853B (zh) * 2020-06-24 2020-12-11 支付宝(杭州)信息技术有限公司 一种基于分布式数据的模型训练方法及系统
CN113065635A (zh) * 2021-02-27 2021-07-02 华为技术有限公司 一种模型的训练方法、图像增强方法及设备
CN116150612A (zh) * 2021-11-15 2023-05-23 华为技术有限公司 模型训练的方法和通信装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018590A1 (en) * 2016-07-18 2018-01-18 NantOmics, Inc. Distributed Machine Learning Systems, Apparatus, and Methods
CN108491266A (zh) * 2018-03-09 2018-09-04 联想(北京)有限公司 基于区块链的数据处理方法、装置及电子设备
CN108520303A (zh) * 2018-03-02 2018-09-11 阿里巴巴集团控股有限公司 一种推荐系统构建方法及装置
CN108683738A (zh) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 图数据处理方法和图数据的计算任务发布方法
CN108898219A (zh) * 2018-06-07 2018-11-27 广东工业大学 一种基于区块链的神经网络训练方法、装置及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018590A1 (en) * 2016-07-18 2018-01-18 NantOmics, Inc. Distributed Machine Learning Systems, Apparatus, and Methods
CN108520303A (zh) * 2018-03-02 2018-09-11 阿里巴巴集团控股有限公司 一种推荐系统构建方法及装置
CN108491266A (zh) * 2018-03-09 2018-09-04 联想(北京)有限公司 基于区块链的数据处理方法、装置及电子设备
CN108683738A (zh) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 图数据处理方法和图数据的计算任务发布方法
CN108898219A (zh) * 2018-06-07 2018-11-27 广东工业大学 一种基于区块链的神经网络训练方法、装置及介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865607A (zh) * 2023-03-01 2023-03-28 山东海量信息技术研究院 一种分布式训练的计算节点管理方法及相关装置
CN116663639A (zh) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质
CN116663639B (zh) * 2023-07-31 2023-11-03 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、装置及介质

Also Published As

Publication number Publication date
CN109690530A (zh) 2019-04-26

Similar Documents

Publication Publication Date Title
WO2020107351A1 (fr) Procédé d'apprentissage de modèle et nœuds associés, réseau et dispositif de stockage
US11922308B2 (en) Generating neighborhood convolutions within a large network
Scardapane et al. Distributed semi-supervised support vector machines
WO2018099084A1 (fr) Procédé, dispositif, puce et système d'apprentissage de modèle de réseau neuronal
van Wyk et al. Evolutionary neural architecture search for image restoration
CN111695415A (zh) 图像识别模型的构建方法、识别方法及相关设备
WO2022166115A1 (fr) Système de recommandation à seuils adaptatifs pour sélection de voisinage
Zhou et al. A randomized block-coordinate adam online learning optimization algorithm
JP2002230514A (ja) 進化的最適化方法
Salehisadaghiani et al. Nash equilibrium seeking by a gossip-based algorithm
WO2017159402A1 (fr) Système, procédé et programme de co-regroupement
CN114613437B (zh) 一种基于异构图的miRNA与疾病关联预测方法及系统
CN115358487A (zh) 面向电力数据共享的联邦学习聚合优化系统及方法
Prellberg et al. Lamarckian evolution of convolutional neural networks
Kholod et al. Efficient Distribution and Processing of Data for Parallelizing Data Mining in Mobile Clouds.
CN114282678A (zh) 一种机器学习模型的训练的方法以及相关设备
Ying et al. EAGAN: Efficient two-stage evolutionary architecture search for GANs
US20230132545A1 (en) Methods and Systems for Approximating Embeddings of Out-Of-Knowledge-Graph Entities for Link Prediction in Knowledge Graph
CN116629376A (zh) 一种基于无数据蒸馏的联邦学习聚合方法和系统
Ma et al. Finite‐time average consensus based approach for distributed convex optimization
CN113228059A (zh) 面向跨网络的表示学习算法
CN108614932B (zh) 基于边图的线性流重叠社区发现方法、系统及存储介质
Tabealhojeh et al. RMAML: Riemannian meta-learning with orthogonality constraints
Ren et al. Personalized federated learning: A Clustered Distributed Co-Meta-Learning approach
CN110867224B (zh) 用于大规模脑病历分割的多粒度Spark超信任模糊方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941887

Country of ref document: EP

Kind code of ref document: A1