WO2020107351A1 - Model training method and nodes thereof, network and storage device - Google Patents

Model training method and nodes thereof, network and storage device Download PDF

Info

Publication number
WO2020107351A1
WO2020107351A1 PCT/CN2018/118291 CN2018118291W WO2020107351A1 WO 2020107351 A1 WO2020107351 A1 WO 2020107351A1 CN 2018118291 W CN2018118291 W CN 2018118291W WO 2020107351 A1 WO2020107351 A1 WO 2020107351A1
Authority
WO
WIPO (PCT)
Prior art keywords
iteration
group
parameters
model
current node
Prior art date
Application number
PCT/CN2018/118291
Other languages
French (fr)
Chinese (zh)
Inventor
袁振南
朱鹏新
Original Assignee
袁振南
区链通网络有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 袁振南, 区链通网络有限公司 filed Critical 袁振南
Priority to CN201880002436.0A priority Critical patent/CN109690530A/en
Priority to PCT/CN2018/118291 priority patent/WO2020107351A1/en
Publication of WO2020107351A1 publication Critical patent/WO2020107351A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of blockchain technology, in particular to a model training method and its node, network and storage device.
  • a decentralized network contains multiple nodes, and there is no central node in the network.
  • each node in the decentralized network can be used to cooperate to realize the information processing using the model. That is, each node uses its corresponding model to process the input information to output the result.
  • the technical problem that this application mainly solves is to provide a model training method and its nodes, network and storage device to realize the training of the model based on the decentralized network.
  • the first aspect of the present application provides a model training method, which is applied to a decentralized network containing at least one group of nodes, where each group of nodes includes at least one node, and at least part of the nodes Used for training to obtain the model parameters of the model; the method includes: the current node adopts a preset decentralized training strategy within the group to obtain the group parameters for the model; using the group parameters and the The weights of neighboring nodes outside the current node group with respect to the current node obtain model parameters for the model.
  • a second aspect of the present application provides a node of a decentralized network, including a processor and a memory and a communication circuit coupled to the processor, wherein the communication circuit is used to Communicate with other nodes of the centralized network; the memory is used to store program instructions; the processor is used to run the program instructions to perform the above method.
  • a third aspect of the present application provides a decentralized network.
  • the decentralized network includes at least one group of nodes, and each group of nodes includes at least one of the foregoing nodes.
  • a fourth aspect of the present application provides a storage device that stores program instructions, and when the program instructions run on a processor, execute the method described in the first aspect above.
  • the above scheme adopts a preset decentralized training strategy within the group to obtain the parameters within the group, and then uses the weights of the neighbor nodes outside the group to weight the parameters within the group, so as to realize the use of its common parameters in the decentralized network.
  • the node can get the model parameters of the model without the central node. .
  • FIG. 1 is a schematic structural diagram of an embodiment of a decentralized network of this application
  • FIG. 2 is a schematic flowchart of an embodiment of a model training method of this application
  • step S220 is a schematic flowchart of step S220 in another embodiment of the model training method of the present application.
  • step S220 is a schematic flowchart of step S220 in another embodiment of the model training method of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a node of a decentralized network of this application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a storage device of the present application.
  • the decentralized network 10 includes multiple nodes 11, wherein the multiple nodes 11 are divided into at least one group of nodes, and each group of nodes includes at least one node 11.
  • the decentralized network cluster 10 multiple nodes 11 are divided into three groups, and each group includes three nodes 11.
  • the node 11 may be a communication device such as a mobile phone, a tablet computer, a computer, or a server.
  • the nodes 11 can directly communicate with each other, and it is not necessary for all nodes 11 to communicate through the central node.
  • the nodes 11 in the group can communicate with each other, and each node 11 can communicate with at least one node 11 of each other group, wherein at least one node 11 in another group that communicates with the node 11 is called the node 11 Communication node outside the group.
  • the node 11 cannot directly communicate with other nodes 11 in other groups.
  • the decentralized network 10 may be used to create models and use the created models for data processing.
  • each node 11 of the decentralized network 10 may be used to perform information processing by using a model in coordination with each node in the decentralized network when performing the above information processing. That is, each node uses its corresponding model to process the input information to output the result.
  • each node may be responsible for different parts of the model.
  • the model is a neural network model, and different network layers of the neural network model are assigned to different nodes, so that different nodes and different model processing parts , That is, parallelize the model; in another reference scenario, each node is responsible for all parts of the model, for example, different nodes have multiple copies of the same model, each node is assigned to a part of the data, and then all The calculation results of the nodes are combined in a certain way.
  • the decentralized network 10 may first perform model training to obtain model parameters of the model, and then use the model corresponding to the model parameters to implement information processing as shown above.
  • each node in the decentralized network 10 is used to train to obtain the model parameters of the model.
  • each group of nodes 11 of the decentralized network 10 is first trained to obtain the group parameters of the model, and then the weights of neighbor nodes of different groups and the group parameters are used to obtain the model parameters of the model. Further, in order to obtain accurate model parameters, the model parameters may be iterated as many times as shown above.
  • the following example lists a training principle for the model parameters of the decentralized network of this application.
  • the above grouped decentralized network is used to implement a machine learning algorithm that optimizes the objective function, and then realizes the training of model parameters, where the objective function can be optimized based on the gradient descent method.
  • the model parameter training method of the decentralized network is equivalent to solving the following objective function J:
  • J k (x) is the sub-objective function of the k-th node 11
  • N is the number of nodes in the decentralized network
  • the parameter training method of the decentralized network is to let all nodes in the decentralized network 10 optimize sub-target data based on local data, and then exchange iterative parameters with other nodes in the decentralized network.
  • a certain number of iterations can make the solutions of all nodes in the decentralized network 10 converge to an approximate solution of the objective function, such as an unbiased optimal solution, and then obtain model parameters of the model.
  • the decentralized network can realize the training of its model. Specifically, the decentralized network can use the following training methods to train its model, and then obtain model parameters.
  • FIG. 2 is a schematic flowchart of an embodiment of a model training method of the present application.
  • the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model.
  • the method includes the following steps:
  • S210 The current node adopts a preset decentralized training strategy in the group to obtain the group parameters for the model.
  • the current node is any node in the above decentralized network.
  • the model parameters of the model can be obtained by iterative training.
  • the current node can use the model parameters obtained from its previous iteration to adopt this preset decentralized training strategy in this group for this iteration training to obtain the group parameters of the model for this iteration, and then step In S220, the local iteration model parameters are obtained using the local iteration group parameters. Therefore, the model parameters are continuously updated iteratively using this training method. After iterating to a certain number of times, the model parameters converge, and the converged model parameters can be taken as the final training model parameters.
  • the preset decentralized training strategy includes but is not limited to the following strategies: gossip-based training strategy, incremental training strategy, consensus training strategy or diffusion training strategy.
  • the diffusion training strategy can be specifically (A Multitask Diffusion Strategy with Optimized Inter-Cluster Cooperation).
  • the above training strategy can be used to iterate the model parameters to obtain an unbiased optimal solution. For example, when the probability of any node being selected in the random strategy of the gossip training method reaches a uniform random distribution, the solutions of all nodes converge to the unbiased optimal solution.
  • the other three strategies can also converge to the unbiased optimal solution.
  • the gossip-based training strategy refers to that each node in the network periodically passes a certain random strategy from all nodes, only exchanges parameters with one other node at a time, and iterates; the model parameters of node k at the tth iteration
  • the update process of w k,t can be as follows: Among them, w k,t-1 is the model parameter of node k at t-1th iteration, l is the sequence number of other neighbor nodes randomly selected, w l,t-1 is the node l at t-1th iteration Model parameters.
  • the training strategy based on gossip can be understood as using each node in the group where the current node is located to periodically pass a certain random strategy from the group where it belongs, Exchange parameters with other nodes in a group and iterate.
  • Gossip is a decentralized, fault-tolerant protocol that guarantees eventual consistency.
  • the incremental training strategy is to iterate the model parameters using the following formula
  • the update process of the model parameters w k,t of the node k at the t-th iteration can be as follows: Among them, w k, t-1 is the model parameter of node k in the t-1th iteration; u is the iteration factor, for example, the value of 0-1; N represents the number of nodes in the network, Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w, It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain a specific algorithm for performing an incremental training strategy on the nodes in the group.
  • Consensus training strategy that is to use the following formula to iterate the model parameters
  • the node k at the tth iteration of the model parameter w k, t update process can be as follows: Where w k,t-1 is the model parameter of node k at the t-1th iteration, N k represents all the sequence numbers of the neighbor nodes of node k, and w l,t-1 is the neighbor node l at the t-1th time right neighbor node l iterative model parameters, c lk weighting factor for the node k, u k is the weight of the composition gradient weighting factor, Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w, It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain an algorithm for performing a consensus strategy on the nodes
  • w k, t-1 is the model parameter of node k at the t-1th iteration
  • N k represents all the sequence numbers of the neighbor nodes of node k
  • clk is the weight factor of neighbor node l of node k
  • u k is The weighting factor of the combined gradient
  • J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w It represents the gradient value of the objective function after substituting specific model parameters.
  • the above formula can be appropriately transformed to obtain an algorithm for expanding the training strategy for the nodes in the group, as described in detail below.
  • this S210 adopts an extended training strategy to implement iterative update of the parameters in the group, specifically including the following sub-steps:
  • the current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
  • the current node may use the following formula (1) to obtain the initial parameter ⁇ k,t of the current node in this iteration;
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the model parameters obtained by the current node in the previous iteration
  • the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
  • the current node uses the following formula (2) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • the k is the sequence number of the current node
  • the G k represents the sequence number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node
  • the ⁇ l, t is the initial parameter of the node l in this group in this iteration.
  • this S210 adopts a consensus training strategy to implement iterative update of the parameters within the group, specifically including the following sub-steps:
  • the current node obtains the initial parameters of the current node in the local iteration by using the model parameters obtained by itself in the previous iteration and the weights of other nodes in the group relative to the current node.
  • the current node may use the following formula (3) to obtain the initial parameter ⁇ k,t-1 of the current node in this iteration;
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the modulus parameter obtained by the current node in the previous iteration
  • the G k represents the serial number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the reference parameters of the group of iterations.
  • the current node uses the following formula (4) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • ⁇ k,t ⁇ k,t-1 +2u k r k,t (d k,t -r k,t w k,t-1 ) (4)
  • this time is the t-th iteration
  • the previous time is the t-1th iteration
  • the k is the sequence number of the current node
  • the w k,t-1 is the model parameters obtained by the current node in the previous iteration
  • the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
  • the current node obtains the model parameters for the model by using the parameters in the group and the weights of the neighbor nodes outside the group and the current node relative to the current node.
  • step S220 the current node reuses the current node's The in-group parameters and the weights of the out-of-group neighbor nodes of the current node relative to the current node obtain the model parameters for the model in this iteration.
  • the parameters of the local iteration are used to obtain the model parameters of the local iteration.
  • the current node pre-stores the weight of each out-of-group neighbor node in the decentralized network relative to the current node, where the out-of-group neighbor node of the current node is a group different from the current node and adjacent to the current node There can be one or more nodes.
  • the current node After the current node obtains the in-group parameters of the model for this iteration, it can add the product between the in-group parameters of the model for this iteration and the weights of the pre-stored neighbor nodes of each group as the current node Model parameters for the model.
  • the current node can use the following formula (5) to obtain the model parameter w k,t of the current node for the model in this iteration;
  • the k is the sequence number of the current node
  • the N k represents the sequence number of the out-group neighbor node of the current node
  • the cl is the out-group neighbor node l relative to the The weight of the current node.
  • the ⁇ k,t is the parameter of the current node in the group of this iteration.
  • a preset decentralized training strategy is adopted within the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve utilization in the decentralized network
  • the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.
  • this embodiment adopts the above-mentioned preset decentralization within the group.
  • the training strategy updates the parameters and then merges between the groups. Further, in order to achieve faster convergence, the aforementioned extended training strategy may be adopted.
  • the above-mentioned group parameters and/or model parameters may be subjected to noise processing.
  • the preset node is used to denoise the parameters of the current node in the group of this iteration, and the parameters in the group after the noise are updated to the current node in the group of this iteration Parameters; further, after sub-steps S311 or S313 in S210, the initial parameters of the current node in this iteration can be denoised using preset noise, and the initial parameters after denoising are updated to the current The initial parameters of the node in this iteration.
  • the preset model noise is used to denoise the model parameters of the model in this iteration, and the denoised model parameters are updated to the model parameters of the model in the current iteration .
  • the aforementioned preset noise is differential privacy noise, for example, laplacian random noise.
  • the Laplace random noise may be L(F, ⁇ ); wherein, ⁇ is a differential privacy parameter that satisfies ⁇ , and F is a differential privacy sensitivity of a preset model training objective function J k .
  • the preset model may be a neural network model. It can be understood that, in other embodiments, the above-mentioned noise adding process may be performed only on a part of the above-mentioned group parameters, initial parameters, and model parameters.
  • FIG. 4 is a flowchart of still another embodiment of the model training method of the present application.
  • the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model.
  • This method uses the extended training strategy to train within the group to obtain the parameters within the group, and then weights between the groups to obtain the model parameters, and performs differential privacy denoising on the parameters of the extended training process and the final model parameters to prevent data indirection Give way.
  • the method includes the following steps:
  • the current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
  • the current node can use the formula (1) as described above, the model parameters w k, t-1 obtained from the previous iteration of itself , and the reference parameters u k , r k, t , d k, t to obtain the initial parameter ⁇ k,t of the current node in this iteration.
  • S420 The current node uses the preset noise to add noise to the initial parameter of the current node in this iteration, and updates the initial parameter after noise addition to the initial parameter of the current node in this iteration.
  • the preset noise is added when the gradient expansion update in the current node calculation group.
  • the preset noise is Laplace random noise.
  • the current node uses the following formula (6) to increase the preset noise and the number of neighbor nodes in the current node's group for the initial parameter ⁇ k,t of the current node in this iteration And update the initial parameter after noise addition to the initial parameter ⁇ ′ k,t of the current node in this iteration.
  • L(F, ⁇ ) Laplace random noise
  • is the differential privacy parameter that satisfies ⁇
  • F is the differential privacy sensitivity of the neural network model training objective function J k ; The number of neighbor nodes in the current node group.
  • the current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
  • the current node can use the following formula (7) to obtain the parameter ⁇ k,t of the current node in this iteration group;
  • the k is the sequence number of the current node
  • the G k represents the sequence number of the node in the group
  • the g l is the weight of the node l in the group relative to the current node
  • the ⁇ l, t ′ is the initial parameter of the node l in the group after the noise is added in this iteration.
  • the current node uses the in-group parameters of the current node to the model in this iteration and the weights of the out-group neighbor nodes of the current node relative to the current node to obtain the model parameters of the model in this iteration.
  • the current node may use the formula (5) as described above, and the current node’s in-group parameter ⁇ k,t and the weight c l of the out-of- group neighbor nodes relative to the current node in this iteration are used to obtain the current The node's model parameters w k,t at this iteration.
  • S450 The current node uses preset noise to denoise the model parameters of the model in this iteration, and updates the model parameters after the denoising to the model parameters of the model in this iteration.
  • the preset noise is Laplace random noise.
  • the current node uses the following formula (8) to increase the preset noise and the number of out-of-group neighbor nodes of the current node's model parameters w k,t in this iteration And update the model parameters after noise addition to the model parameters w k,t ′ of the current node in this iteration.
  • L(F, ⁇ ) Laplace random noise
  • is the differential privacy parameter that satisfies ⁇
  • F is the differential privacy sensitivity of the neural network model training objective function J k ; The number of neighbors outside the group of the current node.
  • the strategy of adopting the update strategy of the nodes in the group first using the diffusion strategy and then merging with the nodes outside the group can accelerate the convergence speed of distributed optimization, and at the same time, the noise of differential privacy can prevent the problem of indirect data leakage.
  • FIG. 5 is a schematic structural diagram of an embodiment of a node of a decentralized network of the present application.
  • the node 50 may be a node in the decentralized network as described in FIG. 1 and includes a memory 51, a processor 52, and a communication circuit 53.
  • the communication circuit 53 and the memory 51 are respectively coupled to the processor 52.
  • each component of the node 50 may be coupled together through a bus, or the processor of the node 50 may be connected to other components one by one.
  • the node 50 may be any communication device such as a mobile phone, a notebook, a desktop computer, and a server.
  • the communication circuit 53 is used to communicate with other nodes in the decentralized network.
  • the communication circuit 53 may communicate with nodes in the group in the decentralized network to obtain initial parameters of previous iterations of other nodes in the group.
  • the memory 51 is used to store program instructions executed by the processor 52 and data during processing of the processor 52, wherein the memory 51 includes a non-volatile storage part for storing the above-mentioned program instructions. Furthermore, the memory 51 may also store account related data.
  • the processor 52 controls the operation of the node 50, and the processor 52 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 52 may be an integrated circuit chip with signal processing capabilities.
  • the processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components .
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the processor 52 uses program instructions stored in the memory 51 to: use the preset decentralized training strategy in the group to obtain the group parameters of the model; use the group parameters and The weight of the neighboring node outside the group with the current node relative to the current node obtains the model parameters of the model.
  • the processor 52 executes the use of a preset decentralized training strategy in the group to obtain the group parameters for the model, including: using the model parameters obtained by the previous iteration of itself in the group The preset decentralized training strategy is used for this iterative training to obtain the in-group parameters of the model for this iteration; the processor 52 executes the use of the in-group parameters and the out-of-group neighbor nodes with the current node Relative to the weight of the current node, the model parameters for the model are obtained, including: using the current node to compare the in-group parameters of the model and the out-group neighbor nodes of the current node in this iteration The weight of the current node obtains the model parameters of the model in this iteration.
  • the preset decentralized training strategy includes a gossip-based training strategy, an incremental training strategy, a consensus training strategy, or a diffusion training strategy.
  • the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a diffusion training strategy to perform this iteration training in this group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained from the previous iteration and the reference parameters of this iteration to obtain the initial parameters of the current node in this iteration; according to the initial parameters of the current node in this iteration and other nodes in this group in this iteration The initial parameters of the current node to obtain the parameters of the current node within the group of the model in this iteration.
  • the processor 52 executes the model parameters obtained by using the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration, which may specifically include: using the formula (1) described above ) Obtain the initial parameter ⁇ k,t of the current node in this iteration.
  • the processor 52 executes the process according to the initial parameters of the current node at this iteration and the initial parameters of other nodes of the group at this iteration to obtain the current node in the group of the model in this iteration
  • the parameters may specifically include: using the formula (2) described above to obtain the parameter ⁇ k,t of the current node in the group of this iteration.
  • the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a consensus training strategy to perform this iteration training within the group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration; according to the initial parameters of the current node in this iteration and the reference of the group of iteration Parameters to get the parameters of the current node within the group of the model in this iteration.
  • the processor 52 executes the model parameters obtained from the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration, which may specifically include: using the above Formula (3) obtains the initial parameter ⁇ k,t-1 of the current node in this iteration.
  • the processor 52 executes the parameters according to the initial parameters of the current node in the current iteration and the reference parameters of the set of iterations to obtain the parameters of the current node for the model in the current iteration, which may specifically include : Use the formula (4) described above to obtain the parameter ⁇ k,t of the current node in the group of this iteration.
  • the processor 52 is further configured to: use preset noise to add noise to the initial parameters of the current node in this iteration, and update the initial parameters after noise addition to the current node at this time The initial parameters of the iteration.
  • the processor 52 executes the use of preset noise to add noise to the initial parameter of the current node in this iteration, which may specifically include: adding the preset to the initial parameter of the current node in this iteration Noise and the number of neighbor nodes in the current node group
  • the processor 52 is further configured to: use preset noise to denoise the model parameters of the model in this iteration, and update the model parameters after denoising to the Iterate over the model parameters of the model.
  • the processor 52 executes the process of adding noise to the model parameters of the model in the current iteration using preset noise, which may specifically include: modeling the model for the current node in the current iteration The parameter increases the preset noise and the number of neighbor nodes outside the group of the current node
  • the preset noise is Laplace random noise.
  • the Laplacian random noise may be: L(F, ⁇ ); wherein, ⁇ is a differential privacy parameter satisfying ⁇ , and F is a differential privacy sensitivity of a preset model training objective function J k .
  • the processor 52 executes the use of the current node in this iteration for the in-group parameters of the model and the weight of the out-group neighbor nodes of the current node relative to the current node to obtain
  • the model parameters of the model in this iteration include: using the above formula (5) to obtain the model parameters w k,t of the current node for the model in this iteration.
  • the above processor 52 is also used to execute the steps of any of the above method embodiments.
  • the present application also provides a schematic structural diagram of an embodiment of a storage device.
  • the storage device 60 stores program instructions 61 executable by the processor, and the program instructions 61 are used to execute the method in the foregoing embodiment.
  • the storage device 60 may specifically be a medium that can store program instructions, such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk, or It may also be a server that stores the program instructions. The server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.
  • program instructions such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk
  • the server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.
  • the storage device 60 may also be a memory as shown in FIG. 5.
  • the preset decentralized training strategy is adopted in the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve the use of the Ordinary nodes can get the model parameters of the model, without a central node.
  • the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.
  • the disclosed method and apparatus may be implemented in other ways.
  • the device implementation described above is only schematic.
  • the division of modules or units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program instructions .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed in the present application are a model training method and nodes thereof, a network and a storage device. The method is applied to a decentralized network comprising at least one group of nodes, wherein each group of nodes comprises at least one node, and at least some of the nodes are used for training to obtain model parameters of the model. The method comprises: using a preset decentralized training policy for a current node in the present group to obtain in-group parameters for the model; and obtaining model parameters for the model by using the in-group parameters and weights of out-group neighbor nodes of the current node with respect to the current node. In the above manner, the training of a model based on a decentralized network can be implemented.

Description

模型训练方法及其节点、网络及存储装置Model training method and its node, network and storage device 【技术领域】【Technical Field】
本申请涉及区块链技术领域,特别是涉及一种模型训练方法及其节点、网络及存储装置。This application relates to the field of blockchain technology, in particular to a model training method and its node, network and storage device.
【背景技术】【Background technique】
目前,通常需要采用各种数据模型来实现信息的处理,例如采用识别模型进行图像识别等。如今,去中心化网络由于具有高可靠性,已经日渐广泛应用于各领域。去中心化网络,即包含多个节点,且该网络中不存在中心节点。在进行上述信息处理时,可采用去中心网络中的各节点协同实现利用模型进行信息处理。也即,各节点利用其相应模型对输入信息进行处理,以输出结果。At present, it is usually necessary to use various data models to realize information processing, such as image recognition using a recognition model. Nowadays, decentralized networks have been widely used in various fields due to their high reliability. A decentralized network contains multiple nodes, and there is no central node in the network. When performing the above information processing, each node in the decentralized network can be used to cooperate to realize the information processing using the model. That is, each node uses its corresponding model to process the input information to output the result.
在进行上述利用模型进行信息处理之前,需要先训练得到相关模型。对于去中心化网络,由于其不具有中心节点,即无法实现基于中心节点或参数节点对模型进行训练。Before performing the above information processing by using the model, it is necessary to train to obtain the relevant model. For the decentralized network, because it does not have a central node, it is impossible to train the model based on the central node or the parameter node.
【发明内容】[Invention content]
本申请主要解决的技术问题是提供一种模型训练方法及其节点、网络及存储装置,实现基于去中心化网络对模型的训练。The technical problem that this application mainly solves is to provide a model training method and its nodes, network and storage device to realize the training of the model based on the decentralized network.
为解决上述技术问题,本申请第一方面提供一种模型训练方法,所述方法应用于包含至少一组节点的去中心化网络中,其中,每组节点包括至少一个节点,至少部分所述节点用于训练得到所述模型的模型参数;所述方法包括:当前节点在本组内采用预设去中心化训练策略得到对所述模型的组内参数;利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数。In order to solve the above technical problems, the first aspect of the present application provides a model training method, which is applied to a decentralized network containing at least one group of nodes, where each group of nodes includes at least one node, and at least part of the nodes Used for training to obtain the model parameters of the model; the method includes: the current node adopts a preset decentralized training strategy within the group to obtain the group parameters for the model; using the group parameters and the The weights of neighboring nodes outside the current node group with respect to the current node obtain model parameters for the model.
为了解决上述技术问题,本申请第二方面提供一种去中心化网络的节 点,包括处理器及与所述处理器耦接的存储器和通信电路,其中,所述通信电路用于与所述去中心化网络的其他节点通信;所述存储器,用于存储程序指令;所述处理器,用于运行所述程序指令以执行上述方法。In order to solve the above technical problems, a second aspect of the present application provides a node of a decentralized network, including a processor and a memory and a communication circuit coupled to the processor, wherein the communication circuit is used to Communicate with other nodes of the centralized network; the memory is used to store program instructions; the processor is used to run the program instructions to perform the above method.
为了解决上述技术问题,本申请第三方面提供一种去中心化网络,所述去中心化网络包括至少一组节点,每组节点包括至少一个上述的节点。In order to solve the above technical problems, a third aspect of the present application provides a decentralized network. The decentralized network includes at least one group of nodes, and each group of nodes includes at least one of the foregoing nodes.
为了解决上述技术问题,本申请第四方面提供一种存储装置,所述存储装置存储有程序指令,当所述程序指令在处理器上运行时,执行上述第一方面所述的方法。In order to solve the above technical problems, a fourth aspect of the present application provides a storage device that stores program instructions, and when the program instructions run on a processor, execute the method described in the first aspect above.
上述方案,采用在组内采用预设去中心化训练策略得到对模型的组内参数,再利用组外邻居节点的权重对该组内参数进行加权,以实现在去中心化网络中利用其普通节点即可得到该模型的模型参数,无需中心节点。。The above scheme adopts a preset decentralized training strategy within the group to obtain the parameters within the group, and then uses the weights of the neighbor nodes outside the group to weight the parameters within the group, so as to realize the use of its common parameters in the decentralized network. The node can get the model parameters of the model without the central node. .
【附图说明】【Explanation】
图1是本申请去中心化网络一实施例的结构示意图;FIG. 1 is a schematic structural diagram of an embodiment of a decentralized network of this application;
图2是本申请模型训练方法一实施例的流程示意图;2 is a schematic flowchart of an embodiment of a model training method of this application;
图3A是本申请模型训练方法另一实施例中步骤S220的流程示意图;3A is a schematic flowchart of step S220 in another embodiment of the model training method of the present application;
图3B是本申请模型训练方法又一实施例中步骤S220的流程示意图;3B is a schematic flowchart of step S220 in another embodiment of the model training method of the present application;
图4是本申请模型训练方法再一实施例的流程示意图;4 is a schematic flowchart of still another embodiment of the model training method of the present application;
图5是本申请去中心化网络的节点一实施例的结构示意图;5 is a schematic structural diagram of an embodiment of a node of a decentralized network of this application;
图6是本申请存储装置一实施例的结构示意图。6 is a schematic structural diagram of an embodiment of a storage device of the present application.
【具体实施方式】【detailed description】
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solution of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中的“多” 表示的是至少为2。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include the majority forms unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. In addition, "multiple" in this article means at least 2.
请参阅图1,图1是本申请去中心化网络一实施例的结构示意图。本实施例中,该去中心化网络10包括多个节点11,其中,该多个节点11划分为至少一组节点,每组节点包括至少一个节点11。如图1所示,在该去中心化网络集群10中,多个节点11被划分为3组,每组包括3个节点11。具体而言,该节点11可以为手机、平板电脑、电脑、或者服务器等通信设备。Please refer to FIG. 1, which is a schematic structural diagram of an embodiment of a decentralized network of the present application. In this embodiment, the decentralized network 10 includes multiple nodes 11, wherein the multiple nodes 11 are divided into at least one group of nodes, and each group of nodes includes at least one node 11. As shown in FIG. 1, in the decentralized network cluster 10, multiple nodes 11 are divided into three groups, and each group includes three nodes 11. Specifically, the node 11 may be a communication device such as a mobile phone, a tablet computer, a computer, or a server.
该去中心化网络10不存在中心节点,节点11之间可直接进行相互通信,而不需要所有节点11均通过中心节点方可实现通信。例如,组内节点11均可进行彼此通信,且每个节点11可与其他每个组的至少一个节点11通信,其中,其他组中与该节点11通信的至少一个节点11称为该节点11的组外通信节点。本实施例中,除组外通信节点以外,节点11与其他组中的其他节点11均不可进行直接通信。There is no central node in the decentralized network 10, and the nodes 11 can directly communicate with each other, and it is not necessary for all nodes 11 to communicate through the central node. For example, the nodes 11 in the group can communicate with each other, and each node 11 can communicate with at least one node 11 of each other group, wherein at least one node 11 in another group that communicates with the node 11 is called the node 11 Communication node outside the group. In this embodiment, except for the communication nodes outside the group, the node 11 cannot directly communicate with other nodes 11 in other groups.
本实施例中,去中心化网络10可用于创建模型以及利用创建的模型进行数据处理。In this embodiment, the decentralized network 10 may be used to create models and use the created models for data processing.
具体地,去中心化网络10的每个节点11可用于在进行上述信息处理时,可采用去中心网络中的各节点协同实现利用模型进行信息处理。也即,各节点利用其相应模型对输入信息进行处理,以输出结果。在一应用场景中,各节点可负责该模型的不同部分,例如该模型为神经网络模型,该神经网络模型的不同网络层被分配到不同的节点,以由不同节点及负责不同的模型处理部分,即实现模型并行化;在另一引用场景中,各节点均负责该模型的所有部分,例如,不同节点设有同一个模型的多个副本,每个节点分配到数据的一部分,然后将所有节点的计算结果按照某种方式合并。Specifically, each node 11 of the decentralized network 10 may be used to perform information processing by using a model in coordination with each node in the decentralized network when performing the above information processing. That is, each node uses its corresponding model to process the input information to output the result. In an application scenario, each node may be responsible for different parts of the model. For example, the model is a neural network model, and different network layers of the neural network model are assigned to different nodes, so that different nodes and different model processing parts , That is, parallelize the model; in another reference scenario, each node is responsible for all parts of the model, for example, different nodes have multiple copies of the same model, each node is assigned to a part of the data, and then all The calculation results of the nodes are combined in a certain way.
在上述利用模型进行信息处理之前,该去中心化网络10可先进行模型训练,以得到该模型的模型参数,进而利用该模型参数对应的模型实现如上所示的信息处理。Before using the model for information processing, the decentralized network 10 may first perform model training to obtain model parameters of the model, and then use the model corresponding to the model parameters to implement information processing as shown above.
本实施例中,该去中心化网络10中每个节点均用于训练得到所述模型的模型参数。例如,去中心化网络10的每组节点11先训练得到该模型的组内参数,进而利用不同组的邻居节点的权重及该组内参数得到该模型的模型参数。进一步地,为得到准确的模型参数,可对该模型参数进行多次 如上所示的迭代处理。In this embodiment, each node in the decentralized network 10 is used to train to obtain the model parameters of the model. For example, each group of nodes 11 of the decentralized network 10 is first trained to obtain the group parameters of the model, and then the weights of neighbor nodes of different groups and the group parameters are used to obtain the model parameters of the model. Further, in order to obtain accurate model parameters, the model parameters may be iterated as many times as shown above.
为更好理解,下面举例列出本申请去中心网络对模型参数的一种训练原理。本例中,利用上述分组的去中心化网络实现优化目标函数的机器学习算法,进而实现对模型参数的训练,其中可基于梯度下降方式来优化目标函数。具体地,该去中心化网络的模型参数训练方式,相当于求解下面目标函数J:For better understanding, the following example lists a training principle for the model parameters of the decentralized network of this application. In this example, the above grouped decentralized network is used to implement a machine learning algorithm that optimizes the objective function, and then realizes the training of model parameters, where the objective function can be optimized based on the gradient descent method. Specifically, the model parameter training method of the decentralized network is equivalent to solving the following objective function J:
Figure PCTCN2018118291-appb-000001
Figure PCTCN2018118291-appb-000001
其中,J k(x)为第k个节点11的子目标函数,N为去中心网络的节点数量,
Figure PCTCN2018118291-appb-000002
表示将J定义为
Figure PCTCN2018118291-appb-000003
Where J k (x) is the sub-objective function of the k-th node 11, N is the number of nodes in the decentralized network,
Figure PCTCN2018118291-appb-000002
Means that J is defined as
Figure PCTCN2018118291-appb-000003
本例中,该去中心化网络的参数训练方法是要通过让去中心化网络10中的所有节点基于局部数据各自优化子目标数据,然后通过与去中心化网络中其他节点交换迭代参数,通过一定次数的迭代可使得去中心化网络10中所有节点的解都收敛于目标函数的一个近似解,如无偏最优解,进而得到模型的模型参数。In this example, the parameter training method of the decentralized network is to let all nodes in the decentralized network 10 optimize sub-target data based on local data, and then exchange iterative parameters with other nodes in the decentralized network. A certain number of iterations can make the solutions of all nodes in the decentralized network 10 converge to an approximate solution of the objective function, such as an unbiased optimal solution, and then obtain model parameters of the model.
基于上述训练原理或者其他类似的训练原理,该去中心化网络可实现对其模型的训练。具体地,该去中心化网络可采用下述训练方法来训练其模型,进而得到模型参数。Based on the above training principle or other similar training principles, the decentralized network can realize the training of its model. Specifically, the decentralized network can use the following training methods to train its model, and then obtain model parameters.
请参阅图2,图2是本申请模型训练方法一实施例的流程示意图。本实施例中,该方法应用于如上面所述的去中心化网络中,利用去中心化网络的每个节点训练得到该模型的模型参数。具体地,该方法包括以下步骤:Please refer to FIG. 2, which is a schematic flowchart of an embodiment of a model training method of the present application. In this embodiment, the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model. Specifically, the method includes the following steps:
S210:当前节点在本组内采用预设去中心化训练策略得到对所述模型的组内参数。S210: The current node adopts a preset decentralized training strategy in the group to obtain the group parameters for the model.
该当前节点即为上述去中心化网络中的任意节点。The current node is any node in the above decentralized network.
本实施例中,可采用迭代方式训练得到模型的模型参数。具体地,当前节点可利用自身前次迭代得到的模型参数,在本组内采用预设去中心化训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数,进而再步骤S220中利用本地迭代的组内参数得到本地迭代的模型参数。由此,利用本训练方法对该模型参数进行不断迭代更新,在迭代到一定次数之后,该模型参数收敛,可取收敛的模型参数作为最终训练得到的模型参数。In this embodiment, the model parameters of the model can be obtained by iterative training. Specifically, the current node can use the model parameters obtained from its previous iteration to adopt this preset decentralized training strategy in this group for this iteration training to obtain the group parameters of the model for this iteration, and then step In S220, the local iteration model parameters are obtained using the local iteration group parameters. Therefore, the model parameters are continuously updated iteratively using this training method. After iterating to a certain number of times, the model parameters converge, and the converged model parameters can be taken as the final training model parameters.
具体地,该预设去中心化训练策略包括但不限于以下策略:基于gossip的训练策略、增量训练策略、共识训练策略或扩散训练策略。该扩散训练策略可具体为(A Multitask Diffusion Strategy with Optimized Inter-Cluster Cooperation)。可利用上述训练策略,对模型参数进行迭代以得到收敛于无偏最优解。例如,当gossip训练方法的随机策略中任意节点被选择的概率达到一致随机分布时,所有节点的解都收敛于无偏最优解。其他三种策略也可以收敛到无偏最优解。Specifically, the preset decentralized training strategy includes but is not limited to the following strategies: gossip-based training strategy, incremental training strategy, consensus training strategy or diffusion training strategy. The diffusion training strategy can be specifically (A Multitask Diffusion Strategy with Optimized Inter-Cluster Cooperation). The above training strategy can be used to iterate the model parameters to obtain an unbiased optimal solution. For example, when the probability of any node being selected in the random strategy of the gossip training method reaches a uniform random distribution, the solutions of all nodes converge to the unbiased optimal solution. The other three strategies can also converge to the unbiased optimal solution.
其中,该基于gossip的训练策略,指网络中每个节点周期性的从所有节点中通过一定随机策略,每次只和一个其他节点交换参数,并迭代;节点k在第t次迭代的模型参数w k,t更新过程可如下:
Figure PCTCN2018118291-appb-000004
其中,w k,t-1为节点k在第t-1次迭代的模型参数,l为随机挑选的其他邻居节点的序号,w l,t-1为节点l在第t-1次迭代的模型参数。对于本申请在组内节点采用该训练策略而言,该基于gossip的训练策略即可以理解为利用当前节点所在组中的每个节点周期性的从所在组中的通过一定随机策略,每次只和一个所在组内的其他节点交换参数,并迭代。其中,Gossip是一种去中心化、容错并保证最终一致性的协议。
Among them, the gossip-based training strategy refers to that each node in the network periodically passes a certain random strategy from all nodes, only exchanges parameters with one other node at a time, and iterates; the model parameters of node k at the tth iteration The update process of w k,t can be as follows:
Figure PCTCN2018118291-appb-000004
Among them, w k,t-1 is the model parameter of node k at t-1th iteration, l is the sequence number of other neighbor nodes randomly selected, w l,t-1 is the node l at t-1th iteration Model parameters. For the application of this training strategy to the nodes in the group in this application, the training strategy based on gossip can be understood as using each node in the group where the current node is located to periodically pass a certain random strategy from the group where it belongs, Exchange parameters with other nodes in a group and iterate. Among them, Gossip is a decentralized, fault-tolerant protocol that guarantees eventual consistency.
同理,增量训练策略,即采用以下公式对模型参数进行迭代,节点k在第t次迭代的模型参数w k,t更新过程可如下:
Figure PCTCN2018118291-appb-000005
其中,w k,t-1为节点k在第t-1次迭代的模型参数;u为迭代因子,例如为0-1的值;N表示所在网络的节点数,
Figure PCTCN2018118291-appb-000006
表示梯度,J k(w k,t-1)为节点k的变量为模型参数w的目标函数,
Figure PCTCN2018118291-appb-000007
表示代入具体模型参数后的目标函数的梯度值。对于本申请在组内节点采用该训练策略而言,可对上述公式进行适当变换,以得到对组内节点进行增量训练策略的具体算法。
In the same way, the incremental training strategy is to iterate the model parameters using the following formula , and the update process of the model parameters w k,t of the node k at the t-th iteration can be as follows:
Figure PCTCN2018118291-appb-000005
Among them, w k, t-1 is the model parameter of node k in the t-1th iteration; u is the iteration factor, for example, the value of 0-1; N represents the number of nodes in the network,
Figure PCTCN2018118291-appb-000006
Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w,
Figure PCTCN2018118291-appb-000007
It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain a specific algorithm for performing an incremental training strategy on the nodes in the group.
共识训练策略,即采用以下公式对模型参数进行迭代,节点k在第t次迭代的模型参数w k,t更新过程可如下:
Figure PCTCN2018118291-appb-000008
Figure PCTCN2018118291-appb-000009
其中,w k,t-1为节点k在第t-1次迭代的模型参数,N k表示节点k的邻居节点的所有序号,w l,t-1为邻居节点l在第t-1次迭代的模型参数,c lk为节点k的邻居节点l的权重因子,u k为组合梯度的权重因子,
Figure PCTCN2018118291-appb-000010
表 示梯度,J k(w k,t-1)为节点k的变量为模型参数w的目标函数,
Figure PCTCN2018118291-appb-000011
表示代入具体模型参数后的目标函数的梯度值。对于本申请在组内节点采用该训练策略而言,可对上述公式进行适当变换,以得到对组内节点进行共识策略的算法,具体可如下面相关描述。
Consensus training strategy, that is to use the following formula to iterate the model parameters, the node k at the tth iteration of the model parameter w k, t update process can be as follows:
Figure PCTCN2018118291-appb-000008
Figure PCTCN2018118291-appb-000009
Where w k,t-1 is the model parameter of node k at the t-1th iteration, N k represents all the sequence numbers of the neighbor nodes of node k, and w l,t-1 is the neighbor node l at the t-1th time right neighbor node l iterative model parameters, c lk weighting factor for the node k, u k is the weight of the composition gradient weighting factor,
Figure PCTCN2018118291-appb-000010
Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w,
Figure PCTCN2018118291-appb-000011
It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain an algorithm for performing a consensus strategy on the nodes in the group, as described in detail below.
扩展训练策略,即采用以下公式对模型参数进行迭代,节点k在第t次迭代的模型参数w k,t更新过程可如下:
Figure PCTCN2018118291-appb-000012
Figure PCTCN2018118291-appb-000013
其中,w k,t-1为节点k在第t-1次迭代的模型参数,N k表示节点k的邻居节点的所有序号,c lk为节点k的邻居节点l的权重因子,u k为组合梯度的权重因子,
Figure PCTCN2018118291-appb-000014
表示梯度,J k(w k,t-1)为节点k的变量为模型参数w的目标函数
Figure PCTCN2018118291-appb-000015
表示代入具体模型参数后的目标函数的梯度值。对于本申请在组内节点采用该训练策略而言,可对上述公式进行适当变换,以得到对组内节点进行扩展训练策略的算法,具体可如下面相关描述。
To extend the training strategy, iterate the model parameters using the following formula , and the update process of the model parameters w k,t at the t-th iteration of node k can be as follows:
Figure PCTCN2018118291-appb-000012
Figure PCTCN2018118291-appb-000013
Among them, w k, t-1 is the model parameter of node k at the t-1th iteration, N k represents all the sequence numbers of the neighbor nodes of node k, clk is the weight factor of neighbor node l of node k, and u k is The weighting factor of the combined gradient,
Figure PCTCN2018118291-appb-000014
Represents the gradient, J k (w k,t-1 ) is the variable of node k is the objective function of the model parameter w
Figure PCTCN2018118291-appb-000015
It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain an algorithm for expanding the training strategy for the nodes in the group, as described in detail below.
上述训练策略具体也可参考现有相关技术,在此不做赘述。For the specific training strategy, reference may also be made to existing related technologies, which will not be repeated here.
为便于大家理解,下面采用共识训练策略和扩展训练策略实现对该组内参数的迭代更新,以对本步骤进行详细描述。In order to facilitate everyone's understanding, the following uses a consensus training strategy and an extended training strategy to implement iterative update of the parameters in the group to describe this step in detail.
第一例中,参阅图3A,本S210采用扩展训练策略实现对该组内参数的迭代更新,具体包括以下子步骤:In the first example, referring to FIG. 3A, this S210 adopts an extended training strategy to implement iterative update of the parameters in the group, specifically including the following sub-steps:
S311:当前节点利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数。S311: The current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
具体如,当前节点可利用所述下列公式(1)得到所述当前节点在本次迭代的初始参数Ψ k,tFor example, the current node may use the following formula (1) to obtain the initial parameter Ψ k,t of the current node in this iteration;
Ψ k,t=w k,t-1+u kr k,t(d k,t-r k,tw k,t-1)    (1) Ψ k,t = w k,t-1 +u k r k,t (d k,t -r k,t w k,t-1 ) (1)
其中,本次为第t次迭代,前次为第t-1迭代,所述k为当前节点的序号,所述w k,t-1为所述当前节点在前次迭代得到的模型参数,所述u k,r k,t,d k,t为本次迭代的参考参数,其中所述u k表示权重因子;所述r k,t表示随机因子;所述d k,t=r k,t·ρ+v k,t,所述ρ为超参数,所述v k,t为随机参数。本实施例中,该u k为和为一的一组权重因子;所述v k,t为零均值的一组随机参数,即该v k,t为-1到1之间的随机数,且v k,t分布的平均值为0。 Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The u k , r k,t and d k,t are reference parameters of this iteration, where the u k represents a weighting factor; the r k,t represents a random factor; and the d k,t =r k ,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter. In this embodiment, the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
S312:当前节点根据当前节点在本次迭代的初始参数与本组其他节点 在本次迭代的初始参数,得到当前节点在本次迭代对所述模型的组内参数。S312: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
具体如,当前节点利用下列公式(2)得到所述当前节点在本次迭代的组内参数Φ k,tSpecifically, the current node uses the following formula (2) to obtain the parameter Φ k,t of the current node in this iteration group;
Figure PCTCN2018118291-appb-000016
Figure PCTCN2018118291-appb-000016
其中,本次为第t次迭代,所述k为当前节点的序号,所述G k表示本组内节点的序号,所述g l为本组内节点l相对于所述当前节点的权重,所述Ψ l,t为本组内节点l在本次迭代的初始参数。 Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G k represents the sequence number of the node in the group, the g l is the weight of the node l in the group relative to the current node, The Ψ l, t is the initial parameter of the node l in this group in this iteration.
故,由上述公式1和2即可得到当前节点在本次迭代的初始参数。Therefore, the initial parameters of the current node in this iteration can be obtained from the above formulas 1 and 2.
第二例中,参阅图3B,本S210采用共识训练策略实现对该组内参数的迭代更新,具体包括以下子步骤:In the second example, referring to FIG. 3B, this S210 adopts a consensus training strategy to implement iterative update of the parameters within the group, specifically including the following sub-steps:
S313:当前节点利用自身前次迭代得到的模型参数以及本组其他节点相对于当前节点的权重得到所述当前节点在本地迭代的初始参数。S313: The current node obtains the initial parameters of the current node in the local iteration by using the model parameters obtained by itself in the previous iteration and the weights of other nodes in the group relative to the current node.
具体如,当前节点可利用所述下列公式(3)得到所述当前节点在本次迭代的初始参数Ψ k,t-1Specifically, the current node may use the following formula (3) to obtain the initial parameter Ψ k,t-1 of the current node in this iteration;
Figure PCTCN2018118291-appb-000017
Figure PCTCN2018118291-appb-000017
其中,本次为第t次迭代,前次为第t-1迭代,所述k为当前节点的序号,所述w k,t-1为所述当前节点在前次迭代得到的模参数,所述G k表示本组内节点的序号,所述g l为本组内节点l相对于所述当前节点的权重。 Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the modulus parameter obtained by the current node in the previous iteration, The G k represents the serial number of the node in the group, and the g l is the weight of the node l in the group relative to the current node.
S314:当前节点根据所述当前节点在本次迭代的初始参数与本组迭代的参考参数,得到所述当前节点在本次迭代对所述模型的组内参数。S314: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the reference parameters of the group of iterations.
具体如,当前节点利用下列公式(4)得到所述当前节点在本次迭代的组内参数Φ k,tSpecifically, the current node uses the following formula (4) to obtain the parameter Φ k,t of the current node in this iteration group;
Φ k,t=Ψ k,t-1+2u kr k,t(d k,t-r k,tw k,t-1)    (4) Φ k,t = Ψ k,t-1 +2u k r k,t (d k,t -r k,t w k,t-1 ) (4)
其中,本次为第t次迭代,前次为第t-1迭代,所述k为当前节点的序号,所述w k,t-1为所述当前节点在前次迭代得到的模型参数,所述u k,r k,t,d k,t为本次迭代的参考参数,其中所述u k表示权重因子;所述r k,t表示随机因子;所述d k,t=r k,t·ρ+v k,t,所述ρ为超参数,所述v k,t为随机参数。本实施例中,该u k为和为一的一组权重因子;所述v k,t为零均值的一组随机参数,即该v k,t为-1到1之间的随机数,且v k,t分布的平均值为0。 Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The u k , r k,t and d k,t are reference parameters of this iteration, where the u k represents a weighting factor; the r k,t represents a random factor; and the d k,t =r k ,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter. In this embodiment, the u k is a set of weighting factors whose sum is one; the v k,t is a set of random parameters with zero mean, that is, the v k,t is a random number between -1 and 1, And the average value of v k,t distribution is 0.
故,由上述公式3和4即可得到当前节点在本次迭代的初始参数。Therefore, the initial parameters of the current node in this iteration can be obtained from the above formulas 3 and 4.
S220:当前节点利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数。S220: The current node obtains the model parameters for the model by using the parameters in the group and the weights of the neighbor nodes outside the group and the current node relative to the current node.
本实施例中,若S210采用上述迭代方式得到当前节点本次迭代对所述模型的组内参数,则在本步骤S220中,当前节点再利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数。利用本地迭代的组内参数得到本地迭代的模型参数。具体地,当前节点预存有该去中心化网络中每个组外邻居节点相对于该当前节点的权重,其中,该当前节点的组外邻居节点为与当前节点不同组且与当前节点相邻的节点,可以为一个或多个。当前节点在获得本次迭代对所述模型的组内参数后,可将本次迭代对所述模型的组内参数与预存的每个组外邻居节点的权重之间的积相加作为当前节点对所述模型的模型参数。In this embodiment, if S210 uses the iterative method to obtain the in-group parameters of the current node for the model in this iteration, then in step S220, the current node reuses the current node's The in-group parameters and the weights of the out-of-group neighbor nodes of the current node relative to the current node obtain the model parameters for the model in this iteration. The parameters of the local iteration are used to obtain the model parameters of the local iteration. Specifically, the current node pre-stores the weight of each out-of-group neighbor node in the decentralized network relative to the current node, where the out-of-group neighbor node of the current node is a group different from the current node and adjacent to the current node There can be one or more nodes. After the current node obtains the in-group parameters of the model for this iteration, it can add the product between the in-group parameters of the model for this iteration and the weights of the pre-stored neighbor nodes of each group as the current node Model parameters for the model.
例如,当前节点可利用下列公式(5)得到所述当前节点在本次迭代对所述模型的模型参数w k,tFor example, the current node can use the following formula (5) to obtain the model parameter w k,t of the current node for the model in this iteration;
Figure PCTCN2018118291-appb-000018
Figure PCTCN2018118291-appb-000018
其中,本次为第t次迭代,所述k为当前节点的序号,所述N k表示所述当前节点的组外邻居节点的序号,所述c l为组外邻居节点l相对于所述当前节点的权重,所述Φ k,t为所述当前节点在本次迭代的组内参数。 Here, this is the t-th iteration, the k is the sequence number of the current node, the N k represents the sequence number of the out-group neighbor node of the current node, and the cl is the out-group neighbor node l relative to the The weight of the current node. The Φ k,t is the parameter of the current node in the group of this iteration.
本实施例中,采用在组内采用预设去中心化训练策略得到对模型的组内参数,再利用组外邻居节点的权重对该组内参数进行加权,以实现在去中心化网络中利用其普通节点即可得到该模型的模型参数,无需中心节点。而且,在使用迭代训练过程中,先再组内得到模型组内参数,再在组件加权得到模型参数,提高了该模型参数的收敛速度。例如,考虑分组的去中心化网络的特性,为了使得算法在并行化训练的时候能够更快的收敛到渐进无偏最优解,本实施例采用先在组内采用如上述预设去中心化训练策略进行参数更新再组间进行合并的训练方式。进一步地,为实现更快的收敛,可采用上述的扩展训练策略。In this embodiment, a preset decentralized training strategy is adopted within the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve utilization in the decentralized network Its ordinary nodes can get the model parameters of the model, without the central node. Moreover, in the process of using iterative training, the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters. For example, considering the characteristics of a grouped decentralized network, in order to enable the algorithm to converge to an asymptotically unbiased optimal solution faster during parallel training, this embodiment adopts the above-mentioned preset decentralization within the group. The training strategy updates the parameters and then merges between the groups. Further, in order to achieve faster convergence, the aforementioned extended training strategy may be adopted.
进一步地,为防止节点之间传输的数据泄露的问题,可对上述组内参数和/或模型参数进行加噪处理。例如,在上述S210之后,利用预设噪声对所述当前节点在本次迭代的组内参数进行加噪,并将加噪后的组内参数更 新为所述当前节点在本次迭代的组内参数;进一步地,可在S210中的子步骤S311或S313之后,利用预设噪声对所述当前节点在本次迭代的初始参数进行加噪,并将加噪后的初始参数更新为所述当前节点在本次迭代的初始参数。在上述S220之后,利用预设噪声对所述在本次迭代对所述模型的模型参数进行加噪,并将加噪后的模型参数更新为所述在本次迭代对所述模型的模型参数。其中,上述预设噪声为差分隐私的噪声,例如为拉普拉斯(laplacian)随机噪声。具体地,所述拉普拉斯随机噪声可为L(F,ε);其中,所述ε为满足ε差分隐私参数,所述F为预设模型训练目标函数J k的差分隐私敏感性,该预设模型可以为神经网络模型。可以理解的是,在其他实施例中,可仅对上述组内参数、初始参数、模型参数中的部分进行上述加噪过程。 Further, in order to prevent the problem of leakage of data transmitted between nodes, the above-mentioned group parameters and/or model parameters may be subjected to noise processing. For example, after the above S210, the preset node is used to denoise the parameters of the current node in the group of this iteration, and the parameters in the group after the noise are updated to the current node in the group of this iteration Parameters; further, after sub-steps S311 or S313 in S210, the initial parameters of the current node in this iteration can be denoised using preset noise, and the initial parameters after denoising are updated to the current The initial parameters of the node in this iteration. After the above S220, the preset model noise is used to denoise the model parameters of the model in this iteration, and the denoised model parameters are updated to the model parameters of the model in the current iteration . Wherein, the aforementioned preset noise is differential privacy noise, for example, laplacian random noise. Specifically, the Laplace random noise may be L(F, ε); wherein, ε is a differential privacy parameter that satisfies ε, and F is a differential privacy sensitivity of a preset model training objective function J k , The preset model may be a neural network model. It can be understood that, in other embodiments, the above-mentioned noise adding process may be performed only on a part of the above-mentioned group parameters, initial parameters, and model parameters.
请参阅图4,图4是本申请模型训练方法再一实施例的流程图。本实施例中,该方法应用于如上面所述的去中心化网络中,利用去中心化网络的每个节点训练得到该模型的模型参数。该方法在组内采用扩展训练策略进行训练得到组内参数,再在组间加权得到模型参数,且对扩展训练过程的参数和最后得到的模型参数进行差分隐私的加噪处理,以防止数据间接泄露。具体地,该方法包括以下步骤:Please refer to FIG. 4, which is a flowchart of still another embodiment of the model training method of the present application. In this embodiment, the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model. This method uses the extended training strategy to train within the group to obtain the parameters within the group, and then weights between the groups to obtain the model parameters, and performs differential privacy denoising on the parameters of the extended training process and the final model parameters to prevent data indirection Give way. Specifically, the method includes the following steps:
S410:当前节点利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数。S410: The current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.
具体地,当前节点可利用如上面所述的公式(1),由自身前次迭代得到的模型参数w k,t-1以及本次迭代的参考参数u k,r k,t,d k,t,得到所述当前节点在本次迭代的初始参数Ψ k,tSpecifically, the current node can use the formula (1) as described above, the model parameters w k, t-1 obtained from the previous iteration of itself , and the reference parameters u k , r k, t , d k, t to obtain the initial parameter Ψ k,t of the current node in this iteration.
S420:当前节点利用预设噪声对当前节点在本次迭代的初始参数进行加噪,并将加噪后的初始参数更新为当前节点在本次迭代的初始参数。S420: The current node uses the preset noise to add noise to the initial parameter of the current node in this iteration, and updates the initial parameter after noise addition to the initial parameter of the current node in this iteration.
本实施例中,在当前节点计算组内梯度扩展更新时加入该预设噪声。该预设噪声为拉普拉斯随机噪声。当前节点利用下面公式(6),为所述当前节点在本次迭代的初始参数Ψ k,t增加所述预设噪声与所述当前节点的组内邻居节点数
Figure PCTCN2018118291-appb-000019
并将加噪后的初始参数更新为所述当前节点在本次迭代的初始参数Ψ′ k,t
In this embodiment, the preset noise is added when the gradient expansion update in the current node calculation group. The preset noise is Laplace random noise. The current node uses the following formula (6) to increase the preset noise and the number of neighbor nodes in the current node's group for the initial parameter Ψ k,t of the current node in this iteration
Figure PCTCN2018118291-appb-000019
And update the initial parameter after noise addition to the initial parameter Ψ′ k,t of the current node in this iteration.
Figure PCTCN2018118291-appb-000020
Figure PCTCN2018118291-appb-000020
其中,L(F,ε)为拉普拉斯随机噪声,所述ε为满足ε差分隐私参数,所述F为神经网络模型训练目标函数J k的差分隐私敏感性;所述
Figure PCTCN2018118291-appb-000021
为当前节点的组内邻居节点数。
Where L(F, ε) is Laplace random noise, ε is the differential privacy parameter that satisfies ε, and F is the differential privacy sensitivity of the neural network model training objective function J k ;
Figure PCTCN2018118291-appb-000021
The number of neighbor nodes in the current node group.
S430:当前节点根据当前节点在本次迭代的初始参数与本组其他节点在本次迭代的初始参数,得到当前节点在本次迭代对所述模型的组内参数。S430: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.
具体地,当前节点可利用如下列的公式(7)得到所述当前节点在本次迭代的组内参数Φ k,tSpecifically, the current node can use the following formula (7) to obtain the parameter Φ k,t of the current node in this iteration group;
Figure PCTCN2018118291-appb-000022
Figure PCTCN2018118291-appb-000022
其中,本次为第t次迭代,所述k为当前节点的序号,所述G k表示本组内节点的序号,所述g l为本组内节点l相对于所述当前节点的权重,所述Ψ l,t′为本组内节点l在本次迭代经加噪后的初始参数。 Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G k represents the sequence number of the node in the group, the g l is the weight of the node l in the group relative to the current node, The Ψ l, t ′ is the initial parameter of the node l in the group after the noise is added in this iteration.
S440:当前节点利用当前节点在本次迭代对模型的组内参数和与当前节点的组外邻居节点相对于当前节点的权重,得到在本次迭代对模型的模型参数。S440: The current node uses the in-group parameters of the current node to the model in this iteration and the weights of the out-group neighbor nodes of the current node relative to the current node to obtain the model parameters of the model in this iteration.
具体地,当前节点可利用如上面所述的公式(5),由当前节点在本次迭代的组内参数Φ k,t以及组外邻居节点相对于当前节点的权重c l,得到所述当前节点在本次迭代对模型的模型参数w k,tSpecifically, the current node may use the formula (5) as described above, and the current node’s in-group parameter Φ k,t and the weight c l of the out-of- group neighbor nodes relative to the current node in this iteration are used to obtain the current The node's model parameters w k,t at this iteration.
S450:当前节点利用预设噪声对在本次迭代对模型的模型参数进行加噪,并将加噪后的模型参数更新为在本次迭代对模型的模型参数。S450: The current node uses preset noise to denoise the model parameters of the model in this iteration, and updates the model parameters after the denoising to the model parameters of the model in this iteration.
例如,该预设噪声为拉普拉斯随机噪声。当前节点利用下面公式(8),为所述当前节点在本次迭代的模型参数w k,t增加所述预设噪声与所述当前节点的组外邻居节点数
Figure PCTCN2018118291-appb-000023
并将加噪后的模型参数更新为所述当前节点在本次迭代的模型参数w k,t′。
For example, the preset noise is Laplace random noise. The current node uses the following formula (8) to increase the preset noise and the number of out-of-group neighbor nodes of the current node's model parameters w k,t in this iteration
Figure PCTCN2018118291-appb-000023
And update the model parameters after noise addition to the model parameters w k,t ′ of the current node in this iteration.
Figure PCTCN2018118291-appb-000024
Figure PCTCN2018118291-appb-000024
其中,L(F,ε)为拉普拉斯随机噪声,所述ε为满足ε差分隐私参数,所述F为神经网络模型训练目标函数J k的差分隐私敏感性;所述
Figure PCTCN2018118291-appb-000025
为当前节点的组外邻居节点数。
Where L(F, ε) is Laplace random noise, ε is the differential privacy parameter that satisfies ε, and F is the differential privacy sensitivity of the neural network model training objective function J k ;
Figure PCTCN2018118291-appb-000025
The number of neighbors outside the group of the current node.
本实施例,利用采取组内节点先用扩散策略的更新策略再和组外节点合并的策略,能够加速分布式优化的收敛速度,同时加上差分隐私的噪声可以防止数据间接泄露的问题。In this embodiment, the strategy of adopting the update strategy of the nodes in the group first using the diffusion strategy and then merging with the nodes outside the group can accelerate the convergence speed of distributed optimization, and at the same time, the noise of differential privacy can prevent the problem of indirect data leakage.
请参阅图5,图5是本申请去中心化网络的节点一实施例的结构示意图。本实施例中,该节点50可以为如图1所述的去中心化网络中的节点,包括存储器51、处理器52和通信电路53。其中,通信电路53、存储器51分别耦接处理器52。具体地,节点50的各个组件可通过总线耦合在一起,或者节点50的处理器分别与其他组件一一连接。该节点50可以为手机、笔记本、桌上电脑、服务器等任意通信设备。Please refer to FIG. 5, which is a schematic structural diagram of an embodiment of a node of a decentralized network of the present application. In this embodiment, the node 50 may be a node in the decentralized network as described in FIG. 1 and includes a memory 51, a processor 52, and a communication circuit 53. The communication circuit 53 and the memory 51 are respectively coupled to the processor 52. Specifically, each component of the node 50 may be coupled together through a bus, or the processor of the node 50 may be connected to other components one by one. The node 50 may be any communication device such as a mobile phone, a notebook, a desktop computer, and a server.
通信电路53用于去中心化网络中与其他节点通信。例如,通信电路53可与去中心化网络中的本组内的节点进行通信,以获得本组内其他节点的前次迭代的初始参数。The communication circuit 53 is used to communicate with other nodes in the decentralized network. For example, the communication circuit 53 may communicate with nodes in the group in the decentralized network to obtain initial parameters of previous iterations of other nodes in the group.
存储器51用于存储处理器52执行的程序指令以及处理器52在处理过程中的数据,其中,该存储器51包括非易失性存储部分,用于存储上述程序指令。而且,该存储器51中还可存储有账户相关数据。The memory 51 is used to store program instructions executed by the processor 52 and data during processing of the processor 52, wherein the memory 51 includes a non-volatile storage part for storing the above-mentioned program instructions. Furthermore, the memory 51 may also store account related data.
处理器52控制节点50的操作,处理器52还可以称为CPU(Central Processing Unit,中央处理单元)。处理器52可能是一种集成电路芯片,具有信号的处理能力。处理器52还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 52 controls the operation of the node 50, and the processor 52 may also be called a CPU (Central Processing Unit, central processing unit). The processor 52 may be an integrated circuit chip with signal processing capabilities. The processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components . The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
在本实施例中,处理器52通过调用存储器51存储的程序指令,用于:在本组内采用预设去中心化训练策略得到对所述模型的组内参数;利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数。In this embodiment, the processor 52 uses program instructions stored in the memory 51 to: use the preset decentralized training strategy in the group to obtain the group parameters of the model; use the group parameters and The weight of the neighboring node outside the group with the current node relative to the current node obtains the model parameters of the model.
在一些实施例中,处理器52执行所述在本组内采用预设去中心化训练策略得到对所述模型的组内参数,包括:利用自身前次迭代得到的模型参数,在本组内采用预设去中心化训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数;处理器52执行所述利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数,包括:利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数。In some embodiments, the processor 52 executes the use of a preset decentralized training strategy in the group to obtain the group parameters for the model, including: using the model parameters obtained by the previous iteration of itself in the group The preset decentralized training strategy is used for this iterative training to obtain the in-group parameters of the model for this iteration; the processor 52 executes the use of the in-group parameters and the out-of-group neighbor nodes with the current node Relative to the weight of the current node, the model parameters for the model are obtained, including: using the current node to compare the in-group parameters of the model and the out-group neighbor nodes of the current node in this iteration The weight of the current node obtains the model parameters of the model in this iteration.
在一些实施例中,所述预设去中心化训练策略包括基于gossip的训练策略、增量训练策略、共识训练策略或扩散训练策略。In some embodiments, the preset decentralized training strategy includes a gossip-based training strategy, an incremental training strategy, a consensus training strategy, or a diffusion training strategy.
在一些实施例中,处理器52执行所述利用前次迭代得到的模型参数,在本组内采用扩散训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数,包括:利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数;根据所述当前节点在本次迭代的初始参数与本组其他节点在本次迭代的初始参数,得到所述当前节点在本次迭代对所述模型的组内参数。In some embodiments, the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a diffusion training strategy to perform this iteration training in this group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained from the previous iteration and the reference parameters of this iteration to obtain the initial parameters of the current node in this iteration; according to the initial parameters of the current node in this iteration and other nodes in this group in this iteration The initial parameters of the current node to obtain the parameters of the current node within the group of the model in this iteration.
进一步地,处理器52执行所述利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数,可具体包括:利用上面所述的公式(1)得到所述当前节点在本次迭代的初始参数Ψ k,tFurther, the processor 52 executes the model parameters obtained by using the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration, which may specifically include: using the formula (1) described above ) Obtain the initial parameter Ψ k,t of the current node in this iteration.
进一步地,处理器52执行所述根据所述当前节点在本次迭代的初始参数与本组其他节点在本次迭代的初始参数,得到所述当前节点在本次迭代对所述模型的组内参数,可具体包括:利用上面所述的公式(2)得到所述当前节点在本次迭代的组内参数Φ k,tFurther, the processor 52 executes the process according to the initial parameters of the current node at this iteration and the initial parameters of other nodes of the group at this iteration to obtain the current node in the group of the model in this iteration The parameters may specifically include: using the formula (2) described above to obtain the parameter Φ k,t of the current node in the group of this iteration.
在一些实施例中,处理器52执行所述利用前次迭代得到的模型参数,在本组内采用共识训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数,包括:利用自身前次迭代得到的模型参数以及本组其他节点相对于当前节点的权重得到所述当前节点在本地迭代的初始参数;根据所述当前节点在本次迭代的初始参数与本组迭代的参考参数,得到所述当前节点在本次迭代对所述模型的组内参数。In some embodiments, the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a consensus training strategy to perform this iteration training within the group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration; according to the initial parameters of the current node in this iteration and the reference of the group of iteration Parameters to get the parameters of the current node within the group of the model in this iteration.
进一步地,处理器52执行所述利用自身前次迭代得到的模型参数以及本组其他节点相对于当前节点的权重得到所述当前节点在本地迭代的初始参数,可具体包括:利用上面所述的公式(3)得到所述当前节点在本次迭代的初始参数Ψ k,t-1Further, the processor 52 executes the model parameters obtained from the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration, which may specifically include: using the above Formula (3) obtains the initial parameter Ψ k,t-1 of the current node in this iteration.
进一步地,处理器52执行所述根据所述当前节点在本次迭代的初始参数与本组迭代的参考参数,得到所述当前节点在本次迭代对所述模型的组内参数,可具体包括:利用上面所述的公式(4)得到所述当前节点在本次迭代的组内参数Φ k,tFurther, the processor 52 executes the parameters according to the initial parameters of the current node in the current iteration and the reference parameters of the set of iterations to obtain the parameters of the current node for the model in the current iteration, which may specifically include : Use the formula (4) described above to obtain the parameter Φ k,t of the current node in the group of this iteration.
在一些实施例中,处理器52还用于:利用预设噪声对所述当前节点在 本次迭代的初始参数进行加噪,并将加噪后的初始参数更新为所述当前节点在本次迭代的初始参数。In some embodiments, the processor 52 is further configured to: use preset noise to add noise to the initial parameters of the current node in this iteration, and update the initial parameters after noise addition to the current node at this time The initial parameters of the iteration.
进一步地,处理器52执行所述利用预设噪声对所述当前节点在本次迭代的初始参数进行加噪,可具体包括:为所述当前节点在本次迭代的初始参数增加所述预设噪声与所述当前节点的组内邻居节点数
Figure PCTCN2018118291-appb-000026
Further, the processor 52 executes the use of preset noise to add noise to the initial parameter of the current node in this iteration, which may specifically include: adding the preset to the initial parameter of the current node in this iteration Noise and the number of neighbor nodes in the current node group
Figure PCTCN2018118291-appb-000026
在一些实施例中,处理器52还用于:利用预设噪声对所述在本次迭代对所述模型的模型参数进行加噪,并将加噪后的模型参数更新为所述在本次迭代对所述模型的模型参数。In some embodiments, the processor 52 is further configured to: use preset noise to denoise the model parameters of the model in this iteration, and update the model parameters after denoising to the Iterate over the model parameters of the model.
进一步地,处理器52执行所述利用预设噪声对所述在本次迭代对所述模型的模型参数进行加噪,可具体包括:为所述当前节点在本次迭代对所述模型的模型参数增加所述预设噪声与所述当前节点的组外邻居节点数
Figure PCTCN2018118291-appb-000027
Further, the processor 52 executes the process of adding noise to the model parameters of the model in the current iteration using preset noise, which may specifically include: modeling the model for the current node in the current iteration The parameter increases the preset noise and the number of neighbor nodes outside the group of the current node
Figure PCTCN2018118291-appb-000027
在一些实施例中,所述预设噪声为拉普拉斯随机噪声。进一步地,所述拉普拉斯随机噪声可以为:L(F,ε);其中,所述ε为满足ε差分隐私参数,所述F为预设模型训练目标函数J k的差分隐私敏感性。 In some embodiments, the preset noise is Laplace random noise. Further, the Laplacian random noise may be: L(F, ε); wherein, ε is a differential privacy parameter satisfying ε, and F is a differential privacy sensitivity of a preset model training objective function J k .
在一些实施例中,处理器52执行所述利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数,包括:利用上面所述的公式(5)得到所述当前节点在本次迭代对所述模型的模型参数w k,tIn some embodiments, the processor 52 executes the use of the current node in this iteration for the in-group parameters of the model and the weight of the out-group neighbor nodes of the current node relative to the current node to obtain The model parameters of the model in this iteration include: using the above formula (5) to obtain the model parameters w k,t of the current node for the model in this iteration.
上述处理器52还用于执行上述任一方法实施例的步骤。The above processor 52 is also used to execute the steps of any of the above method embodiments.
请参阅图6,本申请还提供一种存储装置的实施例的结构示意图。本实施例中,该存储装置60存储有处理器可运行的程序指令61,该程序指令61用于执行上述实施例中的方法。Please refer to FIG. 6, the present application also provides a schematic structural diagram of an embodiment of a storage device. In this embodiment, the storage device 60 stores program instructions 61 executable by the processor, and the program instructions 61 are used to execute the method in the foregoing embodiment.
该存储装置60具体可以为U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等可以存储程序指令的介质,或者也可以为存储有该程序指令的服务器,该服务器可将存储的程序指令发送给其他设备运行,或者也可以自运行该存储的程序指令。The storage device 60 may specifically be a medium that can store program instructions, such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk, or It may also be a server that stores the program instructions. The server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.
在一实施例中,存储装置60还可以为如图5所示的存储器。In an embodiment, the storage device 60 may also be a memory as shown in FIG. 5.
上述方案中,采用在组内采用预设去中心化训练策略得到对模型的组 内参数,再利用组外邻居节点的权重对该组内参数进行加权,以实现在去中心化网络中利用其普通节点即可得到该模型的模型参数,无需中心节点。而且,在使用迭代训练过程中,先再组内得到模型组内参数,再在组件加权得到模型参数,提高了该模型参数的收敛速度。In the above scheme, the preset decentralized training strategy is adopted in the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve the use of the Ordinary nodes can get the model parameters of the model, without a central node. Moreover, in the process of using iterative training, the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the device implementation described above is only schematic. For example, the division of modules or units is only a division of logical functions. In actual implementation, there may be other divisions, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序指令的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program instructions .
以上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或 间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the embodiments of the present application, and therefore do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the description and drawings of this application, or directly or indirectly used in other related technical fields, The same reason is included in the scope of patent protection of this application.

Claims (17)

  1. 一种模型训练方法,其特征在于,所述方法应用于包含至少一组节点的去中心化网络中,其中,每组节点包括至少一个节点,至少部分所述节点用于训练得到所述模型的模型参数;A model training method, characterized in that the method is applied to a decentralized network containing at least one group of nodes, wherein each group of nodes includes at least one node, and at least part of the nodes are used for training to obtain the model Model parameters
    所述方法包括:The method includes:
    当前节点在本组内采用预设去中心化训练策略得到对所述模型的组内参数;The current node uses the preset decentralized training strategy in this group to obtain the group parameters for the model;
    利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数。Using the in-group parameters and the weights of the out-of-group neighbor nodes with the current node relative to the current node, the model parameters for the model are obtained.
  2. 根据权利要求1所述的方法,其特征在于,所述在本组内采用预设去中心化训练策略得到对所述模型的组内参数,包括:The method according to claim 1, wherein the use of a preset decentralized training strategy in the group to obtain the group parameters for the model includes:
    利用自身前次迭代得到的模型参数,在本组内采用预设去中心化训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数;Using the model parameters obtained from the previous iteration of itself, the iterative training is performed in the group using a preset decentralized training strategy to obtain the group parameters of the model for this iteration;
    所述利用所述组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到对所述模型的模型参数,包括:The use of the in-group parameters and the weight of the out-of-group neighbor nodes with the current node relative to the current node to obtain model parameters for the model includes:
    利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数。Using the current node's in-group parameters of the model in this iteration and the weight of the current node's out-of-group neighbor nodes relative to the current node, the model parameters of the model in this iteration are obtained.
  3. 根据权利要求2所述的方法,其特征在于,所述预设去中心化训练策略包括基于gossip的训练策略、增量训练策略、共识训练策略或扩散训练策略。The method according to claim 2, wherein the preset decentralized training strategy comprises a gossip-based training strategy, an incremental training strategy, a consensus training strategy or a diffusion training strategy.
  4. 根据权利要求3所述的方法,其特征在于,所述利用前次迭代得到的模型参数,在本组内采用扩散训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数,包括:The method according to claim 3, wherein the model parameters obtained from the previous iteration are used in this group to adopt the diffusion training strategy for this iteration training to obtain the group parameters of the model for this iteration ,include:
    利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数;Use the model parameters obtained from the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration;
    根据所述当前节点在本次迭代的初始参数与本组其他节点在本次迭代的初始参数,得到所述当前节点在本次迭代对所述模型的组内参数。According to the initial parameters of the current node in the current iteration and the initial parameters of the other nodes in the group in the current iteration, the parameters of the current node in the group of the model in the current iteration are obtained.
  5. 根据权利要求4所述的方法,其特征在于,所述利用自身前次迭代 得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数,包括:The method according to claim 4, wherein the using the model parameters obtained by the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration includes:
    利用所述下列公式得到所述当前节点在本次迭代的初始参数Ψ k,tUse the following formula to obtain the initial parameter Ψ k,t of the current node in this iteration;
    Ψ k,t=w k,t-1+u kr k,t(d k,t-r k,tw k,t-1) Ψ k,t = w k,t-1 +u k r k,t (d k,t -r k,t w k,t-1 )
    其中,本次为第t次迭代,前次为第t-1迭代,所述k为当前节点的序号,所述w k,t-1为所述当前节点在前次迭代得到的模型参数,所述u k,r k,t,d k,t为本次迭代的参考参数,其中所述u k表示权重因子;所述r k,t表示随机因子;所述d k,t=r k,t·ρ+v k,t,所述ρ为超参数,所述v k,t为随机参数。 Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The u k , r k,t and d k,t are reference parameters of this iteration, where the u k represents a weighting factor; the r k,t represents a random factor; and the d k,t =r k ,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter.
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述当前节点在本次迭代的初始参数与本组其他节点在本次迭代的初始参数,得到所述当前节点在本次迭代对所述模型的组内参数,包括:The method according to claim 4, characterized in that, according to the initial parameters of the current node in this iteration and the initial parameters of other nodes in this group in this iteration, the pair of the current node in this iteration is obtained Parameters within the group of the model include:
    利用下列公式得到所述当前节点在本次迭代的组内参数Φ k,tUse the following formula to obtain the parameter Φ k,t of the current node in the group of this iteration;
    Figure PCTCN2018118291-appb-100001
    Figure PCTCN2018118291-appb-100001
    其中,本次为第t次迭代,所述k为当前节点的序号,所述G k表示本组内节点的序号,所述g l为本组内节点l相对于所述当前节点的权重,所述Ψ l,t为本组内节点l在本次迭代的初始参数。 Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G k represents the sequence number of the node in the group, the g l is the weight of the node l in the group relative to the current node, The Ψ l, t is the initial parameter of the node l in this group in this iteration.
  7. 根据权利要求3所述的方法,其特征在于,所述利用前次迭代得到的模型参数,在本组内采用共识训练策略进行本次迭代训练,得到本次迭代对所述模型的组内参数,包括:The method according to claim 3, wherein the model parameters obtained from the previous iteration are used in this group to adopt a consensus training strategy for this iteration training to obtain the group parameters of the model for this iteration ,include:
    利用自身前次迭代得到的模型参数以及本组其他节点相对于当前节点的权重得到所述当前节点在本地迭代的初始参数;Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration;
    根据所述当前节点在本次迭代的初始参数与本组迭代的参考参数,得到所述当前节点在本次迭代对所述模型的组内参数。According to the initial parameters of the current node in this iteration and the reference parameters of the group of iterations, the parameters of the current node in the group of the model in this iteration are obtained.
  8. 根据权利要求7所述的方法,其特征在于,所述利用自身前次迭代得到的模型参数以及本组其他节点相对于当前节点的权重得到所述当前节点在本地迭代的初始参数,包括:The method according to claim 7, wherein the using the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration includes:
    利用所述下列公式得到所述当前节点在本次迭代的初始参数Ψ k,t-1Use the following formula to obtain the initial parameter Ψ k,t-1 of the current node in this iteration;
    Figure PCTCN2018118291-appb-100002
    Figure PCTCN2018118291-appb-100002
    其中,本次为第t次迭代,前次为第t-1迭代,所述k为当前节点的序号,所述w k,t-1为所述当前节点在前次迭代得到的模型参数,所述G k表示本组内节点的序号,所述g l为本组内节点l相对于所述当前节点的权重; Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The G k represents the serial number of the node in the group, and the g l is the weight of the node l in the group relative to the current node;
    所述根据所述当前节点在本次迭代的初始参数与本组迭代的参考参数,得到所述当前节点在本次迭代对所述模型的组内参数,包括:According to the initial parameters of the current node in the current iteration and the reference parameters of the group of iterations, the parameters of the current node in the group of the model in the current iteration include:
    利用下列公式得到所述当前节点在本次迭代的组内参数Φ k,tUse the following formula to obtain the parameter Φ k,t of the current node in the group of this iteration;
    Φ k,t=Ψ k,t-1+2u kr k,t(d k,t-r k,tw k,t-1) Φ k,t = Ψ k,t-1 +2u k r k,t (d k,t -r k,t w k,t-1 )
    其中,所述u k,r k,t,d k,t为本次迭代的参考参数,其中所述u k表示权重因子;所述r k,t表示随机因子;所述d k,t=r k,t·ρ+v k,t,所述ρ为超参数,所述v k,t为随机参数。 Where u k , r k,t and d k,t are reference parameters of this iteration, where u k represents a weighting factor; r k,t represents a random factor; and d k,t = r k,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter.
  9. 根据权利要求4至8任一项所述的方法,其特征在于,所述利用自身前次迭代得到的模型参数以及本次迭代的参考参数得到所述当前节点在本次迭代的初始参数之后,所述方法还包括:The method according to any one of claims 4 to 8, wherein the model parameters obtained from the previous iteration of itself and the reference parameters of this iteration are used to obtain the current node after the initial parameters of this iteration, The method also includes:
    利用预设噪声对所述当前节点在本次迭代的初始参数进行加噪,并将加噪后的初始参数更新为所述当前节点在本次迭代的初始参数。Use preset noise to add noise to the initial parameter of the current node in this iteration, and update the initial parameter after noise addition to the initial parameter of the current node in this iteration.
  10. 根据权利要求9所述的方法,其特征在于,所述利用预设噪声对所述当前节点在本次迭代的初始参数进行加噪,包括:The method according to claim 9, wherein the use of preset noise to add noise to the initial parameters of the current node at this iteration includes:
    为所述当前节点在本次迭代的初始参数增加所述预设噪声与所述当前节点的组内邻居节点数
    Figure PCTCN2018118291-appb-100003
    Adding the preset noise and the number of neighbor nodes in the current node group to the initial parameter of the current node in this iteration
    Figure PCTCN2018118291-appb-100003
  11. 根据权利要求2所述的方法,其特征在于,在所述利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数之后,所述方法还包括:The method according to claim 2, characterized in that, in the current iteration, the in-group parameters of the model and neighbor nodes outside the group of the current node are compared with the current node in the current iteration using the current node After obtaining the model parameters of the model in this iteration, the method further includes:
    利用预设噪声对所述在本次迭代对所述模型的模型参数进行加噪,并将加噪后的模型参数更新为所述在本次迭代对所述模型的模型参数。Use preset noise to denoise the model parameters of the model in the current iteration, and update the model parameters after the noise to the model parameters of the model in the current iteration.
  12. 根据权利要求11所述的方法,其特征在于,所述利用预设噪声对所述在本次迭代对所述模型的模型参数进行加噪,包括:The method according to claim 11, wherein the use of preset noise to denoise the model parameters of the model at this iteration includes:
    为所述当前节点在本次迭代对所述模型的模型参数增加所述预设噪声与所述当前节点的组外邻居节点数
    Figure PCTCN2018118291-appb-100004
    Adding the preset noise and the number of out-of-group neighbor nodes of the current node to the model parameters of the model for the current node in this iteration
    Figure PCTCN2018118291-appb-100004
  13. 根据权利要求7至12任一项所述的方法,其特征在于,The method according to any one of claims 7 to 12, characterized in that
    所述预设噪声为拉普拉斯随机噪声;所述拉普拉斯随机噪声为:L(F,ε);其中,所述ε为满足ε差分隐私参数,所述F为预设模型训练目标函数J k的差分隐私敏感性。 The preset noise is Laplace random noise; the Laplace random noise is: L(F, ε); wherein, ε is a privacy privacy parameter that satisfies ε, and F is a preset model training The differential privacy sensitivity of the objective function J k .
  14. 根据权利要求2所述的方法,其特征在于,所述利用所述当前节点在本次迭代对所述模型的组内参数和与所述当前节点的组外邻居节点相对于所述当前节点的权重,得到在本次迭代对所述模型的模型参数,包括:The method according to claim 2, characterized in that the current node is used in this iteration to compare the in-group parameters of the model and the out-group neighbor nodes of the current node with respect to the current node The weights to obtain the model parameters of the model in this iteration include:
    利用下列公式得到所述当前节点在本次迭代对所述模型的模型参数w k,tUse the following formula to obtain the model parameters w k,t of the current node for the model in this iteration;
    Figure PCTCN2018118291-appb-100005
    Figure PCTCN2018118291-appb-100005
    其中,本次为第t次迭代,所述k为当前节点的序号,所述N k表示所述当前节点的组外邻居节点的序号,所述c l为组外邻居节点l相对于所述当前节点的权重,所述Φ k,t为所述当前节点在本次迭代的组内参数。 Here, this is the t-th iteration, the k is the sequence number of the current node, the N k represents the sequence number of the out-group neighbor node of the current node, and the cl is the out-group neighbor node l relative to the The weight of the current node. The Φ k,t is the parameter of the current node in the group of this iteration.
  15. 一种去中心化网络的节点,其特征在于,包括处理器及与所述处理器耦接的存储器和通信电路,其中,A node of a decentralized network is characterized by comprising a processor, a memory and a communication circuit coupled to the processor, wherein,
    所述通信电路用于与所述去中心化网络的其他节点通信;The communication circuit is used to communicate with other nodes of the decentralized network;
    所述存储器,用于存储程序指令;The memory is used to store program instructions;
    所述处理器,用于运行所述程序指令以执行权利要求1至14任一项所述的方法。The processor is configured to execute the program instructions to perform the method according to any one of claims 1 to 14.
  16. 一种去中心化网络,其特征在于,所述去中心化网络包括至少一组节点,每组节点包括至少一个权利要求15所述的节点。A decentralized network, characterized in that the decentralized network includes at least one group of nodes, and each group of nodes includes at least one node according to claim 15.
  17. 一种存储装置,其特征在于,所述存储装置存储有程序指令,当所述程序指令在处理器上运行时,执行如权利要求1-14任一项所述的方法。A storage device, characterized in that the storage device stores program instructions, and when the program instructions run on a processor, the method according to any one of claims 1-14 is executed.
PCT/CN2018/118291 2018-11-29 2018-11-29 Model training method and nodes thereof, network and storage device WO2020107351A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880002436.0A CN109690530A (en) 2018-11-29 2018-11-29 Model training method and its node, network and storage device
PCT/CN2018/118291 WO2020107351A1 (en) 2018-11-29 2018-11-29 Model training method and nodes thereof, network and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/118291 WO2020107351A1 (en) 2018-11-29 2018-11-29 Model training method and nodes thereof, network and storage device

Publications (1)

Publication Number Publication Date
WO2020107351A1 true WO2020107351A1 (en) 2020-06-04

Family

ID=66190447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118291 WO2020107351A1 (en) 2018-11-29 2018-11-29 Model training method and nodes thereof, network and storage device

Country Status (2)

Country Link
CN (1) CN109690530A (en)
WO (1) WO2020107351A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865607A (en) * 2023-03-01 2023-03-28 山东海量信息技术研究院 Distributed training computing node management method and related device
CN116663639A (en) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 Gradient data synchronization method, system, device and medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704178B (en) * 2019-09-04 2023-05-23 北京三快在线科技有限公司 Machine learning model training method, platform, electronic device and readable storage medium
CN111475853B (en) * 2020-06-24 2020-12-11 支付宝(杭州)信息技术有限公司 Model training method and system based on distributed data
CN113065635A (en) * 2021-02-27 2021-07-02 华为技术有限公司 Model training method, image enhancement method and device
CN116150612A (en) * 2021-11-15 2023-05-23 华为技术有限公司 Model training method and communication device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018590A1 (en) * 2016-07-18 2018-01-18 NantOmics, Inc. Distributed Machine Learning Systems, Apparatus, and Methods
CN108491266A (en) * 2018-03-09 2018-09-04 联想(北京)有限公司 Data processing method, device based on block chain and electronic equipment
CN108520303A (en) * 2018-03-02 2018-09-11 阿里巴巴集团控股有限公司 A kind of recommendation system building method and device
CN108683738A (en) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 The calculating task dissemination method of diagram data processing method and diagram data
CN108898219A (en) * 2018-06-07 2018-11-27 广东工业大学 A kind of neural network training method based on block chain, device and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018590A1 (en) * 2016-07-18 2018-01-18 NantOmics, Inc. Distributed Machine Learning Systems, Apparatus, and Methods
CN108520303A (en) * 2018-03-02 2018-09-11 阿里巴巴集团控股有限公司 A kind of recommendation system building method and device
CN108491266A (en) * 2018-03-09 2018-09-04 联想(北京)有限公司 Data processing method, device based on block chain and electronic equipment
CN108683738A (en) * 2018-05-16 2018-10-19 腾讯科技(深圳)有限公司 The calculating task dissemination method of diagram data processing method and diagram data
CN108898219A (en) * 2018-06-07 2018-11-27 广东工业大学 A kind of neural network training method based on block chain, device and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865607A (en) * 2023-03-01 2023-03-28 山东海量信息技术研究院 Distributed training computing node management method and related device
CN116663639A (en) * 2023-07-31 2023-08-29 浪潮电子信息产业股份有限公司 Gradient data synchronization method, system, device and medium
CN116663639B (en) * 2023-07-31 2023-11-03 浪潮电子信息产业股份有限公司 Gradient data synchronization method, system, device and medium

Also Published As

Publication number Publication date
CN109690530A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
WO2020107351A1 (en) Model training method and nodes thereof, network and storage device
US20220318307A1 (en) Generating Neighborhood Convolutions Within a Large Network
Scardapane et al. Distributed semi-supervised support vector machines
van Wyk et al. Evolutionary neural architecture search for image restoration
CN111695415A (en) Construction method and identification method of image identification model and related equipment
Xu et al. An incremental learning vector quantization algorithm for pattern classification
WO2022166115A1 (en) Recommendation system with adaptive thresholds for neighborhood selection
Salehisadaghiani et al. Nash equilibrium seeking by a gossip-based algorithm
Damicelli et al. Topological reinforcement as a principle of modularity emergence in brain networks
Mohapatra et al. Financial time series prediction using distributed machine learning techniques
Ying et al. EAGAN: Efficient two-stage evolutionary architecture search for GANs
US20230132545A1 (en) Methods and Systems for Approximating Embeddings of Out-Of-Knowledge-Graph Entities for Link Prediction in Knowledge Graph
CN113228059A (en) Cross-network-oriented representation learning algorithm
JP7063274B2 (en) Information processing equipment, neural network design method and program
CN108614932B (en) Edge graph-based linear flow overlapping community discovery method, system and storage medium
Ren et al. Personalized federated learning: A Clustered Distributed Co-Meta-Learning approach
CN110867224B (en) Multi-granularity Spark super-trust fuzzy method for large-scale brain pathology segmentation
Wang Multimodal emotion recognition algorithm based on edge network emotion element compensation and data fusion
Nguyen et al. Meta-learning and personalization layer in federated learning
CN111275562A (en) Dynamic community discovery method based on recursive convolutional neural network and self-encoder
CN115544307A (en) Directed graph data feature extraction and expression method and system based on incidence matrix
JP7041239B2 (en) Deep distance learning methods and systems
TWI823488B (en) Method for implementing edge-optimized incremental learning for deep neural network and computer system
CN112465066A (en) Graph classification method based on clique matching and hierarchical pooling
Du et al. A dynamic adaptive iterative clustered federated learning scheme

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18941887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18941887

Country of ref document: EP

Kind code of ref document: A1