WO2020107351A1

WO2020107351A1 - Model training method and nodes thereof, network and storage device

Info

Publication number: WO2020107351A1
Application number: PCT/CN2018/118291
Authority: WO
Inventors: 袁振南; 朱鹏新
Original assignee: 袁振南; 区链通网络有限公司
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-04
Also published as: CN109690530A

Abstract

Disclosed in the present application are a model training method and nodes thereof, a network and a storage device. The method is applied to a decentralized network comprising at least one group of nodes, wherein each group of nodes comprises at least one node, and at least some of the nodes are used for training to obtain model parameters of the model. The method comprises: using a preset decentralized training policy for a current node in the present group to obtain in-group parameters for the model; and obtaining model parameters for the model by using the in-group parameters and weights of out-group neighbor nodes of the current node with respect to the current node. In the above manner, the training of a model based on a decentralized network can be implemented.

Description

Model training method and its node, network and storage device

【Technical Field】

This application relates to the field of blockchain technology, in particular to a model training method and its node, network and storage device.

【Background technique】

At present, it is usually necessary to use various data models to realize information processing, such as image recognition using a recognition model. Nowadays, decentralized networks have been widely used in various fields due to their high reliability. A decentralized network contains multiple nodes, and there is no central node in the network. When performing the above information processing, each node in the decentralized network can be used to cooperate to realize the information processing using the model. That is, each node uses its corresponding model to process the input information to output the result.

Before performing the above information processing by using the model, it is necessary to train to obtain the relevant model. For the decentralized network, because it does not have a central node, it is impossible to train the model based on the central node or the parameter node.

[Invention content]

The technical problem that this application mainly solves is to provide a model training method and its nodes, network and storage device to realize the training of the model based on the decentralized network.

In order to solve the above technical problems, the first aspect of the present application provides a model training method, which is applied to a decentralized network containing at least one group of nodes, where each group of nodes includes at least one node, and at least part of the nodes Used for training to obtain the model parameters of the model; the method includes: the current node adopts a preset decentralized training strategy within the group to obtain the group parameters for the model; using the group parameters and the The weights of neighboring nodes outside the current node group with respect to the current node obtain model parameters for the model.

In order to solve the above technical problems, a second aspect of the present application provides a node of a decentralized network, including a processor and a memory and a communication circuit coupled to the processor, wherein the communication circuit is used to Communicate with other nodes of the centralized network; the memory is used to store program instructions; the processor is used to run the program instructions to perform the above method.

In order to solve the above technical problems, a third aspect of the present application provides a decentralized network. The decentralized network includes at least one group of nodes, and each group of nodes includes at least one of the foregoing nodes.

In order to solve the above technical problems, a fourth aspect of the present application provides a storage device that stores program instructions, and when the program instructions run on a processor, execute the method described in the first aspect above.

The above scheme adopts a preset decentralized training strategy within the group to obtain the parameters within the group, and then uses the weights of the neighbor nodes outside the group to weight the parameters within the group, so as to realize the use of its common parameters in the decentralized network. The node can get the model parameters of the model without the central node. .

【Explanation】

FIG. 1 is a schematic structural diagram of an embodiment of a decentralized network of this application;

2 is a schematic flowchart of an embodiment of a model training method of this application;

3A is a schematic flowchart of step S220 in another embodiment of the model training method of the present application;

3B is a schematic flowchart of step S220 in another embodiment of the model training method of the present application;

4 is a schematic flowchart of still another embodiment of the model training method of the present application;

5 is a schematic structural diagram of an embodiment of a node of a decentralized network of this application;

6 is a schematic structural diagram of an embodiment of a storage device of the present application.

【detailed description】

In order to better understand the technical solution of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.

The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include the majority forms unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. In addition, "multiple" in this article means at least 2.

Please refer to FIG. 1, which is a schematic structural diagram of an embodiment of a decentralized network of the present application. In this embodiment, the decentralized network 10 includes multiple nodes 11, wherein the multiple nodes 11 are divided into at least one group of nodes, and each group of nodes includes at least one node 11. As shown in FIG. 1, in the decentralized network cluster 10, multiple nodes 11 are divided into three groups, and each group includes three nodes 11. Specifically, the node 11 may be a communication device such as a mobile phone, a tablet computer, a computer, or a server.

There is no central node in the decentralized network 10, and the nodes 11 can directly communicate with each other, and it is not necessary for all nodes 11 to communicate through the central node. For example, the nodes 11 in the group can communicate with each other, and each node 11 can communicate with at least one node 11 of each other group, wherein at least one node 11 in another group that communicates with the node 11 is called the node 11 Communication node outside the group. In this embodiment, except for the communication nodes outside the group, the node 11 cannot directly communicate with other nodes 11 in other groups.

In this embodiment, the decentralized network 10 may be used to create models and use the created models for data processing.

Specifically, each node 11 of the decentralized network 10 may be used to perform information processing by using a model in coordination with each node in the decentralized network when performing the above information processing. That is, each node uses its corresponding model to process the input information to output the result. In an application scenario, each node may be responsible for different parts of the model. For example, the model is a neural network model, and different network layers of the neural network model are assigned to different nodes, so that different nodes and different model processing parts , That is, parallelize the model; in another reference scenario, each node is responsible for all parts of the model, for example, different nodes have multiple copies of the same model, each node is assigned to a part of the data, and then all The calculation results of the nodes are combined in a certain way.

Before using the model for information processing, the decentralized network 10 may first perform model training to obtain model parameters of the model, and then use the model corresponding to the model parameters to implement information processing as shown above.

In this embodiment, each node in the decentralized network 10 is used to train to obtain the model parameters of the model. For example, each group of nodes 11 of the decentralized network 10 is first trained to obtain the group parameters of the model, and then the weights of neighbor nodes of different groups and the group parameters are used to obtain the model parameters of the model. Further, in order to obtain accurate model parameters, the model parameters may be iterated as many times as shown above.

For better understanding, the following example lists a training principle for the model parameters of the decentralized network of this application. In this example, the above grouped decentralized network is used to implement a machine learning algorithm that optimizes the objective function, and then realizes the training of model parameters, where the objective function can be optimized based on the gradient descent method. Specifically, the model parameter training method of the decentralized network is equivalent to solving the following objective function J:

Where J _k (x) is the sub-objective function of the k-th node 11, N is the number of nodes in the decentralized network,

Means that J is defined as

In this example, the parameter training method of the decentralized network is to let all nodes in the decentralized network 10 optimize sub-target data based on local data, and then exchange iterative parameters with other nodes in the decentralized network. A certain number of iterations can make the solutions of all nodes in the decentralized network 10 converge to an approximate solution of the objective function, such as an unbiased optimal solution, and then obtain model parameters of the model.

Based on the above training principle or other similar training principles, the decentralized network can realize the training of its model. Specifically, the decentralized network can use the following training methods to train its model, and then obtain model parameters.

Please refer to FIG. 2, which is a schematic flowchart of an embodiment of a model training method of the present application. In this embodiment, the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model. Specifically, the method includes the following steps:

S210: The current node adopts a preset decentralized training strategy in the group to obtain the group parameters for the model.

The current node is any node in the above decentralized network.

In this embodiment, the model parameters of the model can be obtained by iterative training. Specifically, the current node can use the model parameters obtained from its previous iteration to adopt this preset decentralized training strategy in this group for this iteration training to obtain the group parameters of the model for this iteration, and then step In S220, the local iteration model parameters are obtained using the local iteration group parameters. Therefore, the model parameters are continuously updated iteratively using this training method. After iterating to a certain number of times, the model parameters converge, and the converged model parameters can be taken as the final training model parameters.

Specifically, the preset decentralized training strategy includes but is not limited to the following strategies: gossip-based training strategy, incremental training strategy, consensus training strategy or diffusion training strategy. The diffusion training strategy can be specifically (A Multitask Diffusion Strategy with Optimized Inter-Cluster Cooperation). The above training strategy can be used to iterate the model parameters to obtain an unbiased optimal solution. For example, when the probability of any node being selected in the random strategy of the gossip training method reaches a uniform random distribution, the solutions of all nodes converge to the unbiased optimal solution. The other three strategies can also converge to the unbiased optimal solution.

Among them, the gossip-based training strategy refers to that each node in the network periodically passes a certain random strategy from all nodes, only exchanges parameters with one other node at a time, and iterates; the model parameters of node k at the tth iteration The update process of w _k,t can be as follows:

Among them, w _k,t-1 is the model parameter of node k at t-1th iteration, l is the sequence number of other neighbor nodes randomly selected, w _l,t-1 is the node l at t-1th iteration Model parameters. For the application of this training strategy to the nodes in the group in this application, the training strategy based on gossip can be understood as using each node in the group where the current node is located to periodically pass a certain random strategy from the group where it belongs, Exchange parameters with other nodes in a group and iterate. Among them, Gossip is a decentralized, fault-tolerant protocol that guarantees eventual consistency.

In the same way, the incremental training strategy is to iterate the model parameters using the following formula _{, and the} update process of the model parameters w _{k,t of the} node k at the t-th iteration can be as follows:

Among them, w _{k, t-1} is the model parameter of node k in the t-1th iteration; u is the iteration factor, for example, the value of 0-1; N represents the number of nodes in the network,

Represents the gradient, J _k (w _k,t-1 ) is the variable of node k is the objective function of the model parameter w,

It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain a specific algorithm for performing an incremental training strategy on the nodes in the group.

Consensus training strategy, that is to use the following formula to iterate the model parameters, the node k at the tth iteration of the model parameter w _{k, t} update process can be as follows:

Where w _k,t-1 is the model parameter of node k at the t-1th iteration, N _k represents all the sequence numbers of the neighbor nodes of node k, and w _l,t-1 is the neighbor node l at the t-1th time right neighbor node l iterative model parameters, c _lk weighting factor for the node k, u _k is the weight of the composition gradient weighting factor,

It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain an algorithm for performing a consensus strategy on the nodes in the group, as described in detail below.

To extend the training strategy, iterate the model parameters using the following formula _{, and the} update process of the model parameters w _k,t at the t-th iteration of node k can be as follows:

Among them, w _{k, t-1} is the model parameter of node k at the t-1th iteration, N _k represents all the sequence numbers of the neighbor nodes of node k, _clk is the weight factor of neighbor node l of node k, and u _k is The weighting factor of the combined gradient,

Represents the gradient, J _k (w _k,t-1 ) is the variable of node k is the objective function of the model parameter w

It represents the gradient value of the objective function after substituting specific model parameters. For the application of this training strategy to the nodes in the group in this application, the above formula can be appropriately transformed to obtain an algorithm for expanding the training strategy for the nodes in the group, as described in detail below.

For the specific training strategy, reference may also be made to existing related technologies, which will not be repeated here.

In order to facilitate everyone's understanding, the following uses a consensus training strategy and an extended training strategy to implement iterative update of the parameters in the group to describe this step in detail.

In the first example, referring to FIG. 3A, this S210 adopts an extended training strategy to implement iterative update of the parameters in the group, specifically including the following sub-steps:

S311: The current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.

For example, the current node may use the following formula (1) to obtain the initial parameter Ψ _{k,t of} the current node in this iteration;

Ψ _k,t = w _k,t-1 +u _k r _k,t (d _k,t -r _k,t w _k,t-1 ) (1)

Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w _k,t-1 is the model parameters obtained by the current node in the previous iteration, The u _k , r _k,t and d _k,t are reference parameters of this iteration, where the u _k represents a weighting factor; the r _k,t represents a random factor; and the d _k,t =r _{k ,t} ·ρ+v _k,t , the ρ is a hyperparameter, and the v _k,t is a random parameter. In this embodiment, the u _k is a set of weighting factors whose sum is one; the v _k,t is a set of random parameters with zero mean, that is, the v _k,t is a random number between -1 and 1, And the average value of v _k,t distribution is 0.

S312: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.

Specifically, the current node uses the following formula (2) to obtain the parameter Φ _{k,t of} the current node in this iteration group;

Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G _k represents the sequence number of the node in the group, the g _{l is} the weight of the node l in the group relative to the current node, The Ψ _{l, t is} the initial parameter of the node l in this group in this iteration.

Therefore, the initial parameters of the current node in this iteration can be obtained from the above formulas 1 and 2.

In the second example, referring to FIG. 3B, this S210 adopts a consensus training strategy to implement iterative update of the parameters within the group, specifically including the following sub-steps:

S313: The current node obtains the initial parameters of the current node in the local iteration by using the model parameters obtained by itself in the previous iteration and the weights of other nodes in the group relative to the current node.

Specifically, the current node may use the following formula (3) to obtain the initial parameter Ψ _{k,t-1 of} the current node in this iteration;

Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w _k,t-1 is the modulus parameter obtained by the current node in the previous iteration, The G _k represents the serial number of the node in the group, and the g _{l is} the weight of the node l in the group relative to the current node.

S314: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the reference parameters of the group of iterations.

Specifically, the current node uses the following formula (4) to obtain the parameter Φ _{k,t of} the current node in this iteration group;

Φ _k,t = Ψ _k,t-1 +2u _k r _k,t (d _k,t -r _k,t w _k,t-1 ) (4)

Therefore, the initial parameters of the current node in this iteration can be obtained from the above formulas 3 and 4.

S220: The current node obtains the model parameters for the model by using the parameters in the group and the weights of the neighbor nodes outside the group and the current node relative to the current node.

In this embodiment, if S210 uses the iterative method to obtain the in-group parameters of the current node for the model in this iteration, then in step S220, the current node reuses the current node's The in-group parameters and the weights of the out-of-group neighbor nodes of the current node relative to the current node obtain the model parameters for the model in this iteration. The parameters of the local iteration are used to obtain the model parameters of the local iteration. Specifically, the current node pre-stores the weight of each out-of-group neighbor node in the decentralized network relative to the current node, where the out-of-group neighbor node of the current node is a group different from the current node and adjacent to the current node There can be one or more nodes. After the current node obtains the in-group parameters of the model for this iteration, it can add the product between the in-group parameters of the model for this iteration and the weights of the pre-stored neighbor nodes of each group as the current node Model parameters for the model.

For example, the current node can use the following formula (5) to obtain the model parameter w _{k,t of} the current node for the model in this iteration;

Here, this is the t-th iteration, the k is the sequence number of the current node, the N _k represents the sequence number of the out-group neighbor node of the current node, and the _cl is the out-group neighbor node l relative to the The weight of the current node. The Φ _k,t is the parameter of the current node in the group of this iteration.

In this embodiment, a preset decentralized training strategy is adopted within the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve utilization in the decentralized network Its ordinary nodes can get the model parameters of the model, without the central node. Moreover, in the process of using iterative training, the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters. For example, considering the characteristics of a grouped decentralized network, in order to enable the algorithm to converge to an asymptotically unbiased optimal solution faster during parallel training, this embodiment adopts the above-mentioned preset decentralization within the group. The training strategy updates the parameters and then merges between the groups. Further, in order to achieve faster convergence, the aforementioned extended training strategy may be adopted.

Further, in order to prevent the problem of leakage of data transmitted between nodes, the above-mentioned group parameters and/or model parameters may be subjected to noise processing. For example, after the above S210, the preset node is used to denoise the parameters of the current node in the group of this iteration, and the parameters in the group after the noise are updated to the current node in the group of this iteration Parameters; further, after sub-steps S311 or S313 in S210, the initial parameters of the current node in this iteration can be denoised using preset noise, and the initial parameters after denoising are updated to the current The initial parameters of the node in this iteration. After the above S220, the preset model noise is used to denoise the model parameters of the model in this iteration, and the denoised model parameters are updated to the model parameters of the model in the current iteration . Wherein, the aforementioned preset noise is differential privacy noise, for example, laplacian random noise. Specifically, the Laplace random noise may be L(F, ε); wherein, ε is a differential privacy parameter that satisfies ε, and F is a differential privacy sensitivity of a preset model training objective function J _k , The preset model may be a neural network model. It can be understood that, in other embodiments, the above-mentioned noise adding process may be performed only on a part of the above-mentioned group parameters, initial parameters, and model parameters.

Please refer to FIG. 4, which is a flowchart of still another embodiment of the model training method of the present application. In this embodiment, the method is applied to the decentralized network as described above, and each node of the decentralized network is trained to obtain the model parameters of the model. This method uses the extended training strategy to train within the group to obtain the parameters within the group, and then weights between the groups to obtain the model parameters, and performs differential privacy denoising on the parameters of the extended training process and the final model parameters to prevent data indirection Give way. Specifically, the method includes the following steps:

S410: The current node obtains the initial parameters of the current node in this iteration by using the model parameters obtained by itself in the previous iteration and the reference parameters in this iteration.

Specifically, the current node can use the formula (1) as described above, the model parameters w _{k, t-1} obtained from the previous iteration of itself _, and the reference parameters u _k , r _{k, t} , d _{k, t} to obtain the initial parameter Ψ _{k,t of} the current node in this iteration.

S420: The current node uses the preset noise to add noise to the initial parameter of the current node in this iteration, and updates the initial parameter after noise addition to the initial parameter of the current node in this iteration.

In this embodiment, the preset noise is added when the gradient expansion update in the current node calculation group. The preset noise is Laplace random noise. The current node uses the following formula (6) to increase the preset noise and the number of neighbor nodes in the current node's group for the initial parameter Ψ _k,t of the current node in this iteration

And update the initial parameter after noise addition to the initial parameter Ψ′ _k,t of the current node in this iteration.

Where L(F, ε) is Laplace random noise, ε is the differential privacy parameter that satisfies ε, and F is the differential privacy sensitivity of the neural network model training objective function J _k ;

The number of neighbor nodes in the current node group.

S430: The current node obtains the parameters of the current node in the group of the model in the current iteration according to the initial parameters of the current node in the iteration and the initial parameters of the other nodes in the group in the iteration.

Specifically, the current node can use the following formula (7) to obtain the parameter Φ _{k,t of} the current node in this iteration group;

Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G _k represents the sequence number of the node in the group, the g _{l is} the weight of the node l in the group relative to the current node, The Ψ _{l, t} ′ is the initial parameter of the node l in the group after the noise is added in this iteration.

S440: The current node uses the in-group parameters of the current node to the model in this iteration and the weights of the out-group neighbor nodes of the current node relative to the current node to obtain the model parameters of the model in this iteration.

Specifically, the current node may use the formula (5) as described above, and the current node’s in-group parameter Φ _k,t and the weight c _{l of the out-of-} group neighbor nodes relative to the current node in this iteration are used to obtain the current The node's model parameters w _k,t at this iteration.

S450: The current node uses preset noise to denoise the model parameters of the model in this iteration, and updates the model parameters after the denoising to the model parameters of the model in this iteration.

For example, the preset noise is Laplace random noise. The current node uses the following formula (8) to increase the preset noise and the number of out-of-group neighbor nodes of the current node's model parameters w _k,t in this iteration

And update the model parameters after noise addition to the model parameters w _k,t ′ of the current node in this iteration.

The number of neighbors outside the group of the current node.

In this embodiment, the strategy of adopting the update strategy of the nodes in the group first using the diffusion strategy and then merging with the nodes outside the group can accelerate the convergence speed of distributed optimization, and at the same time, the noise of differential privacy can prevent the problem of indirect data leakage.

Please refer to FIG. 5, which is a schematic structural diagram of an embodiment of a node of a decentralized network of the present application. In this embodiment, the node 50 may be a node in the decentralized network as described in FIG. 1 and includes a memory 51, a processor 52, and a communication circuit 53. The communication circuit 53 and the memory 51 are respectively coupled to the processor 52. Specifically, each component of the node 50 may be coupled together through a bus, or the processor of the node 50 may be connected to other components one by one. The node 50 may be any communication device such as a mobile phone, a notebook, a desktop computer, and a server.

The communication circuit 53 is used to communicate with other nodes in the decentralized network. For example, the communication circuit 53 may communicate with nodes in the group in the decentralized network to obtain initial parameters of previous iterations of other nodes in the group.

The memory 51 is used to store program instructions executed by the processor 52 and data during processing of the processor 52, wherein the memory 51 includes a non-volatile storage part for storing the above-mentioned program instructions. Furthermore, the memory 51 may also store account related data.

The processor 52 controls the operation of the node 50, and the processor 52 may also be called a CPU (Central Processing Unit, central processing unit). The processor 52 may be an integrated circuit chip with signal processing capabilities. The processor 52 may also be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components . The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In this embodiment, the processor 52 uses program instructions stored in the memory 51 to: use the preset decentralized training strategy in the group to obtain the group parameters of the model; use the group parameters and The weight of the neighboring node outside the group with the current node relative to the current node obtains the model parameters of the model.

In some embodiments, the processor 52 executes the use of a preset decentralized training strategy in the group to obtain the group parameters for the model, including: using the model parameters obtained by the previous iteration of itself in the group The preset decentralized training strategy is used for this iterative training to obtain the in-group parameters of the model for this iteration; the processor 52 executes the use of the in-group parameters and the out-of-group neighbor nodes with the current node Relative to the weight of the current node, the model parameters for the model are obtained, including: using the current node to compare the in-group parameters of the model and the out-group neighbor nodes of the current node in this iteration The weight of the current node obtains the model parameters of the model in this iteration.

In some embodiments, the preset decentralized training strategy includes a gossip-based training strategy, an incremental training strategy, a consensus training strategy, or a diffusion training strategy.

In some embodiments, the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a diffusion training strategy to perform this iteration training in this group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained from the previous iteration and the reference parameters of this iteration to obtain the initial parameters of the current node in this iteration; according to the initial parameters of the current node in this iteration and other nodes in this group in this iteration The initial parameters of the current node to obtain the parameters of the current node within the group of the model in this iteration.

Further, the processor 52 executes the model parameters obtained by using the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration, which may specifically include: using the formula (1) described above ) Obtain the initial parameter Ψ _{k,t of} the current node in this iteration.

Further, the processor 52 executes the process according to the initial parameters of the current node at this iteration and the initial parameters of other nodes of the group at this iteration to obtain the current node in the group of the model in this iteration The parameters may specifically include: using the formula (2) described above to obtain the parameter Φ _{k,t of} the current node in the group of this iteration.

In some embodiments, the processor 52 executes the model parameters obtained by using the previous iteration, and adopts a consensus training strategy to perform this iteration training within the group to obtain the group parameters of the model for this iteration, including: Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration; according to the initial parameters of the current node in this iteration and the reference of the group of iteration Parameters to get the parameters of the current node within the group of the model in this iteration.

Further, the processor 52 executes the model parameters obtained from the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration, which may specifically include: using the above Formula (3) obtains the initial parameter Ψ _{k,t-1 of} the current node in this iteration.

Further, the processor 52 executes the parameters according to the initial parameters of the current node in the current iteration and the reference parameters of the set of iterations to obtain the parameters of the current node for the model in the current iteration, which may specifically include : Use the formula (4) described above to obtain the parameter Φ _{k,t of} the current node in the group of this iteration.

In some embodiments, the processor 52 is further configured to: use preset noise to add noise to the initial parameters of the current node in this iteration, and update the initial parameters after noise addition to the current node at this time The initial parameters of the iteration.

Further, the processor 52 executes the use of preset noise to add noise to the initial parameter of the current node in this iteration, which may specifically include: adding the preset to the initial parameter of the current node in this iteration Noise and the number of neighbor nodes in the current node group

In some embodiments, the processor 52 is further configured to: use preset noise to denoise the model parameters of the model in this iteration, and update the model parameters after denoising to the Iterate over the model parameters of the model.

Further, the processor 52 executes the process of adding noise to the model parameters of the model in the current iteration using preset noise, which may specifically include: modeling the model for the current node in the current iteration The parameter increases the preset noise and the number of neighbor nodes outside the group of the current node

In some embodiments, the preset noise is Laplace random noise. Further, the Laplacian random noise may be: L(F, ε); wherein, ε is a differential privacy parameter satisfying ε, and F is a differential privacy sensitivity of a preset model training objective function J _k .

In some embodiments, the processor 52 executes the use of the current node in this iteration for the in-group parameters of the model and the weight of the out-group neighbor nodes of the current node relative to the current node to obtain The model parameters of the model in this iteration include: using the above formula (5) to obtain the model parameters w _{k,t of} the current node for the model in this iteration.

The above processor 52 is also used to execute the steps of any of the above method embodiments.

Please refer to FIG. 6, the present application also provides a schematic structural diagram of an embodiment of a storage device. In this embodiment, the storage device 60 stores program instructions 61 executable by the processor, and the program instructions 61 are used to execute the method in the foregoing embodiment.

The storage device 60 may specifically be a medium that can store program instructions, such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disk, or It may also be a server that stores the program instructions. The server may send the stored program instructions to other devices for operation, or it may run the stored program instructions by itself.

In an embodiment, the storage device 60 may also be a memory as shown in FIG. 5.

In the above scheme, the preset decentralized training strategy is adopted in the group to obtain the in-group parameters of the model, and then the weights of the out-group neighbor nodes are used to weight the in-group parameters to achieve the use of the Ordinary nodes can get the model parameters of the model, without a central node. Moreover, in the process of using iterative training, the parameters within the model group are obtained within the group first, and then the model parameters are obtained by weighting the components, which improves the convergence speed of the model parameters.

In the several embodiments provided in this application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the device implementation described above is only schematic. For example, the division of modules or units is only a division of logical functions. In actual implementation, there may be other divisions, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods of the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program instructions .

The above are only the embodiments of the present application, and therefore do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the description and drawings of this application, or directly or indirectly used in other related technical fields, The same reason is included in the scope of patent protection of this application.

Claims

A model training method, characterized in that the method is applied to a decentralized network containing at least one group of nodes, wherein each group of nodes includes at least one node, and at least part of the nodes are used for training to obtain the model Model parameters

The method includes:

The current node uses the preset decentralized training strategy in this group to obtain the group parameters for the model;

Using the in-group parameters and the weights of the out-of-group neighbor nodes with the current node relative to the current node, the model parameters for the model are obtained.
The method according to claim 1, wherein the use of a preset decentralized training strategy in the group to obtain the group parameters for the model includes:

Using the model parameters obtained from the previous iteration of itself, the iterative training is performed in the group using a preset decentralized training strategy to obtain the group parameters of the model for this iteration;

The use of the in-group parameters and the weight of the out-of-group neighbor nodes with the current node relative to the current node to obtain model parameters for the model includes:

Using the current node's in-group parameters of the model in this iteration and the weight of the current node's out-of-group neighbor nodes relative to the current node, the model parameters of the model in this iteration are obtained.
The method according to claim 2, wherein the preset decentralized training strategy comprises a gossip-based training strategy, an incremental training strategy, a consensus training strategy or a diffusion training strategy.
The method according to claim 3, wherein the model parameters obtained from the previous iteration are used in this group to adopt the diffusion training strategy for this iteration training to obtain the group parameters of the model for this iteration ,include:

Use the model parameters obtained from the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration;

According to the initial parameters of the current node in the current iteration and the initial parameters of the other nodes in the group in the current iteration, the parameters of the current node in the group of the model in the current iteration are obtained.
The method according to claim 4, wherein the using the model parameters obtained by the previous iteration of itself and the reference parameters of this iteration to obtain the initial parameters of the current node at this iteration includes:

Use the following formula to obtain the initial parameter Ψ k,t of the current node in this iteration;

Ψ k,t = w k,t-1 +u k r k,t (d k,t -r k,t w k,t-1 )

Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The u k , r k,t and d k,t are reference parameters of this iteration, where the u k represents a weighting factor; the r k,t represents a random factor; and the d k,t =r k ,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter.
The method according to claim 4, characterized in that, according to the initial parameters of the current node in this iteration and the initial parameters of other nodes in this group in this iteration, the pair of the current node in this iteration is obtained Parameters within the group of the model include:

Use the following formula to obtain the parameter Φ k,t of the current node in the group of this iteration;

Among them, this time is the t-th iteration, the k is the sequence number of the current node, the G k represents the sequence number of the node in the group, the g l is the weight of the node l in the group relative to the current node, The Ψ l, t is the initial parameter of the node l in this group in this iteration.
The method according to claim 3, wherein the model parameters obtained from the previous iteration are used in this group to adopt a consensus training strategy for this iteration training to obtain the group parameters of the model for this iteration ,include:

Use the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration;

According to the initial parameters of the current node in this iteration and the reference parameters of the group of iterations, the parameters of the current node in the group of the model in this iteration are obtained.
The method according to claim 7, wherein the using the model parameters obtained by the previous iteration of itself and the weights of other nodes in the group relative to the current node to obtain the initial parameters of the current node in the local iteration includes:

Use the following formula to obtain the initial parameter Ψ k,t-1 of the current node in this iteration;

Wherein, this time is the t-th iteration, the previous time is the t-1th iteration, the k is the sequence number of the current node, and the w k,t-1 is the model parameters obtained by the current node in the previous iteration, The G k represents the serial number of the node in the group, and the g l is the weight of the node l in the group relative to the current node;

According to the initial parameters of the current node in the current iteration and the reference parameters of the group of iterations, the parameters of the current node in the group of the model in the current iteration include:

Use the following formula to obtain the parameter Φ k,t of the current node in the group of this iteration;

Φ k,t = Ψ k,t-1 +2u k r k,t (d k,t -r k,t w k,t-1 )

Where u k , r k,t and d k,t are reference parameters of this iteration, where u k represents a weighting factor; r k,t represents a random factor; and d k,t = r k,t ·ρ+v k,t , the ρ is a hyperparameter, and the v k,t is a random parameter.
The method according to any one of claims 4 to 8, wherein the model parameters obtained from the previous iteration of itself and the reference parameters of this iteration are used to obtain the current node after the initial parameters of this iteration, The method also includes:

Use preset noise to add noise to the initial parameter of the current node in this iteration, and update the initial parameter after noise addition to the initial parameter of the current node in this iteration.
The method according to claim 9, wherein the use of preset noise to add noise to the initial parameters of the current node at this iteration includes:

Adding the preset noise and the number of neighbor nodes in the current node group to the initial parameter of the current node in this iteration
The method according to claim 2, characterized in that, in the current iteration, the in-group parameters of the model and neighbor nodes outside the group of the current node are compared with the current node in the current iteration using the current node After obtaining the model parameters of the model in this iteration, the method further includes:

Use preset noise to denoise the model parameters of the model in the current iteration, and update the model parameters after the noise to the model parameters of the model in the current iteration.
The method according to claim 11, wherein the use of preset noise to denoise the model parameters of the model at this iteration includes:

Adding the preset noise and the number of out-of-group neighbor nodes of the current node to the model parameters of the model for the current node in this iteration
The method according to any one of claims 7 to 12, characterized in that

The preset noise is Laplace random noise; the Laplace random noise is: L(F, ε); wherein, ε is a privacy privacy parameter that satisfies ε, and F is a preset model training The differential privacy sensitivity of the objective function J k .
The method according to claim 2, characterized in that the current node is used in this iteration to compare the in-group parameters of the model and the out-group neighbor nodes of the current node with respect to the current node The weights to obtain the model parameters of the model in this iteration include:

Use the following formula to obtain the model parameters w k,t of the current node for the model in this iteration;

Here, this is the t-th iteration, the k is the sequence number of the current node, the N k represents the sequence number of the out-group neighbor node of the current node, and the cl is the out-group neighbor node l relative to the The weight of the current node. The Φ k,t is the parameter of the current node in the group of this iteration.
A node of a decentralized network is characterized by comprising a processor, a memory and a communication circuit coupled to the processor, wherein,

The communication circuit is used to communicate with other nodes of the decentralized network;

The memory is used to store program instructions;

The processor is configured to execute the program instructions to perform the method according to any one of claims 1 to 14.
A decentralized network, characterized in that the decentralized network includes at least one group of nodes, and each group of nodes includes at least one node according to claim 15.
A storage device, characterized in that the storage device stores program instructions, and when the program instructions run on a processor, the method according to any one of claims 1-14 is executed.