CN116862025A

CN116862025A - Model training method, system, client and server node, electronic device and storage medium

Info

Publication number: CN116862025A
Application number: CN202310928680.2A
Authority: CN
Inventors: 江军; 王炜; 陈世武
Original assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2023-10-10

Abstract

The application provides a model training method, a model training system, a client and server node, an electronic device and a storage medium. The method is applied to a client node, and comprises the following steps: acquiring a time constant and a global model aggregation parameter of a training period of the round sent by a server node; updating parameters of a local model of the client node by using global model aggregation parameters; determining the node type of the client node according to the comparison of the time constant and the training time length of the client node in the last round of training period, wherein the node type is specifically a fast node or a slow node; under the condition that the client node is a fast node, training the updated local model by using a sample set of the client node; under the condition that the client node is a slow node, the sample set is divided into n non-empty sub-sample sets, and p non-empty sub-sample sets are selected from the n non-empty sub-sample sets to train the updated local model, so that the idle of computing resources is reduced.

Description

Model training method, system, client and server node, electronic device and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model training method, a model training system, a client and server node, electronic equipment and a storage medium.

Background

Currently, with the rapid development of artificial intelligence technology, a large model of artificial intelligence is applied to multiple fields such as image analysis, speech recognition, word processing, intelligent recommendation, safety detection and the like, and a model training method represented by federal learning (Federated Learning) has become a new leading-edge technical hotspot field.

In the model training method of federal learning, a server node interfaces with a plurality of client nodes, and the client nodes and the server node cooperatively complete training of a model. However, due to the difference in computing performance between different client nodes, in the collaborative training process, the client node with relatively higher computing performance can complete its own training task relatively quickly, which often results in that such client node needs to wait for the client node with relatively lower computing performance to complete the training task, and further results in the idling of the overall computing resource, and influences the utilization efficiency of the computing resource.

Disclosure of Invention

The embodiment of the application aims to provide a model training method, a system, a client and server node, electronic equipment and a storage medium, which are used for solving the problems in the prior art.

A first aspect of an embodiment of the present application provides a model training method, where the method is applied to a client node, and the method includes:

acquiring a time constant and a global model aggregation parameter of a training period of the round sent by a server node;

updating parameters of a local model of the client node by utilizing the global model aggregation parameters;

determining a node type of the client node according to the comparison of the time constant and the training duration of the client node in the last round of training period, wherein the node type is specifically a fast node or a slow node;

training the updated local model by using a sample set of the client node under the condition that the client node is a fast node; or alternatively, the first and second heat exchangers may be,

and under the condition that the client node is a slow node, dividing the sample set into n non-empty sub-sample sets, and selecting p non-empty sub-sample sets from the n non-empty sub-sample sets to train the updated local model, wherein n is a positive integer greater than 1, p is a positive integer greater than or equal to 1, and n is greater than p.

In one embodiment, the method further comprises: calculating the value of n according to the ratio of the training duration to the time constant, wherein the value of n is larger than or equal to the ratio; the method comprises the steps of,

Dividing the sample set into n non-empty sub-sample sets specifically includes: the sample set is divided equally into n non-empty sub-sample sets.

In one embodiment, the method further comprises: and (3) calculating the value of p by multiplying the value of n by the ratio of the time constant to the training duration and rounding down.

In one embodiment, selecting p non-empty sub-sample sets from the n non-empty sub-sample sets to train the updated local model specifically includes:

respectively determining the total number of accumulated training rounds participated in each previous training cycle of each non-empty sub-sample set aiming at the n non-empty sub-sample sets;

according to the selection sequence of the total number of training rounds from small to large, selecting p non-empty sub-sample sets from the n non-empty sub-sample sets;

and training the updated local model by using the selected p non-empty sub-sample sets.

In an embodiment, in case the client node is a fast node, the method further comprises:

acquiring the training time length of the training period of the round and the updating gradient of the parameters of the local model in the client node;

and sending the training time length and the update gradient to the server node.

In an embodiment, in case the client node is a slow node, the method further comprises:

determining the actual training time length for training the updated local model by the selected p non-empty sub-sample sets;

and calculating the training time length when the sample set is used for training the updated local model according to the actual training time length and the ratio of the total sample amount in the p non-empty sub-sample sets to the total sample amount in the sample set, and taking the training time length as the training time length of the training period of the round.

In an embodiment, in the case that the client node is a slow node, the method further includes calculating the client node by the following formula, an update gradient of a parameter of the local model in the training period of the present round:

wherein i is the number of the client node; k is the round of the training period of the round;updating gradient of parameters of a local model in the training period of the round for the client node; p is p _i The number of the selected non-empty sub-sample sets in the client node; />The actual update gradient of the j-th non-empty sub-sample set to the parameters of the local model in the training period of the round; g _ij Before the training period of the round, the local model of the client node updates the gradient on the j-th non-empty sub-sample set for the last training; / >An estimate of the average gradient for each non-null sub-sample set in the data set for the client node.

A second aspect of an embodiment of the present application provides a model training method, where the method is applied to a server node, and the method includes:

respectively acquiring the sample number of sample sets in each client node, the training time length of the last round of training period and the update gradient of the last round of training period;

calculating global model aggregation parameters of the training period of the round by using global model aggregation parameters of the training period of the latest round, the sample number of the sample set in each client node and the update gradient of the training period of the latest round;

calculating the time constant of the training period of the round by using the sample number of the sample set in each client node and the training time length of the last round of training period;

and respectively sending the time constant of the training period of the round and the global model aggregation parameters to each client node.

A third aspect of an embodiment of the present application provides a client node, including:

the acquisition unit is used for acquiring the time constant and the global model aggregation parameter of the round of training period sent by the server node;

an updating unit, configured to update parameters of a local model of the client node by using the global model aggregation parameters;

The comparison unit is used for determining the node type of the client node according to the comparison of the time constant and the training duration of the client node in the last round of training period, wherein the node type is specifically a fast node or a slow node;

the first training unit is used for training the updated local model by utilizing the sample set of the client node under the condition that the client node is a fast node; or alternatively, the first and second heat exchangers may be,

and the second training unit is used for dividing the sample set into n non-empty sub-sample sets and selecting p non-empty sub-sample sets from the n non-empty sub-sample sets to train the updated local model under the condition that the client node is a slow node, wherein n is a positive integer greater than 1, p is a positive integer greater than or equal to 1, and n is greater than p.

A fourth aspect of an embodiment of the present application provides a server node, including:

the second acquisition unit is used for respectively acquiring the sample number of the sample set in each client node, the training time length of the last round of training period and the update gradient of the last round of training period;

the global model aggregation parameter calculation unit is used for calculating the global model aggregation parameter of the training period of the round by using the global model aggregation parameter of the training period of the latest round, the sample number of the sample set in each client node and the update gradient of the training period of the latest round;

The time constant calculation unit is used for calculating the time constant of the training period of the round by utilizing the sample number of the sample set in each client node and the training duration of the last round of training period;

and the sending unit is used for respectively sending the time constant of the training period of the round and the global model aggregation parameter to each client node.

A fifth aspect of an embodiment of the present application provides a model training system, including a server node and a plurality of client nodes connected to the server node, wherein:

the server node comprises: the second acquisition unit is used for respectively acquiring the sample number of the sample set in each client node, the training time length of the last round of training period and the update gradient of the last round of training period; the global model aggregation parameter calculation unit is used for calculating the global model aggregation parameter of the training period of the round by using the global model aggregation parameter of the training period of the latest round, the sample number of the sample set in each client node and the update gradient of the training period of the latest round; the time constant calculation unit is used for calculating the time constant of the training period of the round by utilizing the sample number of the sample set in each client node and the training duration of the last round of training period; the sending unit is used for respectively sending the time constant of the training period of the round and the global model aggregation parameter to each client node;

The client node comprises: the acquisition unit is used for acquiring the time constant and the global model aggregation parameter of the round of training period sent by the server node; an updating unit, configured to update parameters of a local model of the client node by using the global model aggregation parameters; the comparison unit is used for determining the node type of the client node according to the comparison of the time constant and the training duration of the client node in the last round of training period, wherein the node type is specifically a fast node or a slow node; the first training unit is used for training the updated local model by utilizing the sample set of the client node under the condition that the client node is a fast node; or the second training unit is configured to divide the sample set into n non-null sub-sample sets and select p non-null sub-sample sets from the n non-null sub-sample sets to train the updated local model when the client node is a slow node, where n is a positive integer greater than 1, p is a positive integer greater than or equal to 1, and n is greater than p.

A sixth aspect of an embodiment of the present application provides an electronic device, including:

A processor;

a memory for storing processor-executable instructions; wherein the processor is configured to perform the method according to any of the embodiments of the application.

A seventh aspect of the embodiments of the present application provides a storage medium storing a computer program executable by a processor to perform the method according to any one of the embodiments of the present application.

The model training method provided by the embodiment of the application comprises the steps of firstly obtaining the time constant and the global model aggregation parameter of a round of training period sent by a server node, then updating the parameters of a local model of a client node by utilizing the global model aggregation parameter, and then determining the node type of the client node according to comparison between the time constant and the training time length of the client node in the last round of training period, wherein under the condition that the client node is a fast node, the updated local model is trained by utilizing a sample set of the client node, or under the condition that the client node is a slow node, the sample set is divided into n non-empty sub-sample sets, and p non-empty sub-sample sets are selected from the n non-empty sub-sample sets to train the updated local model. According to the method, the number of samples of the slow node for training the model is reduced according to different node types of the client node, so that the slow node can relatively and quickly finish training of the model, and therefore the idling of the whole computing resource can be reduced, and the utilization efficiency of the computing resource is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a heterogeneous federal learning system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a model training method for federal learning according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a client node according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a server node according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a model training system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the description of the present application, terms such as "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance or order.

As described above, in the model training method of federal learning, because there is a difference in computing performance between different client nodes, in the collaborative training process, the client node with relatively higher computing performance can complete its own training task relatively quickly, which often results in that such client node needs to wait for the client node with relatively lower computing performance to complete the training task, thereby resulting in idling of the overall computing resource and affecting the utilization efficiency of the computing resource.

In view of this, the embodiment of the application provides a model training method, a device, an electronic device and a storage medium, which can reduce the idle of the whole computing resource and further improve the utilization efficiency of the computing resource.

As shown in fig. 1, a schematic structural diagram of a heterogeneous federal learning system according to an embodiment of the present application includes a server node 1 and a plurality of client nodes 2, where the server node 1 is connected to each client node 2, so that the server node 1 can communicate with each client node 2. Wherein, each client node 2 stores a respective sample set, and the sample sets (including the number of samples in the sample set) of different client nodes 2 are generally different; in addition, the local model is stored in each client node 2, and the global model is stored in the server node 1.

In the federation learning model training method, as shown in fig. 2, the server node 1 may send a first parameter of a first round training period to each client node 2 in the first round training period, after each client node 2 obtains the first parameter, each client node uses its own sample set to train its own local model, and feeds back a second parameter to the server after the training is completed, and the server node 1 updates its own global model based on the second parameter fed back by each client node 2, and generates the first parameter of the second round training period, thereby completing the model training of the first round training period.

Then in the second round of training period, the server node 1 sends the first parameter of the second round of training period to each client node 2 again, each client node 2 trains the local model of itself again by using the own sample set based on the first parameter after acquiring the first parameter, and feeds back the second parameter to the server after completing training, the server node 1 updates the global model of itself again based on the second parameter fed back by each client node 2, and generates the first parameter of the third round of training period, thus completing the model training of the second round of training period; and by analogy, respectively performing model training from the third training period to the Nth training period until the training termination condition is met.

Here, the first parameter refers to data sent by the server node 1 to the client node 2, and the second parameter refers to data sent by the client node 2 to the server node 1, and specific description will be given in the following embodiments for specific inclusion of the first parameter and the second parameter. In addition, for the nth round training period, the round number of the present round training period is N.

As shown in fig. 3, the model training method provided by the embodiment of the application is applied to the client node 2 in the heterogeneous federal learning system, and can train the local model in the client node 2, and can reduce the idling of the whole computing resource of the heterogeneous federal learning system in the model training process, thereby improving the utilization efficiency of the computing resource. The client node 2 as the application of the method may be any one or more client nodes in the heterogeneous federal learning system, for example, each client node 2 in the heterogeneous federal learning system trains its own local model through the method. For the method, the method specifically comprises the following steps:

step S31: and acquiring the time constant and the global model aggregation parameter of the training period of the round sent by the server node.

The present training cycle may be any one of the second to nth training cycles. For example, the present training period may be a 10 th training period, and the last training period is a 9 th training period (i.e., the training period closest to the present training period among the previous training periods).

In the training period of the round, the server node respectively sends first parameters, namely the time constant of the training period of the round and the global model aggregation parameters, to each client node. Correspondingly, the client node can acquire the time constant and the global model aggregation parameter of the training period of the round, which are sent by the server node.

Step S32: and updating the parameters of the local model of the client node by utilizing the global model aggregation parameters.

In the step S31, the client node obtains the time constant and the global model aggregation parameter of the training period, so in the step S32, the parameters of the local model of the client node itself are further updated by using the global model aggregation parameter, for example, the parameters of the local model of the client node itself are updated to the global model aggregation parameter.

Step S33: and determining the node type of the client node according to the comparison of the time constant and the training time length of the client node in the last round of training period, executing the step S34 if the client node is a fast node, and executing the step S35 if the client node is a slow node.

The node type is specifically a fast node or a slow node, for example, when the time constant is smaller than the training duration of the client node in the last round of training period, the time of the client node in the last round of training period is relatively longer, the training speed is relatively slower, and the node type of the client node is a slow node; or when the time constant is greater than or equal to the training duration of the client node in the last round of training period, the time of the client node in the last round of training period is relatively short, the training speed is relatively high, and the node type of the client node is a fast node.

Thus, in the step S33, the time constant may be compared with the training duration of the client node in the last training period, and if the time constant is less than the training duration, it may be determined that the node type of the client node is a slow node; conversely, if the time constant is greater than or equal to the training period, the node type of the client node may be determined to be a fast node.

Step S34: and under the condition that the client node is a fast node, training the updated local model by using the sample set of the client node.

If the time constant is greater than or equal to the training time of the client node in the last round of training period, the client node is a fast node, which also indicates that the time of the client node in the last round of training period is relatively short, and the training speed is relatively fast, so that the updated local model is trained by using the sample set of the client node in the present round of training period.

The training of the updated local model by using the sample set of the client node may specifically be that the updated local model is trained by using samples in the sample set of the client node (generally, all samples in the sample set), where, for example, all samples in the sample set may be input to the local model during the training process, and m iterative training is performed on the local model by using, for example, a gradient descent algorithm, so that training of the local model is implemented in this round of training period. The value of m can be generally set according to practical needs, for example, m can be 3, 4, 5, 6 or other values.

Of course, after the training of the local model is achieved in the present training period, the client node needs to feed back a second parameter to the server node. Thus, the method may further comprise obtaining a training duration of the training cycle and an update gradient of parameters of the local model in the client node, and then sending the obtained training duration and the update gradient to the server node.

For example, in the process of performing m times of iterative training on the local model by adopting a gradient descent algorithm, the total time of the m times of iterative training can be obtained and used as the training time length of the training period of the round; and aiming at the local model in the client node, calculating the update gradient of the parameters of the local model in the client node through an update gradient related calculation formula of the model parameters.

Step S35: and under the condition that the client node is a slow node, dividing the sample set into n non-empty sub-sample sets, and selecting p non-empty sub-sample sets from the n non-empty sub-sample sets to train the updated local model.

If the time constant is smaller than the training duration of the client node in the last round of training period, the client node is a slow node, which also indicates that the time of the client node in the last round of training period is relatively longer and the training speed is relatively slower, so that only part of samples in the sample set of the client node are utilized to train the updated local model in the present round of training period, thereby reducing the time of model training.

In this step S35, n is a positive integer greater than 1, p is a positive integer greater than or equal to 1, and n is greater than p. For example, n has a value of 2 and p has a value of 1; or n is 3, p is 1 or 2, etc. Wherein, the non-empty sub-sample set specifically refers to the number of samples in the non-empty sub-sample set being greater than or equal to 1.

In practical applications, the specific size of n may be set to a fixed positive integer greater than 1, and the specific size of p may be set to a positive integer greater than or equal to 1 and less than n, which is low in implementation cost, but tends to reduce the sample utilization rate in the sample set.

In order to increase the utilization of samples in the sample set of the client node, the values of n and p may be set as follows. For example, the value of n may be calculated by the ratio of the training duration of the client node in the last training period to the time constant, where the value of n is greater than or equal to the ratio, i.e., the value of n is calculated by equation one.

In the formula I, i is the number of the client node in the heterogeneous federal learning system; n is n _i N is the calculated value of the client node with the number i; t (T) _i The training time length in the last round of training period is the client node with the number i; t (T) _th Is the time constant of the training period of the present round.

At this time, due to the time constant T _th Less than the training period T _i Thus, it isGreater than 1; thus in practical use, for n _i Can be applied to +.>Performing downward rounding, and adding 1, 2 or other positive integer value to the value after the downward rounding to obtain n _i Specific values of (2).

Accordingly, the value of p can be calculated as follows. For example, the value of p is calculated by multiplying the value of n by the ratio of the time constant to the training period and rounding down, that is, by equation two.

Similarly, in the second formula, i is the number of the client node in the heterogeneous federal learning system; p is p _i For the calculated p value of the client node numbered i; t (T) _i The training time length in the last round of training period is the client node with the number i; t (T) _th A time constant for the training period of the present round; n is n _i N is a value of n for a client node numbered i, which n _i Can be obtained by calculation according to the formula I;to round down operators.

Therefore, the values of n and p can be calculated by the above formula one and formula two respectively. After the value of n is calculated according to the formula one, for a specific implementation manner of dividing the sample set into n non-empty sub-sample sets, the sample set may be divided into n non-empty sub-sample sets, so that the number of samples in each non-empty sub-sample set is equal or substantially equal.

Thus, a first specific way to select p non-empty sub-sample sets from n non-empty sub-sample sets may be to randomly select p non-empty sub-sample sets from the n non-empty sub-sample sets, where the random selection is implemented at a lower cost, but is prone to a phenomenon that part of the samples are trained less often, and another part of the samples are trained more often, resulting in an imbalance in training frequency among the samples.

The second specific way of selecting p non-empty sample sets from n non-empty sample sets may be to first determine, for the n non-empty sample sets, the total number of training rounds in which each non-empty sample set has accumulated in each previous round of training cycle, and then select p non-empty sample sets from the n non-empty sample sets according to a selection order of the total number of training rounds from small to large, so that p non-empty sample sets with the minimum total number of training rounds can be selected from the n non-empty sample sets, and further train the updated local model by using the selected p non-empty sample sets, thereby balancing training times between samples.

The specific way to select p non-null sub-sample sets from the n non-null sub-sample sets according to the selection sequence from the small to the large of the total number of training rounds is to sort the n non-null sub-sample sets (for example, sort by adopting an bubbling method) according to the sequence from the small to the large of the total number of training rounds, and then select p non-null sub-sample sets with the front sorting; or firstly sequencing the n non-empty sub-sample sets according to the sequence from the large total number of training rounds, and then selecting p non-empty sub-sample sets with the sequencing being later.

For the specific way of selecting p non-empty sub-sample sets from n non-empty sub-sample sets, the first way is low in implementation cost, but easily causes unbalance of training times among samples, and the second way can balance training times among samples. In practical application, when the number of training rounds is small, the sample training times are easy to be unbalanced by adopting the first mode for random selection, and when the number of training rounds is large, the sample training times are not easy to be unbalanced by random selection.

Therefore, the two modes can be combined to obtain a third implementation mode, in the third implementation mode, whether the training wheel number of the training period of the wheel is larger than a preset value can be judged first, if so, the fact that the training wheel number is larger can be indicated, and the first implementation mode can be adopted; otherwise, if the number of training rounds is smaller than or equal to the preset value, the number of training rounds is smaller, and the second implementation manner can be adopted. Thus, with this third implementation, the problems of balance between samples and implementation cost are comprehensively considered, so that the implementation cost can be comprehensively reduced and unbalance between samples can be reduced. For the training wheel number of the training period of the present wheel, for example, if the training period of the present wheel is the 5 th training period, the training wheel number is 5; if the training period of the present round is the training period of the nth round, the training round number is N.

For the n non-empty sub-sample sets, respectively determining the total number of the accumulated participated training rounds of each non-empty sub-sample set in each previous round of training period, wherein the total number of the training rounds reflects the samples in the corresponding non-empty sub-sample set, and the number of the accumulated participated training rounds in each previous round of training period; for example, if the total number of training rounds is larger, the samples in the corresponding non-empty sub-sample set are described, and the more training rounds are accumulated in each previous training cycle; otherwise, if the total number of training rounds is smaller, the samples in the corresponding non-empty sub-sample set are indicated, and the number of training rounds which are accumulated in each previous training cycle is smaller.

Therefore, in practical application, for the n non-null sub-sample sets, the total number of training rounds in which each non-null sub-sample set is accumulated in each previous round of training period is determined, which may be specifically achieved by, for example, for each non-null sub-sample set in the n non-null sub-sample sets, firstly, obtaining the total number of training rounds in which each sample in the non-null sub-sample set is accumulated in each previous round of training period, and then summing the total number of training rounds in which each sample is accumulated in each previous round of training period, so as to obtain the total number of training rounds in which each non-null sub-sample set is accumulated in each previous round of training period.

Of course, after determining each non-empty sub-sample set in this way and accumulating the total number of the involved training rounds in each previous round training cycle, the serial numbers of the non-empty sub-sample set can be stored in the serial number set to construct the corresponding round record parametersFor recording the preceding rounds of the non-empty sub-sample setThe total number of the training rounds participated in is accumulated in the training period. Wherein the elements in the sequence number set are the sequence numbers of each non-empty sub-sample set obtained by dividing in each previous training period, and the sequence numbers are used for representing the unique corresponding non-empty sub-sample set

Further, for the n non-null sub-sample sets, determining the total number of training rounds in which each non-null sub-sample set has accumulated in each previous round of training cycle respectively, may be implemented, for example, by first acquiring a sequence number set, then determining, for each non-null sub-sample set in the n non-null sub-sample sets, whether the sequence number of the non-null sub-sample set is included in the sequence number set, if so, indicating that the non-null sub-sample set has been generated in each previous round of training cycle, so as to acquire a round record parameter corresponding to the non-null sub-sample set Thus obtaining the total number of the training rounds which are accumulated and participated in the previous round training cycle of the non-empty sub-sample set; or if not, the non-empty sample set is described as a new non-empty sample set, and the total number of the training rounds in which each sample in the non-empty sample set is accumulated in each previous round of training period can be obtained first, and then the total number of the training rounds in which each sample is accumulated in each previous round of training period is summed, so as to obtain the total number of the training rounds in which the non-empty sample set is accumulated in each previous round of training period.

After the p non-empty sub-sample sets are selected, the updated local model may be trained by using the p non-empty sub-sample sets, and in the training process, samples in the p non-empty sub-sample sets may be input to the local model, and m iterative training may be performed on the local model by using, for example, a gradient descent algorithm, so that training of the local model is implemented in the present training period. The value of m may be the same as that of m in step S34, which is not described here again.

In the model training method of federal learning, after the client node completes the training of the local model of the client node, the second parameter needs to be fed back to the server node in each training period. Therefore, in the case that the client node is a slow node, after performing the above step S35, the method may further include, acquiring the training duration of the training period of the present round, and the update gradient of the parameters of the local model in the client node by:

the method comprises the steps that p selected non-empty sub-sample sets can be determined first, and the actual training duration and the actual updating gradient of the updated local model can be trained; for the actual training duration, for example, in the process of inputting the samples in the p non-empty sub-sample sets into a local model and performing m times of iterative training on the local model by adopting a gradient descent algorithm, the total time of the m times of iterative training can be obtained as the actual training duration; and aiming at the local model in the client node, calculating the actual update gradient of the parameters of the local model in the client node through an update gradient related calculation formula of the model parameters.

After determining the actual training duration and the actual update gradient, the training duration when the updated local model is trained by the sample set can be calculated according to the actual training duration and the ratio of the total sample amount in the p non-empty sub-sample sets to the total sample amount in the sample set, and the training duration can be calculated by using the formula three as the training duration of the training period of the present round.

T _i ＝TP _i ×1/Q _i Formula III

In the formula III, i is the number of the client node in the heterogeneous federal learning system; TP (Transmission protocol) _i The calculated actual training duration is the actual training duration in the client node with the number of i; q (Q) _i In the client node numbered i, the ratio of the total number of samples in the p non-null sub-sample sets to the total number of samples in the sample set.

Wherein the ratio Q of the total sample in the sample set to the total sample in the p non-null sub-sample sets _i In the case of equally dividing the sample set into n non-empty sub-sample sets using the above-mentioned calculation method, so that the number of samples in each non-empty sub-sample set is equal or substantially equal, the Q _i ＝p _i /n _i Wherein the p is _i For the number p of non-empty sub-sample sets selected in the client node numbered i.

After determining the actual training duration and the actual update gradient, the client node can be calculated through a formula IV, and the update gradient of the parameters of the local model in the training period of the round:

in the fourth formula, i is the number of the client node in the heterogeneous federal learning system; k is the round of the training period, e.g. the second round of trainingThe cycle of the training period is 2, and the cycle of the training period of the 5 th cycle is 5;for the client node with the number i, the update gradient of the parameters of the local model in the training period with the round k (namely the training period of the present round), namely the update gradient of the parameters of the local model of the client node in the training period of the present round; p is p _i The number p of the selected non-empty sub-sample sets in the client node numbered i is the number of the selected non-empty sub-sample sets in the client node; />For the actual update gradient of the jth non-empty sub-sample set to the parameters of the local model in the training period with round k, j is the selected p _i Sequence numbers of the non-null sub-sample sets; g _ij Before the training period of the round, the local model of the client node updates the gradient on the j-th non-empty sub-sample set for the last training; / >An estimate of the average gradient of the respective non-empty sub-sample set in the data set for the client node, wherein +.>Can be calculated by the formula five.

Therefore, through the formula IV, the update gradient of the parameters of the local model in the training period of the client node can be calculated, and through the formula III, the training time of the training period of the client node can be calculated, and further, the client node can feed back the update gradient and the training time as second parameters to the server node, so that the server node can update the global model of the client node by utilizing the second parameters.

It should be noted that, when the client node is calculated by the above formula four, the gradient of the update of the parameters of the local model in the training period of the present round is introduced in the formula fourThus enabling higher accuracy of the calculated update gradient.

Fig. 4 shows a model training method according to another embodiment of the present application, which is applied to a server node connected to a client node in the heterogeneous federal learning system. For the method, the method specifically comprises the following steps:

Step S41: and respectively acquiring the sample number of the sample set in each client node, the training time length of the last round of training period and the update gradient of the last round of training period.

In the model training method for federal learning of the present application, after the client node completes the training of its own local model in each round of training period, the client node can acquire the update gradient of the parameters of the local model in the present round of training period and the training duration of the present round of training period, and then feed back the update gradient and the training duration to the server node.

Therefore, in step S41, a specific manner of obtaining the training duration of the last round of training period and the update gradient of the last round of training period of each client node may be that after each client node completes training on its own local model in the last round of training period, the update gradient of the parameter of the local model in the last round of training period and the training duration of the last round of training period are sent to the server node, so that the server node can obtain the training duration of the last round of training period and the update gradient of the last round of training period of each client node.

In practical application, the number of samples of the sample set in the client node generally does not change much in a short time, so the server node may acquire the number of samples of the sample set in each client node in a first round of training period, for example, after the client node completes training on its own local model in the first round of training period, the number of samples of the sample set is also fed back to the server node, so that the server node can acquire the number of samples of the sample set in the client node. And, after the server node obtains the number of samples of the sample set in each client node in the first round of training period, the number of samples of the sample set in each client node may be stored in the database, and in each subsequent round of training period, the number of samples of the sample set in each client node may be obtained from the database.

In the first training cycle, the model is usually initialized, where the total number of client nodes is assumed to be N _c The initialization may be performed as follows: the server node may initialize its own global model parameters in a gaussian randomization manner, where the initialization parameters may be w ₀ ^′ The initialization parameters are then set to N _c A plurality of client nodes; thus, each client node receives the initialization parameter w ₀ ^′ The initialization parameter w can then be used ₀ ^′ And initializing a local model of the self.

Of course, in the first round of training period, since there is no last round of training period, the training duration in the last round of training period is not stored in each client node, at this time, each client node may respectively train the initialized local model by using its own sample set (for example, all samples in the sample set), so as to obtain the training duration of the first round of training period, and the update gradient of the parameters of the local model in the first round of training period, and feedback the training duration and the update gradient to the server node, and of course, may also feedback the number of samples in its own sample set.

Step S42: and calculating the global model aggregation parameters of the training period of the round by using the global model aggregation parameters of the training period of the latest round, the sample number of the sample set in each client node and the update gradient of the training period of the latest round.

After the server node obtains the number of samples of the sample set in each client node, the training duration of the last round of training period, and the update gradient of the last round of training period through the above step S41, in step S42, the global model aggregation parameter of the last round of training period, and the number of samples of the sample set in each client node and the update gradient of the last round of training period are used to calculate the global model aggregation parameter of the present round of training period, for example, the global model aggregation parameter of the present round of training period may be calculated by using formulas six to eight.

In this equation six, the total number of client nodes is N _c The method comprises the steps of carrying out a first treatment on the surface of the i is the number of the client node; i D _i I is the number of samples of the sample set in the client node numbered i.

In this equation seven, |D _T The I is calculated by the formula six;an update gradient of a training period of the latest round for a client node with the number i; the total number of client nodes is N _c The method comprises the steps of carrying out a first treatment on the surface of the i is the number of the client node; i D _i I is the number of samples of the sample set in the client node numbered i.

In the formula eight，w _k-1 The global model aggregation parameters of the training period of the latest round are obtained;can be obtained through calculation of a formula seven; eta is the learning rate, wherein 0<η<1, the size of the eta is generally gradually reduced along with the increase of the number of training wheels; w (w) _k And the global model aggregation parameters of the training period are calculated.

Of course, after calculating the global model aggregation parameters of the training period according to the formulas six to eight, the server node may update its own global model by using the global model aggregation parameters.

After calculating the global model aggregation parameter of the present round of training period, the server node may further determine whether a training termination condition is met, for example, the training termination condition may be that the global model aggregation parameter converges and/or the cycle round is greater than a preset maximum round (for example, the preset maximum round may be 100), for example, if the global model aggregation parameter converges, it is illustrated that training of the model is completed, and it may be determined that the training termination condition is met and the next round of training period is terminated; or if the cycle number is greater than the preset maximum number, the training of the model is also indicated to be completed, the condition for ending the training can be determined to be met, and the next training period is ended.

In addition, after initializing the parameters of the own global model, the server node initializes the global model aggregation parameters of the own global model to w' ₀ The method comprises the steps of carrying out a first treatment on the surface of the Correspondingly, in the first round training period, each client node trains the initialized local model, so that the training duration of the first round training period and the updating gradient of the parameters of the local model in the first round training period are obtained, the training duration and the updating gradient are fed back to the server node, and the sample number of the self sample set can be fed back; at this time, the server node calculates the global model aggregation parameter of the second round training period by using the above-mentioned formulas six to eight, in the formula eight，w′ _k-1 In particular the w' ₀ 。

Step S43: and calculating the time constant of the training period of the round by using the sample number of the sample set in each client node and the training time length of the last round of training period.

Because the number of samples of the sample set in each client node is usually different, if the time constant of the training period of the current round is calculated directly according to the training time length of the training period of the last round, the training efficiency of each client node cannot be truly reflected in general; therefore, in the step S43, the time constant of the training period is calculated by using the number of samples in the sample set and the training time length of the last training period in each client node, for example, the training time length of the last training period of the client node can be divided by the number of samples in the sample set of the client node for each client node, so as to calculate the training time length of each sample of the client node in the last training period; and then aiming at each client node, sequencing according to the sequence from small to large by averaging the training time length of each sample in the last round of training period, and selecting the client nodes with the median, 70% bit, 80% bit or 90% bit, so that the training time length of the last round of training period of the client node can be used as the time constant of the training period of the present round. Of course, other ways of calculating the time constant of the training period of the present round may be used.

The execution order of the steps S42 and S43 is not limited, and for example, the steps S42 and S43 may be executed first, the steps S42 may be executed second, the steps S42 may be executed first, the steps S42 and S43 may be executed in parallel, or other execution orders may be used.

Step S44: and respectively sending the time constant of the training period of the round and the global model aggregation parameters to each client node.

The server node calculates the time constant of the training period of the present round through the step S43, and sends the time constant of the training period of the present round and the global model aggregation parameter to each client node after calculating the global model aggregation parameter of the training period of the present round through the step S42.

Since the model training method of the step S41 to the step S44 adopts the same inventive concept as the model training method of the step S31 to the step S25, the problems in the prior art can be solved, and the description thereof will not be repeated here.

Based on the same inventive concept as the model training method provided by the embodiment of the present application, the embodiment of the present application also provides a client node, for which, if unclear, reference may be made to the corresponding content of the method embodiment. As shown in fig. 5, a specific structure of the client node 50 is shown, where the client node 50 includes: an acquisition unit 501, an update unit 502, a comparison unit 503, a first training unit 504 and a second training unit 505, wherein:

An obtaining unit 501, configured to obtain a time constant and a global model aggregation parameter of a training period of the present round sent by a server node;

an updating unit 502, configured to update parameters of a local model of the client node with the global model aggregation parameters;

a comparing unit 503, configured to determine a node type of the client node according to the comparison between the time constant and a training duration of the client node in a last training period, where the node type is specifically a fast node or a slow node;

a first training unit 504, configured to train the updated local model with the sample set of the client node when the client node is a fast node; or alternatively, the first and second heat exchangers may be,

a second training unit 505, configured to divide the sample set into n non-null sub-sample sets when the client node is a slow node, and select p non-null sub-sample sets from the n non-null sub-sample sets to train the updated local model, where n is a positive integer greater than 1, p is a positive integer greater than or equal to 1, and n is greater than p.

With the client node 50 provided by the embodiment of the present application, since the client node 50 adopts the same inventive concept as the model training method provided by the embodiment of the present application, the client node 50 can solve the technical problem on the premise that the method can solve the technical problem, which is not described herein.

In addition, in practical application, the technical effects obtained by combining the client node 50 with specific hardware devices, cloud technology and the like are also within the protection scope of the present application, for example, different units in the client node 50 are distributed in different nodes in a distributed cluster manner, so as to improve efficiency and the like; or, part of the units in the client node 50 are distributed in the cloud, so that the cost and the like are reduced.

The client node 50 may further include an n-value calculating unit, configured to calculate a value of n by using a ratio of the training duration to the time constant, where the value of n is greater than or equal to the ratio; and dividing the sample set into n non-empty sub-sample sets, specifically including: the sample set is divided equally into n non-empty sub-sample sets.

The client node 50 may further comprise a p-value calculation unit for calculating the value of p by multiplying the value of n by the ratio of the time constant to the training period and rounding down.

The selecting p non-empty sub-sample sets from the n non-empty sub-sample sets to train the updated local model may specifically include:

In the case that the client node is a fast node, the client node 50 may further include a parameter feedback unit, configured to obtain a training duration of the training period and an update gradient of a parameter of the local model in the client node; and sending the training time length and the update gradient to the server node.

In the case that the client node is a slow node, the client node 50 may further include a training duration calculation unit, configured to determine an actual training duration for training the updated local model by using the selected p non-null sub-sample sets; and calculating the training time length when the sample set is used for training the updated local model according to the actual training time length and the ratio of the total sample amount in the p non-empty sub-sample sets to the total sample amount in the sample set, and taking the training time length as the training time length of the training period of the round.

In the case where the client node is a slow node, the client node 50 may calculate the client node by the following formula, the update gradient of the parameters of the local model in the present training period:

wherein i is the number of the client node; k is the round of the training period of the round;updating gradient of parameters of a local model in the training period of the round for the client node; p is p _i The number of the selected non-empty sub-sample sets in the client node; />The actual update gradient of the j-th non-empty sub-sample set to the parameters of the local model in the training period of the round; g _ij Before the training period of the round, the local model of the client node updates the gradient on the j-th non-empty sub-sample set for the last training; />An estimate of the average gradient for each non-null sub-sample set in the data set for the client node.

Based on the same inventive concept as the model training method provided by the embodiment of the present application, the embodiment of the present application also provides a server node, for which, if it is unclear, reference may be made to the corresponding content of the method embodiment. As shown in fig. 6, a specific structure of the server node 60 is shown, and the server node 60 includes: a second acquisition unit 601, a global model aggregation parameter calculation unit 602, a time constant calculation unit 603, and a transmission unit 604, wherein:

A second obtaining unit 601, configured to obtain, respectively, a number of samples of the sample set in each client node, a training duration of a last round of training period, and an update gradient of the last round of training period;

the global model aggregation parameter calculating unit 602 is configured to calculate the global model aggregation parameter of the training period of the present round by using the global model aggregation parameter of the training period of the last round, the number of samples of the sample set in each client node, and the update gradient of the training period of the last round;

a time constant calculating unit 603, configured to calculate a time constant of the training period of the current round by using the number of samples of the sample set in each client node and the training duration of the last round of training period;

and the sending unit 604 is configured to send the time constant of the training period and the global model aggregation parameter of the present round to each client node respectively.

With the server node 60 provided by the embodiment of the present application, since the server node 60 adopts the same inventive concept as the model training method provided by the embodiment of the present application, the server node 60 can solve the technical problem on the premise that the method can solve the technical problem, which is not described herein.

Based on the same inventive concept as the model training method provided by the embodiment of the present application, the embodiment of the present application further provides a model training system, which is based on the idea of federal learning, so that, in terms of system structure, as shown in fig. 7, the model training system 70 includes a server node 60 provided by the embodiment of the present application, and a plurality of client nodes 50 connected to the server node 60, provided by the embodiment of the present application, so that training of a model is achieved through cooperative training of the server node 60 and the plurality of client nodes 50 in the model training system 70.

As shown in fig. 8, the present embodiment further provides an electronic device 8, including: at least one processor 81, and a memory 82, one processor being illustrated in fig. 8. The processor 81 and the memory 82 may be connected by a bus 80, the memory 82 storing instructions executable by the processor 81, the instructions being executable by the processor 81 to enable the electronic device 8 to implement all or part of the flow of the method in an embodiment of the application.

In practical applications, the electronic device 8 may be a mobile phone, a notebook computer, a desktop computer, or a large server or a server cluster formed by the same. For example, in the scenario provided by the embodiment of the present application, the electronic device 8 may be used as a server node or a client node.

The embodiment of the application also provides a storage medium, and the storage medium stores a computer program, and the computer program can be executed by a processor to complete all or part of the flow of the method in the embodiment of the application. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD), etc. The storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present application have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the application, and such modifications and variations are within the scope of the application as defined by the appended claims.

Claims

1. A method of model training, the method comprising:

2. The method according to claim 1, wherein the method further comprises: calculating the value of n according to the ratio of the training duration to the time constant, wherein the value of n is larger than or equal to the ratio; the method comprises the steps of,

3. The method according to claim 2, wherein the method further comprises: and (3) calculating the value of p by multiplying the value of n by the ratio of the time constant to the training duration and rounding down.

4. A method according to claim 3, characterized in that selecting p non-empty sub-sample sets from the n non-empty sub-sample sets trains the updated local model, in particular comprising:

5. The method of claim 1, wherein in the event that the client node is a fast node, the method further comprises:

6. The method of claim 1, wherein in the case where the client node is a slow node, the method further comprises:

7. The method of claim 1, wherein in the case where the client node is a slow node, the method further comprises calculating the client node by the following formula, the update gradient of the parameters of the local model during the present training period:

8. A method of model training, the method comprising:

9. A client node, comprising:

10. A server node, comprising:

11. A model training system comprising a server node and a plurality of client nodes connected to the server node, wherein:

12. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions; wherein the processor is configured to perform the method of any of claims 1-8.

13. A storage medium storing a computer program executable by a processor to perform the method of any one of claims 1-8.