CN116266283A

CN116266283A - Federal learning method and device, central server and data terminal

Info

Publication number: CN116266283A
Application number: CN202111537430.3A
Authority: CN
Inventors: 周颖
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2023-06-20

Abstract

The invention provides a federal learning method, a federal learning device, a central server and a data terminal, wherein the federal learning method comprises the following steps: fusion calculation is carried out on first training model parameters sent by a plurality of data ends, so that first global model parameters are obtained, and the first global model parameters and a first threshold value are sent to each data end; receiving a second training model parameter recalculated by each data end of the second round of federation learning to obtain a second global model parameter; and determining the second global model parameter as a converged global model parameter under the condition that the difference value between the second global model parameter and the first global model parameter is smaller than a preset value. According to the invention, by setting the threshold value, the data terminal decides whether to participate in the next round of federal learning according to the corresponding threshold value, and the abnormal data terminal can be removed, so that the final model effect of federal learning is ensured, and the communication expense is reduced.

Description

Federal learning method and device, central server and data terminal

Technical Field

The invention relates to the field of artificial intelligence, in particular to a federal learning method, a federal learning device, a central server and a data terminal.

Background

Federal learning is to perform joint training to obtain a global model on the premise that users do not disclose data mutually, so that the problem of data privacy safety is effectively solved. In general, in the realization flow of the federal learning algorithm, a data end is required to send complete model parameters to a central server for updating in each global model training round. As existing models increasingly tend to use complex deep neural networks, which require multiple rounds of iterative updating, significant communication overhead is incurred. Especially in the case of large global model sizes, network bandwidth limitations and increased number of data end nodes may exacerbate federal learning communication bottlenecks. On the premise of ensuring that the performance of the model is not reduced, the training efficiency of federal learning is effectively improved, the communication cost is reduced, and the method becomes a research hotspot problem of current federal learning.

Currently, the ways to address the communication overhead of federal learning are generally: reducing the communication times, reducing the communication transmission content, optimizing the algorithm, optimizing the network and the like. The existing methods for solving the communication overhead mainly comprise the following steps: 1. model compression: in the federal model training process, the structures of the model to be transmitted can be defined among the participants, and the model transmission is only carried out according to the defined structures, such as the mode of quantitatively encoding the model. The model compression can effectively reduce the parameter transmission quantity, improve the training efficiency of federal learning, protect the original parameters of the model from leakage to a certain extent, and improve the safety of the model. 2. And (3) reducing the communication times: the communication frequency of the feedback of the client is usually reduced, so that the client can update locally, and the updated information aggregation model is periodically sent to the central server at intervals or set fixed local model times. This periodic averaging scheme reduces the number of communications between the central server and the clients, thereby reducing the overall communication cost of the training model.

The first method described above speeds up communication transmission from the transmission content by compressing the model, which is usually at the expense of accuracy of the model, and the reasons that communication transmission is usually affected are mainly network delay, interruption, etc., and cannot solve the main problem. The second method reduces the communication frequency, so as to reduce the communication frequency, and the model transmission is performed after a plurality of rounds of local communication, which is very likely to cause local convergence and affect the final model effect.

Therefore, it is desirable to design a federal learning method that reduces communication overhead while guaranteeing the final model effect.

Disclosure of Invention

The embodiment of the invention provides a federal learning method, a federal learning device, a central server and a data terminal, which are used for solving the problems that in the prior art, the final model effect of federal learning cannot be ensured and the communication cost is reduced.

In order to solve the technical problems, the embodiment of the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a federal learning method, applied to a central server, where the method includes:

receiving a model training result sent by each data end in a plurality of data ends; the model training result comprises a first training model parameter;

Fusion calculation is carried out on the plurality of first training model parameters to obtain global model parameters of the first-round federation learning, and the first global model parameters of the first-round federation learning and a first threshold value are sent to each data end;

receiving second training model parameters which are sent by each data end participating in the second-round federation learning and recalculate according to the first global model parameters of the first-round federation learning and the data amount of the data end, and carrying out fusion calculation on the second training model parameters to obtain second global model parameters of the second-round federation learning; whether to participate in the second federation learning is determined by the data end according to the first global model parameter, the first training model parameter and the first threshold value of the first federation learning;

and determining the second global model parameter as a converged global model parameter under the condition that the difference value between the second global model parameter and the first global model parameter is smaller than a preset value.

Optionally, the method further comprises:

when the difference value between the second global model parameter and the first global model parameter is larger than or equal to the preset value, sending a second global model parameter and a second threshold value of the second round of federation learning to a data end participating in the second round of federation learning;

Receiving third training model parameters which are sent by each data end participating in the third-round federation learning and are obtained by recalculating the second global model parameters and the data volume of the data end according to the second-round federation learning, and carrying out fusion calculation on the third training model parameters to obtain third global model parameters of the third-round federation learning; whether to participate in the third-round federal learning is determined by the data end according to the second global model parameter, the second training model parameter and the second threshold value of the second-round federal learning;

and determining the third global model parameter as the converged global model parameter under the condition that the difference value between the third global model parameter and the second global model parameter is smaller than the preset value.

Optionally, the first threshold and the second threshold are each determined from a round of federal learning.

Optionally, the model training result further includes: the first training model parameters send time stamps, data amount of the data end and physical machine information;

receiving a second training model parameter which is sent by each data end participating in the second round of federation learning and is recalculated according to the first global model parameter of the first round of federation learning and the data volume of the data end, wherein the second training model parameter comprises the following components:

Receiving second training model parameters which are sent by each data end participating in the second-round federation learning and recalculated according to the first global model parameters of the first-round federation learning and the data amount of the data end in the waiting time of the second-round federation learning;

the waiting time of the second-round federation learning is determined according to a first training model parameter sending time stamp, a first training model parameter reaching time stamp, data volume of the data end and physical machine information of the data end corresponding to the data end participating in the first-round federation learning.

Optionally, the method further comprises:

in the waiting time of the second round of federation learning, a second training model parameter sent by the first data end is not received, and in the waiting time of the nth round of federation learning, under the condition that the second training model parameter sent by the first data end is received, performing a performance calculation on the second training model parameter and a global model parameter obtained by the (N-1) th round of federation learning to obtain a first alignment degree;

under the condition that the first alignment degree is smaller than a threshold value corresponding to the N-th round of federation learning, performing fusion calculation on the second training model parameters sent by the first data end and the second training model parameters sent by each data end participating in the N-th round of federation learning to obtain global model parameters of the N-th round of federation learning;

Transmitting global model parameters and corresponding thresholds of the Nth round of federation learning to each data end participating in the Nth round of federation learning; wherein the first data end is one of the data ends of each data end participating in the second-round federation learning;

n is a positive integer greater than or equal to 3.

Optionally, the method further comprises:

determining the feedback time length corresponding to each data end according to the first training model parameter arrival time stamp, the first training model parameter sending time stamp, the data amount of the data end and the physical machine information of the data end corresponding to each data end participating in the first round of federal learning;

and determining the maximum time length in the feedback time length corresponding to each data end as the waiting time length of the second round of federal learning.

In a second aspect, an embodiment of the present invention further provides a federal learning method, applied to a data end, where the method includes:

initial model training is carried out, and a model training result is sent to a central server; the model training result comprises a first training model parameter;

receiving a first global model parameter and a first threshold value, which are transmitted by the central server, of the first federation learning of the central server;

And under the condition that participation in second-round federal learning is determined according to the first global model parameter, the first training model parameter and the first threshold value of the first-round federal learning, recalculating the received first global model parameter and data end data volume to obtain a second training model parameter and sending the second training model parameter to the central server so as to obtain a converged global model parameter by the central server.

Optionally, in the case of determining to participate in the second round of federal learning according to the first global model parameter, the first training model parameter, and the first threshold value of the first round of federal learning, before recalculating the received first global model parameter and the data volume to obtain the second training model parameter and sending the second training model parameter to the central server, the method further includes:

carrying out consistency calculation on the first global model parameters and the first training model parameters to obtain a second alignment degree;

and determining to participate in a second round of federal learning if the second alignment is greater than the first threshold.

In a third aspect, an embodiment of the present invention further provides a federal learning apparatus, applied to a central server, where the apparatus includes:

the first receiving module is used for receiving the model training result sent by each data end in the plurality of data ends; the model training result comprises a first training model parameter;

The first processing module is used for carrying out fusion calculation on the plurality of first training model parameters to obtain global model parameters of the first-round federal learning, and sending the first global model parameters of the first-round federal learning and a first threshold value to each data end;

the second processing module is used for receiving second training model parameters which are sent by each data end participating in the second round of federal learning and recalculate according to the first global model parameters of the first round of federal learning and the data volume of the data end, and carrying out fusion calculation on the second training model parameters to obtain second global model parameters of the second round of federal learning; whether to participate in the second federation learning is determined by the data end according to the first global model parameter, the first training model parameter and the first threshold value of the first federation learning;

and the first determining module is used for determining the second global model parameter as a converged global model parameter under the condition that the difference value between the second global model parameter and the first global model parameter is smaller than a preset value.

Optionally, the apparatus further comprises:

the first sending module is used for sending a second global model parameter and a second threshold value of the second round of federal learning to a data end participating in the second round of federal learning under the condition that the difference value between the second global model parameter and the first global model parameter is larger than or equal to the preset value;

The parameter receiving module is used for receiving third training model parameters which are sent by each data end participating in the third federation learning and are obtained by recalculating the second global model parameters and the data volume of the data end according to the second federation learning, and carrying out fusion calculation on the third training model parameters to obtain third global model parameters of the third federation learning; whether to participate in the third-round federal learning is determined by the data end according to the second global model parameter, the second training model parameter and the second threshold value of the second-round federal learning;

and the second determining module is used for determining the third global model parameter as the converged global model parameter under the condition that the difference value between the third global model parameter and the second global model parameter is smaller than the preset value.

the second processing module includes:

the first receiving unit is used for receiving second training model parameters which are sent by each data end participating in the second round of federal learning and recalculate according to the first global model parameters of the first round of federal learning and the data volume of the data end in the waiting time of the second round of federal learning;

Optionally, the second processing module further includes:

the first processing unit is used for performing a performance calculation on the second training model parameter and the global model parameter obtained by the N-1 th round of federation learning to obtain a first alignment degree under the condition that the second training model parameter transmitted by the first data end is not received in the waiting time of the second round of federation learning and the second training model parameter transmitted by the first data end is received in the waiting time of the N-1 th round of federation learning;

the second processing unit is used for carrying out fusion calculation on the second training model parameters sent by the first data end and the second training model parameters sent by each data end participating in the N-th round of federal learning under the condition that the first alignment degree is smaller than a threshold value corresponding to the N-th round of federal learning to obtain global model parameters of the N-th round of federal learning;

the first sending unit is used for sending global model parameters and corresponding thresholds of the Nth round of federation learning to each data end participating in the Nth round of federation learning and the first data end;

Wherein the first data end is one of the data ends of each data end participating in the second-round federation learning;

n is a positive integer greater than or equal to 3.

Optionally, the second processing module further includes:

a first determining unit for

and the second determining unit is used for determining that the maximum time length in the feedback time length corresponding to each data end is the waiting time length of the second round of federal learning.

In a fourth aspect, an embodiment of the present invention further provides a federal learning apparatus, applied to a data end, where the apparatus includes:

the third processing module is used for carrying out initial model training and sending a model training result to the central server; the model training result comprises a first training model parameter;

the second receiving module is used for receiving a first global model parameter and a first threshold value, which are transmitted by the central server, of the first federal learning of the central server;

And the fourth processing module is used for recalculating the received first global model parameters and data volume of the data end to obtain second training model parameters and sending the second training model parameters to the central server so as to obtain converged global model parameters by the central server under the condition that the participation in the second round of federal learning is determined according to the first global model parameters, the first training model parameters and the first threshold value of the first round of federal learning.

Optionally, the apparatus further comprises:

the fifth processing module is used for carrying out consistency calculation on the first global model parameter and the first training model parameter to obtain a second alignment degree;

and a third determining module, configured to determine to participate in a second round of federal learning if the second alignment degree is greater than the first threshold.

In a fifth aspect, an embodiment of the present invention further provides a central server, including: a processor, a memory, and a program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the federal learning method according to any of the first aspects.

In a sixth aspect, an embodiment of the present invention further provides a data terminal, including: a processor, a memory, and a program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the federal learning method according to any of the second aspects.

In a seventh aspect, an embodiment of the present invention further provides a readable storage medium, where a program is stored, the program when executed by a processor implementing the steps of the federal learning method according to any one of the first aspects, or implementing the steps of the federal learning method according to any one of the second aspects.

The beneficial effects of the invention are as follows:

according to the scheme, the threshold value is set in each round of federation learning, so that the data end determines whether to participate in federation learning of the next round according to the global model parameter, the training model parameter and the corresponding threshold value of the previous round of federation learning, and the data end node with abnormality or inconsistent with the overall data condition can be eliminated from the data end, thereby realizing the final model effect of federation learning and reducing communication expenditure.

Drawings

FIG. 1 is a flow chart of a federal learning method applied to a central server according to an embodiment of the present invention;

FIG. 2 shows a flowchart of a federal learning method applied to a data end according to an embodiment of the present invention;

FIG. 3 is a flow chart of first round federal learning provided by an embodiment of the present invention;

FIG. 4 is a flow chart of federal learning cycle training provided by an embodiment of the present invention;

FIG. 5 is a flow chart of a federal learning method provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of a federal learning device according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a federal learning device according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a structure of a central server according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a data terminal according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and the specific embodiments thereof in order to make the objects, technical solutions and advantages of the present invention more apparent.

Aiming at the problems that in the prior art, the final model effect of federation learning cannot be ensured and communication overhead is reduced, the invention provides a federation learning method, a federation learning device, a central server and a data terminal.

As shown in fig. 1, an embodiment of the present invention provides a federal learning method, which is applied to a central server, and includes:

step 101: receiving a model training result sent by each data end in a plurality of data ends; the model training results include first training model parameters.

It should be noted that, the central server provided in the embodiment of the present invention is connected to a plurality of data terminals.

The first-round federal learning process at the data end comprises the following steps: each data end performs initial model training locally to obtain local model training results, wherein the model training results comprise first training model parameters G _i ¹ Wherein i is the ith data terminal.

After each data end obtains the local model training results, the first training model parameters G in the model training results are respectively sent to the central server _i ¹ . The central server sequentially receives the first training model parameters G sent by each data end in the first round of federal learning _i ¹ 。

Step 102: and carrying out fusion calculation on the plurality of first training model parameters to obtain first global model parameters of the first-round federation learning, and sending the first global model parameters of the first-round federation learning and a first threshold value to each data end.

In the process of first-round federation learning by the central server, sequentially receiving first training model parameters G sent by each data end _i ¹ Then, for all the first training model parameters G _i ¹ Performing integration calculation to obtain a first global model parameter G of first-round federal learning ¹ . Transmitting the first global model parameter G to each data end participating in first-round federal learning ¹ Initial threshold V ₁ (first threshold). The first threshold is determined by the central server, so that each subsequent data end can determine whether to participate in federal learning of the second round according to the first threshold.

The function of the global model parameters is:

G ^j ＝F(G ₁ ^j ,G ₂ ^j ,...,G _n ^j )

wherein j is the j-th federation learning round, and the global model parameters are the training model parameters G of each data end ₁ ^j ,G ₂ ^j ,...,G _n ^j Fusion calculation shows that the first global model parameter in the first round is G ¹ The first global model parameters are obtained by fusion calculation of the first training model parameters of each data end.

Step 103: receiving second training model parameters which are sent by each data end participating in the second-round federation learning and recalculate according to the first global model parameters of the first-round federation learning and the data amount of the data end, and carrying out fusion calculation on the second training model parameters to obtain second global model parameters of the second-round federation learning; whether to participate in the second federation study is determined by the data end according to the first global model parameter, the first training model parameter and the first threshold value of the first federation study.

It should be noted that, whether to participate in the second round of federal learning is determined by the data end according to the first global model parameter, the first training model parameter and the first threshold value of the first round of federal learning, and whether to participate in the next round of federal learning in the federal learning process of the third round and the subsequent rounds is determined by the data end according to the global model parameter, the training model parameter and the corresponding threshold value of the last round of federal learning.

Specifically, after the first round of federation learning, the data end and the central server respectively perform federation learning cycle training, and at the data end, each data end participating in the first round of federation learning receives a first global model parameter G of the first round of federation learning ¹ First global model parameter G for first-round federal learning ¹ With the first training model parameter G _i ¹ Performing consistency calculation to obtain a second alignment degree, and combining the second alignment degree with an initial threshold V ₁ Comparing if the second alignment degree of one data terminal is smaller than the initial threshold V ₁ The data of the local data end and the data of other data ends are regarded as larger differences and can be regarded as abnormal data, so that the data end does not need to feed back updated second training model parameters to the central server in the federal learning process of the second round; if the second alignment degree of one data terminal is greater than the initial threshold V ₁ And the data of the local data end is considered to be similar to the overall distribution of the data of other data ends, and the data conditions are similar, so that the data continues to participate in the federal learning of the second round. Each data end participating in the second round of federal learning is combined with a first global model parameter G ¹ And the local data volume m _i (data volume of data end) re-calculating to obtain second training model parameters G updated by federal learning of the second round _i ² And will update the second training model parameters G _i ² And sending the data to a central server. The central server receives updated second training model parameters G sent by the data end participating in the second round of federal learning _i ² 。

The data end performs consistency calculation on the global model parameters and the training model parameters as follows:

wherein N is the total number of parameters, r _j Is the second degree of alignment. The consistency is thatThe principle of the calculation function is as follows: and calculating how many parameters in the two updates have the same sign, and normalizing the result through the total number N of the parameters. The percentage of the same symbol parameter given in both updates is calculated and used to measure the alignment of the local updates.

The consistency calculation function provided by the embodiment of the invention can effectively update by the data end whether the data end follows the writing direction or is only an abnormal value, and the consistency of the comparison parameter symbol and the global model parameter is directly measured without considering the learning rate and the size of the local data set.

In the whole federal learning process, the training model parameters are calculated from the local data end, and the global model parameters fed back to the data ends by the central server are one federal learning round.

The central server receives updated second training model parameters G sent by the data end participating in the second round of federal learning _i ² Updated second training model parameters G sent by all data terminals _i ² Fusion calculation is carried out to obtain a second global model parameter G of the second round of federal learning ² . Transmitting the second global model parameter G to a data end participating in the second round of federal learning ² And the data side and the central server repeat the corresponding steps in step 103, respectively.

In the embodiment of the invention, in order to reduce the frequency of federal learning, a consistency calculation function for judging the overall model parameters and the training model parameters is added at the data end, and if the inconsistency is extremely large, the data of the data end is regarded as abnormal data, so that the data end does not continuously participate in federal learning, and the communication frequency of federal learning can be reduced. Specifically, the embodiment of the invention calculates and counts how many parameters exist in the two updates and has the same sign, normalizes the result through the total number of the parameters, calculates the percentage of the parameters with the same sign in the two updates, measures the consistency (second alignment degree) of the local update and the global update according to the percentage, and compares with the threshold value to judge whether the consistency is required to be proposed or not, thereby eliminating a data set which is not consistent with the condition of the whole data, and reducing unnecessary communication.

Step 104: and determining the second global model parameter as a converged global model parameter under the condition that the difference value between the second global model parameter and the first global model parameter is smaller than a preset value.

It should be noted that after each round of federation learning, the central server needs to calculate whether the difference between the global model parameter of the round of federation learning and the global model parameter of the previous round of federation learning is smaller than a preset value, that is, whether the obtained global model parameter converges, and if the difference is smaller than the preset value (the global model parameter converges), the federation learning is ended, that is, if the difference between the second global model parameter and the first global model parameter is smaller than the preset value, it is determined that the second global model parameter converges, and then the second global model parameter is the converged global model parameter.

According to the embodiment of the invention, the data end determines whether to participate in the federation learning of the next round according to the global model parameter, the training model parameter and the corresponding threshold value of the federation learning of the previous round by setting the threshold value in each round of federation learning, and the data end node with abnormality or inconsistent with the overall data condition can be eliminated from the data end, so that the final model effect of federation learning is ensured, and the communication cost is reduced.

Optionally, the method further comprises:

It should be further noted that, if after each round of federation learning, the central server calculates that the difference between the global model parameter of the federation learning round and the global model parameter of the previous federation learning round is greater than or equal to a preset value, that is, the obtained global model parameter does not converge, then the global model parameter of the federation learning round and the corresponding threshold value are sent to the data end participating in the federation learning round, and the next round of federation learning is continued, that is, the step of step 103 is continuously repeated until the global model parameter converges.

Specifically, under the condition that the difference value between the second global model parameter and the first global model parameter is larger than or equal to the preset value, the second global model parameter is not converged, the second global model parameter and a second threshold value of the second round of federation learning are sent to a data end participating in the second round of federation learning, the data end obtains a second alignment degree of the second round of federation according to the second global model parameter and the second training model parameter, if the second alignment degree is larger than the second threshold value, federation learning participating in a third round of federation is determined, the data end recalculates according to the second global model parameter and the data end data volume of the second round of federation learning to obtain a third training model parameter, fusion calculation is carried out on the third training model parameter to obtain a third global model parameter of the third round of federation, if the difference value between the third global model parameter and the second global model parameter is smaller than the preset value, the third global model parameter is determined to be the converged global model parameter, and otherwise, the fourth round of federation learning is continued until the global model parameter converges.

As a preferred embodiment, both the first threshold and the second threshold are determined from a round of federal learning.

That is, the first threshold is determined from a learning round of the first round of federal learning and the second threshold is determined from a learning round of the second round of federal learning.

Specifically, the corresponding threshold for each round of federal learning is determined by the central server according to the threshold function model as follows:

wherein j is the j-th federation learning round. In the first round of federal learning, the threshold was set to 0.5, and the threshold gradually increased as the number of rounds of federal learning increased.

Preferably, the model training result further comprises: the first training model parameters send time stamps, data amount of the data end and physical machine information;

That is, the model training results further include: training model parameter transmit timestamp t _i Data amount m of data end _i Physical machine information H _i . In the federal learning process of each round, the central server transmits a time stamp t according to the training model parameters transmitted by each data end _i Training model parameters reach a timestamp T _i Data amount m of data end _i Physical machine information H of data end _i Calculating, and measuring to obtain updated training of each data endFeedback duration S of training model parameters _i According to the feedback time length S of the updated training model parameters of each data end _i And determining the waiting time length, and collecting updated training model parameters sent by each data end participating in the next round of federation learning in the waiting time length in the next round of federation learning process, so as to continuously fuse in the federation learning process to obtain global model parameters of the federation learning.

In the process of the second-round federal learning, determining the waiting time of the second-round federal learning according to the first training model parameter sending time stamp, the first training model parameter reaching time stamp, the data volume of the data end and the physical machine information of the data end corresponding to the data end participating in the first-round federal learning, and receiving the first global model parameter and the second training model parameter which are recalculated according to the first global model parameter and the data volume of the data end sent by each data end participating in the second-round federal learning in the waiting time of the second-round federal learning.

Further, the method further comprises:

transmitting global model parameters and corresponding thresholds of the Nth round of federation learning to each data end participating in the Nth round of federation learning; the first data end is one of the data ends of each data end participating in the second-round federation learning;

n is a positive integer greater than or equal to 3.

It should be noted that, in the process of federal learning of a certain round, if the data end returns updated training model parameters to the central server within the waiting time period S, the signal of the data end is considered to be normal, the central server receives the updated training model parameters in the process of federal learning of the round, performs fusion calculation on the updated training model parameters to obtain global model parameters of federal learning of the round, and sends the global model parameters to the data end; if the data end does not return updated training model parameters to the central server within the waiting time length S, the signal of the data end is considered to be interrupted, and the central server does not send the global model parameters to the data end after obtaining the global model parameters of the federation learning of the present round in the federation learning process of the present round.

Further, in the process of federation learning of a certain round, if the central server receives the updated training model parameters sent by the data end with signal interruption, the central server compares the updated training model parameters with the fused global model parameters to obtain a first alignment, if the first alignment is smaller than the corresponding threshold value of federation learning of the round, the updated training model parameters sent by the data end are fused and calculated, and the global model parameters of federation learning of the round are fed back to the data end, so that the data end is considered to be re-accessed in the process of federation learning of the next round, namely, communication of the data end is restored.

In the second federal learning process, the central server does not receive the updated second training model parameters transmitted by the first data end in the waiting period of the second federal learning, and in the condition that the second training model parameters transmitted by the first data end are received in the waiting period corresponding to the fourth federal learning (N is 4), the central server performs consistency calculation on the second training model parameters transmitted by the first data end and the global model parameters of the third federal learning to obtain a first alignment degree, and if the first alignment degree is smaller than the corresponding threshold value V of the fourth federal learning ₄ The second training model parameters sent by the first data end and other participating fourth-wheel federation are combinedThe updated training model parameters sent by each learned data end are subjected to fusion calculation to obtain global model parameters of the fourth-round federal learning, and the global model parameters are sent to each data end and the first data end participating in the fourth-round federal learning and correspond to a threshold V ₄ The first data terminal is considered to be re-accessed.

As a preferred embodiment, the method further comprises:

That is, the central server determines the feedback duration S corresponding to each data end according to the arrival time stamp of the training model parameter, the transmission time stamp of the training model parameter, the data amount of the data end and the physical machine information of the data end, which are received by the central server in this round, sent by each data end _i Selecting the feedback time length S of each data end _i The largest value is the waiting time S corresponding to the next federal learning. In the next federation learning process, the central server waits for the duration S, and collects updated training model parameters sent by each data end participating in the next federation learning process, namely, the waiting duration S represents the cut-off duration of the feedback updated training model parameters of each data end collected by the central server in the federation learning of the subsequent round.

Specifically, in the first federation learning process of the central server, the central server sequentially receives model training results sent by each data end, and sends a time stamp t according to first training model parameters of each data end _i First training model parameter arrival time stamp T _i Data end physical machine information H of each data end _i Data amount m of data end _i Calculating and measuring the updated second training model parameter feedback time length S of each data end _i Determining each data endIs updated the second training model parameter feedback duration S _i The largest value in the number is the waiting time S.

The feedback duration of each data end is as follows:

S _i ＝T _i -t _i +f(m _i +M _i ),i＝1,...,n

wherein n is n data ends in the first round of federal learning process, f (m) _i +M _i ) Calculating a time length for the model, S _i The feedback time length of each data end.

According to the feedback time length of each data end, the formula for determining the waiting time length is as follows:

S＝max[S ₁ ，S ₂ ，...,S _n ]

wherein S is ₁ ，S ₂ ，...,S _n And S is the waiting time length for the feedback time length of each data terminal.

In the embodiment of the invention, the data end with the communication process or communication interruption is removed in the time waiting mechanism of the central server, so that unnecessary communication feedback is reduced, and the learning efficiency of federal learning is improved. In addition, a federal learning re-admission mechanism is added at the central server to recover the data end nodes that are communication delays and helpful to the global model. Specifically, the maximum waiting time is obtained by measuring the model calculation time and the transmission time of the data end, the data end exceeding the maximum waiting time does not participate in the federal learning of the round, the data end node with communication delay is removed, the problem of overlong waiting communication in the federal learning process is solved, a re-access mechanism of the data end node is also added, the data end node can perform model parameter consistency calculation when the central server receives the feedback of the data end because of slower feedback of the communication reason, the first alignment degree is obtained, and if the first alignment degree is larger than the current threshold, the communication of the data end is restored and the global model parameters of the data end node are fed back.

As shown in fig. 2, the embodiment of the present invention further provides a federal learning method, which is applied to a data end, and the method includes:

step 201: initial model training is carried out, and a model training result is sent to a central server; the model training results include first training model parameters.

It should be noted that, the central server is connected to a plurality of data terminals, and each data terminal executes the federal learning method in the embodiment of the present invention.

In the first-round federal learning process, the data end firstly performs initial model training to obtain respective local model training results, the model training results comprise first training model parameters, and after the respective data end obtains the respective local first model training results, the data end respectively sends the first training model parameters in the respective model training results to the central server.

Step 202: and receiving a first global model parameter and a first threshold value, which are transmitted by the central server, of the first federation learning of the central server.

And in the process of first-round federation learning, the central server sequentially receives the first training model parameters sent by each data end, and then performs integrated calculation on all the first training model parameters to obtain first global model parameters of the first-round federation learning, and sends the first global model parameters and an initial threshold (first threshold) to each data end participating in the first-round federation learning. The initial threshold is determined by the central server, so that each subsequent data end can determine whether to participate in the federal learning of the next round according to the initial threshold.

It should be noted that, the model training result further includes a first training model parameter sending timestamp, a data end data volume and data end physical machine information, the central server determines a waiting duration according to the first training model parameter arrival timestamp, the first training model parameter sending timestamp, the data end data volume and the data end physical machine information, the waiting duration represents a cut-off duration of the training model parameters updated by the feedback of each data end collected by the central server in the federal learning of the subsequent rounds.

According to the embodiment of the invention, the threshold value is set in each round of federation learning, so that the data end decides whether to participate in the federation learning of the next round according to the corresponding threshold value of the federation learning of the previous round, and the data end nodes with abnormality or inconsistent data and whole can be removed, thereby realizing the final model effect of federation learning and reducing communication expenditure.

The flow of first-round federation learning is described below with reference to fig. 3, taking the example that the central server connects two data ends (data end 1 and data end 2).

Step 1: the data end 1 and the data end 2 respectively perform initial model training to obtain a model training result, wherein the model training result comprises training model parameters (first training model parameters), a training model parameter sending time stamp (first training model parameter sending time stamp), data amount of the data end and physical machine information of the data end; step 2: the data end 1 and the data end 2 respectively send model training results to a central server; step 3: the central server calculates waiting time according to the training model parameters in the training model result, the data volume of the data end, the physical machine information of the data end and the arrival time stamp of the training model parameters, fuses the training model parameters to obtain global model parameters (first global model parameters), and calculates an initial threshold (first threshold); step 4: the central server sends global model parameters and initial thresholds to data terminal 1 and data terminal 2, respectively.

Step 203: and under the condition that participation in the second round of federal learning is determined according to the first global model parameter, the first training model parameter and the first threshold value of the first round of federal learning, recalculating the received first global model parameter and data volume of the data end to obtain a second training model parameter, and sending the second training model parameter to the central server so as to obtain a converged global model parameter by the central server.

That is, after receiving global model parameters of the previous round of federation learning, the data end determines whether to participate in the next round of federation learning according to the global model parameters, the training model parameters and the corresponding threshold values, if yes, recalculates according to the global model parameters and the data amount of the data end to obtain updated training model parameters, and sends the updated training model parameters to the central server, so that the central server determines whether to perform the next round of federation learning, wherein the central server needs to calculate whether a difference value between the global model parameters of the previous round of federation learning and the global model parameters of the previous round of federation learning is smaller than a preset value, that is, whether the obtained global model parameters are converged, if the difference value is smaller than the preset value (the global model parameters are converged), the federation learning is ended, if the difference value is larger than or equal to the preset value (the global model parameters are not converged), the global model parameters of the previous round of federation learning and the corresponding threshold values are sent to the data end of the previous round of federation learning, the next round of federation learning is continued, and the step is repeated, that is performed until the global model parameters are converged.

In an exemplary embodiment, when the participation in the second round of federal learning is determined according to the first global model parameter, the first training model parameter and the first threshold, the received first global model parameter and the data volume of the data end are recalculated to obtain a second training model parameter and sent to the central server, the central server obtains the second global model parameter according to the second training model parameter, if the central server determines that the difference between the second global model parameter and the first global model parameter is smaller than a preset value, federal learning is ended, the second global model parameter is determined to be a converged global model parameter, and if the central server determines that the difference between the second global model parameter and the first global model parameter is larger than or equal to the preset value, federal learning of the third round is continued until the global model parameter converges.

As a result of the fact that in a preferred embodiment,

under the condition that participation in the second round of federal learning is determined according to the first global model parameter, the first training model parameter and the first threshold value of the first round of federal learning, the received first global model parameter and data volume of the data end are recalculated to obtain a second training model parameter and sent to the central server, and the method further comprises:

Specifically, after receiving global model parameters of the first round of federation learning, the data end determines whether to participate in the next round of federation learning according to the first global model parameters, the first training model parameters and a first threshold value, wherein the process comprises the following steps: and (3) carrying out consistency calculation on the first global model parameter and the first training model parameter of the first round of federation learning to obtain a second alignment degree, comparing the second alignment degree with the initial first threshold value, and if the second alignment degree of one data end is smaller than the first threshold value, regarding that the data of the local data end and the data of other data ends have larger difference and can be regarded as abnormal data, so that in the next round of federation learning process, the data end does not need to feed back updated training model parameters to a central server, namely does not participate in the subsequent federation learning, and if the second alignment degree of one data end is larger than the first threshold value, regarding that the data of the local data end and the data of other data ends are distributed approximately in the whole, and the data condition is similar, so that the next round of federation learning is continuously participated.

Next, with reference to fig. 4, a flow of the federal learning cycle training will be described by taking an example in which a central server connects two data ends (data end 1 and data end 2).

Step 1: the data end 1 and the data end 2 respectively perform consistency calculation locally to obtain a second alignment degree, and compare the second alignment degree with a corresponding threshold (a first threshold) to determine whether to return to a locally updated training model parameter (a second training model parameter); step 2: the data end 1 and the data end 2 respectively send updated training model parameters to the central server; step 3: the central server fuses updated training model parameters respectively sent by the data end 1 and the data end 2 in the calculation waiting time to obtain global model parameters (second global model parameters), and calculates a corresponding threshold value (second threshold value) of the federation learning of the round; step 4: transmitting the global model parameters (second global model parameters) and the corresponding thresholds (second thresholds); the central server calculates whether the difference value between the global model parameter of the federation learning round and the global model parameter (the first global model parameter) of the federation learning round is smaller than a preset value, if the difference value is smaller than the preset value (the global model parameter converges), the federation learning is finished, and if the difference value is larger than or equal to the preset value, the steps are repeated until the global model parameter converges.

In the following, with reference to fig. 5, a flow of the federal learning method will be specifically described by taking an example in which a central server connects two data ends (data end 1 and data end 2).

The data end 1 and the data end 2 respectively calculate training model parameters; the central server calculates an initial threshold; the data end 1 and the data end 2 respectively send training model parameters, a training model parameter sending time stamp, data amount of the data end and physical machine information of the data end to the central server; the central server integrates the training model parameters, calculates to obtain global model parameters, and calculates to obtain waiting time according to the training model parameter arrival time stamp, the training model parameter sending time stamp, the data volume of the data end and the physical machine information of the data end; the central server feeds back global model parameters and corresponding thresholds to the data end 1 and the data end 2; the data end 1 and the data end 2 respectively calculate the alignment degree (second alignment degree) according to the global model parameter and the local training model parameter, judge whether the second alignment degree is larger than a corresponding threshold value, and if so, send updated training model parameters to the central server; the central server only collects updated training model parameters sent by each data end in the waiting time, calculates the updated training model parameters to obtain global model parameters, judges whether the global model parameters of the federal learning are converged, if yes, ends the federal learning, otherwise returns to the central server to integrate the training model parameters, calculates to obtain global model parameters, and carries out cyclic training according to the steps of the arrival time stamp of the training model parameters, the sending time stamp of the training model parameters, the data volume of the data end and the physical machine information of the data end to calculate to obtain the waiting time until the global model parameters are converged.

It should be noted that, in the above embodiments, all descriptions about the central server are applicable to the embodiments of the data end, and the same technical effects as those of the embodiments can be achieved.

As shown in fig. 6, an embodiment of the present invention further provides a federal learning apparatus, which is applied to a central server, and the apparatus includes:

a first receiving module 601, configured to receive a model training result sent by each of a plurality of data ends; the model training result comprises a first training model parameter;

the first processing module 602 is configured to perform fusion calculation on the plurality of first training model parameters to obtain global model parameters of the first federal learning, and send the first global model parameters of the first federal learning and a first threshold to each data end;

the second processing module 603 is configured to receive a second training model parameter that is sent by each data end participating in the second round of federal learning and recalculated according to the first global model parameter of the first round of federal learning and the data amount of the data end, and perform fusion calculation on the second training model parameter to obtain a second global model parameter of the second round of federal learning; whether to participate in the second federation learning is determined by the data end according to the first global model parameter, the first training model parameter and the first threshold value of the first federation learning;

A first determining module 604, configured to determine that the second global model parameter is a converged global model parameter if a difference between the second global model parameter and the first global model parameter is less than a preset value.

Optionally, the apparatus further comprises:

the second processing module 603 includes:

Optionally, the second processing module 603 further includes:

n is a positive integer greater than or equal to 3.

Optionally, the second processing module 603 further includes:

The first determining unit is used for determining the feedback duration corresponding to each data end according to the first training model parameter arrival time stamp, the first training model parameter sending time stamp, the data amount of the data end and the physical machine information of the data end corresponding to each data end participating in the first round of federal learning;

It should be noted that, the federal learning device provided in the embodiment of the present invention is a device capable of executing the federal learning method applied to the central server, so all embodiments of the federal learning method applied to the central server are applicable to the device, and the same or similar technical effects can be achieved.

As shown in fig. 7, an embodiment of the present invention further provides a federal learning device, applied to a data end, where the device includes:

the third processing module 701 is configured to perform initial model training and send a model training result to the central server; the model training result comprises a first training model parameter;

a second receiving module 702, configured to receive a first global model parameter and a first threshold value of a first federal learning of the central server, which are sent by the central server;

The fourth processing module 703 is configured to recalculate the received first global model parameter and data amount to obtain a second training model parameter and send the second training model parameter to the central server, where the first global model parameter, the first training model parameter, and the first threshold are determined to participate in the second federal learning according to the first federal learning round, so that the central server obtains the converged global model parameter.

Optionally, the apparatus further comprises:

It should be noted that, the federation learning device provided in the embodiment of the present invention is a device capable of executing the federation learning method applied to the data end, so all embodiments of the federation learning method applied to the data end are applicable to the device, and the same or similar technical effects can be achieved.

As shown in fig. 8, an embodiment of the present invention further provides a central server, including: a processor 800; and a memory 810 connected to the processor 800 through a bus interface, the memory 810 storing programs and data used by the processor 800 in performing operations, the processor 800 calling and executing the programs and data stored in the memory 810.

Wherein the central server further comprises a transceiver 820, the transceiver 820 being connected to the bus interface for receiving and transmitting data under the control of the processor 800; specifically, the processor 800 invokes and executes the programs and data stored in the memory 810, and the transceiver 820 performs the following processes:

receiving a model training result sent by each data end in a plurality of data ends; the model training results include first training model parameters.

The processor 800 performs the following process:

Optionally, the transceiver 820 is further configured to:

and under the condition that the difference value between the second global model parameter and the first global model parameter is larger than or equal to the preset value, sending a second global model parameter and a second threshold value of the second round of federal learning to a data end participating in the second round of federal learning.

The processor 800 is further configured to:

the transceiver 820 is specifically configured to:

Optionally, the processor 800 is specifically configured to:

n is a positive integer greater than or equal to 3.

Optionally, the processor 800 is specifically configured to:

Wherein in fig. 8, a bus architecture may comprise any number of interconnected buses and bridges, and in particular, one or more processors represented by processor 800 and various circuits of memory represented by memory 810, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides a user interface 830. The transceiver 820 may be a number of elements, i.e., including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 800 is responsible for managing the bus architecture and general processing, and the memory 810 may store data used by the processor 800 in performing operations.

As shown in fig. 9, an embodiment of the present invention further provides a data terminal, including: a processor 900; and a memory 910 connected to the processor 900 through a bus interface, the memory 910 being configured to store programs and data used by the processor 900 when performing operations, the processor 900 calling and executing the programs and data stored in the memory 910.

Wherein the data terminal further comprises a transceiver 920, the transceiver 920 is connected to the bus interface, and is used for receiving and transmitting data under the control of the processor 900; specifically, the processor 900 calls and executes the programs and data stored in the memory 910, and the processor 900 performs the following processes:

initial model training is carried out, and a model training result is sent to a central server; the model training results include first training model parameters.

The transceiver 920 performs the following procedure:

and receiving a first global model parameter and a first threshold value, which are transmitted by the central server, of the first federation learning of the central server. The processor 900 performs the following process:

Optionally, before the processor 900 recalculates the received first global model parameter and the data volume to obtain the second training model parameter and sends the second training model parameter to the central server, where the first global model parameter, the first training model parameter and the first threshold are determined to participate in the second round of federal learning according to the first round of federal learning, the processor 900 is further configured to:

Wherein in fig. 9, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by processor 900 and various circuits of memory represented by memory 910, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides a user interface 930. Transceiver 920 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 900 is responsible for managing the bus architecture and general processing, and the memory 910 may store data used by the processor 900 in performing operations.

In addition, a specific embodiment of the present invention also provides a readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the federal learning method according to any one of the above.

In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the transceiving method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and changes can be made without departing from the principles of the present invention, and such modifications and changes are intended to be within the scope of the present invention.

Claims

1. A federal learning method, for use with a central server, the method comprising:

2. The federal learning method according to claim 1, wherein the method further comprises:

3. The federal learning method according to claim 2, wherein the first threshold and the second threshold are each determined from a round of federal learning.

4. The federal learning method according to claim 1, wherein the model training results further comprise: the first training model parameters send time stamps, data amount of the data end and physical machine information;

5. The federal learning method according to claim 4, wherein the method further comprises:

n is a positive integer greater than or equal to 3.

6. The federal learning method according to claim 4, wherein the method further comprises:

7. A federal learning method, for application to a data side, the method comprising:

8. The federal learning method according to claim 7, wherein, in the case of determining to participate in the second round of federal learning based on the first global model parameter, the first training model parameter, and the first threshold value of the first round of federal learning, before recalculating the received first global model parameter and data-side data amount to obtain the second training model parameter and transmitting the second training model parameter to the central server, the method further comprises:

9. A federal learning apparatus for use with a central server, the apparatus comprising:

10. A federal learning apparatus for use on a data side, the apparatus comprising:

11. A central server, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the federal learning method according to any one of claims 1 to 6.

12. A data terminal, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the federal learning method according to claim 7 or 8.

13. A readable storage medium, wherein a program is stored on the readable storage medium, which when executed by a processor, implements the steps of the federal learning method according to any one of claims 1 to 6, or implements the steps of the federal learning method according to claim 7 or 8.