CN116050548A

CN116050548A - Federal learning method and device and electronic equipment

Info

Publication number: CN116050548A
Application number: CN202310306787.3A
Authority: CN
Inventors: 谢翀; 陈永红; 兰鹏; 罗伟杰; 赵豫陕
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2023-03-27
Filing date: 2023-03-27
Publication date: 2023-05-02
Anticipated expiration: 2043-03-27
Also published as: CN116050548B

Abstract

The application discloses a federal learning method, a federal learning device and electronic equipment. The method comprises the following steps: sending meta-model and initial model parameters to each client so that each client performs a first round of model training based on the meta-model, the initial model parameters, training data of a client local and a target patch model; determining a plurality of target clients from the clients, and collecting first model parameters obtained by current n-th round training of the target clients; based on the first model parameters, calculating to obtain second model parameters for n+1 rounds of model training; sending the second model parameters to each target client so that each target client carries out model retraining based on the meta model, the second model parameters, the local training data and the target patch model to obtain a current second model; and (3) until the current second model obtained by training each client meets training conditions, and if not, re-determining a plurality of target clients. The method and the device can avoid the deviation of the training acquisition model.

Description

Federal learning method and device and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a federal learning method, a federal learning device, and an electronic device.

Background

Federal learning is essentially a distributed machine learning framework, and the core idea is to perform model joint training only through interaction model intermediate parameters on the premise that local individual or sample data do not need to be exchanged by performing distributed model training among a plurality of data sources with local data, so that the original data can not be found out locally.

However, in the existing federal learning method, although the plaintext data is guaranteed not to be local, a problem that a model of a part of clients is deviated, so that a local simulation effect is poor exists.

Disclosure of Invention

In view of the above, the invention provides a federal learning method, a federal learning device and an electronic device, and aims to solve the problem that the existing federal learning method is easy to cause deviation of a client training acquisition model and further cause poor local simulation effect.

To solve the above problems, the present application provides a federal learning method, including:

sending a meta model and initial model parameters to each client so that each client performs a first round of model training based on the meta model, the initial model parameters, training data local to the client and a target patch model;

Determining a plurality of target clients for model retraining from the clients, and collecting first model parameters of a first model obtained by current nth round training of the target clients, wherein n is a positive integer;

based on the first model parameters of each target client, calculating and obtaining second model parameters for n+1 rounds of model training by adopting a preset calculation mode;

transmitting the second model parameters to each target client so that each target client performs model retraining based on the received meta-model, the second model parameters, training data of the local client and a target patch model to obtain a current second model;

judging whether the current second model obtained by training each client conforms to training conditions, and re-determining a plurality of target clients for model retraining when the current second model obtained by training each client does not conform to the training conditions; and stopping training when each client trains to obtain that the second model meets the training conditions.

Optionally, before sending the meta-model and the initial model parameters to each client, the method further includes:

receiving data identifiers of all training data corresponding to all data types sent by all clients;

Based on the data types and the number of the data identifiers sent by each client, calculating and obtaining the data duty ratio of the total data identifiers among the data types;

before each round of model training is carried out, a plurality of target data identifiers are determined from a plurality of data identifiers corresponding to each target data type based on the data duty ratio and the target data types contained in each client, and each target data identifier is sent to the corresponding client so as to redistribute training data for each client to carry out model training.

Optionally, the determining, based on the data duty ratio and the target data types included in each client, a plurality of target data identifiers from a plurality of data identifiers corresponding to each target data type specifically includes:

determining the target data duty ratio corresponding to each client based on the data duty ratio and the target data type contained in each client;

and determining a plurality of target data identifiers from the data identifiers of the data types sent by the corresponding clients based on the target data duty ratio of the clients.

Optionally, the calculating, based on the first model parameters of each target client, by using a predetermined calculation mode to obtain second model parameters for performing n+1 rounds of model training specifically includes:

Determining gradient parameters corresponding to each target client based on first model parameters obtained by n-th round training of each target client and historical model parameters obtained by n-1 th round training of each target client;

calculating to obtain target gradient parameters by adopting a preset calculation formula based on the gradient parameters of each target client;

second model parameters for n+1 rounds of model training are determined based on the target gradient parameters.

Optionally, the method further comprises: and determining any one of a mapping patch model, a residual patch model and an internal patch model as the target patch model based on the task type of the model training task, the data difference degree of the server data and the client data and the structure of the server meta model.

Optionally, determining any one of a mapping patch model, a residual patch model and an internal patch model as the target patch model based on a task type of the model training task, a data difference degree between server data and client data and a structural complexity of a server meta model specifically includes:

when the task type is a monitoring task or a positioning task, determining that the mapping patch model is the target patch model;

When the data difference between the server data and the client data is larger than a preset difference threshold, determining the residual patch model as the target patch model;

and when the structural complexity of the server meta-model is greater than the preset complexity, determining the internal patch model as the target patch model.

Optionally, the mapping patch model includes: mapping a network and an activation layer;

the residual patch model includes: a residual error connection layer;

the internal patch model includes: a convolution layer and an activation layer.

In order to solve the above problems, the present application provides a federal learning method, which is applied to each client, and includes:

receiving meta-model and initial model parameters sent by a server side, and performing first-round model training based on the meta-model, the initial model parameters, training data of a client side local and a target patch model;

receiving second model parameters sent by a server, wherein the second model parameters are obtained by calculating first model parameters obtained by the server based on n-th round training of a plurality of target clients; and carrying out n+1st round of model training based on the second model parameters sent by the server, the received meta model, training data of the local client and the target patch model.

Optionally, before receiving the meta-model and the initial model parameters sent by the server, the method further includes:

transmitting the data types and the data identifications of the training data corresponding to the data types to a server, so that the server calculates and obtains the data duty ratio of the total data identifications among the data types based on the data types and the number of the data identifications transmitted by the client; before each round of model training is carried out, the server determines a plurality of target data identifications from a plurality of data identifications corresponding to each target data type based on the data duty ratio and the target data types contained in each client;

and receiving each target data identifier sent by the server, determining target training data corresponding to the target data identifier from the local training data based on each target data identifier, and performing model training based on the target training data obtained through redistribution.

Optionally, the target patch model includes any one of the following: mapping a patch model, a residual patch model and an internal patch model;

wherein the mapping patch model comprises: mapping a network and an activation layer;

the residual patch model includes: a residual error connection layer;

The internal patch model includes: a convolution layer and an activation layer.

To solve the above problems, the present application provides a federal learning device, including:

the first sending module is used for sending the meta-model and the initial model parameters to each client so that each client carries out first-round model training based on the meta-model, the initial model parameters, training data of the local client and the target patch model;

the acquisition module is used for determining a plurality of target clients for model retraining from the clients and acquiring first model parameters of a first model obtained by the current nth round of training of the target clients, wherein n is a positive integer;

the calculation module is used for calculating and obtaining second model parameters for n+1 rounds of model training by adopting a preset calculation mode based on the first model parameters of each target client;

the second sending module is used for sending the second model parameters to each target client so that each target client performs model retraining based on the received meta-model, the second model parameters, training data of the local client and a target patch model to obtain a current second model;

The judging module is used for judging whether the current second model obtained by training each client accords with the training conditions, and re-determining a plurality of target clients for model retraining based on the acquisition module when the current second model obtained by training each client does not accord with the training conditions; and stopping training when each client trains to obtain that the second model meets the training conditions.

To solve the above problems, the present application provides a federal learning device, including: the receiving module and the model training module;

the receiving module is used for receiving meta-model parameters and initial model parameters sent by the server side and receiving second model parameters sent by the server side, wherein the second model parameters are obtained by calculating first model parameters obtained by the server side based on n-th round training of a plurality of target clients;

the model training module is used for performing a first round of model training based on the meta-model, the initial model parameters, the training data of the local client side and the target patch model, and performing an n+1th round of model training based on the second model parameters sent by the server side, the received meta-model, the training data of the local client side and the target patch model.

To solve the above problems, the present application provides an electronic device, at least including a memory, and a processor, where the memory stores a computer program, and the processor implements the steps of the federal learning method according to any one of the above steps when executing the computer program on the memory.

According to the federal learning method, the federal learning device and the electronic equipment, when each round of model training is performed, the target client side needing to perform the next round of model training is determined from each client side, then the model parameters of the next round of model training are obtained based on the current model parameter calculation of each target client side, so that the determination of the model parameters is more reasonable and accurate, the model parameters can be obtained by calculating the target client side based on the patch model and the server side when the next round of training is performed, the model conforming to the local simulation condition is obtained by accurate training, the problem that the final training obtaining model is deviated is avoided, and the model obtained by training of each client side has good local simulation effect.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flow chart of a federal learning method according to an embodiment of the present application;

FIG. 2 is a flow chart of a federal learning method according to yet another embodiment of the present application;

FIG. 3 is a block diagram of a federal learning device according to another embodiment of the present application;

FIG. 4 is a block diagram of a federal learning device according to another embodiment of the present application;

fig. 5 is a block diagram of an electronic device according to another embodiment of the present application.

Detailed Description

Various aspects and features of the present application are described herein with reference to the accompanying drawings.

It should be understood that various modifications may be made to the embodiments of the application herein. Therefore, the above description should not be taken as limiting, but merely as exemplification of the embodiments. Other modifications within the scope and spirit of this application will occur to those skilled in the art.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and, together with a general description of the application given above and the detailed description of the embodiments given below, serve to explain the principles of the application.

These and other characteristics of the present application will become apparent from the following description of a preferred form of embodiment, given as a non-limiting example, with reference to the accompanying drawings.

It is also to be understood that, although the present application has been described with reference to some specific examples, those skilled in the art can certainly realize many other equivalent forms of the present application.

The foregoing and other aspects, features, and advantages of the present application will become more apparent in light of the following detailed description when taken in conjunction with the accompanying drawings.

Specific embodiments of the present application will be described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application with unnecessary or excessive detail. Therefore, specific structural and functional details disclosed herein are not intended to be limiting, but merely serve as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.

The specification may use the word "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments as per the application.

The embodiment of the application provides a federal learning method, which can be specifically applied to electronic devices such as a server, as shown in fig. 1, and the federal learning method in the embodiment includes the following steps:

step S101, a meta model and initial model parameters are sent to each client so that each client performs a first round of model training based on the meta model, the initial model parameters, training data of the local client and a target patch model;

in this step, when the number of training rounds is the first round, the server may send the pre-stored meta-model and the initial model parameters of the meta-model to each client, so that each client performs local model training based on the local training data, the received meta-model and the received initial model parameters, to obtain the first model.

Step S102, determining a plurality of target clients for model retraining from the clients, and collecting first model parameters of a first model obtained by current n-th round training of the target clients, wherein n is a positive integer;

in this step, after the first round of model training is completed, in each subsequent round of model training, the server needs to screen out part of clients from the clients as target clients. Specifically, 20% of clients can be randomly determined as target clients. The specific percentage can be set and adjusted according to actual needs. After determining the target clients, the server may obtain model parameters of the first model obtained by current training of each target client. In the implementation process, the server side can further acquire the identifier of each target client side and the communication round number of the server side and the client side, and the identifier of the client side and the communication round number are acquired and stored in association with the identifier, the communication round number and the acquired first model parameters.

Step S103, calculating and obtaining second model parameters for n+1 rounds of model training by adopting a preset calculation mode based on the first model parameters of each target client;

in the implementation process, after the server side obtains the first model parameters of the current first model of each client side, the server side can calculate and obtain the model parameters of the target client side for carrying out the next round of model training based on the first model parameters, namely calculate and obtain the second model parameters. The calculation mode can adopt a mode of weighting calculation to obtain a relative value.

Step S104, the second model parameters are sent to each target client so that each target client carries out model retraining based on the received meta-model, the second model parameters, training data of the local client and a target patch model to obtain a current second model;

in this step, any one of the mapping patch model, the residual patch model and the internal patch model may be specifically determined to be the target patch model. In the step, each client is added with the patch module to perform model training, so that better initialization can be realized, the weakness that the client only performs one-step training when performing the model is overcome, and the model obtained through final training is more accurate.

Step S105, judging whether the current second model obtained by training each client accords with training conditions, and re-determining a plurality of target clients for model retraining when the current second model obtained by training each client does not accord with the training conditions; and stopping training when each client trains to obtain that the second model meets the training conditions.

In this step, the device model training conditions may be preset according to actual needs, so that after one round of model training is completed each time to obtain a model, whether the model of each client meets the requirements or not may be determined based on the training conditions, and when the model of any client does not meet the requirements, the model training needs to be continued, so that the step S102 may be returned. When all the models of the clients meet the requirements, training can be finished, namely federal learning is stopped.

According to the federal learning method, when each round of model training is performed, the target client side needing to perform the next round of model training is determined from the client sides, then model parameters of the next round of model training are obtained based on current model parameter calculation of each target client side, so that the determination of the model parameters is more reasonable and accurate, when the target client side performs the next round of training, the model parameters can be obtained based on patch model and server side calculation, a model conforming to the local simulation condition is obtained through accurate training, the problem that the model obtained through final training is deviated is avoided, and the model obtained through the training of each client side has good local simulation effect.

Based on the foregoing embodiment, a further embodiment of the present application provides a federal learning method, in this embodiment, when each round of model training is performed, the server may further reallocate, for each client, target training data for performing model training based on a data identifier and a data type of local training data uploaded by each client. That is, before executing step S101, that is, before the server sends the meta-model and the initial model parameters to each client, the server may also receive each data type sent by each client and the data identifier of each training data corresponding to each data type; then, based on the data types and the number of the data identifiers sent by each client, calculating and obtaining the data duty ratio of the total data identifiers among the data types; therefore, before each round of model training is performed, the server side can determine a plurality of target data identifications from a plurality of data identifications corresponding to each target data type based on the data duty ratio and the target data types contained in each client side, and send each target data identification to the corresponding client side so as to redistribute training data for performing model training for each client side.

Taking an optical character recognition (Optical Character Recognition, english: OCR) task in the image field as an example, the server needs to count OCR data of all participants (clients participating in federal learning) before initiating the federal learning task. The statistics do not obtain the real data of the clients, but rather require each client to count the data type and data identification of the local OCR dataset. And the server side serves as a global master to uniformly distribute training data for all clients based on the labels. Taking a client side including a client side A, a client side B and a client side C as an example for explanation, each client side can configure data identification for the local training data in advance, and meanwhile, the data type corresponding to each training data is determined. The data identifier can be information used for uniquely characterizing training data, such as an ID, a picture, a two-dimensional code, a character string and the like. The client A sends the data type A, the data type B and the data type C related to the local training data to the server, and simultaneously sends the data identifier 11, the data identifier 12 and the data identifier 13 of each training data corresponding to the data type A, the data identifier 21, the data identifier 22, the data identifier 23 and the data identifier 24 corresponding to the data type B, and the data identifier 31, the data identifier 32 and the data identifier 33 corresponding to the data type C to the server. Similarly, the client b sends the data type C and the data type D related to the local training data to each server, and simultaneously sends the data identifier 34 and the data identifier 35 of each training data corresponding to the data type C, and the data identifier 41, the data identifier 42 and the data identifier 43 of each training data corresponding to the data type D to each server. Similarly, the client side c sends the data type B and the data type D related to the local training data to each server side, and simultaneously sends the data identifier 25 and the data identifier 26 of each training data corresponding to the data type B, and the data identifier 44, the data identifier 45 and the data identifier 46 of each training data corresponding to the data type D to each server side.

Therefore, the server side can receive the data identification of each data type and training data corresponding to each data type sent by the client side A, the client side B and the client side C. Then the server can count the data amount of the data identifier corresponding to the same data type, and obtain the total data amount of the training data corresponding to each data type; and then calculating and obtaining the data duty ratio among the data types based on the total data amount corresponding to the data types. That is, the server may count 3 total data identifiers corresponding to the data type a, 6 total data identifiers corresponding to the data type B, 5 total data identifiers corresponding to the data type C, and 6 total data identifiers corresponding to the data type D, so that the server may determine that the data ratio of the total data identifiers between the data types is: data type a, data type B, data type C, data type d=3:6:5:6. After determining the data duty ratio, the server side can determine a plurality of target data identifications from a plurality of data identifications corresponding to each target data type based on the data duty ratio and the target data types contained in each client side during each round of model training.

In this embodiment, by receiving the data identifier sent by the client, the training data can be subsequently redistributed based on the data identifier, so as to avoid the problem of local data leakage of the client caused by directly receiving the training data. Meanwhile, by determining the data duty ratio, the server can accurately know the global distribution condition of the total training data, so that training data for model training can be reasonably distributed to all clients based on the global distribution condition of the total training data, and the generalization performance of all client models is improved.

In yet another embodiment of the present application, after obtaining the data duty ratio, the server may specifically determine each target data identifier by using the following manner: determining the target data duty ratio corresponding to each client based on the data duty ratio and the target data type contained in each client; and determining a plurality of target data identifiers from the data identifiers of the data types sent by the corresponding clients based on the target data duty ratio of the clients. For example, the server determines that the data ratio of the total data identifier between each data type is: data type a, data type B, data type C, data type d=3:6:5:6. The server may determine, according to the data types A, B and C included in the client a, that the target data duty ratio corresponding to the client a is 3:6:5. And the server determines that the target data duty ratio corresponding to the client B is 5:6 according to the data types C and D contained in the client B. Similarly, the server determines that the target data duty ratio corresponding to the client C is 6:6 according to the data types B and D contained in the client C. In determining the target data identifiers for the client, the server may determine a corresponding number of data identifiers from the corresponding data type a, the data type B and the data type according to the proportional relationship of the data type a, the data type B and the data type c=3:6:5, thereby obtaining a plurality of target data identifiers corresponding to the client. And the same can determine a plurality of target data identifiers corresponding to the client side B and a plurality of target data identifiers corresponding to the client side C. The subsequent server can send the target data identifiers to the corresponding clients, so that the clients can reasonably determine target training data for model training from the local training data according to the received target data identifiers.

In the embodiment, the target data duty ratio corresponding to each client is determined based on the data duty ratio, so that the distribution situation of the target training data distributed to the clients based on the target data duty ratio and the global training data is more matched, and a guarantee is provided for obtaining a model with high generalization based on the target training data training later.

Based on the foregoing embodiments, a further embodiment of the present application provides a federal learning method, where in this embodiment, when calculating a second model parameter based on a first model parameter of each target client, the following manner is specifically adopted: determining gradient parameters corresponding to each target client based on first model parameters obtained by n-th round training of each target client and historical model parameters obtained by n-1 th round training of each target client; calculating to obtain target gradient parameters by adopting a preset calculation formula based on the gradient parameters of each target client; second model parameters for n+1 rounds of model training are determined based on the target gradient parameters. For example, when the server receives the first model parameters of the nth round returned by the target client a and the target client B, the server further combines the historical first model parameters of the target client a in the nth-1 round and the historical first model parameters of the target client B in the nth-1 round, so as to calculate and obtain gradient parameters corresponding to the target client a and gradient parameters corresponding to the target client B, and then weights the gradient parameters of the target client a and the gradient parameters of the target client B to obtain second model parameters of the target client a and the target client B in the nth+1 round.

Specifically, after obtaining the gradient parameters of the target client a and the target client B, the relative gradient parameter/target gradient parameter may be obtained by calculating a relative value based on the data amount of each client and the gradient parameter of each client, and a specific calculation formula may be: target gradient parameter = x gradient parameter of client a + y gradient parameter of client B)/(x + y), where x represents the data volume of client a and y represents the data volume of client B. After the server calculates the target gradient parameter, the target gradient parameter may be used to determine the second model parameter, that is, the target gradient parameter may be used to update the current model parameter of the server member model, thereby obtaining the second model parameter.

In this embodiment, before federation is performed, each client may determine any one of a mapping patch model, a residual patch model, and an internal patch model as a target patch model based on a task type of a model training task, a data difference between server data and client data, and a structure of a server meta model. Or, the server side can determine any one of the mapping patch model, the residual patch model and the internal patch model as the target patch model based on the task type of the model training task, the data difference degree of the server data and the client data and the structure of the server meta model. Specifically, when the task type is a monitoring task or a positioning task, determining that the mapping patch model is the target patch model; when the data difference between the server data and the client data is larger than a preset difference threshold, determining the residual patch model as the target patch model; and when the structural complexity of the server meta-model is greater than the preset complexity, determining the internal patch model as the target patch model.

That is, the three patch structures are mainly designed for different application scenarios, and the embodiment can determine which patch structure to use according to different scenario characteristics, which is specifically described as follows: mapping patch structure: the structure adopts a design of mapping a network layer and an activation layer. The structure principle mainly carries out nonlinear change on original data characteristics, captures important characteristics related to scene requirements, ensures extraction effects of the important characteristics to support upper-layer applications, and can adopt the patch structure when the upper-layer applications need to capture special elements of the scene, such as monitoring and positioning. Residual patch structure: the structure comprises a residual error connecting layer, wherein the residual error connecting layer is used for connecting an initial model and a locally trained model, and the original characteristic information is mainly connected with the local model through the residual error connecting layer by adopting the structural principle, so that global information can be prevented from being forgotten in the local model training process. The structure is mainly suitable for scenes with larger gap between server data and local training data, for example, the province A has trained a model of a server for popularizing the intra-province service, and next popularizes the service to the province B, and because client groups of the two provinces A, B service are inconsistent, the data distribution is also different, and the popularizing and applying effect is better by adopting the patch structure. Internal patch structure: with the structure of the convolution layer and the activation layer, compared with the two former structures, the structure is easier to handle the situation that the initial model structure is larger, and the situation is more common in the field of natural language processing. Compared with the field of computer vision, the initial model in the field of natural language processing is larger in scale and more complex in structure, and internal patches need to be added in a targeted manner in an internal key part. The upper layer application comprises information analysis, element extraction, text generation and the like, and the patch structure is better in effect.

In the implementation process of this embodiment, the meta model may be specifically a DBNet detection model, a convolutional neural network structure (Convolutional Recurrent Neural Network, english: CRNN) or the like, which is not limited herein.

According to the federal learning method, local training data of each client are obtained, then data distribution is conducted again based on the local training data of each client, model training is conducted by combining patch models and the local training data, the problem that a final training obtaining model deviates can be avoided, and each client training obtaining model has good local simulation effects.

Another embodiment of the present application provides a federal learning method, as shown in fig. 2, which may be specifically applied to each client, where the method in this embodiment includes the following steps:

step S201, receiving a meta-model and initial model parameters sent by a server, and performing a first round of model training based on the meta-model, the initial model parameters, training data of a client local area and a target patch model;

step S202, receiving second model parameters sent by a server, wherein the second model parameters are obtained by calculating first model parameters obtained by the server based on n-th round training of a plurality of target clients; and performing model training of the n+1th round based on the second model parameters, the received meta-model, training data of the client and the target patch model sent by the server.

That is, in the case that the number of training rounds is the first round, the client receives the meta-model and the initial model parameters sent by the server, and then the client can perform the first round of model training based on the meta-model, the initial model parameters, training data local to the client and the target patch model;

under the condition that the training round number is not the first round, each target client receives a second model parameter sent by the server, wherein the second model parameter is calculated and obtained by the server based on first model parameters obtained by n-th round training of a plurality of target clients; and each target client can perform n+1st round of model training based on the second model parameters of the server, the received meta-model, training data of the local client and the target patch model, and stop training until the current second model obtained by training each client meets the training conditions.

According to the federal learning method in the embodiment, the server determines the target client needing the next round of model training from the clients, and then calculates and obtains the model parameters of the next round of model training based on the current model parameters of the target clients, so that the determination of the model parameters is more reasonable and accurate, the model parameters can be obtained based on the patch model and the server calculation when the subsequent target client performs the next round of training, the accurate training obtains the model conforming to the local simulation condition, the problem that the final training obtaining model deviates is avoided, and the final training obtaining model has good local simulation effect.

In another embodiment of the present application, in order to improve generalization performance of a model obtained by local training of a client, before performing federal learning, each client may further send a data type of the local training data and a data identifier of each training data corresponding to each data type to a server, so that the server allocates training data for performing model training to the client again based on the data type and the data identifier. Specifically, each client may configure a data identifier for its local training data in advance, and determine a data type corresponding to each training data. The data identifier can be information used for uniquely characterizing training data, such as an ID, a picture, a two-dimensional code, a character string and the like. Taking the example that the client comprises a client A, a client B and a client C, the client A sends the data type A, the data type B and the data type C related to the local training data to the server, and simultaneously sends the data identifier 11, the data identifier 12 and the data identifier 13 of each training data corresponding to the data type A, the data identifier 21, the data identifier 22, the data identifier 23 and the data identifier 24 corresponding to the data type B, and the data identifier 31, the data identifier 32 and the data identifier 33 corresponding to the data type C to the server. Similarly, the client side B sends the data types and the data identifications related to the local training data to the server side. Similarly, the client side C also sends the data types and the data identifications related to the local training data to the server side. Therefore, the server side can receive the data identification of each data type and training data corresponding to each data type sent by the client side A, the client side and the client side C. Then the server can count the data amount of the data identifier corresponding to the same data type, and obtain the total data amount of the training data corresponding to each data type; and then calculating and obtaining the data duty ratio among the data types based on the total data amount corresponding to the data types. Further, after the server calculates the obtained data duty ratio, the server may determine the destination data duty ratio corresponding to each client based on the data duty ratio and the destination data type included in each client of each client. Therefore, when each round of model training is performed, the server side can determine a plurality of target data identifiers from the data identifiers corresponding to the clients according to the target data duty ratio of the clients, and send the target data identifiers to the corresponding clients. After receiving each target data identifier sent by the server, the client can determine target training data corresponding to the target data identifier from the local training data based on each target data identifier, so as to perform model training based on the target training data obtained by reassignment,

In this embodiment, the client configures the data identifier for the training data and sends the data identifier to the server, so that the server can conveniently redistribute the training data based on the data identifier, and the problem of local data leakage caused by directly sending the training data to the server is avoided. By receiving the target data identifier sent by the server, the target training data matched with the global distribution condition of the training data can be obtained based on the target data identifier, and a guarantee is provided for training to obtain a model with good generalization performance.

In the implementation process of this embodiment, the client may feed back the first model parameters obtained by n rounds of training to the server, that is, after the server determines the target clients that need to perform model retraining, the server may send Message messages to each target client. After receiving the Message sent by the server, each target client side indicates that the second model obtained by the current nth round training of the target client side does not accord with the model training conditions, and the next round (n+1 rounds) of model training is needed, so that the second model obtained by the current round training is used as a first model, and then the first model parameters of the first model are fed back to the server side according to the Message. So that the server can calculate and obtain second model parameters for n+1 rounds of model training based on the first model parameters of the nth round of each target client.

In this embodiment, the server determines the target client to be trained for the next round of model training from the clients, and then calculates and obtains the model parameters of the next round of model training based on the current model parameters of the target clients, so that the determination of the model parameters is more reasonable and accurate, the model parameters can be obtained based on the patch model and the server calculation when the subsequent target client performs the next round of training, the accurate training obtains the model conforming to the local simulation condition, the problem that the final training obtaining model deviates is avoided, and the final training obtaining model has good local simulation effect.

In a specific implementation process of this embodiment, the target patch model includes any one of the following: mapping a patch model, a residual patch model and an internal patch model; wherein the mapping patch model comprises: mapping a network and an activation layer; the residual patch model includes: a residual error connection layer; the internal patch model includes: a convolution layer and an activation layer. Specifically, before federation, each client may determine any one of the mapping patch model, the residual patch model, and the internal patch model as the target patch model based on the task type of the model training task, the data difference between the server data and the client data, and the structure of the server meta model. Or, the server side can determine any one of the mapping patch model, the residual patch model and the internal patch model as the target patch model based on the task type of the model training task, the data difference degree of the server data and the client data and the structure of the server meta model, and then send the target patch model to each client, so that each subsequent client can conveniently combine the target patch model to perform model training, and a foundation is laid for preventing the model training effect from being improved.

On the basis of the above embodiment, the federal learning method in the present application is explained below in conjunction with a specific application scenario, and as shown in fig. 3, the federal learning method in the present embodiment includes the following processes:

step one, a server collects requirements of all participants (each client 1 and 2 participating in federal learning, and client n), taking an OCR task in the image field as an example, and before the server initiates the federal learning task, counting data identifiers and data types of OCR data of all the participants. The server side calculates and obtains the data duty ratio of the total data identification amount among the data types based on the data types and the number of the data identifications sent by each client side; then, based on the data duty ratio and the data type of the data of each client, a target data duty ratio corresponding to each client is determined. Before each round of model training is carried out, the server determines a plurality of target data identifiers from the data identifiers of each data type sent by the corresponding client based on the target data duty ratio corresponding to each client, so that data distribution is carried out for all the clients, and target training data is distributed and obtained for each client.

And step two, the server initiates a federation learning task and drives all clients to perform federation learning. The server establishes a message queue with each client for subsequent data communication. And simultaneously starting a monitoring process, and restarting a message queue and retransmitting a message if the communication between the server and a client of a certain party fails.

Step three, after initiating the federation learning task, the server initiates a set of initial model parameters as initial values of all client models, and sends the initial model parameters to each client through a message queue established with each client. Meanwhile, the server side can also send the locally stored meta-model to each client side.

In the implementation process of the embodiment, the meta model may be specifically a DBNet detection model, a convolutional cyclic neural network structure (Convolutional Recurrent Neural Network, english: CRNN), and so on.

And step four, after receiving the initial model parameters and the meta model sent by the server, the client performs a first round of model training according to the meta model, the initial model parameters, the target training data obtained based on the target data identification and the target patch model, thereby obtaining a first model.

In this step, after receiving the meta-model, the initial model parameters, and the target data identifier sent by the server, the client 1 and the client 2. The client n determines, based on the target data identifier, to determine target training data corresponding to the target data identifier from the local training data, and then performs a first round of model training by combining the meta-model virtual model and the initial model parameters.

Step five, after completing one round of model training, the server side judges whether the current model obtained by each client side training serves a preset model training condition or not; and when the model training conditions are not met, the server samples all the clients and determines the target client for the next round of local updating operation.

Specifically, the server side sends a Message to the corresponding target client side based on the label information of the sampled target client side, that is, the server side sends the Message to the target client side 2 and the target client side 5 based on the id of the sampled target client side. Specifically, the target client may further obtain a first model parameter based on the present training and obtain a first model parameter from the previous training, calculate to obtain a gradient parameter, and then send the gradient parameter to the server.

In this embodiment, the sampling manner of the server to the client may be random sampling, and the sampling number may be set according to the actual needs, for example, set to 20% of the total number of clients. Namely, 20% of clients are randomly determined as target clients from all clients.

Step six, after receiving the first model parameters fed back by each target client, the server side combines the first model parameters of the previous round of calculation with the target clients to obtain gradient parameters; or the server side directly receives the gradient parameters fed back by each target client side. And then, according to the gradient parameters of each target client and the data volume of each target client, weighting, combining and polymerizing the gradient required to be updated by the meta-model, and obtaining the target gradient parameters.

That is, assuming that there are two target clients, namely, client a and client B, respectively, client a has 100 pieces of data and client B has 1000 pieces of data, the gradient/target gradient parameter that the metamodel of the server needs to update is (100×gradient of client 1+1000×gradient of client 2)/1100= (1×gradient of client 1+10×gradient of client 2)/11.

And step seven, the server calculates the obtained target gradient parameters, and can determine second model parameters for carrying out the next model training based on the target gradient parameters and the current model parameters of the server member model.

In this step, after determining the second model parameter, the server may further update the current model parameter of the meta-model based on the second model parameter, so that the meta-model obtains an updated model parameter (second model parameter). The server side can send the updated model parameters to the target client side. Meanwhile, the server side can redetermine the target data identifier corresponding to each client side based on the target data occupation ratio corresponding to each target client side, and then sends the target data identifier to the corresponding target client side, so that each client side can redetermine the target training data based on the target data identifier.

In this step, the second model parameters specifically refer to model parameters other than the target patch model parameters. Wherein the target gradient parameters also refer to gradient parameters of model parameters other than the target patch model parameters.

And step eight, after each target client obtains the second parameter model, model retraining can be performed based on the second model parameters, the meta model, the target patch model of each client and the target training data obtained by reassigning each client, and a second model corresponding to the current training round number is obtained.

In this embodiment, when the client performs model training, the client may specifically monitor a Message queue between the client and the server, and if the server sends a Message of the latest round, the client may perform a read operation on the Message, update a model parameter in the Message to an initialized model parameter local to perform model training.

The client model trains the task type of the task, the data difference degree of the server data and the client data and the structural complexity of the server meta-model, and loads the corresponding patch model as a target patch model. And then starting to execute local optimization operation, reading a local support-training set, and respectively performing one-step gradient descent training under the setting of three models according to b based on the local training set and the target patch model. And then reading the query-training set, and updating the model by using the gradient calculation second derivative obtained by training at the moment. The model parameters except the patch structure are sent back to the server as one of the main parameters of the Message. Wherein the support-training set is composed of a part of training data in the reassigned target training data, and the query-training set is composed of another part of training data in the reassigned target training data.

In this embodiment, the structure of the patch model includes the following schemes:

1) Mapping patch structure: including the mapping network layer + the activation layer. The method mainly comprises the step of adding a mapping network layer and an activation layer after an initial model sent by a server. The mapping network is a full-connection layer meeting input and output, and is mainly used for adjusting initialization parameters in order to learn local data trends of different clients. The activation layer is used for removing the influence caused by a part of invalid parameters, and highlighting the importance of relevant parameters of important features.

2) Residual patch structure: including the residual connection layer. The method mainly comprises the step of adding one-time residual connection between an initial model sent by a server and a model which is subjected to local training in the previous round of client. The residual structure is mainly used for solving the problem that the local client and the global client have larger deviation and the local model parameters are biased by the global.

3) Internal patch structure: including convolutional layer + active layer. The method mainly comprises the step of adding a convolution layer and an activation layer in an initial model sent by a server. The convolutional layer is a convolutional neural network redetermined according to a specific network. The design of the patch structure is mainly because the two patches take the initialization model as a whole and then carry out additional processing, but the initialization model in practical application can be larger, the structure is more complex, and the patch effect is not obvious only when the patch is carried out outside. Therefore, after patches are added inside, the functions of different parts of the initialized model can be definitely initialized, so that the basic characteristic part of the initialized model has generalization of the global model, and the classification part has the local personalized effect of the model.

Step nine, judging whether a second model obtained by current training of each client accords with a preset model training condition, and completing federal learning when the second model meets the preset model training condition; and when the preset model training conditions are not met, taking the second model obtained by the current training as the first model, and returning to the step five until the second model obtained by the training of each client side meets the preset model training conditions.

According to the federal learning method in the embodiment, each client performs model training by combining patch models, so that a personalized model training mode aiming at service types, operation ranges and the like is realized, and model training is performed by combining local training data and reassigning training data, so that a model obtained through training meets local service requirements, the problem of model deviation is avoided, the local simulation effect is improved, and subsequent model migration is facilitated.

Another embodiment of the present application provides a federal learning apparatus, as shown in fig. 4, comprising:

a first sending module 11, configured to send a meta model and initial model parameters to each client, so that each client performs a first round of model training based on the meta model, the initial model parameters, training data local to the client, and a target patch model;

The acquisition module 12 is configured to determine a plurality of target clients for model retraining from the clients, and acquire first model parameters of a first model obtained by current nth round training of each target client, where n is a positive integer;

the calculation module 13 is configured to calculate, based on the first model parameters of each target client, to obtain second model parameters for performing n+1 model training by adopting a predetermined calculation manner;

a second sending module 14, configured to send the second model parameters to each of the target clients, so that each of the target clients performs model retraining based on the received meta-model, the second model parameters, training data local to the client, and a target patch model, to obtain a current second model;

In this embodiment, in a specific implementation process, the federal learning device further includes: the device comprises a receiving module, a data volume duty ratio calculating module and a determining and sending module;

The receiving module is used for: receiving data identifiers of all training data corresponding to all data types sent by all clients;

the data volume duty ratio calculation module is used for calculating and obtaining the data duty ratio of the total data identification amount among the data types based on the data types and the number of the data identifications sent by each client;

and the determining and transmitting module is used for determining a plurality of target data identifiers from a plurality of data identifiers corresponding to each target data type based on the data duty ratio and the target data types contained in each client before each round of model training is carried out, and transmitting each target data identifier to the corresponding client so as to redistribute training data for carrying out model training for each client.

In a specific implementation process of this embodiment, the determining module is specifically configured to: determining the target data duty ratio corresponding to each client based on the data duty ratio and the target data type contained in each client; and determining a plurality of target data identifiers from the data identifiers of the data types sent by the corresponding clients based on the target data duty ratio of the clients.

In a specific implementation process of this embodiment, the computing module is specifically configured to: determining gradient parameters corresponding to each target client based on first model parameters obtained by n-th round training of each target client and historical model parameters obtained by n-1 th round training of each target client; calculating to obtain target gradient parameters by adopting a preset calculation formula based on the gradient parameters of each target client; determining second model parameters for n+1 rounds of model training based on the target gradient parameters,

In a specific implementation process of this embodiment, the federal learning apparatus further includes a patch model determining module, where the patch model determining module is configured to: and determining any one of a mapping patch model, a residual patch model and an internal patch model as the target patch model based on the task type of the model training task, the data difference degree of the server data and the client data and the structure of the server meta model.

In a specific implementation process of this embodiment, the patch model determining module is specifically configured to: when the task type is a monitoring task or a positioning task, determining that the mapping patch model is the target patch model; when the data difference between the server data and the client data is larger than a preset difference threshold, determining the residual patch model as the target patch model; and when the structural complexity of the server meta-model is greater than the preset complexity, determining the internal patch model as the target patch model.

In a specific implementation process of this embodiment, the mapping patch model includes: mapping a network and an activation layer; the residual patch model includes: a residual error connection layer; the internal patch model includes: a convolution layer and an activation layer.

According to the federal learning device, when each round of model training is performed, the target client side needing to perform the next round of model training is determined from each client side, then model parameters of the next round of model training are obtained based on current model parameter calculation of each target client side, the determination of the model parameters can be more reasonable and accurate, the model parameters can be obtained based on patch models and server side calculation when the target client side performs the next round of training, the model conforming to the local simulation condition can be obtained through accurate training, the problem that the final training obtaining model is deviated is avoided, the problem that the final training obtaining model has good local simulation effect is avoided, the problem that the final training obtaining model is deviated is avoided, and the client side training obtaining model has good local simulation effect.

Another embodiment of the present application provides a federal learning apparatus, as shown in fig. 5, comprising: a receiving module 11 and a model training module 12;

the receiving module 11 is configured to receive a meta model and an initial model parameter sent by a server, and receive a second model parameter sent by the server, where the second model parameter is obtained by calculating a first model parameter obtained by n-th training of a plurality of target clients by the server;

The model training module 12 is configured to perform a first round of model training based on the meta-model, initial model parameters, training data local to the client, and a target patch model, and perform an n+1th round of model training based on the second model parameters sent by the server, the received meta-model, training data local to the client, and the target patch model.

In this embodiment, in a specific implementation process, the federal learning device further includes: transmitting module

The sending module is used for: before receiving the meta-model and initial model parameters sent by the server,

the receiving module is further configured to receive each target data identifier sent by the server, determine, based on each target data identifier, target training data corresponding to the target data identifier from the local training data, and perform model training based on the target training data obtained by redistribution.

The sending module in this embodiment is further configured to obtain, based on the received Message sent by the server, a second model obtained by the first round of training as a first model, and feed back model parameters of the first model to the server, so that the server calculates, based on the first model parameters of the first round of training of each target client, the second model parameters for performing the next round of training of the model.

In this embodiment, the target patch model includes any one of the following: mapping a patch model, a residual patch model and an internal patch model; wherein the mapping patch model comprises: mapping a network and an activation layer; the residual patch model includes: a residual error connection layer; the internal patch model includes: a convolution layer and an activation layer.

According to the federal learning device in the embodiment, the server determines the target client needing to perform the next round of model training from all the clients, then calculates and obtains the model parameters of the next round of model training based on the current model parameters of all the target clients, so that the determination of the model parameters is more reasonable and accurate, the model parameters can be obtained based on the patch model and the server calculation when the subsequent target client performs the next round of training, the accurate training obtains the model conforming to the local simulation condition, the problem that the final training obtaining model deviates is avoided, and the final training obtaining model has good local simulation effect.

Another embodiment of the present application provides an electronic device, at least including a memory, and a processor, where the memory stores a computer program, and the processor when executing the computer program on the memory implements the following method steps:

step one, a meta model and initial model parameters are sent to each client so that each client carries out a first round of model training based on the meta model, the initial model parameters, training data of the local client and a target patch model;

determining a plurality of target clients for model retraining from the clients, and collecting first model parameters of a first model obtained by current n-th round training of each target client, wherein n is a positive integer;

thirdly, calculating and obtaining second model parameters for n+1 rounds of model training by adopting a preset calculation mode based on the first model parameters of each target client;

step four, the second model parameters are sent to each target client so that each target client carries out model retraining based on the received meta-model, the second model parameters, training data of the local client and a target patch model to obtain a current second model;

Judging whether the current second model obtained by training each client meets training conditions or not, and re-determining a plurality of target clients for model retraining when the current second model obtained by training each client does not meet the training conditions; and stopping training when each client trains to obtain that the second model meets the training conditions.

The specific implementation process of the above method steps can be referred to the above embodiments of any federal learning method, and the embodiments are not repeated here.

According to the electronic equipment, when each round of model training is carried out, the target client side needing to carry out the next round of model training is determined from each client side, then the model parameters of the next round of model training are obtained based on the current model parameter calculation of each target client side, so that the determination of the model parameters is more reasonable and accurate, the model parameters can be obtained based on the patch model and the service end calculation when the target client side carries out the next round of training, the accurate training obtains the model conforming to the local simulation condition, the problem that the final training obtaining model is offset is avoided, and the model obtained by each client side training has good local simulation effect.

The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present application, the scope of which is defined by the claims. Various modifications and equivalent arrangements may be made to the present application by those skilled in the art, which modifications and equivalents are also considered to be within the scope of the present application.

Claims

1. A federal learning method, applied to a server, comprising:

2. The method of claim 1, wherein prior to sending the meta-model and initial model parameters to each client, the method further comprises:

3. The method of claim 2, wherein determining a plurality of target data identifiers from a plurality of data identifiers corresponding to each target data type based on the data duty cycle and a target data type included in each client, specifically comprises:

4. The method of claim 1, wherein the calculating, based on the first model parameters of each target client, by using a predetermined calculation mode to obtain the second model parameters for n+1 rounds of model training specifically includes:

5. The method of claim 1, wherein the method further comprises: and determining any one of a mapping patch model, a residual patch model and an internal patch model as the target patch model based on the task type of the model training task, the data difference degree of the server data and the client data and the structure of the server meta model.

6. The method as claimed in claim 5, wherein determining any one of a mapping patch model, a residual patch model, and an internal patch model as the target patch model based on a task type of the model training task, a data difference between server data and client data, and a structural complexity of a server meta model, specifically includes:

7. The method as recited in claim 5, wherein the mapping patch model comprises: mapping a network and an activation layer;

the residual patch model includes: a residual error connection layer;

the internal patch model includes: a convolution layer and an activation layer.

8. A federal learning method applied to each client, comprising:

9. The method of claim 8, wherein prior to receiving the metamodel, initial model parameters sent by the server, the method further comprises:

10. The method of claim 8, wherein the target patch model comprises any one of: mapping a patch model, a residual patch model and an internal patch model;

the residual patch model includes: a residual error connection layer;

the internal patch model includes: a convolution layer and an activation layer.

11. A federal learning apparatus, comprising:

12. A federal learning apparatus, comprising: the receiving module and the model training module;

13. An electronic device comprising at least a memory, a processor, the memory having stored thereon a computer program, the processor, when executing the computer program on the memory, implementing the steps of the federal learning method according to any of claims 1-7 or 8-10.