CN113837399A

CN113837399A - Federal learning model training method, device, system, storage medium and equipment

Info

Publication number: CN113837399A
Application number: CN202111248479.7A
Authority: CN
Inventors: 马鑫; 包仁义; 徐松; 畅绍政; 雷江涛; 刘兵; 张凯; 蒋锦鹏
Original assignee: Yidu Cloud Beijing Technology Co Ltd
Current assignee: Yidu Cloud Beijing Technology Co Ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2021-12-24
Anticipated expiration: 2041-10-26
Also published as: CN113837399B

Abstract

The application discloses a method, a device, a system, a storage medium and equipment for training a federated learning model, wherein the method is applied to a user side and comprises the following steps: performing model training through local data to obtain a local model and a local model; determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model; sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters; receiving updated global model parameters from the server; updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters; by applying the method, a global model with better model performance can be obtained.

Description

Federal learning model training method, device, system, storage medium and equipment

Technical Field

The application relates to the technical field of federal learning, in particular to a method, a device, a system, a storage medium and equipment for training a federated learning model.

Background

The Federal machine learning is a machine learning framework, the data island problem can be effectively solved, participators can jointly model on the basis of not sharing data, the data island can be technically broken, and AI (artificial intelligence) cooperation is realized. The Federal averaging Algorithm (FedAvg) performs well when processing independent and equally distributed data, and is equivalent to a centralized algorithm.

However, in practical applications, because the global model is mainly formed by aggregation of local models of the clients, and the standards, features, and distributions of the production data of the clients are inconsistent, the characteristics and the distributions of the production data of the clients are inconsistent, when the data of the clients are not independently and identically distributed, that is, Non-IID data, a client drift phenomenon is caused, and thus, the performance of the federal average algorithm is degraded.

Disclosure of Invention

In order to solve the above problems in the background art, embodiments of the present application provide a method and an apparatus for training a federated learning model, a storage medium, and a device.

According to a first aspect of the present application, a method for training a bang learning model is provided, the method including: performing model training through local data to obtain a local model and a local model; determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model; sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters; receiving updated global model parameters from the server; and updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters.

According to an embodiment of the present application, the determining a local weight parameter corresponding to the local model according to a difference between the local model and the local model includes: determining a local gradient vector according to the local model; determining a local gradient vector according to the local model; determining a difference degree included angle according to the local gradient vector and the local gradient vector; and determining a local weight parameter corresponding to the local model based on the included angle of the difference degree.

According to an embodiment of the present application, the determining a local weight parameter corresponding to the local model based on the included angle of the difference degree includes: determining a sample proportion of local data and global data; determining an adjusting factor according to the included angle of the difference degree; wherein the included angle of the difference degree is inversely proportional to the numerical value of the adjusting factor; and integrating the sample proportion and the adjusting factor to obtain the local weight parameter.

According to an embodiment of the present application, the performing model training through local data to obtain a local model and a local model includes: receiving an initialization model from the server; setting first model parameters of the initialization model to obtain an initialization local model; setting second model parameters of the initialization model to obtain an initialization local model; training the initialized local model through the local data to obtain the local model; and training the initialized local model through the local data to obtain the local model.

According to a second aspect of the present application, there is provided a method for training a bang learning model, the method including: receiving local model parameters and local weight parameters from a plurality of user terminals; updating global model parameters according to a plurality of the local model parameters and the local weight parameters; and sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain the updated local model parameters and the updated local weight parameters.

According to an embodiment of the present application, the updating the global model parameter according to the plurality of local model parameters and the local weight parameters includes: carrying out normalization processing on the local weight parameters to obtain global weight parameters; and aggregating according to the local model parameters and the global weight parameters to update parameters of the global model and obtain parameters of the updated global model.

According to a third aspect of the present application, there is provided a training apparatus for a bang learning model, the apparatus comprising: the model training module is used for carrying out model training through local data to obtain a local model and a local model; the parameter determining module is used for determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model; the first sending module is used for sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters; the first receiving module is used for receiving the updated global model parameters from the server; and the first updating module is used for updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters.

According to an embodiment of the present application, the parameter determination module includes: determining a local gradient vector according to the local model; determining a local gradient vector according to the local model; determining a difference degree included angle according to the local gradient vector and the local gradient vector; and determining a local weight parameter corresponding to the local model based on the included angle of the difference degree.

According to an embodiment of the present application, the parameter determination module includes: determining a sample proportion of local data and global data; determining an adjusting factor according to the included angle of the difference degree; wherein the included angle of the difference degree is inversely proportional to the numerical value of the adjusting factor; and integrating the sample proportion and the adjusting factor to obtain the local weight parameter.

According to an embodiment of the present application, the model training module includes: the receiving submodule is used for receiving the initialization model from the server; the setting submodule is used for carrying out first model parameter setting on the initialization model to obtain an initialization local model; the setting submodule is also used for carrying out second model parameter setting on the initialization model to obtain an initialization local model; the training submodule is used for training the initialized local model through the local data to obtain the local model; the training submodule is further configured to train the initialized local model through the local data to obtain the local model.

According to a fourth aspect of the present application, there is provided a training apparatus for a bang learning model, the apparatus comprising: the second receiving module is used for receiving the local model parameters and the local weight parameters from a plurality of user terminals; a second updating module, configured to update a global model parameter according to the plurality of local model parameters and the local weight parameters; and the second sending module is used for sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain the updated local model parameters and the updated local weight parameters.

According to an embodiment of the present application, the second updating module includes: the normalization submodule is used for carrying out normalization processing on the local weight parameters to obtain global weight parameters; and the aggregation sub-module is used for aggregating according to the local model parameters and the global weight parameters so as to update parameters of the global model and obtain the parameters of the updated global model.

According to a fifth aspect of the present application, a system for training a federated learning model is provided, the system includes a server and a plurality of clients, the server includes: the second receiving module is used for receiving the local model parameters and the local weight parameters from a plurality of user terminals; a second updating module, configured to update a global model parameter according to the plurality of local model parameters and the local weight parameters; the second sending module is used for sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters; the user side includes: the model training module is used for carrying out model training through local data to obtain a local model and a local model; the parameter determining module is used for determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model; the first sending module is used for sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters; the first receiving module is used for receiving the updated global model parameters from the server; and the first updating module is used for updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters.

According to a sixth aspect of the present application, there is provided a computer device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the method of any of the above.

According to a seventh aspect of the present application, there is provided a storage medium containing computer executable instructions for performing the method of any one of the above when executed by a computer processor.

The method, the device, the system, the storage medium and the equipment for training the federated learning model provided by the embodiment of the application obtain a local model and a local model through local data training, determine local weight parameters according to the difference degree of the local model and the local model, so that the global model can update global model parameters according to each local model parameter and each corresponding local weight parameter, thereby obtaining updated global model parameters, then update the local model parameters by using the updated global model parameters, re-determine the difference degree of the local model and the updated local model, thereby obtaining updated local weight parameters, further update the global model by using the updated local model parameters and the updated local weight parameters again, and so on, so that each update of the global model can correspond to different local weight parameters, through the dynamic adjustment of the weight coefficient, the global model can better learn the deviation information of the local model, and the global model has better convergence, so that higher accuracy is achieved, and better model performance is obtained.

It is to be understood that the teachings of this application need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of this application may achieve benefits not mentioned above.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Fig. 1 is a schematic flow chart illustrating a first implementation flow of a method for training a bang learning model according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a second implementation flow of a method for training a federated learning model according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a third implementation flow of a method for training a federated learning model according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating an implementation scenario of a method for training a federated learning model according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an implementation module of a training apparatus for a federated learning model according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating an implementation module of a training apparatus for a federated learning model according to another embodiment of the present application;

FIG. 7 is a schematic diagram illustrating an implementation apparatus of a training system for the federated learning model according to an embodiment of the present application;

fig. 8 shows a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are given merely to enable those skilled in the art to better understand and to implement the present application, and do not limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The technical solution of the present application is further elaborated below with reference to the drawings and the specific embodiments.

Fig. 1 shows a first implementation flow diagram of a method for training a bang learning model according to an embodiment of the present application.

Referring to fig. 1, according to a first aspect of the present application, a method for training a bang learning model is provided, where the method is applied to a user side, and the method includes: operation 101, performing model training through local data to obtain a local model and a local model; operation 102, determining a local weight parameter corresponding to the local model according to the difference degree between the local model and the local model; operation 103, sending the local model parameters and the local weight parameters corresponding to the local model to the server, so that the server updates the global model parameters according to the local model parameters and the local weight parameters; operation 104, receiving updated global model parameters from the server; in operation 105, the local model parameters and the local weight parameters are updated according to the updated global model parameters, and updated local model parameters and updated local weight parameters are obtained.

The federate learning model training method provided by the application obtains a local model and a local model through local data training at a user side, determines corresponding local weight parameters of the local model according to the difference degree of the local model and the local model by taking the local model as a basis, so that the global model can update global model parameters according to each local model parameter and each corresponding local weight parameter to obtain updated global model parameters, then updates the local model parameters by using the updated global model parameters, re-determines the difference degree of the local model and the updated local model to obtain updated local weight parameters, further updates the global model by using the updated local model parameters and the updated local weight parameters again, and so on, so that each update of the global model can correspond to different local weight parameters, through the dynamic adjustment of the weight coefficient, the global model can better learn the deviation information of the local model, and the global model has better convergence, so that higher accuracy is achieved, and better model performance is obtained.

In the method operation 101, the user side refers to a data participant who needs to train the global model, and it can be understood that there are a plurality of user sides in the federate learning model training process, and the method can be applied to each user side. The local data is training data provided by the user terminal for obtaining the global model, and the local data may be data with the property of private data, for example, when the user terminal is each hospital, the local data may be case data of each hospital. Further, global data is used to characterize the set of all local data. It should be added that, in the first round of training, the initialization model used for training to obtain the local model and the local model may be a model with the same initialization parameters or a model with different initialization parameters, as needed. The local model refers to a model obtained by training only through local data, and the local model does not participate in the aggregation and updating of the global model. Local models refer to models that participate in global model aggregation, which are trained on local data and perform parameter updates according to global model parameters from the global model.

In the method operation 102, the local data of different user sides are suitable for representing the characteristics of Non-independent same distribution (Non-IID), and the local model is used as the aggregation basis of the global model, and along with the update of the local model, the difference between the local model and the local model can represent the difference between the local data and the data of other user sides, so that the local weight parameters corresponding to the local model can be determined by using the difference, and the local weight parameters are used for the aggregation of the global model. For example, when the degree of difference between the local model and the local model is larger, it means that the degree of difference between the local data and the other user-side data is larger, in this case, if the obtained local weight parameter is smaller through formula calculation, the influence of the local data with larger degree of difference on the global model can be smaller, so that the representation of the global model is more inclined to the main tendency representation of the data for training; in another case, if the obtained local weight parameter is made larger through formula calculation, the local data with larger difference degree can have larger influence on the global model, so that the global model can represent the data with different tendentiousness more inclined to be used for training. For example, in the training process of a federal learning model, the user terminal may be different hospitals in different regions or different hospitals in the same region, the local data is medical data corresponding to the hospitals, there are several hospitals, and the patient characteristics of the corresponding local data are obtained, and the detection criteria has a larger difference from the patient characteristics and the detection criteria characterized in other regions. Conversely, the conditions may also be set as: the smaller the difference data between the local model and the local model is, the smaller the value of the local weight parameter is.

In the method operation 103, each ue sends the local model parameters and the local weight parameters corresponding to the local model to the server. The server refers to one end for performing aggregate update on the global model, and may be a server independent of each user end or a specific device with a data processing function. It should be added that, according to different global model updating methods, the local model parameter may be one or more of all model parameters, a part of model parameters, and model parameters related to the local model of the local model, specifically: gradient parameters, variable parameters, etc. of the local model. And the server side aggregates the local model parameters and the local weight parameters from each user side to obtain global model parameters corresponding to the global model, and updates the global model parameters. It can be understood that, in the first round of updating, there may be no difference between the local model and the local model, in this case, the method may use the preset weighting coefficient to make the global model update the global model parameters; the method may also set other conditions to obtain corresponding local weighting parameters, e.g. determining the weighting coefficients as a proportion of local data to total training data. With the aggregation and updating of the global model, if the difference degree between the local data and other training data is large, the difference degree between the local model and the local model can be reflected.

In the method operation 104, updated global model parameters from the server are received, and similarly, according to different policies of the federal learning algorithm, the updated global model parameters sent by the server to the client may be one or more of all global model parameters, a part of global model parameters, and parameters related to the global model parameters. No description is made below.

In the method operation 105, the local model parameters are updated by using the updated global model parameters to obtain an updated global model; then, in the same operation 102, the updated local weight parameters are determined according to the difference between the updated global model and the updated local model, and then the updated local model parameters and the updated local weight parameters may be sent to the server, so that the server aggregates and updates the global model, and so on, until the global model meeting the setting requirements is obtained. It should be added that, according to different federal learning algorithms, the method updates the local model parameters by using the updated global model parameters to obtain an updated global model, can perform another round of training on the local model and the updated global model by using local data, and then compares the difference between the local model after the round of training and the local model after the round of training with operation 102 to determine the updated local weight parameters. It will be appreciated that with each round of training, since the local model is not updated according to the global model, the local model needs to be updated according to the global model, i.e. with each round of training of the model, the model parameters of the local model and the local model will be different. It may be apparent that the updated local weight parameter has a different value than the local weight parameter before the update.

In conclusion, the method mainly adopts the local model as a basis, and realizes dynamic adjustment of the local weight parameters by using the difference degree between the local model and the local model updated along with the federal learning algorithm, so that the global model can better learn the deviation information of all the local models in the aggregation process, obtain the global model with better convergence, and achieve better model performance.

Fig. 2 shows a schematic flow chart of an implementation of a method for training a bang learning model according to an embodiment of the present application.

Referring to fig. 2, according to an embodiment of the present application, the determining a local weight parameter corresponding to the local model according to a difference between the local model and the local model in operation 102 includes: an operation 1021, determining a local gradient vector according to the local model; operation 1022, determining a local gradient vector from the local model; operation 1023, determining a difference degree included angle according to the local gradient vector and the local gradient vector; in operation 1024, local weight parameters corresponding to the local model are determined based on the included angle of the degree of difference.

It is to be understood that the degree of difference between the local model and the local model may be evaluated by various criteria, such as a difference between a layer of the local model and the local model, a difference between output results of the local model and the local model, and the like.

In method operation 1021, a gradient record corresponding to the local model is loaded based on the training of the local model, and then a local gradient vector corresponding to the local model is determined using the gradient record. In operation 1022 of the method, similarly, according to the training of the local model, the gradient record of the corresponding local model is loaded, and then the local gradient vector corresponding to the local model is determined by using the gradient record of the local model. It should be understood that the method operation 1021 and the operation 1022 are only used for operation differentiation, and do not represent that there is a relationship between the two operations, that is, the method operation 1021 and the operation 1022 may be performed simultaneously, or the operation 1022 may be performed first and then the operation 1021.

In operation 1023, the angle formed by the local gradient vector and the specific angle value of the angle are obtained from the local gradient vector and the local gradient vector. The included angle can be used for representing the difference degree of the local model and the local model, namely the included angle of the difference degree. In operation 1024, formula conversion is performed on the included angle of the difference degree through a set function, so that a corresponding local weight parameter can be obtained. From the above description, it is understood that the difference degree included angle may be proportional to the value of the local weight parameter, or may be inversely proportional to the value of the weight coefficient.

According to an embodiment of the present application, in operation 1024, determining a local weight parameter corresponding to the local model based on the included angle of the difference degree includes: firstly, determining the sample proportion of local data and global data; then, determining an adjusting factor according to the included angle of the difference degree; wherein, the included angle of the difference degree is inversely proportional to the numerical value of the adjusting factor; and finally, integrating the sample proportion and the adjustment factor to obtain a local weight parameter.

It will be appreciated from the above description that, since the parameters of the local model and the local model are consistent in the first round of updating, the method can introduce global data, which is used to refer to the set of local data of all clients, based on this. The method can determine the sample proportion according to the proportion of the local data and the global data, and takes the sample proportion as a reference condition of the weight coefficient. And then, the sample proportion is adjusted by using the difference degree included angle as an adjusting factor so as to obtain a weight coefficient. The weighting coefficient is determined by the method, so that the weighting coefficient also carries the reference message of the sample proportion, and based on the sample proportion, the method can avoid the situation that some user side samples are too few, the influence on the global model is large, and the obtained global model can show the tendency of most global data.

In one embodiment, the sample ratio may be determined using the following equation:

in the formula, i is used for representing the ith user terminal; n is used to characterize the total number of clients; s_iSample size, X, for characterizing local data of the ith client_iFor characterizing the sample proportions; k is used to characterize the kth iteration, and it is understood that k is a positive integer greater than or equal to 1.

In one embodiment, the included angle of variance may be determined using the following equation:

in the formula, in the above-mentioned formula,

the included angle of the difference degree is represented;

for characterizing local gradient vectors;

for characterizing local gradient vectors.

In one embodiment, the local weight parameter may be determined using the following formula:

in this formula, the formula characterizes:

the local weight parameter is used for representing the local weight parameter corresponding to the kth iteration of the ith user side; the lambda and the rho are preset hyper-parameters which are used for adjusting the size of the local weight parameter and can be preset manually according to requirements.

In another case, the method can further adjust the local weight parameter at the server to obtain the local weight parameter with normalization, so as to meet the requirement of the server on normalizing the local weight parameter. Specifically, it can be characterized as follows:

in this formula, the formula characterizes:

the method is used for characterizing the global weight parameter corresponding to the kth iteration of the ith user terminal.

Based on the method, the server side can obtain the local weight parameters and the global weight parameters corresponding to each user side, and according to a parameter aggregation formula set by the federal learning model, the server side can aggregate the local model parameters by using the local weight parameters or the global weight formula to obtain the global model parameters.

According to an embodiment of the present application, in operation 101, performing model training through local data to obtain a local model and a local model, including: firstly, receiving an initialization model from a server; then, carrying out first model parameter setting on the initialization model to obtain an initialization local model; then, carrying out second model parameter setting on the initialization model to obtain an initialization local model; training the initialized local model through local data to obtain a local model; and finally, training the initialized local model through local data to obtain the local model.

In the method, the initialization model used for training the local model and the local model comes from the server side, and the server side sends the initialization model to each user side. The initialization model sent by the server to each ue may be the same or different, as required. And then, setting initialization model parameters for model training by a user side to obtain an initialization local model and an initialization local model. And then training the initialized local model and the initialized local model through local data, and obtaining the local model and the local model for the first round of aggregation after finishing the set iteration standard. By analogy, after the local model is updated by using the global model parameters, the local model and the updated local model can be trained through local data, and after the set iteration standard is completed, the local model and the local model for the second round of aggregation can be obtained.

Specifically, the aggregate update to the global model may adopt the following formula:

in this formula, L (θ) is used to characterize the loss function corresponding to the global model; l_i(θ^k) The loss function is used for representing the loss function corresponding to the user terminal i; theta^k+1Local model parameters for characterizing the (k + 1) th iteration; theta^kLocal model parameters for characterizing the kth iteration.

Fig. 3 shows a third implementation flow diagram of a method for training a federated learning model in the embodiment of the present application.

Referring to fig. 3, according to a second aspect of the present application, there is provided a method for training a bang learning model, where the method is applied to a server, and the method includes: operation 301, receiving local model parameters and local weight parameters from a plurality of user terminals; operation 302, updating global model parameters according to the plurality of local model parameters and the local weight parameters; in operation 303, the updated global model parameter is sent to each ue, so that each ue updates the local model parameter and the local weight parameter according to the updated global model parameter to obtain an updated local model parameter and an updated local weight parameter.

The federate learning model training method provided by the application obtains a local model and a local model through local data training at a user side, determines corresponding local weight parameters of the local model according to the difference degree of the local model and the local model, so that the global model can update global model parameters according to each local model parameter and each corresponding local weight parameter, thereby obtaining updated global model parameters, then updates the local model parameters by using the updated global model parameters, re-determines the difference degree of the local model and the updated local model, thereby obtaining updated local weight parameters, further updates the global model by using the updated local model parameters and the updated local weight parameters again, and so on, each time of updating of the global model can correspond to different local weight parameters, through the dynamic adjustment of the weight coefficient, the global model can better learn the deviation information of the local model, and the global model has better convergence, so that higher accuracy is achieved, and better model performance is obtained.

The server refers to a server or other specific electronic equipment with data processing capability for training the global model, and the server is further configured to receive the local model parameters and the local weight parameters from each user side to implement parameter aggregation for all the user sides, so as to implement parameter update for the global model.

Before the method operation 301, the server sends the initialization model corresponding to the user side, so that the user side trains the initialization model through local data to obtain a corresponding local model and a local model. After each user side obtains the local model and the local model, the local weight parameters corresponding to the local model are obtained through the difference comparison between the local model and the local model. In operation 301, the server receives local model parameters and local weight parameters from each of the clients.

In operation 302, the server implements an update of the global model parameters by aggregating the local model parameters and the local weight parameters to obtain updated global model parameters.

In operation 303, the server sends the updated global model parameters to each ue, so that each ue updates the local model parameters according to the updated global model parameters for the next iteration. Now, in the next iteration, after the local model trains the updated local model, the difference between the local model and the updated local model is determined again, so as to determine the corresponding local weight coefficient again, that is, the updated local weight coefficient, the server receives the updated local model parameters and the updated local weight coefficient from the client, and then the global model parameters can be aggregated and updated in another round, and so on, so as to realize multiple rounds of updating of the global model, and finally obtain the target global model.

According to an embodiment of the present application, the operation 301, updating the global model parameter according to the plurality of local model parameters and the local weight parameters, includes: firstly, carrying out normalization processing on a local weight parameter to obtain a global weight parameter; and then, aggregating according to the local model parameters and the global weight parameters to update the parameters of the global model and obtain the parameters for updating the global model.

As can be seen from the description in operation 1024, in the process of updating the global model parameter by using the local weight parameters, the server needs to perform normalization processing on each local weight parameter to achieve unification of data quantity characterization.

To facilitate further understanding of the above embodiments, a specific implementation scenario is provided below for description.

Fig. 4 is a schematic view of an implementation scenario of a method for training a bang learning model according to an embodiment of the present application.

Referring to fig. 4, in this implementation scenario, the method is applied to a server and a plurality of clients, respectively. The user side takes hospitals in different regions as units, and the server side is in communication connection with each user side and is used for training the detection model corresponding to the medical data.

Firstly, the server side sends an initialization model to each user side, wherein the parameters of the initialization model are

Then, the user end sets the local model parameters and the local model parameters of the initialization model, and the initialization parameters of the local model parameters and the local model parameters are the same in the initialization state,

then, the user end trains the initialized local model and the initialized local model, and the gradient records of the local model and the local model are loaded after each round of training

And calculating the vector angle from the gradient record

Obtaining local weight parameters

The user end sends the local model parameters and the local weight parameters corresponding to the local model

To the server.

Then, the server side receives the local weight parameters of each user side

And carrying out normalization processing to obtain a global weight parameter corresponding to each local model, and aggregating the global weight parameters and the local model parameters by the server side to update the global model parameters. And the server side sends the updated global model parameters to each user side so that each user side updates the local model parameters.

And then, carrying out next round of training by using the updated local model and the updated local model, and so on, loading the corresponding local model and the gradient record of the local model when finishing each round of training, so as to realize the purpose of dynamically determining the corresponding weight parameter in each round, so as to realize the purpose of updating the global model through the dynamic weight parameter, and finally obtain the global model with higher accuracy.

Fig. 5 shows a schematic diagram of an implementation module of a training apparatus for a bang learning model according to an embodiment of the present application.

Referring to fig. 5, according to a third aspect of the present application, there is provided a device for training a bang learning model, where the device is applied to a user side, and the device includes: the model training module 501 is configured to perform model training through local data to obtain a local model and a local model; a parameter determining module 502, configured to determine a local weight parameter corresponding to the local model according to a difference between the local model and the local model; a first sending module 503, configured to send the local model parameter and the local weight parameter corresponding to the local model to the server, so that the server updates the global model parameter according to the local model parameter and the local weight parameter; a first receiving module 504, configured to receive updated global model parameters from a server; a first updating module 505, configured to update the local model parameter and the local weight parameter according to the updated global model parameter, so as to obtain an updated local model parameter and an updated local weight parameter.

According to an embodiment of the present application, the parameter determining module 502 includes: determining a local gradient vector according to the local model; determining a local gradient vector according to the local model; determining a difference degree included angle according to the local gradient vector and the local gradient vector; and determining a local weight parameter corresponding to the local model based on the included angle of the difference degree.

According to an embodiment of the present application, the parameter determining module 502 includes: determining a sample proportion of local data and global data; determining an adjusting factor according to the included angle of the difference degree; wherein, the included angle of the difference degree is inversely proportional to the numerical value of the adjusting factor; and integrating the sample proportion and the adjustment factor to obtain a local weight parameter.

According to an embodiment of the present application, the model training module 501 includes: the receiving submodule 5011 is used for receiving an initialization model from a server; the setting submodule 5012 is configured to perform first model parameter setting on the initialization model to obtain an initialization local model; the setting submodule 5012 is further configured to perform second model parameter setting on the initialization model to obtain an initialization local model; the training submodule 5013 is configured to train the initialized local model through local data to obtain a local model; the training submodule 5013 is further configured to train the initialized local model through the local data, so as to obtain the local model.

Fig. 6 is a schematic diagram illustrating an implementation module of a training apparatus for a bang learning model according to another embodiment of the present application.

Referring to fig. 6, according to a fourth aspect of the present application, there is provided a training apparatus for a bang learning model, the apparatus being applied to a server, the apparatus comprising: a second receiving module 601, configured to receive local model parameters and local weight parameters from multiple clients; a second updating module 602, configured to update the global model parameter according to the plurality of local model parameters and the local weight parameter; the second sending module 603 is configured to send the updated global model parameter to each user end, so that each user end updates the local model parameter and the local weight parameter according to the updated global model parameter, and obtains the updated local model parameter and the updated local weight parameter.

According to an embodiment of the present application, the second updating module 602 includes: the normalizing submodule 6021 is configured to perform normalization processing on the local weight parameter to obtain a global weight parameter; and the aggregation sub-module 6022 is configured to aggregate according to the local model parameter and the global weight parameter, so as to update the global model parameter and obtain a parameter for updating the global model.

Fig. 7 shows a schematic diagram of an implementation apparatus of a training system for a federated learning model according to an embodiment of the present application.

Referring to fig. 7, according to a fifth aspect of the present application, there is provided a system for training a federated learning model, the system including a server 600 and a plurality of clients 500, the server 600 includes: a second receiving module 601, configured to receive local model parameters and local weight parameters from multiple clients; a second updating module 602, configured to update the global model parameter according to the plurality of local model parameters and the local weight parameter; the second sending module 603 is configured to send the updated global model parameter to each user end, so that each user end updates the local model parameter and the local weight parameter according to the updated global model parameter, and obtains the updated local model parameter and the updated local weight parameter. The user terminal 600 includes: the model training module 501 is configured to perform model training through local data to obtain a local model and a local model; a parameter determining module 502, configured to determine a local weight parameter corresponding to the local model according to a difference between the local model and the local model; a first sending module 503, configured to send the local model parameter and the local weight parameter corresponding to the local model to the server, so that the server updates the global model parameter according to the local model parameter and the local weight parameter; a first receiving module 504, configured to receive updated global model parameters from a server; a first updating module 505, configured to update the local model parameter and the local weight parameter according to the updated global model parameter, so as to obtain an updated local model parameter and an updated local weight parameter.

Here, it should be noted that: the above description of the embodiment of the training apparatus and system for the federal learning model is similar to the description of the method embodiment shown in fig. 1 to 4, and has similar beneficial effects to the method embodiment shown in fig. 1 to 4, and therefore, the description thereof is omitted. For technical details that are not disclosed in the embodiment of the display device for configuration information of the present application, please refer to the description of the method embodiment shown in fig. 1 to 4 of the present application for understanding, and therefore, for brevity, will not be described again.

According to a sixth aspect of the present application, there is provided a computer device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the above when executing the program.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 801 performs the various methods and processes described above, such as the training method of the federal learning model. For example, in some embodiments, the method of training of the federated learning model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more steps of the method of training of the federated learning model described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method of the federated learning model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a federated learning model, the method comprising:

performing model training through local data to obtain a local model and a local model;

determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model;

sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters;

receiving updated global model parameters from the server;

and updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters.

2. The method of claim 1, wherein determining the local weight parameter corresponding to the local model according to the degree of difference between the local model and the local model comprises:

determining a local gradient vector according to the local model;

determining a difference degree included angle according to the local gradient vector and the local gradient vector;

and determining a local weight parameter corresponding to the local model based on the included angle of the difference degree.

3. The method of claim 2, wherein determining a local weight parameter corresponding to the local model based on the included angle of difference comprises:

determining a sample proportion of local data and global data;

determining an adjusting factor according to the included angle of the difference degree; wherein the included angle of the difference degree is inversely proportional to the numerical value of the adjusting factor;

and integrating the sample proportion and the adjusting factor to obtain the local weight parameter.

4. The method of claim 1, wherein the model training through the local data to obtain the local model and the local model comprises:

receiving an initialization model from the server;

setting first model parameters of the initialization model to obtain an initialization local model;

setting second model parameters of the initialization model to obtain an initialization local model;

training the initialized local model through the local data to obtain the local model;

and training the initialized local model through the local data to obtain the local model.

5. A method for training a federated learning model, the method comprising:

receiving local model parameters and local weight parameters from a plurality of user terminals;

updating global model parameters according to a plurality of the local model parameters and the local weight parameters;

and sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain the updated local model parameters and the updated local weight parameters.

6. The method of claim 5, wherein updating global model parameters based on the plurality of local model parameters and the local weight parameters comprises:

carrying out normalization processing on the local weight parameters to obtain global weight parameters;

and aggregating according to the local model parameters and the global weight parameters to update parameters of the global model and obtain parameters of the updated global model.

7. The utility model provides a trainer of bang's learning model, its characterized in that, the device includes:

the model training module is used for carrying out model training through local data to obtain a local model and a local model;

the parameter determining module is used for determining a local weight parameter corresponding to the local model according to the difference degree of the local model and the local model;

the first sending module is used for sending the local model parameters and the local weight parameters corresponding to the local model to a server so that the server updates global model parameters according to the local model parameters and the local weight parameters;

the first receiving module is used for receiving the updated global model parameters from the server;

and the first updating module is used for updating the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters.

8. The utility model provides a trainer of bang's learning model, its characterized in that, the device includes:

the second receiving module is used for receiving the local model parameters and the local weight parameters from a plurality of user terminals;

a second updating module, configured to update a global model parameter according to the plurality of local model parameters and the local weight parameters;

and the second sending module is used for sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain the updated local model parameters and the updated local weight parameters.

9. The utility model provides a training system of bang's learning model, its characterized in that, the system includes server and a plurality of user side, the server includes:

the second sending module is used for sending the updated global model parameters to each user side so that each user side updates the local model parameters and the local weight parameters according to the updated global model parameters to obtain updated local model parameters and updated local weight parameters;

the user side includes:

10. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-4 or 5-6 when executing the program.

11. A storage medium containing computer-executable instructions for performing the method of any one of claims 1-4 or 5-6 when executed by a computer processor.