CN113537509A

CN113537509A - Collaborative model training method and device

Info

Publication number: CN113537509A
Application number: CN202110719669.6A
Authority: CN
Inventors: 余剑峤; 朱元绍; 刘毅
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology; Southern University of Science and Technology
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-10-22

Abstract

The invention discloses a collaborative model training method and a collaborative model training device, and relates to the technical field of deep learning, wherein the collaborative model training method comprises the following steps: selecting a shared data set from a client set corresponding to a plurality of clients; participating in federal learning training according to the shared data set and the preset score to obtain classification model parameters; broadcasting classification model parameters to a plurality of clients; obtaining updated model parameters returned by a plurality of clients; aggregating according to the updated model parameters to obtain a plurality of global binary classification models; and integrating a plurality of global binary classification models into a global multi-classification model. The collaborative model training method can improve the performance, accuracy and convergence speed of the Federal learning framework, does not need to share any data or be based on a specific model, provides complete privacy protection, and has smaller precision fluctuation, thereby reducing the negative influence of the data on the model convergence and being suitable for large-scale user scenes.

Description

Collaborative model training method and device

Technical Field

The invention relates to the technical field of deep learning, in particular to a collaborative model training method and device.

Background

Federal Learning (FL) is a privacy-oriented framework that allows distributed edge devices to jointly train a shared global model without transmitting their sensitive data to a centralized server. The objective of federal learning is to balance the conflict between obtaining large amounts of data and protecting sensitive information. However, the data stored locally on each edge device is typically not independently and identically distributed (non-IID). This data heterogeneity presents serious challenges to the optimization and convergence of the global model.

Currently, to address the communication-related challenges in the federal learning framework, an efficient model update aggregation algorithm, namely, Federated Averaging (FedAvg), is introduced. Federal learning still faces statistical challenges. On the one hand, the FedAvg-based variant aggregation algorithm relies on distributed stochastic gradient descent (D-SGD), which is widely used for iteratively training a deep learning model under independent homographic distribution (IID) sampling settings of training data. The purpose of learning from the training samples using the IID sampling method is to ensure that the random gradient is an unbiased estimate of the full gradient. It is not practical to ensure that the local data of each edge customer is always IID; on the other hand, studies have shown that data heterogeneity, i.e., non-IID distribution, may reduce convergence speed, which is not favorable for obtaining a robust sharing model.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the embodiment of the invention provides a method which can improve the performance, accuracy and convergence speed of a federated learning framework, does not need to share any data or be based on a specific model, provides complete privacy protection, has smaller precision fluctuation, reduces the negative influence of the data on the model convergence, and is suitable for large-scale user scenes.

The embodiment of the invention also provides another collaborative model training method.

The embodiment of the invention also provides a collaborative model training device.

The embodiment of the invention also provides another cooperative model training device.

The collaborative model training method according to the embodiment of the first aspect of the invention comprises the following steps:

selecting a shared data set from a client set corresponding to a plurality of clients;

participating in federal learning training according to the shared data set and preset scores to obtain classification model parameters;

broadcasting the classification model parameters to the plurality of clients;

obtaining the parameters of the updated model returned by the plurality of clients;

aggregating according to the updated model parameters to obtain a plurality of global binary classification models;

integrating the plurality of global binary classification models into a global multi-classification model.

The collaborative model training method according to the embodiment of the first aspect of the invention has at least the following advantages: the method comprises the steps of firstly, selecting a shared data set from a client set corresponding to a plurality of clients in a centralized manner, then participating in federated learning training according to the shared data set and preset scores to obtain classification model parameters, then broadcasting the classification model parameters to the plurality of clients, then obtaining updated model parameters and updated binary classification models returned by the plurality of clients, aggregating according to the updated model parameters to obtain a plurality of global binary classification models, and finally integrating the plurality of global binary classification models into a global multi-classification model.

According to some embodiments of the present invention, the selecting a shared data set from a set of clients corresponding to a plurality of clients includes: obtaining client labels corresponding to the plurality of clients; and selecting the shared data set from a plurality of client sets according to the client tags.

According to some embodiments of the present invention, the aggregating according to the updated model parameters to obtain a plurality of global binary classification models includes: obtaining an updating binary classification model corresponding to the updating model parameter; grouping the updated model parameters according to the updated binary classification model to obtain a plurality of classification groups to be aggregated; respectively extracting parameters to be aggregated of each classified group to be aggregated; and aggregating according to the parameters to be aggregated and the updated binary classification model to obtain the plurality of global binary classification models.

According to a second aspect of the invention, a collaborative model training method comprises:

acquiring classification model parameters broadcasted by a server;

obtaining local data labels corresponding to a plurality of local binary classification models;

initializing a plurality of the local binary classification models according to the local data tags;

optimizing and initializing a plurality of local binary classification models according to a local data set to obtain a plurality of updating model parameters and a plurality of updating binary classification models;

sending the updated model parameters back to the server.

The collaborative model training method according to the embodiment of the second aspect of the invention has at least the following advantages: firstly, classification model parameters broadcasted by a server are obtained, local data labels corresponding to a plurality of local binary classification models are obtained, then a plurality of local binary classification models are initialized according to the local data labels, the initialized local binary classification models are optimized according to a local data set, a plurality of updated model parameters and a plurality of updated binary classification models are obtained, and finally the updated model parameters are sent back to the server.

According to some embodiments of the present invention, the optimizing the initialized plurality of local binary classification models according to the local data set to obtain a plurality of updated model parameters and a plurality of updated binary classification models includes: acquiring a target function corresponding to the local binary classification model; and executing random gradient descent according to the local data set and the objective function, and optimizing the local binary classification model to obtain the updated model parameters and the updated binary classification model.

The cooperative model training method according to the embodiment of the third aspect of the present invention is applied to a server and a client, wherein the server is in communication connection with the client, and the method comprises the following steps:

the server performs the collaborative model training method according to the embodiment of the first aspect of the present invention, and the client performs the collaborative model training method according to the embodiment of the second aspect of the present invention.

The collaborative model training method according to the embodiment of the third aspect of the invention has at least the following advantages: the server side executes the collaborative model training method of the first aspect of the invention, and the client side executes the collaborative model training method of the second aspect of the invention, so that the performance, accuracy and convergence of the Federal learning framework can be improved, the precision fluctuation is smaller, the negative influence of data on the model convergence is reduced, and the method is suitable for large-scale user scenes.

A collaborative model training apparatus according to an embodiment of a fourth aspect of the present invention includes:

the selection module is used for selecting a shared data set from client sets corresponding to a plurality of clients;

the training module is used for participating in federal learning training according to the shared data set and a preset score to obtain a classification model parameter;

a broadcasting module for broadcasting the classification model parameters to the plurality of clients;

the first acquisition module is used for acquiring the updated model parameters returned by the plurality of clients;

the aggregation module is used for aggregating according to the updated model parameters to obtain a plurality of global binary classification models;

and the integration module is used for integrating the global binary classification models into a global multi-classification model.

The cooperative model training apparatus according to the fourth aspect of the present invention has at least the following advantages: by executing the collaborative model training method of the embodiment of the first aspect of the invention, the performance, accuracy and convergence speed of the federal learning framework can be improved, no data need to be shared, no specific model need to be based, complete privacy protection is provided, and precision fluctuation is smaller, so that the negative influence of data on model convergence is reduced, and the collaborative model training method is suitable for large-scale user scenes.

The cooperative model training apparatus according to the fifth aspect of the present invention includes:

the second acquisition module is used for acquiring the classification model parameters broadcasted by the server;

the third acquisition module is used for acquiring local data labels corresponding to the local binary classification models;

an initialization module for initializing a plurality of the local binary classification models according to the local data tags;

the optimization module is used for optimizing the initialized local binary classification models according to a local data set to obtain a plurality of updated model parameters and a plurality of updated binary classification models;

and the return module is used for sending the updated model parameters back to the server.

The cooperative model training apparatus according to the fifth aspect of the present invention has at least the following advantages: by executing the collaborative model training method of the embodiment of the second aspect of the invention, the extra computational burden can be greatly reduced, the accuracy of the data is improved, no data needs to be shared, complete privacy protection is provided, and the performance and the convergence of the federal learning framework are improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart illustrating a collaborative model training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a FedOVA algorithm according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a collaborative model training method according to another embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a collaborative model training apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a collaborative model training apparatus according to another embodiment of the present invention.

Reference numerals:

the system comprises a selection module 400, a training module 410, a broadcasting module 420, a first acquisition module 430, an aggregation module 440, an integration module 450, a second acquisition module 500, a third acquisition module 510, an initialization module 520, an optimization module 530, and a return module 540.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

The embodiment of the invention aims to solve the non-IID distribution problem caused by partial class deletion. Under the limitation of the client non-IID data, the non-IID problem in the federal learning means that each client cannot obtain enough multi-class labels to train a high-precision multi-class classification model through data sharing, which limits the expandability of the federal learning framework, and therefore cannot directly train one multi-class model at a client. For this purpose, the joint multi-class classification task on the non-IID data can be decomposed into a plurality of binary classification tasks of the client by using an One-vs-All method. Therefore, in the embodiment, a modified One-vs-All (fedvoa) algorithm may be designed by combining a One-vs-All method (i.e., using a plurality of classifiers, assuming each class as a positive class, and then using a One-time two-class discrimination algorithm to obtain the class classification of each class) and a FedAvg (federal averaging algorithm, the essence of FedAvg idea lies in integrating the weights trained by each user to average, that is, the proportion occupied in the averaging process is different according to the number of samples owned by each user) training scheme. The algorithm can effectively process non-IID data in federal learning by decomposing a multi-classification task into a plurality of binary classification tasks. An overview of the FedOVA algorithm can be found with reference to FIG. 2. The FedOVA algorithm first decomposes the multi-class classification problem into more direct binary classification problems, and then combines their respective outputs using ensemble learning methods.

Referring to fig. 1, a collaborative model training method according to an embodiment of the first aspect of the present invention includes:

step S100, a shared data set is selected from a set of clients corresponding to a plurality of clients.

Wherein, the client can be a plurality of clients; the set of clients may be a data set for each client; a set of a portion of the clients may be selected as the client shared data set κ. Optionally, let there be one local data set D per client_kK is equal to {1,2,. eta., K }, and the total number of classes is n. A part of clients can be selected from all the clients, and then the data of the part of clients (i.e. D of the clients) is obtained_k) As a shared data set. Specifically, for a plurality of clients, the tags of different clients may be set. For each round of training, the server randomly selects a part of client sets according to the labels as a shared data set k to participate in federal learning training.

And step S110, participating in federal learning training according to the shared data set and the preset score to obtain classification model parameters.

Wherein, the classification model parameter can be that the server shares the data set and the preset score according to the shared data setAnd carrying out Federal learning training to obtain parameters of the binary classifier model. Optionally, the preset score may be set according to the requirement. The preset score is C, the user can participate in federal learning training according to the randomly selected shared data set k, and the convergence rate of the training speed can be increased. Obtaining binary classifier model parameters through federal learning training

Namely obtaining the classification model parameters

Where i represents a classifier ID, i ∈ {1, 2.

Step S120, broadcasting the classification model parameters to a plurality of clients.

Optionally, the server may classify the model parameters

Broadcast to multiple clients so that multiple clients are receiving classification model parameters

Thereafter, local training may be performed to obtain updated results.

Step S130, obtaining updated model parameters returned by the plurality of clients.

The updated model parameters may be binary classifier parameters obtained by locally training the optimized classifier. Optionally, when the client performs local training, a classification model, that is, multiple models need to be trained for each type of label. Each client side uses the local data set D_kAnd executing random gradient descent to optimize the classifier, and then returning updated model parameters obtained after updating to the server.

And step S140, aggregating according to the updated model parameters to obtain a plurality of global binary classification models.

Wherein the global binary classification model may be a global two-classifier model, and the global binary classification model may be a plurality of binary classification models. Optionally, since each client only trains part of the classifiers, parameters of all the classifiers are not returned, and each classifier is independent, asynchronous update can be performed to reduce computational burden. Specifically, the server groups the returned updated model parameters according to the corresponding binary classifier models, and then summarizes each group G_iThe parameters are aggregated to obtain a corresponding global binary classification model. And for each local client side having a plurality of local binary classification models, after the updated model parameters are returned to the server, the server aggregates the local binary classification models of each class into a plurality of global binary classification models, and then the plurality of global binary classification models are obtained.

And S150, integrating a plurality of global binary classification models into a global multi-classification model.

Optionally, the aggregating step S140 is repeated, and multiple global binary classification models may be integrated into a global multi-classification model until the final integrated classifier achieves the convergence effect, so as to obtain the global multi-classification model. There is no redundancy between component binary classifiers in FedOVA, and the training of each classifier does not affect other classifiers. Therefore, in the process of federal learning training, the component binary classifier can be independently updated once the training is finished, so that the communication efficiency of the client and the server is improved, and no additional operation is brought to the federal learning framework.

The collaborative model training method comprises the steps of firstly selecting a shared data set from a client set corresponding to a plurality of clients, then participate in federal learning training according to the shared data set and the preset score to obtain classification model parameters, and then broadcast the classification model parameters to a plurality of clients to obtain updated model parameters and updated binary classification models returned by the clients, and aggregating according to the updated model parameters to obtain a plurality of global binary classification models, and finally integrating the plurality of global binary classification models into a global multi-classification model, which can improve the performance, accuracy and convergence speed of the Federal learning framework without sharing any data or based on a specific model, and complete privacy protection is provided, and precision fluctuation is smaller, so that negative influence of data on model convergence is reduced, and the method is suitable for large-scale user scenes.

In some embodiments of the present invention, selecting a shared data set from a set of clients corresponding to a plurality of clients includes:

and obtaining client labels corresponding to the plurality of clients. Wherein, the client tag can be a unique tag corresponding to the client. Alternatively, for the non-IID configuration, the number of unique tags held by the client can be represented using the parameter 1 ≦ l ≦ 10. For example, non-IID-2 means that each client has two different tags. The method is realized by grouping training data according to labels and dividing each group into (l multiplied by K)/n partitions, finally distributing the partitions with different labels to each client, and acquiring the labels of the partitions to obtain the client labels corresponding to each client.

And selecting a shared data set from a plurality of client sets according to the client tag. Optionally, assuming that the number K of the clients is 100, 20% of the clients may be randomly selected for training in each round of training according to the client labels corresponding to different clients (that is, the preset score C is 0.2), and the data of the 20% of the clients are respectively obtained, so as to obtain the shared data set. The client terminals participating in training are randomly selected through the client labels to obtain the shared data set, all sample categories are not needed, the best accuracy can be obtained on each task, and complete privacy protection is provided.

In some embodiments of the present invention, aggregating according to the updated model parameters to obtain a plurality of global binary classification models includes:

and obtaining an updated binary classification model corresponding to the updated model parameters. Wherein the updated bivariate classification model may be an updated local bivariate classification model. Optionally, when the client performs local training, a classification model, that is, multiple models need to be trained for each type of label, and therefore, the updated model parameter of each type corresponds to an updated local binary classification model, that is, an updated binary classification model.

And grouping the updated model parameters according to the updated binary classification model to obtain a plurality of classification groups to be aggregated. Optionally, since each client only trains part of the classifiers, parameters of all the classifiers are not returned, and each classifier is independent, asynchronous update can be performed to reduce computational burden. And the server groups the returned parameters according to the corresponding binary classifier models to obtain a plurality of groups of classified groups to be aggregated.

And respectively extracting the parameters to be polymerized of each classified group to be polymerized. Optionally, for each group of classified groups to be aggregated, each group G may be aggregated_iObtaining the parameter to be polymerized.

And aggregating according to the parameters to be aggregated and the updated binary classification models to obtain a plurality of global binary classification models. Optionally, for updating the binary classification model f_iThe polymerization process can be expressed by the following formula (i):

in the formula (I), the reaction solution is prepared,

as a function of the parameters to be polymerized,

in order for the classified groups to be aggregated,

to update the model parameters, K ∈ {1, 2. And aggregating the updated binary classification models of each category into a plurality of global binary classification models through the formula I to obtain a plurality of global binary classification models. The multi-class classification problem is decomposed into a more direct binary classification problem, then the respective outputs of the two classes are combined by using an ensemble learning method, as the component binary classifiers in FedOVA have no redundancy, the training of each classifier does not influence other classifiers, and through asynchronous updating, the component binary classifiers can be independently updated once the training is finished in the Federal learning training process, so that the passenger capacity is improvedThe communication efficiency of the client and the server.

Referring to fig. 3, a collaborative model training method according to an embodiment of a second aspect of the present invention includes:

step S300, obtaining the classification model parameters broadcasted by the server.

Optionally, the client may receive the classification model parameters broadcast by the server

Each client receives the classification model parameters

Thereafter, local training may be performed to obtain updated results.

Step S310, local data labels corresponding to the local binary classification models are obtained.

Wherein the local data tag may be a local data set D_kA corresponding label. Optionally, the local data set of the client is set as an F-MNIST data set, and for the F-MNIST data set, if the client only has a label "1" and a label "2", a local binary classification model F may be obtained₁And f₂Corresponding local data tags "1" and "2".

Step S320, initializing a plurality of local binary classification models according to the local data labels.

Optionally, referring to the fedvoa algorithm shown in fig. 2, the client receives the classification model parameter ω_i ^tThen, the client side can initialize some OVA component classifier models according to the local data label distribution of the client side, namely classifying model parameters

Taking the example that the client only has the label "1" and the label "2", at this time, the client will initialize the local binary classification model f₁And f₂The parameter (c) of (c). Unlike multi-class classifier integration, each classifier of FedOVA is dedicated to distinguishing a particular class. Such a design may ensure low error between different classifiersAnd the relevance is realized, so that the diversity among the classifiers is enhanced, and the overall classification precision is improved.

Step S330, a plurality of updating model parameters and a plurality of updating binary classification models are obtained according to the plurality of local binary classification models after the local data set optimization initialization.

Optionally, in the federal learning training process, the purpose of fedvoa is to train an expert-level binary classifier for each class, thus solving the problem of difficulty in convergence when training a multi-classifier model with non-IID data. For each binary classifier, each client uses the local dataset D_kAnd executing random gradient descent to optimize the classifier to obtain an updated result, wherein the updated result comprises the updated binary classification model and the corresponding updated model parameters. Since each component classifier in fedvoa assigns samples that do not belong to the current class to other classes, missing negative sample classes does not have a significant impact on the performance of the classifier. Furthermore, FedOVA only needs to create a new classifier for each new emerging label when the new label class emerges, which allows adaptation to the new environment during federal learning training without drastic changes.

Step S340, sending the updated model parameters back to the server.

Optionally, the update result may be returned to the server, so that the server may aggregate the local two-class models of each class into a plurality of global two-class models according to the returned update model parameters, and finally integrate the global two-class models into one global multi-class model.

The collaborative model training method comprises the steps of firstly obtaining classification model parameters broadcasted by a server, obtaining local data labels corresponding to a plurality of local binary classification models, initializing the local binary classification models according to the local data labels, optimizing the initialized local binary classification models according to a local data set to obtain a plurality of updated model parameters and a plurality of updated binary classification models, and finally sending the updated model parameters back to the server, so that extra calculation burden can be greatly reduced, the accuracy of data is improved, any data does not need to be shared, complete privacy protection is provided, and the performance and the convergence of a federated learning framework are improved.

In some embodiments of the present invention, optimizing the initialized local binary classification models according to the local data set to obtain a plurality of updated model parameters includes:

and acquiring a target function corresponding to the local binary classification model. Optionally, for each binary classifier, the goal is to minimize the following objective function:

formula II, D_kThe representation contains training samples (x)_i，y_i) ω is a parameter of the binary classifier,

is a loss function, namely:

and executing random gradient descent according to the local data set and the target function, and optimizing the local binary classification model to obtain updated model parameters. Optionally, a plurality of binary classifiers are trained according to the FedOVA algorithm shown in fig. 2, and the output of the classifier with the most grip is selected as the prediction result. Local data set D can be obtained_kAnd substituting the binary classifier parameters omega into the formula II to obtain an updated result, wherein the updated result comprises an updated binary classification model and corresponding updated model parameters, so that the performance of federal learning in a non-IID data scene can be remarkably improved, the accuracy is higher, and the convergence is faster.

The cooperative model training method according to the embodiment of the third aspect of the present invention is applied to a server and a client, wherein the server is in communication connection with the client, and the method includes:

the server executes the collaborative model training method according to the embodiment of the first aspect of the present invention, and the client executes the collaborative model training method according to the embodiment of the second aspect of the present invention.

According to the collaborative model training method, the server executes the collaborative model training method in the embodiment of the first aspect of the invention, and the client executes the collaborative model training method in the embodiment of the second aspect of the invention, so that the performance, accuracy and convergence of a federal learning framework can be improved, the precision fluctuation is smaller, the negative influence of data on model convergence is reduced, and the collaborative model training method is suitable for large-scale user scenes.

The following describes the process of the collaborative model training method according to an embodiment of the present invention in detail. It is to be understood that the following description is only exemplary, and not a specific limitation of the invention.

The collaborative model training method comprises the following steps:

the embodiment of the invention carries out the collaborative model training according to the FedOVA algorithm shown in figure 2, and the FedOVA training process is to repeat the following steps for the communication rounds from 1 to T:

first, the parameters of the two classifier models are initialized.

Suppose there is one server responsible for coordination and a set of clients in federated learning

Wherein each client has a local data set D_kK is equal to {1,2,. eta., K }, and the total number of classes is n. For each round of training, the server randomly selects a set of clients

A subset of

And (4) participating in the federal learning training by the score C (participating in the federal learning training by the score C can accelerate the training speed and the convergence speed of the global model). The server then applies the binary classifier model parameters

Broadcast to the clients, where i represents a classifier ID, i ∈ {1, 2.

And secondly, the client performs local training.

After the client receives the parameters of the binary classifier model, the client initializes some OVA component classifier models according to the local data label distribution of the client, namely the client initializes some OVA component classifier models

For example, for the F-MNIST dataset, if a client has only tag "1" and tag "2", this client will initialize the parameters for the classifier sum. For each binary classifier, whose goal is to minimize the objective function shown in equation (c), each client optimizes the classifier by performing a random gradient descent using the local dataset, and then sends the update back to the server.

And thirdly, the server aggregates the local two-classification models of each class and integrates to obtain a global multi-classification model.

Because each client only trains part of the classifiers, the parameters of all the classifiers are not returned, and each classifier is independent, asynchronous updating is carried out to reduce the computational burden. The server groups the returned parameters according to the corresponding binary classifier models, and then summarizes each group G_iThe parameter (c) of (c). For model f_iThe aggregation process can be expressed as formula (i), so that a plurality of global two-classification models are obtained by aggregating the local two-classification models of each category, the steps are repeated until the final integrated classifier achieves the convergence effect, and finally the global two-classification models are integrated into a global multi-classification model. Since the purpose of FedOVA is to train an expert-level binary classifier for each class, the problem of difficult convergence when training a multi-classifier model with non-IID data is solved.

In some specific embodiments, a simulation experiment may be performed on the collaborative model training method in the embodiments of the present invention:

for experimental hardware environmentFor explanation: all simulation experiments were developed using Python 3.7 and PyTorch1.7, and using Nvidia GeForce RTX2080 Ti GPU and PyTorch1.7

The Silver CPU was run and all experiments were performed in sequence to simulate distributed training.

First, the performance of FedOVA and FedAvg was evaluated using non-IID data. Simulation experiments can be performed on three representative common data sets. For non-IID configurations, a parameter may be used to indicate the number of unique tags that the client holds. For example, non-IID-2 means that each client has two different tags. This is achieved by grouping the training data by label and dividing each group into partitions, and finally assigning partitions with different labels to each client. For consistency of the experiment, the same data allocation method can be used for each data set and the training data is sent to 100 clients according to the non-IID-2 configuration. Training samples were assigned to 100 clients according to the non-IID configuration and 10 Convolutional Neural Networks (CNNs) were trained as binary classifiers. By default, 20% of the customers are selected for training in each round of training, i.e., the score C is 0.2. Each customer trains 5 cycles on their local data set with a batch size of 15 (i.e., E-5 and B-15).

To better study the performance of FedOVA under non-IID data distributions, a series of simulations can be performed for different non-IID configurations, e.g., testing non-IID-l and/e {2,3,5 }. Whether using the fedvoa or FedAvg algorithm, the accuracy increases as the number of tag classes in the customer local dataset increases. However, two points remain noteworthy: table 1 compares the accuracy, reported in percent (%), for different non-IID configurations.

TABLE 1

As shown in table 1, the first is that different non-IID configurations have a dramatic effect on the performance of the FedAvg, but no significant effect on the fedva; secondly, FedOVA can also achieve superior performance compared to FedAvg in every non-IID scenario. These two observations fully illustrate the robustness of the FedOVA algorithm to the non-IID case.

And secondly, testing performance of the data under large-scale clients and small-scale data. Keeping the total number of training samples unchanged and greatly increasing the number of clients (i.e. K is increased from 100 to 1000 in F-MNIST and CIFAR-10; setting K to 500 because the data size of KWS is too limited), the local data size of each client will be correspondingly reduced. For each training round, C is set to 0.2 and K × C customers are selected to participate. The average of the last 20 rounds was taken as the final accuracy, the experimental results are shown in table 5, table 5 is the comparison of the accuracy rates at different numbers of clients, and the accuracy rates are reported in percent (%).

TABLE 2

As shown in table 2, as the number of customers increases, the use of FedAvg results in a large drop in accuracy; however, this is not the case with FedOVA. The reason is that the FedOVA algorithm adopts the OVA idea to train some binary classifiers, and the robustness of the binary classifiers to different environments is improved due to the increase of the number of clients. In this way, competitive results are still possible with less data.

The above experiment proves that: the OVA-based federated learning training algorithm (FedOVA algorithm) can obviously improve the performance of federated learning in non-IID data scenes. In extensive experiments on a given set of computer vision and speech recognition data, the FedOVA algorithm consistently outperformed FedAvg under various non-IID configurations. Experimental results also show that FedOVA has faster convergence speed and smaller precision fluctuation, so that the negative influence of non-IID data on model convergence is reduced. Finally, experimental results show that FedOVA can be applied to large-scale user scenarios, especially in cases where a large number of customers (up to 1000) each have a small amount of data.

Referring to fig. 4, a collaborative model training apparatus according to a fourth aspect of the present invention includes:

a selecting module 400, configured to select a shared data set from a set of clients corresponding to multiple clients;

the training module 410 is used for participating in federal learning training according to the shared data set and the preset score to obtain classification model parameters;

a broadcasting module 420 for broadcasting the classification model parameters to a plurality of clients;

a first obtaining module 430, configured to obtain updated model parameters and an updated binary classification model returned by multiple clients;

the aggregation module 440 is configured to aggregate the updated model parameters to obtain a plurality of global binary classification models;

an integration module 450 configured to integrate the plurality of global binary classification models into a global multi-classification model.

By implementing the collaborative model training method of the embodiment of the first aspect of the present invention, the collaborative model training apparatus described above can improve the performance, accuracy and convergence speed of the federal learning framework, does not need to share any data, does not need to be based on a specific model, and provides complete privacy protection with less precision fluctuation, thereby reducing the negative impact of data on model convergence, and being suitable for large-scale user scenarios.

Referring to fig. 5, a collaborative model training apparatus according to an embodiment of a fifth aspect of the present invention includes:

a second obtaining module 500, configured to obtain a classification model parameter broadcasted by the server;

a third obtaining module 510, configured to obtain local data tags corresponding to multiple local binary classification models;

an initialization module 520 for initializing a plurality of local binary classification models according to local data tags;

an optimizing module 530, configured to optimize the initialized local binary classification models according to the local data set to obtain a plurality of updated model parameters and a plurality of updated binary classification models;

and a returning module 540 for sending the updated model parameters back to the server.

By implementing the collaborative model training method according to the embodiment of the second aspect of the present invention, the collaborative model training apparatus described above can greatly reduce the additional computational burden, improve the accuracy of data, do not need to share any data, provide complete privacy protection, and improve the performance and convergence of the federal learning framework.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A collaborative model training method, comprising:

broadcasting the classification model parameters to the plurality of clients;

obtaining updated model parameters returned by the plurality of clients;

2. The method of claim 1, wherein selecting the shared data set from a set of clients corresponding to a plurality of clients comprises:

obtaining client labels corresponding to the plurality of clients;

and selecting the shared data set from a plurality of client sets according to the client tags.

3. The method of claim 1, wherein the aggregating according to the updated model parameters to obtain a plurality of global binary classification models comprises:

obtaining an updating binary classification model corresponding to the updating model parameter;

grouping the updated model parameters according to the updated binary classification model to obtain a plurality of classification groups to be aggregated;

respectively extracting parameters to be aggregated of each classified group to be aggregated;

and aggregating according to the parameters to be aggregated and the updated binary classification model to obtain the plurality of global binary classification models.

4. A collaborative model training method, comprising:

acquiring classification model parameters broadcasted by a server;

sending the updated model parameters back to the server.

5. The method of claim 4, wherein optimizing the initialized plurality of local binary classification models from the local dataset to obtain a plurality of updated model parameters comprises:

acquiring a target function corresponding to the local binary classification model;

and executing random gradient descent according to the local data set and the objective function, and optimizing the local binary classification model to obtain the updated model parameters.

6. A collaborative model training method is used for a server and a client, wherein the server is in communication connection with the client, and the method comprises the following steps:

the server executes the collaborative model training method according to any one of claims 1 to 3, and the client executes the collaborative model training method according to any one of claims 4 to 5.

7. A collaborative model training apparatus, comprising:

8. A collaborative model training apparatus, comprising: