CN115099334A

CN115099334A - Multi-party cooperative data learning system and learning model training method

Info

Publication number: CN115099334A
Application number: CN202210715202.9A
Authority: CN
Inventors: 武星; 裴洁; 钱权
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-09-23

Abstract

The invention relates to a multi-party collaborative data learning system and a learning model training method, which fully utilize the combination of the advantages of active learning and federal learning, utilize a small amount of labeled data of each client to carry out collaborative training of a model on the premise of protecting data privacy, generate probability distribution by predicting loss values in each round of federal learning training, select the client to participate in training, accelerate the convergence of a global model and reduce communication traffic; the client side loads the current global model parameters into a local model and then actively learns, the local model is used for guiding sample query, and high-information samples are obtained through sampling and are labeled; and expanding the marked data set of each client, and performing federated learning training again to obtain a model with better performance. On the premise of reducing the sample marking cost as much as possible, the accuracy and generalization performance of the model are improved.

Description

Multi-party cooperative data learning system and learning model training method

Technical Field

The invention relates to a data information technology, in particular to a multi-party cooperative data learning system and a learning model training method.

Background

The development of artificial intelligence depends on a large amount of information data, so that deep learning is driven to make great progress under the data support of a large labeled data set, and the obtained deep learning model is widely applied to production practices, such as a face recognition system, a smoke alarm system, an intelligent workshop, automatic driving and the like. In addition, deep learning has great development prospects in the fields of medical treatment, finance, industry, education and the like. In recent years, the internet is mature, the internet of things is rapidly developed, various industries generate a large amount of data every day, and how to use the data to train a deep learning model becomes a practical challenge.

The following challenges are faced with using these data: firstly, the data has the characteristics of large data volume and low value density, and most of the data need to be marked by manual marking or other methods with huge cost. Because the cost of the labeled data set is high, a plurality of researchers are promoted to research and search the method to reduce the labeling cost, and research directions such as active learning are generated; secondly, an excellent deep learning model is not driven by a large amount of data, firstly, the amount of data collected by a single enterprise in a certain field may not meet the requirement for training the deep learning model, secondly, the data collected by some enterprises do not belong to the enterprise, and the data may belong to individuals or other enterprises, so that the data are uploaded to the same high-performance server center by multiple enterprises or individuals for deep learning model training through cooperation in a traditional data centralized mode, and thus, various problems such as privacy protection, industry competition, legal restriction, intellectual property protection, data storage and communication are raised. To address these issues, researchers have proposed a federal learning framework.

Disclosure of Invention

Aiming at the problem that the accuracy of a learning model is improved and depends on a large amount of high-quality data, a multi-party collaborative data learning system and a learning model training method are provided, the advantages of active learning and federal learning are fully utilized, on the premise of protecting data privacy, the labeled data information of each client is subjected to federal learning collaborative training aiming at classification tasks, the model obtained by the federal learning training is used for guiding each participant to perform active learning, an information sample is sampled, then the federal learning training is further performed, the sample labeling cost is reduced as far as possible, and the accuracy and the generalization performance of the model are improved.

The technical scheme of the invention is as follows: a multi-party cooperative data learning system comprises a central server for laying out a global model and a client with local classification models in multiple parties, wherein the central server issues model parameters, the client utilizes local label data to carry out reasoning and training, reasoning and training results are returned to the central server, and the central server receives results of the client with multiple parties to carry out federal learning and training of the global model; and the client actively learns the local unlabeled data according to the acquired trained global model parameters, expands the local labeled data set, and the global model performs federated learning on the labeled data set expanded by the multi-party clients again.

A multi-party cooperative data learning model training method specifically comprises the following steps:

1) the central server initially randomly generates a global model parameter W and sends the global model parameter W to all the clients;

2) each client receives the global model parameter W issued by the central server, loads the global model parameter W to the local model, and utilizes the local model to label the local data set D _L All samples are subjected to model reasoning once, and the predicted loss value of the current learning is recorded and uploaded to a central server;

3) the central server receives a round of predicted loss value set V uploaded by all the clients, linear numerical value mapping is carried out on all elements in the predicted loss value set V to enable the sum of the elements to be 1, the elements are used as probability distribution to participate in the next round of selection of the Federal learning clients according to the size after numerical value mapping, and the number of the selected clients accounts for half of the number of the clients;

4) selected clients utilize local tagged data set D _L Training the local model, and uploading the updated local model parameters to a central server;

5) the central server receives the local model parameters uploaded by the selected client, updates the global model parameters W by using a model aggregation algorithm and sends the global model parameters W to all the clients;

6) iterative training, repeating 2) -5) until a certain training frequency is met, and finishing the training of the federal learning global model;

7) each client receives updated parameters W after the global model sent by the central server is trained ^* Loading the data to a local model for active learning, selecting an unlabeled sample with the most information gain by using the local model, and requesting an expert to label the sample;

8) and returning the supplemented labeled data set to execute the step 6) to perform global model training again until the labeled data set cannot be expanded.

Further, the calculation formula of the predicted loss value in the step 2) is as follows:

wherein V _i ^t The predicted loss value of the ith client end in the tth round of federal learning is represented, t represents the tth round of federal learning training, i represents the ith client end, n _i Representing the currently labeled data set D of the ith client _L Number of samples owned in, l (-) represents the loss function, x _k 、y _k Respectively represent the kth sample and a sample label, W _local Representing local model parameters.

Further, the calculation formula of the linear numerical mapping in the step 3) is as follows:

wherein newV _i ^t Denotes V _i ^t The value after the linear numerical mapping is carried out,

sum, line representing all client predicted loss valuesAnd sampling the client by taking the prediction loss value after the characteristic value mapping as discrete probability distribution, wherein the higher the prediction loss value is, the higher the probability of sampling the client is.

Further, the formula of the model polymerization algorithm in the step 5) is as follows:

wherein h is the number of the currently selected clients,

and representing the local model parameters uploaded by the ith client in the selected clients.

Further, the step 7) of the active learning method of the local model is as follows:

7.1) additionally adding two auxiliary classifiers to a local model architecture in global model training, connecting the auxiliary classifiers to a main network in a local model, and then paralleling the auxiliary classifiers with a main classifier of the local model to form a local active learning model;

7.2) Using the labeled data set D _L And unlabeled data set D _U Training a local active learning model;

7.3) training the differences among the maximized auxiliary classifiers by taking the difference loss function as an objective function to obtain a tighter decision boundary, thereby selecting high-information samples from the unlabeled samples and adding the high-information samples into the label data set.

Furthermore, the two additional auxiliary classifiers are the same as the network architecture of the main classifier, the network parameters are generated by adding random Gaussian noise to the network parameters of the main classifier, the added random Gaussian noise p-N (0,0.1), the feature maps obtained after the data samples pass through the main network respectively enter the main classifier and the auxiliary classifiers, and the classifiers are not influenced with each other.

Further, the step 7.2) of training the local active learning model is as follows:

the backbone network and the primary classifier are denoted by θ, the backbone network is denoted by b, and the backbone network is denoted by θ ₁ And theta ₂ Two auxiliary classifiers are represented, p represents the probability distribution of the sample passing through theta output, and p represents the probability distribution of the sample passing through theta output ₁ Represents the sample passing through (b, theta) ₁ ) Probability distribution of output in p ₂ Represents the sample passing through (b, theta) ₂ ) A probability distribution of the output;

a: training a local active learning model by using the labeled data set;

a-1, calculating the sample passing through theta, (b, theta) ₁ )、(b，θ ₂ ) Cross entropy loss L produced by inference _CE ：

The cross entropy loss function is calculated by the formula:

where C is the total number of sample classes, C represents the sample class, 1 represents the indicator function, p ^c (y | x) represents the probability that sample x belongs to class c;

a-2, local active learning model parameters are updated, wherein eta is the learning rate,

is a gradient:

b: training the auxiliary classifier by using the unlabeled data set;

b-1: calculating the sample pass (b, θ) ₁ )、(b，θ ₂ ) Loss of variance L from reasoning _dist ：

L _dist ＝d(p ₁ ,p ₂ )+d(p ₁ ,p)+d(p ₂ ,p)，

B-2: updating the auxiliary classifier parameters:

the invention has the beneficial effects that: the invention relates to a multi-party collaborative data learning system and a learning model training method, which fully utilize a small amount of labeled data sets of all clients through federal learning and are used for guiding the sampling of samples actively learned by a single client; in the federal learning training, the client is subjected to sampling training by using the prediction loss value as probability distribution, so that the model convergence can be accelerated and the communication traffic can be reduced; in active learning, a tighter decision boundary can be obtained by maximizing the difference between the auxiliary classifiers, so that high-information samples are effectively selected; by utilizing the advantages of active learning and federal learning, a large amount of unmarked data of each participant can be fully utilized to carry out the collaborative training of the model on the premise of protecting the data privacy.

Drawings

FIG. 1 is a schematic diagram of a multi-party collaborative data learning system according to the present invention;

FIG. 2 is a block flow diagram of a multi-party collaborative data learning model training method of the present invention;

FIG. 3 is a schematic diagram of a sample sampling strategy in the multi-party collaborative data learning system according to the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

As shown in the schematic structural diagram of the multi-party collaborative data learning system shown in fig. 1, a local client loads global model parameters into a local model, performs local model reasoning by using local label data, obtains a prediction loss value and sends the prediction loss value back to a central server, so that the central server selects clients participating in the next round of federal learning; and after the selected client side trains the local model by using the locally marked data, uploading the local model parameters to the central server, aggregating the selected client side model parameters by the central server to obtain an updated global model, issuing the updated global model parameters to all the client sides for the next round of federal learning, and ending the federal learning training after the global model is converged. And the client loads the trained global model parameters into the local model, actively learns the local unmarked data by using the local model, expands the local marked data set, and then performs the federal learning.

As shown in fig. 2, a flow diagram of a training method of a multi-party collaborative data learning model is further illustrated with reference to fig. 2, and the method specifically includes the following steps:

s100, the central server initially randomly generates a global model parameter W and transmits the global model parameter W to all clients;

s200, each client receives the global model parameter W issued by the central server, loads the global model parameter W to the local model, and utilizes the local model to carry out local annotation on the data set D _L All samples of the system are subjected to model reasoning once, and the predicted loss value V of the current round of federal learning is recorded _i ^t And uploading to a central server;

in particular, the predicted loss value V _i ^t The calculation formula of (2) is as follows:

wherein t represents the t-th round of federal learning training, i represents the ith client, n _i Representing the currently labeled data set D of the ith client _L Number of samples owned in, l (-) represents the loss function, x _k 、y _k Respectively represent the kth sample and a sample label, W _local Representing local model parameters.

Each client locally stores a labeled data set D with a small number of samples _L And unlabeled data set D of a large number of samples _U The reason why the denominator of the calculation formula for predicting the loss value is the square of the number of marked samples is as follows: one side is inclined to the client side with more marked data samples; on the other hand, the influence of the fact that a certain client has a small number of samples in the labeled data set and a large amount of noise data exists is reduced.

S300, the central server receives a round of prediction loss value set V uploaded by all the clients,

m represents the number of the clients, linear numerical mapping is carried out on all elements in the prediction loss value set V to enable the sum of the elements to be 1, the elements are used as probability distribution to participate in the next round of selection of the Federal learning clients according to the size after numerical mapping, and the number of the selected clients accounts for half of the number of the clients;

specifically, the calculation formula of the linear numerical mapping is as follows:

wherein newV _i ^t Represents V _i ^t The value after the linear numerical mapping is carried out,

representing the sum of all client predicted loss values.

The method has the advantages that the central server can be used for predicting the loss value after linear value mapping

The client is sampled as a discrete probability distribution, the higher the predicted loss value, the greater the probability that the client is sampled. Firstly, the training of the local labeled data set of the client with higher loss prediction value on the current global model is more helpful, the convergence of the global model can be accelerated, and secondly, in each round of federal learning training, because not all the clients participate in the training, the communication volume can be reduced.

S400, the selected client utilizes the local labeled data set D _L Training the local model and updating the local model parameter W _local Uploading to a central server;

s500, the central server receives the local model parameters uploaded by the selected clients, updates the global model parameters W by using a model aggregation algorithm and sends the global model parameters W to all the clients;

specifically, the algorithm formula of the aggregate global model is as follows:

wherein h is the number of currently selected clients,

S600, iterative training is carried out, S200-S500 are repeated until a certain training frequency is met, and the federal learning global model training is completed;

the method has the advantages that the information of the existing marked data samples of each client can be fully utilized by repeating the steps of S200-S500, the global model obtained through the federal learning training has better performance than the local model obtained by training a single client by using a small amount of marked samples, and the global model is used for guiding each client to actively learn, so that the samples with gain on the model performance can be better inquired.

S700, each guestThe client receives the updated parameter W after the global model training sent by the central server ^* And loading the data to a local model for active learning, selecting a part of unlabeled samples with the most information gain by using the local model, and requesting an expert to label the samples. As shown in fig. 3, the active learning performed by each client in S700 includes the following steps:

s700-1, modifying a local model architecture, and establishing a local active learning model after additionally adding two auxiliary Classifier classic 1 and classic 2 modules to a main network backbone module, which are the same as a main Classifier classic module;

s700-2, utilizing labeled data set D _L And unlabeled data set D _U Training a local active learning model;

s700-3, using local active learning model to never mark data set D _U A part of unlabeled samples with the most information gain is selected and moved to the labeled data set D _L 。

Specifically, in sub-step S700-1 of step S700, two additional modules of Classifier1 and Classifier2 are added, which are the same as the network architecture of the main Classifier module, and the network parameters are generated by adding random gaussian noise to the network parameters of the main Classifier. Random Gaussian noise p-N (0,0.1) is added, a feature map obtained after a data sample passes through a backbone network back bone module respectively enters a main classifier and an auxiliary classifier, and the classifiers are not affected with each other.

Specifically, in sub-step S700-2 of step S700, the backbone network module and the main classifier module are denoted by θ, the backbone network is denoted by b, and θ is denoted by θ ₁ And theta ₂ Two auxiliary classifier blocks are represented, p represents the probability distribution of the sample through the theta output, p ₁ Represents the sample passing through (b, θ) ₁ ) Probability distribution of output in p ₂ Represents the sample passing through (b, θ) ₂ ) The training mode of the local active learning model is as follows: 1. training a local active learning model by using the labeled data set;

(1) computing samplesThrough theta, (b, theta) ₁ )、(b，θ ₂ ) Cross entropy loss L produced by inference _CE : the cross entropy loss function is calculated by the formula:

where C is the total number of sample classes, C represents the sample class, 1 represents the indicator function, p ^c (y | x) represents the probability that sample x belongs to class c.

(2) Updating the local active learning model parameters, wherein eta is the learning rate,

is the gradient:

2. training the auxiliary classifier by using the unlabeled data set;

(1) calculating the sample pass through (b, θ) ₁ )、(b，θ ₂ ) Loss of variance L from reasoning _dist ：

The formula for the difference loss function is:

L _dist ＝d(p ₁ ,p ₂ )+d(p ₁ ,p)+d(p ₂ ,p)，

(2) updating the auxiliary classifier parameters:

the auxiliary classifier is the same as the main classifier in the model architecture, and the difference is that:

(1) the parameters of the auxiliary classifier are different, and the parameters of the auxiliary classifier are obtained by adding random Gaussian noise to the parameters of the main classifier, so the parameters of the three classifiers are different;

(2) the invention can be briefly described as three stages, namely an initial federal learning stage, a local active learning stage and a federal learning stage after local labeled data set expansion, wherein the parameters of a main classifier are obtained by initial federal learning training, and in the local active learning stage, after two auxiliary classifiers are added to a local model, the local model also needs to be trained, wherein the parameters of the main classifier are updated only when the labeled data set is used for training,

however, the parameters of the two auxiliary classifiers are not only updated when trained with labeled data sets:

it is also updated when training is performed with unlabeled datasets:

the auxiliary classifier updates parameters by using an unmarked data set, and aims to train the differences among the maximized auxiliary classifiers by using a difference loss function as a target function to obtain a tighter decision boundary, thereby selecting high-information samples and adding the samples into a label data set.

Specifically, in the sub-step S700-3 of the step S700, the data sets are sorted in descending order according to the numerical values of f (x), and are sequentially sorted from the unlabeled data set D _U A part of the unlabeled samples with the most information gain is selected and moved to the labeled data set D _L The formula for F (x) is:

F(x)＝d(p ₁ (x),p ₂ (x))。

the method has the advantages that two auxiliary classifiers are added to the local model, the structure of the local model is the same as that of the main classifier, so that the method is simple to implement, the difference between the maximized auxiliary classifiers is trained by taking the difference loss function as the objective function, a tighter decision boundary can be obtained, and the high-information samples can be effectively selected.

And S800, returning the supplemented labeled data set to execute the step S600, and performing global model training again until the labeled data set cannot be expanded.

The beneficial effect of this step is that after S700, the labeled data sets of the clients are further expanded, and then federate learning training is performed to obtain a model with better performance.

In the embodiment, as shown in fig. 2, an active federal learning model training system implemented based on the method includes a central server and a plurality of participant devices, where the participant devices store a large number of unlabeled data sets, and each participant has a related domain expert to perform high-quality labeling work on data samples.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims

1. A multi-party cooperative data learning system is characterized by comprising a central server for laying out a global model and a client with local classification models in multiple parties, wherein the central server issues model parameters, the client utilizes local label data to carry out reasoning and training, the reasoning and training results are returned to the central server, and the central server receives results of the client with multiple parties to carry out Federal learning and training on the global model; and the client actively learns the local unlabeled data according to the acquired trained global model parameters, expands the local labeled data set, and the global model performs federated learning on the labeled data set expanded by the multi-party clients again.

2. A multi-party cooperative data learning model training method is characterized by comprising the following steps:

1) the central server initially randomly generates a global model parameter W and transmits the global model parameter W to all the clients;

2) each client receives the global model parameter W issued by the central server, loads the global model parameter W to the local model, and uses the local model to label the local data set D _L Performing model reasoning on all samples, recording the predicted loss value of the learning in the current round and uploading the predicted loss value to a central server;

6) iterative training, repeating 2) -5) until a certain training frequency is met, and finishing the training of the global model of the federal learning;

7) each client receives updated parameters W after global model training sent by the central server ^* Loading the data to a local model for active learning, selecting an unlabeled sample with the most information gain by using the local model, and requesting an expert to label the sample;

3. The multi-party collaborative data learning model training method according to claim 2, wherein the calculation formula of the prediction loss value in step 2) is:

wherein V _i ^t Representing the predicted loss value of the ith client in the tth round of federal learning, t representing the tth round of federal learning training, i representing the ith client, n _i Representing the current labeled data set D of the ith client _L Number of samples owned in, l (-) represents the loss function, x _k 、y _k Respectively represent the kth sample and the sample label, W _local Representing local model parameters.

4. The method for training the multi-party collaborative data learning model according to claim 3, wherein the calculation formula of the linear numerical value mapping in step 3) is:

and the sum of all the client prediction loss values is shown, the client sampling is carried out by taking the prediction loss values after linear numerical value mapping as discrete probability distribution, and the higher the prediction loss value is, the higher the probability of the client being sampled is.

5. The multi-party collaborative data learning model training method according to any one of claims 2 to 4, wherein the model aggregation algorithm in the step 5) is formulated as:

wherein h is the number of currently selected clients,

6. The multi-party collaborative data learning model training method according to claim 5, wherein the step 7) local model active learning method is as follows:

7.2) Using the labeled data set D _L And unlabeled data set D _U Training local active learning models；

7. The multi-party collaborative data learning model training method according to claim 6, wherein two additional auxiliary classifiers are added as same as a main classifier network architecture, a network parameter is generated by adding random Gaussian noise to a network parameter of the main classifier, the added random Gaussian noise p-N (0,0.1), feature maps obtained after a data sample passes through a main network respectively enter the main classifier and the auxiliary classifiers, and the classifiers are not affected.

8. The multi-party collaborative data learning model training method according to claim 7, wherein the local active learning model of step 7.2) is trained as follows:

the backbone network and the master classifier are denoted by theta, the backbone network is denoted by b, and the backbone network is denoted by theta ₁ And theta ₂ Two auxiliary classifiers are represented, p represents the probability distribution of the sample passing through theta output, and p represents the probability distribution of the sample passing through theta output ₁ Represents the sample passing through (b, theta) ₁ ) Probability distribution of output in p ₂ Represents the sample passing through (b, theta) ₂ ) A probability distribution of the output;

a: training a local active learning model by using the labeled data set;

The cross entropy loss function is calculated by the formula:

is the gradient:

b: training the auxiliary classifier by using the unlabeled data set;

b-1: calculating the sample pass through (b, θ) ₁ )、(b，θ ₂ ) Loss of variance L from reasoning _dist ：

L _dist ＝d(p ₁ ,p ₂ )+d(p ₁ ,p)+d(p ₂ ,p)，

B-2: updating the auxiliary classifier parameters: