CN112364943A

CN112364943A - Federal prediction method based on federal learning

Info

Publication number: CN112364943A
Application number: CN202011456395.8A
Authority: CN
Inventors: 李先贤; 段锦欢; 王金艳
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2021-02-12
Anticipated expiration: 2040-12-10
Also published as: CN112364943B

Abstract

The invention discloses a federal prediction method based on federal learning, which enables the parameter change of a neural network model updated by a single participant to have direction difference and no size difference by unitizing a locally updated gradient vector, thereby protecting the data privacy, simultaneously not using homomorphic, differential privacy or other encryption technologies, and greatly reducing the communication cost between equipment and a server under the condition of not losing the data precision. In addition, considering that the data are often large in difference in a federal learning scene, the performance of local participants can be improved by increasing the local information of the data, the uploaded neural network model parameters are clustered by using a k-means algorithm to obtain similar neural network model parameters, and the aggregation weight of the neural network model parameters is improved, so that the method is more suitable for the data scene of the participants.

Description

Federal prediction method based on federal learning

Technical Field

The invention relates to the technical field of federal learning, in particular to a federal prediction method based on federal learning.

Background

In most industries, due to problems of industry competition, privacy security, complex administrative procedures and the like, data is not shared, and even among different departments of the same company, centralized integration of data faces an important resistance. In reality, it is almost impossible to integrate data distributed in various places and organizations, and the cost is enormous. Meanwhile, all countries strengthen the protection of data security and privacy, and a new law General Data Protection Regulation (GDPR) introduced recently in the European Union shows that the strictness of the user data privacy and security management is a world trend. Aiming at two problems of data island and privacy safety, Google proposed a federal learning framework in 2016, and the design goal of the Federa learning framework is to develop efficient machine learning among multiple participants or multiple computing nodes on the premise of guaranteeing information safety during big data exchange, protecting terminal data and personal data privacy and guaranteeing legal compliance.

The federal learning is that a plurality of data owners form a alliance and participate in the training of the global model together. On the basis of protecting data privacy and model parameters, all the participants only share the encrypted model parameters or the encrypted intermediate calculation results, but do not share original data, so that the data can be used and invisible, and the jointly constructed model can achieve better model performance. With the continuous perfection of laws and regulations in data security, more and more companies and organizations schedule privacy security, and more researchers invest the data security.

In federal learning, using matrix D_iRepresenting the data held by each data owner i, each row of the matrix representing a sample, and each column representing a feature. Meanwhile, some data sets may also contain label data, for example, a label in the financial field may be a credit of a user, a label in the marketing field may be a purchasing desire of the user, and a label in the education field may be a degree of a student. For this purpose, the feature space is denoted by X, the label space by Y and the sample ID space by I, so that a complete training data set (I, X, Y) is formed among the feature X, the label Y and the sample ID. In existing federal learning, its parameters are in the polymerization processA pure federal averaging mode is adopted, and the prediction performance of the model obtained by the mode is poor. In addition, because encryption is required on the shared parameters, the communication cost between the equipment and the server can be greatly increased by using homomorphic or other encryption technologies, and the communication efficiency is low.

Disclosure of Invention

The invention aims to solve the problem of poor prediction effect of the existing federal learning and provides a federal prediction method based on the federal learning.

In order to solve the problems, the invention is realized by the following technical scheme:

a federal forecast method based on federal learning comprises the following steps:

step 1, a server initializes parameters of a neural network model and sends the parameters to all participants; all participants take the parameters of the initialized neural network model as model parameters of the 0 th round of the local neural network model;

step 2, the participation direction server meeting the conditions provides a request for participating in the t-th round of training, and the server selects cK participants as training participants to participate in the t-th round of training;

step 3, each training participant of the t round of training utilizes a local training data set D_kTraining a local neural network model, and updating model parameters of the t-1 th round of the local neural network model by using a random gradient descent method and gradient unit vectors in the training process

Obtaining the t round uploading model parameter of the local neural network model

Step 4, uploading model parameters of the t-th round of the local neural network model by all training participants of the t-th round of training

Uploading to a server;

step 5, uploading model parameters to the tth round by the server based on a k-means clustering algorithm

Polymerizing to obtain the final model parameter of the discrete cluster of the t-th round

And the t round clustering final model parameter of each clustering cluster

Step 6, the server enables the t-th clustering cluster final model parameter of each clustering cluster

Respectively sending the parameters to training participants corresponding to the t-th round clustering point model parameters of the corresponding clustering clusters, and obtaining the final model parameters of the t-th round discrete clusters of the discrete clusters

Transmitting the training participator corresponding to the discrete point model parameter of the t-th round of the discrete cluster and other participators which do not participate in the t-th round;

step 7, judging whether the local neural network models of all the participants have converged or reach the preset training times: if yes, go to step 8; otherwise, making t equal to t +1, and returning to the step 2;

and 8, each participant sends the local test data into the local neural network model, and the local test data is predicted by using the local neural network model.

The specific process of the step 3 is as follows:

step 3.1, training participant k utilizes local training data set D_kTraining the current local neural network model, and updating model parameters of the t-1 th round of the local neural network model by using a random gradient descent method and gradient unit vectors in the training process

Obtaining the first model parameter of the t-th round of the local neural network model

Step 3.2, training participant k first from local training data set D_kSelecting a part of data to form a part of local training data set B_kReuse part of the local training data set B_kTraining the current local neural network model, and updating the first model parameter of the t-th round of the local neural network model by using a random gradient descent method and gradient unit vectors in the training process

Obtaining the second model parameter of the t-th round of the local neural network model

Step 3.3, training participant k utilizes local training dataset D_kTraining the current local neural network model, and updating the second model parameter of the t-th round of the local neural network model by using a random gradient descent method and gradient unit vectors in the training process

The specific process of the step 5 is as follows:

step 5.1, the server uploads all the t-th round uploading model parameters by using a k-means clustering algorithm

Clustering into M clustering clusters, and calculating the cluster center coordinate of each clustering cluster;

step 5.2, the server follows eachClustering cluster t round uploading model parameter

Selecting a t-th round uploading model parameter of which the distance between the parameter and the cluster center coordinate of the cluster is out of a preset range

As the t-th discrete point model parameters, forming a new discrete cluster by the t-th discrete point model parameters selected from all the cluster clusters; at the moment, uploading model parameters of the remaining t-th round of each cluster

As the t-th round clustering point model parameter;

step 5.3, the server selects a certain number of discrete point model parameters of the t-th round from all discrete point model parameters of the t-th round of the discrete clusters to average, and the discrete cluster intermediate model parameters of the t-th round of the discrete clusters are obtained

Meanwhile, the server averages the t-th round clustering point model parameters in all the t-th round clustering point model parameters of each clustering cluster to obtain the t-th round clustering cluster middle model parameters of each clustering cluster

Step 5.4, the server calculates the t-th round discrete cluster final model parameter of the discrete cluster

And the t round clustering final model parameter of each clustering cluster

Wherein:

the alpha is the weight of the current cluster i, beta is the weight of other clusters j, gamma is the weight of a discrete cluster, alpha + beta + gamma is 1, and alpha > beta > gamma;

the Euclidean distance between the cluster center coordinate of the current cluster i and the cluster center coordinates of other cluster j is obtained; i belongs to M, j is not equal to i, and M is the number of the clustering clusters; k belongs to cK, and the cK is the number of the training participants; t is the number of training rounds.

Compared with the prior art, the invention has the following characteristics:

1. considering that the data are often large in difference in a federal learning scene, the performance of the local participants can be improved by increasing the local information of the data, the method utilizes a k-means algorithm to cluster uploaded neural network model parameters to obtain similar neural network model parameters, improves the aggregation weight of the neural network model parameters, and is more suitable for the data scene of the local participants.

2. According to the method, the gradient vector unit of local update is adopted, so that the change of the neural network model parameters updated by a single participant is only different in direction and has no difference in size, the data privacy is protected, homomorphic privacy, differential privacy or other encryption technologies are not needed, and the communication cost between equipment and a server is greatly reduced under the condition of not losing the data precision.

Drawings

FIG. 1 is a schematic diagram of a federated prediction method based on federated learning.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.

For those skilled in the art of federal learning, before federal training can be performed, it is necessary to identify federal learning participants and servers and build a federal learning environment. Before participating in federal learning, a participant needs to prepare a local training data set to be trained, and needs a local neural network model determined according to an actual application scenario.

Taking keyboard input prediction as an example, the participating parties are thousands of mobile devices (mobile phones) participating in federal learning, and the server is Ariiyun or Baidu cloud. The local training data set is the data of input method (e.g., Gboard) users who select a shared text segment when entering the google application. The text is truncated to contain a phrase consisting of several words, and segments are only occasionally logged in from individual users. Prior to training, the logs were anonymized and deprived of personally identifiable information. Furthermore, segments are only used for training when starting with a sentence mark. The local neural network model uses a variant of the long-short term memory (LSTM) recurrent neural network, called the coupled-input forgetting gate (CIFG), for the next-word prediction model, and bound input-embedded and output projection matrices are used to reduce function size and accelerate training. Given a vocabulary of size V, by d Wv and an embedding matrix w ∈ R^D×VEncoding a single hotspot v epsilon R^VMapping to a dense embedded vector d ∈ R^D. Output projection of CIFG, also at R^DIs mapped to W^Th∈R^VThe output vector of (1). The softmax function on the output vector converts the raw logarithm to a normalized probability, and the model is trained using the cross entropy loss of the output label and the target label. Virtual keyboard input suggests that users all use mobile devices (cell phones), but the data input by different users on the keyboard is different.

Taking bank money laundering prediction as an example, the participating party is a bank participating in federal study, and the server is Ali cloud or Baidu cloud. The local training data set is business data of a bank, and the data comprises four items of client id, number x1 of inconsistent fund source and operation range, number x2 of large transaction and whether label data Y is used for money laundering. The local neural network model is set as a fully-connected neural network with two layers of three nodes and a softmax function, and the activation function is relu. The bank anti-money laundering task, bank A and bank B are in different areas. The business data of the bank A and the bank B have the same characteristic space and different customers due to the same business.

Referring to fig. 1, a federal prediction method based on federal learning includes the following steps:

step 1, a server initializes parameters of a neural network model and sends the parameters to all K participants; all participants take the parameters of the initialized neural network model as the model parameters of the 0 th round of the local neural network model

Wherein K belongs to K, and K is the number of the participants.

In the keyboard input prediction example, the 0 th round raw model parameters are broadcast to all mobile devices (handsets) participating in federal training.

In the bank money laundering prediction example, round 0 original model parameters are broadcast to all bank devices participating in federal training.

And 2, the participation direction meeting the conditions provides a request for participating in the t-th round of training to the server, and the server selects cK participants as training participants to participate in the t-th round of training.

Participants meeting a certain condition (the condition is determined according to the task of the federal learning training plan) can make a request to the server, and the server can participate in the current training, after receiving the request, a part of the participants can be selected to participate in the current training, if some participants do not participate in the current training, the server can make the requests again after a period of time, and the server can consider the factors of the number of the participants and the overtime time. This round of training will only succeed if enough devices can participate in the current round of training before a timeout.

Wherein k belongs to cK, and cK is the number of training participants.

By unitizing the locally updated gradient vectors, the parameter change of the neural network model updated by a single participant only has direction difference and no size difference. Even if an attacker obtains the neural network model parameters, only the gradient unit vector is known, and the local data cannot be reversely pushed out, so that the data privacy is protected. When uploading the parameters, the parameters do not need to be encrypted, and the communication efficiency between the participants and the server is improved.

This phase is trained once.

In the formula, eta is the learning rate,

for the t-th round of training, the model parameter gradient of sample x in party k.

Step 3.2, training participant k first from local training data set D_kSelecting a part forming part of the local training data set B_kReuse of part of the local training dataSet B_kTraining the current local neural network model, and updating the first model parameter of the t-th round of the local neural network model by using a random gradient descent method and gradient unit vectors in the training process

This phase may be trained multiple times.

In the formula, eta is the learning rate,

This phase is trained once.

In the formula, eta is the learning rate,

for the sample in the t round training, participant kThe model parameter gradient of this x.

In the keyboard input prediction example, the mobile devices participating in the current round of training input a local training data set (input method user's data) including log data and log text into the local neural network model.

In the bank money laundering prediction example, the bank equipment participating in the current round of training inputs a local training data set (business data of the bank) into a local neural network model, wherein the local training data set comprises four items including client id, the number x1 of the capital source and the operating range which do not accord with each other, the number x2 of the large transaction number, and whether the label data Y is money laundering or not.

And uploading to a server.

And the server waits for each participant to return the result after training, and if enough participants return the result before timeout, the training of the round is successful, otherwise, the training of the round fails. The server adopts an aggregation algorithm to aggregate after the training of the round is successful.

Polymerizing to obtain the final model parameter of the t round cluster

And the t-th discrete cluster final model parameters

Due to the fact that the parameters are directly averaged by the federal average, the data are often large in difference under the federal learning scene, and the performance of local participants can be improved by increasing the local information of the data. And clustering the uploaded neural network model parameters by using a k-means algorithm to obtain similar neural network model parameters. During aggregation, the weight of the neural network model parameters of the local cluster is improved for local data, so that the method is more suitable for the data scene of the participant and improves the performance of the neural network model.

Clustering into M cluster clusters, and calculating cluster center coordinates of each cluster

Wherein i belongs to M, and M is the number of the clustering clusters.

Step 5.2, uploading model parameters from the t-th round of each cluster by the server

As the t-th round clustering point model parameters.

Uploading model parameters when the tth round

Coordinates of the cluster center

The Euclidean distance between them is less than or equal to the preset range, i.e.

Then the t round uploads the model parameters

Called as the t-th round clustering point model parameter (the training participant corresponding to the parameter is the normal training participant in the cluster). Uploading model parameters when the tth round

Coordinates of the cluster center

Having a Euclidean distance between them greater than a predetermined range, i.e.

Then the t round uploads the model parameters

And the parameter is called as the t-th round discrete point model parameter (the training participant corresponding to the parameter is the abnormal training participant in the cluster). Wherein s is_iThe standard deviation of all model parameters in the i cluster to cluster center coordinates.

Step 5.3, the server selects Q discrete point model parameters of the t round from all discrete point model parameters of the t round of the discrete clusters to average to obtain discrete cluster intermediate model parameters of the t round of the discrete clusters

Meanwhile, the server selects a certain number of Q cluster parameters from all the t round clustering point model parameters of each cluster to be averaged to obtain the t round clustering cluster intermediate model parameters of each cluster

And the t round clustering final model parameter of each clustering cluster

Wherein:

where α is the weight of the current cluster i, β is the weight of the other cluster j, and γ is the weight of the discrete cluster, and α + β + γ is 1 and α > β > γ. Wherein

And in the t-th round of training, the Euclidean distance between the current cluster i and the cluster center coordinates of other cluster j is calculated.

Respectively sending the parameters to training participators (normal training participators) corresponding to the t-th round clustering point model parameters of the corresponding clustering clusters, and finally obtaining model parameters of the t-th round discrete clusters of the discrete clusters

And transmitting the parameters to training participants (abnormal training participants) corresponding to the discrete point model parameters of the t-th round of the discrete clusters and other participants not participating in the t-th round.

Step 7, judging whether the local neural network models of all the participants have converged or reach the preset training times: if yes, go to step 8; otherwise, let t be t +1, and return to step 2.

The model convergence refers to that the loss function value of the local neural network model is not changed any more or the change amount is smaller than a set change threshold.

In the keyboard input prediction example, each mobile device obtains a trained coupled input forgetting gate neural network model, the mobile device takes the real-time keyboard input characters of a user as input values of the coupled input forgetting gate neural network model, the output values of the coupled input forgetting gate neural network model are the next predicted input characters, and the predicted input characters are displayed on a virtual keyboard for the user to select.

In the bank money laundering prediction example, each bank obtains a trained anti-money laundering neural network model, bank business data comprising three items of client id, number x1 of inconsistent fund source and operation range and number x2 of large-amount transaction are used as input values of the anti-money laundering neural network model, and the output value of the anti-money laundering neural network model is used for predicting whether the business data is suspected to participate in money laundering.

It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims

1. A federated prediction method based on federated learning, characterized in that it comprises the following steps:

Step 1. The server initializes the parameters of the neural network model and sends it to all the participants; all the participants take the parameters of the initialized neural network model as the 0th round model parameters of the local neural network model;

Step 2. Participants who meet the conditions submit a request to the server to participate in the t-th round of training, and the server selects cK participants from them to participate in the t-th round of training as training participants;

Step 3. Each training participant in the t-th round of training uses the local training data set _Dk to train the local neural network model, and uses the stochastic gradient descent method during the training process to update the local neural network model through the gradient unit vector. t-1 round model parameters

Get the t-th upload model parameters of the local neural network model

Step 4. All training participants in the t-th round of training upload the model parameters for the t-th round of their local neural network model

upload to the server;

Step 5. The server uploads the model parameters for the t round based on the k-means clustering algorithm

Aggregate to get the final model parameters of the t-th round of discrete clusters of discrete clusters

and the final model parameters of the t-th round of clusters for each cluster

Step 6. The server sets the final model parameters of each cluster in the t-th round of clustering

They are respectively sent to the training participants corresponding to the model parameters of the t-th round of clustering points of the corresponding cluster clusters, and the final model parameters of the t-th round of discrete clusters of discrete clusters are sent to the training participants.

Send to the training participants corresponding to the discrete point model parameters of the t-th round of discrete clusters and other participants who did not participate in the t-th round;

Step 7. Determine whether the local neural network models of all participants have converged or have reached a predetermined number of training times: if so, go to step 8; otherwise, add 1 to the number of training rounds t, and return to step 2;

Step 8, each participant sends the local test data into the local neural network model, and uses the local neural network model to complete the prediction of the local test data;

The above k∈cK, cK is the number of training participants; t is the number of training rounds.

2. A federated prediction method based on federated learning according to claim 1, wherein the specific process of step 3 is as follows:

Step 3.1. The training participant k uses the local training data set D _k to train the current local neural network model. During the training process, the stochastic gradient descent method is used to update the t-1 round model of the local neural network model through the gradient unit vector. parameter

Obtain the first model parameters of the t-th round of the local neural network model

Step 3.2. The training participant k first selects a part of the data from the local training data set D _k to form a part of the local training data set B _k , and then uses the part of the local training data set B _k to train the current local neural network model. During the training process Use the stochastic gradient descent method to update the first model parameters of the t-th round of the local neural network model through the gradient unit vector

Obtain the second model parameters of the t-th round of the local neural network model

Step 3.3. The training participant k uses the local training data set D _k to train the current local neural network model. During the training process, the stochastic gradient descent method is used to update the second model of the t-th round of the local neural network model through the gradient unit vector. parameter

Get the t-th upload model parameters of the local neural network model

3. A federated prediction method based on federated learning according to claim 2, wherein,

In step 3.1, use the local training data set D _k to train the local neural network model once; in step 3.2, use the local training data set B _k to train the local neural network model multiple times; in step 3.3, use the local training data set _Dk trains the local neural network model once;

The above k∈cK, cK is the number of training participants.

4. A federated prediction method based on federated learning according to claim 1, wherein the specific process of step 5 is as follows:

Step 5.1. The server uses the k-means clustering algorithm to upload all model parameters for the t-th round

Cluster into M clusters, and calculate the cluster center coordinates of each cluster;

Step 5.2, the server uploads model parameters from the t-th round of each cluster

In the t-th round, upload the model parameters with the distance between the cluster center coordinates of the cluster and its cluster outside the preset range.

As the t-th round of discrete point model parameters, the t-th round of discrete-point model parameters selected from all cluster clusters are formed into a new discrete cluster; at this time, the remaining t-th round of each cluster cluster uploads the model parameters

as the t-th round of clustering point model parameters;

Step 5.3. The server selects a certain number of t-th round of discrete point model parameters from all the t-th round of discrete point model parameters of the discrete cluster and averages them to obtain the t-th round of discrete cluster intermediate model parameters of the discrete cluster

At the same time, the server averages the t-th round of cluster point model parameters from all the t-th round of cluster point model parameters of each cluster to obtain the t-th round of cluster intermediate model parameters of each cluster.

Step 5.4, the server calculates the final model parameters of the discrete cluster in the t-th round of discrete clusters

and the final model parameters of the t-th round of clusters for each cluster

in:

The above α is the weight of the current cluster i, β is the weight of other clusters j, γ is the weight of the discrete cluster, α+β+γ=1 and α>β>γ;

is the Euclidean distance between the cluster center coordinates of the current cluster i and the cluster center coordinates of other clusters j; i∈M, j∈M, j≠i, M is the number of clusters; k∈cK, cK is the number of training participants; t is the number of training rounds.

5. A federated prediction method based on federated learning according to claim 1, wherein in step 7, the convergence of the local neural network model of the participant means that the loss function value of the local neural network model no longer changes or changes. The amount is less than the set change threshold.