CN112926088B

CN112926088B - Federal learning privacy policy selection method based on game theory

Info

Publication number: CN112926088B
Application number: CN202110292473.3A
Authority: CN
Inventors: 林思昕; 孙哲; 殷丽华; 那崇宁; 李丹; 李超; 罗熙; 韦南
Original assignee: Guangzhou University; Zhejiang Lab
Current assignee: Guangzhou University; Zhejiang Lab
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2024-03-19
Anticipated expiration: 2041-03-18
Also published as: CN112926088A

Abstract

The invention discloses a federal learning privacy policy selection method based on game theory, which comprises the following steps: the server provides thresholds with different service costs for the participants, the participants select the optimal threshold according to whether the thresholds meet the service quality, the privacy disclosure cost and the like, and the server updates the service cost in the next iterative training; the server obtains optimal model parameters through multiple iterations, so that a long-term stable service state of the model is maintained, and the model is provided for the participants. The method effectively avoids malicious behaviors such as taking a car and the like of the participants, so that the server can obtain service cost to the maximum extent, and the participants can obtain long-term high-quality service.

Description

Federal learning privacy policy selection method based on game theory

Technical Field

The invention relates to the field of federal learning and privacy protection, in particular to a federal learning privacy policy selection method based on game theory.

Background

Federal learning is a machine learning process that can cooperatively train a model without collecting all data of each participant, and when the participants do not adopt privacy protection settings, the model trained by the server has the best service quality, but the problem of revealing the privacy of the participants exists, and the interests of the participants are seriously damaged; when the participants adopt very strong privacy protection settings, the personal privacy of the participants can be guaranteed, but the service quality of the model can be influenced. Therefore, there is a need at this time to have a threshold to adjust the privacy preserving strength of the participants and the server trained model quality of service.

However, in model training, since most participants are selfish, they choose the maximum threshold for privacy protection when they have the right to secure personal privacy information, and they usually consider the interests in front of the eyes rather than long term when choosing the threshold.

Although many incentive mechanisms have been proposed to address the selfish problem in model quality of service and privacy preserving trade-offs, most focus on disposable gaming models, such as disposable gaming in outsourcing services, to improve k-anonymous privacy preservation by designing a coalition policy and then sharing revenue results among cooperating users. Meanwhile, most of the game processes are based on the assumption of complete information, namely, each game party can know the strategies and rewards of other game parties. The complete information assumption is difficult to implement in reality.

Therefore, the existing strategy weighing method does not consider long-term service benefits, namely a high-quality model which can serve the participants for a long time and can guarantee privacy of the participants cannot be trained; most of the existing strategy weighing methods only consider one game, and in fact, multiple games are possible for both parties, and the influence of the current game situation on the following game process is required to be considered.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a federal learning privacy policy selection method based on a game theory, which provides thresholds with different service costs for participants through a server, wherein the participants select an optimal threshold according to whether the thresholds meet the service quality, the privacy revealing cost and the like, and the server updates the service cost in the next iterative training; the server obtains optimal model parameters through multiple iterations, so that a long-term stable service state of the model is maintained, and the model is provided for the participants.

The aim of the invention is achieved by the following technical scheme:

a federal learning privacy policy selection method based on game theory, the method comprising: the server provides the participants with model quality trade-off parameters lambda with different service costs _MQ Trade-off parameter lambda with privacy strength _PI Ratio lambda of (2) _MQ /λ _PI The participators select the optimal threshold according to whether the service quality and the privacy revealing cost are met or not, train the model by using own data sets, and send the selected threshold and model parameters updated by training to the server; the server collects the model parameters of each participant to further train the model, obtain optimized model parameters, and update the service threshold lambda _MQ /λ _PI And the service cost is sent to the participators, so that the server keeps the long-term stable service state of the model and provides the service state for the participators.

Further, in each iteration, the server calculates the utility of each participant _i And utility average for all participantsutility，utility＝∑utility _i N; utility availability when a party is _i Greater than or equal to utility average of overall participantsutilityWhen the method is used for the next iteration, the strategy of the method is not changed; otherwise, the participant reselects the threshold lambda at the next iteration _MQ /λ _PI 。

Further, when the participants adopt the low-quality data training model, no rewards are obtained, namely, rewards are 0, and when the participants adopt the low-quality data training model, rewards are obtained, wherein the rewards are as follows:

where x is the number of iterations of the evolution game, b is the revenue awarded to the participants by the server, qos (ρ, r) =1- (1- ρ) ^r Is a modelLocal QoS contribution function, ρ is global QoS contribution, r _x To select the number of participants for the high quality threshold in the xth iteration, r _fix Qos for the value of outdated data in each round of new training _expect P is the expected value of global service quality, pdr (sigma, r) is the contribution function of the privacy of the participants, sigma is the contribution of the privacy of the participants, Z is the probability of the participants selecting a high quality threshold, and N is the total number of the participants.

Further, the server finally stabilizes the whole evolution game model at a probability Z by adjusting the budget b of rewards; the probability Z refers to the probability of searching for a high quality threshold selected by a participant in the process of training and updating the model, so as to ensure the long-term service quality of the model; through multiple iterations, finally, the participants trained by the whole service select a high-quality data training model with the probability of Z, and select a low-quality data training model with the probability of 1-Z, wherein the calculation method of Z is as follows: z=utility/utility _i 。

Further, utility of the participant _i The data quality is distinguished according to the following specific calculation formula:

where j represents the level of high quality data, reorder (x) represents the rewards per participant per iteration round, and cost (x) is the service cost paid out per participant per iteration round.

Further, the method comprises the following steps:

step S1: by using different thresholds lambda _MQ /λ _PI Training a model and for each threshold lambda _MQ /λ _PI Setting different service fees; wherein lambda is _MQ Lambda is a trade-off parameter for model quality _PI A trade-off parameter for privacy intensity;

step S2: the server sends the trained models, different thresholds, corresponding service fees and the like to each participant for selection;

step S3: each participant selects a proper threshold lambda according to the requirements of the participants on the service quality, the privacy protection strength and the like _MQ /λ _PI And pays the service fee corresponding to the server;

step S4: after the participants finish the selection, using a local data set training model, and sending the threshold value, the corresponding cost and the updated parameters to a server;

step S5: the server gathers and adjusts the model parameters of each participant, trains the model again, and resets the service charge according to the selection condition of the threshold value;

step S6: the server selects a threshold lambda for each participant _MQ /λ _PI Compared with the threshold range set by the server, when the threshold lambda is selected by the participant _MQ /λ _PI If the service quality is not within the threshold range set by the server, the participant is considered to select low service quality, and no rewards are obtained, and the benefit is 0; when the participant selects the threshold lambda _MQ /λ _PI When within the threshold set by the server, the participant is considered to be selecting a high quality of service,

step S7: the server uses the utility of each participant _i Calculating utility average value of overall participants of the iterationutility，utility＝∑utility _i N, and send to each participant, calculate the utility of the participant _i The following are provided:

step S8: repeating the steps S2 to S7, and keeping the threshold selection of the current iteration unchanged when the utility of the participant is greater than or equal to the average utility value of the population; otherwise, the participant needs to reselect the threshold.

The beneficial effects of the invention are as follows:

(1) Good participant privacy protection effect: the method effectively avoids the problem of privacy leakage of data of the model participants, takes the federal learning training aggregation update model as a scene, uses the evolutionary game in the game theory, ensures the privacy of the participants through the selection of the participants, and obtains long-term high-quality service.

(2) Good server long-term service effect: according to the invention, malicious behaviors such as 'taking a car for convenience' of a participant are effectively avoided, the federal learning training aggregation update model is taken as a scene, the evolution game in the game theory is used, and high-quality data is obtained through the selection of the participant, so that the aggregation update model is trained, and long-term high-quality service is provided for the participant.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a federal learning privacy policy selection method based on game theory of the present invention.

Fig. 2 is a schematic diagram of the interaction of a server and a participant in the method of the invention.

Detailed Description

Other advantages and effects of the present invention will become readily apparent to those skilled in the art from the following disclosure, when considered in light of the accompanying drawings, by describing embodiments of the present invention with specific embodiments thereof. The invention may be practiced or carried out in other embodiments and details within the scope and range of equivalents of the various features and advantages of the invention.

As shown in fig. 1-2, a flow chart of a federal learning privacy policy selection method based on game theory according to an embodiment of the present invention includes the following steps:

step S1: by using different thresholds lambda _MQ /λ _PI Training a model and for each threshold lambda _MQ /λ _PI Setting different service fees; which is a kind ofIn lambda, lambda _MQ Lambda is a trade-off parameter for model quality _PI A trade-off parameter for privacy intensity;

step S7: the server uses the utility of each participant _i Calculating utility average value of overall participants of the iterationutility，utility＝∑utility _i N, and send to each participant;

where j represents the level of high quality data, x is the number of iterations of the evolutionary game, reward (x) represents the awards of the iterative participants of the x-th round, cost (x) is the expense of the iterative participants of the x-th round, and b is the benefit awarded to the participants by the server，qos(ρ,r)＝1-(1-ρ) ^r As a function of the contribution degree of the global service quality of the model, ρ is the contribution degree of the global service quality, r _x To select the number of participants for the high quality threshold in the xth iteration, r _fix Qos for the value of outdated data in each round of new training _expect P is the expected value of global service quality, pdr (sigma, r) is the contribution function of the privacy of the participants when privacy leakage occurs, sigma is the contribution degree of the privacy of the participants, Z is the probability of the participants selecting a high quality threshold value, and N is the total number of the participants;

step S8: repeating the steps S2 to S7, and keeping the threshold selection of the current iteration unchanged when the utility of the participant is greater than or equal to the average utility value of the population; otherwise, the participant needs to reselect the threshold. The server eventually stabilizes the entire evolutionary game model at a probability Z by adjusting the budget b of the rewards.

Due to p, ρ, σ, qos of the respective participants _expect 、r _x-1 、r _fix Parameters such as (because the amount of time-shifted data of the obsolete data is statistically stable), the self utility can be easily calculated if they get the number of participants to choose a high quality threshold. However, due to incomplete information and assumptions of limited rational participants, they cannot get the choice of other participants until the iteration is completed.

One embodiment of the application of the present invention in a smart medical scenario is as follows.

In the smart medical scenario, the diagnosis data of the hospital server and each participant trains a system suitable for parkinsonism monitoring through federal learning, so as to realize the diagnosis of on-line parkinsonism. The specific implementation process is as follows:

step S1: the hospital server trains the model by using different thresholds (the trade-off weight ratio of the parkinsonism monitoring system service quality and the participant privacy classifier), and at the same time, each threshold is set by the hospital server to different service fees;

step S2: the hospital server sends the trained model and different thresholds and corresponding service fees to each participant for selection and training;

step S3: each participant selects a threshold according to whether the service effect of parkinsonism monitoring, the privacy protection intensity of the participant and the like are met, and pays the service fee corresponding to the hospital server;

step S4: each participant uses own data set training model, and sends the selected threshold value and the model parameters after training update to the server;

step S5: the hospital server retrains and updates the parkinsonism system according to the threshold value selected by each participant and the submitted parameters, and resets the service fee of each threshold value;

step S6: the hospital server compares the threshold selected by the participant with the threshold range set by the server, and when the threshold selected by the participant is not in the threshold range set by the server, the hospital server considers that the participant selects low service quality, and rewards are not obtained, and the benefit is 0; when the threshold selected by the participant is within the threshold range set by the server, the participant is considered to be selected by high service quality;

step S7: the hospital server calculates the utility of each participant _i And ensemble averaging utility for this iterationutilityAnd send to each participant;

step S8: repeating S2-S7, and obtaining the utility of the party _i Greater than or equal to utility averageutilityWhen the current iteration threshold is selected as the previous iteration, the utility of the participator is kept the same _i Less than the utility averageutilityAt this time, the threshold is reselected for training.

In a real scenario, since the federally learning-based application is a long-term service, the model for concept drift should be updated continuously, but long-term service cannot be provided without a certain number of participants providing high quality data. Therefore, the modeling of the invention is an evolution game mode, and the evolution stability strategy (which means the strategy adopted by most participants is different from other strategies) is found through repeated iteration of the participants and the server, so that the stable service state of the model is ensured, the service cost is obtained to the maximum extent, and long-term benefits are brought to the participants.

It will be appreciated by persons skilled in the art that the foregoing description is a preferred embodiment of the invention, and is not intended to limit the invention, but rather to limit the invention to the specific embodiments described, and that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for elements thereof, for the purposes of those skilled in the art. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A federal learning privacy policy selection method based on game theory is characterized by comprising the following steps: the server provides the participants with model quality trade-off parameters lambda with different service costs _MQ Trade-off parameter lambda with privacy strength _PI Ratio lambda of (2) _MQ /λ _PI The participators select the optimal threshold according to whether the service quality and the privacy revealing cost are met or not, train the model by using own data sets, and send the selected threshold and model parameters updated by training to the server; the server collects the model parameters of each participant to further train the model, obtain optimized model parameters, and update the service threshold lambda _MQ /λ _PI And the service cost is sent to the participators, so that the server keeps the long-term stable service state of the model and provides the service state for the participators;

in each iteration, the server calculates the utility of each participant _i And utility average for all participantsutility，utility＝∑utility _i N; utility availability when a party is _i Greater than or equal to utility average of overall participantsutilityWhen the method is used for the next iteration, the strategy of the method is not changed; otherwise, the participant reselects the threshold lambda at the next iteration _MQ /λ _PI ；

When the participants adopt the low-quality data training model, rewards are not obtained, namely, rewards are 0, and when the participants adopt the low-quality data training model, rewards are obtained, wherein the rewards are as follows:

where x is the number of iterations of the evolution game, b is the revenue awarded to the participants by the server, qos (ρ, r) =1- (1- ρ) ^r As a function of the contribution degree of the global service quality of the model, ρ is the contribution degree of the global service quality, r _x To select the number of participants for the high quality threshold in the xth iteration, r _fix Qos for the value of outdated data in each round of new training _expect P is the expected value of global service quality, pdr (sigma, r) is the contribution function of the privacy of the participants when privacy leakage occurs, sigma is the contribution degree of the privacy of the participants, Z is the probability of the participants selecting a high quality threshold value, and N is the total number of the participants;

the server adjusts the budget b of rewards to enable the whole evolution game model to be finally stabilized at a probability Z; the probability Z refers to the probability of searching for a high quality threshold selected by a participant in the process of training and updating the model, so as to ensure the long-term service quality of the model; through multiple iterations, finally, the participants trained by the whole service select a high-quality data training model with the probability of Z, and select a low-quality data training model with the probability of 1-Z, wherein the calculation method of Z is as follows: z=utility/utility _i 。

2. The federal learning privacy policy selection method based on game theory of claim 1, wherein: utility of the participant _i The data quality is distinguished according to the following specific calculation formula:

3. The federal learning privacy policy selection method based on game theory according to claim 1, wherein the method comprises the steps of:

step S6: the server selects a threshold lambda for each participant _MQ /λ _PI Compared with the threshold range set by the server, when the threshold lambda is selected by the participant _MQ /λ _PI If the service quality is not within the threshold range set by the server, the participant is considered to select low service quality, and no rewards are obtained, and the benefit is 0; when the participant selects the threshold lambda _MQ /λ _PI When the server sets the threshold value range, consider that the participant selects high service quality;

calculating utility of the party _i The following are provided: