CN114330587A

CN114330587A - Federal learning incentive method under specific index

Info

Publication number: CN114330587A
Application number: CN202210001509.2A
Authority: CN
Inventors: 王丽霞; 王大维; 王南; 高强; 刘晓强; 教传铭; 曲睿婷; 胡非; 张福良; 张戈
Original assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-12

Abstract

The invention provides a two-stage federal learning incentive method under specific indexes, which comprises the following steps: receiving a platform model precision improvement task index issued by a platform server; making a learning strategy according to a model precision improvement target issued by a platform server; training and acquiring the total reward amount of the platform server based on the learning strategy; the platform server is obtained a reward amount based on the contribution proportion distribution to the platform model precision value promotion. The two-stage federal learning incentive mechanism under the specific model precision index can be combined with the actual situation, unnecessary cost waste is reduced, the incentive mechanism designed from the angle of data quality and data quantity is more comprehensive and scientific, and the training efficiency of federal learning is systematically improved.

Description

Federal learning incentive method under specific index

Technical Field

The invention provides a federal learning incentive method under specific indexes, belongs to the field of distributed machine learning, and particularly provides a federal learning incentive method under specific indexes.

Background

With the continuous development of machine learning technology, data security has become an inevitable problem, and joint learning as a new distributed machine learning model can well solve the data privacy problem. The basic joint learning model addresses the data privacy issue, but such techniques, like crowd sensing, still have another problem in that collaboration between the data island and the platform server becomes inefficient. It is therefore common practice to design appropriate incentive schemes to maximize the benefits of each participant and society.

The main research directions of the federal learning incentive mechanism are Stackelberg game, auction, contract theory, Shapley value, reinforcement learning, blockchain and the like. The Stackelberg game can well establish the relationship between all related subjects for joint learning, namely the relationship between the platform server and the data island is described as the relationship between the master game and the slave game. However, current research is mainly focused on complex incentive mechanisms under uncertain conditions of theoretically constructed indexes. In reality, however, the accuracy of the training model may only meet the requirements of specific indexes. The problem of cost increase may be caused by neglecting the model precision redundancy problem in the actual operation process while only aiming at obtaining the theoretical optimal solution without combining with the actual situation; data quality and data quantity are not effectively used as the basis for the incentive scheme.

Disclosure of Invention

In view of the above problems, the present invention provides a federal learning incentive method under specific indexes, which is suitable for collaboration between a platform server and a plurality of data islands, and comprises the following steps,

s1: receiving a platform model precision improvement task index issued by a platform server;

s2: making a learning strategy according to a model precision improvement target issued by a platform server;

s3: training and acquiring the total reward amount of the platform server based on the learning strategy;

s4: the platform server is obtained a reward amount based on the contribution proportion distribution to the platform model precision value promotion.

Further, in step S2, the data island develops a learning strategy based on the maximization of self utility, and the specific steps are as follows,

1) establishing a utility model of a data island:

U_i＝R_i-C_i，i∈(1，...，N)， (1)

setting up

C_i＝v_ia_i+μ_iq_i，Δθ_i＝σlog_κ(q_i a_i)；

Wherein, U_iFor the utility of data islands i, R_iRepresenting the reward earned by the data island i, C_iRepresents the training cost, Δ θ, of the data island i_iRepresenting the lifting value of the data island i to the training precision of the model, a_iAs the number of data, q_iFor data quality, v_iFor data computation, storage cost comprehensive parameter, mu, of data island i_iFor data islands iA data processing cost parameter, wherein kappa is more than 1 and is a training parameter, and sigma is a precision parameter;

2) based on the utility maximization of the data isolated island, establishing an objective function aiming at the utility model:

wherein, the decision variable of the data island i is the number a of the data sets participating in training_iAnd data quality q_iI.e. its own utility maximization strategy; the second stage is based on Nash equilibrium game among data islands:

the second stage of the game is resolved,

q_ifirst derivative of (d):

a_ifirst derivative of (d):

calculating a Hessian matrix:

solving a system of equations:

the decision variables for training are obtained as follows:

further, the platform server maximizes the total reward amount based on the effect thereof, and the specific steps are as follows:

1) establishing a platform server total reward information calculation model:

U＝V-R， (3)

the setting is carried out in a way that,

u is the utility obtained by the platform server, V represents the total valuation increment of the model and is set as a constant, R represents the total incentive cost paid by the platform server, gamma is the average reward amount of the platform decision, and N is the number of data islands;

2) based on the game of the platform server and the data island in the first stage, the utility of the platform server is maximized, and the objective function is established as follows:

wherein, the decision variable of the platform server is the average reward amount gamma provided by the platform;

mixing the above

Substituting into the platform server objective function to obtain

First derivative of γ:

let the first derivative be zero:

solving can obtain:

the optimal policy value on the platform server side is gamma^*I.e. the actual total prize amount.

Further, a data island decision variable data set quantity a is adopted_iAnd data quality q_iThrough Δ θ_i＝σlog_κ(q_i a_i) Calculating the ratio of the precision value and the contribution value of the specific island to the platform model training; the platform server distributes the incentives according to the proportion:

according to (6) and (7), there are:

the two-stage federal learning incentive mechanism under the specific model precision index can be combined with the actual situation, unnecessary cost waste is reduced, the incentive mechanism designed from the angle of data quality and data quantity is more comprehensive and scientific, and the training efficiency of federal learning is systematically improved.

Drawings

FIG. 1 is a schematic overall flow diagram of the present invention;

FIG. 2 is a schematic diagram of a once-trained Federal learning model under a specific accuracy index;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a federal learning incentive method under specific indexes, which is suitable for collaboration between a platform server and a plurality of data islands, and includes the following steps, wherein each data island,

s3: training and acquiring the reward amount of the platform server based on the learning strategy;

s4: and acquiring the total reward amount distributed by the platform server based on the contribution ratio of the platform model precision value promotion.

The study hypothesis mainly includes two: the data island training data cost is related to the quality and quantity of data; the accuracy improvement of the data model is also related to the quality and quantity of the data. Using the Stackelberg game for analysis: the first stage of the two-stage game is a master-slave game between the server and the data island; the second stage of the two-stage game is a nash equilibrium game among data islands, and the significance is that for any data island i, the final strategy result is the result with the maximum utility, that is, the utility of any other strategy is not as great as the final strategy utility. When all data islands meet the requirements, it can be said that a nash equilibrium state is achieved between the data islands.

The specific implementation mode is as follows:

the data island is used for making a learning strategy based on self utility maximization, and the specific steps are as follows,

1) establishing a utility model of a data island:

U_i＝R_i-C_i，i∈(1，...，N)， (1)

setting up

C_i＝v_ia_i+μ_iq_i，Δθ_i＝σlog_κ(q_i a_i)；

Wherein, U_iFor the utility of data islands i, R_iRepresenting the reward obtained by the data island i; c_iRepresenting the training cost of the data island i; delta theta_iRepresenting the lifting value of the data island i to the training precision of the model, a_iAs the number of data, q_iFor data quality, v_iCalculating and storing cost comprehensive parameters for data of the data island i, wherein the cost comprehensive parameters are known fixed parameters; mu.s_iThe data processing cost parameter of the data island i is a known fixed parameter; kappa > 1 is a training parameter, sigma is a precision parameter, and all parameters are known fixed parameters. Data quantity a_iThe higher the data calculation and storage cost is; data quality q_iThe higher the data processing cost. The higher the data quality and the data quantity are, the higher the accuracy of the model parameters is, the more the angle is_iThe easier, but certain data quality and improvement of data quality to model parameter accuracyThe promotion is in a marginal decreasing rule.

wherein, the decision variable of the data island i is the number a of the data sets participating in training_iAnd data quality q_iAn optimal strategy under the condition, namely a self utility maximization strategy; the second stage is based on Nash equilibrium game among data islands:

the meaning of the method is that for any data island i, the final strategy result is the result with the maximum utility, namely the utility of any other strategy is not as great as the final strategy utility. When all data islands meet the requirements, it can be said that a nash equilibrium state is achieved between the data islands.

Solving the second stage game to determine the optimal data quantity and the data quality local precision target of each data island under the self utility maximization:

q_ifirst derivative of (d):

a_ifirst derivative of (d):

calculating a Hessian matrix:

solving a system of equations:

obtaining an optimal strategy of learning data participation:

1) establishing a platform server total reward information calculation model:

U＝V-R， (3)

the setting is carried out in a way that,

u is the utility obtained by the platform server, V represents the total estimation increment of the model, namely the increment of the estimation value of the model, and is determined by the platform or a third party according to the actual conditions of the specific actual model, so that a corresponding determination constant can be reasonably assumed; r represents the total incentive cost paid by the platform server, gamma is the average incentive amount decided by the platform, namely the platform adjusts the incentive degree by deciding the average incentive amount, so that the regulation and control of the whole incentive mechanism are realized, the average incentive amount value with the maximum effectiveness for the platform is finally obtained according to the data island training condition, and N is the number of data islands;

2) because the precision of the parameters of the training model is a given value of the server platform, and the utility function of the server platform is the total incentive R paid out subtracted from the incremental estimate of the model based on the data quality, the smaller the total incentive R paid out by the server platform is, the greater the utility of the server is. Therefore, the objective function of the server platform is:

mixing the above

Substituting into the platform server objective function to obtain

First derivative of γ:

let the first derivative be zero:

solving can obtain:

optimal policy value on the platform server sideIs gamma^*The significance is that after the accuracy requirement of the specific model is determined, the platform server only needs to make corresponding reward amount, and the utility of the platform server can be maximized on the premise that the accuracy of the specific model is obtained.

S4: and carrying out total reward amount distribution according to the ratio of the precision value of the contribution of the single data island to the precision value promotion of the platform model.

Using the above data set quantity a_iAnd data quality q_iThrough Δ θ_i＝σlog_κ(q_i a_i) Calculating the precision value improved by the platform model training of the specific island; the platform server distributes the incentives according to the proportion:

according to (6) and (7), there are:

in the scheme, the data quality evaluation generally comprises consistency, integrity and timeliness, the platform issues a thirteen-dimensional table comprising three indexes of data type, data integrity and data timeliness, namely the data type, the data integrity and the data timeliness respectively correspond to three dimensions, and the three dimensions are represented by values in an interval of 0-1. The standards of the three index platforms are all 1, then the data islands are compared after data of the data islands are input into a table, namely the data type dimension is the number of data types owned by the data islands in the type types provided by the platforms; the data integrity is the integrity degree of data owned by the data island in the platform; the data timeliness is how much the data timeliness owned by the data island accounts for the timeliness standard provided by the platform. And finally, realizing the quantification of the data quality of each data island by the weighted average of the three indexes.

Referring to fig. 2, the federal learning incentive scheme model proposed in the present invention mainly aims at a training situation under the requirement of a specific model accuracy index. In reality, the requirements under specific accuracy indexes are met, so that the training mechanism efficiency is higher, and the situation that the training cost is wasted due to too high training accuracy is avoided. In addition, although the model training process may have multiple rounds, we can analyze the training mechanism once, and then the last time the training process ends can be regarded as the beginning of the training process and just repeated for multiple times, so the mechanism of the present invention simplifies the training process.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A federal learning incentive method under specific indexes is suitable for collaboration between a platform server and a plurality of data islands, and is characterized in that: comprises the following steps of (a) carrying out,

2. The federal learning incentive method under a specific guideline as claimed in claim 1, wherein: in step S2, the data island formulates a learning strategy based on self utility maximization, and the specific steps are as follows,

1) establishing a utility model of a data island:

U_i＝R_i-C_i，i∈(1，...，N)， (1)

setting up

C_i＝v_ia_i+μ_iq_i，Δθ_i＝σlog_κ(q_i a_i)；

Wherein, U_iFor the utility of data islands i, R_iRepresenting the reward earned by the data island i, C_iRepresents the training cost, Δ θ, of the data island i_iRepresenting the lifting value of the data island i to the training precision of the model, a_iAs the number of data, q_iFor data quality, v_iFor data computation, storage cost comprehensive parameter, mu, of data island i_iTaking data processing cost parameters of a data island i, taking kappa > 1 as a training parameter and taking sigma as a precision parameter;

the second stage of the game is resolved,

q_ifirst derivative of (d):

a_ifirst derivative of (d):

calculating a Hessian matrix:

solving a system of equations:

the decision variables for training are obtained as follows:

3. the federal learning incentive method under a specific guideline as claimed in claim 1, wherein: the platform server makes a corresponding total reward amount based on self effect maximization, and the specific steps are as follows:

1) establishing a platform server total reward information calculation model:

U＝V-R， (3)

the setting is carried out in a way that,

mixing the above

Substituting into the platform server objective function to obtain

First derivative of γ:

let the first derivative be zero:

solving can obtain:

4. The two-stage federal learning incentive method in a specific index as claimed in claim 1, wherein:

decision variable data set quantity a by data island_iAnd data quality q_iThrough Δ θ_i＝σlog_κ(q_i a_i) Calculating the ratio of the precision value and the contribution value of the specific island to the platform model training; the platform server distributes the incentives according to the proportion:

according to (6) and (7), there are: