CN114330587A - Federal learning incentive method under specific index - Google Patents

Federal learning incentive method under specific index Download PDF

Info

Publication number
CN114330587A
CN114330587A CN202210001509.2A CN202210001509A CN114330587A CN 114330587 A CN114330587 A CN 114330587A CN 202210001509 A CN202210001509 A CN 202210001509A CN 114330587 A CN114330587 A CN 114330587A
Authority
CN
China
Prior art keywords
data
platform server
platform
model
island
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210001509.2A
Other languages
Chinese (zh)
Inventor
王丽霞
王大维
王南
高强
刘晓强
教传铭
曲睿婷
胡非
张福良
张戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202210001509.2A priority Critical patent/CN114330587A/en
Publication of CN114330587A publication Critical patent/CN114330587A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a two-stage federal learning incentive method under specific indexes, which comprises the following steps: receiving a platform model precision improvement task index issued by a platform server; making a learning strategy according to a model precision improvement target issued by a platform server; training and acquiring the total reward amount of the platform server based on the learning strategy; the platform server is obtained a reward amount based on the contribution proportion distribution to the platform model precision value promotion. The two-stage federal learning incentive mechanism under the specific model precision index can be combined with the actual situation, unnecessary cost waste is reduced, the incentive mechanism designed from the angle of data quality and data quantity is more comprehensive and scientific, and the training efficiency of federal learning is systematically improved.

Description

Federal learning incentive method under specific index
Technical Field
The invention provides a federal learning incentive method under specific indexes, belongs to the field of distributed machine learning, and particularly provides a federal learning incentive method under specific indexes.
Background
With the continuous development of machine learning technology, data security has become an inevitable problem, and joint learning as a new distributed machine learning model can well solve the data privacy problem. The basic joint learning model addresses the data privacy issue, but such techniques, like crowd sensing, still have another problem in that collaboration between the data island and the platform server becomes inefficient. It is therefore common practice to design appropriate incentive schemes to maximize the benefits of each participant and society.
The main research directions of the federal learning incentive mechanism are Stackelberg game, auction, contract theory, Shapley value, reinforcement learning, blockchain and the like. The Stackelberg game can well establish the relationship between all related subjects for joint learning, namely the relationship between the platform server and the data island is described as the relationship between the master game and the slave game. However, current research is mainly focused on complex incentive mechanisms under uncertain conditions of theoretically constructed indexes. In reality, however, the accuracy of the training model may only meet the requirements of specific indexes. The problem of cost increase may be caused by neglecting the model precision redundancy problem in the actual operation process while only aiming at obtaining the theoretical optimal solution without combining with the actual situation; data quality and data quantity are not effectively used as the basis for the incentive scheme.
Disclosure of Invention
In view of the above problems, the present invention provides a federal learning incentive method under specific indexes, which is suitable for collaboration between a platform server and a plurality of data islands, and comprises the following steps,
s1: receiving a platform model precision improvement task index issued by a platform server;
s2: making a learning strategy according to a model precision improvement target issued by a platform server;
s3: training and acquiring the total reward amount of the platform server based on the learning strategy;
s4: the platform server is obtained a reward amount based on the contribution proportion distribution to the platform model precision value promotion.
Further, in step S2, the data island develops a learning strategy based on the maximization of self utility, and the specific steps are as follows,
1) establishing a utility model of a data island:
Ui=Ri-Ci,i∈(1,...,N), (1)
setting up
Figure BDA0003454549490000021
Ci=viaiiqi,Δθi=σlogκ(qi ai);
Wherein, UiFor the utility of data islands i, RiRepresenting the reward earned by the data island i, CiRepresents the training cost, Δ θ, of the data island iiRepresenting the lifting value of the data island i to the training precision of the model, aiAs the number of data, qiFor data quality, viFor data computation, storage cost comprehensive parameter, mu, of data island iiFor data islands iA data processing cost parameter, wherein kappa is more than 1 and is a training parameter, and sigma is a precision parameter;
2) based on the utility maximization of the data isolated island, establishing an objective function aiming at the utility model:
Figure BDA0003454549490000022
wherein, the decision variable of the data island i is the number a of the data sets participating in trainingiAnd data quality qiI.e. its own utility maximization strategy; the second stage is based on Nash equilibrium game among data islands:
Figure BDA00034545494900000210
the second stage of the game is resolved,
qifirst derivative of (d):
Figure BDA0003454549490000023
aifirst derivative of (d):
Figure BDA0003454549490000024
calculating a Hessian matrix:
Figure BDA0003454549490000025
Figure BDA0003454549490000026
Figure BDA0003454549490000027
Figure BDA0003454549490000028
Figure BDA0003454549490000029
solving a system of equations:
Figure BDA0003454549490000031
the decision variables for training are obtained as follows:
Figure BDA0003454549490000032
further, the platform server maximizes the total reward amount based on the effect thereof, and the specific steps are as follows:
1) establishing a platform server total reward information calculation model:
U=V-R, (3)
the setting is carried out in a way that,
Figure BDA0003454549490000033
u is the utility obtained by the platform server, V represents the total valuation increment of the model and is set as a constant, R represents the total incentive cost paid by the platform server, gamma is the average reward amount of the platform decision, and N is the number of data islands;
2) based on the game of the platform server and the data island in the first stage, the utility of the platform server is maximized, and the objective function is established as follows:
Figure BDA0003454549490000034
wherein, the decision variable of the platform server is the average reward amount gamma provided by the platform;
mixing the above
Figure BDA0003454549490000035
Substituting into the platform server objective function to obtain
Figure BDA0003454549490000036
First derivative of γ:
Figure BDA0003454549490000037
let the first derivative be zero:
Figure BDA0003454549490000038
solving can obtain:
Figure BDA0003454549490000039
the optimal policy value on the platform server side is gamma*I.e. the actual total prize amount.
Further, a data island decision variable data set quantity a is adoptediAnd data quality qiThrough Δ θi=σlogκ(qi ai) Calculating the ratio of the precision value and the contribution value of the specific island to the platform model training; the platform server distributes the incentives according to the proportion:
Figure BDA0003454549490000041
Figure BDA0003454549490000042
according to (6) and (7), there are:
Figure BDA0003454549490000043
the two-stage federal learning incentive mechanism under the specific model precision index can be combined with the actual situation, unnecessary cost waste is reduced, the incentive mechanism designed from the angle of data quality and data quantity is more comprehensive and scientific, and the training efficiency of federal learning is systematically improved.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a once-trained Federal learning model under a specific accuracy index;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a federal learning incentive method under specific indexes, which is suitable for collaboration between a platform server and a plurality of data islands, and includes the following steps, wherein each data island,
s1: receiving a platform model precision improvement task index issued by a platform server;
s2: making a learning strategy according to a model precision improvement target issued by a platform server;
s3: training and acquiring the reward amount of the platform server based on the learning strategy;
s4: and acquiring the total reward amount distributed by the platform server based on the contribution ratio of the platform model precision value promotion.
The study hypothesis mainly includes two: the data island training data cost is related to the quality and quantity of data; the accuracy improvement of the data model is also related to the quality and quantity of the data. Using the Stackelberg game for analysis: the first stage of the two-stage game is a master-slave game between the server and the data island; the second stage of the two-stage game is a nash equilibrium game among data islands, and the significance is that for any data island i, the final strategy result is the result with the maximum utility, that is, the utility of any other strategy is not as great as the final strategy utility. When all data islands meet the requirements, it can be said that a nash equilibrium state is achieved between the data islands.
The specific implementation mode is as follows:
s1: receiving a platform model precision improvement task index issued by a platform server;
s2: making a learning strategy according to a model precision improvement target issued by a platform server;
the data island is used for making a learning strategy based on self utility maximization, and the specific steps are as follows,
1) establishing a utility model of a data island:
Ui=Ri-Ci,i∈(1,...,N), (1)
setting up
Figure BDA0003454549490000051
Ci=viaiiqi,Δθi=σlogκ(qi ai);
Wherein, UiFor the utility of data islands i, RiRepresenting the reward obtained by the data island i; ciRepresenting the training cost of the data island i; delta thetaiRepresenting the lifting value of the data island i to the training precision of the model, aiAs the number of data, qiFor data quality, viCalculating and storing cost comprehensive parameters for data of the data island i, wherein the cost comprehensive parameters are known fixed parameters; mu.siThe data processing cost parameter of the data island i is a known fixed parameter; kappa > 1 is a training parameter, sigma is a precision parameter, and all parameters are known fixed parameters. Data quantity aiThe higher the data calculation and storage cost is; data quality qiThe higher the data processing cost. The higher the data quality and the data quantity are, the higher the accuracy of the model parameters is, the more the angle isiThe easier, but certain data quality and improvement of data quality to model parameter accuracyThe promotion is in a marginal decreasing rule.
2) Based on the utility maximization of the data isolated island, establishing an objective function aiming at the utility model:
Figure BDA0003454549490000052
wherein, the decision variable of the data island i is the number a of the data sets participating in trainingiAnd data quality qiAn optimal strategy under the condition, namely a self utility maximization strategy; the second stage is based on Nash equilibrium game among data islands:
Figure BDA0003454549490000053
the meaning of the method is that for any data island i, the final strategy result is the result with the maximum utility, namely the utility of any other strategy is not as great as the final strategy utility. When all data islands meet the requirements, it can be said that a nash equilibrium state is achieved between the data islands.
Solving the second stage game to determine the optimal data quantity and the data quality local precision target of each data island under the self utility maximization:
qifirst derivative of (d):
Figure BDA0003454549490000054
aifirst derivative of (d):
Figure BDA0003454549490000061
calculating a Hessian matrix:
Figure BDA0003454549490000062
Figure BDA0003454549490000063
Figure BDA0003454549490000064
Figure BDA0003454549490000065
Figure BDA0003454549490000066
solving a system of equations:
Figure BDA0003454549490000067
obtaining an optimal strategy of learning data participation:
Figure BDA0003454549490000068
s3: training and acquiring the total reward amount of the platform server based on the learning strategy;
1) establishing a platform server total reward information calculation model:
U=V-R, (3)
the setting is carried out in a way that,
Figure BDA0003454549490000069
u is the utility obtained by the platform server, V represents the total estimation increment of the model, namely the increment of the estimation value of the model, and is determined by the platform or a third party according to the actual conditions of the specific actual model, so that a corresponding determination constant can be reasonably assumed; r represents the total incentive cost paid by the platform server, gamma is the average incentive amount decided by the platform, namely the platform adjusts the incentive degree by deciding the average incentive amount, so that the regulation and control of the whole incentive mechanism are realized, the average incentive amount value with the maximum effectiveness for the platform is finally obtained according to the data island training condition, and N is the number of data islands;
2) because the precision of the parameters of the training model is a given value of the server platform, and the utility function of the server platform is the total incentive R paid out subtracted from the incremental estimate of the model based on the data quality, the smaller the total incentive R paid out by the server platform is, the greater the utility of the server is. Therefore, the objective function of the server platform is:
Figure BDA0003454549490000071
wherein, the decision variable of the platform server is the average reward amount gamma provided by the platform;
mixing the above
Figure BDA0003454549490000072
Substituting into the platform server objective function to obtain
Figure BDA0003454549490000073
First derivative of γ:
Figure BDA0003454549490000074
let the first derivative be zero:
Figure BDA0003454549490000075
solving can obtain:
Figure BDA0003454549490000076
optimal policy value on the platform server sideIs gamma*The significance is that after the accuracy requirement of the specific model is determined, the platform server only needs to make corresponding reward amount, and the utility of the platform server can be maximized on the premise that the accuracy of the specific model is obtained.
S4: and carrying out total reward amount distribution according to the ratio of the precision value of the contribution of the single data island to the precision value promotion of the platform model.
Using the above data set quantity aiAnd data quality qiThrough Δ θi=σlogκ(qi ai) Calculating the precision value improved by the platform model training of the specific island; the platform server distributes the incentives according to the proportion:
Figure BDA0003454549490000077
Figure BDA0003454549490000078
according to (6) and (7), there are:
Figure BDA0003454549490000081
in the scheme, the data quality evaluation generally comprises consistency, integrity and timeliness, the platform issues a thirteen-dimensional table comprising three indexes of data type, data integrity and data timeliness, namely the data type, the data integrity and the data timeliness respectively correspond to three dimensions, and the three dimensions are represented by values in an interval of 0-1. The standards of the three index platforms are all 1, then the data islands are compared after data of the data islands are input into a table, namely the data type dimension is the number of data types owned by the data islands in the type types provided by the platforms; the data integrity is the integrity degree of data owned by the data island in the platform; the data timeliness is how much the data timeliness owned by the data island accounts for the timeliness standard provided by the platform. And finally, realizing the quantification of the data quality of each data island by the weighted average of the three indexes.
Referring to fig. 2, the federal learning incentive scheme model proposed in the present invention mainly aims at a training situation under the requirement of a specific model accuracy index. In reality, the requirements under specific accuracy indexes are met, so that the training mechanism efficiency is higher, and the situation that the training cost is wasted due to too high training accuracy is avoided. In addition, although the model training process may have multiple rounds, we can analyze the training mechanism once, and then the last time the training process ends can be regarded as the beginning of the training process and just repeated for multiple times, so the mechanism of the present invention simplifies the training process.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (4)

1. A federal learning incentive method under specific indexes is suitable for collaboration between a platform server and a plurality of data islands, and is characterized in that: comprises the following steps of (a) carrying out,
s1: receiving a platform model precision improvement task index issued by a platform server;
s2: making a learning strategy according to a model precision improvement target issued by a platform server;
s3: training and acquiring the total reward amount of the platform server based on the learning strategy;
s4: the platform server is obtained a reward amount based on the contribution proportion distribution to the platform model precision value promotion.
2. The federal learning incentive method under a specific guideline as claimed in claim 1, wherein: in step S2, the data island formulates a learning strategy based on self utility maximization, and the specific steps are as follows,
1) establishing a utility model of a data island:
Ui=Ri-Ci,i∈(1,...,N), (1)
setting up
Figure FDA0003454549480000011
Ci=viaiiqi,Δθi=σlogκ(qi ai);
Wherein, UiFor the utility of data islands i, RiRepresenting the reward earned by the data island i, CiRepresents the training cost, Δ θ, of the data island iiRepresenting the lifting value of the data island i to the training precision of the model, aiAs the number of data, qiFor data quality, viFor data computation, storage cost comprehensive parameter, mu, of data island iiTaking data processing cost parameters of a data island i, taking kappa > 1 as a training parameter and taking sigma as a precision parameter;
2) based on the utility maximization of the data isolated island, establishing an objective function aiming at the utility model:
Figure FDA0003454549480000012
wherein, the decision variable of the data island i is the number a of the data sets participating in trainingiAnd data quality qiI.e. its own utility maximization strategy; the second stage is based on Nash equilibrium game among data islands:
Figure FDA0003454549480000013
the second stage of the game is resolved,
qifirst derivative of (d):
Figure FDA0003454549480000014
aifirst derivative of (d):
Figure FDA0003454549480000015
calculating a Hessian matrix:
Figure FDA0003454549480000021
Figure FDA0003454549480000022
Figure FDA0003454549480000023
Figure FDA0003454549480000024
Figure FDA0003454549480000025
solving a system of equations:
Figure FDA0003454549480000026
the decision variables for training are obtained as follows:
Figure FDA0003454549480000027
3. the federal learning incentive method under a specific guideline as claimed in claim 1, wherein: the platform server makes a corresponding total reward amount based on self effect maximization, and the specific steps are as follows:
1) establishing a platform server total reward information calculation model:
U=V-R, (3)
the setting is carried out in a way that,
Figure FDA0003454549480000028
u is the utility obtained by the platform server, V represents the total valuation increment of the model and is set as a constant, R represents the total incentive cost paid by the platform server, gamma is the average reward amount of the platform decision, and N is the number of data islands;
2) based on the game of the platform server and the data island in the first stage, the utility of the platform server is maximized, and the objective function is established as follows:
Figure FDA0003454549480000029
wherein, the decision variable of the platform server is the average reward amount gamma provided by the platform;
mixing the above
Figure FDA0003454549480000038
Substituting into the platform server objective function to obtain
Figure FDA0003454549480000031
First derivative of γ:
Figure FDA0003454549480000032
let the first derivative be zero:
Figure FDA0003454549480000033
solving can obtain:
Figure FDA0003454549480000034
the optimal policy value on the platform server side is gamma*I.e. the actual total prize amount.
4. The two-stage federal learning incentive method in a specific index as claimed in claim 1, wherein:
decision variable data set quantity a by data islandiAnd data quality qiThrough Δ θi=σlogκ(qi ai) Calculating the ratio of the precision value and the contribution value of the specific island to the platform model training; the platform server distributes the incentives according to the proportion:
Figure FDA0003454549480000035
Figure FDA0003454549480000036
according to (6) and (7), there are:
Figure FDA0003454549480000037
CN202210001509.2A 2022-01-04 2022-01-04 Federal learning incentive method under specific index Pending CN114330587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210001509.2A CN114330587A (en) 2022-01-04 2022-01-04 Federal learning incentive method under specific index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210001509.2A CN114330587A (en) 2022-01-04 2022-01-04 Federal learning incentive method under specific index

Publications (1)

Publication Number Publication Date
CN114330587A true CN114330587A (en) 2022-04-12

Family

ID=81022869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210001509.2A Pending CN114330587A (en) 2022-01-04 2022-01-04 Federal learning incentive method under specific index

Country Status (1)

Country Link
CN (1) CN114330587A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819197A (en) * 2022-06-27 2022-07-29 杭州同花顺数据开发有限公司 Block chain alliance-based federal learning method, system, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819197A (en) * 2022-06-27 2022-07-29 杭州同花顺数据开发有限公司 Block chain alliance-based federal learning method, system, device and storage medium
CN114819197B (en) * 2022-06-27 2023-07-04 杭州同花顺数据开发有限公司 Federal learning method, system, device and storage medium based on blockchain alliance

Similar Documents

Publication Publication Date Title
CN110135761A (en) For power demand side response Load Regulation method of commerce, system and terminal device
Amirteimoori et al. Optimal input/output reduction in production processes
CN111262241B (en) Flexible load optimization scheduling strategy research method considering user type
CN114330587A (en) Federal learning incentive method under specific index
CN110826890A (en) Benefit distribution method and device of virtual power plant considering risks
CN112132309A (en) Electricity purchasing and selling optimization method and system for electricity selling company under renewable energy power generation quota system
Zeng et al. A Game Study on Accounts Receivable Financing in Energy Conservation and Environmental Protection Manufacturing Supply Chain under Green Development.
Koibichuk et al. The effectiveness of employment in high-tech and science-intensive business areas as important indicator of socio-economic development: Cross-country cluster analysis
Moene Strong unions or worker control
CN116451800A (en) Multi-task federal edge learning excitation method and system based on deep reinforcement learning
Zheng et al. Wealth optimization models on jump-diffusion model
CN113177366B (en) Comprehensive energy system planning method and device and terminal equipment
CN110390443A (en) A kind of production plan method of adjustment and system considering demand response
Hu et al. Design of two-stage federal learning incentive mechanism under specific indicators
Liu et al. Differential Game Analysis of Shared Manufacturing Platform Pricing Considering Cooperative Advertising Under Government Subsidies
CN110321511B (en) Knowledge sharing incentive method, device, equipment and storage medium
Zhang et al. Two-stage blockchain-based transaction mechanism of demand response quota
Zhang et al. Identifying the configurations to operating efficiency in China’s life insurance industry using fuzzy-set qualitative comparative analysis
CN117744931A (en) Quantum technology-based electric power spot market game method and system thereof
Kaya et al. Project FUGI and the future of ESCAP developing countries
Hamlen The Output Distribution Frontier: A Comment and Further Consideration
CN117634929A (en) Flexible overflow price assessment method considering flexible regulation characteristics of virtual power plant
CN117332859A (en) Crowd-sourced logistics method based on digital twin and evolution game
CN117689494A (en) Wind power cluster deviation assessment method and system based on Shapley value
CN117314190A (en) Low-carbon control method, device, electronic equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination