CN116485430A - Federal learning forgetting mechanism and method for data circulation - Google Patents

Federal learning forgetting mechanism and method for data circulation Download PDF

Info

Publication number
CN116485430A
CN116485430A CN202310414143.6A CN202310414143A CN116485430A CN 116485430 A CN116485430 A CN 116485430A CN 202310414143 A CN202310414143 A CN 202310414143A CN 116485430 A CN116485430 A CN 116485430A
Authority
CN
China
Prior art keywords
data
model
market
federal
participants
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310414143.6A
Other languages
Chinese (zh)
Inventor
黄建国
钱敬
唐浩竣
魏宗正
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310414143.6A priority Critical patent/CN116485430A/en
Publication of CN116485430A publication Critical patent/CN116485430A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a federal learning forgetting mechanism and method for data circulation, and belongs to the technical field of deep learning. The method uses the federal learning technology to realize the implicit circulation of data, and carries out rewarding and punishment allocation on data contributors in the market based on the shape value, and simultaneously uses the federal learning forgetting algorithm to realize the revocation in the data transaction process. The invention solves the problems of difficult data right confirmation, difficult transaction pricing, difficult privacy protection, difficult circulation transaction and the like in the data circulation transaction process.

Description

Federal learning forgetting mechanism and method for data circulation
Technical Field
The invention belongs to the technical field of deep learning, relates to building a data circulation trading market based on federal learning, provides a data trading mechanism based on the quality of a generated model, sets a reward mechanism and a punishment mechanism in the data trading process to realize standardized operation of the market, and simultaneously utilizes a federal forgetting algorithm to realize data revocation in the market.
Background
In the background of the big data age, massive, diversified and complicated data are continuously generated, and the method becomes a new trend of current social and economic development. However, the data vitality is not really released, and the data is necessarily the next technical barrier today when the deep learning technology is rapidly developed. However, the existence of the data island causes that data cannot be shared among enterprises, the value of the data is not yet excited, and in order to solve the problem, the invention constructs a data circulation transaction market from data transaction, thereby realizing data circulation and sharing, and meanwhile, the operation mechanism of the market is processed in a standardized way, so that the data circulation transaction market has the characteristics of excellent data transaction mode, reasonable allocation mechanism and reward and punishment mechanism, replaceable and revocable data use and the like.
At present, the circulation and the transaction of data are not standard, and a plurality of problems exist in the data transaction, such as data piracy, how to realize the optimal strategy of the data transaction, how to realize the privacy protection of users and how to standardize markets, so that sustainable development and operation can be realized; meanwhile, a safe and stable data circulation transaction market is constructed by taking data as commodities, and one major difficulty to be solved is that: unlike common physical commodities, the data cannot be easily removed and replaced, and the removal and replacement are an important part of the market, and the lack of the data can lead to the fact that the standardization of the market is almost impossible.
In fact, some studies currently available have been able to solve this problem locally, such as: federal learning-based solutions to some of the issues with data privacy protection; constructing a data integration and sharing platform to realize integration and sharing of data; various data security protection technologies such as encryption algorithm, access control technology, identity verification technology, audit technology and the like are used for ensuring the security of data circulation; through technical means such as data cleaning, data matching and data restoration, the invisibility of the data in the circulation process is guaranteed; and providing data service to the outside through the data open interface, so that the data can be accessed and used by a third party application. However, it is almost impossible to construct a data circulation trading market based on the above research, and the existing federal learning technology can only solve the implicit circulation problem of data, so that only the privacy of users is guaranteed not to be revealed in the circulation process of data. Meanwhile, the prior art does not consider the way in which data is sold as a commodity; and how to normalize the transaction process, such as rewarding distribution and setting for data contributors, is not considered in the data flow transaction process. Meanwhile, the existing technology cannot ensure that the data is withdrawn in the circulation process.
Consider one such case: the target client wants to exit the federation after the federation learning process ends, and therefore wants to delete their contribution from the global model. For example, consider a federal learning scenario, assuming a system is co-operated by multiple e-commerce platforms, which co-train a model to provide personalized commodity recommendations to users. During training, each platform shares its user's purchase history and browsing behavior to the model so that the model can learn the interests and preferences of the user. However, after training is completed, one of the platforms decides to exit the federation, so they no longer wish to share their user's data. The data of the platform now needs to be removed from the trained model to ensure that the privacy of its users is preserved. Similar to centralized machine learning, the most reasonable way to delete clients is to retrain the federal learning model from scratch. However, the cost of retraining from scratch is often prohibitive and the large chapter is not a long term because of a data removal. The present invention therefore devised an unreeling model to solve this problem.
Disclosure of Invention
In order to solve the problems of crude transaction mode, incomplete reward and punishment and allocation mechanism, irrevocable data and the like in the existing data transaction mode, the invention provides a Quality-oriented task allocation algorithm QTA (Quality-oriented Task Allocation algorithm), an implicit circulation of data is realized by using a federal learning technology, reward and punishment allocation is carried out on data contributors in the market based on a shape value, and revocation in the data transaction process is realized by using a federal learning forgetting algorithm.
The technical scheme provided by the invention aiming at the technical problems is as follows:
a federal learning forgetting mechanism and method facing data circulation comprises the following steps:
step one: training data through a federal learning framework to generate a local model, and replacing the direct circulation of the data in the market by the local model to realize the implicit flow of the data in the market; and then, model aggregation is carried out in the server to generate a global model.
Step two: establishing a buyer market based on a quality-oriented task allocation algorithm QTA, wherein a data seller can send the budget to a server, calculate sales price and return the calculated sales price to a buyer for selection; if payment is made, the transaction will be recorded and considered to join the federal collaboration.
Step three: the bonus allocation of the participant is calculated by calculating the contribution of the participant to the global model based on the Xia Puli value bonus mechanism.
Step four: when the user requests to exit the market and withdraw data, the server calculates the impact of the model via a penalty mechanism to determine the amount of the reimbursement.
Step five: for target users who want to exit the market, a forgetting mechanism for reversing federal learning is adopted for data revocation.
In the third step, the specific steps of the reward mechanism are as follows:
3.1: and calculating the contribution degree of the participant combination. From the federal learning participant collective perspective, the data value of an individual participant cannot replace its contribution in federal collaboration. The rewarding mechanism needs to evaluate the marginal value gain brought by introducing a certain participant into the federal from the data value of the participant combination, so that different rewards are given according to different contributions.
3.2: the Xia Puli value is used to calculate the marginal value gain for each participant.
3.3: the total prize value is calculated using the total contribution. The total prize value should be the combined property of all participants.
3.4: the marginal value gain of a new participant after being added to the market is calculated to determine the rewards of that participant based on the total contribution.
In the fourth step, the penalty mechanism construction step is as follows:
4.1: a feature vector is calculated for each user. Features are various attributes of user data such as time stamps, geographic location, device information, behavioral characteristics, and the like. The data of each user can be regarded as a feature vector.
4.2: the effect of each feature on the model prediction results is calculated. The degree of contribution of each feature is calculated using a chaperone value calculation method, and the importance ranking of each feature is given.
4.3: the reimbursement for each user data is calculated. The corresponding reimbursement may be calculated based on the contribution of each user data, which should be proportional to the contribution of the user data.
4.4: the revocators pay the reimbursement and update the total bonus pool and contribution.
The invention has the beneficial effects that: the invention realizes a safe and stable data circulation transaction system with complete functions. The system uses a distributed artificial intelligence architecture-training data generation model and uses the model as a circulation element, a blockchain-a transaction traceability and federal forgetting algorithm-data withdrawal; a multidimensional data circulation transaction market is constructed, so that data circulation transaction is accelerated technically, market vitality is stimulated, and data transaction 'available invisible', 'traceable source', 'controllable metering', 'exchangeable revocable' is realized. The invention solves the problems of difficult data right confirmation, difficult transaction pricing, difficult privacy protection, difficult circulation transaction and the like in the data circulation transaction process.
Drawings
FIG. 1 is a market infrastructure built based on federal learning.
Fig. 2 is a specific flow chart in the data transaction process of the present invention.
FIG. 3 is a detailed explanation of the rewarding mechanism and punishment mechanism of the present invention.
FIG. 4 is a flowchart of the federal forgetting algorithm of the present invention.
Detailed Description
The invention provides a data transaction circulation mode based on federal learning, which realizes data transaction based on the quality of a generated model, sets a rewarding mechanism and a punishment mechanism in the data transaction process, and simultaneously realizes the revocation of data in the market by utilizing a federal forgetting algorithm, and specifically comprises the following steps:
step one: training data through a federal learning framework to generate a local model, replacing the direct circulation of the data in the market with the local model to realize the implicit flow of the data in the market, and then carrying out model aggregation in a server to generate a global model.
Step two: based on Quality-oriented task allocation algorithm QTA (Quality-oriented Task Allocation algorithm), the process of data transaction is embodied so that the requester can obtain the most amount of data within a given budget and get the highest accuracy model.
Step three: the bonus allocation of the participant is calculated by calculating the contribution of the participant to the global model based on the Xia Puli value bonus mechanism.
Step four: when the user asks to exit the market and withdraw their data, the server calculates the impact of the model through a penalty mechanism to determine the amount of the reimbursement.
Step five: and carrying out data revocation by using a federal forgetting algorithm.
Fig. 1 is a basic market architecture constructed based on federal learning, and the specific implementation steps are as follows:
1.1 the requestor sends a transaction request to a data transaction marketplace, which accepts the purchase needs of the requestor.
1.2 data in the trading market train a local model in a distributed manner in different devices.
1.3 local models generated the local models are aggregated into a global model via a server in the data trade market.
1.4 sell the global model to the requestor.
The following is a simple introduction to the process of federal learning implementation:
federal study introduction: federal learning (federated learning, FL) is a data-immobilized distributed machine learning framework, the machine learning model is completed by a plurality of participants through multiple rounds of training, federal learning is composed of a federal server and a plurality of participants, each participant holds respective local data, during the training process, the local data of each participant cannot leave the local, but local models are directly generated locally, and after the federal models are subjected to multiple rounds of training until the requirements are met, the federal learning is terminated.
The federal learning implementation process:
the training process of federal learning is divided into two phases:
and (5) local updating: each participant updates the local model based on the respective local data and sends parameter gradient updates to the federal server.
Model aggregation: the federal server aggregates all participants 'updates, e.g., federal average (FedAvg) simply weights aggregate the gradient values uploaded by each participant based on each participant's data volume and gives the aggregate gradient to update the global model.
The federal server sends the updated global model parameters to each participant, and each participant updates the local model and prepares for the next round of training.
In practice federal learning also has a decentralised end-to-end architecture, and the definition of federal learning generally includes the following assumptions: 1. multiparty participation, 2. Data immobility, 3. Trusted transmission. According to different data dimensions held by the participants, federal learning is divided into horizontal federal learning, longitudinal federal learning and transfer learning. Federal learning is classified into cross-device federal learning and cross-organization federal learning according to the participant object.
As shown in fig. 2, which is a process diagram of a specific implementation manner of data transaction, the present invention proposes a Quality-oriented task allocation algorithm QTA (Quality-oriented Task Allocation algorithm), in which a greedy strategy is applied in the implementation process of the algorithm, for allocating federal learning tasks issued by requesters in the market to devices with appropriate data in the market, and the present invention constructs a purchasing market of data based on the algorithm, so that a purchaser can obtain the maximum amount of data available under a fixed budget.
The specific process implementation steps of the QTA algorithm are as follows:
2.1 the requester has a budget for its task and each device with the appropriate data will generate an expected sales price upon receipt of the request, the sales prices generated by all devices will constitute a set of alternative sales prices.
2.2 sales price update: each device generates a sales price of the data based on the requestor's budget and sends the data price and the amount of data the device has to the server, which recalculates the sales prices to update the sales price.
The sales price update is performed because: the data price of a device is the sum of its data and computational costs, whereas the data price sold by a server should contain the costs of the data transmission process.
2.3 the server offers to the buyer: the server updates the sales price, obtains a sales price vector, initiates quotation to the buyer for the buyer to select, and the calculation mode of the sales price update is to consider the cost of the data in the transmission process and add the cost of the data transmission into the final model cost. The profit of the server is the difference between the payment per unit data amount and the price per unit data of the computing device minus the cost of transmission times the total amount of transaction data.
2.4 greedy selection of requesters: the server employs a greedy strategy to select devices for the requester based on the sales price vector until the requester's budget is exhausted. The greedy strategy is specifically as follows: after the sales price vector is generated, the server preferentially selects the data model with the lowest price for the requester each time under the given budget of the initiator, then the price of the data is deleted from the sales price, the budget of the requester is updated, and then the selection of the data model with the lowest price is performed again, and the cycle is repeated until the budget of the requester is exhausted.
QTA may implement a globally optimal solution when requesters submit federal learning tasks in a particular order. In the trading market, all requesters independently issue federal learning tasks. Any requestor should be considered when to join the data flow trading market. The greedy strategy proposed by the present invention ensures that the requester will maximally utilize training data of the internet of things devices on the current market, i.e. exclude devices that have been selected by the early requester. Therefore, the greedy selection-based QTA proposed by the present invention is optimal in the asynchronous market.
Fig. 3 is a detailed explanation of the reward mechanism and penalty mechanism of the present invention.
Rewarding mechanism:
the contribution degree of the multiparty users in federal learning is evaluated through a classical cooperative game scheme Xia Puli value (shape value), and the rewards are reasonably distributed according to the contribution degree of each party, so that the rewards mechanism is reasonable, fair, efficient and more realistic. The formula for calculating the users of each party by using the eplerian value is as follows:
wherein pi e pi is an arrangement of all participants,combinations of participants arranged before in the arrangement pi; v is a cost function, such as test accuracy. The Xia Puli value of party i can be understood as i's marginal contribution expectation under all federal orders of addition, i.e. enumerating all expected value gains to federal brought by party i under all possible federal orders of addition as i's contribution to the federation, by which a reward mechanism can be built, and if the total contribution degree of all parties is phi, the contribution degree of the ith party is phi i The total data flow transaction market amount is O, then the i-th user's reward is:
wherein phi is i Score the contribution of the party and satisfy:
the joining federal collaboration process of the participants is: the participants submit applications, the market calculates the total contribution value of each existing participant, the sharp value is used for calculating the independent contribution degree of the participants after the participants are added into the market, and then the participation is multiplied by the total prize according to the proportion, so that the prize obtained after the participants are added into the market is obtained.
Penalty mechanism:
in the data flow trading market, individual participants may opt into federal learning and have the right to opt out of existing platforms. Since the individual user exits federal collaboration, the contribution that would otherwise belong to him will be drawn from the total rewards, and the corresponding rewards will be deducted for that user according to a penalty mechanism. Thus, when a user asks to exit the market and withdraw their data, the server may calculate taking into account the impact on the model to determine the amount of the reimbursement. When the participant submits the revocation application, the model calculates the influence of the data on the model prediction result through Xia Puli values on various feature vectors (various attributes of the user data, such as time stamps, geographic positions, equipment information, behavior characteristics and the like), and the importance ranking is given. And calculating the compensation of each user data according to the contribution degree of each user data to the model prediction result, wherein the compensation is proportional to the contribution degree of the user data. And multiplying the contribution degree of each user data to the model prediction result by the change degree of the model accuracy, thereby obtaining the influence degree of each user data on the model accuracy. The corresponding payouts are then calculated based on the degree of influence, and the payouts may be a fixed amount or a proportion. The calculation formula is as follows:
wherein W represents an influence factor of the party; n represents the number of all participants and C represents the amount of reimbursement corresponding to each degree of influence. P (P) i Representing the accuracy of the model when the user data is not used, ΔP i Indicating the degree of influence of certain user data, ΔP j Representing the change in accuracy of the model after use of the user data. The denominator in the formula represents the sum of the degrees of influence of all participant data and can be used to normalize all degrees of influence. This ensures that the total reimbursement for all participants does not exceed the sustainable reimbursement amount.
Forgetting mechanism:
based on the distribution of all models generated by using the federal learning algorithm, the quality of forgetting can be measured by weight of the retrained model and the model to be forgotten and the distance in the output space.
After the federal training t-round, the goal of the client is to learn a local model that minimizes the risk of (local) experience. This problem can be solved using the following mathematical formula:
wherein L (w; (x) j ,y j ) A predicted loss on the sample; (x) j ,y j ) For each client to locally perform several small batch random gradient descent, searching a model with small experience loss, F i (w) predicting the loss for the average sample corresponding to the client to be forgotten.
One major implementation of the forgetting mechanism is to reverse the learning process; that is, in the forgetting process, the customer does not learn model parameters that minimize experience loss, but rather strives to learn model parameters that maximize loss. To find a model with a large loss of experience, the customer can simply make several small batches of random gradient rises. However, the step of maximizing the loss simply by using gradient ascent is somewhat problematic. The loss before and after forgetting is unbounded, in which case each gradient step moves towards a model that increases the loss, and it is likely that after several steps an arbitrary model similar to a random model will be created.
To address this problem, it is ensured that the model to be forgotten is sufficiently close to the reference model that has effectively learned the data distribution of other clients. The range of losses is defined and will automatically stop once the losses exceed a given threshold. In particular, it is suggested to use averages of other client models As a reference pattern. The target client i can calculate this reference model locally as +.>
For the client i to be forgotten, F is needed i The variation interval of (w) is limited to w * The use of l '2-norm spheres in l' 2-norm spheres with a surrounding radius δ will be specific to F i The limitation of (w) is shown:
one option to solve this problem is to use projection gradient ascent. More specifically, let' 2-norm sphere w≡with radius delta be expressed * At the position ofWherein is set to represent ∈Ω ->Is a projector operator of (a).
Then, for a given step size eta μ Client i iteratively updates:
and continuously updating and iterating the client data to finally obtain a good model. If the verification accuracy is below a predetermined threshold τ, an early stop will be performed. In this way, the formation of any model by the excessive gradient-rise training model is avoided. By performing gradient up-training, the resulting model can counteract the historical effects of low quality data on them. And the parameters obtained by gradient rising training are carried out on the local model of the client, so that the influence of low-quality data reserved in the local model of other clients can be eliminated during subsequent polymerization.
This is a new federal revocation approach proposed by the present invention that can successfully revoke the contribution of any client by reversing the learning process. The method relies on clients wanting to exit the federation without requiring the server to track their parameter update history.

Claims (7)

1. A federal learning forgetting mechanism and method facing data circulation is characterized in that the method comprises the following steps:
step one: training data through a federal learning framework to generate a local model, and replacing the direct circulation of the data in the market by the local model to realize the implicit flow of the data in the market; then, model aggregation is carried out in a server to generate a global model;
step two: establishing a buyer market based on a quality-oriented task allocation algorithm QTA, transmitting the budget of the buyer market to a server by a data seller, calculating a sales price and returning the sales price to the buyer for selection; if payment is made, the transaction will be recorded, as if it were joining federal collaboration;
step three: calculating the contribution degree of the participant to the global model through a reward mechanism based on Xia Puli values, and further calculating the reward distribution of the participant;
step four: when the user requests to exit the market and withdraw data, the server calculates the influence of the model through a punishment mechanism to determine the amount of the reimbursement;
step five: for target users who are to exit the market, a forgetting mechanism for reversing federal learning is adopted to conduct data revocation.
2. The data flow oriented federal learning forgetting mechanism and method of claim 1, wherein said step one comprises the specific steps of:
1.1, a requester sends a transaction request to a data transaction market, and the transaction market accepts the purchase demand of the requester;
1.2 data in the trade market train the local model in different devices in a distributed manner;
1.3 the local model generated aggregates the local model into a global model via a server in the data trade market;
1.4 sell the global model to the requestor.
3. A mechanism and method for federal learning forgetting for data traffic according to claim 1 or 2, wherein the procedure of step two is as follows:
2.1 the requester has a budget for its task, each device with the appropriate data will generate an expected sales price upon receipt of the request, the sales prices generated by all devices will constitute a set of alternative sales prices;
2.2 sales price update: generating a sales price of data by each device according to the budget of the requester, transmitting the data price and the data quantity of the device to a server, and recalculating the sales prices by the server so as to update the sales price;
2.3 the server offers to the buyer: the server updates the sales price, obtains a sales price vector, and initiates quotation to the buyer for the buyer to select; the sales price update is calculated by adding the cost of data transmission to the final model cost; the profit of the server is the difference between the payment per unit data amount and the price per unit data of the computing device minus the cost of transmission times the total amount of transaction data;
2.4 greedy selection of requesters: the server adopts a greedy strategy to select equipment for the requester according to the sales price vector until the budget of the requester is exhausted;
the greedy strategy is specifically as follows: after the sales price vector is generated, the server preferentially selects the data model with the lowest price for the requester each time under the given budget of the initiator, then the price of the data is deleted from the sales price, the budget of the requester is updated, and then the selection of the data model with the lowest price is carried out again, and the cycle is repeated until the budget of the requester is exhausted.
4. A data flow oriented federal learning forgetting mechanism and method according to claim 1 or 2, wherein in step three, the rewarding mechanism is as follows:
the contribution degree of the multiparty users in federal learning is evaluated through the eplerian value, the incentives are reasonably distributed according to the contribution degree of each party, and the formula for calculating each party user by using the Xia Puli value is as follows:
wherein pi e pi is an arrangement of all participants,combinations of participants arranged before in the arrangement pi; v is a cost function; the Xia Puli value of party i is understood to be i's marginal contribution expectation under all federal orders of addition, i.e. enumerating all expected value gains to federation brought by party i under all possible federal orders of addition as i's contribution to federation by which a reward mechanism can be built, if the total contribution degree of all parties is phi, the contribution degree of the ith party is phi i The total data flow transaction market amount is O, then the i-th user's reward is:
wherein phi is i Score the contribution of the party and satisfy:
the joining federal collaboration process of the participants is: the participants submit applications, the market calculates the total contribution value of each existing participant, the sharp value is used for calculating the independent contribution degree of the participants after the participants are added into the market, and then the participation is multiplied by the total prize according to the proportion, so that the prize obtained after the participants are added into the market is obtained.
5. A data flow oriented federal learning forgetting mechanism and method according to claim 3, wherein in step three, the rewarding mechanism is as follows:
the contribution degree of the multiparty users in federal learning is evaluated through the eplerian value, the incentives are reasonably distributed according to the contribution degree of each party, and the formula for calculating each party user by using the Xia Puli value is as follows:
wherein pi e pi is an arrangement of all participants,combinations of participants arranged before in the arrangement pi; v is a cost function; the Xia Puli value of party i is understood to be i the marginal contribution expectation under all federal orders of addition, i.e. enumerating all expected value gains to federal brought by party i under the possible federal orders of addition as i contribution to the federation by which a reward mechanism can be built, if the total contribution degree of all parties is Φ, iContribution degree of each participant is phi i The total data flow transaction market amount is O, then the i-th user's reward is:
wherein phi is i Score the contribution of the party and satisfy:
the joining federal collaboration process of the participants is: the participants submit applications, the market calculates the total contribution value of each existing participant, the sharp value is used for calculating the independent contribution degree of the participants after the participants are added into the market, and then the participation is multiplied by the total prize according to the proportion, so that the prize obtained after the participants are added into the market is obtained.
6. The method and the federal learning forgetting mechanism for data traffic according to claim 1, 2 or 5, wherein in the fourth step, the penalty mechanism is as follows:
after the participant submits the revocation application, calculating the feature vector of each user; calculating the influence of each feature on the model prediction result, and giving the importance ranking of each feature; calculating the compensation of each user data according to the contribution degree of each user data to the model prediction result, wherein the compensation is in direct proportion to the contribution degree of the user data; multiplying the contribution degree of each user data to the model prediction result by the change degree of the model accuracy, thereby obtaining the influence degree of each user data on the model accuracy; then calculating corresponding compensation according to the influence degree, wherein the compensation is a fixed amount or a proportion; the calculation formula is as follows:
wherein W represents an influence factor of the party; n represents the number of all participants, and C represents the amount of the reimbursement corresponding to each influence degree; p (P) i Representing the accuracy of the model when the user data is not used, ΔP i Indicating the degree of influence of certain user data, ΔP j Representing the change in accuracy of the model after use of the user data.
7. The mechanism and method for federal learning forgetting for data traffic according to claim 1, 2 or 5, wherein the fifth step is as follows:
after the federal training t-round, the client's goal is to learn a local model that minimizes the risk of experience, solving the problem using the following mathematical formula:
wherein L (w; (x) j ,y j ) A predicted loss on the sample; (x) j ,y j ) For each client to locally perform several small batch random gradient descent, searching a model with small experience loss, F i (w) predicting loss for the average sample corresponding to the client to be forgotten;
one major implementation of the forgetting mechanism is to reverse the learning process; that is, in the forgetting process, the customer does not learn model parameters that minimize experience loss, but rather strives to learn model parameters that maximize loss;
defining a range of losses, which automatically stops once the losses exceed a given threshold; average using other client modelsAs a reference pattern; target client i computes this reference model locally as
For the client i to be forgotten, F is needed i The variation range of (w) is limited to w ^* The use of l '2-norm spheres in l' 2-norm spheres with a surrounding radius δ will be specific to F i The limitation of (w) is shown:
let l' 2-norm sphere w representing radius delta using projection gradient elevation ^* At the position of Wherein is set to represent ∈Ω ->Is a projector operator of (1);
then, for a given step size eta μ Client i iteratively updates:
continuously updating and iterating the data of the client to obtain an optimal model; when the verification accuracy is lower than a predetermined threshold τ, early stopping is performed.
CN202310414143.6A 2023-04-18 2023-04-18 Federal learning forgetting mechanism and method for data circulation Pending CN116485430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310414143.6A CN116485430A (en) 2023-04-18 2023-04-18 Federal learning forgetting mechanism and method for data circulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310414143.6A CN116485430A (en) 2023-04-18 2023-04-18 Federal learning forgetting mechanism and method for data circulation

Publications (1)

Publication Number Publication Date
CN116485430A true CN116485430A (en) 2023-07-25

Family

ID=87224504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310414143.6A Pending CN116485430A (en) 2023-04-18 2023-04-18 Federal learning forgetting mechanism and method for data circulation

Country Status (1)

Country Link
CN (1) CN116485430A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172338A (en) * 2023-11-02 2023-12-05 数据空间研究院 Contribution evaluation method in longitudinal federal learning scene
CN117892843A (en) * 2024-03-18 2024-04-16 中国海洋大学 Machine learning data forgetting method based on game theory and cryptography

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117172338A (en) * 2023-11-02 2023-12-05 数据空间研究院 Contribution evaluation method in longitudinal federal learning scene
CN117172338B (en) * 2023-11-02 2024-02-02 数据空间研究院 Contribution evaluation method in longitudinal federal learning scene
CN117892843A (en) * 2024-03-18 2024-04-16 中国海洋大学 Machine learning data forgetting method based on game theory and cryptography

Similar Documents

Publication Publication Date Title
US10970733B2 (en) Systems and methods for coupon issuing
CN116485430A (en) Federal learning forgetting mechanism and method for data circulation
Satzger et al. Auction-based crowdsourcing supporting skill management
JP6967116B1 (en) Electronic ticket management system and program
Yassine et al. Double auction mechanisms for dynamic autonomous electric vehicles energy trading
CN107103408A (en) Complex task distribution method under a kind of mass-rent environment
Cheng Reverse auction with buyer–supplier negotiation using bi-level distributed programming
Sueyoshi An agent-based approach equipped with game theory: strategic collaboration among learning agents during a dynamic market change in the California electricity crisis
JP2022535636A (en) Financial product recommendation method, device, electronic device and program
CN107146158A (en) A kind of electronic data processing method and device
Wang et al. Integration of simulation‐based cost model and multi‐criteria evaluation model for bid price decisions
CN112182399A (en) Multi-party security calculation method and device for federated learning
Guo et al. Dissolving the segmentation of a shared mobility market: A framework and four market structure designs
KR102176108B1 (en) Differential fee payment system through professional experts
Chen et al. Stability and convergence in matching processes for shared mobility systems
CN110852771A (en) Multi-level architecture sale system and method for establishing multi-level architecture sale system
Huang et al. An Online Inference-Aided Incentive Framework for Information Elicitation Without Verification
Lalith et al. Distributed memory parallel implementation of agent-based economic models
CN113761070A (en) Block chain intelligence data sharing excitation method, system, equipment and medium
Tang et al. Intelligent Agents for Auction-based Federated Learning: A Survey
KR102302785B1 (en) Service method for point market platform
CN117391359B (en) Method, device, electronic equipment and storage medium for resource scheduling
Brânzei et al. Online Learning in Multi-unit Auctions
Xu et al. Multi-Agent Deep Reinforcement Learning for Decentralized Proactive Transshipment
US20200184502A1 (en) Multilevel structure sales system and method for establishing the sales system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination