CN115358831A - User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning - Google Patents
User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning Download PDFInfo
- Publication number
- CN115358831A CN115358831A CN202211120985.2A CN202211120985A CN115358831A CN 115358831 A CN115358831 A CN 115358831A CN 202211120985 A CN202211120985 A CN 202211120985A CN 115358831 A CN115358831 A CN 115358831A
- Authority
- CN
- China
- Prior art keywords
- agent
- bidding
- uploaded
- model
- sample client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 230000002787 reinforcement Effects 0.000 title claims abstract description 74
- 238000012549 training Methods 0.000 claims abstract description 51
- 230000008569 process Effects 0.000 claims abstract description 37
- 230000002860 competitive effect Effects 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims description 43
- 238000011156 evaluation Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 15
- 230000004931 aggregating effect Effects 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 230000007613 environmental effect Effects 0.000 claims description 5
- 238000013468 resource allocation Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 abstract description 13
- 239000003795 chemical substances by application Substances 0.000 description 214
- 238000004364 calculation method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000003999 initiator Substances 0.000 description 2
- 230000009916 joint effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/08—Auctions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a user bidding method and a device based on a multi-agent reinforcement learning algorithm under federal learning, wherein the method comprises the following steps: the method comprises the steps that a learning task issued by a federal learning platform is obtained, competitive bidding information is uploaded to the federal platform by a sample client through a reinforcement learning algorithm, a global sharing model is issued to a selected sample client downwards after the platform selects the sample client through the algorithm, the selected sample client conducts local training and uploads updating parameters, and the platform gathers the uploaded updating model parameters according to a gathering algorithm and updates the model parameters in the global model. The method relieves overfitting of the model while realizing dynamic bidding of federal learning participated users, and solves the problems of lack of federal learning fairness and overfitting of the model caused by the fact that the bidding strategy of the users cannot be changed in the subsequent training process after the users submit the bidding strategy in the conventional incentive mechanism based on auction.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning.
Background
With increasing privacy concerns and the advent of relevant policies, traditional machine learning to collect data for centralized training becomes increasingly difficult. Federal learning is the most promising deep learning paradigm because it does not require users to upload raw data to protect user privacy. However, the participating users in federal learning consume a lot of resources such as computation, communication, etc. during the training process of federal learning, which means that the participating users in selfish do not participate in the learning task in a full-hearted way without sufficient return. Meanwhile, due to the fact that the underlying network structure of the federal study is complex, node resources are limited and heterogeneous, and if a federal initiator does not have corresponding incentive selection measures, huge communication overhead is caused, so that network resources are wasted, and popularization of the federal study is hindered.
In the incentive mechanism of the related art, the selection and profit sharing of the participating users can be performed using a gaming technique. This may be particularly achieved by incorporating an auction method in federal learning. As one implementation mode, the selection of high-quality participating users can be carried out through a lightweight and multidimensional incentive scheme, and as another implementation mode, the learning quality of the participating users can be integrated into the federal learning through setting an incentive mechanism framework to carry out quality-aware incentive mechanism and model aggregation. However, existing auction-based incentive mechanisms are almost static, and these methods default to participating users in the auction process to determine their own policies and then no longer modify their own policies with changes in platform behavior, which merely maximizes the utility of the platform or social welfare, but fails to maximize the utility of both the platform and participating users. The method is characterized in that in the process of federal learning auction, after participating users determine own bidding information, the strategy cannot be changed in subsequent training, and whether the participating users are selected or not, the participating users only wait to be selected. The second dynamic bidding method is to assume that the user information is transparent, that is, each user knows the private information of other users, which is impossible to realize in practical application.
Disclosure of Invention
The invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning, which introduces a multi-agent reinforcement learning mode into an incentive mechanism of federal learning, thereby solving the problem that the incentive mechanism based on auction in the prior art causes loss of federal learning fairness due to the fact that a strategy cannot be changed in the follow-up training process. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm, where the method includes:
the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and if the updated global shared model reaches the preset model precision in the test task, judging that the learning task released by the Federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global shared model in multiple turns so as to ensure that the updated global shared model reaches the preset model precision in the test task.
Optionally, the process of outputting, by the sample client, bidding information to be submitted of the sample client in the current round by using a multi-agent reinforcement learning algorithm includes:
and taking the sample client as an intelligent agent, observing the self historical state information in the federal learning environment by the intelligent agent, and outputting the bidding information to be submitted of the sample client in the current turn by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policy engine and an experience pool, the sample client is used as an agent, the agent observes historical state information of the agent in a federal learning environment, and outputs bidding information to be submitted of the sample client in a current turn by using the historical state information, where the policy engine includes:
the sample client side is used as an intelligent agent, historical task state information observed by each intelligent agent in a federated learning environment is stored by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm, and the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data amount and a historical unit resource amount;
historical task state information observed by the intelligent agent in the federal learning environment is used as state information of the intelligent agent in the current round and input to a strategy device in the multi-intelligent-agent reinforcement learning algorithm, and bidding information to be submitted of the intelligent agent in the current round is output.
Optionally, after the outputting of the bidding information to be submitted of the agent in the current round by inputting the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round to the policier in the multi-agent reinforcement learning algorithm, the method further comprises:
and calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after the bidding information to be submitted is uploaded, and the income resources fed back to the intelligent agent by the federal learning environment aiming at the bidding information to be submitted uploaded in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the calculating revenue resources fed back by the federal learning environment in the current turn for the intelligent agent includes:
respectively acquiring resource parameters related to the intelligent agent in the bidding process based on the bidding information to be uploaded of the intelligent agent in the current round;
and inputting resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the policy engine inputs historical task state information observed in the federal learning environment as state information of the agent in the current round into the multi-agent reinforcement learning algorithm, and outputs bidding information to be submitted of the agent in the current round, including:
historical task state information observed by the intelligent agent in the federal learning environment is input into the action network of the policy engine as state information of the intelligent agent in the current round, and bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained;
the state information of the intelligent agent in the current round and the competitive bidding information to be uploaded of the intelligent agent in the current round are input into the value network in the strategy device, and the competitive bidding information to be uploaded is evaluated to obtain an evaluation score of the competitive bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters includes:
respectively calculating the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume proportion corresponding to each sample client;
and after the data volume ratio corresponding to each sample client is multiplied by the update model parameters uploaded by the corresponding sample client, aggregating the update model parameters corresponding to all the sample clients, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
In a second aspect, an embodiment of the present invention provides a user bidding device under federal learning based on a multi-agent reinforcement learning algorithm, where the device includes:
the system comprises an acquisition unit, a resource allocation unit and a resource allocation unit, wherein the acquisition unit is used for acquiring a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
the system comprises a receiving unit, a calculating unit and a processing unit, wherein the receiving unit is used for receiving an update model parameter uploaded by each sample client, and the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current turn before training of the sample client, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
the aggregation unit is used for aggregating the update model parameters uploaded by each sample client and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and the selecting unit is used for judging that the learning task issued by the Federal learning platform is completed if the updated global sharing model reaches the preset model precision in the test task, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
Optionally, the apparatus further comprises:
the output unit is used for outputting the process of the competitive bidding information to be submitted of the sample client in the current round by the sample client by using a multi-agent reinforcement learning algorithm;
the output unit is specifically configured to use the sample client as an agent, and the agent observes historical state information of the agent in a federal learning environment and outputs bidding information to be submitted of the sample client in a current round by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policer and an experience pool, and the output unit includes:
the storage module is used for storing historical task state information observed by each intelligent agent in a federated learning environment by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm with the sample client as the intelligent agent, wherein the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data volume and a historical unit resource volume;
and the output module is used for outputting bidding information to be submitted of the intelligent agent in the current round by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current round into the policy maker in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the output unit further includes:
and the calculation module is used for inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current turn into the policy maker in the multi-intelligent-agent reinforcement learning algorithm, outputting the competitive bidding information to be submitted by the intelligent agent in the current turn, calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the competitive bidding information to be submitted, the environmental state after the competitive bidding information to be submitted is uploaded and the income resources fed back to the intelligent agent by the federal learning environment aiming at the competitive bidding information to be submitted in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the computing module is specifically configured to obtain resource parameters related to the intelligent agents in the bidding process based on the bidding information to be uploaded of the intelligent agents in the current round;
the calculating module is specifically configured to input resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function, so as to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the output module is specifically configured to output bidding information to be submitted of the agent in the current round by inputting historical task state information observed by the agent in the federal learning environment as state information of the agent in the current round into the action network of the policy engine, so as to obtain bidding information to be uploaded of the agent in the current training round;
the output module is specifically used for inputting the state information of the intelligent agent in the current round and the bidding information to be uploaded of the intelligent agent in the current round to the value network in the policy device, evaluating the bidding information to be uploaded and obtaining the evaluation score of the bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregation unit includes:
the computing module is used for respectively computing the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume ratio corresponding to each sample client;
and the aggregation module is used for aggregating the update model parameters corresponding to all the sample clients after multiplying the data volume ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
In a third aspect, an embodiment of the present invention provides a storage medium, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the method described in the first aspect.
In a fourth aspect, an embodiment of the present invention provides an apparatus for user bidding under federal learning based on a multi-agent reinforcement learning algorithm, including:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
As can be seen from the above, in the user bidding method and apparatus based on multi-agent reinforcement learning algorithm under federal learning provided in the embodiments of the present invention, a learning task issued by a federal learning platform is obtained, a sample client is selected from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, a global sharing model is issued to a sample client, update model parameters uploaded by each sample client are received, the update model parameters are formed by training the global sharing model according to configuration in the bidding information to be submitted by the sample client before training starts, update model parameters uploaded by each sample client are further aggregated, model parameters in the global sharing model are updated by using the aggregated update model parameters, if the updated global sharing model reaches preset model accuracy in a test task, the learning task issued by the learning platform is determined to be completed, otherwise, the step of updating the parameters in the global sharing model in multiple rounds is repeatedly executed, so that the updated global sharing model reaches the preset model accuracy after the update in the federal learning is tested. Therefore, compared with the incentive mechanism based on auction in the prior art, the method and the device for adjusting the bidding information can adjust the bidding information uploaded by the client by using the multi-agent learning system, so that the problem that the federal learning fairness is lost due to the fact that the incentive mechanism based on auction in the prior art cannot be changed in the subsequent strategy training process is solved.
In addition, the technical effects that the embodiment can also realize include:
(1) Competitive bidding information uploaded by the client is adjusted based on a multi-agent reinforcement learning algorithm, so that the probability of selecting the client is increased, the fairness of participating in users in federal learning is guaranteed, the suboptimal problem caused by a fixed strategy is solved, and the goal of jointly maximizing the utility of a federal learning platform and participating users is realized.
(2) The multi-agent reinforcement learning algorithm adopts centralized training and distributed execution, so that the corresponding client of the participating user can observe more states, and the stability in the agent training process is improved.
(3) The multi-agent reinforcement learning algorithm adopts an asynchronous deep reinforcement training mode, and decouples the steps of executing the learning task and updating the bidding information, so that the two can work in parallel, and the training of the model is accelerated.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can be obtained from these figures.
Fig. 1 is a schematic flow chart of a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention;
FIG. 2 is a block diagram of a process of outputting bidding information to be submitted by a sample client in a current round by a multi-agent reinforcement learning algorithm according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating a user bidding apparatus under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. A process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning, which adjusts bidding information uploaded by a client by using a multi-agent learning system, thereby solving the problem of lacking federal learning fairness caused by no change in the subsequent training process of a strategy in an incentive mechanism based on auction in the prior art. The traditional auction technology needs private information of participating users in the auction process, the whole auction process is static, namely the bids of the participating users are fixed, bidding information uploaded by a client cannot be adjusted even after the bidding fails, the participating users cannot dynamically change the bidding information, so that fairness loss exists in federal learning, the participating users with weak resources are difficult to select by a federal learning platform, and resources of the participating users are greatly wasted. The embodiment of the invention introduces a multi-agent reinforcement learning algorithm into an incentive mechanism of federal learning, adjusts bidding information uploaded by a client based on the multi-agent reinforcement learning algorithm so as to increase the probability of selecting the client corresponding to a participating user, reduce aggregation time, ensure fairness of participating users in federal learning, solve suboptimal problems caused by fixed strategies and achieve the goal of jointly maximizing the utility of a federal learning platform and participating users.
The following provides a detailed description of embodiments of the invention.
Fig. 1 is a schematic flow chart of a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention. The method may comprise the steps of:
s100: the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client.
The learning task issued by the federal learning platform is issued by a server corresponding to a federal issuer, and the learning task is suitable for various application scenarios related to data collection training, such as a target recognition task, a data classification task and the like. In order to ensure that the server selects a proper client to perform model training, each client uploads bidding information, and the server further selects each client based on the learning task.
The bidding information here consists of computing resources, data volume and bidding resources. Specifically, in the process of selecting a sample client from a client set based on a learning task and bidding information uploaded by the client set participating in federal learning, after receiving the bidding information of the client, a federal learning platform can perform modeling solution on the learning task to obtain a sample client and bidding resources corresponding to the sample client, wherein the modeling solution process comprises the following steps:
s n ∈{0,1} (2)
t n max ≤T max (3)
the method comprises the following steps that (1) the payment sum of the federal learning platform to selected clients does not exceed the budget of the federal learning platform, (2) each client is selected or not selected, is selected to be 1 and is not selected to be 0, and (3) the training time of the selected client cannot exceed the maximum time specified by the federal learning platform.
By solving the expression, the federated learning platform will pick out the sample client set and the amount of resources purchased by the federated learning platform at each sample client.
S110: and receiving an updated model parameter uploaded by each sample client, wherein the updated model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to the configuration in the bidding information to be submitted after the sample client is selected.
In the embodiment of the invention, the sample clients are clients wishing to participate in the learning task, only the selected sample clients can download and train the global model, each sample client has a multi-agent reinforcement learning algorithm, and specifically, after the selected sample clients receive the global shared model, the global model is trained according to the configuration in the bidding information of the selected sample clients to obtain the updated model parameters.
The method comprises the steps that a specific sample client side outputs bidding information to be submitted of the sample client side in the current round by using a multi-agent reinforcement learning algorithm, the sample client side serves as an agent, the agent observes historical state information of the agent in a federal learning environment, and the historical state information is used for outputting the bidding information to be submitted of the sample client side in the current round.
The multi-agent reinforcement learning algorithm comprises a strategy device and an experience pool, wherein a sample client is used as an agent, the agent observes the self historical state information in the federal learning environment, and the historical state information is used for outputting the process that the sample client submits bidding information in the current round, the sample client can be used as the agent, the historical task state information observed by each agent in the federal learning environment is stored by using the experience pool in the multi-agent reinforcement learning algorithm, the historical task state information is equivalent to the state of the agent in the past submission of the bidding information and the feedback of the federal learning on the agent in the past submission of the bidding information, at least comprises whether the agent is selected in the historical round, the historical resource value, the history providing data volume and the historical unit resource volume, and the historical task state information observed by the agent in the federal learning environment is used as the state information of the agent in the current round and is input into the strategy device in the multi-agent reinforcement learning algorithm, and the to-submit bidding information of the agent in the current round is output.
It can be appreciated that the multi-agent reinforcement learning algorithm described above can learn how to map task states in the federated learning environment into bidding information so that the client and platform obtain maximum resource revenue simultaneously. The basic model of the system is a Markov game process, and in the Markov game, all the intelligent bodies simultaneously select and execute bidding information to be submitted by each intelligent body according to the task state (or observed value) of the current federal learning environment. It is defined as a tuple (n, S, A) 1 ,...,A n ,T,γ,R 1 ,...,R n ) Wherein n is the number of agents, and S is the task state of the multi-agent reinforcement learning algorithm, which refers to the historical task state information of each agent; a is a bidding information set to be submitted by each agent; t is SxA 1 ×A 2 ×...×A n ×S→[0,1]Is a set of agent state transition functions, i.e. the probability distribution of the next task state given the current task state and the joint action. R i :S×A 1 ×A 2 ×...×A n ×S→[0,1]Is a collection of agent i reward functions, R i (s,a 1 ,...×a n S) agent i takes a joint action (a) in task state s 1 ,...a n ) Then in task state s t+1 The resulting reward. The expectation of a jackpot obtained by a certain agent i here may be expressed as:
the reward function of the agent may be expressed as:
grid; d i Is the data volume of agent i; m is i For the unit computing power of the agent, c i Is unit cost;average profit, x, for agent i to get on its own resource demand i The resources are served to the agents to their needs. Due to uncertainty in the behavior of the agent owner, for example, the agent may use it for a long time for other things, resulting in a device with little remaining resources available for task training, where x is used i Defined as random variables within a certain intervalWherein x is i Following the probability distribution function F (x) i )。
Further, in order to know the profit condition fed back by the sample client in each turn in real time, after outputting the bidding information to be submitted of the intelligent agent in the current turn, the profit resource fed back by the federal learning environment in the current turn for the intelligent agent can be calculated, and the experience pool in the multi-intelligent-agent reinforcement learning algorithm is used for storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after uploading the bidding information to be submitted, and the profit resource fed back by the federal learning environment to the intelligent agent in the current turn for the bidding information to be submitted. The resource parameters related to the intelligent agent in the bidding process can be respectively obtained based on the bidding information to be uploaded of the intelligent agent in the current round, and the resource parameters related to the intelligent agent in the bidding process are further input into a pre-constructed revenue function, so that the revenue resource fed back by the federal learning environment for the intelligent agent in the current round is obtained.
Each sample client is provided with a strategy device, the strategy device comprises an action network and a value network, specifically, in the process of outputting the bidding information to be submitted of the intelligent agent in the current round, historical task state information observed by the intelligent agent in the federal learning environment is used as the state information of the intelligent agent in the current round and is input into the action network of the strategy device, and the bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained; and evaluating the bidding information to be uploaded to obtain an evaluation score of the bidding information to be uploaded by inputting the state information of the intelligent agent in the current turn and the bidding information to be uploaded of the intelligent agent in the current turn into a value network in the policy device. The action network is trained by using the evaluation scores of the bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation scores of the bidding information to be uploaded and the income resources actually fed back by the intelligent agents, and the network parameters of the value network are updated through a time sequence difference method.
Specifically, in an actual application scenario, it can be assumed that m sample clients and a task initiator exist in a certain area at a certain time step t, where the certain time step t is equivalent to one turn of local training and uploading updated model parameters by a client set submitting a task bid, a federal learning platform selecting client and a selected sample client in the federal learning process, and the task bid includes bidding information (data volume and computing resources) of the sample clients and payment expected to be obtained. In the federal learning, each sample client serves as an agent and is provided with a reinforcement learning strategy device, and the reinforcement learning strategy device is formed by using a multi-layer perception machine in deep learning and comprises an input layer, a hidden layer and an output layer. The policer is represented as follows: the st is the state of the federal learning environment at the time t, the state comprises the state of each agent and the state of the federal learning platform, and in each time slot t, the observation space of the agent i isWherein,the price provided for the agent in the previous round,is toOne round of bidding results i belongs to {0,1}, wherein s =0 represents that the bidding fails, and s =1 represents that the bidding succeeds;representing the single computational resources provided by the agents, the unit computational resources of each agent being related to the agent's own resource requirements, since the agents will not necessarily allocate all of the computational resources to the training task during the training time;representing the amount of data provided by the agent in the previous round. Before the current training wheel begins, the agent i observes the state information of the current learning task before, and then inputs the state learning observed from the Federal learning environment into the action network, and the action network outputs a strategy through calculationAnd the strategy is bidding information to be submitted by the intelligent agent in the current round. Wherein,is an array, which contains four attributes in the bidding information,the intelligent agent is rewarded by the feedback of the federal learning environment after the user takes action in the current turn, each intelligent agent has an own reward function, and in the embodiment of the invention, the benefit resources of the intelligent agent in the current turn can be calculated by using the reward function of the intelligent agent.
Exemplarily, fig. 2 is a block diagram of a process of outputting bidding information to be submitted by a sample client in a current round by using a multi-agent reinforcement learning algorithm, where each policy device includes an action network and a value network, and the action network and the value network respectively include a main network and a target network, and the specific algorithm process in conjunction with fig. 2 is as follows:
for number of rounds epicode =1 to M, iteration is performed
Initializing an action space;
for T =1 to T, iterate
b) Execution actiona = (a) 1 ,...,a N ) Observe the reward r and the new state s t+1
c) Will(s) t ,a t ,r t ,s t+1 T) putting the obtained product into an experience pool D;
d)s t ←s t+1
for agent ito N iterates:
randomly sampling small batches of stored samples from the experience pool (X) j ,a j ,r j ,X′ j )
updating the action network by adopting gradient ascent:
after all updates are completed, the target network is updated for each agent i: theta' i ←τθ i +(1-τ)θ' i
S120: and aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters.
Specifically, the ratio of the data volume of each sample client to the data volume of all the sample clients can be calculated respectively to obtain the data volume proportion corresponding to each sample client, the update model parameters corresponding to all the sample clients are aggregated after the data volume proportion corresponding to each sample client is multiplied by the update model parameters uploaded by the corresponding sample client, and the model parameters in the global sharing model are updated by accumulating the aggregated update model parameters.
It can be understood that the model parameters of the global sharing model issued to each sample client are the same, and the model parameters corresponding to the global sharing model trained by each sample client according to the output bidding information configuration to be submitted are different, here, a local model parameter is obtained after each sample client trains the global sharing model, an updated model parameter is obtained by subtracting the local model parameter from the model parameter of the issued global sharing model, and the federal learning platform further updates the model parameters in the global sharing model.
S130: and if the updated global sharing model reaches the preset model precision in the test task, judging that the learning task issued by the federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
In the embodiment of the invention, the huge exploration space of the intelligent agent is considered, and centralized training and a distributed execution multi-intelligent reinforcement learning algorithm are adopted as a framework. Each sample client serves as an agent, each agent is provided with a policer, the policer consists of an action network and a value network, and each action network and the value network consist of two networks (a main network and a target network) respectively and are used for training updating. The intelligent agent observes the task state of the current round, for example, the task state of the current round is not selected, the historical price, the historical provided data amount, the historical unit resource amount and the like, and the task state is used as the input of an action network in the strategy device, the action network gives the action of the current round, namely the bidding information to be submitted in the current round, the value network of each intelligent agent has the local state observed and the action made by each intelligent agent, and the action output by the intelligent agent is scored as the input. Specifically, before each federal learning training round begins, each agent observes own historical information s (historical bidding result, historical resource calculation resource, historical data amount and historical bidding) as a state and inputs the state into a policer, the policer outputs action a of the agent, namely bidding information of the current training round of a user, the user submits the bidding information to a federal learning platform, namely an environment, the federal learning platform selects a proper sample client to maximize own profit, the federal learning environment feeds back a reward value r of each agent, and the experience pool converts the reward value to the next state s 'to store the tuple (s, a, s' r). When new data cannot be collected in the experience pool, the policer starts training, in the embodiment of the present invention, centralized training is specifically adopted, and the policer is trained by a distributed execution idea, and the centralized training may be expressed as: first, the action network in each agent policer selects an action a according to the current state, and then the value network calculates a Q value according to the state-action pair as feedback for the action a to the action network. Here the value network is trained on the estimated and actual Q values, and the action network updates the strategy according to the feedback of the value network. In order to obtain a more accurate Q value, a value network in a strategy device has the actions and states of all agents in the training process, the value network updates the parameters of the value network through a time sequence difference method, and then the parameters of an action network are updated through gradient rising. The distributed execution may appear as: after the centralized training is finished, each agent executes the centralized training according to the current observed state distribution. The strategy device starts to enter a convergence state after being trained for enough time, and finally an optimal real-time bidding effect is achieved.
The user bidding method under the federal learning based on the multi-agent reinforcement learning algorithm provided by the embodiment of the invention comprises the steps of selecting a sample client from a client set based on a learning task issued by a federal learning platform and bidding information uploaded by the learning task and the client set participating in the federal learning, issuing a global sharing model to the sample client, receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by training the global sharing model according to the configuration in the bidding information to be submitted after the sample client outputs the bidding information to be submitted of the sample client in the current turn by using the multi-agent reinforcement learning algorithm before the training is started, further aggregating the update model parameters uploaded by each sample client, updating the model parameters in the global sharing model by using the aggregated update model parameters, judging to finish the learning task issued by the federal learning platform if the updated global sharing model reaches the preset model precision in the testing task, and otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns to ensure that the updated global sharing model reaches the preset task precision in the testing task. Therefore, compared with the incentive mechanism based on auction in the prior art, the embodiment of the invention can adjust the bidding information uploaded by the client by using the multi-agent learning system to increase the probability of the client being selected, thereby solving the problem that the federal learning fairness is lost due to the fact that the incentive mechanism based on auction in the prior art cannot be changed in the subsequent strategy training process.
Based on the above embodiment, another embodiment of the present invention provides a user bidding apparatus under federal learning based on multi-agent reinforcement learning algorithm, as shown in fig. 3, the apparatus includes:
the obtaining unit 20 may be configured to obtain a learning task issued by a federal learning platform, select a sample client from the client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issue a global sharing model to the sample client;
the receiving unit 22 may be configured to receive an update model parameter uploaded by each sample client, where the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in a current round before training of the sample client starts, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
the aggregation unit 24 may be configured to aggregate the update model parameters uploaded by each sample client, and update the model parameters in the global sharing model by using the aggregated update model parameters;
the selecting unit 26 may be configured to determine that the learning task issued by the federal learning platform is completed if the updated global sharing model reaches the preset model accuracy in the test task, and otherwise, repeat the step of updating the model parameters in the global sharing model for multiple rounds, so that the updated global sharing model reaches the preset model accuracy in the test task.
Optionally, the apparatus further comprises:
the output unit can be used for the process that the sample client outputs the bidding information to be submitted of the sample client in the current round by using a multi-agent reinforcement learning algorithm;
the output unit is specifically configured to use the sample client as an agent, and the agent observes historical state information of the agent in a federal learning environment and outputs bidding information to be submitted of the sample client in a current round by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policer and an experience pool, and the output unit includes:
a storage module, configured to use an experience pool in the multi-agent reinforcement learning algorithm to store historical task status information observed by each agent in a federated learning environment with the sample client as an agent, where the historical task status information at least includes whether the agent is selected in a historical turn, a historical resource value, a historical provided data amount, and a historical unit resource amount;
and the output module can be used for outputting the bidding information to be submitted of the intelligent agent in the current round by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current round into the policy maker of the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the output unit further includes:
and the calculation module can be used for calculating the income resources fed back by the federal learning environment for the intelligent agent in the current turn by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current turn into the policy maker in the multi-intelligent-agent reinforcement learning algorithm, outputting the competitive bidding information to be submitted by the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the competitive bidding information to be submitted, the environmental state after the competitive bidding information to be submitted is uploaded and the income resources fed back to the intelligent agent by the federal learning environment for the competitive bidding information to be submitted in the current turn by using the experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the computing module may be specifically configured to obtain resource parameters related to the intelligent agents in the bidding process based on the bidding information to be uploaded of the intelligent agents in the current round;
the calculation module may be further configured to input resource parameters related to the intelligent agent in the bidding process to a pre-established revenue function, so as to obtain revenue resources fed back by the intelligent agent in the current round in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the output module is specifically configured to output bidding information to be submitted of the agent in the current round by inputting historical task state information observed by the agent in the federal learning environment as state information of the agent in the current round into the action network of the policy engine, so as to obtain bidding information to be uploaded of the agent in the current training round;
the output module is specifically further configured to evaluate the bidding information to be uploaded by inputting the state information of the agent in the current round and the bidding information to be uploaded of the agent in the current round to the value network in the policer, so as to obtain an evaluation score of the bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregation unit 24 includes:
the computing module can be used for respectively computing the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume ratio corresponding to each sample client;
the aggregation module may be configured to aggregate the update model parameters corresponding to all the sample clients after multiplying the data size ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, and update the model parameters in the global sharing model by accumulating the aggregated update model parameters.
Based on the above method embodiments, another embodiment of the present invention provides a storage medium having executable instructions stored thereon, which when executed by a processor, cause the processor to implement the above method.
Based on the above embodiment, another embodiment of the present invention provides a vehicle including:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described above. The vehicle may be a non-autonomous vehicle or an autonomous vehicle.
The system and apparatus embodiments correspond to the method embodiments, and have the same technical effects as the method embodiments, and for the specific description, refer to the method embodiments. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again. Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A user bidding method under federal learning based on multi-agent reinforcement learning algorithm is characterized by comprising the following steps:
the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and if the updated global shared model reaches the preset model precision in the test task, judging that the learning task released by the Federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global shared model in multiple turns so as to ensure that the updated global shared model reaches the preset model precision in the test task.
2. The method of claim 1, wherein the process of the sample client outputting the bidding information to be submitted of the sample client in the current round using a multi-agent reinforcement learning algorithm comprises:
and taking the sample client as an intelligent agent, observing the self historical state information in the federal learning environment by the intelligent agent, and outputting the bidding information to be submitted of the sample client in the current turn by using the historical state information.
3. The method of claim 2, wherein the multi-agent reinforcement learning algorithm comprises a policer and an experience pool, the taking the sample client as an agent, the agent observing historical state information of the agent in a federated learning environment and using the historical state information to output bidding information to be submitted of the sample client in a current round, comprises:
the sample client side is used as an intelligent agent, historical task state information observed by each intelligent agent in a federated learning environment is stored by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm, and the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data amount and a historical unit resource amount;
historical task state information observed by the intelligent agent in the federal learning environment is used as state information of the intelligent agent in the current round and input to a strategy device in the multi-intelligent-agent reinforcement learning algorithm, and bidding information to be submitted of the intelligent agent in the current round is output.
4. The method of claim 3, wherein after outputting the agent's to-be-submitted bid information for a current round by inputting historical task state information observed by the agent in the federated learning environment as the agent's state information for the current round to a policer in the multi-agent reinforcement learning algorithm, the method further comprises:
and calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after the bidding information to be submitted is uploaded, and the income resources fed back to the intelligent agent by the federal learning environment aiming at the bidding information to be submitted uploaded in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
5. The method of claim 4, wherein calculating the revenue resources for the federated learning environment for the agent to feed back on the current turn comprises:
respectively acquiring resource parameters related to the intelligent agent in the bidding process based on the bidding information to be uploaded of the intelligent agent in the current round;
and inputting resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
6. The method of claim 3, wherein each sample client is configured with a policer comprising an action network and a value network, the outputting of the bidding information to be submitted by the agent in the current round by inputting historical task state information observed in the federated learning environment as state information of the agent in the current round to a policer in the multi-agent reinforcement learning algorithm comprises:
historical task state information observed by the intelligent agent in the federal learning environment is input into the action network of the policy engine as state information of the intelligent agent in the current round, and bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained;
the state information of the intelligent agent in the current round and the competitive bidding information to be uploaded of the intelligent agent in the current round are input into a value network in the strategy device, and the competitive bidding information to be uploaded is evaluated to obtain an evaluation score of the competitive bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
7. The method according to any one of claims 1 to 6, wherein the aggregating the update model parameters uploaded by each sample client, and using the aggregated update model parameters to update the model parameters in the global sharing model, comprises:
respectively calculating the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume proportion corresponding to each sample client;
and after multiplying the data volume ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, aggregating the update model parameters corresponding to all the sample clients, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
8. A user bidding apparatus under federal learning based on multi-agent reinforcement learning algorithm, the apparatus comprising:
the system comprises an acquisition unit, a resource allocation unit and a resource allocation unit, wherein the acquisition unit is used for acquiring a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
the receiving unit is used for receiving the updated model parameters uploaded by each sample client, and the updated model parameters are formed by training a global sharing model by using a multi-agent reinforcement learning algorithm to output sample clients in the current round of bidding information to be submitted;
the aggregation unit is used for aggregating the update model parameters uploaded by each sample client and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and the selecting unit is used for judging that the learning task issued by the Federal learning platform is completed if the updated global sharing model reaches the preset model precision in the test task, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
9. A storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to implement the method for user bidding under federal learning based on multi-agent reinforcement learning algorithms.
10. An apparatus for bidding users under federal learning based on multi-agent reinforcement learning algorithm, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for user bidding under federated learning based on a multi-agent reinforcement learning algorithm.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210309611.9A CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
CN2022103096119 | 2022-03-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115358831A true CN115358831A (en) | 2022-11-18 |
Family
ID=82975873
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210309611.9A Withdrawn CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
CN202211120985.2A Pending CN115358831A (en) | 2022-03-28 | 2022-09-15 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210309611.9A Withdrawn CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN114971819A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115130683A (en) * | 2022-07-18 | 2022-09-30 | 山东大学 | Asynchronous federal learning method and system based on multi-agent model |
CN115086399B (en) * | 2022-07-28 | 2022-12-06 | 深圳前海环融联易信息科技服务有限公司 | Federal learning method and device based on hyper network and computer equipment |
CN117076113B (en) * | 2023-08-17 | 2024-09-06 | 重庆理工大学 | Industrial heterogeneous equipment multi-job scheduling method based on federal learning |
-
2022
- 2022-03-28 CN CN202210309611.9A patent/CN114971819A/en not_active Withdrawn
- 2022-09-15 CN CN202211120985.2A patent/CN115358831A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114971819A (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115358831A (en) | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning | |
Du et al. | Learning resource allocation and pricing for cloud profit maximization | |
CN110189174A (en) | Mobile crowd sensing excitation method based on data quality sensing | |
CN111064633B (en) | Cloud-edge cooperative power information communication equipment automated testing resource allocation method | |
CN113992676B (en) | Incentive method and system for layered federal learning under terminal edge cloud architecture and complete information | |
CN113128828B (en) | Satellite observation distributed online planning method based on multi-agent reinforcement learning | |
CN108921298B (en) | Multi-agent communication and decision-making method for reinforcement learning | |
CN109884897B (en) | Unmanned aerial vehicle task matching and calculation migration method based on deep reinforcement learning | |
CN109068288B (en) | Method and system for selecting mobile crowd sensing incentive mechanism based on multi-attribute user | |
Kumar et al. | Federated control with hierarchical multi-agent deep reinforcement learning | |
CN112950251A (en) | Reputation-based vehicle crowd sensing node reverse combination auction excitation optimization method | |
Lim et al. | Incentive mechanism design for resource sharing in collaborative edge learning | |
CN109922152A (en) | Calculating discharging method and system in a kind of mobile edge calculations | |
CN107784561A (en) | The implementation method of online incentive mechanism in a kind of mobile mass-rent system | |
CN114415735B (en) | Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method | |
CN110390560A (en) | A kind of mobile intelligent perception multitask pricing method based on Stackelberg game | |
Rjoub et al. | Explainable AI-based federated deep reinforcement learning for trusted autonomous driving | |
Zhu et al. | Agent-based dynamic scheduling for earth-observing tasks on multiple airships in emergency | |
CN104657390B (en) | A kind of answer platform method and system | |
Liew et al. | Economics of semantic communication in metaverse: an auction approach | |
CN113660304A (en) | Unmanned aerial vehicle group distributed learning resource control method based on bidirectional auction game | |
CN116566891A (en) | Delay-sensitive service function chain parallel route optimization method, device and medium | |
CN115271092A (en) | Crowd funding incentive method for indoor positioning federal learning | |
CN109872058A (en) | Multimedia crowd sensing excitation method for machine learning system | |
Yue et al. | Ai-enhanced incentive design for crowdsourcing in internet of vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Zeng Rongfei Inventor after: An Shuyang Inventor after: Zeng Chao Inventor after: Su Mai Inventor after: Wang Jiaqi Inventor before: Zeng Rongfei Inventor before: An Shuyang Inventor before: Zeng Chao Inventor before: Han Bo Inventor before: Su Mai Inventor before: Wang Jiaqi |