CN115358831A - User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning - Google Patents

User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning Download PDF

Info

Publication number
CN115358831A
CN115358831A CN202211120985.2A CN202211120985A CN115358831A CN 115358831 A CN115358831 A CN 115358831A CN 202211120985 A CN202211120985 A CN 202211120985A CN 115358831 A CN115358831 A CN 115358831A
Authority
CN
China
Prior art keywords
agent
bidding
uploaded
model
sample client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211120985.2A
Other languages
Chinese (zh)
Inventor
曾荣飞
安树阳
曾超
韩波
苏迈
王家齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Publication of CN115358831A publication Critical patent/CN115358831A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user bidding method and a device based on a multi-agent reinforcement learning algorithm under federal learning, wherein the method comprises the following steps: the method comprises the steps that a learning task issued by a federal learning platform is obtained, competitive bidding information is uploaded to the federal platform by a sample client through a reinforcement learning algorithm, a global sharing model is issued to a selected sample client downwards after the platform selects the sample client through the algorithm, the selected sample client conducts local training and uploads updating parameters, and the platform gathers the uploaded updating model parameters according to a gathering algorithm and updates the model parameters in the global model. The method relieves overfitting of the model while realizing dynamic bidding of federal learning participated users, and solves the problems of lack of federal learning fairness and overfitting of the model caused by the fact that the bidding strategy of the users cannot be changed in the subsequent training process after the users submit the bidding strategy in the conventional incentive mechanism based on auction.

Description

User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning.
Background
With increasing privacy concerns and the advent of relevant policies, traditional machine learning to collect data for centralized training becomes increasingly difficult. Federal learning is the most promising deep learning paradigm because it does not require users to upload raw data to protect user privacy. However, the participating users in federal learning consume a lot of resources such as computation, communication, etc. during the training process of federal learning, which means that the participating users in selfish do not participate in the learning task in a full-hearted way without sufficient return. Meanwhile, due to the fact that the underlying network structure of the federal study is complex, node resources are limited and heterogeneous, and if a federal initiator does not have corresponding incentive selection measures, huge communication overhead is caused, so that network resources are wasted, and popularization of the federal study is hindered.
In the incentive mechanism of the related art, the selection and profit sharing of the participating users can be performed using a gaming technique. This may be particularly achieved by incorporating an auction method in federal learning. As one implementation mode, the selection of high-quality participating users can be carried out through a lightweight and multidimensional incentive scheme, and as another implementation mode, the learning quality of the participating users can be integrated into the federal learning through setting an incentive mechanism framework to carry out quality-aware incentive mechanism and model aggregation. However, existing auction-based incentive mechanisms are almost static, and these methods default to participating users in the auction process to determine their own policies and then no longer modify their own policies with changes in platform behavior, which merely maximizes the utility of the platform or social welfare, but fails to maximize the utility of both the platform and participating users. The method is characterized in that in the process of federal learning auction, after participating users determine own bidding information, the strategy cannot be changed in subsequent training, and whether the participating users are selected or not, the participating users only wait to be selected. The second dynamic bidding method is to assume that the user information is transparent, that is, each user knows the private information of other users, which is impossible to realize in practical application.
Disclosure of Invention
The invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning, which introduces a multi-agent reinforcement learning mode into an incentive mechanism of federal learning, thereby solving the problem that the incentive mechanism based on auction in the prior art causes loss of federal learning fairness due to the fact that a strategy cannot be changed in the follow-up training process. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm, where the method includes:
the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and if the updated global shared model reaches the preset model precision in the test task, judging that the learning task released by the Federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global shared model in multiple turns so as to ensure that the updated global shared model reaches the preset model precision in the test task.
Optionally, the process of outputting, by the sample client, bidding information to be submitted of the sample client in the current round by using a multi-agent reinforcement learning algorithm includes:
and taking the sample client as an intelligent agent, observing the self historical state information in the federal learning environment by the intelligent agent, and outputting the bidding information to be submitted of the sample client in the current turn by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policy engine and an experience pool, the sample client is used as an agent, the agent observes historical state information of the agent in a federal learning environment, and outputs bidding information to be submitted of the sample client in a current turn by using the historical state information, where the policy engine includes:
the sample client side is used as an intelligent agent, historical task state information observed by each intelligent agent in a federated learning environment is stored by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm, and the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data amount and a historical unit resource amount;
historical task state information observed by the intelligent agent in the federal learning environment is used as state information of the intelligent agent in the current round and input to a strategy device in the multi-intelligent-agent reinforcement learning algorithm, and bidding information to be submitted of the intelligent agent in the current round is output.
Optionally, after the outputting of the bidding information to be submitted of the agent in the current round by inputting the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round to the policier in the multi-agent reinforcement learning algorithm, the method further comprises:
and calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after the bidding information to be submitted is uploaded, and the income resources fed back to the intelligent agent by the federal learning environment aiming at the bidding information to be submitted uploaded in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the calculating revenue resources fed back by the federal learning environment in the current turn for the intelligent agent includes:
respectively acquiring resource parameters related to the intelligent agent in the bidding process based on the bidding information to be uploaded of the intelligent agent in the current round;
and inputting resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the policy engine inputs historical task state information observed in the federal learning environment as state information of the agent in the current round into the multi-agent reinforcement learning algorithm, and outputs bidding information to be submitted of the agent in the current round, including:
historical task state information observed by the intelligent agent in the federal learning environment is input into the action network of the policy engine as state information of the intelligent agent in the current round, and bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained;
the state information of the intelligent agent in the current round and the competitive bidding information to be uploaded of the intelligent agent in the current round are input into the value network in the strategy device, and the competitive bidding information to be uploaded is evaluated to obtain an evaluation score of the competitive bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters includes:
respectively calculating the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume proportion corresponding to each sample client;
and after the data volume ratio corresponding to each sample client is multiplied by the update model parameters uploaded by the corresponding sample client, aggregating the update model parameters corresponding to all the sample clients, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
In a second aspect, an embodiment of the present invention provides a user bidding device under federal learning based on a multi-agent reinforcement learning algorithm, where the device includes:
the system comprises an acquisition unit, a resource allocation unit and a resource allocation unit, wherein the acquisition unit is used for acquiring a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
the system comprises a receiving unit, a calculating unit and a processing unit, wherein the receiving unit is used for receiving an update model parameter uploaded by each sample client, and the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current turn before training of the sample client, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
the aggregation unit is used for aggregating the update model parameters uploaded by each sample client and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and the selecting unit is used for judging that the learning task issued by the Federal learning platform is completed if the updated global sharing model reaches the preset model precision in the test task, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
Optionally, the apparatus further comprises:
the output unit is used for outputting the process of the competitive bidding information to be submitted of the sample client in the current round by the sample client by using a multi-agent reinforcement learning algorithm;
the output unit is specifically configured to use the sample client as an agent, and the agent observes historical state information of the agent in a federal learning environment and outputs bidding information to be submitted of the sample client in a current round by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policer and an experience pool, and the output unit includes:
the storage module is used for storing historical task state information observed by each intelligent agent in a federated learning environment by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm with the sample client as the intelligent agent, wherein the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data volume and a historical unit resource volume;
and the output module is used for outputting bidding information to be submitted of the intelligent agent in the current round by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current round into the policy maker in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the output unit further includes:
and the calculation module is used for inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current turn into the policy maker in the multi-intelligent-agent reinforcement learning algorithm, outputting the competitive bidding information to be submitted by the intelligent agent in the current turn, calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the competitive bidding information to be submitted, the environmental state after the competitive bidding information to be submitted is uploaded and the income resources fed back to the intelligent agent by the federal learning environment aiming at the competitive bidding information to be submitted in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the computing module is specifically configured to obtain resource parameters related to the intelligent agents in the bidding process based on the bidding information to be uploaded of the intelligent agents in the current round;
the calculating module is specifically configured to input resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function, so as to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the output module is specifically configured to output bidding information to be submitted of the agent in the current round by inputting historical task state information observed by the agent in the federal learning environment as state information of the agent in the current round into the action network of the policy engine, so as to obtain bidding information to be uploaded of the agent in the current training round;
the output module is specifically used for inputting the state information of the intelligent agent in the current round and the bidding information to be uploaded of the intelligent agent in the current round to the value network in the policy device, evaluating the bidding information to be uploaded and obtaining the evaluation score of the bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregation unit includes:
the computing module is used for respectively computing the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume ratio corresponding to each sample client;
and the aggregation module is used for aggregating the update model parameters corresponding to all the sample clients after multiplying the data volume ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
In a third aspect, an embodiment of the present invention provides a storage medium, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the method described in the first aspect.
In a fourth aspect, an embodiment of the present invention provides an apparatus for user bidding under federal learning based on a multi-agent reinforcement learning algorithm, including:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
As can be seen from the above, in the user bidding method and apparatus based on multi-agent reinforcement learning algorithm under federal learning provided in the embodiments of the present invention, a learning task issued by a federal learning platform is obtained, a sample client is selected from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, a global sharing model is issued to a sample client, update model parameters uploaded by each sample client are received, the update model parameters are formed by training the global sharing model according to configuration in the bidding information to be submitted by the sample client before training starts, update model parameters uploaded by each sample client are further aggregated, model parameters in the global sharing model are updated by using the aggregated update model parameters, if the updated global sharing model reaches preset model accuracy in a test task, the learning task issued by the learning platform is determined to be completed, otherwise, the step of updating the parameters in the global sharing model in multiple rounds is repeatedly executed, so that the updated global sharing model reaches the preset model accuracy after the update in the federal learning is tested. Therefore, compared with the incentive mechanism based on auction in the prior art, the method and the device for adjusting the bidding information can adjust the bidding information uploaded by the client by using the multi-agent learning system, so that the problem that the federal learning fairness is lost due to the fact that the incentive mechanism based on auction in the prior art cannot be changed in the subsequent strategy training process is solved.
In addition, the technical effects that the embodiment can also realize include:
(1) Competitive bidding information uploaded by the client is adjusted based on a multi-agent reinforcement learning algorithm, so that the probability of selecting the client is increased, the fairness of participating in users in federal learning is guaranteed, the suboptimal problem caused by a fixed strategy is solved, and the goal of jointly maximizing the utility of a federal learning platform and participating users is realized.
(2) The multi-agent reinforcement learning algorithm adopts centralized training and distributed execution, so that the corresponding client of the participating user can observe more states, and the stability in the agent training process is improved.
(3) The multi-agent reinforcement learning algorithm adopts an asynchronous deep reinforcement training mode, and decouples the steps of executing the learning task and updating the bidding information, so that the two can work in parallel, and the training of the model is accelerated.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can be obtained from these figures.
Fig. 1 is a schematic flow chart of a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention;
FIG. 2 is a block diagram of a process of outputting bidding information to be submitted by a sample client in a current round by a multi-agent reinforcement learning algorithm according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating a user bidding apparatus under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. A process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federal learning, which adjusts bidding information uploaded by a client by using a multi-agent learning system, thereby solving the problem of lacking federal learning fairness caused by no change in the subsequent training process of a strategy in an incentive mechanism based on auction in the prior art. The traditional auction technology needs private information of participating users in the auction process, the whole auction process is static, namely the bids of the participating users are fixed, bidding information uploaded by a client cannot be adjusted even after the bidding fails, the participating users cannot dynamically change the bidding information, so that fairness loss exists in federal learning, the participating users with weak resources are difficult to select by a federal learning platform, and resources of the participating users are greatly wasted. The embodiment of the invention introduces a multi-agent reinforcement learning algorithm into an incentive mechanism of federal learning, adjusts bidding information uploaded by a client based on the multi-agent reinforcement learning algorithm so as to increase the probability of selecting the client corresponding to a participating user, reduce aggregation time, ensure fairness of participating users in federal learning, solve suboptimal problems caused by fixed strategies and achieve the goal of jointly maximizing the utility of a federal learning platform and participating users.
The following provides a detailed description of embodiments of the invention.
Fig. 1 is a schematic flow chart of a user bidding method under federal learning based on a multi-agent reinforcement learning algorithm according to an embodiment of the present invention. The method may comprise the steps of:
s100: the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client.
The learning task issued by the federal learning platform is issued by a server corresponding to a federal issuer, and the learning task is suitable for various application scenarios related to data collection training, such as a target recognition task, a data classification task and the like. In order to ensure that the server selects a proper client to perform model training, each client uploads bidding information, and the server further selects each client based on the learning task.
The bidding information here consists of computing resources, data volume and bidding resources. Specifically, in the process of selecting a sample client from a client set based on a learning task and bidding information uploaded by the client set participating in federal learning, after receiving the bidding information of the client, a federal learning platform can perform modeling solution on the learning task to obtain a sample client and bidding resources corresponding to the sample client, wherein the modeling solution process comprises the following steps:
Figure BDA0003846997550000081
s n ∈{0,1} (2)
t n max ≤T max (3)
the method comprises the following steps that (1) the payment sum of the federal learning platform to selected clients does not exceed the budget of the federal learning platform, (2) each client is selected or not selected, is selected to be 1 and is not selected to be 0, and (3) the training time of the selected client cannot exceed the maximum time specified by the federal learning platform.
By solving the expression, the federated learning platform will pick out the sample client set and the amount of resources purchased by the federated learning platform at each sample client.
S110: and receiving an updated model parameter uploaded by each sample client, wherein the updated model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to the configuration in the bidding information to be submitted after the sample client is selected.
In the embodiment of the invention, the sample clients are clients wishing to participate in the learning task, only the selected sample clients can download and train the global model, each sample client has a multi-agent reinforcement learning algorithm, and specifically, after the selected sample clients receive the global shared model, the global model is trained according to the configuration in the bidding information of the selected sample clients to obtain the updated model parameters.
The method comprises the steps that a specific sample client side outputs bidding information to be submitted of the sample client side in the current round by using a multi-agent reinforcement learning algorithm, the sample client side serves as an agent, the agent observes historical state information of the agent in a federal learning environment, and the historical state information is used for outputting the bidding information to be submitted of the sample client side in the current round.
The multi-agent reinforcement learning algorithm comprises a strategy device and an experience pool, wherein a sample client is used as an agent, the agent observes the self historical state information in the federal learning environment, and the historical state information is used for outputting the process that the sample client submits bidding information in the current round, the sample client can be used as the agent, the historical task state information observed by each agent in the federal learning environment is stored by using the experience pool in the multi-agent reinforcement learning algorithm, the historical task state information is equivalent to the state of the agent in the past submission of the bidding information and the feedback of the federal learning on the agent in the past submission of the bidding information, at least comprises whether the agent is selected in the historical round, the historical resource value, the history providing data volume and the historical unit resource volume, and the historical task state information observed by the agent in the federal learning environment is used as the state information of the agent in the current round and is input into the strategy device in the multi-agent reinforcement learning algorithm, and the to-submit bidding information of the agent in the current round is output.
It can be appreciated that the multi-agent reinforcement learning algorithm described above can learn how to map task states in the federated learning environment into bidding information so that the client and platform obtain maximum resource revenue simultaneously. The basic model of the system is a Markov game process, and in the Markov game, all the intelligent bodies simultaneously select and execute bidding information to be submitted by each intelligent body according to the task state (or observed value) of the current federal learning environment. It is defined as a tuple (n, S, A) 1 ,...,A n ,T,γ,R 1 ,...,R n ) Wherein n is the number of agents, and S is the task state of the multi-agent reinforcement learning algorithm, which refers to the historical task state information of each agent; a is a bidding information set to be submitted by each agent; t is SxA 1 ×A 2 ×...×A n ×S→[0,1]Is a set of agent state transition functions, i.e. the probability distribution of the next task state given the current task state and the joint action. R i :S×A 1 ×A 2 ×...×A n ×S→[0,1]Is a collection of agent i reward functions, R i (s,a 1 ,...×a n S) agent i takes a joint action (a) in task state s 1 ,...a n ) Then in task state s t+1 The resulting reward. The expectation of a jackpot obtained by a certain agent i here may be expressed as:
Figure BDA0003846997550000091
the reward function of the agent may be expressed as:
Figure BDA0003846997550000092
grid; d i Is the data volume of agent i; m is i For the unit computing power of the agent, c i Is unit cost;
Figure BDA0003846997550000093
average profit, x, for agent i to get on its own resource demand i The resources are served to the agents to their needs. Due to uncertainty in the behavior of the agent owner, for example, the agent may use it for a long time for other things, resulting in a device with little remaining resources available for task training, where x is used i Defined as random variables within a certain interval
Figure BDA0003846997550000094
Wherein x is i Following the probability distribution function F (x) i )。
Further, in order to know the profit condition fed back by the sample client in each turn in real time, after outputting the bidding information to be submitted of the intelligent agent in the current turn, the profit resource fed back by the federal learning environment in the current turn for the intelligent agent can be calculated, and the experience pool in the multi-intelligent-agent reinforcement learning algorithm is used for storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after uploading the bidding information to be submitted, and the profit resource fed back by the federal learning environment to the intelligent agent in the current turn for the bidding information to be submitted. The resource parameters related to the intelligent agent in the bidding process can be respectively obtained based on the bidding information to be uploaded of the intelligent agent in the current round, and the resource parameters related to the intelligent agent in the bidding process are further input into a pre-constructed revenue function, so that the revenue resource fed back by the federal learning environment for the intelligent agent in the current round is obtained.
Each sample client is provided with a strategy device, the strategy device comprises an action network and a value network, specifically, in the process of outputting the bidding information to be submitted of the intelligent agent in the current round, historical task state information observed by the intelligent agent in the federal learning environment is used as the state information of the intelligent agent in the current round and is input into the action network of the strategy device, and the bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained; and evaluating the bidding information to be uploaded to obtain an evaluation score of the bidding information to be uploaded by inputting the state information of the intelligent agent in the current turn and the bidding information to be uploaded of the intelligent agent in the current turn into a value network in the policy device. The action network is trained by using the evaluation scores of the bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation scores of the bidding information to be uploaded and the income resources actually fed back by the intelligent agents, and the network parameters of the value network are updated through a time sequence difference method.
Specifically, in an actual application scenario, it can be assumed that m sample clients and a task initiator exist in a certain area at a certain time step t, where the certain time step t is equivalent to one turn of local training and uploading updated model parameters by a client set submitting a task bid, a federal learning platform selecting client and a selected sample client in the federal learning process, and the task bid includes bidding information (data volume and computing resources) of the sample clients and payment expected to be obtained. In the federal learning, each sample client serves as an agent and is provided with a reinforcement learning strategy device, and the reinforcement learning strategy device is formed by using a multi-layer perception machine in deep learning and comprises an input layer, a hidden layer and an output layer. The policer is represented as follows: the st is the state of the federal learning environment at the time t, the state comprises the state of each agent and the state of the federal learning platform, and in each time slot t, the observation space of the agent i is
Figure BDA0003846997550000101
Wherein,
Figure BDA0003846997550000102
the price provided for the agent in the previous round,
Figure BDA0003846997550000103
is toOne round of bidding results i belongs to {0,1}, wherein s =0 represents that the bidding fails, and s =1 represents that the bidding succeeds;
Figure BDA0003846997550000104
representing the single computational resources provided by the agents, the unit computational resources of each agent being related to the agent's own resource requirements, since the agents will not necessarily allocate all of the computational resources to the training task during the training time;
Figure BDA0003846997550000105
representing the amount of data provided by the agent in the previous round. Before the current training wheel begins, the agent i observes the state information of the current learning task before, and then inputs the state learning observed from the Federal learning environment into the action network, and the action network outputs a strategy through calculation
Figure BDA0003846997550000106
And the strategy is bidding information to be submitted by the intelligent agent in the current round. Wherein,
Figure BDA0003846997550000107
is an array, which contains four attributes in the bidding information,
Figure BDA0003846997550000108
the intelligent agent is rewarded by the feedback of the federal learning environment after the user takes action in the current turn, each intelligent agent has an own reward function, and in the embodiment of the invention, the benefit resources of the intelligent agent in the current turn can be calculated by using the reward function of the intelligent agent.
Exemplarily, fig. 2 is a block diagram of a process of outputting bidding information to be submitted by a sample client in a current round by using a multi-agent reinforcement learning algorithm, where each policy device includes an action network and a value network, and the action network and the value network respectively include a main network and a target network, and the specific algorithm process in conjunction with fig. 2 is as follows:
for number of rounds epicode =1 to M, iteration is performed
Initializing an action space;
for T =1 to T, iterate
a) Selecting actions for each agent i
Figure BDA0003846997550000111
b) Execution actiona = (a) 1 ,...,a N ) Observe the reward r and the new state s t+1
c) Will(s) t ,a t ,r t ,s t+1 T) putting the obtained product into an experience pool D;
d)s t ←s t+1
for agent ito N iterates:
randomly sampling small batches of stored samples from the experience pool (X) j ,a j ,r j ,X′ j )
Master network update
Figure BDA0003846997550000112
By minimizing the loss function
Figure BDA0003846997550000113
To update the primary network and the target network;
updating the action network by adopting gradient ascent:
Figure BDA0003846997550000114
after all updates are completed, the target network is updated for each agent i: theta' i ←τθ i +(1-τ)θ' i
S120: and aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters.
Specifically, the ratio of the data volume of each sample client to the data volume of all the sample clients can be calculated respectively to obtain the data volume proportion corresponding to each sample client, the update model parameters corresponding to all the sample clients are aggregated after the data volume proportion corresponding to each sample client is multiplied by the update model parameters uploaded by the corresponding sample client, and the model parameters in the global sharing model are updated by accumulating the aggregated update model parameters.
It can be understood that the model parameters of the global sharing model issued to each sample client are the same, and the model parameters corresponding to the global sharing model trained by each sample client according to the output bidding information configuration to be submitted are different, here, a local model parameter is obtained after each sample client trains the global sharing model, an updated model parameter is obtained by subtracting the local model parameter from the model parameter of the issued global sharing model, and the federal learning platform further updates the model parameters in the global sharing model.
S130: and if the updated global sharing model reaches the preset model precision in the test task, judging that the learning task issued by the federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
In the embodiment of the invention, the huge exploration space of the intelligent agent is considered, and centralized training and a distributed execution multi-intelligent reinforcement learning algorithm are adopted as a framework. Each sample client serves as an agent, each agent is provided with a policer, the policer consists of an action network and a value network, and each action network and the value network consist of two networks (a main network and a target network) respectively and are used for training updating. The intelligent agent observes the task state of the current round, for example, the task state of the current round is not selected, the historical price, the historical provided data amount, the historical unit resource amount and the like, and the task state is used as the input of an action network in the strategy device, the action network gives the action of the current round, namely the bidding information to be submitted in the current round, the value network of each intelligent agent has the local state observed and the action made by each intelligent agent, and the action output by the intelligent agent is scored as the input. Specifically, before each federal learning training round begins, each agent observes own historical information s (historical bidding result, historical resource calculation resource, historical data amount and historical bidding) as a state and inputs the state into a policer, the policer outputs action a of the agent, namely bidding information of the current training round of a user, the user submits the bidding information to a federal learning platform, namely an environment, the federal learning platform selects a proper sample client to maximize own profit, the federal learning environment feeds back a reward value r of each agent, and the experience pool converts the reward value to the next state s 'to store the tuple (s, a, s' r). When new data cannot be collected in the experience pool, the policer starts training, in the embodiment of the present invention, centralized training is specifically adopted, and the policer is trained by a distributed execution idea, and the centralized training may be expressed as: first, the action network in each agent policer selects an action a according to the current state, and then the value network calculates a Q value according to the state-action pair as feedback for the action a to the action network. Here the value network is trained on the estimated and actual Q values, and the action network updates the strategy according to the feedback of the value network. In order to obtain a more accurate Q value, a value network in a strategy device has the actions and states of all agents in the training process, the value network updates the parameters of the value network through a time sequence difference method, and then the parameters of an action network are updated through gradient rising. The distributed execution may appear as: after the centralized training is finished, each agent executes the centralized training according to the current observed state distribution. The strategy device starts to enter a convergence state after being trained for enough time, and finally an optimal real-time bidding effect is achieved.
The user bidding method under the federal learning based on the multi-agent reinforcement learning algorithm provided by the embodiment of the invention comprises the steps of selecting a sample client from a client set based on a learning task issued by a federal learning platform and bidding information uploaded by the learning task and the client set participating in the federal learning, issuing a global sharing model to the sample client, receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by training the global sharing model according to the configuration in the bidding information to be submitted after the sample client outputs the bidding information to be submitted of the sample client in the current turn by using the multi-agent reinforcement learning algorithm before the training is started, further aggregating the update model parameters uploaded by each sample client, updating the model parameters in the global sharing model by using the aggregated update model parameters, judging to finish the learning task issued by the federal learning platform if the updated global sharing model reaches the preset model precision in the testing task, and otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns to ensure that the updated global sharing model reaches the preset task precision in the testing task. Therefore, compared with the incentive mechanism based on auction in the prior art, the embodiment of the invention can adjust the bidding information uploaded by the client by using the multi-agent learning system to increase the probability of the client being selected, thereby solving the problem that the federal learning fairness is lost due to the fact that the incentive mechanism based on auction in the prior art cannot be changed in the subsequent strategy training process.
Based on the above embodiment, another embodiment of the present invention provides a user bidding apparatus under federal learning based on multi-agent reinforcement learning algorithm, as shown in fig. 3, the apparatus includes:
the obtaining unit 20 may be configured to obtain a learning task issued by a federal learning platform, select a sample client from the client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issue a global sharing model to the sample client;
the receiving unit 22 may be configured to receive an update model parameter uploaded by each sample client, where the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in a current round before training of the sample client starts, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
the aggregation unit 24 may be configured to aggregate the update model parameters uploaded by each sample client, and update the model parameters in the global sharing model by using the aggregated update model parameters;
the selecting unit 26 may be configured to determine that the learning task issued by the federal learning platform is completed if the updated global sharing model reaches the preset model accuracy in the test task, and otherwise, repeat the step of updating the model parameters in the global sharing model for multiple rounds, so that the updated global sharing model reaches the preset model accuracy in the test task.
Optionally, the apparatus further comprises:
the output unit can be used for the process that the sample client outputs the bidding information to be submitted of the sample client in the current round by using a multi-agent reinforcement learning algorithm;
the output unit is specifically configured to use the sample client as an agent, and the agent observes historical state information of the agent in a federal learning environment and outputs bidding information to be submitted of the sample client in a current round by using the historical state information.
Optionally, the multi-agent reinforcement learning algorithm includes a policer and an experience pool, and the output unit includes:
a storage module, configured to use an experience pool in the multi-agent reinforcement learning algorithm to store historical task status information observed by each agent in a federated learning environment with the sample client as an agent, where the historical task status information at least includes whether the agent is selected in a historical turn, a historical resource value, a historical provided data amount, and a historical unit resource amount;
and the output module can be used for outputting the bidding information to be submitted of the intelligent agent in the current round by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current round into the policy maker of the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the output unit further includes:
and the calculation module can be used for calculating the income resources fed back by the federal learning environment for the intelligent agent in the current turn by inputting the historical task state information observed by the intelligent agent in the federal learning environment as the state information of the intelligent agent in the current turn into the policy maker in the multi-intelligent-agent reinforcement learning algorithm, outputting the competitive bidding information to be submitted by the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the competitive bidding information to be submitted, the environmental state after the competitive bidding information to be submitted is uploaded and the income resources fed back to the intelligent agent by the federal learning environment for the competitive bidding information to be submitted in the current turn by using the experience pool in the multi-intelligent-agent reinforcement learning algorithm.
Optionally, the computing module may be specifically configured to obtain resource parameters related to the intelligent agents in the bidding process based on the bidding information to be uploaded of the intelligent agents in the current round;
the calculation module may be further configured to input resource parameters related to the intelligent agent in the bidding process to a pre-established revenue function, so as to obtain revenue resources fed back by the intelligent agent in the current round in the federal learning environment.
Optionally, each sample client is configured with a policy engine, the policy engine includes an action network and a value network, and the output module is specifically configured to output bidding information to be submitted of the agent in the current round by inputting historical task state information observed by the agent in the federal learning environment as state information of the agent in the current round into the action network of the policy engine, so as to obtain bidding information to be uploaded of the agent in the current training round;
the output module is specifically further configured to evaluate the bidding information to be uploaded by inputting the state information of the agent in the current round and the bidding information to be uploaded of the agent in the current round to the value network in the policer, so as to obtain an evaluation score of the bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
Optionally, the aggregation unit 24 includes:
the computing module can be used for respectively computing the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume ratio corresponding to each sample client;
the aggregation module may be configured to aggregate the update model parameters corresponding to all the sample clients after multiplying the data size ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, and update the model parameters in the global sharing model by accumulating the aggregated update model parameters.
Based on the above method embodiments, another embodiment of the present invention provides a storage medium having executable instructions stored thereon, which when executed by a processor, cause the processor to implement the above method.
Based on the above embodiment, another embodiment of the present invention provides a vehicle including:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method described above. The vehicle may be a non-autonomous vehicle or an autonomous vehicle.
The system and apparatus embodiments correspond to the method embodiments, and have the same technical effects as the method embodiments, and for the specific description, refer to the method embodiments. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again. Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A user bidding method under federal learning based on multi-agent reinforcement learning algorithm is characterized by comprising the following steps:
the method comprises the steps of obtaining a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
receiving an update model parameter uploaded by each sample client, wherein the update model parameter is formed by using a multi-agent reinforcement learning algorithm to output bidding information to be submitted of the sample client in the current round before training of the sample client is started, and training a global sharing model according to configuration in the bidding information to be submitted after the sample client is selected;
aggregating the update model parameters uploaded by each sample client, and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and if the updated global shared model reaches the preset model precision in the test task, judging that the learning task released by the Federal learning platform is completed, otherwise, repeatedly executing the step of updating the model parameters in the global shared model in multiple turns so as to ensure that the updated global shared model reaches the preset model precision in the test task.
2. The method of claim 1, wherein the process of the sample client outputting the bidding information to be submitted of the sample client in the current round using a multi-agent reinforcement learning algorithm comprises:
and taking the sample client as an intelligent agent, observing the self historical state information in the federal learning environment by the intelligent agent, and outputting the bidding information to be submitted of the sample client in the current turn by using the historical state information.
3. The method of claim 2, wherein the multi-agent reinforcement learning algorithm comprises a policer and an experience pool, the taking the sample client as an agent, the agent observing historical state information of the agent in a federated learning environment and using the historical state information to output bidding information to be submitted of the sample client in a current round, comprises:
the sample client side is used as an intelligent agent, historical task state information observed by each intelligent agent in a federated learning environment is stored by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm, and the historical task state information at least comprises whether the intelligent agent is selected in a historical turn, a historical resource value, a historical provided data amount and a historical unit resource amount;
historical task state information observed by the intelligent agent in the federal learning environment is used as state information of the intelligent agent in the current round and input to a strategy device in the multi-intelligent-agent reinforcement learning algorithm, and bidding information to be submitted of the intelligent agent in the current round is output.
4. The method of claim 3, wherein after outputting the agent's to-be-submitted bid information for a current round by inputting historical task state information observed by the agent in the federated learning environment as the agent's state information for the current round to a policer in the multi-agent reinforcement learning algorithm, the method further comprises:
and calculating the income resources fed back by the federal learning environment aiming at the intelligent agent in the current turn, and storing the historical state of the environment observed by the intelligent agent in the current turn, the bidding information to be submitted, the environmental state after the bidding information to be submitted is uploaded, and the income resources fed back to the intelligent agent by the federal learning environment aiming at the bidding information to be submitted uploaded in the current turn by using an experience pool in the multi-intelligent-agent reinforcement learning algorithm.
5. The method of claim 4, wherein calculating the revenue resources for the federated learning environment for the agent to feed back on the current turn comprises:
respectively acquiring resource parameters related to the intelligent agent in the bidding process based on the bidding information to be uploaded of the intelligent agent in the current round;
and inputting resource parameters related to the intelligent agent in the bidding process into a pre-constructed revenue function to obtain revenue resources fed back by the intelligent agent in the current turn in the federal learning environment.
6. The method of claim 3, wherein each sample client is configured with a policer comprising an action network and a value network, the outputting of the bidding information to be submitted by the agent in the current round by inputting historical task state information observed in the federated learning environment as state information of the agent in the current round to a policer in the multi-agent reinforcement learning algorithm comprises:
historical task state information observed by the intelligent agent in the federal learning environment is input into the action network of the policy engine as state information of the intelligent agent in the current round, and bidding information to be submitted of the intelligent agent in the current round is output, so that the bidding information to be uploaded of the intelligent agent in the current training round is obtained;
the state information of the intelligent agent in the current round and the competitive bidding information to be uploaded of the intelligent agent in the current round are input into a value network in the strategy device, and the competitive bidding information to be uploaded is evaluated to obtain an evaluation score of the competitive bidding information to be uploaded;
the action network is trained by using the evaluation score of the competitive bidding information to be uploaded, the network parameters of the action network are updated through gradient rise, the value network is trained by using the evaluation score of the competitive bidding information to be uploaded and income resources actually fed back by the intelligent agent, and the network parameters of the value network are updated through a time sequence difference method.
7. The method according to any one of claims 1 to 6, wherein the aggregating the update model parameters uploaded by each sample client, and using the aggregated update model parameters to update the model parameters in the global sharing model, comprises:
respectively calculating the ratio of the data volume of each sample client to the data volume of all the sample clients to obtain the data volume proportion corresponding to each sample client;
and after multiplying the data volume ratio corresponding to each sample client by the update model parameters uploaded by the corresponding sample client, aggregating the update model parameters corresponding to all the sample clients, and updating the model parameters in the global sharing model by accumulating the aggregated update model parameters.
8. A user bidding apparatus under federal learning based on multi-agent reinforcement learning algorithm, the apparatus comprising:
the system comprises an acquisition unit, a resource allocation unit and a resource allocation unit, wherein the acquisition unit is used for acquiring a learning task issued by a federal learning platform, selecting a sample client from a client set based on the learning task and bidding information uploaded by the client set participating in federal learning, and issuing a global sharing model to the sample client;
the receiving unit is used for receiving the updated model parameters uploaded by each sample client, and the updated model parameters are formed by training a global sharing model by using a multi-agent reinforcement learning algorithm to output sample clients in the current round of bidding information to be submitted;
the aggregation unit is used for aggregating the update model parameters uploaded by each sample client and updating the model parameters in the global sharing model by using the aggregated update model parameters;
and the selecting unit is used for judging that the learning task issued by the Federal learning platform is completed if the updated global sharing model reaches the preset model precision in the test task, otherwise, repeatedly executing the step of updating the model parameters in the global sharing model in multiple turns so as to ensure that the updated global sharing model reaches the preset model precision in the test task.
9. A storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to implement the method for user bidding under federal learning based on multi-agent reinforcement learning algorithms.
10. An apparatus for bidding users under federal learning based on multi-agent reinforcement learning algorithm, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for user bidding under federated learning based on a multi-agent reinforcement learning algorithm.
CN202211120985.2A 2022-03-28 2022-09-15 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning Pending CN115358831A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210309611.9A CN114971819A (en) 2022-03-28 2022-03-28 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning
CN2022103096119 2022-03-28

Publications (1)

Publication Number Publication Date
CN115358831A true CN115358831A (en) 2022-11-18

Family

ID=82975873

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210309611.9A Withdrawn CN114971819A (en) 2022-03-28 2022-03-28 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning
CN202211120985.2A Pending CN115358831A (en) 2022-03-28 2022-09-15 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210309611.9A Withdrawn CN114971819A (en) 2022-03-28 2022-03-28 User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning

Country Status (1)

Country Link
CN (2) CN114971819A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115130683A (en) * 2022-07-18 2022-09-30 山东大学 Asynchronous federal learning method and system based on multi-agent model
CN115086399B (en) * 2022-07-28 2022-12-06 深圳前海环融联易信息科技服务有限公司 Federal learning method and device based on hyper network and computer equipment
CN117076113B (en) * 2023-08-17 2024-09-06 重庆理工大学 Industrial heterogeneous equipment multi-job scheduling method based on federal learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544899A (en) * 2022-11-23 2022-12-30 南京邮电大学 Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN114971819A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN115358831A (en) User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning
Du et al. Learning resource allocation and pricing for cloud profit maximization
Jiao et al. Toward an automated auction framework for wireless federated learning services market
CN110189174A (en) Mobile crowd sensing excitation method based on data quality sensing
CN111064633B (en) Cloud-edge cooperative power information communication equipment automated testing resource allocation method
CN113992676B (en) Incentive method and system for layered federal learning under terminal edge cloud architecture and complete information
CN113128828B (en) Satellite observation distributed online planning method based on multi-agent reinforcement learning
CN108921298B (en) Multi-agent communication and decision-making method for reinforcement learning
CN109068288B (en) Method and system for selecting mobile crowd sensing incentive mechanism based on multi-attribute user
Kumar et al. Federated control with hierarchical multi-agent deep reinforcement learning
CN112950251A (en) Reputation-based vehicle crowd sensing node reverse combination auction excitation optimization method
Lim et al. Incentive mechanism design for resource sharing in collaborative edge learning
CN107784561A (en) The implementation method of online incentive mechanism in a kind of mobile mass-rent system
CN109922152A (en) Calculating discharging method and system in a kind of mobile edge calculations
CN114415735B (en) Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN110390560A (en) A kind of mobile intelligent perception multitask pricing method based on Stackelberg game
Zhu et al. Agent-based dynamic scheduling for earth-observing tasks on multiple airships in emergency
CN116362327A (en) Model training method and system and electronic equipment
CN104657390B (en) A kind of answer platform method and system
Liew et al. Economics of semantic communication in metaverse: an auction approach
CN113660304A (en) Unmanned aerial vehicle group distributed learning resource control method based on bidirectional auction game
CN116566891A (en) Delay-sensitive service function chain parallel route optimization method, device and medium
CN115271092A (en) Crowd funding incentive method for indoor positioning federal learning
CN109872058A (en) Multimedia crowd sensing excitation method for machine learning system
Li et al. A cooperative analysis to incentivize communication-efficient federated learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zeng Rongfei

Inventor after: An Shuyang

Inventor after: Zeng Chao

Inventor after: Su Mai

Inventor after: Wang Jiaqi

Inventor before: Zeng Rongfei

Inventor before: An Shuyang

Inventor before: Zeng Chao

Inventor before: Han Bo

Inventor before: Su Mai

Inventor before: Wang Jiaqi