CN115358831A - User bidding method and device based on multi-agent reinforcement learning algorithm under federated learning - Google Patents
User bidding method and device based on multi-agent reinforcement learning algorithm under federated learning Download PDFInfo
- Publication number
- CN115358831A CN115358831A CN202211120985.2A CN202211120985A CN115358831A CN 115358831 A CN115358831 A CN 115358831A CN 202211120985 A CN202211120985 A CN 202211120985A CN 115358831 A CN115358831 A CN 115358831A
- Authority
- CN
- China
- Prior art keywords
- agent
- bidding
- learning
- uploaded
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 230000002787 reinforcement Effects 0.000 title claims abstract description 72
- 238000012549 training Methods 0.000 claims abstract description 57
- 230000008569 process Effects 0.000 claims abstract description 35
- 230000002776 aggregation Effects 0.000 claims abstract description 14
- 238000004220 aggregation Methods 0.000 claims abstract description 14
- 230000009471 action Effects 0.000 claims description 43
- 238000012360 testing method Methods 0.000 claims description 16
- 238000011156 evaluation Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 11
- 230000004931 aggregating effect Effects 0.000 claims description 6
- 230000002860 competitive effect Effects 0.000 claims 5
- 238000013468 resource allocation Methods 0.000 claims 2
- 230000007613 environmental effect Effects 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 14
- 230000008859 change Effects 0.000 abstract description 9
- 239000003795 chemical substances by application Substances 0.000 description 206
- 238000004364 calculation method Methods 0.000 description 7
- 230000002123 temporal effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000003999 initiator Substances 0.000 description 2
- 230000009916 joint effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/08—Auctions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开一种基于多智能体强化学习算法在联邦学习下的用户竞价方法及装置,方法包括:获取联邦学习平台发布的学习任务,样本客户端利用强化学习算法向联邦平台上传竞标信息,平台通过算法选取样本客户端后下向被选中的样本客户端下发全局共享模型,被选中的样本客户端进行本地训练并上传更新参数,平台将上传的更新模型参数按照聚合算法进行聚合并对全局模型中的模型参数进行更新。以完成联邦学习平台发布的学习任务,此方法在实现联邦学习参与用户的动态竞价的同时缓解了模型的过拟合,解决了现有基于拍卖的激励机制由于用户提交竞价策略后,用户竞价策略在后续训练过程中不会发生改变而导致联邦学习公平性缺失以及模型过拟合的问题。
The invention discloses a user bidding method and device based on a multi-agent reinforcement learning algorithm under federated learning. The method includes: acquiring learning tasks issued by a federated learning platform, a sample client uploading bidding information to the federated platform using a reinforced learning algorithm, and the platform After the sample client is selected by the algorithm, the global shared model is delivered to the selected sample client. The selected sample client performs local training and uploads the updated parameters. The platform aggregates the uploaded updated model parameters according to the aggregation algorithm and aggregates the global The model parameters in the model are updated. In order to complete the learning tasks released by the federated learning platform, this method alleviates the over-fitting of the model while realizing the dynamic bidding of federated learning users, and solves the existing auction-based incentive mechanism. It will not change in the subsequent training process, resulting in the lack of fairness of federated learning and the problem of model overfitting.
Description
技术领域technical field
本发明涉及人工智能技术领域,具体而言,涉及一种基于多智能体强化学习算法在联邦学习下的用户竞价方法及装置。The present invention relates to the technical field of artificial intelligence, in particular, to a user bidding method and device based on a multi-agent reinforcement learning algorithm under federated learning.
背景技术Background technique
随着用户对隐私的日益重视以及相关政策的出台,传统机器学习收集数据集中训练变得越来越困难。联邦学习因不需要用户上传原始数据而保护用户隐私而成为最具潜力的深度学习范式。但是联邦学习的参与用户在参与联邦学习的训练过程中会消耗大量的计算、通信等资源,这意味着自私的参与用户无法在没有足够回报的前提下全心全意地参与学习任务。同时,由于联邦学习的底层网络结构复杂,节点资源有限且异构,联邦发起者如果没有相应的激励选择措施会造成巨大的通信开销,这些问题不仅浪费了网络资源,而且阻碍了联邦学习的推广。With the increasing attention of users to privacy and the introduction of related policies, it is becoming more and more difficult for traditional machine learning to collect data for centralized training. Federated learning has become the most potential deep learning paradigm because it does not require users to upload raw data and protects user privacy. However, participating users of federated learning consume a large amount of computing, communication and other resources during the training process of federated learning, which means that selfish participating users cannot fully participate in learning tasks without sufficient rewards. At the same time, due to the complex underlying network structure of federated learning, limited and heterogeneous node resources, if the federation initiator does not have corresponding incentives to choose measures, it will cause huge communication overhead. These problems not only waste network resources, but also hinder the promotion of federated learning .
相关技术的激励机制中,可采用博弈技术进行参与用户的选择和利润分配。具体可以可通过在联邦学习中加入拍卖方法来实现。作为一种实现方式,可以通过轻量级和多维度的激励方案来进行高质量参与用户的选择,作为另一种实现方式,可以通过设置激励机制框架,将参与用户的学习质量整合到联邦学习中进行有质量意识的激励机制和模型聚合。然而,现有基于拍卖的激励机制几乎都是静态的,这些方法在拍过程中会默认参与用户在确定自己的策略后不再随平台行为的改变更改自己的策略,该方式仅仅最大化了平台或者社会福利的效用,而未能共同最大化平台和参与用户的效用。具体表现为在联邦学习拍卖过程中,参与用户在确定自己的竞标信息后,该策略在后续训练中不会发生改变,无论被选中与否,参与用户只有等待被选择,在这样的拍卖机制中,即使参与用户没有中标,参与用户也无法改变现有策略,导致出现资源强的参与用户一直被选中,而资源弱但诚实的参与用户不会被选中的现象,这不仅会导致联邦学习公平性的缺失,阻碍参与用户的积极性,并且由于不断选中特定的客户端,导致数据多样性减少,模型可能会出过拟合的问题。其次一些的动态竞标方法都是假设用户信息是透明的,即每个用户知道其他用户的私人信息,这在实际应用中是不可能实现的。In the incentive mechanism of related technologies, game technology can be used to select participating users and distribute profits. Specifically, it can be realized by adding an auction method in federated learning. As an implementation method, a lightweight and multi-dimensional incentive scheme can be used to select high-quality participating users. As another implementation method, the learning quality of participating users can be integrated into federated learning by setting an incentive mechanism framework. Quality-conscious incentives and model aggregation in . However, the existing incentive mechanisms based on auctions are almost static. During the bidding process, these methods will default to participating users not to change their strategies with changes in platform behavior after determining their own strategies. This method only maximizes the value of the platform. Or the utility of social welfare without jointly maximizing the utility of the platform and participating users. The specific performance is that in the federated learning auction process, after the participating users determine their bidding information, the strategy will not change in the subsequent training. No matter whether they are selected or not, the participating users can only wait to be selected. In such an auction mechanism , even if the participating users do not win the bid, the participating users cannot change the existing strategy, resulting in the phenomenon that the participating users with strong resources are always selected, while the participating users with weak resources but honest will not be selected, which will not only lead to the fairness of federated learning The absence of , hinders the enthusiasm of participating users, and due to the continuous selection of specific clients, the data diversity is reduced, and the model may have over-fitting problems. Secondly, some dynamic bidding methods all assume that user information is transparent, that is, each user knows the private information of other users, which is impossible in practical applications.
发明内容Contents of the invention
本发明提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价方法及装置,通过将多智能体强化学习的方式引入联邦学习的激励机制中,从而解决了现有技术基于拍卖的激励机制由于策略后续训练过程中不会发生改变而导致联邦学习公平性缺失的问题。具体的技术方案如下:The present invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federated learning. By introducing the multi-agent reinforcement learning method into the incentive mechanism of federated learning, it solves the problem of auction-based bidding in the prior art. The incentive mechanism will not change in the subsequent training process of the strategy, which leads to the lack of fairness in federated learning. The specific technical scheme is as follows:
第一方面,本发明实施例提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价方法,所述方法包括:In the first aspect, an embodiment of the present invention provides a user bidding method based on a multi-agent reinforcement learning algorithm under federated learning, the method comprising:
获取联邦学习平台发布的学习任务,基于所述学习任务以及参与联邦学习的客户端集合所上传的竞标信息从所述客户端集合中选取样本客户端,并向样本客户端下发全局共享模型;Obtaining the learning task released by the federated learning platform, selecting a sample client from the client set based on the learning task and the bidding information uploaded by the client set participating in the federated learning, and sending the global shared model to the sample client;
接收每个样本客户端上传的更新模型参数,所述更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的;Receive the updated model parameters uploaded by each sample client, the updated model parameters are the sample client using the multi-agent reinforcement learning algorithm to output the bidding information of the sample client in the current round before the training starts, after being selected according to The configuration training global shared model in the bidding information to be submitted is formed;
对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对所述全局共享模型中的模型参数进行更新;Aggregating the updated model parameters uploaded by each sample client, and using the aggregated updated model parameters to update the model parameters in the global shared model;
若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。If the updated global shared model reaches the preset model accuracy in the test task, it is determined that the learning task released by the federated learning platform is completed; otherwise, the steps of updating the model parameters in the global shared model are repeated for multiple rounds, so that The updated globally shared model achieves the preset model accuracy on the test task.
可选的,所述样本客户端使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的过程,包括:Optionally, the process of the sample client using a multi-agent reinforcement learning algorithm to output the bidding information to be submitted by the sample client in the current round includes:
以所述样本客户端作为智能体,所述智能体观察在联邦学习环境中自身的历史状态信息,并利用所述历史状态信息输出所述样本客户端在当前轮次的待提交竞标信息。Taking the sample client as an agent, the agent observes its own historical state information in the federated learning environment, and uses the historical state information to output the bidding information of the sample client in the current round to be submitted.
可选的,所述多智能体强化学习算法包括策略器和经验池,所述以所述样本客户端作为智能体,所述智能体观察在联邦学习环境中自身的历史状态信息,并利用所述历史状态信息输出所述样本客户端在当前轮次的待提交竞标信息,包括:Optionally, the multi-agent reinforcement learning algorithm includes a strategist and an experience pool, the sample client is used as an agent, and the agent observes its own historical state information in a federated learning environment, and uses the The above historical status information outputs the bidding information to be submitted by the sample client in the current round, including:
以所述样本客户端作为智能体,使用所述多智能体强化学习算法中经验池来存储联邦学习环境中各个智能体观察到的历史任务状态信息,所述历史任务状态信息至少包括智能体在历史轮次中是否被选中、历史资源值、历史提供数据量以及历史单位资源量;Taking the sample client as an agent, using the experience pool in the multi-agent reinforcement learning algorithm to store historical task state information observed by each agent in the federated learning environment, the historical task state information includes at least Whether it is selected in the historical round, the historical resource value, the amount of historical provided data, and the historical unit resource amount;
通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息。By inputting the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round to the strategist in the multi-agent reinforcement learning algorithm, the output of the agent in the current round is times of bids to be submitted.
可选的,在所述通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息之后,所述方法还包括:Optionally, in the multi-agent reinforcement learning algorithm, the strategist inputs the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round. , after outputting the bid information to be submitted by the agent in the current round, the method further includes:
计算联邦学习环境针对智能体在当前轮次反馈的收益资源,并使用所述多智能体强化学习算法中经验池存储智能体在当前轮次观察到环境的历史状态、待提交竞标信息、待提交竞标信息上传后的环境状态以及联邦学习环境针对当前轮次上传的待提交竞标信息反馈给智能体的收益资源。Calculate the revenue resources fed back by the federated learning environment for the agent in the current round, and use the experience pool in the multi-agent reinforcement learning algorithm to store the historical state of the environment observed by the agent in the current round, the bidding information to be submitted, the information to be submitted The state of the environment after the bidding information is uploaded and the revenue resources that the federated learning environment feeds back to the agent for the bidding information uploaded in the current round.
可选的,所述计算联邦学习环境针对智能体在当前轮次反馈的收益资源,包括:Optionally, the computing federated learning environment aims at the revenue resources fed back by the agent in the current round, including:
基于智能体在当前轮次上的待上传竞标信息,分别获取智能体在竞标过程中涉及的资源参数;Based on the bidding information to be uploaded by the agent in the current round, respectively obtain the resource parameters involved in the bidding process of the agent;
将所述智能体在竞标过程中涉及的资源参数输入至预先构建的收益函数,得到联邦学习环境针对智能体在当前轮次反馈的收益资源。The resource parameters involved in the bidding process of the agent are input into the pre-built revenue function to obtain the revenue resource fed back by the federated learning environment for the agent in the current round.
可选的,每个样本客户端配置有一个策略器,所述策略器包括动作网络和价值网络,所述通过将所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息,包括:Optionally, each sample client is configured with a strategist, the strategist includes an action network and a value network, and the state information of historical tasks observed in the federated learning environment is used as the The state information of the agent is input to the strategist in the multi-agent reinforcement learning algorithm, and the bidding information to be submitted by the agent in the current round is output, including:
通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述策略器中动作网络,输出智能体在当前轮次的待提交竞标信息,得到智能体在当前训练轮次的待上传竞标信息;By inputting the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round to the action network in the strategist, output the state information of the agent to be submitted in the current round Bidding information, to obtain the bidding information to be uploaded by the agent in the current training round;
通过将所述智能体在当前轮次的状态信息以及智能体在当前轮次的待上传竞标信息输入至所述策略器中价值网络,对所述待上传竞标信息进行评估,得到待上传竞标信息的评估分数;By inputting the state information of the agent in the current round and the bidding information to be uploaded by the agent in the current round into the value network in the strategist, evaluating the bidding information to be uploaded, and obtaining the bidding information to be uploaded assessment score;
其中,所述动作网络利用所述待上传竞标信息的评估分数进行训练,所述动作网络的网络参数通过梯度上升来更新,所述价值网络利用所述待上传竞标信息的评估分数以及智能体实际反馈的收益资源进行训练,所述价值网络的网络参数通过时序差分法来更新。Wherein, the action network uses the evaluation score of the bidding information to be uploaded for training, the network parameters of the action network are updated through gradient ascent, and the value network uses the evaluation score of the bidding information to be uploaded and the actual The feedback revenue resources are used for training, and the network parameters of the value network are updated by the temporal difference method.
可选的,所述对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对所述全局共享模型中的模型参数进行更新,包括:Optionally, the aggregating the updated model parameters uploaded by each sample client, and using the aggregated updated model parameters to update the model parameters in the global shared model include:
分别计算各个样本客户端的数据量与所有样本客户端的数据量的比值,得到每个样本客户端对应的数据量占比;Calculate the ratio of the data volume of each sample client to the data volume of all sample clients to obtain the proportion of data volume corresponding to each sample client;
将每个样本客户端对应的数据量占比乘以相应样本客户端上传的更新模型参数后,聚合所有样本客户端对应的更新模型参数,通过累加聚合后更新模型参数对全局共享模型中的模型参数进行更新。After multiplying the proportion of data volume corresponding to each sample client by the updated model parameters uploaded by the corresponding sample client, aggregate the updated model parameters corresponding to all sample clients, and update the model parameters for the model in the global shared model through accumulation and aggregation. The parameters are updated.
第二方面,本发明实施例提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价装置,所述装置包括:In the second aspect, an embodiment of the present invention provides a user bidding device based on a multi-agent reinforcement learning algorithm under federated learning, and the device includes:
获取单元,用于获取联邦学习平台发布的学习任务,基于所述学习任务以及参与联邦学习的客户端集合所上传的竞标信息从所述客户端集合中选取样本客户端,并向样本客户端下发全局共享模型;The acquiring unit is configured to acquire the learning tasks issued by the federated learning platform, select sample clients from the client set based on the learning tasks and the bidding information uploaded by the client sets participating in the federated learning, and download to the sample clients Send a global shared model;
接收单元,用于接收每个样本客户端上传的更新模型参数,所述更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的;The receiving unit is used to receive the updated model parameters uploaded by each sample client, and the updated model parameters are the sample client's bidding information to be submitted in the current round using the multi-agent reinforcement learning algorithm output by the sample client before the training starts , formed by training the global shared model according to the configuration in the bidding information to be submitted after being selected;
聚合单元,用于对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对所述全局共享模型中的模型参数进行更新;An aggregation unit, configured to aggregate the updated model parameters uploaded by each sample client, and use the aggregated updated model parameters to update the model parameters in the global shared model;
选取单元,用于若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。The selection unit is used to determine that the learning task released by the federated learning platform is completed if the updated global shared model reaches the preset model accuracy in the test task, otherwise, repeat multiple rounds to update the model parameters in the global shared model steps to make the updated global shared model achieve the preset model accuracy in the test task.
可选的,所述装置还包括:Optionally, the device also includes:
输出单元,用于样本客户端使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的过程;The output unit is used for the sample client to use the multi-agent reinforcement learning algorithm to output the bidding information of the sample client in the current round to be submitted;
所述输出单元,具体用于以所述样本客户端作为智能体,所述智能体观察在联邦学习环境中自身的历史状态信息,并利用所述历史状态信息输出所述样本客户端在当前轮次的待提交竞标信息。The output unit is specifically configured to use the sample client as an agent, the agent observes its own historical state information in the federated learning environment, and uses the historical state information to output the sample client in the current round times of bids to be submitted.
可选的,所述多智能体强化学习算法包括策略器和经验池,所述输出单元包括:Optionally, the multi-agent reinforcement learning algorithm includes a strategist and an experience pool, and the output unit includes:
存储模块,用于以所述样本客户端作为智能体,使用所述多智能体强化学习算法中经验池来存储联邦学习环境中各个智能体观察到的历史任务状态信息,所述历史任务状态信息至少包括智能体在历史轮次中是否被选中、历史资源值、历史提供数据量以及历史单位资源量;A storage module, configured to use the sample client as an agent, use the experience pool in the multi-agent reinforcement learning algorithm to store historical task state information observed by each agent in the federated learning environment, the historical task state information Including at least whether the agent is selected in the historical rounds, historical resource value, historical provided data amount and historical unit resource amount;
输出模块,用于通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息。The output module is used to input the historical task state information observed by the agent in the federated learning environment into the strategist in the multi-agent reinforcement learning algorithm as the state information of the agent in the current round, and output The bidding information to be submitted by the agent in the current round.
可选的,所述输出单元还包括:Optionally, the output unit also includes:
计算模块,用于在所述通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息之后,计算联邦学习环境针对智能体在当前轮次反馈的收益资源,并使用所述多智能体强化学习算法中经验池存储智能体在当前轮次观察到环境的历史状态、待提交竞标信息、待提交竞标信息上传后的环境状态以及联邦学习环境针对当前轮次上传的待提交竞标信息反馈给智能体的收益资源。The calculation module is used to input the historical task state information observed by the agent in the federated learning environment into the multi-agent reinforcement learning algorithm as the state information of the agent in the current round. After outputting the bidding information to be submitted by the agent in the current round, it calculates the revenue resources fed back by the federated learning environment for the agent in the current round, and uses the experience pool in the multi-agent reinforcement learning algorithm to store the The round observes the historical state of the environment, the bidding information to be submitted, the state of the environment after the bidding information to be submitted is uploaded, and the revenue resources that the federated learning environment feeds back to the agent for the bidding information uploaded in the current round.
可选的,所述计算模块,具体用于基于智能体在当前轮次上的待上传竞标信息,分别获取智能体在竞标过程中涉及的资源参数;Optionally, the computing module is specifically configured to respectively acquire resource parameters involved in the bidding process of the agent based on the bidding information to be uploaded by the agent in the current round;
所述计算模块,具体还用于将所述智能体在竞标过程中涉及的资源参数输入至预先构建的收益函数,得到联邦学习环境针对智能体在当前轮次反馈的收益资源。The calculation module is also specifically configured to input the resource parameters involved in the bidding process of the agent into a pre-built revenue function, so as to obtain the revenue resources fed back by the federated learning environment for the agent in the current round.
可选的,每个样本客户端配置有一个策略器,所述策略器包括动作网络和价值网络,所述输出模块,具体用于通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述策略器中动作网络,输出智能体在当前轮次的待提交竞标信息,得到智能体在当前训练轮次的待上传竞标信息;Optionally, each sample client is configured with a strategist, the strategist includes an action network and a value network, and the output module is specifically configured to use the agent observed in the federated learning environment The historical task state information is input to the action network in the strategist as the state information of the agent in the current round, and the bid information to be submitted by the agent in the current round is output, and the bid information to be uploaded by the agent in the current training round is obtained ;
所述输出模块,具体还用于通过将所述智能体在当前轮次的状态信息以及智能体在当前轮次的待上传竞标信息输入至所述策略器中价值网络,对所述待上传竞标信息进行评估,得到待上传竞标信息的评估分数;The output module is also specifically configured to input the state information of the agent in the current round and the bidding information of the agent in the current round to the value network in the strategist, and process the bids to be uploaded. Evaluate the information and get the evaluation score of the bidding information to be uploaded;
其中,所述动作网络利用所述待上传竞标信息的评估分数进行训练,所述动作网络的网络参数通过梯度上升来更新,所述价值网络利用所述待上传竞标信息的评估分数以及智能体实际反馈的收益资源进行训练,所述价值网络的网络参数通过时序差分法来更新。Wherein, the action network uses the evaluation score of the bidding information to be uploaded for training, the network parameters of the action network are updated through gradient ascent, and the value network uses the evaluation score of the bidding information to be uploaded and the actual The feedback revenue resources are used for training, and the network parameters of the value network are updated by the temporal difference method.
可选的,所述聚合单元包括:Optionally, the polymerization unit includes:
计算模块,用于分别计算各个样本客户端的数据量与所有样本客户端的数据量的比值,得到每个样本客户端对应的数据量占比;A computing module, configured to calculate the ratio of the data volume of each sample client to the data volume of all sample clients respectively, so as to obtain the proportion of data volume corresponding to each sample client;
聚合模块,用于将每个样本客户端对应的数据量占比乘以相应样本客户端上传的更新模型参数后,聚合所有样本客户端对应的更新模型参数,通过累加聚合后更新模型参数对全局共享模型中的模型参数进行更新。The aggregation module is used to multiply the proportion of data volume corresponding to each sample client by the updated model parameters uploaded by the corresponding sample client, aggregate the updated model parameters corresponding to all sample clients, and update the model parameters to the global Model parameters in the shared model are updated.
第三方面,本发明实施例提供了一种存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现第一方面所述的方法。In a third aspect, an embodiment of the present invention provides a storage medium, on which executable instructions are stored, and when the instructions are executed by a processor, the processor implements the method described in the first aspect.
第四方面,本发明实施例提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价的设备,包括:In the fourth aspect, an embodiment of the present invention provides a device for user bidding based on a multi-agent reinforcement learning algorithm under federated learning, including:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现第一方面所述的方法。Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method described in the first aspect.
由上述内容可知,本发明实施例提供的基于多智能体强化学习算法在联邦学习下的用户竞价方法及装置,通过获取联邦学习平台发布的学习任务,基于学习任务以及参与联邦学习的客户端集合所上传的竞标信息从客户端集合中选取样本客户端,并向样本客户端下发全局共享模型,接收每个样本客户端上传的更新模型参数,该更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的,进一步对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对全局共享模型中的模型参数进行更新,若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。由此可知,与现有技术基于拍卖的激励机制相比,本发明实施例可使用多智能体学习系统来调整客户端上传的竞标信息,从而解决了现有技术基于拍卖的激励机制由于策略后续训练过程中不会发生改变而导致联邦学习公平性缺失本的问题。It can be seen from the above that the embodiment of the present invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federated learning. By obtaining the learning tasks released by the federated learning platform, based on the learning tasks and the set of clients participating in federated learning The uploaded bidding information selects the sample client from the client collection, and sends the global shared model to the sample client, and receives the updated model parameters uploaded by each sample client. The updated model parameters are the sample client before the training starts. Use the multi-agent reinforcement learning algorithm to output the bidding information to be submitted by the sample client in the current round. After being selected, it is formed by training the global shared model according to the configuration in the bidding information to be submitted, and further uploaded by each sample client Update the model parameters for aggregation, and use the aggregated updated model parameters to update the model parameters in the global shared model. If the updated global shared model reaches the preset model accuracy in the test task, it is judged that the learning released by the federated learning platform is completed. task, otherwise, repeatedly execute the step of updating the model parameters in the global shared model for multiple rounds, so that the updated global shared model can reach the preset model accuracy in the test task. It can be seen that, compared with the auction-based incentive mechanism in the prior art, the embodiment of the present invention can use the multi-agent learning system to adjust the bidding information uploaded by the client, thereby solving the problem of the auction-based incentive mechanism in the prior art There is no change in the training process, which leads to the lack of fairness in federated learning.
此外,本实施例还可以实现的技术效果包括:In addition, the technical effects that can also be achieved in this embodiment include:
(1)基于多智能体强化学习算法来调整客户端上传的竞标信息,以增加客户端被选取的概率,保证联邦学习中参与用户的公平性,解决固定策略导致的次优问题,实现联邦学习平台和参与用户效用共同最大化的目标。(1) Adjust the bidding information uploaded by the client based on the multi-agent reinforcement learning algorithm to increase the probability of the client being selected, ensure the fairness of participating users in federated learning, solve the suboptimal problem caused by fixed strategies, and realize federated learning The goal of jointly maximizing the utility of the platform and participating users.
(2)该多智能体强化学习算法采用集中式训练,分布式执行,使得参与用户对应客户端可以观察到更多状态,提高智能体训练过程中的稳定性。(2) The multi-agent reinforcement learning algorithm adopts centralized training and distributed execution, so that participating users can observe more states corresponding to the client, and improve the stability of the agent training process.
(3)该多智能体强化学习算法采用异步深度强化的训练方式,解耦学习任务的执行与竞标信息的更新的步骤,使得两者可以并行地进行工作,加速模型的训练。(3) The multi-agent reinforcement learning algorithm adopts an asynchronous deep reinforcement training method to decouple the execution of learning tasks and the steps of updating bidding information, so that the two can work in parallel to speed up the training of the model.
当然,实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。Of course, implementing any product or method of the present invention does not necessarily need to achieve all the above-mentioned advantages at the same time.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单介绍。显而易见地,下面描述中的附图仅仅是本发明的一些实施例。对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.
图1为本发明实施例提供的一种基于多智能体强化学习算法在联邦学习下的用户竞价方法的流程示意图;1 is a schematic flow diagram of a user bidding method under federated learning based on a multi-agent reinforcement learning algorithm provided by an embodiment of the present invention;
图2为本发明实施例提供的多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的流程框图;Fig. 2 is a flow diagram of the multi-agent reinforcement learning algorithm outputting the bidding information to be submitted by the sample client in the current round provided by the embodiment of the present invention;
图3为本发明实施例提供的一种基于多智能体强化学习算法在联邦学习下的用户竞价装置的组成框图。FIG. 3 is a block diagram of a user bidding device under federated learning based on a multi-agent reinforcement learning algorithm provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
需要说明的是,本发明实施例及附图中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含的一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。It should be noted that the terms "include" and "have" and any variations thereof in the embodiments of the present invention and the drawings are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally further includes For other steps or units inherent in these processes, methods, products or devices.
本发明提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价方法及装置,通过使用多智能体学习系统来调整客户端上传的竞标信息,从而解决了现有技术基于拍卖的激励机制由于策略后续训练过程中不会发生改变而导致联邦学习公平性缺失的问题。传统拍卖技术在进行拍卖过程中需要参与用户的私人信息,整个拍卖过程是静态进行的,也就是说参与用户的投标是固定的,即使在竞标失败后也不会调整客户端上传的竞标信息,使得参与用户无法动态改变竞标信息,使得联邦学习存在公平性缺失,资源弱的参与用户很难被联邦学习平台所选取,使得参与用户的资源造成极大的浪费,这些拍卖机制仅仅最大化了联邦学习平台或者社会福利效用,而未能共同最大化联邦学习平台和参与用户的效用。本发明实施例将多智能体强化学习算法引入联邦学习的激励机制中,基于多智能体强化学习算法来调整客户端上传的竞标信息,以增加参与用户对应客户端被选取的概率,降低聚合时间,保证联邦学习中参与用户的公平性,解决固定策略导致的次优问题,实现联邦学习平台和参与用户效用共同最大化的目标。The present invention provides a user bidding method and device based on a multi-agent reinforcement learning algorithm under federated learning, by using a multi-agent learning system to adjust the bidding information uploaded by the client, thereby solving the problem of auction-based incentives in the prior art The mechanism does not change in the subsequent training process of the policy, which leads to the lack of fairness in federated learning. Traditional auction technology needs to participate in the user's private information during the auction process. The entire auction process is carried out statically, that is to say, the bidding of participating users is fixed, and the bidding information uploaded by the client will not be adjusted even after the bidding fails. It makes it impossible for participating users to dynamically change the bidding information, which makes federated learning lack of fairness. It is difficult for participating users with weak resources to be selected by the federated learning platform, which causes a great waste of participating users' resources. These auction mechanisms only maximize the federated learning. learning platform or social welfare utility, but fail to jointly maximize the utility of the federated learning platform and participating users. The embodiment of the present invention introduces the multi-agent reinforcement learning algorithm into the incentive mechanism of federated learning, and adjusts the bidding information uploaded by the client based on the multi-agent reinforcement learning algorithm, so as to increase the probability of the corresponding client being selected by participating users and reduce the aggregation time , to ensure the fairness of participating users in federated learning, solve the suboptimal problem caused by fixed strategies, and achieve the goal of maximizing the utility of the federated learning platform and participating users.
下面对本发明实施例进行详细说明。The embodiments of the present invention will be described in detail below.
图1为本发明实施例提供的一种基于多智能体强化学习算法在联邦学习下的用户竞价方法的流程示意图。所述方法可以包括如下步骤:FIG. 1 is a schematic flowchart of a user bidding method under federated learning based on a multi-agent reinforcement learning algorithm provided by an embodiment of the present invention. The method may include the steps of:
S100:获取联邦学习平台发布的学习任务,基于所述学习任务以及参与联邦学习的客户端集合所上传的竞标信息从所述客户端集合中选取样本客户端,并向样本客户端下发全局共享模型。S100: Obtain the learning task released by the federated learning platform, select a sample client from the client set based on the learning task and the bidding information uploaded by the client set participating in the federated learning, and issue a global share to the sample client Model.
其中,联邦学习平台发布的学习任务由联邦发布者对应的服务端发布,该学习任务适用于各种涉及到数据收集训练的应用场景,例如,目标识别任务、数据分类任务等。由于联邦学习过程需要使用选取带有数据的客户端对全局共享模型进行训练,如果能够选取优质的客户端来更新全局共享模型的模型参数,将提高学习任务的应用效果,这里为了保证服务端选取到合适的客户端来进行模型训练,各个客户端会上传竞标信息,进一步服务端基于学习任务对各个客户端进行选取。Among them, the learning tasks issued by the federated learning platform are issued by the server corresponding to the federated issuer. The learning tasks are applicable to various application scenarios involving data collection and training, such as target recognition tasks, data classification tasks, etc. Since the federated learning process needs to use selected clients with data to train the global shared model, if a high-quality client can be selected to update the model parameters of the global shared model, the application effect of the learning task will be improved. Here, in order to ensure that the server selects To the appropriate client for model training, each client will upload bidding information, and the server will further select each client based on the learning task.
这里竞标信息由计算资源、数据量和竞标资源组成。具体基于学习任务以及参与联邦学习的客户端集合上传的竞标信息从客户端集合中选取样本客户端的过程中,可以在联邦学习平台收到客户端的竞标信息后,通过对学习任务进行建模求解得到样本客户端,以及样本客户端对应的竞标资源,该建模求解过程如下:The bidding information here consists of computing resources, data volume, and bidding resources. Specifically, in the process of selecting sample clients from the client collection based on the learning tasks and the bidding information uploaded by the client collection participating in federated learning, after the federated learning platform receives the client bidding information, it can be obtained by modeling and solving the learning tasks. The sample client, and the bidding resources corresponding to the sample client, the modeling and solving process is as follows:
sn∈{0,1} (2)s n ∈ {0,1} (2)
tn max≤Tmax (3)t n max ≤ T max (3)
其中,(1)代表的是联邦学习平台对于被选中的客户端的支付总和不得超过平台的预算,(2)代表的每个客户端被选中或不被选中,被选中为1,不被选中为0,(3)代表的是被选中客户端的训练时间不能超过联邦学习平台规定的最大时间。Among them, (1) represents that the sum of the federated learning platform’s payments to the selected clients must not exceed the platform’s budget, (2) represents whether each client is selected or not selected, is selected as 1, and is not selected as 0, (3) means that the training time of the selected client cannot exceed the maximum time specified by the federated learning platform.
通过求解上述表达式,联邦学习平台会选取出样本客户端集合,以及联邦学习平台在每个样本客户端所购买的资源量。By solving the above expressions, the federated learning platform will select a set of sample clients and the amount of resources purchased by the federated learning platform for each sample client.
S110:接收每个样本客户端上传的更新模型参数,所述更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的。S110: Receive the updated model parameters uploaded by each sample client, the updated model parameters are the sample client's output of the bidding information to be submitted in the current round by using the multi-agent reinforcement learning algorithm before the training starts, and are selected Then it is formed by training the global shared model according to the configuration in the bidding information to be submitted.
在本发明实施例中,样本客户端为希望参与学习任务的客户端,只有被选中的样本客户端才可进行全局模型的下载以及训练,每个样本客户端都有多智能体强化学习算法,具体在被选中的样本客户端在接收到全局共享模型后,按照其竞标信息中的配置训练全局模型,得到更新模型参数。In the embodiment of the present invention, the sample client is a client that wishes to participate in the learning task, and only the selected sample client can download and train the global model, and each sample client has a multi-agent reinforcement learning algorithm, Specifically, after the selected sample client receives the global shared model, it trains the global model according to the configuration in its bidding information, and obtains updated model parameters.
具体样本客户端使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的过程,以样本客户端作为智能体,智能体智能体观察在联邦学习环境中自身的历史状态信息,并利用历史状态信息输出样本客户端在当前轮次的待提交竞标信息。The specific sample client uses the multi-agent reinforcement learning algorithm to output the bidding information of the sample client in the current round. The sample client is used as the agent, and the agent observes its own historical state information in the federated learning environment. , and use the historical state information to output the bidding information to be submitted by the sample client in the current round.
上述多智能体强化学习算法包括策略器和经验池,具体以样本客户端作为智能体,智能体智能体观察在联邦学习环境中自身的历史状态信息,并利用历史状态信息输出样本客户端在当前轮次的待提交竞标信息的过程中,可以以样本客户端作为智能体,使用多智能体强化学习算法中经验池来存储联邦学习环境中各个智能体观察到的历史任务状态信息,该的历史任务状态信息相当于智能体历史提交竞标信息的状态以及联邦学习对于智能体历史提交竞标信息的反馈,至少包括智能体在历史轮次中是否被选中、历史资源值、历史提供数据量以及历史单位资源量,进一步通过将智能体在联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息。The above multi-agent reinforcement learning algorithm includes a strategist and an experience pool. Specifically, the sample client is used as the agent. The agent agent observes its own historical state information in the federated learning environment, and uses the historical state information to output the current status of the sample client. In the process of submitting bidding information for a round, the sample client can be used as an agent, and the experience pool in the multi-agent reinforcement learning algorithm can be used to store the historical task status information observed by each agent in the federated learning environment. The task status information is equivalent to the status of the agent’s historically submitted bidding information and federated learning’s feedback on the agent’s historically submitted bidding information, including at least whether the agent is selected in the historical rounds, historical resource values, historically provided data volume, and historical units The amount of resources, further by inputting the state information of historical tasks observed by the agent in the federated learning environment as the state information of the agent in the current round to the strategist in the multi-agent reinforcement learning algorithm, and outputting the state information of the agent in the current round Submit bid information.
可以理解的是,上述多智能体强化学习算法可以学习如何将联邦学习环境中的任务状态映射到竞标信息中,以使得客户端与平台同时获得最大资源收益。该系统的基本模型是基本模型是马尔可夫博弈过程,在马尔科夫博弈中,所有智能体根据当前联邦学习环境的任务状态(或者是观测值)来同时选择并执行各智能体待提交的竞标信息。它被定义为一个元组(n,S,A1,...,An,T,γ,R1,...,Rn),其中,n为智能体的数量,S是多智能体强化学习算法的任务状态,指各个智能体在的历史任务状态信息;A是各智能体待提交的竞标信息集合;T:S×A1×A2×...×An×S→[0,1]是智能体状态转移函数的集合,即给定当前任务状态和联合动作时,下一任务状态的概率分布。Ri:S×A1×A2×...×An×S→[0,1]是智能体i奖励函数的集合,Ri(s,a1,...×an,s)是智能体i在任务状态s时采取联合动作(a1,...an)之后在任务状态st+1所得到的回报。这里某个智能体i获得的累积奖励的期望可以表示为:It can be understood that the above multi-agent reinforcement learning algorithm can learn how to map the task state in the federated learning environment to bidding information, so that the client and the platform can obtain the maximum resource benefits at the same time. The basic model of the system is that the basic model is a Markov game process. In the Markov game, all agents simultaneously select and execute tasks to be submitted by each agent according to the task state (or observation value) of the current federated learning environment. bidding information. It is defined as a tuple (n,S,A 1 ,...,A n ,T,γ,R 1 ,...,R n ), where n is the number of agents and S is the number of agents The task status of the agent reinforcement learning algorithm refers to the historical task status information of each agent; A is the bidding information set to be submitted by each agent; T:S×A 1 ×A 2 ×...×A n ×S→ [0,1] is the set of agent state transition functions, that is, the probability distribution of the next task state given the current task state and the joint action. R i :S×A 1 ×A 2 ×...×A n ×S→[0,1] is the set of reward functions for agent i, R i (s,a 1 ,...×a n ,s ) is the reward obtained by agent i in task state t +1 after taking joint action (a 1 ,...a n ) in task state s. Here the expectation of the cumulative reward obtained by an agent i can be expressed as:
上述智能体的奖励函数可表示为:The reward function of the above agent can be expressed as:
格;di为智能体i的数据量;mi为智能体的单位计算能力,ci为单位成本;为智能体i在自己的资源需求得到的平均利润,xi为智能体将资源服务于自己的需求。由于智能体所有者行为的不确定性,例如,智能体可能会长时间将其用于其他事情,导致设备几乎没有剩余的资源可用于任务训练,这里将xi定义为某个区间内的随机变量其中,xi遵循概率分布函数F(xi)。 grid; d i is the data volume of agent i; m i is the unit computing power of the agent, and c i is the unit cost; is the average profit that the agent i gets from its own resource demand, and xi is the resource that the agent uses to serve its own demand. Due to the uncertainty of the behavior of the agent owner, for example, the agent may use it for other things for a long time, causing the device to have almost no remaining resources for task training. Here, xi is defined as a random value within a certain interval variable Among them, x i follows the probability distribution function F(xi ) .
进一步地,为了实时了解样本客户端在每个轮次反馈的收益情况,可以在输出智能体在当前轮次的待提交竞标信息之后,计算联邦学习环境针对智能体在当前轮次反馈的收益资源,并使用多智能体强化学习算法中经验池存储智能体在当前轮次观察到环境的历史状态、待提交竞标信息、待提交竞标信息上传后的环境状态以及联邦学习环境针对当前轮次上传的待提交竞标信息反馈给智能体的收益资源。具体可以基于智能体在当前轮次上的待上传竞标信息,分别获取智能体在竞标过程中涉及的资源参数,进一步将智能体在竞标过程中涉及的资源参数输入至预先构建的收益函数,得到联邦学习环境针对智能体在当前轮次反馈的收益资源。Furthermore, in order to understand the revenue feedback of the sample client in each round in real time, after outputting the bid information to be submitted by the agent in the current round, calculate the revenue resources fed back by the federated learning environment for the agent in the current round , and use the experience pool in the multi-agent reinforcement learning algorithm to store the historical state of the environment observed by the agent in the current round, the bidding information to be submitted, the environment state after the bidding information is uploaded, and the federated learning environment uploaded for the current round The revenue resource to be submitted to the agent for bidding information feedback. Specifically, based on the bidding information to be uploaded by the agent in the current round, the resource parameters involved in the bidding process of the agent can be obtained respectively, and the resource parameters involved in the bidding process of the agent can be further input into the pre-built revenue function to obtain The federated learning environment is aimed at the revenue resources fed back by the agent in the current round.
上述每个样本客户端配置有一个策略器,策略器包括动作网络和价值网络,具体在输出智能体在当前轮次的待提交竞标信息的过程中,可以通过将智能体在联邦学习环境中观察到的历史任务状态信息作为其在当前轮次的状态信息输入至策略器中动作网络,输出智能体在当前轮次的待提交竞标信息,得到智能体在当前训练轮次的待上传竞标信息;通过将智能体在当前轮次的状态信息以及智能体在当前轮次的待上传竞标信息输入至策略器中价值网络,对待上传竞标信息进行评估,得到待上传竞标信息的评估分数。其中,动作网络利用待上传竞标信息的评估分数进行训练,动作网络的网络参数通过梯度上升来更新,价值网络利用待上传竞标信息的评估分数以及智能体实际反馈的收益资源进行训练,价值网络的网络参数通过时序差分法来更新。Each of the above sample clients is configured with a strategist, which includes an action network and a value network. Specifically, in the process of outputting the bid information to be submitted by the agent in the current round, the agent can be observed in the federated learning environment The historical task status information obtained is input to the action network in the strategist as its status information in the current round, and the bid information to be submitted by the agent in the current round is output, and the bid information to be uploaded by the agent in the current training round is obtained; By inputting the status information of the agent in the current round and the bidding information to be uploaded in the current round of the agent into the value network in the strategist, the bidding information to be uploaded is evaluated, and the evaluation score of the bidding information to be uploaded is obtained. Among them, the action network uses the evaluation scores of the bidding information to be uploaded for training, and the network parameters of the action network are updated through gradient ascent. The network parameters are updated by the temporal difference method.
具体在实际应用场景中,可假设在某一时间步长t某个区域有m个样本客户端和一个任务发起者,这里某一时间步长t相当于联邦学习过程中客户端集合提交任务投标、联邦学习平台选取客户端以及被选中的样本客户端在本地训练并上传更新模型参数的一个轮次,其中,任务投标包含样本客户端的竞标信息(数据量,计算资源)与希望获得的支付。在联邦学习中,每个样本客户端都作为一个智能体,拥有一个强化学习策略器,该强化学习策略器使用深度学习中的多层感知机构成,包含输入层、隐藏层和输出层。该策略器表示如下:st为t时刻的联邦学习环境的状态,包括每个智能体的状态和联邦学习平台的状态,在每个时间槽t中,智能体i的观测空间为其中,为上一轮次智能体提供的价格,是上一轮次的竞标结果i∈{0,1},s=0表示竞标失败,s=1表示竞标成功;表示智能体提供的单计算资源,由于智能体在训练时间内不一定会将全部计算资源分配给训练任务,所以每个智能体的单位计算资源与智能体自身的资源需求有关;表示上一轮次智能体提供的数据量。智能体i在当前训练轮开始前,观察之前关于当前学习任务的状态信息,然后将从联邦学习环境中观察到的状态学习输入至动作网络,动作网络经过计算输出策略策略为当前轮次智能体待提交的竞标信息。其中,是一个数组,该数组包含竞标信息中的四个属性,是该用户在当前轮次采取行动后联邦学习环境反馈的奖励,每一个智能体都有自己的奖励函数,在本发明实施例中,使用智能体的奖励函数即可计算智能体在当前轮次的收益资源。Specifically, in an actual application scenario, it can be assumed that there are m sample clients and a task initiator in a certain area at a certain time step t, where a certain time step t is equivalent to submitting task bids by a collection of clients in the
示例性的,图2为多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的流程框图,这里每个策略器包括动作网络和价值网络,动作网络和价值网络分别包括主网络和目标网络,结合图2具体算法流程如下:Exemplarily, Fig. 2 is a flow chart of the multi-agent reinforcement learning algorithm outputting the bidding information to be submitted by the sample client in the current round, where each strategist includes an action network and a value network, and the action network and the value network respectively include the main Network and target network, combined with Figure 2, the specific algorithm flow is as follows:
for回合数episode=1 to M,进行迭代for number of rounds episode=1 to M, iterate
初始化动作空间;Initialize the action space;
for t=1 to T,进行迭代for t=1 to T, iterate
a)对于每一个智能体i选择动作 a) For each agent i choose an action
b)执行actiona=(a1,...,aN),观察奖励r和新的状态st+1 b) Execute actiona=(a 1 ,...,a N ), observe reward r and new state s t+1
c)将(st,at,rt,st+1,t)放入到经验池D中;c) Put (s t , a t , r t , s t+1 , t) into the experience pool D;
d)st←st+1 d)s t ←s t+1
for智能体ito N进行迭代:for the agent ito N to iterate:
从经验池里随机抽取小批量存储的样本(Xj,aj,rj,X′j)Randomly select samples stored in small batches from the experience pool (X j ,a j ,r j ,X′ j )
主网络更新 mainnet update
通过最小化loss函数来更新主网络和目标网络;By minimizing the loss function to update the main network and the target network;
采用梯度上升更新动作网络:Update the action network using gradient ascent:
全部更新完成后,对于每一个智能体i更新目标网络:θ'i←τθi+(1-τ)θ'i After all updates are completed, update the target network for each agent i: θ' i ←τθ i +(1-τ)θ' i
S120:对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对所述全局共享模型中的模型参数进行更新。S120: Aggregate the updated model parameters uploaded by each sample client, and use the aggregated updated model parameters to update the model parameters in the global shared model.
具体地,可以分别计算各个样本客户端的数据量与所有样本客户端的数据量的比值,得到每个样本客户端对应的数据量占比,并将每个样本客户端对应的数据量占比乘以相应样本客户端上传的更新模型参数后,聚合所有样本客户端对应的更新模型参数,通过累加聚合后更新模型参数对全局共享模型中的模型参数进行更新。Specifically, the ratio of the data volume of each sample client to the data volume of all sample clients can be calculated separately to obtain the proportion of data volume corresponding to each sample client, and the proportion of data volume corresponding to each sample client can be multiplied by After the updated model parameters uploaded by the corresponding sample clients, the updated model parameters corresponding to all sample clients are aggregated, and the model parameters in the global shared model are updated by accumulating the updated model parameters after aggregation.
可以理解的是,下发到各个样本客户端中全局共享模型的模型参数是相同的,而各个样本客户端根据输出的待提交竞标信息配置所训练的全局共享模型对应的模型参数是不同的,这里在每个样本客户端训练全局共享模型后会得到本地模型参数,使用本地模型参数与下发的全局共享模型的模型参数相减,可得到更新模型参数,进一步联邦学习平台对全局共享模型中的模型参数进行更新。It can be understood that the model parameters of the global shared model delivered to each sample client are the same, while the model parameters corresponding to the global shared model trained by each sample client according to the output bidding information configuration to be submitted are different. Here, after each sample client trains the global shared model, the local model parameters will be obtained, and the local model parameters will be subtracted from the model parameters of the delivered global shared model to obtain updated model parameters. The model parameters are updated.
S130:若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。S130: If the updated global shared model reaches the preset model accuracy in the test task, it is judged that the learning task released by the federated learning platform is completed; otherwise, the steps of updating the model parameters in the global shared model are repeated for multiple rounds, In order to make the updated global shared model reach the preset model accuracy in the test task.
在本发明实施例中,考虑智能体的探索空间巨大,这里采用集中式训练,分布式执行的多智能强化学习算法作为框架。每一个样本客户端作为一个智能体,每个智能体都有一个策略器,策略器由动作网络和价值网络组成,每个动作网络与价值网络都分别由两个网络(主网络和目标网络)组成,以用来训练更新。智能体会观察当前轮次的任务状态,例如,历史轮次选中未选中,历史价格,历史提供数据量,以及历史单位资源量等,作为策略器中动作网络的输入,动作网络给出当前轮次的动作,即当前轮次待提交的竞标信息,每个智能体的价值网络具有每个智能体观察的局部状态和做出的动作,并作为输入以此来对此智能体输出的动作进行打分。具体而言,在每一个联邦学习训练轮次开始前,每个智能体将观察自己的历史信息s(历史竞标结果,历史资源计算资源,历史数据量,历史竞价)作为状态输入到策略器中,策略器将输出智能体的动作a即用户当前训练轮的竞标信息,用户将竞标信息提交至联邦学习平台即环境,联邦学习平台会选择合适的样本客户端以最大化自己的利润,联邦学习环境会反馈给每个智能体奖励值r,并转变到下一个状态s’,经验池会对元组(s,a,s’r)进行存储。当经验池里无法在收集新的数据时,策略器会开始进行训练,在本发明实施例中具体采用集中式训练,分布式执行的思想来训练策略器,集中式训练可表现为:首先,每个智能体策略器中的动作网络根据当前的状态选择一个动作a,然后,价值网络根据状态-动作对计算一个Q值,作为对动作网络做出动作a的反馈。这里价值网络根据估计的Q值和实际的Q值来进行训练,动作网络根据价值网络的反馈来更新策略。为了获得更准确的Q值,训练过程中策略器中的价值网络具有所有智能体的动作及状态,且价值网络通过时序差分法来更新价值网络参数,而后通过梯度上升更新动作网络的参数。分布式执行可表现为:集中训练完成后,由每个智能体根据自己当前观察的状态分布执行。策略器经过足够时间训练,开始进入收敛状态,最终达到一个最佳的实时竞标的效果。In the embodiment of the present invention, considering the huge exploration space of the agent, a multi-intelligence reinforcement learning algorithm with centralized training and distributed execution is used as the framework. Each sample client acts as an agent, and each agent has a strategist. The strategist consists of an action network and a value network. Each action network and value network consists of two networks (the main network and the target network). composed to be used for training updates. The agent will observe the task status of the current round, for example, selected or not selected in the historical round, historical price, historical data volume, and historical unit resource volume, etc., as the input of the action network in the strategist, and the action network gives the current round action, that is, the bidding information to be submitted in the current round, the value network of each agent has the local state observed and the action made by each agent, and is used as input to score the action output by the agent . Specifically, before the start of each federated learning training round, each agent will observe its own historical information s (historical bidding results, historical resource computing resources, historical data volume, historical bidding) as a state input into the strategist , the strategist will output the action a of the agent, that is, the bidding information of the user's current training wheel. The user submits the bidding information to the federated learning platform, that is, the environment. The federated learning platform will select the appropriate sample client to maximize its own profit. Federated learning The environment will feed back the reward value r to each agent, and transition to the next state s', and the experience pool will store the tuple (s, a, s'r). When new data cannot be collected in the experience pool, the policer will start training. In the embodiment of the present invention, centralized training is specifically adopted, and the idea of distributed execution is used to train the policer. Centralized training can be expressed as: first, The action network in each agent strategist selects an action a according to the current state, and then the value network calculates a Q value according to the state-action pair as a feedback to the action network to make action a. Here the value network is trained according to the estimated Q value and the actual Q value, and the action network updates the policy according to the feedback of the value network. In order to obtain a more accurate Q value, the value network in the strategist has the actions and states of all agents during the training process, and the value network updates the parameters of the value network through the temporal difference method, and then updates the parameters of the action network through gradient ascent. Distributed execution can be expressed as: after the centralized training is completed, each agent executes according to the state distribution it currently observes. After enough time training, the strategist starts to enter the convergence state, and finally achieves an optimal real-time bidding effect.
本发明实施例提供的基于多智能体强化学习算法在联邦学习下的用户竞价方法,通过获取联邦学习平台发布的学习任务,基于学习任务以及参与联邦学习的客户端集合所上传的竞标信息从客户端集合中选取样本客户端,并向样本客户端下发全局共享模型,接收每个样本客户端上传的更新模型参数,该更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的,进一步对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对全局共享模型中的模型参数进行更新,若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。由此可知,与现有技术基于拍卖的激励机制相比,本发明实施例可使用多智能体学习系统来调整客户端上传的竞标信息,以增加客户端被选取的概率,从而解决了现有技术基于拍卖的激励机制由于策略后续训练过程中不会发生改变而导致联邦学习公平性缺失本的问题。The user bidding method under federated learning based on the multi-agent reinforcement learning algorithm provided by the embodiment of the present invention obtains the learning tasks released by the federated learning platform, based on the learning tasks and the bidding information uploaded by the client set participating in federated learning from the customer Select the sample client from the client set, and send the global shared model to the sample client, and receive the updated model parameters uploaded by each sample client. The updated model parameters are the multi-agent reinforcement learning algorithm used by the sample client before the training starts. Output the bidding information to be submitted by the sample client in the current round, which is formed by training the global shared model according to the configuration in the bidding information to be submitted after being selected, and further aggregate the updated model parameters uploaded by each sample client, using The aggregated updated model parameters update the model parameters in the global shared model. If the updated global shared model reaches the preset model accuracy in the test task, it is judged that the learning task released by the federated learning platform is completed. Otherwise, multiple executions are repeated. The step of updating the model parameters in the global shared model in rounds, so that the updated global shared model can reach the preset model accuracy in the test task. It can be seen that, compared with the auction-based incentive mechanism in the prior art, the embodiment of the present invention can use the multi-agent learning system to adjust the bidding information uploaded by the client to increase the probability of the client being selected, thus solving the problem of existing The auction-based incentive mechanism of technology does not change in the subsequent training process of the strategy, which leads to the lack of fairness in federated learning.
基于上述实施例,本发明的另一实施例提供了一种基于多智能体强化学习算法在联邦学习下的用户竞价装置,如图3所示,所述装置包括:Based on the above-mentioned embodiments, another embodiment of the present invention provides a user bidding device based on a multi-agent reinforcement learning algorithm under federated learning, as shown in FIG. 3 , the device includes:
获取单元20,可以用于获取联邦学习平台发布的学习任务,基于所述学习任务以及参与联邦学习的客户端集合所上传的竞标信息从所述客户端集合中选取样本客户端,并向样本客户端下发全局共享模型;The
接收单元22,可以用于接收每个样本客户端上传的更新模型参数,所述更新模型参数为样本客户端在训练开始之前使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息,被选中后按照所述待提交竞标信息中的配置训练全局共享模型所形成的;The receiving
聚合单元24,可以用于对各个样本客户端上传的更新模型参数进行聚合,使用聚合后的更新模型参数对所述全局共享模型中的模型参数进行更新;The
选取单元26,可以用于若更新后的全局共享模型在测试任务中达到预设模型精度,则判定完成联邦学习平台发布的学习任务,否则,重复执行多个轮次对全局共享模型中模型参数进行更新的步骤,以使得更新后的全局共享模型在测试任务中达到预设模型精度。The
可选的,所述装置还包括:Optionally, the device also includes:
输出单元,可以用于样本客户端使用多智能体强化学习算法输出样本客户端在当前轮次的待提交竞标信息的过程;The output unit can be used for the sample client to use the multi-agent reinforcement learning algorithm to output the bidding information to be submitted by the sample client in the current round;
所述输出单元,具体用于以所述样本客户端作为智能体,所述智能体观察在联邦学习环境中自身的历史状态信息,并利用所述历史状态信息输出所述样本客户端在当前轮次的待提交竞标信息。The output unit is specifically configured to use the sample client as an agent, the agent observes its own historical state information in the federated learning environment, and uses the historical state information to output the sample client in the current round Times pending bidding information.
可选的,所述多智能体强化学习算法包括策略器和经验池,所述输出单元包括:Optionally, the multi-agent reinforcement learning algorithm includes a strategist and an experience pool, and the output unit includes:
存储模块,可以用于以所述样本客户端作为智能体,使用所述多智能体强化学习算法中经验池来存储联邦学习环境中各个智能体观察到的历史任务状态信息,所述历史任务状态信息至少包括智能体在历史轮次中是否被选中、历史资源值、历史提供数据量以及历史单位资源量;The storage module can be used to use the sample client as an agent, use the experience pool in the multi-agent reinforcement learning algorithm to store the historical task status information observed by each agent in the federated learning environment, and the historical task status information The information includes at least whether the agent is selected in the historical rounds, the historical resource value, the amount of historical data provided, and the amount of historical unit resources;
输出模块,可以用于通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息。The output module can be used to input the historical task state information observed by the agent in the federated learning environment as the state information of the agent in the current round to the strategist in the multi-agent reinforcement learning algorithm, Output the bidding information to be submitted by the agent in the current round.
可选的,所述输出单元还包括:Optionally, the output unit also includes:
计算模块,可以用于在所述通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述多智能体强化学习算法中策略器,输出智能体在当前轮次的待提交竞标信息之后,计算联邦学习环境针对智能体在当前轮次反馈的收益资源,并使用所述多智能体强化学习算法中经验池存储智能体在当前轮次观察到环境的历史状态、待提交竞标信息、待提交竞标信息上传后的环境状态以及联邦学习环境针对当前轮次上传的待提交竞标信息反馈给智能体的收益资源。The calculation module can be used to input the historical task state information observed by the agent in the federated learning environment into the multi-agent reinforcement learning algorithm as the state information of the agent in the current round Strategist, after outputting the bid information to be submitted by the agent in the current round, calculate the revenue resources fed back by the federated learning environment for the agent in the current round, and use the experience pool in the multi-agent reinforcement learning algorithm to store the agent in The current round observes the historical state of the environment, the bidding information to be submitted, the state of the environment after the bidding information to be submitted is uploaded, and the revenue resources that the federated learning environment feeds back to the agent for the bidding information uploaded in the current round.
可选的,所述计算模块,具体可以用于基于智能体在当前轮次上的待上传竞标信息,分别获取智能体在竞标过程中涉及的资源参数;Optionally, the calculation module can be specifically used to obtain the resource parameters involved in the bidding process of the agent based on the bidding information to be uploaded by the agent in the current round;
所述计算模块,具体还可以用于将所述智能体在竞标过程中涉及的资源参数输入至预先构建的收益函数,得到联邦学习环境针对智能体在当前轮次反馈的收益资源。The calculation module can also be specifically configured to input the resource parameters involved in the bidding process of the agent into a pre-built revenue function, so as to obtain the revenue resources fed back by the federated learning environment for the agent in the current round.
可选的,每个样本客户端配置有一个策略器,所述策略器包括动作网络和价值网络,所述输出模块,具体可以用于通过将所述智能体在所述联邦学习环境中观察到的历史任务状态信息作为智能体在当前轮次的状态信息输入至所述策略器中动作网络,输出智能体在当前轮次的待提交竞标信息,得到智能体在当前训练轮次的待上传竞标信息;Optionally, each sample client is configured with a strategist, the strategist includes an action network and a value network, and the output module can specifically be used to observe by the agent in the federated learning environment The historical task state information of the agent is input to the action network in the strategist as the state information of the agent in the current round, and the bid information to be submitted by the agent in the current round is output, and the bid to be uploaded by the agent in the current training round is obtained. information;
所述输出模块,具体还可以用于通过将所述智能体在当前轮次的状态信息以及智能体在当前轮次的待上传竞标信息输入至所述策略器中价值网络,对所述待上传竞标信息进行评估,得到待上传竞标信息的评估分数;The output module can also specifically input the state information of the agent in the current round and the bidding information of the agent to be uploaded in the current round into the value network in the strategist, and process the to-be-uploaded Evaluate the bidding information and get the evaluation score of the bidding information to be uploaded;
其中,所述动作网络利用所述待上传竞标信息的评估分数进行训练,所述动作网络的网络参数通过梯度上升来更新,所述价值网络利用所述待上传竞标信息的评估分数以及智能体实际反馈的收益资源进行训练,所述价值网络的网络参数通过时序差分法来更新。Wherein, the action network uses the evaluation score of the bidding information to be uploaded for training, the network parameters of the action network are updated through gradient ascent, and the value network uses the evaluation score of the bidding information to be uploaded and the actual The feedback revenue resources are used for training, and the network parameters of the value network are updated by the temporal difference method.
可选的,所述聚合单元24包括:Optionally, the
计算模块,可以用于分别计算各个样本客户端的数据量与所有样本客户端的数据量的比值,得到每个样本客户端对应的数据量占比;The calculation module can be used to separately calculate the ratio of the data volume of each sample client to the data volume of all sample clients, and obtain the proportion of data volume corresponding to each sample client;
聚合模块,可以用于将每个样本客户端对应的数据量占比乘以相应样本客户端上传的更新模型参数后,聚合所有样本客户端对应的更新模型参数,通过累加聚合后更新模型参数对全局共享模型中的模型参数进行更新。The aggregation module can be used to multiply the proportion of data volume corresponding to each sample client by the updated model parameters uploaded by the corresponding sample client, aggregate the updated model parameters corresponding to all sample clients, and update the model parameter pairs after aggregation and aggregation Model parameters in the globally shared model are updated.
基于上述方法实施例,本发明的另一实施例提供了一种存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器实现上述方法。Based on the foregoing method embodiments, another embodiment of the present invention provides a storage medium on which executable instructions are stored, and when the instructions are executed by a processor, the processor implements the foregoing method.
基于上述实施例,本发明的另一实施例提供了一种车辆,包括:Based on the above embodiments, another embodiment of the present invention provides a vehicle, including:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现上述的方法。所述车辆可以为非自动驾驶车辆,也可以为自动驾驶车辆。Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are made to implement the above method. The vehicle may be a non-autonomous driving vehicle or an automatic driving vehicle.
上述系统、装置实施例与方法实施例相对应,与该方法实施例具有同样的技术效果,具体说明参见方法实施例。装置实施例是基于方法实施例得到的,具体的说明可以参见方法实施例部分,此处不再赘述。本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。The above-mentioned system and device embodiments correspond to the method embodiments, and have the same technical effect as the method embodiments. For details, refer to the method embodiments. The device embodiment is obtained based on the method embodiment. For specific description, please refer to the method embodiment part, which will not be repeated here. Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of an embodiment, and the modules or processes in the accompanying drawing are not necessarily necessary for implementing the present invention.
本领域普通技术人员可以理解:实施例中的装置中的模块可以按照实施例描述分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。Those of ordinary skill in the art can understand that: the modules in the device in the embodiment may be distributed in the device in the embodiment according to the description in the embodiment, or may be changed and located in one or more devices different from the embodiment. The modules in the above embodiments can be combined into one module, and can also be further split into multiple sub-modules.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2022103096119 | 2022-03-28 | ||
CN202210309611.9A CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115358831A true CN115358831A (en) | 2022-11-18 |
Family
ID=82975873
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210309611.9A Withdrawn CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
CN202211120985.2A Pending CN115358831A (en) | 2022-03-28 | 2022-09-15 | User bidding method and device based on multi-agent reinforcement learning algorithm under federated learning |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210309611.9A Withdrawn CN114971819A (en) | 2022-03-28 | 2022-03-28 | User bidding method and device based on multi-agent reinforcement learning algorithm under federal learning |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN114971819A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115130683B (en) * | 2022-07-18 | 2025-02-14 | 山东大学 | Asynchronous federated learning method and system based on multi-agent model |
CN115086399B (en) * | 2022-07-28 | 2022-12-06 | 深圳前海环融联易信息科技服务有限公司 | Federal learning method and device based on hyper network and computer equipment |
CN117076113B (en) * | 2023-08-17 | 2024-09-06 | 重庆理工大学 | Industrial heterogeneous equipment multi-job scheduling method based on federal learning |
-
2022
- 2022-03-28 CN CN202210309611.9A patent/CN114971819A/en not_active Withdrawn
- 2022-09-15 CN CN202211120985.2A patent/CN115358831A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115544899A (en) * | 2022-11-23 | 2022-12-30 | 南京邮电大学 | Water plant water intake pump station energy-saving scheduling method based on multi-agent deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114971819A (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111754000B (en) | Quality-aware edge intelligent federal learning method and system | |
CN115358831A (en) | User bidding method and device based on multi-agent reinforcement learning algorithm under federated learning | |
Du et al. | Learning resource allocation and pricing for cloud profit maximization | |
CN110189174A (en) | Mobile crowd sensing excitation method based on data quality sensing | |
CN109068288B (en) | Method and system for selecting mobile crowd sensing incentive mechanism based on multi-attribute user | |
CN110417872A (en) | Edge network resource allocation method facing mobile block chain | |
Lim et al. | Incentive mechanism design for resource sharing in collaborative edge learning | |
CN113992676A (en) | Incentive method and system for layered federal learning under terminal edge cloud architecture and complete information | |
CN114169412B (en) | Federal learning model training method for large-scale industry chain privacy calculation | |
CN107784561A (en) | The implementation method of online incentive mechanism in a kind of mobile mass-rent system | |
Zhou et al. | A truthful procurement auction for incentivizing heterogeneous clients in federated learning | |
CN111523939A (en) | Promotion content delivery method and device, storage medium and electronic equipment | |
Wu et al. | Fedab: Truthful federated learning with auction-based combinatorial multi-armed bandit | |
CN104794644B (en) | A kind of task crowdsourcing method of Intelligent Service Oriented business engine | |
Zhou et al. | DPS: Dynamic pricing and scheduling for distributed machine learning jobs in edge-cloud networks | |
Wang et al. | Truthful auction-based resource allocation mechanisms with flexible task offloading in mobile edge computing | |
Li et al. | A cooperative analysis to incentivize communication-efficient federated learning | |
Xu et al. | Hierarchical combinatorial auction in computing resource allocation for mobile blockchain | |
CN116976612A (en) | Task scheduling method, device, terminal equipment and storage medium | |
CN116523071A (en) | Group crowd funding game continuous excitation method for federal learning | |
CN115271092A (en) | A crowdfunding incentive method for indoor positioning federated learning | |
Li et al. | An Incentive Mechanism for Consortium Blockchain-based Cross-Silo Federated Learning | |
Bai et al. | A fair incentive mechanism in federated learning | |
Liu et al. | Resource allocation strategy based on improved auction algorithm in mobile edge computing environment | |
CN108984479B (en) | A method for improving the operating efficiency of crowdsourcing platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zeng Rongfei Inventor after: An Shuyang Inventor after: Zeng Chao Inventor after: Su Mai Inventor after: Wang Jiaqi Inventor before: Zeng Rongfei Inventor before: An Shuyang Inventor before: Zeng Chao Inventor before: Han Bo Inventor before: Su Mai Inventor before: Wang Jiaqi |
|
CB03 | Change of inventor or designer information |