CN114139680A - Automatic negotiation intelligent agent design method based on deep reinforcement learning - Google Patents
Automatic negotiation intelligent agent design method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114139680A CN114139680A CN202111318748.2A CN202111318748A CN114139680A CN 114139680 A CN114139680 A CN 114139680A CN 202111318748 A CN202111318748 A CN 202111318748A CN 114139680 A CN114139680 A CN 114139680A
- Authority
- CN
- China
- Prior art keywords
- agent
- action
- reinforcement learning
- sarsa
- lambda
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims abstract description 18
- 230000009471 action Effects 0.000 claims abstract description 33
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims abstract description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 81
- 239000013598 vector Substances 0.000 claims description 10
- 230000006399 behavior Effects 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 2
- 230000006403 short-term memory Effects 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000002860 competitive effect Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013068 supply chain management Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/101—Collaborative creation, e.g. joint development of products or services
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an automatic negotiation multi-agent design method based on deep reinforcement learning, which comprises the following steps that firstly, each agent independently learns a strategy by using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a long short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation. Compared with the prior art, the agent of the agent can perform more excellent selection of winning alliances in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.
Description
Technical Field
The invention relates to the field of multi-agent reinforcement learning, in particular to a multi-agent reinforcement learning method.
Background
Multiple artificial intelligence entities in the same environment interact and gain benefits by coordinating their actions. Many tasks are very tricky for a single agent, requiring a collaboration team. Such as search and rescue, multi-robot patrol, supply chain management, etc. In many cases, the stakeholder may select a partner with whom to collaborate.
In a limited bargaining and environment, there are many links between the values of xiapril and the core and competitive or nash equilibria. When a particular category of competitive market is modeled as multiplayer gaming and the set of traders is evenly spread, the xiapril value converges to a competitive equilibrium state. While these are suitable for various market areas, they do not cover any negotiation settings. Meanwhile, many aspects of intelligence are required for the success of social tasks such as negotiations. Previous studies have primarily studied the occurrence of interactions in cooperative games, such as the reference game, a variation of the Lewis signal game, where information is used to disambiguate between different possible references. The work on negotiation in classical game theory usually uses simple forms of offer/bargaining games, which do not explicitly address the problem of communication.
Deep multi-agent reinforcement learning (MARL) gradually achieves guidance of complex behaviors including motor skills and language communication by using a subject through repeated interaction with other agents (agents) while learning. However, the environment considered in the prior art of MARL involves only two agents and does not consider the problem of team formation. Thereby avoiding the problem of federation selection. How to form a coalition beneficial to the user by exchanging information in a game so as to obtain higher income in the game than an agent of an agent constructed by other methods is a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide an automatic negotiation agent design method based on deep reinforcement learning, which realizes the construction of an agent for negotiation by using a deep reinforcement learning algorithm.
The invention is realized by adopting the following technical scheme:
an automatic negotiation intelligent agent design method based on deep reinforcement learning comprises the following steps:
Compared with the prior art, the invention has the following advantages compared with the prior agent algorithm of the intelligent agent:
1) compared with the traditional intelligent agent used for automatic negotiation and needing to manually set rules, the intelligent agent constructed by using the deep reinforcement learning algorithm has good generalization and can deal with different negotiation rules without manually adjusting the setting of the intelligent agent constructed by applying the method;
2) the agent of the agent constructed by the deep reinforcement learning algorithm can select a more excellent winning alliance in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.
Drawings
FIG. 1 is an overall flowchart of an auto-negotiation multi-agent design method based on deep reinforcement learning according to the present invention
FIG. 2 is a schematic diagram of a deep reinforcement learning algorithm based structure.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses an auto-negotiation multi-agent design method based on deep reinforcement learning. First, each agent learns a policy independently using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a Long Short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation.
The invention relates to an automatic negotiation intelligent agent design method based on deep reinforcement learning, which combines a deep learning algorithm and a reinforcement learning algorithm to form a deep reinforcement learning algorithm for training an intelligent agent to carry out automatic negotiation, and the specific flow is as follows:
step 1-1, selecting a maximum action value maxQ in a decision link by SARSA (lambda) to be applied to an environment for getting a return, and selecting an action a capable of bringing the maximum return as an estimation action when the Sarsa (lambda) is in a state s;
step 1-2, Sarsa (lambda) takes the estimated action as the action to be executed next, finds the difference between the reality and the estimation of the selected maximum action value maxQ, and updates Q (s, a) in the Q table;
step 1-3, SARSA (lambda) updates all steps taken for obtaining the reward, wherein the steps closer to the step for obtaining the reward are more important, and the steps farther away are less important (the attenuation amplitude is controlled by a parameter lambda);
the optimal strategy can be learned more quickly and effectively by utilizing the Sarsa (lambda);
step 2-1, extracting the characteristics of the agent, and outputting the information as the implicit characteristic information x of the agentiThe formula is as follows:
xi=embedding(ai,oi) (1)
wherein o isiRepresenting local observations of agents and their attribute information, aiRepresenting actions selected by the agent based on local observations and policies, embedding representing a multi-layer neuron perceptron;
step 2-2, implicit characteristic information x is obtainediTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence still corresponds to one LSTM network to obtain two vectors with fixed sizes; connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in a time step t, wherein the action state value is used for selecting an action strategy to be taken;
the input at each time step t consists of two parts, one is the allocation of revenue for the agent and the other is the information used for communication in the negotiation. Firstly, two embedding tables are used for respectively corresponding to two kinds of input to convert the two kinds of input into dense vectors; each input sequence is then encoded using LSTM, again one LSTM for each input sequence, resulting in two fixed-size vectors. Connecting the two vectors obtained and inputting the two vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in time step t for selecting an action strategy to be taken;
and 3, obtaining a corresponding value function by the agents of the agents.
Building an agent for auto-negotiation using a reinforcement learning algorithm Sarsa (λ) in combination with a neural network LSTM; the weight of the network is trained and optimized by using an Adam optimizer with default parameter setting; each agent is independent, trained with the sole goal of obtaining a higher reward value, and learns strategies that map observations of environmental conditions to actions to be taken; the agent uses a reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns an appropriate behavior strategy.
Fig. 2 is a schematic structural diagram of a deep reinforcement learning algorithm.
The invention combines the neural network and the reinforcement learning algorithm in the deep learning, and is applied to the field of multi-agent systems, so that the agent of the agent can understand the communication information between the agents of the agent and communicate with other agents of the agent in the automatic negotiation, and the more excellent winning alliance is selected, thereby obtaining higher negotiation income. Based on the environment of automatic negotiation and the information of each round of negotiation, the intelligent agent needs to use the trained model of the invention, obtain the automatic negotiation rule when the automatic negotiation starts, and obtain the information of no round of negotiation in the automatic negotiation process, so that the intelligent agent can exchange and propose in the automatic negotiation, and further obtain better negotiation benefit.
Claims (3)
1. An automatic negotiation intelligent agent design method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, each agent of the agent learns a tactics independently by using a deep reinforcement learning algorithm SARSA (lambda);
step 2, obtaining an action state value of the agent of the intelligent agent in a time step t by adopting a neural network LSTM;
step 3, combining a reinforcement learning algorithm Sarsa (lambda) with a neural network LSTM to construct an agent of the intelligent agent for automatic negotiation; each agent trains with the goal of obtaining a higher reward value as the only goal, learning separately a strategy that maps observations of environmental conditions to actions to be taken; each agent uses reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns a proper behavior strategy.
2. The method according to claim 1, wherein the step 1 specifically includes the following steps:
step 1-1, selecting a maximum action value maxQ in a decision link by SARSA (lambda) to be applied to an environment for getting a return, and selecting an action a capable of bringing the maximum return as an estimation action when the Sarsa (lambda) is in a state s;
step 1-2, Sarsa (lambda) takes the estimated action as the action to be executed next, finds the difference between the reality and the estimation of the selected maximum action value maxQ, and updates Q (s, a) in the Q table;
step 1-3, SARSA (lambda) updates the steps taken to acquire the reward.
3. The method according to claim 1, wherein the step 2 specifically includes the following steps:
step 2-1, extracting the characteristics of the agent, and outputting the information as the implicit characteristic information x of the agentiThe formula is as follows:
xi=embedding(ai,oi)
wherein o isiRepresenting local observations of agents and their attribute information, aiRepresenting actions selected by the agent based on local observations and policies, embedding representing a multi-layer neuron perceptron;
step 2-2, implicit characteristic information x is obtainediTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence corresponds to one LSTM network to obtain two vectors with fixed sizes; and connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent at a time step t, wherein the action state value is used for selecting an action strategy to be taken.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111318748.2A CN114139680A (en) | 2021-11-09 | 2021-11-09 | Automatic negotiation intelligent agent design method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111318748.2A CN114139680A (en) | 2021-11-09 | 2021-11-09 | Automatic negotiation intelligent agent design method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114139680A true CN114139680A (en) | 2022-03-04 |
Family
ID=80393366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111318748.2A Pending CN114139680A (en) | 2021-11-09 | 2021-11-09 | Automatic negotiation intelligent agent design method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114139680A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116882461A (en) * | 2023-09-01 | 2023-10-13 | 北京航空航天大学 | Neural network evaluation optimization method and system based on neuron plasticity |
-
2021
- 2021-11-09 CN CN202111318748.2A patent/CN114139680A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116882461A (en) * | 2023-09-01 | 2023-10-13 | 北京航空航天大学 | Neural network evaluation optimization method and system based on neuron plasticity |
CN116882461B (en) * | 2023-09-01 | 2023-11-21 | 北京航空航天大学 | Neural network evaluation optimization method and system based on neuron plasticity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110404264B (en) | Multi-person non-complete information game strategy solving method, device and system based on virtual self-game and storage medium | |
Yu et al. | Emotional multiagent reinforcement learning in spatial social dilemmas | |
Sandholm et al. | On multiagent Q-learning in a semi-competitive domain | |
Yang et al. | Multiagent reinforcement learning for multi-robot systems: A survey | |
Fudenberg et al. | Noncooperative game theory for industrial organization: an introduction and overview | |
Kirman | Complex economics: individual and collective rationality | |
Dosi et al. | Norms as emergent properties of adaptive learning: The case of economic routines | |
CN108921298B (en) | Multi-agent communication and decision-making method for reinforcement learning | |
Oliphant | The learning barrier: Moving from innate to learned systems of communication | |
Busemeyer et al. | Theoretical tools for understanding and aiding dynamic decision making | |
CN116187787B (en) | Intelligent planning method for cross-domain allocation problem of combat resources | |
CN114139680A (en) | Automatic negotiation intelligent agent design method based on deep reinforcement learning | |
Wang et al. | Application of deep reinforcement learning in werewolf game agents | |
Savarimuthu | Mechanisms for norm emergence and norm identification in multi-agent societies | |
Critch | Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making | |
CN116167415A (en) | Policy decision method in multi-agent cooperation and antagonism | |
Jin et al. | The Convergence Analysis of Evolutionary Dynamics for Continuous Action Iterated Dilemma in Information Loss Networks | |
Łatek et al. | Bounded rationality via recursion | |
CN116128028A (en) | Efficient deep reinforcement learning algorithm for continuous decision space combination optimization | |
Madeira et al. | Designing a reinforcement learning-based adaptive AI for large-scale strategy games | |
Azaria | Irrational, but Adaptive and Goal Oriented: Humans Interacting with Autonomous Agents. | |
Petrosian et al. | Cooperative differential games with dynamic updating | |
Sun | Meta-learning processes in multi-agent systems | |
Dhami | The political economy of redistribution under asymmetric information | |
Verhagen et al. | Adjustable autonomy, delegation and distribution of decision making |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Hao Jianye Inventor after: Zheng Yan Inventor after: Lin Jie Inventor after: Chen Siqi Inventor after: Ma Yi Inventor before: Lin Jie Inventor before: Chen Siqi Inventor before: Hao Jianye Inventor before: Zheng Yan Inventor before: Ma Yi |
|
CB03 | Change of inventor or designer information |