CN114139680A - Automatic negotiation intelligent agent design method based on deep reinforcement learning - Google Patents

Automatic negotiation intelligent agent design method based on deep reinforcement learning Download PDF

Info

Publication number
CN114139680A
CN114139680A CN202111318748.2A CN202111318748A CN114139680A CN 114139680 A CN114139680 A CN 114139680A CN 202111318748 A CN202111318748 A CN 202111318748A CN 114139680 A CN114139680 A CN 114139680A
Authority
CN
China
Prior art keywords
agent
action
reinforcement learning
sarsa
lambda
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111318748.2A
Other languages
Chinese (zh)
Inventor
林杰
陈锶奇
郝建业
郑岩
马亿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202111318748.2A priority Critical patent/CN114139680A/en
Publication of CN114139680A publication Critical patent/CN114139680A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/101Collaborative creation, e.g. joint development of products or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an automatic negotiation multi-agent design method based on deep reinforcement learning, which comprises the following steps that firstly, each agent independently learns a strategy by using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a long short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation. Compared with the prior art, the agent of the agent can perform more excellent selection of winning alliances in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.

Description

Automatic negotiation intelligent agent design method based on deep reinforcement learning
Technical Field
The invention relates to the field of multi-agent reinforcement learning, in particular to a multi-agent reinforcement learning method.
Background
Multiple artificial intelligence entities in the same environment interact and gain benefits by coordinating their actions. Many tasks are very tricky for a single agent, requiring a collaboration team. Such as search and rescue, multi-robot patrol, supply chain management, etc. In many cases, the stakeholder may select a partner with whom to collaborate.
In a limited bargaining and environment, there are many links between the values of xiapril and the core and competitive or nash equilibria. When a particular category of competitive market is modeled as multiplayer gaming and the set of traders is evenly spread, the xiapril value converges to a competitive equilibrium state. While these are suitable for various market areas, they do not cover any negotiation settings. Meanwhile, many aspects of intelligence are required for the success of social tasks such as negotiations. Previous studies have primarily studied the occurrence of interactions in cooperative games, such as the reference game, a variation of the Lewis signal game, where information is used to disambiguate between different possible references. The work on negotiation in classical game theory usually uses simple forms of offer/bargaining games, which do not explicitly address the problem of communication.
Deep multi-agent reinforcement learning (MARL) gradually achieves guidance of complex behaviors including motor skills and language communication by using a subject through repeated interaction with other agents (agents) while learning. However, the environment considered in the prior art of MARL involves only two agents and does not consider the problem of team formation. Thereby avoiding the problem of federation selection. How to form a coalition beneficial to the user by exchanging information in a game so as to obtain higher income in the game than an agent of an agent constructed by other methods is a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide an automatic negotiation agent design method based on deep reinforcement learning, which realizes the construction of an agent for negotiation by using a deep reinforcement learning algorithm.
The invention is realized by adopting the following technical scheme:
an automatic negotiation intelligent agent design method based on deep reinforcement learning comprises the following steps:
step 1, each agent of the agent learns a tactics independently by using a deep reinforcement learning algorithm SARSA (lambda);
step 2, obtaining the action state value of the agent of the intelligent agent in the time step t by adopting a neural network LSTM:
step 3, combining a reinforcement learning algorithm Sarsa (lambda) with a neural network LSTM to construct an agent of the intelligent agent for automatic negotiation; each agent trains with the goal of obtaining a higher reward value as the only goal, learning separately a strategy that maps observations of environmental conditions to actions to be taken; each agent uses reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns a proper behavior strategy.
Compared with the prior art, the invention has the following advantages compared with the prior agent algorithm of the intelligent agent:
1) compared with the traditional intelligent agent used for automatic negotiation and needing to manually set rules, the intelligent agent constructed by using the deep reinforcement learning algorithm has good generalization and can deal with different negotiation rules without manually adjusting the setting of the intelligent agent constructed by applying the method;
2) the agent of the agent constructed by the deep reinforcement learning algorithm can select a more excellent winning alliance in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.
Drawings
FIG. 1 is an overall flowchart of an auto-negotiation multi-agent design method based on deep reinforcement learning according to the present invention
FIG. 2 is a schematic diagram of a deep reinforcement learning algorithm based structure.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses an auto-negotiation multi-agent design method based on deep reinforcement learning. First, each agent learns a policy independently using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a Long Short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation.
The invention relates to an automatic negotiation intelligent agent design method based on deep reinforcement learning, which combines a deep learning algorithm and a reinforcement learning algorithm to form a deep reinforcement learning algorithm for training an intelligent agent to carry out automatic negotiation, and the specific flow is as follows:
step 1, the agent of the intelligent agent uses the reinforcement learning algorithm to study the tactics independently, namely each agent of the intelligent agent uses the deep reinforcement learning algorithm-SARSA (lambda) to study a tactics independently; the method specifically comprises the following steps:
step 1-1, selecting a maximum action value maxQ in a decision link by SARSA (lambda) to be applied to an environment for getting a return, and selecting an action a capable of bringing the maximum return as an estimation action when the Sarsa (lambda) is in a state s;
step 1-2, Sarsa (lambda) takes the estimated action as the action to be executed next, finds the difference between the reality and the estimation of the selected maximum action value maxQ, and updates Q (s, a) in the Q table;
step 1-3, SARSA (lambda) updates all steps taken for obtaining the reward, wherein the steps closer to the step for obtaining the reward are more important, and the steps farther away are less important (the attenuation amplitude is controlled by a parameter lambda);
the optimal strategy can be learned more quickly and effectively by utilizing the Sarsa (lambda);
step 2, solving the problem of excessive state actions of negotiation by adopting a neural network LSTM:
step 2-1, extracting the characteristics of the agent, and outputting the information as the implicit characteristic information x of the agentiThe formula is as follows:
xi=embedding(ai,oi) (1)
wherein o isiRepresenting local observations of agents and their attribute information, aiRepresenting actions selected by the agent based on local observations and policies, embedding representing a multi-layer neuron perceptron;
step 2-2, implicit characteristic information x is obtainediTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence still corresponds to one LSTM network to obtain two vectors with fixed sizes; connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in a time step t, wherein the action state value is used for selecting an action strategy to be taken;
the input at each time step t consists of two parts, one is the allocation of revenue for the agent and the other is the information used for communication in the negotiation. Firstly, two embedding tables are used for respectively corresponding to two kinds of input to convert the two kinds of input into dense vectors; each input sequence is then encoded using LSTM, again one LSTM for each input sequence, resulting in two fixed-size vectors. Connecting the two vectors obtained and inputting the two vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in time step t for selecting an action strategy to be taken;
and 3, obtaining a corresponding value function by the agents of the agents.
Building an agent for auto-negotiation using a reinforcement learning algorithm Sarsa (λ) in combination with a neural network LSTM; the weight of the network is trained and optimized by using an Adam optimizer with default parameter setting; each agent is independent, trained with the sole goal of obtaining a higher reward value, and learns strategies that map observations of environmental conditions to actions to be taken; the agent uses a reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns an appropriate behavior strategy.
Fig. 2 is a schematic structural diagram of a deep reinforcement learning algorithm.
The invention combines the neural network and the reinforcement learning algorithm in the deep learning, and is applied to the field of multi-agent systems, so that the agent of the agent can understand the communication information between the agents of the agent and communicate with other agents of the agent in the automatic negotiation, and the more excellent winning alliance is selected, thereby obtaining higher negotiation income. Based on the environment of automatic negotiation and the information of each round of negotiation, the intelligent agent needs to use the trained model of the invention, obtain the automatic negotiation rule when the automatic negotiation starts, and obtain the information of no round of negotiation in the automatic negotiation process, so that the intelligent agent can exchange and propose in the automatic negotiation, and further obtain better negotiation benefit.

Claims (3)

1. An automatic negotiation intelligent agent design method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, each agent of the agent learns a tactics independently by using a deep reinforcement learning algorithm SARSA (lambda);
step 2, obtaining an action state value of the agent of the intelligent agent in a time step t by adopting a neural network LSTM;
step 3, combining a reinforcement learning algorithm Sarsa (lambda) with a neural network LSTM to construct an agent of the intelligent agent for automatic negotiation; each agent trains with the goal of obtaining a higher reward value as the only goal, learning separately a strategy that maps observations of environmental conditions to actions to be taken; each agent uses reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns a proper behavior strategy.
2. The method according to claim 1, wherein the step 1 specifically includes the following steps:
step 1-1, selecting a maximum action value maxQ in a decision link by SARSA (lambda) to be applied to an environment for getting a return, and selecting an action a capable of bringing the maximum return as an estimation action when the Sarsa (lambda) is in a state s;
step 1-2, Sarsa (lambda) takes the estimated action as the action to be executed next, finds the difference between the reality and the estimation of the selected maximum action value maxQ, and updates Q (s, a) in the Q table;
step 1-3, SARSA (lambda) updates the steps taken to acquire the reward.
3. The method according to claim 1, wherein the step 2 specifically includes the following steps:
step 2-1, extracting the characteristics of the agent, and outputting the information as the implicit characteristic information x of the agentiThe formula is as follows:
xi=embedding(ai,oi)
wherein o isiRepresenting local observations of agents and their attribute information, aiRepresenting actions selected by the agent based on local observations and policies, embedding representing a multi-layer neuron perceptron;
step 2-2, implicit characteristic information x is obtainediTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence corresponds to one LSTM network to obtain two vectors with fixed sizes; and connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent at a time step t, wherein the action state value is used for selecting an action strategy to be taken.
CN202111318748.2A 2021-11-09 2021-11-09 Automatic negotiation intelligent agent design method based on deep reinforcement learning Pending CN114139680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111318748.2A CN114139680A (en) 2021-11-09 2021-11-09 Automatic negotiation intelligent agent design method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111318748.2A CN114139680A (en) 2021-11-09 2021-11-09 Automatic negotiation intelligent agent design method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114139680A true CN114139680A (en) 2022-03-04

Family

ID=80393366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111318748.2A Pending CN114139680A (en) 2021-11-09 2021-11-09 Automatic negotiation intelligent agent design method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114139680A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882461A (en) * 2023-09-01 2023-10-13 北京航空航天大学 Neural network evaluation optimization method and system based on neuron plasticity

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882461A (en) * 2023-09-01 2023-10-13 北京航空航天大学 Neural network evaluation optimization method and system based on neuron plasticity
CN116882461B (en) * 2023-09-01 2023-11-21 北京航空航天大学 Neural network evaluation optimization method and system based on neuron plasticity

Similar Documents

Publication Publication Date Title
CN110404264B (en) Multi-person non-complete information game strategy solving method, device and system based on virtual self-game and storage medium
Yu et al. Emotional multiagent reinforcement learning in spatial social dilemmas
Sandholm et al. On multiagent Q-learning in a semi-competitive domain
Yang et al. Multiagent reinforcement learning for multi-robot systems: A survey
Fudenberg et al. Noncooperative game theory for industrial organization: an introduction and overview
Kirman Complex economics: individual and collective rationality
Dosi et al. Norms as emergent properties of adaptive learning: The case of economic routines
CN108921298B (en) Multi-agent communication and decision-making method for reinforcement learning
Oliphant The learning barrier: Moving from innate to learned systems of communication
Busemeyer et al. Theoretical tools for understanding and aiding dynamic decision making
CN116187787B (en) Intelligent planning method for cross-domain allocation problem of combat resources
CN114139680A (en) Automatic negotiation intelligent agent design method based on deep reinforcement learning
Wang et al. Application of deep reinforcement learning in werewolf game agents
Savarimuthu Mechanisms for norm emergence and norm identification in multi-agent societies
Critch Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making
CN116167415A (en) Policy decision method in multi-agent cooperation and antagonism
Jin et al. The Convergence Analysis of Evolutionary Dynamics for Continuous Action Iterated Dilemma in Information Loss Networks
Łatek et al. Bounded rationality via recursion
CN116128028A (en) Efficient deep reinforcement learning algorithm for continuous decision space combination optimization
Madeira et al. Designing a reinforcement learning-based adaptive AI for large-scale strategy games
Azaria Irrational, but Adaptive and Goal Oriented: Humans Interacting with Autonomous Agents.
Petrosian et al. Cooperative differential games with dynamic updating
Sun Meta-learning processes in multi-agent systems
Dhami The political economy of redistribution under asymmetric information
Verhagen et al. Adjustable autonomy, delegation and distribution of decision making

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Hao Jianye

Inventor after: Zheng Yan

Inventor after: Lin Jie

Inventor after: Chen Siqi

Inventor after: Ma Yi

Inventor before: Lin Jie

Inventor before: Chen Siqi

Inventor before: Hao Jianye

Inventor before: Zheng Yan

Inventor before: Ma Yi

CB03 Change of inventor or designer information