CN114139680A

CN114139680A - Automatic negotiation intelligent agent design method based on deep reinforcement learning

Info

Publication number: CN114139680A
Application number: CN202111318748.2A
Authority: CN
Inventors: 林杰; 陈锶奇; 郝建业; 郑岩; 马亿
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-11-09
Filing date: 2021-11-09
Publication date: 2022-03-04

Abstract

The invention discloses an automatic negotiation multi-agent design method based on deep reinforcement learning, which comprises the following steps that firstly, each agent independently learns a strategy by using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a long short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation. Compared with the prior art, the agent of the agent can perform more excellent selection of winning alliances in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.

Description

Automatic negotiation intelligent agent design method based on deep reinforcement learning

Technical Field

The invention relates to the field of multi-agent reinforcement learning, in particular to a multi-agent reinforcement learning method.

Background

Multiple artificial intelligence entities in the same environment interact and gain benefits by coordinating their actions. Many tasks are very tricky for a single agent, requiring a collaboration team. Such as search and rescue, multi-robot patrol, supply chain management, etc. In many cases, the stakeholder may select a partner with whom to collaborate.

In a limited bargaining and environment, there are many links between the values of xiapril and the core and competitive or nash equilibria. When a particular category of competitive market is modeled as multiplayer gaming and the set of traders is evenly spread, the xiapril value converges to a competitive equilibrium state. While these are suitable for various market areas, they do not cover any negotiation settings. Meanwhile, many aspects of intelligence are required for the success of social tasks such as negotiations. Previous studies have primarily studied the occurrence of interactions in cooperative games, such as the reference game, a variation of the Lewis signal game, where information is used to disambiguate between different possible references. The work on negotiation in classical game theory usually uses simple forms of offer/bargaining games, which do not explicitly address the problem of communication.

Deep multi-agent reinforcement learning (MARL) gradually achieves guidance of complex behaviors including motor skills and language communication by using a subject through repeated interaction with other agents (agents) while learning. However, the environment considered in the prior art of MARL involves only two agents and does not consider the problem of team formation. Thereby avoiding the problem of federation selection. How to form a coalition beneficial to the user by exchanging information in a game so as to obtain higher income in the game than an agent of an agent constructed by other methods is a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide an automatic negotiation agent design method based on deep reinforcement learning, which realizes the construction of an agent for negotiation by using a deep reinforcement learning algorithm.

The invention is realized by adopting the following technical scheme:

an automatic negotiation intelligent agent design method based on deep reinforcement learning comprises the following steps:

step 1, each agent of the agent learns a tactics independently by using a deep reinforcement learning algorithm SARSA (lambda);

step 2, obtaining the action state value of the agent of the intelligent agent in the time step t by adopting a neural network LSTM:

step 3, combining a reinforcement learning algorithm Sarsa (lambda) with a neural network LSTM to construct an agent of the intelligent agent for automatic negotiation; each agent trains with the goal of obtaining a higher reward value as the only goal, learning separately a strategy that maps observations of environmental conditions to actions to be taken; each agent uses reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns a proper behavior strategy.

Compared with the prior art, the invention has the following advantages compared with the prior agent algorithm of the intelligent agent:

1) compared with the traditional intelligent agent used for automatic negotiation and needing to manually set rules, the intelligent agent constructed by using the deep reinforcement learning algorithm has good generalization and can deal with different negotiation rules without manually adjusting the setting of the intelligent agent constructed by applying the method;

2) the agent of the agent constructed by the deep reinforcement learning algorithm can select a more excellent winning alliance in automatic negotiation, thereby improving the income obtained in the negotiation by the agent.

Drawings

FIG. 1 is an overall flowchart of an auto-negotiation multi-agent design method based on deep reinforcement learning according to the present invention

FIG. 2 is a schematic diagram of a deep reinforcement learning algorithm based structure.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an auto-negotiation multi-agent design method based on deep reinforcement learning. First, each agent learns a policy independently using a reinforcement learning algorithm; secondly, learning a Q function in SARSA (lambda) by using a Long Short-Term Memory network (LSTM) to reduce the difficult-to-process state space to a manageable characteristic number; and finally, constructing intelligent agent agents by combining a reinforcement learning algorithm SARSA (lambda) and a neural network LSTM, wherein each intelligent agent learns independently, maps the learned observation result of the environment state to a strategy of action to be taken, updates the strategy of the intelligent agent by interacting with other agents in the environment, and finally learns a proper action strategy to carry out automatic negotiation.

The invention relates to an automatic negotiation intelligent agent design method based on deep reinforcement learning, which combines a deep learning algorithm and a reinforcement learning algorithm to form a deep reinforcement learning algorithm for training an intelligent agent to carry out automatic negotiation, and the specific flow is as follows:

step 1, the agent of the intelligent agent uses the reinforcement learning algorithm to study the tactics independently, namely each agent of the intelligent agent uses the deep reinforcement learning algorithm-SARSA (lambda) to study a tactics independently; the method specifically comprises the following steps:

step 1-1, selecting a maximum action value maxQ in a decision link by SARSA (lambda) to be applied to an environment for getting a return, and selecting an action a capable of bringing the maximum return as an estimation action when the Sarsa (lambda) is in a state s;

step 1-2, Sarsa (lambda) takes the estimated action as the action to be executed next, finds the difference between the reality and the estimation of the selected maximum action value maxQ, and updates Q (s, a) in the Q table;

step 1-3, SARSA (lambda) updates all steps taken for obtaining the reward, wherein the steps closer to the step for obtaining the reward are more important, and the steps farther away are less important (the attenuation amplitude is controlled by a parameter lambda);

the optimal strategy can be learned more quickly and effectively by utilizing the Sarsa (lambda);

step 2, solving the problem of excessive state actions of negotiation by adopting a neural network LSTM:

step 2-1, extracting the characteristics of the agent, and outputting the information as the implicit characteristic information x of the agent_iThe formula is as follows:

x_i＝embedding(a_i,o_i) (1)

wherein o is_iRepresenting local observations of agents and their attribute information, a_iRepresenting actions selected by the agent based on local observations and policies, embedding representing a multi-layer neuron perceptron;

step 2-2, implicit characteristic information x is obtained_iTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence still corresponds to one LSTM network to obtain two vectors with fixed sizes; connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in a time step t, wherein the action state value is used for selecting an action strategy to be taken;

the input at each time step t consists of two parts, one is the allocation of revenue for the agent and the other is the information used for communication in the negotiation. Firstly, two embedding tables are used for respectively corresponding to two kinds of input to convert the two kinds of input into dense vectors; each input sequence is then encoded using LSTM, again one LSTM for each input sequence, resulting in two fixed-size vectors. Connecting the two vectors obtained and inputting the two vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent in time step t for selecting an action strategy to be taken;

and 3, obtaining a corresponding value function by the agents of the agents.

Building an agent for auto-negotiation using a reinforcement learning algorithm Sarsa (λ) in combination with a neural network LSTM; the weight of the network is trained and optimized by using an Adam optimizer with default parameter setting; each agent is independent, trained with the sole goal of obtaining a higher reward value, and learns strategies that map observations of environmental conditions to actions to be taken; the agent uses a reinforcement learning algorithm to update its own strategy by interacting with other agents in the environment, and finally learns an appropriate behavior strategy.

Fig. 2 is a schematic structural diagram of a deep reinforcement learning algorithm.

The invention combines the neural network and the reinforcement learning algorithm in the deep learning, and is applied to the field of multi-agent systems, so that the agent of the agent can understand the communication information between the agents of the agent and communicate with other agents of the agent in the automatic negotiation, and the more excellent winning alliance is selected, thereby obtaining higher negotiation income. Based on the environment of automatic negotiation and the information of each round of negotiation, the intelligent agent needs to use the trained model of the invention, obtain the automatic negotiation rule when the automatic negotiation starts, and obtain the information of no round of negotiation in the automatic negotiation process, so that the intelligent agent can exchange and propose in the automatic negotiation, and further obtain better negotiation benefit.

Claims

1. An automatic negotiation intelligent agent design method based on deep reinforcement learning is characterized by comprising the following steps:

step 2, obtaining an action state value of the agent of the intelligent agent in a time step t by adopting a neural network LSTM;

2. The method according to claim 1, wherein the step 1 specifically includes the following steps:

step 1-3, SARSA (lambda) updates the steps taken to acquire the reward.

3. The method according to claim 1, wherein the step 2 specifically includes the following steps:

x_i＝embedding(a_i,o_i)

step 2-2, implicit characteristic information x is obtained_iTransmitting the input sequence into an LSTM network, and encoding each input sequence, wherein each input sequence corresponds to one LSTM network to obtain two vectors with fixed sizes; and connecting two vectors with fixed sizes, inputting the vectors into a feedforward layer, and then inputting a ReLU nonlinear function to obtain an action state value of the agent at a time step t, wherein the action state value is used for selecting an action strategy to be taken.