CN109934753B - Multi-Agent emergency action decision method based on JADE platform and reinforcement learning - Google Patents

Multi-Agent emergency action decision method based on JADE platform and reinforcement learning Download PDF

Info

Publication number
CN109934753B
CN109934753B CN201910182048.1A CN201910182048A CN109934753B CN 109934753 B CN109934753 B CN 109934753B CN 201910182048 A CN201910182048 A CN 201910182048A CN 109934753 B CN109934753 B CN 109934753B
Authority
CN
China
Prior art keywords
agent
emergency
reinforcement learning
emergency resource
resource warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910182048.1A
Other languages
Chinese (zh)
Other versions
CN109934753A (en
Inventor
赵佳宝
潘东旭
潘昱宸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910182048.1A priority Critical patent/CN109934753B/en
Publication of CN109934753A publication Critical patent/CN109934753A/en
Application granted granted Critical
Publication of CN109934753B publication Critical patent/CN109934753B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps: starting a JADE platform and establishing a monitoring Agent, and judging whether an emergency public event occurs or not in real time by using the monitoring Agent; registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent; one or more emergency resource warehouse agents are selected from the reinforcement learning feedback values and added into the emergency resource allocation sequence. The multi-Agent emergency action decision method combines a multi-Agent technology with a reinforcement learning algorithm, and the supply of an emergency resource warehouse is allocated from the global start of the whole emergency action system, and the reinforcement learning algorithm fully utilizes the autonomy of agents to promote the intelligent level and the self-adaption capability of the multi-Agent system.

Description

Multi-Agent emergency action decision method based on JADE platform and reinforcement learning
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-Agent emergency action strategy method based on a JADE platform and reinforcement learning.
Background
With the rapid development of the economic society in China, various emergent public events are also endless. According to official data, only natural disasters in 2018 cause 1.3 hundred million people to suffer disasters, and the direct economic loss exceeds 2600 hundred million yuan. The effective emergency action not only can prevent and reduce the occurrence of sudden public events, but also can ensure the personal and property safety of people when sudden public events occur, control situation as soon as possible and minimize loss. Therefore, how to utilize artificial intelligence technologies such as multi-Agent, reinforcement learning and the like to monitor, manage and assist in decision making of the whole emergency action process is a work which should be further developed, and has important significance.
An Agent is a class of computing entities or programs that are capable of sensing an environment in a particular environment and that can be run autonomously to achieve a set of goals on behalf of its designer or user. The Multi-Agent System, MAS (Multi-Agent System), is essentially a "divide-and-conquer" thinking. The characteristics of the multi-Agent system determine its unique advantages in many distributed application areas. The electronic commerce field, the transportation field, the emergency rescue field, the dialectical system and the telecommunication system all have the characteristic of distributed interaction, the interaction modes of different entities can be obviously improved by adopting the multi-Agent system, the execution plan is optimized, and better, faster and more reliable service is provided. In addition, in the construction of certain information decision support systems, multi-Agent systems are also extremely effective solutions. JADE is used as a multi-Agent system simulation realization platform based on the FIPA standard, has perfect functions, sound system and strong portability, and greatly simplifies the development of the multi-Agent system. Reinforcement learning is a typical unsupervised learning method, and is widely applied to various fields such as unmanned driving, intelligent control, auxiliary decision making and the like, and reinforcement learning algorithm is implemented by utilizing the autonomy of agents in the multi-Agent system, so that the overall intelligence of the multi-Agent system is improved.
Disclosure of Invention
The invention aims at: the multi-Agent emergency action decision method based on the JADE platform and reinforcement learning is provided, and how to cooperatively provide emergency resources by utilizing each emergency resource warehouse is decided by comprehensively considering the cost, distance, time, effectiveness and the like of transportation, so that emergency resource guarantee is provided timely and effectively with lower economic cost.
In order to achieve the aim of the invention, the invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps:
step 1, starting a JADE platform and establishing a monitoring Agent, judging whether an emergency public event occurs or not in real time by using the monitoring Agent, if so, directly entering the step 2, and if not, continuing to judge by cycling the step;
step 2, registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent;
and 3, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values and adding the emergency resource warehouse agents into the emergency resource allocation sequence.
Further, in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through the yellow page service of the JADE.
Further, in step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t Discount factor gamma and Q value;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, exiting reinforcement learning after the Q value is converged.
Further, in step b, the interaction information of the initiation class of the JADE interaction protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
Further, in step b, the state transfer function P selects the action policy based on the softmax function, so that the action policy with a larger average feedback value is more likely to be adopted.
Further, in step b, the probability normalization formula of the state transfer function P is:
Figure GDA0004163256300000021
where τ represents the annealing temperature for controlling the search rate, the larger the difference in average rewards, the larger the strategy may be chosen to be optimal,
Figure GDA0004163256300000022
representing the probability of the corresponding action selection causing a state transition before normalization,/->
Figure GDA0004163256300000023
Representing the probability that all actions in the pre-normalization action set cause a state transition.
Further, in step c, the calculation formula of the Q value is:
Figure GDA0004163256300000024
wherein gamma E [0,1 ] is a discount factor, Λ t For learning rate, A is action set, S is state set, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ,a ) The maximum values in these Q value tables are shown.
Further, the action set a= { a1, a2}, the state set s= { C1, C2, D, F1, F2}, C1 represents an inventory capacity that the emergency resource warehouse Agent can effectively provide, C2 represents an emergency material type that the emergency resource warehouse Agent can effectively provide, D represents a distance between the emergency resource warehouse Agent and a place where an emergency public event occurs, F1 represents a transportation cost per unit distance of emergency resources, F2 represents a transportation cost per unit mass of emergency resources, a1 represents a selection of adding the emergency resource warehouse Agent into the emergency resource allocation row, and a2 represents a selection of not adding the emergency resource warehouse Agent.
The invention has the beneficial effects that: (1) Combining a multi-Agent technology with a reinforcement learning algorithm, allocating the supply of an emergency resource warehouse from the global start of the whole emergency action system, wherein the reinforcement learning algorithm fully utilizes the autonomy of the agents to promote the intelligent level and the self-adaption capability of the multi-Agent system; (2) The system has stronger expansibility and applicability, can be combined with a digital emergency plan system, utilizes the existing monitoring data information and a case library to carry out computer-aided decision, and more scientifically and effectively commands emergency actions; (3) The JADE platform is utilized to construct agents, the development of a multi-Agent system is realized, the multi-Agent system based on the JADE platform is utilized to combine the simulation of emergency rescue treatment process and action details with the auxiliary decision-making application of the emergency treatment of actual emergency public events by utilizing a communication interaction protocol, yellow page service, body support, agent migration and the like provided by the JADE, and a set of system application framework system for providing auxiliary decisions in normal time and anti-exercise optimization and war time is constructed.
Drawings
FIG. 1 is a general flow chart of emergency action decisions for a finite state machine model of the present invention;
fig. 2 is a structural diagram of reinforcement learning of the JADE platform of the present invention.
Detailed Description
As shown in FIG. 1, the invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps of;
step 1, starting a JADE platform and establishing a monitoring Agent, managing emergency actions of sudden public events by using a finite state machine model (FSM) scheduling sub-actions, wherein the finite state machine starts from an initial state 1, and executes the actions 1: judging whether an emergency public event occurs or not in real time by using a monitoring Agent, if the emergency public event occurs, directly entering a step 2, entering an intermediate state 3, and if the emergency public event does not occur, entering the intermediate state 2 (early warning behavior) and then migrating to an initial state 1 to circulate the step to continue judging;
step 2, executing the action 3, registering emergency resource guarantee service actions of each emergency resource warehouse Agent on the monitoring Agent, entering the intermediate state 4, and executing the action 4: reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent; the task of reinforcement learning is corresponding to a four-tuple: e= < S, A, P, R >, wherein S is the current state, A is the action set, P is the state transfer function, R is the feedback function, and the state transfer P selects the action strategy based on the softmax function, so that the possibility that the action strategy with a larger average feedback value is adopted is ensured to be higher, and meanwhile, the opportunity that the action strategy with a lower average feedback value is still adopted is ensured; the emergency resource warehouse Agent obtains feedback values mainly including effectiveness, economic benefit, time distance and the like from the environment (monitoring Agent), and according to the basic principle of reinforcement learning, if a certain behavior strategy of the emergency resource warehouse Agent changes the environment to obtain a positive feedback value, the trend of the Agent for generating the behavior strategy is reinforced; conversely, the reinforcement learning target in the multi-Agent system is still the maximum reward feedback value, and the long-term accumulated feedback value is calculated by using a gamma discount accumulated feedback value method, namely, the discount factor of each time is decreased at a gamma rate;
and 3, executing a behavior 5, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values, adding the emergency resource warehouse agents into the emergency resource allocation sequence, and ending the behavior of the finite state machine of the monitoring agents at the moment, and ending the emergency action.
Further, in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through the yellow page service of the JADE platform.
Further, in step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t A discount factor gamma and a Q value, wherein the discount factor gamma E [0,1 ], and gamma is 0 and represents r before only seeing;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, after the Q value is converged, exiting reinforcement learning, entering a termination state 5, and executing the action 5.
Further, in step b, the interaction information of the initiation class of the JADE interaction protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
Further, in step b, the state transfer function P selects the action policy based on the softmax function, so that the action policy with a larger average feedback value is more likely to be adopted.
Further, in step b, the probability normalization formula of the state transfer function P is:
Figure GDA0004163256300000041
where τ represents the annealing temperature for controlling the search rate, the larger the difference in average rewards, the larger the strategy may be chosen to be optimal,
Figure GDA0004163256300000051
representing the probability of the corresponding action selection causing a state transition before normalization,/->
Figure GDA0004163256300000052
Representing the probability that all actions in the pre-normalization action set cause a state transition.
Further, in step c, the calculation formula of the Q value is:
Figure GDA0004163256300000053
gamma epsilon [0, 1) is the discount factor, Λ t For learning rate, r t+1 The feedback value at time t+1 is represented by A being an action set, S being a state set, Q (S, a) being Q values determined by S and a, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ,a ) The maximum values in these Q value tables are shown. Learning rate alpha t The learning speed is controlled with a value proportional to the convergence speed, but not too great, otherwise the convergence will not be mature. Gamma e 0, 1) as a discount factor, the larger the value is, the more the long-term feedback value is favored, and the smaller the value is, the more the short-term current feedback value is favored. The calculation formula of the Q value is that the best solution is continuously approximated by updating iterative calculation after being processed by the Bellman equation, so that the reinforcement learning behavior is set as a cyclic Behaviour (cyclic class), and the learning process can be continuously repeated.
Further, the action set a= { a1, a2}, the state set s= { C1, C2, D, F1, F2}, C1 represents an inventory capacity that the emergency resource warehouse Agent can effectively provide, C2 represents an emergency material type that the emergency resource warehouse Agent can effectively provide, D represents a distance between the emergency resource warehouse Agent and a place where an emergency public event occurs, F1 represents a transportation cost per unit distance of emergency resources, F2 represents a transportation cost per unit mass of emergency resources, a1 represents a selection of adding the emergency resource warehouse Agent into the emergency resource allocation row, and a2 represents a selection of not adding the emergency resource warehouse Agent.
As shown in FIG. 2, the multi-Agent emergency action decision method based on the JADE platform and reinforcement learning is combined with a model reinforcement learning algorithm structure diagram. Emergency actions for sudden public events are managed by monitoring agents. The specific interaction method of the emergency resource warehouse Agent and the environment (monitoring Agent) is as follows: the emergency resource warehouse Agent establishes an initiation class of a JADE interaction protocol, the monitoring Agent establishes a response class of the JADE interaction protocol, and a feedback value r of a Concept structure is defined by using the JADE body class t+1 State s of the predicte structure t Action a of Action structure t And communicates the information.
The invention is based on the multi-Agent technology, the JADE platform and the reinforcement learning method, combines the multi-Agent technology with the reinforcement learning algorithm from the global aspect of emergency treatment action, establishes a more intelligent decision support method, globally distributes each emergency resource warehouse by comprehensively considering the factors such as time, cost, effectiveness and the like, fully and effectively utilizes each emergency resource warehouse to carry out emergency guarantee work, and simultaneously applies the multi-Agent idea to the decision support system, thereby greatly enhancing the self-adaption capability of the system; the coordination among the agents is improved by reinforcement learning, the intellectualization of the system is promoted, the system can be combined with a digitalized emergency plan system, the existing monitoring data information and a case library are utilized for making computer-aided decisions, emergency actions are more scientifically and effectively commanded, the system can effectively coordinate the relationship between economic cost and time efficiency, the intelligence and the adaptability are stronger, and the system has higher expandability and more important practical application value.

Claims (5)

1. A multi-Agent emergency action decision method based on a JADE platform and reinforcement learning is characterized by comprising the following steps:
step 1, starting a JADE platform and establishing a monitoring Agent, judging whether an emergency public event occurs or not in real time by using the monitoring Agent, if so, directly entering the step 2, and if not, continuing to judge by cycling the step;
step 2, registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent;
step 3, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values and adding the emergency resource warehouse agents into an emergency resource allocation sequence;
in the step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t Discount factor gamma and Q value;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, exiting reinforcement learning after the Q value is converged.
2. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 1, wherein in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through a yellow page service of the JADE platform.
3. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 1, wherein in the step b, the interactive information of the initiation class of the JADE interactive protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
4. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 3, wherein in the step c, the calculation formula of the Q value is:
Figure FDA0004163256290000011
wherein gamma E [0,1 ] is a discount factor, Λ t For learning rate, A is action set, S is state set, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ,a ) The maximum values in these Q value tables are shown.
5. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 4, wherein the action set A= { a1, a2}, the state set S= { C1, C2, D, F1, F2}, C1 represents an inventory capacity which can be effectively provided by an emergency resource warehouse Agent, C2 represents an emergency material type which can be effectively provided by the emergency resource warehouse Agent, D represents a distance between the emergency resource warehouse Agent and an emergency public event occurrence place, F1 represents a transportation cost of an emergency resource per unit distance, F2 represents a transportation cost of an emergency resource per unit mass, a1 represents a selection of adding the emergency resource allocation line by the emergency resource warehouse Agent, and a2 represents a selection of non-adding the emergency resource warehouse Agent.
CN201910182048.1A 2019-03-11 2019-03-11 Multi-Agent emergency action decision method based on JADE platform and reinforcement learning Active CN109934753B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910182048.1A CN109934753B (en) 2019-03-11 2019-03-11 Multi-Agent emergency action decision method based on JADE platform and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910182048.1A CN109934753B (en) 2019-03-11 2019-03-11 Multi-Agent emergency action decision method based on JADE platform and reinforcement learning

Publications (2)

Publication Number Publication Date
CN109934753A CN109934753A (en) 2019-06-25
CN109934753B true CN109934753B (en) 2023-05-16

Family

ID=66986738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910182048.1A Active CN109934753B (en) 2019-03-11 2019-03-11 Multi-Agent emergency action decision method based on JADE platform and reinforcement learning

Country Status (1)

Country Link
CN (1) CN109934753B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207928B (en) * 2011-06-02 2013-04-24 河海大学常州校区 Reinforcement learning-based multi-Agent sewage treatment decision support system
CN102622269B (en) * 2012-03-15 2014-06-04 广西大学 Java agent development (JADE)-based intelligent power grid power generation dispatching multi-Agent system
CN106980548A (en) * 2017-02-22 2017-07-25 中国科学院合肥物质科学研究院 Intelligent repository scheduling Agent system and method based on Jade

Also Published As

Publication number Publication date
CN109934753A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
Dong et al. Task scheduling based on deep reinforcement learning in a cloud manufacturing environment
Russell et al. Q-decomposition for reinforcement learning agents
CN112329948B (en) Multi-agent strategy prediction method and device
CN114691363A (en) Cloud data center self-adaption efficient resource allocation method based on deep reinforcement learning
CN107831685B (en) Group robot control method and system
CN110688754B (en) Combat architecture modeling and optimal searching method
CN112149990B (en) Fuzzy supply and demand matching method based on prediction
CN116614394A (en) Service function chain placement method based on multi-target deep reinforcement learning
CN116663416A (en) CGF decision behavior simulation method based on behavior tree
CN109934753B (en) Multi-Agent emergency action decision method based on JADE platform and reinforcement learning
CN115018231A (en) Autonomous task planning method and system for reinforcement learning deep space probe based on dynamic rewards
CN115906673B (en) Combat entity behavior model integrated modeling method and system
CN115730630A (en) Control method and device of intelligent agent, electronic equipment and storage medium
CN113469369B (en) Method for relieving catastrophic forgetting for multitasking reinforcement learning
David et al. Optimal health monitoring via wireless body area networks
Elfahim et al. Deep Reinforcement Learning Approach for Emergency Response Management
Yang et al. Energy saving strategy of cloud data computing based on convolutional neural network and policy gradient algorithm
Madhumala et al. Hybrid model for virtual machine optimization in cloud data center
Shen et al. Goal autonomous agent architecture
CN109388802A (en) A kind of semantic understanding method and apparatus based on deep learning
Asadi et al. A dynamic hierarchical task transfer in multiple robot explorations
CN114815598B (en) Multi-level optimal control system and method based on Stackelberg-Nash differential game
CN112149798B (en) AI model training method, AI model calling method, apparatus and readable storage medium
CN114722085B (en) Service combination method, system, equipment and medium for meeting user demand constraint
Egon et al. Deep Reinforcement Learning: Training Intelligent Agents to Make Complex Decisions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant