CN109934753B - Multi-Agent emergency action decision method based on JADE platform and reinforcement learning - Google Patents
Multi-Agent emergency action decision method based on JADE platform and reinforcement learning Download PDFInfo
- Publication number
- CN109934753B CN109934753B CN201910182048.1A CN201910182048A CN109934753B CN 109934753 B CN109934753 B CN 109934753B CN 201910182048 A CN201910182048 A CN 201910182048A CN 109934753 B CN109934753 B CN 109934753B
- Authority
- CN
- China
- Prior art keywords
- agent
- emergency
- reinforcement learning
- emergency resource
- resource warehouse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps: starting a JADE platform and establishing a monitoring Agent, and judging whether an emergency public event occurs or not in real time by using the monitoring Agent; registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent; one or more emergency resource warehouse agents are selected from the reinforcement learning feedback values and added into the emergency resource allocation sequence. The multi-Agent emergency action decision method combines a multi-Agent technology with a reinforcement learning algorithm, and the supply of an emergency resource warehouse is allocated from the global start of the whole emergency action system, and the reinforcement learning algorithm fully utilizes the autonomy of agents to promote the intelligent level and the self-adaption capability of the multi-Agent system.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a multi-Agent emergency action strategy method based on a JADE platform and reinforcement learning.
Background
With the rapid development of the economic society in China, various emergent public events are also endless. According to official data, only natural disasters in 2018 cause 1.3 hundred million people to suffer disasters, and the direct economic loss exceeds 2600 hundred million yuan. The effective emergency action not only can prevent and reduce the occurrence of sudden public events, but also can ensure the personal and property safety of people when sudden public events occur, control situation as soon as possible and minimize loss. Therefore, how to utilize artificial intelligence technologies such as multi-Agent, reinforcement learning and the like to monitor, manage and assist in decision making of the whole emergency action process is a work which should be further developed, and has important significance.
An Agent is a class of computing entities or programs that are capable of sensing an environment in a particular environment and that can be run autonomously to achieve a set of goals on behalf of its designer or user. The Multi-Agent System, MAS (Multi-Agent System), is essentially a "divide-and-conquer" thinking. The characteristics of the multi-Agent system determine its unique advantages in many distributed application areas. The electronic commerce field, the transportation field, the emergency rescue field, the dialectical system and the telecommunication system all have the characteristic of distributed interaction, the interaction modes of different entities can be obviously improved by adopting the multi-Agent system, the execution plan is optimized, and better, faster and more reliable service is provided. In addition, in the construction of certain information decision support systems, multi-Agent systems are also extremely effective solutions. JADE is used as a multi-Agent system simulation realization platform based on the FIPA standard, has perfect functions, sound system and strong portability, and greatly simplifies the development of the multi-Agent system. Reinforcement learning is a typical unsupervised learning method, and is widely applied to various fields such as unmanned driving, intelligent control, auxiliary decision making and the like, and reinforcement learning algorithm is implemented by utilizing the autonomy of agents in the multi-Agent system, so that the overall intelligence of the multi-Agent system is improved.
Disclosure of Invention
The invention aims at: the multi-Agent emergency action decision method based on the JADE platform and reinforcement learning is provided, and how to cooperatively provide emergency resources by utilizing each emergency resource warehouse is decided by comprehensively considering the cost, distance, time, effectiveness and the like of transportation, so that emergency resource guarantee is provided timely and effectively with lower economic cost.
In order to achieve the aim of the invention, the invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps:
step 1, starting a JADE platform and establishing a monitoring Agent, judging whether an emergency public event occurs or not in real time by using the monitoring Agent, if so, directly entering the step 2, and if not, continuing to judge by cycling the step;
step 2, registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent;
and 3, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values and adding the emergency resource warehouse agents into the emergency resource allocation sequence.
Further, in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through the yellow page service of the JADE.
Further, in step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t Discount factor gamma and Q value;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1 ;
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, exiting reinforcement learning after the Q value is converged.
Further, in step b, the interaction information of the initiation class of the JADE interaction protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
Further, in step b, the state transfer function P selects the action policy based on the softmax function, so that the action policy with a larger average feedback value is more likely to be adopted.
Further, in step b, the probability normalization formula of the state transfer function P is:
where τ represents the annealing temperature for controlling the search rate, the larger the difference in average rewards, the larger the strategy may be chosen to be optimal,representing the probability of the corresponding action selection causing a state transition before normalization,/->Representing the probability that all actions in the pre-normalization action set cause a state transition.
Further, in step c, the calculation formula of the Q value is:
wherein gamma E [0,1 ] is a discount factor, Λ t For learning rate, A is action set, S is state set, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ′ ,a ′ ) The maximum values in these Q value tables are shown.
Further, the action set a= { a1, a2}, the state set s= { C1, C2, D, F1, F2}, C1 represents an inventory capacity that the emergency resource warehouse Agent can effectively provide, C2 represents an emergency material type that the emergency resource warehouse Agent can effectively provide, D represents a distance between the emergency resource warehouse Agent and a place where an emergency public event occurs, F1 represents a transportation cost per unit distance of emergency resources, F2 represents a transportation cost per unit mass of emergency resources, a1 represents a selection of adding the emergency resource warehouse Agent into the emergency resource allocation row, and a2 represents a selection of not adding the emergency resource warehouse Agent.
The invention has the beneficial effects that: (1) Combining a multi-Agent technology with a reinforcement learning algorithm, allocating the supply of an emergency resource warehouse from the global start of the whole emergency action system, wherein the reinforcement learning algorithm fully utilizes the autonomy of the agents to promote the intelligent level and the self-adaption capability of the multi-Agent system; (2) The system has stronger expansibility and applicability, can be combined with a digital emergency plan system, utilizes the existing monitoring data information and a case library to carry out computer-aided decision, and more scientifically and effectively commands emergency actions; (3) The JADE platform is utilized to construct agents, the development of a multi-Agent system is realized, the multi-Agent system based on the JADE platform is utilized to combine the simulation of emergency rescue treatment process and action details with the auxiliary decision-making application of the emergency treatment of actual emergency public events by utilizing a communication interaction protocol, yellow page service, body support, agent migration and the like provided by the JADE, and a set of system application framework system for providing auxiliary decisions in normal time and anti-exercise optimization and war time is constructed.
Drawings
FIG. 1 is a general flow chart of emergency action decisions for a finite state machine model of the present invention;
fig. 2 is a structural diagram of reinforcement learning of the JADE platform of the present invention.
Detailed Description
As shown in FIG. 1, the invention provides a multi-Agent emergency action decision method based on a JADE platform and reinforcement learning, which comprises the following steps of;
step 1, starting a JADE platform and establishing a monitoring Agent, managing emergency actions of sudden public events by using a finite state machine model (FSM) scheduling sub-actions, wherein the finite state machine starts from an initial state 1, and executes the actions 1: judging whether an emergency public event occurs or not in real time by using a monitoring Agent, if the emergency public event occurs, directly entering a step 2, entering an intermediate state 3, and if the emergency public event does not occur, entering the intermediate state 2 (early warning behavior) and then migrating to an initial state 1 to circulate the step to continue judging;
step 2, executing the action 3, registering emergency resource guarantee service actions of each emergency resource warehouse Agent on the monitoring Agent, entering the intermediate state 4, and executing the action 4: reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent; the task of reinforcement learning is corresponding to a four-tuple: e= < S, A, P, R >, wherein S is the current state, A is the action set, P is the state transfer function, R is the feedback function, and the state transfer P selects the action strategy based on the softmax function, so that the possibility that the action strategy with a larger average feedback value is adopted is ensured to be higher, and meanwhile, the opportunity that the action strategy with a lower average feedback value is still adopted is ensured; the emergency resource warehouse Agent obtains feedback values mainly including effectiveness, economic benefit, time distance and the like from the environment (monitoring Agent), and according to the basic principle of reinforcement learning, if a certain behavior strategy of the emergency resource warehouse Agent changes the environment to obtain a positive feedback value, the trend of the Agent for generating the behavior strategy is reinforced; conversely, the reinforcement learning target in the multi-Agent system is still the maximum reward feedback value, and the long-term accumulated feedback value is calculated by using a gamma discount accumulated feedback value method, namely, the discount factor of each time is decreased at a gamma rate;
and 3, executing a behavior 5, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values, adding the emergency resource warehouse agents into the emergency resource allocation sequence, and ending the behavior of the finite state machine of the monitoring agents at the moment, and ending the emergency action.
Further, in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through the yellow page service of the JADE platform.
Further, in step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t A discount factor gamma and a Q value, wherein the discount factor gamma E [0,1 ], and gamma is 0 and represents r before only seeing;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1 ;
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, after the Q value is converged, exiting reinforcement learning, entering a termination state 5, and executing the action 5.
Further, in step b, the interaction information of the initiation class of the JADE interaction protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
Further, in step b, the state transfer function P selects the action policy based on the softmax function, so that the action policy with a larger average feedback value is more likely to be adopted.
Further, in step b, the probability normalization formula of the state transfer function P is:
where τ represents the annealing temperature for controlling the search rate, the larger the difference in average rewards, the larger the strategy may be chosen to be optimal,representing the probability of the corresponding action selection causing a state transition before normalization,/->Representing the probability that all actions in the pre-normalization action set cause a state transition.
Further, in step c, the calculation formula of the Q value is:
gamma epsilon [0, 1) is the discount factor, Λ t For learning rate, r t+1 The feedback value at time t+1 is represented by A being an action set, S being a state set, Q (S, a) being Q values determined by S and a, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ′ ,a ′ ) The maximum values in these Q value tables are shown. Learning rate alpha t The learning speed is controlled with a value proportional to the convergence speed, but not too great, otherwise the convergence will not be mature. Gamma e 0, 1) as a discount factor, the larger the value is, the more the long-term feedback value is favored, and the smaller the value is, the more the short-term current feedback value is favored. The calculation formula of the Q value is that the best solution is continuously approximated by updating iterative calculation after being processed by the Bellman equation, so that the reinforcement learning behavior is set as a cyclic Behaviour (cyclic class), and the learning process can be continuously repeated.
Further, the action set a= { a1, a2}, the state set s= { C1, C2, D, F1, F2}, C1 represents an inventory capacity that the emergency resource warehouse Agent can effectively provide, C2 represents an emergency material type that the emergency resource warehouse Agent can effectively provide, D represents a distance between the emergency resource warehouse Agent and a place where an emergency public event occurs, F1 represents a transportation cost per unit distance of emergency resources, F2 represents a transportation cost per unit mass of emergency resources, a1 represents a selection of adding the emergency resource warehouse Agent into the emergency resource allocation row, and a2 represents a selection of not adding the emergency resource warehouse Agent.
As shown in FIG. 2, the multi-Agent emergency action decision method based on the JADE platform and reinforcement learning is combined with a model reinforcement learning algorithm structure diagram. Emergency actions for sudden public events are managed by monitoring agents. The specific interaction method of the emergency resource warehouse Agent and the environment (monitoring Agent) is as follows: the emergency resource warehouse Agent establishes an initiation class of a JADE interaction protocol, the monitoring Agent establishes a response class of the JADE interaction protocol, and a feedback value r of a Concept structure is defined by using the JADE body class t+1 State s of the predicte structure t Action a of Action structure t And communicates the information.
The invention is based on the multi-Agent technology, the JADE platform and the reinforcement learning method, combines the multi-Agent technology with the reinforcement learning algorithm from the global aspect of emergency treatment action, establishes a more intelligent decision support method, globally distributes each emergency resource warehouse by comprehensively considering the factors such as time, cost, effectiveness and the like, fully and effectively utilizes each emergency resource warehouse to carry out emergency guarantee work, and simultaneously applies the multi-Agent idea to the decision support system, thereby greatly enhancing the self-adaption capability of the system; the coordination among the agents is improved by reinforcement learning, the intellectualization of the system is promoted, the system can be combined with a digitalized emergency plan system, the existing monitoring data information and a case library are utilized for making computer-aided decisions, emergency actions are more scientifically and effectively commanded, the system can effectively coordinate the relationship between economic cost and time efficiency, the intelligence and the adaptability are stronger, and the system has higher expandability and more important practical application value.
Claims (5)
1. A multi-Agent emergency action decision method based on a JADE platform and reinforcement learning is characterized by comprising the following steps:
step 1, starting a JADE platform and establishing a monitoring Agent, judging whether an emergency public event occurs or not in real time by using the monitoring Agent, if so, directly entering the step 2, and if not, continuing to judge by cycling the step;
step 2, registering emergency resource guarantee service behaviors of each emergency resource warehouse Agent on the monitoring Agent, executing reinforcement learning of each emergency resource warehouse Agent, and obtaining reinforcement learning feedback values corresponding to each emergency resource warehouse Agent from the monitoring Agent;
step 3, selecting one or more emergency resource warehouse agents from the reinforcement learning feedback values and adding the emergency resource warehouse agents into an emergency resource allocation sequence;
in the step 2, the specific steps of reinforcement learning of the emergency resource warehouse Agent are as follows:
step a, initializing a learning rate lambda t Discount factor gamma and Q value;
step b, each emergency resource warehouse Agent obtains the current state s through the interaction of the initiation class of the JADE interaction protocol and the response class of the environment t And selects the current state s according to the state transfer function P t Lower optimal action a t Action a is performed t Transition to new state s t+1 ;
Step c, the emergency resource warehouse Agent obtains a return value r from the external environment by utilizing the initiation class of the JADE interaction protocol t+1 And updating the Q value;
and d, exiting reinforcement learning after the Q value is converged.
2. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 1, wherein in step 2, the monitoring Agent searches all possible emergency resource warehouse agents in real time through a yellow page service of the JADE platform.
3. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 1, wherein in the step b, the interactive information of the initiation class of the JADE interactive protocol and the response class of the environment is stored in an ontology form by using the content language of the JADE.
4. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 3, wherein in the step c, the calculation formula of the Q value is:
wherein gamma E [0,1 ] is a discount factor, Λ t For learning rate, A is action set, S is state set, Q t (s t ,a t ) Representing the time t is represented by s t And a t Determined Q value, Q t+1 (s t ,a t ) An update value, max, at time t+1 a∈A Q(s ′ ,a ′ ) The maximum values in these Q value tables are shown.
5. The multi-Agent emergency action decision method based on the JADE platform and reinforcement learning according to claim 4, wherein the action set A= { a1, a2}, the state set S= { C1, C2, D, F1, F2}, C1 represents an inventory capacity which can be effectively provided by an emergency resource warehouse Agent, C2 represents an emergency material type which can be effectively provided by the emergency resource warehouse Agent, D represents a distance between the emergency resource warehouse Agent and an emergency public event occurrence place, F1 represents a transportation cost of an emergency resource per unit distance, F2 represents a transportation cost of an emergency resource per unit mass, a1 represents a selection of adding the emergency resource allocation line by the emergency resource warehouse Agent, and a2 represents a selection of non-adding the emergency resource warehouse Agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910182048.1A CN109934753B (en) | 2019-03-11 | 2019-03-11 | Multi-Agent emergency action decision method based on JADE platform and reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910182048.1A CN109934753B (en) | 2019-03-11 | 2019-03-11 | Multi-Agent emergency action decision method based on JADE platform and reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109934753A CN109934753A (en) | 2019-06-25 |
CN109934753B true CN109934753B (en) | 2023-05-16 |
Family
ID=66986738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910182048.1A Active CN109934753B (en) | 2019-03-11 | 2019-03-11 | Multi-Agent emergency action decision method based on JADE platform and reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109934753B (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102207928B (en) * | 2011-06-02 | 2013-04-24 | 河海大学常州校区 | Reinforcement learning-based multi-Agent sewage treatment decision support system |
CN102622269B (en) * | 2012-03-15 | 2014-06-04 | 广西大学 | Java agent development (JADE)-based intelligent power grid power generation dispatching multi-Agent system |
CN106980548A (en) * | 2017-02-22 | 2017-07-25 | 中国科学院合肥物质科学研究院 | Intelligent repository scheduling Agent system and method based on Jade |
-
2019
- 2019-03-11 CN CN201910182048.1A patent/CN109934753B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109934753A (en) | 2019-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dong et al. | Task scheduling based on deep reinforcement learning in a cloud manufacturing environment | |
Russell et al. | Q-decomposition for reinforcement learning agents | |
CN112329948B (en) | Multi-agent strategy prediction method and device | |
CN114691363A (en) | Cloud data center self-adaption efficient resource allocation method based on deep reinforcement learning | |
CN107831685B (en) | Group robot control method and system | |
CN110688754B (en) | Combat architecture modeling and optimal searching method | |
CN112149990B (en) | Fuzzy supply and demand matching method based on prediction | |
CN116614394A (en) | Service function chain placement method based on multi-target deep reinforcement learning | |
CN116663416A (en) | CGF decision behavior simulation method based on behavior tree | |
CN109934753B (en) | Multi-Agent emergency action decision method based on JADE platform and reinforcement learning | |
CN115018231A (en) | Autonomous task planning method and system for reinforcement learning deep space probe based on dynamic rewards | |
CN115906673B (en) | Combat entity behavior model integrated modeling method and system | |
CN115730630A (en) | Control method and device of intelligent agent, electronic equipment and storage medium | |
CN113469369B (en) | Method for relieving catastrophic forgetting for multitasking reinforcement learning | |
David et al. | Optimal health monitoring via wireless body area networks | |
Elfahim et al. | Deep Reinforcement Learning Approach for Emergency Response Management | |
Yang et al. | Energy saving strategy of cloud data computing based on convolutional neural network and policy gradient algorithm | |
Madhumala et al. | Hybrid model for virtual machine optimization in cloud data center | |
Shen et al. | Goal autonomous agent architecture | |
CN109388802A (en) | A kind of semantic understanding method and apparatus based on deep learning | |
Asadi et al. | A dynamic hierarchical task transfer in multiple robot explorations | |
CN114815598B (en) | Multi-level optimal control system and method based on Stackelberg-Nash differential game | |
CN112149798B (en) | AI model training method, AI model calling method, apparatus and readable storage medium | |
CN114722085B (en) | Service combination method, system, equipment and medium for meeting user demand constraint | |
Egon et al. | Deep Reinforcement Learning: Training Intelligent Agents to Make Complex Decisions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |