CN110674470A - Distributed task planning method for multiple robots in dynamic environment - Google Patents
Distributed task planning method for multiple robots in dynamic environment Download PDFInfo
- Publication number
- CN110674470A CN110674470A CN201911022986.1A CN201911022986A CN110674470A CN 110674470 A CN110674470 A CN 110674470A CN 201911022986 A CN201911022986 A CN 201911022986A CN 110674470 A CN110674470 A CN 110674470A
- Authority
- CN
- China
- Prior art keywords
- tree
- intention
- reward
- action
- robots
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Mathematical Analysis (AREA)
- Development Economics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Marketing (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Algebra (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
The invention belongs to the field of robots, and discloses a distributed task planning method for multiple robots in a dynamic environment, aiming at enabling the multiple robots to collect more information and avoid threats in the dynamic environment within a certain time range through distributed planning. The technical scheme of the invention is that intention sharing and prediction are fused in a distributed planning method, then the shared and predicted teammate intentions are fused in a local search tree, and finally a global reward is formed, so that the local tree search is guided, and an effective decision is finally formed. The invention has the advantages of low communication cost, universality and high efficiency.
Description
Technical Field
The invention belongs to the field of robots, relates to a multi-robot task planning method, and particularly relates to a distributed task planning method of multiple robots in a dynamic environment. The method can be applied to multi-robot distributed planning in disaster search and rescue scenes such as earthquakes, fires, nuclear radiation leakage and the like.
Background
The Monte Carlo tree search belongs to a random sampling or statistical test method, is a branch of computational mathematics, and is developed in the fortieth century in order to adapt to the development of the current atomic energy cause. The traditional experience method can not approach to the real physical process, and is difficult to obtain satisfactory results, while the Monte Carlo tree searching method can truly simulate the real physical process, so the problem solving and the reality are very consistent, and the satisfactory results can be obtained. This is also a computational method based on probabilistic and statistical theory methods, which are methods that use random numbers (or more commonly pseudo-random numbers) to solve many computational problems. The solved problem is associated with a certain probability model, and statistical simulation or sampling is carried out by an electronic computer to obtain an approximate solution of the problem.
As shown in fig. 1, the monte carlo tree search can be divided into the following steps: selection, spreading, random simulation, back propagation. First, the selection phase starts with the root node and selects successive child nodes down to the leaf nodes. The decision tree is expanded to the optimal direction, which is the essence of the Monte Carlo tree search. That is, to select a tree node as "potential" as possible, what kind of nodes are potential? One is high in the rate of winning and the other is small in the number of times of being examined. The node with high winning rate means that the probability of winning chess is high, and the subsequent methods should be analyzed with more effort. The node with a small number of times of investigation means that the node has not been fully studied and is likely to become a black horse. The expansion is at the selected leaf node, if the win or loss can be judged, the round of game is ended, otherwise, one or more child nodes are created and one of the nodes is selected. From this node, the game is played with a random strategy until a win-or-lose is scored (an accurate return is obtained), which is also referred to as a random simulation. And in the last step, the back propagation is started from the leaf node, and the updated node information is propagated in the back direction.
However, the monte carlo tree search method has a great disadvantage, the search space of the monte carlo tree search method is still very large, and the monte carlo tree search method is a centralized planning method, and is low in expandability and high in calculation cost. So, although in principle the monte carlo method uses a random strategy, in practice some "empirical" strategies can be used, and this empirical acquisition, and how to apply it to the monte carlo tree search, is one of the issues that the present invention is concerned with. In addition, how to better expand the monte carlo tree search method to the distributed robot decision is another problem concerned by the invention, and a set of effective and universal distributed planning method capable of reducing communication cost is formed.
Disclosure of Invention
The invention aims to solve the technical problems of how to share and predict intention information among multiple robots and how to use the information to guide the growth of a local search tree to form a final decision. The invention provides a distributed planning method for searching and rescuing in a multi-robot disaster, which enables multiple robots to cooperate to efficiently implement searching and rescuing and simultaneously avoids danger.
Aiming at the problems, the technical scheme of the invention is as follows:
a distributed task planning method for multiple robots in a dynamic environment comprises three stages of intention prediction, intention sharing and intention prediction fusion, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
The invention can achieve the following beneficial effects:
firstly, the Monte Carlo tree searching method is expanded to the distributed robot planning, a very general distributed planning method is constructed, and the method is suitable for all distributed sequential decision-making problems, namely the planning problem which can be discretely decided in one step;
secondly, the dynamic change of the environment can be processed, because the environment state is predicted in the dynamically changed environment, the environment state can be decoupled from the distributed planning method in the invention to form an independent prediction part, and finally, in a fusion stage, the prediction part is combined and calculated into a joint reward to guide a local Monte Carlo search tree so as to form a joint decision;
finally, the invention obtains various intention information among the robots not completely through sharing, on the contrary, the intention information of each robot teammate is predicted by adopting a prediction method on the larger part, thereby greatly reducing the communication cost, and the communication cost is reduced, so that the method can be suitable for the environment with harsh communication conditions, thereby improving the universality of the method.
Drawings
FIG. 1 is a search graph of a Monte Carlo tree;
FIG. 2 is an overall framework of the invention;
fig. 3 is a flow chart of the method.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The specific implementation mode of the invention comprises the following steps:
a distributed task planning method for multiple robots in a dynamic environment is disclosed, as shown in FIG. 2, and comprises three stages of intention prediction, intention sharing, and fusion of intention sharing and intention prediction, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
As shown in FIG. 3, the inner loop mainly refers to the growth of the local Monte Carlo search tree, and the loop at this layer mainly comprises the fusion of intent sharing and intent prediction, that is, the local joint reward is calculated by the superposition of the shared intent and the predicted intent, so as to guide the growth of the Monte Carlo search tree. Secondly, in the outer-layer circulation, through the periodic planning, the intention is extracted from the current periodic Monte Carlo search tree to form a periodic intention which is shared by others, and meanwhile, the local observation of the user is also shared by others to form a global observation. Through the continuous iteration of the inner and outer layer loops, a global plan is formed finally.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.
Claims (1)
1. A distributed task planning method for multiple robots in a dynamic environment is characterized by comprising three stages of intention prediction, intention sharing and integration of intention sharing and intention prediction, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911022986.1A CN110674470B (en) | 2019-10-25 | 2019-10-25 | Distributed task planning method for multiple robots in dynamic environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911022986.1A CN110674470B (en) | 2019-10-25 | 2019-10-25 | Distributed task planning method for multiple robots in dynamic environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110674470A true CN110674470A (en) | 2020-01-10 |
CN110674470B CN110674470B (en) | 2022-09-23 |
Family
ID=69084366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911022986.1A Active CN110674470B (en) | 2019-10-25 | 2019-10-25 | Distributed task planning method for multiple robots in dynamic environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110674470B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112827174A (en) * | 2021-02-05 | 2021-05-25 | 清华大学 | Distributed multi-robot target searching method |
CN117521576A (en) * | 2024-01-08 | 2024-02-06 | 深圳鸿芯微纳技术有限公司 | Computing resource sharing method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103278164A (en) * | 2013-06-13 | 2013-09-04 | 北京大学深圳研究生院 | Planning method for simulated path of robot under complex dynamic scene and simulation platform |
CN109540150A (en) * | 2018-12-26 | 2019-03-29 | 北京化工大学 | One kind being applied to multi-robots Path Planning Method under harmful influence environment |
CN109839110A (en) * | 2019-01-09 | 2019-06-04 | 浙江大学 | A kind of multiple target point path planning method based on quick random search tree |
-
2019
- 2019-10-25 CN CN201911022986.1A patent/CN110674470B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103278164A (en) * | 2013-06-13 | 2013-09-04 | 北京大学深圳研究生院 | Planning method for simulated path of robot under complex dynamic scene and simulation platform |
CN109540150A (en) * | 2018-12-26 | 2019-03-29 | 北京化工大学 | One kind being applied to multi-robots Path Planning Method under harmful influence environment |
CN109839110A (en) * | 2019-01-09 | 2019-06-04 | 浙江大学 | A kind of multiple target point path planning method based on quick random search tree |
Non-Patent Citations (1)
Title |
---|
陈彦杰等: "局部环境增量采样的服务机器人路径规划", 《仪器仪表学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112827174A (en) * | 2021-02-05 | 2021-05-25 | 清华大学 | Distributed multi-robot target searching method |
CN112827174B (en) * | 2021-02-05 | 2024-05-07 | 清华大学 | Distributed multi-robot target searching method |
CN117521576A (en) * | 2024-01-08 | 2024-02-06 | 深圳鸿芯微纳技术有限公司 | Computing resource sharing method, device, equipment and medium |
CN117521576B (en) * | 2024-01-08 | 2024-04-26 | 深圳鸿芯微纳技术有限公司 | Computing resource sharing method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN110674470B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA3060900A1 (en) | System and method for deep reinforcement learning | |
CN110674470B (en) | Distributed task planning method for multiple robots in dynamic environment | |
CN108363478B (en) | For wearable device deep learning application model load sharing system and method | |
CN109034670A (en) | Satellite on-orbit activity planning method and system | |
Wang et al. | Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework | |
CN112221149B (en) | Artillery and soldier continuous intelligent combat drilling system based on deep reinforcement learning | |
JP2022522180A (en) | Insulation development path prediction methods, equipment, equipment and computer programs | |
Fu | Simulation-based algorithms for Markov decision processes: Monte Carlo tree search from AlphaGo to AlphaZero | |
Tang et al. | ADP with MCTS algorithm for Gomoku | |
Chen et al. | Policy gradient from demonstration and curiosity | |
Srivastava et al. | Implementation of ant colony optimization in economic load dispatch problem | |
Juan et al. | Optimization of fuzzy rule based on adaptive genetic algorithm and ant colony algorithm | |
Tong et al. | Enhancing rolling horizon evolution with policy and value networks | |
Galván-López et al. | Heuristic-based multi-agent monte carlo tree search | |
Vale et al. | A machine learning-based approach to accelerating computational design synthesis | |
CN113139644B (en) | Information source navigation method and device based on deep Monte Carlo tree search | |
CN110991712B (en) | Planning method and device for space debris removal task | |
CN112827174B (en) | Distributed multi-robot target searching method | |
CN114861368A (en) | Method for constructing railway longitudinal section design learning model based on near-end strategy | |
Itazuro et al. | Design environment of reinforcement learning agents for intelligent multiagent system | |
Uchiya et al. | IDEAL: Interactive design environment for agent system with learning mechanism | |
Zaw et al. | Verifying the gaming strategy of self-learning game by using PRISM-games | |
Yang | Multi-agent actor-critic reinforcement learning for argumentative dialogue systems | |
Omondi et al. | A Selection Variation for Improved Throughput and Accuracy of Monte Carlo Tree Search Algorithms | |
Waledzik et al. | Proactive and reactive risk-aware project scheduling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |