CN110674470A - Distributed task planning method for multiple robots in dynamic environment - Google Patents

Distributed task planning method for multiple robots in dynamic environment Download PDF

Info

Publication number
CN110674470A
CN110674470A CN201911022986.1A CN201911022986A CN110674470A CN 110674470 A CN110674470 A CN 110674470A CN 201911022986 A CN201911022986 A CN 201911022986A CN 110674470 A CN110674470 A CN 110674470A
Authority
CN
China
Prior art keywords
tree
intention
reward
action
robots
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911022986.1A
Other languages
Chinese (zh)
Other versions
CN110674470B (en
Inventor
杨文靖
王戟
徐利洋
杨绍武
黄达
李明龙
蔡中轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201911022986.1A priority Critical patent/CN110674470B/en
Publication of CN110674470A publication Critical patent/CN110674470A/en
Application granted granted Critical
Publication of CN110674470B publication Critical patent/CN110674470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Analysis (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Marketing (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention belongs to the field of robots, and discloses a distributed task planning method for multiple robots in a dynamic environment, aiming at enabling the multiple robots to collect more information and avoid threats in the dynamic environment within a certain time range through distributed planning. The technical scheme of the invention is that intention sharing and prediction are fused in a distributed planning method, then the shared and predicted teammate intentions are fused in a local search tree, and finally a global reward is formed, so that the local tree search is guided, and an effective decision is finally formed. The invention has the advantages of low communication cost, universality and high efficiency.

Description

Distributed task planning method for multiple robots in dynamic environment
Technical Field
The invention belongs to the field of robots, relates to a multi-robot task planning method, and particularly relates to a distributed task planning method of multiple robots in a dynamic environment. The method can be applied to multi-robot distributed planning in disaster search and rescue scenes such as earthquakes, fires, nuclear radiation leakage and the like.
Background
The Monte Carlo tree search belongs to a random sampling or statistical test method, is a branch of computational mathematics, and is developed in the fortieth century in order to adapt to the development of the current atomic energy cause. The traditional experience method can not approach to the real physical process, and is difficult to obtain satisfactory results, while the Monte Carlo tree searching method can truly simulate the real physical process, so the problem solving and the reality are very consistent, and the satisfactory results can be obtained. This is also a computational method based on probabilistic and statistical theory methods, which are methods that use random numbers (or more commonly pseudo-random numbers) to solve many computational problems. The solved problem is associated with a certain probability model, and statistical simulation or sampling is carried out by an electronic computer to obtain an approximate solution of the problem.
As shown in fig. 1, the monte carlo tree search can be divided into the following steps: selection, spreading, random simulation, back propagation. First, the selection phase starts with the root node and selects successive child nodes down to the leaf nodes. The decision tree is expanded to the optimal direction, which is the essence of the Monte Carlo tree search. That is, to select a tree node as "potential" as possible, what kind of nodes are potential? One is high in the rate of winning and the other is small in the number of times of being examined. The node with high winning rate means that the probability of winning chess is high, and the subsequent methods should be analyzed with more effort. The node with a small number of times of investigation means that the node has not been fully studied and is likely to become a black horse. The expansion is at the selected leaf node, if the win or loss can be judged, the round of game is ended, otherwise, one or more child nodes are created and one of the nodes is selected. From this node, the game is played with a random strategy until a win-or-lose is scored (an accurate return is obtained), which is also referred to as a random simulation. And in the last step, the back propagation is started from the leaf node, and the updated node information is propagated in the back direction.
However, the monte carlo tree search method has a great disadvantage, the search space of the monte carlo tree search method is still very large, and the monte carlo tree search method is a centralized planning method, and is low in expandability and high in calculation cost. So, although in principle the monte carlo method uses a random strategy, in practice some "empirical" strategies can be used, and this empirical acquisition, and how to apply it to the monte carlo tree search, is one of the issues that the present invention is concerned with. In addition, how to better expand the monte carlo tree search method to the distributed robot decision is another problem concerned by the invention, and a set of effective and universal distributed planning method capable of reducing communication cost is formed.
Disclosure of Invention
The invention aims to solve the technical problems of how to share and predict intention information among multiple robots and how to use the information to guide the growth of a local search tree to form a final decision. The invention provides a distributed planning method for searching and rescuing in a multi-robot disaster, which enables multiple robots to cooperate to efficiently implement searching and rescuing and simultaneously avoids danger.
Aiming at the problems, the technical scheme of the invention is as follows:
a distributed task planning method for multiple robots in a dynamic environment comprises three stages of intention prediction, intention sharing and intention prediction fusion, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
The invention can achieve the following beneficial effects:
firstly, the Monte Carlo tree searching method is expanded to the distributed robot planning, a very general distributed planning method is constructed, and the method is suitable for all distributed sequential decision-making problems, namely the planning problem which can be discretely decided in one step;
secondly, the dynamic change of the environment can be processed, because the environment state is predicted in the dynamically changed environment, the environment state can be decoupled from the distributed planning method in the invention to form an independent prediction part, and finally, in a fusion stage, the prediction part is combined and calculated into a joint reward to guide a local Monte Carlo search tree so as to form a joint decision;
finally, the invention obtains various intention information among the robots not completely through sharing, on the contrary, the intention information of each robot teammate is predicted by adopting a prediction method on the larger part, thereby greatly reducing the communication cost, and the communication cost is reduced, so that the method can be suitable for the environment with harsh communication conditions, thereby improving the universality of the method.
Drawings
FIG. 1 is a search graph of a Monte Carlo tree;
FIG. 2 is an overall framework of the invention;
fig. 3 is a flow chart of the method.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The specific implementation mode of the invention comprises the following steps:
a distributed task planning method for multiple robots in a dynamic environment is disclosed, as shown in FIG. 2, and comprises three stages of intention prediction, intention sharing, and fusion of intention sharing and intention prediction, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
As shown in FIG. 3, the inner loop mainly refers to the growth of the local Monte Carlo search tree, and the loop at this layer mainly comprises the fusion of intent sharing and intent prediction, that is, the local joint reward is calculated by the superposition of the shared intent and the predicted intent, so as to guide the growth of the Monte Carlo search tree. Secondly, in the outer-layer circulation, through the periodic planning, the intention is extracted from the current periodic Monte Carlo search tree to form a periodic intention which is shared by others, and meanwhile, the local observation of the user is also shared by others to form a global observation. Through the continuous iteration of the inner and outer layer loops, a global plan is formed finally.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims (1)

1. A distributed task planning method for multiple robots in a dynamic environment is characterized by comprising three stages of intention prediction, intention sharing and integration of intention sharing and intention prediction, and is realized by the following steps:
first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:
1.1 forming a Markov state transition matrix of an environment change rule through expert experience;
1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;
1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;
1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;
second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:
2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;
2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;
2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;
2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;
2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;
thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:
3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;
3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;
3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;
3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;
3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.
CN201911022986.1A 2019-10-25 2019-10-25 Distributed task planning method for multiple robots in dynamic environment Active CN110674470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911022986.1A CN110674470B (en) 2019-10-25 2019-10-25 Distributed task planning method for multiple robots in dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911022986.1A CN110674470B (en) 2019-10-25 2019-10-25 Distributed task planning method for multiple robots in dynamic environment

Publications (2)

Publication Number Publication Date
CN110674470A true CN110674470A (en) 2020-01-10
CN110674470B CN110674470B (en) 2022-09-23

Family

ID=69084366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911022986.1A Active CN110674470B (en) 2019-10-25 2019-10-25 Distributed task planning method for multiple robots in dynamic environment

Country Status (1)

Country Link
CN (1) CN110674470B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112827174A (en) * 2021-02-05 2021-05-25 清华大学 Distributed multi-robot target searching method
CN117521576A (en) * 2024-01-08 2024-02-06 深圳鸿芯微纳技术有限公司 Computing resource sharing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103278164A (en) * 2013-06-13 2013-09-04 北京大学深圳研究生院 Planning method for simulated path of robot under complex dynamic scene and simulation platform
CN109540150A (en) * 2018-12-26 2019-03-29 北京化工大学 One kind being applied to multi-robots Path Planning Method under harmful influence environment
CN109839110A (en) * 2019-01-09 2019-06-04 浙江大学 A kind of multiple target point path planning method based on quick random search tree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103278164A (en) * 2013-06-13 2013-09-04 北京大学深圳研究生院 Planning method for simulated path of robot under complex dynamic scene and simulation platform
CN109540150A (en) * 2018-12-26 2019-03-29 北京化工大学 One kind being applied to multi-robots Path Planning Method under harmful influence environment
CN109839110A (en) * 2019-01-09 2019-06-04 浙江大学 A kind of multiple target point path planning method based on quick random search tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈彦杰等: "局部环境增量采样的服务机器人路径规划", 《仪器仪表学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112827174A (en) * 2021-02-05 2021-05-25 清华大学 Distributed multi-robot target searching method
CN112827174B (en) * 2021-02-05 2024-05-07 清华大学 Distributed multi-robot target searching method
CN117521576A (en) * 2024-01-08 2024-02-06 深圳鸿芯微纳技术有限公司 Computing resource sharing method, device, equipment and medium
CN117521576B (en) * 2024-01-08 2024-04-26 深圳鸿芯微纳技术有限公司 Computing resource sharing method, device, equipment and medium

Also Published As

Publication number Publication date
CN110674470B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CA3060900A1 (en) System and method for deep reinforcement learning
CN110674470B (en) Distributed task planning method for multiple robots in dynamic environment
CN108363478B (en) For wearable device deep learning application model load sharing system and method
CN109034670A (en) Satellite on-orbit activity planning method and system
Wang et al. Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework
CN112221149B (en) Artillery and soldier continuous intelligent combat drilling system based on deep reinforcement learning
JP2022522180A (en) Insulation development path prediction methods, equipment, equipment and computer programs
Fu Simulation-based algorithms for Markov decision processes: Monte Carlo tree search from AlphaGo to AlphaZero
Tang et al. ADP with MCTS algorithm for Gomoku
Chen et al. Policy gradient from demonstration and curiosity
Srivastava et al. Implementation of ant colony optimization in economic load dispatch problem
Juan et al. Optimization of fuzzy rule based on adaptive genetic algorithm and ant colony algorithm
Tong et al. Enhancing rolling horizon evolution with policy and value networks
Galván-López et al. Heuristic-based multi-agent monte carlo tree search
Vale et al. A machine learning-based approach to accelerating computational design synthesis
CN113139644B (en) Information source navigation method and device based on deep Monte Carlo tree search
CN110991712B (en) Planning method and device for space debris removal task
CN112827174B (en) Distributed multi-robot target searching method
CN114861368A (en) Method for constructing railway longitudinal section design learning model based on near-end strategy
Itazuro et al. Design environment of reinforcement learning agents for intelligent multiagent system
Uchiya et al. IDEAL: Interactive design environment for agent system with learning mechanism
Zaw et al. Verifying the gaming strategy of self-learning game by using PRISM-games
Yang Multi-agent actor-critic reinforcement learning for argumentative dialogue systems
Omondi et al. A Selection Variation for Improved Throughput and Accuracy of Monte Carlo Tree Search Algorithms
Waledzik et al. Proactive and reactive risk-aware project scheduling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant