CN110674470A

CN110674470A - Distributed task planning method for multiple robots in dynamic environment

Info

Publication number: CN110674470A
Application number: CN201911022986.1A
Authority: CN
Inventors: 杨文靖; 王戟; 徐利洋; 杨绍武; 黄达; 李明龙; 蔡中轩
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-01-10
Anticipated expiration: 2039-10-25
Also published as: CN110674470B

Abstract

The invention belongs to the field of robots, and discloses a distributed task planning method for multiple robots in a dynamic environment, aiming at enabling the multiple robots to collect more information and avoid threats in the dynamic environment within a certain time range through distributed planning. The technical scheme of the invention is that intention sharing and prediction are fused in a distributed planning method, then the shared and predicted teammate intentions are fused in a local search tree, and finally a global reward is formed, so that the local tree search is guided, and an effective decision is finally formed. The invention has the advantages of low communication cost, universality and high efficiency.

Description

Distributed task planning method for multiple robots in dynamic environment

Technical Field

The invention belongs to the field of robots, relates to a multi-robot task planning method, and particularly relates to a distributed task planning method of multiple robots in a dynamic environment. The method can be applied to multi-robot distributed planning in disaster search and rescue scenes such as earthquakes, fires, nuclear radiation leakage and the like.

Background

The Monte Carlo tree search belongs to a random sampling or statistical test method, is a branch of computational mathematics, and is developed in the fortieth century in order to adapt to the development of the current atomic energy cause. The traditional experience method can not approach to the real physical process, and is difficult to obtain satisfactory results, while the Monte Carlo tree searching method can truly simulate the real physical process, so the problem solving and the reality are very consistent, and the satisfactory results can be obtained. This is also a computational method based on probabilistic and statistical theory methods, which are methods that use random numbers (or more commonly pseudo-random numbers) to solve many computational problems. The solved problem is associated with a certain probability model, and statistical simulation or sampling is carried out by an electronic computer to obtain an approximate solution of the problem.

As shown in fig. 1, the monte carlo tree search can be divided into the following steps: selection, spreading, random simulation, back propagation. First, the selection phase starts with the root node and selects successive child nodes down to the leaf nodes. The decision tree is expanded to the optimal direction, which is the essence of the Monte Carlo tree search. That is, to select a tree node as "potential" as possible, what kind of nodes are potential? One is high in the rate of winning and the other is small in the number of times of being examined. The node with high winning rate means that the probability of winning chess is high, and the subsequent methods should be analyzed with more effort. The node with a small number of times of investigation means that the node has not been fully studied and is likely to become a black horse. The expansion is at the selected leaf node, if the win or loss can be judged, the round of game is ended, otherwise, one or more child nodes are created and one of the nodes is selected. From this node, the game is played with a random strategy until a win-or-lose is scored (an accurate return is obtained), which is also referred to as a random simulation. And in the last step, the back propagation is started from the leaf node, and the updated node information is propagated in the back direction.

However, the monte carlo tree search method has a great disadvantage, the search space of the monte carlo tree search method is still very large, and the monte carlo tree search method is a centralized planning method, and is low in expandability and high in calculation cost. So, although in principle the monte carlo method uses a random strategy, in practice some "empirical" strategies can be used, and this empirical acquisition, and how to apply it to the monte carlo tree search, is one of the issues that the present invention is concerned with. In addition, how to better expand the monte carlo tree search method to the distributed robot decision is another problem concerned by the invention, and a set of effective and universal distributed planning method capable of reducing communication cost is formed.

Disclosure of Invention

The invention aims to solve the technical problems of how to share and predict intention information among multiple robots and how to use the information to guide the growth of a local search tree to form a final decision. The invention provides a distributed planning method for searching and rescuing in a multi-robot disaster, which enables multiple robots to cooperate to efficiently implement searching and rescuing and simultaneously avoids danger.

Aiming at the problems, the technical scheme of the invention is as follows:

a distributed task planning method for multiple robots in a dynamic environment comprises three stages of intention prediction, intention sharing and intention prediction fusion, and is realized by the following steps:

first, intent prediction: the method comprises the following steps of sharing current partially-perceivable environment information and a current probabilistic action decision sequence among a plurality of robots, and predicting current unobservable environment, future environment and teammate action decisions based on the conditions, wherein the method comprises the following steps:

1.1 forming a Markov state transition matrix of an environment change rule through expert experience;

1.2 a plurality of robots share current observable environmental information to locally form historical environmental observation information;

1.3 based on the environment historical observation information stored locally and the prediction of the Markov dynamic transfer matrix computing environment;

1.4, predicting the actions of teammates by a greedy method based on heuristic factors, namely predicting the teammates to move towards the nearest path point with the maximum awarded reward by a short-term approximation method, and finally forming intention prediction for the teammates;

second, intention sharing: the method comprises the following steps that a plurality of robots form local behavior intents according to a current Monte Carlo search tree, wherein the behavior intents are represented by action sequence probability distribution, and the method comprises the following steps:

2.1, one branch in the Monte Carlo tree search represents a decision for a future behavior action sequence, and the reward stored by different branches corresponding to local leaf nodes is calculated;

2.2, selecting a part of branches with the largest reward, and correspondingly selecting a part of action decision sequences with the largest reward;

2.3 calculating the probability distribution of the action sequence according to the principle that the probability of the action decision sequence is selected to be the maximum in the future when the reward is larger, and forming the action intention of the user;

2.4, local action intentions are published on a topic through a loosely coupled communication mechanism of publishing and subscribing, and meanwhile, action intention information of other robots is subscribed on the topic;

2.5 storing the behavior intention of other robots in the current planning stage of the current time step to form a local behavior intention for calculating joint rewards later;

thirdly, fusing intention sharing and intention prediction: the method comprises the following steps of based on a local Monte Carlo search tree, calculating a recent reward based on a shared intention, supplementing a long-term reward by a predicted intention, reducing communication, enabling a planning algorithm to look longer and improving a planning effect, and comprises the following steps:

3.1 in the selection stage of the Monte Carlo tree, adopting a dynamic self-adaptive UCT method to realize the balance of exploration and utilization;

3.2 in the expansion stage of the Monte Carlo search tree, adopting a strategy of forcing downward expansion to realize forced exploration on the tree in the depth direction of the tree;

3.3 in the random simulation phase of the Monte Carlo tree, calculating a joint reward by splicing two rewards, thereby guiding the unbalanced growth of the tree and forming a final joint plan, wherein the two rewards comprise a short-term reward and a long-term reward, the short-term reward is calculated by the action expressed in the branch of the current tree of the self and the action sampled from the action intentions of other robots stored locally, and the long-term reward is calculated by the prediction of the teammate action, namely the first step;

3.4 in the backward propagation stage of the Monte Carlo tree, propagating the calculated joint reward to the root node direction of the tree, updating the statistical information stored in the tree, including reward and node access times, returning to 3.1 after executing 3.4, and forming a local inner loop;

3.5 executing the inner loop in 3.4 for a certain number of times, returning to 1.1 to form an outer loop, wherein the inner loop is executed 10000 times, the outer loop is executed 100 times to form an unbalanced search tree, and iteratively searching the branches with the largest prize in the tree to form a final decision sequence of the current planning time step.

The invention can achieve the following beneficial effects:

firstly, the Monte Carlo tree searching method is expanded to the distributed robot planning, a very general distributed planning method is constructed, and the method is suitable for all distributed sequential decision-making problems, namely the planning problem which can be discretely decided in one step;

secondly, the dynamic change of the environment can be processed, because the environment state is predicted in the dynamically changed environment, the environment state can be decoupled from the distributed planning method in the invention to form an independent prediction part, and finally, in a fusion stage, the prediction part is combined and calculated into a joint reward to guide a local Monte Carlo search tree so as to form a joint decision;

finally, the invention obtains various intention information among the robots not completely through sharing, on the contrary, the intention information of each robot teammate is predicted by adopting a prediction method on the larger part, thereby greatly reducing the communication cost, and the communication cost is reduced, so that the method can be suitable for the environment with harsh communication conditions, thereby improving the universality of the method.

Drawings

FIG. 1 is a search graph of a Monte Carlo tree;

FIG. 2 is an overall framework of the invention;

fig. 3 is a flow chart of the method.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The specific implementation mode of the invention comprises the following steps:

a distributed task planning method for multiple robots in a dynamic environment is disclosed, as shown in FIG. 2, and comprises three stages of intention prediction, intention sharing, and fusion of intention sharing and intention prediction, and is realized by the following steps:

As shown in FIG. 3, the inner loop mainly refers to the growth of the local Monte Carlo search tree, and the loop at this layer mainly comprises the fusion of intent sharing and intent prediction, that is, the local joint reward is calculated by the superposition of the shared intent and the predicted intent, so as to guide the growth of the Monte Carlo search tree. Secondly, in the outer-layer circulation, through the periodic planning, the intention is extracted from the current periodic Monte Carlo search tree to form a periodic intention which is shared by others, and meanwhile, the local observation of the user is also shared by others to form a global observation. Through the continuous iteration of the inner and outer layer loops, a global plan is formed finally.

The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A distributed task planning method for multiple robots in a dynamic environment is characterized by comprising three stages of intention prediction, intention sharing and integration of intention sharing and intention prediction, and is realized by the following steps: