CN110989352B

CN110989352B - Group robot collaborative search method based on Monte Carlo tree search algorithm

Info

Publication number: CN110989352B
Application number: CN201911272386.0A
Authority: CN
Inventors: 丁肇红; 吴莹莹; 温晓静
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2022-05-27
Anticipated expiration: 2039-12-06
Also published as: CN110989352A

Abstract

The invention discloses a group robot collaborative search method based on a Monte Carlo tree search algorithm, and belongs to the technical field of multi-agent target search. The method comprises the following steps: setting a target position according to a target monitoring area of a ground mobile robot group; the ground mobile robot track planning is used for solving the problem of two-dimensional track planning, possible access node sequences in the collaborative search process of the ground mobile robot are determined based on a Monte Carlo tree search algorithm, and probability distribution corresponding to the access sequences of the robot is optimized by a probability descent method; communicating with other ground robots, updating the joint probability distribution of the access sequences of the ground robot group, and selecting a first node in the access sequence with the highest probability as a next access node of the robot; and combining the motion constraint of the ground robot, realizing smooth closed-loop track planning of the ground mobile robot by utilizing a piecewise smooth curve, and solving the optimal observation problem of target search of the ground mobile robot group cooperation area under the time constraint.

Description

Group robot collaborative search method based on Monte Carlo tree search algorithm

Technical Field

The invention belongs to the technical field of multi-agent regional monitoring, and particularly relates to a group robot collaborative searching method based on a Monte Carlo tree searching algorithm.

Background

At present, the multi-agent environment perception technology is mainly used for passively completing tasks such as environment detection, target identification and tracking, real-time positioning, map construction and the like, and the number of related agents is mostly single. In addition, the research field of the ground mobile robot group is mostly concentrated on centralized formation of the robot group, a communication mechanism between the robots, task resource allocation between multiple robots, and the like, and the research field of the collaborative target search of the mobile robot group is rarely studied. With the rapid development of deep learning algorithms, the current popular deep learning mainly focuses on the processing of text, images, videos and other data, but the process is long in time consumption and high in operation complexity, and cannot be applied to an actual multi-robot system. In a complex large-scale dynamic environment, the amount of information required by the robot to interact with the environment is large, and the robot cannot well sense an active target through a deep learning method.

The target search of the mobile robot in the existing literature is focused on the known static environment, the path between a starting point and an end point is obtained by discretizing the environment and adopting the traditional search algorithm, and the path is an absolute shortest path under the resolution of an environment map; the robot track planning is carried out by utilizing an improved A-algorithm and a particle swarm algorithm, but the A-algorithm is slow in searching speed and large in calculation amount, so that the optimal track of a robot group is difficult to find under the condition of ensuring multiple constraints; the planning space division of the particle swarm algorithm is rough, motion constraint conditions are difficult to meet, the space search outside the optional path set cannot be realized, the accuracy of the particle swarm algorithm is not high enough, and the optimal track is difficult to find under the complex environment and the multi-constraint conditions; most mobile robotic target search research focuses on single robotic applications, and rarely involves collaborative searching and perception of group robots.

The Monte Carlo tree search algorithm is a game tree search algorithm which utilizes a Monte Carlo method as evaluation, does not need to introduce excessive field knowledge, and has huge expandability. The upper limit confidence interval strategy is a method for solving the multi-arm bandit problem. The upper confidence interval strategy based on the UCT Monte Carlo tree search algorithm is proved to greatly improve the level of a computer game engine. At present, the algorithm is applied to game development, and a few papers apply the Monte Carlo algorithm to the online trajectory planning of a single robot and mostly concentrate on a two-dimensional space. In the field of robot group target search, no patent exists yet for realizing a target search task by utilizing the algorithm.

Disclosure of Invention

Aiming at large-scale and incompletely known unstructured complex environments, the regional collaborative rapid trajectory planning and target searching problems of the group robots under multiple constraints are solved by using the Monte Carlo tree search algorithm and segmented Dubins planning. The method plans a most effective track for each ground mobile robot, searches and observes targets in an area at the minimum cost and the highest speed, and collects area information in real time to the maximum extent so as to achieve the aim of effectively monitoring a large-scale area.

In order to achieve the above purpose, the technical solution for solving the technical problem is as follows:

a group robot collaborative search method based on a Monte Carlo tree search algorithm comprises the following steps:

step 1: setting parameters including a travel budget time threshold value of a ground mobile robot group and iteration times of a Monte Carlo tree search algorithm; setting n target positions of the ground mobile robot group according to a target monitoring area of the ground mobile robot group; the ground mobile robot group comprises N ground mobile robots moving independently; each ground mobile robot is provided with a vision sensor, the monitoring radius of each ground mobile robot is set according to the sensing distance of each vision sensor, and the monitoring radius is used as the neighborhood of each target position;

step 2: randomly selecting an initial position for each ground mobile robot in an environment to be detected, obtaining an access node sequence by each ground mobile robot in a parallel mode based on a Monte Carlo tree search algorithm and detecting the environment, wherein in the observation process, the more data collected by a vision sensor and a laser radar, the higher the reward value is, forming an access sequence meeting time constraint by the robot group according to a greedy principle, and after observation, returning each ground mobile robot to the respective initial position to respectively form a corresponding closed-loop track;

and step 3: continuously repeating the step 2 until the time budget of the ground mobile robot group is exhausted or the maximum iteration times is reached, and outputting an access node sequence of the ground mobile robot group;

and 4, step 4: and synchronously executing the steps by each ground mobile robot in the ground mobile robot group, and enabling the multi-ground mobile robot to visit the position of the target object as much as possible within the shortest time by the planned track, thereby realizing a quick cooperative target search task.

Further, the step 2 is realized by the following steps:

step 21: an improved method of a Monte Carlo tree search algorithm, namely an upper limit confidence interval tree search algorithm, is adopted, the Monte Carlo tree search method is combined with an upper limit confidence interval formula, and possible access node sequences in the individual collaborative search process of each ground mobile robot are determined;

step 22: optimizing the probability distribution corresponding to the access sequence of the ground mobile robot by a random probability descent method, and completing the probability updating of the access sequence by utilizing the maximum entropy principle, wherein the probability updating formula is as follows:

wherein x is a possible access sequence of the ground mobile robot and q is_nTo access the probability distribution corresponding to the sequence x,

for the objective function f to the probability distribution q_nExpectation of (1), H (q)_n) Is a probability distribution q_nEntropy of (d);

step 23: communicating with other ground mobile robots, updating the joint probability distribution of the access sequences of the ground mobile robot group, and selecting a first node in the access sequence with the highest probability as a next access node of the ground mobile robot;

step 24: the motion of the ground mobile robot is limited by the maximum turning radius and the maximum speed, and the smooth closed-loop track planning of the ground mobile robot is realized by utilizing the segmented Dubins curve in combination with the motion characteristics.

Further, the step 21 specifically includes the following steps:

step 211: selecting the best expansion node as a child node in the search tree, preferentially selecting the node with the highest reward value in unexplored nodes, namely the node with the most observation information by adopting a greedy principle, and selecting the node with the highest upper limit confidence interval value if all the nodes are visited, wherein the upper limit confidence interval calculation formula is as follows:

wherein a is the node number, t is the iteration number, N_tFor the number of times the node is accessed, Q_t(a) A reward estimate for node a;

step 212: further before the selected child node in step 211, randomly expanding leaf nodes at the node, wherein the newly expanded node is not repeated with the previous child node or the child nodes of the search tree of other ground mobile robots;

step 213: calculating the latest reward estimated value of the expanded leaf node sequence obtained in the step 212;

step 214: after traversing all the nodes in the Monte Carlo tree, updating the reward values of the corresponding nodes on the search tree according to the reward estimation value in the step 213 by adopting a back propagation mode;

step 215: and repeating the step 211 and 214 until the reward value of at least one node in the monte carlo tree reaches the preset threshold, and selecting the node corresponding to the maximum reward value from all the reward values reaching the preset threshold as a new access node.

Further, the step 24 specifically includes the following steps:

the simple kinematic model of each wheeled ground robot system employed is as follows:

wherein x is_P、y_pTheta is the current pose of the chassis of the robot, V is the speed of the robot,μ denotes the turning speed control, the maximum turning rate corresponds to a certain minimum turning radius, and the initial and terminal tangential directions correspond to the initial and terminal coordinates.

Further, the step 3 specifically includes the following steps:

setting the total navigation time T of the ground mobile robot group as an objective function, wherein the expression is as follows:

if the group target function of the ground mobile robots is the minimum, returning the track curve of each ground mobile robot, namely returning the path of each ground mobile robot corresponding to the shortest total running time of the ground mobile robots, and searching the target of each ground mobile robot according to the obtained path; wherein χ is the final trajectory of the robot, X_iIs the ith segmented Dubins curve of the robot.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:

1. compared with a single robot target searching method, the group robot collaborative searching method based on the Monte Carlo tree searching algorithm provided by the invention can solve the problem of rapid track planning of a mobile robot group and realize target searching of the mobile robot group. On the premise of meeting the self motion characteristics, the mobile robot group searches and observes the target at the minimum cost and the highest speed, and collects regional information in real time to the maximum extent;

2. the method solves the problem of multi-collaborative track planning of the multi-ground mobile robot, and uses the Monte Carlo tree search algorithm and the Dubins smooth planning to solve the problems of track fast planning and target search in order to improve the algorithm execution efficiency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic flow chart of a group robot collaborative search method based on a Monte Carlo tree search algorithm according to the present invention;

FIG. 2 is a diagram of a relative position relationship among a ground mobile robot group, a target position and a target position neighborhood related to the group robot collaborative search method based on the Monte Carlo tree search algorithm.

Detailed Description

While the embodiments of the present invention will be described and illustrated in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the specific embodiments disclosed, but is intended to cover various modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims.

As shown in fig. 1, the embodiment discloses a group robot collaborative search method based on a monte carlo tree search algorithm, which includes the following steps:

step 1: setting parameters including a travel budget time threshold value of a ground mobile robot group and iteration times of a Monte Carlo tree search algorithm; setting n target positions of the ground mobile robot group according to a target monitoring area of the ground mobile robot group; the ground mobile robot group comprises N ground mobile robots moving independently; each ground mobile robot is mounted with a vision sensor, and a monitoring radius R of each ground mobile robot is set according to a sensing distance of each vision sensor (in the present embodiment, each vision sensor is the same, and therefore the monitoring radius R of each ground mobile robot is equal and is set as a sensing distance of the vision sensor). In order to save the target searching time and improve the efficiency and the adaptability of target searching, the set monitoring radius is used as the neighborhood of each target position, and the robot group can be regarded as achieving the observation effect only by observing the target in the monitoring radius. Fig. 2 is a diagram of relative position relationship between a robot population (in this embodiment, a robot population is composed of 5 ground mobile robots), a target position and a target position neighborhood, where points a, B, C, D, and E are starting points of the robot population on the ground, black points are target positions of a monitoring area, and a gray circular area is a neighborhood of the target position;

step 2: randomly selecting an initial position for each ground mobile robot in an environment to be detected, obtaining an access node sequence by each ground mobile robot in a parallel mode based on a Monte Carlo tree search algorithm and detecting the environment, wherein in the observation process, the more data collected by a vision sensor and a laser radar, the higher the reward value is, forming an access sequence meeting time constraint by the robot group according to a greedy principle, and after observation, returning each ground mobile robot to the respective initial position to respectively form a corresponding closed-loop track; specifically, the steps are realized through the following steps:

step 21: an improved method of a Monte Carlo tree search algorithm, namely an upper limit confidence interval tree search algorithm, is adopted, the Monte Carlo tree search method is combined with an upper limit confidence interval formula, and possible access node sequences in the individual collaborative search process of each ground mobile robot are determined; specifically, the method specifically comprises the following steps:

step 24: the motion of the ground mobile robot is limited by the maximum turning radius and the maximum speed, and the smooth closed-loop track planning of the ground mobile robot is realized by utilizing the segmented Dubins curve in combination with the motion characteristics. Specifically, the steps specifically include the following:

wherein x is_P、y_pAnd theta is the current pose of the chassis of the robot, V is the speed of the robot, mu represents the turning speed control, the maximum turning speed corresponds to a certain minimum turning radius, and the initial and terminal tangential directions correspond to the initial and terminal coordinates. The Dubins curve is the shortest path connecting two-dimensional planes, which is a practically operable route for a ground mobile robot, under the condition that curvature constraints and prescribed tangential directions of the start and end are satisfied.

And 3, step 3: continuously repeating the step 2 until the time budget of the ground mobile robot group is exhausted or the maximum iteration times is reached, and outputting an access node sequence of the ground mobile robot group; specifically, the steps specifically include the following:

if the group objective function of the ground mobile robots is minimum, returning to the track curve of each ground mobile robot, namely returning to the corresponding ground mobile robot with the shortest total running timeEach ground mobile robot path, and each ground mobile robot carries out target search according to the obtained path; wherein χ is the final trajectory of the robot, X_iIs the ith segmented Dubins curve of the robot.

And 4, step 4: and synchronously executing the steps by each ground mobile robot in the ground mobile robot group, so that the planned track enables the multi-ground mobile robot to visit the positions of the target objects as much as possible in the shortest time, and the quick collaborative target searching task is realized.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A group robot collaborative search method based on a Monte Carlo tree search algorithm is characterized by comprising the following steps:

the step 2 is realized by the following steps:

step 24: the motion of the ground mobile robot is limited by the maximum turning radius and the maximum speed, and the smooth closed-loop track planning of the ground mobile robot is realized by utilizing the segmented Dubins curve in combination with the motion characteristics;

2. The cooperative group robot search method based on the monte carlo tree search algorithm according to claim 1, wherein the step 21 specifically comprises the steps of:

step 214: after traversing all nodes in the Monte Carlo tree, updating the reward values of the corresponding nodes on the search tree by adopting a reverse propagation mode according to the reward estimation value in the step 213;

3. The cooperative group robot search method based on the monte carlo tree search algorithm according to claim 1, wherein the step 24 specifically comprises the following steps:

wherein x is_P、y_pAnd theta is the current pose of the chassis of the robot, V is the speed of the robot, mu represents the turning speed control, the maximum turning speed corresponds to a certain minimum turning radius, and the initial and terminal tangential directions correspond to the initial and terminal coordinates.

4. The cooperative group robot search method based on the monte carlo tree search algorithm according to claim 1, wherein the step 3 specifically comprises the following steps:

if the objective function of the ground mobile robot group is minimum, returning to the track curve of each ground mobile robot, namely returning to the ground mobile robot for the maximum total running timeThe corresponding paths of the ground mobile robots are shortened, and the ground mobile robots carry out target search according to the obtained paths; wherein χ is the final trajectory of the robot, X_iIs the ith segmented Dubins curve of the robot.