CN104063541A

CN104063541A - Hierarchical decision making mechanism-based multirobot cooperation method

Info

Publication number: CN104063541A
Application number: CN201410274560.6A
Authority: CN
Inventors: 梁志伟; 沈萍; 刘娟
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2014-06-18
Filing date: 2014-06-18
Publication date: 2014-09-24
Anticipated expiration: 2034-06-18
Also published as: CN104063541B

Abstract

The present invention provides a multi-robot collaboration method based on a hierarchical decision-making mechanism. Players judge the position of the ball to make formation selections to deal with the game; then all players vote for the best ball-handler forward ball-handler they think at this time, Then assign other roles; judge whether it is a forward ball holder, if it is a forward ball holder, walk to the ball, walk with the ball, use the ideal behavior prediction model to mathematically model the opponent's speed for the forward ball holder to walk The decision-making module for kicking the ball; if it is not the forward ball holder, after other role assignments, walk to the position point and select the formation. The present invention sequentially realizes the selection of the striker with the ball and the distribution of the roles of all other players. At the same time, the DOBMP model is established for the decision-making module of the striker with the ball. Finally, the dynamic programming algorithm is used to optimize the high-dimensional calculation amount brought by the role function. The problem is to ensure the smoothness of the role rotation based on the constantly changing football position.

Description

Multi-robot Collaboration Method Based on Hierarchical Decision-Making Mechanism

技术领域technical field

本发明涉及一种基于分层决策机制的多机器人协作方法。The invention relates to a multi-robot cooperation method based on a hierarchical decision-making mechanism.

背景技术Background technique

现今国际上最具影响力的FIRA(Federation of International Robot-soccerAssociation，国际机器人足球联合会)和RoboCup两大机器世界杯人足球赛，两者最大的区别是FIRA是允许一支球队采用传统的集中控制方式，相当于一支球队中的全体队友受同一个大脑的控制。而RoboCup则必须要求采用分布式控制方式，相当于每个队员有自己的大脑，因而是一个独立的“主体”。这就需要对MAS进行深入地研究，让多个智能体规划以合作和竞争的方式去完成一定的目标任务，使用演化算法和群体智慧以达到一个整体的突破性行为目标。Today, the most influential FIRA (Federation of International Robot-soccer Association, International Robot-soccer Association) and RoboCup are two robot-soccer football tournaments. The biggest difference between the two is that FIRA allows a team to adopt traditional centralized The control method is equivalent to that all teammates in a team are controlled by the same brain. However, RoboCup must adopt a distributed control method, which means that each team member has its own brain and thus is an independent "subject". This requires in-depth research on MAS, allowing multiple agents to plan to complete certain target tasks in a cooperative and competitive manner, and using evolutionary algorithms and group intelligence to achieve an overall breakthrough behavioral goal.

在RoboCup3D仿真比赛中，要想赢得一场足球比赛，单靠个人能力是不可能的，必须有全部队员的相互配合与协作，而且RoboCup3D仿真比赛主要是体现多智能体在复杂动态的环境下如何实现高效地协作和顽强地对抗。RoboCup3D仿真环境的球员人数从2010年的6个智能体变化至2011年的9个人到至今的11个智能体，这对于多智能体的配合提出了更高的要求。In the RoboCup3D simulation game, in order to win a football match, it is impossible to rely on individual ability alone, and all players must cooperate and cooperate with each other, and the RoboCup3D simulation game mainly reflects how multi-agents can play in a complex and dynamic environment. Achieve efficient collaboration and tenacious confrontation. The number of players in the RoboCup3D simulation environment has changed from 6 agents in 2010 to 9 in 2011 to 11 agents so far, which puts forward higher requirements for the cooperation of multiple agents.

关于多机器人的协作机制问题，最近几年国内外都已开始不同程度的探究。例如葡萄牙的FC Portugal针对球员角色分配问题，采用重复性最优分配(IOA，Iterated Optimal Assignment)方法，是基于著名的贪婪算法下寻求受限的最优值，并结合角色交换机制；观察人类的足球运动，有人提出希望通过建立模仿学习机制，统和人类复杂行为与机器人动作，然而鉴于模仿学习的基础框架的未知性，交互接口也很难获得；美国UT Austin Villa队伍应用子任务集优化方法完成目标框架的设计，使用动态角色分配算法协调整体队伍的占位配合；英国BoldHearts队伍使用联盟算法，旨在构建一个强大的联盟团队满足外界环境的要求，能够按照算法优化其动作参数，同时采用无梯度的Infotaxis策略搜索算法，局部最大化信息增益的速率值；美国的Robocanes队采用基于时空模型匹配方法，以建立相关的运动模型和其内部状态，同时参照德国B-Human队伍的行走引擎机制，并用遗传算法和SARSA学习算法优化不同行为动作参数配置。Regarding the cooperation mechanism of multi-robots, in recent years, both at home and abroad have begun to explore to varying degrees. For example, FC Portugal in Portugal adopts the Iterated Optimal Assignment (IOA, Iterated Optimal Assignment) method for the problem of player role assignment, which is based on the famous greedy algorithm to seek the restricted optimal value, combined with the role exchange mechanism; In football, some people proposed to integrate complex human behaviors and robot actions by establishing an imitation learning mechanism. However, due to the unknown basic framework of imitation learning, the interactive interface is also difficult to obtain; the US UT Austin Villa team applied the subtask set optimization method Complete the design of the target framework, and use the dynamic role allocation algorithm to coordinate the occupancy of the entire team; the British BoldHearts team uses the alliance algorithm to build a strong alliance team to meet the requirements of the external environment, and can optimize its action parameters according to the algorithm. Gradient-free Infotaxis strategy search algorithm, which locally maximizes the rate value of information gain; the Robocanes team in the United States uses a space-time model-based matching method to establish the relevant motion model and its internal state, while referring to the walking engine mechanism of the German B-Human team , and use the genetic algorithm and the SARSA learning algorithm to optimize the parameter configuration of different behaviors.

上述方法都需要一定的优化机制和学习方法，针对角色分配问题，其计算量大，更新速度慢。上述问题是在多机器人协作过程中应当予以考虑并解决的问题。The above methods all require a certain optimization mechanism and learning method. For the problem of role assignment, the calculation load is large and the update speed is slow. The above problems should be considered and solved in the process of multi-robot collaboration.

发明内容Contents of the invention

本发明的目的是提供一种基于分层决策机制的多机器人协作方法，实现整个多机器人团队的有效协作，依次实现前锋持球者的选择和其它所有球员角色的分配，同时针对前锋持球者带球决策模块建立DOBMP模型，最后采用动态规划算法优化角色函数所带来的高维数计算量的问题，保证基于足球位置不断变化下的角色轮换的流畅性。The purpose of the present invention is to provide a multi-robot collaboration method based on a hierarchical decision-making mechanism, to realize the effective collaboration of the entire multi-robot team, to realize the selection of the forward ball-handler and the distribution of all other player roles in turn, and at the same time for the forward-ball-handler The dribbling decision-making module establishes the DOBMP model, and finally uses the dynamic programming algorithm to optimize the high-dimensional calculation problem caused by the role function, so as to ensure the smoothness of role rotation based on the constantly changing football position.

本发明的技术解决方案是：Technical solution of the present invention is:

一种基于分层决策机制的多机器人协作方法，A multi-robot collaboration method based on hierarchical decision-making mechanism,

球员依据球的位置判断进行阵型选择去应对比赛；Players make formation choices based on the position of the ball to deal with the game;

接着所有球员投票选出自己认为此时最佳的持球者前锋持球者，再进行其它角色分配；Then all players vote for the forward ball holder who they think is the best ball holder at this time, and then assign other roles;

判断是否为前锋持球者，如果是前锋持球者，则行走至球处，带球行走，使用理想行为预测模型对对手速度进行数学建模用于前锋持球者行走踢球决策模块，即是将球踢至目标点还是行走带球至目标点；Judging whether it is a striker with the ball, if it is a striker with the ball, walk to the ball, walk with the ball, use the ideal behavior prediction model to mathematically model the opponent's speed, and use it in the striker's walking and kicking decision module, that is Whether to kick the ball to the target point or walk with the ball to the target point;

如果不是前锋持球者，则进行其他角色分配后，行走至位置点，进行阵型选择。If it is not the striker with the ball, after assigning other roles, walk to the position point and select the formation.

进一步地，使用理想行为预测模型对对手速度进行数学建模用于前锋持球者行走踢球决策模块，具体为：Further, using the ideal behavior prediction model to mathematically model the opponent's speed is used in the striker's ball-handler's walking and kicking decision-making module, specifically:

由对手的平均速度和其当前所在的位置，计算出对手到达球位置所需要花费的时间T；同时知道我方球员执行踢球动作所花费的时间，设定阈值以预测我方机器人能否成功将球踢至目标点；From the average speed of the opponent and its current position, calculate the time T it takes for the opponent to reach the ball position; at the same time, know the time it takes for our players to perform kicking actions, and set a threshold to predict whether our robot can succeed Kick the ball to the target point;

假设对手可以在t时间内阻止我方踢球，当T-t值越小，我方成功完成踢球任务的可能性越大；Assuming that the opponent can prevent us from kicking the ball within t time, when the T-t value is smaller, the possibility of our team successfully completing the kicking task is greater;

当T-t的值小于设定的阈值时，就认为踢球任务可以成功完成，此时采取将球踢至目标点。When the value of T-t is less than the set threshold, it is considered that the kicking task can be successfully completed, and at this time kick the ball to the target point.

进一步地，在做出决策后对手仍可以阻止我方踢球，更改建立的对手的瞬时速度表，也就是，如果我方未能完成踢球任务就要对速度表设置惩罚值p：Furthermore, the opponent can still prevent us from kicking the ball after making a decision, and change the established opponent's instantaneous speed table, that is, if we fail to complete the kicking task, we must set a penalty value p for the speed table:

$p p = = \frac{{V V}_{err err}}{n no} = = \frac{{V V}_{rea rea} - - {V V}_{ave ave}}{22} - - - - - - ((33))$

其中，V_err是对手的真实速度与平均速度之差，n是采样的瞬时速度的个数。Among them, V _err is the difference between the real speed and the average speed of the opponent, and n is the number of instantaneous speeds sampled.

进一步地，使用动态规划函数优化算法来减少计算量：Further, use the dynamic programming function optimization algorithm to reduce the amount of calculation:

首先计算每个智能体到达第一个角色位置的距离值，然后利用角色分配函数yr计算每个智能体分别到达第一、二个位置的所有可能性组合的距离值，并保存每对智能体到达这两个位置的最低定位代价组合；First calculate the distance value of each agent to the first role position, and then use the role assignment function yr to calculate the distance value of all possible combinations of each agent to reach the first and second positions respectively, and save each pair of agents The lowest positioning cost combination to reach these two locations;

对于第k个智能体建立新的定位是基于k-1个智能体到达{p₁…p_k-1}位置的，即利用角色分配函数yr计算每个智能体分别到达{p₁…p_k-1}位置的所有可能性组合的距离值，并保存每对智能体到达{p₁…p_k-1}位置的最低定位代价组合；The establishment of a new location for the kth agent is based on k-1 agents arriving at {p ₁ …p _k-1 } position, that is, using the role assignment function yr to calculate each agent’s arrival at {p ₁ …p _{k -1} } the distance values of all possible combinations of positions, and save the combination of the lowest positioning cost for each pair of agents to reach {p ₁ ...p _k-1 } position;

随后分配每个智能体到达第p_k个位置的距离值并计算出所有智能体到达这三个不同位置的最低定位代价组合。Then assign the distance value of each agent to reach the p _kth position and calculate the lowest positioning cost combination for all agents to reach these three different positions.

进一步地，在计算最低定位代价组合时：任何子集中存在更低的定位代价，则包含该定位的整个定位方式的代价必然更低。Furthermore, when calculating the lowest positioning cost combination: if there is a lower positioning cost in any subset, the cost of the entire positioning method including this positioning must be lower.

进一步地，使用含不同权重的投票系统进行投票。Further, a voting system with different weights is used for voting.

进一步地，投票系统中，通信信息字节的分配情况为：Further, in the voting system, the allocation of communication information bytes is:

进一步地，球员角色的动态分配，使用的角色分配函数yr以实现最佳占位：Further, for the dynamic allocation of player roles, use the role allocation function yr to achieve the best occupancy:

按照字典排序的方式选择，每个智能体在所有可能的占位方式中，所有智能体的走位之和是最短的路径；According to the selection method of dictionary sorting, the sum of the positions of all agents is the shortest path among all possible occupancy methods of each agent;

在最短路径中，当两名球员在路径上有交点，即会出现碰撞的情况，角色分配函数yr根据三角不等性通过交换两名球员的目标位置来获得更低的代价。In the shortest path, when two players have an intersection on the path, there will be a collision, and the role assignment function yr obtains a lower cost by exchanging the target positions of the two players according to the triangle inequality.

本发明的有益效果是：该方法在投票通信系统的支持下依次实现前锋持球者CF的选择和其它球员角色的分配，并同步更新所有球员角色；对于CF踢球判断机制采用DOBMP模型分析决策；针对角色更新的计算量问题，采用动态规划函数大大的减少了计算量，这对于角色更新的速度有很大的帮助，保证了基于足球位置变化情况下的角色轮换的流畅性。The beneficial effects of the present invention are: the method sequentially realizes the selection of the forward ball holder CF and the distribution of other player roles under the support of the voting communication system, and synchronously updates all player roles; the DOBMP model analysis and decision-making is adopted for the CF kick judging mechanism ;Aiming at the computational complexity of character update, the use of dynamic programming function greatly reduces the computational complexity, which is of great help to the speed of character update and ensures the smoothness of character rotation based on the change of football position.

附图说明Description of drawings

图1是分层决策机制优化过程示意图。Figure 1 is a schematic diagram of the optimization process of the hierarchical decision-making mechanism.

图2是阵型选择的示意图。Figure 2 is a schematic diagram of formation selection.

图3是整体占位阵型图。Figure 3 is a diagram of the overall occupancy formation.

图4是最小代价占位说明示意图。Fig. 4 is a schematic diagram illustrating the minimum cost occupancy.

图5是使用DOBPM对对手速度进行数学建模用于CF行走踢球决策模块的决策流程图。Fig. 5 is a decision-making flow chart of using DOBPM to mathematically model the opponent's speed for the CF walking and kicking decision-making module.

图6是不同比赛模式下的队形占位。Figure 6 shows the formation occupancy in different game modes.

图7是角色轮换进攻示意图。Fig. 7 is a schematic diagram of a role rotation attack.

具体实施方式Detailed ways

下面结合附图详细说明本发明的优选实施例。Preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

基于RoboCup3D仿真平台，实施例设计了一种基于分层决策的多机器人协作策略，实现整个多机器人团队的有效协作。策略主要包括基于角色分配函数、投票通信系统和理想行为预测模型(Desired of Behavior Prediction Model，DOBMP)下的分层决策机制三个方面，依次实现前锋持球者(CenterForward，CF)的选择和其它所有球员角色的分配，同时针对CF带球决策模块建立DOBMP模型，最后采用动态规划算法优化角色函数所带来的高维数计算量的问题，保证基于足球位置不断变化下的角色轮换的流畅性。Based on the RoboCup3D simulation platform, the embodiment designs a multi-robot cooperation strategy based on hierarchical decision-making to realize effective cooperation of the entire multi-robot team. The strategy mainly includes three aspects based on the role assignment function, the voting communication system and the hierarchical decision-making mechanism under the Desired of Behavior Prediction Model (DOBMP). The distribution of all player roles, and the establishment of a DOBMP model for the CF dribbling decision-making module. Finally, the dynamic programming algorithm is used to optimize the high-dimensional calculation problem caused by the role function, so as to ensure the smoothness of role rotation based on the changing football position. .

实施例Example

在Apollo3D的策略中，采用分层决策机制(Hierarchical Decision Making，简称HDM)，如图1所示，所谓分层决策就是首先球员依据球的位置判断当前应该采用什么阵型去应对比赛，接着所有球员投票选出自己认为此时最佳的持球者CF，因为一场足球比赛关键在于CF，它是该传球还是自己带球前进，这都是整体团队策略选择的关键。整个决策过程中角色与球员之间不是一直保持固定不变的，当前时刻机器人A接球最方便，它可能就是前锋CF，下一时刻由于对手拦截，A无法实现自己带球突破就将球传给队友，传球后A就会根据当前时刻的占位变换角色。此时其它球员的角色和位置都是依据自身当前时刻的位置而定的，最后采用一种协调机制来实现所有球员之间的通信和角色的同步更新，即通过通信系统发送球员自身位置和球的位置，这样每个球员就可以获知其队友的最佳占位，就使得球员达成一致意见，便更有利于球员之间的协作。其中CF的选择还得依据：该球员是否摔倒、能否看到球、球在他的前方还是后方、距离足球的距离、是否是守门员，还有他上一个决策周期中是否是CF，上述每种情况所占的权重都是不一样的。In Apollo3D's strategy, Hierarchical Decision Making (HDM) is adopted, as shown in Figure 1. The so-called hierarchical decision-making is that the players first judge what formation should be used to deal with the game based on the position of the ball, and then all players Vote for the CF that you think is the best ball holder at this time, because the key to a football game is CF, whether it should pass the ball or carry the ball forward, this is the key to the overall team strategy choice. The relationship between roles and players is not always fixed throughout the decision-making process. At the moment, robot A is the most convenient to receive the ball. It may be the striker CF. At the next moment, due to the interception of the opponent, A cannot achieve his own breakthrough with the ball and pass the ball. To a teammate, after passing the ball, A will change roles according to the occupancy at the current moment. At this time, the roles and positions of other players are determined according to their own current positions. Finally, a coordination mechanism is adopted to realize the communication between all players and the synchronous update of roles, that is, to send the player's own position and ball information through the communication system. In this way, each player can know the best position of his teammates, so that the players can reach a consensus, which is more conducive to the cooperation between players. Among them, the choice of CF has to be based on: whether the player fell, whether he can see the ball, whether the ball is in front of him or behind him, the distance from the football, whether he is a goalkeeper, and whether he was CF in the last decision cycle. The weight of each case is different.

阵型选择formation selection

和人类足球赛一样，RoboCup3D仿真比赛应对不同情况也设置相应的比赛模式，如开球(Kick-Off)、球门球(Goal_Kick)、边线球(Throw_In)、角球(Corner_Kick)等等。从足球比赛的角度来看，球队整体策略可以分为进攻和防守两大体系，球员的动作选择实际上就是依据控球的是我方还是对方，我们控球就进入进攻状态，对方控球就进入防守状态。不同的阵型如图2所示。Like a human soccer game, RoboCup3D simulation games also set corresponding game modes for different situations, such as Kick-Off, Goal_Kick, Throw_In, Corner_Kick and so on. From the perspective of a football game, the team's overall strategy can be divided into two systems: offense and defense. The player's action selection is actually based on whether the ball is controlled by us or the opponent. We enter the offensive state when we control the ball, and the opponent controls the ball. Get into a defensive state. The different formations are shown in Figure 2.

通常所说的球队队形是依据球的位置而言的，如图3所示，当球位于球场中心时球队的整体站位，整个队形可以分为进击和守卫两个部分，进击部分的角色位置是依据求的坐标位置再添加一定的偏移量得到的，包括CF、WFL、WFR、SFL、SFR、CAM和FF。唯一特殊的就是CF这个角色，它总是距离球最近的球员，将球的位置定为它的坐标位置。由球门中心和球的位置连成一条线，守卫型球员CDM、CBL和CBR的位置均在在这条线上，且依据球场底线再添加一定的偏移量即可。而守门员GK的位置基本上是不受其队友影响的，这是为了保证自家球门不失，如果GK某时刻是CF的绝佳人选时，必将有另一名球员分配为GK角色并站在球门中心处。Generally speaking, the formation of the team is based on the position of the ball. As shown in Figure 3, when the ball is in the center of the court, the overall position of the team, the entire formation can be divided into two parts: attacking and guarding. Part of the character position is obtained by adding a certain offset based on the requested coordinate position, including CF, WFL, WFR, SFL, SFR, CAM and FF. The only special thing is the role of CF, which is always the player closest to the ball, and the position of the ball is set as its coordinate position. A line is formed by the center of the goal and the position of the ball. The positions of the guard players CDM, CBL and CBR are all on this line, and a certain offset can be added according to the bottom line of the court. The position of the goalkeeper GK is basically not affected by his teammates. At the center of the goal.

角色分配函数y_r role assignment function y _r

整体队形确定后关键就是球员角色的动态分配，使用的角色分配函数y_r以实现最佳占位，当输入外界状态信息时，函数会计算出当前时刻球员与角色最佳匹配情况。在论述该函数之前必须满足三个前提条件：After the overall formation is determined, the key is the dynamic allocation of player roles. The role allocation function y _r is used to achieve the best occupancy. When the external state information is input, the function will calculate the best match between players and roles at the current moment. Three prerequisites must be met before discussing this function:

(1)选取距离最近的位置：每个智能体在所有可能的占位方式中，分别取出离它们最近(较近)的位置，以确保所有智能体的走位之和是最短的，这就需要按照字典排序的方式进行选择。(1) Select the closest position: each agent takes out the nearest (nearer) position from all possible occupying ways to ensure that the sum of the positions of all agents is the shortest, which is The selection needs to be sorted lexicographically.

(2)避障：球员在移动至它们既定的位置时应该尽量避免与其它的球员发生碰撞。(2) Obstacle Avoidance: Players should try to avoid collisions with other players when moving to their predetermined positions.

(3)动态一致：只要给定一系列的目标位置，如果y_r在时刻T输出占位方式m，那么球员在移动至目标位置过程中f都将输出的是m。(3) Dynamic consistency: As long as a series of target positions are given, if y _r outputs occupancy mode m at time T, then f will output m when the player moves to the target position.

如果有n个球员，将会有n！中占位。给定外界的状态信息，尤其是n个球员的位置和n个目标位置。用n元组表示一种占位方式的代价并且依次降序排列。这样可以依据代价得到n！种可行的占位，按照字典式排序比较这些代价，如图4和表1所示。If there are n players, there will be n! placeholder. Given the state information of the outside world, especially the positions of n players and n target positions. Use n-tuples to represent the cost of a placeholder method and arrange them in descending order. In this way, n can be obtained according to the cost! A feasible placeholder, and compare these costs in lexicographic order, as shown in Figure 4 and Table 1.

各种占位方式的代价按照字典式排序：The cost of various placeholders is sorted according to the dictionary:

表1占位代价排序Table 1 Placeholder cost sorting

这种按照字典式排序很容易得出属性1要求的最小值。如果两名球员在路径上有交点即会出现碰撞的情况，函数y_r会根据三角不等性通过交换它们的目标位置能够获得更低的代价。This sorting lexicographically easily yields the minimum value required by attribute 1. If there is a collision between two players on the path, the function y _r can obtain a lower cost by exchanging their target positions according to the triangle inequality.

投票通信系统voting communication system

为了让球队所有球员都能准确无误的到达各自的目标位置，就必须要求所有球员能够协调一致并且对于正在执行角色占位毫无异议。如果球员能获知球和其队友在球场上的准确位置，那么就无需球员之间的协调一致，因为每个球员都能独立的计算出需要使用的最佳占位。但是问题就在于球员自身有120°的视角限制，并且接收到的感知信息都是夹杂噪声干扰的，所以看到的物体在距离和角度上都是有误差的，因而无法获得准确的位置信息。所幸的是Simspark中允许智能体进行内部通信，即球员之间在每隔一个仿真周期(40ms)能够相互通信，但这种通信渠道的带宽是有限制的，每次只能有一名球员发送信息并且信息的内容限制在20个字节。In order for all players on the team to reach their respective target positions accurately, it is necessary for all players to be able to coordinate and have no objection to the role occupation being performed. If players know exactly where the ball and their teammates are on the pitch, then there is no need for coordination between players, as each player can independently calculate the best position to use. But the problem is that the players themselves have a 120° viewing angle limitation, and the received perceptual information is mixed with noise interference, so the seen objects have errors in distance and angle, so accurate position information cannot be obtained. Fortunately, Simspark allows agents to communicate internally, that is, players can communicate with each other every other simulation cycle (40ms), but the bandwidth of this communication channel is limited, and only one player can send information at a time And the content of the message is limited to 20 bytes.

3D仿真环境提供一个所谓的声音系统，使得每个机器人能够每两个周期(40ms)广播自己要‘说’的信息，其它机器人能够在下一个仿真周期接收到该信息为‘听’，但是无法获知接收到的信息来自于那个智能体，所以有必要在发送的信息中添加球员号。所有球员发送和接收的信息都是限制在20个字节的ASCII码，而且有部分ASCII码是不允许使用的。Apollo3D为了压缩数据量，将球场分割成5000*5000大小的方格，使用‘*’到‘～’之间的83个字符进行编码，则可传递8320比特信息。The 3D simulation environment provides a so-called sound system, so that each robot can broadcast the information it wants to 'speak' every two cycles (40ms), and other robots can receive the information as 'listening' in the next simulation cycle, but they cannot know it The received message is from that agent, so it is necessary to add the player number to the sent message. All messages sent and received by players are limited to 20 bytes of ASCII code, and some ASCII codes are not allowed to be used. In order to compress the amount of data, Apollo3D divides the stadium into 5000*5000 squares, and uses 83 characters between '*' and '~' to encode, so 8320 bits of information can be transmitted.

信息字节的具体分配情况如下表2所示，其中值得注意的第14-18字节作为Apollo3D使用的分层决策系统的基础。此外，Apollo3D对于每个球员发送的‘说’和接收的‘听’信息使用加解密策略，以确保我方信息通信的安全并增加一定的抗干扰能力。The specific allocation of information bytes is shown in Table 2 below, among which the noteworthy 14th-18th bytes serve as the basis for the hierarchical decision-making system used by Apollo3D. In addition, Apollo3D uses an encryption and decryption strategy for the 'talking' and receiving 'listening' information sent by each player to ensure the security of our information communication and increase certain anti-interference capabilities.

表2通信信息字节的分配情况Table 2 Allocation of communication information bytes

必须强调的是仅仅使用通信最后接受到的占位信息是很不明智的，因为比赛过程中有噪声的干扰、球员出现摔倒或是自身定位误差有积累时，又或是从服务器发来的信息偶有丢失或是延时的情况出现，球员接收到的信息就更加不准确了。所以使用含不同权重的投票系统，即便是偶有的信息丢失或是球员发出的错误数据的情况出现，也能使得全队使用统一的占位。It must be emphasized that it is unwise to only use the occupancy information received by the communication at the end, because there is noise interference during the game, when the player falls or the positioning error accumulates, or it is sent from the server Information is occasionally lost or delayed, and the information received by players is even more inaccurate. So using a voting system with different weights allows the whole team to use a unified placeholder even if there is an occasional loss of information or wrong data sent by a player.

使用含不同权重的投票系统，具体是，在比赛中，带球者的任务最重，可以称之为CF。其中CF的选择依据为：该球员是否摔倒、能否看到球、球在他的前方还是后方、距离足球的距离、是否是守门员，还有该球员在上一个决策周期中是否是CF。上述每种情况所占的权重都是不一样的，而是用(0，1)之间的概率表示。Use a voting system with different weights, specifically, in the game, the ball carrier has the heaviest task, which can be called CF. Among them, CF is selected based on: whether the player fell, whether he can see the ball, whether the ball is in front of him or behind him, the distance from the football, whether he is a goalkeeper, and whether the player was CF in the previous decision cycle. The weight of each of the above cases is different, but represented by a probability between (0, 1).

理想的行为预测模型Ideal Behavioral Prediction Model

在MAS中，对于其他智能体的行为预测是一项颇具挑战性的研究。理论上来说，单智能体可以直接观察其他智能体的行为，从而建立固定的行为模型，但是只有当智能体之间有许多重复性的信息交互才能建立模型。在RoboCup3D仿真比赛中，无法通过简单的观察就预知对手的行为，而且比赛实时变化过程中也很难有足够的交互行为来建立有用的模型。In MAS, behavior prediction for other agents is a challenging research. In theory, a single agent can directly observe the behavior of other agents to establish a fixed behavior model, but only when there are many repetitive information interactions between agents can the model be established. In the RoboCup3D simulation game, it is impossible to predict the behavior of the opponent through simple observation, and it is difficult to have enough interactive behaviors to build a useful model during the real-time changes of the game.

实施例设计了一种理想的行为预测模型DOBPM，以预测单智能体在给定条件下的最佳行为。DOBPM不是基于理论分析来假定其他智能体将会做什么，而是分析他们的最佳行为误差以描述其预期行为。DOBPM模型可以用于决定何时射门、传球和最佳的抢球时刻等等。实施例使用DOBPM对对手速度进行数学建模用于CF行走踢球决策模块，即是将球踢至目标点还是行走带球至目标点。整体决策的流程图如图5所示。Embodiments An ideal behavior prediction model DOBPM is designed to predict the optimal behavior of a single agent under given conditions. Instead of assuming what other agents will do based on theoretical analysis, DOBPM analyzes their best behavior errors to describe their expected behavior. The DOBPM model can be used to decide when to shoot, pass and the best moment to grab the ball, etc. The embodiment uses DOBPM to mathematically model the opponent's speed for the CF walking and kicking decision-making module, that is, whether to kick the ball to the target point or to walk and take the ball to the target point. The flow chart of the overall decision-making is shown in Figure 5.

在比赛过程中，首先采样对手在几个周期内的行走速度值，并计算出其瞬时速度V_i：During the game, first sample the walking speed value of the opponent in several cycles, and calculate its instantaneous speed V _i :

${V V}_{i i} = = \frac{\sqrt{{(({y the y}_{c c} - - {y the y}_{b b}))}^{22} + + {(({x x}_{c c} - - {x x}_{b b}))}^{22}}}{Δt Δt} - - - - - - ((11))$

其中(x_b,y_b)是采样上一时刻的对手的位置值，(x_c,y_c)是当前时刻对手的位置值。为了得到对手的平均速度，可以使用调和平均数的方法：Wherein (x _b , y _b ) is the position value of the opponent at the previous sampling moment, and (x _c , y _c ) is the position value of the opponent at the current moment. To get the average speed of the opponent, the harmonic mean method can be used:

${V V}_{ave ave} = = \frac{11}{\frac{11}{n no} {Σ Σ}_{i i = = 00}^{n no} \frac{11}{{V V}_{i i}}} = = \frac{n no}{\frac{11}{{V V}_{11}} + + \frac{11}{{V V}_{22}} + + . . . . . . \frac{11}{{V V}_{n no}}} - - - - - - ((22))$

由对手的平均速度和其当前所在的位置，可以计算出对手到达球位置所需要花费的时间T。同时也知道我方球员执行踢球动作所花费的时间，就可以设定阈值以预测我方机器人能否成功将球踢至目标点。假设对手可以在t时间内阻止我方踢球，当T-t值越小，我方成功完成踢球任务的可能性越大。当T-t的值小于设定的阈值时，就认为踢球任务可以成功完成，此时采取将球踢至目标点。如果做出决策后对手仍可以阻止我方踢球，则说明对对手的平均速度的预测值是不准确的，此时应该更改建立的对手的瞬时速度表。也就是说，如果我方未能完成踢球任务就要对速度表设置惩罚值p：From the opponent's average speed and its current location, the time T it takes the opponent to reach the ball position can be calculated. At the same time, knowing the time it takes our players to kick the ball, we can set a threshold to predict whether our robot can successfully kick the ball to the target point. Assuming that the opponent can prevent us from kicking the ball within t time, when the value of T-t is smaller, the possibility of our team successfully completing the kicking task is greater. When the value of T-t is less than the set threshold, it is considered that the kicking task can be successfully completed, and at this time kick the ball to the target point. If the opponent can still prevent us from kicking the ball after making a decision, it means that the prediction of the opponent's average speed is inaccurate, and the established opponent's instantaneous speed table should be changed at this time. In other words, if we fail to complete the kicking task, we must set a penalty value p for the speedometer:

其中，V_err是对手的真实速度与平均速度之差，n是采样的瞬时速度的个数。而其真实速度是对手由初始位置行走至终止位置(球的位置)所花费的时间以及两个位置之间距离求出。Among them, V _err is the difference between the real speed and the average speed of the opponent, and n is the number of instantaneous speeds sampled. And its real speed is calculated from the time spent by the opponent walking from the initial position to the end position (the position of the ball) and the distance between the two positions.

基于分层决策的动态规划优化方法Dynamic Programming Optimization Method Based on Hierarchical Decision

以上分别阐述了分层决策机制的四大模块，基于投票通信机制和理想行为模型的建立，从而得出球员角色分配的过程和踢球一支队伍的11个不同的角色分配给11个机器人，然而守门员总是充当守护球门的角色、CF总是离球最近的球员，其余九个角色位置都是由动态规划函数y_r得出。如果守门员恰巧又是离球最近的球员时，即当GK是CF时，此时y_r需要10！＝3,628,800中不同的定位方案，再分别计算它们的代价并按字典排序选出最优代价值，所有这些计算都必须在0.02s的仿真周期内完成，这就需要考虑使用动态规划函数(Dynamic PlanningFunction)优化算法来减少计算量。The four modules of the hierarchical decision-making mechanism are described above, based on the establishment of the voting communication mechanism and the ideal behavior model, so as to obtain the process of player role allocation and the distribution of 11 different roles of a team to 11 robots. However, the goalkeeper always plays the role of guarding the goal, CF is always the player closest to the ball, and the other nine role positions are all obtained by the dynamic programming function _yr . If the goalkeeper happens to be the player closest to the ball, that is, when GK is CF, then y _r needs to be 10! = 3,628,800 different positioning schemes, and then calculate their costs separately and select the optimal cost value according to the dictionary sorting, all these calculations must be completed within the simulation cycle of 0.02s, which requires the use of dynamic planning function (Dynamic PlanningFunction ) optimization algorithm to reduce the amount of computation.

其中A、P分别代表n个智能体及其位置的集合，定位方式m:＝y_r(A,P)，如果在任何子集中存在更低的定位代价，那么包含该定位的整个定位方式的代价必然更低。对于第k个智能体建立新的定位是基于k-1个智能体到达{p₁…p_k-1}位置的。例如三个机器人的动态规划过程，如表3所示，首先计算三个智能体到达第一个角色位置的距离值，然后利用角色分配函数yr计算三个智能体分别到达第一二个不同位置的所有可能性组合的距离值，并保存每对智能体到达这两个位置的最低定位代价组合。随后分配每个智能体到达第三个位置的距离值并计算出所有智能体到达这三个不同位置的最低定位代价组合。Among them, A and P respectively represent the collection of n agents and their positions, and the positioning method m:=y _r (A, P), if there is a lower positioning cost in any subset, then the entire positioning method including the positioning The price must be lower. Establishing a new location for the kth agent is based on k-1 agents reaching {p ₁ ...p _k-1 } positions. For example, the dynamic programming process of three robots, as shown in Table 3, first calculates the distance value of the three agents to the first role position, and then uses the role assignment function yr to calculate the three agents to reach the first and second different positions respectively The distance values of all possible combinations of , and save the combination of the lowest positioning cost for each pair of agents to reach these two positions. Then assign the distance value for each agent to reach the third location and calculate the lowest combination of positioning costs for all agents to reach these three different locations.

表3三个机器人的占位分配方案Table 3 Occupancy allocation scheme of three robots

n个智能体经过n次动态规划迭代计算，每次相当于对最高次为n-1的二项式的计算：N agents undergo n dynamic programming iterative calculations, and each time is equivalent to the calculation of the binomial with the highest degree n-1:

${Σ Σ}_{k k = = 11}^{n no} (\begin{matrix} n no - - 11 \\ k k - - 11 \end{matrix}) = = {Σ Σ}_{k k = = 00}^{n no - - 11} (\begin{matrix} n no - - 11 \\ k k \end{matrix}) 22^{n no - - 11} - - - - - - ((44))$

则11个智能体参与比赛时，除去守门员共10个智能体参与角色分配，使用动态规划优化算法后计算量为n2^n-1＝10×2⁹＝5120，然而未经过优化的角色分配算法计算量为10！＝3,628,800，明显的降低了计算量，同时也减少了角色切换时间花销。Then, when 11 agents participate in the game, except for the goalkeeper, a total of 10 agents participate in the role assignment. After using the dynamic programming optimization algorithm, the calculation amount is n2 ^n-1 = 10×2 ⁹ = 5120, but the calculation of the unoptimized role assignment algorithm is Quantity is 10! = 3,628,800, which significantly reduces the amount of calculation, and also reduces the time spent on role switching.

实验验证Experimental verification

所有实验都是使用Roboviz中DrawAnnotation函数将我方球员在当前时刻的角色名称在头顶处显示出来，每个角色的含义在图5中都有说明，我方Apollo3D是蓝色的机器人，红色机器人是对手。All experiments use the DrawAnnotation function in Roboviz to display the character names of our players at the current moment above their heads. The meaning of each role is explained in Figure 5. Our Apollo3D is a blue robot, and the red robot is opponent.

实验一：不同比赛模式下的阵型占位Experiment 1: Formation occupancy in different game modes

本实验主要是针对RoboCup3D仿真比赛中不同比赛模式下的队形占位情况，如组图6所示，图6是不同比赛模式下的队形占位，其中，(a)开球前双方占位图；(b)我方左边角球占位图；(c)我方门球占位图；(d)我方禁区角球占位图。CF、WFL、WFR、SFL、SFR、CAM和FF为整个队伍中负责进攻的角色，其中WFL、WFR、SFL、SFR和CAM都是紧随CF身后组成长方形，分别站在长方形的四个角和中心处，并且自身尽量朝向球，这样可以在保证队形的情况下，每个球员距离球位置最近。FF始终站在对方球门禁区前面，经过多次实验表明：CF的射门可能会被对方拦截或者射门角度有偏差，这是FF能尽快的占据有利位置，切换到下一时刻的CF补救射门效果极佳；CDM、CBL、CBR和GK角色是承担自家半场的防守任务，如果我方处于进攻状态时，CDM会占据中场位置，这是为了防止对手反击或者接应我方球员形成我方反击。This experiment is mainly aimed at the formation occupancy in different game modes in the RoboCup3D simulation game, as shown in Figure 6. Figure 6 is the formation occupancy in different game modes. Bit map; (b) our left corner corner map; (c) our goal ball map; (d) our penalty area corner map. CF, WFL, WFR, SFL, SFR, CAM, and FF are the offensive roles in the entire team. Among them, WFL, WFR, SFL, SFR, and CAM follow CF to form a rectangle, standing at the four corners of the rectangle and At the center, and face the ball as far as possible, so that each player is closest to the ball position while ensuring the formation. FF always stands in front of the opponent's goal penalty area. After many experiments, it shows that: CF's shot may be intercepted by the opponent or the shooting angle is deviated. This is because FF can occupy a favorable position as soon as possible. Switching to the next moment of CF's remedial shot is very effective. Good; the role of CDM, CBL, CBR and GK is to undertake the defensive task of their own half. If our team is in an offensive state, CDM will occupy the midfield position. This is to prevent the opponent from counterattacking or responding to our players to form our counterattack.

实验二：角色轮换和DOBMP模型验证Experiment 2: Role rotation and DOBMP model validation

组图7描述的是进攻部分的占位和角色切换，a图中2号球员是前锋CF，由于受到对方球员的阻挡摔倒，2号的角色迅速切换成CAM，此时7号球员面向球并且距离球最近，其角色迅速切换为CF，如图b、c所示；当7号球员也被对手拦截时，应用DOBMP模型判断将球踢至目标点即3号球员位置，同时3号轮换为CF角色，2号和7号同时轮换为SFL和CAM，如图d所示；由于Simspark比赛平台规定：以球为中心半径1米圆内有超过2名同方球员时，会自动弹开所有距离球较远的球员，所以2号球员靠近球时被平台自动弹开，同时5号轮换为CF，3号轮换为SFL，如图e所示。Group picture 7 describes the position and role switching of the offensive part. Player No. 2 in picture a is the forward CF. Because he was blocked by the opponent and fell down, the role of No. 2 quickly switched to CAM. At this time, player No. 7 faced the ball And the distance from the ball is the closest, and its role is quickly switched to CF, as shown in Figures b and c; when player No. 7 is also intercepted by the opponent, the DOBMP model is used to judge that the ball is kicked to the target point, which is the position of player No. 3, and No. 3 rotates at the same time As the CF role, No. 2 and No. 7 are rotated into SFL and CAM at the same time, as shown in Figure d; due to the Simspark game platform regulations: when there are more than 2 players from the same side within a radius of 1 meter with the ball as the center, all players will be automatically ejected. The player who is far away from the ball, so when the player No. 2 is close to the ball, he is automatically bounced off by the platform. At the same time, No. 5 is rotated to CF, and No. 3 is rotated to SFL, as shown in Figure e.

基于角色分配下的分层决策机制就是在投票通信系统的支持下依次实现前锋持球者CF的选择和其它球员角色的分配，并同步更新所有球员角色；对于CF踢球判断机制采用DOBMP模型分析决策；针对角色更新的计算量问题，采用动态规划函数大大的减少了计算量，这对于角色更新的速度有很大的帮助，保证了基于足球位置变化情况下的角色轮换的流畅性。The layered decision-making mechanism based on role distribution is to realize the selection of striker CF and the distribution of other player roles in sequence with the support of the voting communication system, and simultaneously update all player roles; the DOBMP model is used to analyze the CF kicking judgment mechanism Decision-making: Aiming at the computational complexity of character update, the dynamic programming function is used to greatly reduce the computational complexity, which is of great help to the speed of character update and ensures the smoothness of character rotation based on the change of football position.

Claims

1. A multi-robot collaboration method based on hierarchical decision-making mechanism, characterized in that:

Players make formation choices based on the position of the ball to deal with the game;

Then all players vote for the forward ball holder who they think is the best ball holder at this time, and then assign other roles;

Judging whether it is a striker with the ball, if it is a striker with the ball, walk to the ball, walk with the ball, use the ideal behavior prediction model to mathematically model the opponent's speed, and use it in the striker's walking and kicking decision module, that is Whether to kick the ball to the target point or walk with the ball to the target point;

If it is not the striker with the ball, after assigning other roles, walk to the position point and select the formation.

2. the multi-robot collaboration method based on hierarchical decision-making mechanism as claimed in claim 1, is characterized in that, uses ideal behavior prediction model to carry out mathematical modeling to opponent's speed and is used for forward ball holder walking and kicking decision-making module, specifically :

From the average speed of the opponent and its current position, calculate the time T it takes for the opponent to reach the ball position; at the same time, know the time it takes for our players to perform kicking actions, and set a threshold to predict whether our robot can succeed Kick the ball to the target point;

Assuming that the opponent can prevent us from kicking the ball within t time, when the T-t value is smaller, the possibility of our team successfully completing the kicking task is greater;

When the value of T-t is less than the set threshold, it is considered that the kicking task can be successfully completed, and at this time kick the ball to the target point.

3. The multi-robot collaboration method based on hierarchical decision-making mechanism as claimed in claim 2, characterized in that: after making a decision, the opponent can still stop us from kicking the ball, and change the opponent's instantaneous speed table that is established, that is, If we fail to complete the kicking task, we must set a penalty value p for the speedometer:

p p = = \frac{{V V}_{err err}}{n no} = = \frac{{V V}_{rea rea} - - {V V}_{ave ave}}{22} - - - - - - ((33))

Among them, V _err is the difference between the real speed and the average speed of the opponent, and n is the number of instantaneous speeds sampled.

4. The multi-robot collaboration method based on hierarchical decision-making mechanism according to any one of claims 1-3, wherein the dynamic programming function optimization algorithm is used to reduce the amount of calculation:

First calculate the distance value of each agent to reach the first role position, and then use the role assignment function yr to calculate the distance value of all possible combinations of each agent to reach the first and second positions respectively, and save each pair of agents The lowest positioning cost combination to reach these two locations;

The establishment of a new location for the kth agent is based on k-1 agents arriving at {p ₁ …p _k-1 } position, that is, using the role assignment function yr to calculate each agent’s arrival at {p ₁ …p _{k -1} } the distance values of all possible combinations of positions, and save the combination of the lowest positioning cost for each pair of agents to reach {p ₁ ...p _k-1 } position;

Then assign the distance value of each agent to reach the p _kth position and calculate the lowest positioning cost combination for all agents to reach these three different positions.

5. The multi-robot collaboration method based on hierarchical decision-making mechanism as claimed in claim 4, wherein when calculating the minimum positioning cost combination: if there is a lower positioning cost in any subset, then the entire positioning method including the positioning The cost is necessarily lower.

6. The multi-robot collaboration method based on a hierarchical decision-making mechanism as claimed in claim 5, wherein a voting system with different weights is used to vote.

7. the multi-robot collaboration method based on hierarchical decision-making mechanism as claimed in claim 6, is characterized in that, in the voting system, the allocation situation of communication information byte is:

8. the multi-robot collaboration method based on hierarchical decision-making mechanism as claimed in claim 7, is characterized in that, the dynamic allocation of player's role, the role assignment function yr of use is to realize optimal occupancy:

According to the selection method of dictionary sorting, the sum of the positions of all agents is the shortest path among all possible occupancy methods of each agent;

In the shortest path, when two players have an intersection on the path, there will be a collision, and the role assignment function yr obtains a lower cost by exchanging the target positions of the two players according to the triangle inequality.