WO2022120955A1 - Multi-agent simulation method and platform using method - Google Patents

Multi-agent simulation method and platform using method Download PDF

Info

Publication number
WO2022120955A1
WO2022120955A1 PCT/CN2020/138782 CN2020138782W WO2022120955A1 WO 2022120955 A1 WO2022120955 A1 WO 2022120955A1 CN 2020138782 W CN2020138782 W CN 2020138782W WO 2022120955 A1 WO2022120955 A1 WO 2022120955A1
Authority
WO
WIPO (PCT)
Prior art keywords
agent
algorithm
squares
tested
unit
Prior art date
Application number
PCT/CN2020/138782
Other languages
French (fr)
Chinese (zh)
Inventor
刘延东
韩东
王鲁佳
须成忠
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022120955A1 publication Critical patent/WO2022120955A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Definitions

  • the invention belongs to the technical field of agent simulation and is suitable for verifying the simulation of multi-agent reinforcement learning algorithms, in particular to a multi-agent simulation method and a platform using the method.
  • a multi-agent simulation system is composed of a group of agents that share the environment, perceive the environment, and interact with the environment. Each agent interacts with the environment independently, takes actions according to individual goals, and affects the environment.
  • multi-agent simulation systems such as traffic congestion control, resource scheduling management, base station communication transmission, etc.
  • the existing multi-agent simulation platforms are mainly based on game interaction scenes, and the state space is based on images.
  • the pixel point information provides some observation information of the environment, it has a large dimension and a large number of channels. Even after image preprocessing, such as image cropping, reduction, channel number change and other methods. There is still a large state dimension, which requires high computer hardware and takes a long time to verify and test the algorithm.
  • the multi-agent simulation environment state information is single. For example, based on the interactive environment observation state of the game, the pixels of a frame of image are selected, while the position, speed, relative distance and other information of the object are generally selected based on the motion state of the object.
  • the algorithm needs to design different network input dimensions in the algorithm according to the form of state information provided by the environment. The adjustment process is cumbersome and error-prone.
  • the purpose of the present invention is to provide a multi-agent simulation method compatible with pixel observation and non-pixel observation and a platform using the method, aiming to solve the complicated adjustment process of the algorithm to be tested due to many input dimensions, pixel observation and non-pixel observation. Toggle error-prone technical issues.
  • the present invention provides a multi-agent simulation method, the method includes the following steps:
  • the agent uploads the observation information formed by the search space, the obstacle and other said agents to the algorithm to be tested;
  • the algorithm to be tested guides the agent to move one by one according to task requirements and observation information; the scoring system scores each movement of the agent according to the task requirements of the algorithm to be tested, until the intelligent body to complete the task;
  • one of the squares corresponds to a pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
  • the present invention also provides a multi-agent simulation platform, comprising:
  • An interactive environment unit constructing a search space with a length of 10 squares and a width of 10 squares; and setting a predetermined number of agents and obstacles in the search space;
  • a scene unit electrically connected to the interactive environment unit, for providing the number and position of the agent and the obstacle in the search space in a preset scene;
  • an algorithm unit electrically connected to the interactive environment unit, for loading the algorithm to be tested, and receiving observation information fed back to the algorithm to be tested by the agent; and outputting the decision of the algorithm to be tested on the movement direction of the agent information;
  • a scoring unit which is electrically connected to the algorithm unit and the interactive environment unit respectively, and is used to score each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task; The total score of the task is fed back to the algorithm to be tested; the agent performs multiple tasks, and the algorithm to be tested obtains the optimal strategy after continuous trial and error.
  • one of the squares corresponds to a pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
  • a square corresponds to both a pixel and a unit length of a physical quantity, so that a square can be used as a pixel (target Points, passable areas, obstacles with different pixel values), can also be used as a unit minimum distance (describe the position of the agent, the distance between the agent and the obstacle, the distance between the agent and the target point).
  • target Points passable areas, obstacles with different pixel values
  • unit minimum distance scribe the position of the agent, the distance between the agent and the obstacle, the distance between the agent and the target point.
  • Fig. 1 is the realization flow chart of the multi-agent simulation method provided by the first embodiment of the present invention
  • FIG. 2 is a schematic diagram of a preset simulation scene provided by the present invention.
  • FIG. 3 is a schematic diagram of a pixel-based observation state of a multi-agent in an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a user-defined scene layout in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the number of user-defined scene agents in an embodiment of the present invention.
  • FIG. 6 is a functional block diagram of a multi-agent simulation platform provided by Embodiment 2 of the present application.
  • FIG. 1 shows a multi-agent simulation method provided by Embodiment 1 of the present invention, and the method includes the following steps:
  • the search space is limited to a certain space, which reduces the dimension of the pixel and reduces the waiting time of the algorithm training process.
  • the agent uploads the observation information composed of the search space, obstacles and other agents to the algorithm to be tested;
  • the algorithm to be tested guides the agent to move one by one according to the task requirements and observation information; the scoring system scores each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task;
  • a square corresponds to a pixel and a unit length of a physical quantity; an agent occupies a square, and the unit time step of its movement is one square; an obstacle occupies A square where the agent cannot enter the square where the obstacle is.
  • the self-tuning and trial-and-error scheme of the algorithm to be tested is set by the algorithm itself.
  • This application provides the total score of each task, the movement mode of each agent, and the graded reward or punishment records for the to-be-tested algorithm. Test the algorithm for reference.
  • the multi-agent simulation method of the present application is not limited to reinforcement learning algorithms, but can also be applied to other algorithms that can realize self-learning.
  • each movement target of the agent is one of the upper, lower, left and right squares of the current square.
  • the scoring includes: a reward for adding points or a penalty for deducting points.
  • the observation range of the agent is the area of 3*3 squares centered on the square where it is located.
  • step S2 the number and positions of the agents and obstacles are determined according to the requirements of the algorithm to be tested or the preset scene.
  • This step provides a user-defined interface, and users can modify the scene to a certain extent according to their own needs. Such as changing the number of agents or the distribution of obstacles.
  • setting the search space to a limited space of 10*10 can ensure the validity and timeliness of the algorithm to be tested and reduce the waiting time of the algorithm training process.
  • Each square corresponds to a pixel point and a unit length of a physical quantity at the same time, which helps to be compatible with pixel observations and non-pixel observations in the training process and facilitates user testing.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 6 shows a multi-agent simulation platform using the above simulation method provided by Embodiment 2 of the present invention, including:
  • the interactive environment unit constructs a search space with a length of 10 squares and a width of 10 squares; and sets a predetermined number of agents and obstacles in the search space.
  • the environment interaction unit is responsible for transferring the observation of the agent from the environment to the algorithm, and displaying the behavior of the algorithm decision on the visual interface.
  • the agent observes the environment information from the scene in the search space and transmits the observed state to the interaction module.
  • the interaction module uploads the observation state to the algorithm, the algorithm predicts the behavior through the trained model, the agent executes the behavior in the interaction module, and obtains new observations from the scene, the reward for the behavior, and whether the training is completed.
  • the scene unit is electrically connected to the interactive environment unit, and is used to provide the number and position of the agent and the obstacle in the search space in the preset scene.
  • the scene unit of the simulation platform presets three common experimental scenarios for multi-agents, such as "pursuit-escape”, “multi-target navigation”, and “exchange position”, and provides user-defined interfaces.
  • the solid polygon represents the agent
  • the hollow polygon represents the target of the agent.
  • Agents with different tasks are represented by solid polygons with different shapes, such as solid circles and solid diamonds in the figure. Its moving targets are corresponding hollow circles and hollow diamonds.
  • the scene is initialized first, and the properties such as the number and color of the agents are determined.
  • the location of the start and end points in the scene. Determine the number and location of obstacles.
  • the observation information of the agent is determined.
  • the observation information based on images and non-images is different.
  • the image-based observation information is the pixels of the grid around the agent, and the non-image-based observation information is the physical quantity that reflects the environmental information (such as position, relative distance between agents, and relative distance between the agent and the target).
  • the algorithm unit is electrically connected with the interactive environment unit, and is used for loading the algorithm to be tested, receiving observation information fed back to the algorithm to be tested by the agent, and outputting the decision information of the algorithm to be tested on the moving direction of the agent.
  • the specific user adds the algorithm to be tested in the algorithm unit, obtains the observation state of the scene fed back by the agent through the interface of the environment interaction unit, predicts the behavior and returns the behavior to the agent, and guides the agent to move.
  • the scoring unit is electrically connected with the algorithm unit and the interactive environment unit respectively, and is used for scoring each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task. After each task is completed, the total score of the task is fed back to the algorithm to be tested; when the agent performs multiple tasks, the algorithm to be tested will get the optimal strategy after continuous trial and error.
  • the scoring unit should design the instant reward of the agent, that is, the numerical reward for each behavior of the agent. Whether the reward design can reflect the behavior of the agent determines the effect of the optimization strategy.
  • a square corresponds to a pixel and a unit length of a physical quantity; by using a small square as a dual attribute of a pixel and a unit length, on a simulation platform, it can easily provide users with different environmental states information to facilitate user testing.
  • An agent occupies a square, and the unit time step of its movement is one square; an obstacle occupies a square, and the agent cannot enter the square where the obstacle is located.
  • each movement target of the agent is one of the upper, lower, left and right squares of the current square.
  • the scoring of the intelligent body movement result by the scoring unit includes: a reward for adding points or a penalty for deducting points.
  • the observation range of the agent is an area of 3*3 squares centered on the square where it is located.
  • the solid circle and the realization diamond are agents with different tasks.
  • the hollow diamond is the target of the agent represented by the solid diamond
  • the hollow circle is the target of the agent represented by the solid circle.
  • the 3*3 hollow boxes around the agent represent the observation range of the agent. Taking the agent as the center, the observation range is determined according to the observation depth set by the user. Take the pixels in each grid (RGB three-channel) as the observed state of the agent.
  • Figure 3 shows the observed state of 3x3.
  • Image pixel-based observation states are often used in game-related multi-agent reinforcement learning simulation platforms.
  • the physical quantities commonly used for the observation state include the position of the agent, the relative position between the agent and the agent, and the relative position between the agent and the target point, etc.
  • the algorithm to be tested can choose different physical quantities to represent the observed state.
  • the configuration of the number and positions of the agents and obstacles by the scene unit is determined according to the requirements of the algorithm to be tested or a preset scene.
  • the user can change the scene setting of the search space according to the test requirements of the algorithm. It is mainly divided into two aspects: (1) changing the distribution of obstacles in the scene; (2) changing the number of agents in the scene.
  • the difficulty of the test can be increased by changing the distribution of obstacles in the scene.
  • the solid circles and solid diamonds shown in Figure 4 represent agents for different tasks, respectively.
  • the hollow rhombus is the target of the solid circle, and the hollow square is the target of the solid rhombus.
  • Agents that need to swap positions have a segment that must pass through the same area. To complete the task, the agent must learn to navigate cooperatively. One agent passes through the area first, and the other agent waits and passes when the area is traversable.
  • solid hexagons, solid triangles, solid circles, and solid diamonds represent agents for different tasks, respectively.
  • the hollow rhombus is the target of the solid circle
  • the hollow square is the target of the solid rhombus
  • the hollow hexagon is the target of the solid hexagon
  • the hollow triangle is the target of the solid triangle.
  • the number of agents that need to swap positions has increased from two to four.
  • the algorithm needs to coordinate the strategies of the four agents to complete the navigation task in a narrower space.
  • the global observation state and joint action space become complex, the task completion becomes more difficult, and the performance requirements of the algorithm are higher.
  • the customizability of the simulation platform can meet more needs of users.
  • the following table shows the agent behavior states and related rewards of the three preset scenarios applied in the search space.
  • the multi-agent simulation platform provided by the embodiment of the present invention is limited to a certain space (10*10 squares), which reduces the waiting time of the algorithm training process.
  • the pixel point and unit length attributes are integrated into the smallest unit of the simulation environment, one square, which solves the insufficiency that the existing methods can only be based on one kind of state information (image pixel or physical quantity).
  • a user-defined interface is also provided, and users can modify the scene to a certain extent according to their own needs to meet the testing needs of the algorithm.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention is applicable to the technical field of agent simulation, is applicable in particular to the field of simulation technologies for verifying a multi-agent reinforcement learning algorithm, and relates to a multi-agent simulation method and a platform using the method. According to the method, by setting a search space to be a 10*10 limited space, the validity and timeliness of an algorithm to be tested can be ensured, thereby shortening the waiting time during an algorithm training process. Each square corresponds to one pixel and one unit length of a physical quantity, so that a pixel observation state and a non-pixel observation state during the training process can be compatible, and a user can conveniently test an algorithm thereof and fine-tune the algorithm. The simulation platform using the method also has the same technical effect.

Description

多智能体仿真方法及采用该方法的平台Multi-agent simulation method and platform using the method 技术领域technical field
本发明属于智能体仿真技术领域,适用于验证多智能体强化学习算法的仿真,尤其涉及多智能体仿真方法及采用该方法的平台。The invention belongs to the technical field of agent simulation and is suitable for verifying the simulation of multi-agent reinforcement learning algorithms, in particular to a multi-agent simulation method and a platform using the method.
背景技术Background technique
多智能体仿真系统是由一组共享环境、感知环境并与环境交互的智能体组成的,每个智能体独立的与环境交互,根据个体的目标采取行为,并影响环境。在现实世界中,有很多多智能体仿真系统的实例,如交通拥堵控制、资源调度管理、基站通讯传输等。A multi-agent simulation system is composed of a group of agents that share the environment, perceive the environment, and interact with the environment. Each agent interacts with the environment independently, takes actions according to individual goals, and affects the environment. In the real world, there are many examples of multi-agent simulation systems, such as traffic congestion control, resource scheduling management, base station communication transmission, etc.
现有的多智能体仿真平台以游戏交互场景为主,状态空间是基于图像的。像素点信息虽然提供了环境的部分观测信息,但是维度较大,通道数多。即便经过图像的预处理,如图像的裁剪,缩小,通道数变化等方法。还是存在较大的状态维度,对于计算机硬件要求较高,验证测试算法的时间较长。同时,多智能体仿真环境状态信息单一,例如基于游戏的交互环境观测状态选取的是一帧图像的像素,而基于物体运动状态的一般会选取物体的位置,速度,相对距离等信息。算法需要根据环境所提供的状态信息的形式,设计算法中不同的网络输入维度,调整过程较为繁琐,容易出错。The existing multi-agent simulation platforms are mainly based on game interaction scenes, and the state space is based on images. Although the pixel point information provides some observation information of the environment, it has a large dimension and a large number of channels. Even after image preprocessing, such as image cropping, reduction, channel number change and other methods. There is still a large state dimension, which requires high computer hardware and takes a long time to verify and test the algorithm. At the same time, the multi-agent simulation environment state information is single. For example, based on the interactive environment observation state of the game, the pixels of a frame of image are selected, while the position, speed, relative distance and other information of the object are generally selected based on the motion state of the object. The algorithm needs to design different network input dimensions in the algorithm according to the form of state information provided by the environment. The adjustment process is cumbersome and error-prone.
技术问题technical problem
本发明的目的在于提供一种兼容像素观测与非像素观测的多智能体仿真方法及采用该方法的平台,旨在解决由于输入维度较多导致待测试算法调整过程繁琐,像素观测与非像素观测切换容易出错的技术问题。The purpose of the present invention is to provide a multi-agent simulation method compatible with pixel observation and non-pixel observation and a platform using the method, aiming to solve the complicated adjustment process of the algorithm to be tested due to many input dimensions, pixel observation and non-pixel observation. Toggle error-prone technical issues.
技术解决方案technical solutions
一方面,本发明提供了一种多智能体仿真方法,所述方法包括下述步骤:In one aspect, the present invention provides a multi-agent simulation method, the method includes the following steps:
S1.构建长度为10个方格,宽度为10个方格的搜索空间; S1. Build a search space with a length of 10 squares and a width of 10 squares;
S2.在所述搜索空间中设置预定数量的智能体和障碍物;S2. Set a predetermined number of agents and obstacles in the search space;
S3.所述智能体将所述搜索空间、所述障碍物和其他所述智能体构成的观测信息上传给待测试算法;S3. The agent uploads the observation information formed by the search space, the obstacle and other said agents to the algorithm to be tested;
S4.所述待测试算法根据任务需求及观测信息,逐次指导所述智能体进行运动;评分系统根据所述待测试算法的任务需求对所述智能体的每次运动进行评分,直至所述智能体完成任务;S4. The algorithm to be tested guides the agent to move one by one according to task requirements and observation information; the scoring system scores each movement of the agent according to the task requirements of the algorithm to be tested, until the intelligent body to complete the task;
S5.将任务总分反馈给所述待测试算法;所述智能体执行多次任务,所述待测试算法不断试错后,得到最优的策略;S5. Feed back the total score of the task to the algorithm to be tested; the agent performs multiple tasks, and the algorithm to be tested obtains the optimal strategy after continuous trial and error;
所述搜索空间中,一个所述方格对应一个像素点和一个物理量的单位长度;一个所述智能体占据一个所述方格,其运动的单位时间步长为一个所述方格;一个所述障碍物占据一个所述方格,所述智能体无法进入所述障碍物所在的方格。In the search space, one of the squares corresponds to a pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
另一方面,本发明还提供了一种多智能体仿真平台,包括:On the other hand, the present invention also provides a multi-agent simulation platform, comprising:
交互环境单元,构建长度为10个方格,宽度为10个方格的搜索空间;并在所述搜索空间中设置预定数量的智能体和障碍物;An interactive environment unit, constructing a search space with a length of 10 squares and a width of 10 squares; and setting a predetermined number of agents and obstacles in the search space;
场景单元 ,与所述交互环境单元电连接,用于提供预设场景时所述智能体和所述障碍物在所述搜索空间中的数量和位置;a scene unit, electrically connected to the interactive environment unit, for providing the number and position of the agent and the obstacle in the search space in a preset scene;
算法单元,与所述交互环境单元电连接,用于载入待测试算法,以及接收所述智能体反馈给该待测试算法的观测信息;并输出待测算法对所述智能体运动方向的决策信息;an algorithm unit, electrically connected to the interactive environment unit, for loading the algorithm to be tested, and receiving observation information fed back to the algorithm to be tested by the agent; and outputting the decision of the algorithm to be tested on the movement direction of the agent information;
评分单元,分别与所述算法单元、所述交互环境单元电连接,用于根据所述待测试算法的任务需求对所述智能体的每次运动进行评分,直至所述智能体完成任务;将任务总分反馈给所述待测试算法;所述智能体执行多次任务,所述待测试算法不断试错后,得到最优的策略。a scoring unit, which is electrically connected to the algorithm unit and the interactive environment unit respectively, and is used to score each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task; The total score of the task is fed back to the algorithm to be tested; the agent performs multiple tasks, and the algorithm to be tested obtains the optimal strategy after continuous trial and error.
所述搜索空间中,一个所述方格对应一个像素点和一个物理量的单位长度;一个所述智能体占据一个所述方格,其运动的单位时间步长为一个所述方格;一个所述障碍物占据一个所述方格,所述智能体无法进入所述障碍物所在的方格。In the search space, one of the squares corresponds to a pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
有益效果beneficial effect
本发明的多智能体仿真方法及其采用该方法的平台提供的最小单位一个方格既对应一个像素点又对应一个物理量的单位长度,这样设置使得一个方格即可以作为一个像素点使用(目标点,可通行区域,障碍物具有不同的像素值),也可以作为单位最小距离(描述智能体的位置,智能体与障碍物之间的距离,智能体与目标点之间的距离)。通过将小方格作为像素点和单位长度的双重属性使用。在一个仿真平台上,可以为用户提供不同的环境状态信息,即直观的兼容了像素观测与非像素观测,方便用户测试其算法。The multi-agent simulation method of the present invention and the minimum unit provided by the platform using the method, a square corresponds to both a pixel and a unit length of a physical quantity, so that a square can be used as a pixel (target Points, passable areas, obstacles with different pixel values), can also be used as a unit minimum distance (describe the position of the agent, the distance between the agent and the obstacle, the distance between the agent and the target point). By using small squares as dual attributes of pixel point and unit length. On a simulation platform, it can provide users with different environmental state information, that is, it is intuitively compatible with pixel observations and non-pixel observations, which is convenient for users to test their algorithms.
附图说明Description of drawings
图1是本发明实施例一提供的多智能体仿真方法的实现流程图;Fig. 1 is the realization flow chart of the multi-agent simulation method provided by the first embodiment of the present invention;
图2是本发明提供的预设仿真场景示意图;2 is a schematic diagram of a preset simulation scene provided by the present invention;
图3是本发明实施例中多智能体基于像素的观测状态示意图;3 is a schematic diagram of a pixel-based observation state of a multi-agent in an embodiment of the present invention;
图4是本发明实施例中用户自定义场景布局示意图;4 is a schematic diagram of a user-defined scene layout in an embodiment of the present invention;
图5是本发明实施例中用户自定义场景智能体数量示意图;5 is a schematic diagram of the number of user-defined scene agents in an embodiment of the present invention;
图6是本申请实施例二提供的多智能体仿真平台的功能框图。FIG. 6 is a functional block diagram of a multi-agent simulation platform provided by Embodiment 2 of the present application.
本发明的实施方式Embodiments of the present invention
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图1-6及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings 1-6 and the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
以下结合具体实施例对本发明的具体实现进行详细描述:The specific implementation of the present invention is described in detail below in conjunction with specific embodiments:
实施例一:Example 1:
图1示出了本发明实施例一提供的一种多智能体仿真方法,方法包括下述步骤:FIG. 1 shows a multi-agent simulation method provided by Embodiment 1 of the present invention, and the method includes the following steps:
S1.构建长度为10个方格,宽度为10个方格的搜索空间;S1. Build a search space with a length of 10 squares and a width of 10 squares;
该步骤将搜索空间限定在一定的空间内,内使得像素的维度降低,减少了算法训练过程的等待时间。In this step, the search space is limited to a certain space, which reduces the dimension of the pixel and reduces the waiting time of the algorithm training process.
S2.在搜索空间中设置预定数量的智能体和障碍物;S2. Set a predetermined number of agents and obstacles in the search space;
S3.智能体将搜索空间、障碍物和其他智能体构成的观测信息上传给待测试算法;S3. The agent uploads the observation information composed of the search space, obstacles and other agents to the algorithm to be tested;
S4.待测试算法根据任务需求及观测信息,逐次指导智能体进行运动;评分系统根据待测试算法的任务需求对智能体的每次运动进行评分,直至智能体完成任务;S4. The algorithm to be tested guides the agent to move one by one according to the task requirements and observation information; the scoring system scores each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task;
S5.将任务总分反馈给待测试算法;智能体执行多次任务,待测试算法不断试错后,得到最优的策略;S5. Feed back the total score of the task to the algorithm to be tested; the agent performs multiple tasks, and the optimal strategy is obtained after the algorithm to be tested continues trial and error;
如图2-5所示,搜索空间中,一个方格对应一个像素点和一个物理量的单位长度;一个智能体占据一个方格,其运动的单位时间步长为一个方格;一个障碍物占据一个方格,智能体无法进入障碍物所在的方格。As shown in Figure 2-5, in the search space, a square corresponds to a pixel and a unit length of a physical quantity; an agent occupies a square, and the unit time step of its movement is one square; an obstacle occupies A square where the agent cannot enter the square where the obstacle is.
在实际操作中,待测试算法的自我调优、试错方案由算法本身进行设定,本申请则提供每次任务的总分和每次智能体的运动方式及分次奖励或惩罚记录给予待测试算法进行参考。In actual operation, the self-tuning and trial-and-error scheme of the algorithm to be tested is set by the algorithm itself. This application provides the total score of each task, the movement mode of each agent, and the graded reward or punishment records for the to-be-tested algorithm. Test the algorithm for reference.
优选的,本申请的多智能体仿真方法不仅限于强化学习算法,也可以适用于其他可实现自我学习的算法。Preferably, the multi-agent simulation method of the present application is not limited to reinforcement learning algorithms, but can also be applied to other algorithms that can realize self-learning.
优选的,步骤S4中,智能体的每次运动目标为其当前方格的上、下、左和右中的一个方格。Preferably, in step S4, each movement target of the agent is one of the upper, lower, left and right squares of the current square.
优选的,步骤S4中,评分包括:加分的奖励或者扣分的惩罚。Preferably, in step S4, the scoring includes: a reward for adding points or a penalty for deducting points.
优选的,步骤S3中,智能体的观测范围为其所在方格为中心的3*3个方格面积。Preferably, in step S3, the observation range of the agent is the area of 3*3 squares centered on the square where it is located.
优选的,步骤S2中,智能体和障碍物的数量和位置根据待测试算法的需求或者预设场景来确定。Preferably, in step S2, the number and positions of the agents and obstacles are determined according to the requirements of the algorithm to be tested or the preset scene.
该步骤提供了用户自定义接口,用户可以根据自身需要,一定程度上修改场景。比如改变智能体的数量或障碍物的分布。This step provides a user-defined interface, and users can modify the scene to a certain extent according to their own needs. Such as changing the number of agents or the distribution of obstacles.
在本发明实施例中,将搜索空间设定为10*10的有限空间可以保证待测试算法的有效性和时效性减少算法训练过程的等待时间。其中每个方格同时对应一个像素点和一个物理量的单位长度,有助于兼容训练过程中的像素观测和非像素观测,方便用户测试。In the embodiment of the present invention, setting the search space to a limited space of 10*10 can ensure the validity and timeliness of the algorithm to be tested and reduce the waiting time of the algorithm training process. Each square corresponds to a pixel point and a unit length of a physical quantity at the same time, which helps to be compatible with pixel observations and non-pixel observations in the training process and facilitates user testing.
实施例二:Embodiment 2:
图6示出了本发明实施例二提供的采用上述仿真方法的一种多智能体仿真平台,包括:FIG. 6 shows a multi-agent simulation platform using the above simulation method provided by Embodiment 2 of the present invention, including:
交互环境单元,构建长度为10个方格,宽度为10个方格的搜索空间;并在搜索空间中设置预定数量的智能体和障碍物。The interactive environment unit constructs a search space with a length of 10 squares and a width of 10 squares; and sets a predetermined number of agents and obstacles in the search space.
具体的,环境交互单元负责将智能体从环境中的观测传递给算法,并将算法决策的行为表现在可视化界面上。智能体从搜索空间的场景中观测环境信息,将观测状态传递给交互模块。交互模块上传观测状态给算法,算法通过训练的模型预测行为,智能体在交互模块中执行行为,从场景中得到新的观测,对行为的奖励和是否完成这次训练等信息。Specifically, the environment interaction unit is responsible for transferring the observation of the agent from the environment to the algorithm, and displaying the behavior of the algorithm decision on the visual interface. The agent observes the environment information from the scene in the search space and transmits the observed state to the interaction module. The interaction module uploads the observation state to the algorithm, the algorithm predicts the behavior through the trained model, the agent executes the behavior in the interaction module, and obtains new observations from the scene, the reward for the behavior, and whether the training is completed.
场景单元 ,与交互环境单元电连接,用于提供预设场景时智能体和障碍物在搜索空间中的数量和位置。The scene unit is electrically connected to the interactive environment unit, and is used to provide the number and position of the agent and the obstacle in the search space in the preset scene.
如图2所示,仿真平台的场景单元预设了“追捕——逃跑”,“多目标导航”,“交换位置”等三个多智能体常用实验场景,并提供了用户自定义接口。其中,实心多边形代表的是智能体、空心多边形代表的是智能体的目标。任务不同的智能体用形状不同的实心多边形表示,如图中的实心圆形和实心菱形。其移动目标为对应的空心圆形和空心菱形。As shown in Figure 2, the scene unit of the simulation platform presets three common experimental scenarios for multi-agents, such as "pursuit-escape", "multi-target navigation", and "exchange position", and provides user-defined interfaces. Among them, the solid polygon represents the agent, and the hollow polygon represents the target of the agent. Agents with different tasks are represented by solid polygons with different shapes, such as solid circles and solid diamonds in the figure. Its moving targets are corresponding hollow circles and hollow diamonds.
在场景单元中,首先初始化场景,确定智能体的数量,颜色等属性。在场景中的起点和终点的位置。确定障碍物的数量和位置。然后,确定智能体的观测信息,不同场景下,基于图像和非图像的观测信息不同。基于图像的观测信息为智能体周围方格的像素,基于非图像的观测信息为反应环境信息的物理量(如位置,智能体间的相对距离,智能体与目标之间的相对距离)。最后,判断一个训练过程是否结束,需要注意的是,场景中所有的智能体全部完成任务,这一个训练过程才结束。In the scene unit, the scene is initialized first, and the properties such as the number and color of the agents are determined. The location of the start and end points in the scene. Determine the number and location of obstacles. Then, the observation information of the agent is determined. In different scenarios, the observation information based on images and non-images is different. The image-based observation information is the pixels of the grid around the agent, and the non-image-based observation information is the physical quantity that reflects the environmental information (such as position, relative distance between agents, and relative distance between the agent and the target). Finally, to judge whether a training process is over, it should be noted that this training process ends only when all the agents in the scene complete the task.
算法单元,与交互环境单元电连接,用于载入待测试算法,以及接收智能体反馈给该待测试算法的观测信息;并输出待测算法对智能体运动方向的决策信息。The algorithm unit is electrically connected with the interactive environment unit, and is used for loading the algorithm to be tested, receiving observation information fed back to the algorithm to be tested by the agent, and outputting the decision information of the algorithm to be tested on the moving direction of the agent.
具体的用户在算法单元中添加待测试的算法,通过环境交互单元的接口得到智能体反馈的场景的观测状态,预测行为并将行为返回给智能体,指导智能体运动。The specific user adds the algorithm to be tested in the algorithm unit, obtains the observation state of the scene fed back by the agent through the interface of the environment interaction unit, predicts the behavior and returns the behavior to the agent, and guides the agent to move.
评分单元,分别与算法单元、交互环境单元电连接,用于根据待测试算法的任务需求对智能体的每次运动进行评分,直至智能体完成任务。每次完成任务后,将任务总分反馈给待测试算法;智能体执行多次任务时,待测试算法不断试错后,得到最优的策略。The scoring unit is electrically connected with the algorithm unit and the interactive environment unit respectively, and is used for scoring each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task. After each task is completed, the total score of the task is fed back to the algorithm to be tested; when the agent performs multiple tasks, the algorithm to be tested will get the optimal strategy after continuous trial and error.
具体的,评分单元要设计智能体的即时奖励,即对智能体每个行为的数值奖励。奖励设计能否反应智能体行为的好坏决定了优化策略的效果。Specifically, the scoring unit should design the instant reward of the agent, that is, the numerical reward for each behavior of the agent. Whether the reward design can reflect the behavior of the agent determines the effect of the optimization strategy.
搜索空间中,一个方格对应一个像素点和一个物理量的单位长度;通过将小方格作为像素点和单位长度的双重属性使用,在一个仿真平台上,可以方便的为用户提供不同的环境状态信息,方便用户测试。In the search space, a square corresponds to a pixel and a unit length of a physical quantity; by using a small square as a dual attribute of a pixel and a unit length, on a simulation platform, it can easily provide users with different environmental states information to facilitate user testing.
一个智能体占据一个方格,其运动的单位时间步长为一个方格;一个障碍物占据一个方格,智能体无法进入障碍物所在的方格。An agent occupies a square, and the unit time step of its movement is one square; an obstacle occupies a square, and the agent cannot enter the square where the obstacle is located.
优选的,交互环境单元中,智能体的每次运动目标为其当前方格的上、下、左和右中的一个方格。Preferably, in the interactive environment unit, each movement target of the agent is one of the upper, lower, left and right squares of the current square.
优选的,评分单元对智慧体运动结果的评分包括:加分的奖励或扣分的惩罚。Preferably, the scoring of the intelligent body movement result by the scoring unit includes: a reward for adding points or a penalty for deducting points.
优选的,交互环境单元中,智能体的观测范围为其所在方格为中心的3*3个方格面积。Preferably, in the interactive environment unit, the observation range of the agent is an area of 3*3 squares centered on the square where it is located.
其中,基于图像像素的观测状态,如图3所示,实心圆形和实现菱形为任务不同的智能体。空心菱形为实心菱形所代表的智能体的目标,空心方形套空心圆形为实心圆形所代表智能体的目标。智能体外围的3*3空心方框代表智能体的观测范围。以智能体为中心,根据用户设定的观测深度,决定观测范围。将每个网格中的像素(RGB三通道)作为智能体的观测状态。图3所示为3x3的观测状态。基于图像像素的观测状态常用于游戏相关的多智能体强化学习仿真平台上。Among them, based on the observation state of image pixels, as shown in Figure 3, the solid circle and the realization diamond are agents with different tasks. The hollow diamond is the target of the agent represented by the solid diamond, and the hollow circle is the target of the agent represented by the solid circle. The 3*3 hollow boxes around the agent represent the observation range of the agent. Taking the agent as the center, the observation range is determined according to the observation depth set by the user. Take the pixels in each grid (RGB three-channel) as the observed state of the agent. Figure 3 shows the observed state of 3x3. Image pixel-based observation states are often used in game-related multi-agent reinforcement learning simulation platforms.
其中,基于物理量的观测状态,常用做观测状态的物理量包括智能体的位置,智能体与智能体间相对位置,智能体与目标点之间的相对位置等。针对不同的场景,待测试的算法可以选择不同的物理量来表示观测状态。Among them, based on the observation state of physical quantities, the physical quantities commonly used for the observation state include the position of the agent, the relative position between the agent and the agent, and the relative position between the agent and the target point, etc. For different scenarios, the algorithm to be tested can choose different physical quantities to represent the observed state.
优选的,场景单元对智能体和障碍物的数量和位置的配置根据待测试算法的需求或者预设场景来确定。Preferably, the configuration of the number and positions of the agents and obstacles by the scene unit is determined according to the requirements of the algorithm to be tested or a preset scene.
具体的,用户可以根据算法的测试需要改变搜索空间的场景设置。主要分为两个方面:(1)改变场景中障碍物的分布;(2)改变场景中智能体的数量。通过改变场景中障碍物的分布,可以增加测试的难度。如图4所示实心圆形和实心菱形分别代表不同任务的智能体。其中空心菱形为实心圆形的目标,空心方形套空心圆形为实心菱形的目标。需要交换位置的智能体,有一段必须通过且相同的区域。为了完成任务,智能体必须学习协同导航。一个智能体先通过该区域,另一个智能体等待并在该区域可通行后再经过。通过改变场景中智能体的数量,可以测试算法的性能。如图5所示,实心六边形、实心三角形、实心圆形和实心菱形分别代表不同任务的智能体。其中空心菱形为实心圆形的目标,空心方形套空心圆形为实心菱形的目标,空心六边形为实心六边形的目标,空心三角形为实心三角形的目标。需要交换位置的智能体从两个增加到四个。算法需要协调四个智能体的策略,在更为狭小的空间内完成导航任务。全局观测状态和联合动作空间都变得复杂,任务完成的难度变大,对算法的性能要求更高。仿真平台的可定制性,能够满足用户的更多需求。Specifically, the user can change the scene setting of the search space according to the test requirements of the algorithm. It is mainly divided into two aspects: (1) changing the distribution of obstacles in the scene; (2) changing the number of agents in the scene. The difficulty of the test can be increased by changing the distribution of obstacles in the scene. The solid circles and solid diamonds shown in Figure 4 represent agents for different tasks, respectively. The hollow rhombus is the target of the solid circle, and the hollow square is the target of the solid rhombus. Agents that need to swap positions have a segment that must pass through the same area. To complete the task, the agent must learn to navigate cooperatively. One agent passes through the area first, and the other agent waits and passes when the area is traversable. By varying the number of agents in the scene, the performance of the algorithm can be tested. As shown in Figure 5, solid hexagons, solid triangles, solid circles, and solid diamonds represent agents for different tasks, respectively. The hollow rhombus is the target of the solid circle, the hollow square is the target of the solid rhombus, the hollow hexagon is the target of the solid hexagon, and the hollow triangle is the target of the solid triangle. The number of agents that need to swap positions has increased from two to four. The algorithm needs to coordinate the strategies of the four agents to complete the navigation task in a narrower space. The global observation state and joint action space become complex, the task completion becomes more difficult, and the performance requirements of the algorithm are higher. The customizability of the simulation platform can meet more needs of users.
实施例三:Embodiment three:
以下表格示出了应用在搜索空间中的预设的三种场景的智能体行为状态及相关奖励等信息。The following table shows the agent behavior states and related rewards of the three preset scenarios applied in the search space.
Figure dest_path_image001
Figure dest_path_image001
本发明实施例提供的多智能体仿真平台,限定在一定空间(10*10方格)内,减少了算法的训练过程的等待时间。将像素点和单位长度属性融合到了仿真环境的最小单位一个方格,解决了现有方法只能基于一种状态信息(图像像素或物理量)的不足。同时,还提供了用户自定义接口,用户可以根据自身需要,一定程度上修改场景来满足算法的测试需求。The multi-agent simulation platform provided by the embodiment of the present invention is limited to a certain space (10*10 squares), which reduces the waiting time of the algorithm training process. The pixel point and unit length attributes are integrated into the smallest unit of the simulation environment, one square, which solves the insufficiency that the existing methods can only be based on one kind of state information (image pixel or physical quantity). At the same time, a user-defined interface is also provided, and users can modify the scene to a certain extent according to their own needs to meet the testing needs of the algorithm.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims (10)

  1. 一种多智能体仿真方法,其特征在于,所述方法包括下述步骤:A multi-agent simulation method, characterized in that the method comprises the following steps:
    S1.构建长度为10个方格,宽度为10个方格的搜索空间; S1. Build a search space with a length of 10 squares and a width of 10 squares;
    S2.在所述搜索空间中设置预定数量的智能体和障碍物;S2. Set a predetermined number of agents and obstacles in the search space;
    S3.所述智能体将所述搜索空间、所述障碍物和其他所述智能体构成的观测信息上传给待测试算法;S3. The agent uploads the observation information formed by the search space, the obstacle and other said agents to the algorithm to be tested;
    S4.所述待测试算法根据任务需求及观测信息,逐次指导所述智能体进行运动;评分系统根据所述待测试算法的任务需求对所述智能体的每次运动进行评分,直至所述智能体完成任务;S4. The algorithm to be tested guides the agent to move one by one according to task requirements and observation information; the scoring system scores each movement of the agent according to the task requirements of the algorithm to be tested, until the intelligent body to complete the task;
    S5.将任务总分反馈给所述待测试算法;所述智能体执行多次任务,所述待测试算法不断试错后,得到最优的策略;S5. Feed back the total score of the task to the algorithm to be tested; the agent performs multiple tasks, and the algorithm to be tested obtains the optimal strategy after continuous trial and error;
    所述搜索空间中,一个所述方格对应一个像素点和一个物理量的单位长度;一个所述智能体占据一个所述方格,其运动的单位时间步长为一个所述方格;一个所述障碍物占据一个所述方格,所述智能体无法进入所述障碍物所在的方格。In the search space, one of the squares corresponds to a pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
  2. 如权利要求1所述的方法,其特征在于,所述步骤S4中,所述智能体的每次运动目标为其当前方格的上、下、左和右中的一个方格。The method according to claim 1, characterized in that, in the step S4, the target of each movement of the agent is one of the upper, lower, left and right squares of the current square.
  3. 如权利要求1所述的方法,其特征在于,所述步骤S4中,所述评分包括:加分的奖励或者扣分的惩罚。The method of claim 1, wherein, in the step S4, the scoring includes: a reward for adding points or a penalty for deducting points.
  4. 如权利要求1所述的方法,其特征在于,所述步骤S3中,所述智能体的观测范围为其所在方格为中心的3*3个方格面积。The method according to claim 1, characterized in that, in the step S3, the observation range of the agent is an area of 3*3 squares centered on the square where it is located.
  5. 如权利要求1所述的方法,其特征在于,所述步骤S2中,所述智能体和所述障碍物的数量和位置根据待测试算法的需求或者预设场景来确定。The method according to claim 1, wherein in the step S2, the number and position of the agent and the obstacle are determined according to the requirements of the algorithm to be tested or a preset scene.
  6. 一种多智能体仿真平台,其特征在于,所述平台包括:A multi-agent simulation platform, characterized in that the platform includes:
    交互环境单元,构建长度为10个方格,宽度为10个方格的搜索空间;并在所述搜索空间中设置预定数量的智能体和障碍物;An interactive environment unit, constructing a search space with a length of 10 squares and a width of 10 squares; and setting a predetermined number of agents and obstacles in the search space;
    场景单元 ,与所述交互环境单元电连接,用于提供预设场景时所述智能体和所述障碍物在所述搜索空间中的数量和位置;a scene unit, electrically connected to the interactive environment unit, for providing the number and position of the agent and the obstacle in the search space in a preset scene;
    算法单元,与所述交互环境单元电连接,用于载入待测试算法,以及接收所述智能体反馈给该待测试算法的观测信息;并输出待测算法对所述智能体运动方向的决策信息;an algorithm unit, electrically connected to the interactive environment unit, for loading the algorithm to be tested, and receiving observation information fed back to the algorithm to be tested by the agent; and outputting the decision of the algorithm to be tested on the movement direction of the agent information;
    评分单元,分别与所述算法单元、所述交互环境单元电连接,用于根据所述待测试算法的任务需求对所述智能体的每次运动进行评分,直至所述智能体完成任务;将任务总分反馈给所述待测试算法;所述智能体执行多次任务,所述待测试算法不断试错后,得到最优的策略;a scoring unit, which is electrically connected to the algorithm unit and the interactive environment unit respectively, and is used to score each movement of the agent according to the task requirements of the algorithm to be tested, until the agent completes the task; The total score of the task is fed back to the algorithm to be tested; the agent performs multiple tasks, and the algorithm to be tested obtains the optimal strategy after continuous trial and error;
    所述搜索空间中,一个所述方格对应一个像素点和一个物理量的单位长度;一个所述智能体占据一个所述方格,其运动的单位时间步长为一个所述方格;一个所述障碍物占据一个所述方格,所述智能体无法进入所述障碍物所在的方格。In the search space, one of the squares corresponds to one pixel and a unit length of a physical quantity; one of the agents occupies one of the squares, and the unit time step of its movement is one of the squares; The obstacle occupies one of the squares, and the agent cannot enter the square where the obstacle is located.
  7. 如权利要求6所述的平台,其特征在于,所述交互环境单元中,所述智能体的每次运动目标为其当前方格的上、下、左和右中的一个方格。The platform according to claim 6, wherein, in the interactive environment unit, the target of each movement of the agent is one of the upper, lower, left and right squares of the current square.
  8. 如权利要求6所述的平台,其特征在于,所述评分单元对所述智慧体运动结果的评分包括:加分的奖励或扣分的惩罚。The platform according to claim 6, wherein the scoring of the intelligent body movement result by the scoring unit includes: a reward for adding points or a penalty for deducting points.
  9. 如权利要求6所述的平台,其特征在于,所述交互环境单元中,所述智能体的观测范围为其所在方格为中心的3*3个方格面积。The platform according to claim 6, wherein, in the interactive environment unit, the observation range of the agent is an area of 3*3 squares centered on the square where it is located.
  10. 如权利要求6所述的平台,其特征在于,所述场景单元对所述智能体和所述障碍物的数量和位置的配置根据待测试算法的需求或者预设场景来确定。The platform according to claim 6, wherein the configuration of the number and positions of the agent and the obstacles by the scene unit is determined according to the requirements of the algorithm to be tested or a preset scene.
PCT/CN2020/138782 2020-12-11 2020-12-24 Multi-agent simulation method and platform using method WO2022120955A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011442726.2 2020-12-11
CN202011442726.2A CN114626175A (en) 2020-12-11 2020-12-11 Multi-agent simulation method and platform adopting same

Publications (1)

Publication Number Publication Date
WO2022120955A1 true WO2022120955A1 (en) 2022-06-16

Family

ID=81895881

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/138782 WO2022120955A1 (en) 2020-12-11 2020-12-24 Multi-agent simulation method and platform using method

Country Status (2)

Country Link
CN (1) CN114626175A (en)
WO (1) WO2022120955A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912356A (en) * 2023-09-13 2023-10-20 深圳大学 Hexagonal set visualization method and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010042256A9 (en) * 2008-05-30 2010-06-03 The University Of Memphis Research Foundation Methods of improved learning in simultaneous recurrent neural networks
US20180165602A1 (en) * 2016-12-14 2018-06-14 Microsoft Technology Licensing, Llc Scalability of reinforcement learning by separation of concerns
CN108563112A (en) * 2018-03-30 2018-09-21 南京邮电大学 Control method for emulating Soccer robot ball-handling
CN110135644A (en) * 2019-05-17 2019-08-16 北京洛必德科技有限公司 A kind of robot path planning method for target search
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN111612126A (en) * 2020-04-18 2020-09-01 华为技术有限公司 Method and device for reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010042256A9 (en) * 2008-05-30 2010-06-03 The University Of Memphis Research Foundation Methods of improved learning in simultaneous recurrent neural networks
US20180165602A1 (en) * 2016-12-14 2018-06-14 Microsoft Technology Licensing, Llc Scalability of reinforcement learning by separation of concerns
CN108563112A (en) * 2018-03-30 2018-09-21 南京邮电大学 Control method for emulating Soccer robot ball-handling
CN110135644A (en) * 2019-05-17 2019-08-16 北京洛必德科技有限公司 A kind of robot path planning method for target search
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN111612126A (en) * 2020-04-18 2020-09-01 华为技术有限公司 Method and device for reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAN, FENGTING ET AL.: "DP-Q(λ):Real-time Path Planning for Multi-agent in Large-scale Web3D Scene", JOURNAL OF SYSTEM SIMULATION, vol. 31, no. 1, 31 January 2019 (2019-01-31), pages 19 - 26, XP055941909, ISSN: 1004-731X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912356A (en) * 2023-09-13 2023-10-20 深圳大学 Hexagonal set visualization method and related device
CN116912356B (en) * 2023-09-13 2024-01-09 深圳大学 Hexagonal set visualization method and related device

Also Published As

Publication number Publication date
CN114626175A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Han et al. A dynamic resource allocation framework for synchronizing metaverse with iot service and data
CN113495578B (en) Digital twin training-based cluster track planning reinforcement learning method
CN110865653B (en) Distributed cluster unmanned aerial vehicle formation transformation method
US7537523B2 (en) Dynamic player groups for interest management in multi-character virtual environments
CN106949893A (en) The Indoor Robot air navigation aid and system of a kind of three-dimensional avoidance
CN106991713A (en) Method and apparatus, medium, processor and the terminal of scene in more new game
Kumar et al. Federated control with hierarchical multi-agent deep reinforcement learning
CN110308740A (en) A kind of unmanned aerial vehicle group dynamic task allocation method towards mobile target tracking
CN107491086A (en) Unmanned plane formation obstacle avoidance and system under time-varying network topology
CN107918403A (en) A kind of implementation method of multiple no-manned plane flight path collaborative planning
Ding et al. Hierarchical reinforcement learning framework towards multi-agent navigation
CN111208842B (en) Virtual unmanned aerial vehicle and entity unmanned aerial vehicle mixed cluster task control system
WO2022121207A1 (en) Trajectory planning method and apparatus, device, storage medium, and program product
WO2022120955A1 (en) Multi-agent simulation method and platform using method
CN110162097A (en) Unmanned plane distribution formation control method based on energy consumption
Sui et al. Path planning of multiagent constrained formation through deep reinforcement learning
CN112114592B (en) Method for realizing autonomous crossing of movable frame-shaped barrier by unmanned aerial vehicle
CN108628294A (en) A kind of autonomous cooperative control system of multirobot target and its control method
CN106557611A (en) The Dynamic Load-balancing Algorithm research of distributed traffic network simulation platform and application
Wu et al. Digital twin-enabled reinforcement learning for end-to-end autonomous driving
CN115327499A (en) Radar target track simulation method based on load unmanned aerial vehicle
CN107247253A (en) A kind of phased-array radar beam dispath information visuallization system and method
CN117170410B (en) Control method for unmanned aerial vehicle formation flight and related products
CN117516562A (en) Road network processing method and related device
CN107543549A (en) Route planning method under the unilateral imaging constraints of unmanned plane

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964884

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964884

Country of ref document: EP

Kind code of ref document: A1