CN114626175A - Multi-agent simulation method and platform adopting same - Google Patents

Multi-agent simulation method and platform adopting same Download PDF

Info

Publication number
CN114626175A
CN114626175A CN202011442726.2A CN202011442726A CN114626175A CN 114626175 A CN114626175 A CN 114626175A CN 202011442726 A CN202011442726 A CN 202011442726A CN 114626175 A CN114626175 A CN 114626175A
Authority
CN
China
Prior art keywords
algorithm
agent
tested
unit
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011442726.2A
Other languages
Chinese (zh)
Inventor
刘延东
韩东
王鲁佳
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202011442726.2A priority Critical patent/CN114626175A/en
Priority to PCT/CN2020/138782 priority patent/WO2022120955A1/en
Publication of CN114626175A publication Critical patent/CN114626175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Abstract

The invention is suitable for the technical field of intelligent agent simulation, in particular to the technical field of simulation for verifying a multi-intelligent agent reinforcement learning algorithm, and relates to a multi-intelligent agent simulation method and a platform adopting the method, wherein the method sets a search space to be 10 x 10 of a limited space, so that the effectiveness and timeliness of an algorithm to be tested can be ensured, and the waiting time of an algorithm training process is reduced; each square corresponds to a pixel point and the unit length of a physical quantity at the same time, so that the compatibility of the pixel observation state and the non-pixel observation state in the training process is facilitated, and a user can test the algorithm and adjust the optimization conveniently. The simulation platform adopting the method also has the same technical effect.

Description

Multi-agent simulation method and platform adopting same
Technical Field
The invention belongs to the technical field of intelligent agent simulation, is suitable for verifying simulation of a multi-intelligent-agent reinforcement learning algorithm, and particularly relates to a multi-intelligent-agent simulation method and a platform adopting the method.
Background
The multi-agent simulation system is composed of a group of agents sharing the environment, sensing the environment and interacting with the environment, each agent independently interacts with the environment, takes action according to individual targets and influences the environment. In the real world, there are many examples of intelligent agent simulation systems, such as traffic congestion control, resource scheduling management, base station communication transmission, etc.
The existing multi-agent simulation platform mainly takes a game interaction scene, and the state space is based on images. Although the pixel point information provides partial observation information of the environment, the dimension is large, and the number of channels is large. Even through image preprocessing, such as image cutting, image reduction, channel number change and the like. Still have great state dimension, it is higher to the computer hardware requirement, the time of verifying the test algorithm is longer. Meanwhile, the multi-agent simulation environment state information is single, for example, the pixels of one frame of image are selected based on the interactive environment observation state of the game, and the position, the speed, the relative distance and other information of the object are generally selected based on the motion state of the object. The algorithm needs to design different network input dimensions in the algorithm according to the form of state information provided by the environment, and the adjustment process is complex and easy to make mistakes.
Disclosure of Invention
The invention aims to provide a multi-agent simulation method compatible with pixel observation and non-pixel observation and a platform adopting the method, and aims to solve the technical problems that the adjustment process of an algorithm to be tested is complicated due to more input dimensions, and errors are easy to occur in switching between pixel observation and non-pixel observation.
In one aspect, the present invention provides a multi-agent simulation method, comprising the steps of:
s1, constructing a search space with the length of 10 grids and the width of 10 grids;
s2, setting a preset number of intelligent bodies and obstacles in the search space;
s3, the intelligent agent uploads observation information formed by the search space, the obstacles and other intelligent agents to an algorithm to be tested;
s4, the algorithm to be tested guides the intelligent agent to move gradually according to task requirements and observation information; the scoring system scores each movement of the intelligent agent according to the task requirement of the algorithm to be tested until the intelligent agent completes the task;
s5, feeding back the total task score to the algorithm to be tested; the intelligent agent executes a plurality of tasks, and after the algorithm to be tested is continuously tested and debugged, an optimal strategy is obtained;
in the search space, one square corresponds to one pixel point and the unit length of one physical quantity; one of said agents occupying one of said squares, the unit time step of movement of which is one of said squares; one of the squares is occupied by one of the obstacles, and the agent cannot enter the square in which the obstacle is located.
In another aspect, the present invention further provides a multi-agent simulation platform, comprising:
the interactive environment unit is used for constructing a search space with the length of 10 grids and the width of 10 grids; setting a preset number of agents and barriers in the search space;
the scene unit is electrically connected with the interactive environment unit and used for providing the number and the positions of the intelligent bodies and the obstacles in the search space when a preset scene is provided;
the algorithm unit is electrically connected with the interactive environment unit and used for loading the algorithm to be tested and receiving the observation information fed back to the algorithm to be tested by the intelligent agent; outputting decision information of the algorithm to be tested on the motion direction of the intelligent agent;
the scoring unit is respectively electrically connected with the algorithm unit and the interactive environment unit and is used for scoring each movement of the intelligent agent according to the task requirement of the algorithm to be tested until the intelligent agent completes the task; feeding back the total task score to the algorithm to be tested; and the intelligent agent executes a plurality of tasks, and the optimal strategy is obtained after the algorithm to be tested is continuously tested and debugged.
In the search space, one square corresponds to one pixel point and the unit length of one physical quantity; one of said agents occupying one of said squares, the unit time step of movement of which is one of said squares; one of the squares is occupied by one of the obstacles, and the agent cannot enter the square in which the obstacle is located.
The invention relates to a multi-agent simulation method and a platform adopting the method, wherein the minimum unit provided by the method corresponds to one pixel point and the unit length of one physical quantity, so that one grid can be used as one pixel point (a target point, a passable area and an obstacle with different pixel values) or the minimum unit distance (describing the position of an agent, the distance between the agent and the obstacle and the distance between the agent and the target point). The small checks are used as the dual attributes of the pixel points and the unit length. On a simulation platform, different environment state information can be provided for a user, namely pixel observation and non-pixel observation are visually compatible, and the user can test the algorithm conveniently.
Drawings
FIG. 1 is a flow chart of an implementation of a multi-agent simulation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a preset simulation scenario provided by the present invention;
FIG. 3 is a schematic diagram of pixel-based observed states of a multi-agent in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a layout of a user-defined scene in an embodiment of the present invention;
FIG. 5 is a diagram illustrating the number of agents in an embodiment of the invention;
FIG. 6 is a functional block diagram of a multi-agent simulation platform provided in the second embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to fig. 1 to 6 and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:
the first embodiment is as follows:
FIG. 1 shows a multi-agent simulation method according to an embodiment of the present invention, which includes the following steps:
s1, constructing a search space with the length of 10 grids and the width of 10 grids;
the step limits the search space in a certain space, so that the dimensionality of the pixels is reduced, and the waiting time of the algorithm training process is reduced.
S2, setting a preset number of intelligent bodies and obstacles in a search space;
s3, uploading observation information formed by the search space, the obstacles and other intelligent agents to an algorithm to be tested by the intelligent agents;
s4, successively guiding the intelligent agent to move by the algorithm to be tested according to task requirements and observation information; the scoring system scores each movement of the intelligent agent according to the task requirements of the algorithm to be tested until the intelligent agent finishes the task;
s5, feeding back the task total score to the algorithm to be tested; the intelligent agent executes a plurality of tasks, and obtains an optimal strategy after the algorithm to be tested is continuously tried out;
as shown in fig. 2-5, in the search space, one square corresponds to one pixel point and the unit length of one physical quantity; an agent occupies a square grid, and the unit time step of the movement of the agent is a square grid; an obstacle occupies a square grid and the agent cannot enter the square grid in which the obstacle is located.
In actual operation, the self-optimization and trial-and-error schemes of the algorithm to be tested are set by the algorithm, and the total score of each task and the motion mode and the fractional reward or punishment record of each agent are provided for the algorithm to be tested to be referred.
Preferably, the multi-agent simulation method is not limited to reinforcement learning algorithm, but can also be applied to other algorithms capable of realizing self-learning.
Preferably, in step S4, each movement of the agent is targeted to one of the upper, lower, left and right squares of its current square.
Preferably, in step S4, the scoring includes: bonus of bonus or penalty of bonus.
Preferably, in step S3, the observation range of the agent is 3 × 3 squares centered on the grid where the agent is located.
Preferably, in step S2, the number and positions of the agents and the obstacles are determined according to the requirements of the algorithm to be tested or the preset scenario.
The step provides a user-defined interface, and the user can modify the scene to a certain extent according to the self requirement. Such as changing the number of agents or the distribution of obstacles.
In the embodiment of the invention, setting the search space to be a limited space of 10 × 10 can ensure that the effectiveness and timeliness of the algorithm to be tested reduce the waiting time of the algorithm training process. Each square corresponds to a pixel point and the unit length of a physical quantity at the same time, so that the compatibility of pixel observation and non-pixel observation in the training process is facilitated, and the user test is facilitated.
Example two:
fig. 6 shows a multi-agent simulation platform using the simulation method according to the second embodiment of the present invention, which includes:
the interactive environment unit is used for constructing a search space with the length of 10 grids and the width of 10 grids; and sets a predetermined number of agents and obstacles in the search space.
Specifically, the environment interaction unit is responsible for transferring the observation of the intelligent agent from the environment to the algorithm and expressing the behavior of the algorithm decision on the visual interface. The intelligent agent observes the environmental information from the scene of the search space and transmits the observation state to the interaction module. The interaction module uploads the observation state to the algorithm, the algorithm predicts the behavior through the trained model, the intelligent agent executes the behavior in the interaction module, new observation is obtained from a scene, and information such as reward of the behavior and whether the training is finished is obtained.
And the scene unit is electrically connected with the interactive environment unit and is used for providing the number and the positions of the intelligent bodies and the barriers in the search space when a scene is preset.
As shown in fig. 2, three multi-agent common experimental scenes such as "pursuit-escape", "multi-target navigation", "location exchange" and the like are preset in a scene unit of the simulation platform, and a user-defined interface is provided. Wherein the solid polygon represents the agent and the hollow polygon represents the target of the agent. Different tasks of the intelligent body are represented by solid polygons with different shapes, such as a solid circle and a solid diamond in the figure. The moving targets are corresponding hollow circles and hollow diamonds.
In the scene unit, the scene is initialized first, and the number of agents, color and other attributes are determined. The position of the start and end points in the scene. The number and location of obstacles is determined. Then, the observation information of the agent is determined, and under different scenes, the observation information based on the image is different from that based on the non-image. The image-based observation information is pixels of squares around the agent, and the non-image-based observation information is a physical quantity (e.g., position, relative distance between agents, relative distance between agent and target) reflecting environmental information. And finally, judging whether a training process is finished or not, wherein all the agents in the scene finish tasks and the training process is finished.
The algorithm unit is electrically connected with the interactive environment unit and is used for loading the algorithm to be tested and receiving the observation information fed back to the algorithm to be tested by the intelligent agent; and outputting the decision information of the algorithm to be tested on the motion direction of the intelligent agent.
The specific user adds the algorithm to be tested in the algorithm unit, obtains the observation state of the scene fed back by the intelligent agent through the interface of the environment interaction unit, predicts the behavior and returns the behavior to the intelligent agent to guide the intelligent agent to move.
And the scoring unit is respectively and electrically connected with the algorithm unit and the interactive environment unit and is used for scoring each movement of the intelligent body according to the task requirement of the algorithm to be tested until the intelligent body finishes the task. After each task is finished, feeding back the total task score to the algorithm to be tested; and when the intelligent agent executes a plurality of tasks, the algorithm to be tested is continuously tested and debugged to obtain an optimal strategy.
Specifically, the scoring unit is to design an instant reward of the agent, i.e. a numerical reward for each behavior of the agent. Whether the reward design can reflect the quality of the behavior of the intelligent agent determines the effect of the optimization strategy.
In the search space, one square grid corresponds to one pixel point and the unit length of one physical quantity; by using the small checks as dual attributes of pixel points and unit lengths, different environment state information can be conveniently provided for users on one simulation platform, and the user test is facilitated.
An agent occupies a square grid, and the unit time step of the movement of the agent is a square grid; an obstacle occupies a square grid and the agent cannot enter the square grid in which the obstacle is located.
Preferably, in the interactive environment unit, each moving target of the agent is one of the upper, lower, left and right squares of the current square.
Preferably, the scoring of the result of the movement of the wisdom by the scoring unit comprises: bonus of bonus or penalty of bonus.
Preferably, in the interactive environment unit, the observation range of the agent is 3 × 3 grid areas centered on the grid in which the agent is located.
Wherein, based on the observation state of the image pixels, as shown in fig. 3, the solid circle and the implementation diamond are agents with different tasks. The hollow diamond is the target of the agent represented by the solid diamond, and the hollow square sleeve is the target of the agent represented by the solid circle. The 3x3 open boxes at the periphery of the agent represent the observation scope of the agent. And taking the intelligent agent as a center, and determining an observation range according to the observation depth set by the user. The pixels in each grid (RGB three channels) are taken as the observed state of the agent. Fig. 3 shows the observed state of 3 × 3. The observation state based on image pixels is commonly used on a game-related multi-agent reinforcement learning simulation platform.
The physical quantity commonly used as the observation state includes a position of the agent, a relative position between the agent and the target point, and the like. For different scenarios, the algorithm to be tested may select different physical quantities to represent the observed state.
Preferably, the configuration of the number and positions of the agents and the obstacles by the scene unit is determined according to the requirements of the algorithm to be tested or a preset scene.
Specifically, the user may change the scene setting of the search space according to the test requirement of the algorithm. The method mainly comprises two aspects: (1) changing the distribution of obstacles in the scene; (2) changing the number of agents in the scene. By changing the distribution of obstacles in the scene, the difficulty of testing can be increased. The solid circles and the solid diamonds represent agents for different tasks, respectively, as shown in fig. 4. Wherein the hollow rhombus is a solid round target, and the hollow square sleeve is a solid rhombus target. There is a section of the agent that must pass through and be the same area that needs to switch locations. To complete the task, the agent must learn collaborative navigation. One agent passes through the area first and another agent waits and passes through the area after it is accessible. By varying the number of agents in a scene, the performance of the algorithm can be tested. As shown in fig. 5, a solid hexagon, a solid triangle, a solid circle, and a solid diamond represent agents for different tasks, respectively. The hollow rhombus is a solid circular target, the hollow square sleeve is a solid rhombus target, the hollow hexagon is a solid hexagon target, and the hollow triangle is a solid triangle target. The number of agents that need to swap locations increases from two to four. The algorithm needs to coordinate the strategies of four agents and complete the navigation task in a narrower space. The global observation state and the joint action space become complex, the difficulty of task completion becomes high, and the performance requirement on the algorithm is higher. The customizability of the simulation platform can meet more requirements of users.
Example three:
the following table shows the behavior state of the agent and the related rewards applied in three preset scenes in the search space.
Figure BDA0002830641590000081
The multi-agent simulation platform provided by the embodiment of the invention is limited in a certain space (10 x 10 grids), so that the waiting time of the algorithm training process is reduced. The pixel point and the unit length attribute are fused to the minimum unit square of the simulation environment, and the defect that the existing method can only be based on one state information (image pixel or physical quantity) is overcome. Meanwhile, a user-defined interface is provided, and a user can modify the scene to a certain extent according to the self requirement to meet the test requirement of the algorithm.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A multi-agent simulation method, characterized in that the method comprises the steps of:
s1, constructing a search space with the length of 10 grids and the width of 10 grids;
s2, setting a preset number of intelligent bodies and obstacles in the search space;
s3, the intelligent agent uploads observation information formed by the search space, the obstacles and other intelligent agents to an algorithm to be tested;
s4, the algorithm to be tested guides the intelligent agent to move gradually according to task requirements and observation information; the scoring system scores each movement of the intelligent agent according to the task requirement of the algorithm to be tested until the intelligent agent completes the task;
s5, feeding back the total task score to the algorithm to be tested; the intelligent agent executes a plurality of tasks, and after the algorithm to be tested is continuously tested and debugged, an optimal strategy is obtained;
in the search space, one square corresponds to one pixel point and the unit length of one physical quantity; one of said agents occupying one of said squares, the unit time step of movement of which is one of said squares; one of the squares is occupied by one of the obstacles, and the agent cannot enter the square in which the obstacle is located.
2. The method of claim 1, wherein in the step S4, each moving target of the agent is one of the upper, lower, left and right squares of its current square.
3. The method of claim 1, wherein in step S4, the scoring comprises: bonus of bonus or penalty of bonus.
4. The method according to claim 1, wherein in step S3, the observation range of the agent is 3 × 3 square areas centered on the square where the agent is located.
5. The method according to claim 1, wherein in the step S2, the number and positions of the agents and the obstacles are determined according to the requirements of the algorithm to be tested or a preset scenario.
6. A multi-agent simulation platform, the platform comprising:
the interactive environment unit is used for constructing a search space with the length of 10 grids and the width of 10 grids; setting a preset number of agents and barriers in the search space;
the scene unit is electrically connected with the interactive environment unit and used for providing the number and the positions of the intelligent bodies and the obstacles in the search space when a preset scene is provided;
the algorithm unit is electrically connected with the interactive environment unit and is used for loading the algorithm to be tested and receiving the observation information fed back to the algorithm to be tested by the intelligent agent; outputting decision information of the algorithm to be tested on the motion direction of the intelligent agent;
the scoring unit is respectively electrically connected with the algorithm unit and the interactive environment unit and is used for scoring each movement of the intelligent agent according to the task requirement of the algorithm to be tested until the intelligent agent completes the task; feeding back the total task score to the algorithm to be tested; the intelligent agent executes a plurality of tasks, and after the algorithm to be tested is continuously tested and debugged, an optimal strategy is obtained;
in the search space, one square corresponds to one pixel point and the unit length of one physical quantity; one of said agents occupying one of said squares, the unit time step of movement of which is one of said squares; one of the squares is occupied by one of the obstacles, and the agent cannot enter the square in which the obstacle is located.
7. The platform of claim 6, wherein each motion target of the agent in the interactive environment unit is one of up, down, left, and right of its current pane.
8. The platform of claim 6, wherein the scoring unit to score the smart body exercise result comprises: bonus of bonus or penalty of bonus.
9. The platform of claim 6, wherein the observation scope of the agent in the interactive environment unit is 3x3 square areas centered on the square in which the agent is located.
10. The platform of claim 6, wherein the configuration of the number and location of agents and obstacles by the context unit is determined according to the requirements of the algorithm under test or a preset context.
CN202011442726.2A 2020-12-11 2020-12-11 Multi-agent simulation method and platform adopting same Pending CN114626175A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011442726.2A CN114626175A (en) 2020-12-11 2020-12-11 Multi-agent simulation method and platform adopting same
PCT/CN2020/138782 WO2022120955A1 (en) 2020-12-11 2020-12-24 Multi-agent simulation method and platform using method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011442726.2A CN114626175A (en) 2020-12-11 2020-12-11 Multi-agent simulation method and platform adopting same

Publications (1)

Publication Number Publication Date
CN114626175A true CN114626175A (en) 2022-06-14

Family

ID=81895881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011442726.2A Pending CN114626175A (en) 2020-12-11 2020-12-11 Multi-agent simulation method and platform adopting same

Country Status (2)

Country Link
CN (1) CN114626175A (en)
WO (1) WO2022120955A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912356B (en) * 2023-09-13 2024-01-09 深圳大学 Hexagonal set visualization method and related device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090299929A1 (en) * 2008-05-30 2009-12-03 Robert Kozma Methods of improved learning in simultaneous recurrent neural networks
US10977551B2 (en) * 2016-12-14 2021-04-13 Microsoft Technology Licensing, Llc Hybrid reward architecture for reinforcement learning
CN108563112A (en) * 2018-03-30 2018-09-21 南京邮电大学 Control method for emulating Soccer robot ball-handling
CN110135644B (en) * 2019-05-17 2020-04-17 北京洛必德科技有限公司 Robot path planning method for target search
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN111612126A (en) * 2020-04-18 2020-09-01 华为技术有限公司 Method and device for reinforcement learning

Also Published As

Publication number Publication date
WO2022120955A1 (en) 2022-06-16

Similar Documents

Publication Publication Date Title
US7537523B2 (en) Dynamic player groups for interest management in multi-character virtual environments
Liu et al. Prospects for multi-agent collaboration and gaming: challenge, technology, and application
CN107553490A (en) A kind of monocular vision barrier-avoiding method based on deep learning
CN106991713A (en) Method and apparatus, medium, processor and the terminal of scene in more new game
CN112001585A (en) Multi-agent decision method and device, electronic equipment and storage medium
CN113495578A (en) Digital twin training-based cluster track planning reinforcement learning method
CN106446351A (en) Real-time drawing-oriented large-scale scene organization and scheduling technology and simulation system
AU2020416878B2 (en) Map generation method and apparatus, electronic device, and computer storage medium
CN110327624A (en) A kind of game follower method and system based on course intensified learning
Mota et al. Multi-robot coordination using setplays in the middle-size and simulation leagues
US11633671B2 (en) Method and apparatus for dynamic management of formations in a video game
CN108108410B (en) Method for generating maze map of online game
CN110064205A (en) Data for games processing method, equipment and medium
CN114626175A (en) Multi-agent simulation method and platform adopting same
US20230321535A1 (en) Coordinate axis display method and apparatus applied to virtual environments, terminal, and medium
Sui et al. Path planning of multiagent constrained formation through deep reinforcement learning
CN106557611A (en) The Dynamic Load-balancing Algorithm research of distributed traffic network simulation platform and application
CN107247253A (en) A kind of phased-array radar beam dispath information visuallization system and method
CN108628294A (en) A kind of autonomous cooperative control system of multirobot target and its control method
CN104834552B (en) A kind of driving simulator obtains the quick interception engine of traffic environment information
Petrov et al. An intelligent-agent based decision support system for a complex command and control application
Zhou et al. A system for the validation of collision avoidance algorithm performance of autonomous ships
CN115167506A (en) Method, device, equipment and storage medium for updating and planning flight line of unmanned aerial vehicle
Ribas-Xirgo A state-based multi-agent system model of taxi fleets
Kamola A compact DQN model for mobile agents with collision avoidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination