CN114489043B

CN114489043B - Multi-agent path planning method and device, electronic equipment and storage medium

Info

Publication number: CN114489043B
Application number: CN202111602040.XA
Authority: CN
Inventors: 芦维宁; 戴汉奇; 杨君; 陈章; 梁斌
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2024-02-09
Anticipated expiration: 2041-12-24
Also published as: CN114489043A

Abstract

The application relates to the technical field of multi-agent collaborative planning, in particular to a multi-agent path planning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting the perception information of each intelligent agent in the multiple intelligent agents in a perception range; performing imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image; and performing feature extraction and information aggregation on the perceived image based on the composite neural network, mapping the perceived information into a target action strategy, generating predicted actions of each agent at a plurality of moments based on the target action strategy, generating an optimal planning path according to the predicted actions at the plurality of moments, and controlling each agent to act according to the optimal planning path. Therefore, the problems of how to carry out collaborative planning on multiple intelligent agents under the condition that the information part is informed are solved.

Description

Multi-agent path planning method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of multi-agent collaborative planning technologies, and in particular, to a multi-agent path planning method, device, electronic apparatus, and storage medium.

Background

Through researches and developments of multiple intelligent body systems for many years, related applications of the multiple intelligent body systems are deep in various fields, such as military fields, unmanned aerial vehicle cluster collaborative reconnaissance, striking, multi-warhead missile joint burst prevention, guidance and the like; such as the civil field, logistics storage collaborative sorting, emergency rescue joint search and rescue and the like. Multi-agent collaborative planning is an important branch in multi-agent systems, which is a class of problems that find an optimal set of paths for multiple agents from a starting location to a target location without collision.

The multi-agent collaborative planning problem can be categorized into collaborative planning in which information is completely known and information is not completely known, depending on whether a priori information is known. Compared with the situation that the information is completely known, the intelligent level of the multi-agent collaborative planning is higher and the reality universality is stronger under the situation that the information is not completely known, and the method has great scientific research significance and considerable application prospect, but the design difficulty is higher, the calculation complexity is higher, and the method is always a difficulty of research.

Disclosure of Invention

The application provides a multi-agent path planning method, a multi-agent path planning device, electronic equipment and a storage medium, so as to solve the problems of how to carry out collaborative planning on multi-agents under the condition that information parts are informed.

An embodiment of a first aspect of the present application provides a multi-agent path planning method, including the following steps: collecting the perception information of each intelligent agent in the multiple intelligent agents in a perception range; performing imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image; and performing feature extraction and information aggregation on the perceived image based on a composite neural network, mapping the perceived information into a target action strategy, generating predicted actions of each intelligent agent at a plurality of moments based on the target action strategy, generating an optimal planning path according to the predicted actions at the plurality of moments, and controlling each intelligent agent to act according to the optimal planning path.

Further, the composite neural network is composed of CNN (convolutional neural network ), graphSAGE (Graph Sample and AggreGate, graphic neural network) and MLP (Multilayer Perceptron, multi-layer perceptron).

Further, the feature extraction and information aggregation are performed on the perceived image based on the composite neural network to map the perceived information into a target action policy, including: extracting features of the perceived image by using the CNN to obtain a first feature tensor; performing feature tensor aggregation on the first feature tensor by utilizing the GraphSAGE to obtain a second feature tensor; the second feature tensor is input into the MLP to be mapped to a target action strategy based on a probability distribution.

Further, the sensing information includes position information of other agents within the perception range of the agent, position information of the target point or projection of the target point on the boundary of the perception range, and position information of an obstacle within the perception range of the agent.

Further, performing imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image, including: generating a status channel according to the position information of the other intelligent agents; generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range; generating an obstacle channel according to the obstacle position information; and generating the state channel, the target channel and the obstacle channel on a binary image to obtain the perceived image.

Further, before controlling each agent to act according to the optimal planned path, the method further includes: judging whether the predicted action of the intelligent body is collision action or position exchange action; and if the predicted action is the collision action or the position exchange action, executing a collision avoidance protection mechanism, and replacing the collision action or the position exchange action by using an idle action.

An embodiment of a second aspect of the present application provides a multi-agent path planning apparatus, including: the acquisition module is used for acquiring the perception information of each intelligent agent in the multiple intelligent agents in the perception range; the processing module is used for carrying out imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image; the planning module is used for carrying out feature extraction and information aggregation on the perceived image based on a composite neural network, mapping the perceived information into a target action strategy, generating predicted actions of each intelligent agent at a plurality of moments based on the target action strategy, generating an optimal planning path according to the predicted actions at the plurality of moments, and controlling each intelligent agent to act according to the optimal planning path.

Further, the composite neural network consists of a convolutional neural network CNN, a graph neural network graph and a multi-layer perceptron MLP.

Further, the planning module is further configured to perform feature extraction on the perceived image by using the CNN to obtain a first feature tensor; performing feature tensor aggregation on the first feature tensor by utilizing the GraphSAGE to obtain a second feature tensor; the second feature tensor is input into the MLP to be mapped to a target action strategy based on a probability distribution.

Further, the processing module is further configured to generate a status channel according to the location information of the other agents; generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range; generating an obstacle channel according to the obstacle position information in the intelligent agent sensing range; and generating the state channel, the target channel and the obstacle channel on a binary image to obtain the perceived image.

Further, the method further comprises the following steps: the collision avoidance module is used for judging whether the predicted action of each intelligent agent is collision action or position exchange action before controlling each intelligent agent to act according to the optimal planning path; and if the predicted action is the collision action or the position exchange action, executing a collision avoidance protection mechanism, and replacing the collision action or the position exchange action by using an idle action.

An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the multi-agent path planning method according to the embodiment.

The fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program for execution by a processor for implementing the multi-agent path planning method as described in the above embodiments.

Therefore, the application has at least the following beneficial effects:

based on the perception information of the multi-agent and the collaborative planning of the composite neural network to the multi-agent, the collaborative planning of the multi-agent under the condition that the information part is known is realized, so that the multi-agent can find an optimal or suboptimal path which reaches respective target points without collision from respective starting points within a limited time, the multi-agent can reach respective target points without collision efficiently, the design difficulty and the calculation complexity are reduced, and the multi-agent collaborative planning efficiency is improved. Therefore, the technical problems of how to carry out collaborative planning on multiple intelligent agents under the condition that the information part is informed are solved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flow chart of a multi-agent path planning method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a composite network structure consisting of CNN, graphSAGE and MLP provided in accordance with an embodiment of the present application;

fig. 3 is a schematic diagram of an imaging processing method of single agent sensing information according to an embodiment of the present application;

FIG. 4 is a schematic diagram showing a relationship between a success rate θ of a simulation experiment and the number of agents according to an embodiment of the present application;

FIG. 5 is a time-consuming increment ratio of simulation experiments provided according to an embodiment of the present applicationA schematic diagram of the relationship with the number of agents;

FIG. 6 is a block diagram of a multi-agent path planning apparatus provided in accordance with an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

The key point of solving the difficulty in the background technology is how to process the problems of cooperative cooperation and information sharing among the intelligent agents, the communication network among the intelligent agents can be regarded as a non-Euclidean data graph, and the graph neural network has the advantages of being capable of learning non-Euclidean data structure information, not only aggregating complex structure information in the communication network, but also containing rich attribute information. The embodiment of the application aims to solve the problem of multi-agent collaborative planning under the condition that an information part is known, and therefore, the embodiment of the application provides a multi-agent collaborative planning method based on a graph neural network.

The following describes a multi-agent path planning method, a multi-agent path planning device, an electronic device and a storage medium according to embodiments of the present application with reference to the accompanying drawings. Aiming at the problem of how to carry out collaborative planning on multiple intelligent agents under the condition that an information part is known, the application provides a path planning method of the multiple intelligent agents, in the method, based on the perception information of the multiple intelligent agents and the collaborative planning of a composite neural network on the multiple intelligent agents, the collaborative planning of the multiple intelligent agents under the condition that the information part is known is realized, the multiple intelligent agents can find an optimal or suboptimal path which reaches respective target points without collision in a limited time from respective starting points, thereby realizing that the multiple intelligent agents can reach respective target points without collision efficiently, reducing the difficulty and the computational complexity of design, and improving the collaborative planning efficiency of the multiple intelligent agents. Therefore, the technical problems of how to carry out collaborative planning on multiple intelligent agents under the condition that the information part is informed are solved.

Specifically, fig. 1 is a flow chart of a multi-agent path planning method according to an embodiment of the present application.

As shown in fig. 1, the multi-agent path planning method includes the following steps:

in step S101, sensing information of each of the multiple agents in a sensing range is collected.

The sensing information comprises position information of other agents in the agent sensing range, position information of a target point or projection of the target point on the boundary of the sensing range, and position information of an obstacle in the agent sensing range.

It can be understood that the embodiment of the present application may sample the sensing information of the multiple agents, as shown in fig. 2, the sampling number of layer 1 is N ₁ The sampling number of each agent in layer 2 is N ₂ Layer 2 co-sampling N ₁ ×N ₂ And (3) an intelligent agent.

It should be noted that, the goal of multi-agent collaborative planning: in the two-dimensional grid plane, the multi-agent finds an optimal or suboptimal path to the respective target point without collision from the respective starting point in a defined time.

The simulation environment of the embodiment of the application is a two-dimensional grid plane (w×h), and the plane contains N agents v= { V ₁ ,...,v _N Sum of N target points g= { G ₁ ,...,G _N And randomly spread a plurality of obstacles. Each intelligent agent has a perceived radius r _ob ，Is intelligent v _i The perceived image at time t. Communication half of each agentThe diameter is r _com When agent v _i And agent v _j Is less than r _com When, i.e. |p _i -p _j ||≤r _com Then agent v _i And agent v _j Can communicate with each other, otherwise, cannot communicate. Definitions->Communication network diagram for multiple agents at t time, wherein epsilon _t A set of composed edges representing agents that can communicate with each other; w, H: the width and height of the two-dimensional grid plane; v= { V ₁ ,...,v _N }、G＝{G ₁ ,...,G _N }: a set of N agents, each agent corresponding to a set of target points; r is (r) _ob 、r _com : sensing radius and communication radius of the intelligent body; />Intelligent v _i Is a perceived image of W _OB 、H _OB The width and the height of the perceived image are respectively; p is p _i 、p _j : intelligent v _i And v _j Is a part of the position information of the mobile terminal; />Communication network diagram of multiple intelligent agents at t moment epsilon _t Is a set of connected edges of communicable agents.

In step S102, the perceived information of each agent in the perceived range is subjected to imaging processing, and a perceived image is generated.

In this embodiment, performing imaging processing on the sensing information of each agent in the sensing range to generate a sensing image, including: generating a state channel according to the position information of other intelligent agents; generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range; generating an obstacle channel according to the obstacle position information in the perception range of the intelligent body; and generating the state channel, the target channel and the obstacle channel on the binary image to obtain a perceived image.

Specifically, the imaging process of the individual agent sensing information generates the sensing information of the agent at W as shown in fig. 3 _OB ×H _OB And is divided into three channels of states, objects and barriers. As shown in fig. 3, the status channel displays the location information of other agents within the perception range of the agent; the target channel is the position information of the target point corresponding to the intelligent agent or the projection of the target point on the boundary of the perception map; the obstacle channel displays obstacle position information within the perception range of the agent.

In step S103, feature extraction and information aggregation are performed on the perceived image based on the composite neural network, perceived information is mapped into a target action policy, a predicted action of each agent at a plurality of moments is generated based on the target action policy, an optimal planning path is generated according to the predicted actions at the plurality of moments, and each agent is controlled to act according to the optimal planning path.

Among other things, the composite neural network may also be referred to as GNN (Graph neural network, graphic neural network), CNN, graphSAGE and MLP.

It can be appreciated that, in order to make the multi-agent arrive at the respective target points efficiently without collision, according to the actual conditions that the sensing and communication range of the multi-agent are limited and no global positioning exists, the embodiment of the application constructs a composite neural network composed of CNN, graphSAGE and MLP, and by training the network, the system can better determine which information is helpful for the multi-agent collaborative planning. The CNN is used for fully extracting the characteristics of the perception information of each agent, inputting the characteristics into the graph SAGE, inputting the characteristics into the MLP after flow aggregation, and mapping the characteristics into action strategies (up, down, left, right and idle) based on probability distribution, wherein the final path of each agent is represented by a series of sequential actions.

In this embodiment, feature extraction and information aggregation are performed on the perceived image based on the composite neural network to map the perceived information into a target action policy, including: extracting features of the perceived image by using CNN to obtain a first feature tensor; performing feature tensor aggregation on the first feature tensor by utilizing GraphSAGE to obtain a second feature tensor; the second feature tensor is input into the MLP to map to a target action strategy based on the probability distribution.

It can be understood that taking the generation process of the single agent prediction action at the time t as shown in fig. 2 as an example, firstly, the sensing information of multiple agents is sampled, then the sampled information is sent to a composite network composed of CNN, graphSAGE and MLP to extract the characteristic and aggregate information, and the sensing information is mapped into a specific prediction action.

Specifically, the principle of the prediction action generation process of each agent is the same, and the agent v is adopted in the embodiment of the application _i For example, specifically, the process of generating the predicted action at the time t is described:

first, each agent images the perceived information onto three channels of a state, a target and an obstacle, and inputs the three channels into the CNN for feature extraction. The purpose of CNN is to convert perceived map information into a high-dimensional feature tensor at time t for describing the state, object and obstacle information of other agents within the perception range, as agent v _i For example, i.e.Construction with agent v _i GraphSAGE with center 2-hop, layer 1 sampling number N ₁ The sampling number of each agent in layer 2 is N ₂ Layer 2 co-sampling N ₁ ×N ₂ And (3) an intelligent agent. And then the central intelligent body v _i First layer N ₁ Personal agent and layer 2N ₁ ×N ₂ The high-dimensional feature tensor aggregate output of individual agents, i.eObtaining Q-dimensional feature tensor->The characteristic tensor is further added>Input into MLP, mapped into action strategies (up, down, left, right based on probability distributionIdle), generate agent v _i Predicted action at time t->Finally, agent v _i Is composed of a series of sequential actions.

It should be noted that, in the embodiment of the present application, the agent v may be defined _i Maximum planning time T of (2) _i ^max ＝3T _i ^* Wherein T is _i ^* Is intelligent v _i Is used to optimize the path planning time. Each time planning is performed, if agent v _i Is greater than T _i ^max And the planning is regarded as a failure case.

In this embodiment, before controlling each agent to act according to the optimal planned path, the method further includes: judging whether the predicted action of the intelligent body is a collision action or a position exchange action; if the predicted motion is a collision motion or a position exchange motion, a collision avoidance protection mechanism is executed, and the collision motion or the position exchange motion is replaced by an idle motion.

It can be understood that, since it cannot be guaranteed that the predicted actions will not generate collision situations such as collision, the embodiment of the present application further adds a collision avoidance protection mechanism, so as to avoid the following two collision situations: 1) When the predicted action causes collision between the agent and the obstacle or other agents; 2) When the predicted action will cause the agent to exchange positions with another agent. If the collision situation is judged to occur, starting a collision avoidance protection mechanism, wherein the intelligent agent replaces the prediction action with an idle action; if it is determined that the above situation does not occur, the agent performs the predictive action.

Simulation experiments are performed to verify the composite network constructed by the embodiment of the application and based on CNN, graphSAGE and MLP. The CNN network provided uses Conv2d-BatchNorm2dReLU-MaxPool2d and Conv2d-BatchNorm2d-ReLU continuously for 3 times, all convolution kernels are 3 in size, the step length is 1, and zero filling is performed; graphSAGE uses 2 layers of network samples (K=3), with 3 samples per layerThe input is 128-dimensional and the output is 7-dimensional. Supervised learning using an open source dataset, experiments using 30000 different maps of size 20 x 20 with obstacle rate 10%, of which 21000 are used for training, 4500 are used for testing, 4500 are used for verification, training with the same number of agents (4, 6, 8, 10, 12), testing the network, respectively, and assuming perceived radius r of the agents _ob =4, communication radius r _com =5. Use success rate θ and time consumption increment ratioAs a measure, success rate θ=n _suc /n，n _suc For successfully completing the planned number of agents, n is the total number of agents; time consumption increment ratio->FT is the total planning time for completing tasks, FT ^* Optimally time consuming to complete the task.

In the embodiment of the application, the simulation experiment result is compared with a composite network which does not contain graphSAGE and consists of CNN and MLP, and the experiment result is shown in fig. 4 and 5. Fig. 4 shows the relationship between the success rate θ and the number of agents, and it can be seen that the success rate decreases as the number of agents tested increases. And the method is obviously superior to the composite network consisting of CNN and MLP. FIG. 5 is a time consumption increment ratioThe relationship between the number of agents can be seen to decrease system performance as the agent size increases. The method of the embodiment of the application is still obviously superior to the composite network consisting of CNN and MLP only. Compared with a simulation experiment of a composite network consisting of CNN and MLP only, the composite network consisting of CNN, graphSAGE, MLP constructed by the invention shows that the method of the embodiment of the application has higher success rate and smaller time consumption increment ratio, and the result also shows the effectiveness of the method provided by the embodiment of the application.

In summary, the embodiment of the application provides a multi-agent collaborative planning method based on a graph neural network for multi-agent collaborative planning research. Wherein the method comprises the following steps: constructing a composite neural network consisting of CNN, graphSAGE, MLP and training the network; real-time imaging processing is carried out on the sensing information of each intelligent agent, and the sensing information is input into a composite neural network, and the actions of each intelligent agent are predicted; judging whether the predicted action can cause collision, and selecting to execute the action or start a collision avoidance protection mechanism according to the judgment.

According to the multi-agent path planning method provided by the embodiment of the application, the multi-agent collaborative planning is realized based on the perception information of the multi-agent and the composite neural network, so that the multi-agent collaborative planning under the condition that the information part is known can be realized, and the multi-agent can find an optimal or suboptimal path which reaches respective target points without collision in a limited time from respective starting points, so that the multi-agent can reach respective target points without collision efficiently, the design difficulty and the calculation complexity are reduced, and the multi-agent collaborative planning efficiency is improved.

Next, a multi-agent path planning apparatus according to an embodiment of the present application will be described with reference to the accompanying drawings.

Fig. 6 is a block schematic diagram of a multi-agent path planning apparatus according to an embodiment of the present application.

As shown in fig. 6, the multi-agent path planning apparatus 10 includes: an acquisition module 100, a processing module 200 and a planning module 300.

The acquisition module 100 is configured to acquire sensing information of each of the multiple agents within a sensing range; the processing module 200 is configured to perform imaging processing on the sensing information of each agent in the sensing range, so as to generate a sensing image; the planning module 300 is configured to perform feature extraction and information aggregation on the perceived image based on the composite neural network, map the perceived information into a target action policy, generate predicted actions of each agent at multiple times based on the target action policy, and generate an optimal planning path according to the predicted actions at multiple times, so as to control each agent to perform actions according to the optimal planning path.

Further, the sensing information includes position information of other agents within the agent sensing range, position information of the target point or projection of the target point on the boundary of the sensing range, and obstacle position information within the agent sensing range.

Further, the processing module 200 is further configured to generate a status channel according to the location information of the other agents; generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range; generating an obstacle channel according to the obstacle position information in the perception range of the intelligent body; and generating the state channel, the target channel and the obstacle channel on the binary image to obtain a perceived image.

Further, the planning module 300 is further configured to perform feature extraction on the perceived image by using CNN to obtain a first feature tensor; performing feature tensor aggregation on the first feature tensor by utilizing GraphSAGE to obtain a second feature tensor; the second feature tensor is input into the MLP to map to a target action strategy based on the probability distribution.

Further, the apparatus 10 of the embodiment of the present application further includes: and a collision avoidance module. The collision avoidance module is used for judging whether the predicted action of each intelligent agent is collision action or position exchange action before controlling each intelligent agent to act according to the optimal planning path; if the predicted motion is a collision motion or a position exchange motion, a collision avoidance protection mechanism is executed, and the collision motion or the position exchange motion is replaced by an idle motion.

It should be noted that the foregoing explanation of the embodiment of the multi-agent path planning method is also applicable to the multi-agent path planning apparatus of this embodiment, and will not be repeated herein.

According to the multi-agent path planning device provided by the embodiment of the application, the multi-agent collaborative planning is realized based on the perception information of the multi-agent and the composite neural network, so that the multi-agent collaborative planning under the condition that the information part is informed can be realized, the multi-agent can find an optimal or suboptimal path which reaches respective target points without collision in a limited time from respective starting points, the multi-agent can reach respective target points without collision efficiently, the design difficulty and the calculation complexity are reduced, and the multi-agent collaborative planning efficiency is improved.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

memory 701, processor 702, and computer programs stored on memory 701 and executable on processor 702.

The processor 702 implements the multi-agent path planning method provided in the above embodiments when executing a program.

Further, the electronic device further includes:

a communication interface 703 for communication between the memory 701 and the processor 702.

Memory 701 for storing a computer program executable on processor 702.

The memory 701 may include high-speed RAM (Random Access Memory ) memory, and may also include non-volatile memory, such as at least one disk memory.

If the memory 701, the processor 702, and the communication interface 703 are implemented independently, the communication interface 703, the memory 701, and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component, external device interconnect) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may communicate with each other through internal interfaces.

The processor 702 may be a CPU (Central Processing Unit ) or ASIC (Application Specific Integrated Circuit, application specific integrated circuit) or one or more integrated circuits configured to implement embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the multi-agent path planning method as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The multi-agent path planning method is characterized by comprising the following steps of:

collecting the perception information of each intelligent agent in the multiple intelligent agents in a perception range;

performing imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image; and

performing feature extraction on the perceived image by using a convolutional neural network CNN to obtain a first feature tensor, and performing feature tensor aggregation on the first feature tensor by using a graph neural network graph SAGE to obtain a second feature tensorInputting the second characteristic tensor into a multi-layer perceptron MLP (multi-layer perceptron) to map to a target action strategy based on probability distribution, generating a predicted action of each agent at a plurality of moments based on the target action strategy, generating an optimal planning path according to the predicted actions at the plurality of moments, and controlling each agent to act according to the optimal planning path so as to realize that each agent finds an optimal or suboptimal path which reaches a respective target point without collision from a respective starting point in a defined time in a two-dimensional grid plane, wherein the two-dimensional grid plane comprises N agents V= { V ₁ ,...,v _N Sum of N target points g= { G ₁ ,...,G _N While randomly spreading obstacles, each agent has a perceived radius r _ob ，Is intelligent v _i Perceived image at time t, W and H represent width and height of two-dimensional grid plane, W _OB And H _OB The width and the height of the perceived image are respectively; each intelligent agent has a communication radius r _com When agent v _i And agent v _j Is less than r _com When, i.e. |p _i -p _j ||≤r _com Then agent v _i And agent v _j Can communicate with each other, p _i 、p _j Representing agent v _i And v _j Is a part of the position information of the mobile terminal; definitions->Communication network diagram for multiple agents at t time, wherein epsilon _t Representing a set of constituent edges of agents that can communicate with each other.

2. The method of claim 1, wherein the perception information includes location information of other agents within an agent's perception range, location information of a target point or projection of the target point on a boundary of the perception range, obstacle location information within the agent's perception range.

3. The method according to claim 2, wherein the imaging the sensing information of each agent in the sensing range to generate the sensing image includes:

generating a status channel according to the position information of the other intelligent agents;

generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range;

generating an obstacle channel according to the obstacle position information in the intelligent agent sensing range;

and generating the state channel, the target channel and the obstacle channel on a binary image to obtain the perceived image.

4. A method according to any one of claims 1-3, further comprising, prior to controlling each agent to act on the optimal planned path:

judging whether the predicted action of the intelligent body is collision action or position exchange action;

and if the predicted action is the collision action or the position exchange action, executing a collision avoidance protection mechanism, and replacing the collision action or the position exchange action by using an idle action.

5. A multi-agent path planning apparatus, comprising:

the acquisition module is used for acquiring the perception information of each intelligent agent in the multiple intelligent agents in the perception range;

the processing module is used for carrying out imaging processing on the perception information of each intelligent agent in the perception range to generate a perception image; and

the planning module is used for carrying out feature extraction on the perceived image by utilizing a convolutional neural network CNN to obtain a first feature tensor, and carrying out feature tensor aggregation on the first feature tensor by utilizing a graph neural network graph SAGE to obtain the perceived imageA second feature tensor is input into the multi-layer perceptron MLP to be mapped into a target action strategy based on probability distribution, a predicted action of each agent at a plurality of moments is generated based on the target action strategy, an optimal planning path is generated according to the predicted actions at the plurality of moments, each agent is controlled to act according to the optimal planning path, so that the optimal or suboptimal path of each agent, which reaches respective target points without collision, is found in a limited time from respective starting points in a two-dimensional grid plane, wherein N agents V= { V are contained in the two-dimensional grid plane ₁ ,...,v _N Sum of N target points g= { G ₁ ,...,G _N While randomly spreading obstacles, each agent has a perceived radius r _ob ，Is intelligent v _i Perceived image at time t, W and H represent width and height of two-dimensional grid plane, W _OB And H _OB The width and the height of the perceived image are respectively; each intelligent agent has a communication radius r _com When agent v _i And agent v _j Is less than r _com When, i.e. |p _i -p _j ||≤r _com Then agent v _i And agent v _j Can communicate with each other, p _i 、p _j Representing agent v _i And v _j Is a part of the position information of the mobile terminal; definitions->Communication network diagram for multiple agents at t time, wherein epsilon _t Representing a set of constituent edges of agents that can communicate with each other.

6. The apparatus of claim 5, wherein the perception information includes location information of other agents within an agent's perception range, location information of a target point or projection of the target point on a boundary of the perception range, obstacle location information within the agent's perception range.

7. The apparatus of claim 6, wherein the processing module is further configured to generate a status channel based on the location information of the other agents; generating a target channel according to the position information of the target point or the projection of the target point on the boundary of the perception range; generating an obstacle channel according to the obstacle position information in the intelligent agent sensing range; and generating the state channel, the target channel and the obstacle channel on a binary image to obtain the perceived image.

8. The apparatus according to any one of claims 5-7, further comprising:

the collision avoidance module is used for judging whether the predicted action of each intelligent agent is collision action or position exchange action before controlling each intelligent agent to act according to the optimal planning path; and if the predicted action is the collision action or the position exchange action, executing a collision avoidance protection mechanism, and replacing the collision action or the position exchange action by using an idle action.

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the multi-agent path planning method of any of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing the multi-agent path planning method according to any one of claims 1-4.