CN111738372A

CN111738372A - Distributed multi-agent space-time feature extraction method and behavior decision method

Info

Publication number: CN111738372A
Application number: CN202010873794.8A
Authority: CN
Inventors: 蒲志强; 刘振; 王彗木; 丘腾海; 易建强
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-10-02
Anticipated expiration: 2040-08-26
Also published as: CN111738372B

Abstract

The invention provides a distributed multi-agent space-time feature extraction method and a behavior decision method. The behavior decision method comprises the following steps: intelligent agent for acquiring current moment and previous moments

The state information which can be sensed constructs a space-time state vector; inputting the space-time state vector into the network generation layer of the graph and outputting the intelligent agent

The original feature vector of (2); inputting the original feature vector into a spatial feature extraction layer, and outputting a spatial relationship feature vector; inputting the space relation characteristic vector into a space-time characteristic extraction layer, and outputting the space-time relation characteristic vector; calculating an agent based on the obtained spatiotemporal relationship feature vectors

A behavior decision at the current moment; updating time step, calculating next time agent

Spatiotemporal features and behavioral decisions. The invention realizes the extraction of the distributed space-time characteristic relation of the multi-agent system under the constraints of complex environment, time-varying topology, limited resources and the like, and improves the self-adaptive capacity and performance robustness of the multi-agent system in large-scale complex tasks.

Description

Distributed multi-agent space-time feature extraction method and behavior decision method

Technical Field

The invention belongs to the field of multi-agent and group intelligence, and particularly relates to a distributed multi-agent space-time feature extraction method and a behavior decision method.

Background

The multi-agent system has the advantages of distributivity, simplicity, flexibility, robustness and the like, provides a brand-new solution for a plurality of complex problems with great challenges, and the group intelligence represented by the multi-agent system is also one of five intelligent forms of the key development established in 'new generation artificial intelligence development planning' in China. With the rapid development of micro-nano electronics, computing platforms, autonomous control and other emerging technologies, a multi-agent system composed of unmanned autonomous platforms such as unmanned aerial vehicles and unmanned vehicles is increasingly applied in important scenes concerning national civilization and national defense safety. The unmanned autonomous multi-agent system can quickly form area coverage in a networked, distributed and collaborative mode, achieves cluster resource optimization scheduling, improves task completion rate and response speed, can serve as a normalized deployment system to serve the fields of mountain patrol, disaster early warning, environment monitoring, regional logistics and the like on the one hand, and can also serve as a quick response system for emergencies to provide the capabilities of quick material scheduling, disaster monitoring evaluation, communication guarantee support and the like under the scenes such as epidemic prevention and control, sudden disasters, large-scale active personal defense and the like on the other hand.

The behavior decision of the multi-agent mainly comprises a centralized mode and a distributed mode. The centralized decision-making has a central decision-making point, all information is gathered to the central node through a communication network, behavior decision-making instructions of all the intelligent agents are calculated through a centralized planning decision-making algorithm, and then the decision-making instructions of all the intelligent agents are issued to each intelligent agent through the communication network for execution. The centralized mode has high requirement on the reliability of a communication network and a central node, and has larger behavior delay, and the intelligent agent is difficult to realize self-adaptive autonomous behavior decision along with the change of tasks and environments when facing to the actual application scene, thereby greatly limiting the exertion of the intelligent synergistic effect of the multi-intelligent-agent system. In an actual scene, a multi-agent system often covers a large range and is difficult to form a centralized network, a single agent often only has limited environment sensing capability, communication capability and behavior capability, and the communication topology connection relationship among the agents will change all the time in a dynamic task, so that distributed decision brings better self-adaption capability and task performance for the multi-agent system in complex environments and tasks.

When an agent makes a decision, the decision-making basis is the state information of the current task and environment, and when large-scale clustering and complex tasks/environments are faced, how to abstract and reduce the state information by effective means and further extract the space-time characteristic relationship between the agent and between the agent and the task environment elements is a key for ensuring the multi-agent system to realize abstract understanding of the task and the environment and further realize autonomous decision-making and intelligent control.

The graph attention network is a machine learning method which is just emerging in recent years, multiple problems in reality are abstracted into a graph structure, a graph neural network is adopted for feature extraction, an attention mechanism is further adopted for realizing fusion of different feature representation spaces, and related technologies are gradually verified and applied in scenes such as social networks, traffic road condition forecast and the like. On the other hand, the long-term and short-term memory network is an important recurrent neural network and is widely applied to the processing of timing problems, particularly in the fields of voice recognition, semantic analysis and the like. In particular, the long and short term memory network with peepholes has the characteristic of repeatedly refining the historical state, thereby having better performance when dealing with complex state and time-varying systems. The essential spatial topological structure of the multi-agent system and the time sequence dependency of tasks enable the graph attention network and the long-short term memory network with peepholes to have natural application advantages on the multi-agent, however, due to the rising of the related technology, the application of the graph attention network and the long-short term memory network on an unmanned autonomous multi-agent system composed of an unmanned aerial vehicle, an unmanned vehicle and the like is rarely reported at present, particularly, the two networks are combined for extracting the spatiotemporal relation characteristics of the multi-agent system, and the method has important prospective and innovative significance in the machine learning field and the multi-agent system field.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problem of efficient spatiotemporal feature relation extraction of a multi-agent system in dynamic and complex tasks, a first aspect of the present invention provides a distributed multi-agent spatiotemporal feature extraction method, which includes the following steps:

step S100, at the time

，Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

step S200 based on

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

step S300 based on

Obtaining the intelligent agent through the spatial characteristic relation extraction layer

At the current moment

Space relation characteristic vector of

(ii) a The spatial feature relation extraction layer is constructed in a mode that a graph attention network module and a full-connection network module are stacked alternately;

step S400, acquiring time

Front side

Individual moment intelligent agent

Spatial relationship feature vector of

Will be

、

Inputting the data into a time-space feature extraction layer, outputting an intelligent agent by adopting a long-term and short-term memory network with peepholes based on graph convolution operation

At the current moment

Space-time relation characteristic vector of

。

In some preferred embodiments, step S100 "agent at each moment in time

Observable spatial state vectors "including agent self states, task goal states, observable other agent states, observable environmental element states;

the self state of the intelligent agent comprises the self position, speed and acceleration state of the intelligent agent;

the task target state comprises a target position and a speed state;

the observable other agent states include observable location, speed states of other agents;

the observable environmental element states include observable position and velocity states of obstacles in the environment, and position states of no-traffic zones in the environment.

In some preferred embodiments, the graph network generation layer in step S200 is composed of a plurality of layers of fully-connected neural networks.

In some preferred embodiments, the method for extracting spatial feature relationship based on the spatial feature relationship extraction layer in step S300 includes:

step S310, using the agent

Of the original feature vector⁰

And the original feature vectors of all neighboring agents⁰

Obtaining, as input, a spatial relationship feature vector by a first graph attention network module¹

(ii) a Wherein,

，

as an agent

A set of neighbor agents capable of direct communication;

⁰

，

⁰

；

step S320, in¹

For input, obtaining space relation characteristic vector through first full-connection network module²

；

Step S330, based on the space relation feature vector obtained in the step S320, through the stacked attention network module and the full-connection network module, the method of the step S310 and the step S320 is adopted to iteratively calculate the second

Spatial relationship feature vector of order

^f2-1

、^f2

(ii) a Wherein,

，

the number of stacked layers of the power network module and the fully connected network module is noted for the figure;

step S340, in the iterative computation

The second timeBased on^k2(-1)

By the method of step S310, through

Obtaining space relation characteristic vector by drawing attention network module^k2-1

(ii) a Splicing vector value⁰

,²

,⁴

,…,^k2(-1)

,^k2-1

]Input the first

Fully connecting network modules to obtain

Spatial relationship feature vector^k2

As time of day

Lower intelligent agent

Final output based on the spatial feature relationship extraction layer

。

In some preferred embodiments, the spatial relationship feature vector¹

The acquisition method comprises the following steps:

step S311, using a learnable matrix

Pair relation feature vector⁰

、⁰

Linear transformation is carried out and spliced into a new relation characteristic vector⁰

=[

⁰

,

⁰

](ii) a Will be provided with⁰

Inputting a full-connection neural network and outputting an intelligent agent

For intelligent agent

Attention coefficient of

(ii) a Obtaining an agent

For intelligent agent

Attention normalization coefficient of

，

Step S312, adopting a multi-head attention mechanism for the first time

Calculating the normalized attention coefficient by the method of step S311

(ii) a Intelligent agent under fusion of computing multi-head attention mechanism

Spatial relationship feature vector of¹

，

¹

Wherein,

to keep track of the number of the first hours of attention,

the function is activated for the sigmoid and,

is as follows

Attention is drawn to the chosen linear transformation matrix,

representing a concatenation operation of the vectors.

In some preferred embodiments, the long-short term memory network with peephole based on graph volume operation comprises

The long and short term memory network units are connected in series and provided with peepholes;

and an input gate, a forgetting gate and an output gate of the long and short term memory network unit are constructed based on a graph convolution neural network.

In some preferred embodiments, the spatiotemporal relationship feature vector

The acquisition method comprises the following steps:

the long-short term memory network unit near the output end is recorded as

And increase the serial number in the reverse direction;

first, the

A long and short term memory network unit

The cell state of (A) is recorded as

The output is a space-time relation feature vector

Input is as

Temporal spatial relationship feature vector

The first stepp+1 unit

Output space-time relation feature vector

And cell state thereof

(ii) a Wherein,

；

based on cell state notation

Feature vector of space-time relation

、

、

Obtaining spatiotemporal relation feature vectors through the long-short term memory network

。

In some preferred embodiments, the spatiotemporal relationship feature vector

The calculation method comprises the following steps:

step 401, will

Temporal spatial relationship feature vector

The first step

A unit

Network element status of

And the first

A unit

Output space-time relation feature vector

Inputting the input data into a forgetting gate based on a neural network adopting graph convolution, and calculating to obtain an forgetting gate output variable

，

Wherein, the graph convolution operation is represented,

and

the weight coefficient matrix and the bias of the forgetting gate diagram convolutional neural network respectively,

activating a function for sigmoid;

step 402, will

Temporal spatial relationship feature vector

The first step

A unit

Output space-time relation feature vector

And cell state thereof

Inputting to an input gate based on the use of a convolutional neural network, and comparing the cell states

Updating, wherein the calculation formula is as follows:

；

；

wherein,

is the output of the input gate or gates,

in order to be a transitional state of the cell,

、

the weight coefficient matrix corresponding to the neural network is convolved for the input gate map,

、

the corresponding bias of the neural network is convolved for the input gate map,

for the purpose of the tanh activation function,

is the Hadamard product;

at step 403, will

Temporal spatial relationship feature vector

The first step

A unit

Space-time of outputRelational feature vector

And step 402 updated

A unit

Network element status of

Input to an output gate based on a neural network using convolution of a graph to obtain

A unit

Output space-time relation feature vector

The calculation formula is as follows:

wherein,

is the output variable of the output gate,

、

and (4) convolution of the weight coefficient matrix and the bias corresponding to the neural network for the output gate graph.

In some preferred embodiments, all of the agents in the distributed multi-agent share learnable parameters.

The invention provides a distributed multi-agent behavior decision method in a second aspect, and the method is based on the above method for extracting the space-time characteristics of the distributed multi-agent to obtain the agent

At the current moment

Space-time relation characteristic vector of

Computing agents using model-knowledge-driven methods or reinforcement learning data-driven methods

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.

In some preferred embodiments of the distributed multi-agent behavioral decision method, all agents in the distributed multi-agent share learnable parameters.

The invention provides a distributed multi-agent space-time feature extraction system, which comprises a state vector acquisition module, an original feature generation module, a space relation calculation module and a space-time relation calculation module;

the state vector acquisition module is configured to acquire the state vector at the moment

Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

the primitive feature generation module configured to generate a primitive feature based on

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

the spatial relationship calculation module is configured to calculate the spatial relationship based on

At the current moment

Space relation characteristic vector of

the time-space relation calculation module is configured to obtain time

Front side

Individual moment intelligent agent

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

。

The fourth aspect of the invention provides a distributed multi-agent behavior decision system, which is based on the above-mentioned distributed multi-agent space-time feature extraction system and further comprises a behavior decision calculation module;

the behavioral decision calculation module is configured to be based on an agent

At the current moment

Space-time relation characteristic vector of

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.

In a fifth aspect of the present invention, a storage device is proposed, in which a plurality of programs are stored, said programs being adapted to be loaded and executed by a processor to implement the above-mentioned distributed multi-agent spatiotemporal feature extraction method, and/or the above-mentioned distributed multi-agent behaviour decision method.

In a sixth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the above-mentioned distributed multi-agent space-time feature extraction method and/or the above-mentioned distributed multi-agent behavior decision method.

The invention has the beneficial effects that:

by adopting a distributed characteristic extraction and behavior decision mode, compared with a centralized mode, the method is closer to the actual application scene of a large-scale multi-agent system, and can fully exert the application advantages of the multi-agent system such as distribution, networking and synergy; the spatial-temporal characteristic relation contained in the multi-agent system is extracted through the graph attention mechanism and the long and short memory networks, so that an important basis can be provided for the follow-up intelligent behavior decision of the multi-agent system, the agent can realize the autonomous behavior decision in dynamic and complex tasks, and the characteristic extraction layer is constructed by adopting parameter learnable models such as the graph neural network and the long and short memory networks with peepholes, so that the extraction of hidden characteristics and changed characteristics in the agent can be realized, and the task and environment adaptability of the agent is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a distributed multi-agent behavior decision method according to an embodiment of the present invention;

FIG. 2 is a diagram of a distributed multi-agent spatiotemporal feature extraction method for a single agent in an embodiment of the invention

Generating a layer schematic diagram;

FIG. 3 is a diagram of a distributed multi-agent spatiotemporal feature extraction method for a single agent in an embodiment of the invention

Schematic diagram of the spatial feature extraction layer;

FIG. 4 is a diagram of a distributed multi-agent spatiotemporal feature extraction method for a single agent in an embodiment of the invention

The space-time feature extraction layer schematic diagram of (1);

FIG. 5 is a diagram of a distributed multi-agent spatiotemporal feature extraction method for a single agent in an embodiment of the invention

The 1 st long-short term memory network unit structure diagram in the space-time characteristic extraction layer;

FIG. 6 is a schematic diagram of intelligent agent collaboration behavior in an embodiment of the invention

FIG. 7 is a block diagram of a distributed multi-agent behavior decision system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention provides a novel multi-agent space-time feature extraction method, which aims at the problems of high-efficiency space-time feature relation extraction and intelligent decision making of a multi-agent system in dynamic and complex tasks, adopts a graph structure to express the space-time relation between an agent individual and the environment, further adopts a graph neural network and an attention mechanism to extract the space feature relation, adopts a long-short term memory network with peepholes to realize the time feature relation extraction, and finally realizes the autonomous space-time feature relation extraction and the intelligent behavior decision making of the agent in the dynamic and complex tasks.

The invention discloses a distributed multi-agent space-time feature extraction method, which comprises the following steps:

step S100, at the time

，Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

step S200 based on

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

step S300 based on

At the current moment

Space relation characteristic vector of

step S400, acquiring time

Front side

Individual moment intelligence

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

。

In order to more clearly explain the present invention, the following will expand the detailed description of the steps in an embodiment of the distributed multi-agent spatio-temporal feature extraction method according to the present invention with reference to the drawings.

In one embodiment of the present invention, the device comprises

Individual agent's multi-agent system relies on method specification, the first of which

（

) The method for extracting space-time characteristics of an agent comprises steps S100-S400, as shown in FIG. 1 (in FIG. 1, a distributed multi-agent behavior decision is shown)The flow diagram of the policy method includes the content of the distributed multi-agent spatio-temporal feature extraction method, so this embodiment only describes the relevant content, and other parts are described in the following corresponding embodiments, such as distributed sharing parameters, agent behavior decision based on spatio-temporal relation feature vectors, and updating time steps to return to the next time calculation in step S100).

Step S100, at the current time

Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the preset historical state number is an adjustable non-negative integer.

In this step, each discrete moment agent is obtained

Observable spatial states including agent's own state, task goal state, observable states of other agents, observable environmental element statesState; the state of the agent itself includes but is not limited to the state of the agent's own position, velocity, acceleration; task goal states include, but are not limited to, goal position, speed state; other agent states that may be observed include, but are not limited to, the location, speed state of other agents that may be observed; the observable environmental element states include, but are not limited to, observable position, velocity state of obstacles in the environment, position state of no-pass zones in the environment, and other environmental state information affecting multi-agent system tasks.

Step S200 based on

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

is the selected feature space dimension.

In this step, the graph network generation layer is realized by using a multilayer fully-connected neural network.

FIG. 2 shows a method for extracting spatiotemporal features of a distributed multi-agent space-time feature of a single agent

A schematic layer diagram is generated. Spatial state vector

After being input into the network generation layer of the graph, the graph passes through

Layer is connected completelyThe neural network is connected for extracting the original features, and the network input is the digitalized state information, so that the features do not need to be extracted by adopting a convolutional network and the like, and the fitting capacity of the original feature vector to the complex intelligent agent relation is enhanced by the multilayer full-connection network.

Step S300 based on

At the current moment

Space relation characteristic vector of

(ii) a The spatial feature relation extraction layer is constructed in a mode that a graph attention network module and a full-connection network module are stacked alternately.

In this embodiment, the attention network module uses a multi-head attention mechanism to implement feature aggregation of any agent with respect to surrounding agents and environmental information under different attentions.

FIG. 3 shows a distributed multi-agent spatiotemporal feature extraction method for a single agent according to the present invention

As shown in the figure, the method for extracting a spatial feature relationship based on the spatial feature relationship extraction layer in this embodiment includes:

step S310, using the agent

Of the original feature vector⁰

And all neighbors intelligenceOriginal feature vector of a volume⁰

(ii) a Wherein,

，

as an agent

A set of neighbor agents capable of direct communication;

⁰

，

⁰

，⁰

transmission to agents via a communication network

。

Intelligent agent

Set of directly communicable neighbor agents

The obtaining method may be: setting a determination parameter of observable distance

When another agent is present

And an agent

Is less than

Then, it is determined

。

In this embodiment, the spatial relationship feature vector¹

The acquisition method comprises the following steps:

step S311, using a learnable matrix

Pair relation feature vector⁰

、⁰

=[

⁰

，

⁰

](ii) a Will be provided with⁰

Inputting a full-connection neural network and outputting an intelligent agent

For intelligent agent

Attention coefficient of

(ii) a Obtaining an agent

For intelligent agent

Attention normalization coefficient of

The formula is

Step S312, adopting a multi-head attention mechanism for the first time

Calculating the normalized attention coefficient by the method of step S311

(ii) a Intelligent calculation under fusion of multi-head attention mechanismBody

Spatial relationship feature vector of¹

The formula is

¹

Wherein,

to keep track of the number of the first hours of attention,

the function is activated for the sigmoid and,

is as follows

Attention is drawn to the chosen linear transformation matrix,

representing a concatenation operation of the vectors.

Spatial relationship feature vector^y

（

Is odd and

) The calculation of (2) can also be performed by adopting the methods of step (311) and step (312), wherein¹

Become into^y

Will be⁰

、⁰

Is changed into^y-1

、

^y-1

、^y-1

And (4) finishing.

Step S320, in¹

，²

。

In this step, the fully-connected network module is composed of a plurality of fully-connected neural network layers, and performs enhanced representation and dimension compression on the features.

Step S330, based on the space relation characteristic vector obtained in the step S320, through the stacked graph attention network module and the full connection network module, adoptingStep S310, step S320 method iteration calculation

Spatial relationship feature vector of order^f2-1

、^f2

(ii) a Wherein,

，

the number of stacked layers of network modules and fully connected network modules is noted.

Step S340, in the iterative computation

Then, based on^k2(-1)

By the method of step S310, through

(ii) a Splicing vector value⁰

,²

,⁴

,…,^k2(-1)

,^k2-1

]Input the first

Fully connecting network modules to obtain

Spatial relationship feature vector^k2

As time of day

Lower intelligent agent

Final output based on the spatial feature relationship extraction layer

。

In this embodiment, steps S330 and S340 are processes of performing iterative computations by using the methods of steps S310 and S320 based on the alternating stack graph attention network module and the fully-connected network module, and each stack of the graph attention network module and the fully-connected network module is taken as a unit, which is included in this embodiment

A unit, therefore, needs to be performed

And (3) iterative calculation:

the first time is step S310 and step S320, obtaining¹

、²

；

The second iteration adopts the graph attention network module and the full-connection network module in the second unit to obtain the feature vector²

And spatial relationship feature vectors corresponding to all neighbor agents²

（

，²

Transmission to agents via a communication network

) Adopting the methods of step S310 and step S320 as input to respectively obtain the space relation feature vectors³

、⁴

；

Repeating the steps in the same way

The method operations of sub (including the first and second iterations) steps 310, 320, where the second step

Second space closingIs a feature vector^f2-1

、^f2

，

；

In the first place

At the time of the second iteration, based on^k2(-1)

By the method of step S310, through

(ii) a Then the original feature vector is processed⁰

The space relation feature vector obtained by the method of the step 302²

,⁴

,…,^k2(-1)

And the feature vectors obtained after the last operation step 301^k2-1

Splicing is carried out to obtain the spliced characteristic vector⁰

,²

,⁴

,…,^k2(-1)

,^k2-1

]Inputting the data into the last full-connection network module and outputting the space relation characteristic vector^k2

I.e. the current time

Lower intelligent agent

Final output of the spatial feature relationship extraction layer, denoted as^k2

。

Step S400, acquiring time

Front side

（

Adjustable positive integer) time agent

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

Wherein

。

FIG. 4 is a diagram illustrating a distributed multi-agent spatiotemporal feature extraction method for a single agent according to the present invention

The space-time feature extraction layer diagram of (1). In this embodiment, the long and short term memory network with peephole based on graph convolution operation comprises

The long and short term memory network units are connected in series and provided with peepholes; the input gate, forgetting gate and output gate of the long and short term memory network unit are constructed based on the graph convolution neural network.

Space-time relation feature vector

The acquisition method comprises the following steps: the long-short term memory network unit near the output end is recorded as

And increase the serial number in the reverse direction; first, the

A long and short term memory network unit

The cell state of (A) is recorded as

The output is a space-time relation feature vector

Input is as

Temporal spatial relationship feature vector

The first stepp+1 unit

Output space-time relation feature vector

And cell state thereof

(ii) a Wherein,

；

based on cell state notation

Feature vector of space-time relation

、

、

。

In this embodiment, the structure diagram of the 1 st long/short term memory network unit is shown in fig. 5, wherein the dotted line part represents the state of the previous long/short term memory network unit as the signal source. First, the

Space-time relation characteristic vector output by long and short term memory network unit

The calculation method comprises the following steps:

step 401, will

Temporal spatial relationship feature vector

The first step

A unit

Network element status of

And the first

A unit

Output space-time relation feature vector

，

Wherein, the graph convolution operation is represented,

and

activating a function for sigmoid;

step 402, will

Temporal spatial relationship feature vector

The first step

A unit

Output space-time relation feature vector

And cell state thereof

Updating, wherein the calculation formula is as follows:

；

；

wherein,

is the output of the input gate or gates,

in order to be a transitional state of the cell,

、

、

for the purpose of the tanh activation function,

is the Hadamard product;

at step 403, will

Temporal spatial relationship feature vector

The first step

A unit

Output space-time relation feature vector

And step 402 updated

A unit

Network element status of

A unit

Output space-time relation feature vector

The calculation formula is as follows:

wherein,

is the output variable of the output gate,

、

A distributed multi-agent behavior decision method according to a second embodiment of the present invention, as shown in fig. 1, based on the above-mentioned distributed multi-agent spatiotemporal feature extraction method, further includes step S500: obtaining an agent

At the current moment

Space-time relation characteristic vector of

The method based on model knowledge driving or the method based on reinforcement learning data driving (preferably adopting Actor-criticic architecture to train and learn the behavior of the intelligent agent) is adopted to calculate the intelligent agent

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.

In the embodiments of the above-mentioned method for extracting spatiotemporal features of distributed multi-agent and the method for deciding behaviors of distributed multi-agent, the distributed shared parameters of all agents in distributed multi-agent may include a graph network generation layer in step S200, a spatial feature extraction layer in step S300, learnable parameters of relevant neural networks in the spatiotemporal feature extraction layer in step S400, and control and decision parameters for making a behavior decision of an agent based on the extracted spatiotemporal features.

FIG. 6 is a diagram illustrating intelligent collaboration behavior of agents in an embodiment of the invention. Including 8 enclosure agents (little black circle), 1 target agent (little triangle), 2 barrier agents (pentagon) in this embodiment, for clearer demonstration, the little black circle of agent is connected through the lines in the picture. The target of the enclosure intelligent agent is to form an enclosure for the target intelligent agent, in the process, collision between the target intelligent agent and the obstacle and collision between the target intelligent agent and other enclosure intelligent agents are avoided, and the target intelligent agent tries to get rid of the enclosure. In an embodiment, four phases are included: in a series of processes, the enclosure intelligent agent adopts the method provided by the invention to extract space-time characteristics, and on the basis, an Actor-Critic architecture is adopted to make a behavior decision, a target intelligent agent adopts a traditional artificial potential field method to make a behavior decision, and an obstacle intelligent agent is set as a static obstacle. In a certain time, the captive intelligent bodies learn the cooperative behaviors, the achievement realizes the enclosure of the target intelligent bodies, and the self-adaptive and distributed cooperative advantages of the method provided by the invention in the process of coping with complex and dynamic multi-intelligent-body behavior decision are shown.

The description in the above embodiment is a data processing method at the current time, and at the next time

Will be based on

And (5) extracting the time-space characteristics and generating behavior decision according to the method, namely updating the time step, returning to the step S100, and calculating the next time.

The description in the embodiment is a process of space-time feature extraction and behavior decision generation of the intelligent agents, and each intelligent agent adopts the method to calculate at the same time; each intelligent agent can independently calculate, and the calculation result can be sent to each intelligent agent after the calculation of each intelligent agent is carried out through the cloud platform.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the above-described distributed multi-agent behavior decision method may refer to the corresponding process in the foregoing embodiment of the distributed multi-agent spatiotemporal feature extraction method, and details are not repeated herein.

The distributed multi-agent space-time feature extraction system comprises a state vector acquisition module, an original feature generation module, a space relation calculation module and a space-time relation calculation module;

，Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

At the current moment

Space relation characteristic vector of

the time-space relation calculation module is configured to obtain time

Front side

Individual moment intelligent agent

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

。

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the above-described distributed multi-agent spatiotemporal feature extraction system may refer to the corresponding process in the foregoing embodiment of the distributed multi-agent spatiotemporal feature extraction method, and will not be described herein again.

The distributed multi-agent behavior decision system of the fourth embodiment of the invention is based on the above-mentioned distributed multi-agent space-time feature extraction system, and also includes a behavior decision calculation module;

At the current moment

Space-time relation characteristic vector of

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.

The flow chart of this embodiment is shown in fig. 7.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the above-described distributed multi-agent behavior decision system may refer to the corresponding process in the foregoing embodiment of the distributed multi-agent behavior decision method, and details are not repeated herein.

It should be noted that, the distributed multi-agent spatiotemporal feature extraction system and the distributed multi-agent behavior decision system provided in the foregoing embodiments are only exemplified by the division of the above functional modules, and in practical applications, the above functions may be allocated by different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a fifth embodiment of the present invention stores a plurality of programs, which are suitable for being loaded and executed by a processor to implement the above-mentioned method for extracting spatiotemporal features of distributed multi-agent, and/or the above-mentioned method for deciding the behavior of distributed multi-agent.

A processing apparatus according to a sixth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to realize the above-mentioned distributed multi-agent space-time feature extraction method and/or the above-mentioned distributed multi-agent behavior decision method.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A distributed multi-agent space-time feature extraction method is characterized by comprising the following steps:

step S100, at the time

，Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

step S200 based on

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

step S300 based on

At the current moment

Space relation characteristic vector of

step S400, acquiring time

Front side

Individual moment intelligent agent

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

。

2. The distributed multi-agent spatiotemporal feature extraction method as defined in claim 1, wherein "agent at each moment of time" in step S100

Observable spatial state vectors "including agent's own state, task goal state, other intelligence observableEnergy state, observable environmental element state;

the task target state comprises a target position and a speed state; the observable other agent states include observable location, speed states of other agents;

3. The distributed multi-agent spatiotemporal feature extraction method as defined in claim 1, wherein the graph network generation layer in step S200 is composed of a plurality of layers of fully connected neural networks.

4. The method for extracting spatiotemporal features of distributed multi-agent as claimed in claim 1, wherein the method for extracting spatial feature relationship based on the spatial feature relationship extraction layer in step S300 comprises:

step S310, using the agent

Of the original feature vector⁰

And the original feature vectors of all neighboring agents⁰

(ii) a Wherein,

，

as an agent

A set of neighbor agents capable of direct communication;

⁰

，

⁰

；

step S320, in¹

；

Spatial relationship feature vector of order

^f2-1

、^{f 2}

(ii) a Wherein,

，

step S340, in the iterative computation

Then, based on^k2(-1)

By the method of step S310, through

(ii) a Splicing vector value⁰

,²

,⁴

,…,^k2(-1)

,^k2-1

]Input the first

Fully connecting network modules to obtain

Spatial relationship feature vector^{k 2}

As time of day

Lower intelligent agent

Final output based on the spatial feature relationship extraction layer

。

5. The distributed multi-agent spatiotemporal feature extraction method as defined in claim 4, wherein the spatial relationship feature vector¹

The acquisition method comprises the following steps:

step S311, using a learnable matrix

Pair relation feature vector⁰

、⁰

=[

⁰

,

⁰

](ii) a Will be provided with⁰

Inputting a full-connection neural network and outputting an intelligent agent

For intelligent agent

Attention coefficient of

(ii) a Obtaining an agent

For intelligent agent

Attention normalization coefficient of

，

Step S312, adopting a multi-head attention mechanism for the first time

Calculating the normalized attention coefficient by the method of step S311

Spatial relationship feature vector of¹

，

¹

Wherein,

to keep track of the number of the first hours of attention,

the function is activated for the sigmoid and,

is as follows

Attention is drawn to the chosen linear transformation matrix,

representing a concatenation operation of the vectors.

6. The distributed multi-agent spatiotemporal feature extraction method as defined in claim 1, wherein the graph convolution operation based long and short term memory network with peepholes comprises

The long and short term memory network units are connected in series and provided with peepholes; and an input gate, a forgetting gate and an output gate of the long and short term memory network unit are constructed based on a graph convolution neural network.

7. The distributed multi-agent spatiotemporal feature extraction method as claimed in claim 6, wherein the spatiotemporal relationship feature vector

The acquisition method comprises the following steps:

the long-short term memory network unit near the output end is recorded as

And increase the serial number in the reverse direction;

first, the

A long and short term memory network unit

The cell state of (A) is recorded as

The output is a space-time relation feature vector

Input is as

Temporal spatial relationship feature vector

The first stepp+1 unit

Output space-time relation feature vector

And cell state thereof

(ii) a Wherein,

；

based on cell state notation

Feature vector of space-time relation

、

、

。

8. The distributed multi-agent spatiotemporal feature extraction method as defined in claim 7, wherein the spatiotemporal relationship is characterizedEigenvector

The calculation method comprises the following steps:

step 401, will

Temporal spatial relationship feature vector

The first step

A unit

Network element status of

And the first

A unit

Output space-time relation feature vector

，

Wherein,

the operation of the convolution of the graph is shown,

and

activating a function for sigmoid;

step 402, will

Temporal spatial relationship feature vector

The first step

A unit

Output space-time relation feature vector

And cell state thereof

Updating, wherein the calculation formula is as follows:

wherein,

is the output of the input gate or gates,

in order to be a transitional state of the cell,

、

、

for the purpose of the tanh activation function,

is the Hadamard product;

at step 403, will

Temporal spatial relationship feature vector

The first step

A unit

Output space-time relation feature vector

And step 402 updated

A unit

Network element status of

A unit

Output space-time relation feature vector

The calculation formula is as follows:

wherein,

is the output variable of the output gate,

、

9. A distributed multi-agent spatiotemporal feature extraction method according to any of claims 1-8, characterized in that all agents in the distributed multi-agent share learnable parameters.

10. A distributed multi-agent behavior decision method, characterized in that an agent is obtained based on the distributed multi-agent spatiotemporal feature extraction method of any one of claims 1 to 9

At the current moment

Space-time relation characteristic vector of

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.

11. A distributed multi-agent behavior decision method as defined in claim 10, wherein all agents in the distributed multi-agent share learnable parameters.

12. A distributed multi-agent space-time feature extraction system is characterized by comprising a state vector acquisition module, an original feature generation module, a space relation calculation module and a space-time relation calculation module;

，Based on

Agent for each moment from moment

Observable spatial state vector

Splicing to obtain agents

At the current moment

Space-time state vector of

(ii) a Wherein,

the number is a preset historical state number;

Obtaining agents by generating layers through a graph network

Of the original feature vector

(ii) a Wherein,

，

a selected feature space dimension;

At the current moment

Space relation characteristic vector of

(ii) a Wherein the spatial feature relation extraction layer adopts a graphThe attention network module and the full-connection network module are constructed in an alternate stacking mode;

the time-space relation calculation module is configured to obtain time

Front side

Individual moment intelligent agent

Spatial relationship feature vector of

Will be

、

At the current moment

Space-time relation characteristic vector of

。

13. A distributed multi-agent behavior decision making system, based on the distributed multi-agent spatiotemporal feature extraction system of claim 12, further comprising a behavior decision calculation module; the behavioral decision calculation module is configured to be based on an agent

At the current moment

Space-time relation characteristic vector of

At the current moment

Behavioral decision set of

(ii) a Wherein,

，

is the chosen decision space dimension.