CN112257875A

CN112257875A - Task understanding method of multiple intelligent agents based on extreme learning machine

Info

Publication number: CN112257875A
Application number: CN202011269619.4A
Authority: CN
Inventors: 辛斌; 李朝阳; 王淼; 陈杰; 王晴; 杨庆凯; 鲁赛; 张若伟
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-01-22

Abstract

The task understanding method of the multi-agent of the extreme learning machine initializes the multi-agent parameters and the environmental situation perception information; formulating task understanding sample data of the extreme learning machine according to tasks and parameters of the multi-agent and environment situation perception information; determining a task understanding network structure of the extreme learning machine according to the task understanding sample data; training a task understanding network structure of the extreme learning machine by using the task sample data to obtain a task understanding model of the extreme learning machine; and after the multi-agent receives the task instruction, acquiring current environment situation perception information and multi-agent parameters, and inputting the current environment situation perception information and the multi-agent parameters into a task understanding model of the extreme learning machine to obtain a task understanding result. The method can make full use of information such as battlefield environment situation information, multi-agent capacity values and the like, generate an understanding result which effectively fits the thinking of a commander, avoid subjective factors depending on an expert system to a certain extent, and ensure the accuracy of task understanding.

Description

Task understanding method of multiple intelligent agents based on extreme learning machine

Technical Field

The disclosure belongs to the technical field of task understanding of multi-agent, and particularly relates to a task understanding method of multi-agent based on an extreme learning machine.

Background

With the rapid development of computer technology, communication technology and artificial intelligence technology, the traditional war form and thinking have also changed. With the advance of modernization and informatization, various new technologies are also applied to a combat system, and the combat mode is continuously upgraded and evolved. The multi-agent system has the advantages of casualties reduction, strong maneuverability and viability, flexible function configuration, suitability for executing various dangerous tasks in severe environment and the like, and becomes an important force of a novel battlefield.

The deep fusion of the multi-agent system and the manned platform is an important guarantee for forming an integrated cooperative system of the manned platform and the multi-agent system. The OODA cycle model describes four links of decision: observation, judgment, decision and action. The task understanding of the multi-agent system is an important ring in an OODA loop, and means that an unmanned platform combines the understanding of the environment and the situation to generate a task understanding result which is consistent with an underexpressed or unclear instruction issued by a manned platform. The classic task understanding method comprises a layered task decomposition method, a mapping method from natural language to action instructions and the like. However, these methods depend on subjective factors of expert systems to some extent, and may cause problems such as the same instruction obtaining different task understanding results under the same conditions.

Disclosure of Invention

In view of the above, the present disclosure provides a multi-agent task understanding method based on an extreme learning machine, which can make full use of situation information of a battlefield environment and information such as multi-agent capability values to generate an understanding result that effectively fits the thinking of a commander, avoid subjective factors that depend on an expert system to a certain extent, enable the same instruction to obtain the same task understanding result under the same condition, and ensure the accuracy of task understanding.

According to an aspect of the present disclosure, there is provided a task understanding method of a multi-agent based extreme learning machine, including:

initializing the multi-agent parameters and environmental situation perception information;

formulating task understanding sample data of the multi-agent of the extreme learning machine according to the tasks of the multi-agent, the parameters of the multi-agent and the environmental situation perception information;

determining a task understanding network structure of the multi-agent of the extreme learning machine according to the task understanding sample data of the multi-agent;

training the task understanding network structure of the multiple agents of the extreme learning machine by using the task sample data of the multiple agents to obtain a task understanding model of the multiple agents of the extreme learning machine;

and after receiving a task instruction, the multi-agent obtains current environment situation perception information and multi-agent parameters, and inputs the current environment situation perception information and the multi-agent parameters into a multi-agent task understanding model of the extreme learning machine to obtain a task understanding result of the multi-agent.

In one possible implementation, the task understanding sample data includes task understanding data and task understanding label data, and is divided into task understanding training data and task understanding test data.

In one possible implementation, training a task understanding network structure of the multi-agents of the extreme learning machine by using task understanding sample data of the multi-agents to obtain a task understanding model of the multi-agents of the extreme learning machine includes:

training the task understanding network structure of the multi-agent of the extreme learning machine by using the task understanding training data of the multi-agent to obtain an initial task understanding network structure of the multi-agent of the extreme learning machine;

testing the initial task understanding network structure of the multi-agent of the extreme learning machine by using the task understanding test data of the multi-agent, and storing the initial task understanding network structure of the multi-agent of the extreme learning machine as a task understanding model of the extreme learning multi-agent when the performance of the multi-agent is met; otherwise, adjusting the initial task understanding network structure of the multi-agent of the extreme learning machine.

In one possible implementation, the multi-agent parameters include the number n of the multi-agents and the capability values of the multi-agents, n being a positive integer;

the environment situation awareness information includes threat degree and, number and/or distribution density of target objects in the environment.

In one possible implementation, the task understanding sample data format of the multi-agent is [ x ]₁,...,x_n,y₁,y₂]Task understanding sample tag data format is [ o ]₁,...,o_i,...,o_n]Wherein x is_iCapability value, y, for the ith agent for task instructions₁Threat value, y, for a target object perceived by environmental situation₂For the number or distribution density of target objects perceived by the environmental situation, o_iThe value of (i) is 0 or 1, i is more than or equal to 1 and less than or equal to n, i is a positive integer, and i is the number of the multi-agent;

in one possible implementation, the multi-agent task understanding network architecture includes an input layer, a hidden layer, and an output layer; the number of the nodes of the input layer is n +2, the number of the nodes of the output layer is 1, and the number of the nodes of the hidden layer is smaller than the number of the sample data.

In one possible implementation, adjusting the initial task understanding network structure of the multi-agent of the extreme learning machine is adjusting the number of nodes of the hidden layer of the task understanding network structure of the multi-agent.

In one possible implementation, the multi-agent tasks include search tasks and percussive tasks.

In one possible implementation, the liveliness function of the hidden layer is a sigmoid function.

Sensing information by initializing the multi-agent parameters and environmental situation; formulating task understanding sample data of the multi-agent of the extreme learning machine according to the tasks of the multi-agent, the parameters of the multi-agent and the environmental situation perception information; determining a task understanding network structure of the multi-agent of the extreme learning machine according to the task understanding sample data of the multi-agent; training the task understanding network structure of the multiple agents of the extreme learning machine by using the task sample data of the multiple agents to obtain a task understanding model of the multiple agents of the extreme learning machine; and after receiving a task instruction, the multi-agent obtains current environment situation perception information and multi-agent parameters, and inputs the current environment situation perception information and the multi-agent parameters into a multi-agent task understanding model of the extreme learning machine to obtain a task understanding result of the multi-agent. The situation information of the battlefield environment, the multi-agent capacity value and other information can be fully utilized to generate an understanding result which effectively fits the thinking of a commander, and the subjectivity factor depending on an expert system to a certain extent is avoided, so that the same instruction obtains the same task understanding result under the same condition, and the accuracy of task understanding is ensured.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 illustrates a flowchart of a task understanding method for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates a hypothetical scenario diagram of a task understanding method for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure;

FIG. 3 shows a schematic region partitioning diagram of a hypothetical scenario for a task understanding method for multi-agent extreme learning machine based, according to an embodiment of the present disclosure;

FIG. 4 shows a search task understanding network diagram for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a hit task understanding network for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure;

FIG. 6 illustrates a task understanding result diagram for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure;

FIG. 7 illustrates a task allocation diagram for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In a complex battlefield environment, the multi-agent can execute dangerous tasks in a severe environment, and casualties are effectively reduced. In the battle, the commander can obtain control tactics according to information such as situation, environment and the like and issue corresponding battle instructions to the multiple intelligent agents. In order to realize the efficient cooperative combat between the commander and the multiple intelligent agents, the multiple intelligent agents are particularly important to reasonably understand and display the instructions issued by the commander. The input data of the task understanding has multi-source characteristics, and in order to better utilize the information of the input data, an extreme learning machine is adopted to carry out reasoning on the task understanding.

FIG. 1 shows a flowchart of a task understanding method for an extreme learning machine-based multi-agent, according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

step S1: initializing the multi-agent parameters and environmental situation awareness information.

In a specific scenario, the capability values and related parameters of the multi-agent can be initialized to different values, for example, the parameters include the speed, detection capability, striking capability, initial position, etc. of the multi-agent, and the initialization operation of a target object (multi-agent of an enemy), the capability values, positions, etc. of the target object are set, and after the initialization is completed, the visual positions of the model expressions corresponding to all the agents can be displayed in the specific scenario.

In one example, the multi-agent parameter may be a parameter of my multi-agent, and may include the number n of multi-agents and the capability values of the multi-agents, n being a positive integer. The environment situation awareness information is a parameter of the enemy multi-agent, and can include the threat degree sum, the number or the distribution density of the enemy multi-agent (target object) for situation awareness in the environment.

Wherein, the ability value of the agent can be the range which can be searched or hit by the agent with the agent as the center.

FIG. 2 shows a hypothetical scenario diagram of a task understanding method for extreme learning machine-based multi-agents, according to an embodiment of the present disclosure. FIG. 3 shows a schematic region partitioning diagram of a hypothetical scenario for a task understanding method for multi-agent extreme learning machine based, according to an embodiment of the present disclosure.

As shown in fig. 2, the assumed environment scene built in the AnyLogic simulation environment includes obstacles such as rivers, and the area size is 10km × 6 cm. The number of my agent is 15, for example agent 1 has a capability value of 0.395km, agent 2 has a capability value of 0.522km, etc. The sum of the threat levels of the enemy multi-agent (target object) is 3, the number is 5, and the like. Then, the assumed scene is preprocessed, a large scene area is divided into several task areas according to the relative positions of the multiple agents, as shown in fig. 3, the assumed scene shown in fig. 2 is divided into 8 small areas according to the relative position relationship of the multiple agents, and the types of obstacles and enemy agents existing in different areas may be the same or different. In this way, sample data can be understood by the tasks of the multi-agent of the extreme learning machine by using the assumed scene and the related parameters of the multi-agent of our party and the multi-agent of local place in the scene.

Step S2: and formulating task understanding sample data of the multi-agent of the extreme learning machine according to the tasks of the multi-agent, the parameters of the multi-agent and the environmental situation perception information.

The task understanding sample data is randomly divided into two parts, namely task understanding training data and task understanding test data, and the number of the task understanding test data is less than that of the task understanding training data.

And the task understanding sample data format comprises a task understanding data format and a task understanding label data format.

In one example, the task understanding sample data format of the multi-agent is [ x ]₁,...,x_n，y₁,y₂]The task understanding sample tag data format of the multi-agent is [ o ]₁,...,o_i,...,o_n]Wherein x is_iCapability value, y, for the ith agent for task instructions₁Threat value, y, for a target object perceived by environmental situation₂For the number or distribution density of target objects perceived by the environmental situation, o_iThe value of (i) is 0 or 1, i is more than or equal to 1 and less than or equal to n, i is a positive integer, and i is the number of the multi-agent.

For different tasks, such as for typical search tasks and percussive tasks, sample data formats for understanding the sample and tag data formats for understanding the task, which are needed for training the extreme learning machine model and suitable for corresponding to different tasks, can be made.

For example, if the number of multi-agents is 15 for a search task, the sample data format of the designed search task is [ x [ ]₁...x₁₅,y₁,y₂]The label data format is a one-dimensional vector [ o ]₁...o₁₅]。

Wherein x is₁、x₂Etc. represent the capability values, y, of agent 1, agent 2, etc. for the task areas specified in the search instruction as described in fig. 2, fig. 3₁For the presence of a sum of threat values, y, in a search target area obtained by context awareness₂The distribution density of the enemy agents in the search target area is perceived through the environment situation. Tag data vector [ o ]₁...o₁₅]The value of the vector element of (a) is 0 or 1, which is used to represent whether the corresponding agent performs the search task, for example, when the tag data vector [ o ]₁...o₁₅]Is [0,0,0,0,0,1,0,0,0,1,0,0,0,0,0]While the search tasks are performed on behalf of the 6 th agent and the 10 th agent, the other agents do not perform the search tasks.

Based on the divided search area in step 2, a stage of generating a search task sample is carried out, the multi-agent is positioned at certain positions of the scene, after a commander issues an instruction for searching a task area, a task understanding module of a plurality of intelligent agents is triggered, an AnyLogic simulation system automatically records the current capability value of each intelligent agent and the current environmental situation perception information value according to the formulated search task sample data format to generate a sample, the commander manufactures corresponding label data according to the search task label data format according to experience, the manufactured search task understanding sample data and the label data are stored in a text, for example, 700 sets of search task understanding sample data are generated, 500 sets of search task understanding sample data are randomly extracted to form a search task understanding sample training set, and the remaining 200 sets of search task understanding sample data form a search task understanding sample test set.

Aiming at the striking task, the number of the multi-agents is still 15, and the sample data format of the designed striking task is [ z ]₁,...,z₁₅,k₁,k₂]The label data format is a one-dimensional vector [ O ]₁,...,O₁₅]。

Wherein z is₁、z₁₅Etc. represents the ability value, k, of agent 1, agent 2, etc. to strike a target object (an enemy agent) in the task area as described in fig. 2, fig. 3₁Sum of threat values, k, for a group to which the target object belongs as perceived by the environmental situation₂The number of clusters where the target object is located is perceived through the environment situation. Tag data vector [ O ]₁,...,O₁₅]The value of the vector element(s) is 0 or 1, which is used to represent whether to assign the corresponding agent to execute the percussion task, for example, when the sample tag data vector [ O ] of the percussion task₁,...,O₁₅]Is [0,0,0,0,0,1,0,0,0,1,0,0,0,0,0]Meanwhile, the representative assigns agent 6 and agent 10 to perform the percussion task, and the other agents do not perform the percussion task.

After the generation of the sample data of the search task is finished, the embodiment is operated again, the situation in the environments shown in the figures 2 and 3 is observed, and the agents in the enemy agent group are randomly selected as the targets of the attack task. When a commander issues an instruction of a percussion task, the AnyLogic simulation system can automatically generate sample data of the percussion task according to a percussion task understanding sample data format, and the commander can manufacture label data of the percussion task in the situation according to experience and the percussion task understanding sample label data format. After the tag is manufactured, the percussion task understanding sample data and the percussion task understanding sample tag data are stored in corresponding texts. As with the search task, 700 sets of percussion task understanding sample data are generated, 500 sets of percussion task understanding sample data are randomly extracted to form a percussion task understanding sample training set, and the remaining 200 sets of percussion task understanding sample data form a percussion task understanding sample testing set.

Step S3: and determining the task understanding network structure of the multi-agent of the extreme learning machine according to the task understanding sample data of the multi-agent.

In one example, a multi-agent task understanding network architecture may include an input layer, an implied layer, and an output layer; the number of nodes of the input layer is n +2(n is the number of our multi-agent), the number of nodes of the output layer is 1, and the number of nodes of the hidden layer is smaller than the number of the sample data.

The number of the nodes of the input layer corresponds to the dimension of sample data of the multi-agent of our party, and the number of the hidden layer can be set to be a value smaller than the number of the samples according to the experience of a commander. And inputting weight coefficients from the hidden layer to the hidden layer, initializing bias coefficients of the hidden layer and the output layer by using random numbers, and adopting a sigmoid function as an activation function of the hidden layer.

Fig. 4 and 5 show schematic understanding networks of extreme learning machine-based multi-agent search tasks and percussive tasks, respectively, according to an embodiment of the present disclosure.

For example, extreme learning machine models of the search task and the hit task are trained using my 15 agents and the assumed scenario as shown in fig. 2 and 3, respectively.

As shown in fig. 4, for the extreme learning machine model of the training search task of the ith agent, the number of input nodes of the network model structure for understanding the search task of the multi-agent is set to be 17, the number of nodes of the hidden layer is set to be 30, the number of nodes of the output layer is set to be 1, and the hidden layer neuron bias a of the extreme learning machine is randomly initialized (a)_j ⁱ⁾Bias of output neuron b⁽ⁱ⁾Input layer to hidden layer input weights W⁽ⁱ⁾Implicit layer to output layer input weights beta⁽ⁱ⁾(ii) a The activation function selects the sigmoid function. The principle of the extreme learning machine model of the training attack task for the ith agent is the same as that of the extreme learning machine model of the training search task for the ith agent, and as shown in fig. 5, only one more enemy unit group situation node is added to the input node of the extreme learning machine model of the attack task.

Step S4: and training the task understanding network structure of the multi-agent of the extreme learning machine by using the task sample data of the multi-agent to obtain a task understanding model of the multi-agent of the extreme learning machine.

In one example, the task understanding network structure of the multi-agents of the extreme learning machine may be trained using the task understanding training data of the multi-agents to obtain an initial task understanding network structure of the multi-agents of the extreme learning machine; testing the initial task understanding network structure of the multi-agent of the extreme learning machine by using the task understanding test data of the multi-agent, and storing the initial task understanding network structure of the multi-agent of the extreme learning machine as a task understanding model of the extreme learning multi-agent when the performance of the multi-agent is met; otherwise, adjusting the initial task understanding network structure of the multi-agent of the extreme learning machine.

For my ith agent, the multi-agent task understanding model of the extreme learning machine is trained using a training sample set of search tasks and hit tasks stored in the text, and the performance of the multi-agent task understanding model is tested using a test sample set of predetermined search tasks and hit tasks. Taking a search task understanding model of a plurality of agents for training an extreme learning machine as an example for explanation, loading 500 groups of search task understanding training sample sets corresponding to the ith agent from a text, training the multi-agent search task understanding model of the extreme learning machine, then using the remaining 200 groups of search task understanding test samples to test the performance of the multi-agent search task understanding model of the extreme learning machine finished by training, and if the performance requirements of the plurality of agents are met, saving parameters of the multi-agent search task understanding model of the extreme learning machine; otherwise, adjusting the number of nodes of a hidden layer of the multi-agent search task understanding model of the extreme learning machine, re-executing the steps to generate a certain amount of new multi-agent search task understanding model sample data of the extreme learning machine, training the multi-agent search task understanding model sample data set of the extreme learning machine, then training and testing the multi-agent search task understanding model of the extreme learning machine until the multi-agent search task understanding model of the extreme learning machine with better performance is obtained, and storing the parameters of the multi-agent search task understanding model of the extreme learning machine. The training process of the multi-agent hit task understanding model of the extreme learning machine is similar to the search task. If the ideal extreme learning machine model cannot be obtained by using the current samples, a certain number of search task samples can be regenerated, the samples in the current training set are equivalently replaced, and the multi-agent hit task understanding model of the extreme learning machine is retrained.

Step S5: and after receiving a task instruction, the multi-agent obtains current environment situation perception information and multi-agent parameters, and inputs the current environment situation perception information and the multi-agent parameters into a multi-agent task understanding model of the extreme learning machine to obtain a task understanding result of the multi-agent.

FIG. 6 illustrates a task understanding result diagram for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure; FIG. 7 illustrates a task allocation diagram for an extreme learning machine-based multi-agent in accordance with an embodiment of the present disclosure.

After the search task understanding model and the hit task understanding model of the multi-agent of the extreme learning machine are trained, each agent loads the corresponding trained extreme learning machine model. After a commander issues a task searching or attacking instruction to an intelligent agent, an AnyLogic simulation system acquires multi-source data such as current environment situation perception information, the physical ability value of the intelligent agent of our party and the like from an environment and inputs the multi-source data into a searching task understanding model or an attacking understanding model frame of the multi-intelligent agent of the extreme learning machine, and a task understanding result of the multi-intelligent agent is obtained through inference and calculation and is a one-dimensional vector, the vector dimension is n, wherein n is the number of the intelligent agents of our parties, and the value of each vector element is 0 and 1 and corresponds to the task understanding result of each intelligent agent. As shown in fig. 6, when the task understanding module of the multi-agent of the extreme learning machine receives the triggering instruction of the search task, 15 multi-agents in the current environmental scene are 0 in my party unit, 1 … in my party unit, 14 in my party unit, and the search capability values of 15 multi-agents are the values of the capability/distance value column; the current environment situation perception information (the threat degree and the distribution density of the enemy multi-agent) is the corresponding value of the threat degree and the enemy distribution density of the enemy, and the search task understanding result of the multi-agent search task understanding model of the extreme learning machine is obtained through reasoning. As shown in FIG. 7, the array [0,0,0,0,0,1,0,0,0, 0,0] represents the search tasks assigned to agent 6 and agent 10, which understand that the results reasonably fit the director's intent.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for task understanding of multi-agents based on an extreme learning machine, the method comprising:

2. The task understanding method according to claim 1, wherein the task understanding sample data includes task understanding data and task understanding label data, and is divided into task understanding training data and task understanding test data.

3. A task understanding method according to claim 2, wherein training a task understanding network structure of the multi-agents of the extreme learning machine using the task understanding sample data of the multi-agents to obtain a task understanding model of the multi-agents of the extreme learning machine comprises:

4. A task understanding method according to claim 1, wherein said multi-agent parameters include the number n of said multi-agents and the ability values of said multi-agents, n being a positive integer;

the environment situation awareness information comprises the threat degree sum, the quantity or the distribution density of the target objects in the environment.

5. A task understanding method according to claim 4, wherein the task understanding sample data format of the multi-agent is [ x ]₁,...,x_n,y₁,y₂]Task understanding sample tag data format is [ o ]₁,...,o_i,...,o_n]Wherein x is_iCapability value, y, for the ith agent for task instructions₁Threat value, y, for a target object perceived by environmental situation₂For the number or distribution density of target objects perceived by the environmental situation, o_iThe value of (i) is 0 or 1, i is more than or equal to 1 and less than or equal to n, i is a positive integer, and i is the number of the multi-agent.

6. A task understanding method according to claim 3, characterized in that the task understanding network structure of the multi-agent comprises an input layer, a hidden layer and an output layer; the number of the nodes of the input layer is n +2, the number of the nodes of the output layer is 1, and the number of the nodes of the hidden layer is smaller than the number of the sample data.

7. A task understanding method of claim 6, wherein adjusting the initial task understanding network structure of the multi-agent of the extreme learning machine is adjusting the number of nodes of the hidden layer of the task understanding network structure of the multi-agent.

8. A task understanding method according to claim 1, wherein the tasks of the multi-agent include search tasks and percussive tasks.

9. The task understanding method of claim 6, wherein the liveliness function of the hidden layer is a sigmoid function.