CN112307622B

CN112307622B - Autonomous planning system and planning method for generating force by computer

Info

Publication number: CN112307622B
Application number: CN202011190896.6A
Authority: CN
Inventors: 魏永勇; 陈岩; 李广运; 易中凯; 张天赐; 郭志明; 王珣; 曹俊卿; 赵裕伟; 王迪
Original assignee: Ordnance Science and Research Academy of China
Current assignee: Ordnance Science and Research Academy of China
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2024-05-17
Anticipated expiration: 2040-10-30
Also published as: CN112307622A

Abstract

The invention relates to an autonomous decision-making system and a decision-making method for generating weapon forces by a computer, wherein a sensing unit is used for collecting environment data of a simulated battlefield environment where the autonomous decision-making system is positioned and sharing the environment data and threat level with other autonomous decision-making systems; modeling based on the collected environmental data, the shared environmental data and the threat level to form situation information, and generating the threat level based on the situation information; the decision unit is used for receiving the task issued by the upper-level autonomous planning system, planning the task and generating a path for executing the task according to the task target and the threat level; and the behavior unit is used for simulating the real response of the corresponding object on the battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target place according to the path for executing the task.

Description

Autonomous planning system and planning method for generating force by computer

Technical Field

The invention belongs to the technical field of computer generated force simulation, and particularly relates to an autonomous decision system and a decision method for generating force by a computer.

Background

The research of the domestic intelligent modeling and simulation technology for generating the weapon force CGF by a computer starts later, and the national 'Jiuwu' national defense pre-research important project 'comprehensive multi-weapon platform demonstration system' and the national '863' research subject 'distributed virtual scene technology DVENET' play a positive role in promoting the CGF modeling and simulation technology.

In recent years, some key technologies and applications for CGF have established a certain research basis and related exemplary systems. Beihang University lead development DVENET, wherein part of computer generated weapons of air combat, sea combat, land combat are realized. The national defense science and technology university establishes a war computer generated weapon platform SEFBG based on the research of the methodology of the control theory behavior modeling. Apart from related systems, the national defense science and technology university automation institute developed a force behavior modeling study based on a finite state machine; beihang University key laboratories have developed research on the force generated by air combat computers; the armoured engineering institute has conducted a study of the computer generated force system of armored combat vehicles.

Generally, research on CGF artificial intelligence modeling and simulation technology is still in an initial stage in China, and a certain gap exists between the research and the foreign country in theoretical research and system design.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an autonomous decision-making system and a decision-making method for generating force by a computer.

In order to achieve the above object, the present invention provides an autonomous planning system for generating force by a computer, comprising a sensing unit, a decision unit and a behavior unit;

The sensing unit is used for collecting the environmental data of the simulated battlefield environment, and sharing the environmental data and threat level with other autonomous decision-making systems; modeling based on the collected environmental data, the shared environmental data and the threat level to form situation information, and generating the threat level based on the situation information;

the decision unit is used for receiving the task issued by the upper-level autonomous planning system, planning the task and generating a path for executing the task according to the task target and the threat level;

And the behavior unit is used for simulating the real response of the corresponding object on the battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target place according to the path for executing the task.

Further, the sensing unit comprises a plurality of sensors and a blackboard system; the sensor is used for collecting environmental data; the blackboard system is used for storing and sharing environment data and threat levels among the same parties.

Further, the sensor comprises a basic timer, a core function and a data area;

The basic timer is used for defining the duration time of the sensor in an activated state and the interval time between two activations of the sensor;

The core function is used for defining the conditions of activation and failure of the perceptrons, and the behaviors during activation and failure;

The data area is used for storing state information of the sensor, whether the sensor is activated or not and whether the sensor is available or not; store the associated data, the dependent parent sensor and custom data required for program logic implementation.

Further, the sensing unit further comprises a communication capability module, wherein the communication capability module is used for defining the communication capability of the CGF agent, including the communication range capable of reading the shared data, and setting the weight of the shared threat level according to the reliability and the effectiveness of the data source.

Further, the perception unit comprises a threat calculation module, wherein a deep convolutional neural network model is built in the threat calculation module, and threat levels are calculated and output for situation information extraction features.

Further, the decision unit comprises a path planning module which plans a path with minimum loss of the CGF agent from the initial position to the target position as an optimal path.

Further, when the CGF agent moves toward the target point, the path planning module cyclically checks the node n that minimizes f (n), and moves toward the node n; the loss f (n) is calculated as follows:

f(n)＝g(n)+h(n)

Where g (n) represents the total cost of the current location to any node n and h (n) represents the heuristic evaluation cost from node n to the target location.

Further, the behavior unit comprises a state module and an action module, wherein the state module is used for defining single action attributes, action sets and basic connection relations among the single actions in the action sets, and the action module is used for defining state jump relations among different action sets, and threat levels are used as jump conditions.

Further, the behavior unit adopts a control AI module to package a state module and an action module, and the control AI module uses a behavior tree to decide the CGF agent behavior, and threat level is used as a decision branch condition of the behavior tree.

Another aspect of the present invention provides a method of planning using the autonomous planning system for computer generated force, comprising the steps of:

The sensing unit acquires the environmental data of the simulated battlefield environment in real time, and shares the environmental data and threat level of other autonomous decision-making systems; forming situation information, and generating threat levels based on the situation information;

If the self-planning system is an upper autonomous planning system, the decision unit receives the task and then generates a lower task sequence based on a decision model, wherein the decision model is based on a behavior tree description; if the self-planning system is a lower autonomous planning system, the decision unit generates a path for executing the task according to the task target and situation information; the decision unit generates a series of action sets based on the action tree, and the action sets are executed by the action unit;

after completing the task, reporting the task according to the issued task rule, or waiting after completing the task.

The technical scheme of the invention has the following beneficial technical effects:

(1) The invention utilizes the convolutional neural network to have translation invariance (shift-invariant), can effectively reduce the problem of high dimension calculation amount of input features, learns smaller local modes through the former convolutional layer, learns higher abstract modes based on the former layer, and is relatively verified to be effective to threat perception problems, and the deep convolutional neural network is relatively effective to threat level identification;

(2) The invention solves the problem of unsafe path planning caused by ignoring enemy fire threat in the path planning by the traditional A-type algorithm, and the scheme mainly comprises the steps of establishing quadratic programming on the basis of the original path planning, introducing enemy fire threat values into A-type algorithm valuation functions, and weighting the whole valuation functions as loss value function input factors;

(3) The basic tactical action system adopts scene mapping to be motivation (or state) and then mapping to be specific action sequences through motivation (or state), realizes combination and serialization of a large number of element actions by using a multi-layer finite action state machine framework, and can classify the element actions more naturally according to the basic response mode of an entity to environmental stimulus;

(4) The Control AI module is more convenient to maintain and modify, so that the CGF intelligent body is more modularized, and the Control AI module uses an optimized navigation grid technology, so that the navigation can be more efficient;

(5) The autonomous decision making system realizes the simulation of the force of the weapon, the computer generates the CGF with various perceptrons and information sharing function, adds shared information to the judgment of threat level, carries out credibility attenuation, simulates the memory model of human beings, and has more real simulation effect.

Drawings

FIG. 1 is a schematic diagram of a deep convolutional neural network in one embodiment of the present invention;

FIG. 2 is a schematic diagram of a basic tactical action architecture in one embodiment of the present invention;

fig. 3 is a schematic diagram of the composition of an autonomous planning system for computer generated forces.

Detailed Description

The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

An autonomous planning system for computer generated force in one embodiment of the present invention, in conjunction with fig. 3, includes: the system comprises a sensing unit, a decision unit and a behavior unit.

The sensing unit is used for acquiring battlefield data according to the self capacity model by the CGF agent, and performing sensing modeling on the battlefield data to form the cognition of the battlefield situation. The sensing unit comprises a sensor, a blackboard system, a communication capability module and a feature training module.

The perceptrons are used for collecting environmental data, and each intelligent agent can have a plurality of perceptrons, such as vision, hearing and the like according to the class of the perception data. The sensor includes a base timer, a core function, and a data field. The basic timer is used for defining the duration of the sensor in an activated state and the interval time between two activations of the sensor. The core function is used for defining the activation condition of the sensor so as to activate and deactivate the sensor. The data area is used for storing state information and associated data of the perceptrons, and specifically comprises whether the perceptrons are activated, available, father perceptrons and custom data.

In addition, additional data or functions may be added for different types of sensor implementations. An important optional attribute is "dependency" defining other perceptrons on which the perceptrons depend; when this sensor is created, sensors on which it depends are also created together.

Blackboard systems are used for storage and sharing of data. The perceptron and the blackboard system are designed based on a VMS platform (a virtual military simulation platform developed by Nanjui Chen Xin Chun network technology Limited liability company), and the control of the perceptron and the use of the blackboard system data are completed in the process of constructing a behavior tree set. The intelligent agent should have the ability of sensing and data sharing, and when describing the behavior ability of the intelligent agent by using the behavior tree, the sensing ability should be constructed, and the data sharing ability should be constructed.

The data acquisition of the sensor is defined according to the reconnaissance capability of the CGF agent, including vision, hearing capability definition, night vision device, infrared, radar and the like, and a sharing mechanism is established at the same time, so that unified modeling is carried out on the information acquisition results of all units in the group (under the condition of good communication state). And in the running process, the needed perceptrons are run through the behavior tree, data are collected from the simulation engine according to the states that the perceptrons are activated or deactivated, and related data are stored on the blackboard system. The simulation engine comprises a physical world, intelligent agent behaviors and the like, and simulation data are generated in the running process of the simulation system. Other behavior tree nodes acquire sensor state information and related data from the blackboard system through a provided sensor access interface for decision making.

The communication capability module is used for defining the communication capability of the CGF intelligent agents so as to realize information sharing among the CGF intelligent agents. For example, through the communication capability module, the CGF intelligent agents can share the threat information acquired respectively in the normal communication range, and the weight is increased or decreased according to the reliability of the information source; and the construction of the CGF agent on the information memory capacity is realized through threat information intensity attenuation.

The feature training module is used for performing feature recognition training on the basis of the acquired shared information based on the deep convolutional neural network by the CGF agent, and outputting an effective threat level as a decision basis (activation function input).

The structural definition of the deep convolutional neural network is shown in fig. 1. In the threat perception problem, the situation information acquired by the CGF agent according to the self perception capability is input, the characteristic information is extracted, the characteristic information comprises the number, the type, the position distribution, the orientation, the speed, the visual condition, the number of unit of the my, the type, the position distribution, the orientation, the speed, the visual condition and the like of enemy targets, a time sequence two-dimensional vector (I) is formed as the input of the convolutional neural network, and a two-dimensional kernel (K) is used for carrying out convolution operation:

S(i，j)＝(I*K)(i，j)＝Σ_mΣ_nI(m，n)K(i-m，k-n)。

Convolutional neural network operation (based on tensorFlow): sliding an x y window on the two-dimensional perception information tensor, stopping and extracting surrounding characteristic two-dimensional characteristic blocks at each possible position, performing tensor product on each two-dimensional characteristic block and a learned same weight matrix (convolution kernel), converting the tensor product into one-dimensional vectors, and performing space recombination on all the vectors to convert the vectors into 3D output tensors with the shapes (My units, target units and threat levels). The convolution layer convolves the input with a window of a certain size (convolution kernel). Each volume base layer includes a convolution stage, a detection stage, and a pooling stage. The output layer outputs threat levels.

The decision unit is used for establishing a multi-layer finite state machine (HFSM) of groups, subgroups and single entities of the CGF agent, and realizing construction of tactical rules of the CGF agent with various task capabilities. And receiving a task issued by the upper-level autonomous planning system, planning the task, and generating a path for executing the task according to the task target and the threat level.

In the hierarchical CGF agent organization structure, an upper CGF agent controls a lower CGF agent to realize the planning of a combat target in a larger scale and longer time than the lower CGF agent, and is responsible for distributing tasks to the lower CGF agent and monitoring the condition of the unit to complete the tasks. The lower CGF agent belongs to the upper CGF agent, receives commands and tasks of the upper CGF and plans to execute, and makes autonomous response to battlefield environmental changes while executing the upper distribution task.

The optimization of the state jump condition design in the multi-layer finite state machine framework is to adopt a local reinforcement learning method, generate a state jump condition matrix in local space learning through continuous interaction of the CGF agent and the environment, realize more reasonable and effective reactive intelligent behavior of the CGF agent in a local mode, and represent the method as a Markov decision process. The markov decision process is described by tuples (S, a, P, R, y), wherein: s is a finite state set corresponding to a first layer space (bottommost layer) in the multi-layer finite state machine; a is a limited action set, corresponding to CGF agent action set; p is the state transition probability; r is a return function; y is a discount factor used to calculate the cumulative return. The jump condition employs the threat level generated.

The decision unit comprises a path planning module for searching an optimal path of the CGF agent from the initial position to the target position. The path planning of CGF agents can be described as finding the optimal path from the starting location to the target location in a weighted graph with losses, and the problem space is expressed as the location state of a set of entities and the edges connecting them, each edge having a corresponding loss value. The loss from states X to Y is represented by the loss function cost (X, Y) of the edge (positive value), and if X has no edge connection cost (X, Y) to Y is undefined (represented by the maximum integer that the machine can represent), states X and Y are contiguous, then cost (X, Y) and cost (Y, X) are defined as the edge loss of X to Y. The path is represented as a sequence of vertices from a starting position to a target position, and the personnel equipment moves from one position state along the sequence of vertices of the path to the next state (resulting in corresponding loss) until the target is reached.

The simplified representation of the a algorithm is: f (n) =g (n) +h (n), g (n) representing the cost from the initial node to any node n, the sum of all costs (X, Y), h (n) representing the heuristic evaluation cost from node n to the target point, the node n with the smallest f (n) being checked each time a main loop is performed while moving from the initial point to the target point. A heuristic function h' is added after using the node threat level coefficients to evaluate the cost of reaching neighboring navigation points from arbitrary locations.

The heuristic function may be: h (n) =h '(n, w 1) +cost (w 1, w 2) h' (w 2, goal), i.e. the evaluation cost of the corresponding node is increased by the node threat level coefficient, so that the finally generated node sequence is changed to avoid part of threat areas. h' (n, w 1) is an evaluation function, which is an evaluation of the actual loss from n to w1, and the algorithm runs faster the closer the evaluation function is to the actual loss.

The action unit is used for simulating the actual response of all objects on a battlefield to different battlefield environments and generating an action sequence which needs to be executed by the CGF to complete specific tasks. The CGF agent has intelligent behavior capability, namely, on the basis of a physical capability model, the CGF agent can reasonably respond to the state of the CGF agent and the state changes of other entities by sensing the virtual battlefield environment; tactical behavior of a real personnel entity can be simulated autonomously or semi-autonomously to replace a real fighter or weapon equipment, so that roles and functions of the real fighter or weapon equipment in a simulation system are realized. In order for the CGF agent to meet the goal of the CGF agent to act autonomously or semi-autonomously and interact with the environment and other agents, the CGF agent needs to have the capabilities of target information perception, path planning, autonomous cross fire and the like.

The behavior unit comprises a state module and an action module, wherein the state module is used for defining a basic connection relation between single action attributes and actions, and the action module is used for defining a state jump relation between different action sets.

The basic tactical action system adopts scene mapping as motivation (or state) and then mapping as specific action sequence through motivation (or state), and a plurality of layers of finite action state machine frames are used for realizing combination and serialization of a large number of element actions. The advantage of this process is that during the design phase, it is not emphasized that the possible behaviors resulting from the exhaustive collection of all possible scenarios would otherwise soon lead to explosion of the state space, creating "unfriendly" behavior logic, but instead putting focus on providing a generic behavior template for CGF agents. Thus, the CGF agents can be classified more naturally according to their basic response patterns to environmental stimuli. As shown in fig. 2, the behavior units can be combined to form a generic tactical action model hierarchy that adapts to specific tactical actions in different environments. Such as lying, concealing, observing, gun-based, shooting and other general models, can be combined to form the lying and observing without threat; sleeping, concealing and observing under threatening conditions; specific actions such as lying down, firing and the like under open regional conditions.

Fig. 2 is a schematic diagram of a basic tactical action system in an embodiment of the present application. Solid arrows in fig. 2 represent the underlying connection between a single action attribute and an action, and dashed arrows in fig. 2 represent state-jump relationships between different sets of actions.

The Control AI module is a behavior modeling framework developed based on a VMS platform and uses a behavior tree as a CGF agent behavior modeling method. The Control AI module consists of three parts: a visual editor, a base tactical action library, a tactical action plan (set of action trees) based on task type. The mode of realizing the CGF agent by the Control AI module is different from the original CGF agent in that: the original CGF agent usage HFSM is difficult to maintain and modify when the behavior becomes complex, and cannot be modified externally by third parties through FSM since many behaviors are implemented in the engine by hard coding. The CGF agent Behavior is decided by using a Behavior Tree (Behavior Tree), threat levels are used as decision branch conditions of the Behavior Tree, and the Control AI module enables the CGF agent to be more modularized. The Control AI module uses an optimized navigation grid (NavMesh) technology, so that the CGF agent can efficiently navigate on a map and can find smooth paths on buildings and corridors, and the original CGF agent cannot freely navigate in the buildings by using grids (grids) with the precision of less than 1m and can only move on path layers (paths) predefined in the model. CGF agents using new navigation techniques can navigate to all places in the environment where ring personnel can reach, and can also be updated dynamically to update the latest terrain navigation as the terrain changes (by editing tools to modify maps, or creation and deletion of buildings).

Another aspect of the present invention provides a method for planning an autonomous planning system for computer generated forces, comprising the steps of:

(1) The sensing unit acquires the environmental data of the simulated battlefield environment in real time, and shares the environmental data and threat level of other autonomous decision-making systems; forming situation information, and generating threat levels based on the situation information;

(2) If the self-planning system is an upper autonomous planning system, the decision unit receives the task and then generates a lower task sequence based on a decision model, wherein the decision model is based on a behavior tree description; if the self-planning system is a lower autonomous planning system, the decision unit generates a path for executing the task according to the task target and situation information; the decision unit generates a series of action sets based on the action tree and is executed by the action unit;

(3) After completing the task, reporting the task according to the issued task rule, or waiting after completing the task.

In summary, the present invention relates to an autonomous decision-making system and a decision-making method for generating weapon forces by a computer, and a sensing unit, configured to collect environmental data of a simulated battlefield environment where the autonomous decision-making system is located, and share the environmental data and threat levels with other autonomous decision-making systems; modeling based on the collected environmental data, the shared environmental data and the threat level to form situation information, and generating the threat level based on the situation information; the decision unit is used for receiving the task issued by the upper-level autonomous planning system, planning the task and generating a path for executing the task according to the task target and the threat level; and the behavior unit is used for simulating the real response of the corresponding object on the battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target place according to the path for executing the task.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims

1. An autonomous planning system for generating weapon forces by a computer, comprising a perception unit, a decision unit and a behavior unit;

The behavior unit is used for simulating the real response of the corresponding object on the battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target place according to the path for executing the task; the sensing unit comprises a plurality of sensors and a blackboard system; the sensor is used for collecting environmental data; the blackboard system is used for storing and sharing environment data and threat levels among the same parties; the sensor comprises a basic timer, a core function and a data area;

The data area is used for storing state information of the sensor, whether the sensor is activated or not and whether the sensor is available or not; storing associated data, dependent parent sensor and custom data required for program logic implementation; the sensing unit further comprises a communication capability module, wherein the communication capability module is used for defining the communication capability of the CGF intelligent agent, comprises a communication range capable of reading shared data, and sets the weight of the shared threat level according to the reliability and the effectiveness of the data source; the perception unit comprises a threat calculation module, wherein a deep convolutional neural network model is built in the threat calculation module, and threat levels are calculated and output for situation information extraction features;

the decision unit comprises a path planning module which plans CGF intelligent

The path with the minimum loss from the initial position to the target position is taken as an optimal path; when the CGF agent moves to the target point, the path planning module circularly checks the node n with the minimum f (n) and moves to the node n; the loss f (n) is calculated as follows: f (n) =g (n) +h (n); where g (n) represents the total cost of the current location to any node n, and h (n) represents the heuristic evaluation cost from node n to the target location; checking the node n with the smallest f (n) each time the main loop is performed while moving from the initial point to the target point; adding a heuristic function h' after using the node threat level coefficients for evaluating the cost of reaching the adjacent navigation points from any location; heuristic function: h (n) =h '(n, w 1) +cost (w 1, w 2) h' (w 2, goal), namely, increasing the evaluation cost of the corresponding node through the node threat level coefficient, so as to change the finally generated node sequence to avoid part of threat areas; h' (n, w 1) is an evaluation function, which is an evaluation of the actual loss of n to w 1;

The behavior unit comprises a state module and an action module, wherein the state module is used for defining single action attributes, action sets and basic connection relations among all single actions in the action sets, and the action module is used for defining state jump relations among different action sets, and threat levels are used as jump conditions; the basic tactical action system adopts scene mapping as motivation or state, and then mapping into a specific action sequence through motivation or state, and a plurality of layers of finite action state machine frames are used for realizing combination and serialization of a large number of element actions; the behavior unit adopts a control AI module to package a state module and an action module, and the control AI module uses a behavior tree to decide the CGF agent behavior, and threat level is used as a decision branch condition of the behavior tree.

2. The autonomous programming system for computer generated force of claim 1, wherein the decision unit includes a path planning module that plans a path of least loss of CGF agent from a starting location to a target location as an optimal path.

3. The autonomous planning system for computer generated force of claim 2, wherein the path planning module loops examining node n for minimizing f (n) and moving toward node n as the CGF agent moves toward the target point; the loss f (n) is calculated as follows:

f(n)＝g(n)+h(n)

4. The autonomous planning system for computer generated force of claim 1, wherein the behavior unit comprises a state module for defining individual action attributes, action sets, and underlying connections between individual actions within an action set, and an action module for defining state jump relationships between different action sets, threat levels being jump conditions.

5. The autonomous planning system for computer generated forces of claim 4, wherein the behavior unit encapsulates a status module and an action module with a control AI module that uses a behavior tree to decide CGF agent behavior, threat level as a decision branch condition of the behavior tree.

6. A method of planning using an autonomous planning system for computer generated forces as claimed in any of claims 1 to 5, comprising the steps of: