CN112307622A

CN112307622A - Autonomous planning system and planning method for generating military forces by computer

Info

Publication number: CN112307622A
Application number: CN202011190896.6A
Authority: CN
Inventors: 魏永勇; 陈岩; 李广运; 易中凯; 张天赐; 郭志明; 王珣; 曹俊卿; 赵裕伟; 王迪
Original assignee: Ordnance Science and Research Academy of China
Current assignee: Ordnance Science and Research Academy of China
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-02-02

Abstract

The invention relates to an autonomous decision-making system and a decision-making method for generating force by a computer.A sensing unit is used for acquiring environmental data of a simulated battlefield environment and sharing the environmental data and threat level with other autonomous decision-making systems; modeling and forming situation information based on the acquired environment data, the shared environment data and the threat level, and generating a threat level based on the situation information; the decision unit is used for receiving a task issued by a superior autonomous planning system, planning the task and generating a path for executing the task according to a task target and a threat level; and the behavior unit is used for simulating real reactions of corresponding objects on a battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target location according to the path for executing the task.

Description

Autonomous planning system and planning method for generating military forces by computer

Technical Field

The invention belongs to the technical field of computer generated force simulation, and particularly relates to an autonomous decision-making system and a decision-making method for computer generated force.

Background

The research of the computer-generated force CGF artificial intelligence modeling and simulation technology in China starts late, and for the overall research of the CGF system, the national 'nine five' national defense pre-research key project 'comprehensive multi-weapon platform demonstration system' and the national '863' research subject 'distributed virtual scene technology DVENET' play a positive role in promoting the CGF modeling and simulation technology.

In recent years, some key technologies and applications related to CGF have established a certain research foundation and established relevant demonstration systems. DVENET developed by the north navigation lead realizes partial computer generated forces of air war, sea war and land war. The national defense science and technology university establishes a sea warfare computer generated military force platform SEFBG based on the research of a control theory behavior modeling method. In addition to the related systems, the automation academy of the national defense science and technology university develops the armed behavior modeling research based on the finite state machine; the research on the force generated by an air combat computer is developed in a key north aviation laboratory; the armored force engineering institute carries out the research of the computer generation force system of the armored combat vehicle.

Generally, the research of the CGF artificial intelligence modeling and simulation technology in China is still in the initial stage, and a certain gap exists between the theoretical research and the system design.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an autonomous decision-making system and a decision-making method for generating the force by a computer.

In order to achieve the aim, the invention provides an autonomous planning system for generating the force by a computer, which comprises a sensing unit, a decision unit and a behavior unit;

the sensing unit is used for acquiring environmental data of the simulated battlefield environment and sharing the environmental data and threat level with other autonomous decision-making systems; modeling and forming situation information based on the acquired environment data, the shared environment data and the threat level, and generating a threat level based on the situation information;

the decision unit is used for receiving a task issued by a superior autonomous planning system, planning the task and generating a path for executing the task according to a task target and a threat level;

and the behavior unit is used for simulating real reactions of corresponding objects on a battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target location according to the path for executing the task.

Further, the sensing unit comprises a plurality of sensors and a blackboard system; the sensor is used for collecting environmental data; the blackboard system is used for storing and sharing environmental data and threat levels among the same parties.

Further, the sensor comprises a basic timer, a core function and a data area;

the basic timer is used for defining the duration of the sensor in an activated state and the interval time between two activations of the sensor;

the kernel function is used for defining the conditions of activation and failure of the sensor and the behaviors of the sensor during activation and failure;

the data area is used for storing state information of the sensor, whether the sensor is activated or not and whether the sensor is available or not; storing associated data, dependent parent perceptron and custom data required on program logic implementation.

Furthermore, the sensing unit further comprises a communication capability module, wherein the communication capability module is used for defining the communication capability of the CGF agent, reading the communication range of the shared data, and setting the shared threat level weight according to the reliability and effectiveness of the data source.

Furthermore, the sensing unit comprises a threat calculation module, a deep convolutional neural network model is built in the threat calculation module, and the threat calculation module extracts features from situation information to calculate threat level and outputs the threat level.

Further, the decision unit comprises a path planning module, and the path planning module plans a path with the minimum loss from the starting position to the target position of the CGF agent as an optimal path.

Further, when the CGF agent moves to the target point, the path planning module circularly checks the node n which minimizes f (n) and moves to the node n; loss f (n) is calculated as follows:

f(n)＝g(n)+h(n)

where g (n) represents the total cost from the current location to any node n, and h (n) represents the heuristically assessed cost from node n to the target location.

Furthermore, the behavior unit comprises a state module and an action module, wherein the state module is used for defining basic connection relations among single actions in a single action attribute, an action set and each single action in the action set, the action module is used for defining state jump relations among different action sets, and the threat level is used as a jump condition.

Furthermore, the behavior unit adopts a control AI module to package the state module and the action module, the control AI module uses a behavior tree to decide the CGF agent behavior, and the threat level is used as a decision branch condition of the behavior tree.

The invention also provides a method for planning by adopting the autonomous planning system for generating the forces by the computer, which comprises the following steps:

the sensing unit collects environmental data of the simulated battlefield environment in real time and shares the environmental data and threat level of other autonomous decision-making systems; forming situation information, and generating a threat level based on the situation information;

if the system is an upper-layer autonomous planning system, a decision unit receives a task and then generates a lower-level task sequence based on a decision model, wherein the decision model is described based on a behavior tree; if the self-planning system is a lower-layer autonomous planning system, the decision unit generates a path for executing the task according to the task target and the situation information; the decision unit generates a series of action sets based on the behavior tree, and the action sets are executed by the behavior unit;

and after the task is finished, reporting the finished task or waiting the finished task according to the issued task rule.

The technical scheme of the invention has the following beneficial technical effects:

(1) the method has the advantages that the convolutional neural network has shift-invariance (shift-invariant), the problem of high-dimensional calculation amount of input features can be effectively reduced, a smaller local mode can be learned through a previous convolutional layer, a later convolutional layer can be learned based on a higher abstract mode, and the deep convolutional neural network is effective for threat level identification on the problem of threat perception through comparison verification;

(2) the path planning based on the threat weight solves the problem that the path planning is unsafe because the traditional A algorithm ignores the enemy fire threat in the path planning, and the scheme is mainly implemented by establishing secondary planning on the basis of the original path planning, introducing the enemy fire threat value into an A algorithm evaluation function, and weighting the whole evaluation function evaluation to serve as a loss value function input factor;

(3) the basic tactical action system adopts scene mapping as an motivation (or state) and then maps as a specific action sequence through the motivation (or state), a plurality of meta-actions are combined and serialized by a multi-layer finite action state machine framework, and the meta-actions can be classified naturally according to the basic reaction mode of an entity to environmental stimuli;

(4) the invention uses the Control AI module to be more convenient for maintenance and modification, so that the CGF agent is more modularized, and the Control AI module uses an optimized navigation grid technology to realize more efficient navigation;

(5) the autonomous decision system realizes the simulation of the weaponry, the computer generated weaponry CGF has various sensors and an information sharing function, shared information is added for judging the threat level, the credibility is attenuated, a human memory model is simulated, and the simulation effect is more real.

Drawings

FIG. 1 is a schematic diagram of a deep convolutional neural network in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a basic tactical action system in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of the composition of a computer-generated armed force autonomous planning system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The invention relates to an autonomous planning system for generating forces by a computer in an implementation mode, which is combined with a figure 3 and comprises the following components: the device comprises a sensing unit, a decision unit and a behavior unit.

The sensing unit is used for the CGF intelligent agent to obtain battlefield data according to the self capability model and to perform sensing modeling on the battlefield data so as to form cognition on battlefield situation. The sensing unit comprises a sensor, a blackboard system, a communication capacity module and a characteristic training module.

The sensors are used for collecting environmental data, and each intelligent agent can have a plurality of sensors according to sensing data categories, such as vision, hearing and the like. The sensor includes a basic timer, a kernel function, and a data area. The basic timer is used for defining the duration of the sensor in the activated state and the interval time between two activations of the sensor. The kernel function is used for defining the activation condition of the sensor and the action of the sensor when the sensor is activated and deactivated. The data area is used for storing state information and associated data of the sensors, and specifically comprises whether the sensors are activated or not, whether the sensors are available or not, a father sensor and custom data.

Additional data or functions may also be added when different types of sensors are implemented. An important optional attribute is "dependency," which defines the other sensors that the sensor depends on; when this sensor is created, the sensors it depends on will also be created together.

The blackboard system is used for storage and sharing of data. The perceptron and the blackboard system are designed based on a VMS platform (a virtual military simulation platform developed by Nanjing Rui Xin network science and technology Limited liability company), and the control of the perceptron and the use of the blackboard system data are all completed in the process of constructing the behavior tree set. The intelligent agent should have the ability of sensing and data sharing, and when the behavior tree is used for describing the behavior ability of the intelligent agent, the sensing ability and the data sharing ability should be constructed.

The data acquisition of the sensor is defined according to the reconnaissance capability of the CGF agent, including vision, hearing capability definition, night vision device, infrared, radar and the like, and a sharing mechanism is established at the same time, and unified modeling is carried out on the information acquisition results of all units in the marshalling (under the condition of good communication state). And in the running process, the required sensors are run through the behavior tree, data are collected from the simulation engine according to the condition that the sensors are in an activated or deactivated state, and related data are stored on the blackboard system. The simulation engine comprises the simulation of the physical world, the behavior of the intelligent agent and the like, and simulation data are generated in the operation process of the simulation system. And other behavior tree nodes acquire sensor state information and related data from the blackboard system through the provided sensor access interface for decision making.

The communication capability module is used for defining the communication capability of the CGF agents so as to realize information sharing among the CGF agents. For example, through the communication capability module, in a normal communication range, the CGF agents can share respective acquired threat information, and increase or decrease the weight according to the reliability of the information source; and the CGF intelligent agent constructs the information memory capacity by threatening the attenuation of the information intensity.

The feature training module is used for the CGF agent to perform feature recognition training on the basis of the acquired shared information based on the deep convolutional neural network, and outputs an effective threat level as a decision basis (activation function input).

The structure of the deep convolutional neural network is defined as shown in fig. 1. In the threat perception problem, the input is situation information obtained by a CGF (cognitive grade) agent according to self perception capability, characteristic information is extracted, the characteristic information comprises the number, type, position distribution, orientation, speed, communication situation, the number, type, position distribution, orientation, speed, communication situation and the like of enemy targets, a time sequence two-dimensional vector (I) is formed to be used as the input of a convolution neural network, and a two-dimensional kernel (K) is used for convolution operation:

S(i，j)＝(I*K)(i，j)＝∑_m∑_nI(m，n)K(i-m，j-n)。

convolutional neural network working process (tensorFlow based): sliding a window of x y on a two-dimensional perception information tensor, stopping at each possible position, extracting a surrounding characteristic two-dimensional feature block, then making a tensor product of each two-dimensional feature block and the learned same weight matrix (convolution kernel), converting the tensor product into a one-dimensional vector, and then performing spatial recombination on all the vectors to convert the vectors into a 3D output tensor with the shape of (a unit of our square, a unit of a target and a threat level). The convolutional layer convolves the input with a window (convolution kernel) of a certain size. Each volume base layer includes a convolution stage, a probe stage, and a pooling stage. The output layer outputs the threat level.

The decision unit is used for establishing a multi-layer finite state machine (HFSM) of a group, a subgroup and a single entity of the CGF agents and realizing the construction of tactical rules of the CGF agents with various task capabilities. And receiving a task issued by a superior autonomous planning system, performing task planning, and generating a path for executing the task according to a task target and a threat level.

In the hierarchical CGF agent organization structure, the CGF agent at the upper layer controls the CGF agent at the lower layer so as to realize the planning of the battle target with larger scale and longer time than the lower layer, and is responsible for distributing tasks to the lower layer and monitoring the condition of the task completion of the affiliated unit. The lower CGF agent belongs to the upper CGF agent, receives the command and the task of the upper CGF agent, plans and executes the command and the task, and autonomously reacts to the battlefield environment change while executing the upper distributed task.

The optimization of the state jump condition design in the multi-layer finite-state machine framework adopts a local reinforcement learning method, and a state jump condition matrix is generated in local space learning through the continuous interaction of a CGF (Carrier-grade frequency) intelligent agent and the environment, so that a more reasonable and effective reactive intelligent behavior of the CGF intelligent agent in a local mode is realized, and the expression method is a Markov decision process. The markov decision process is described by a tuple (S, a, P, R, y), where: s is a limited state set and corresponds to a first layer of state space (the bottommost layer) in a multi-layer finite state machine; a is a limited action set corresponding to a CGF agent action set; p is the state transition probability; r is a return function; y is a discount factor used to calculate the cumulative return. The jump condition employs the generated threat level.

The decision unit comprises a path planning module, and the path planning module is used for searching the optimal path from the starting position to the target position of the CGF agent. Path planning for a CGF agent may be described as finding an optimal path from a starting location to a target location in a weighted graph with loss, and a problem space is represented as a set of location states of the entities and edges connecting them, each edge having a corresponding loss value. The loss from state X to Y is represented by the loss function cost (X, Y) of the edge (positive value), if X has no edge connection to Y cost (X, Y) is not defined (expressed in the largest integer that the machine can represent), and if states X and Y are contiguous then cost (X, Y) and cost (Y, X) are defined as the loss of the edge from X to Y. The path is represented as a sequence of vertices from a starting position to a target position, and the personnel equipment moves from a position state to the next state (causing a corresponding loss) along the sequence of vertices of the path until the target is reached.

The simplified representation of the a algorithm is: (n) g (n) + h (n), g (n) represents the cost from the initial node to any node n, the sum of all costs (X, Y), h (n) represents the heuristic evaluation cost from node n to the target point, and f (n) is checked for the smallest node n each time the main loop is performed when moving from the initial point to the target point. An initiation function h' is added after using the node threat level coefficients to evaluate the cost of reaching neighboring navigation points from arbitrary locations.

The heuristic function may be: h '(n, w1) + cost (w1, w2) h' (w2, good), that is, the evaluation cost of the corresponding node is increased by the node threat level coefficient, so that the finally generated node sequence is changed to avoid the partial threat region. h' (n, w1) is an evaluation function, and is an evaluation of the actual losses n to w1, and the closer the evaluation function is to the actual losses, the faster the algorithm runs.

The behavior unit is used for simulating the real reaction of all objects on the battlefield to different combat environments and generating the action sequence required to be executed by the CGF to complete a specific task. The CGF intelligent agent has intelligent behavior capability, namely on the basis of a physical capability model, the CGF intelligent agent can sense the virtual battlefield environment and reasonably react to the state change of the CGF intelligent agent and the state change of other entities; the simulation system can autonomously or semi-autonomously simulate the tactical behaviors of real personnel entities to replace real fighters or weaponry and realize the roles and functions of the real fighters or weaponry in the simulation system. In order to meet the requirement that the CGF agent can act autonomously or semi-autonomously and interact with the environment and other agents, the CGF agent needs to have the capabilities of target information perception, path planning, autonomous fire crossing and the like.

The behavior unit comprises a state module and an action module, wherein the state module is used for defining the basic connection relation between single action attributes and actions, and the action module is used for defining the state jump relation between different action sets.

The basic tactical action system adopts scene mapping as motivation (or state) and then mapping as a specific action sequence through the motivation (or state), and a plurality of meta actions are combined and serialized by a multi-layer finite action state machine framework. The advantage of this is that in the design phase, rather than emphasizing the possible behaviors resulting from the exhaustive collection of all possible scenarios, it would otherwise quickly lead to explosion of the state space, creating an "unfriendly" behavior logic, the focus is on providing a generic behavior template for the CGF agent. This allows a more natural classification of CGF agents according to their basic response patterns to environmental stimuli. As shown in fig. 2, the behavior units can be combined to form a universal tactical movement model system adapted to specific tactical movements in different environments. Such as lying, concealing, observing according to the common models of a gun, shooting and the like, the lying and the observing under the condition without threat can be formed through combination; lying down, concealing and observing under the threatening condition; and special behaviors such as lying down, gunshot and shooting under the condition of open regions.

Fig. 2 is a schematic diagram of a basic tactical action system according to an embodiment of the present application. Solid arrows in fig. 2 indicate underlying connection relationships between individual action attributes and actions, and dashed arrows in fig. 2 indicate state-hopping relationships between different action sets.

A Control AI module is adopted to package a state module and an action module, the Control AI module is a behavior modeling framework developed based on a VMS platform, and a behavior tree is used as a CGF (context-based intelligent body) behavior modeling method. The Control AI module consists of three parts: visual editor, base tactical action library, tactical action scheme (action tree set) based on task type. The mode of the Control AI module for realizing the CGF agent is different from the original CGF agent in that: the original CGF agent uses HFSM, which is difficult to maintain and modify when behaviors become complex, and cannot be modified externally by a third party through FSM since many behaviors are implemented in the engine by hard coding. And (3) using a Behavior Tree (Behavior Tree) to decide the Behavior of the CGF agent, taking the threat level as a decision branch condition of the Behavior Tree, and enabling the CGF agent to be more modular by using a Control AI module. The Control AI module uses an optimized navigation grid (Navmesh) technology, the CGF agent can efficiently navigate on a map and can find smooth paths on buildings and corridors, and the original CGF agent uses a grid (grid) with the precision of less than 1m, cannot freely navigate in the buildings and can only move on a path layer (pathway) which is defined in the model in advance. CGF agents using the new navigation technology are able to navigate to all places in the environment that people in the ring can reach and can also be dynamically updated to navigate to the latest terrain when the terrain changes (map is modified by editing tools, or building creation and deletion).

The invention also provides a method for planning by an autonomous planning system for generating the force by a computer, which comprises the following steps:

(1) the sensing unit collects environmental data of the simulated battlefield environment in real time and shares the environmental data and threat level of other autonomous decision-making systems; forming situation information, and generating a threat level based on the situation information;

(2) if the system is an upper-layer autonomous planning system, a decision unit receives a task and then generates a lower-level task sequence based on a decision model, wherein the decision model is described based on a behavior tree; if the self-planning system is a lower-layer autonomous planning system, the decision unit generates a path for executing the task according to the task target and the situation information; the decision unit generates a series of action sets based on the behavior tree, and the action sets are executed by the behavior unit;

(3) and after the task is finished, reporting the finished task or waiting the finished task according to the issued task rule.

In summary, the present invention relates to an autonomous decision system and a decision method for generating forces by a computer, wherein a sensing unit is used for collecting environmental data of a simulated battlefield environment, and sharing the environmental data and threat level with other autonomous decision systems; modeling and forming situation information based on the acquired environment data, the shared environment data and the threat level, and generating a threat level based on the situation information; the decision unit is used for receiving a task issued by a superior autonomous planning system, planning the task and generating a path for executing the task according to a task target and a threat level; and the behavior unit is used for simulating real reactions of corresponding objects on a battlefield to different environments and generating an action sequence for executing the task according to the threat level and the task target in the process of reaching the target location according to the path for executing the task.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. An autonomous planning system for generating forces by a computer is characterized by comprising a sensing unit, a decision unit and a behavior unit;

2. An autonomous planning system for computer-generated forces according to claim 1, characterized in that the perception unit comprises several perceptrons and a blackboard system; the sensor is used for collecting environmental data; the blackboard system is used for storing and sharing environmental data and threat levels among the same parties.

3. An autonomous planning system for computer-generated forces according to claim 2, characterized in that the sensors comprise basic timers, kernel functions and data fields;

4. The autonomous planning system for computer-generated forces according to claim 3, characterized in that the sensing unit further comprises a communication capability module, the communication capability module is used for defining the communication capability of the CGF agents, including a communication range capable of reading shared data, and setting the shared threat level weight according to the reliability and effectiveness of data sources.

5. The autonomic planning system for computer-generated forces as claimed in claim 4, wherein the perception unit comprises a threat calculation module, a deep convolutional neural network model is built in the threat calculation module, and features are extracted from the situation information to calculate a threat level and output the threat level.

6. The autonomous planning system for computer-generated forces according to one of claims 1 to 5, characterized in that the decision unit comprises a path planning module which plans a path with the least loss of the CGF agent from the starting position to the target position as an optimal path.

7. The autonomic planning system for computer-generated forces of claim 6 wherein the path planning module loops through and moves to node n that minimizes f (n) as the CGF agent moves to the target point; loss f (n) is calculated as follows:

f(n)＝g(n)+h(n)

8. The autonomous planning system for computer-generated forces according to claim 1, characterized in that the behavior unit comprises a state module and an action module, the state module is used for defining basic connection relations among single actions in a single action attribute, an action set and each single action in the action set, the action module is used for defining state jump relations among different action sets, and a threat level is used as a jump condition.

9. The autonomous planning system for computer-generated forces according to claim 7, characterized in that the behavior unit employs a control AI module encapsulating a state module and an action module, the control AI module using a behavior tree decision CGF agent behavior, the threat level being a decision branch condition of the behavior tree.

10. A method of planning using an autonomous planning system for computer-generated forces according to one of claims 1 to 9, characterized in that it comprises the following steps: