CN112560332A - Aviation soldier system intelligent behavior modeling method based on global situation information - Google Patents

Aviation soldier system intelligent behavior modeling method based on global situation information Download PDF

Info

Publication number
CN112560332A
CN112560332A CN202011375776.3A CN202011375776A CN112560332A CN 112560332 A CN112560332 A CN 112560332A CN 202011375776 A CN202011375776 A CN 202011375776A CN 112560332 A CN112560332 A CN 112560332A
Authority
CN
China
Prior art keywords
situation
color
combat
state
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011375776.3A
Other languages
Chinese (zh)
Other versions
CN112560332B (en
Inventor
李妮
董力维
王泽�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202011375776.3A priority Critical patent/CN112560332B/en
Publication of CN112560332A publication Critical patent/CN112560332A/en
Application granted granted Critical
Publication of CN112560332B publication Critical patent/CN112560332B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an aviation soldier system intelligent behavior modeling method based on global situation information, which is characterized in that the global situation is mathematically expressed based on a state vector by comprehensively studying and judging the complex aerial global battlefield situation; providing a situation feature extraction and perception algorithm based on a two-dimensional GIS situation map, obtaining element information which cannot be directly obtained from a state vector, and obtaining a global situation state space for sensing an aviation soldier intelligent behavior model; and a reward value generation algorithm based on network connected domain maximization is provided, and the aviation soldier intelligent behavior model is driven to perform iterative evolution to high return under the excitation of an incomplete global situation. The technical scheme of the invention can provide effective theoretical basis and technical assistance for researching the aspects of obtaining greater operational information advantages under the condition of incomplete battlefield situation perception, generating efficient air combat command and control decision, analyzing, deducing and repeating air combat schemes, improving the operational level of an aviation soldier system and the like.

Description

Aviation soldier system intelligent behavior modeling method based on global situation information
Technical Field
The invention belongs to the technical field of operational situation analysis and aviation soldier modeling and simulation, and particularly relates to an aviation soldier system intelligent behavior modeling method based on global situation information.
Background
In recent years, China is in the development process of new military transformation, and the security form faced by China is increasingly complex. The aviation soldier has the characteristics of rapid operation force application, multidimensional battlefield space, rapid change of operation situation, various operation modes and the like, and is important force for maintaining national safety.
The military force system of the aviation soldier is taken as a typical complex system, the uncertainty of the operation situation, the complexity of weaponry and equipment and the huge volume of operation tasks make the operation simulation research face huge challenges, and at present, the countermeasure operation simulation research aiming at the military force system of the aviation soldier is far from reaching the mature ground step. With the increasing of the pace of the modern system fight and the increasing of the complexity, the human brain decision is difficult to adapt to the trend of rapid change of the fight situation of the aviation soldier armed system, the future aviation soldier armed system fight needs rapid, automatic and autonomous decision, and the intelligent technology is urgently needed to extend the human brain so as to improve the capability of a command information system, thereby adapting to the high-speed, complex and variable battlefield environment.
The current war force system confrontation is mainly network center war, and finally forms the fighting activity with fighting strength advantage through information advantage enabling control decision advantage. The network central war acquires, fuses, transmits and processes environmental information of various aspects of the battlefield through a detection network and an information network which are composed of combat entities distributed in the battlefield, thereby forming a battlefield perception situation, and the environmental information is quickly shared and presented to an instruction control center as a basis for generating an instruction control decision.
The military strength countermeasure technology becomes the key point in the systematic countermeasure simulation research in response to the command decision environment for combined defeating of all levels of fighters in the current informatization combat. The aviation soldier formation decision model based on the rule set has the defects of high knowledge acquisition difficulty, poor adaptability to decision environment, poor reconfigurability, high modeling workload, complex maintenance and the like.
With the breakthrough progress of the artificial intelligence technology in the fields of perception intelligence and cognitive intelligence, the machine learning technology represented by deep learning and reinforcement learning methods makes the behavior modeling direction progress, and provides possibility for breakthrough of the technical bottleneck of confrontation behavior decision facing the military strength system of the aviation soldier. The core idea of reinforcement learning behavior modeling is to enable a reinforcement learning algorithm to an agent, wherein a situation environment is a carrier for confrontation and learning and an object for interacting with the reinforcement learning agent, and becomes a core element in the whole closed-loop process of behavior modeling based on reinforcement learning. Therefore, it is important to expand the following two issues around the situation: firstly, selecting a proper state vector space to reasonably express the complex air battlefield situation and effectively sensing the situation environment by an aviation soldier force behavior model; and secondly, generating effective continuous rewards aiming at the confrontation decision-making behaviors of the aviation soldiers under the excitation of incomplete global situation environment information. However, at present, in order to solve the two important problems, a theoretical research and a technical implementation means for a specific system confrontation simulation scene still need to be deeply explored and optimally designed, namely an effective expression and perception technology of an incomplete confrontation situation and an aviation soldier intelligent reward generation technology under the excitation of the incomplete situation.
In the field of intelligent behavior modeling of reinforcement learning, the foreign starts earlier and has undergone theoretical development and application exploration for decades, and a systematic research method and a wide application foundation are formed. The artificial intelligence wave taking reinforcement learning and deep learning as the tide is also developed in China, but the reinforcement learning intelligent behavior modeling research aiming at global situation information is less at present and is basically in the development stage. Therefore, the method for modeling the intelligent behaviors of the aviation soldier system based on the global situation information is researched to form a reinforcement learning behavior modeling framework with universality and practicability in the aerial force fight, and the method is an important subject for breaking through an intelligent decision and improving the fight level from an artificial intelligence level in the modern aerial fight.
At present, the research and the related method for fighting and confrontation of multi-formation aviation soldier strength systems based on reinforcement learning are lacked in China. Most researches are mainly conducted by scientific research institutions, and original algorithm theories and deep application expansion are lacked. Most of research aims at modeling problems generally having small-scale and definite state space, such as double-machine close-range air combat based on tactical level.
Disclosure of Invention
Aiming at the problems of complex model, complex operation rule, complex red and blue situation formalization expression and difficult effective generation of confrontation decision caused by strong uncertainty of the armed force configuration and the task flow in the conventional combat simulation of the armed force system of the aviation soldier, the invention develops the research of the intelligent behavior modeling method of the armed force system under the global situation, and provides the intelligent behavior modeling method of the armed force system based on the global situation information. The specific technical scheme of the invention is as follows:
an aviation soldier system intelligent behavior modeling method based on global situation information comprises the following steps:
s1: extracting key elements to construct an environment state space vector according to the air combat characteristics of the aviation soldiers and the importance degree of relevant factors influencing the air combat result of the aviation soldiers, and effectively representing the air combat battlefield situation of the aviation soldiers; selecting own fire power network connected domain ratio ArOwn information network connected domain ratio IrEnemy fire power network communication domain ratio AbEnemy information network connected domain ratio IbThe rest percentage xi of the ammunition of the weapon and the battle loss ratio epsilon of the formation of the aviation soldier form an environment state space vector S ═<Ar,Ir,Ab,IbXi, epsilon > describe the battlefield situation of the aviation soldier;
s2: acquiring environment situation information A which cannot be directly acquired in environment state space vector by using situation feature extraction and perception algorithm based on two-dimensional GIS situation mapr、Ir、AbAnd IbBased on two-dimensional GIS state with graphic featuresThe situation map, namely the own information detection range, the fire striking range and the detected enemy force entity position are all distinguished by obvious colors, image feature extraction is carried out to obtain monochromatic feature map layers of an own information network connected domain and a fire network connected domain of the own party and the enemy, and perception of situation environment information by reinforcement learning is realized;
in the image characteristic extraction part, extracting image color characteristics contained in a two-dimensional GIS situation image by adopting a color-based characteristic extraction method, and removing color values of non-characteristic pixels to obtain an information network connected domain and a fire network connected domain which are composed of own aviation soldier system combat entities and serve as two monochromatic characteristic image layers reflecting own intelligent blue combat situation; positioning an enemy combat entity through a feature extraction method, calling corresponding entity information in a self weapon equipment rule base, simulating an information network connected domain and a fire network connected domain which are formed by enemy, namely intelligent redparty combat entities, and generating two monochromatic feature map layers reflecting the fighting situation of the enemy;
s3: designing a combat behavior space of an army system of an aviation soldier; dividing an executable task set of the airplane formation according to the combat characteristics of the armed force formation in the aviation soldier system; the executable tasks of the airplane formation in the aviation soldier system are integrated to form a combat action space of the aviation soldier system;
s4: generating an effective real-time reward mechanism by adopting a reward value generation algorithm based on network connected domain maximization;
performing color histogram statistics on the characteristic map layer obtained in the step S2, calculating the number ratio of color-imparting pixels of an information network connected domain and a fire network connected domain in a monochromatic characteristic map layer, obtaining numerical parameters for representing own and enemy situation characteristics, obtaining combat advantage quantitative parameters of the two parties through comprehensive weight quantization, designing a reward function based on combat advantage comparison, providing positive and negative clear real-time reward feedback for each behavior decision of the intelligent agent, and driving the intelligent agent to make a continuously optimized behavior decision by using a reward mechanism;
s5: constructing a state transition model and designing an action selection strategy;
the transition of the air combat situation of the aviation soldier accords with a first-order Markov decision process, namely the state transition probability is only related to the current state; a behavior selection strategy is designed by a greedy-random algorithm, a behavior with the maximum effectiveness is selected in a specific state, randomness is added to the behavior selection, and the aviation soldiers are allowed to form a form to be 'explored' in a state space;
s6: based on a time sequence difference algorithm, an aviation soldier state space vector, a state transition model and an action selection strategy algorithm formed in the steps S1-S5 and a reward value generation algorithm based on network connected domain maximization are fused to form an improved reinforcement learning frame in aviation soldier combat confrontation, and iterative learning training of the military force Agent is carried out based on the frame.
Further, the situation feature extraction and perception algorithm process based on the two-dimensional GIS situation map in step S2 includes:
s2-1: extracting image features; abstracting a frame of m multiplied by n two-dimensional GIS situation map based on RGB color space into a matrix
Figure BDA0002807170270000031
Each element of the matrix is a three-dimensional vector representing the RGB color value of the pixel point at the corresponding position, as shown in the following formula:
Figure BDA0002807170270000041
in the formula, cij∈[RGB]I is more than or equal to 0 and less than m, j is more than or equal to 0 and less than n, and the three-dimensional vector is a three-dimensional vector of RGB color values of pixel points at corresponding positions, [ r, g, b [ ]]ijWherein r represents a red component of the color, g represents a green component of the color, b represents a blue component of the color, and the numerical ranges of the three elements are all 0-255;
s2-2: the color value range of the information perception domain of the own party combat entity is set as
Figure BDA0002807170270000042
The fire striking domain color value of the own combat entity is
Figure BDA0002807170270000043
The color value of the enemy combat entity is cr, and the steps S2-3 to S2-8 are carried out on each frame of two-dimensional GIS situation map;
s2-3: copying a two-dimensional GIS situation map, judging whether the current pixel belongs to a local information sensing area or not on a copy layer I pixel by pixel, and if so, keeping the color value of the current pixel; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000044
s2-4: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to a fire striking area of the own party on a copy map layer II, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000045
s2-5: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to an enemy combat entity or not on a copy map layer III, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000046
s2-6: let the enemy combat entity be e1,e2,...,epCalling corresponding information perception range in weapon equipment rule base as
Figure BDA0002807170270000047
The corresponding fire striking range is
Figure BDA0002807170270000048
Executing the step S2-8 and the step S2-9 on the layer obtained after the processing of the step S2-6;
s2-7: copying the layer obtained after the processing of the step S2-6, and copying the layerIV, judging whether the current pixel point belongs to the enemy combat entity or not by pixel points, if so, assigning color values of all pixel points in a circle taking the current pixel point as the center and the information perception range of the corresponding combat entity as the radius to be
Figure BDA0002807170270000049
If not, the color value of the current pixel point is reserved; the following formula:
Figure BDA0002807170270000051
in the formula, crThe color value of the enemy combat entity,
Figure BDA0002807170270000052
the color value corresponding to the red information perception range;
s2-8: the layer obtained after the processing of the step S2-6 is copied, whether the current pixel point belongs to the enemy combat entity is judged pixel by pixel on the copy layer V, if yes, all the pixel point color values in a circle taking the current pixel point as the center and the corresponding combat entity firepower impact range as the radius are assigned as the color values
Figure BDA0002807170270000053
If not, the color value of the current pixel point is reserved; the following formula:
Figure BDA0002807170270000054
in the formula, crThe color value of the enemy combat entity,
Figure BDA0002807170270000055
the color value corresponding to the red fire striking range;
therefore, the intelligent aviation soldier system obtains four feature layers respectively reflecting own party and enemy information network connected domains and fire network connected domains from the two-dimensional GIS situation map, namely, the layer I, the layer II, the layer IV and the layer V, and completes situation feature extraction and perception.
Further, the aviation force comprises fighter force formation, bomber formation, early warning machine formation, unmanned reconnaissance machine formation and electronic interference machine formation, and the operation behavior space of the intelligent aviation force system is as follows:
s3-1: the fighter plane comprises fighter planes and bombers, and according to the fighter characteristics of the fighter planes, the performable tasks of the fighter plane formation comprise: region patrol J1Patrol J for take-off area2Patrol on route J3Patrol J of takeoff route4Huoyang J5Takeoff and protection J6Air interception J7Go back J8
S3-2: tasks that can be performed according to operational characteristics of bombers include: regional patrol H1Patrol H for take-off area1Patrol on route H2Patrol H for takeoff route3Regional assault H4Takeoff area assault H5Target assault H6Takeoff target assault H7Go back H8
S3-3: the executable tasks of the early warning machine according to the fighting characteristics of the early warning machine comprise: regional patrol detection Y1Patrol detection Y for route2Early warning machine detection mode Y3And early warning machine radar startup and shutdown Y4And the detection task is cancelled Y5
S3-4: the executable tasks according to the operational characteristics of the electronic jammer comprise: regional interference R1Route interference R2Setting an interference pattern R3Turn off the disturbance R4Ending the disturbance R5
S3-5: the executable tasks of the unmanned reconnaissance plane according to the fighting characteristics comprise: regional patrol scouting W1Patrol and reconnaissance W for route2The scout task is cancelled W3
S3-6: sorting the executable tasks of the different aviation soldier formations described in the steps S3-1 to S3-5 to obtain a combat action space A ═ J of the whole intelligent aviation soldier force system decision-making action model1,…J8,H1,…H9,Y1,…Y5,R1,…R5,W1,…W3}。
Further, the reward value generation algorithm based on the network connected domain maximization in the step S4 includes:
s4-1: counting the pixel proportion of the color histogram in the feature map layer obtained in the step S2; respectively executing the steps S4-2 to S4-4 to four m multiplied by n characteristic image layers representing the own-party information network connected domain, the own-party fire network connected domain, the enemy information network connected domain and the enemy fire network connected domain;
s4-2: color quantization; setting the color interval of the map layer as range, wherein the range comprises the color value range of the information perception domain of the own combat entity
Figure BDA0002807170270000061
The fire striking domain color value of the own combat entity is
Figure BDA0002807170270000062
And the color value of the enemy combat entity is crNamely, the following conditions are satisfied:
Figure BDA0002807170270000063
randomly dividing range into N color intervals bini=[ci1,ci2]Each bin is called a bin of the color histogram, as follows:
range=bin1∪bin2U…∪binN
in the formula, ci1Is the color interval biniLower boundary of ci2Is the color interval biniThe upper bound of (c);
s4-3: carrying out color detection pixel by pixel, calculating the number of pixels of which the colors fall in each interval, and obtaining a color histogram, wherein the color histogram is represented as:
Figure BDA0002807170270000064
in the formula (I), the compound is shown in the specification,
Figure BDA0002807170270000067
indicating that the color falls in the interval bini=[ci1,ci2]The number of pixels in (1) is proportional; c. CpqThe color value of the pixel point is; c. CiIs the color sub-interval biniCenter color value of (d): c. Ci=0.5×(ci1,ci2);δ(cpq-ci) The specific form is a color judgment function:
Figure BDA0002807170270000065
s4-4: counting the total pixel proportion of non-zero color values:
Figure BDA0002807170270000066
in the formula, hTA total pixel fraction representing a non-zero color value;
s4-5: the four steps from the step S4-2 to the step S4-4 are executed to obtain the total proportion h of the non-zero color value pixels of the four monochrome characteristic image layersT(1),hT(2),hT(3),hT(4) Respectively corresponding to situation characteristic parameters I as own information networkb=hT(1) Own fire network situation characteristic parameter Ab=hT(2) Situation characteristic parameter I of enemy information networkr=hT(3) Situation characteristic parameter A of enemy fire power networkr=hT(4);
S4-6: based on the result of step S4-5, obtaining the quantified parameters of the battle superiority of both parties by weight integrated quantification, using PbIndicating the operational advantage of the own party in the systematic confrontation, PrThe fighting advantages of the enemy in the system fight are shown, and the fighting advantages of the two parties are as follows:
Pb=ω1·Ib2·Ab
Pr=ω1·Ir2·Ar
in the formula, ω1Representing the weight of the information network advantage in the comprehensive combat advantage; omega2Represents the weight of the fire network advantage in the comprehensive combat advantage, the weight value is adjusted between (0, 1), and meets omega12=1;
S4-7: designing a formalized reward function based on the contrast of the operational advantages of the two parties; the core idea of the reward function for proposing the reward mechanism is that: comparing a one-time behavior decision made under the current situation with the comprehensive combat superiority of two parties formed after the interaction of the battlefield environment to obtain a reward value based on the current situation and the decision; specifically, if the decision makes the intelligent agent have the comprehensive combat advantage relative to the enemy, the reward is positive, and the greater the advantage is, the greater the absolute value of the reward value is; if the decision makes the intelligent agent have the disadvantage of comprehensive combat relative to the enemy, the reward is negative, and the greater the disadvantage, the greater the absolute value of the reward value; meanwhile, the reward parameters need to be normalized;
the reward function is expressed as: the proportion of the operational advantage of one intelligent agent to the total operational advantage of the two intelligent agents is used as a main reward value, a minimum value delta is matched to introduce a positive and negative numerical characteristic, and the following formula is shown as follows:
Figure BDA0002807170270000071
wherein R is an award value based on the current situation and decision; delta is a minimum value in the range of (10)-4,10-3) The significance is to avoid divide by zero while introducing normalized prize values into the positive and negative numerical features.
Further, the specific process of step S5 is as follows:
s5-1: the transition of the fighting situation is described by the probability,
Figure BDA0002807170270000072
represents the transition probability between states, meaning: the probability of executing an action a in state s to reach state s', all transition probabilities constituting a matrix, called the environmental transition matrix, denoted as environmental transition matrixT;
S5-2: after the own party selects the behavior a, the change of the fighting situation is completely expressed by the state transition matrix, and the air combat process conforms to the first-order Markov decision process, namely the transition probability is only related to the current state;
s5-3: combining the probability in the state transition model, each state s, and selecting the behavior a according to a certain probability by following the strategy pi under the state to form a 'state-behavior' pair (s, a), wherein the value of the 'state-behavior' pair is obtained by a Q function, and Q is used for obtaining the value of the 'state-behavior' pairπ(s, a) represents;
s5-4: in the course of behavior selection, add the random selection part to form the behavior selection tactics mu on the basis of greedy tactics, in order to choose a behavior from the behavior space under each state, and transfer to the next state with a certain probability, the construction of the behavior selection tactics mu lies in setting up a exploration constant tau at first, tau is for (0, 1), when choosing the behavior each time, produce a random number rho with interval [0, 1], there are:
Figure BDA0002807170270000081
taking τ to 0.2, there is 20% of possible free choice actions.
Further, the specific process of step S6 is as follows:
s6-1: utilizing the situation perception information obtained in the step S2, the weapon ammunition residual percentage obtained in the simulation platform and the aviation soldier formation combat damage ratio to form a state space vector S, representing a specific state space vector at a certain moment by using S, and building a GRBF neural network consisting of an input layer, a discrete layer, a hidden layer and an output layer to discretize a Q function value of a ' state-behavior ' pair so as to partition a continuous state space and obtain a state-behavior ' pair value corresponding to a discrete state; the network input is a state space vector, and the values of all 'state-behavior' pairs obtained by selecting different behaviors under the state corresponding to the state space vector are output; the network input layer and the discrete layer have the same dimension with the state space vector; the hidden layer of the network has m nodes in total, and the output layer has the same dimension as the behavior space; for an aviation Agent, 30 behaviors are selected in the behavior space under each state, and the calculation formula is as follows:
Figure BDA0002807170270000082
wherein Q (s, a)j) Q function value, w, representing the execution of the j-th action in state sijThe connection weight between the ith node of the hidden layer and the jth node of the output layer,
Figure BDA0002807170270000083
normalization for the ith node of the hidden layer:
Figure BDA0002807170270000084
in the formula, the radial basis function biThe formula for calculation of(s) is:
Figure BDA0002807170270000085
in the formula (d)iIs the center of the ith basis function, having the same dimension as s, σiIs the width of the ith basis function, | | s-di| | is the Euclidean distance between the input state and the center of the basis function; after p is set artificially, di,σiAll determined by a k-means clustering algorithm;
s6-2: carrying out iterative learning training of the military force agents based on the framework of the step S6-1, counting the learning process in cycles, regarding the completion of one round of combat as the completion of one learning cycle, and describing the decision process of the intelligent aviation soldier combat system into steps S6-3 to S6-10;
s6-3: initializing a GRBF neural network of an Agent of an aviation soldier, setting the center and the width of the GRBF through K-means clustering, and setting the maximum learning cycle number K, wherein K is 1;
s6-4: starting the learning of the k-th iteration cycle, starting the confrontation simulation, wherein t is the current time, and t is the0,st=s0,s0Is in an initial state;
s6-5: in the kth iteration cycle, state stDown-obey policy μ execution behavior atThen, based on the instant reward obtained in step S4
Figure BDA0002807170270000091
Transition to a new state st+1Continuing to execute behavior a following policy μt+1(ii) a Calculating stCorresponding GRBF network output, and updating the weight from the hidden layer to the output layer by using a time sequence difference algorithm according to the following formula
Figure BDA0002807170270000092
Figure BDA0002807170270000093
In the formula (I), the compound is shown in the specification,
Figure BDA0002807170270000094
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network obtained by iteration in the kth learning periodt) The connection weight between each node;
Figure BDA0002807170270000095
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt) The connection weight between each node;
Figure BDA0002807170270000096
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt+1) The connection weight between each node; bi(st) Is the state S described in S6-1tA radial basis function of; bi(st+1) Is a state st+1A radial basis function of; id (a)t) Is an action atThe serial number of (2); id (a)t+1) Is an action at+1The serial number of (2); alpha represents the learning rate and takes the valueThe range is (0, 1);
s6-6: let t be t +1, and repeatedly execute step S6-5 until the confrontation simulation wins and falls to the end state of the current iteration cycle;
s6-7: let K be K +1 and repeatedly perform steps S6-5 to S6-6 until K > K.
The invention has the beneficial effects that:
1. the invention carries out new exploration research aiming at the intelligent decision-making modeling of the formation of the aviation soldiers based on the deep reinforcement learning algorithm, and compared with the intelligent decision-making model of the formation of the aviation soldiers based on the rule set, the intelligent behavior model of the formation of the aviation soldiers based on the deep reinforcement learning has the advantages of less knowledge acquisition time consumption, good adaptability to the decision environment change, convenient reuse, no need of maintenance and the like, and supports the task of continuous change in the high-dynamic battlefield environment.
2. The invention provides a situation characteristic extraction and perception algorithm based on a two-dimensional GIS situation map, which is used for forming the perception capability of incomplete situation environment information under the system confrontation situation based on a situation map information extraction mechanism similar to human decision, serving the behavior value generation of a military strength intelligent agent in the confrontation and supporting a complete closed loop of a reinforcement learning process.
3. The invention provides a reward value generation algorithm based on network connected domain maximization, designs a reward mechanism for decision continuous optimization by taking the network connected domain maximization of own combat as a core, and supports driving a force agent to complete decision generation and optimization under the excitation of incomplete confrontation situation information.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the drawings which are needed in the embodiments will be briefly described below, so that the features and advantages of the present invention can be understood more clearly by referring to the drawings, which are schematic and should not be construed as limiting the present invention in any way, and for a person skilled in the art, other drawings can be obtained on the basis of these drawings without any inventive effort. Wherein:
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a situation feature extraction and perception algorithm flow based on a two-dimensional GIS situation map of the invention;
FIG. 3 is a two-dimensional confrontational map of the present invention, wherein (a) and (b) are two different examples of confrontational maps (the two-dimensional GIS situational map is the input to the situational feature extraction algorithm);
fig. 4 is a monochrome feature map layer obtained by color feature extraction according to the present invention, where (a) is a two-dimensional GIS situation map before color feature extraction, and (b) is the monochrome feature map layer obtained by color feature extraction;
FIG. 5 is a flow chart of a reward value generation algorithm based on network connected domain maximization according to the present invention;
FIG. 6 is a diagram illustrating situation characteristic parameters of two enemies and the my party obtained through color histogram statistics, wherein (a) is four monochrome characteristic image layers representing battle network situations, and (b) is a situation characteristic parameter obtained by performing color quantization on the four monochrome characteristic image layers;
FIG. 7 is a transition model of the present invention constructed from a plurality of probabilities;
FIG. 8 is a sequence of actions for which the selection of the invention is most effective;
FIG. 9 is a TD-Q based reinforcement learning framework of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Aiming at the technical bottleneck existing in the intelligent behavior modeling of the aviation soldier system, the invention aims to explore and research the intelligent behavior modeling problem of the multi-formation aviation soldier system, which has complicated and changeable war situations and difficult description of state space, based on a deep reinforcement learning algorithm.
The invention mainly solves the technical problems in two aspects: (1) the effective expression of the state space of the complex air battlefield situation and the effective perception of the aviation soldier behavior model to the global situation; (2) generating a problem for an effective continuous reward value of an airline soldier confrontation decision-making behavior under the incentive of incomplete global situation environment information. On the basis, an aviation soldier system intelligent confrontation decision-making behavior model is constructed, an effective and high-return air formation combat behavior sequence space is formed through iterative training, support is provided for constructing an intelligent aviation force model based on reinforcement learning, and further a larger combat information advantage is obtained under the condition of incomplete battlefield situation sensing for research, efficient air combat command decision is generated, and effective theoretical basis and technical assistance are provided for the aspects of analyzing, deducing and replying the air combat scheme, improving the combat level of the aviation soldier system and the like.
As shown in figure 1, the invention establishes an aviation soldier system intelligent behavior model based on global situation information, selects series key situation information influencing the confrontation outcome of the aviation soldier system to form a state space element of the aviation soldier system operation by comprehensively studying and judging the complex aerial global battlefield situation, and mathematically expresses the global situation based on a state vector; providing a situation characteristic extraction and perception algorithm based on a two-dimensional GIS (geographic Information system) situation map, and obtaining element Information which cannot be directly obtained from a state vector so as to obtain a global situation state space for sensing an intelligent behavior model of an aviation soldier; and providing a reward value generation algorithm based on network connected domain maximization to drive the aviation soldier intelligent behavior model to perform iterative evolution towards high return under the excitation of incomplete global situation. A complete technical chain for constructing an intelligent behavior model of an aerodrome system is formed through the scheme.
Specifically, the aviation soldier system intelligent behavior modeling method based on global situation information comprises the following steps:
s1: the military force of aviation soldiers generally consists of fighter plane formation, bomber formation, early warning plane formation, unmanned reconnaissance plane formation and electronic jammer formation.Extracting key elements to construct an environment state space vector according to the air combat characteristics of the aviation soldiers and the importance degree of relevant factors influencing the air combat result of the aviation soldiers, and effectively representing the air combat battlefield situation of the aviation soldiers; selecting own fire power network connected domain ratio ArOwn information network connected domain ratio IrEnemy fire power network communication domain ratio AbEnemy information network connected domain ratio IbThe rest percentage xi of the ammunition of the weapon and the battle loss ratio epsilon of the formation of the aviation soldier form an environment state space vector S ═<Ar,Ir,Ab,IbXi, epsilon > describe the battlefield situation of the aviation soldier;
s2: acquiring environment situation information A which cannot be directly acquired in environment state space vector by using situation feature extraction and perception algorithm based on two-dimensional GIS situation mapr、Ir、AbAnd IbBased on a two-dimensional GIS situation map with graphic features, namely, a self information detection range, a fire striking range and a detected enemy force entity position, obvious color distinction is carried out, image feature extraction is carried out, a self and enemy information network connected domain and a fire network connected domain monochromatic feature map layer are obtained, and perception of situation environment information by reinforced learning is achieved;
in the image characteristic extraction part, extracting image color characteristics contained in a two-dimensional GIS situation image by adopting a color-based characteristic extraction method, and removing color values of non-characteristic pixels to obtain an information network connected domain and a fire network connected domain which are composed of own aviation soldier system combat entities and serve as two monochromatic characteristic image layers reflecting own intelligent blue combat situation; positioning an enemy combat entity through a feature extraction method, calling corresponding entity information in a self weapon equipment rule base, simulating an information network connected domain and a fire network connected domain which are formed by enemy, namely intelligent redparty combat entities, and generating two monochromatic feature map layers reflecting the fighting situation of the enemy;
s3: designing a combat behavior space of an army system of an aviation soldier; dividing an executable task set of the airplane formation according to the combat characteristics of the armed force formation in the aviation soldier system; the executable tasks of the airplane formation in the aviation soldier system are integrated to form a combat action space of the aviation soldier system;
s4: generating an effective real-time reward mechanism by adopting a reward value generation algorithm based on network connected domain maximization;
performing color histogram statistics on the characteristic map layer obtained in the step S2, calculating the number ratio of color-imparting pixels of an information network connected domain and a fire network connected domain in a monochromatic characteristic map layer, obtaining numerical parameters for representing own and enemy situation characteristics, obtaining combat advantage quantitative parameters of the two parties through comprehensive weight quantization, designing a reward function based on combat advantage comparison, providing positive and negative clear real-time reward feedback for each behavior decision of the intelligent agent, and driving the intelligent agent to make a continuously optimized behavior decision by using a reward mechanism;
s5: constructing a state transition model and designing an action selection strategy;
in the process of confrontation of the military force system of the aircraft soldier, due to the independence of decision of the two parties, the transition of the battle situation of the aircraft soldier has the characteristic of nondeterministic property and needs to be described through probability, as shown in fig. 7. Because the air combat situation changes violently, the transition of the air combat situation of the aviation soldier is considered to conform to a first-order Markov Decision Process (MDP), i.e. the state transition probability is only related to the current state. A behavior selection strategy is designed by a greedy-random algorithm, a behavior with the maximum effectiveness is selected in a specific state, randomness is added to the behavior selection, and the aviation soldiers are allowed to form a form to be 'explored' in a state space;
s6: based on a time sequence difference algorithm, an aviation soldier state space vector, a state transition model and an action selection strategy algorithm formed in the steps S1-S5 and a reward value generation algorithm based on network connected domain maximization are fused to form an improved reinforcement learning frame in aviation soldier combat confrontation, and iterative learning training of the military force Agent is carried out based on the frame.
As shown in fig. 2-3, the situation feature extraction and perception algorithm based on the two-dimensional GIS situation map in step S2 includes:
s2-1: extracting image features; mxn size based on RGB color space for one frameThe two-dimensional GIS situation map is abstracted into a matrix
Figure BDA0002807170270000131
Each element of the matrix is a three-dimensional vector representing the RGB color value of the pixel point at the corresponding position, as shown in the following formula:
Figure BDA0002807170270000132
in the formula, cij∈[RGB]I is more than or equal to 0 and less than m, j is more than or equal to 0 and less than n, and the three-dimensional vector is a three-dimensional vector of RGB color values of pixel points at corresponding positions, [ r, g, b [ ]]ijWherein r represents a red component of the color, g represents a green component of the color, b represents a blue component of the color, and the numerical ranges of the three elements are all 0-255;
s2-2: the color value range of the information perception domain of the own party combat entity is set as
Figure BDA0002807170270000133
The fire striking domain color value of the own combat entity is
Figure BDA0002807170270000134
The color value of the enemy combat entity is crExecuting the step S2-3 to the step S2-8 for each frame of the two-dimensional GIS situation map;
s2-3: copying a two-dimensional GIS situation map, judging whether the current pixel belongs to a local information sensing area or not on a copy layer I pixel by pixel, and if so, keeping the color value of the current pixel; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000135
s2-4: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to a fire striking area of the own party on a copy map layer II, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000136
s2-5: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to an enemy combat entity or not on a copy map layer III, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure BDA0002807170270000137
s2-6: let the enemy combat entity be e1,e2,...,epCalling corresponding information perception range in weapon equipment rule base as
Figure BDA0002807170270000141
The corresponding fire striking range is
Figure BDA0002807170270000142
Executing the step S2-8 and the step S2-9 on the layer obtained after the processing of the step S2-6;
s2-7: the layer obtained after the processing of the step S2-6 is copied, whether the current pixel point belongs to the enemy combat entity is judged pixel by pixel on the copy layer IV, if yes, all the pixel point color values in a circle taking the current pixel point as the center and the corresponding combat entity information perception range as the radius are assigned as the color values
Figure BDA0002807170270000143
If not, the color value of the current pixel point is reserved; the following formula:
Figure BDA0002807170270000144
in the formula, crThe color value of the enemy combat entity,
Figure BDA0002807170270000145
the color value corresponding to the red information perception range;
s2-8: the layer obtained after the processing of the step S2-6 is copied, whether the current pixel point belongs to the enemy combat entity is judged pixel by pixel on the copy layer V, if yes, all the pixel point color values in a circle taking the current pixel point as the center and the corresponding combat entity firepower impact range as the radius are assigned as the color values
Figure BDA0002807170270000146
If not, the color value of the current pixel point is reserved; the following formula:
Figure BDA0002807170270000147
in the formula, crThe color value of the enemy combat entity,
Figure BDA0002807170270000148
the color value corresponding to the red fire striking range;
the intelligent aviation soldier system obtains four feature layers respectively reflecting own and enemy information network connected domains and fire network connected domains from a two-dimensional GIS situation map, namely a layer I, a layer II, a layer IV and a layer V, and completes situation feature extraction and perception, wherein as shown in FIG. 4, (a) is a two-dimensional situation map before color feature extraction, and (b) is a single-color feature layer obtained by color feature extraction;
in the step S3, the aviation soldier forces comprise fighter plane formation, bomber formation, early warning plane formation, unmanned reconnaissance plane formation and electronic interference plane formation, and the operation behavior space of the intelligent aviation soldier force system is as follows:
s3-1: the fighter plane comprises fighter planes and bombers, and according to the fighter characteristics of the fighter planes, the performable tasks of the fighter plane formation comprise: region patrol J1Patrol J for take-off area2Patrol on route J3Patrol J of takeoff route4Huoyang J5Takeoff and protection J6Air interception J7Go back J8
S3-2: tasks that can be performed according to operational characteristics of bombers include: regional patrol H1Patrol H for take-off area1Patrol on route H2Patrol H for takeoff route3Regional assault H4Takeoff area assault H5Target assault H6Takeoff target assault H7Go back H8
S3-3: the executable tasks of the early warning machine according to the fighting characteristics of the early warning machine comprise: regional patrol detection Y1Patrol detection Y for route2Early warning machine detection mode (alternate air, sea and air) Y3And early warning machine radar startup and shutdown Y4And the detection task is cancelled Y5
S3-4: the executable tasks according to the operational characteristics of the electronic jammer comprise: regional interference R1Route interference R2Setting interference pattern (jamming interference, aiming interference) R3Turn off the disturbance R4Ending the disturbance R5
S3-5: the executable tasks of the unmanned reconnaissance plane according to the fighting characteristics comprise: regional patrol scouting W1Patrol and reconnaissance W for route2The scout task is cancelled W3
S3-6: sorting the executable tasks of the different aviation soldier formations described in the steps S3-1 to S3-5 to obtain a combat action space A ═ J of the whole intelligent aviation soldier force system decision-making action model1,…J8,H1,…H9,Y1,…Y5,R1,…R5,W1,…W3}。
As shown in fig. 5, the flow of the reward value generation algorithm based on the network connected domain maximization in step S4 is as follows:
s4-1: counting the pixel proportion of the color histogram in the feature map layer obtained in the step S2; respectively executing the steps S4-2 to S4-4 to four m multiplied by n characteristic image layers representing the own-party information network connected domain, the own-party fire network connected domain, the enemy information network connected domain and the enemy fire network connected domain;
s4-2: color quantization; setting the color interval of the map layer as range, wherein the range comprises the color value range of the information perception domain of the own combat entity
Figure BDA0002807170270000151
The fire striking domain color value of the own combat entity is
Figure BDA0002807170270000152
And the color value of the enemy combat entity is crNamely, the following conditions are satisfied:
Figure BDA0002807170270000153
randomly dividing range into N color intervals bini=[ci1,ci2]Each bin is called a bin of the color histogram, as follows:
range=bin1∪bin2U…∪binN
in the formula, ci1Is the color interval biniLower boundary of ci2Is the color interval biniThe upper bound of (c);
s4-3: carrying out color detection pixel by pixel, calculating the number of pixels of which the colors fall in each interval, and obtaining a color histogram, wherein the color histogram is represented as:
Figure BDA0002807170270000154
in the formula (I), the compound is shown in the specification,
Figure BDA0002807170270000155
indicating that the color falls in the interval bini=[ci1,ci2]The number of pixels in (1) is proportional; c. CpqThe color value of the pixel point is; c. CiIs the color sub-interval biniCenter color value of (d): c. Ci=0.5×(ci1,ci2);δ(cpq-ci) The specific form is a color judgment function:
Figure BDA0002807170270000161
s4-4: counting the total pixel proportion of non-zero color values:
Figure BDA0002807170270000162
in the formula, hTA total pixel fraction representing a non-zero color value;
s4-5: the four steps from the step S4-2 to the step S4-4 are executed to obtain the total proportion h of the non-zero color value pixels of the four monochrome characteristic image layersT(1),hT(2),hT(3),hT(4) Respectively corresponding to situation characteristic parameters I as own information networkb=hT(1) Own fire network situation characteristic parameter Ab=hT(2) Situation characteristic parameter I of enemy information networkr=hT(3) Situation characteristic parameter A of enemy fire power networkr=hT(4) (ii) a The situation characteristic parameters of the enemy and the my are obtained through color histogram statistics and are shown in fig. 6, wherein (a) is four monochromatic feature layers representing the battle network situation, and (b) is the situation characteristic parameter obtained through color quantization of the four monochromatic feature layers;
s4-6: based on the result of step S4-5, obtaining the quantified parameters of the battle superiority of both parties by weight integrated quantification, using PbIndicating the operational advantage of the own party in the systematic confrontation, PrThe fighting advantages of the enemy in the system fight are shown, and the fighting advantages of the two parties are as follows:
Pb=ω1·Ib2·Ab
Pr=ω1·Ir2·Ar
in the formula, ω1Representing the weight of the information network advantage in the comprehensive combat advantage; omega2Represents the weight of the fire network advantage in the comprehensive combat advantage, the weight value is adjusted between (0, 1), and meets omega12=1;
S4-7: designing a formalized reward function based on the contrast of the operational advantages of the two parties; the core idea of the reward function for proposing the reward mechanism is that: comparing a one-time behavior decision made under the current situation with the comprehensive combat superiority of two parties formed after the interaction of the battlefield environment to obtain a reward value based on the current situation and the decision; specifically, if the decision makes the intelligent agent have the comprehensive combat advantage relative to the enemy, the reward is positive, and the greater the advantage is, the greater the absolute value of the reward value is; if the decision makes the intelligent agent have the disadvantage of comprehensive combat relative to the enemy, the reward is negative, and the greater the disadvantage, the greater the absolute value of the reward value; meanwhile, the reward parameters need to be normalized;
corresponding to the reward mechanism, the reward function is expressed as: the proportion of the operational advantage of one intelligent agent to the total operational advantage of the two intelligent agents is used as a main reward value, a minimum value delta is matched to introduce a positive and negative numerical characteristic, and the following formula is shown as follows:
Figure BDA0002807170270000171
wherein R is an award value based on the current situation and decision; delta is a minimum value in the range of (10)-4,10-3) The significance is to avoid divide by zero while introducing normalized prize values into the positive and negative numerical features.
As shown in fig. 7 to 8, the specific process of step S5 is:
s5-1: the transition of the fighting situation is described by the probability,
Figure BDA0002807170270000172
represents the transition probability between states, meaning: the probability that behavior a is executed in state s to reach state s', and all transition probabilities form a matrix, called an environment transition matrix and marked as T;
s5-2: after the own party selects the behavior a, the change of the fighting situation is completely expressed by the state transition matrix, and the air combat process conforms to the first-order Markov decision process, namely the transition probability is only related to the current state;
s5-3: combining the probability in the state transition model, each state s, and selecting the behavior a according to a certain probability by following the strategy pi under the state to form a 'state-behavior' pair (s, a), wherein the value of the 'state-behavior' pair is formed byThe Q function is obtained by Qπ(s, a) represents;
s5-4: in the course of behavior selection, add the random selection part to form the behavior selection tactics mu on the basis of greedy tactics, in order to choose a behavior from the behavior space under each state, and transfer to the next state with a certain probability, the construction of the behavior selection tactics mu lies in setting up a exploration constant tau at first, tau is for (0, 1), when choosing the behavior each time, produce a random number rho with interval [0, 1], there are:
Figure BDA0002807170270000173
taking τ to 0.2, there is 20% of possible free choice actions.
As shown in fig. 9, the specific process of step S6 is:
s6-1: utilizing the situation perception information obtained in the step S2, the weapon ammunition residual percentage obtained in the simulation platform and the aviation soldier formation combat damage ratio to form a state space vector S, representing a specific state space vector at a certain moment by using S, and building a GRBF neural network consisting of an input layer, a discrete layer, a hidden layer and an output layer to discretize a Q function value of a ' state-behavior ' pair so as to partition a continuous state space and obtain a state-behavior ' pair value corresponding to a discrete state; the network input is a state space vector, and the values of all 'state-behavior' pairs obtained by selecting different behaviors under the state corresponding to the state space vector are output; the network input layer and the discrete layer have the same dimension with the state space vector; the hidden layer of the network has m nodes in total, and the output layer has the same dimension as the behavior space; for an aviation Agent, 30 behaviors are selected in the behavior space under each state, and the calculation formula is as follows:
Figure BDA0002807170270000181
wherein Q (s, a)j) Q function value, w, representing the execution of the j-th action in state sijFor the ith node and the output layer of the hidden layerThe connection weight between the jth node,
Figure BDA0002807170270000182
normalization for the ith node of the hidden layer:
Figure BDA0002807170270000183
in the formula, the radial basis function biThe formula for calculation of(s) is:
Figure BDA0002807170270000184
in the formula (d)iIs the center of the ith basis function, having the same dimension as s, σiIs the width of the ith basis function, | | s-di| | is the Euclidean distance between the input state and the center of the basis function; after p is set artificially, di,σiAll determined by a k-means clustering algorithm;
s6-2: carrying out iterative learning training of the military force agents based on the framework of the step S6-1, counting the learning process in cycles, regarding the completion of one round of combat as the completion of one learning cycle, and describing the decision process of the intelligent aviation soldier combat system into steps S6-3 to S6-10;
s6-3: initializing a GRBF neural network of an Agent of an aviation soldier, setting the center and the width of the GRBF through K-means clustering, and setting the maximum learning cycle number K, wherein K is 1;
s6-4: starting learning of the k-th iteration cycle, starting the confrontation simulation, wherein t is the current time, t is 0, and st=s0,s0Is in an initial state;
s6-5: in the kth iteration cycle, state stDown-obey policy μ execution behavior atThen, based on the instant reward obtained in step S4
Figure BDA0002807170270000185
Transition to a new state st+1Continuing to execute behavior a following policy μt+1(ii) a Calculating stCorresponding GRBF network output, and updating the weight from the hidden layer to the output layer by using a time sequence difference algorithm according to the following formula
Figure BDA0002807170270000186
Figure BDA0002807170270000187
In the formula (I), the compound is shown in the specification,
Figure BDA0002807170270000188
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network obtained by iteration in the kth learning periodt) The connection weight between each node;
Figure BDA0002807170270000189
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt) The connection weight between each node;
Figure BDA00028071702700001810
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt+1) The connection weight between each node; bi(st) Is state S in S6-1tA radial basis function of; bi(st+1) Is a state st+1A radial basis function of; id (a)t) Is an action atThe serial number of (2); id (a)t+1) Is an action at+1The serial number of (2); alpha represents the learning rate, and the value range is (0, 1);
s6-6: let t be t +1, and repeatedly execute step S6-5 until the confrontation simulation wins and falls to the end state of the current iteration cycle;
s6-7: let K be K +1 and repeatedly perform steps S6-5 to S6-6 until K > K.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. An aviation soldier system intelligent behavior modeling method based on global situation information is characterized by comprising the following steps:
s1: extracting key elements to construct an environment state space vector according to the air combat characteristics of the aviation soldiers and the importance degree of relevant factors influencing the air combat result of the aviation soldiers, and effectively representing the air combat battlefield situation of the aviation soldiers; selecting own fire power network connected domain ratio ArOwn information network connected domain ratio IrEnemy fire power network communication domain ratio AbEnemy information network connected domain ratio IbThe rest percentage xi of the weapon ammunition and the battle loss ratio epsilon of the aviation soldier form an environment state space vector S ═ Ar,Ir,Ab,Ib,ξ,ε>Describing the battlefield situation of the aviation soldier;
s2: acquiring environment situation information A which cannot be directly acquired in environment state space vector by using situation feature extraction and perception algorithm based on two-dimensional GIS situation mapr、Ir、AbAnd IbBased on a two-dimensional GIS situation map with graphic features, namely, a self information detection range, a fire striking range and a detected enemy force entity position, obvious color distinction is carried out, image feature extraction is carried out, a self and enemy information network connected domain and a fire network connected domain monochromatic feature map layer are obtained, and perception of situation environment information by reinforced learning is achieved;
in the image characteristic extraction part, extracting image color characteristics contained in a two-dimensional GIS situation image by adopting a color-based characteristic extraction method, and removing color values of non-characteristic pixels to obtain an information network connected domain and a fire network connected domain which are composed of own aviation soldier system combat entities and serve as two monochromatic characteristic image layers reflecting own intelligent blue combat situation; positioning an enemy combat entity through a feature extraction method, calling corresponding entity information in a self weapon equipment rule base, simulating an information network connected domain and a fire network connected domain which are formed by enemy, namely intelligent redparty combat entities, and generating two monochromatic feature map layers reflecting the fighting situation of the enemy;
s3: designing a combat behavior space of an army system of an aviation soldier; dividing an executable task set of the airplane formation according to the combat characteristics of the armed force formation in the aviation soldier system; the executable tasks of the airplane formation in the aviation soldier system are integrated to form a combat action space of the aviation soldier system;
s4: generating an effective real-time reward mechanism by adopting a reward value generation algorithm based on network connected domain maximization;
performing color histogram statistics on the characteristic map layer obtained in the step S2, calculating the number ratio of color-imparting pixels of an information network connected domain and a fire network connected domain in a monochromatic characteristic map layer, obtaining numerical parameters for representing own and enemy situation characteristics, obtaining combat advantage quantitative parameters of the two parties through comprehensive weight quantization, designing a reward function based on combat advantage comparison, providing positive and negative clear real-time reward feedback for each behavior decision of the intelligent agent, and driving the intelligent agent to make a continuously optimized behavior decision by using a reward mechanism;
s5: constructing a state transition model and designing an action selection strategy;
the transition of the air combat situation of the aviation soldier accords with a first-order Markov decision process, namely the state transition probability is only related to the current state; a behavior selection strategy is designed by a greedy-random algorithm, a behavior with the maximum effectiveness is selected in a specific state, randomness is added to the behavior selection, and the aviation soldiers are allowed to form a form to be 'explored' in a state space;
s6: based on a time sequence difference algorithm, an aviation soldier state space vector, a state transition model and an action selection strategy algorithm formed in the steps S1-S5 and a reward value generation algorithm based on network connected domain maximization are fused to form an improved reinforcement learning frame in aviation soldier combat confrontation, and iterative learning training of the military force Agent is carried out based on the frame.
2. The aviation soldier system intelligent behavior modeling method based on global situation information as claimed in claim 1, wherein in step S2, the situation feature extraction and perception algorithm flow based on the two-dimensional GIS situation map is as follows:
s2-1: extracting image features; abstracting a frame of m multiplied by n two-dimensional GIS situation map based on RGB color space into a matrix
Figure FDA0002807170260000021
Each element of the matrix is a three-dimensional vector representing the RGB color value of the pixel point at the corresponding position, as shown in the following formula:
Figure FDA0002807170260000022
in the formula, cij∈[RGB]I is more than or equal to 0 and less than m, j is more than or equal to 0 and less than n, and the three-dimensional vector is a three-dimensional vector of RGB color values of pixel points at corresponding positions, [ r, g, b [ ]]ijWherein r represents a red component of the color, g represents a green component of the color, b represents a blue component of the color, and the numerical ranges of the three elements are all 0-255;
s2-2: the color value range of the information perception domain of the own party combat entity is set as
Figure FDA0002807170260000023
The fire striking domain color value of the own combat entity is
Figure FDA0002807170260000024
The color value of the enemy combat entity is crExecuting the step S2-3 to the step S2-8 for each frame of the two-dimensional GIS situation map;
s2-3: copying a two-dimensional GIS situation map, judging whether the current pixel belongs to a local information sensing area or not on a copy layer I pixel by pixel, and if so, keeping the color value of the current pixel; if not, assigning the color value to be 0; the following formula:
Figure FDA0002807170260000025
s2-4: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to a fire striking area of the own party on a copy map layer II, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure FDA0002807170260000026
s2-5: copying a two-dimensional GIS situation map, judging whether the current pixel point belongs to an enemy combat entity or not on a copy map layer III, and if so, keeping the color value of the current pixel point; if not, assigning the color value to be 0; the following formula:
Figure FDA0002807170260000027
s2-6: let the enemy combat entity be e1,e2,...,epCalling corresponding information perception range in weapon equipment rule base as
Figure FDA0002807170260000031
The corresponding fire striking range is
Figure FDA0002807170260000032
Executing the step S2-8 and the step S2-9 on the layer obtained after the processing of the step S2-6;
s2-7: the layer obtained after the processing of the step S2-6 is copied, whether the current pixel point belongs to the enemy combat entity is judged pixel by pixel on the copy layer IV, if yes, all the pixel point color values in a circle taking the current pixel point as the center and the corresponding combat entity information perception range as the radius are assigned as the color values
Figure FDA0002807170260000033
If not, the color value of the current pixel point is reserved; the following formula:
Figure FDA0002807170260000034
in the formula, crThe color value of the enemy combat entity,
Figure FDA0002807170260000035
the color value corresponding to the red information perception range;
s2-8: the layer obtained after the processing of the step S2-6 is copied, whether the current pixel point belongs to the enemy combat entity is judged pixel by pixel on the copy layer V, if yes, all the pixel point color values in a circle taking the current pixel point as the center and the corresponding combat entity firepower impact range as the radius are assigned as the color values
Figure FDA0002807170260000036
If not, the color value of the current pixel point is reserved; the following formula:
Figure FDA0002807170260000037
in the formula, crThe color value of the enemy combat entity,
Figure FDA0002807170260000038
the color value corresponding to the red fire striking range;
therefore, the intelligent aviation soldier system obtains four feature layers respectively reflecting own party and enemy information network connected domains and fire network connected domains from the two-dimensional GIS situation map, namely, the layer I, the layer II, the layer IV and the layer V, and completes situation feature extraction and perception.
3. The method for modeling the intelligent behavior of the aviation soldier system based on the global situation information as claimed in claim 1, wherein the aviation soldier forces comprise fighter formation, bomber formation, early warning machine formation, unmanned reconnaissance machine formation and electronic interference machine formation, and the operation behavior space of the intelligent aviation soldier force system is as follows:
s3-1: the fighter plane comprises fighter planes and bombers, and according to the fighter characteristics of the fighter planes, the performable tasks of the fighter plane formation comprise: region patrol J1Patrol J for take-off area2Patrol on route J3Patrol J of takeoff route4Huoyang J5Takeoff and protection J6Air interception J7Go back J8
S3-2: tasks that can be performed according to operational characteristics of bombers include: regional patrol H1Patrol H for take-off area1Patrol on route H2Patrol H for takeoff route3Regional assault H4Takeoff area assault H5Target assault H6Takeoff target assault H7Go back H8
S3-3: the executable tasks of the early warning machine according to the fighting characteristics of the early warning machine comprise: regional patrol detection Y1Patrol detection Y for route2Early warning machine detection mode Y3And early warning machine radar startup and shutdown Y4And the detection task is cancelled Y5
S3-4: the executable tasks according to the operational characteristics of the electronic jammer comprise: regional interference R1Route interference R2Setting an interference pattern R3Turn off the disturbance R4Ending the disturbance R5
S3-5: the executable tasks of the unmanned reconnaissance plane according to the fighting characteristics comprise: regional patrol scouting W1Patrol and reconnaissance W for route2The scout task is cancelled W3
S3-6: sorting the executable tasks of the different aviation soldier formations described in the steps S3-1 to S3-5 to obtain a combat action space A ═ J of the whole intelligent aviation soldier force system decision-making action model1,…J8,H1,…H9,Y1,…Y5,R1,…R5,W1,…W3}。
4. The aviation soldier system intelligent behavior modeling method based on global situation information as claimed in claim 1, wherein in step S4, the reward value generation algorithm flow based on network connected domain maximization is as follows:
s4-1: counting the pixel proportion of the color histogram in the feature map layer obtained in the step S2; respectively executing the steps S4-2 to S4-4 to four m multiplied by n characteristic image layers representing the own-party information network connected domain, the own-party fire network connected domain, the enemy information network connected domain and the enemy fire network connected domain;
s4-2: color quantization; setting the color interval of the map layer as range, wherein the range comprises the color value range of the information perception domain of the own combat entity
Figure FDA0002807170260000041
The fire striking domain color value of the own combat entity is
Figure FDA0002807170260000042
And the color value of the enemy combat entity is crNamely, the following conditions are satisfied:
Figure FDA0002807170260000043
randomly dividing range into N color intervals bini=[ci1,ci2]Each bin is called a bin of the color histogram, as follows:
range=bin1Ubin2U…UbinN
in the formula, ci1Is the color interval biniLower boundary of ci2Is the color interval biniThe upper bound of (c);
s4-3: carrying out color detection pixel by pixel, calculating the number of pixels of which the colors fall in each interval, and obtaining a color histogram, wherein the color histogram is represented as:
Figure FDA0002807170260000044
in the formula (I), the compound is shown in the specification,
Figure FDA0002807170260000045
indicating that the color falls in the interval bini=[ci1,ci2]The number of pixels in (1) is proportional; c. CpqThe color value of the pixel point is; c. CiIs the color sub-interval biniCenter color value of (d): c. Ci=0.5×(ci1,ci2);δ(cpq-ci) The specific form is a color judgment function:
Figure FDA0002807170260000046
s4-4: counting the total pixel proportion of non-zero color values:
Figure FDA0002807170260000047
in the formula, hTA total pixel fraction representing a non-zero color value;
s4-5: the four steps from the step S4-2 to the step S4-4 are executed to obtain the total proportion h of the non-zero color value pixels of the four monochrome characteristic image layersT(1),hT(2),hT(3),hT(4) Respectively corresponding to situation characteristic parameters I as own information networkb=hT(1) Own fire network situation characteristic parameter Ab=hT(2) Situation characteristic parameter I of enemy information networkr=hT(3) Situation characteristic parameter A of enemy fire power networkr=hT(4);
S4-6: based on the result of step S4-5, obtaining the quantified parameters of the battle superiority of both parties by weight integrated quantification, using PbIndicating the operational advantage of the own party in the systematic confrontation, PrThe fighting advantages of the enemy in the system fight are shown, and the fighting advantages of the two parties are as follows:
Pb=ω1·Ib2·Ab
Pr=ω1·Ir2·Ar
in the formula, ω1Representing the weight of the information network advantage in the comprehensive combat advantage; omega2Represents the weight of the fire network advantage in the comprehensive combat advantage, the weight value is adjusted between (0, 1), and meets omega12=1;
S4-7: designing a formalized reward function based on the contrast of the operational advantages of the two parties; the core idea of the reward function for proposing the reward mechanism is that: comparing a one-time behavior decision made under the current situation with the comprehensive combat superiority of two parties formed after the interaction of the battlefield environment to obtain a reward value based on the current situation and the decision; specifically, if the decision makes the intelligent agent have the comprehensive combat advantage relative to the enemy, the reward is positive, and the greater the advantage is, the greater the absolute value of the reward value is; if the decision makes the intelligent agent have the disadvantage of comprehensive combat relative to the enemy, the reward is negative, and the greater the disadvantage, the greater the absolute value of the reward value; meanwhile, the reward parameters need to be normalized;
the reward function is expressed as: the proportion of the operational advantage of one intelligent agent to the total operational advantage of the two intelligent agents is used as a main reward value, a minimum value delta is matched to introduce a positive and negative numerical characteristic, and the following formula is shown as follows:
Figure FDA0002807170260000051
wherein R is an award value based on the current situation and decision; delta is a minimum value in the range of (10)-4,10-3) The significance is to avoid divide by zero while introducing normalized prize values into the positive and negative numerical features.
5. The aviation soldier system intelligent behavior modeling method based on global situation information as claimed in claim 1, wherein the specific process of step S5 is as follows:
s5-1: the transition of the fighting situation is described by the probability,
Figure FDA0002807170260000052
representing transition probabilities between states, includingMeaning as follows: the probability that behavior a is executed in state s to reach state s', and all transition probabilities form a matrix, called an environment transition matrix and marked as T;
s5-2: after the own party selects the behavior a, the change of the fighting situation is completely expressed by the state transition matrix, and the air combat process conforms to the first-order Markov decision process, namely the transition probability is only related to the current state;
s5-3: combining the probability in the state transition model, each state s, and selecting the behavior a according to a certain probability by following the strategy pi under the state to form a 'state-behavior' pair (s, a), wherein the value of the 'state-behavior' pair is obtained by a Q function, and Q is used for obtaining the value of the 'state-behavior' pairπ(s, a) represents;
s5-4: in the course of behavior selection, add the random selection part to form the behavior selection tactics mu on the basis of greedy tactics, in order to choose a behavior from the behavior space under each state, and transfer to the next state with a certain probability, the construction of the behavior selection tactics mu lies in setting up a exploration constant tau at first, tau is for (0, 1), when choosing the behavior each time, produce a random number rho with interval [0, 1], there are:
Figure FDA0002807170260000061
taking τ to 0.2, there is 20% of possible free choice actions.
6. The aviation soldier system intelligent behavior modeling method based on global situation information as claimed in claim 1, wherein the specific process of step S6 is as follows:
s6-1: utilizing the situation perception information obtained in the step S2, the weapon ammunition residual percentage obtained in the simulation platform and the aviation soldier formation combat damage ratio to form a state space vector S, representing a specific state space vector at a certain moment by using S, and building a GRBF neural network consisting of an input layer, a discrete layer, a hidden layer and an output layer to discretize a Q function value of a ' state-behavior ' pair so as to partition a continuous state space and obtain a state-behavior ' pair value corresponding to a discrete state; the network input is a state space vector, and the values of all 'state-behavior' pairs obtained by selecting different behaviors under the state corresponding to the state space vector are output; the network input layer and the discrete layer have the same dimension with the state space vector; the hidden layer of the network has m nodes in total, and the output layer has the same dimension as the behavior space; for an aviation Agent, 30 behaviors are selected in the behavior space under each state, and the calculation formula is as follows:
Figure FDA0002807170260000062
wherein Q (s, a)j) Q function value, w, representing the execution of the j-th action in state sijThe connection weight between the ith node of the hidden layer and the jth node of the output layer,
Figure FDA0002807170260000063
normalization for the ith node of the hidden layer:
Figure FDA0002807170260000064
in the formula, the radial basis function biThe formula for calculation of(s) is:
Figure FDA0002807170260000065
in the formula (d)iIs the center of the ith basis function, having the same dimension as s, σiIs the width of the ith basis function, | | s-di| | is the Euclidean distance between the input state and the center of the basis function; after p is set artificially, di,σiAll determined by a k-means clustering algorithm;
s6-2: carrying out iterative learning training of the military force agents based on the framework of the step S6-1, counting the learning process in cycles, regarding the completion of one round of combat as the completion of one learning cycle, and describing the decision process of the intelligent aviation soldier combat system into steps S6-3 to S6-10;
s6-3: initializing a GRBF neural network of an Agent of an aviation soldier, setting the center and the width of the GRBF through K-means clustering, and setting the maximum learning cycle number K, wherein K is 1;
s6-4: starting learning of the k-th iteration cycle, starting the confrontation simulation, wherein t is the current time, t is 0, and st=s0,s0Is in an initial state;
s6-5: in the kth iteration cycle, state stDown-obey policy μ execution behavior atThen, based on the instant reward obtained in step S4
Figure FDA0002807170260000071
Transition to a new state st+1Continuing to execute behavior a following policy μt+1(ii) a Calculating stCorresponding GRBF network output, and updating the weight from the hidden layer to the output layer by using a time sequence difference algorithm according to the following formula
Figure FDA0002807170260000072
Figure FDA0002807170260000073
In the formula (I), the compound is shown in the specification,
Figure FDA0002807170260000074
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network obtained by iteration in the kth learning periodt) The connection weight between each node;
Figure FDA0002807170260000075
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt) The connection weight between each node;
Figure FDA0002807170260000076
the ith node and the id (a) of the output layer of the hidden layer of the GRBF neural network in the k-1 learning periodt+1) The connection weight between each node; bi(st) Is the state S described in S6-1tA radial basis function of; bi(st+1) Is a state st+1A radial basis function of; id (a)t) Is an action atThe serial number of (2); id (a)t+1) Is an action at+1The serial number of (2); alpha represents the learning rate, and the value range is (0, 1);
s6-6: let t be t +1, and repeatedly execute step S6-5 until the confrontation simulation wins and falls to the end state of the current iteration cycle;
s6-7: let K be K +1 and repeatedly perform steps S6-5 to S6-6 until K > K.
CN202011375776.3A 2020-11-30 2020-11-30 Aviation soldier system intelligent behavior modeling method based on global situation information Active CN112560332B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011375776.3A CN112560332B (en) 2020-11-30 2020-11-30 Aviation soldier system intelligent behavior modeling method based on global situation information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011375776.3A CN112560332B (en) 2020-11-30 2020-11-30 Aviation soldier system intelligent behavior modeling method based on global situation information

Publications (2)

Publication Number Publication Date
CN112560332A true CN112560332A (en) 2021-03-26
CN112560332B CN112560332B (en) 2022-08-02

Family

ID=75045501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011375776.3A Active CN112560332B (en) 2020-11-30 2020-11-30 Aviation soldier system intelligent behavior modeling method based on global situation information

Country Status (1)

Country Link
CN (1) CN112560332B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990452A (en) * 2021-05-06 2021-06-18 中国科学院自动化研究所 Man-machine confrontation knowledge driving type decision-making method and device and electronic equipment
CN113268865A (en) * 2021-05-12 2021-08-17 中国人民解放军军事科学院评估论证研究中心 Aircraft behavior modeling construction method based on regular flow chain
CN113283110A (en) * 2021-06-11 2021-08-20 中国人民解放军国防科技大学 Situation perception method for intelligent confrontation simulation deduction
CN113505538A (en) * 2021-07-28 2021-10-15 哈尔滨工业大学 Unmanned aerial vehicle autonomous combat system based on computer generated force
CN114330093A (en) * 2021-10-26 2022-04-12 北京航空航天大学 Multi-platform collaborative intelligent confrontation decision-making method for aviation soldiers based on DQN
CN115909027A (en) * 2022-11-14 2023-04-04 中国人民解放军32370部队 Situation estimation method and device
CN117852319A (en) * 2024-03-07 2024-04-09 中国人民解放军国防科技大学 Space target visibility judging method for space foundation situation awareness system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
US8879426B1 (en) * 2009-09-03 2014-11-04 Lockheed Martin Corporation Opportunistic connectivity edge detection
CN108021754A (en) * 2017-12-06 2018-05-11 北京航空航天大学 A kind of unmanned plane Autonomous Air Combat Decision frame and method
CN111488992A (en) * 2020-03-03 2020-08-04 中国电子科技集团公司第五十二研究所 Simulator adversary reinforcing device based on artificial intelligence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8879426B1 (en) * 2009-09-03 2014-11-04 Lockheed Martin Corporation Opportunistic connectivity edge detection
CN101964019A (en) * 2010-09-10 2011-02-02 北京航空航天大学 Against behavior modeling simulation platform and method based on Agent technology
CN108021754A (en) * 2017-12-06 2018-05-11 北京航空航天大学 A kind of unmanned plane Autonomous Air Combat Decision frame and method
CN111488992A (en) * 2020-03-03 2020-08-04 中国电子科技集团公司第五十二研究所 Simulator adversary reinforcing device based on artificial intelligence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIREN KONG 等: "UAV autonomous aerial combat maneuver strategy generation with observation error based on stateadversarial deep deterministic policy gradient and inverse reinforcement learning", 《ELECTRONICS》 *
闫旭等: "基于agent的作战单元任务持续性仿真评估方法", 《系统仿真学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990452A (en) * 2021-05-06 2021-06-18 中国科学院自动化研究所 Man-machine confrontation knowledge driving type decision-making method and device and electronic equipment
CN113268865A (en) * 2021-05-12 2021-08-17 中国人民解放军军事科学院评估论证研究中心 Aircraft behavior modeling construction method based on regular flow chain
CN113268865B (en) * 2021-05-12 2022-02-22 中国人民解放军军事科学院评估论证研究中心 Aircraft behavior modeling construction method based on regular flow chain
CN113283110A (en) * 2021-06-11 2021-08-20 中国人民解放军国防科技大学 Situation perception method for intelligent confrontation simulation deduction
CN113283110B (en) * 2021-06-11 2022-05-27 中国人民解放军国防科技大学 Situation perception method for intelligent confrontation simulation deduction
CN113505538A (en) * 2021-07-28 2021-10-15 哈尔滨工业大学 Unmanned aerial vehicle autonomous combat system based on computer generated force
CN113505538B (en) * 2021-07-28 2022-04-12 哈尔滨工业大学 Unmanned aerial vehicle autonomous combat system based on computer generated force
CN114330093A (en) * 2021-10-26 2022-04-12 北京航空航天大学 Multi-platform collaborative intelligent confrontation decision-making method for aviation soldiers based on DQN
CN115909027A (en) * 2022-11-14 2023-04-04 中国人民解放军32370部队 Situation estimation method and device
CN115909027B (en) * 2022-11-14 2023-06-09 中国人民解放军32370部队 Situation estimation method and device
CN117852319A (en) * 2024-03-07 2024-04-09 中国人民解放军国防科技大学 Space target visibility judging method for space foundation situation awareness system
CN117852319B (en) * 2024-03-07 2024-05-17 中国人民解放军国防科技大学 Space target visibility judging method for space foundation situation awareness system

Also Published As

Publication number Publication date
CN112560332B (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN112560332B (en) Aviation soldier system intelligent behavior modeling method based on global situation information
CN110929394B (en) Combined combat system modeling method based on super network theory and storage medium
CN110196605B (en) Method for cooperatively searching multiple dynamic targets in unknown sea area by reinforcement learning unmanned aerial vehicle cluster
CN105892480B (en) Isomery multiple no-manned plane systematic collaboration, which is examined, beats task self-organizing method
Ernest et al. Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles
Ma et al. Cooperative occupancy decision making of Multi-UAV in Beyond-Visual-Range air combat: A game theory approach
CN102506863B (en) Universal gravitation search-based unmanned plane air route planning method
CN106705970A (en) Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
CN105893656A (en) Platform-level Agent interactive simulation-oriented army unit combat test method
CN112307622A (en) Autonomous planning system and planning method for generating military forces by computer
CN109541960B (en) System and method for aircraft digital battlefield confrontation
CN113625569B (en) Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN114510078A (en) Unmanned aerial vehicle maneuver evasion decision-making method based on deep reinforcement learning
Qingwen et al. Cooperative jamming resource allocation of UAV swarm based on multi-objective DPSO
CN114518772A (en) Unmanned aerial vehicle swarm self-organization method in rejection environment
Peng et al. Modeling and solving the dynamic task allocation problem of heterogeneous UAV swarm in unknown environment
Lu et al. Analysis of OODA Loop based on Adversarial for Complex Game Environments
CN114792072B (en) Function-based equipment decision behavior simulation modeling method and system
He et al. An operation planning generation and optimization method for the new intelligent combat SoS
Cao et al. Application of improved ant colony algorithm in the path planning problem of mobile robot
Duan et al. Dynamic Tasks Scheduling Model for UAV Cluster Flexible Network Architecture
Zhang et al. Intelligent battlefield situation comprehension method based on deep learning in wargame
Shuo et al. Research on distributed task allocation of loitering munition swarm
Deng et al. Research on intelligent decision technology for multi-UAVs prevention and control
Huang et al. Course of Action Generation Using Graph Model for Conflict Resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant